summaryrefslogtreecommitdiff
path: root/nch/README.md
blob: e9e3e941c047a5006aa5f4486eeca11127d209e3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15


scripts
-------------------------------------
results are stored in the data folder in the corresponding sub-folders: 01, 02, ...

config.rb defines which datasets to employ and stores URIs of already uploaded files

* 01_fetch - copies data from old repository and converts to a consistent naming scheme
* 02_decode_inchi.rb - decodes inchis and renames SMILES column to InChI
* 03_validate_compounds.rb - checks if all compounds are included in the feature set, stores uniq compounds without duplicates
* 04_get_feature_names.rb - extracts new features names for features from orig files
* 05_compute_features.rb - computes new features
* 06_compare_features.rb - compares orig features and new features
* 07_validate.rb - starts crossvalidation/test set validation with old / new features