From 6c4fd5809d20596ad2cfe507cd762bdcdce7fc57 Mon Sep 17 00:00:00 2001 From: Christoph Helma Date: Fri, 3 Feb 2017 14:36:07 +0100 Subject: algorithm selection tutorial --- README.md | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 97 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 28ed18f..2bb5c80 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,74 @@ Execute the following commands either from an interactive Ruby shell or a Ruby s #### Experiment with other algorithms - You can pass algorithms parameters to the `Model::Validation.create_from_csv_file` command. The [API documentation](http://rdoc.info/gems/lazar) provides detailed instructions. + You can pass algorithm specifications as parameters to the `Model::Validation.create_from_csv_file` and `Model::Lazar.create` commands. Algorithms for descriptors, similarity calculations, feature_selection and local models are specified in the `algorithm` parameter. Unspecified algorithms and parameters are substituted by default values. The example below selects + + - MP2D fingerprint descriptors + - Tanimoto similarity with a threshold of 0.1 + - no feature selction + - weighted majority vote predictions + + ``` + algorithms = { + :descriptors => { # descriptor algorithm + :method => "fingerprint", # fingerprint descriptors + :type => "MP2D" # fingerprint type, e.g. FP4, MACCS + }, + :similarity => { # similarity algorithm + :method => "Algorithm::Similarity.tanimoto", + :min => 0.1 # similarity threshold for neighbors + }, + :feature_selection => nil, # no feature selection + :prediction => { # local modelling algorithm + :method => "Algorithm::Classification.weighted_majority_vote", + }, + } + + training_dataset = Dataset.from_csv_file "hamster_carcinogenicity.csv" + model = Model::Lazar.create training_dataset: training_dataset, algorithms: algorithms + ``` + + The next example creates a regression model with + + - calculated descriptors from OpenBabel libraries + - weighted cosine similarity and a threshold of 0.5 + - descriptors that are correlated with the endpoint + - local partial least squares models from the R caret package + + ``` + algorithms = { + :descriptors => { # descriptor algorithm + :method => "calculate_properties", + :features => PhysChem.openbabel_descriptors, + }, + :similarity => { # similarity algorithm + :method => "Algorithm::Similarity.weighted_cosine", + :min => 0.5 + }, + :feature_selection => { # feature selection algorithm + :method => "Algorithm::FeatureSelection.correlation_filter", + }, + :prediction => { # local modelling algorithm + :method => "Algorithm::Caret.pls", + }, + } + training_dataset = Dataset.from_csv_file "EPAFHM_log10.csv" + model = Model::Lazar.create(training_dataset:training_dataset, algorithms:algorithms) + ``` + Please consult the [API documentation](http://rdoc.info/gems/lazar) and [source code](https:://github.com/opentox/lazar) for up to date information about implemented algorithms: + + - Descriptor algorithms + - [Compounds](http://www.rubydoc.info/gems/lazar/OpenTox/Compound) + - [Nanoparticles](http://www.rubydoc.info/gems/lazar/OpenTox/Nanoparticle) + - [Similarity algorithms](http://www.rubydoc.info/gems/lazar/OpenTox/Algorithm/Similarity) + - [Feature selection algorithms](http://www.rubydoc.info/gems/lazar/OpenTox/Algorithm/FeatureSelection) + - Local models + - [Classification](http://www.rubydoc.info/gems/lazar/OpenTox/Algorithm/Classification) + - [Regression](http://www.rubydoc.info/gems/lazar/OpenTox/Algorithm/Regression) + - [R caret](http://www.rubydoc.info/gems/lazar/OpenTox/Algorithm/Caret) + + + You can find more working examples in the `lazar` `model-*.rb` and `validation-*.rb` [tests](https://github.com/opentox/lazar/tree/master/test). ### Create and use `lazar` nanoparticle models @@ -87,7 +154,35 @@ Execute the following commands either from an interactive Ruby shell or a Ruby s #### Experiment with other datasets, endpoints and algorithms - You can pass training_dataset, prediction_feature and algorithms parameters to the `Model::Validation.create_from_enanomapper` command. The [API documentation](http://rdoc.info/gems/lazar) provides detailed instructions. Detailed documentation and validation results can be found in this [publication](https://github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf). + You can pass training_dataset, prediction_feature and algorithms parameters to the `Model::Validation.create_from_enanomapper` command. Procedure and options are the same as for compounds. The following commands create and validate a `nano-lazar` model with + + - measured P-CHEM properties as descriptors + - descriptors selected with correlation filter + - weighted cosine similarity with a threshold of 0.5 + - Caret random forests + +``` + algorithms = { + :descriptors => { + :method => "properties", + :categories => ["P-CHEM"], + }, + :similarity => { + :method => "Algorithm::Similarity.weighted_cosine", + :min => 0.5 + }, + :feature_selection => { + :method => "Algorithm::FeatureSelection.correlation_filter", + }, + :prediction => { + :method => "Algorithm::Caret.rf", + }, + } + validation_model = Model::Validation.from_enanomapper algorithms: algorithms +``` + + + Detailed documentation and validation results for nanoparticle models can be found in this [publication](https://github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf). Documentation ------------- -- cgit v1.2.3