diff options
author | Andreas Maunz <andreas@maunz.de> | 2012-05-02 17:03:40 +0200 |
---|---|---|
committer | Andreas Maunz <andreas@maunz.de> | 2012-05-02 17:03:40 +0200 |
commit | 813f3fcb9e1ecf873c396a09939518ecddf07def (patch) | |
tree | 3c8bb7734696e473a53ff0597720bac884935831 /_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md | |
parent | ced27698c975bbc581ad1fa624fa0872d5abdd17 (diff) |
pc, dm, fs
Diffstat (limited to '_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md')
-rw-r--r-- | _posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md | 131 |
1 files changed, 131 insertions, 0 deletions
diff --git a/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md b/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md new file mode 100644 index 0000000..3151337 --- /dev/null +++ b/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md @@ -0,0 +1,131 @@ +--- +layout: post +title: "Calculating physico chemical descriptors with opentox algorithm" +description: "" +category: +tags: [] +--- +{% include JB/setup %} + +**Opentox-ruby has facilities to calculate more than 300 non-proprietary physico-chemical descriptors, whose calculation can be easily triggered with REST calls.** + +Three libraries are employed for descriptor calculation: + + +* [Chemistry Development Kit](http://cdk.sf.net), developed by Chris Steinbeck, Egon Willighagen, and others. + + +* [OpenBabel](http://openbabel.sf.net), developed by Noel O'Boyle and others. + + +* [Joelib2](http://sourceforge.net/projects/joelib/), developed by Jörg Wegener. + + +CDK and OpenBabel are actively maintained packages with a large community, while Joelib seems to be a one-man effort that has reached its final state. + +Descriptors have been categorized across libraries. It is possible to calculate descriptors individually, or calculate groups of descriptors according to categories and/or libraries. + + +# Categories + + +Descriptors were categorized based on work by [Guha](http://www.rguha.net/code/java/cdkdesc.html). The list gives code names and number of descriptors in braces: + + +* _geometrical (20)_ + +* _topological (186)_ + +* _electronic (10)_ + +* _cpsa (28)_ + +* _constitutional (48)_ + +* _hybrid (23)_ + + + + +# Libraries + + +Three libraries are employed. The list gives code names and number of descriptors in braces. + + +* _cdk (263)_ + +* _openbabel (20)_ + +* _joelib (32)_ + + +**Note:** CDK descriptors are calculated via REST calls to [Ambit](http://ambit.sf.net), while the others are derived locally on the server. + + +# Creating a Feature Dataset + + +The deployment on ot-test.in-silico.ch is used to demonstrate the usage: + + + + +* Get a list of all descriptors: Query the pc descriptor service without parameters. + + + curl "http://ot-test.in-silico.ch/algorithm/pc" + + + +* Calculate all descriptors (using _AllDescriptors_ from the above list) by POSTing a dataset uri. + + + curl -X POST \ + --data-urlencode "dataset_uri=..." \ + "http://ot-test.in-silico.ch/algorithm/ + pc/AllDescriptors" + + +Pass arguments _pc_type_ and/or _lib_ to constrain the query to certain categories and/or libraries. Use code names from above as argument values. E.g., to calculate constitutional and electronic descriptors from CDK, use pc_type=constitutional,electronic and lib=cdk. + + + curl -X POST \ + --data-urlencode "dataset_uri=..." \ + --data-urlencode "pc_type=constitutional,electronic", \ + --data-urlencode "lib=cdk" \ + "http://ot-test.in-silico.ch/algorithm/pc/AllDescriptors" + + +Clearly, it is possible to combine values by appending them in a comma-separated list. + +* Calculate individual descriptors by POSTing to one of the other URIs from the list. + +* The POST returns a task URI that can be queried for progress. + + +That's it! The resulting feature dataset contains extensive [OWL feature metadata](http://opentox.org/data/documents/development/RDF%20files): + + +* OT.hasSource: The dataset passed as _dataset_uri_ argument + +* DC.creator: The webservice used to calculate the feature + +* DC.description: Free text with a human readable description, library, and category of the feature + + + + +# Notes + + +* As with any dataset, resulting feature datasets may also be requested in CSV or YAML formats by specifying appropriate accept-Headers. + +* Further processing may include feature selection as next step. [An implementation based on Random Forests is available in opentox-ruby](http://www.maunz.de/wordpress/opentox/2012/selecting-features-with-opentox-ruby). + +* The dataset (perhaps after feature selection) can be [passed to the Lazar algorithm](http://www.maunz.de/wordpress/opentox/2011/lazar-models-and-how-to-trigger-them) as feature dataset. + +* From a higher perspective: A complete [tutorial](http://www.maunz.de/wordpress/opentox/2012/services-tutorial-lazar-feature-generation-feature-selection-validation) that streamlines the process. + + + |