summaryrefslogtreecommitdiff
path: root/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md
diff options
context:
space:
mode:
authorAndreas Maunz <andreas@maunz.de>2012-05-02 17:03:40 +0200
committerAndreas Maunz <andreas@maunz.de>2012-05-02 17:03:40 +0200
commit813f3fcb9e1ecf873c396a09939518ecddf07def (patch)
tree3c8bb7734696e473a53ff0597720bac884935831 /_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md
parentced27698c975bbc581ad1fa624fa0872d5abdd17 (diff)
pc, dm, fs
Diffstat (limited to '_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md')
-rw-r--r--_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md131
1 files changed, 131 insertions, 0 deletions
diff --git a/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md b/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md
new file mode 100644
index 0000000..3151337
--- /dev/null
+++ b/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md
@@ -0,0 +1,131 @@
+---
+layout: post
+title: "Calculating physico chemical descriptors with opentox algorithm"
+description: ""
+category:
+tags: []
+---
+{% include JB/setup %}
+
+**Opentox-ruby has facilities to calculate more than 300 non-proprietary physico-chemical descriptors, whose calculation can be easily triggered with REST calls.**
+
+Three libraries are employed for descriptor calculation:
+
+
+* [Chemistry Development Kit](http://cdk.sf.net), developed by Chris Steinbeck, Egon Willighagen, and others.
+
+
+* [OpenBabel](http://openbabel.sf.net), developed by Noel O'Boyle and others.
+
+
+* [Joelib2](http://sourceforge.net/projects/joelib/), developed by Jörg Wegener.
+
+
+CDK and OpenBabel are actively maintained packages with a large community, while Joelib seems to be a one-man effort that has reached its final state.
+
+Descriptors have been categorized across libraries. It is possible to calculate descriptors individually, or calculate groups of descriptors according to categories and/or libraries.
+
+
+# Categories
+
+
+Descriptors were categorized based on work by [Guha](http://www.rguha.net/code/java/cdkdesc.html). The list gives code names and number of descriptors in braces:
+
+
+* _geometrical (20)_
+
+* _topological (186)_
+
+* _electronic (10)_
+
+* _cpsa (28)_
+
+* _constitutional (48)_
+
+* _hybrid (23)_
+
+
+
+
+# Libraries
+
+
+Three libraries are employed. The list gives code names and number of descriptors in braces.
+
+
+* _cdk (263)_
+
+* _openbabel (20)_
+
+* _joelib (32)_
+
+
+**Note:** CDK descriptors are calculated via REST calls to [Ambit](http://ambit.sf.net), while the others are derived locally on the server.
+
+
+# Creating a Feature Dataset
+
+
+The deployment on ot-test.in-silico.ch is used to demonstrate the usage:
+
+
+
+
+* Get a list of all descriptors: Query the pc descriptor service without parameters.
+
+
+ curl "http://ot-test.in-silico.ch/algorithm/pc"
+
+
+
+* Calculate all descriptors (using _AllDescriptors_ from the above list) by POSTing a dataset uri.
+
+
+ curl -X POST \
+ --data-urlencode "dataset_uri=..." \
+ "http://ot-test.in-silico.ch/algorithm/
+ pc/AllDescriptors"
+
+
+Pass arguments _pc_type_ and/or _lib_  to constrain the query to certain categories and/or libraries. Use code names from above as argument values. E.g., to calculate constitutional and electronic descriptors from CDK, use pc_type=constitutional,electronic and lib=cdk.
+
+
+ curl -X POST \
+ --data-urlencode "dataset_uri=..." \
+ --data-urlencode "pc_type=constitutional,electronic", \
+ --data-urlencode "lib=cdk" \
+ "http://ot-test.in-silico.ch/algorithm/pc/AllDescriptors"
+
+
+Clearly, it is possible to combine values by appending them in a comma-separated list.
+
+* Calculate individual descriptors by POSTing to one of the other URIs from the list.
+
+* The POST returns a task URI that can be queried for progress.
+
+
+That's it! The resulting feature dataset contains extensive [OWL feature metadata](http://opentox.org/data/documents/development/RDF%20files):
+
+
+* OT.hasSource: The dataset passed as _dataset_uri_ argument
+
+* DC.creator: The webservice used to calculate the feature
+
+* DC.description: Free text with a human readable description, library, and category of the feature
+
+
+
+
+# Notes
+
+
+* As with any dataset, resulting feature datasets may also be requested in CSV or YAML formats by specifying appropriate accept-Headers.
+
+* Further processing may include feature selection as next step. [An implementation based on Random Forests is available in opentox-ruby](http://www.maunz.de/wordpress/opentox/2012/selecting-features-with-opentox-ruby).
+
+* The dataset (perhaps after feature selection) can be [passed to the Lazar algorithm](http://www.maunz.de/wordpress/opentox/2011/lazar-models-and-how-to-trigger-them) as feature dataset.
+
+* From a higher perspective: A complete [tutorial](http://www.maunz.de/wordpress/opentox/2012/services-tutorial-lazar-feature-generation-feature-selection-validation) that streamlines the process.
+
+
+