summaryrefslogtreecommitdiff
path: root/_posts/2012-05-02-calculating-physico-chemical-descriptors-with-opentox-algorithm.md
blob: b31cbdd0debe3729178d6c3ce9c6497a80ef70fa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
layout: post
title: "Calculating physico chemical descriptors with opentox algorithm"
description: "Opentox-ruby has facilities to calculate more than 300 non-proprietary physico-chemical descriptors, whose calculation can be easily triggered with REST calls."
category: Usage
tags: [Algorithm, Feature Generation, Feature Selection, Tutorials]
---
{% include JB/setup %}

**Opentox-ruby has facilities to calculate more than 300 non-proprietary physico-chemical descriptors, whose calculation can be easily triggered with REST calls.**

Three libraries are employed for descriptor calculation:


* [Chemistry Development Kit](http://cdk.sf.net) (CDK), developed by Chris Steinbeck, Egon Willighagen, and others.


* [OpenBabel](http://openbabel.sf.net), developed by Noel O'Boyle and others.


* [Joelib2](http://sourceforge.net/projects/joelib/), developed by Jörg Wegener.


CDK and OpenBabel are actively maintained packages with a large community, while Joelib seems is more or less a single man's work.

Descriptors have been categorized across libraries. It is possible to calculate descriptors individually, or calculate groups of descriptors according to categories and/or libraries.


# Categories


Descriptors were categorized based on work by [R. Guha](http://www.rguha.net/code/java/cdkdesc.html). The list gives code names and number of descriptors in braces:

	
* _geometrical (20)_

* _topological (186)_

* _electronic (10)_

* _cpsa (28)_

* _constitutional (48)_

* _hybrid (23)_




# Libraries


Three libraries are employed. The list gives code names and number of descriptors in braces.

	
* _cdk (263)_

* _openbabel (20)_

* _joelib (32)_



# Creating a Feature Dataset


![Descriptor Calculation](/images/pc.png)


The deployment on `ot-test.in-silico.ch` is used to demonstrate the usage:


	
Get a list of all descriptors: Query the pc descriptor service without parameters.
    
    curl "http://ot-test.in-silico.ch/algorithm/pc"
    
	
Calculate all descriptors (using *AllDescriptors* from the above list) by POSTing a dataset uri.

    
    curl -X POST \
    --data-urlencode "dataset_uri=..." \
    "http://ot-test.in-silico.ch/algorithm/
    pc/AllDescriptors"
    

Pass arguments *pc_type* and/or *lib* to constrain the query to certain categories and/or libraries. Use code names from above as argument values. E.g., to calculate constitutional and electronic descriptors from CDK, use `pc_type=constitutional,electronic` and `lib=cdk`.


    curl -X POST \
    --data-urlencode "dataset_uri=..." \
    --data-urlencode "pc_type=constitutional,electronic", \
    --data-urlencode "lib=cdk" \
    "http://ot-test.in-silico.ch/algorithm/pc/AllDescriptors"


It is possible to combine values by appending them in a comma-separated list. Moreover,

* Calculate individual descriptors by POSTing to one of the other URIs from the list.

* The POST returns a task URI that can be queried for progress.


That's it! The resulting feature dataset contains extensive [OWL feature metadata](http://opentox.org/data/documents/development/RDF%20files):


* `OT.hasSource`: The dataset passed as *dataset_uri* argument

* `DC.creator`: The webservice used to calculate the feature

* `DC.description`: Free text with a human readable description, library, and category of the feature


# Notes
	
* As with any dataset, resulting feature datasets may also be requested in CSV or YAML formats by specifying appropriate accept-Headers.

* Further processing may include feature selection as next step. [An implementation based on Random Forests is available in opentox-ruby](/algorithm/2012/05/02/selecting-features-with-opentox-algorithm).

* The dataset (perhaps after feature selection) can be [passed to the Lazar algorithm](/algorithm/2012/05/02/lazar-models-and-how-to-trigger-them) as feature dataset.

* From a higher perspective: A complete [tutorial](/algorithm/2012/05/01/services-tutorial---lazar-feature-generation-feature-selection-validation) that streamlines the process.