summaryrefslogtreecommitdiff
path: root/ches-mapper-all.Rmd
blob: c1739ebcf380181441b66435ffc789300d27f7d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
##### Ches-Mapper analysis

We applied the visualization tool CheS-Mapper (Chemical Space Mapping and Visualization in 3D,
http://ches-mapper.org, @Gütlein2012) to compare both datasets. CheS-Mapper can be used to analyze the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. It embeds a dataset into 3D space, such that compounds with similar feature values are close to each other. CheS-Mapper is generic and can be employed with different kinds of features. [@fig:ches-mapper-pc] shows an embedding that is based on physico-chemical (PC) descriptors: we determined that both datasets have very similar PC feature values.

![Compounds from the Mazzatorta and the Swiss dataset are highlighted in red and green. Compounds that occur in both datasets are highlighted in magenta. In this example, CheS-Mapper applied a principal components analysis to map compounds according to their physico-chemical (PC) feature values into 3D space. Both datasets have in general similar PC feature values. As an exception, the Mazzatorta dataset includes most of the tiny compound structures: we have selected the 78 smallest compounds (with 10 atoms and less, marked with a blue box in the screen-shot) and found that 61 of these compounds occur in the Mazzatorta dataset, whereas only 19 are contained in the Swiss dataset (p-value 3.7E-7).](figure/pc-small-compounds-highlighted.png){#fig:ches-mapper-pc}

We extended CheS-Mapper with a functionality to mine the same MolPrint2D features that are utilized for model building in this work. Applying a minimum frequency of 3 yields 760 distinguished MolPrint2D fragments for the composed dataset of 671 unique compounds. Again, a visual inspection confirmed that both datasets are structurally very similar. However, CheS-Mapper allows the detection of features that help to distinguish groups of selected compounds from the entire dataset. Hence, we found discriminating features for compounds that occur in only one of both datasets, and for the most active or in-active compounds (see [@tbl:molprint]). As an example, [@fig:ches-mapper-alert] shows 9 compounds that match a specific fragment (all other compounds in the dataset do not match this fragment) and have very low mean LOAEL values.

| Selection  | Num selected compounds | Feature name                           | Human-readable feature name          | Feature values entire dataset | Feature values in selection  | P-Value               |
|------------|------------------------|----------------------------------------|--------------------------------------|-------------------------------|------------------------------|-----------------------|
|            |                        |                                        |                                      |                               |                              |                       |
| Mazzatorta | 290                    | 8;1-1-3;2-2-3;                         | O.3 1:C.ar 2:2xC.ar                  | 643× 'no-match', 28× 'match'  | 265× 'no-match', 25× 'match' | 0.005560996217776615  |
| Mazzatorta | 290                    | 9;1-1-2;2-1-3;2-1-28;                  | O 1:C.2 2:C.ar,N.am                  | 629× 'no-match', 42× 'match'  | 284× 'no-match', 6× 'match'  | 0.006195320799272208  |
| Mazzatorta | 290                    | 15;1-1-3;2-2-3;                        | Cl 1:C.ar 2:2xC.ar                   | 504× 'no-match', 167× 'match' | 240× 'no-match', 50× 'match' | 0.009255119323774763  |
|            |                        |                                        |                                      |                               |                              |                       |
| Swiss      | 226                    | 16;1-1-3;2-2-3;                        | F 1:C.ar 2:2xC.ar                    | 630× 'no-match', 41× 'match'  | 199× 'no-match', 27× 'match' | 0.004142648833225349  |
| Swiss      | 226                    | 8;1-1-3;2-2-3;                         | O.3 1:C.ar 2:2xC.ar                  | 643× 'no-match', 28× 'match'  | 225× 'no-match', 1× 'match'  | 0.006101869043914521  |
|            |                        |                                        |                                      |                               |                              |                       |
| low10      | 67                     | 1;1-1-8;2-1-12;                        | C 1:O.3 2:P.3                        | 642× 'no-match', 29× 'match'  | 52× 'no-match', 15× 'match'  | 2.599701232064433E-9  |
| low10      | 67                     | 15;1-1-1;2-2-1;2-1-15;                 | Cl 1:C 2:2xC,Cl                      | 662× 'no-match', 9× 'match'   | 59× 'no-match', 8× 'match'   | 3.499196354894707E-8  |
| low10      | 67                     | 1;1-1-1;1-1-8;2-1-12;                  | C 1:C,O.3 2:P.3                      | 645× 'no-match', 26× 'match'  | 54× 'no-match', 13× 'match'  | 6.053371437442223E-8  |
| low10      | 67                     | 2;1-1-1;1-1-2;2-3-1;                   | C.2 1:C,C.2 2:3xC                    | 663× 'no-match', 8× 'match'   | 61× 'no-match', 6× 'match'   | 8.936801443204523E-6  |
| low10      | 67                     | 2;1-1-1;1-1-2;1-1-15;2-3-1;2-2-15;     | C.2 1:C,C.2,Cl 2:3xC,2xCl            | 665× 'no-match', 6× 'match'   | 62× 'no-match', 5× 'match'   | 2.3279183652191726E-5 |
|            |                        |                                        |                                      |                               |                              |                       |
| high10     | 67                     | 8;1-1-3;2-2-3;                         | O.3 1:C.ar 2:2xC.ar                  | 643× 'no-match', 28× 'match'  | 57× 'no-match', 10× 'match'  | 1.4617532950766954E-4 |
| high10     | 67                     | 3;1-2-3;2-1-1;2-2-3;                   | C.ar 1:2xC.ar 2:C,2xC.ar             | 506× 'no-match', 165× 'match' | 64× 'no-match', 3× 'match'   | 1.8132445228380423E-4 |
| high10     | 67                     | 1;1-1-1;1-1-2;2-1-1;2-1-8;2-1-9;       | C 1:C,C.2 2:C,O.3,O                  | 668× 'no-match', 3× 'match'   | 64× 'no-match', 3× 'match'   | 4.598209084156757E-4  |
| high10     | 67                     | 1;1-2-1;1-1-8;2-1-1;2-2-8;             | C 1:2xC,O.3 2:C,2xO.3                | 668× 'no-match', 3× 'match'   | 64× 'no-match', 3× 'match'   | 4.598209084156757E-4  |
| high10     | 67                     | 3;1-1-2;1-2-3;2-1-2;2-2-3;2-1-8;2-1-9; | C.ar 1:C.2,2xC.ar 2:C.2,2xC.ar,O.3,O | 662× 'no-match', 9× 'match'   | 62× 'no-match', 5× 'match'   | 4.613813663041366E-4  |

: Significant MolPrint2D features. The listed features help to distinguish a selection of compounds from the entire dataset (of 671 compounds). We selected compounds that occur in only one of the two datasets, or the top 10 percent of all compounds with the lowest/highest LOAEL values (the mean LOAEL value was computed when a compound occurs in both dataset or was measured multiple times). For each set of selected compounds, we listed the top five most significant fragments with p-value < 0.01 (if available, otherwise less fragments). The MolPrint2D fragments are circular fragments that consist of a center atom, and to layers of neighboring atoms. {#tbl:molprint}

![A CheS-Mapper screen-shot showing 9 compounds that match the MolPrint2D fragment 15;1-1-1;2-2-1;2-1-15; (as SMILES syntax: ClC(C)Cl). Apart from the selected compound (blue box), the other 8 compounds belong to the top 10 percent of compounds with the lowest LOAEL values. I.e., this feature can be regarded as a structural alert in our dataset, as it is matched by only 9 compounds in the entire dataset and 8 of these compounds are highly active.](figure/matching-ClC(C)Cl.png){#fig:ches-mapper-alert}