summaryrefslogtreecommitdiff
path: root/mutagenicity.md
diff options
context:
space:
mode:
Diffstat (limited to 'mutagenicity.md')
-rw-r--r--mutagenicity.md38
1 files changed, 28 insertions, 10 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index c278142..d05cbc7 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -478,7 +478,9 @@ Results
10-fold crossvalidations
------------------------
-Crossvalidation results are summarized in the following tables: @tbl:lazar shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results and @tbl:tensorflow Tensorflow results.
+Crossvalidation results are summarized in the following tables: @tbl:lazar
+shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results
+and @tbl:tensorflow Tensorflow results.
```{#tbl:lazar .table file="tables/lazar-summary.csv" caption="Summary of lazar crossvalidation results (all/high confidence predictions)"}
@@ -494,25 +496,41 @@ Crossvalidation results are summarized in the following tables: @tbl:lazar shows
![ROC plot of crossvalidation results.](figures/roc.png){#fig:roc}
-Confusion matrices for all models are available from the git repository http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, individual predictions can be found in
-http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/.
+Confusion matrices for all models are available from the git repository
+https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/,
+individual predictions can be found in
+https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/.
-The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} for predictions with high confidence, {{cv.lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models.
+The most accurate crossvalidation predictions have been obtained with standard
+`lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}}
+for predictions with high confidence, {{cv.lazar-all.acc}} for all
+predictions). Models utilizing PaDEL descriptors have generally lower
+accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}}
+(R/Tensorflow random forests). Sensitivity and specificity is generally well
+balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep
+learning (low specificity) models.
Pyrrolizidine alkaloid mutagenicity predictions
-----------------------------------------------
-Mutagenicity predictions from all investigated models for 602 pyrrolizidine alkaloids are summarized in Table 4.
+Mutagenicity predictions from all investigated models for 602 pyrrolizidine
+alkaloids (PAs) are summarized in Table 4. A CSV table with all predictions can be
+downloaded from https://git.in-silico.ch/mutagenicity-paper/tables/pa-table.csv
**TODO** **Verena und Philipp** Koennt Ihr bitte stichprobenweise die Tabelle ueberpruefen, mir verrutscht bei der Auswertung immer gerne etwas.
\input{tables/pa-tab.tex}
-Training data and
-pyrrolizidine alkaloids were visualised with t-distributed stochastic neighbor embedding (t-SNE, @Maaten2008)
-for MolPrint2D and PaDEL descriptors. t-SNA maps each high-dimensional object
-(chemical) to a two-dimensional point. Similar objects are represented by
-nearby points and dissimilar objects are represented by distant points.
+```{#tbl:pa-summary .table file="tables/pa-summary.csv" caption="Summary of pyrrolizidine alkaloid mutagenicity predictions"}
+```
+
+For the visualisation of the position of pyrrolizidine alkaloids in respect to
+the training data set we have applied t-distributed stochastic neighbor
+embedding (t-SNE, @Maaten2008) for MolPrint2D and PaDEL descriptors. t-SNE
+maps each high-dimensional object (chemical) to a two-dimensional point,
+maintaining the high-dimensional distances of the objects. Similar objects are
+represented by nearby points and dissimilar objects are represented by distant
+points.
@fig:tsne-mp2d shows the t-SNE of pyrrolizidine alkaloids (PA) and the mutagenicity training data in MP2D space (Tanimoto/Jaccard similarity).