summaryrefslogtreecommitdiff
path: root/mutagenicity.md
diff options
context:
space:
mode:
authorChristoph Helma <helma@in-silico.ch>2020-10-16 18:28:18 +0200
committerChristoph Helma <helma@in-silico.ch>2020-10-16 18:28:18 +0200
commite288019a0f3eb691723944d6e47838d52cfdc21a (patch)
tree15df8933515f8d414971c0c9454dc37b88c3b739 /mutagenicity.md
parente3a32112611f263104c767fae8c6e1f2b95d505f (diff)
pa prediction table integrated
Diffstat (limited to 'mutagenicity.md')
-rw-r--r--mutagenicity.md125
1 files changed, 7 insertions, 118 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index b15ea54..4ac5a32 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -35,6 +35,7 @@ header-includes:
- \usepackage{setspace}
- \doublespacing
- \usepackage{lineno}
+ - \usepackage{color, colortbl, longtable}
- \linenumbers
...
@@ -481,7 +482,7 @@ Results
Crossvalidation results are summarized in the following tables: @tbl:lazar shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results and @tbl:tensorflow Tensorflow results.
-```{#tbl:lazar .table file="tables/lazar-summary.csv" caption="Summary of lazar crossvalidation results (all predictions/high confidence predictions"}
+```{#tbl:lazar .table file="tables/lazar-summary.csv" caption="Summary of lazar crossvalidation results (all/high confidence predictions"}
```
```{#tbl:R .table file="tables/r-summary.csv" caption="Summary of R crossvalidation results"}
@@ -499,133 +500,21 @@ http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/
The most accurate crossvalidation predictions have been obtained with `lazar` models with MolPrint2D descriptors ({{lazar-high-confidence.acc}} for predictions with high confidence, {{lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from TODO to TODO. Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models.
-<!--
-| |R-RF | R-SVM | R-DL | TF | TF-FS | L | L-HC | L-P | L-P-HC|
-|-|-----|-------|------|----|-------|---|------|------|--------|
-|Accuracy|{{R-RF.acc}}|{{R-SVM.acc}}|{{R-DL.acc}}|{{tensorflow-all.acc}}|{{tensorflow-selected.acc}}|{{lazar-all.acc}}|{{lazar-high-confidence.acc}}|{{lazar-padel-all.acc}}|{{lazar-padel-high-confidence.acc}}|
-|Sensitivity|{{R-RF.tpr}}|{{R-SVM.tpr}}|{{R-DL.tpr}}|{{tensorflow-all.tpr}}|{{tensorflow-selected.tpr}}|{{lazar-all.tpr}}|{{lazar-high-confidence.tpr}}|{{lazar-padel-all.tpr}}|{{lazar-padel-high-confidence.tpr}}|
-|Specificity|{{R-RF.tnr}}|{{R-SVM.tnr}}|{{R-DL.tnr}}|{{tensorflow-all.tnr}}|{{tensorflow-selected.tnr}}|{{lazar-all.tnr}}|{{lazar-high-confidence.tnr}}|{{lazar-padel-all.tnr}}|{{lazar-padel-high-confidence.tnr}}|
-|PPV|{{R-RF.ppv}}|{{R-SVM.ppv}}|{{R-DL.ppv}}|{{tensorflow-all.ppv}}|{{tensorflow-selected.ppv}}|{{lazar-all.ppv}}|{{lazar-high-confidence.ppv}}|{{lazar-padel-all.ppv}}|{{lazar-padel-high-confidence.ppv}}|
-|NPV|{{R-RF.npv}}|{{R-SVM.npv}}|{{R-DL.npv}}|{{tensorflow-all.npv}}|{{tensorflow-selected.npv}}|{{lazar-all.npv}}|{{lazar-high-confidence.npv}}|{{lazar-padel-all.npv}}|{{lazar-padel-high-confidence.npv}}|
-|Nr. predictions|{{R-RF.n}}|{{R-SVM.n}}|{{R-DL.n}}|{{tensorflow-all.n}}|{{tensorflow-selected.n}}|{{lazar-all.n}}|{{lazar-high-confidence.n}}|{{lazar-padel-all.n}}|{{lazar-padel-high-confidence.n}}|
-
-: Summary of crossvalidation results. *R-RF*: R Random Forests, *R-SVM*: R Support Vector Machines, *R-DL*: R Deep Learning, *TF*: Tensorflow without feature selection, *TF-FS*: Tensorflow with feature selection, *L*: lazar, *L-HC*: lazar high confidence predictions, *L-P*: lazar with PaDEL descriptors, *L-P-HC*: lazar PaDEL high confidence predictions, *PPV*: Positive predictive value (Precision), *NPV*: Negative predictive value {#tbl:summary}
-
-R Models
---------
-
-### Random Forest
-
-10-fold crossvalidation of the R-RF model gave an accuracy of
-{{R-RF.acc_perc}}%, a sensitivity of {{R-RF.tpr_perc}}% and a specificity of
-{{R-RF.tnr_perc}}%. The confusion matrix for {{R-RF.n}}
-predictions is provided in @tbl:R-RF.
-
-```{#tbl:R-RF .table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
-```
-
-### Support Vector Machines
-
-10-fold crossvalidation of the R-SVM model gave an accuracy of
-{{R-SVM.acc_perc}}%, a sensitivity of {{R-SVM.tpr_perc}}% and a specificity of
-{{R-SVM.tnr_perc}}%. The confusion matrix for {{R-SVM.n}}
-predictions is provided in @tbl:R-SVM.
-
-```{#tbl:R-SVM .table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"}
-```
-
-### Deep Learning
-
-10-fold crossvalidation of the R-DL model gave an accuracy of
-{{R-DL.acc_perc}}%, a sensitivity of {{R-DL.tpr_perc}}% and a specificity of
-{{R-DL.tnr_perc}}%. The confusion matrix for {{R-DL.n}}
-predictions is provided in @tbl:R-DL.
-
-```{#tbl:R-DL .table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"}
-```
-
-Tensorflow Models
------------------
-
-### Without feature selection
-
-10-fold crossvalidation of the Tensorflow DL model gave an accuracy of
-{{tensorflow-all.acc_perc}}%, a sensitivity of {{tensorflow-all.tpr_perc}}% and a specificity of
-{{tensorflow-all.tnr_perc}}%. The confusion matrix for {{tensorflow-all.n}}
-predictions is provided in @tbl:tensorflow-all.
-
-```{#tbl:tensorflow-all .table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without feature selecetion"}
-```
-
-### With feature selection
-
-10-fold crossvalidation of the Tensorflow model with feature selection gave an accuracy of
-{{tensorflow-selected.acc_perc}}%, a sensitivity of {{tensorflow-selected.tpr_perc}}% and a specificity of
-{{tensorflow-selected.tnr_perc}}%. The confusion matrix for {{tensorflow-selected.n}}
-predictions is provided in @tbl:tensorflow-selected.
-
-```{#tbl:tensorflow-selected .table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with feature selecetion"}
-```
-
-`lazar` Models
---------------
-
-### MolPrint2D Descriptors
-
-10-fold crossvalidation of the lazar model with MolPrint2D descriptors gave an accuracy of
-{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of
-{{lazar-all.tnr_perc}}%.
-The confusion matrix for {{lazar-all.n}}
-predictions is provided in @tbl:lazar-all.
-
-```{#tbl:lazar-all .table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"}
-```
-
-Predictions with high confidence had an accuracy of
-{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of
-{{lazar-high-confidence.tnr_perc}}%.
-The confusion matrix for {{lazar-high-confidence.n}}
-predictions is provided in @tbl:lazar-high-confidence.
-
-
-```{#tbl:lazar-high-confidence .table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"}
-```
-
-### PaDEL Descriptors
-
-10-fold crossvalidation of the lazar model with PaDEL descriptors gave an accuracy of
-{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of
-{{lazar-all.tnr_perc}}%.
-The confusion matrix for {{lazar-all.n}}
-predictions is provided in @tbl:lazar-padel-all.
-
-```{#tbl:lazar-padel-all .table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors" }
-```
-
-Predictions with high confidence had an accuracy of
-{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of
-{{lazar-high-confidence.tnr_perc}}%.
-The confusion matrix for {{lazar-high-confidence.n}}
-predictions is provided in @tbl:lazar-padel-high-confidence.
-
-```{#tbl:lazar-padel-high-confidence .table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"}
-```
--->
Pyrrolizidine alkaloid mutagenicity predictions
-----------------------------------------------
-Pyrrolizidine alkaloid mutagenicity predictions are summarized in Table @tab:pa.
+Pyrrolizidine alkaloid mutagenicity predictions are summarized in @tab:pa.
@fig:tsne-mp2d shows the position of pyrrolizidine alkaloids (PA) in the mutagenicity training dataset in MP2D space
-![t-sne visualisation of mutagenicty training data and pyrrolizidine alkaloids (PA)](figures/tsne-mp2d.png){#fig:tsne-mp2d}
+\input{tables/pa-tab.tex}
-@fig:tsne-padel shows the position of pyrrolizidine alkaloids (PA) in the mutagenicity training dataset in PADEL space
+![t-sne visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA)](figures/tsne-mp2d.png){#fig:tsne-mp2d}
-![t-sne visualisation of mutagenicty training data and pyrrolizidine alkaloids (PA)](figures/tsne-padel.png){#fig:tsne-padel}
+@fig:tsne-padel shows the position of pyrrolizidine alkaloids (PA) in the mutagenicity training dataset in PADEL space
-\input{pa-tab.tex}
+![t-sne visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA)](figures/tsne-padel.png){#fig:tsne-padel}
Discussion
==========