summaryrefslogtreecommitdiff
path: root/mutagenicity.md
diff options
context:
space:
mode:
authorChristoph Helma <helma@in-silico.ch>2019-10-21 20:29:12 +0200
committerChristoph Helma <helma@in-silico.ch>2019-10-21 20:29:12 +0200
commit2e03df94681951a62229b76b52370da094aa1ec6 (patch)
treea1bedd275c3ffab65c49f4eefec91bf6a0768d09 /mutagenicity.md
parentb1e01382e0580676d3686195f9897a60a2ffee1d (diff)
Results section
Diffstat (limited to 'mutagenicity.md')
-rw-r--r--mutagenicity.md132
1 files changed, 73 insertions, 59 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index 2f80bad..a9fa116 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -25,6 +25,7 @@ institute:
bibliography: bibliography.bib
keywords: mutagenicity, (Q)SAR, lazar, random forest, support vector machine, deep learning
documentclass: scrartcl
+tblPrefix: Table
...
Abstract
@@ -335,83 +336,60 @@ Validation
Results
=======
-{{#programs}}
-{{name}} Models
---------
-{{#algos}}
-
-### {{name}}
-
-10-fold crossvalidation of the {{abbrev}} model gave an accuracy of
-{{accuracy_perc}}%
-a sensitivity of
-{{true_positive_rate_perc}}%
-and a specificity of
-{{true_negative_rate_perc}}%
-The confusion matrix of the
-model, calculated for 8080 instances, is provided in Table 1.
-
-```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
-```
-{{/algos}}
-{{/programs}}
-
R Models
--------
### Random Forest
-The validation showed that the RF model has an accuracy of
-{{R-RF.accuracy}}%
-`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.accuracy * 100 | round'`{pipe="sh"}%,
-a sensitivity of
-`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_positive_rate * 100 | round'`{pipe="sh"}%,
-and a specificity of
-`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_negative_rate * 100 | round'`{pipe="sh"}%,
-The confusion matrix of the
-model, calculated for 8080 instances, is provided in Table 1.
-
-```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
+10-fold crossvalidation of the R-RF model gave an accuracy of
+{{R-RF.acc_perc}}%, a sensitivity of {{R-RF.tpr_perc}}% and a specificity of
+{{R-RF.tnr_perc}}%. The confusion matrix for {{R-RF.n}}
+predictions is provided in @tbl:R-RF.
+
+```{#tbl:R-RF .table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
```
### Support Vector Machines
-The validation showed that the SVM model has an accuracy of 62%, a
-sensitivity of 65% and a specificity of 60%. The confusion matrix of SVM
-model, calculated for 8080 instances, is provided in Table 2.
-
+10-fold crossvalidation of the R-SVM model gave an accuracy of
+{{R-SVM.acc_perc}}%, a sensitivity of {{R-SVM.tpr_perc}}% and a specificity of
+{{R-SVM.tnr_perc}}%. The confusion matrix for {{R-SVM.n}}
+predictions is provided in @tbl:R-SVM.
-```{.table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"}
+```{#tbl:R-SVM .table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"}
```
### Deep Learning
-The validation showed that the DL model generated in R has an accuracy
-of 59%, a sensitivity of 89% and a specificity of 30%. The confusion
-matrix of the model, normalised to 8080 instances, is provided in Table
-3.
+10-fold crossvalidation of the R-DL model gave an accuracy of
+{{R-DL.acc_perc}}%, a sensitivity of {{R-DL.tpr_perc}}% and a specificity of
+{{R-DL.tnr_perc}}%. The confusion matrix for {{R-DL.n}}
+predictions is provided in @tbl:R-DL.
-```{.table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"}
-```
-
-```{.table file="tables/r-summary.csv" caption="Summary of R model validations"}
+```{#tbl:R-DL .table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"}
```
TensorFlow Models
-----------------
-The validation showed that the DL model generated in TensorFlow has an
-accuracy of 68%, a sensitivity of 70% and a specificity of 46%. The
-confusion matrix of the model, normalised to 8080 instances, is provided
-in Table 4.
+### Without feature selection
-```{.table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without variable selecetion"}
-```
+10-fold crossvalidation of the TensorFlow DL model gave an accuracy of
+{{tensorflow-all.acc_perc}}%, a sensitivity of {{tensorflow-all.tpr_perc}}% and a specificity of
+{{tensorflow-all.tnr_perc}}%. The confusion matrix for {{tensorflow-all.n}}
+predictions is provided in @tbl:tensorflow-all.
-```{.table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with variable selecetion"}
+```{#tbl:tensorflow-all .table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without feature selecetion"}
```
-```{.table file="tables/tf-summary.csv" caption="Summary of TensorFlow model validations"}
+### With feature selection
+
+10-fold crossvalidation of the TensorFlow model with feature selection gave an accuracy of
+{{tensorflow-selected.acc_perc}}%, a sensitivity of {{tensorflow-selected.tpr_perc}}% and a specificity of
+{{tensorflow-selected.tnr_perc}}%. The confusion matrix for {{tensorflow-selected.n}}
+predictions is provided in @tbl:tensorflow-selected.
+
+```{#tbl:tensorflow-selected .table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with feature selecetion"}
```
`lazar` Models
@@ -419,23 +397,59 @@ in Table 4.
### MolPrint2D Descriptors
-```{.table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"}
+10-fold crossvalidation of the lazar model with MolPrint2D descriptors gave an accuracy of
+{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of
+{{lazar-all.tnr_perc}}%.
+The confusion matrix for {{lazar-all.n}}
+predictions is provided in @tbl:lazar-all.
+
+```{#tbl:lazar-all .table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"}
```
-```{.table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"}
+Predictions with high confidence had an accuracy of
+{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of
+{{lazar-high-confidence.tnr_perc}}%.
+The confusion matrix for {{lazar-high-confidence.n}}
+predictions is provided in @tbl:lazar-high-confidence.
+
+
+```{#tbl:lazar-high-confidence .table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"}
```
### PaDEL Descriptors
-```{.table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors"}
-```
+10-fold crossvalidation of the lazar model with PaDEL descriptors gave an accuracy of
+{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of
+{{lazar-all.tnr_perc}}%.
+The confusion matrix for {{lazar-all.n}}
+predictions is provided in @tbl:lazar-padel-all.
-```{.table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"}
+```{#tbl:lazar-padel-all .table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors" }
```
-```{.table file="tables/lazar-summary.csv" caption="Summary of lazar model validations"}
+Predictions with high confidence had an accuracy of
+{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of
+{{lazar-high-confidence.tnr_perc}}%.
+The confusion matrix for {{lazar-high-confidence.n}}
+predictions is provided in @tbl:lazar-padel-high-confidence.
+
+```{#tbl:lazar-padel-high-confidence .table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"}
```
+The results of all crossvalidation experiments are summarized in @tbl:summary.
+
+| |R-RF | R-SVM | R-DL | TF | TF-FS | L | L-HC | L-P | L-P-HC|
+|-|-----|-------|------|----|-------|---|------|------|--------|
+|Accuracy|{{R-RF.acc}}|{{R-SVM.acc}}|{{R-DL.acc}}|{{tensorflow-all.acc}}|{{tensorflow-selected.acc}}|{{lazar-all.acc}}|{{lazar-high-confidence.acc}}|{{lazar-padel-all.acc}}|{{lazar-padel-high-confidence.acc}}|
+|Sensitivity|{{R-RF.tpr}}|{{R-SVM.tpr}}|{{R-DL.tpr}}|{{tensorflow-all.tpr}}|{{tensorflow-selected.tpr}}|{{lazar-all.tpr}}|{{lazar-high-confidence.tpr}}|{{lazar-padel-all.tpr}}|{{lazar-padel-high-confidence.tpr}}|
+|Specificity|{{R-RF.tnr}}|{{R-SVM.tnr}}|{{R-DL.tnr}}|{{tensorflow-all.tnr}}|{{tensorflow-selected.tnr}}|{{lazar-all.tnr}}|{{lazar-high-confidence.tnr}}|{{lazar-padel-all.tnr}}|{{lazar-padel-high-confidence.tnr}}|
+|PPV|{{R-RF.ppv}}|{{R-SVM.ppv}}|{{R-DL.ppv}}|{{tensorflow-all.ppv}}|{{tensorflow-selected.ppv}}|{{lazar-all.ppv}}|{{lazar-high-confidence.ppv}}|{{lazar-padel-all.ppv}}|{{lazar-padel-high-confidence.ppv}}|
+|NPV|{{R-RF.npv}}|{{R-SVM.npv}}|{{R-DL.npv}}|{{tensorflow-all.npv}}|{{tensorflow-selected.npv}}|{{lazar-all.npv}}|{{lazar-high-confidence.npv}}|{{lazar-padel-all.npv}}|{{lazar-padel-high-confidence.npv}}|
+|Nr. predictions|{{R-RF.n}}|{{R-SVM.n}}|{{R-DL.n}}|{{tensorflow-all.n}}|{{tensorflow-selected.n}}|{{lazar-all.n}}|{{lazar-high-confidence.n}}|{{lazar-padel-all.n}}|{{lazar-padel-high-confidence.n}}|
+
+: Summary of crossvalidation results. *R-RF*: R Random Forests, *R-SVM*: R Support Vector Machines, *R-DL*: R Deep Learning, *TF*: TensorFlow without feature selection, *TF-FS*: TensorFlow with feature selection, *L*: lazar, *L-HC*: lazar high confidence predictions, *L-P*: lazar with PaDEL descriptors, *L-P-HC*: lazar PaADEL high confidence predictions, *PPV*: Positive predictive value (Precision), *NPV*: Negative predictive value {#tbl:summary}
+
+
Discussion
==========