diff options
author | Christoph Helma <helma@in-silico.ch> | 2019-10-21 20:29:12 +0200 |
---|---|---|
committer | Christoph Helma <helma@in-silico.ch> | 2019-10-21 20:29:12 +0200 |
commit | 2e03df94681951a62229b76b52370da094aa1ec6 (patch) | |
tree | a1bedd275c3ffab65c49f4eefec91bf6a0768d09 /mutagenicity.md | |
parent | b1e01382e0580676d3686195f9897a60a2ffee1d (diff) |
Results section
Diffstat (limited to 'mutagenicity.md')
-rw-r--r-- | mutagenicity.md | 132 |
1 files changed, 73 insertions, 59 deletions
diff --git a/mutagenicity.md b/mutagenicity.md index 2f80bad..a9fa116 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -25,6 +25,7 @@ institute: bibliography: bibliography.bib keywords: mutagenicity, (Q)SAR, lazar, random forest, support vector machine, deep learning documentclass: scrartcl +tblPrefix: Table ... Abstract @@ -335,83 +336,60 @@ Validation Results ======= -{{#programs}} -{{name}} Models --------- -{{#algos}} - -### {{name}} - -10-fold crossvalidation of the {{abbrev}} model gave an accuracy of -{{accuracy_perc}}% -a sensitivity of -{{true_positive_rate_perc}}% -and a specificity of -{{true_negative_rate_perc}}% -The confusion matrix of the -model, calculated for 8080 instances, is provided in Table 1. - -```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"} -``` -{{/algos}} -{{/programs}} - R Models -------- ### Random Forest -The validation showed that the RF model has an accuracy of -{{R-RF.accuracy}}% -`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.accuracy * 100 | round'`{pipe="sh"}%, -a sensitivity of -`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_positive_rate * 100 | round'`{pipe="sh"}%, -and a specificity of -`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_negative_rate * 100 | round'`{pipe="sh"}%, -The confusion matrix of the -model, calculated for 8080 instances, is provided in Table 1. - -```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"} +10-fold crossvalidation of the R-RF model gave an accuracy of +{{R-RF.acc_perc}}%, a sensitivity of {{R-RF.tpr_perc}}% and a specificity of +{{R-RF.tnr_perc}}%. The confusion matrix for {{R-RF.n}} +predictions is provided in @tbl:R-RF. + +```{#tbl:R-RF .table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"} ``` ### Support Vector Machines -The validation showed that the SVM model has an accuracy of 62%, a -sensitivity of 65% and a specificity of 60%. The confusion matrix of SVM -model, calculated for 8080 instances, is provided in Table 2. - +10-fold crossvalidation of the R-SVM model gave an accuracy of +{{R-SVM.acc_perc}}%, a sensitivity of {{R-SVM.tpr_perc}}% and a specificity of +{{R-SVM.tnr_perc}}%. The confusion matrix for {{R-SVM.n}} +predictions is provided in @tbl:R-SVM. -```{.table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"} +```{#tbl:R-SVM .table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"} ``` ### Deep Learning -The validation showed that the DL model generated in R has an accuracy -of 59%, a sensitivity of 89% and a specificity of 30%. The confusion -matrix of the model, normalised to 8080 instances, is provided in Table -3. +10-fold crossvalidation of the R-DL model gave an accuracy of +{{R-DL.acc_perc}}%, a sensitivity of {{R-DL.tpr_perc}}% and a specificity of +{{R-DL.tnr_perc}}%. The confusion matrix for {{R-DL.n}} +predictions is provided in @tbl:R-DL. -```{.table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"} -``` - -```{.table file="tables/r-summary.csv" caption="Summary of R model validations"} +```{#tbl:R-DL .table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"} ``` TensorFlow Models ----------------- -The validation showed that the DL model generated in TensorFlow has an -accuracy of 68%, a sensitivity of 70% and a specificity of 46%. The -confusion matrix of the model, normalised to 8080 instances, is provided -in Table 4. +### Without feature selection -```{.table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without variable selecetion"} -``` +10-fold crossvalidation of the TensorFlow DL model gave an accuracy of +{{tensorflow-all.acc_perc}}%, a sensitivity of {{tensorflow-all.tpr_perc}}% and a specificity of +{{tensorflow-all.tnr_perc}}%. The confusion matrix for {{tensorflow-all.n}} +predictions is provided in @tbl:tensorflow-all. -```{.table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with variable selecetion"} +```{#tbl:tensorflow-all .table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without feature selecetion"} ``` -```{.table file="tables/tf-summary.csv" caption="Summary of TensorFlow model validations"} +### With feature selection + +10-fold crossvalidation of the TensorFlow model with feature selection gave an accuracy of +{{tensorflow-selected.acc_perc}}%, a sensitivity of {{tensorflow-selected.tpr_perc}}% and a specificity of +{{tensorflow-selected.tnr_perc}}%. The confusion matrix for {{tensorflow-selected.n}} +predictions is provided in @tbl:tensorflow-selected. + +```{#tbl:tensorflow-selected .table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with feature selecetion"} ``` `lazar` Models @@ -419,23 +397,59 @@ in Table 4. ### MolPrint2D Descriptors -```{.table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"} +10-fold crossvalidation of the lazar model with MolPrint2D descriptors gave an accuracy of +{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of +{{lazar-all.tnr_perc}}%. +The confusion matrix for {{lazar-all.n}} +predictions is provided in @tbl:lazar-all. + +```{#tbl:lazar-all .table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"} ``` -```{.table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"} +Predictions with high confidence had an accuracy of +{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of +{{lazar-high-confidence.tnr_perc}}%. +The confusion matrix for {{lazar-high-confidence.n}} +predictions is provided in @tbl:lazar-high-confidence. + + +```{#tbl:lazar-high-confidence .table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"} ``` ### PaDEL Descriptors -```{.table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors"} -``` +10-fold crossvalidation of the lazar model with PaDEL descriptors gave an accuracy of +{{lazar-all.acc_perc}}%, a sensitivity of {{lazar-all.tpr_perc}}% and a specificity of +{{lazar-all.tnr_perc}}%. +The confusion matrix for {{lazar-all.n}} +predictions is provided in @tbl:lazar-padel-all. -```{.table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"} +```{#tbl:lazar-padel-all .table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors" } ``` -```{.table file="tables/lazar-summary.csv" caption="Summary of lazar model validations"} +Predictions with high confidence had an accuracy of +{{lazar-high-confidence.acc_perc}}%, a sensitivity of {{lazar-high-confidence.tpr_perc}}% and a specificity of +{{lazar-high-confidence.tnr_perc}}%. +The confusion matrix for {{lazar-high-confidence.n}} +predictions is provided in @tbl:lazar-padel-high-confidence. + +```{#tbl:lazar-padel-high-confidence .table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"} ``` +The results of all crossvalidation experiments are summarized in @tbl:summary. + +| |R-RF | R-SVM | R-DL | TF | TF-FS | L | L-HC | L-P | L-P-HC| +|-|-----|-------|------|----|-------|---|------|------|--------| +|Accuracy|{{R-RF.acc}}|{{R-SVM.acc}}|{{R-DL.acc}}|{{tensorflow-all.acc}}|{{tensorflow-selected.acc}}|{{lazar-all.acc}}|{{lazar-high-confidence.acc}}|{{lazar-padel-all.acc}}|{{lazar-padel-high-confidence.acc}}| +|Sensitivity|{{R-RF.tpr}}|{{R-SVM.tpr}}|{{R-DL.tpr}}|{{tensorflow-all.tpr}}|{{tensorflow-selected.tpr}}|{{lazar-all.tpr}}|{{lazar-high-confidence.tpr}}|{{lazar-padel-all.tpr}}|{{lazar-padel-high-confidence.tpr}}| +|Specificity|{{R-RF.tnr}}|{{R-SVM.tnr}}|{{R-DL.tnr}}|{{tensorflow-all.tnr}}|{{tensorflow-selected.tnr}}|{{lazar-all.tnr}}|{{lazar-high-confidence.tnr}}|{{lazar-padel-all.tnr}}|{{lazar-padel-high-confidence.tnr}}| +|PPV|{{R-RF.ppv}}|{{R-SVM.ppv}}|{{R-DL.ppv}}|{{tensorflow-all.ppv}}|{{tensorflow-selected.ppv}}|{{lazar-all.ppv}}|{{lazar-high-confidence.ppv}}|{{lazar-padel-all.ppv}}|{{lazar-padel-high-confidence.ppv}}| +|NPV|{{R-RF.npv}}|{{R-SVM.npv}}|{{R-DL.npv}}|{{tensorflow-all.npv}}|{{tensorflow-selected.npv}}|{{lazar-all.npv}}|{{lazar-high-confidence.npv}}|{{lazar-padel-all.npv}}|{{lazar-padel-high-confidence.npv}}| +|Nr. predictions|{{R-RF.n}}|{{R-SVM.n}}|{{R-DL.n}}|{{tensorflow-all.n}}|{{tensorflow-selected.n}}|{{lazar-all.n}}|{{lazar-high-confidence.n}}|{{lazar-padel-all.n}}|{{lazar-padel-high-confidence.n}}| + +: Summary of crossvalidation results. *R-RF*: R Random Forests, *R-SVM*: R Support Vector Machines, *R-DL*: R Deep Learning, *TF*: TensorFlow without feature selection, *TF-FS*: TensorFlow with feature selection, *L*: lazar, *L-HC*: lazar high confidence predictions, *L-P*: lazar with PaDEL descriptors, *L-P-HC*: lazar PaADEL high confidence predictions, *PPV*: Positive predictive value (Precision), *NPV*: Negative predictive value {#tbl:summary} + + Discussion ========== |