summaryrefslogtreecommitdiff
path: root/mutagenicity.md
diff options
context:
space:
mode:
authorChristoph Helma <helma@in-silico.ch>2019-10-21 17:29:52 +0200
committerChristoph Helma <helma@in-silico.ch>2019-10-21 17:29:52 +0200
commit93f2fb17788b9d02b00935e0d1be7cd1d81ff555 (patch)
tree95ea869bf48bd41bb0d6d341e6cee7f3e01d2c81 /mutagenicity.md
parent1035124b854e21998d3fd9de4935780a19a2d3d3 (diff)
mustache preprocessing
Diffstat (limited to 'mutagenicity.md')
-rw-r--r--mutagenicity.md139
1 files changed, 64 insertions, 75 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index bf4f6d1..2f80bad 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -134,8 +134,8 @@ of a compound can be constructed that can be used to calculate chemical
similarities.
The chemical similarity between two compounds a and b is expressed as
-the proportion between atom environments common in both structures A ∩ B
-and the total number of atom environments A U B (Jaccard/Tanimoto
+the proportion between atom environments common in both structures $A \cap B$
+and the total number of atom environments $A \cup B$ (Jaccard/Tanimoto
index).
$$sim = \frac{\left| A\ \cap B \right|}{\left| A\ \cup B \right|}$$
@@ -335,117 +335,106 @@ Validation
Results
=======
-`lazar`
------
+{{#programs}}
+{{name}} Models
+--------
+{{#algos}}
-Random Forest
--------------
+### {{name}}
-The validation showed that the RF model has an accuracy of 64%, a
-sensitivity of 66% and a specificity of 63%. The confusion matrix of the
+10-fold crossvalidation of the {{abbrev}} model gave an accuracy of
+{{accuracy_perc}}%
+a sensitivity of
+{{true_positive_rate_perc}}%
+and a specificity of
+{{true_negative_rate_perc}}%
+The confusion matrix of the
model, calculated for 8080 instances, is provided in Table 1.
-Table 1: Confusion matrix of the RF model
+```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
+```
+{{/algos}}
+{{/programs}}
- Predicted genotoxicity
- ----------------------- ------------------------ ---------- ---------- -------------
- Measured genotoxicity ***PP*** ***PN*** ***Total***
- ***TP*** 2274 1163 3437
- ***TN*** 1736 2907 4643
- ***Total*** 4010 4070 8080
+R Models
+--------
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
+### Random Forest
-Support Vector Machines
------------------------
+The validation showed that the RF model has an accuracy of
+{{R-RF.accuracy}}%
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.accuracy * 100 | round'`{pipe="sh"}%,
+a sensitivity of
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_positive_rate * 100 | round'`{pipe="sh"}%,
+and a specificity of
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_negative_rate * 100 | round'`{pipe="sh"}%,
+The confusion matrix of the
+model, calculated for 8080 instances, is provided in Table 1.
+
+```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
+```
+
+### Support Vector Machines
The validation showed that the SVM model has an accuracy of 62%, a
sensitivity of 65% and a specificity of 60%. The confusion matrix of SVM
model, calculated for 8080 instances, is provided in Table 2.
-Table 2: Confusion matrix of the SVM model
-
- Predicted genotoxicity
- ----------------------- ------------------------ ---------- ---------- -------------
- Measured genotoxicity ***PP*** ***PN*** ***Total***
- ***TP*** 2057 1107 3164
- ***TN*** 1953 2963 4916
- ***Total*** 4010 4070 8080
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
+```{.table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"}
+```
-Deep Learning (R-project)
--------------------------
+### Deep Learning
The validation showed that the DL model generated in R has an accuracy
of 59%, a sensitivity of 89% and a specificity of 30%. The confusion
matrix of the model, normalised to 8080 instances, is provided in Table
3.
-Table 3: Confusion matrix of the DL model (R-project)
+```{.table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"}
+```
- Predicted genotoxicity
- ----------------------- ------------------------ ---------- ---------- -------------
- Measured genotoxicity ***PP*** ***PN*** ***Total***
- ***TP*** 3575 435 4010
- ***TN*** 2853 1217 4070
- ***Total*** 6428 1652 8080
+```{.table file="tables/r-summary.csv" caption="Summary of R model validations"}
+```
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
-
-DL model (TensorFlow)
----------------------
+TensorFlow Models
+-----------------
The validation showed that the DL model generated in TensorFlow has an
accuracy of 68%, a sensitivity of 70% and a specificity of 46%. The
confusion matrix of the model, normalised to 8080 instances, is provided
in Table 4.
-Table 4: Confusion matrix of the DL model (TensorFlow)
-
- Predicted genotoxicity
- ----------------------- ------------------------ ---------- ---------- -------------
- Measured genotoxicity ***PP*** ***PN*** ***Total***
- ***TP*** 2851 1227 4078
- ***TN*** 1825 2177 4002
- ***Total*** 4676 3404 8080
-
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
-
-The ROC curves from the 6-fold validation are shown in Figure 7.
+```{.table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without variable selecetion"}
+```
-![](figures/image7.png){width="3.825in"
-height="2.7327045056867894in"}
+```{.table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with variable selecetion"}
+```
-Figure 7: Six-fold cross-validation of TensorFlow DL model show an
-average area under the ROC-curve (ROC-AUC; measure of accuracy) of 68%.
+```{.table file="tables/tf-summary.csv" caption="Summary of TensorFlow model validations"}
+```
-In summary, the validation results of the four methods are presented in
-the following table.
+`lazar` Models
+--------------
-Table 5 Results of the cross-validation of the four models and after
-y-randomisation
+### MolPrint2D Descriptors
- ----------------------------------------------------------------------
- Accuracy CCR Sensitivity Specificity
- ----------------------- ---------- ------- ------------- -------------
- RF model 64.1% 64.4% 66.2% 62.6%
+```{.table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"}
+```
- SVM model 62.1% 62.6% 65.0% 60.3%
+```{.table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"}
+```
- DL model\ 59.3% 59.5% 89.2% 29.9%
- (R-project)
+### PaDEL Descriptors
- DL model (TensorFlow) 68% 62.2% 69.9% 45.6%
+```{.table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors"}
+```
- y-randomisation 50.5% 50.4% 50.3% 50.6%
- ----------------------------------------------------------------------
+```{.table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"}
+```
-CCR (correct classification rate)
+```{.table file="tables/lazar-summary.csv" caption="Summary of lazar model validations"}
+```
Discussion
==========