mustache preprocessing

author: Christoph Helma <helma@in-silico.ch> 2019-10-21 17:29:52 +0200
committer: Christoph Helma <helma@in-silico.ch> 2019-10-21 17:29:52 +0200
commit: 93f2fb17788b9d02b00935e0d1be7cd1d81ff555 (patch)
tree: 95ea869bf48bd41bb0d6d341e6cee7f3e01d2c81 /mutagenicity.md
parent: 1035124b854e21998d3fd9de4935780a19a2d3d3 (diff)
1 files changed, 64 insertions, 75 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index bf4f6d1..2f80bad 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -134,8 +134,8 @@ of a compound can be constructed that can be used to calculate chemical
 similarities.
 
 The chemical similarity between two compounds a and b is expressed as
-the proportion between atom environments common in both structures A ∩ B
-and the total number of atom environments A U B (Jaccard/Tanimoto
+the proportion between atom environments common in both structures $A \cap B$
+and the total number of atom environments $A \cup B$ (Jaccard/Tanimoto
 index).
 
 $$sim = \frac{\left| A\  \cap B \right|}{\left| A\  \cup B \right|}$$
@@ -335,117 +335,106 @@ Validation
 Results
 =======
 
-`lazar`
------
+{{#programs}}
+{{name}} Models
+--------
+{{#algos}}
 
-Random Forest
--------------
+### {{name}}
 
-The validation showed that the RF model has an accuracy of 64%, a
-sensitivity of 66% and a specificity of 63%. The confusion matrix of the
+10-fold crossvalidation of the {{abbrev}} model gave an accuracy of
+{{accuracy_perc}}%
+a sensitivity of
+{{true_positive_rate_perc}}%
+and a specificity of
+{{true_negative_rate_perc}}%
+The confusion matrix of the
 model, calculated for 8080 instances, is provided in Table 1.
 
-Table 1: Confusion matrix of the RF model
+```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
+```
+{{/algos}}
+{{/programs}}
 
-                          Predicted genotoxicity                         
-  ----------------------- ------------------------ ---------- ---------- -------------
-  Measured genotoxicity                            ***PP***   ***PN***   ***Total***
-                          ***TP***                 2274       1163       3437
-                          ***TN***                 1736       2907       4643
-                          ***Total***              4010       4070       8080
+R Models
+--------
 
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
+### Random Forest
 
-Support Vector Machines
------------------------
+The validation showed that the RF model has an accuracy of
+{{R-RF.accuracy}}%
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.accuracy * 100 | round'`{pipe="sh"}%,
+a sensitivity of
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_positive_rate * 100 | round'`{pipe="sh"}%,
+and a specificity of
+`cat /home/ch/src/mutagenicity-paper/10-fold-crossvalidations/summaries/R-RF.json|jq '.true_negative_rate * 100 | round'`{pipe="sh"}%,
+The confusion matrix of the
+model, calculated for 8080 instances, is provided in Table 1.
+
+```{.table file="tables/R-RF.csv" caption="Confusion matrix for R Random Forest predictions"}
+```
+
+### Support Vector Machines
 
 The validation showed that the SVM model has an accuracy of 62%, a
 sensitivity of 65% and a specificity of 60%. The confusion matrix of SVM
 model, calculated for 8080 instances, is provided in Table 2.
 
-Table 2: Confusion matrix of the SVM model
-
-                          Predicted genotoxicity                         
-  ----------------------- ------------------------ ---------- ---------- -------------
-  Measured genotoxicity                            ***PP***   ***PN***   ***Total***
-                          ***TP***                 2057       1107       3164
-                          ***TN***                 1953       2963       4916
-                          ***Total***              4010       4070       8080
 
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
+```{.table file="tables/R-SVM.csv" caption="Confusion matrix for R Support Vector Machine predictions"}
+```
 
-Deep Learning (R-project)
--------------------------
+### Deep Learning
 
 The validation showed that the DL model generated in R has an accuracy
 of 59%, a sensitivity of 89% and a specificity of 30%. The confusion
 matrix of the model, normalised to 8080 instances, is provided in Table
 3.
 
-Table 3: Confusion matrix of the DL model (R-project)
+```{.table file="tables/R-DL.csv" caption="Confusion matrix for R Deep Learning predictions"}
+```
 
-                          Predicted genotoxicity                         
-  ----------------------- ------------------------ ---------- ---------- -------------
-  Measured genotoxicity                            ***PP***   ***PN***   ***Total***
-                          ***TP***                 3575       435        4010
-                          ***TN***                 2853       1217       4070
-                          ***Total***              6428       1652       8080
+```{.table file="tables/r-summary.csv" caption="Summary of R model validations"}
+```
 
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
-
-DL model (TensorFlow)
----------------------
+TensorFlow Models
+-----------------
 
 The validation showed that the DL model generated in TensorFlow has an
 accuracy of 68%, a sensitivity of 70% and a specificity of 46%. The
 confusion matrix of the model, normalised to 8080 instances, is provided
 in Table 4.
 
-Table 4: Confusion matrix of the DL model (TensorFlow)
-
-                          Predicted genotoxicity                         
-  ----------------------- ------------------------ ---------- ---------- -------------
-  Measured genotoxicity                            ***PP***   ***PN***   ***Total***
-                          ***TP***                 2851       1227       4078
-                          ***TN***                 1825       2177       4002
-                          ***Total***              4676       3404       8080
-
-PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
-True negative
-
-The ROC curves from the 6-fold validation are shown in Figure 7.
+```{.table file="tables/tensorflow-all.csv" caption="Confusion matrix for Tensorflow predictions without variable selecetion"}
+```
 
-![](figures/image7.png){width="3.825in"
-height="2.7327045056867894in"}
+```{.table file="tables/tensorflow-selected.csv" caption="Confusion matrix for Tensorflow predictions with variable selecetion"}
+```
 
-Figure 7: Six-fold cross-validation of TensorFlow DL model show an
-average area under the ROC-curve (ROC-AUC; measure of accuracy) of 68%.
+```{.table file="tables/tf-summary.csv" caption="Summary of TensorFlow model validations"}
+```
 
-In summary, the validation results of the four methods are presented in
-the following table.
+`lazar` Models
+--------------
 
-Table 5 Results of the cross-validation of the four models and after
-y-randomisation
+### MolPrint2D Descriptors
 
-  ----------------------------------------------------------------------
-                          Accuracy   CCR     Sensitivity   Specificity
-  ----------------------- ---------- ------- ------------- -------------
-  RF model                64.1%      64.4%   66.2%         62.6%
+```{.table file="tables/lazar-all.csv" caption="Confusion matrix for lazar predictions with MolPrint2D descriptors"}
+```
 
-  SVM model               62.1%      62.6%   65.0%         60.3%
+```{.table file="tables/lazar-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with MolPrint2D descriptors"}
+```
 
-  DL model\               59.3%      59.5%   89.2%         29.9%
-  (R-project)                                              
+### PaDEL Descriptors
 
-  DL model (TensorFlow)   68%        62.2%   69.9%         45.6%
+```{.table file="tables/lazar-padel-all.csv" caption="Confusion matrix for lazar predictions with PaDEL descriptors"}
+```
 
-  y-randomisation         50.5%      50.4%   50.3%         50.6%
-  ----------------------------------------------------------------------
+```{.table file="tables/lazar-padel-high-confidence.csv" caption="Confusion matrix for high confidence lazar predictions with PaDEL descriptors"}
+```
 
-CCR (correct classification rate)
+```{.table file="tables/lazar-summary.csv" caption="Summary of lazar model validations"}
+```
 
 Discussion
 ==========
author	Christoph Helma <helma@in-silico.ch>	2019-10-21 17:29:52 +0200
committer	Christoph Helma <helma@in-silico.ch>	2019-10-21 17:29:52 +0200
commit	93f2fb17788b9d02b00935e0d1be7cd1d81ff555 (patch)
tree	95ea869bf48bd41bb0d6d341e6cee7f3e01d2c81 /mutagenicity.md
parent	1035124b854e21998d3fd9de4935780a19a2d3d3 (diff)