summaryrefslogtreecommitdiff
path: root/mutagenicity.md
diff options
context:
space:
mode:
Diffstat (limited to 'mutagenicity.md')
-rw-r--r--mutagenicity.md16
1 files changed, 9 insertions, 7 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index 9f7e349..c278142 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -42,7 +42,7 @@ Abstract
Random forest, support vector machine, logistic regression, neural networks and k-nearest neighbor
(`lazar`) algorithms, were applied to new *Salmonella* mutagenicity dataset
with 8309 unique chemical structures. The best prediction accuracies in
-10-fold-crossvalidation were obtained with `lazar` models and MolPrint2D descriptors, that gave accuracies ({{lazar-high-confidence.acc_perc}}%)
+10-fold-crossvalidation were obtained with `lazar` models and MolPrint2D descriptors, that gave accuracies ({{cv.lazar-high-confidence.acc_perc}}%)
similar to the interlaboratory variability of the Ames test.
**TODO**: PA results
@@ -497,13 +497,15 @@ Crossvalidation results are summarized in the following tables: @tbl:lazar shows
Confusion matrices for all models are available from the git repository http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, individual predictions can be found in
http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/.
-The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{lazar-high-confidence.acc}} for predictions with high confidence, {{lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{R-DL.acc}} (R deep learning) to {{R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models.
+The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} for predictions with high confidence, {{cv.lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models.
Pyrrolizidine alkaloid mutagenicity predictions
-----------------------------------------------
Mutagenicity predictions from all investigated models for 602 pyrrolizidine alkaloids are summarized in Table 4.
+**TODO** **Verena und Philipp** Koennt Ihr bitte stichprobenweise die Tabelle ueberpruefen, mir verrutscht bei der Auswertung immer gerne etwas.
+
\input{tables/pa-tab.tex}
Training data and
@@ -546,16 +548,16 @@ models have low specificity.
The accuracy of `lazar` *in-silico* predictions are comparable to the
interlaboratory variability of the Ames test (80-85% according to
@Benigni1988), especially for predictions with high confidence
-({{lazar-high-confidence.acc_perc}}%). This is a clear indication that
+({{cv.lazar-high-confidence.acc_perc}}%). This is a clear indication that
*in-silico* predictions can be as reliable as the bioassays, if the compounds
are close to the applicability domain. This conclusion is also supported by our
analysis of `lazar` lowest observed effect level predictions, which are also
similar to the experimental variability (@Helma2018).
-The lowest number of predictions ({{lazar-padel-high-confidence.n}}) has been
+The lowest number of predictions ({{cv.lazar-padel-high-confidence.n}}) has been
obtained from `lazar`-PaDEL high confidence predictions, the largest number of
-predictions comes from Tensorflow models ({{tensorflow-rf.v3.n}}). Standard
-`lazar` give a slightly lower number of predictions ({{lazar-all.n}}) than R
+predictions comes from Tensorflow models ({{cv.tensorflow-rf.v3.n}}). Standard
+`lazar` give a slightly lower number of predictions ({{cv.lazar-all.n}}) than R
and Tensorflow models. This is not necessarily a disadvantage, because `lazar`
abstains from predictions, if the query compound is very dissimilar from the
compounds in the training set and thus avoids to make predictions for compounds
@@ -751,7 +753,7 @@ A new public *Salmonella* mutagenicity training dataset with 8309 compounds was
created and used it to train `lazar`, R and Tensorflow models with MolPrint2D
and PaDEL descriptors. The best performance was obtained with `lazar` models
using MolPrint2D descriptors, with prediction accuracies
-({{lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability
+({{cv.lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability
of the Ames test (80-85%). Models based on PaDEL descriptors had lower
accuracies than MolPrint2D models, but only the `lazar` algorithm could use
MolPrint2D descriptors.