diff options
Diffstat (limited to 'mutagenicity.md')
-rw-r--r-- | mutagenicity.md | 16 |
1 files changed, 9 insertions, 7 deletions
diff --git a/mutagenicity.md b/mutagenicity.md index 9f7e349..c278142 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -42,7 +42,7 @@ Abstract Random forest, support vector machine, logistic regression, neural networks and k-nearest neighbor (`lazar`) algorithms, were applied to new *Salmonella* mutagenicity dataset with 8309 unique chemical structures. The best prediction accuracies in -10-fold-crossvalidation were obtained with `lazar` models and MolPrint2D descriptors, that gave accuracies ({{lazar-high-confidence.acc_perc}}%) +10-fold-crossvalidation were obtained with `lazar` models and MolPrint2D descriptors, that gave accuracies ({{cv.lazar-high-confidence.acc_perc}}%) similar to the interlaboratory variability of the Ames test. **TODO**: PA results @@ -497,13 +497,15 @@ Crossvalidation results are summarized in the following tables: @tbl:lazar shows Confusion matrices for all models are available from the git repository http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, individual predictions can be found in http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/. -The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{lazar-high-confidence.acc}} for predictions with high confidence, {{lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{R-DL.acc}} (R deep learning) to {{R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models. +The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} for predictions with high confidence, {{cv.lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models. Pyrrolizidine alkaloid mutagenicity predictions ----------------------------------------------- Mutagenicity predictions from all investigated models for 602 pyrrolizidine alkaloids are summarized in Table 4. +**TODO** **Verena und Philipp** Koennt Ihr bitte stichprobenweise die Tabelle ueberpruefen, mir verrutscht bei der Auswertung immer gerne etwas. + \input{tables/pa-tab.tex} Training data and @@ -546,16 +548,16 @@ models have low specificity. The accuracy of `lazar` *in-silico* predictions are comparable to the interlaboratory variability of the Ames test (80-85% according to @Benigni1988), especially for predictions with high confidence -({{lazar-high-confidence.acc_perc}}%). This is a clear indication that +({{cv.lazar-high-confidence.acc_perc}}%). This is a clear indication that *in-silico* predictions can be as reliable as the bioassays, if the compounds are close to the applicability domain. This conclusion is also supported by our analysis of `lazar` lowest observed effect level predictions, which are also similar to the experimental variability (@Helma2018). -The lowest number of predictions ({{lazar-padel-high-confidence.n}}) has been +The lowest number of predictions ({{cv.lazar-padel-high-confidence.n}}) has been obtained from `lazar`-PaDEL high confidence predictions, the largest number of -predictions comes from Tensorflow models ({{tensorflow-rf.v3.n}}). Standard -`lazar` give a slightly lower number of predictions ({{lazar-all.n}}) than R +predictions comes from Tensorflow models ({{cv.tensorflow-rf.v3.n}}). Standard +`lazar` give a slightly lower number of predictions ({{cv.lazar-all.n}}) than R and Tensorflow models. This is not necessarily a disadvantage, because `lazar` abstains from predictions, if the query compound is very dissimilar from the compounds in the training set and thus avoids to make predictions for compounds @@ -751,7 +753,7 @@ A new public *Salmonella* mutagenicity training dataset with 8309 compounds was created and used it to train `lazar`, R and Tensorflow models with MolPrint2D and PaDEL descriptors. The best performance was obtained with `lazar` models using MolPrint2D descriptors, with prediction accuracies -({{lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability +({{cv.lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability of the Ames test (80-85%). Models based on PaDEL descriptors had lower accuracies than MolPrint2D models, but only the `lazar` algorithm could use MolPrint2D descriptors. |