diff options
Diffstat (limited to 'mutagenicity.md')
-rw-r--r-- | mutagenicity.md | 38 |
1 files changed, 28 insertions, 10 deletions
diff --git a/mutagenicity.md b/mutagenicity.md index c278142..d05cbc7 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -478,7 +478,9 @@ Results 10-fold crossvalidations ------------------------ -Crossvalidation results are summarized in the following tables: @tbl:lazar shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results and @tbl:tensorflow Tensorflow results. +Crossvalidation results are summarized in the following tables: @tbl:lazar +shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results +and @tbl:tensorflow Tensorflow results. ```{#tbl:lazar .table file="tables/lazar-summary.csv" caption="Summary of lazar crossvalidation results (all/high confidence predictions)"} @@ -494,25 +496,41 @@ Crossvalidation results are summarized in the following tables: @tbl:lazar shows ![ROC plot of crossvalidation results.](figures/roc.png){#fig:roc} -Confusion matrices for all models are available from the git repository http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, individual predictions can be found in -http://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/. +Confusion matrices for all models are available from the git repository +https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, +individual predictions can be found in +https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/. -The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} for predictions with high confidence, {{cv.lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models. +The most accurate crossvalidation predictions have been obtained with standard +`lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} +for predictions with high confidence, {{cv.lazar-all.acc}} for all +predictions). Models utilizing PaDEL descriptors have generally lower +accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} +(R/Tensorflow random forests). Sensitivity and specificity is generally well +balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep +learning (low specificity) models. Pyrrolizidine alkaloid mutagenicity predictions ----------------------------------------------- -Mutagenicity predictions from all investigated models for 602 pyrrolizidine alkaloids are summarized in Table 4. +Mutagenicity predictions from all investigated models for 602 pyrrolizidine +alkaloids (PAs) are summarized in Table 4. A CSV table with all predictions can be +downloaded from https://git.in-silico.ch/mutagenicity-paper/tables/pa-table.csv **TODO** **Verena und Philipp** Koennt Ihr bitte stichprobenweise die Tabelle ueberpruefen, mir verrutscht bei der Auswertung immer gerne etwas. \input{tables/pa-tab.tex} -Training data and -pyrrolizidine alkaloids were visualised with t-distributed stochastic neighbor embedding (t-SNE, @Maaten2008) -for MolPrint2D and PaDEL descriptors. t-SNA maps each high-dimensional object -(chemical) to a two-dimensional point. Similar objects are represented by -nearby points and dissimilar objects are represented by distant points. +```{#tbl:pa-summary .table file="tables/pa-summary.csv" caption="Summary of pyrrolizidine alkaloid mutagenicity predictions"} +``` + +For the visualisation of the position of pyrrolizidine alkaloids in respect to +the training data set we have applied t-distributed stochastic neighbor +embedding (t-SNE, @Maaten2008) for MolPrint2D and PaDEL descriptors. t-SNE +maps each high-dimensional object (chemical) to a two-dimensional point, +maintaining the high-dimensional distances of the objects. Similar objects are +represented by nearby points and dissimilar objects are represented by distant +points. @fig:tsne-mp2d shows the t-SNE of pyrrolizidine alkaloids (PA) and the mutagenicity training data in MP2D space (Tanimoto/Jaccard similarity). |