1 files changed, 176 insertions, 107 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index 5a01ee9..3911799 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -53,69 +53,12 @@ accuracies of all investigated models ranged from 80-85% which is comparable
 with the interlaboratory variability of the *Salmonella* mutagenicity assay.
 Pyrrolizidine alkaloid predictions showed a clear distinction between chemical
 groups, where Otonecines had the highest proportion of positive mutagenicity
-predictions and Monoester the lowest.
+predictions and Monoesters the lowest.
 
 Introduction
 ============
 
 **TODO**: rationale for investigation
-
-<!---
-Pyrrolizidine alkaloids (PAs) are secondary plant ingredients found in
-many plant species as protection against predators [Hartmann & Witte
-1995](#_ENREF_59)[Langel et al. 2011](#_ENREF_76)(; ). PAs are ester
-alkaloids, which are composed of a necine base (two fused five-membered
-rings joined by a nitrogen atom) and one or two necic acid (carboxylic
-ester arms). The necine base can have different structures and thereby
-divides PAs into several structural groups, e.g. otonecine, platynecine,
-and retronecine. The structural groups of the necic acid are macrocyclic
-diester, open-ring diester and monoester [Langel et al.
-2011](#_ENREF_76)().
-
-PA are mainly metabolised in the liver, which is at the same time the
-main target organ of toxicity [Bull & Dick 1959](#_ENREF_17)[Bull et al.
-1958](#_ENREF_18)[Butler et al. 1970](#_ENREF_20)[DeLeve et al.
-1996](#_ENREF_33)[Jago 1971](#_ENREF_65)[Li et al.
-2011](#_ENREF_78)[Neumann et al. 2015](#_ENREF_99)(; ; ; ; ; ; ). There
-are three principal metabolic pathways for 1,2-unsaturated PAs [Chen et
-al. 2010](#_ENREF_26)(): (i) Detoxification by hydrolysis: the ester
-bond on positions C7 and C9 are hydrolysed by non-specific esterases to
-release necine base and necic acid, which are then subjected to further
-phase II-conjugation and excretion. (ii) Detoxification by *N*-oxidation
-of the necine base (only possible for retronecine-type PAs): the
-nitrogen is oxidised to form a PA *N*-oxides, which can be conjugated by
-phase II enzymes e.g. glutathione and then excreted. PA *N*-oxides can
-be converted back into the corresponding parent PA [Wang et al.
-2005](#_ENREF_134)(). (iii) Metabolic activation or toxification: PAs
-are metabolic activated/ toxified by oxidation (for retronecine-type
-PAs) or oxidative *N*-demethylation (for otonecine-type PAs [Lin
-1998](#_ENREF_82)()). This pathway is mainly catalysed by cytochrome
-P450 isoforms CYP2B and 3A [Ruan et al. 2014b](#_ENREF_115)(), and
-results in the formation of dehydropyrrolizidines (DHP, also known as
-pyrrolic ester or reactive pyrroles). DHPs are highly reactive and cause
-damage in the cells where they are formed, usually hepatocytes. However,
-they can also pass from the hepatocytes into the adjacent sinusoids and
-damage the endothelial lining cells [Gao et al. 2015](#_ENREF_48)()
-predominantly by reaction with protein, lipids and DNA. There is even
-evidence, that conjugation of DHP to glutathione, which would generally
-be considered a detoxification step, could result in reactive
-metabolites, which might also lead to DNA adduct formation [Xia et al.
-2015](#_ENREF_138)(). Due to the ability to form DNA adducts, DNA
-crosslinks and DNA breaks 1,2-unsaturated PAs are generally considered
-genotoxic and carcinogenic [Chen et al. 2010](#_ENREF_26)[EFSA
-2011](#_ENREF_36)[Fu et al. 2004](#_ENREF_45)[Li et al.
-2011](#_ENREF_78)[Takanashi et al. 1980](#_ENREF_126)[Yan et al.
-2008](#_ENREF_140)[Zhao et al. 2012](#_ENREF_148)(; ; ; ; ; ; ). Still,
-there is no evidence yet that PAs are carcinogenic in humans [ANZFA
-2001](#_ENREF_4)[EMA 2016](#_ENREF_39)(; ). One general limitation of
-studies with PAs is the number of different PAs investigated. Around 30
-PAs are currently commercially available, therefore all studies focus on
-these PAs. This is also true for *in vitro* and *in vivo* tests on
-mutagenicity and genotoxicity. To gain a wider perspective, in this
-study over 600 different PAs were assessed on their mutagenic potential
-using four different machine learning techniques.
---->
-
 <!---
 
 Mutagenicity datasets
@@ -123,9 +66,30 @@ Algorithms
 descriptors
 define abbreviations
 pyrrolizidine 
+large dataset -> comparison of algorithms and descriptors
+reliable experimental outcome
 --->
 
-The main objectives of this study were
+As case study we decided to apply these mutagenicity models to {{pa.nr}}
+Pyrrolizidines alkaloids (PAs) in order to highlight potentials and problems
+with the applicability of mutagenicity models for compounds with very limited
+experimental data.
+
+Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant
+families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae*
+(@Hartmann1995, @Langel2011) and form a powerful defence mechanism against
+herbivores. PAs are heterocyclic ester alkaloids composed of a necine base (two
+fused five-membered rings joined by a single nitrogen atom) and a necic acid
+(one or two carboxylic ester arms), occurring principally in two forms,
+tertiary base PAs and PA N-oxides. Several *in vitro* studies have shown the
+mutagenic potential of PAs, which seems highly dependent on structure of necine
+base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to
+limited availability of pure substances, only a limited number of PAs have been
+investigated with regards to their structure-specific mutagenicity. To overcome
+this bottleneck, the prediction of structure-specific mutagenic potential of
+PAs with different machine learning models could provide further inside in the mechanisms.
+
+Summing up the main objectives of this study were
 
   - to generate a new mutagenicity training dataset, by combining the most comprehensive public datasets
   - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors
@@ -503,43 +467,6 @@ investigated models can be downloaded from
 A visual representation of all PA predictions can be found at
 <https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/pa-predictions.pdf>.
 
-<!--
-@tbl:pa-mp2d and @tbl:pa-cdk summarise the outcome of pyrrolizidine alkaloid predictions from all models with MolPrint2D and CDK descriptors.
-
-| Model  | mutagenic | non-mutagenic | Nr. predictions |
-|-------:|-----------|---------------|-----------------|
-| lazar-all | {{pa.mp2d_lazar_all.mut_perc}}% ({{pa.mp2d_lazar_all.mut}}) | {{pa.mp2d_lazar_all.non_mut_perc}}% ({{pa.mp2d_lazar_all.non_mut}}) | {{pa.mp2d_lazar_all.n_perc}}% ({{pa.mp2d_lazar_all.n}}) |
-| lazar-HC | {{pa.mp2d_lazar_high_confidence.mut_perc}}% ({{pa.mp2d_lazar_high_confidence.mut}}) | {{pa.mp2d_lazar_high_confidence.non_mut_perc}}% ({{pa.mp2d_lazar_high_confidence.non_mut}}) | {{pa.mp2d_lazar_high_confidence.n_perc}}% ({{pa.mp2d_lazar_high_confidence.n}}) |
-| RF | {{pa.mp2d_rf.mut_perc}}% ({{pa.mp2d_rf.mut}}) | {{pa.mp2d_rf.non_mut_perc}}% ({{pa.mp2d_rf.non_mut}}) | {{pa.mp2d_rf.n_perc}}% ({{pa.mp2d_rf.n}}) |
-| LR-sgd | {{pa.mp2d_lr.mut_perc}}% ({{pa.mp2d_lr.mut}}) | {{pa.mp2d_lr.non_mut_perc}}% ({{pa.mp2d_lr.non_mut}}) | {{pa.mp2d_lr.n_perc}}% ({{pa.mp2d_lr.n}}) |
-| LR-scikit | {{pa.mp2d_lr2.mut_perc}}% ({{pa.mp2d_lr2.mut}}) | {{pa.mp2d_lr2.non_mut_perc}}% ({{pa.mp2d_lr2.non_mut}}) | {{pa.mp2d_lr2.n_perc}}% ({{pa.mp2d_lr2.n}}) |
-| NN | {{pa.mp2d_nn.mut_perc}}% ({{pa.mp2d_nn.mut}}) | {{pa.mp2d_nn.non_mut_perc}}% ({{pa.mp2d_nn.non_mut}}) | {{pa.mp2d_nn.n_perc}}% ({{pa.mp2d_nn.n}}) |
-| SVM | {{pa.mp2d_svm.mut_perc}}% ({{pa.mp2d_svm.mut}}) | {{pa.mp2d_svm.non_mut_perc}}% ({{pa.mp2d_svm.non_mut}}) | {{pa.mp2d_svm.n_perc}}% ({{pa.mp2d_svm.n}}) |
-
-: Summary of MolPrint2D pyrrolizidine alkaloid predictions {#tbl:pa-mp2d}
-
-| Model  | mutagenic | non-mutagenic | Nr. predictions |
-|-------:|-----------|---------------|-----------------|
-| lazar-all | {{pa.cdk_lazar_all.mut_perc}}% ({{pa.cdk_lazar_all.mut}}) | {{pa.cdk_lazar_all.non_mut_perc}}% ({{pa.cdk_lazar_all.non_mut}}) | {{pa.cdk_lazar_all.n_perc}}% ({{pa.cdk_lazar_all.n}}) |
-| lazar-HC | {{pa.cdk_lazar_high_confidence.mut_perc}}% ({{pa.cdk_lazar_high_confidence.mut}}) | {{pa.cdk_lazar_high_confidence.non_mut_perc}}% ({{pa.cdk_lazar_high_confidence.non_mut}}) | {{pa.cdk_lazar_high_confidence.n_perc}}% ({{pa.cdk_lazar_high_confidence.n}}) |
-| RF | {{pa.cdk_rf.mut_perc}}% ({{pa.cdk_rf.mut}}) | {{pa.cdk_rf.non_mut_perc}}% ({{pa.cdk_rf.non_mut}}) | {{pa.cdk_rf.n_perc}}% ({{pa.cdk_rf.n}}) |
-| LR-sgd | {{pa.cdk_lr.mut_perc}}% ({{pa.cdk_lr.mut}}) | {{pa.cdk_lr.non_mut_perc}}% ({{pa.cdk_lr.non_mut}}) | {{pa.cdk_lr.n_perc}}% ({{pa.cdk_lr.n}}) |
-| LR-scikit | {{pa.cdk_lr2.mut_perc}}% ({{pa.cdk_lr2.mut}}) | {{pa.cdk_lr2.non_mut_perc}}% ({{pa.cdk_lr2.non_mut}}) | {{pa.cdk_lr2.n_perc}}% ({{pa.cdk_lr2.n}}) |
-| NN | {{pa.cdk_nn.mut_perc}}% ({{pa.cdk_nn.mut}}) | {{pa.cdk_nn.non_mut_perc}}% ({{pa.cdk_nn.non_mut}}) | {{pa.cdk_nn.n_perc}}% ({{pa.cdk_nn.n}}) |
-| SVM | {{pa.cdk_svm.mut_perc}}% ({{pa.cdk_svm.mut}}) | {{pa.cdk_svm.non_mut_perc}}% ({{pa.cdk_svm.non_mut}}) | {{pa.cdk_svm.n_perc}}% ({{pa.cdk_svm.n}}) |
-
-: Summary of CDK pyrrolizidine alkaloid predictions {#tbl:pa-cdk}
--->
-
-@fig:pa-groups displays the proportion of positive mutagenicity predictions
-from all models for the different pyrrolizidine alkaloid groups. Tensorflow
-models predicted all {{pa.n}} pyrrolizidine alkaloids, `lazar` MP2D models
-predicted {{pa.mp2d_lazar_all.n}} compounds
-({{pa.mp2d_lazar_high_confidence.n}} with high confidence) and `lazar` CDK
-models {{pa.cdk_lazar_all.n}} compounds ({{pa.cdk_lazar_high_confidence.n}}
-with high confidence).
-
-![Summary of pyrrolizidine alkaloid predictions](figures/pa-groups.png){#fig:pa-groups}
 
 <!--
 ![Summary of Diester predictions](figures/Diester.png){#fig:die}
@@ -617,6 +544,65 @@ mutagenicity predictions in the context of training data. t-SNE visualisations o
 ![t-SNE visualisation of CDK support vector machine predictions](figures/tsne-cdk-svm-classifications.png){#fig:tsne-cdk-svm}
 -->
 
+@tbl:pa-summary summarises the outcome of pyrrolizidine alkaloid predictions from all models with MolPrint2D and CDK descriptors.
+
+
+| Model  | MP2D Mutagenic | Nr. predictions | CDK Mutagenic | Nr. predictions |
+|-------:|----------------|-----------------|---------------|-----------------|
+| lazar-all | {{pa.mp2d_lazar_all.mut_perc}}% ({{pa.mp2d_lazar_all.mut}}) | {{pa.mp2d_lazar_all.n_perc}}% ({{pa.mp2d_lazar_all.n}}) | {{pa.cdk_lazar_all.mut_perc}}% ({{pa.cdk_lazar_all.mut}}) | {{pa.cdk_lazar_all.n_perc}}% ({{pa.cdk_lazar_all.n}}) |
+| lazar-HC | {{pa.mp2d_lazar_high_confidence.mut_perc}}% ({{pa.mp2d_lazar_high_confidence.mut}}) | {{pa.mp2d_lazar_high_confidence.n_perc}}% ({{pa.mp2d_lazar_high_confidence.n}}) | {{pa.cdk_lazar_high_confidence.mut_perc}}% ({{pa.cdk_lazar_high_confidence.mut}}) | {{pa.cdk_lazar_high_confidence.n_perc}}% ({{pa.cdk_lazar_high_confidence.n}}) |
+| RF | {{pa.mp2d_rf.mut_perc}}% ({{pa.mp2d_rf.mut}}) | {{pa.mp2d_rf.n_perc}}% ({{pa.mp2d_rf.n}}) | {{pa.cdk_rf.mut_perc}}% ({{pa.cdk_rf.mut}}) | {{pa.cdk_rf.n_perc}}% ({{pa.cdk_rf.n}}) |
+| LR-sgd | {{pa.mp2d_lr.mut_perc}}% ({{pa.mp2d_lr.mut}}) | {{pa.mp2d_lr.n_perc}}% ({{pa.mp2d_lr.n}}) | {{pa.cdk_lr.mut_perc}}% ({{pa.cdk_lr.mut}}) | {{pa.cdk_lr.n_perc}}% ({{pa.cdk_lr.n}}) |
+| LR-scikit | {{pa.mp2d_lr2.mut_perc}}% ({{pa.mp2d_lr2.mut}}) | {{pa.mp2d_lr2.n_perc}}% ({{pa.mp2d_lr2.n}}) | {{pa.cdk_lr2.mut_perc}}% ({{pa.cdk_lr2.mut}}) | {{pa.cdk_lr2.n_perc}}% ({{pa.cdk_lr2.n}}) |
+| NN | {{pa.mp2d_nn.mut_perc}}% ({{pa.mp2d_nn.mut}}) | {{pa.mp2d_nn.n_perc}}% ({{pa.mp2d_nn.n}}) | {{pa.cdk_nn.mut_perc}}% ({{pa.cdk_nn.mut}}) | {{pa.cdk_nn.n_perc}}% ({{pa.cdk_nn.n}}) |
+| SVM | {{pa.mp2d_svm.mut_perc}}% ({{pa.mp2d_svm.mut}}) | {{pa.mp2d_svm.n_perc}}% ({{pa.mp2d_svm.n}}) | {{pa.cdk_svm.mut_perc}}% ({{pa.cdk_svm.mut}}) | {{pa.cdk_svm.n_perc}}% ({{pa.cdk_svm.n}}) |
+
+: Summary of pyrrolizidine alkaloid predictions {#tbl:pa-summary}
+
+@fig:pa-groups displays the proportion of positive mutagenicity predictions
+from all models for the different pyrrolizidine alkaloid groups. Tensorflow
+models predicted all {{pa.n}} pyrrolizidine alkaloids, `lazar` MP2D models
+predicted {{pa.mp2d_lazar_all.n}} compounds
+({{pa.mp2d_lazar_high_confidence.n}} with high confidence) and `lazar` CDK
+models {{pa.cdk_lazar_all.n}} compounds ({{pa.cdk_lazar_high_confidence.n}}
+with high confidence).
+
+![Summary of pyrrolizidine alkaloid predictions](figures/pa-groups.png){#fig:pa-groups}
+
+For the lazar-HC model, only
+{{pa.mp2d_lazar_high_confidence.n_perc}}/{{pa.cdk_lazar_high_confidence.n_perc}}%
+of the PA dataset were within the stricter similarity thresholds of 0.5/0.9
+(MP2D/CDK). Reduction of the similarity threshold to 0.2/0.5 in the lazar-all
+model increased the amount of predictable PAs to
+{{pa.mp2d_lazar_all.n_perc}}/{{pa.cdk_lazar_all.n_perc}}%. As the other ML
+models do not consider applicability domains, all PAs were predicted. 
+
+Although most of the models show similar accuracies, sensitivities and
+specificities in crossvalidation experiments some of the models (MPD-RF, CDK-RF
+and CDK-SVM) predict a lower number of mutagens
+({{pa.cdk_rf.mut_perc}}-{{pa.mp2d_rf.mut_perc}}%) than the majority of the
+models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%,
+@tbl:pa-summary, @fig:pa-groups). 
+
+Over all models, the mean value of mutagenic predicted PAs was highest for
+Otonecines ({{pa.groups.Otonecine.mut_perc}}%, 
+{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}),
+followed by Macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, {{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}),
+Dehydropyrrolizidine ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, {{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}),
+Tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, {{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and
+Retronecines ({{pa.groups.Retronecine.mut_perc}}%, {{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}).
+
+When excluding the aforementioned three deviating models,
+the rank order stays the same, but the percentage of mutagenic PAs is higher.
+
+The following rank order for mutagenic probability can be deduced from the results of all models taken together: 
+
+Necine base: 				Platynecine < Retronecine << Otonecine
+
+Necic acid: 				Monoester < Diester << Macrocyclic diester
+
+Modification of necine base:		N-oxide  < Tertiary PA < Dehydropyrrolizidine
+
 Discussion
 ==========
 
@@ -716,8 +702,10 @@ CDK descriptors contain in contrast in every case matrices with
 Pyrrolizidine alkaloid mutagenicity predictions
 -----------------------------------------------
 
-@fig:pa-groups shows a clear differentiation between the different
-pyrrolizidine alkaloid groups. The largest proportion of mutagenic predictions
+### Algorithms and descriptors
+
+<!--
+The largest proportion of mutagenic predictions
 was observed for Otonecines {{pa.groups.Otonecine.mut_perc}}%
 ({{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), the lowest for
 Monoesters {{pa.groups.Monoester.mut_perc}}%
@@ -733,9 +721,12 @@ models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%
 (@fig:pa-groups). lazar-CDK on the other hand
 predicts the largest number of mutagens for all groups with exception of
 Otonecines.
+-->
 
-These differences between predictions from different algorithms and descriptors
-were not expected based on crossvalidation results.
+@fig:pa-groups shows a clear differentiation between the different
+pyrrolizidine alkaloid groups.
+Nevertheless differences between predictions from different algorithms and descriptors
+(@tbl:pa-summary) were not expected based on crossvalidation results.
 
 In order to investigate, if any of the investigated models show systematic
 errors in the  vicinity of pyrrolizidine-alkaloids we have performed a
@@ -743,12 +734,72 @@ detailled t-SNE analysis of all models (see @fig:tsne-mp2d-rf and
 @fig:tsne-cdk-lazar-all for two examples, all visualisations can be found at
 <https://git.in-silico.ch/mutagenicity-paper/figures>.
 
-Nevertheless none of the models showed obvious deviations from their expected
+None of the models showed obvious deviations from their expected
 behaviour, so the reason for the disagreement between some of the models
-remains unclear at the moment.  It is however perfectly possible that some
+remains unclear at the moment.  It is however possible that some
 systematic errors are covered up by converting high dimensional spaces to two
 coordinates and are thus invisible in t-SNE visualisations.
 
+### Necic acid
+
+The rank order of the necic acid is comparable in all models. PAs from the
+monoester type had the lowest genotoxic potential, followed by PAs from the
+open-ring diester type. PAs with macrocyclic diesters had the highest genotoxic
+potential. The result fits well with current state of knowledge: in general,
+PAs, which have a macrocyclic diesters as necic acid, are considered to be more toxic
+than those with an open-ring diester or monoester (@EFSA2011, @Fu2004,
+Ruan2014b). This was also confirmed by more recent studies, confirming that
+macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters
+(@Hadi2021; @Allemang2018, @Louisse2019). 
+
+### Necine base
+
+In the rank order of necine base PAs, platynecine is the least mutagenic, followed
+by retronecine, and otonecine. Saturated PAs of the platynecine-type are
+generally accepted to be less or non-toxic and have been shown in *in vitro*
+experiments to form no DNA-adducts (@Xia2013). In literature,
+otonecine-type PAs were shown to be more toxic than those of the
+retronecine-type (@Li2013). 
+
+### Modifications of necine base
+
+The group-specific results reflect the expected relationship between the
+groups: the low mutagenic potential of N-oxides and the high potential of
+Dehydropyrrolizidines (DHP) (@Chen2010). 
+
+Dehydropyrrolizidines are regarded as the toxic principle in the metabolism of
+PAs, and known to produce protein- and DNA-adducts (@Chen2010). None of the
+models did not meet this expectation and predicted the majority of DHP as
+non-mutagenic. However, the following issues need to be considered. On the one
+hand, all DHP were outside of the stricter applicability domain of MP2D lazar.
+This indicates that they are structurally very different than the training data
+and might be out of the applicability domain of all models based on this
+training set. In addition, DHP has two unsaturated double bounds in its necine
+base, making it highly reactive. DHP and other comparable molecules have a very
+short lifespan, and usually cannot be used in *in vitro* experiments. 
+
+<!--
+Furthermore, the probabilities for this substance groups needs to be considered, and not only the consolidated prediction. In the LAZAR model, all DHPs had probabilities for both outcomes (genotoxic and not genotoxic) mainly below 30%. Additionally, the probabilities for both outcomes were close together, often within 10% of each other. The fact that for both outcomes, the probabilities were low and close together, indicates a lower confidence in the prediction of the model for DHPs. 
+-->
+
+PA N-oxides are easily conjugated for extraction, they are generally considered
+as detoxification products, which are *in vivo* quickly renally eliminated
+(@Chen2010).
+
+Overall the low number of positive mutagenicity predictions was unexpected.
+PAs are generally considered to be genotoxic, and the mode of action is also known.
+Therefore, the fact that some models predict the majority of PAs as not
+mutagenic seems contradictory. To understand this result, the experimental
+basis of the training dataset has to be considered. The 
+training dataset is based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There are some
+studies, which show mutagenicity of PAs in the Ames test (@Chen2010).
+Also, @Rubiolo1992 examined several different PAs and several
+different extracts of PA-containing plants in the AMES test. They found that
+the Ames test was indeed able to detect mutagenicity of PAs, but in general,
+appeared to have a low sensitivity. The pre-incubation phase for metabolic
+activation of PAs by microsomal enzymes was the sensitivity-limiting step. This
+could very well mean that the low sensitivity of the Ames test for PAs is also reflected in the investigated models.
+
 <!--
 non-conflicting CIDs
 43040
@@ -798,8 +849,6 @@ R RF and SVM models favor very strongly non-mutagenic predictions (only {{pa.r.r
 
 It is interesting to note, that different implementations of the same algorithm show little accordance in their prediction (see e.g R-RF vs. Tensorflow-RF and LR-sgd vs. LR-scikit in Table 4 and @tbl:pa-summary).
 
-**TODO** **Verena, Philipp** habt ihr eine Erklaerung dafuer?
-
 @fig:tsne-mp2d and @fig:tsne-padel show the t-SNE of training data and pyrrolizidine alkaloids. In @fig:tsne-mp2d the PAs are located closely together at the outer border of the training set. In @fig:tsne-padel they are less clearly separated and spread over the space occupied by the training examples.
 
 This is probably the reason why CDK models predicted all instances and the MP2D model only {{pa.lazar.mp2d.all.n}} PAs. Predicting a large number of instances is however not the ultimate goal, we need accurate predictions and an unambiguous estimation of the applicability domain. With CDK descriptors *all* PAs are within the applicability domain of the training data, which is unlikely despite the size of the training set. MolPrint2D descriptors provide a clearer separation, which is also reflected in a better separation between high and low confidence predictions in `lazar` MP2D predictions as compared to `lazar` CDK predictions. Crossvalidation results with substantially higher accuracies for MP2D models than for CDK models also support this argument.
@@ -812,11 +861,31 @@ From a practical point we still have to face the question, how to choose model p
 Conclusions
 ===========
 
-A new public *Salmonella* mutagenicity training dataset with 8309 compounds was
-created and used it to train `lazar` and Tensorflow models with MolPrint2D
-and CDK descriptors.
+A new public *Salmonella* mutagenicity training dataset with {{cv.n}}
+experimental results was created and used to train `lazar` and Tensorflow
+models with MolPrint2D and CDK descriptors. All investigated algorithm and
+descriptor combinations showed accuracies comparable to the interlaboratory
+variability of the Ames test.
+
+Pyrrolizidine alkaloid predictions showed a clear separation between different
+classes of PAs which were generally in accordance with the current
+toxicological knowledge about these compounds.  Some of the models showed
+however a substantially lower number of mutagenicity predictions, despite
+similar crossvalidation results and we were unable to identify the reasons for
+this discrepancy within this investigation.
+
+Thus the practical question how to choose model predictions in the absence of
+experimental data remains open. Tensorflow predictions do not include
+applicability domain estimations and the rationales for predictions cannot be
+traced by toxicologists.  Transparent models like `lazar` may have an advantage
+in this context, because they present rationales for predictions (similar
+compounds with experimental data) which can be accepted or rejected by
+toxicologists and provide validated applicability domain estimations. 
 
 <!---
+in a form that is understandable and criticiseable by toxicologists without a machine learning background.
+
+is available (we found two PAs in the training data, but this number is too low, to draw any general conclusions). Based on crossvalidation results and the arguments in favor of MolPrint2D descriptors we would put the highest trust in `lazar` MolPrint2D predictions, especially in high-confidence predictions. `lazar` predictions have a accuracy comparable to experimental variability (@Helma2018) for compounds within the applicability domain. But they should not be trusted blindly. For practical purposes it is important to study the rationales (i.e. neighbors and their experimental activities) for each prediction of relevance. A freely accessible GUI for this purpose has been implemented at https://lazar.in-silico.ch.
 The best performance was obtained with `lazar` models
 using MolPrint2D descriptors, with prediction accuracies
 ({{cv.lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability