1 files changed, 116 insertions, 189 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index c80bdf1..1014cc1 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -49,7 +49,7 @@ Abstract
 ========
 
 Random forest, support vector machine, logistic regression, neural networks and
-k-nearest neighbor (`lazar`) algorithms, were applied to new *Salmonella*
+k-nearest neighbor (`lazar`) algorithms, were applied to a new *Salmonella*
 mutagenicity dataset with {{cv.n_uniq}} unique chemical structures utilizing
 MolPrint2D and Chemistry Development Kit (CDK) descriptors.  Crossvalidation
 accuracies of all investigated models ranged from 80-85% which is comparable
@@ -62,7 +62,7 @@ Introduction
 ============
 
 The assessment of mutagenicity is an important part in the safety assessment of
-chemical structures, because genomic changes may lead to cancer and germ
+chemical structures, because mutations may lead to cancer and germ
 cells damage.  The *Salmonella typhimurium* bacterial reverse mutation
 test (Ames test) is capable to identify substances that cause mutations (e.g.,
 base-pair substitutions, frameshifts, insertions, deletions) and is generally
@@ -93,9 +93,9 @@ Within this study we attempted
   - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors for mutagenicity predictions
   - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`)
 
-In order to highlight potentials and problems with the application of
-mutagenicity models to compounds with limited experimental data we decided to
-apply these mutagenicity models to {{pa.nr}} Pyrrolizidine alkaloids (PAs).
+To demonstrate the application of mutagenicity models to compounds with very
+limited experimental data and to show their strengths an weaknesses we decided
+to apply them to {{pa.nr}} Pyrrolizidine alkaloids (PAs).
 
 Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant
 families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae*
@@ -103,14 +103,28 @@ families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae*
 herbivores. PAs are heterocyclic ester alkaloids composed of a necine base (two
 fused five-membered rings joined by a single nitrogen atom) and a necic acid
 (one or two carboxylic ester arms), occurring principally in two forms,
-tertiary base PAs and PA N-oxides. Several *in vitro* studies have shown the
-mutagenic potential of PAs, which seems highly dependent on structure of necine
+tertiary base PAs and PA N-oxides.
+
+In mammals, PAs are mainly metabolized in the liver. There are three principal metabolic pathways for 1,2-unsaturated PAs (@Chen2010): 
+
+Detoxification by 
+
+- hydrolysis of the ester bond on positions C7 and C9 by non-specific esterases to release necine base and necic acid 
+
+- N-oxidation of the necine base to form a PA N-oxides, which can be either conjugated by phase II enzymes and then excreted or converted back into the corresponding parent PA (following ref) This detoxification pathway is not possible for otonecine-type PAs, as they are N-methylated (see @fig:pa-schema, @Wang2005)
+
+- Metabolic activation or toxification by oxidation (for retronecine-type PAs) or oxidative N-demethylation (for otonecine-type Pas) by cytochromes P450 isoforms CYP2B and 3A (@Lin1998,  @Ruan2014)
+
+The latter reactions result in the formation of dehydropyrrolizidine (DHP) that is highly reactive and causes damage by building adducts with protein, lipids and DNA (@Chen2010). On the other hand, open diesters and macrocyclic PAs have a reduced detoxification due to steric hinderance of the respective esterases (@Ruan2014)
+
+Therefore the 
+mutagenic probability of PAs is highly dependent on structure of necine
 base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to
 limited availability of pure substances, only a limited number of PAs have been
-investigated with regards to their structure-specific mutagenicity. To overcome
-this bottleneck, the prediction of structure-specific mutagenic potential of
-PAs with different machine learning models could provide further inside in the
-mechanisms.
+investigated with regards to their structure-specific mutagenicity and
+experimentally in an Ames test. To overcome this bottleneck, the prediction of
+structure-specific mutagenic probabilities of PAs with different machine learning
+models could provide further insights in the mechanisms.
 
 Materials and Methods
 =====================
@@ -129,10 +143,17 @@ training dataset was compiled from the following sources:
 
 -   EFSA Dataset (695 compounds @EFSA2016): <https://data.europa.eu/euodp/data/storage/f/2017-0719T142131/GENOTOX%20data%20and%20dictionary.xls>
 
-Mutagenicity classifications from Kazius and Hansen datasets were used
-without further processing. To achieve consistency with these
-datasets, EFSA compounds were classified as mutagenic, if at least one
-positive result was found for TA98 or T100 Salmonella strains.
+Mutagenicity classifications from Kazius and Hansen datasets were used without
+further processing. According to these publications compounds were classified
+as mutagenic, if at least one positive result has been obtained in *Salmonella
+typhimurium* strains TA98, TA100, TA1535, TA1537, TA97, TA102 and 1538 either
+with or without metabolic activation by S9. *E. coli* results were not
+considered in these databases. To achieve consistency with these datasets, EFSA
+compounds were classified as mutagenic, if at least one positive result was
+found for TA98 or T100 Salmonella strains either with or without metabolic
+activation. The complete dataset contains chemicals for very diverse
+application areas (e.g. pharmaceuticals, pesticides, industrial chemicals,
+environmental contaminants).
 
 Dataset merges were based on unique SMILES (*Simplified Molecular Input Line
 Entry Specification*, @Weininger1989) strings of the compound structures.
@@ -158,10 +179,14 @@ substances were searched individually in PubChem and, if available, downloaded
 separately.  Non-PA substances, duplicates, and isomers were removed from the
 files, but artificial PAs, even if unlikely to occur in nature, were kept. The
 resulting PA dataset comprised a total of {{pa.n}} different PAs.
+Further details about the compilation of the PA dataset are described in @Schoening2017.
+
 
 The PAs in the dataset were classified according to structural features. A
 total of 9 different structural features were assigned to the necine base,
-modifications of the necine base and to the necic acid:
+modifications of the necine base and to the necic acid (@fig:pa-schema):
+
+![Structural features of pyrrolizidine alkaloids](figures/PA-Schema.png){#fig:pa-schema}
 
 For the necine base, the following structural features were chosen:
 
@@ -172,8 +197,8 @@ For the necine base, the following structural features were chosen:
 For the modifications of the necine base, the following structural features were chosen:
 
   - N-oxide-type ({{pa.groups.N_oxide.n}} compounds)
+  - Dehydropyrrolizidine-type (DHP, pyrrolic ester, {{pa.groups.Dehydropyrrolizidine.n}} compounds)
   - Tertiary-type (PAs which were neither from the N-oxide- nor DHP-type, {{pa.groups.Tertiary_PA.n}} compounds)
-  - Dehydropyrrolizidine-type (pyrrolic ester, {{pa.groups.Dehydropyrrolizidine.n}} compounds)
 
 For the necic acid, the following structural features were chosen:
 
@@ -181,8 +206,6 @@ For the necic acid, the following structural features were chosen:
   - Open-ring diester-type ({{pa.groups.Diester.n}} compounds)
   - Macrocyclic diester-type ({{pa.groups.Macrocyclic_diester.n}} compounds)
 
-The compilation of the PA dataset is described in detail in @Schoening2017.
-
 Descriptors
 -----------
 
@@ -201,7 +224,8 @@ descriptors. In addition, they allow the efficient calculation of chemical
 similarities (e.g. Tanimoto indices) with simple set operations.
 
 MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics
-library (@OBoyle2011a). They can be obtained from the following locations:
+library (@OBoyle2011a) for the complete training dataset with {{cv.n}}
+instances. They can be obtained from the following locations:
 
 *Training data:*
 
@@ -220,9 +244,9 @@ program (<http://www.yapcwsoft.com> version 2.21, @Yap2011). PaDEL uses the
 Chemistry Development Kit (*CDK*, <https://cdk.github.io/index.html>) library
 for descriptor calculations.
 
-As the training dataset contained {{cv.n_uniq}} instances, it was decided to
-delete instances with missing values during data pre-processing. Furthermore,
-substances with equivocal outcome were removed. The final training dataset
+As the training dataset contained {{cv.n}} instances, it was decided to
+delete all instances where CDK descriptor calculations failed during pre-processing. Furthermore,
+all substances with contradictory experimental mutagenicity data were removed. The final training dataset
 contained {{cv.cdk.n_descriptors}} descriptors for {{cv.cdk.n_compounds}}
 compounds.
 
@@ -601,12 +625,16 @@ models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%,
 @tbl:pa-summary, @fig:pa-groups). 
 
 Over all models, the mean value of mutagenic predicted PAs was highest for
-otonecines ({{pa.groups.Otonecine.mut_perc}}%, 
-{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}),
-followed by macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, {{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}),
-dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, {{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}),
-tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, {{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and
-retronecines ({{pa.groups.Retronecine.mut_perc}}%, {{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}).
+otonecines ({{pa.groups.Otonecine.mut_perc}}%,
+{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), followed by
+macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%,
+{{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}),
+dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%,
+{{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}),
+tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%,
+{{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and
+retronecines ({{pa.groups.Retronecine.mut_perc}}%,
+{{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}).
 
 When excluding the aforementioned three deviating models,
 the rank order stays the same, but the percentage of mutagenic PAs is higher.
@@ -674,8 +702,8 @@ This allows a critical examination of individual predictions and prevents blind
 trust in models that are intransparent to users with a toxicological
 background.
 
-<!--
 ![`lazar` screenshot of 12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione mutagenicity prediction](figures/lazar-screenshot.png){#fig:lazar}
+<!--
 -->
 
 Descriptors
@@ -725,25 +753,6 @@ Pyrrolizidine alkaloid mutagenicity predictions
 
 ### Algorithms and descriptors
 
-<!--
-The largest proportion of mutagenic predictions
-was observed for Otonecines {{pa.groups.Otonecine.mut_perc}}%
-({{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), the lowest for
-Monoesters {{pa.groups.Monoester.mut_perc}}%
-({{pa.groups.Monoester.mut}}/{{pa.groups.Monoester.n_pred}}) and N-Oxides
-{{pa.groups.N_oxide.mut_perc}}%
-({{pa.groups.N_oxide.mut}}/{{pa.groups.N_oxide.n_pred}}).
-
-Although most of the models show similar accuracies, sensitivities and
-specificities in crossvalidation experiments some of the models (MPD-RF, CDK-RF
-and CDK-SVM) predict a lower number of mutagens
-({{pa.cdk_rf.mut_perc}}-{{pa.mp2d_rf.mut_perc}}%) than the majority of the
-models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%
-(@fig:pa-groups). lazar-CDK on the other hand
-predicts the largest number of mutagens for all groups with exception of
-Otonecines.
--->
-
 @fig:pa-groups shows a clear differentiation between the different
 pyrrolizidine alkaloid groups.
 Nevertheless differences between predictions from different algorithms and descriptors
@@ -753,7 +762,7 @@ In order to investigate, if any of the investigated models show systematic
 errors in the  vicinity of pyrrolizidine-alkaloids we have performed a
 detailled t-SNE analysis of all models (see @fig:tsne-mp2d-rf and
 @fig:tsne-cdk-lazar-all for two examples, all visualisations can be found at
-<https://git.in-silico.ch/mutagenicity-paper/figures>.
+<https://git.in-silico.ch/mutagenicity-paper/figures>).
 
 None of the models showed obvious deviations from their expected
 behaviour, so the reason for the disagreement between some of the models
@@ -761,15 +770,24 @@ remains unclear at the moment.  It is however possible that some
 systematic errors are covered up by converting high dimensional spaces to two
 coordinates and are thus invisible in t-SNE visualisations.
 
+Only two compounds from the PA dataset (Senecivernine and Retronecine) are part
+of the training set. Both are non-mutagenic and were predicted as non-mutagenic
+by all models (instances have been removed from the training set for unbiased
+predictions). Despite the exact concordance, we cannot draw any general
+conclusions about model performance based on two examples with a single
+outcome. 
+
 ### Necic acid
 
 The rank order of the necic acid is comparable in all models. PAs from the
-monoester type had the lowest genotoxic potential, followed by PAs from the
+monoester type had the lowest genotoxic probability, followed by PAs from the
 open-ring diester type. PAs with macrocyclic diesters had the highest genotoxic
-potential. The result fits well with current state of knowledge: in general,
-PAs, which have a macrocyclic diesters as necic acid, are considered to be more toxic
-than those with an open-ring diester or monoester (@EFSA2011, @Fu2004,
-Ruan2014b). This was also confirmed by more recent studies, confirming that
+probability. The result fits well with current state of knowledge: in general,
+PAs, which have a macrocyclic diesters as necic acid, are considered to be more
+mutagenic than those with an open-ring diester or monoester (@EFSA2011,
+@Fu2004).  As pointed out above, open diesters and macrocyclic PAs have a
+reduced detoxification due to steric hinderance of the respective esterases
+(@Ruan2014). This was also confirmed by more recent studies, confirming that
 macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters
 (@Hadi2021; @Allemang2018, @Louisse2019). 
 
@@ -777,17 +795,20 @@ macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters
 
 In the rank order of necine base PAs, platynecine is the least mutagenic, followed
 by retronecine, and otonecine. Saturated PAs of the platynecine-type are
-generally accepted to be less or non-toxic and have been shown in *in vitro*
+generally accepted to be less or non-mutagenic and have been shown in *in vitro*
 experiments to form no DNA-adducts (@Xia2013). In literature,
-otonecine-type PAs were shown to be more toxic than those of the
+otonecine-type PAs were shown to be more mutagenic than those of the
 retronecine-type (@Li2013). 
 
 ### Modifications of necine base
 
 The group-specific results reflect the expected relationship between the
-groups: the low mutagenic potential of *N*-oxides and the high potential of
-dehydropyrrolizidines (DHP) (@Chen2010). 
-However, *N*-oxides may be *in vivo* converted back to their parent toxic/tumorigenic parent PA (@Yan2008),  on the other hand they are highly water soluble and generally considered as detoxification products, which are *in vivo* quickly renally eliminated (@Chen2010).
+groups: the low mutagenic probability of *N*-oxides and the high probability of
+dehydropyrrolizidines (DHP) (@Chen2010).  However, *N*-oxides may be *in vivo*
+converted back to their parent mutagenic/tumorigenic parent PA (@Yan2008),  on the
+other hand they are highly water soluble and generally considered as
+detoxification products, which are *in vivo* quickly renally eliminated
+(@Chen2010).
 
 DHP are regarded as the toxic principle in the metabolism of
 PAs, and are known to produce protein- and DNA-adducts (@Chen2010). None of our investigated
@@ -800,107 +821,50 @@ training set. In addition, DHP has two unsaturated double bounds in its necine
 base, making it highly reactive. DHP and other comparable molecules have a very
 short lifespan *in vivo*, and usually cannot be used in *in vitro* experiments. 
 
-<!--
-Furthermore, the probabilities for this substance groups needs to be considered, and not only the consolidated prediction. In the LAZAR model, all DHPs had probabilities for both outcomes (genotoxic and not genotoxic) mainly below 30%. Additionally, the probabilities for both outcomes were close together, often within 10% of each other. The fact that for both outcomes, the probabilities were low and close together, indicates a lower confidence in the prediction of the model for DHPs. 
--->
-
-<!--
-PA N-oxides are easily conjugated for extraction, they are generally considered
-as detoxification products, which are *in vivo* quickly renally eliminated
-(@Chen2010).
--->
-
 Overall the low number of positive mutagenicity predictions was unexpected.
-PAs are generally considered to be genotoxic, and the mode of action is also known.
-Therefore, the fact that some models predict the majority of PAs as not
+PAs are generally considered to be genotoxic, and the mode of action is also
+known.  Therefore, the fact that some models predict the majority of PAs as not
 mutagenic seems contradictory. To understand this result, the experimental
-basis of the training dataset has to be considered. The 
-training dataset is based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There are some
-studies, which show mutagenicity of PAs in the Ames test (@Chen2010).
-Also, @Rubiolo1992 examined several different PAs and several
-different extracts of PA-containing plants in the AMES test. They found that
-the Ames test was indeed able to detect mutagenicity of PAs, but in general,
-appeared to have a low sensitivity. The pre-incubation phase for metabolic
-activation of PAs by microsomal enzymes was the sensitivity-limiting step. This
-could very well mean that the low sensitivity of the Ames test for PAs is also reflected in the investigated models.
+basis of the training dataset has to be considered. The training dataset is
+based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There
+are some studies, which show mutagenicity of PAs in the Ames test (@Chen2010).
+Also, @Rubiolo1992 examined several different PAs and several different
+extracts of PA-containing plants in the Ames test. They found that the Ames
+test was indeed able to detect mutagenicity of PAs, but in general, appeared to
+have a low sensitivity. The pre-incubation phase for metabolic activation of
+PAs by microsomal enzymes was the sensitivity-limiting step. This could very
+well mean that the low sensitivity of the Ames test for PAs is also reflected
+in the investigated models.
 
+<!--
 A *in vitro* screen of cellular PA effects (metabolic activation and mutagenic
 effects) in human and rodent hepatocytes (HepG2 and H-4-II-E) showed that
 results may also critically depend on the cellular model and cell culture
 conditions and may underestimate the effects of PAs (@Forsch2018).
+-->
 
-In summary, we found marked differences in the predicted genotoxic potential
-between the PA groups: most toxic appeared the otonecines and macrocyclic
-diesters, least toxic the platynecines and the mono- and diesters. These
+In summary, we found marked differences in the predicted genotoxic probability
+between the PA groups: most mutagenic appeared the otonecines and macrocyclic
+diesters, least mutagenic the platynecines and the mono- and diesters. These
 results are comparable with *in vitro* measurements in hepatic HepaRG cells
 (@Louisse2019), where relative potencies (RP) were determined: for otonecines
 and cyclic diesters RP = 1, for open diesters RP = 0.1 and for monoesters RP =
 0.01.
 
-Due to a lack of
-differential data, European authorities based their risk assessment in a
-worst-case approach on lasiocarpine, for which sufficient data on genotoxicity
-and carcinogenicity were available (@HMPC2014, @EMA2020). Our data further support a tiered risk assessment
-based on *in silico* and experimental data on the relative potency of
-individual PAs as already suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019). 
+Due to a lack of differential data, European authorities based their risk
+assessment in a worst-case approach on lasiocarpine, for which sufficient data
+on genotoxicity and carcinogenicity were available (@HMPC2014, @EMA2020). Our
+data further support a tiered risk assessment based on *in silico* and
+experimental data on the relative potency of individual PAs as already
+suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019). 
 
-<!--
-non-conflicting CIDs
-43040
-186980
-187805
-610955
-3033169
-6429355
-10095536
-10251171
-10577975
-10838897
-10992912
-10996028
-11618501
-11827237
-11827238
-16687858
-73893122
-91747608
-91749688
-91751314
-91752877
-100979630
-100979631
-101648301
-102478913
-148322
-194088
-21626760
-91747610
-91747612
-91749428
-91749448
-102596226
-6440436
-4483893
-5315247
-46930232
-67189194
-91747354
-91749894
-101324794
-118701599
-
-R RF and SVM models favor very strongly non-mutagenic predictions (only {{pa.r.rf.mut_perc}} and {{pa.r.svm.mut_perc}} % mutagenic PAs), while Tensorflow models classify approximately half of the PAs as mutagenic (RF {{pa.tf.rf.mut_perc}}%, LR-sgd {{pa.tf.lr_sgd}}%, LR-scikit:{{pa.tf.lr_scikit.mut_perc}}, LR-NN:{{pa.tf.nn.mut_perc}}%). `lazar` models predict predominately non-mutagenicity, but to a lesser extend than R models (MP2D:{{pa.lazar.mp2d.all.mut_perc}}, CDK:{{pa.lazar.padel.all.mut_perc}}).
-
-It is interesting to note, that different implementations of the same algorithm show little accordance in their prediction (see e.g R-RF vs. Tensorflow-RF and LR-sgd vs. LR-scikit in Table 4 and @tbl:pa-summary).
-
-@fig:tsne-mp2d and @fig:tsne-padel show the t-SNE of training data and pyrrolizidine alkaloids. In @fig:tsne-mp2d the PAs are located closely together at the outer border of the training set. In @fig:tsne-padel they are less clearly separated and spread over the space occupied by the training examples.
-
-This is probably the reason why CDK models predicted all instances and the MP2D model only {{pa.lazar.mp2d.all.n}} PAs. Predicting a large number of instances is however not the ultimate goal, we need accurate predictions and an unambiguous estimation of the applicability domain. With CDK descriptors *all* PAs are within the applicability domain of the training data, which is unlikely despite the size of the training set. MolPrint2D descriptors provide a clearer separation, which is also reflected in a better separation between high and low confidence predictions in `lazar` MP2D predictions as compared to `lazar` CDK predictions. Crossvalidation results with substantially higher accuracies for MP2D models than for CDK models also support this argument.
-
-Differences between MP2D and CDK descriptors can be explained by their specific properties: CDK calculates a fixed set of descriptors for all structures, while MolPrint2D descriptors resemble substructures that are present in a compound. For this reason there is no fixed number of MP2D descriptors, the descriptor space are all unique substructures of the training set. If a query compound contains new substructures, this is immediately reflected in a lower similarity to training compounds, which makes applicability domain estimations very straightforward. With CDK (or any other predefined descriptors), the same set of descriptors is calculated for every compound, even if a compound comes from an completely new chemical class. 
-
-From a practical point we still have to face the question, how to choose model predictions, if no experimental data is available (we found two PAs in the training data, but this number is too low, to draw any general conclusions). Based on crossvalidation results and the arguments in favor of MolPrint2D descriptors we would put the highest trust in `lazar` MolPrint2D predictions, especially in high-confidence predictions. `lazar` predictions have a accuracy comparable to experimental variability (@Helma2018) for compounds within the applicability domain. But they should not be trusted blindly. For practical purposes it is important to study the rationales (i.e. neighbors and their experimental activities) for each prediction of relevance. A freely accessible GUI for this purpose has been implemented at https://lazar.in-silico.ch.
--->
+The practical question how to choose model predictions in the absence of
+experimental data remains open. Tensorflow predictions do not include
+applicability domain estimations and the rationales for predictions cannot be
+traced by toxicologists.  Transparent models like `lazar` may have an advantage
+in this context, because they present rationales for predictions (similar
+compounds with experimental data) which can be accepted or rejected by
+toxicologists and provide validated applicability domain estimations. 
 
 Conclusions
 ===========
@@ -918,48 +882,11 @@ however a substantially lower number of mutagenicity predictions, despite
 similar crossvalidation results and we were unable to identify the reasons for
 this discrepancy within this investigation.
 
-Thus the practical question how to choose model predictions in the absence of
-experimental data remains open. Tensorflow predictions do not include
-applicability domain estimations and the rationales for predictions cannot be
-traced by toxicologists.  Transparent models like `lazar` may have an advantage
-in this context, because they present rationales for predictions (similar
-compounds with experimental data) which can be accepted or rejected by
-toxicologists and provide validated applicability domain estimations. 
-
-Our data show that large difference exist with regard to genotoxic potential between different pyrrolizidine subgroups. These results may allow to adjust risk assessment of pyrrolizidine contamination.
-
-<!---
-in a form that is understandable and criticiseable by toxicologists without a machine learning background.
-
-is available (we found two PAs in the training data, but this number is too low, to draw any general conclusions). Based on crossvalidation results and the arguments in favor of MolPrint2D descriptors we would put the highest trust in `lazar` MolPrint2D predictions, especially in high-confidence predictions. `lazar` predictions have a accuracy comparable to experimental variability (@Helma2018) for compounds within the applicability domain. But they should not be trusted blindly. For practical purposes it is important to study the rationales (i.e. neighbors and their experimental activities) for each prediction of relevance. A freely accessible GUI for this purpose has been implemented at https://lazar.in-silico.ch.
-The best performance was obtained with `lazar` models
-using MolPrint2D descriptors, with prediction accuracies
-({{cv.lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability
-of the Ames test (80-85%). Models based on CDK descriptors had lower
-accuracies than MolPrint2D models, but only the `lazar` algorithm could use
-MolPrint2D descriptors.
-
-**TODO**: PA Vorhersagen
-
-In this study, an attempt was made to predict the genotoxic potential of
-PAs using five different machine learning techniques (LAZAR, RF, SVM, DL
-(R-project and Tensorflow). The results of all models fitted only partly
-to the findings in literature, with best results obtained with the
-Tensorflow DL model. Therefore, modelling allows statements on the
-relative risks of genotoxicity of the different PA groups. Individual
-predictions for selective PAs appear, however, not reliable on the
-current basis of the used training dataset.
-
-This study emphasises the importance of critical assessment of
-predictions by QSAR models. This includes not only extensive literature
-research to assess the plausibility of the predictions, but also a good
-knowledge of the metabolism of the test substances and understanding for
-possible mechanisms of toxicity.
-
-In further studies, additional machine learning techniques or a modified
-(extended) training dataset should be used for an additional attempt to
-predict the genotoxic potential of PAs.
---->
+Our data show that large difference exist with regard to genotoxic probabilities
+between different pyrrolizidine subgroups. To adjust risk assessment of
+pyrrolizidine contamination, our data supports a tiered risk assessment based
+on *in silico* and experimental data on the relative potency of individual
+pyrrolizidine alkaloids.
 
 References
 ==========