1 files changed, 61 insertions, 27 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index 9c5f427..c80bdf1 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -10,6 +10,8 @@ author:
       institute: insel
   - Jürgen Drewe:
       institute: zeller, unibas
+      email: juergendrewe@zellerag.ch
+      correspondence: "yes"
   - Philipp Boss:
       institute: sysbio
 
@@ -33,11 +35,12 @@ institute:
 bibliography: bibliography.bib
 keywords: mutagenicity, QSAR, lazar, random forest, support vector machine, linear regression, neural nets, deep learning, pyrrolizidine alkaloids, OpenBabel, CDK
 
+#documentclass: frontiersHLTH
 documentclass: scrartcl
 tblPrefix: Table
 figPrefix: Figure
 header-includes:
-    - \usepackage{lineno, setspace, color, colortbl, longtable}
+    - \usepackage{lineno, color, setspace}
     - \doublespacing
     - \linenumbers
 ...
@@ -69,13 +72,14 @@ Computer based (*in silico*) mutagenicity predictions can be used in the early
 screening of novel compounds (e.g. drug candidates), but they are also gaining
 regulatory acceptance e.g. for the registration of industrial chemicals within
 REACH (@ECHA2017) or the assessment of impurities in pharmaceuticals (ICH M7
-guideline, @ICH2017).
+guideline, Harmonisation of Technical Requirements for Pharmaceuticals for
+Human Use @ICH2017).
 
-*Salmonella* mutagenicity is at the moment the toxicological endpoint with the
+Currently, *Salmonella* mutagenicity is the toxicological endpoint with the
 largest amount of public data for almost 10000 structures, whereas datasets for
 other endpoints contain typically only a few hundred compounds. The Ames test
 itself is relatively reproducible with an interlaboratory variability of 80-85%
-(@Benigni1988).
+(@Piegorsch1991).
 
 This makes the development of mutagenicity models also interesting from a
 computational chemistry and machine learning point of view.  The relatively
@@ -148,8 +152,8 @@ under a GPL3 License. The new combined dataset can be found at
 The pyrrolizidine alkaloid dataset was created from five independent, necine
 base substructure searches in PubChem (https://pubchem.ncbi.nlm.nih.gov/) and
 compared to the PAs listed in the EFSA publication @EFSA2011 and the book by
-Mattocks @Mattocks1986, to ensure, that all major PAs were included. PAs
-mentioned in these publications which were not found in the downloaded
+@Mattocks1986, to ensure, that all major PAs were included. PAs
+mentioned in these publications, which were not found in the downloaded
 substances were searched individually in PubChem and, if available, downloaded
 separately.  Non-PA substances, duplicates, and isomers were removed from the
 files, but artificial PAs, even if unlikely to occur in nature, were kept. The
@@ -193,7 +197,7 @@ In contrast to predefined lists of fragments (e.g. FP3, FP4 or MACCs
 fingerprints) or descriptors (e.g CDK) they are generated dynamically from
 chemical structures. This has the advantage that they can capture unknown
 substructures of toxicological relevance that are not included in other
-descriptors. In addition they allow the efficient calculation of chemical
+descriptors. In addition, they allow the efficient calculation of chemical
 similarities (e.g. Tanimoto indices) with simple set operations.
 
 MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics
@@ -297,7 +301,7 @@ absence of closely related neighbours, we follow a tiered approach:
     flagged with a warning that it might be out of the applicability domain of
     the training data (*low confidence*).
 
--   These Similarity thresholds are the default values chosen
+-   These similarity thresholds are the default values chosen
     by software developers and remained unchanged during the
     course of these experiments.
 
@@ -368,13 +372,13 @@ to a uniform distribution. MP2D features were not preprocessed.
 #### Random forests (*RF*)
 
 For the random forest classifier we used the parameters 
-n_estimators=1000and max_leaf_nodes=200. For the other parameters we 
+n_estimators=1000 and max_leaf_nodes=200. For the other parameters we 
 used the scikit-learn default values.
 
 #### Logistic regression (SGD) (*LR-sgd*)
 
 For the logistic regression we used an ensemble of five trained models. 
-For each model we used a batch size of 64 and trained for 50 epoch. As 
+For each model we used a batch size of 64 and trained for 50 epochs. As 
 an optimizer ADAM was chosen. For the other parameters we used the 
 tensorflow default values.
 
@@ -386,7 +390,7 @@ default values.
 #### Neural Nets (*NN*)
 
 For the neural network we used an ensemble of five trained models. For 
-each model we used a batch size of 64 and trained for 50 epoch. As an 
+each model we used a batch size of 64 and trained for 50 epochs. As an 
 optimizer ADAM was chosen. The neural network had 4 hidden layers with 
 64 nodes each and a ReLu activation function. For the other parameters 
 we used the tensorflow default values.
@@ -467,7 +471,7 @@ https://git.in-silico.ch/mutagenicity-paper/tree/crossvalidations/predictions/.
 All investigated algorithm/descriptor combinations
 give accuracies between (80 and 85%) which is equivalent to the experimental
 variability of the *Salmonella typhimurium* mutagenicity bioassay (80-85%,
-@Benigni1988). Sensitivities and specificities are balanced in all of
+@Piegorsch1991). Sensitivities and specificities are balanced in all of
 these models.
 
 Pyrrolizidine alkaloid mutagenicity predictions 
@@ -638,16 +642,16 @@ frequently *local models*, because models are generated specifically for each
 query compound. The investigated tensorflow models are in contrast *global
 models*, i.e. a single model is used to make predictions for all compounds. It
 has been postulated in the past, that local models are more accurate, because
-they can account better for mechanisms, that affect only a subset of the
+they can account better for mechanisms that affect only a subset of the
 training data.
 
 @tbl:cv-mp2d, @tbl:cv-cdk and @fig:roc show that the crossvalidation accuracies
 of all models are comparable to the experimental variability of the *Salmonella
-typhimurium* mutagenicity bioassay (80-85% according to @Benigni1988). All of
-these models have balanced sensitivity (true position rate) and specificity
+typhimurium* mutagenicity bioassay (80-85% according to @Piegorsch1991). All of
+these models have balanced sensitivity (true positive rate) and specificity
 (true negative rate) and provide highly significant concordance with
 experimental data (as determined by McNemar's Test). This is a clear indication
-that *in-silico* predictions can be as reliable as the bioassays. Given that
+that *in silico* predictions can be as reliable as the bioassays. Given that
 the variability of experimental data is similar to model variability it is
 impossible to decide which model gives the most accurate predictions, as models
 with higher accuracies might just approximate experimental errors better than
@@ -663,11 +667,16 @@ depend more on practical considerations than on intrinsic  properties. Nearest
 neighbor algorithms like `lazar` have the practical advantage that the
 rationales for individual predictions can be presented in a  straightforward
 manner that is understandable without a background in statistics or machine
-learning (@fig:lazar). This allows a critical examination of individual
-predictions and prevents blind trust in models that are intransparent to users
-with a toxicological background.
+learning (a screenshot of the mutagenicity prediction for
+12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione can be found at
+https://git.in-silico.ch/mutagenicity-paper/tree/figures/lazar-screenshot.png).
+This allows a critical examination of individual predictions and prevents blind
+trust in models that are intransparent to users with a toxicological
+background.
 
-![Lazar screenshot of 12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione mutagenicity prediction](figures/lazar-screenshot.png){#fig:lazar}
+<!--
+![`lazar` screenshot of 12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione mutagenicity prediction](figures/lazar-screenshot.png){#fig:lazar}
+-->
 
 Descriptors
 -----------
@@ -776,27 +785,30 @@ retronecine-type (@Li2013). 
 ### Modifications of necine base
 
 The group-specific results reflect the expected relationship between the
-groups: the low mutagenic potential of N-oxides and the high potential of
-Dehydropyrrolizidines (DHP) (@Chen2010). 
+groups: the low mutagenic potential of *N*-oxides and the high potential of
+dehydropyrrolizidines (DHP) (@Chen2010). 
+However, *N*-oxides may be *in vivo* converted back to their parent toxic/tumorigenic parent PA (@Yan2008),  on the other hand they are highly water soluble and generally considered as detoxification products, which are *in vivo* quickly renally eliminated (@Chen2010).
 
-Dehydropyrrolizidines are regarded as the toxic principle in the metabolism of
-PAs, and known to produce protein- and DNA-adducts (@Chen2010). None of the
-models did not meet this expectation and predicted the majority of DHP as
+DHP are regarded as the toxic principle in the metabolism of
+PAs, and are known to produce protein- and DNA-adducts (@Chen2010). None of our investigated
+models did meet this expectation and all of them predicted the majority of DHP as
 non-mutagenic. However, the following issues need to be considered. On the one
-hand, all DHP were outside of the stricter applicability domain of MP2D lazar.
+hand, all DHP were outside of the stricter applicability domain of MP2D `lazar`.
 This indicates that they are structurally very different than the training data
 and might be out of the applicability domain of all models based on this
 training set. In addition, DHP has two unsaturated double bounds in its necine
 base, making it highly reactive. DHP and other comparable molecules have a very
-short lifespan, and usually cannot be used in *in vitro* experiments. 
+short lifespan *in vivo*, and usually cannot be used in *in vitro* experiments. 
 
 <!--
 Furthermore, the probabilities for this substance groups needs to be considered, and not only the consolidated prediction. In the LAZAR model, all DHPs had probabilities for both outcomes (genotoxic and not genotoxic) mainly below 30%. Additionally, the probabilities for both outcomes were close together, often within 10% of each other. The fact that for both outcomes, the probabilities were low and close together, indicates a lower confidence in the prediction of the model for DHPs. 
 -->
 
+<!--
 PA N-oxides are easily conjugated for extraction, they are generally considered
 as detoxification products, which are *in vivo* quickly renally eliminated
 (@Chen2010).
+-->
 
 Overall the low number of positive mutagenicity predictions was unexpected.
 PAs are generally considered to be genotoxic, and the mode of action is also known.
@@ -812,6 +824,26 @@ appeared to have a low sensitivity. The pre-incubation phase for metabolic
 activation of PAs by microsomal enzymes was the sensitivity-limiting step. This
 could very well mean that the low sensitivity of the Ames test for PAs is also reflected in the investigated models.
 
+A *in vitro* screen of cellular PA effects (metabolic activation and mutagenic
+effects) in human and rodent hepatocytes (HepG2 and H-4-II-E) showed that
+results may also critically depend on the cellular model and cell culture
+conditions and may underestimate the effects of PAs (@Forsch2018).
+
+In summary, we found marked differences in the predicted genotoxic potential
+between the PA groups: most toxic appeared the otonecines and macrocyclic
+diesters, least toxic the platynecines and the mono- and diesters. These
+results are comparable with *in vitro* measurements in hepatic HepaRG cells
+(@Louisse2019), where relative potencies (RP) were determined: for otonecines
+and cyclic diesters RP = 1, for open diesters RP = 0.1 and for monoesters RP =
+0.01.
+
+Due to a lack of
+differential data, European authorities based their risk assessment in a
+worst-case approach on lasiocarpine, for which sufficient data on genotoxicity
+and carcinogenicity were available (@HMPC2014, @EMA2020). Our data further support a tiered risk assessment
+based on *in silico* and experimental data on the relative potency of
+individual PAs as already suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019). 
+
 <!--
 non-conflicting CIDs
 43040
@@ -894,6 +926,8 @@ in this context, because they present rationales for predictions (similar
 compounds with experimental data) which can be accepted or rejected by
 toxicologists and provide validated applicability domain estimations. 
 
+Our data show that large difference exist with regard to genotoxic potential between different pyrrolizidine subgroups. These results may allow to adjust risk assessment of pyrrolizidine contamination.
+
 <!---
 in a form that is understandable and criticiseable by toxicologists without a machine learning background.