From 7a6d28d9ad32198af1feb848352731ab2fd7e2f1 Mon Sep 17 00:00:00 2001 From: Christoph Helma Date: Fri, 25 Jun 2021 11:29:49 +0200 Subject: first revision --- mutagenicity.md | 305 +++++++++++++++++++++----------------------------------- 1 file changed, 116 insertions(+), 189 deletions(-) (limited to 'mutagenicity.md') diff --git a/mutagenicity.md b/mutagenicity.md index c80bdf1..1014cc1 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -49,7 +49,7 @@ Abstract ======== Random forest, support vector machine, logistic regression, neural networks and -k-nearest neighbor (`lazar`) algorithms, were applied to new *Salmonella* +k-nearest neighbor (`lazar`) algorithms, were applied to a new *Salmonella* mutagenicity dataset with {{cv.n_uniq}} unique chemical structures utilizing MolPrint2D and Chemistry Development Kit (CDK) descriptors. Crossvalidation accuracies of all investigated models ranged from 80-85% which is comparable @@ -62,7 +62,7 @@ Introduction ============ The assessment of mutagenicity is an important part in the safety assessment of -chemical structures, because genomic changes may lead to cancer and germ +chemical structures, because mutations may lead to cancer and germ cells damage. The *Salmonella typhimurium* bacterial reverse mutation test (Ames test) is capable to identify substances that cause mutations (e.g., base-pair substitutions, frameshifts, insertions, deletions) and is generally @@ -93,9 +93,9 @@ Within this study we attempted - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors for mutagenicity predictions - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`) -In order to highlight potentials and problems with the application of -mutagenicity models to compounds with limited experimental data we decided to -apply these mutagenicity models to {{pa.nr}} Pyrrolizidine alkaloids (PAs). +To demonstrate the application of mutagenicity models to compounds with very +limited experimental data and to show their strengths an weaknesses we decided +to apply them to {{pa.nr}} Pyrrolizidine alkaloids (PAs). Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae* @@ -103,14 +103,28 @@ families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae* herbivores. PAs are heterocyclic ester alkaloids composed of a necine base (two fused five-membered rings joined by a single nitrogen atom) and a necic acid (one or two carboxylic ester arms), occurring principally in two forms, -tertiary base PAs and PA N-oxides. Several *in vitro* studies have shown the -mutagenic potential of PAs, which seems highly dependent on structure of necine +tertiary base PAs and PA N-oxides. + +In mammals, PAs are mainly metabolized in the liver. There are three principal metabolic pathways for 1,2-unsaturated PAs (@Chen2010):  + +Detoxification by  + +- hydrolysis of the ester bond on positions C7 and C9 by non-specific esterases to release necine base and necic acid  + +- N-oxidation of the necine base to form a PA N-oxides, which can be either conjugated by phase II enzymes and then excreted or converted back into the corresponding parent PA (following ref) This detoxification pathway is not possible for otonecine-type PAs, as they are N-methylated (see @fig:pa-schema, @Wang2005) + +- Metabolic activation or toxification by oxidation (for retronecine-type PAs) or oxidative N-demethylation (for otonecine-type Pas) by cytochromes P450 isoforms CYP2B and 3A (@Lin1998, @Ruan2014) + +The latter reactions result in the formation of dehydropyrrolizidine (DHP) that is highly reactive and causes damage by building adducts with protein, lipids and DNA (@Chen2010). On the other hand, open diesters and macrocyclic PAs have a reduced detoxification due to steric hinderance of the respective esterases (@Ruan2014) + +Therefore the +mutagenic probability of PAs is highly dependent on structure of necine base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to limited availability of pure substances, only a limited number of PAs have been -investigated with regards to their structure-specific mutagenicity. To overcome -this bottleneck, the prediction of structure-specific mutagenic potential of -PAs with different machine learning models could provide further inside in the -mechanisms. +investigated with regards to their structure-specific mutagenicity and +experimentally in an Ames test. To overcome this bottleneck, the prediction of +structure-specific mutagenic probabilities of PAs with different machine learning +models could provide further insights in the mechanisms. Materials and Methods ===================== @@ -129,10 +143,17 @@ training dataset was compiled from the following sources: - EFSA Dataset (695 compounds @EFSA2016): -Mutagenicity classifications from Kazius and Hansen datasets were used -without further processing. To achieve consistency with these -datasets, EFSA compounds were classified as mutagenic, if at least one -positive result was found for TA98 or T100 Salmonella strains. +Mutagenicity classifications from Kazius and Hansen datasets were used without +further processing. According to these publications compounds were classified +as mutagenic, if at least one positive result has been obtained in *Salmonella +typhimurium* strains TA98, TA100, TA1535, TA1537, TA97, TA102 and 1538 either +with or without metabolic activation by S9. *E. coli* results were not +considered in these databases. To achieve consistency with these datasets, EFSA +compounds were classified as mutagenic, if at least one positive result was +found for TA98 or T100 Salmonella strains either with or without metabolic +activation. The complete dataset contains chemicals for very diverse +application areas (e.g. pharmaceuticals, pesticides, industrial chemicals, +environmental contaminants). Dataset merges were based on unique SMILES (*Simplified Molecular Input Line Entry Specification*, @Weininger1989) strings of the compound structures. @@ -158,10 +179,14 @@ substances were searched individually in PubChem and, if available, downloaded separately. Non-PA substances, duplicates, and isomers were removed from the files, but artificial PAs, even if unlikely to occur in nature, were kept. The resulting PA dataset comprised a total of {{pa.n}} different PAs. +Further details about the compilation of the PA dataset are described in @Schoening2017. + The PAs in the dataset were classified according to structural features. A total of 9 different structural features were assigned to the necine base, -modifications of the necine base and to the necic acid: +modifications of the necine base and to the necic acid (@fig:pa-schema): + +![Structural features of pyrrolizidine alkaloids](figures/PA-Schema.png){#fig:pa-schema} For the necine base, the following structural features were chosen: @@ -172,8 +197,8 @@ For the necine base, the following structural features were chosen: For the modifications of the necine base, the following structural features were chosen: - N-oxide-type ({{pa.groups.N_oxide.n}} compounds) + - Dehydropyrrolizidine-type (DHP, pyrrolic ester, {{pa.groups.Dehydropyrrolizidine.n}} compounds) - Tertiary-type (PAs which were neither from the N-oxide- nor DHP-type, {{pa.groups.Tertiary_PA.n}} compounds) - - Dehydropyrrolizidine-type (pyrrolic ester, {{pa.groups.Dehydropyrrolizidine.n}} compounds) For the necic acid, the following structural features were chosen: @@ -181,8 +206,6 @@ For the necic acid, the following structural features were chosen: - Open-ring diester-type ({{pa.groups.Diester.n}} compounds) - Macrocyclic diester-type ({{pa.groups.Macrocyclic_diester.n}} compounds) -The compilation of the PA dataset is described in detail in @Schoening2017. - Descriptors ----------- @@ -201,7 +224,8 @@ descriptors. In addition, they allow the efficient calculation of chemical similarities (e.g. Tanimoto indices) with simple set operations. MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics -library (@OBoyle2011a). They can be obtained from the following locations: +library (@OBoyle2011a) for the complete training dataset with {{cv.n}} +instances. They can be obtained from the following locations: *Training data:* @@ -220,9 +244,9 @@ program ( version 2.21, @Yap2011). PaDEL uses the Chemistry Development Kit (*CDK*, ) library for descriptor calculations. -As the training dataset contained {{cv.n_uniq}} instances, it was decided to -delete instances with missing values during data pre-processing. Furthermore, -substances with equivocal outcome were removed. The final training dataset +As the training dataset contained {{cv.n}} instances, it was decided to +delete all instances where CDK descriptor calculations failed during pre-processing. Furthermore, +all substances with contradictory experimental mutagenicity data were removed. The final training dataset contained {{cv.cdk.n_descriptors}} descriptors for {{cv.cdk.n_compounds}} compounds. @@ -601,12 +625,16 @@ models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%, @tbl:pa-summary, @fig:pa-groups).  Over all models, the mean value of mutagenic predicted PAs was highest for -otonecines ({{pa.groups.Otonecine.mut_perc}}%, -{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), -followed by macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, {{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}), -dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, {{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}), -tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, {{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and -retronecines ({{pa.groups.Retronecine.mut_perc}}%, {{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}). +otonecines ({{pa.groups.Otonecine.mut_perc}}%, +{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), followed by +macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, +{{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}), +dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, +{{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}), +tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, +{{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and +retronecines ({{pa.groups.Retronecine.mut_perc}}%, +{{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}). When excluding the aforementioned three deviating models, the rank order stays the same, but the percentage of mutagenic PAs is higher. @@ -674,8 +702,8 @@ This allows a critical examination of individual predictions and prevents blind trust in models that are intransparent to users with a toxicological background. - Descriptors @@ -725,25 +753,6 @@ Pyrrolizidine alkaloid mutagenicity predictions ### Algorithms and descriptors - - @fig:pa-groups shows a clear differentiation between the different pyrrolizidine alkaloid groups. Nevertheless differences between predictions from different algorithms and descriptors @@ -753,7 +762,7 @@ In order to investigate, if any of the investigated models show systematic errors in the vicinity of pyrrolizidine-alkaloids we have performed a detailled t-SNE analysis of all models (see @fig:tsne-mp2d-rf and @fig:tsne-cdk-lazar-all for two examples, all visualisations can be found at -. +). None of the models showed obvious deviations from their expected behaviour, so the reason for the disagreement between some of the models @@ -761,15 +770,24 @@ remains unclear at the moment. It is however possible that some systematic errors are covered up by converting high dimensional spaces to two coordinates and are thus invisible in t-SNE visualisations. +Only two compounds from the PA dataset (Senecivernine and Retronecine) are part +of the training set. Both are non-mutagenic and were predicted as non-mutagenic +by all models (instances have been removed from the training set for unbiased +predictions). Despite the exact concordance, we cannot draw any general +conclusions about model performance based on two examples with a single +outcome. + ### Necic acid The rank order of the necic acid is comparable in all models. PAs from the -monoester type had the lowest genotoxic potential, followed by PAs from the +monoester type had the lowest genotoxic probability, followed by PAs from the open-ring diester type. PAs with macrocyclic diesters had the highest genotoxic -potential. The result fits well with current state of knowledge: in general, -PAs, which have a macrocyclic diesters as necic acid, are considered to be more toxic -than those with an open-ring diester or monoester (@EFSA2011, @Fu2004, -Ruan2014b). This was also confirmed by more recent studies, confirming that +probability. The result fits well with current state of knowledge: in general, +PAs, which have a macrocyclic diesters as necic acid, are considered to be more +mutagenic than those with an open-ring diester or monoester (@EFSA2011, +@Fu2004). As pointed out above, open diesters and macrocyclic PAs have a +reduced detoxification due to steric hinderance of the respective esterases +(@Ruan2014). This was also confirmed by more recent studies, confirming that macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters (@Hadi2021; @Allemang2018, @Louisse2019).  @@ -777,17 +795,20 @@ macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters In the rank order of necine base PAs, platynecine is the least mutagenic, followed by retronecine, and otonecine. Saturated PAs of the platynecine-type are -generally accepted to be less or non-toxic and have been shown in *in vitro* +generally accepted to be less or non-mutagenic and have been shown in *in vitro* experiments to form no DNA-adducts (@Xia2013). In literature, -otonecine-type PAs were shown to be more toxic than those of the +otonecine-type PAs were shown to be more mutagenic than those of the retronecine-type (@Li2013).  ### Modifications of necine base The group-specific results reflect the expected relationship between the -groups: the low mutagenic potential of *N*-oxides and the high potential of -dehydropyrrolizidines (DHP) (@Chen2010).  -However, *N*-oxides may be *in vivo* converted back to their parent toxic/tumorigenic parent PA (@Yan2008),  on the other hand they are highly water soluble and generally considered as detoxification products, which are *in vivo* quickly renally eliminated (@Chen2010). +groups: the low mutagenic probability of *N*-oxides and the high probability of +dehydropyrrolizidines (DHP) (@Chen2010).  However, *N*-oxides may be *in vivo* +converted back to their parent mutagenic/tumorigenic parent PA (@Yan2008),  on the +other hand they are highly water soluble and generally considered as +detoxification products, which are *in vivo* quickly renally eliminated +(@Chen2010). DHP are regarded as the toxic principle in the metabolism of PAs, and are known to produce protein- and DNA-adducts (@Chen2010). None of our investigated @@ -800,107 +821,50 @@ training set. In addition, DHP has two unsaturated double bounds in its necine base, making it highly reactive. DHP and other comparable molecules have a very short lifespan *in vivo*, and usually cannot be used in *in vitro* experiments. - - - - Overall the low number of positive mutagenicity predictions was unexpected. -PAs are generally considered to be genotoxic, and the mode of action is also known. -Therefore, the fact that some models predict the majority of PAs as not +PAs are generally considered to be genotoxic, and the mode of action is also +known. Therefore, the fact that some models predict the majority of PAs as not mutagenic seems contradictory. To understand this result, the experimental -basis of the training dataset has to be considered. The -training dataset is based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There are some -studies, which show mutagenicity of PAs in the Ames test (@Chen2010). -Also, @Rubiolo1992 examined several different PAs and several -different extracts of PA-containing plants in the AMES test. They found that -the Ames test was indeed able to detect mutagenicity of PAs, but in general, -appeared to have a low sensitivity. The pre-incubation phase for metabolic -activation of PAs by microsomal enzymes was the sensitivity-limiting step. This -could very well mean that the low sensitivity of the Ames test for PAs is also reflected in the investigated models. +basis of the training dataset has to be considered. The training dataset is +based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There +are some studies, which show mutagenicity of PAs in the Ames test (@Chen2010). +Also, @Rubiolo1992 examined several different PAs and several different +extracts of PA-containing plants in the Ames test. They found that the Ames +test was indeed able to detect mutagenicity of PAs, but in general, appeared to +have a low sensitivity. The pre-incubation phase for metabolic activation of +PAs by microsomal enzymes was the sensitivity-limiting step. This could very +well mean that the low sensitivity of the Ames test for PAs is also reflected +in the investigated models. + -In summary, we found marked differences in the predicted genotoxic potential -between the PA groups: most toxic appeared the otonecines and macrocyclic -diesters, least toxic the platynecines and the mono- and diesters. These +In summary, we found marked differences in the predicted genotoxic probability +between the PA groups: most mutagenic appeared the otonecines and macrocyclic +diesters, least mutagenic the platynecines and the mono- and diesters. These results are comparable with *in vitro* measurements in hepatic HepaRG cells (@Louisse2019), where relative potencies (RP) were determined: for otonecines and cyclic diesters RP = 1, for open diesters RP = 0.1 and for monoesters RP = 0.01. -Due to a lack of -differential data, European authorities based their risk assessment in a -worst-case approach on lasiocarpine, for which sufficient data on genotoxicity -and carcinogenicity were available (@HMPC2014, @EMA2020). Our data further support a tiered risk assessment -based on *in silico* and experimental data on the relative potency of -individual PAs as already suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019).  +Due to a lack of differential data, European authorities based their risk +assessment in a worst-case approach on lasiocarpine, for which sufficient data +on genotoxicity and carcinogenicity were available (@HMPC2014, @EMA2020). Our +data further support a tiered risk assessment based on *in silico* and +experimental data on the relative potency of individual PAs as already +suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019).  - +The practical question how to choose model predictions in the absence of +experimental data remains open. Tensorflow predictions do not include +applicability domain estimations and the rationales for predictions cannot be +traced by toxicologists. Transparent models like `lazar` may have an advantage +in this context, because they present rationales for predictions (similar +compounds with experimental data) which can be accepted or rejected by +toxicologists and provide validated applicability domain estimations. Conclusions =========== @@ -918,48 +882,11 @@ however a substantially lower number of mutagenicity predictions, despite similar crossvalidation results and we were unable to identify the reasons for this discrepancy within this investigation. -Thus the practical question how to choose model predictions in the absence of -experimental data remains open. Tensorflow predictions do not include -applicability domain estimations and the rationales for predictions cannot be -traced by toxicologists. Transparent models like `lazar` may have an advantage -in this context, because they present rationales for predictions (similar -compounds with experimental data) which can be accepted or rejected by -toxicologists and provide validated applicability domain estimations. - -Our data show that large difference exist with regard to genotoxic potential between different pyrrolizidine subgroups. These results may allow to adjust risk assessment of pyrrolizidine contamination. - - +Our data show that large difference exist with regard to genotoxic probabilities +between different pyrrolizidine subgroups. To adjust risk assessment of +pyrrolizidine contamination, our data supports a tiered risk assessment based +on *in silico* and experimental data on the relative potency of individual +pyrrolizidine alkaloids. References ========== -- cgit v1.2.3