diff options
author | Christoph Helma <helma@in-silico.ch> | 2021-06-30 12:50:00 +0200 |
---|---|---|
committer | Christoph Helma <helma@in-silico.ch> | 2022-08-18 12:51:45 +0200 |
commit | 66543ccf5724f55e39775a1650b7b36381ae5ea9 (patch) | |
tree | f4b40e1774cdc368143b00e60939c12642bdc6f2 /mutagenicity.md | |
parent | 1f956a4963f62c90475ac8e1f713b989b5a99b36 (diff) |
Diffstat (limited to 'mutagenicity.md')
-rw-r--r-- | mutagenicity.md | 73 |
1 files changed, 38 insertions, 35 deletions
diff --git a/mutagenicity.md b/mutagenicity.md index eb0ce3c..98e25a5 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -62,11 +62,11 @@ Introduction ============ The assessment of mutagenicity is an important part in the safety assessment of -chemical structures, because mutations may lead to cancer and germ -cells damage. The *Salmonella typhimurium* bacterial reverse mutation -test (Ames test) is capable to identify substances that cause mutations (e.g., -base-pair substitutions, frameshifts, insertions, deletions) and is generally -used as the first step in genotoxicity and carcinogenicity assessments. +chemical structures, because mutations may lead to cancer and germ cells +damage. The bacterial reverse mutation test (Ames test) is capable to identify +substances that cause mutations (e.g., base-pair substitutions, frameshifts, +insertions, deletions) and is generally used as the first step in genotoxicity +and carcinogenicity assessments. Computer based (*in silico*) mutagenicity predictions can be used in the early screening of novel compounds (e.g. drug candidates), but they are also gaining @@ -75,7 +75,7 @@ REACH (@ECHA2017) or the assessment of impurities in pharmaceuticals (ICH M7 guideline, Harmonisation of Technical Requirements for Pharmaceuticals for Human Use @ICH2017). -Currently, *Salmonella* mutagenicity is the toxicological endpoint with the +Currently, mutagenicity is the toxicological endpoint with the largest amount of public data for almost 10000 structures, whereas datasets for other endpoints contain typically only a few hundred compounds. The Ames test itself is relatively reproducible with an interlaboratory variability of 80-85% @@ -89,7 +89,7 @@ of overfitting experimental errors. Within this study we attempted - - to generate a new public mutagenicity training dataset, by combining the most comprehensive public datasets + - to generate a new public mutagenicity training dataset focusing on *Salmonella typhimurium*, by combining the most comprehensive public datasets - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors for mutagenicity predictions - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`) @@ -109,20 +109,23 @@ In mammals, PAs are mainly metabolized in the liver. There are three principal m - Detoxification by hydrolysis of the ester bond on positions C7 and C9 by non-specific esterases to release necine base and necic acid. -- N-oxidation of the necine base to form a PA N-oxides, which can be either conjugated by phase II enzymes and then excreted or converted back into the corresponding parent PA (@Wang2005). This detoxification pathway is not possible for otonecine-type PAs, as they are N-methylated (see @fig:pa-schema). +- N-oxidation of the necine base to form PA N-oxides, which can be either conjugated by phase II enzymes and then excreted or converted back into the corresponding parent PA (@Wang2005). This detoxification pathway is not possible for otonecine-type PAs, as they are N-methylated (see @fig:pa-schema). - Metabolic activation or toxification by oxidation (for retronecine-type PAs) or oxidative N-demethylation (for otonecine-type Pas) by cytochromes P450 isoforms CYP2B and 3A (@Lin1998, @Ruan2014). -The latter reactions result in the formation of dehydropyrrolizidine (DHP) that is highly reactive and causes damage by building adducts with protein, lipids and DNA (@Chen2010). On the other hand, open diesters and macrocyclic PAs have a reduced detoxification due to steric hinderance of the respective esterases (@Ruan2014) +The latter reactions result in the formation of dehydropyrrolizidine (DHP) that +is highly reactive and causes damage by building adducts with protein, lipids +and DNA (@Chen2010). On the other hand, open diesters and macrocyclic PAs have +a reduced detoxification due to steric hinderance of the respective esterases +(@Ruan2014). -Therefore the -mutagenic probability of PAs is highly dependent on structure of necine -base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to -limited availability of pure substances, only a limited number of PAs have been -investigated with regards to their structure-specific mutagenicity and -experimentally in an Ames test. To overcome this bottleneck, the prediction of -structure-specific mutagenic probabilities of PAs with different machine learning -models could provide further insights in the mechanisms. +Therefore, the mutagenic probability of PAs is highly dependent on the +structure of necine base and necic acid (@Hadi2021; @Allemang2018, +@Louisse2019). However, due to limited availability of pure substances, only a +small number of PAs have been investigated experimentally in an Ames test. To +overcome this bottleneck, the application of different machine learning models +to predict mutagenic probabilities based on structures and properties +could provide further insights into the mutagenicity mechanisms of PAs. Materials and Methods ===================== @@ -142,16 +145,16 @@ training dataset was compiled from the following sources: - EFSA Dataset (695 compounds @EFSA2016): <https://data.europa.eu/euodp/data/storage/f/2017-0719T142131/GENOTOX%20data%20and%20dictionary.xls> Mutagenicity classifications from Kazius and Hansen datasets were used without -further processing. According to these publications compounds were classified -as mutagenic, if at least one positive result has been obtained in *Salmonella -typhimurium* strains TA98, TA100, TA1535, TA1537, TA97, TA102 and 1538 either +further processing. According to these publications, compounds were classified +as mutagenic if at least one positive result has been obtained in *Salmonella +typhimurium* strains TA97, TA98, TA100, TA102, TA1535, TA1537 and TA1538 either with or without metabolic activation by S9. *E. coli* results were not considered in these databases. To achieve consistency with these datasets, EFSA compounds were classified as mutagenic, if at least one positive result was -found for TA98 or T100 Salmonella strains either with or without metabolic -activation. The complete dataset contains chemicals for very diverse -application areas (e.g. pharmaceuticals, pesticides, industrial chemicals, -environmental contaminants). +found for the same *Salmonella* strains either with or without metabolic +activation and as non-mutagenic if no positive result was found. The complete +dataset contains chemicals from very diverse application areas (e.g. +pharmaceuticals, pesticides, industrial chemicals, environmental contaminants). Dataset merges were based on unique SMILES (*Simplified Molecular Input Line Entry Specification*, @Weininger1989) strings of the compound structures. @@ -159,7 +162,7 @@ Duplicated experimental data with the same outcome was merged into a single value, because it is likely that it originated from the same experiment. Contradictory results were kept as multiple measurements in the database. The combined training dataset contains {{cv.n_uniq}} unique structures and {{cv.n}} -individual measurements. +individual measurements. Contradictory results were found for {{cv.n_mult}} substances. Source code for all data download, extraction and merge operations is publicly available from the git repository <https://git.in-silico.ch/mutagenicity-paper> @@ -170,7 +173,7 @@ under a GPL3 License. The new combined dataset can be found at The pyrrolizidine alkaloid dataset was created from five independent, necine base substructure searches in PubChem (https://pubchem.ncbi.nlm.nih.gov/) and -compared to the PAs listed in the EFSA publication @EFSA2011 and the book by +compared to the PAs listed in @EFSA2011 and the book by @Mattocks1986, to ensure, that all major PAs were included. PAs mentioned in these publications, which were not found in the downloaded substances were searched individually in PubChem and, if available, downloaded @@ -182,7 +185,7 @@ Further details about the compilation of the PA dataset are described in @Schoen The PAs in the dataset were classified according to structural features. A total of 9 different structural features were assigned to the necine base, -modifications of the necine base and to the necic acid (@fig:pa-schema): +to modifications of the necine base and to the necic acid (@fig:pa-schema): ![Structural features of pyrrolizidine alkaloids](figures/PA-Schema.png){#fig:pa-schema} @@ -222,8 +225,8 @@ descriptors. In addition, they allow the efficient calculation of chemical similarities (e.g. Tanimoto indices) with simple set operations. MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics -library (@OBoyle2011a) for the complete training dataset with {{cv.n}} -instances. They can be obtained from the following locations: +library (@OBoyle2011a) for the complete training dataset with {{cv.n_uniq}} +unique structures. They can be obtained from the following locations: *Training data:* @@ -244,7 +247,7 @@ for descriptor calculations. As the training dataset contained {{cv.n}} instances, it was decided to delete all instances where CDK descriptor calculations failed during pre-processing. Furthermore, -all substances with contradictory experimental mutagenicity data were removed. The final training dataset +{{cv.n_mult}} substances with contradictory experimental results were removed. The final training dataset contained {{cv.cdk.n_descriptors}} descriptors for {{cv.cdk.n_compounds}} compounds. @@ -272,7 +275,7 @@ following basic workflow: For a given chemical structure `lazar`: compound. This procedure resembles an automated version of read across predictions -in toxicology, in machine learning terms it would be classified as a +in toxicology. In machine learning terms it would be classified as a k-nearest-neighbour algorithm. Apart from this basic workflow, `lazar` is completely modular and allows @@ -399,7 +402,7 @@ used the scikit-learn default values. #### Logistic regression (SGD) (*LR-sgd*) -For the logistic regression we used an ensemble of five trained models. +For the logistic regression we used a combination of five trained models. For each model we used a batch size of 64 and trained for 50 epochs. As an optimizer ADAM was chosen. For the other parameters we used the tensorflow default values. @@ -411,7 +414,7 @@ default values. #### Neural Nets (*NN*) -For the neural network we used an ensemble of five trained models. For +For the neural network we used a combination of five trained models. For each model we used a batch size of 64 and trained for 50 epochs. As an optimizer ADAM was chosen. The neural network had 4 hidden layers with 64 nodes each and a ReLu activation function. For the other parameters @@ -876,10 +879,10 @@ however a substantially lower number of mutagenicity predictions, despite similar crossvalidation results and we were unable to identify the reasons for this discrepancy within this investigation. -Our data show that large difference exist with regard to genotoxic probabilities +Our data show that large difference exist with regard to mutagenic probabilities between different pyrrolizidine subgroups. To adjust risk assessment of pyrrolizidine contamination, our data supports a tiered risk assessment based -on *in silico* and experimental data on the relative potency of individual +on *in silico* predictions and experimental data of individual pyrrolizidine alkaloids. References |