diff options
Diffstat (limited to 'mutagenicity.md')
-rw-r--r-- | mutagenicity.md | 72 |
1 files changed, 42 insertions, 30 deletions
diff --git a/mutagenicity.md b/mutagenicity.md index 3911799..9c5f427 100644 --- a/mutagenicity.md +++ b/mutagenicity.md @@ -52,28 +52,46 @@ MolPrint2D and Chemistry Development Kit (CDK) descriptors. Crossvalidation accuracies of all investigated models ranged from 80-85% which is comparable with the interlaboratory variability of the *Salmonella* mutagenicity assay. Pyrrolizidine alkaloid predictions showed a clear distinction between chemical -groups, where Otonecines had the highest proportion of positive mutagenicity -predictions and Monoesters the lowest. +groups, where otonecines had the highest proportion of positive mutagenicity +predictions and monoesters the lowest. Introduction ============ -**TODO**: rationale for investigation -<!--- - -Mutagenicity datasets -Algorithms -descriptors -define abbreviations -pyrrolizidine -large dataset -> comparison of algorithms and descriptors -reliable experimental outcome ----> +The assessment of mutagenicity is an important part in the safety assessment of +chemical structures, because genomic changes may lead to cancer and germ +cells damage. The *Salmonella typhimurium* bacterial reverse mutation +test (Ames test) is capable to identify substances that cause mutations (e.g., +base-pair substitutions, frameshifts, insertions, deletions) and is generally +used as the first step in genotoxicity and carcinogenicity assessments. + +Computer based (*in silico*) mutagenicity predictions can be used in the early +screening of novel compounds (e.g. drug candidates), but they are also gaining +regulatory acceptance e.g. for the registration of industrial chemicals within +REACH (@ECHA2017) or the assessment of impurities in pharmaceuticals (ICH M7 +guideline, @ICH2017). + +*Salmonella* mutagenicity is at the moment the toxicological endpoint with the +largest amount of public data for almost 10000 structures, whereas datasets for +other endpoints contain typically only a few hundred compounds. The Ames test +itself is relatively reproducible with an interlaboratory variability of 80-85% +(@Benigni1988). + +This makes the development of mutagenicity models also interesting from a +computational chemistry and machine learning point of view. The relatively +large amount of public data reduces the probability of chance effects due to +small sample sizes and the reliability of the underlying assay reduces the risk +of overfitting experimental errors. + +Within this study we attempted + + - to generate a new public mutagenicity training dataset, by combining the most comprehensive public datasets + - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors for mutagenicity predictions + - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`) -As case study we decided to apply these mutagenicity models to {{pa.nr}} -Pyrrolizidines alkaloids (PAs) in order to highlight potentials and problems -with the applicability of mutagenicity models for compounds with very limited -experimental data. +In order to highlight potentials and problems with the application of +mutagenicity models to compounds with limited experimental data we decided to +apply these mutagenicity models to {{pa.nr}} Pyrrolizidine alkaloids (PAs). Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae* @@ -87,14 +105,8 @@ base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to limited availability of pure substances, only a limited number of PAs have been investigated with regards to their structure-specific mutagenicity. To overcome this bottleneck, the prediction of structure-specific mutagenic potential of -PAs with different machine learning models could provide further inside in the mechanisms. - -Summing up the main objectives of this study were - - - to generate a new mutagenicity training dataset, by combining the most comprehensive public datasets - - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors - - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`) - - to apply these models for the prediction of pyrrolizidine alkaloid mutagenicity +PAs with different machine learning models could provide further inside in the +mechanisms. Materials and Methods ===================== @@ -585,12 +597,12 @@ models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%, @tbl:pa-summary, @fig:pa-groups). Over all models, the mean value of mutagenic predicted PAs was highest for -Otonecines ({{pa.groups.Otonecine.mut_perc}}%, +otonecines ({{pa.groups.Otonecine.mut_perc}}%, {{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), -followed by Macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, {{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}), -Dehydropyrrolizidine ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, {{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}), -Tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, {{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and -Retronecines ({{pa.groups.Retronecine.mut_perc}}%, {{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}). +followed by macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%, {{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}), +dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%, {{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}), +tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%, {{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and +retronecines ({{pa.groups.Retronecine.mut_perc}}%, {{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}). When excluding the aforementioned three deviating models, the rank order stays the same, but the percentage of mutagenic PAs is higher. |