summaryrefslogtreecommitdiff
path: root/190501_Genotox-PA.md
diff options
context:
space:
mode:
Diffstat (limited to '190501_Genotox-PA.md')
-rw-r--r--190501_Genotox-PA.md1325
1 files changed, 1325 insertions, 0 deletions
diff --git a/190501_Genotox-PA.md b/190501_Genotox-PA.md
new file mode 100644
index 0000000..acf64e9
--- /dev/null
+++ b/190501_Genotox-PA.md
@@ -0,0 +1,1325 @@
+Prediction of the mutagenic potential of different pyrrolizidine
+alkaloids using LAZAR, Random Forest, Support Vector Machines, and Deep
+Learning
+
+Authors
+
+Verena Schöning, Christoph Helma, Philipp Boss, Jürgen Drewe
+
+**Manuscript in preparation.**
+
+Corresponding author:
+
+Prof. Dr. Jürgen Drewe, MSc
+
+Abstract
+========
+
+Pyrrolizidine alkaloids (PAs) are secondary plant metabolites of some
+plant families, which protect against predators and generally considered
+as genotoxic and mutagenic. This mutagenicity is also the point of
+concern in regulatory risk assessment of this substance group [EFSA
+2011](#_ENREF_36)[EMA 2014](#_ENREF_38)[2016](#_ENREF_39)(; ; ). Several
+investigations already showed that the mutagenic potential of PAs is
+different, and largely depends on the structure.
+
+Since only very few of over 600 known PAs are available for *in vitro*
+or *in vivo* experiments, the mutagenicity of PAs in this study was
+estimated using four different machine learning techniques LAZAR and
+Deep Learning, Random Forest and Support Vector Machines. However, all
+models were not optimal for predicting the genotoxic potential of PAs
+either due to problems with the applicability domain or due to low
+performance. Therefore, no estimation regarding the genotoxic potential
+of single PAs could be made. An analysis of the genotoxic potential of
+different structural groups, showed promising results. For necine base
+and necic acid, the results fitted well with literature for three
+models. However, the prediction of the toxic principle of PAs,
+dehydropyrrolizidine was only within expectation in one model
+(TensorFlow-generated Deep Learning model), but not in the other four
+models. This study shows convincingly the need to critically review and
+assess the predictions obtained from machine learning approaches by
+internal cross-validation, but also by external validation through
+comparison with literature.
+
+Introduction
+============
+
+Pyrrolizidine alkaloids (PAs) are secondary plant ingredients found in
+many plant species as protection against predators [Hartmann & Witte
+1995](#_ENREF_59)[Langel et al. 2011](#_ENREF_76)(; ). PAs are ester
+alkaloids, which are composed of a necine base (two fused five-membered
+rings joined by a nitrogen atom) and one or two necic acid (carboxylic
+ester arms). The necine base can have different structures and thereby
+divides PAs into several structural groups, e.g. otonecine, platynecine,
+and retronecine. The structural groups of the necic acid are macrocyclic
+diester, open-ring diester and monoester [Langel et al.
+2011](#_ENREF_76)().
+
+PA are mainly metabolised in the liver, which is at the same time the
+main target organ of toxicity [Bull & Dick 1959](#_ENREF_17)[Bull et al.
+1958](#_ENREF_18)[Butler et al. 1970](#_ENREF_20)[DeLeve et al.
+1996](#_ENREF_33)[Jago 1971](#_ENREF_65)[Li et al.
+2011](#_ENREF_78)[Neumann et al. 2015](#_ENREF_99)(; ; ; ; ; ; ). There
+are three principal metabolic pathways for 1,2-unsaturated PAs [Chen et
+al. 2010](#_ENREF_26)(): (i) Detoxification by hydrolysis: the ester
+bond on positions C7 and C9 are hydrolysed by non-specific esterases to
+release necine base and necic acid, which are then subjected to further
+phase II-conjugation and excretion. (ii) Detoxification by *N*-oxidation
+of the necine base (only possible for retronecine-type PAs): the
+nitrogen is oxidised to form a PA *N*-oxides, which can be conjugated by
+phase II enzymes e.g. glutathione and then excreted. PA *N*-oxides can
+be converted back into the corresponding parent PA [Wang et al.
+2005](#_ENREF_134)(). (iii) Metabolic activation or toxification: PAs
+are metabolic activated/ toxified by oxidation (for retronecine-type
+PAs) or oxidative *N*-demethylation (for otonecine-type PAs [Lin
+1998](#_ENREF_82)()). This pathway is mainly catalysed by cytochrome
+P450 isoforms CYP2B and 3A [Ruan et al. 2014b](#_ENREF_115)(), and
+results in the formation of dehydropyrrolizidines (DHP, also known as
+pyrrolic ester or reactive pyrroles). DHPs are highly reactive and cause
+damage in the cells where they are formed, usually hepatocytes. However,
+they can also pass from the hepatocytes into the adjacent sinusoids and
+damage the endothelial lining cells [Gao et al. 2015](#_ENREF_48)()
+predominantly by reaction with protein, lipids and DNA. There is even
+evidence, that conjugation of DHP to glutathione, which would generally
+be considered a detoxification step, could result in reactive
+metabolites, which might also lead to DNA adduct formation [Xia et al.
+2015](#_ENREF_138)(). Due to the ability to form DNA adducts, DNA
+crosslinks and DNA breaks 1,2-unsaturated PAs are generally considered
+genotoxic and carcinogenic [Chen et al. 2010](#_ENREF_26)[EFSA
+2011](#_ENREF_36)[Fu et al. 2004](#_ENREF_45)[Li et al.
+2011](#_ENREF_78)[Takanashi et al. 1980](#_ENREF_126)[Yan et al.
+2008](#_ENREF_140)[Zhao et al. 2012](#_ENREF_148)(; ; ; ; ; ; ). Still,
+there is no evidence yet that PAs are carcinogenic in humans [ANZFA
+2001](#_ENREF_4)[EMA 2016](#_ENREF_39)(; ). One general limitation of
+studies with PAs is the number of different PAs investigated. Around 30
+PAs are currently commercially available, therefore all studies focus on
+these PAs. This is also true for *in vitro* and *in vivo* tests on
+mutagenicity and genotoxicity. To gain a wider perspective, in this
+study over 600 different PAs were assessed on their mutagenic potential
+using four different machine learning techniques.
+
+Materials and Methods
+=====================
+
+Training dataset
+----------------
+
+For all methods, the same validated training dataset was used. The
+training dataset was compiled from the following sources:
+
+- Kazius/Bursi Dataset (4337 compounds, [Kazius et al.
+ 2005](#_ENREF_71)()):
+
+> <http://cheminformatics.org/datasets/bursi/cas_4337.zip>
+
+- Hansen Dataset (6513 compounds, [Hansen et al. 2009](#_ENREF_57)()):
+
+> <http://doc.ml.tu-berlin.de/toxbenchmark/Mutagenicity_N6512.csv>
+
+- EFSA Dataset (695 compounds, [EFSA 2011](#_ENREF_36)()):
+
+> <https://data.europa.eu/euodp/data/storage/f/2017-0719T142131/GENOTOX%20data%20and%20dictionary.xls>
+
+Mutagenicity classifications from Kazius and Hansen datasets were used
+without further processing. To achieve consistency between these
+datasets, EFSA compounds were classified as mutagenic, if at least one
+positive result was found for TA98 or T100 Salmonella strains.
+
+Dataset merges were based on unique SMILES (*Simplified Molecular Input
+Line Entry Specification*) strings of the compound structures.
+Duplicated experimental data with the same outcome was merged into a
+single value, because it is likely that it originated from the same
+experiment. Contradictory results were kept as multiple measurements in
+the database. The combined training dataset contains 8281 unique
+structures.
+
+Source code for all data download, extraction and merge operations is
+publicly available from the git repository
+<https://git.in-silico.ch/pyrrolizidine> under a GPL3 License.
+
+Testing dataset
+---------------
+
+The testing dataset consisted of 602 different PAs. The compilation of
+the PA dataset is described in detail in [Schöning et al.
+(2017)](#_ENREF_119). The PAs were assigned to groups according to
+structural features of the necine base and necic acid.
+
+For the necine base, following groups were assigned:
+
+- Retronecine-type (1,2-unstaturated necine base)
+
+- Otonecine-type (1,2-unstaturated necine base)
+
+- Platynecine-type (1,2-saturated necine base)
+
+For the modification of necine base, following groups were assigned:
+
+- *N*-oxide-type
+
+- Tertiary-type (PAs which were neither from the *N*-oxide- nor
+ > DHP-type)
+
+- DHP-type (dehydropyrrolizidine, pyrrolic ester)
+
+For the necic acid, following groups were assigned:
+
+- Monoester-type
+
+- Open-ring diester-type
+
+- Macrocyclic diester-type
+
+For the Random Forest (RF), Support Vector Machines (SVM), and Deep
+Learning (DL) models, molecular descriptors of the PAs were calculated
+using the program PaDEL-Descriptors (version 2.21) [Yap
+2011](#_ENREF_142)[2014](#_ENREF_143)(; ). From these descriptors were
+chosen, which were actually used for the generation of the DL model.
+
+LAZAR
+-----
+
+LAZAR (*lazy structure activity relationships*) is a modular framework
+for read-across model development and validation. It follows the
+following basic workflow: For a given chemical structure LAZAR:
+
+- searches in a database for similar structures (neighbours) with
+ experimental data,
+
+- builds a local QSAR model with these neighbours and
+
+- uses this model to predict the unknown activity of the query
+ compound.
+
+This procedure resembles an automated version of read across predictions
+in toxicology, in machine learning terms it would be classified as a
+k-nearest-neighbour algorithm.
+
+Apart from this basic workflow, LAZAR is completely modular and allows
+the researcher to use any algorithm for similarity searches and local
+QSAR (*Quantitative structure--activity relationship*) modelling.
+Algorithms used within this study are described in the following
+sections.
+
+### Neighbour identification
+
+Similarity calculations were based on MolPrint2D fingerprints [Bender et
+al. 2004](#_ENREF_8)() from the OpenBabel cheminformatics library
+[O\'Boyle et al. 2011](#_ENREF_104)(). The MolPrint2D fingerprint uses
+atom environments as molecular representation, which resembles basically
+the chemical concept of functional groups. For each atom in a molecule,
+it represents the chemical environment using the atom types of connected
+atoms.
+
+MolPrint2D fingerprints are generated dynamically from chemical
+structures and do not rely on predefined lists of fragments (such as
+OpenBabel FP3, FP4 or MACCs fingerprints or lists of
+toxicophores/toxicophobes). This has the advantage that they may capture
+substructures of toxicological relevance that are not included in other
+fingerprints.
+
+From MolPrint2D fingerprints a feature vector with all atom environments
+of a compound can be constructed that can be used to calculate chemical
+similarities.
+
+The chemical similarity between two compounds a and b is expressed as
+the proportion between atom environments common in both structures A ∩ B
+and the total number of atom environments A U B (Jaccard/Tanimoto
+index).
+
+$$sim = \frac{\left| A\ \cap B \right|}{\left| A\ \cup B \right|}$$
+
+Threshold selection is a trade-off between prediction accuracy (high
+threshold) and the number of predictable compounds (low threshold). As
+it is in many practical cases desirable to make predictions even in the
+absence of closely related neighbours, we follow a tiered approach:
+
+- First a similarity threshold of 0.5 is used to collect neighbours,
+ to create a local QSAR model and to make a prediction for the query
+ compound.
+
+- If any of these steps fails, the procedure is repeated with a
+ similarity threshold of 0.2 and the prediction is flagged with a
+ warning that it might be out of the applicability domain of the
+ training data.
+
+- Similarity thresholds of 0.5 and 0.2 are the default values chosen
+ > by the software developers and remained unchanged during the
+ > course of these experiments.
+
+Compounds with the same structure as the query structure are
+automatically eliminated from neighbours to obtain unbiased predictions
+in the presence of duplicates.
+
+### Local QSAR models and predictions
+
+Only similar compounds (neighbours) above the threshold are used for
+local QSAR models. In this investigation, we are using a weighted
+majority vote from the neighbour's experimental data for mutagenicity
+classifications. Probabilities for both classes
+(mutagenic/non-mutagenic) are calculated according to the following
+formula and the class with the higher probability is used as prediction
+outcome.
+
+$$p_{c} = \ \frac{\sum_{}^{}\text{sim}_{n,c}}{\sum_{}^{}\text{sim}_{n}}$$
+
+$p_{c}$ Probability of class c (e.g. mutagenic or non-mutagenic)\
+$\sum_{}^{}\text{sim}_{n,c}$ Sum of similarities of neighbours with
+class c\
+$\sum_{}^{}\text{sim}_{n}$ Sum of all neighbours
+
+### Applicability domain
+
+The applicability domain (AD) of LAZAR models is determined by the
+structural diversity of the training data. If no similar compounds are
+found in the training data no predictions will be generated. Warnings
+are issued if the similarity threshold had to be lowered from 0.5 to 0.2
+in order to enable predictions. Predictions without warnings can be
+considered as close to the applicability domain and predictions with
+warnings as more distant from the applicability domain. Quantitative
+applicability domain information can be obtained from the similarities
+of individual neighbours.
+
+### Availability
+
+- LAZAR experiments for this manuscript:
+ [https://git.in-silico.ch/pyrrolizidine](https://deref-gmx.net/mail/client/Yn0laI8dUvs/dereferrer/?redirectUrl=https%3A%2F%2Fgit.in-silico.ch%2Fpyrrolizidine)
+ (source code, GPL3)
+
+- LAZAR framework:
+ [https://git.in-silico.ch/lazar](https://deref-gmx.net/mail/client/v26UgZbKEpE/dereferrer/?redirectUrl=https%3A%2F%2Fgit.in-silico.ch%2Flazar)
+ (source code, GPL3)
+
+- LAZAR GUI:
+ [https://git.in-silico.ch/lazar-gui](https://deref-gmx.net/mail/client/QstEPrpbcqQ/dereferrer/?redirectUrl=https%3A%2F%2Fgit.in-silico.ch%2Flazar-gui)
+ (source code, GPL3)
+
+- Public web interface:
+ [https://lazar.in-silico.ch](https://deref-gmx.net/mail/client/Gln3hLem0DY/dereferrer/?redirectUrl=https%3A%2F%2Flazar.in-silico.ch)
+
+Random Forest, Support Vector Machines, and Deep Learning in R-project
+----------------------------------------------------------------------
+
+In comparison to LAZAR, three other models (Random Forest (RF), Support
+Vector Machines (SVM), and Deep Learning (DL)) were evaluated.
+
+For the generation of these models, molecular 1D and 2D descriptors of
+the training dataset were calculated using PaDEL-Descriptors (version
+2.21) [Yap 2011](#_ENREF_142)[2014](#_ENREF_143)(; ).
+
+As the training dataset contained over 8280 instances, it was decided to
+delete instances with missing values during data pre-processing.
+Furthermore, substances with equivocal outcome were removed. The final
+training dataset contained 8080 instances with known mutagenic
+potential. The RF, SVM, and DL models were generated using the R
+software (R-project for Statistical Computing,
+<https://www.r-project.org/>*;* version 3.3.1), specific R packages used
+are identified for each step in the description below. During feature
+selection, descriptor with near zero variance were removed using
+'*NearZeroVar*'-function (package 'caret'). If the percentage of the
+most common value was more than 90% or when the frequency ratio of the
+most common value to the second most common value was greater than 95:5
+(e.g. 95 instances of the most common value and only 5 or less instances
+of the second most common value), a descriptor was classified as having
+a near zero variance. After that, highly correlated descriptors were
+removed using the '*findCorrelation*'-function (package 'caret') with a
+cut-off of 0.9. This resulted in a training dataset with 516
+descriptors. These descriptors were scaled to be in the range between 0
+and 1 using the '*preProcess*'-function (package 'caret'). The scaling
+routine was saved in order to apply the same scaling on the testing
+dataset. As these three steps did not consider the outcome, it was
+decided that they do not need to be included in the cross-validation of
+the model. To further reduce the number of features, a LASSO (*least
+absolute shrinkage and selection operator*) regression was performed
+using the '*glmnet*'-function (package '*glmnet*'). The reduced dataset
+was used for the generation of the pre-trained models.
+
+For the RF model, the '*randomForest*'-function (package
+'*randomForest*') was used. A forest with 1000 trees with maximal
+terminal nodes of 200 was grown for the prediction.
+
+The '*svm*'-function (package 'e1071') with a *radial basis function
+kernel* was used for the SVM model.
+
+The DL model was generated using the '*h2o.deeplearning*'-function
+(package '*h2o*'). The DL contained four hidden layer with 70, 50, 50,
+and 10 neurons, respectively. Other hyperparameter were set as follows:
+l1=1.0E-7, l2=1.0E-11, epsilon = 1.0E-10, rho = 0.8, and quantile\_alpha
+= 0.5. For all other hyperparameter, the default values were used.
+Weights and biases were in a first step determined with an unsupervised
+DL model. These values were then used for the actual, supervised DL
+model.
+
+To validate these models, an internal cross-validation approach was
+chosen. The training dataset was randomly split in training data, which
+contained 95% of the data, and validation data, which contain 5% of the
+data. A feature selection with LASSO on the training data was performed,
+reducing the number of descriptors to approximately 100. This step was
+repeated five times. Based on each of the five different training data,
+the predictive models were trained and the performance tested with the
+validation data. This step was repeated 10 times. Furthermore, a
+y-randomisation using the RF model was performed. During
+y-randomisation, the outcome (y-variable) is randomly permuted. The
+theory is that after randomisation of the outcome, the model should not
+be able to correlate the outcome to the properties (descriptor values)
+of the substances. The performance of the model should therefore
+indicate a by change prediction with an accuracy of about 50%. If this
+is true, it can be concluded that correlation between actual outcome and
+properties of the substances is real and not by chance [Rücker et al.
+2007](#_ENREF_117)().
+
+![](./media/media/image1.png){width="6.26875in"
+height="5.486111111111111in"}
+
+Figure 1: Flowchart of the generation and validation of the models
+generated in R-project
+
+Deep Learning in TensorFlow
+---------------------------
+
+Alternatively, a DL model was established with Python-based TensorFlow
+program (<https://www.tensorflow.org/>) using the high-level API Keras
+(<https://www.tensorflow.org/guide/keras>) to build the models.
+
+Data pre-processing was done by rank transformation using the
+'*QuantileTransformer*' procedure. A sequential model has been used.
+Four layers have been used: input layer, two hidden layers (with 12, 8
+and 8 nodes, respectively) and one output layer. For the output layer, a
+sigmoidal activation function and for all other layers the ReLU
+('*Rectified Linear Unit*') activation function was used. Additionally,
+a L^2^-penalty of 0.001 was used for the input layer. For training of
+the model, the ADAM algorithm was used to minimise the cross-entropy
+loss using the default parameters of Keras. Training was performed for
+100 epochs with a batch size of 64. The model was implemented with
+Python 3.6 and Keras. For training of the model, a 6-fold
+cross-validation was used. Accuracy was estimated by ROC-AUC and
+confusion matrix.
+
+Results
+=======
+
+LAZAR
+-----
+
+For 46 PAs, no prediction could be made. 26 PAs had no neighbours and 20
+PAs had only one neighbour. For additional 396 PAs, the similarity
+threshold had to be reduced from 0.5 to 0.2 to obtain enough neighbours
+for a prediction. This means that these substances might not be within
+the applicability domain (AD). Therefore, only 160 of 602 PAs were well
+within the stricter AD with the similarity threshold of 0.5 and 556 PAs
+in the AD with the similarity threshold of 0.2.
+
+![](./media/media/image2.png){width="5.905511811023622in"
+height="3.868241469816273in"}
+
+Figure 2: Genotoxic potential of the different PA groups as predicted by
+LAZAR, using the **similarity threshold** **of 0.5**.
+
+*Genotoxic*: percentage number of compounds per group, which were
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which were
+predicted to be not genotoxic\
+*Outside AD*: percentage number of compounds per group, which were
+outside the applicability domain (AD).
+
+![](./media/media/image3.png){width="5.905511811023622in"
+height="3.868241469816273in"}
+
+Figure 3: Genotoxic potential of the different PA groups as predicted by
+LAZAR, using the **similarity threshold of 0.2**
+
+*Genotoxic*: percentage number of compounds per group, which were
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which were
+predicted to be not genotoxic\
+*Outside AD*: percentage number of compounds per group, which were
+outside the applicability domain (AD).
+
+Interestingly, using both similarity thresholds (e.g. 0.2 and 0.5), the
+majority of PAs in all groups except otonecine, were predicted to be not
+genotoxic.
+
+The following rank order for genotoxicity probability can be deduced
+from the results of both similarity thresholds:
+
+- Necine base: platynecine ≤ retronecine \<\< otonecine
+
+- Necic acid: monoester \< diester \< macrocyclic diester
+
+- Modification of necine base: *N*-oxide \< DHP \< tertiary PA
+
+Random Forest, Support Vector Machines, and Deep Learning
+---------------------------------------------------------
+
+Applicability domain
+
+The AD of the training dataset and the PA dataset was evaluated using
+the Jaccard distance. A Jaccard distance of '0' indicates that the
+substances are similar, whereas a value of '1' shows that the substances
+are different. The Jaccard distance was below 0.2 for all PAs relative
+to the training dataset. Therefore, PA dataset is within the AD of the
+training dataset and the models can be used to predict the genotoxic
+potential of the PA dataset.
+
+y-randomisation
+
+After y-randomisation of the outcome, the accuracy and CCR are around
+50%, indicating a chance in the distribution of the results. This shows,
+that the outcome is actually related to the predictors and not by
+chance.
+
+Random Forest
+
+The validation showed that the RF model has an accuracy of 64%, a
+sensitivity of 66% and a specificity of 63%. The confusion matrix of the
+model, calculated for 8080 instances, is provided in Table 1.
+
+Table 1: Confusion matrix of the RF model
+
+ Predicted genotoxicity
+ ----------------------- ------------------------ ---------- ---------- -------------
+ Measured genotoxicity ***PP*** ***PN*** ***Total***
+ ***TP*** 2274 1163 3437
+ ***TN*** 1736 2907 4643
+ ***Total*** 4010 4070 8080
+
+PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
+True negative
+
+In general, the majority of PAs were considered to be not genotoxic by
+the RF model (Figure 4).
+
+![](./media/media/image4.png){width="6.063194444444444in"
+height="3.8756944444444446in"}
+
+Figure 4: Genotoxic potential of the different PA groups as predicted by
+**RF model**
+
+*Genotoxic*: percentage number of compounds per group, which was
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which was
+predicted to be not genotoxic.
+
+From the results, the following rank orders of genotoxic potential could
+be deduced:
+
+- Necine base: platynecine \< retronecine \< otonecine
+
+- Necic acid: monoester (= 0%) \< diester \< macrocyclic diester
+
+- Modification of necine base: *N*-oxide = dehydropyrrolizidine (0%)
+ \< tertiary PA
+
+Support Vector Machines
+
+The validation showed that the SVM model has an accuracy of 62%, a
+sensitivity of 65% and a specificity of 60%. The confusion matrix of SVM
+model, calculated for 8080 instances, is provided in Table 2.
+
+Table 2: Confusion matrix of the SVM model
+
+ Predicted genotoxicity
+ ----------------------- ------------------------ ---------- ---------- -------------
+ Measured genotoxicity ***PP*** ***PN*** ***Total***
+ ***TP*** 2057 1107 3164
+ ***TN*** 1953 2963 4916
+ ***Total*** 4010 4070 8080
+
+PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
+True negative
+
+In the SVM model, also the majority of PAs were considered to be not
+genotoxic (Figure 5).
+
+![](./media/media/image5.png){width="6.063194444444444in"
+height="3.9694444444444446in"}
+
+Figure 5: Genotoxic potential of the different PA groups as predicted by
+**SVM model**
+
+*Genotoxic*: percentage number of compounds per group, which was
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which was
+predicted to be not genotoxic
+
+From the results, the following rank orders of genotoxic potential could
+be deduced:
+
+- Necine base: otonecine \< platynecine = retronecine
+
+- Necic acid: macrocyclic diester \< monoester = diester
+
+- Modification of necine base: dehydropyrrolizidine \< tertiary
+ PA \< *N*-oxide 
+
+Deep Learning (R-project)
+
+The validation showed that the DL model generated in R has an accuracy
+of 59%, a sensitivity of 89% and a specificity of 30%. The confusion
+matrix of the model, normalised to 8080 instances, is provided in Table
+3.
+
+Table 3: Confusion matrix of the DL model (R-project)
+
+ Predicted genotoxicity
+ ----------------------- ------------------------ ---------- ---------- -------------
+ Measured genotoxicity ***PP*** ***PN*** ***Total***
+ ***TP*** 3575 435 4010
+ ***TN*** 2853 1217 4070
+ ***Total*** 6428 1652 8080
+
+PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
+True negative
+
+In contrast, the majority of PAs were considered to be genotoxic by the
+DL model in R (Figure 6).
+
+![](./media/media/image6.png){width="6.063194444444444in"
+height="3.982638888888889in"}
+
+Figure 6: Genotoxic potential of the different PA groups as predicted by
+**DL model (R-project)**
+
+*Genotoxic*: percentage number of compounds per group, which was
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which was
+predicted to be not genotoxic
+
+From the results, the following rank orders of genotoxic potential could
+be proposed:
+
+- Necine base: platynecine \< retronecine \< otonecine
+
+- Necic acid: monoester \< diester \< macrocyclic diester
+
+- Modification of necine base: tertiary PA = dehydropyrrolizidine \<
+ *N*-oxide.
+
+DL model (TensorFlow)
+
+The validation showed that the DL model generated in TensorFlow has an
+accuracy of 68%, a sensitivity of 70% and a specificity of 46%. The
+confusion matrix of the model, normalised to 8080 instances, is provided
+in Table 4.
+
+Table 4: Confusion matrix of the DL model (TensorFlow)
+
+ Predicted genotoxicity
+ ----------------------- ------------------------ ---------- ---------- -------------
+ Measured genotoxicity ***PP*** ***PN*** ***Total***
+ ***TP*** 2851 1227 4078
+ ***TN*** 1825 2177 4002
+ ***Total*** 4676 3404 8080
+
+PP: Predicted positive; PN: Predicted negative, TP: True positive, TN:
+True negative
+
+The ROC curves from the 6-fold validation are shown in Figure 7.
+
+![C:\\Users\\JDrewe\\AppData\\Local\\Microsoft\\Windows\\INetCache\\Content.MSO\\7CFE5F13.tmp](./media/media/image7.png){width="3.825in"
+height="2.7327045056867894in"}
+
+Figure 7: Six-fold cross-validation of TensorFlow DL model show an
+average area under the ROC-curve (ROC-AUC; measure of accuracy) of 68%.
+
+In contrast to the DL generated in R, the DL model generated in
+TensorFlow predicted the majority of PAs as not genotoxic.
+
+![C:\\Users\\JDrewe\\AppData\\Local\\Microsoft\\Windows\\INetCache\\Content.MSO\\4F678848.tmp](./media/media/image8.png){width="6.26875in"
+height="3.6993055555555556in"}
+
+Figure 8: Genotoxic potential of the different PA groups as predicted by
+**DL model (TensorFlow)**
+
+*Genotoxic*: percentage number of compounds per group, which was
+predicted to be genotoxic.\
+*Not genotoxic*: percentage number of compounds per group, which was
+predicted to be not genotoxic
+
+The following rank orders of genotoxic potential could be proposed based
+on the results:
+
+- Necine base: platynecine \< otonecine \< retronecine 
+
+- Necic acid: monoester \< diester \< macrocyclic diester
+
+- Modification of necine base: tertiary PA \< *N*-oxide \<\<
+ dehydropyrrolizidine.
+
+In summary, the validation results of the four methods are presented in
+the following table.
+
+Table 5 Results of the cross-validation of the four models and after
+y-randomisation
+
+ ----------------------------------------------------------------------
+ Accuracy CCR Sensitivity Specificity
+ ----------------------- ---------- ------- ------------- -------------
+ RF model 64.1% 64.4% 66.2% 62.6%
+
+ SVM model 62.1% 62.6% 65.0% 60.3%
+
+ DL model\ 59.3% 59.5% 89.2% 29.9%
+ (R-project)
+
+ DL model (TensorFlow) 68% 62.2% 69.9% 45.6%
+
+ y-randomisation 50.5% 50.4% 50.3% 50.6%
+ ----------------------------------------------------------------------
+
+CCR (correct classification rate)
+
+Discussion
+==========
+
+General model performance
+
+Based on the results of the cross-validation for all models, LAZAR, RF,
+SVM, DL (R-project) and DL (TensorFlow) it can be state that the
+prediction results are not optimal due to different reasons. The
+accuracy as measured during cross-validation of the four models (RF,
+SVM, DL (R-project and TensorFlow)) was partly low with CCR values
+between 59.3 and 68%, with the R-generated DL model and the
+TensorFlow-generated DL model showing the worst and the best
+performance, respectively. The validation of the R-generated DL model
+revealed a high sensitivity (89.2%) but an unacceptably low specificity
+of 29.9% indicating a high number of false positive estimates. The
+TensorFlow-generated DL model, however, showed an acceptable but not
+optimal accuracy of 68%, a sensitivity of 69.9% and a specificity of
+45.6%. The low specificity indicates that both DL models tends to
+predict too many instances as positive (genotoxic), and therefore have a
+high false positive rate. This allows at least with the TensorFlow
+generated DL model to make group statements, but the confidence for
+estimations of single PAs appears to be insufficiently low.
+
+Several factors have likely contributed to the low to moderate
+performance of the used methods as shown during the cross-validation:
+
+1. The outcome in the training dataset was based on the results of AMES
+ tests for genotoxicity [ICH 2011](#_ENREF_63)(), an *in vitro* test
+ in different strains of the bacteria *Salmonella typhimurium*. In
+ this test, mutagenicity is evaluated with and without prior
+ metabolic activation of the test substance. Metabolic activation
+ could result in the formation of genotoxic metabolites from
+ non-genotoxic parent compounds. However, no distinction was made in
+ the training dataset between substances that needed metabolic
+ activation before being mutagenic and those that were mutagenic
+ without metabolic activation. LAZAR is able to handle this
+ 'inaccuracy' in the training dataset well due to the way the
+ algorithm works: LAZAR predicts the genotoxic potential based on the
+ neighbours of substances with comparable structural features,
+ considering mutagenic and not mutagenic neighbours. Based on the
+ structural similarity, a probability for mutagenicity and no
+ mutagenicity is calculated independently from each other (meaning
+ that the sum of probabilities does not necessarily adds up to 100%).
+ The class with the higher outcome is then the overall outcome for
+ the substance.
+
+> In contrast, the other models need to be trained first to recognise
+> the structural features that are responsible for genotoxicity.
+> Therefore, the mixture of substances being mutagenic with and without
+> metabolic activation in the training dataset may have adversely
+> affected the ability to separate the dataset in two distinct classes
+> and thus explains the relatively low performance of these models.
+
+2. Machine learning algorithms try to find an optimized solution in a
+ high-dimensional (one dimension per each predictor) space. Sometimes
+ these methods do not find the global optimum of estimates but only
+ local (not optimal) solutions. Strategies to find the global
+ solutions are systematic variation (grid search) of the
+ hyperparameters of the methods, which may be very time consuming in
+ particular in large datasets.
+
+Mutagenicity of PAs
+
+Due to the low to moderate predictivity of all models, quantitative
+statement on the genotoxicity of single PAs cannot be made with
+sufficient confidence.
+
+The predictions of the SVM model did not fit with the other models or
+literature, and are therefore not further considered in the discussion.
+
+Necic acid
+
+The rank order of the necic acid is comparable in the four models
+considered (LAZAR, RF and DL (R-project and TensorFlow). PAs from the
+monoester type had the lowest genotoxic potential, followed by PAs from
+the open-ring diester type. PAs with macrocyclic diesters had the
+highest genotoxic potential. The result fit well with current state of
+knowledge: in general, PAs, which have a macrocyclic diesters as necic
+acid, are considered more toxic than those with an open-ring diester or
+monoester [EFSA 2011](#_ENREF_36)[Fu et al. 2004](#_ENREF_45)[Ruan et
+al. 2014b](#_ENREF_115)(; ; ).
+
+Necine base
+
+The rank order of necine base is comparable in LAZAR, RF, and DL
+(R-project) models: with platynecine being less or as genotoxic as
+retronecine, and otonecine being the most genotoxic. In the
+TensorFlow-generate DL model, platynecine also has the lowest genotoxic
+probability, but are then followed by the otonecines and last by
+retronecine. These results partly correspond to earlier published
+studies. Saturated PAs of the platynecine-type are generally accepted to
+be less or non-toxic and have been shown in *in vitro* experiments to
+form no DNA-adducts [Xia et al. 2013](#_ENREF_139)(). Therefore, it is
+striking, that 1,2-unsaturated PAs of the retronecine-type should have
+an almost comparable genotoxic potential in the LAZAR and DL (R-project)
+model. In literature, otonecine-type PAs were shown to be more toxic
+than those of the retronecine-type [Li et al. 2013](#_ENREF_80)().
+
+Modifications of necine base
+
+The group-specific results of the TensorFlow-generated DL model appear
+to reflect the expected relationship between the groups: the low
+genotoxic potential of *N*-oxides and the highest potential of
+dehydropyrrolizidines [Chen et al. 2010](#_ENREF_26)().
+
+In the LAZAR model, the genotoxic potential of dehydropyrrolizidines
+(DHP) (using the extended AD) is comparable to that of tertiary PAs.
+Since, DHP is regarded as the toxic principle in the metabolism of PAs,
+and known to produce protein- and DNA-adducts [Chen et al.
+2010](#_ENREF_26)(), the LAZAR model did not meet this expectation it
+predicted the majority of DHP as being not genotoxic. However, the
+following issues need to be considered. On the one hand, all DHP were
+outside of the stricter AD of 0.5. This indicates that in general, there
+might be a problem with the AD. In addition, DHP has two unsaturated
+double bounds in its necine base, making it highly reactive. DHP and
+other comparable molecules have a very short lifespan, and usually
+cannot be used in *in vitro* experiments. This might explain the absence
+of suitable neighbours in LAZAR.
+
+Furthermore, the probabilities for this substance groups needs to be
+considered, and not only the consolidated prediction. In the LAZAR
+model, all DHPs had probabilities for both outcomes (genotoxic and not
+genotoxic) mainly below 30%. Additionally, the probabilities for both
+outcomes were close together, often within 10% of each other. The fact
+that for both outcomes, the probabilities were low and close together,
+indicates a lower confidence in the prediction of the model for DHPs.
+
+In the DL (R-project) and RF model, *N*-oxides have a by far more
+genotoxic potential that tertiary PAs or dehydropyrrolizidines. As PA
+*N*-oxides are easily conjugated for extraction, they are generally
+considered as detoxification products, which are *in vivo* quickly
+renally eliminated [Chen et al. 2010](#_ENREF_26)(). On the other hand,
+*N*-oxides can be also back-transformed to the corresponding tertiary PA
+[Wang et al. 2005](#_ENREF_134)(). Therefore, it may be questioned,
+whether *N*-oxides themselves are generally less genotoxic than the
+corresponding tertiary PAs. However, in the groups of modification of
+the necine base, dehydropyrrolizidine, the toxic principle of PAs,
+should have had the highest genotoxic potential. Taken together, the
+predictions of the modifications of the necine base from the LAZAR, RF
+and R-generated DL model cannot -- in contrast to the TensorFlow DL
+model - be considered as reliable.
+
+Overall, when comparing the prediction results of the PAs to current
+published knowledge, it can be concluded that the performance of most
+models was low to moderate. This might be contributed to the following
+issues:
+
+1. In the LAZAR model, only 26.6% PAs were within the stricter AD. With
+ the extended AD, 92.3% of the PAs could be included in the
+ prediction. Even though the Jaccard distance between the training
+ dataset and the PA dataset for the RF, SVM, and DL (R-project and
+ TensorFlow) models was small, suggesting a high similarity, the
+ LAZAR indicated that PAs have only few local neighbours, which might
+ adversely affect the prediction of the mutagenic potential of PAs.
+
+2. All above-mentioned models were used to predict the mutagenicity of
+ PAs. PAs are generally considered to be genotoxic, and the mode of
+ action is also known. Therefore, the fact that some models predict
+ the majority of PAs as not genotoxic seems contradictory. To
+ understand this result, the basis, the training dataset, has to be
+ considered. The mutagenicity of in the training dataset are based on
+ data of mutagenicity in bacteria. There are some studies, which show
+ mutagenicity of PAs in the AMES test [Chen et al.
+ 2010](#_ENREF_26)(). Also, [Rubiolo et al. (1992)](#_ENREF_116)
+ examined several different PAs and several different extracts of
+ PA-containing plants in the AMES test. They found that the AMES test
+ was indeed able to detect mutagenicity of PAs, but in general,
+ appeared to have a low sensitivity. The pre-incubation phase for
+ metabolic activation of PAs by microsomal enzymes was the
+ sensitivity-limiting step. This could very well mean that this is
+ also reflected in the QSAR models.
+
+Conclusions
+===========
+
+In this study, an attempt was made to predict the genotoxic potential of
+PAs using five different machine learning techniques (LAZAR, RF, SVM, DL
+(R-project and TensorFlow). The results of all models fitted only partly
+to the findings in literature, with best results obtained with the
+TensorFlow DL model. Therefore, modelling allows statements on the
+relative risks of genotoxicity of the different PA groups. Individual
+predictions for selective PAs appear, however, not reliable on the
+current basis of the used training dataset.
+
+This study emphasises the importance of critical assessment of
+predictions by QSAR models. This includes not only extensive literature
+research to assess the plausibility of the predictions, but also a good
+knowledge of the metabolism of the test substances and understanding for
+possible mechanisms of toxicity.
+
+In further studies, additional machine learning techniques or a modified
+(extended) training dataset should be used for an additional attempt to
+predict the genotoxic potential of PAs.
+
+References
+==========
+
+[]{#_ENREF_4 .anchor}
+
+[]{#_ENREF_8 .anchor}
+
+[]{#_ENREF_17 .anchor}
+
+[]{#_ENREF_18 .anchor}
+
+[]{#_ENREF_20 .anchor}
+
+[]{#_ENREF_26 .anchor}
+
+[]{#_ENREF_33 .anchor}
+
+[]{#_ENREF_36 .anchor}
+
+[]{#_ENREF_38 .anchor}
+
+[]{#_ENREF_39 .anchor}
+
+[]{#_ENREF_45 .anchor}
+
+[]{#_ENREF_48 .anchor}
+
+[]{#_ENREF_57 .anchor}
+
+[]{#_ENREF_59 .anchor}
+
+[]{#_ENREF_63 .anchor}
+
+[]{#_ENREF_65 .anchor}
+
+[]{#_ENREF_71 .anchor}
+
+[]{#_ENREF_76 .anchor}
+
+[]{#_ENREF_78 .anchor}
+
+[]{#_ENREF_80 .anchor}
+
+[]{#_ENREF_82 .anchor}
+
+[]{#_ENREF_99 .anchor}
+
+[]{#_ENREF_104 .anchor}
+
+<https://openbabel.org/docs/dev/Fingerprints/intro.html>
+
+[]{#_ENREF_115 .anchor}
+
+[]{#_ENREF_116 .anchor}
+
+[]{#_ENREF_117 .anchor}
+
+[]{#_ENREF_119 .anchor}
+
+[]{#_ENREF_126 .anchor}
+
+[]{#_ENREF_134 .anchor}
+
+[]{#_ENREF_138 .anchor}
+
+[]{#_ENREF_139 .anchor}
+
+[]{#_ENREF_140 .anchor}
+
+[]{#_ENREF_142 .anchor}
+
+[]{#_ENREF_143
+.anchor}<http://www.yapcwsoft.com/dd/padeldescriptor/Descriptors.xls>
+
+[]{#_ENREF_148 .anchor}
+
+Aguer C, Gambarotta D, Mailloux RJ, Moffat C, Dent R, et al. 2011.
+Galactose enhances oxidative metabolism and reveals mitochondrial
+dysfunction in human primary muscle cells. PLoS One 6:e28536Ahmed SN,
+Siddiqi ZA. 2006. Antiepileptic drugs and liver disease. Seizure
+15:156-64Aleo MD, Luo Y, Swiss R, Bonin PD, Potter DM, Will Y. 2014.
+Human drug-induced liver injury severity is highly associated with dual
+inhibition of liver mitochondrial function and bile salt export pump.
+Hepatology (Baltimore, Md) 60:1015-22ANZFA. 2001. Pyrrolizidine
+alkaloids in food. A Toxicological Review and Risk Assessment. ed.
+Authority, ANZF, pp. 1-16Armstrong SJ, Zuckerman AJ, Bird RG. 1972.
+Induction of morphological changes in human embryo liver cells by the
+pyrrolizidine alkaloid lasiocarpine. British journal of experimental
+pathology 53:145-9Barysz M, Jashari G, Lall RS, Srivastava AK,
+Trinajstic N. 1983. On the distance matrix of molecules containing
+heteroatoms. In *Chemical Applications of Topology and Graph Theory*,
+pp. 222-30. Amsterdam, The Netherlands: ElsevierBasak SC, Harriss DK,
+Magnuson VR. Comparative Study of Lipophilicity \<em\>versus\</em\>
+Topological Molecular Descriptors in Biological Correlations. Journal of
+Pharmaceutical Sciences 73:429-37Bender A, Mussa HY, Glen RC, Reiling S.
+2004. Molecular similarity searching using atom environments,
+information-based feature selection, and a naive Bayesian classifier. J
+Chem Inf Comput Sci 44:170-8Benichou C, Danan G, Flahault A. 1993.
+Causality assessment of adverse reactions to drugs\--II. An original
+model for validation of drug causality assessment methods: case reports
+with positive rechallenge. J Clin Epidemiol 46:1331-6Bergmeir C, Benítez
+JM. 2012. Neural Networks in R Using the Stuttgart Neural Network
+Simulator: RSNNS. Journal of Statistical Software 46:1-26Bishop-Bailey
+D, Thomson S, Askari A, Faulkner A, Wheeler-Jones C. 2014.
+Lipid-metabolizing CYPs in the regulation and dysregulation of
+metabolism. Annu Rev Nutr 34:261-79Blower PE, Cross KP. 2006. Decision
+Tree Methods in Pharmaceutical Research. Current topics in medicinal
+chemistry 6:31-9Boelsterli UA, Lee KK. 2014. Mechanisms of
+isoniazid-induced idiosyncratic liver injury: emerging role of
+mitochondrial stress. Journal of gastroenterology and hepatology
+29:678-87Bramer M. 2013. Principles of Data Mining. p. 444:
+Springer-VerlagBreimann L. 2001. Random Forests. Machine Learning
+45:5-32Breimann L. 2003. Manual-Setting Up, Using, And Understanding
+Random Forests V4.0.1-33Bull LB, Dick AT. 1959. The chronic pathological
+effects on the liver of the rat of the pyrrolizidine alkaloids
+heliotrine, lasiocarpine and their N-oxides. J Path Bact 78:483-502Bull
+LB, Dick AT, McKenzie JS. 1958. The actue toxic effects of heliotrine
+and lasiocarpine, and their N-oxides, on the rat. J Path Bact
+75:17-25Burden FR. 1989. Molecular identification number for
+substructure searches. Journal of Chemical Information and Computer
+Sciences 29:225-7Butler WH, Mattocks AR, Barnes JM. 1970. Lesions in the
+liver and lungs of rats given pyrrole derivates of pyrrolizidine
+alkaloids. J Path 100:169-75Chai J, He Y, Cai SY, Jiang Z, Wang H, et
+al. 2012. Elevated hepatic multidrug resistance-associated protein
+3/ATP-binding cassette subfamily C 3 expression in human obstructive
+cholestasis is mediated through tumor necrosis factor alpha and c-Jun
+NH2-terminal kinase/stress-activated protein kinase-signaling pathway.
+Hepatology 55:1485-94Chalhoub WM, Sliman KD, Arumuganathan M, Lewis JH.
+2014. Drug-induced liver injury: what was new in 2013? Expert Opin Drug
+Metab Toxicol 10:959-80Chawla NV, Bowyer KW, Hall LO. 2002. SMOTE:
+Synthetic Minority Over-sampling Technique. Journal of Artificial
+Intelligence Research 16:321--57Chen M, Borlak J, Tong W. 2013. High
+lipophilicity and high daily dose of oral medications are associated
+with significant risk for drug-induced liver injury. Hepatology
+(Baltimore, Md) 58:388-96Chen M, Suzuki A, Thakkar S, Yu K, Hu C, Tong
+W. 2016. DILIrank: the largest reference drug list ranked by the risk
+for developing drug-induced liver injury in humans. Drug Discov Today
+21:648-53Chen T, Mei N, Fu PP. 2010. Genotoxicity of pyrrolizidine
+alkaloids. J Appl Toxicol 30:183-96Crabtree HG. 1928. The carbohydrate
+metabolism of certain pathological overgrowths Biochem J 22:1289-98Daly
+AK, Donaldson PT, Bhatnagar P, Shen Y, Pe\'er I, et al. 2009.
+HLA-B\*5701 genotype is a major determinant of drug-induced liver injury
+due to flucloxacillin. Nature genetics 41:816-9Danan G, Benichou C.
+1993. Causality assessment of adverse reactions to drugs\--I. A novel
+method based on the conclusions of international consensus meetings:
+application to drug-induced liver injuries. J Clin Epidemiol
+46:1323-30Dar AC, Shokat KM. 2011. The evolution of protein kinase
+inhibitors from antagonists to agonists of cellular signaling. Annu Rev
+Biochem 80:769-95de Wildt SN, Kearns GL, Leeder JS, van den Anker JN.
+1999. Cytochrome P450 3A: ontogeny and drug disposition. Clin
+Pharmacokinet 37:485-505DeLeve LD, Ito Y, Bethea NW, McCuskey MK, Wang
+X, McCuskey RS. 2003. Embolization by sinusoidal lining cells obstructs
+the microcirculation in rat sinusoidal obstruction syndrome. Am J
+Physiol Gastrointest Liver Physiol 284:G1045--G52DeLeve LD, Wang X,
+Kuhlenkamp JF, Kaplowitz N. 1996. Toxicity of Azathioprine and
+Monocrotaline in Murine Sinusoidal Endothelial Cells and Hepatocytes:
+The Role of Glutathione and Relevance to Hepatic Venoocclusive Disease.
+Hepatology 23:589-99Dong H, Haining RL, Thummel KE, Rettie AE, Nelson
+SD. 2000. Involvement of human cytochrome P450 2D6 in the bioactivation
+of acetaminophen. Drug Metab Dispos 28:1397-400Doostdar H, Grant MH,
+Melvin WT, Wolf CR, Burke MD. 1993. The effects of inducing agents on
+cytochrome P450 and UDP-glucuronyltransferase activities in human HEPG2
+hepatoma cells. Biochemical pharmacology 46:629-35EFSA. 2011. Scientific
+Opinion on Pyrrolizidine alkaloids in food and feed. EFSA Journal
+9:1-134Ekins S, Williams AJ, Xu JJ. 2010. A predictive ligand-based
+Bayesian model for human drug-induced liver injury. Drug Metab. Dispos.
+38:2302-8EMA. 2014. EMA/HMPC/893108/2011: Public statement on the use of
+herbal medicinal products containing toxic, unsaturated pyrrolizidine
+alkaloids (PAs).1-24EMA. 2016. EMA/HMPC/328782/2016: Public statement on
+contamination of herbal medicinal products/traditional herbal medicinal
+products with pyrrolizidine alkaloids.1-11Fashe MM, Juvonen RO, Petsalo
+A, Vepsalainen J, Pasanen M, Rahnasto-Rilla M. 2015. In silico
+prediction of the site of oxidation by cytochrome P450 3A4 that leads to
+the formation of the toxic metabolites of pyrrolizidine alkaloids. Chem
+Res Toxicol 28:702-10Field RA, Stegelmeier BL, Colegate SM, Brown AW,
+Green BT. 2015. An in vitro comparison of the cytotoxic potential of
+selected dehydropyrrolizidine alkaloids and some N-oxides. Toxicon
+97:36-45Fleming I. 2014. The pharmacology of the cytochrome P450
+epoxygenase/soluble epoxide hydrolase axis in the vasculature and
+cardiovascular disease. Pharmacol Rev 66:1106-40Fonti V. 2017. *Feature
+Selection using LASSO*. Research paper. VU Amsterdam. 26 pp.Fu PP, Chou
+MW, Churchwell M, Wang Y, Zhao Y, et al. 2010. High-Performance Liquid
+Chromatography Electrospray Ionization Tandem Mass Spectrometry for the
+Detection and Quantitation of Pyrrolizidine Alkaloid-Derived DNA Adducts
+in Vitro and in Vivo. Chem Res Toxicol 23:637--52Fu PP, Xia Q, Lin G,
+Chou MW. 2004. Pyrrolizidine alkaloids\--genotoxicity, metabolism
+enzymes, metabolic activation, and mechanisms. Drug Metab Rev
+36:1-55Galeotti N, Vivoli E, Bilia AR, Vincieri FF, Ghelardini C. 2010.
+St. John\'s wort reduces neuropathic pain through a hypericin-mediated
+inhibition of the protein kinase Cgamma and epsilon activity. Biochem
+Pharmacol 79:1327-36Ganesan S, Tekwani BL, Sahu R, Tripathi LM, Walker
+LA. 2009. Cytochrome P(450)-dependent toxic effects of primaquine on
+human erythrocytes. Toxicol Appl Pharmacol 241:14-22Gao H, Ruan JQ, Chen
+J, Li N, Ke CQ, et al. 2015. Blood pyrrole-protein adducts as a
+diagnostic and prognostic index in pyrrolizidine alkaloid-hepatic
+sinusoidal obstruction syndrome. Drug Des Devel Ther 9:4861-8Gitlin N.
+1980. Salicylate hepatotoxicity: the potential role of hypoalbuminemia.
+J Clin Gastroenterol 2:281-5Gordon GJ, Coleman WB, Grisham JW. 2000.
+Bax-mediated apoptosis in the livers of rats after partial hepatectomy
+in the retrorsine model of hepatocellular injury. Hepatology
+32:312-20Gradhand U, Lang T, Schaeffeler E, Glaeser H, Tegude H, et al.
+2008. Variability in human hepatic MRP4 expression: influence of
+cholestasis and genotype. Pharmacogenomics J 8:42-52Gramatica P, Corradi
+M, Consonni V. 2000. Modelling and prediction of soil sorption
+coefficients of non-ionic organic pesticides by molecular descriptors.
+Chemosphere 41:763-77Greene N, Fisk L, Naven RT, Note RR, Patel ML,
+Pelletier DJ. 2010. Developing structure-activity relationships for the
+prediction of hepatotoxicity. Chemical Research in Toxicology
+23:1215-22Guo YX, Xu XF, Zhang QZ, Li C, Deng Y, et al. 2015. The
+inhibition of hepatic bile acids transporters Ntcp and Bsep is involved
+in the pathogenesis of isoniazid/rifampicin-induced hepatotoxicity.
+Toxicology mechanisms and methods 25:382-7Hall LH, Kier LB. 1995.
+Electrotopological State Indices for Atom Types: A Novel Combination of
+Electronic, Topological, and Valence State Information. Journal of
+Chemical Information and Computer Sciences 35:1039-45Hammann F, Schoning
+V, Drewe J. 2018. Prediction of clinically relevant drug-induced liver
+injury from structure using machine learning. J Appl Toxicol Hansen K,
+Mika S, Schroeter T, Sutter A, ter Laak A, et al. 2009. Benchmark data
+set for in silico prediction of Ames mutagenicity. J Chem Inf Model
+49:2077-81Hartmann T, Ehmke A, Eilert U, yon Borstel K, Thcuring C.
+1989. Sites of synthesis, translocation and accumulation of
+pyrrolizidine alkaloid N-oxides in Senecio vulgaris L. Planta
+177:98-107Hartmann T, Witte L. 1995. Chemistry, Biology and Chemoecology
+of the Pyrrolizidine Alkaloids. In *Alkaloids: Chemical and Biological
+Perspectives*, ed. Pelletier, pp. 155-233. Pergamon, London, New
+YorkHessel S, Gottschalk C, Schumann D, These A, Preiss-Weigert A,
+Lampen A. 2014. Structure-activity relationship in the passage of
+different pyrrolizidine alkaloids through the gastrointestinal barrier:
+ABCB1 excretes heliotrine and echimidine. Mol Nutr Food Res
+58:995-1004Hunt CM, Westerkam WR, Stave GM. 1992. Effect of age and
+gender on the activity of human hepatic CYP3A. Biochemical pharmacology
+44:275-83Ibanez L, Perez E, Vidal X, Laporte JR, Grup d\'Estudi
+Multicenteric d\'Hepatotoxicitat Aguda de B. 2002. Prospective
+surveillance of acute serious liver disease unrelated to infectious,
+obstructive, or metabolic diseases: epidemiological and clinical
+features, and exposure to drugs. J Hepatol 37:592-600ICH. 2011.
+Guideance on genotoxicity testing and data interpretation for
+pharmaceuticals intended for human use S2(R1). p. 29Iyer VV, Yang H,
+Ierapetritou MG, Roth CM. 2010. Effects of glucose and insulin on
+HepG2-C3A cell metabolism. Biotechnol Bioeng 107:347-56Jago MV. 1971.
+Factors affecting the chronic hepatotoxicity of pyrrolizidine alkaloids.
+The Journal of Pathology 105:1-11Jeon JY, Sparreboom A, Baker SD. 2017.
+Kinase Inhibitors: The Reality Behind the Success. Clin Pharmacol Ther
+102:726-30Jeong W, Doroshow JH, Kummar S. 2013. United States Food and
+Drug Administration approved oral kinase inhibitors for the treatment of
+malignancies. Curr Probl Cancer 37:110-44Ji L, Chen Y, Liu T, Wang Z.
+2008. Involvement of Bcl-xL degradation and mitochondrial-mediated
+apoptotic pathway in pyrrolizidine alkaloids-induced apoptosis in
+hepatocytes. Toxicol Appl Pharmacol 231:393-400Jornil J, Nielsen TS,
+Rosendal I, Ahlner J, Zackrisson AL, et al. 2013. A poor metabolizer of
+both CYP2C19 and CYP2D6 identified by mechanistic pharmacokinetic
+simulation in a fatal drug poisoning case involving venlafaxine.
+Forensic Sci Int 226:e26-31Kalthoff S, Ehmer U, Freiberg N, Manns MP,
+Strassburg CP. 2010. Interaction between oxidative stress sensor Nrf2
+and xenobiotic-activated aryl hydrocarbon receptor in the regulation of
+the human phase II detoxifying UDP-glucuronosyltransferase 1A10. J Biol
+Chem 285:5993-6002Kazius J, McGuire R, Bursi R. 2005. Derivation and
+validation of toxicophores for mutagenicity prediction. J Med Chem
+48:312-20Khan D, Khan AU. 2016. Descriptors and their selection methods
+in QSAR analysis: paradigm for drug design. Drug Discov Today
+21:1291-302Kim HY, Stermitz FR, Molyneux RJ, Wilson DW, Taylor D,
+Coulombe RA, Jr. 1993. Structural influences on pyrrolizidine
+alkaloid-induced cytopathology. Toxicol Appl Pharmacol 122:61-9Kock K,
+Ferslew BC, Netterberg I, Yang K, Urban TJ, et al. 2014. Risk factors
+for development of cholestatic drug-induced liver injury: inhibition of
+hepatic basolateral bile acid transporters multidrug
+resistance-associated proteins 3 and 4. Drug Metab Dispos
+42:665-74Lammert C, Einarsson S, Saha C, Niklasson A, Bjornsson E,
+Chalasani N. 2008. Relationship between daily dose of oral medications
+and idiosyncratic drug-induced liver injury: search for signals.
+Hepatology 47:2003-9Langel D, Ober D, Pelser PB. 2011. The evolution of
+pyrrolizidine alkaloid biosynthesis and diversity in the Senecioneae.
+Phytochemistry Reviews 10:3-74Lasser KE, Allen PD, Woolhandler SJ,
+Himmelstein DU, Wolfe SM, Bor DH. 2002. Timing of new black box warnings
+and withdrawals for prescription medications. JAMA 287:2215-20Li N, Xia
+Q, Ruan J, Fu PP, Lin G. 2011. Hepatotoxicity and Tumorigenicity Induced
+by Metabolic Activation of Pyrrolizidine Alkaloids in Herbs. Current
+Drug Metabolism 12Li X, Cameron MD. 2012. Potential role of a quetiapine
+metabolite in quetiapine-induced neutropenia and agranulocytosis. Chem
+Res Toxicol 25:1004-11Li YH, Kan WL, Li N, Lin G. 2013. Assessment of
+pyrrolizidine alkaloid-induced toxicity in an in vitro screening model.
+J Ethnopharmacol 150:560-7Lima A, Bernardes M, Azevedo R, Medeiros R,
+Seabra V. 2015. Pharmacogenomics of Methotrexate Membrane Transport
+Pathway: Can Clinical Response to Methotrexate in Rheumatoid Arthritis
+Be Predicted? Int J Mol Sci 16:13760-80Lin G. 1998. Microsomal Formation
+of a Pyrrolic Alcohol Glutathione Conjugate of ClivorineFirm Evidence
+for the Formation of a Pyrrolic Metabolite of an Otonecine-Type
+Pyrrolizidine Alkaloid. Drug Metabolism and Disposition
+26:181-4Lindigkeit R, Biller A, Buch M, Schiebel H-M, Boppré M, Hartmann
+T. 1997. The two faces of pyrrolizidine alkaloids: the role of the
+tertiary amine and its N-oxide in chemical defense of insects with
+acquired plant alkaloids. Eur J Biochem 245Makhlouf HA, Helmy A, Fawzy
+E, El-Attar M, Rashed HA. 2008. A prospective study of antituberculous
+drug-induced hepatotoxicity in an area endemic for liver diseases.
+Hepatol Int 2:353-60Marin-Hernandez A, Rodriguez-Enriquez S,
+Vital-Gonzalez PA, Flores-Rodriguez FL, Macias-Silva M, et al. 2006.
+Determining and understanding the control of glycolysis in fast-growth
+tumor cells. Flux control by an over-expressed but strongly
+product-inhibited hexokinase. FEBS J 273:1975-88Marroquin LD, Hynes J,
+Dykens JA, Jamieson JD, Will Y. 2007. Circumventing the Crabtree effect:
+replacing media glucose with galactose increases susceptibility of HepG2
+cells to mitochondrial toxicants. Toxicol Sci 97:539-47Mattocks AR.
+1986. *Chemistry and Toxicology of Pyrrolizidine Alkaloids*: Academic
+PressMeharena HS, Chang P, Keshwani MM, Oruganty K, Nene AK, et al.
+2013. Deciphering the structural basis of eukaryotic protein kinase
+regulation. PLoS Biol 11:e1001680Merz KH, Schrenk D. 2016. Interim
+relative potency factors for the toxicological risk assessment of
+pyrrolizidine alkaloids in food and herbal medicines. Toxicol Lett
+263:44-57Miners JO, Birkett DJ. 1998. Cytochrome P4502C9: an enzyme of
+major importance in human drug metabolism. British Journal of Clinical
+Pharmacology 45:525-38Mingard C, Paech F, Bouitbir J, Krahenbuhl S.
+2018. Mechanisms of toxicity associated with six tyrosine kinase
+inhibitors in human hepatocyte cell lines. J Appl Toxicol
+38:418-31Mingatto FE, Dorta DJ, dos Santos AB, Carvalho I, da Silva CH,
+et al. 2007. Dehydromonocrotaline inhibits mitochondrial complex I. A
+potential mechanism accounting for hepatotoxicity of monocrotaline.
+Toxicon 50:724-30Mitchell JB. 2014. Machine learning methods in
+chemoinformatics. Wiley Interdiscip Rev Comput Mol Sci 4:468-81Morgan
+RE, Trauner M, van Staden CJ, Lee PH, Ramachandran B, et al. 2010.
+Interference with bile salt export pump function is a susceptibility
+factor for human liver injury in drug development. Toxicol Sci
+118:485-500Muegge I, Mukherjee P. 2016. An overview of molecular
+fingerprint similarity search in virtual screening. Expert Opin Drug
+Discov 11:137-48Najibi A, Heidari R, Zarifi J, Jamshidzadeh A,
+Firoozabadi N, Niknahad H. 2016. Evaluating the Role of Drug Metabolism
+and Reactive Intermediates in Trazodone-Induced Cytotoxicity toward
+Freshly-Isolated Rat Hepatocytes. Drug Res (Stuttg) 66:592-6Nantasenamat
+C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V. 2009. A
+Practical Overview of Quantitative Structure-Activity Relationship.
+EXCLI Journal 8:74-88National Cancer Institute. 2006. Common Terminology
+Criteria for Adverse Events v3.0 (CTCAE). ed. Program, CTENeumann MG,
+Cohen LB, Opris M, Nanau R, Jeong H. 2015. Hepatotoxicity of
+Pyrrolizidine Alkaloids. J Pharm Pharm Sci 18:825-43Newby D, Freitas AA,
+Ghafourian T. 2015. Decision trees to characterise the roles of
+permeability and solubility on the prediction of oral absorption. Eur J
+Med Chem 90:751-65Niederer C, Behra R, Harder A, Schwarzenbach RP,
+Escher BI. 2004. Mechanistic approaches for evaluating the toxicity of
+reactive organochlorines and epoxides in green algae. Environmental
+Toxicology and Chemistry 23:697-704NTP. 1978. Bioassay of lasiocarpine
+for possible carcinogenicity. pp. 1-82NTP. 2003. Toxicology and
+Carcinogenesis Studies of Riddelliine (CAS No. 23246-96-0) in F344/N
+Rats And B6c3F~1~ Mice (Gavage Studies). ed. Health, NIoO\'Boyle NM,
+Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. 2011. Open
+Babel: An open chemical toolbox. J Cheminform 3:33Open Babel community.
+2011. *Molecular fingerprints and similarity searching --- Open Babel
+v2.3.1 documentation. Openbabel.org*. , Dececmber 31, 2018Paech F,
+Bouitbir J, Krahenbuhl S. 2017. Hepatocellular Toxicity Associated with
+Tyrosine Kinase Inhibitors: Mitochondrial Damage and Inhibition of
+Glycolysis. Front Pharmacol 8:367Parkinson A, Mudra DR, Johnson C, Dwyer
+A, Carroll KM. 2004. The effects of gender, age, ethnicity, and liver
+cirrhosis on cytochrome P450 enzyme activity in human liver microsomes
+and inducibility in cultured human hepatocytes. Toxicol Appl Pharmacol
+199:193-209Pellinen P, Honkakoski P, Stenback F, Niemitz M, Alhava E, et
+al. 1994. Cocaine N-demethylation and the metabolism-related
+hepatotoxicity can be prevented by cytochrome P450 3A inhibitors. Eur J
+Pharmacol 270:35-43Regev A, Seeff LB, Merz M, Ormarsdottir S, Aithal GP,
+et al. 2014. Causality assessment for suspected DILI during clinical
+phases of drug development. Drug Saf 37 Suppl 1:S47-56Rendic S. 2002.
+Summary of information on human CYP enzymes: human P450 metabolism data.
+Drug Metab Rev 34:83-448Reuben A, Koch DG, Lee WM, Acute Liver Failure
+Study G. 2010. Drug-induced acute liver failure: results of a U.S.
+multicenter, prospective study. Hepatology 52:2065-76Rodrigues AC. 2010.
+Efflux and uptake transporters as determinants of statin response.
+Expert Opin Drug Metab Toxicol 6:621-32Roskoski R, Jr. 2015. A
+historical overview of protein kinases and their targeted small molecule
+inhibitors. Pharmacol Res 100:1-23Ruan J, Liao C, Ye Y, Lin G. 2014a.
+Lack of metabolic activation and predominant formation of an excreted
+metabolite of nontoxic platynecine-type pyrrolizidine alkaloids. Chem
+Res Toxicol 27:7-16Ruan J, Yang M, Fu P, Ye Y, Lin G. 2014b. Metabolic
+activation of pyrrolizidine alkaloids: insights into the structural and
+enzymatic basis. Chem Res Toxicol 27:1030-9Rubiolo P, Pieters L, Calomme
+M, Bicchi C, Vlietinck A, Vanden Berghe D. 1992. Mutagenicity of
+pyrrolizidine alkaloids in the Salmonella typhimurium/mammalian
+microsome system. Mutat Res 281:143-7Rücker C, Rücker G, Meringer M.
+2007. y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf.
+Model. 47:2345-57Schoental R, Head MA. 1957. Progression of liver
+lesions produced in rats by temporary treatment with pyrrolizidine
+(senecio) alkaloids, and the effects of betaine and high casein diet. Br
+J Cancer 11:535-44Schöning V, Hammann F, Peinl M, Drewe J. 2017.
+Editor\'s Highlight: Identification of Any Structure-Specific
+Hepatotoxic Potential of Different Pyrrolizidine Alkaloids Using Random
+Forests and Artificial Neural Networks. Toxicol Sci 160:361-70Shah RR,
+Morganroth J, Shah DR. 2013. Hepatotoxicity of tyrosine kinase
+inhibitors: clinical and regulatory perspectives. Drug Saf
+36:491-503Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, et al. 2009.
+Bioclipse 2: A scriptable integration platform for the life sciences.
+BMC Bioinformatics 10:1-5Spjuth O, Helmus T, Willighagen EL, Kuhn S,
+Eklund M, et al. 2007. Bioclipse: an open source workbench for chemo-
+and bioinformatics. BMC Bioinformatics 8:1-10Srinivas N, Sandeep KS,
+Anusha Y, Devendra BN. 2014. In Vitro Cytotoxic Evaluation and
+Detoxification of Monocrotaline (Mct) Alkaloid: An In Silico Approach.
+International Invention Journal of Biochemistry and Bioinformatics
+2:20-9Stine JG, Chalasani NP. 2017. Drug Hepatotoxicity: Environmental
+Factors. Clin Liver Dis 21:103-13Stine JG, Lewis JH. 2011. Drug-induced
+liver injury: a summary of recent advances. Expert Opin Drug Metab
+Toxicol 7:875-90Takanashi H, Umeda M, Hirono I. 1980. Chromosomal
+aberrations and mutations in cultured mammalidan cells induced by
+pyrrolizidine alkaloids. Mutation Research 78:67-77Takeda M, Okamoto I,
+Nakagawa K. 2015. Pooled safety analysis of EGFR-TKI treatment for EGFR
+mutation-positive non-small cell lung cancer. Lung Cancer 88:74-9Tamta
+H, Pawar RS, Wamer WG, Grundel E, Krynitsky AJ, Rader JI. 2012.
+Comparison of metabolism-mediated effects of pyrrolizidine alkaloids in
+a HepG2/C3A cell-S9 co-incubation system and quantification of their
+glutathione conjugates. Xenobiotica 42:1038-48Teh LK, Bertilsson L.
+2012. Pharmacogenomics of CYP2D6: molecular genetics, interethnic
+differences and clinical importance. Drug Metab Pharmacokinet
+27:55-67Teo YL, Ho HK, Chan A. 2013. Risk of tyrosine kinase
+inhibitors-induced hepatotoxicity in cancer patients: a meta-analysis.
+Cancer Treat Rev 39:199-206Teo YL, Ho HK, Chan A. 2015. Formation of
+reactive metabolites and management of tyrosine kinase inhibitor-induced
+hepatotoxicity: a literature review. Expert Opin Drug Metab Toxicol
+11:231-42Thompson RA, Isin EM, Ogese MO, Mettetal JT, Williams DP. 2016.
+Reactive Metabolites: Current and Emerging Risk and Hazard Assessments.
+Chem Res Toxicol 29:505-33Walker K, Ginsberg G, Hattis D, Johns DO,
+Guyton KZ, Sonawane B. 2009. Genetic polymorphism in N-Acetyltransferase
+(NAT): Population distribution of NAT1 and NAT2 activity. Journal of
+toxicology and environmental health. Part B, Critical reviews
+12:440-72Wang YP, Yan J, Fu PP, Chou MW. 2005. Human liver microsomal
+reduction of pyrrolizidine alkaloid N-oxides to form the corresponding
+carcinogenic parent alkaloid. Toxicol Lett 155:411-20Weininger D. 1988.
+SMILES, a chemical language and information system. 1. Introduction to
+methodology and encoding rules. J Chem Inf Comput Sci 28:31-6Westerink
+WM, Schoonen WG. 2007. Phase II enzyme levels in HepG2 cells and
+cryopreserved primary human hepatocytes and their induction in HepG2
+cells. Toxicol In Vitro 21:1592-602Wu P, Nielsen TE, Clausen MH. 2015.
+FDA-approved small-molecule kinase inhibitors. Trends Pharmacol Sci
+36:422-39Xia Q, Ma L, He X, Cai L, Fu PP. 2015. 7-glutathione pyrrole
+adduct: a potential DNA reactive metabolite of pyrrolizidine alkaloids.
+Chem Res Toxicol 28:615-20Xia Q, Zhao Y, Von Tungeln LS, Doerge DR, Lin
+G, et al. 2013. Pyrrolizidine alkaloid-derived DNA adducts as a common
+biological biomarker of pyrrolizidine alkaloid-induced tumorigenicity.
+Chem Res Toxicol 26:1384-96Yan J, Xia Q, Chou MW, Fu P. 2008. Metabolic
+activation of retronecine and retronecine N-oxide -- formation of
+DHP-derived DNA adducts. Toxicology and Industrial Health 24Yang X, Li
+W, Sun Y, Guo X, Huang W, et al. 2017. Comparative Study of
+Hepatotoxicity of Pyrrolizidine Alkaloids Retrorsine and Monocrotaline.
+Chem Res Toxicol 30:532-9Yap CW. 2011. PaDEL-descriptor: an open source
+software to calculate molecular descriptors and fingerprints. Journal of
+computational chemistry 32:1466-74Yap CW. 2014. *Descriptors*. ,
+27.10.2016Yu K, Geng X, Chen M, Zhang J, Wang B, et al. 2014a. High
+daily dose and being a substrate of cytochrome P450 enzymes are two
+important predictors of drug-induced liver injury. Drug Metab Dispos
+42:744-50Yu K, Geng X, Chen M, Zhang J, Wang B, et al. 2014b. High daily
+dose and being a substrate of cytochrome P450 enzymes are two important
+predictors of drug-induced liver injury. Drug Metab. Dispos.
+42:744-50Zanger UM, Turpeinen M, Klein K, Schwab M. 2008. Functional
+pharmacogenetics/genomics of human cytochromes P450 involved in drug
+biotransformation. Anal Bioanal Chem 392:1093-108Zhang J, Sheng Y, Shi
+L, Zheng Z, Chen M, et al. 2017. Quercetin and baicalein suppress
+monocrotaline-induced hepatic sinusoidal obstruction syndrome in rats.
+Eur J Pharmacol 795:160-8Zhao Y, Xia Q, Gamboa da Costa G, Yu H, Cai L,
+Fu PP. 2012. Full structure assignments of pyrrolizidine alkaloid DNA
+adducts and mechanism of tumor initiation. Chem Res Toxicol
+25:1985-96Zheng Z, Shi L, Sheng Y, Zhang J, Lu B, Ji L. 2016.
+Chlorogenic acid suppresses monocrotaline-induced sinusoidal obstruction
+syndrome: The potential contribution of NFkappaB, Egr1, Nrf2, MAPKs and
+PI3K signals. Environ Toxicol Pharmacol 46:80-9Zhu XW, Xin YJ, Ge HL.
+2015. Recursive Random Forests Enable Better Predictive Performance and
+Model Interpretation than Variable Selection by LASSO. J Chem Inf Model
+55:736-46