mutagenicity.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892

---
title: A comparison of nine machine learning mutagenicity models and their application for predicting pyrrolizidine alkaloids

author:
  - Christoph Helma:
      institute: ist
      email: helma@in-silico.ch
      correspondence: "yes"
  - Verena Schöning:
      institute: insel
  - Jürgen Drewe:
      institute: zeller, unibas
      email: juergendrewe@zellerag.ch
      correspondence: "yes"
  - Philipp Boss:
      institute: sysbio

institute:
  - ist:
      name: in silico toxicology gmbh
      address: "Rastatterstrasse 41, 4057 Basel, Switzerland"
  - zeller: 
      name: Max Zeller Söhne AG
      address: "Seeblickstrasse 4, 8590 Romanshorn, Switzerland"
  - sysbio:
      name: Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association
      address: "Robert-Rössle-Strasse 10, Berlin, 13125, Germany"
  - unibas:
      name: Clinical Pharmacology, Department of Pharmaceutical Sciences, University Hospital Basel, University of Basel
      address: "Petersgraben 4, 4031 Basel, Switzerland"
  - insel:
      name: Clinical Pharmacology and Toxicology, Department of General Internal Medicine, University Hospital Bern, University of Bern
      address: "Inselspital, 3010 Bern, Switzerland"

bibliography: bibliography.bib
keywords: mutagenicity, QSAR, lazar, random forest, support vector machine, linear regression, neural nets, deep learning, pyrrolizidine alkaloids, OpenBabel, CDK

#documentclass: frontiersHLTH
documentclass: scrartcl
tblPrefix: Table
figPrefix: Figure
header-includes:
    - \usepackage{lineno, color, setspace}
    - \doublespacing
    - \linenumbers
...

Abstract
========

Random forest, support vector machine, logistic regression, neural networks and
k-nearest neighbor (`lazar`) algorithms, were applied to a new *Salmonella*
mutagenicity dataset with {{cv.n_uniq}} unique chemical structures utilizing
MolPrint2D and Chemistry Development Kit (CDK) descriptors.  Crossvalidation
accuracies of all investigated models ranged from 80-85% which is comparable
with the interlaboratory variability of the *Salmonella* mutagenicity assay.
Pyrrolizidine alkaloid predictions showed a clear distinction between chemical
groups, where otonecines had the highest proportion of positive mutagenicity
predictions and monoesters the lowest.

Introduction
============

The assessment of mutagenicity is an important part in the safety assessment of
chemical structures, because mutations may lead to cancer and germ
cells damage.  The *Salmonella typhimurium* bacterial reverse mutation
test (Ames test) is capable to identify substances that cause mutations (e.g.,
base-pair substitutions, frameshifts, insertions, deletions) and is generally
used as the first step in genotoxicity and carcinogenicity assessments.

Computer based (*in silico*) mutagenicity predictions can be used in the early
screening of novel compounds (e.g. drug candidates), but they are also gaining
regulatory acceptance e.g. for the registration of industrial chemicals within
REACH (@ECHA2017) or the assessment of impurities in pharmaceuticals (ICH M7
guideline, Harmonisation of Technical Requirements for Pharmaceuticals for
Human Use @ICH2017).

Currently, *Salmonella* mutagenicity is the toxicological endpoint with the
largest amount of public data for almost 10000 structures, whereas datasets for
other endpoints contain typically only a few hundred compounds. The Ames test
itself is relatively reproducible with an interlaboratory variability of 80-85%
(@Piegorsch1991).

This makes the development of mutagenicity models also interesting from a
computational chemistry and machine learning point of view.  The relatively
large amount of public data reduces the probability of chance effects due to
small sample sizes and the reliability of the underlying assay reduces the risk
of overfitting experimental errors.

Within this study we attempted

  - to generate a new public mutagenicity training dataset, by combining the most comprehensive public datasets
  - to compare the performance of MolPrint2D (*MP2D*) fingerprints with Chemistry Development Kit (*CDK*) descriptors for mutagenicity predictions
  - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`)

To demonstrate the application of mutagenicity models to compounds with very
limited experimental data and to show their strengths an weaknesses we decided
to apply them to {{pa.nr}} Pyrrolizidine alkaloids (PAs).

Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant
families, mainly: *Asteraceae*, *Boraginaceae*, *Fabaceae* and *Orchidaceae*
(@Hartmann1995, @Langel2011) and form a powerful defence mechanism against
herbivores. PAs are heterocyclic ester alkaloids composed of a necine base (two
fused five-membered rings joined by a single nitrogen atom) and a necic acid
(one or two carboxylic ester arms), occurring principally in two forms,
tertiary base PAs and PA N-oxides.

In mammals, PAs are mainly metabolized in the liver. There are three principal metabolic pathways for 1,2-unsaturated PAs (@Chen2010): 

Detoxification by 

- hydrolysis of the ester bond on positions C7 and C9 by non-specific esterases to release necine base and necic acid 

- N-oxidation of the necine base to form a PA N-oxides, which can be either conjugated by phase II enzymes and then excreted or converted back into the corresponding parent PA (following ref) This detoxification pathway is not possible for otonecine-type PAs, as they are N-methylated (see @fig:pa-schema, @Wang2005)

- Metabolic activation or toxification by oxidation (for retronecine-type PAs) or oxidative N-demethylation (for otonecine-type Pas) by cytochromes P450 isoforms CYP2B and 3A (@Lin1998,  @Ruan2014)

The latter reactions result in the formation of dehydropyrrolizidine (DHP) that is highly reactive and causes damage by building adducts with protein, lipids and DNA (@Chen2010). On the other hand, open diesters and macrocyclic PAs have a reduced detoxification due to steric hinderance of the respective esterases (@Ruan2014)

Therefore the 
mutagenic probability of PAs is highly dependent on structure of necine
base and necic acid (@Hadi2021; @Allemang2018, @Louisse2019). However, due to
limited availability of pure substances, only a limited number of PAs have been
investigated with regards to their structure-specific mutagenicity and
experimentally in an Ames test. To overcome this bottleneck, the prediction of
structure-specific mutagenic probabilities of PAs with different machine learning
models could provide further insights in the mechanisms.

Materials and Methods
=====================

Data
----

### Mutagenicity training data

An identical training dataset was used for all models. The
training dataset was compiled from the following sources:

-   Kazius/Bursi Dataset (4337 compounds, @Kazius2005): <http://cheminformatics.org/datasets/bursi/cas_4337.zip>

-   Hansen Dataset (6513 compounds, @Hansen2009): <http://doc.ml.tu-berlin.de/toxbenchmark/Mutagenicity_N6512.csv>

-   EFSA Dataset (695 compounds @EFSA2016): <https://data.europa.eu/euodp/data/storage/f/2017-0719T142131/GENOTOX%20data%20and%20dictionary.xls>

Mutagenicity classifications from Kazius and Hansen datasets were used without
further processing. According to these publications compounds were classified
as mutagenic, if at least one positive result has been obtained in *Salmonella
typhimurium* strains TA98, TA100, TA1535, TA1537, TA97, TA102 and 1538 either
with or without metabolic activation by S9. *E. coli* results were not
considered in these databases. To achieve consistency with these datasets, EFSA
compounds were classified as mutagenic, if at least one positive result was
found for TA98 or T100 Salmonella strains either with or without metabolic
activation. The complete dataset contains chemicals for very diverse
application areas (e.g. pharmaceuticals, pesticides, industrial chemicals,
environmental contaminants).

Dataset merges were based on unique SMILES (*Simplified Molecular Input Line
Entry Specification*, @Weininger1989) strings of the compound structures.
Duplicated experimental data with the same outcome was merged into a single
value, because it is likely that it originated from the same experiment.
Contradictory results were kept as multiple measurements in the database. The
combined training dataset contains {{cv.n_uniq}} unique structures and {{cv.n}}
individual measurements.

Source code for all data download, extraction and merge operations is publicly
available from the git repository <https://git.in-silico.ch/mutagenicity-paper>
under a GPL3 License. The new combined dataset can be found at
<https://git.in-silico.ch/mutagenicity-paper/tree/mutagenicity/mutagenicity.csv>.

### Pyrrolizidine alkaloid (PA) dataset

The pyrrolizidine alkaloid dataset was created from five independent, necine
base substructure searches in PubChem (https://pubchem.ncbi.nlm.nih.gov/) and
compared to the PAs listed in the EFSA publication @EFSA2011 and the book by
@Mattocks1986, to ensure, that all major PAs were included. PAs
mentioned in these publications, which were not found in the downloaded
substances were searched individually in PubChem and, if available, downloaded
separately.  Non-PA substances, duplicates, and isomers were removed from the
files, but artificial PAs, even if unlikely to occur in nature, were kept. The
resulting PA dataset comprised a total of {{pa.n}} different PAs.
Further details about the compilation of the PA dataset are described in @Schoening2017.


The PAs in the dataset were classified according to structural features. A
total of 9 different structural features were assigned to the necine base,
modifications of the necine base and to the necic acid (@fig:pa-schema):

![Structural features of pyrrolizidine alkaloids](figures/PA-Schema.png){#fig:pa-schema}

For the necine base, the following structural features were chosen:

  - Retronecine-type (1,2-unstaturated necine base, {{pa.groups.Retronecine.n}} compounds)
  - Otonecine-type (1,2-unstaturated necine base, {{pa.groups.Otonecine.n}} compounds)
  - Platynecine-type (1,2-saturated necine base, {{pa.groups.Platynecine.n}} compounds)

For the modifications of the necine base, the following structural features were chosen:

  - N-oxide-type ({{pa.groups.N_oxide.n}} compounds)
  - Dehydropyrrolizidine-type (DHP, pyrrolic ester, {{pa.groups.Dehydropyrrolizidine.n}} compounds)
  - Tertiary-type (PAs which were neither from the N-oxide- nor DHP-type, {{pa.groups.Tertiary_PA.n}} compounds)

For the necic acid, the following structural features were chosen:

  - Monoester-type ({{pa.groups.Monoester.n}} compounds)
  - Open-ring diester-type ({{pa.groups.Diester.n}} compounds)
  - Macrocyclic diester-type ({{pa.groups.Macrocyclic_diester.n}} compounds)

Descriptors
-----------

### MolPrint2D (*MP2D*) fingerprints

MolPrint2D fingerprints (@OBoyle2011a) use atom environments as molecular
representation.  They determine for each atom in a molecule, the atom types of
its connected atoms to represent their chemical environment.  This resembles
basically the chemical concept of functional groups.

In contrast to predefined lists of fragments (e.g. FP3, FP4 or MACCs
fingerprints) or descriptors (e.g CDK) they are generated dynamically from
chemical structures. This has the advantage that they can capture unknown
substructures of toxicological relevance that are not included in other
descriptors. In addition, they allow the efficient calculation of chemical
similarities (e.g. Tanimoto indices) with simple set operations.

MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics
library (@OBoyle2011a) for the complete training dataset with {{cv.n}}
instances. They can be obtained from the following locations:

*Training data:*

  - sparse representation (<https://git.in-silico.ch/mutagenicity-paper/tree/mutagenicity/mp2d/fingerprints.mp2d>)
  - descriptor matrix (<https://git.in-silico.ch/mutagenicity-paper/tree/mutagenicity/mp2d/mutagenicity-fingerprints.csv.gz>)

*Pyrrolizidine alkaloids:*

  - sparse representation (<https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/mp2d/fingerprints.mp2d>)
  - descriptor matrix (<https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/mp2d/pa-fingerprints.csv.gz>)

#### Chemistry Development Kit (*CDK*) descriptors

Molecular 1D and 2D descriptors were calculated with the PaDEL-Descriptors
program (<http://www.yapcwsoft.com> version 2.21, @Yap2011). PaDEL uses the
Chemistry Development Kit (*CDK*, <https://cdk.github.io/index.html>) library
for descriptor calculations.

As the training dataset contained {{cv.n}} instances, it was decided to
delete all instances where CDK descriptor calculations failed during pre-processing. Furthermore,
all substances with contradictory experimental mutagenicity data were removed. The final training dataset
contained {{cv.cdk.n_descriptors}} descriptors for {{cv.cdk.n_compounds}}
compounds.

CDK training data can be obtained from <https://git.in-silico.ch/mutagenicity-paper/tree/mutagenicity/cdk/mutagenicity-mod-2.new.csv>.

The same procedure was applied for the pyrrolizidine dataset yielding 
 {{pa.cdk.n_descriptors}} descriptors for {{pa.cdk.n_compounds}}
compounds. CDK features for pyrrolizidine alkaloids are available at  <https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/cdk/PA-Padel-2D_m2.csv>.

Algorithms
----------

### `lazar`

`lazar` (*lazy structure activity relationships*) is a modular framework
for read-across model development and validation. It follows the
following basic workflow: For a given chemical structure `lazar`:

-   searches in a database for similar structures (neighbours) with
    experimental data,

-   builds a local QSAR model with these neighbours and

-   uses this model to predict the unknown activity of the query
    compound.

This procedure resembles an automated version of read across predictions
in toxicology, in machine learning terms it would be classified as a
k-nearest-neighbour algorithm.

Apart from this basic workflow, `lazar` is completely modular and allows
the researcher to use arbitrary algorithms for similarity searches and local
QSAR (*Quantitative structure--activity relationship*) modelling.
Algorithms used within this study are described in the following
sections.

#### Feature preprocessing

MolPrint2D features were used without preprocessing. Near zero variance and
strongly correlated CDK descriptors were removed and the remaining descriptor
values were centered and scaled. Preprocessing was performed with the R `caret`
preProcess function using the methods "nzv","corr","center" and "scale" with
default settings.

#### Neighbour identification

Utilizing this modularity, similarity calculations were based both on
MolPrint2D fingerprints and on CDK descriptors.

For MolPrint2D fingerprints chemical similarity between two compounds $a$ and
$b$ is expressed as the proportion between atom environments common in both
structures $A \cap B$ and the total number of atom environments $A \cup B$
(Jaccard/Tanimoto index).

$$sim = \frac{\lvert A\  \cap B \rvert}{\lvert A\  \cup B \rvert}$$

For CDK descriptors chemical similarity between two compounds $a$ and $b$ is
expressed as the cosine similarity between the descriptor vectors $A$ for $a$
and $B$ for $b$.

$$sim = \frac{A \cdot B}{\lvert A \rvert \lvert B \rvert}$$


Threshold selection is a trade-off between prediction accuracy (high
threshold) and the number of predictable compounds (low threshold). As
it is in many practical cases desirable to make predictions even in the
absence of closely related neighbours, we follow a tiered approach:

-   First a similarity threshold of 0.5 (MP2D/Tanimoto) or 0.9 (CDK/Cosine) is
    used to collect neighbours, to create a local QSAR model and to make a
    prediction for the query compound. This are predictions with *high
    confidence*.

-   If any of these steps fails, the procedure is repeated with a similarity
    threshold of 0.2 (MP2D/Tanimoto) or 0.7 (CDK/Cosine) and the prediction is
    flagged with a warning that it might be out of the applicability domain of
    the training data (*low confidence*).

-   These similarity thresholds are the default values chosen
    by software developers and remained unchanged during the
    course of these experiments.

Compounds with the same structure as the query structure are
automatically eliminated from neighbours to obtain unbiased predictions
in the presence of duplicates.

#### Local QSAR models and predictions

Only similar compounds (neighbours) above the threshold are used for
local QSAR models. In this investigation, we are using a weighted
majority vote from the neighbour's experimental data for mutagenicity
classifications. Probabilities for both classes (mutagenic/non-mutagenic) are
calculated according to the following formula and the class with the higher
probability is used as prediction outcome.

$$p_{c} = \ \frac{\sum_{}^{}\text{sim}_{n,c}}{\sum_{}^{}\text{sim}_{n}}$$

$p_{c}$ Probability of class c (e.g. mutagenic or non-mutagenic)\
$\sum_{}^{}\text{sim}_{n,c}$ Sum of similarities of neighbours with
class c\
$\sum_{}^{}\text{sim}_{n}$ Sum of all neighbours

#### Applicability domain

The applicability domain (AD) of `lazar` models is determined by the
structural diversity of the training data. If no similar compounds are
found in the training data no predictions will be generated. Warnings are
issued if the similarity threshold had to be lowered from 0.5 to 0.2 in order
to enable predictions. Predictions without warnings can be considered as close
to the applicability domain (*high confidence*) and predictions with warnings
as more distant from the applicability domain (*low confidence*). Quantitative
applicability domain information can be obtained from the similarities of
individual neighbours.

#### Validation

10-fold cross validation was performed for model evaluation.

#### Pyrrolizidine alkaloid predictions

For the prediction of pyrrolizidine alkaloids models were generated with the
MP2D and CDK training datasets. The complete feature set was used for MP2D
predictions, for CDK predictions the intersection between training and
pyrrolizidine alkaloid features was used.

#### Availability

  - Source code for this manuscript (GPL3):
    <https://git.in-silico.ch/lazar/tree/?h=mutagenicity-paper>
  
  - Crossvalidation experiments (GPL3):
    <https://git.in-silico.ch/lazar/tree/models/?h=mutagenicity-paper>
  
  - Pyrrolizidine alkaloid predictions (GPL3):
    <https://git.in-silico.ch/lazar/tree/predictions/?h=mutagenicity-paper>
  
  - Public web interface:
    <https://lazar.in-silico.ch>

### Tensorflow models

#### Feature Preprocessing

For preprocessing of the CDK features we used a quantile transformation 
to a uniform distribution. MP2D features were not preprocessed.

#### Random forests (*RF*)

For the random forest classifier we used the parameters 
n_estimators=1000 and max_leaf_nodes=200. For the other parameters we 
used the scikit-learn default values.

#### Logistic regression (SGD) (*LR-sgd*)

For the logistic regression we used an ensemble of five trained models. 
For each model we used a batch size of 64 and trained for 50 epochs. As 
an optimizer ADAM was chosen. For the other parameters we used the 
tensorflow default values.

#### Logistic regression (scikit) (*LR-scikit*)

For the logistic regression we used as parameters the scikit-learn 
default values.

#### Neural Nets (*NN*)

For the neural network we used an ensemble of five trained models. For 
each model we used a batch size of 64 and trained for 50 epochs. As an 
optimizer ADAM was chosen. The neural network had 4 hidden layers with 
64 nodes each and a ReLu activation function. For the other parameters 
we used the tensorflow default values.

#### Support vector machines (*SVM*)

We used the SVM implemented in scikit-learn. We used the parameters 
kernel='rbf', gamma='scale'. For the other parameters we used the 
scikit-learn default values.

#### Validation

10-fold cross-validation was used for all Tensorflow models.

#### Pyrrolizidine alkaloid predictions

For the prediction of pyrrolizidine alkaloids we trained the model described above on the 
training data. For training and prediction only the features were used 
that were in the intersection of features from the training data and the 
pyrrolizidine alkaloids.

#### Availability

Jupyter notebooks for these experiments can be found at the following locations

*Crossvalidation:*

  - MolPrint2D fingerprints: <https://git.in-silico.ch/mutagenicity-paper/tree/crossvalidations/mp2d/tensorflow>
  - CDK descriptors: <https://git.in-silico.ch/mutagenicity-paper/tree/crossvalidations/cdk/tensorflow>

*Pyrrolizidine alkaloids:*

  - MolPrint2D fingerprints: <https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/mp2d/tensorflow>
  - CDK descriptors: <https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/cdk/tensorflow>
  - CDK desc

Results
=======

10-fold crossvalidations
------------------------

Crossvalidation results are summarized in the following tables: @tbl:cv-mp2d
shows results with MolPrint2D descriptors and @tbl:cv-cdk with CDK descriptors.

|  | lazar-HC | lazar-all | RF | LR-sgd | LR-scikit | NN | SVM |
|:-|----------|-----------|----|--------|-----------|----|-----|
Accuracy | {{cv.mp2d_lazar_high_confidence.acc_perc}} | {{cv.mp2d_lazar_all.acc_perc}} | {{cv.mp2d_rf.acc_perc}} | {{cv.mp2d_lr.acc_perc}} | {{cv.mp2d_lr2.acc_perc}} | {{cv.mp2d_nn.acc_perc}} | {{cv.mp2d_svm.acc_perc}} |
True positive rate | {{cv.mp2d_lazar_high_confidence.tpr_perc}} | {{cv.mp2d_lazar_all.tpr_perc}} | {{cv.mp2d_rf.tpr_perc}} | {{cv.mp2d_lr.tpr_perc}} | {{cv.mp2d_lr2.tpr_perc}} | {{cv.mp2d_nn.tpr_perc}} | {{cv.mp2d_svm.tpr_perc}} |
True negative rate | {{cv.mp2d_lazar_high_confidence.tnr_perc}} | {{cv.mp2d_lazar_all.tnr_perc}} | {{cv.mp2d_rf.tnr_perc}} | {{cv.mp2d_lr.tnr_perc}} | {{cv.mp2d_lr2.tnr_perc}} | {{cv.mp2d_nn.tnr_perc}} | {{cv.mp2d_svm.tnr_perc}} |
Positive predictive value | {{cv.mp2d_lazar_high_confidence.ppv_perc}} | {{cv.mp2d_lazar_all.ppv_perc}} | {{cv.mp2d_rf.ppv_perc}} | {{cv.mp2d_lr.ppv_perc}} | {{cv.mp2d_lr2.ppv_perc}} | {{cv.mp2d_nn.ppv_perc}} | {{cv.mp2d_svm.ppv_perc}} |
Negative predictive value | {{cv.mp2d_lazar_high_confidence.npv_perc}} | {{cv.mp2d_lazar_all.npv_perc}} | {{cv.mp2d_rf.npv_perc}} | {{cv.mp2d_lr.npv_perc}} | {{cv.mp2d_lr2.npv_perc}} | {{cv.mp2d_nn.npv_perc}} | {{cv.mp2d_svm.npv_perc}} |
Nr. predictions | {{cv.mp2d_lazar_high_confidence.n}} | {{cv.mp2d_lazar_all.n}} | {{cv.mp2d_rf.n}} | {{cv.mp2d_lr.n}} | {{cv.mp2d_lr2.n}} | {{cv.mp2d_nn.n}} | {{cv.mp2d_svm.n}} |

: Summary of crossvalidation results with MolPrint2D descriptors (lazar-HC: lazar with high confidence, lazar-all: all lazar predictions, RF: random forests, LR-sgd: logistic regression (stochastic gradient descent), LR-scikit: logistic regression (scikit), NN: neural networks, SVM: support vector machines) {#tbl:cv-mp2d}


|  | lazar-HC | lazar-all | RF | LR-sgd | LR-scikit | NN | SVM |
|:-|----------|-----------|----|--------|-----------|----|-----|
Accuracy | {{cv.cdk_lazar_high_confidence.acc_perc}} | {{cv.cdk_lazar_all.acc_perc}} | {{cv.cdk_rf.acc_perc}} | {{cv.cdk_lr.acc_perc}} | {{cv.cdk_lr2.acc_perc}} | {{cv.cdk_nn.acc_perc}} | {{cv.cdk_svm.acc_perc}} |
True positive rate | {{cv.cdk_lazar_high_confidence.tpr_perc}} | {{cv.cdk_lazar_all.tpr_perc}} | {{cv.cdk_rf.tpr_perc}} | {{cv.cdk_lr.tpr_perc}} | {{cv.cdk_lr2.tpr_perc}} | {{cv.cdk_nn.tpr_perc}} | {{cv.cdk_svm.tpr_perc}} |
True negative rate | {{cv.cdk_lazar_high_confidence.tnr_perc}} | {{cv.cdk_lazar_all.tnr_perc}} | {{cv.cdk_rf.tnr_perc}} | {{cv.cdk_lr.tnr_perc}} | {{cv.cdk_lr2.tnr_perc}} | {{cv.cdk_nn.tnr_perc}} | {{cv.cdk_svm.tnr_perc}} |
Positive predictive value | {{cv.cdk_lazar_high_confidence.ppv_perc}} | {{cv.cdk_lazar_all.ppv_perc}} | {{cv.cdk_rf.ppv_perc}} | {{cv.cdk_lr.ppv_perc}} | {{cv.cdk_lr2.ppv_perc}} | {{cv.cdk_nn.ppv_perc}} | {{cv.cdk_svm.ppv_perc}} |
Negative predictive value | {{cv.cdk_lazar_high_confidence.npv_perc}} | {{cv.cdk_lazar_all.npv_perc}} | {{cv.cdk_rf.npv_perc}} | {{cv.cdk_lr.npv_perc}} | {{cv.cdk_lr2.npv_perc}} | {{cv.cdk_nn.npv_perc}} | {{cv.cdk_svm.npv_perc}} |
Nr. predictions | {{cv.cdk_lazar_high_confidence.n}} | {{cv.cdk_lazar_all.n}} | {{cv.cdk_rf.n}} | {{cv.cdk_lr.n}} | {{cv.cdk_lr2.n}} | {{cv.cdk_nn.n}} | {{cv.cdk_svm.n}} |

: Summary of crossvalidation results with CDK descriptors (lazar-HC: lazar with high confidence, lazar-all: all lazar predictions, RF: random forests, LR-sgd: logistic regression (stochastic gradient descent), LR-scikit: logistic regression (scikit), NN: neural networks, SVM: support vector machines) {#tbl:cv-cdk}

@fig:roc depicts the position of all crossvalidation results in receiver operating characteristic (ROC) space.

![ROC plot of crossvalidation results (lazar-HC: lazar with high confidence, lazar-all: all lazar predictions, RF: random forests, LR-sgd: logistic regression (stochastic gradient descent), LR-scikit: logistic regression (scikit), NN: neural networks, SVM: support vector machines).](figures/roc.png){#fig:roc}

Confusion matrices for all models are available from the git repository
https://git.in-silico.ch/mutagenicity-paper/tree/crossvalidations/confusion-matrices/,
individual predictions can be found in
https://git.in-silico.ch/mutagenicity-paper/tree/crossvalidations/predictions/.

All investigated algorithm/descriptor combinations
give accuracies between (80 and 85%) which is equivalent to the experimental
variability of the *Salmonella typhimurium* mutagenicity bioassay (80-85%,
@Piegorsch1991). Sensitivities and specificities are balanced in all of
these models.

Pyrrolizidine alkaloid mutagenicity predictions 
-----------------------------------------------

Mutagenicity predictions of {{pa.n}} pyrrolizidine alkaloids (PAs) from all
investigated models can be downloaded from
<https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/pa-predictions.csv>.
A visual representation of all PA predictions can be found at
<https://git.in-silico.ch/mutagenicity-paper/tree/pyrrolizidine-alkaloids/pa-predictions.pdf>.


<!--
![Summary of Diester predictions](figures/Diester.png){#fig:die}

![Summary of Macrocyclic-diester predictions](figures/Macrocyclic.diester.png){#fig:mcdie}

![Summary of Monoester predictions](figures/Monoester.png){#fig:me}

![Summary of N-oxide predictions](figures/N.oxide.png){#fig:nox}

![Summary of Otonecine predictions](figures/Otonecine.png){#fig:oto}

![Summary of Platynecine predictions](figures/Platynecine.png){#fig:plat}

![Summary of Retronecine predictions](figures/Retronecine.png){#fig:ret}

![Summary of Tertiary PA predictions](figures/Tertiary.PA.png){#fig:tert}
-->

For the visualisation of the position of pyrrolizidine alkaloids in respect to
the training data set we have applied t-distributed stochastic neighbor
embedding (t-SNE, @Maaten2008) for MolPrint2D and CDK descriptors.  t-SNE maps
each high-dimensional object (chemical) to a two-dimensional point, maintaining
the high-dimensional distances of the objects. Similar objects are represented
by nearby points and dissimilar objects are represented by distant points.
t-SNE coordinates were calculated with the R `Rtsne` package using the default
settings (perplexity = 30, theta = 0.5, max_iter = 1000).

@fig:tsne-mp2d shows the t-SNE of pyrrolizidine alkaloids (PA) and the
mutagenicity training data in MP2D space (Tanimoto/Jaccard similarity), which
resembles basically the structural diversity of the investigated compounds.

![t-SNE visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA) in MP2D space](figures/tsne-mp2d-mutagenicity.png){#fig:tsne-mp2d}

@fig:tsne-cdk shows the t-SNE of pyrrolizidine alkaloids (PA) and the
mutagenicity training data in CDK space (Euclidean similarity), which resembles
basically the physical-chemical properties of the investigated compounds.

![t-SNE visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA) in CDK space](figures/tsne-cdk-mutagenicity.png){#fig:tsne-cdk}

@fig:tsne-mp2d-rf and @fig:tsne-cdk-lazar-all depict two example pyrrolizidine alkaloid
mutagenicity predictions in the context of training data. t-SNE visualisations of all investigated models can be downloaded from <https://git.in-silico.ch/mutagenicity-paper/figures>.

<!--
![t-SNE visualisation of all MP2D lazar predictions](figures/tsne-mp2d-lazar-all-classifications.png){#fig:tsne-mp2d-lazar-all}

![t-SNE visualisation of MP2D lazar high-confidence predictions](figures/tsne-mp2d-lazar-high-confidence-classifications.png){#fig:tsne-mp2d-lazar-high-confidence}

![t-SNE visualisation of MP2D logistic regression (sgd) predictions](figures/tsne-mp2d-lr-classifications.png){#fig:tsne-mp2d-lr}

![t-SNE visualisation of MP2D logistic regression (scikit) predictions](figures/tsne-mp2d-lr2-classifications.png){#fig:tsne-mp2d-lr2}

![t-SNE visualisation of MP2D neural network predictions](figures/tsne-mp2d-nn-classifications.png){#fig:tsne-mp2d-nn}
-->

![t-SNE visualisation of MP2D random forest predictions](figures/tsne-mp2d-rf-classifications.png){#fig:tsne-mp2d-rf}

<!--
![t-SNE visualisation of MP2D support vector machine predictions](figures/tsne-mp2d-svm-classifications.png){#fig:tsne-mp2d-svm}
-->

![t-SNE visualisation of all CDK lazar predictions](figures/tsne-cdk-lazar-all-classifications.png){#fig:tsne-cdk-lazar-all}

<!--
![t-SNE visualisation of CDK lazar high-confidence predictions](figures/tsne-cdk-lazar-high-confidence-classifications.png){#fig:tsne-cdk-lazar-high-confidence}

![t-SNE visualisation of CDK logistic regression (sgd) predictions](figures/tsne-cdk-lr-classifications.png){#fig:tsne-cdk-lr}

![t-SNE visualisation of CDK logistic regression (scikit) predictions](figures/tsne-cdk-lr2-classifications.png){#fig:tsne-cdk-lr2}

![t-SNE visualisation of CDK neural network predictions](figures/tsne-cdk-nn-classifications.png){#fig:tsne-cdk-nn}

![t-SNE visualisation of CDK random forest predictions](figures/tsne-cdk-rf-classifications.png){#fig:tsne-cdk-rf}

![t-SNE visualisation of CDK support vector machine predictions](figures/tsne-cdk-svm-classifications.png){#fig:tsne-cdk-svm}
-->

@tbl:pa-summary summarises the outcome of pyrrolizidine alkaloid predictions from all models with MolPrint2D and CDK descriptors.


| Model  | MP2D Mutagenic | Nr. predictions | CDK Mutagenic | Nr. predictions |
|-------:|----------------|-----------------|---------------|-----------------|
| lazar-all | {{pa.mp2d_lazar_all.mut_perc}}% ({{pa.mp2d_lazar_all.mut}}) | {{pa.mp2d_lazar_all.n_perc}}% ({{pa.mp2d_lazar_all.n}}) | {{pa.cdk_lazar_all.mut_perc}}% ({{pa.cdk_lazar_all.mut}}) | {{pa.cdk_lazar_all.n_perc}}% ({{pa.cdk_lazar_all.n}}) |
| lazar-HC | {{pa.mp2d_lazar_high_confidence.mut_perc}}% ({{pa.mp2d_lazar_high_confidence.mut}}) | {{pa.mp2d_lazar_high_confidence.n_perc}}% ({{pa.mp2d_lazar_high_confidence.n}}) | {{pa.cdk_lazar_high_confidence.mut_perc}}% ({{pa.cdk_lazar_high_confidence.mut}}) | {{pa.cdk_lazar_high_confidence.n_perc}}% ({{pa.cdk_lazar_high_confidence.n}}) |
| RF | {{pa.mp2d_rf.mut_perc}}% ({{pa.mp2d_rf.mut}}) | {{pa.mp2d_rf.n_perc}}% ({{pa.mp2d_rf.n}}) | {{pa.cdk_rf.mut_perc}}% ({{pa.cdk_rf.mut}}) | {{pa.cdk_rf.n_perc}}% ({{pa.cdk_rf.n}}) |
| LR-sgd | {{pa.mp2d_lr.mut_perc}}% ({{pa.mp2d_lr.mut}}) | {{pa.mp2d_lr.n_perc}}% ({{pa.mp2d_lr.n}}) | {{pa.cdk_lr.mut_perc}}% ({{pa.cdk_lr.mut}}) | {{pa.cdk_lr.n_perc}}% ({{pa.cdk_lr.n}}) |
| LR-scikit | {{pa.mp2d_lr2.mut_perc}}% ({{pa.mp2d_lr2.mut}}) | {{pa.mp2d_lr2.n_perc}}% ({{pa.mp2d_lr2.n}}) | {{pa.cdk_lr2.mut_perc}}% ({{pa.cdk_lr2.mut}}) | {{pa.cdk_lr2.n_perc}}% ({{pa.cdk_lr2.n}}) |
| NN | {{pa.mp2d_nn.mut_perc}}% ({{pa.mp2d_nn.mut}}) | {{pa.mp2d_nn.n_perc}}% ({{pa.mp2d_nn.n}}) | {{pa.cdk_nn.mut_perc}}% ({{pa.cdk_nn.mut}}) | {{pa.cdk_nn.n_perc}}% ({{pa.cdk_nn.n}}) |
| SVM | {{pa.mp2d_svm.mut_perc}}% ({{pa.mp2d_svm.mut}}) | {{pa.mp2d_svm.n_perc}}% ({{pa.mp2d_svm.n}}) | {{pa.cdk_svm.mut_perc}}% ({{pa.cdk_svm.mut}}) | {{pa.cdk_svm.n_perc}}% ({{pa.cdk_svm.n}}) |

: Summary of pyrrolizidine alkaloid predictions {#tbl:pa-summary}

@fig:pa-groups displays the proportion of positive mutagenicity predictions
from all models for the different pyrrolizidine alkaloid groups. Tensorflow
models predicted all {{pa.n}} pyrrolizidine alkaloids, `lazar` MP2D models
predicted {{pa.mp2d_lazar_all.n}} compounds
({{pa.mp2d_lazar_high_confidence.n}} with high confidence) and `lazar` CDK
models {{pa.cdk_lazar_all.n}} compounds ({{pa.cdk_lazar_high_confidence.n}}
with high confidence).

![Summary of pyrrolizidine alkaloid predictions](figures/pa-groups.png){#fig:pa-groups}

For the lazar-HC model, only
{{pa.mp2d_lazar_high_confidence.n_perc}}/{{pa.cdk_lazar_high_confidence.n_perc}}%
of the PA dataset were within the stricter similarity thresholds of 0.5/0.9
(MP2D/CDK). Reduction of the similarity threshold to 0.2/0.5 in the lazar-all
model increased the amount of predictable PAs to
{{pa.mp2d_lazar_all.n_perc}}/{{pa.cdk_lazar_all.n_perc}}%. As the other ML
models do not consider applicability domains, all PAs were predicted. 

Although most of the models show similar accuracies, sensitivities and
specificities in crossvalidation experiments some of the models (MPD-RF, CDK-RF
and CDK-SVM) predict a lower number of mutagens
({{pa.cdk_rf.mut_perc}}-{{pa.mp2d_rf.mut_perc}}%) than the majority of the
models ({{pa.mp2d_svm.mut_perc}}-{{pa.mp2d_lazar_high_confidence.mut_perc}}%,
@tbl:pa-summary, @fig:pa-groups). 

Over all models, the mean value of mutagenic predicted PAs was highest for
otonecines ({{pa.groups.Otonecine.mut_perc}}%,
{{pa.groups.Otonecine.mut}}/{{pa.groups.Otonecine.n_pred}}), followed by
macrocyclic diesters ({{pa.groups.Macrocyclic_diester.mut_perc}}%,
{{pa.groups.Macrocyclic_diester.mut}}/{{pa.groups.Macrocyclic_diester.n_pred}}),
dehydropyrrolizidines ({{pa.groups.Dehydropyrrolizidine.mut_perc}}%,
{{pa.groups.Dehydropyrrolizidine.mut}}/{{pa.groups.Dehydropyrrolizidine.n_pred}}),
tertiary PAs ({{pa.groups.Tertiary_PA.mut_perc}}%,
{{pa.groups.Tertiary_PA.mut}}/{{pa.groups.Tertiary_PA.n_pred}}) and
retronecines ({{pa.groups.Retronecine.mut_perc}}%,
{{pa.groups.Retronecine.mut}}/{{pa.groups.Retronecine.n_pred}}).

When excluding the aforementioned three deviating models,
the rank order stays the same, but the percentage of mutagenic PAs is higher.

The following rank order for mutagenic probability can be deduced from the results of all models taken together: 

Necine base: 				Platynecine < Retronecine << Otonecine

Necic acid: 				Monoester < Diester << Macrocyclic diester

Modification of necine base:		N-oxide  < Tertiary PA < Dehydropyrrolizidine

Discussion
==========

Data
----

A new training dataset for *Salmonella* mutagenicity was created from three
different sources (@Kazius2005, @Hansen2009, @EFSA2016). It contains {{cv.n_uniq}}
unique chemical structures, which is according to our knowledge the largest
public mutagenicity dataset presently available. The new training data can be
downloaded from
<https://git.in-silico.ch/mutagenicity-paper/tree/mutagenicity/mutagenicity.csv>.

Algorithms
----------

`lazar` is formally a *k-nearest-neighbor* algorithm that searches for similar
structures for a given compound and calculates the prediction based on the
experimental data for these structures. The QSAR literature calls such models
frequently *local models*, because models are generated specifically for each
query compound. The investigated tensorflow models are in contrast *global
models*, i.e. a single model is used to make predictions for all compounds. It
has been postulated in the past, that local models are more accurate, because
they can account better for mechanisms that affect only a subset of the
training data.

@tbl:cv-mp2d, @tbl:cv-cdk and @fig:roc show that the crossvalidation accuracies
of all models are comparable to the experimental variability of the *Salmonella
typhimurium* mutagenicity bioassay (80-85% according to @Piegorsch1991). All of
these models have balanced sensitivity (true positive rate) and specificity
(true negative rate) and provide highly significant concordance with
experimental data (as determined by McNemar's Test). This is a clear indication
that *in silico* predictions can be as reliable as the bioassays. Given that
the variability of experimental data is similar to model variability it is
impossible to decide which model gives the most accurate predictions, as models
with higher accuracies might just approximate experimental errors better than
more robust models.

Our results do not support the assumption that local models are superior to
global models for classification purposes. For regression models (lowest
observed effect level) we have found however that local models may outperform
global models (@Helma2018) with accuracies similar to experimental variability.

As all investigated algorithms give similar accuracies the selection will
depend more on practical considerations than on intrinsic  properties. Nearest
neighbor algorithms like `lazar` have the practical advantage that the
rationales for individual predictions can be presented in a  straightforward
manner that is understandable without a background in statistics or machine
learning (a screenshot of the mutagenicity prediction for
12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione can be found at
https://git.in-silico.ch/mutagenicity-paper/tree/figures/lazar-screenshot.png).
This allows a critical examination of individual predictions and prevents blind
trust in models that are intransparent to users with a toxicological
background.

![`lazar` screenshot of 12,21-Dihydroxy-4-methyl-4,8-secosenecinonan-8,11,16-trione mutagenicity prediction](figures/lazar-screenshot.png){#fig:lazar}
<!--
-->

Descriptors
-----------

This study uses two types of descriptors for the characterisation of chemical
structures:

*MolPrint2D* fingerprints (MP2D, @Bender2004) use atom environments (i.e.
connected atom types for all atoms in a molecule) as molecular representation,
which resembles basically the chemical concept of functional groups. MP2D
descriptors are used to determine chemical similarities in the default `lazar`
settings, and previous experiments have shown, that they give more accurate
results than predefined fingerprints (e.g.  MACCS, FP2-4).

*Chemistry Development Kit* (CDK, @Willighagen2017) descriptors 
were calculated with the PaDEL graphical interface (@Yap2011). They include 
1D and 2D topological descriptors as well as physical-chemical properties.

All investigated algorithms obtained models within the experimental variability
for both types of descriptors (@tbl:cv-mp2d, @tbl:cv-cdk, @fig:roc).

Given that similar predictive accuracies are obtainable from both types of
descriptors the choice depends once more on practical considerations:

MolPrint2D fragments can be calculated very efficiently for every well defined
chemical structure with OpenBabel (@OBoyle2011a). CDK descriptor calculations
are in contrast much more resource intensive and may fail for a significant
number of compounds ({{cv.cdk.n_failed}} from {{cv.n_uniq}}). 

MolPrint2D fragments are generated dynamically from chemical structures and can
be used to determine if a compound contains structural features that are absent
in training data. This feature can be used to determine applicability domains.
CDK descriptors contain in contrast a predefined set of descriptors with
unknown toxicological relevance.

MolPrint2D fingerprints can be represented very efficiently as sets of features
that are present in a given compound which makes similarity calculations very
efficient. Due to the large number of substructures present in training
compounds, they lead however to large and sparsely populated datasets, if they
have to be expanded to a binary matrix (e.g. as input for tensorflow models).
CDK descriptors contain in contrast in every case matrices with
{{cv.cdk.n_descriptors}} columns which can cause substantial computational overhead.

Pyrrolizidine alkaloid mutagenicity predictions
-----------------------------------------------

### Algorithms and descriptors

@fig:pa-groups shows a clear differentiation between the different
pyrrolizidine alkaloid groups.
Nevertheless differences between predictions from different algorithms and descriptors
(@tbl:pa-summary) were not expected based on crossvalidation results.

In order to investigate, if any of the investigated models show systematic
errors in the  vicinity of pyrrolizidine-alkaloids we have performed a
detailled t-SNE analysis of all models (see @fig:tsne-mp2d-rf and
@fig:tsne-cdk-lazar-all for two examples, all visualisations can be found at
<https://git.in-silico.ch/mutagenicity-paper/figures>).

None of the models showed obvious deviations from their expected
behaviour, so the reason for the disagreement between some of the models
remains unclear at the moment.  It is however possible that some
systematic errors are covered up by converting high dimensional spaces to two
coordinates and are thus invisible in t-SNE visualisations.

Only two compounds from the PA dataset (Senecivernine and Retronecine) are part
of the training set. Both are non-mutagenic and were predicted as non-mutagenic
by all models (instances have been removed from the training set for unbiased
predictions). Despite the exact concordance, we cannot draw any general
conclusions about model performance based on two examples with a single
outcome. 

### Necic acid

The rank order of the necic acid is comparable in all models. PAs from the
monoester type had the lowest genotoxic probability, followed by PAs from the
open-ring diester type. PAs with macrocyclic diesters had the highest genotoxic
probability. The result fits well with current state of knowledge: in general,
PAs, which have a macrocyclic diesters as necic acid, are considered to be more
mutagenic than those with an open-ring diester or monoester (@EFSA2011,
@Fu2004).  As pointed out above, open diesters and macrocyclic PAs have a
reduced detoxification due to steric hinderance of the respective esterases
(@Ruan2014). This was also confirmed by more recent studies, confirming that
macrocyclic- and open-diesters are more genotoxic *in vitro* than monoesters
(@Hadi2021; @Allemang2018, @Louisse2019). 

### Necine base

In the rank order of necine base PAs, platynecine is the least mutagenic, followed
by retronecine, and otonecine. Saturated PAs of the platynecine-type are
generally accepted to be less or non-mutagenic and have been shown in *in vitro*
experiments to form no DNA-adducts (@Xia2013). In literature,
otonecine-type PAs were shown to be more mutagenic than those of the
retronecine-type (@Li2013). 

### Modifications of necine base

The group-specific results reflect the expected relationship between the
groups: the low mutagenic probability of *N*-oxides and the high probability of
dehydropyrrolizidines (DHP) (@Chen2010).  However, *N*-oxides may be *in vivo*
converted back to their parent mutagenic/tumorigenic parent PA (@Yan2008),  on the
other hand they are highly water soluble and generally considered as
detoxification products, which are *in vivo* quickly renally eliminated
(@Chen2010).

DHP are regarded as the toxic principle in the metabolism of
PAs, and are known to produce protein- and DNA-adducts (@Chen2010). None of our investigated
models did meet this expectation and all of them predicted the majority of DHP as
non-mutagenic. However, the following issues need to be considered. On the one
hand, all DHP were outside of the stricter applicability domain of MP2D `lazar`.
This indicates that they are structurally very different than the training data
and might be out of the applicability domain of all models based on this
training set. In addition, DHP has two unsaturated double bounds in its necine
base, making it highly reactive. DHP and other comparable molecules have a very
short lifespan *in vivo*, and usually cannot be used in *in vitro* experiments. 

Overall the low number of positive mutagenicity predictions was unexpected.
PAs are generally considered to be genotoxic, and the mode of action is also
known.  Therefore, the fact that some models predict the majority of PAs as not
mutagenic seems contradictory. To understand this result, the experimental
basis of the training dataset has to be considered. The training dataset is
based on the *Salmonella typhimurium* mutagenicity bioassay (Ames test). There
are some studies, which show mutagenicity of PAs in the Ames test (@Chen2010).
Also, @Rubiolo1992 examined several different PAs and several different
extracts of PA-containing plants in the Ames test. They found that the Ames
test was indeed able to detect mutagenicity of PAs, but in general, appeared to
have a low sensitivity. The pre-incubation phase for metabolic activation of
PAs by microsomal enzymes was the sensitivity-limiting step. This could very
well mean that the low sensitivity of the Ames test for PAs is also reflected
in the investigated models.

<!--
A *in vitro* screen of cellular PA effects (metabolic activation and mutagenic
effects) in human and rodent hepatocytes (HepG2 and H-4-II-E) showed that
results may also critically depend on the cellular model and cell culture
conditions and may underestimate the effects of PAs (@Forsch2018).
-->

In summary, we found marked differences in the predicted genotoxic probability
between the PA groups: most mutagenic appeared the otonecines and macrocyclic
diesters, least mutagenic the platynecines and the mono- and diesters. These
results are comparable with *in vitro* measurements in hepatic HepaRG cells
(@Louisse2019), where relative potencies (RP) were determined: for otonecines
and cyclic diesters RP = 1, for open diesters RP = 0.1 and for monoesters RP =
0.01.

Due to a lack of differential data, European authorities based their risk
assessment in a worst-case approach on lasiocarpine, for which sufficient data
on genotoxicity and carcinogenicity were available (@HMPC2014, @EMA2020). Our
data further support a tiered risk assessment based on *in silico* and
experimental data on the relative potency of individual PAs as already
suggested by other authors (@Merz2016, @Rutz2020, @Louisse2019). 

The practical question how to choose model predictions in the absence of
experimental data remains open. Tensorflow predictions do not include
applicability domain estimations and the rationales for predictions cannot be
traced by toxicologists.  Transparent models like `lazar` may have an advantage
in this context, because they present rationales for predictions (similar
compounds with experimental data) which can be accepted or rejected by
toxicologists and provide validated applicability domain estimations. 

Conclusions
===========

A new public *Salmonella* mutagenicity training dataset with {{cv.n}}
experimental results was created and used to train `lazar` and Tensorflow
models with MolPrint2D and CDK descriptors. All investigated algorithm and
descriptor combinations showed accuracies comparable to the interlaboratory
variability of the Ames test.

Pyrrolizidine alkaloid predictions showed a clear separation between different
classes of PAs which were generally in accordance with the current
toxicological knowledge about these compounds.  Some of the models showed
however a substantially lower number of mutagenicity predictions, despite
similar crossvalidation results and we were unable to identify the reasons for
this discrepancy within this investigation.

Our data show that large difference exist with regard to genotoxic probabilities
between different pyrrolizidine subgroups. To adjust risk assessment of
pyrrolizidine contamination, our data supports a tiered risk assessment based
on *in silico* and experimental data on the relative potency of individual
pyrrolizidine alkaloids.

References
==========