--- title: A comparison of nine machine learning models based on an expanded mutagenicity dataset and their application for predicting pyrrolizidine alkaloid mutagenicity author: - Christoph Helma: institute: ist email: helma@in-silico.ch correspondence: "yes" - Verena Schöning: institute: zeller - Philipp Boss: institute: sysbio - Jürgen Drewe: institute: zeller institute: - ist: name: in silico toxicology gmbh address: "Rastatterstrasse 41, 4057 Basel, Switzerland" - zeller: name: Zeller AG address: "Seeblickstrasse 4, 8590 Romanshorn, Switzerland" - sysbio: name: Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association address: "Robert-Rössle-Strasse 10, Berlin, 13125, Germany" bibliography: bibliography.bib keywords: mutagenicity, QSAR, lazar, random forest, support vector machine, linear regression, neural nets, deep learning documentclass: scrartcl tblPrefix: Table figPrefix: Figure header-includes: - \usepackage{lineno, setspace, color, colortbl, longtable} - \doublespacing - \linenumbers ... Abstract ======== Random forest, support vector machine, logistic regression, neural networks and k-nearest neighbor (`lazar`) algorithms, were applied to new *Salmonella* mutagenicity dataset with 8309 unique chemical structures. The best prediction accuracies in 10-fold-crossvalidation were obtained with `lazar` models and MolPrint2D descriptors, that gave accuracies ({{cv.lazar-high-confidence.acc_perc}}%) similar to the interlaboratory variability of the Ames test. **TODO**: PA results Introduction ============ **TODO**: rationale for investigation The main objectives of this study were - to generate a new mutagenicity training dataset, by combining the most comprehensive public datasets - to compare the performance of MolPrint2D (*MP2D*) fingerprints with PaDEL descriptors - to compare the performance of global QSAR models (random forests (*RF*), support vector machines (*SVM*), logistic regression (*LR*), neural nets (*NN*)) with local models (`lazar`) - to apply these models for the prediction of pyrrolizidine alkaloid mutagenicity Materials and Methods ===================== Data ---- ### Mutagenicity training data An identical training dataset was used for all models. The training dataset was compiled from the following sources: - Kazius/Bursi Dataset (4337 compounds, @Kazius2005): - Hansen Dataset (6513 compounds, @Hansen2009): - EFSA Dataset (695 compounds @EFSA2016): Mutagenicity classifications from Kazius and Hansen datasets were used without further processing. To achieve consistency with these datasets, EFSA compounds were classified as mutagenic, if at least one positive result was found for TA98 or T100 Salmonella strains. Dataset merges were based on unique SMILES (*Simplified Molecular Input Line Entry Specification*) strings of the compound structures. Duplicated experimental data with the same outcome was merged into a single value, because it is likely that it originated from the same experiment. Contradictory results were kept as multiple measurements in the database. The combined training dataset contains 8309 unique structures. Source code for all data download, extraction and merge operations is publicly available from the git repository under a GPL3 License. The new combined dataset can be found at . ### Pyrrolizidine alkaloid (PA) dataset The testing dataset consisted of 602 different PAs. **TODO**: **Verena** Kannst Du kurz die Quellen und Auswahlkriterien zusammenfassen? The compilation of the PA dataset is described in detail in [Schöning et al. (2017)](#_ENREF_119). Descriptors ----------- ### MolPrint2D (*MP2D*) fingerprints MolPrint2D fingerprints (@OBoyle2011a) use atom environments as molecular representation. They determine for each atom in a molecule, the atom types of its connected atoms to represent their chemical environment. This resembles basically the chemical concept of functional groups. In contrast to predefined lists of fragments (e.g. FP3, FP4 or MACCs fingerprints) or descriptors (e.g PaDEL) they are generated dynamically from chemical structures. This has the advantage that they can capture substructures of toxicological relevance that are not included in other descriptors. Chemical similarities (e.g. Tanimoto indices) can be calculated very efficiently with MolPrint2D fingerprints. Using them as descriptors for global models leads however to huge, sparsely populated matrices that cannot be handled with traditional machine learning algorithms. In our experiments none of the R and Tensorflow algorithms was capable to use them as descriptors. MolPrint2D fingerprints were calculated with the OpenBabel cheminformatics library (@OBoyle2011a). #### PaDEL descriptors Molecular 1D and 2D descriptors were calculated with the PaDEL-Descriptors program ( version 2.21, @Yap2011). As the training dataset contained over 8309 instances, it was decided to delete instances with missing values during data pre-processing. Furthermore, substances with equivocal outcome were removed. The final training dataset contained 8080 instances with known mutagenic potential. During feature selection, descriptors with near zero variance were removed using '*NearZeroVar*'-function (package 'caret'). If the percentage of the most common value was more than 90% or when the frequency ratio of the most common value to the second most common value was greater than 95:5 (e.g. 95 instances of the most common value and only 5 or less instances of the second most common value), a descriptor was classified as having a near zero variance. After that, highly correlated descriptors were removed using the '*findCorrelation*'-function (package 'caret') with a cut-off of 0.9. This resulted in a training dataset with 516 descriptors. These descriptors were scaled to be in the range between 0 and 1 using the '*preProcess*'-function (package 'caret'). The scaling routine was saved in order to apply the same scaling on the testing dataset. As these three steps did not consider the dependent variable (experimental mutagenicity), it was decided that they do not need to be included in the cross-validation of the model. To further reduce the number of features, a LASSO (*least absolute shrinkage and selection operator*) regression was performed using the '*glmnet*'-function (package '*glmnet*'). The reduced dataset was used for the generation of the pre-trained models. PaDEL descriptors were used in global (RF, SVM, LR, NN) and local (`lazar`) models. Algorithms ---------- ### `lazar` `lazar` (*lazy structure activity relationships*) is a modular framework for read-across model development and validation. It follows the following basic workflow: For a given chemical structure `lazar`: - searches in a database for similar structures (neighbours) with experimental data, - builds a local QSAR model with these neighbours and - uses this model to predict the unknown activity of the query compound. This procedure resembles an automated version of read across predictions in toxicology, in machine learning terms it would be classified as a k-nearest-neighbour algorithm. Apart from this basic workflow, `lazar` is completely modular and allows the researcher to use arbitrary algorithms for similarity searches and local QSAR (*Quantitative structure--activity relationship*) modelling. Algorithms used within this study are described in the following sections. #### Neighbour identification Utilizing this modularity, similarity calculations were based both on MolPrint2D fingerprints and on PaDEL descriptors. For MolPrint2D fingerprints chemical similarity between two compounds $a$ and $b$ is expressed as the proportion between atom environments common in both structures $A \cap B$ and the total number of atom environments $A \cup B$ (Jaccard/Tanimoto index). $$sim = \frac{\lvert A\ \cap B \rvert}{\lvert A\ \cup B \rvert}$$ For PaDEL descriptors chemical similarity between two compounds $a$ and $b$ is expressed as the cosine similarity between the descriptor vectors $A$ for $a$ and $B$ for $b$. $$sim = \frac{A \cdot B}{\lvert A \rvert \lvert B \rvert}$$ Threshold selection is a trade-off between prediction accuracy (high threshold) and the number of predictable compounds (low threshold). As it is in many practical cases desirable to make predictions even in the absence of closely related neighbours, we follow a tiered approach: - First a similarity threshold of 0.5 is used to collect neighbours, to create a local QSAR model and to make a prediction for the query compound. This are predictions with *high confidence*. - If any of these steps fails, the procedure is repeated with a similarity threshold of 0.2 and the prediction is flagged with a warning that it might be out of the applicability domain of the training data (*low confidence*). - Similarity thresholds of 0.5 and 0.2 are the default values chosen by the software developers and remained unchanged during the course of these experiments. Compounds with the same structure as the query structure are automatically eliminated from neighbours to obtain unbiased predictions in the presence of duplicates. #### Local QSAR models and predictions Only similar compounds (neighbours) above the threshold are used for local QSAR models. In this investigation, we are using a weighted majority vote from the neighbour's experimental data for mutagenicity classifications. Probabilities for both classes (mutagenic/non-mutagenic) are calculated according to the following formula and the class with the higher probability is used as prediction outcome. $$p_{c} = \ \frac{\sum_{}^{}\text{sim}_{n,c}}{\sum_{}^{}\text{sim}_{n}}$$ $p_{c}$ Probability of class c (e.g. mutagenic or non-mutagenic)\ $\sum_{}^{}\text{sim}_{n,c}$ Sum of similarities of neighbours with class c\ $\sum_{}^{}\text{sim}_{n}$ Sum of all neighbours #### Applicability domain The applicability domain (AD) of `lazar` models is determined by the structural diversity of the training data. If no similar compounds are found in the training data no predictions will be generated. Warnings are issued if the similarity threshold had to be lowered from 0.5 to 0.2 in order to enable predictions. Predictions without warnings can be considered as close to the applicability domain (*high confidence*) and predictions with warnings as more distant from the applicability domain (*low confidence*). Quantitative applicability domain information can be obtained from the similarities of individual neighbours. #### Availability - `lazar` experiments for this manuscript: (source code, GPL3) - `lazar` framework: (source code, GPL3) - `lazar` GUI: (source code, GPL3) - Public web interface: ### R Random Forest, Support Vector Machines, and Deep Learning The RF, SVM, and DL models were generated using the R software (R-project for Statistical Computing, *;* version 3.3.1), specific R packages used are identified for each step in the description below. #### Random Forest (*RF*) For the RF model, the '*randomForest*'-function (package '*randomForest*') was used. A forest with 1000 trees with maximal terminal nodes of 200 was grown for the prediction. #### Support Vector Machines (*SVM*) The '*svm*'-function (package 'e1071') with a *radial basis function kernel* was used for the SVM model. **TODO**: **Verena, Phillip** Sollen wir die DL Modelle ebenso wie die Tensorflow als Neural Nets (NN) bezeichnen? #### Deep Learning The DL model was generated using the '*h2o.deeplearning*'-function (package '*h2o*'). The DL contained four hidden layer with 70, 50, 50, and 10 neurons, respectively. Other hyperparameter were set as follows: l1=1.0E-7, l2=1.0E-11, epsilon = 1.0E-10, rho = 0.8, and quantile\_alpha = 0.5. For all other hyperparameter, the default values were used. Weights and biases were in a first step determined with an unsupervised DL model. These values were then used for the actual, supervised DL model. To validate these models, an internal cross-validation approach was chosen. The training dataset was randomly split in training data, which contained 95% of the data, and validation data, which contain 5% of the data. A feature selection with LASSO on the training data was performed, reducing the number of descriptors to approximately 100. This step was repeated five times. Based on each of the five different training data, the predictive models were trained and the performance tested with the validation data. This step was repeated 10 times. **TODO**: **Verena** kannst Du bitte ueberpruefen, ob das noch stimmt und ggf die Figure 1 anpassen ![Flowchart of the generation and validation of the models generated in R-project](figures/image1.png){#fig:valid} #### Applicability domain **TODO**: **Verena**: Mit welchen Deskriptoren hast Du den Jaccard index berechnet? Fuer den Jaccard index braucht man binaere Deskriptoren (zB MP2D), mit PaDEL Deskriptoren koennte man zB eine euklidische oder cosinus Distanz berechnen. The AD of the training dataset and the PA dataset was evaluated using the Jaccard distance. A Jaccard distance of '0' indicates that the substances are similar, whereas a value of '1' shows that the substances are different. The Jaccard distance was below 0.2 for all PAs relative to the training dataset. Therefore, PA dataset is within the AD of the training dataset and the models can be used to predict the genotoxic potential of the PA dataset. #### Availability R scripts for these experiments can be found in https://git.in-silico.ch/mutagenicity-paper/scripts/R. ### Tensorflow models Data pre-processing was done by rank transformation using the '*QuantileTransformer*' procedure. A sequential model has been used. Four layers have been used: input layer, two hidden layers (with 12, 8 and 8 nodes, respectively) and one output layer. For the output layer, a sigmoidal activation function and for all other layers the ReLU ('*Rectified Linear Unit*') activation function was used. Additionally, a L^2^-penalty of 0.001 was used for the input layer. For training of the model, the ADAM algorithm was used to minimise the cross-entropy loss using the default parameters of Keras. Training was performed for 100 epochs with a batch size of 64. The model was implemented with Python 3.6 and Keras. **TODO**: **Philipp** Ich hab die alten Ergebnisse mit feature selection weggelassen, ist das ok? Dann muesste auch dieser Absatz gestrichen werden, oder? **TODO**: **Philipp** Kannst Du bitte die folgenden Absaetze ergaenzen #### Random forests (*RF*) #### Logistic regression (SGD) (*LR-sgd*) #### Logistic regression (scikit) (*LR-scikit*) **TODO**: **Philipp, Verena** DL oder NN? #### Neural Nets (*NN*) Alternatively, a DL model was established with Python-based Tensorflow program () using the high-level API Keras () to build the models. Tensorflow models used the same PaDEL descriptors as the R models. Validation ---------- 10-fold cross-validation was used for all Tensorflow models. #### Availability Jupyter notebooks for these experiments can be found in https://git.in-silico.ch/mutagenicity-paper/scripts/tensorflow. Results ======= 10-fold crossvalidations ------------------------ Crossvalidation results are summarized in the following tables: @tbl:lazar shows `lazar` results with MolPrint2D and PaDEL descriptors, @tbl:R R results and @tbl:tensorflow Tensorflow results. ```{#tbl:lazar .table file="tables/lazar-summary.csv" caption="Summary of lazar crossvalidation results (all/high confidence predictions)"} ``` ```{#tbl:R .table file="tables/r-summary.csv" caption="Summary of R crossvalidation results"} ``` ```{#tbl:tensorflow .table file="tables/tensorflow-summary.csv" caption="Summary of tensorflow crossvalidation results"} ``` @fig:roc depicts the position of all crossvalidation results in receiver operating characteristic (ROC) space. ![ROC plot of crossvalidation results.](figures/roc.png){#fig:roc} Confusion matrices for all models are available from the git repository https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/confusion-matrices/, individual predictions can be found in https://git.in-silico.ch/mutagenicity-paper/10-fold-crossvalidations/predictions/. The most accurate crossvalidation predictions have been obtained with standard `lazar` models using MolPrint2D descriptors ({{cv.lazar-high-confidence.acc}} for predictions with high confidence, {{cv.lazar-all.acc}} for all predictions). Models utilizing PaDEL descriptors have generally lower accuracies ranging from {{cv.R-DL.acc}} (R deep learning) to {{cv.R-RF.acc}} (R/Tensorflow random forests). Sensitivity and specificity is generally well balanced with the exception of `lazar`-PaDEL (low sensitivity) and R deep learning (low specificity) models. Pyrrolizidine alkaloid mutagenicity predictions ----------------------------------------------- Mutagenicity predictions from all investigated models for 602 pyrrolizidine alkaloids (PAs) are shown in Table 4. A CSV table with all predictions can be downloaded from https://git.in-silico.ch/mutagenicity-paper/tables/pa-table.csv **TODO** **Verena und Philipp** Koennt Ihr bitte stichprobenweise die Tabelle ueberpruefen \input{tables/pa-tab.tex} @tbl:pa-summary summarises the number of positive and negative mutagenicity predictions for all investigated models. ```{#tbl:pa-summary .table file="tables/pa-summary.csv" caption="Summary of pyrrolizidine alkaloid mutagenicity predictions"} ``` For the visualisation of the position of pyrrolizidine alkaloids in respect to the training data set we have applied t-distributed stochastic neighbor embedding (t-SNE, @Maaten2008) for MolPrint2D and PaDEL descriptors. t-SNE maps each high-dimensional object (chemical) to a two-dimensional point, maintaining the high-dimensional distances of the objects. Similar objects are represented by nearby points and dissimilar objects are represented by distant points. @fig:tsne-mp2d shows the t-SNE of pyrrolizidine alkaloids (PA) and the mutagenicity training data in MP2D space (Tanimoto/Jaccard similarity). ![t-SNE visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA)](figures/tsne-mp2d.png){#fig:tsne-mp2d} @fig:tsne-padel shows the t-SNE of pyrrolizidine alkaloids (PA) and the mutagenicity training data in PaDEL space (Euclidean similarity). ![t-SNE visualisation of mutagenicity training data and pyrrolizidine alkaloids (PA)](figures/tsne-padel.png){#fig:tsne-padel} Discussion ========== Data ---- A new training dataset for *Salmonella* mutagenicity was created from three different sources (@Kazius2005, @Hansen2009, @EFSA2016). It contains 8309 unique chemical structures, which is according to our knowledge the largest public mutagenicity dataset presently available. The new training data can be downloaded from . Model performance ----------------- @tbl:lazar, @tbl:R, @tbl:tensorflow and @fig:roc show that the standard `lazar` algorithm (with MP2D fingerprints) give the most accurate crossvalidation results. R Random Forests, Support Vector Machines and Tensorflow models have similar accuracies with balanced sensitivity (true position rate) and specificity (true negative rate). `lazar` models with PaDEL descriptors have low sensitivity and R Deep Learning models have low specificity. The accuracy of `lazar` *in-silico* predictions are comparable to the interlaboratory variability of the Ames test (80-85% according to @Benigni1988), especially for predictions with high confidence ({{cv.lazar-high-confidence.acc_perc}}%). This is a clear indication that *in-silico* predictions can be as reliable as the bioassays, if the compounds are close to the applicability domain. This conclusion is also supported by our analysis of `lazar` lowest observed effect level predictions, which are also similar to the experimental variability (@Helma2018). The lowest number of predictions ({{cv.lazar-padel-high-confidence.n}}) has been obtained from `lazar`-PaDEL high confidence predictions, the largest number of predictions comes from Tensorflow models ({{cv.tensorflow-rf.v3.n}}). Standard `lazar` give a slightly lower number of predictions ({{cv.lazar-all.n}}) than R and Tensorflow models. This is not necessarily a disadvantage, because `lazar` abstains from predictions, if the query compound is very dissimilar from the compounds in the training set and thus avoids to make predictions for compounds out of the applicability domain. Descriptors ----------- This study uses two types of descriptors for the characterisation of chemical structures: *MolPrint2D* fingerprints (MP2D, @Bender2004) use atom environments (i.e. connected atom types for all atoms in a molecule) as molecular representation, which resembles basically the chemical concept of functional groups. MP2D descriptors are used to determine chemical similarities in the default `lazar` settings, and previous experiments have shown, that they give more accurate results than predefined fragments (e.g. MACCS, FP2-4). In order to investigate, if MP2D fingerprints are also suitable for global models we have tried to build R and Tensorflow models, both with and without unsupervised feature selection. Unfortunately none of the algorithms was capable to deal with the large and sparsely populated descriptor matrix. Based on this result we can conclude, that MolPrint2D descriptors are at the moment unsuitable for standard global machine learning algorithms. `lazar` does not suffer from the size and sparseness problem, because (a) it utilizes internally a much more efficient occurrence based representation and (b) it uses fingerprints only for similarity calculations and not as model parameters. PaDEL calculates topological and physical-chemical descriptors. **TODO**: **Verena** kannst Du bitte die Deskriptoren nochmals kurz beschreiben *PaDEL* descriptors were used for `lazar`, R and Tensorflow models. All models based on PaDEL descriptors had similar crossvalidation accuracies that were significantly lower than `lazar` MolPrint2D results. Direct comparisons are available only for the `lazar` algorithm, and also in this case PaDEL accuracies were lower than MolPrint2D accuracies. Based on `lazar` results we can conclude, that PaDEL descriptors are less suited for chemical similarity calculations than MP2D descriptors. It is also likely that PaDEL descriptors lead to less accurate predictions for global models, but we cannot draw any definitive conclusion in the absence of MP2D models. Algorithms ---------- `lazar` is formally a *k-nearest-neighbor* algorithm that searches for similar structures for a given compound and calculates the prediction based on the experimental data for these structures. The QSAR literature calls such models frequently *local models*, because models are generated specifically for each query compound. R and Tensorflow models are in contrast *global models*, i.e. a single model is used to make predictions for all compounds. It has been postulated in the past, that local models are more accurate, because they can account better for mechanisms, that affect only a subset of the training data. Our results seem to support this assumption, because standard `lazar` models with MolPrint2D descriptors perform better than global models. The accuracy of `lazar` models with PaDEL descriptors is however substantially lower and comparable to global models with the same descriptors. This observation may lead to the conclusion that the choice of suitable descriptors is more important for predictive accuracy than the modelling algorithm, but we were unable to obtain global MP2D models for direct comparisons. The selection of an appropriate modelling algorithm is still crucial, because it needs the capability to handle the descriptor space. Neighbour (and thus similarity) based algorithms like `lazar` have a clear advantage in this respect over global machine learning algorithms (e.g. RF, SVM, LR, NN), because Tanimoto/Jaccard similarities can be calculated efficiently with simple set operations. Pyrrolizidine alkaloid mutagenicity predictions ----------------------------------------------- `lazar` models with MolPrint2D descriptors predicted {{pa.lazar.mp2d.all.n_perc}}% of the pyrrolizidine alkaloids (PAs) ({{pa.lazar.mp2d.high_confidence.n_perc}}% with high confidence), the remaining compounds are not within its applicability domain. All other models predicted 100% of the 602 compounds, indicating that all compounds are within their applicability domain. Mutagenicity predictions from different models show little agreement in general (table 4). 42 from 602 PAs have non-conflicting predictions (all of them non-mutagenic). Most models predict predominantly a non-mutagenic outcome for PAs, with exception of the R deep learning (DL) and the Tensorflow Scikit logistic regression models ({{pa.tf.dl.mut_perc}} and {{pa.tf.lr_scikit.mut_perc}}% positive predictions). R RF and SVM models favor very strongly non-mutagenic predictions (only {{pa.r.rf.mut_perc}} and {{pa.r.svm.mut_perc}} % mutagenic PAs), while Tensorflow models classify approximately half of the PAs as mutagenic (RF {{pa.tf.rf.mut_perc}}%, LR-sgd {{pa.tf.lr_sgd}}%, LR-scikit:{{pa.tf.lr_scikit.mut_perc}}, LR-NN:{{pa.tf.nn.mut_perc}}%). `lazar` models predict predominately non-mutagenicity, but to a lesser extend than R models (MP2D:{{pa.lazar.mp2d.all.mut_perc}}, PaDEL:{{pa.lazar.padel.all.mut_perc}}). It is interesting to note, that different implementations of the same algorithm show little accordance in their prediction (see e.g R-RF vs. Tensorflow-RF and LR-sgd vs. LR-scikit in Table 4 and @tbl:pa-summary). **TODO** **Verena, Philipp** habt ihr eine Erklaerung dafuer? @fig:tsne-mp2d and @fig:tsne-padel show the t-SNE of training data and pyrrolizidine alkaloids. In @fig:tsne-mp2d the PAs are located closely together at the outer border of the training set. In @fig:tsne-padel they are less clearly separated and spread over the space occupied by the training examples. This is probably the reason why PaDEL models predicted all instances and the MP2D model only {{pa.lazar.mp2d.all.n}} PAs. Predicting a large number of instances is however not the ultimate goal, we need accurate predictions and an unambiguous estimation of the applicability domain. With PaDEL descriptors *all* PAs are within the applicability domain of the training data, which is unlikely despite the size of the training set. MolPrint2D descriptors provide a clearer separation, which is also reflected in a better separation between high and low confidence predictions in `lazar` MP2D predictions as compared to `lazar` PaDEL predictions. Crossvalidation results with substantially higher accuracies for MP2D models than for PaDEL models also support this argument. Differences between MP2D and PaDEL descriptors can be explained by their specific properties: PaDEL calculates a fixed set of descriptors for all structures, while MolPrint2D descriptors resemble substructures that are present in a compound. For this reason there is no fixed number of MP2D descriptors, the descriptor space are all unique substructures of the training set. If a query compound contains new substructures, this is immediately reflected in a lower similarity to training compounds, which makes applicability domain estimations very straightforward. With PaDEL (or any other predefined descriptors), the same set of descriptors is calculated for every compound, even if a compound comes from an completely new chemical class. From a practical point we still have to face the question, how to choose model predictions, if no experimental data is available (we found two PAs in the training data, but this number is too low, to draw any general conclusions). Based on crossvalidation results and the arguments in favor of MolPrint2D descriptors we would put the highest trust in `lazar` MolPrint2D predictions, especially in high-confidence predictions. `lazar` predictions have a accuracy comparable to experimental variability (@Helma2018) for compounds within the applicability domain. But they should not be trusted blindly. For practical purposes it is important to study the rationales (i.e. neighbors and their experimental activities) for each prediction of relevance. A freely accessible GUI for this purpose has been implemented at https://lazar.in-silico.ch. **TODO**: **Verena** Wenn Du lazar Ergebnisse konkret diskutieren willst, kann ich Dir ausfuehrliche Vorhersagen (mit aehnlichen Verbindungen und deren Aktivitaet) fuer einzelne Beispiele zusammenstellen Conclusions =========== A new public *Salmonella* mutagenicity training dataset with 8309 compounds was created and used it to train `lazar`, R and Tensorflow models with MolPrint2D and PaDEL descriptors. The best performance was obtained with `lazar` models using MolPrint2D descriptors, with prediction accuracies ({{cv.lazar-high-confidence.acc_perc}}%) comparable to the interlaboratory variability of the Ames test (80-85%). Models based on PaDEL descriptors had lower accuracies than MolPrint2D models, but only the `lazar` algorithm could use MolPrint2D descriptors. **TODO**: PA Vorhersagen References ==========