diff options
author | Christoph Helma <helma@in-silico.ch> | 2018-03-13 15:06:05 +0100 |
---|---|---|
committer | Christoph Helma <helma@in-silico.ch> | 2018-03-13 15:06:05 +0100 |
commit | 1aa8093ea8f182ec7cc9aae626f494a1e14c8c84 (patch) | |
tree | 545cad6d548ac26c6c23961a805a07884fd0f6f0 /loael.tex | |
parent | 391042ada12bd0f9be2649b47e8746071354955a (diff) |
text revisions
Diffstat (limited to 'loael.tex')
-rw-r--r-- | loael.tex | 78 |
1 files changed, 46 insertions, 32 deletions
@@ -100,14 +100,15 @@ \maketitle \begin{abstract} This study compares the accuracy of (Q)SAR/read-across predictions with -the experimental variability of chronic LOAEL values from \emph{in vivo} -experiments. We could demonstrate that predictions of the \texttt{lazar} -algrorithm within the applicability domain of the training data have the -same variability as the experimental training data. Predictions with a -lower similarity threshold (i.e.~a larger distance from the -applicability domain) are also significantly better than random -guessing, but the errors to be expected are higher and a manual -inspection of prediction results is highly recommended. +the experimental variability of chronic lowest-observed-adverse-effect +levels (LOAELs) from \emph{in vivo} experiments. We could demonstrate +that predictions of the lazy structure-activity relationships +(\texttt{lazar}) algorithm within the applicability domain of the +training data have the same variability as the experimental training +data. Predictions with a lower similarity threshold (i.e.~a larger +distance from the applicability domain) are also significantly better +than random guessing, but the errors to be expected are higher and a +manual inspection of prediction results is highly recommended. \end{abstract} \textsuperscript{1} in silico toxicology gmbh, Basel, @@ -166,13 +167,16 @@ methods that lead to impressive validation results, but also to overfitted models with little practical relevance. In the present study, automatic read-across like models were built to -generate quantitative predictions of long-term toxicity. Two databases -compiling chronic oral rat Lowest Adverse Effect Levels (LOAEL) as -endpoint were used. An early review of the databases revealed that many -chemicals had at least two independent studies/LOAELs. These studies -were exploited to generate information on the reproducibility of chronic -animal studies and were used to evaluate prediction performance of the -models in the context of experimental variability. +generate quantitative predictions of long-term toxicity. The aim of the +work was not to predict the nature of the toxicological effects of +chemicals, but to obtain quantitative values which could be compared to +exposure. Two databases compiling chronic oral rat Lowest Adverse Effect +Levels (LOAEL) as endpoint were used. An early review of the databases +revealed that many chemicals had at least two independent +studies/LOAELs. These studies were exploited to generate information on +the reproducibility of chronic animal studies and were used to evaluate +prediction performance of the models in the context of experimental +variability. An important limitation often raised for computational toxicology is the lack of transparency on published models and consequently on the @@ -334,8 +338,6 @@ structures and do not rely on predefined lists of fragments (such as OpenBabel FP3, FP4 or MACCs fingerprints or lists of toxocophores/toxicophobes). This has the advantage that they may capture substructures of toxicological relevance that are not included in other -fingerprints. Unpublished experiments have shown that predictions with -MolPrint2D fingerprints are indeed more accurate than other OpenBabel fingerprints. From MolPrint2D fingerprints we can construct a feature vector with all @@ -367,6 +369,10 @@ absence of closely related neighbors, we follow a tiered approach: similarity threshold of 0.2 and the prediction is flagged with a warning that it might be out of the applicability domain of the training data. +\item + Similarity thresholds of 0.5 and 0.2 are the default values chosen by + the software developers and remained unchanged during the course of + these experiments. \end{itemize} Compounds with the same structure as the query structure are @@ -393,11 +399,12 @@ resampling. Finally the local RF model is applied to \href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/model.rb\#L194-L272}{predict -the activity} of the query compound. The RMSE of bootstrapped local -model predictions is used to construct 95\% prediction intervals at -1.96*RMSE. The width of the prediction interval indicates the expected -prediction accuracy. The ``true'' value of a prediction should be with -95\% probability within the prediction interval. +the activity} of the query compound. The root-mean-square error (RMSE) +of bootstrapped local model predictions is used to construct 95\% +prediction intervals at 1.96*RMSE. The width of the prediction interval +indicates the expected prediction accuracy. The ``true'' value of a +prediction should be with 95\% probability within the prediction +interval. If RF modelling or prediction fails, the program resorts to using the \href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/regression.rb\#L6-L16}{weighted @@ -724,15 +731,15 @@ In order to establish the level of safety concern of food chemicals toxicologically not characterized, a methodology mimicking the process of chemical risk assessment, and supported by computational toxicology, was proposed (Schilter et al. 2014). It is based on the calculation of -margins of exposure (MoE) between predicted values of toxicity and -exposure estimates. The level of safety concern of a chemical is then -determined by the size of the MoE and its suitability to cover the -uncertainties of the assessment. To be applicable, such an approach -requires quantitative predictions of toxicological endpoints relevant -for risk assessment. The present work focuses on the prediction of -chronic toxicity, a major and often pivotal endpoint of toxicological -databases used for hazard identification and characterization of food -chemicals. +margins of exposure (MoE) that is the ratio between the predicted +chronic toxicity value (LOAEL) and exposure estimate. The level of +safety concern of a chemical is then determined by the size of the MoE +and its suitability to cover the uncertainties of the assessment. To be +applicable, such an approach requires quantitative predictions of +toxicological endpoints relevant for risk assessment. The present work +focuses on the prediction of chronic toxicity, a major and often pivotal +endpoint of toxicological databases used for hazard identification and +characterization of food chemicals. In a previous study, automated read-across like models for predicting carcinogenic potency were developed. In these models, substances in the @@ -845,7 +852,14 @@ variability as the experimental training data. In such cases experimental investigations can be substituted with \emph{in silico} predictions. Predictions with a lower similarity threshold can still give usable results, but the errors to be expected are higher and a -manual inspection of prediction results is highly recommended. +manual inspection of prediction results is highly recommended. Anyway, +our suggested workflow includes always the visual inspection of the +chemical structures of the neighbors selected by the model. Indeed it +will strength the prediction confidence (if the input structure looks +very similar to the neighbors selected to build the model) or it can +drive to the conclusion to use read-across with the most similar +compound of the database (in case not enough similar compounds to build +the model are present in the database). \section*{References}\label{references} \addcontentsline{toc}{section}{References} |