diff options
Diffstat (limited to 'loael.tex')
-rw-r--r-- | loael.tex | 41 |
1 files changed, 21 insertions, 20 deletions
@@ -1,4 +1,4 @@ -\documentclass[]{article} +\documentclass[]{achemso} \usepackage{lmodern} \usepackage{amssymb,amsmath} \usepackage{ifxetex,ifluatex} @@ -389,7 +389,9 @@ Finally the local RF model is applied to \href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/model.rb\#L194-L272}{predict the activity} of the query compound. The RMSE of bootstrapped local model predictions is used to construct 95\% prediction intervals at -1.96*RMSE. +1.96*RMSE. The width of the prediction interval indicates the expected +prediction accuracy. The ``true'' value of a prediction should be with +95\% probability within the prediction interval. If RF modelling or prediction fails, the program resorts to using the \href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/regression.rb\#L6-L16}{weighted @@ -725,20 +727,20 @@ experimental variability (Lo Piparo et al. 2014). In the present study, a similar approach was applied to build models generating quantitative predictions of long-term toxicity. Two databases compiling chronic oral rat lowest adverse effect levels (LOAEL) as endpoint were available from -different sources. \protect\hypertarget{dataset-comparison-1}{}{}Our -investigations clearly indicated that the Nestlé and FSVO databases are -very similar in terms of chemical structures and properties as well as -distribution of experimental LOAEL values. The only significant -difference that we observed was that the Nestlé one has larger amount of -small molecules, than the FSVO database. For this reason we pooled both -databases into a single training dataset for read across predictions. +different sources. Our investigations clearly indicated that the Nestlé +and FSVO databases are very similar in terms of chemical structures and +properties as well as distribution of experimental LOAEL values. The +only significant difference that we observed was that the Nestlé one has +larger amount of small molecules, than the FSVO database. For this +reason we pooled both databases into a single training dataset for read +across predictions. An early review of the databases revealed that 155 out of the 671 chemicals available in the training datasets had at least two independent studies/LOAELs. These studies were exploited to generate information on the reproducibility of chronic animal studies and were used to evaluate prediction performance of the models in the context of -experimental variability.Considerable variability in the experimental +experimental variability. Considerable variability in the experimental data was observed. Study design differences, including dose selection, dose spacing and route of administration are likely explanation of experimental variability. High experimental variability has an impact on @@ -747,15 +749,14 @@ quality by introducing noise into the training data, secondly it influences accuracy estimates because predictions have to be compared against noisy data where ``true'' experimental values are unknown. This will become obvious in the next section, where comparison of predictions -with experimental data is -discussed.\protect\hypertarget{lazar-predictions}{}{}The data obtained -in the present study indicate that \texttt{lazar} generates reliable -predictions for compounds within the applicability domain of the -training data (i.e.~predictions without warnings, which indicates a -sufficient number of neighbors with similarity \textgreater{} 0.5 to -create local random forest models). Correlation analysis shows that -errors (\(\text{RMSE}\)) and explained variance (\(r^{2}\)) are -comparable to experimental variability of the training data. +with experimental data is discussed. The data obtained in the present +study indicate that \texttt{lazar} generates reliable predictions for +compounds within the applicability domain of the training data +(i.e.~predictions without warnings, which indicates a sufficient number +of neighbors with similarity \textgreater{} 0.5 to create local random +forest models). Correlation analysis shows that errors (\(\text{RMSE}\)) +and explained variance (\(r^{2}\)) are comparable to experimental +variability of the training data. Predictions with a warning (neighbor similarity \textless{} 0.5 and \textgreater{} 0.2 or weighted average predictions) are more uncertain. @@ -786,7 +787,7 @@ since evidence suggest that exposure duration has little impact on the levels of NOAELs/LOAELs (Zarn, Engeli, and Schlatter 2011, Zarn, Engeli, and Schlatter (2013)). -Elena: Should we add a GUI screenshot? +TODO: GUI screenshot \section{Summary}\label{summary} |