summaryrefslogtreecommitdiff
path: root/loael.tex
diff options
context:
space:
mode:
Diffstat (limited to 'loael.tex')
-rw-r--r--loael.tex41
1 files changed, 21 insertions, 20 deletions
diff --git a/loael.tex b/loael.tex
index b5c625b..b82a370 100644
--- a/loael.tex
+++ b/loael.tex
@@ -1,4 +1,4 @@
-\documentclass[]{article}
+\documentclass[]{achemso}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
@@ -389,7 +389,9 @@ Finally the local RF model is applied to
\href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/model.rb\#L194-L272}{predict
the activity} of the query compound. The RMSE of bootstrapped local
model predictions is used to construct 95\% prediction intervals at
-1.96*RMSE.
+1.96*RMSE. The width of the prediction interval indicates the expected
+prediction accuracy. The ``true'' value of a prediction should be with
+95\% probability within the prediction interval.
If RF modelling or prediction fails, the program resorts to using the
\href{https://github.com/opentox/lazar/blob/loael-paper.submission/lib/regression.rb\#L6-L16}{weighted
@@ -725,20 +727,20 @@ experimental variability (Lo Piparo et al. 2014). In the present study,
a similar approach was applied to build models generating quantitative
predictions of long-term toxicity. Two databases compiling chronic oral
rat lowest adverse effect levels (LOAEL) as endpoint were available from
-different sources. \protect\hypertarget{dataset-comparison-1}{}{}Our
-investigations clearly indicated that the Nestlé and FSVO databases are
-very similar in terms of chemical structures and properties as well as
-distribution of experimental LOAEL values. The only significant
-difference that we observed was that the Nestlé one has larger amount of
-small molecules, than the FSVO database. For this reason we pooled both
-databases into a single training dataset for read across predictions.
+different sources. Our investigations clearly indicated that the Nestlé
+and FSVO databases are very similar in terms of chemical structures and
+properties as well as distribution of experimental LOAEL values. The
+only significant difference that we observed was that the Nestlé one has
+larger amount of small molecules, than the FSVO database. For this
+reason we pooled both databases into a single training dataset for read
+across predictions.
An early review of the databases revealed that 155 out of the 671
chemicals available in the training datasets had at least two
independent studies/LOAELs. These studies were exploited to generate
information on the reproducibility of chronic animal studies and were
used to evaluate prediction performance of the models in the context of
-experimental variability.Considerable variability in the experimental
+experimental variability. Considerable variability in the experimental
data was observed. Study design differences, including dose selection,
dose spacing and route of administration are likely explanation of
experimental variability. High experimental variability has an impact on
@@ -747,15 +749,14 @@ quality by introducing noise into the training data, secondly it
influences accuracy estimates because predictions have to be compared
against noisy data where ``true'' experimental values are unknown. This
will become obvious in the next section, where comparison of predictions
-with experimental data is
-discussed.\protect\hypertarget{lazar-predictions}{}{}The data obtained
-in the present study indicate that \texttt{lazar} generates reliable
-predictions for compounds within the applicability domain of the
-training data (i.e.~predictions without warnings, which indicates a
-sufficient number of neighbors with similarity \textgreater{} 0.5 to
-create local random forest models). Correlation analysis shows that
-errors (\(\text{RMSE}\)) and explained variance (\(r^{2}\)) are
-comparable to experimental variability of the training data.
+with experimental data is discussed. The data obtained in the present
+study indicate that \texttt{lazar} generates reliable predictions for
+compounds within the applicability domain of the training data
+(i.e.~predictions without warnings, which indicates a sufficient number
+of neighbors with similarity \textgreater{} 0.5 to create local random
+forest models). Correlation analysis shows that errors (\(\text{RMSE}\))
+and explained variance (\(r^{2}\)) are comparable to experimental
+variability of the training data.
Predictions with a warning (neighbor similarity \textless{} 0.5 and
\textgreater{} 0.2 or weighted average predictions) are more uncertain.
@@ -786,7 +787,7 @@ since evidence suggest that exposure duration has little impact on the
levels of NOAELs/LOAELs (Zarn, Engeli, and Schlatter 2011, Zarn, Engeli,
and Schlatter (2013)).
-Elena: Should we add a GUI screenshot?
+TODO: GUI screenshot
\section{Summary}\label{summary}