From d32fea79a1b6f1673510f1666bb471e6deb37eff Mon Sep 17 00:00:00 2001
From: Christoph Helma <helma@in-silico.ch>
Date: Fri, 26 Jan 2018 14:36:18 +0100
Subject: final proof before submission

---
 figures/lazar-screenshot.pdf | Bin 0 -> 255264 bytes
 loael.Rmd                    | 179 +++++++------------------------------------
 loael.md                     |  48 +++++-------
 loael.pdf                    | Bin 433007 -> 683927 bytes
 loael.tex                    |  92 ++++++++++++----------
 5 files changed, 99 insertions(+), 220 deletions(-)
 create mode 100644 figures/lazar-screenshot.pdf

diff --git a/figures/lazar-screenshot.pdf b/figures/lazar-screenshot.pdf
new file mode 100644
index 0000000..f550fa4
Binary files /dev/null and b/figures/lazar-screenshot.pdf differ
diff --git a/loael.Rmd b/loael.Rmd
index 0225ba9..c39a3f7 100644
--- a/loael.Rmd
+++ b/loael.Rmd
@@ -62,8 +62,8 @@ prioritization in research and development (safety by design) is a big
 challenge mainly because of the time and cost constraints associated with the
 generation of relevant animal data. In this context, alternative approaches to
 obtain timely and fit-for-purpose toxicological information are being
-developed. Amongst others, non-testing, structure-activity based *in silico*
-toxicology methods (also called computational toxicology) are considered highly
+developed. Amongst others *in silico*
+toxicology methods are considered highly
 promising. Importantly, they are raising more and more interests
 and getting increased acceptance in various regulatory (e.g.
 [@ECHA2008, @EFSA2016, @EFSA2014, @HealthCanada2016, @OECD2015]) and industrial (e.g.
@@ -72,13 +72,12 @@ and getting increased acceptance in various regulatory (e.g.
 For a long time already, computational methods have been an integral
 part of pharmaceutical discovery pipelines, while in chemical food
 safety their actual potentials emerged only recently [@LoPiparo2011].
-In this later field, an application considered critical is in the
+In this field, an application considered critical is in the
 establishment of levels of safety concern in order to rapidly and
 efficiently manage toxicologically uncharacterized chemicals identified
 in food. This requires a risk-based approach to benchmark exposure with
 a quantitative value of toxicity relevant for risk assessment [@Schilter2014].
-Since most of the time chemical food safety deals with
-life-long exposures to relatively low levels of chemicals, and because
+Since chronic studies have the highest power (more animals per group and more endpoints than other studies) and because
 long-term toxicity studies are often the most sensitive in food
 toxicology databases, predicting chronic toxicity is of prime
 importance. Up to now, read-across and Quantitative Structure Activity
@@ -96,44 +95,25 @@ tempting for model developers to use aggressive model optimisation
 methods that lead to impressive validation results, but also to
 overfitted models with little practical relevance.
 
-In the present study, automatic read-across like models were built to
-generate quantitative predictions of long-term toxicity. Two databases
-compiling chronic oral rat Lowest Adverse Effect Levels (LOAEL) as
-endpoint were used. An early review of the databases revealed that many
-chemicals had at least two independent studies/LOAELs. These studies
-were exploited to generate information on the reproducibility of chronic
-animal studies and were used to evaluate prediction performance of the
-models in the context of experimental variability.
+In the present study, automatic read-across like models were built to generate
+quantitative predictions of long-term toxicity. Two databases compiling chronic
+oral rat Lowest Adverse Effect Levels (LOAEL) as endpoint were used. An early
+review of the databases revealed that many chemicals had at least two
+independent studies/LOAELs. These studies were exploited to generate
+information on the reproducibility of chronic animal studies and were used to
+evaluate prediction performance of the models in the context of experimental
+variability.
 
 An important limitation often raised for computational toxicology is the lack
 of transparency on published models and consequently on the difficulty for the
 scientific community to reproduce and apply them. To overcome these issues,
-source code for all programs and libraries and the data that have been used to generate this
-manuscript are made available under GPL3 licenses. Data and compiled
-programs with all dependencies for the reproduction of results in this manuscript are available as
-a self-contained docker image. All data, tables and figures in this manuscript
-was generated directly from experimental results using the `R` package `knitR`.
-<!-- A single command repeats all experiments (possibly with different settings) and
-updates the manuscript with the new results. -->
-
-<!--
-overcome these issues, all databases and programs that have been used to
-generate this manuscript are made available under GPL3 licenses.
-A self-contained docker image with all programs, libraries and data
-required for the reproduction of these results is available from
-<https://hub.docker.com/r/insilicotox/loael-paper/>.
-
-Source code and datasets for the reproduction of this manuscript can be
-downloaded from the GitHub repository
-<https://github.com/opentox/loael-paper>. The lazar framework [@Maunz2013]
-is also available under a GPL3 License from
-<https://github.com/opentox/lazar>.
-
-A graphical webinterface for `lazar` model predictions and validation results
-is publicly accessible at <https://lazar.in-silico.ch>, models presented in
-this manuscript will be included in future versions. Source code for the GUI
-can be obtained from <https://github.com/opentox/lazar-gui>.
--->
+source code for all programs and libraries and the data that have been used to
+generate this manuscript are made available under GPL3 licenses. Data and
+compiled programs with all dependencies for the reproduction of results in this
+manuscript are available as a self-contained docker image. All data, tables and
+figures in this manuscript was generated directly from experimental results
+using the `R` package `knitR`.
+
 Materials and Methods
 =====================
 
@@ -344,7 +324,7 @@ data (Nestlé and FSVO databases combined).
 ## Availability
 
 Public webinterface
-  ~ <https://lazar.in-silico.ch>
+  ~ <https://lazar.in-silico.ch> (see [@fig:screenshot])
 
 `lazar` framework
   ~ <https://github.com/opentox/lazar> (source code)
@@ -358,6 +338,7 @@ Manuscript
 Docker image
   ~ <https://hub.docker.com/r/insilicotox/loael-paper/> (container with manuscript, validation experiments, `lazar` libraries and third party dependencies)
 
+![Screenshot of a lazar prediction from the public webinterface.](figures/lazar-screenshot.pdf){#fig:screenshot}
 
 Results
 =======
@@ -395,31 +376,9 @@ used with different kinds of features. We have investigated structural as well
 as physico-chemical properties and concluded that both databases are very
 similar, both in terms of chemical structures and physico-chemical properties. 
 
-The only statistically significant difference between both databases, is that
+The only statistically significant difference between both databases is that
 the Nestlé database contains more small compounds (61 structures with less than
-11 atoms) than the FSVO-database (19 small structures, p-value 3.7E-7).
-
-<!--
-[@fig:ches-mapper-pc] shows an embedding that is based on physico-chemical (PC)
-descriptors.
-
-![Compounds from the Mazzatorta and the Swiss Federal Office dataset are highlighted in red and green. Compounds that occur in both datasets are highlighted in magenta.](figures/pc-small-compounds-highlighted.png){#fig:ches-mapper-pc}
-
-Martin: please explain light colors at bottom of histograms
-
-In this example, CheS-Mapper applied a principal components analysis to map
-compounds according to their physico-chemical (PC) feature values into 3D
-space. Both datasets have in general very similar PC feature values. As an
-exception, the Nestlé database includes most of the tiny compound
-structures: we have selected the 78 smallest compounds (with 10 atoms and less,
-marked with a blue box in the screen-shot) and found that 61 of these compounds
-occur in the Nestlé database, whereas only 19 are contained in the Swiss
-dataset (p-value 3.7E-7).
-
-This result was confirmed for structural features (fingerprints) including
-MolPrint2D features that are utilized for model building in this work.
--->
-
+11 non-hydrogen atoms) than the FSVO-database (19 small structures, chi-square test: p-value 3.7E-7).
 
 ### Experimental variability versus prediction uncertainty 
 
@@ -464,7 +423,7 @@ c.mg$sd <- ave(c.mg$LOAEL,c.mg$SMILES,FUN=sd)
 ```
 
 Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
+For this purpose we have calculated the mean LOAEL standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
 
 The Nestlé database has `r length(m$SMILES)` LOAEL values for
 `r length(levels(m$SMILES))` unique structures, `r m.dupnr` compounds have
@@ -489,7 +448,7 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 `r round(mean(10^(-1*c.dup$sd)),2)` mmol/kg_bw/day)
 ([@fig:intra]). 
 
-![LOAEL distribution and variability of compounds with multiple measurements in both databases. Compounds were sorted according to LOAEL values. Each vertical line represents a compound, and each dot an individual LOAEL value. Experimental variability can be inferred from dots (LOAELs) lying on the same line (compound).](figures/dataset-variability.pdf){#fig:intra}
+![LOAEL distribution and variability of compounds with multiple measurements in both databases. Compounds were sorted according to LOAEL values. Each vertical line represents a compound, and each dot an individual LOAEL value. Experimental variability can be inferred from dots (LOAELs) on the same line (compound).](figures/dataset-variability.pdf){#fig:intra}
 
 ##### Inter database variability
 
@@ -548,17 +507,9 @@ data).
 In `r round(100*correct_predictions/length(training$SMILES))`\% of the test examples
 experimental LOAEL values were located within the 95\% prediction intervals. 
 
-<!--
-Experimental data and 95\% prediction intervals did not overlap in `r incorrect_predictions` cases
-(`r round(100*incorrect_predictions/length(training$SMILES))`\%),
-`r length(which(sign(misclassifications$Distance) == 1))` predictions were too high and
-`r length(which(sign(misclassifications$Distance) == -1))` predictions too low (after -log10 transformation).
--->
-
 [@fig:comp] shows a comparison of predicted with experimental values. Most
 predicted values were located within the experimental variability.
 
-
 ![Comparison of experimental with predicted LOAEL values. Each vertical line
 represents a compound, dots are individual measurements (blue), predictions
 (green) or predictions far from the applicability domain, i.e. with warnings
@@ -638,10 +589,6 @@ All | `r round(cv.t2all.r_square,2)`  | `r round(cv.t2all.rmse,2)` | `r length(u
 
 : Results from 3 independent 10-fold crossvalidations {#tbl:cv}
 
-<!--
-![Correlation of experimental with predicted LOAEL values (10-fold crossvalidation)](figures/crossvalidation.pdf){#fig:cv}
--->
-
 <div id="fig:cv">
 ![](figures/crossvalidation0.pdf){#fig:cv0 height=30%}
 
@@ -659,7 +606,7 @@ Discussion
 
 It is currently acknowledged that there is a strong need for
 toxicological information on the multiple thousands of chemicals to
-which human may be exposed through food. These include for examples many
+which human may be exposed through food. These include for example many
 chemicals in commerce, which could potentially find their way into food
 [@Stanton2016, @Fowler2011], but also substances
 migrating from food contact materials [@Grob2006], chemicals
@@ -685,8 +632,8 @@ exposure estimates. The level of safety concern of a chemical is then
 determined by the size of the MoE and its suitability to cover the
 uncertainties of the assessment. To be applicable, such an approach
 requires quantitative predictions of toxicological endpoints relevant
-for risk assessment. The present work focuses on prediction of chronic
-toxicity, a major and often pivotal endpoints of toxicological databases
+for risk assessment. The present work focuses on the prediction of chronic
+toxicity, a major and often pivotal endpoint of toxicological databases
 used for hazard identification and characterization of food chemicals.
 
 In a previous study, automated read-across like models for predicting
@@ -697,7 +644,7 @@ observed in these models were within the published estimation of
 experimental variability [@LoPiparo2014]. In the present
 study, a similar approach was applied to build models generating
 quantitative predictions of long-term toxicity. Two databases compiling
-chronic oral rat lowest adverse effect levels (LOAEL) as endpoint were
+chronic oral rat lowest adverse effect levels (LOAEL) as reference value were
 available from different sources. Our investigations clearly indicated that the
 Nestlé and FSVO databases are very similar in terms of chemical
 structures and properties as well as distribution of experimental LOAEL
@@ -755,25 +702,6 @@ shorter duration endpoints would also be valuable for chronic toxicy
 since evidence suggest that exposure duration has little impact on the
 levels of NOAELs/LOAELs [@Zarn2011, @Zarn2013].
 
-<!--
-Elena + Benoit
-
-### Dataset comparison
-
-Our investigations clearly indicate that the Mazzatorta and Swiss Federal Office datasets are very similar in terms of chemical structures and properties and the distribution of experimental LOAEL values. The only significant difference that we have observed was that the Nestlé database has larger amount of small molecules, than the Swiss Federal Office dataset. For this reason we have pooled both dataset into a single training dataset for read across predictions.
-
-[@fig:intra] and [@fig:corr] and [@tbl:common-pred] show however considerable
-variability in the experimental data. High experimental variability has an
-impact on model building and on model validation. First it influences model
-quality by introducing noise into the training data, secondly it influences
-accuracy estimates because predictions have to be compared against noisy data
-where "true" experimental values are unknown.
-
-<!--
-This will become obvious in the
-next section, where we compare predictions with experimental data.
--->
-
 ### `lazar` predictions
 
 [@tbl:common-pred], [@tbl:cv], [@fig:comp], [@fig:corr] and [@fig:cv] clearly
@@ -802,47 +730,6 @@ Finally there is a substantial number of compounds
 (`r length(unique(t$SMILES))-length(training$LOAEL_predicted)`),
 where no predictions can be made, because there are no similar compounds in the training data. These compounds clearly fall beyond the applicability domain of the training dataset 
  and in such cases it is preferable to avoid predictions instead of random guessing.
--->
-
-TODO: GUI screenshot
-
-<!--
-is covered in
-prediction interval shows that `lazar` read across predictions fit well into
-the experimental variability of LOAEL values.
-
-It is tempting to increase the "quality" of predictions by performing parameter
-or algorithm optimisations, but this may lead to overfitted models, because the
-training set is known beforehand. As prediction accuracies correspond well to
-experimental accuracies, and the visual inspection of predictions does not show
-obvious anomalies, we consider our model as a robust method for LOAEL
-estimations. Prediction accuracies that are lower than experimental variability
-would be a clear sign for a model that is overfitted for a particular test set.
-
-we present a brief analysis of the two most severe mispredictions:
-
-```{r echo=F}
-smi = "COP(=O)(SC)N"
-misclass = training[which(training$SMILES==smi),]
-med = round(misclass[,2],2)
-pred = round(misclass[,3],2)
-pi = round(log10(misclass[,4]),2)
-```
-
-The compound with the largest deviation of prediction intervals is (amino-methylsulfanyl-phosphoryl)oxymethane (SMILES `r smi`) with an experimental median of `r med` and a prediction interval of `r pred` +/- `r pi`. In this case the prediction is based on two neighbors with very low similarity (0.1 and 0.13). Such cases can be eliminated by raising the similarity threshold for neighbors, but that could come at the cost of a larger number of unpredicted compounds. The graphical user interface shows for each prediction neighbors and similarities for a critical examination which should make the detection of similar cases rather straightforward.
-
-```{r echo=F}
-smi = "O=S1OCC2C(CO1)C1(C(C2(Cl)C(=C1Cl)Cl)(Cl)Cl)Cl"
-misclass = training[which(training$SMILES==smi),]
-med = round(misclass[,2],2)
-pred = round(misclass[,3],2)
-pi = round(misclass[,4],2)
-```
-
-The compound with second largest deviation of prediction intervals is
-Endosulfan (SMILES `r smi`)
-with an experimental median of `r med` and a prediction interval of `r pred` +/- `r pi`. In this case the prediction is based on 5 neighbors with similarities between 0.33 and 0.4. All of them are polychlorinated compound, but none of them contains sulfur or is a sulfurous acid ester. Again such problems are easily identified from a visual inspection of neighbors, and we want to stress the importance of inspecting rationales for predictions in the graphical interface before accepting a prediction.
--->
 
 Summary
 =======
@@ -855,13 +742,5 @@ data. In such cases experimental investigations can be substituted with
 still give usable results, but the errors to be expected are higher and
 a manual inspection of prediction results is highly recommended.
 
-<!--
-We could demonstrate that `lazar` predictions within the applicability domain of the training data have the same variability as the experimental training data. In such cases experimental investigations can be substituted with in silico predictions.
-Predictions with a lower similarity threshold can still give usable results, but the errors to be expected are higher and a manual inspection of prediction results is highly recommended.
-
-- beware of over-optimisations and the race for "better" validation results
-- reproducible research
--->
-
 References
 ==========
diff --git a/loael.md b/loael.md
index 7dbe8a4..8d68575 100644
--- a/loael.md
+++ b/loael.md
@@ -5,9 +5,11 @@ author:
     - David Vorgrimmler^1^
     - Denis Gebele^1^
     - Martin Gütlein^2^
-    - Benoit Schilter^3^
-    - Elena Lo Piparo^3^
-include-before: ^1^ in silico toxicology gmbh,  Basel, Switzerland\newline^2^ Inst. f. Computer Science, Johannes Gutenberg Universität Mainz, Germany\newline^3^ Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland
+    - Barbara Engeli^3^
+    - Jürg Zarn^3^
+    - Benoit Schilter^4^
+    - Elena Lo Piparo^4^
+include-before: ^1^ in silico toxicology gmbh,  Basel, Switzerland\newline^2^ Inst. f. Computer Science, Johannes Gutenberg Universität Mainz, Germany\newline^3^ Federal Food Safety and Veterinary Office (FSVO) , Risk Assessment Division , Bern , Switzerland\newline^4^ Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland
 keywords: (Q)SAR, read-across, LOAEL, experimental variability
 date: \today
 abstract: |
@@ -52,8 +54,8 @@ prioritization in research and development (safety by design) is a big
 challenge mainly because of the time and cost constraints associated with the
 generation of relevant animal data. In this context, alternative approaches to
 obtain timely and fit-for-purpose toxicological information are being
-developed. Amongst others, non-testing, structure-activity based *in silico*
-toxicology methods (also called computational toxicology) are considered highly
+developed. Amongst others *in silico*
+toxicology methods are considered highly
 promising. Importantly, they are raising more and more interests
 and getting increased acceptance in various regulatory (e.g.
 [@ECHA2008, @EFSA2016, @EFSA2014, @HealthCanada2016, @OECD2015]) and industrial (e.g.
@@ -62,13 +64,12 @@ and getting increased acceptance in various regulatory (e.g.
 For a long time already, computational methods have been an integral
 part of pharmaceutical discovery pipelines, while in chemical food
 safety their actual potentials emerged only recently [@LoPiparo2011].
-In this later field, an application considered critical is in the
+In this field, an application considered critical is in the
 establishment of levels of safety concern in order to rapidly and
 efficiently manage toxicologically uncharacterized chemicals identified
 in food. This requires a risk-based approach to benchmark exposure with
 a quantitative value of toxicity relevant for risk assessment [@Schilter2014].
-Since most of the time chemical food safety deals with
-life-long exposures to relatively low levels of chemicals, and because
+Since chronic studies have the highest power (more animals per group and more endpoints than other studies) and because
 long-term toxicity studies are often the most sensitive in food
 toxicology databases, predicting chronic toxicity is of prime
 importance. Up to now, read-across and Quantitative Structure Activity
@@ -334,7 +335,7 @@ data (Nestlé and FSVO databases combined).
 ## Availability
 
 Public webinterface
-  ~ <https://lazar.in-silico.ch>
+  ~ <https://lazar.in-silico.ch> (see [@fig:screenshot])
 
 `lazar` framework
   ~ <https://github.com/opentox/lazar> (source code)
@@ -348,6 +349,7 @@ Manuscript
 Docker image
   ~ <https://hub.docker.com/r/insilicotox/loael-paper/> (container with manuscript, validation experiments, `lazar` libraries and third party dependencies)
 
+![Screenshot of a lazar prediction from the public webinterface.](figures/lazar-screenshot.pdf){#fig:screenshot}
 
 Results
 =======
@@ -383,9 +385,9 @@ used with different kinds of features. We have investigated structural as well
 as physico-chemical properties and concluded that both databases are very
 similar, both in terms of chemical structures and physico-chemical properties. 
 
-The only statistically significant difference between both databases, is that
+The only statistically significant difference between both databases is that
 the Nestlé database contains more small compounds (61 structures with less than
-11 atoms) than the FSVO-database (19 small structures, p-value 3.7E-7).
+11 non-hydrogen atoms) than the FSVO-database (19 small structures, chi-square test: p-value 3.7E-7).
 
 <!--
 [@fig:ches-mapper-pc] shows an embedding that is based on physico-chemical (PC)
@@ -424,7 +426,7 @@ same experiments.
 
 
 Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
+For this purpose we have calculated the mean LOAEL standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
 
 The Nestlé database has 567 LOAEL values for
 445 unique structures, 93 compounds have
@@ -449,7 +451,7 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 0.55 mmol/kg_bw/day)
 ([@fig:intra]). 
 
-![Distribution and variability of compounds with multiple LOAEL values in both databases Each vertical line represents a compound, dots are individual LOAEL values.](figures/dataset-variability.pdf){#fig:intra}
+![LOAEL distribution and variability of compounds with multiple measurements in both databases. Compounds were sorted according to LOAEL values. Each vertical line represents a compound, and each dot an individual LOAEL value. Experimental variability can be inferred from dots (LOAELs) on the same line (compound).](figures/dataset-variability.pdf){#fig:intra}
 
 ##### Inter database variability
 
@@ -570,7 +572,7 @@ Discussion
 
 It is currently acknowledged that there is a strong need for
 toxicological information on the multiple thousands of chemicals to
-which human may be exposed through food. These include for examples many
+which human may be exposed through food. These include for example many
 chemicals in commerce, which could potentially find their way into food
 [@Stanton2016, @Fowler2011], but also substances
 migrating from food contact materials [@Grob2006], chemicals
@@ -596,8 +598,8 @@ exposure estimates. The level of safety concern of a chemical is then
 determined by the size of the MoE and its suitability to cover the
 uncertainties of the assessment. To be applicable, such an approach
 requires quantitative predictions of toxicological endpoints relevant
-for risk assessment. The present work focuses on prediction of chronic
-toxicity, a major and often pivotal endpoints of toxicological databases
+for risk assessment. The present work focuses on the prediction of chronic
+toxicity, a major and often pivotal endpoint of toxicological databases
 used for hazard identification and characterization of food chemicals.
 
 In a previous study, automated read-across like models for predicting
@@ -608,7 +610,7 @@ observed in these models were within the published estimation of
 experimental variability [@LoPiparo2014]. In the present
 study, a similar approach was applied to build models generating
 quantitative predictions of long-term toxicity. Two databases compiling
-chronic oral rat lowest adverse effect levels (LOAEL) as endpoint were
+chronic oral rat lowest adverse effect levels (LOAEL) as reference value were
 available from different sources. Our investigations clearly indicated that the
 Nestlé and FSVO databases are very similar in terms of chemical
 structures and properties as well as distribution of experimental LOAEL
@@ -713,11 +715,9 @@ Finally there is a substantial number of compounds
 (37),
 where no predictions can be made, because there are no similar compounds in the training data. These compounds clearly fall beyond the applicability domain of the training dataset 
  and in such cases it is preferable to avoid predictions instead of random guessing.
--->
-
-TODO: GUI screenshot
 
 <!--
+TODO: GUI screenshot
 is covered in
 prediction interval shows that `lazar` read across predictions fit well into
 the experimental variability of LOAEL values.
@@ -754,13 +754,5 @@ data. In such cases experimental investigations can be substituted with
 still give usable results, but the errors to be expected are higher and
 a manual inspection of prediction results is highly recommended.
 
-<!--
-We could demonstrate that `lazar` predictions within the applicability domain of the training data have the same variability as the experimental training data. In such cases experimental investigations can be substituted with in silico predictions.
-Predictions with a lower similarity threshold can still give usable results, but the errors to be expected are higher and a manual inspection of prediction results is highly recommended.
-
-- beware of over-optimisations and the race for "better" validation results
-- reproducible research
--->
-
 References
 ==========
diff --git a/loael.pdf b/loael.pdf
index 4c35492..3effcef 100644
Binary files a/loael.pdf and b/loael.pdf differ
diff --git a/loael.tex b/loael.tex
index 52712f3..f9ab237 100644
--- a/loael.tex
+++ b/loael.tex
@@ -25,7 +25,7 @@
 \PassOptionsToPackage{usenames,dvipsnames}{color} % color is loaded by hyperref
 \hypersetup{
             pdftitle={Modeling Chronic Toxicity: A comparison of experimental variability with (Q)SAR/read-across predictions},
-            pdfauthor={Christoph Helma1; David Vorgrimmler1; Denis Gebele1; Martin Gütlein2; Benoit Schilter3; Elena Lo Piparo3},
+            pdfauthor={Christoph Helma1; David Vorgrimmler1; Denis Gebele1; Martin Gütlein2; Barbara Engeli3; Jürg Zarn3; Benoit Schilter4; Elena Lo Piparo4},
             pdfkeywords={(Q)SAR, read-across, LOAEL, experimental variability},
             colorlinks=true,
             linkcolor=Maroon,
@@ -93,7 +93,7 @@
 
 \title{Modeling Chronic Toxicity: A comparison of experimental variability with
 (Q)SAR/read-across predictions}
-\author{Christoph Helma\textsuperscript{1} \and David Vorgrimmler\textsuperscript{1} \and Denis Gebele\textsuperscript{1} \and Martin Gütlein\textsuperscript{2} \and Benoit Schilter\textsuperscript{3} \and Elena Lo Piparo\textsuperscript{3}}
+\author{Christoph Helma\textsuperscript{1} \and David Vorgrimmler\textsuperscript{1} \and Denis Gebele\textsuperscript{1} \and Martin Gütlein\textsuperscript{2} \and Barbara Engeli\textsuperscript{3} \and Jürg Zarn\textsuperscript{3} \and Benoit Schilter\textsuperscript{4} \and Elena Lo Piparo\textsuperscript{4}}
 \date{\today}
 
 \begin{document}
@@ -113,8 +113,9 @@ inspection of prediction results is highly recommended.
 \textsuperscript{1} in silico toxicology gmbh, Basel,
 Switzerland\newline\textsuperscript{2} Inst. f. Computer Science,
 Johannes Gutenberg Universität Mainz, Germany\newline\textsuperscript{3}
-Chemical Food Safety Group, Nestlé Research Center, Lausanne,
-Switzerland
+Federal Food Safety and Veterinary Office (FSVO) , Risk Assessment
+Division , Bern , Switzerland\newline\textsuperscript{4} Chemical Food
+Safety Group, Nestlé Research Center, Lausanne, Switzerland
 
 \section{Introduction}\label{introduction}
 
@@ -130,29 +131,28 @@ research and development (safety by design) is a big challenge mainly
 because of the time and cost constraints associated with the generation
 of relevant animal data. In this context, alternative approaches to
 obtain timely and fit-for-purpose toxicological information are being
-developed. Amongst others, non-testing, structure-activity based
-\emph{in silico} toxicology methods (also called computational
-toxicology) are considered highly promising. Importantly, they are
-raising more and more interests and getting increased acceptance in
-various regulatory (e.g. (ECHA 2008, EFSA (2016), EFSA (2014), Health
-Canada (2016), OECD (2015))) and industrial (e.g. (Stanton and
-Krusezewski 2016, Lo Piparo et al. (2011))) frameworks.
+developed. Amongst others \emph{in silico} toxicology methods are
+considered highly promising. Importantly, they are raising more and more
+interests and getting increased acceptance in various regulatory (e.g.
+(ECHA 2008, EFSA (2016), EFSA (2014), Health Canada (2016), OECD
+(2015))) and industrial (e.g. (Stanton and Krusezewski 2016, Lo Piparo
+et al. (2011))) frameworks.
 
 For a long time already, computational methods have been an integral
 part of pharmaceutical discovery pipelines, while in chemical food
 safety their actual potentials emerged only recently (Lo Piparo et al.
-2011). In this later field, an application considered critical is in the
+2011). In this field, an application considered critical is in the
 establishment of levels of safety concern in order to rapidly and
 efficiently manage toxicologically uncharacterized chemicals identified
 in food. This requires a risk-based approach to benchmark exposure with
 a quantitative value of toxicity relevant for risk assessment (Schilter
-et al. 2014). Since most of the time chemical food safety deals with
-life-long exposures to relatively low levels of chemicals, and because
-long-term toxicity studies are often the most sensitive in food
-toxicology databases, predicting chronic toxicity is of prime
-importance. Up to now, read-across and Quantitative Structure Activity
-Relationships (QSAR) have been the most used \emph{in silico} approaches
-to obtain quantitative predictions of chronic toxicity.
+et al. 2014). Since chronic studies have the highest power (more animals
+per group and more endpoints than other studies) and because long-term
+toxicity studies are often the most sensitive in food toxicology
+databases, predicting chronic toxicity is of prime importance. Up to
+now, read-across and Quantitative Structure Activity Relationships
+(QSAR) have been the most used \emph{in silico} approaches to obtain
+quantitative predictions of chronic toxicity.
 
 The quality and reproducibility of (Q)SAR and read-across predictions
 has been a continuous and controversial topic in the toxicological
@@ -449,7 +449,7 @@ LOAEL data (Nestlé and FSVO databases combined).
 \begin{description}
 \tightlist
 \item[Public webinterface]
-\url{https://lazar.in-silico.ch}
+\url{https://lazar.in-silico.ch} (see Figure~\ref{fig:screenshot})
 \item[\texttt{lazar} framework]
 \url{https://github.com/opentox/lazar} (source code)
 \item[\texttt{lazar} GUI]
@@ -463,6 +463,13 @@ manuscript, validation experiments, \texttt{lazar} libraries and third
 party dependencies)
 \end{description}
 
+\begin{figure}
+\centering
+\includegraphics{figures/lazar-screenshot.pdf}
+\caption{Screenshot of a lazar prediction from the public
+webinterface.}\label{fig:screenshot}
+\end{figure}
+
 \section{Results}\label{results}
 
 \subsubsection{Dataset comparison}\label{dataset-comparison}
@@ -499,10 +506,10 @@ We have investigated structural as well as physico-chemical properties
 and concluded that both databases are very similar, both in terms of
 chemical structures and physico-chemical properties.
 
-The only statistically significant difference between both databases, is
+The only statistically significant difference between both databases is
 that the Nestlé database contains more small compounds (61 structures
-with less than 11 atoms) than the FSVO-database (19 small structures,
-p-value 3.7E-7).
+with less than 11 non-hydrogen atoms) than the FSVO-database (19 small
+structures, chi-square test: p-value 3.7E-7).
 
 \subsubsection{Experimental variability versus prediction
 uncertainty}\label{experimental-variability-versus-prediction-uncertainty}
@@ -520,7 +527,7 @@ variability}\label{intra-database-variability}
 
 Both databases contain substances with multiple measurements, which
 allow the determination of experimental variabilities. For this purpose
-we have calculated the mean standard deviation of compounds with
+we have calculated the mean LOAEL standard deviation of compounds with
 multiple measurements. Mean standard deviations and thus experimental
 variabilities are similar for both databases.
 
@@ -543,9 +550,11 @@ test set has a mean standard deviation (-log10 transformed values) of
 \begin{figure}
 \centering
 \includegraphics{figures/dataset-variability.pdf}
-\caption{Distribution and variability of compounds with multiple LOAEL
-values in both databases Each vertical line represents a compound, dots
-are individual LOAEL values.}\label{fig:intra}
+\caption{LOAEL distribution and variability of compounds with multiple
+measurements in both databases. Compounds were sorted according to LOAEL
+values. Each vertical line represents a compound, and each dot an
+individual LOAEL value. Experimental variability can be inferred from
+dots (LOAELs) on the same line (compound).}\label{fig:intra}
 \end{figure}
 
 \subparagraph{Inter database
@@ -693,7 +702,7 @@ random forest models.}
 
 It is currently acknowledged that there is a strong need for
 toxicological information on the multiple thousands of chemicals to
-which human may be exposed through food. These include for examples many
+which human may be exposed through food. These include for example many
 chemicals in commerce, which could potentially find their way into food
 (Stanton and Krusezewski 2016, Fowler, Savage, and Mendez (2011)), but
 also substances migrating from food contact materials (Grob et al.
@@ -720,9 +729,10 @@ exposure estimates. The level of safety concern of a chemical is then
 determined by the size of the MoE and its suitability to cover the
 uncertainties of the assessment. To be applicable, such an approach
 requires quantitative predictions of toxicological endpoints relevant
-for risk assessment. The present work focuses on prediction of chronic
-toxicity, a major and often pivotal endpoints of toxicological databases
-used for hazard identification and characterization of food chemicals.
+for risk assessment. The present work focuses on the prediction of
+chronic toxicity, a major and often pivotal endpoint of toxicological
+databases used for hazard identification and characterization of food
+chemicals.
 
 In a previous study, automated read-across like models for predicting
 carcinogenic potency were developed. In these models, substances in the
@@ -732,14 +742,14 @@ observed in these models were within the published estimation of
 experimental variability (Lo Piparo et al. 2014). In the present study,
 a similar approach was applied to build models generating quantitative
 predictions of long-term toxicity. Two databases compiling chronic oral
-rat lowest adverse effect levels (LOAEL) as endpoint were available from
-different sources. Our investigations clearly indicated that the Nestlé
-and FSVO databases are very similar in terms of chemical structures and
-properties as well as distribution of experimental LOAEL values. The
-only significant difference that we observed was that the Nestlé one has
-larger amount of small molecules, than the FSVO database. For this
-reason we pooled both databases into a single training dataset for read
-across predictions.
+rat lowest adverse effect levels (LOAEL) as reference value were
+available from different sources. Our investigations clearly indicated
+that the Nestlé and FSVO databases are very similar in terms of chemical
+structures and properties as well as distribution of experimental LOAEL
+values. The only significant difference that we observed was that the
+Nestlé one has larger amount of small molecules, than the FSVO database.
+For this reason we pooled both databases into a single training dataset
+for read across predictions.
 
 An early review of the databases revealed that 155 out of the 671
 chemicals available in the training datasets had at least two
@@ -825,9 +835,7 @@ Finally there is a substantial number of compounds (37), where no
 predictions can be made, because there are no similar compounds in the
 training data. These compounds clearly fall beyond the applicability
 domain of the training dataset and in such cases it is preferable to
-avoid predictions instead of random guessing. --\textgreater{}
-
-TODO: GUI screenshot
+avoid predictions instead of random guessing.
 
 \section{Summary}\label{summary}
 
-- 
cgit v1.2.3