remaining manuscript fixes

author: Christoph Helma <helma@in-silico.ch> 2018-01-10 12:02:30 +0100
committer: Christoph Helma <helma@in-silico.ch> 2018-01-10 12:02:30 +0100
commit: 3800005bafa367ba3f09365459f7ec426483becf (patch)
tree: a9610a4bacbce27ab113ebd958a1f05b56f7b92c
parent: 160d9d4489a4d9ebed72db46c5bbe94f1d8131bc (diff)
4 files changed, 28 insertions, 30 deletions
diff --git a/loael.Rmd b/loael.Rmd
index 47ff2c6..3886c50 100644
--- a/loael.Rmd
+++ b/loael.Rmd
@@ -106,13 +106,13 @@ models in the context of experimental variability.
 An important limitation often raised for computational toxicology is the lack
 of transparency on published models and consequently on the difficulty for the
 scientific community to reproduce and apply them. To overcome these issues,
-source code for all programs and libraries and the databases that have been used to generate this
-manuscript are made available under GPL3 licenses. Databases and compiled
+source code for all programs and libraries and the data that have been used to generate this
+manuscript are made available under GPL3 licenses. Data and compiled
 programs with all dependencies for the reproduction of results in this manuscript are available as
 a self-contained docker image. All data, tables and figures in this manuscript
 was generated directly from experimental results using the `R` package `knitR`.
-A single command repeats all experiments (possibly with different settings) and
-updates the manuscript with the new results.
+<!-- A single command repeats all experiments (possibly with different settings) and
+updates the manuscript with the new results. -->
 
 <!--
 overcome these issues, all databases and programs that have been used to
@@ -462,7 +462,7 @@ c.mg$sd <- ave(c.mg$LOAEL,c.mg$SMILES,FUN=sd)
 ```
 
 Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases. 
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
 
 The Nestlé database has `r length(m$SMILES)` LOAEL values for
 `r length(levels(m$SMILES))` unique structures, `r m.dupnr` compounds have
@@ -493,10 +493,6 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 
 In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
 
-[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
-the baseline reference for the comparison with predicted values.
-
 ```{r echo=F}
 data <- read.csv("data/median-correlation.csv",header=T)
 cor <- cor.test(data$mazzatorta,data$swiss)
@@ -513,6 +509,10 @@ experimental variability.  Correlation analysis shows a significant (p-value < 2
 correlation between the experimental data in both databases with r\^2:
 `r round(median.r.square,2)`, RMSE: `r round(median.rmse,2)`
 
+[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
+the baseline reference for the comparison with predicted values.
+
 ![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
   identical values in both databases was removed from
   analysis.](figures/median-correlation.pdf){#fig:datacorr}
diff --git a/loael.md b/loael.md
index ec3f743..7dbe8a4 100644
--- a/loael.md
+++ b/loael.md
@@ -98,13 +98,13 @@ models in the context of experimental variability.
 An important limitation often raised for computational toxicology is the lack
 of transparency on published models and consequently on the difficulty for the
 scientific community to reproduce and apply them. To overcome these issues,
-source code for all programs and libraries and the databases that have been used to generate this
-manuscript are made available under GPL3 licenses. Databases and compiled
+source code for all programs and libraries and the data that have been used to generate this
+manuscript are made available under GPL3 licenses. Data and compiled
 programs with all dependencies for the reproduction of results in this manuscript are available as
 a self-contained docker image. All data, tables and figures in this manuscript
 was generated directly from experimental results using the `R` package `knitR`.
-A single command repeats all experiments (possibly with different settings) and
-updates the manuscript with the new results.
+<!-- A single command repeats all experiments (possibly with different settings) and
+updates the manuscript with the new results. -->
 
 <!--
 overcome these issues, all databases and programs that have been used to
@@ -424,7 +424,7 @@ same experiments.
 
 
 Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases. 
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases. 
 
 The Nestlé database has 567 LOAEL values for
 445 unique structures, 93 compounds have
@@ -455,10 +455,6 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 
 In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
 
-[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
-the baseline reference for the comparison with predicted values.
-
 
 
 [@fig:datacorr] depicts the correlation between LOAEL values from both
@@ -469,6 +465,10 @@ experimental variability.  Correlation analysis shows a significant (p-value < 2
 correlation between the experimental data in both databases with r\^2:
 0.52, RMSE: 0.59
 
+[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
+the baseline reference for the comparison with predicted values.
+
 ![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
   identical values in both databases was removed from
   analysis.](figures/median-correlation.pdf){#fig:datacorr}
diff --git a/loael.pdf b/loael.pdf
index f4c1351..4c35492 100644
--- a/loael.pdf
+++ b/loael.pdf
diff --git a/loael.tex b/loael.tex
index 6b9a2fb..52712f3 100644
--- a/loael.tex
+++ b/loael.tex
@@ -178,14 +178,12 @@ An important limitation often raised for computational toxicology is the
 lack of transparency on published models and consequently on the
 difficulty for the scientific community to reproduce and apply them. To
 overcome these issues, source code for all programs and libraries and
-the databases that have been used to generate this manuscript are made
-available under GPL3 licenses. Databases and compiled programs with all
+the data that have been used to generate this manuscript are made
+available under GPL3 licenses. Data and compiled programs with all
 dependencies for the reproduction of results in this manuscript are
 available as a self-contained docker image. All data, tables and figures
 in this manuscript was generated directly from experimental results
-using the \texttt{R} package \texttt{knitR}. A single command repeats
-all experiments (possibly with different settings) and updates the
-manuscript with the new results.
+using the \texttt{R} package \texttt{knitR}.
 
 \section{Materials and Methods}\label{materials-and-methods}
 
@@ -523,8 +521,8 @@ variability}\label{intra-database-variability}
 Both databases contain substances with multiple measurements, which
 allow the determination of experimental variabilities. For this purpose
 we have calculated the mean standard deviation of compounds with
-multiple measurements, which is roughly a factor of 2 for both
-databases.
+multiple measurements. Mean standard deviations and thus experimental
+variabilities are similar for both databases.
 
 The Nestlé database has 567 LOAEL values for 445 unique structures, 93
 compounds have multiple measurements with a mean standard deviation
@@ -557,11 +555,6 @@ In order to compare the correlation of LOAEL values in both databases
 and to establish a reference for predicted values, we have investigated
 compounds, that occur in both databases.
 
-Figure~\ref{fig:comp} shows the experimental LOAEL variability of
-compounds occurring in both datasets (i.e.~the \emph{test} dataset)
-colored in blue (experimental). This is the baseline reference for the
-comparison with predicted values.
-
 Figure~\ref{fig:datacorr} depicts the correlation between LOAEL values
 from both databases. As both databases contain duplicates medians were
 used for the correlation plot and statistics. It should be kept in mind
@@ -571,6 +564,11 @@ Correlation analysis shows a significant (p-value \textless{} 2.2e-16)
 correlation between the experimental data in both databases with r\^{}2:
 0.52, RMSE: 0.59
 
+Figure~\ref{fig:comp} shows the experimental LOAEL variability of
+compounds occurring in both datasets (i.e.~the \emph{test} dataset)
+colored in blue (experimental). This is the baseline reference for the
+comparison with predicted values.
+
 \begin{figure}
 \centering
 \includegraphics{figures/median-correlation.pdf}
author	Christoph Helma <helma@in-silico.ch>	2018-01-10 12:02:30 +0100
committer	Christoph Helma <helma@in-silico.ch>	2018-01-10 12:02:30 +0100
commit	3800005bafa367ba3f09365459f7ec426483becf (patch)
tree	a9610a4bacbce27ab113ebd958a1f05b56f7b92c
parent	160d9d4489a4d9ebed72db46c5bbe94f1d8131bc (diff)