summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristoph Helma <helma@in-silico.ch>2018-01-10 12:02:30 +0100
committerChristoph Helma <helma@in-silico.ch>2018-01-10 12:02:30 +0100
commit3800005bafa367ba3f09365459f7ec426483becf (patch)
treea9610a4bacbce27ab113ebd958a1f05b56f7b92c
parent160d9d4489a4d9ebed72db46c5bbe94f1d8131bc (diff)
remaining manuscript fixes
-rw-r--r--loael.Rmd18
-rw-r--r--loael.md18
-rw-r--r--loael.pdfbin433140 -> 433007 bytes
-rw-r--r--loael.tex22
4 files changed, 28 insertions, 30 deletions
diff --git a/loael.Rmd b/loael.Rmd
index 47ff2c6..3886c50 100644
--- a/loael.Rmd
+++ b/loael.Rmd
@@ -106,13 +106,13 @@ models in the context of experimental variability.
An important limitation often raised for computational toxicology is the lack
of transparency on published models and consequently on the difficulty for the
scientific community to reproduce and apply them. To overcome these issues,
-source code for all programs and libraries and the databases that have been used to generate this
-manuscript are made available under GPL3 licenses. Databases and compiled
+source code for all programs and libraries and the data that have been used to generate this
+manuscript are made available under GPL3 licenses. Data and compiled
programs with all dependencies for the reproduction of results in this manuscript are available as
a self-contained docker image. All data, tables and figures in this manuscript
was generated directly from experimental results using the `R` package `knitR`.
-A single command repeats all experiments (possibly with different settings) and
-updates the manuscript with the new results.
+<!-- A single command repeats all experiments (possibly with different settings) and
+updates the manuscript with the new results. -->
<!--
overcome these issues, all databases and programs that have been used to
@@ -462,7 +462,7 @@ c.mg$sd <- ave(c.mg$LOAEL,c.mg$SMILES,FUN=sd)
```
Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities.
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases.
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases.
The Nestlé database has `r length(m$SMILES)` LOAEL values for
`r length(levels(m$SMILES))` unique structures, `r m.dupnr` compounds have
@@ -493,10 +493,6 @@ The combined test set has a mean standard deviation (-log10 transformed values)
In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
-[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
-the baseline reference for the comparison with predicted values.
-
```{r echo=F}
data <- read.csv("data/median-correlation.csv",header=T)
cor <- cor.test(data$mazzatorta,data$swiss)
@@ -513,6 +509,10 @@ experimental variability. Correlation analysis shows a significant (p-value < 2
correlation between the experimental data in both databases with r\^2:
`r round(median.r.square,2)`, RMSE: `r round(median.rmse,2)`
+[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
+the baseline reference for the comparison with predicted values.
+
![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
identical values in both databases was removed from
analysis.](figures/median-correlation.pdf){#fig:datacorr}
diff --git a/loael.md b/loael.md
index ec3f743..7dbe8a4 100644
--- a/loael.md
+++ b/loael.md
@@ -98,13 +98,13 @@ models in the context of experimental variability.
An important limitation often raised for computational toxicology is the lack
of transparency on published models and consequently on the difficulty for the
scientific community to reproduce and apply them. To overcome these issues,
-source code for all programs and libraries and the databases that have been used to generate this
-manuscript are made available under GPL3 licenses. Databases and compiled
+source code for all programs and libraries and the data that have been used to generate this
+manuscript are made available under GPL3 licenses. Data and compiled
programs with all dependencies for the reproduction of results in this manuscript are available as
a self-contained docker image. All data, tables and figures in this manuscript
was generated directly from experimental results using the `R` package `knitR`.
-A single command repeats all experiments (possibly with different settings) and
-updates the manuscript with the new results.
+<!-- A single command repeats all experiments (possibly with different settings) and
+updates the manuscript with the new results. -->
<!--
overcome these issues, all databases and programs that have been used to
@@ -424,7 +424,7 @@ same experiments.
Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities.
-For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases.
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements. Mean standard deviations and thus experimental variabilities are similar for both databases.
The Nestlé database has 567 LOAEL values for
445 unique structures, 93 compounds have
@@ -455,10 +455,6 @@ The combined test set has a mean standard deviation (-log10 transformed values)
In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
-[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
-the baseline reference for the comparison with predicted values.
-
[@fig:datacorr] depicts the correlation between LOAEL values from both
@@ -469,6 +465,10 @@ experimental variability. Correlation analysis shows a significant (p-value < 2
correlation between the experimental data in both databases with r\^2:
0.52, RMSE: 0.59
+[@fig:comp] shows the experimental LOAEL variability of compounds occurring in
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
+the baseline reference for the comparison with predicted values.
+
![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
identical values in both databases was removed from
analysis.](figures/median-correlation.pdf){#fig:datacorr}
diff --git a/loael.pdf b/loael.pdf
index f4c1351..4c35492 100644
--- a/loael.pdf
+++ b/loael.pdf
Binary files differ
diff --git a/loael.tex b/loael.tex
index 6b9a2fb..52712f3 100644
--- a/loael.tex
+++ b/loael.tex
@@ -178,14 +178,12 @@ An important limitation often raised for computational toxicology is the
lack of transparency on published models and consequently on the
difficulty for the scientific community to reproduce and apply them. To
overcome these issues, source code for all programs and libraries and
-the databases that have been used to generate this manuscript are made
-available under GPL3 licenses. Databases and compiled programs with all
+the data that have been used to generate this manuscript are made
+available under GPL3 licenses. Data and compiled programs with all
dependencies for the reproduction of results in this manuscript are
available as a self-contained docker image. All data, tables and figures
in this manuscript was generated directly from experimental results
-using the \texttt{R} package \texttt{knitR}. A single command repeats
-all experiments (possibly with different settings) and updates the
-manuscript with the new results.
+using the \texttt{R} package \texttt{knitR}.
\section{Materials and Methods}\label{materials-and-methods}
@@ -523,8 +521,8 @@ variability}\label{intra-database-variability}
Both databases contain substances with multiple measurements, which
allow the determination of experimental variabilities. For this purpose
we have calculated the mean standard deviation of compounds with
-multiple measurements, which is roughly a factor of 2 for both
-databases.
+multiple measurements. Mean standard deviations and thus experimental
+variabilities are similar for both databases.
The Nestlé database has 567 LOAEL values for 445 unique structures, 93
compounds have multiple measurements with a mean standard deviation
@@ -557,11 +555,6 @@ In order to compare the correlation of LOAEL values in both databases
and to establish a reference for predicted values, we have investigated
compounds, that occur in both databases.
-Figure~\ref{fig:comp} shows the experimental LOAEL variability of
-compounds occurring in both datasets (i.e.~the \emph{test} dataset)
-colored in blue (experimental). This is the baseline reference for the
-comparison with predicted values.
-
Figure~\ref{fig:datacorr} depicts the correlation between LOAEL values
from both databases. As both databases contain duplicates medians were
used for the correlation plot and statistics. It should be kept in mind
@@ -571,6 +564,11 @@ Correlation analysis shows a significant (p-value \textless{} 2.2e-16)
correlation between the experimental data in both databases with r\^{}2:
0.52, RMSE: 0.59
+Figure~\ref{fig:comp} shows the experimental LOAEL variability of
+compounds occurring in both datasets (i.e.~the \emph{test} dataset)
+colored in blue (experimental). This is the baseline reference for the
+comparison with predicted values.
+
\begin{figure}
\centering
\includegraphics{figures/median-correlation.pdf}