summaryrefslogtreecommitdiff
path: root/paper/loael.Rmd
diff options
context:
space:
mode:
Diffstat (limited to 'paper/loael.Rmd')
-rw-r--r--paper/loael.Rmd34
1 files changed, 6 insertions, 28 deletions
diff --git a/paper/loael.Rmd b/paper/loael.Rmd
index 98c9e81..c34f5f6 100644
--- a/paper/loael.Rmd
+++ b/paper/loael.Rmd
@@ -38,7 +38,10 @@ We are using two datasets, one from [@mazzatorta08] (*Mazzatorta* dataset) and o
Elena: do you have a reference and the name of the department?
```{r echo=F}
-t = read.csv("data/test.csv")
+m = read.csv("data/mazzatorta.csv",header=T)
+s = read.csv("data/swiss.csv",header=T)
+t = read.csv("data/test.csv",header=T)
+c = read.csv("data/combined.csv",header=T)
```
`r length(unique(t$SMILES))` compounds are common in both datasets and we use them as a test set in our investigation. For this test set we will
@@ -65,13 +68,6 @@ Materials and Methods
Datasets
--------
-```{r echo=F}
-m = read.csv("data/mazzatorta.csv",header=T)
-s = read.csv("data/swiss.csv",header=T)
-t = read.csv("data/test.csv",header=T)
-c = read.csv("data/combined.csv",header=T)
-```
-
### Mazzatorta dataset
The first dataset (*Mazzatorta* dataset for further reference) originates from
@@ -306,7 +302,7 @@ The Mazzatorta dataset has `r length(m$SMILES)` LOAEL values for `r length(level
The Swiss Federal Office dataset has `r length(s$SMILES)` rat LOAEL values for `r length(levels(s$SMILES))` unique structures, `r s.dupnr` compounds have multiple measurements with a similar variance (average `r round(mean(s.dup$var),2)` log10 units). Variances of both datasets do not show a statistically significant difference with a
p-value (t-test) of `r round(p,2)`.
-![Variability of LOAEL values in both datasets: Each vertical line represents a compound, dots are individual LOAEL values.](figure/dataset-variability.pdf){#fig:intra}
+![Distribution and variability of LOAEL values in both datasets: Each vertical line represents a compound, dots are individual LOAEL values.](figure/dataset-variability.pdf){#fig:intra}
##### Inter dataset variability
@@ -315,7 +311,7 @@ p-value (t-test) of `r round(p,2)`.
##### LOAEL correlation between datasets
```{r echo=F}
-data <- read.csv("data/common-median.csv",header=T)
+data <- read.csv("data/median-correlation.csv",header=T)
cor <- cor.test(-log(data$mazzatorta),-log(data$swiss))
median.p <- cor$p.value
median.r.square <- round(rsquare(-log(data$mazzatorta),-log(data$swiss)),2)
@@ -335,14 +331,7 @@ The Mazzatorta, the Swiss Federal Office dataset and a combined dataset were use
![Comparison of experimental with predicted LOAEL values, each vertical line represents a compound, dots are individual measurements (red) or predictions (green).](figure/test-prediction.pdf){#fig:comp}
```{r echo=F}
-mazzatorta = read.csv("data/mazzatorta-test-predictions.csv",header=T)
-swiss = read.csv("data/swiss-test-predictions.csv",header=T)
combined = read.csv("data/combined-test-predictions.csv",header=T)
-
-mazzatorta.r_square = round(rsquare(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2)
-mazzatorta.rmse = round(rmse(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2)
-swiss.r_square = round(rsquare(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2)
-swiss.rmse = round(rmse(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2)
combined.r_square = round(rsquare(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2)
combined.rmse = round(rmse(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2)
```
@@ -356,8 +345,6 @@ These results are presented in [@fig:corr] and [@tbl:cv]. Please bear in mind th
Training data | $r^2$ | RMSE
--------------|---------------------------|-------------------------
Experimental | `r median.r.square` | `r median.rmse`
-Mazzatorta | `r mazzatorta.r_square` | `r mazzatorta.rmse`
-Swiss Federal Office |`r swiss.r_square` | `r swiss.rmse`
Combined | `r combined.r_square` | `r combined.rmse`
: Comparison of model predictions with experimental variability. {#tbl:common-pred}
@@ -365,14 +352,7 @@ Combined | `r combined.r_square` | `r combined.rmse`
![Correlation of experimental with predicted LOAEL values (test set)](figure/test-correlation.pdf){#fig:corr}
```{r echo=F}
-mazzatorta = read.csv("data/mazzatorta-cv.csv",header=T)
-swiss = read.csv("data/swiss-cv.csv",header=T)
combined = read.csv("data/combined-cv.csv",header=T)
-
-cv.mazzatorta.r_square = round(rsquare(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2)
-cv.mazzatorta.rmse = round(rmse(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2)
-cv.swiss.r_square = round(rsquare(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2)
-cv.swiss.rmse = round(rmse(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2)
cv.combined.r_square = round(rsquare(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2)
cv.combined.rmse = round(rmse(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2)
```
@@ -383,8 +363,6 @@ All correlations are statistically highly significant with a p-value < 2.2e-16.
Training dataset | $r^2$ | RMSE
-----------------|-------|------
-Mazzatorta | `r round(cv.mazzatorta.r_square,2)` | `r round(cv.mazzatorta.rmse,2)`
-Swiss Federal Office | `r round(cv.swiss.r_square,2)` | `r round(cv.swiss.rmse,2)`
Combined | `r round(cv.combined.r_square,2)` | `r round(cv.combined.rmse,2)`
: 10-fold crossvalidation results {#tbl:cv}