diff options
Diffstat (limited to 'paper/loael.Rmd')
-rw-r--r-- | paper/loael.Rmd | 34 |
1 files changed, 6 insertions, 28 deletions
diff --git a/paper/loael.Rmd b/paper/loael.Rmd index 98c9e81..c34f5f6 100644 --- a/paper/loael.Rmd +++ b/paper/loael.Rmd @@ -38,7 +38,10 @@ We are using two datasets, one from [@mazzatorta08] (*Mazzatorta* dataset) and o Elena: do you have a reference and the name of the department? ```{r echo=F} -t = read.csv("data/test.csv") +m = read.csv("data/mazzatorta.csv",header=T) +s = read.csv("data/swiss.csv",header=T) +t = read.csv("data/test.csv",header=T) +c = read.csv("data/combined.csv",header=T) ``` `r length(unique(t$SMILES))` compounds are common in both datasets and we use them as a test set in our investigation. For this test set we will @@ -65,13 +68,6 @@ Materials and Methods Datasets -------- -```{r echo=F} -m = read.csv("data/mazzatorta.csv",header=T) -s = read.csv("data/swiss.csv",header=T) -t = read.csv("data/test.csv",header=T) -c = read.csv("data/combined.csv",header=T) -``` - ### Mazzatorta dataset The first dataset (*Mazzatorta* dataset for further reference) originates from @@ -306,7 +302,7 @@ The Mazzatorta dataset has `r length(m$SMILES)` LOAEL values for `r length(level The Swiss Federal Office dataset has `r length(s$SMILES)` rat LOAEL values for `r length(levels(s$SMILES))` unique structures, `r s.dupnr` compounds have multiple measurements with a similar variance (average `r round(mean(s.dup$var),2)` log10 units). Variances of both datasets do not show a statistically significant difference with a p-value (t-test) of `r round(p,2)`. -![Variability of LOAEL values in both datasets: Each vertical line represents a compound, dots are individual LOAEL values.](figure/dataset-variability.pdf){#fig:intra} +![Distribution and variability of LOAEL values in both datasets: Each vertical line represents a compound, dots are individual LOAEL values.](figure/dataset-variability.pdf){#fig:intra} ##### Inter dataset variability @@ -315,7 +311,7 @@ p-value (t-test) of `r round(p,2)`. ##### LOAEL correlation between datasets ```{r echo=F} -data <- read.csv("data/common-median.csv",header=T) +data <- read.csv("data/median-correlation.csv",header=T) cor <- cor.test(-log(data$mazzatorta),-log(data$swiss)) median.p <- cor$p.value median.r.square <- round(rsquare(-log(data$mazzatorta),-log(data$swiss)),2) @@ -335,14 +331,7 @@ The Mazzatorta, the Swiss Federal Office dataset and a combined dataset were use ![Comparison of experimental with predicted LOAEL values, each vertical line represents a compound, dots are individual measurements (red) or predictions (green).](figure/test-prediction.pdf){#fig:comp} ```{r echo=F} -mazzatorta = read.csv("data/mazzatorta-test-predictions.csv",header=T) -swiss = read.csv("data/swiss-test-predictions.csv",header=T) combined = read.csv("data/combined-test-predictions.csv",header=T) - -mazzatorta.r_square = round(rsquare(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2) -mazzatorta.rmse = round(rmse(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2) -swiss.r_square = round(rsquare(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2) -swiss.rmse = round(rmse(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2) combined.r_square = round(rsquare(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2) combined.rmse = round(rmse(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2) ``` @@ -356,8 +345,6 @@ These results are presented in [@fig:corr] and [@tbl:cv]. Please bear in mind th Training data | $r^2$ | RMSE --------------|---------------------------|------------------------- Experimental | `r median.r.square` | `r median.rmse` -Mazzatorta | `r mazzatorta.r_square` | `r mazzatorta.rmse` -Swiss Federal Office |`r swiss.r_square` | `r swiss.rmse` Combined | `r combined.r_square` | `r combined.rmse` : Comparison of model predictions with experimental variability. {#tbl:common-pred} @@ -365,14 +352,7 @@ Combined | `r combined.r_square` | `r combined.rmse` ![Correlation of experimental with predicted LOAEL values (test set)](figure/test-correlation.pdf){#fig:corr} ```{r echo=F} -mazzatorta = read.csv("data/mazzatorta-cv.csv",header=T) -swiss = read.csv("data/swiss-cv.csv",header=T) combined = read.csv("data/combined-cv.csv",header=T) - -cv.mazzatorta.r_square = round(rsquare(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2) -cv.mazzatorta.rmse = round(rmse(-log(mazzatorta$LOAEL_measured_median),-log(mazzatorta$LOAEL_predicted)),2) -cv.swiss.r_square = round(rsquare(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2) -cv.swiss.rmse = round(rmse(-log(swiss$LOAEL_measured_median),-log(swiss$LOAEL_predicted)),2) cv.combined.r_square = round(rsquare(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2) cv.combined.rmse = round(rmse(-log(combined$LOAEL_measured_median),-log(combined$LOAEL_predicted)),2) ``` @@ -383,8 +363,6 @@ All correlations are statistically highly significant with a p-value < 2.2e-16. Training dataset | $r^2$ | RMSE -----------------|-------|------ -Mazzatorta | `r round(cv.mazzatorta.r_square,2)` | `r round(cv.mazzatorta.rmse,2)` -Swiss Federal Office | `r round(cv.swiss.r_square,2)` | `r round(cv.swiss.rmse,2)` Combined | `r round(cv.combined.r_square,2)` | `r round(cv.combined.rmse,2)` : 10-fold crossvalidation results {#tbl:cv} |