From d467b34ca9ea79095205d022b9a62888294b543d Mon Sep 17 00:00:00 2001 From: Christoph Helma Date: Mon, 18 Dec 2017 17:13:03 +0100 Subject: abstract, tex file added --- Makefile | 8 +- loael.Rmd | 63 ++-- loael.md | 93 ++++-- loael.pdf | Bin 359957 -> 471524 bytes loael.tex | 931 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ references.bibtex | 6 +- 6 files changed, 1042 insertions(+), 59 deletions(-) create mode 100644 loael.tex diff --git a/Makefile b/Makefile index 50b9456..19cff1d 100644 --- a/Makefile +++ b/Makefile @@ -6,12 +6,14 @@ validations = data/training-test-predictions.csv $(crossvalidations) data/miscla figures = figures/functional-groups.pdf figures/test-prediction.pdf figures/prediction-test-correlation.pdf figures/dataset-variability.pdf figures/median-correlation.pdf figures/crossvalidation0.pdf figures/crossvalidation1.pdf figures/crossvalidation2.pdf # Paper +loael.pdf: loael.tex + pdflatex loael.tex; pdflatex loael.tex -loael.pdf: loael.md references.bibtex - pandoc -s --bibliography=references.bibtex --latex-engine=pdflatex --filter pandoc-crossref --filter pandoc-citeproc -o loael.pdf loael.md +loael.tex: loael.md references.bibtex + pandoc -s --bibliography=references.bibtex --filter pandoc-crossref --filter pandoc-citeproc -o loael.tex loael.md loael.md: loael.Rmd $(figures) $(datasets) $(validations) - Rscript --vanilla -e "library(knitr); knit('loael.Rmd');" + export LANG=en_US.UTF-8; Rscript --vanilla -e "library(knitr); knit('loael.Rmd');" loael.docx: loael.md pandoc -s --bibliography=references.bibtex --latex-engine=pdflatex --filter pandoc-crossref --filter pandoc-citeproc -o loael.docx loael.md diff --git a/loael.Rmd b/loael.Rmd index 2a32482..2905bb5 100644 --- a/loael.Rmd +++ b/loael.Rmd @@ -1,15 +1,27 @@ --- -author: | - Christoph Helma^1^, David Vorgrimmler^1^, Denis Gebele^1^, Martin Gütlein^2^, Benoit Schilter^3^, Elena Lo Piparo^3^ -title: | - Modeling Chronic Toxicity: A comparison of experimental variability with read across predictions +title: 'Modeling Chronic Toxicity: A comparison of experimental variability with read across predictions' +author: + - Christoph Helma^1^ + - David Vorgrimmler^1^ + - Denis Gebele^1^ + - Martin Gütlein^2^ + - Benoit Schilter^3^ + - Elena Lo Piparo^3^ include-before: ^1^ in silico toxicology gmbh, Basel, Switzerland\newline^2^ Inst. f. Computer Science, Johannes Gutenberg Universität Mainz, Germany\newline^3^ Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland -keywords: (Q)SAR, read-across, LOAEL +keywords: (Q)SAR, read-across, LOAEL, experimental variability date: \today -abstract: " " -documentclass: achemso +abstract: | + This study compares the accuracy of (Q)SAR/read-across predictions with the + experimental variability of chronic LOAEL values from *in vivo* experiments. + We could demonstrate that predictions of the `lazar` lazar algrorithm within + the applicability domain of the training data have the same variability as + the experimental training data. Predictions with a lower similarity threshold + (i.e. a larger distance from the applicability domain) are also significantly + better than random guessing, but the errors to be expected are higher and + a manual inspection of prediction results is highly recommended. + +documentclass: article bibliography: references.bibtex -bibliographystyle: achemso figPrefix: Figure eqnPrefix: Equation tblPrefix: Table @@ -18,6 +30,8 @@ output: pdf_document: fig_caption: yes header-includes: + - \usepackage{a4wide} + - \linespread{2} - \usepackage{lineno} - \linenumbers ... @@ -89,12 +103,20 @@ were exploited to generate information on the reproducibility of chronic animal studies and were used to evaluate prediction performance of the models in the context of experimental variability. -An important limitation often raised for computational toxicology is the -lack of transparency on published models and consequently on the -difficulty for the scientific community to reproduce and apply them. To +An important limitation often raised for computational toxicology is the lack +of transparency on published models and consequently on the difficulty for the +scientific community to reproduce and apply them. To overcome these issues, +source code for all programs and libraries and the databases that have been used to generate this +manuscript are made available under GPL3 licenses. Databases and compiled +programs with all dependencies for the reproduction of results in this manuscript are available as +a self-contained docker image. All data, tables and figures in this manuscript +was generated directly from experimental results using the `R` package `knitR`. +A single command repeats all experiments (possibly with different settings) and +updates the manuscript with the new results. + + Materials and Methods ===================== @@ -128,9 +150,11 @@ observed effect levels (LOAEL) for rats (*Rattus norvegicus*) after oral (gavage, diet, drinking water) administration. The Nestlé database consists of `r length(m$SMILES)` LOAEL values for `r length(unique(m$SMILES))` unique chemical structures. -The Nestlé database can be obtained from the following GitHub links: [original data](https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv), -[unique smiles](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv), -[-log10 transfomed LOAEL](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv). +The Nestlé database can be obtained from the following GitHub links: + + - original data: [https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv](https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv) + - unique smiles: [https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv) + - -log10 transfomed LOAEL: [https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv). ### Swiss Food Safety and Veterinary Office (FSVO) database @@ -143,9 +167,10 @@ described elsewhere [@Zarn2011, @Zarn2013]. The FSVO-database consists of `r length(s$SMILES)` rat LOAEL values for `r length(unique(s$SMILES))` unique chemical structures. It can be obtained from the following GitHub links: -[original data](https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv), -[unique smiles and mmol/kg_bw/day units](https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv), -[-log10 transfomed LOAEL](https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv). + - original data: [https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv](https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv) + - unique smiles and mmol/kg_bw/day units: [https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv](https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv) + - -log10 transfomed LOAEL: [https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv](https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv) + ### Preprocessing diff --git a/loael.md b/loael.md index f2a967c..0ca8d7e 100644 --- a/loael.md +++ b/loael.md @@ -1,15 +1,27 @@ --- -author: | - Christoph Helma^1^, David Vorgrimmler^1^, Denis Gebele^1^, Martin Gtlein^2^, Benoit Schilter^3^, Elena Lo Piparo^3^ -title: | - Modeling Chronic Toxicity: A comparison of experimental variability with read across predictions -include-before: ^1^ in silico toxicology gmbh, Basel, Switzerland\newline^2^ Inst. f. Computer Science, Johannes Gutenberg Universitt Mainz, Germany\newline^3^ Chemical Food Safety Group, Nestl Research Center, Lausanne, Switzerland -keywords: (Q)SAR, read-across, LOAEL +title: 'Modeling Chronic Toxicity: A comparison of experimental variability with read across predictions' +author: + - Christoph Helma^1^ + - David Vorgrimmler^1^ + - Denis Gebele^1^ + - Martin Gütlein^2^ + - Benoit Schilter^3^ + - Elena Lo Piparo^3^ +include-before: ^1^ in silico toxicology gmbh, Basel, Switzerland\newline^2^ Inst. f. Computer Science, Johannes Gutenberg Universität Mainz, Germany\newline^3^ Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland +keywords: (Q)SAR, read-across, LOAEL, experimental variability date: \today -abstract: " " -documentclass: achemso +abstract: | + This study compares the accuracy of (Q)SAR/read-across predictions with the + experimental variability of chronic LOAEL values from *in vivo* experiments. + We could demonstrate that predictions of the `lazar` lazar algrorithm within + the applicability domain of the training data have the same variability as + the experimental training data. Predictions with a lower similarity threshold + (i.e. a larger distance from the applicability domain) are also significantly + better than random guessing, but the errors to be expected are higher and + a manual inspection of prediction results is highly recommended. + +documentclass: article bibliography: references.bibtex -bibliographystyle: achemso figPrefix: Figure eqnPrefix: Equation tblPrefix: Table @@ -18,6 +30,8 @@ output: pdf_document: fig_caption: yes header-includes: + - \usepackage{a4wide} + - \linespread{2} - \usepackage{lineno} - \linenumbers ... @@ -81,12 +95,20 @@ were exploited to generate information on the reproducibility of chronic animal studies and were used to evaluate prediction performance of the models in the context of experimental variability. -An important limitation often raised for computational toxicology is the -lack of transparency on published models and consequently on the -difficulty for the scientific community to reproduce and apply them. To +An important limitation often raised for computational toxicology is the lack +of transparency on published models and consequently on the difficulty for the +scientific community to reproduce and apply them. To overcome these issues, +source code for all programs and libraries and the databases that have been used to generate this +manuscript are made available under GPL3 licenses. Databases and compiled +programs with all dependencies for the reproduction of results in this manuscript are available as +a self-contained docker image. All data, tables and figures in this manuscript +was generated directly from experimental results using the `R` package `knitR`. +A single command repeats all experiments (possibly with different settings) and +updates the manuscript with the new results. + + Materials and Methods ===================== @@ -112,17 +134,19 @@ and datasets, links to source code and data sources are included in the text. Datasets -------- -### Nestl database +### Nestlé database -The first database (Nestl database for further reference) originates +The first database (Nestlé database for further reference) originates from the publication of [@mazzatorta08]. It contains chronic (> 180 days) lowest observed effect levels (LOAEL) for rats (*Rattus norvegicus*) after oral -(gavage, diet, drinking water) administration. The Nestl database consists +(gavage, diet, drinking water) administration. The Nestlé database consists of 567 LOAEL values for 445 unique chemical structures. -The Nestl database can be obtained from the following GitHub links: [original data](https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv), -[unique smiles](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv), -[-log10 transfomed LOAEL](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv). +The Nestlé database can be obtained from the following GitHub links: + + - original data: [https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv](https://github.com/opentox/loael-paper/blob/submission/data/LOAEL_mg_corrected_smiles_mmol.csv) + - unique smiles: [https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta.csv) + - -log10 transfomed LOAEL: [https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv](https://github.com/opentox/loael-paper/blob/submission/data/mazzatorta_log10.csv). ### Swiss Food Safety and Veterinary Office (FSVO) database @@ -135,9 +159,10 @@ described elsewhere [@Zarn2011, @Zarn2013]. The FSVO-database consists of 493 rat LOAEL values for 381 unique chemical structures. It can be obtained from the following GitHub links: -[original data](https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv), -[unique smiles and mmol/kg_bw/day units](https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv), -[-log10 transfomed LOAEL](https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv). + - original data: [https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv](https://github.com/opentox/loael-paper/blob/submission/data/NOAEL-LOAEL_SMILES_rat_chron.csv) + - unique smiles and mmol/kg_bw/day units: [https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv](https://github.com/opentox/loael-paper/blob/submission/data/swiss.csv) + - -log10 transfomed LOAEL: [https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv](https://github.com/opentox/loael-paper/blob/submission/data/swiss_log10.csv) + ### Preprocessing @@ -169,7 +194,7 @@ unique chemical structures and was used for The [*training* dataset](https://github.com/opentox/loael-paper/blob/submission/data/training_log10.csv) -is the union of the Nestl and the FSVO databases and it was used to build +is the union of the Nestlé and the FSVO databases and it was used to build predictive models. LOAEL duplicates were removed using the same criteria as for the test dataset. The training dataset has 998 LOAEL values for 671 unique chemical structures. @@ -304,7 +329,7 @@ with independent training/test set splits are provided as additional information to the test set results. The final model for production purposes was trained with all available LOAEL -data (Nestl and FSVO databases combined). +data (Nestlé and FSVO databases combined). ## Availability @@ -359,7 +384,7 @@ as physico-chemical properties and concluded that both datasets are very similar, both in terms of chemical structures and physico-chemical properties. The only statistically significant difference between both datasets, is that -the Nestl database contains more small compounds (61 structures with less than +the Nestlé database contains more small compounds (61 structures with less than 11 atoms) than the FSVO-database (19 small structures, p-value 3.7E-7).