benoits comments

author: Christoph Helma <helma@in-silico.ch> 2017-12-20 14:25:02 +0100
committer: Christoph Helma <helma@in-silico.ch> 2017-12-20 14:25:02 +0100
commit: a5a9144dd7eb4cb9455c5674325ce6e0cc17af61 (patch)
tree: 608e85d62e1fa8992f2acb36c6cfb29a264a44ae
parent: d467b34ca9ea79095205d022b9a62888294b543d (diff)
4 files changed, 88 insertions, 66 deletions
diff --git a/loael.Rmd b/loael.Rmd
index 2905bb5..3ff6146 100644
--- a/loael.Rmd
+++ b/loael.Rmd
@@ -175,12 +175,12 @@ structures. It can be obtained from the following GitHub links:
 ### Preprocessing
 
 Chemical structures (represented as SMILES [@doi:10.1021/ci00057a005]) in both
-datasets were checked for correctness. When syntactically incorrect or missing
+databases were checked for correctness. When syntactically incorrect or missing
 SMILES were generated from other identifiers (e.g names, CAS numbers). Unique
 smiles from the OpenBabel library [@OBoyle2011] were used for the
 identification of duplicated structures. 
 
-Studies with undefined or empty LOAEL entries were removed from the datasets.
+Studies with undefined or empty LOAEL entries were removed from the databases
 LOAEL values were converted to mmol/kg_bw/day units and rounded to five
 significant digits. For prediction, validation and visualisation purposes
 -log10 transformations are used.
@@ -373,9 +373,9 @@ baseline for evaluating prediction performance.
 fg = read.csv('data/functional-groups.csv',head=F)
 ```
 
-In order to compare the structural diversity of both datasets we evaluated the
+In order to compare the structural diversity of both databases we evaluated the
 frequency of functional groups from the OpenBabel FP4 fingerprint. [@fig:fg]
-shows the frequency of functional groups in both datasets. `r length(fg$V1)`
+shows the frequency of functional groups in both databases `r length(fg$V1)`
 functional groups with a frequency > 25 are depicted, the complete table for
 all functional groups can be found in the supplemental
 material at [GitHub](https://github.com/opentox/loael-paper/blob/submission/data/functional-groups.csv).
@@ -390,10 +390,10 @@ CheS-Mapper can be used to analyze the relationship between the structure of
 chemical compounds, their physico-chemical properties, and biological or toxic
 effects. It depicts closely related (similar) compounds in 3D space and can be
 used with different kinds of features. We have investigated structural as well
-as physico-chemical properties and concluded that both datasets are very
+as physico-chemical properties and concluded that both databases are very
 similar, both in terms of chemical structures and physico-chemical properties. 
 
-The only statistically significant difference between both datasets, is that
+The only statistically significant difference between both databases, is that
 the Nestlé database contains more small compounds (61 structures with less than
 11 atoms) than the FSVO-database (19 small structures, p-value 3.7E-7).
 
@@ -421,11 +421,11 @@ MolPrint2D features that are utilized for model building in this work.
 
 ### Experimental variability versus prediction uncertainty 
 
-Duplicated LOAEL values can be found in both datasets and there is
+Duplicated LOAEL values can be found in both databases and there is
 a substantial number of `r length(unique(t$SMILES))` compounds with more than
 one LOAEL. These chemicals allow us to estimate the variability of
-experimental results within individual datasets and between datasets. Data with
-*identical* values (at five significant digits) in both datasets were excluded
+experimental results within individual databases and between databases. Data with
+*identical* values (at five significant digits) in both databases were excluded
 from variability analysis, because it it likely that they originate from the
 same experiments.
 
@@ -461,6 +461,9 @@ c.mg = read.csv("data/all_mg_dup.csv",header=T)
 c.mg$sd <- ave(c.mg$LOAEL,c.mg$SMILES,FUN=sd)
 ```
 
+Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases. 
+
 The Nestlé database has `r length(m$SMILES)` LOAEL values for
 `r length(levels(m$SMILES))` unique structures, `r m.dupnr` compounds have
 multiple measurements with a mean standard deviation (-log10 transformed
@@ -476,7 +479,7 @@ multiple measurements with a mean standard deviation (-log10 transformed values)
 `r round(mean(10^(-1*s.dup$sd)),2)` mmol/kg_bw/day)
 ([@fig:intra]). 
 
-Standard deviations of both datasets do not show
+Standard deviations of both databases do not show
 a statistically significant difference with a p-value (t-test) of `r round(p,2)`.
 The combined test set has a mean standard deviation (-log10 transformed values) of
 `r round(mean(c.dup$sd),2)`
@@ -484,12 +487,14 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 `r round(mean(10^(-1*c.dup$sd)),2)` mmol/kg_bw/day)
 ([@fig:intra]). 
 
-![Distribution and variability of LOAEL values in both datasets. Each vertical line represents a compound, dots are individual LOAEL values.](figures/dataset-variability.pdf){#fig:intra}
+![Distribution and variability of compounds with multiple LOAEL values in both databases Each vertical line represents a compound, dots are individual LOAEL values.](figures/dataset-variability.pdf){#fig:intra}
 
 ##### Inter database variability
 
+In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
+
 [@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in red (experimental). This is
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
 the baseline reference for the comparison with predicted values.
 
 ```{r echo=F}
@@ -501,11 +506,11 @@ median.rmse <- round(rmse(data$mazzatorta,data$swiss),2)
 ``` 
 
 [@fig:datacorr] depicts the correlation between LOAEL values from both
-datasets. As both datasets contain duplicates medians were used for the
+databases. As both databases contain duplicates medians were used for the
 correlation plot and statistics. It should be kept in mind that the aggregation of duplicated
 measurements into a single median value hides a substantial portion of the
 experimental variability.  Correlation analysis shows a significant (p-value < 2.2e-16)
-correlation between the experimental data in both datasets with r\^2:
+correlation between the experimental data in both databases with r\^2:
 `r round(median.r.square,2)`, RMSE: `r round(median.rmse,2)`
 
 ![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
@@ -530,7 +535,7 @@ correct_predictions = length(training$SMILES)-incorrect_predictions
 ```
 
 In order to compare the performance of *in silico* read across models with
-experimental variability we are using compounds that occur in both datasets as
+experimental variability we are using compounds with multiple measurements as
 a test set (`r  length(t$SMILES)` measurements, `r  length(unique(t$SMILES))`
 compounds). `lazar` read across predictions were obtained for
 `r length(unique(t$SMILES))` compounds, `r  length(unique(t$SMILES)) - length(training$SMILES)`
@@ -697,7 +702,7 @@ Nestlé and FSVO databases are very similar in terms of chemical
 structures and properties as well as distribution of experimental LOAEL
 values. The only significant difference that we observed was that the
 Nestlé one has larger amount of small molecules, than the FSVO database.
-For this reason we pooled both dataset into a single training dataset
+For this reason we pooled both databases into a single training dataset
 for read across predictions.
 
 An early review of the databases revealed that 155 out of the 671
@@ -726,8 +731,8 @@ data.
 
 Predictions with a warning (neighbor similarity &lt; 0.5 and &gt; 0.2 or
 weighted average predictions) are more uncertain. However, they still
-show a strong correlation with experimental data, but the errors are
-larger than for compounds within the applicability domain. Expected
+show a strong correlation with experimental data, but the errors are ~ 20-40\%
+larger than for compounds within the applicability domain ([@fig:corr] and [@tbl:cv]). Expected
 errors are displayed as 95% prediction intervals, which covers 100% of
 the experimental data. The main advantage of lowering the similarity
 threshold is that it allows to predict a much larger number of
@@ -795,7 +800,7 @@ where no predictions can be made, because there are no similar compounds in the
  and in such cases it is preferable to avoid predictions instead of random guessing.
 -->
 
-Elena: Should we add a GUI screenshot?
+TODO: GUI screenshot
 
 <!--
 is covered in
diff --git a/loael.md b/loael.md
index 0ca8d7e..fe9eb27 100644
--- a/loael.md
+++ b/loael.md
@@ -167,12 +167,12 @@ structures. It can be obtained from the following GitHub links:
 ### Preprocessing
 
 Chemical structures (represented as SMILES [@doi:10.1021/ci00057a005]) in both
-datasets were checked for correctness. When syntactically incorrect or missing
+databases were checked for correctness. When syntactically incorrect or missing
 SMILES were generated from other identifiers (e.g names, CAS numbers). Unique
 smiles from the OpenBabel library [@OBoyle2011] were used for the
 identification of duplicated structures. 
 
-Studies with undefined or empty LOAEL entries were removed from the datasets.
+Studies with undefined or empty LOAEL entries were removed from the databases
 LOAEL values were converted to mmol/kg_bw/day units and rounded to five
 significant digits. For prediction, validation and visualisation purposes
 -log10 transformations are used.
@@ -363,9 +363,9 @@ baseline for evaluating prediction performance.
 
 
 
-In order to compare the structural diversity of both datasets we evaluated the
+In order to compare the structural diversity of both databases we evaluated the
 frequency of functional groups from the OpenBabel FP4 fingerprint. [@fig:fg]
-shows the frequency of functional groups in both datasets. 139
+shows the frequency of functional groups in both databases 139
 functional groups with a frequency > 25 are depicted, the complete table for
 all functional groups can be found in the supplemental
 material at [GitHub](https://github.com/opentox/loael-paper/blob/submission/data/functional-groups.csv).
@@ -380,10 +380,10 @@ CheS-Mapper can be used to analyze the relationship between the structure of
 chemical compounds, their physico-chemical properties, and biological or toxic
 effects. It depicts closely related (similar) compounds in 3D space and can be
 used with different kinds of features. We have investigated structural as well
-as physico-chemical properties and concluded that both datasets are very
+as physico-chemical properties and concluded that both databases are very
 similar, both in terms of chemical structures and physico-chemical properties. 
 
-The only statistically significant difference between both datasets, is that
+The only statistically significant difference between both databases, is that
 the Nestlé database contains more small compounds (61 structures with less than
 11 atoms) than the FSVO-database (19 small structures, p-value 3.7E-7).
 
@@ -411,11 +411,11 @@ MolPrint2D features that are utilized for model building in this work.
 
 ### Experimental variability versus prediction uncertainty 
 
-Duplicated LOAEL values can be found in both datasets and there is
+Duplicated LOAEL values can be found in both databases and there is
 a substantial number of 155 compounds with more than
 one LOAEL. These chemicals allow us to estimate the variability of
-experimental results within individual datasets and between datasets. Data with
-*identical* values (at five significant digits) in both datasets were excluded
+experimental results within individual databases and between databases. Data with
+*identical* values (at five significant digits) in both databases were excluded
 from variability analysis, because it it likely that they originate from the
 same experiments.
 
@@ -423,6 +423,9 @@ same experiments.
 
 
 
+Both databases contain substances with multiple measurements, which allow the determination of experimental variabilities. 
+For this purpose we have calculated the mean standard deviation of compounds with multiple measurements, which is roughly a factor of 2 for both databases. 
+
 The Nestlé database has 567 LOAEL values for
 445 unique structures, 93 compounds have
 multiple measurements with a mean standard deviation (-log10 transformed
@@ -438,7 +441,7 @@ multiple measurements with a mean standard deviation (-log10 transformed values)
 0.59 mmol/kg_bw/day)
 ([@fig:intra]). 
 
-Standard deviations of both datasets do not show
+Standard deviations of both databases do not show
 a statistically significant difference with a p-value (t-test) of 0.21.
 The combined test set has a mean standard deviation (-log10 transformed values) of
 0.33
@@ -446,22 +449,24 @@ The combined test set has a mean standard deviation (-log10 transformed values)
 0.55 mmol/kg_bw/day)
 ([@fig:intra]). 
 
-![Distribution and variability of LOAEL values in both datasets. Each vertical line represents a compound, dots are individual LOAEL values.](figures/dataset-variability.pdf){#fig:intra}
+![Distribution and variability of compounds with multiple LOAEL values in both databases Each vertical line represents a compound, dots are individual LOAEL values.](figures/dataset-variability.pdf){#fig:intra}
 
 ##### Inter database variability
 
+In order to compare the correlation of LOAEL values in both databases and to establish a reference for predicted values, we have investigated compounds, that occur in both databases.
+
 [@fig:comp] shows the experimental LOAEL variability of compounds occurring in
-both datasets (i.e. the *test* dataset) colored in red (experimental). This is
+both datasets (i.e. the *test* dataset) colored in blue (experimental). This is
 the baseline reference for the comparison with predicted values.
 
 
 
 [@fig:datacorr] depicts the correlation between LOAEL values from both
-datasets. As both datasets contain duplicates medians were used for the
+databases. As both databases contain duplicates medians were used for the
 correlation plot and statistics. It should be kept in mind that the aggregation of duplicated
 measurements into a single median value hides a substantial portion of the
 experimental variability.  Correlation analysis shows a significant (p-value < 2.2e-16)
-correlation between the experimental data in both datasets with r\^2:
+correlation between the experimental data in both databases with r\^2:
 0.52, RMSE: 0.59
 
 ![Correlation of median LOAEL values from Nestlé and FSVO databases. Data with
@@ -473,7 +478,7 @@ correlation between the experimental data in both datasets with r\^2:
 
 
 In order to compare the performance of *in silico* read across models with
-experimental variability we are using compounds that occur in both datasets as
+experimental variability we are using compounds with multiple measurements as
 a test set (375 measurements, 155
 compounds). `lazar` read across predictions were obtained for
 155 compounds, 37
@@ -610,7 +615,7 @@ Nestlé and FSVO databases are very similar in terms of chemical
 structures and properties as well as distribution of experimental LOAEL
 values. The only significant difference that we observed was that the
 Nestlé one has larger amount of small molecules, than the FSVO database.
-For this reason we pooled both dataset into a single training dataset
+For this reason we pooled both databases into a single training dataset
 for read across predictions.
 
 An early review of the databases revealed that 155 out of the 671
@@ -639,8 +644,8 @@ data.
 
 Predictions with a warning (neighbor similarity &lt; 0.5 and &gt; 0.2 or
 weighted average predictions) are more uncertain. However, they still
-show a strong correlation with experimental data, but the errors are
-larger than for compounds within the applicability domain. Expected
+show a strong correlation with experimental data, but the errors are ~ 20-40\%
+larger than for compounds within the applicability domain ([@fig:corr] and [@tbl:cv]). Expected
 errors are displayed as 95% prediction intervals, which covers 100% of
 the experimental data. The main advantage of lowering the similarity
 threshold is that it allows to predict a much larger number of
diff --git a/loael.pdf b/loael.pdf
index 80921e4..82b541d 100644
--- a/loael.pdf
+++ b/loael.pdf
diff --git a/loael.tex b/loael.tex
index 738fea5..b5c625b 100644
--- a/loael.tex
+++ b/loael.tex
@@ -249,13 +249,13 @@ chemical structures. It can be obtained from the following GitHub links:
 \subsubsection{Preprocessing}\label{preprocessing}
 
 Chemical structures (represented as SMILES (Weininger 1988)) in both
-datasets were checked for correctness. When syntactically incorrect or
+databases were checked for correctness. When syntactically incorrect or
 missing SMILES were generated from other identifiers (e.g names, CAS
 numbers). Unique smiles from the OpenBabel library (OBoyle et al. 2011)
 were used for the identification of duplicated structures.
 
 Studies with undefined or empty LOAEL entries were removed from the
-datasets. LOAEL values were converted to mmol/kg\_bw/day units and
+databases LOAEL values were converted to mmol/kg\_bw/day units and
 rounded to five significant digits. For prediction, validation and
 visualisation purposes -log10 transformations are used.
 
@@ -466,10 +466,10 @@ baseline for evaluating prediction performance.
 
 \subparagraph{Structural diversity}\label{structural-diversity}
 
-In order to compare the structural diversity of both datasets we
+In order to compare the structural diversity of both databases we
 evaluated the frequency of functional groups from the OpenBabel FP4
 fingerprint. Figure~\ref{fig:fg} shows the frequency of functional
-groups in both datasets. 139 functional groups with a frequency
+groups in both databases 139 functional groups with a frequency
 \textgreater{} 25 are depicted, the complete table for all functional
 groups can be found in the supplemental material at
 \href{https://github.com/opentox/loael-paper/blob/submission/data/functional-groups.csv}{GitHub}.
@@ -488,10 +488,10 @@ structure of chemical compounds, their physico-chemical properties, and
 biological or toxic effects. It depicts closely related (similar)
 compounds in 3D space and can be used with different kinds of features.
 We have investigated structural as well as physico-chemical properties
-and concluded that both datasets are very similar, both in terms of
+and concluded that both databases are very similar, both in terms of
 chemical structures and physico-chemical properties.
 
-The only statistically significant difference between both datasets, is
+The only statistically significant difference between both databases, is
 that the Nestlé database contains more small compounds (61 structures
 with less than 11 atoms) than the FSVO-database (19 small structures,
 p-value 3.7E-7).
@@ -499,17 +499,23 @@ p-value 3.7E-7).
 \subsubsection{Experimental variability versus prediction
 uncertainty}\label{experimental-variability-versus-prediction-uncertainty}
 
-Duplicated LOAEL values can be found in both datasets and there is a
+Duplicated LOAEL values can be found in both databases and there is a
 substantial number of 155 compounds with more than one LOAEL. These
 chemicals allow us to estimate the variability of experimental results
-within individual datasets and between datasets. Data with
-\emph{identical} values (at five significant digits) in both datasets
+within individual databases and between databases. Data with
+\emph{identical} values (at five significant digits) in both databases
 were excluded from variability analysis, because it it likely that they
 originate from the same experiments.
 
 \subparagraph{Intra database
 variability}\label{intra-database-variability}
 
+Both databases contain substances with multiple measurements, which
+allow the determination of experimental variabilities. For this purpose
+we have calculated the mean standard deviation of compounds with
+multiple measurements, which is roughly a factor of 2 for both
+databases.
+
 The Nestlé database has 567 LOAEL values for 445 unique structures, 93
 compounds have multiple measurements with a mean standard deviation
 (-log10 transformed values) of 0.32 (0.56 mg/kg\_bw/day, 0.56
@@ -520,7 +526,7 @@ compounds have multiple measurements with a mean standard deviation
 (-log10 transformed values) of 0.29 (0.57 mg/kg\_bw/day, 0.59
 mmol/kg\_bw/day) (Figure~\ref{fig:intra}).
 
-Standard deviations of both datasets do not show a statistically
+Standard deviations of both databases do not show a statistically
 significant difference with a p-value (t-test) of 0.21. The combined
 test set has a mean standard deviation (-log10 transformed values) of
 0.33 (0.56 mg/kg\_bw/day, 0.55 mmol/kg\_bw/day)
@@ -529,26 +535,30 @@ test set has a mean standard deviation (-log10 transformed values) of
 \begin{figure}
 \centering
 \includegraphics{figures/dataset-variability.pdf}
-\caption{Distribution and variability of LOAEL values in both datasets.
-Each vertical line represents a compound, dots are individual LOAEL
-values.}\label{fig:intra}
+\caption{Distribution and variability of compounds with multiple LOAEL
+values in both databases Each vertical line represents a compound, dots
+are individual LOAEL values.}\label{fig:intra}
 \end{figure}
 
 \subparagraph{Inter database
 variability}\label{inter-database-variability}
 
+In order to compare the correlation of LOAEL values in both databases
+and to establish a reference for predicted values, we have investigated
+compounds, that occur in both databases.
+
 Figure~\ref{fig:comp} shows the experimental LOAEL variability of
 compounds occurring in both datasets (i.e.~the \emph{test} dataset)
-colored in red (experimental). This is the baseline reference for the
+colored in blue (experimental). This is the baseline reference for the
 comparison with predicted values.
 
 Figure~\ref{fig:datacorr} depicts the correlation between LOAEL values
-from both datasets. As both datasets contain duplicates medians were
+from both databases. As both databases contain duplicates medians were
 used for the correlation plot and statistics. It should be kept in mind
 that the aggregation of duplicated measurements into a single median
 value hides a substantial portion of the experimental variability.
 Correlation analysis shows a significant (p-value \textless{} 2.2e-16)
-correlation between the experimental data in both datasets with r\^{}2:
+correlation between the experimental data in both databases with r\^{}2:
 0.52, RMSE: 0.59
 
 \begin{figure}
@@ -562,8 +572,8 @@ analysis.}\label{fig:datacorr}
 \subsubsection{Local QSAR models}\label{local-qsar-models}
 
 In order to compare the performance of \emph{in silico} read across
-models with experimental variability we are using compounds that occur
-in both datasets as a test set (375 measurements, 155 compounds).
+models with experimental variability we are using compounds with
+multiple measurements as a test set (375 measurements, 155 compounds).
 \texttt{lazar} read across predictions were obtained for 155 compounds,
 37 predictions failed, because no similar compounds were found in the
 training data (i.e.~they were not covered by the applicability domain of
@@ -721,7 +731,7 @@ very similar in terms of chemical structures and properties as well as
 distribution of experimental LOAEL values. The only significant
 difference that we observed was that the Nestlé one has larger amount of
 small molecules, than the FSVO database. For this reason we pooled both
-dataset into a single training dataset for read across predictions.
+databases into a single training dataset for read across predictions.
 
 An early review of the databases revealed that 155 out of the 671
 chemicals available in the training datasets had at least two
@@ -750,16 +760,18 @@ comparable to experimental variability of the training data.
 Predictions with a warning (neighbor similarity \textless{} 0.5 and
 \textgreater{} 0.2 or weighted average predictions) are more uncertain.
 However, they still show a strong correlation with experimental data,
-but the errors are larger than for compounds within the applicability
-domain. Expected errors are displayed as 95\% prediction intervals,
-which covers 100\% of the experimental data. The main advantage of
-lowering the similarity threshold is that it allows to predict a much
-larger number of substances than with more rigorous applicability domain
-criteria. As each of this prediction could be problematic, they are
-flagged with a warning to alert risk assessors that further inspection
-is required. This can be done in the graphical interface
-(\url{https://lazar.in-silico.ch}) which provides intuitive means of
-inspecting the rationales and data used for read across predictions.
+but the errors are \textasciitilde{} 20-40\% larger than for compounds
+within the applicability domain (Figure~\ref{fig:corr} and
+Table~\ref{tbl:cv}). Expected errors are displayed as 95\% prediction
+intervals, which covers 100\% of the experimental data. The main
+advantage of lowering the similarity threshold is that it allows to
+predict a much larger number of substances than with more rigorous
+applicability domain criteria. As each of this prediction could be
+problematic, they are flagged with a warning to alert risk assessors
+that further inspection is required. This can be done in the graphical
+interface (\url{https://lazar.in-silico.ch}) which provides intuitive
+means of inspecting the rationales and data used for read across
+predictions.
 
 Finally there is a substantial number of chemicals (37), where no
 predictions can be made, because no similar compounds in the training
author	Christoph Helma <helma@in-silico.ch>	2017-12-20 14:25:02 +0100
committer	Christoph Helma <helma@in-silico.ch>	2017-12-20 14:25:02 +0100
commit	a5a9144dd7eb4cb9455c5674325ce6e0cc17af61 (patch)
tree	608e85d62e1fa8992f2acb36c6cfb29a264a44ae
parent	d467b34ca9ea79095205d022b9a62888294b543d (diff)