summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristoph Helma <helma@in-silico.ch>2018-03-19 20:02:38 +0100
committerChristoph Helma <helma@in-silico.ch>2018-03-19 20:02:38 +0100
commitbff63dfc03d903d5788051128d2d59e1b585f89b (patch)
treed7291148b029fa5bc3f944b68a069e30e2e5388f
parent9d3f7f3c2ec14f53446a36de4343b25f57a82bb7 (diff)
paper revision links fixed
-rw-r--r--loael.Rmd22
-rw-r--r--loael.md24
-rw-r--r--loael.pdfbin661974 -> 661959 bytes
-rw-r--r--loael.tex26
4 files changed, 36 insertions, 36 deletions
diff --git a/loael.Rmd b/loael.Rmd
index 08087eb..03f5089 100644
--- a/loael.Rmd
+++ b/loael.Rmd
@@ -197,9 +197,9 @@ Algorithms
In this study we are using the modular lazar (*la*zy *s*tructure *a*ctivity
*r*elationships) framework [@Maunz2013] for model development and validation.
-The complete `lazar` source code can be found on [GitHub](https://github.com/opentox/lazar).
+The complete `lazar` source code can be found on [GitHub](https://github.com/opentox/lazar/tree/loael-paper.revision).
-lazar follows the following basic [workflow](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L180-L257):
+lazar follows the following basic [workflow](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L191-L281):
For a given chemical structure lazar
@@ -239,7 +239,7 @@ similarities.
[//]: # https://openbabel.org/docs/dev/FileFormats/MolPrint2D_format.html#molprint2d-format
-The [chemical similarity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb#L20-L27) between two compounds A and B is expressed as the
+The [chemical similarity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb#L22-L27) between two compounds A and B is expressed as the
proportion between atom environments common in both structures $A \cap B$ and the
total number of atom environments $A \cup B$ (Jaccard/Tanimoto index, [@eq:jaccard]).
@@ -258,7 +258,7 @@ closely related neighbors, we follow a tiered approach:
- Similarity thresholds of 0.5 and 0.2 are the default values chosen by the software developers and remained unchanged during the course of these experiments.
Compounds with the same structure as the query structure are automatically
-[eliminated from neighbors](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L231-L236)
+[eliminated from neighbors](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L233-L236)
to obtain unbiased predictions in the presence of
duplicates.
@@ -267,7 +267,7 @@ duplicates.
Only similar compounds (*neighbors*) above the threshold are used for local
QSAR models. In this investigation we are using [weighted random forests
regression
-(RF)](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/caret.rb#L7-L78)
+(RF)](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L82-L85)
for the prediction of quantitative properties. First all uninformative
fingerprints (i.e. features with identical values across all neighbors) are
removed. The remaining set of features is used as descriptors for creating
@@ -277,12 +277,12 @@ used for this purpose. Models are trained with the default `caret` settings,
optimizing the number of RF components by bootstrap resampling.
Finally the local RF model is applied to [predict the
-activity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L194-L272)
+activity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L191-L281)
of the query compound. The root-mean-square error (RMSE) of bootstrapped local model predictions is used
to construct 95\% prediction intervals at 1.96*RMSE. The width of the prediction interval indicates the expected prediction accuracy. The "true" value of a prediction should be with 95\% probability within the prediction interval.
If RF modelling or prediction fails, the program resorts to using the [weighted
-mean](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb#L6-L16)
+mean](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb#L7-L21)
of the neighbors LOAEL values, where the contribution of each neighbor is
weighted by its similarity to the query compound. In this case the prediction
is also flagged with a warning.
@@ -309,14 +309,14 @@ interval associated with each prediction.
For the comparison of experimental variability with predictive accuracies we
are using a test set of compounds that occur in both databases. Unbiased read
across predictions are obtained from the *training* dataset, by [removing *all*
-information](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L234-L238)
+information](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L233-L237)
from the test compound from the training set prior to predictions. This
procedure is hardcoded into the prediction algorithm in order to prevent
validation errors. As we have only a single test set no model or parameter
optimisations were performed in order to avoid overfitting a single dataset.
-Results from 3 repeated [10-fold
-crossvalidations](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb#L85-L93)
+Results from 50 repeated [10-fold
+crossvalidations](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb#L10-L48)
with independent training/test set splits are provided as additional
information to the test set results.
@@ -335,7 +335,7 @@ Public webinterface
~ <https://github.com/opentox/lazar-gui> (source code)
Manuscript
- ~ <https://github.com/opentox/loael-paper> (source code for the manuscript and validation experiments)
+ ~ <https://github.com/opentox/loael-paper/tree/revision> (source code for the manuscript and validation experiments)
Docker image
~ <https://hub.docker.com/r/insilicotox/loael-paper/> (container with manuscript, validation experiments, `lazar` libraries and third party dependencies)
diff --git a/loael.md b/loael.md
index a698ce2..9da9949 100644
--- a/loael.md
+++ b/loael.md
@@ -189,9 +189,9 @@ Algorithms
In this study we are using the modular lazar (*la*zy *s*tructure *a*ctivity
*r*elationships) framework [@Maunz2013] for model development and validation.
-The complete `lazar` source code can be found on [GitHub](https://github.com/opentox/lazar).
+The complete `lazar` source code can be found on [GitHub](https://github.com/opentox/lazar/tree/loael-paper.revision).
-lazar follows the following basic [workflow](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L180-L257):
+lazar follows the following basic [workflow](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L191-L281):
For a given chemical structure lazar
@@ -210,7 +210,7 @@ modelling. Algorithms used within this study are described in the following sect
### Neighbor identification
-Similarity calculations are based on [MolPrint2D fingerprints](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/nanoparticle.rb#L17-L21)
+Similarity calculations are based on [MolPrint2D fingerprints](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/compound.rb#L38-L42)
[@doi:10.1021/ci034207y] from the OpenBabel chemoinformatics library
[@OBoyle2011].
@@ -231,7 +231,7 @@ similarities.
[//]: # https://openbabel.org/docs/dev/FileFormats/MolPrint2D_format.html#molprint2d-format
-The [chemical similarity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb#L18-L20) between two compounds A and B is expressed as the
+The [chemical similarity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb#L22-L27) between two compounds A and B is expressed as the
proportion between atom environments common in both structures $A \cap B$ and the
total number of atom environments $A \cup B$ (Jaccard/Tanimoto index, [@eq:jaccard]).
@@ -250,7 +250,7 @@ closely related neighbors, we follow a tiered approach:
- Similarity thresholds of 0.5 and 0.2 are the default values chosen by the software developers and remained unchanged during the course of these experiments.
Compounds with the same structure as the query structure are automatically
-[eliminated from neighbors](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L180-L257)
+[eliminated from neighbors](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L233-L236)
to obtain unbiased predictions in the presence of
duplicates.
@@ -259,7 +259,7 @@ duplicates.
Only similar compounds (*neighbors*) above the threshold are used for local
QSAR models. In this investigation we are using [weighted random forests
regression
-(RF)](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/caret.rb#L7-L78)
+(RF)](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L82-L85)
for the prediction of quantitative properties. First all uninformative
fingerprints (i.e. features with identical values across all neighbors) are
removed. The remaining set of features is used as descriptors for creating
@@ -269,12 +269,12 @@ used for this purpose. Models are trained with the default `caret` settings,
optimizing the number of RF components by bootstrap resampling.
Finally the local RF model is applied to [predict the
-activity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L194-L272)
+activity](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L191-L281)
of the query compound. The root-mean-square error (RMSE) of bootstrapped local model predictions is used
to construct 95\% prediction intervals at 1.96*RMSE. The width of the prediction interval indicates the expected prediction accuracy. The "true" value of a prediction should be with 95\% probability within the prediction interval.
If RF modelling or prediction fails, the program resorts to using the [weighted
-mean](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb#L6-L16)
+mean](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb#L7-L21)
of the neighbors LOAEL values, where the contribution of each neighbor is
weighted by its similarity to the query compound. In this case the prediction
is also flagged with a warning.
@@ -301,14 +301,14 @@ interval associated with each prediction.
For the comparison of experimental variability with predictive accuracies we
are using a test set of compounds that occur in both databases. Unbiased read
across predictions are obtained from the *training* dataset, by [removing *all*
-information](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L234-L238)
+information](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb#L233-L237)
from the test compound from the training set prior to predictions. This
procedure is hardcoded into the prediction algorithm in order to prevent
validation errors. As we have only a single test set no model or parameter
optimisations were performed in order to avoid overfitting a single dataset.
-Results from 3 repeated [10-fold
-crossvalidations](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb#L85-L93)
+Results from 50 repeated [10-fold
+crossvalidations](https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb#L10-L48)
with independent training/test set splits are provided as additional
information to the test set results.
@@ -327,7 +327,7 @@ Public webinterface
~ <https://github.com/opentox/lazar-gui> (source code)
Manuscript
- ~ <https://github.com/opentox/loael-paper> (source code for the manuscript and validation experiments)
+ ~ <https://github.com/opentox/loael-paper/tree/revision> (source code for the manuscript and validation experiments)
Docker image
~ <https://hub.docker.com/r/insilicotox/loael-paper/> (container with manuscript, validation experiments, `lazar` libraries and third party dependencies)
diff --git a/loael.pdf b/loael.pdf
index 550e001..25d0755 100644
--- a/loael.pdf
+++ b/loael.pdf
Binary files differ
diff --git a/loael.tex b/loael.tex
index 7c30c58..5376ce0 100644
--- a/loael.tex
+++ b/loael.tex
@@ -294,10 +294,10 @@ In this study we are using the modular lazar (\emph{la}zy
\emph{s}tructure \emph{a}ctivity \emph{r}elationships) framework (A.
Maunz et al. 2013) for model development and validation. The complete
\texttt{lazar} source code can be found on
-\href{https://github.com/opentox/lazar}{GitHub}.
+\href{https://github.com/opentox/lazar/tree/loael-paper.revision}{GitHub}.
lazar follows the following basic
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L180-L257}{workflow}:
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L191-L281}{workflow}:
For a given chemical structure lazar
@@ -324,7 +324,7 @@ following sections.
\subsubsection{Neighbor identification}\label{neighbor-identification}
Similarity calculations are based on
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/nanoparticle.rb\#L17-L21}{MolPrint2D
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/compound.rb\#L38-L42}{MolPrint2D
fingerprints} (Bender et al. 2004) from the OpenBabel chemoinformatics
library (OBoyle et al. 2011).
@@ -345,7 +345,7 @@ atom environments of a compound, which can be used to calculate chemical
similarities.
The
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb\#L18-L20}{chemical
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb\#L22-L27}{chemical
similarity} between two compounds A and B is expressed as the proportion
between atom environments common in both structures \(A \cap B\) and the
total number of atom environments \(A \cup B\) (Jaccard/Tanimoto index,
@@ -377,7 +377,7 @@ absence of closely related neighbors, we follow a tiered approach:
Compounds with the same structure as the query structure are
automatically
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L180-L257}{eliminated
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L233-L236}{eliminated
from neighbors} to obtain unbiased predictions in the presence of
duplicates.
@@ -386,7 +386,7 @@ predictions}\label{local-qsar-models-and-predictions}
Only similar compounds (\emph{neighbors}) above the threshold are used
for local QSAR models. In this investigation we are using
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/caret.rb\#L7-L78}{weighted
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L82-L85}{weighted
random forests regression (RF)} for the prediction of quantitative
properties. First all uninformative fingerprints (i.e.~features with
identical values across all neighbors) are removed. The remaining set of
@@ -398,7 +398,7 @@ settings, optimizing the number of RF components by bootstrap
resampling.
Finally the local RF model is applied to
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L194-L272}{predict
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L191-L281}{predict
the activity} of the query compound. The root-mean-square error (RMSE)
of bootstrapped local model predictions is used to construct 95\%
prediction intervals at 1.96*RMSE. The width of the prediction interval
@@ -407,7 +407,7 @@ prediction should be with 95\% probability within the prediction
interval.
If RF modelling or prediction fails, the program resorts to using the
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb\#L6-L16}{weighted
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb\#L7-L21}{weighted
mean} of the neighbors LOAEL values, where the contribution of each
neighbor is weighted by its similarity to the query compound. In this
case the prediction is also flagged with a warning.
@@ -436,15 +436,15 @@ For the comparison of experimental variability with predictive
accuracies we are using a test set of compounds that occur in both
databases. Unbiased read across predictions are obtained from the
\emph{training} dataset, by
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L234-L238}{removing
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L233-L237}{removing
\emph{all} information} from the test compound from the training set
prior to predictions. This procedure is hardcoded into the prediction
algorithm in order to prevent validation errors. As we have only a
single test set no model or parameter optimisations were performed in
order to avoid overfitting a single dataset.
-Results from 3 repeated
-\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb\#L85-L93}{10-fold
+Results from 50 repeated
+\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb\#L10-L48}{10-fold
crossvalidations} with independent training/test set splits are provided
as additional information to the test set results.
@@ -462,8 +462,8 @@ LOAEL data (Nestlé and FSVO databases combined).
\item[\texttt{lazar} GUI]
\url{https://github.com/opentox/lazar-gui} (source code)
\item[Manuscript]
-\url{https://github.com/opentox/loael-paper} (source code for the
-manuscript and validation experiments)
+\url{https://github.com/opentox/loael-paper/tree/revision} (source code
+for the manuscript and validation experiments)
\item[Docker image]
\url{https://hub.docker.com/r/insilicotox/loael-paper/} (container with
manuscript, validation experiments, \texttt{lazar} libraries and third