diff options
Diffstat (limited to 'loael.tex')
-rw-r--r-- | loael.tex | 26 |
1 files changed, 13 insertions, 13 deletions
@@ -294,10 +294,10 @@ In this study we are using the modular lazar (\emph{la}zy \emph{s}tructure \emph{a}ctivity \emph{r}elationships) framework (A. Maunz et al. 2013) for model development and validation. The complete \texttt{lazar} source code can be found on -\href{https://github.com/opentox/lazar}{GitHub}. +\href{https://github.com/opentox/lazar/tree/loael-paper.revision}{GitHub}. lazar follows the following basic -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L180-L257}{workflow}: +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L191-L281}{workflow}: For a given chemical structure lazar @@ -324,7 +324,7 @@ following sections. \subsubsection{Neighbor identification}\label{neighbor-identification} Similarity calculations are based on -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/nanoparticle.rb\#L17-L21}{MolPrint2D +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/compound.rb\#L38-L42}{MolPrint2D fingerprints} (Bender et al. 2004) from the OpenBabel chemoinformatics library (OBoyle et al. 2011). @@ -345,7 +345,7 @@ atom environments of a compound, which can be used to calculate chemical similarities. The -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb\#L18-L20}{chemical +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/similarity.rb\#L22-L27}{chemical similarity} between two compounds A and B is expressed as the proportion between atom environments common in both structures \(A \cap B\) and the total number of atom environments \(A \cup B\) (Jaccard/Tanimoto index, @@ -377,7 +377,7 @@ absence of closely related neighbors, we follow a tiered approach: Compounds with the same structure as the query structure are automatically -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L180-L257}{eliminated +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L233-L236}{eliminated from neighbors} to obtain unbiased predictions in the presence of duplicates. @@ -386,7 +386,7 @@ predictions}\label{local-qsar-models-and-predictions} Only similar compounds (\emph{neighbors}) above the threshold are used for local QSAR models. In this investigation we are using -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/caret.rb\#L7-L78}{weighted +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L82-L85}{weighted random forests regression (RF)} for the prediction of quantitative properties. First all uninformative fingerprints (i.e.~features with identical values across all neighbors) are removed. The remaining set of @@ -398,7 +398,7 @@ settings, optimizing the number of RF components by bootstrap resampling. Finally the local RF model is applied to -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L194-L272}{predict +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L191-L281}{predict the activity} of the query compound. The root-mean-square error (RMSE) of bootstrapped local model predictions is used to construct 95\% prediction intervals at 1.96*RMSE. The width of the prediction interval @@ -407,7 +407,7 @@ prediction should be with 95\% probability within the prediction interval. If RF modelling or prediction fails, the program resorts to using the -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb\#L6-L16}{weighted +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/regression.rb\#L7-L21}{weighted mean} of the neighbors LOAEL values, where the contribution of each neighbor is weighted by its similarity to the query compound. In this case the prediction is also flagged with a warning. @@ -436,15 +436,15 @@ For the comparison of experimental variability with predictive accuracies we are using a test set of compounds that occur in both databases. Unbiased read across predictions are obtained from the \emph{training} dataset, by -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L234-L238}{removing +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/model.rb\#L233-L237}{removing \emph{all} information} from the test compound from the training set prior to predictions. This procedure is hardcoded into the prediction algorithm in order to prevent validation errors. As we have only a single test set no model or parameter optimisations were performed in order to avoid overfitting a single dataset. -Results from 3 repeated -\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb\#L85-L93}{10-fold +Results from 50 repeated +\href{https://github.com/opentox/lazar/blob/loael-paper.revision/lib/crossvalidation.rb\#L10-L48}{10-fold crossvalidations} with independent training/test set splits are provided as additional information to the test set results. @@ -462,8 +462,8 @@ LOAEL data (Nestlé and FSVO databases combined). \item[\texttt{lazar} GUI] \url{https://github.com/opentox/lazar-gui} (source code) \item[Manuscript] -\url{https://github.com/opentox/loael-paper} (source code for the -manuscript and validation experiments) +\url{https://github.com/opentox/loael-paper/tree/revision} (source code +for the manuscript and validation experiments) \item[Docker image] \url{https://hub.docker.com/r/insilicotox/loael-paper/} (container with manuscript, validation experiments, \texttt{lazar} libraries and third |