summaryrefslogtreecommitdiff
path: root/loael.md
diff options
context:
space:
mode:
Diffstat (limited to 'loael.md')
-rw-r--r--loael.md38
1 files changed, 7 insertions, 31 deletions
diff --git a/loael.md b/loael.md
index 9f49cff..aa1764e 100644
--- a/loael.md
+++ b/loael.md
@@ -304,30 +304,6 @@ experimental results within individual datasets and between datasets.
##### Intra dataset variability
-```r
-m.dupsmi <- unique(m$SMILES[duplicated(m$SMILES)])
-s.dupsmi <- unique(s$SMILES[duplicated(s$SMILES)])
-c.dupsmi <- unique(c$SMILES[duplicated(c$SMILES)])
-
-m.dup <- m[m$SMILES %in% m.dupsmi,]
-s.dup <- s[s$SMILES %in% s.dupsmi,]
-c.dup <- c[c$SMILES %in% c.dupsmi,]
-
-m.dupnr <- length(m.dupsmi)
-s.dupnr <- length(s.dupsmi)
-c.dupnr <- length(c.dupsmi)
-
-#m.dup
-#m.dup$LOAEL
-#m.dup$SMILES
-
-m.dup$sd <- ave(m.dup$LOAEL,m.dup$SMILES,FUN=sd)
-s.dup$sd <- ave(s.dup$LOAEL,s.dup$SMILES,FUN=sd)
-c.dup$sd <- ave(c.dup$LOAEL,c.dup$SMILES,FUN=sd)
-t$sd <- ave(t$LOAEL,t$SMILES,FUN=sd)
-
-p = t.test(m.dup$sd,s.dup$sd)$p.value
-```
The Mazzatorta dataset has 567 LOAEL values for
445 unique structures, 93
@@ -369,7 +345,7 @@ In order to compare the performance of in silico read across models with experim
variability we are using compounds that occur in both datasets as a test set
(375 measurements, 155 compounds).
`lazar` read across predictions
-were obtained for 155 compounds, 5
+were obtained for 155 compounds, 129
predictions failed, because no similar compounds were found in the training data (i.e. they were not covered by the applicability domain of the training data).
@@ -391,7 +367,7 @@ experimental data into a single median value hides experimental variability.
Comparison | $r^2$ | RMSE
--------------|---------------------------|-------------------------
Mazzatorta vs. Swiss | 0.52 | 0.59
-Prediction vs. Test median | 0.31 | 0.72
+Prediction vs. Test median | 0.48 | 0.53
: Comparison of model predictions with experimental variability. {#tbl:common-pred}
@@ -405,9 +381,9 @@ All correlations of predicted with experimental values are statistically highly
$r^2$ | RMSE | Nr. predicted
-------|------|----------------
-0.33 | 0.82 | 631/671
-0.36 | 0.81 | 633/671
-0.35 | 0.81 | 636/671
+0.41 | 0.57 | 51/671
+0.52 | 0.54 | 57/671
+0.51 | 0.52 | 58/671
: Results from 3 independent 10-fold crossvalidations {#tbl:cv}
@@ -445,13 +421,13 @@ we present a brief analysis of the two most severe mispredictions:
-The compound with the largest deviation of prediction intervals is (amino-methylsulfanyl-phosphoryl)oxymethane (SMILES COP(=O)(SC)N) with an experimental median of 2.69 and a prediction interval of 0.74 +/- 0.29. In this case the prediction is based on two neighbors with very low similarity (0.1 and 0.13). Such cases can be eliminated by raising the similarity threshold for neighbors, but that could come at the cost of a larger number of unpredicted compounds. The graphical user interface shows for each prediction neighbors and similarities for a critical examination which should make the detection of similar cases rather straightforward.
+The compound with the largest deviation of prediction intervals is (amino-methylsulfanyl-phosphoryl)oxymethane (SMILES COP(=O)(SC)N) with an experimental median of and a prediction interval of +/- . In this case the prediction is based on two neighbors with very low similarity (0.1 and 0.13). Such cases can be eliminated by raising the similarity threshold for neighbors, but that could come at the cost of a larger number of unpredicted compounds. The graphical user interface shows for each prediction neighbors and similarities for a critical examination which should make the detection of similar cases rather straightforward.
The compound with second largest deviation of prediction intervals is
Endosulfan (SMILES O=S1OCC2C(CO1)C1(C(C2(Cl)C(=C1Cl)Cl)(Cl)Cl)Cl)
-with an experimental median of 1.91 and a prediction interval of 3.43 +/- 1.52. In this case the prediction is based on 5 neighbors with similarities between 0.33 and 0.4. All of them are polychlorinated compound, but none of them contains sulfur or is a sulfurous acid ester. Again such problems are easily identified from a visual inspection of neighbors, and we want to stress the importance of inspecting rationales for predictions in the graphical interface before accepting a prediction.
+with an experimental median of and a prediction interval of +/- . In this case the prediction is based on 5 neighbors with similarities between 0.33 and 0.4. All of them are polychlorinated compound, but none of them contains sulfur or is a sulfurous acid ester. Again such problems are easily identified from a visual inspection of neighbors, and we want to stress the importance of inspecting rationales for predictions in the graphical interface before accepting a prediction.
Summary
=======