Validation of read across predictions for nanoparticle toxicities

Christoph Helma, Micha Rautenberg, Denis Gebele

in silico toxicology gmbh, Basel, Switzerland

logo.png

Objectives

lazar read across framework

Similarity calculation

Relevant features
Features that correlate significantly with toxicity (Pearson correlation p-value < 0.05)
Weighted cosine similarity
  • Scaled and centered relevant feature vectors
  • Feature contributions weighted by Pearson correlation coefficient
  • Similarity threshold: sim > 0.5

Local regression algorithms

Partial least squares and random forest models use the caret R package with default settings

Prediction intervals: 1.96*RMSE of carets bootstrapped model predictions

If PLS/RF modelling or prediction fails, lazar resorts to using the weighted average method.

Validation

Data requirements

Net cell association endpoint of the Protein corona dataset (121 gold and silver particles)

10-fold crossvalidations

Descriptors Algorithm r2 RMSE
Physchem WA 0.42, 0.46, 0.48 2.02, 1.94, 1.92
Physchem PLS 0.53, 0.54, 0.49 1.83, 1.8, 1.9
Physchem RF 0.53, 0.52, 0.54 1.82, 1.84, 1.79
Proteomics WA 0.66, 0.63, 0.63 * 1.58, 1.62, 1.66 *
Proteomics PLS 0.59, 0.66, 0.63 * 1.74, 1.56, 1.65 *
Proteomics RF 0.66, 0.65, 0.63 * 1.56, 1.59, 1.64 *
All WA 0.73, 0.66, 0.66 * 1.41, 1.57, 1.58 *
All PLS 0.67, 0.64, 0.69 * 1.53, 1.63, 1.5 *
All RF 0.69, 0.69, 0.7 ** 1.51, 1.5, 1.46 **

Gold and silver particles included!

Correlation plot

Correlation of log2 transformed net cell association measurements with random forest predictions using physchem properties and protein corona data.

Correlation of log2 transformed net cell association measurements with random forest predictions using physchem properties and protein corona data.

Reproducible research

Lazar (source code)
https://github.com/opentox/lazar
Manuscript (source code)
https://github.com/opentox/nano-lazar-paper
Docker image
https://hub.docker.com/r/insilicotox/nano-lazar-paper/

Questions

TODO