Nanomaterial read across predictions with nano-lazar

Christoph Helma, Micha Rautenberg, Denis Gebele

in silico toxicology gmbh, Basel, Switzerland


Contents

lazar read across framework

A reproducible version of the read across procedure commonly used in toxicological risk assessment (based on the k-nearest-neighbor algorithm)

lazar was originally designed for small molecules with a defined chemical structure. The nanoparticle extension was developed and validated within the eNanoMapper project.

Similarity calculation

Requirements
Descriptors (features) for the query substance and the neighbor candidate
Observation
A large number of irrelevant features can lead do meaningless similarity estimates
Relevant features
Features that correlate significantly with toxicity (Pearson correlation p-value < 0.05)
Weighted cosine similarity
  • Scaled and centered relevant feature vectors
  • Feature contributions weighted by Pearson correlation coefficient
  • Similarity threshold: sim > 0.5

Local regression algorithms

Partial least squares and random forest models use the caret R package with default settings

Prediction intervals: 1.96*RMSE of carets bootstrapped model predictions

If PLS/RF modelling or prediction fails, lazar resorts to using the weighted average method.

Validation

Data requirements

Net cell association endpoint of the Protein corona dataset (121 gold and silver particles)

10-fold crossvalidations

Descriptors Algorithm r2 RMSE
Physchem WA 0.42, 0.46, 0.48 2.02, 1.94, 1.92
Physchem PLS 0.53, 0.54, 0.49 1.83, 1.8, 1.9
Physchem RF 0.53, 0.52, 0.54 1.82, 1.84, 1.79
Proteomics WA 0.66, 0.63, 0.63 * 1.58, 1.62, 1.66 *
Proteomics PLS 0.59, 0.66, 0.63 * 1.74, 1.56, 1.65 *
Proteomics RF 0.66, 0.65, 0.63 * 1.56, 1.59, 1.64 *
All WA 0.73, 0.66, 0.66 * 1.41, 1.57, 1.58 *
All PLS 0.67, 0.64, 0.69 * 1.53, 1.63, 1.5 *
All RF 0.69, 0.69, 0.7 ** 1.51, 1.5, 1.46 **

Gold and silver particles included!

Correlation plot

Correlation of log2 transformed net cell association measurements with random forest predictions using physchem properties and protein corona data.

Correlation of log2 transformed net cell association measurements with random forest predictions using physchem properties and protein corona data.

Exercises

Try the nano-lazar versions at

Old (stable) version (physchem only)
https://nano-lazar.in-silico.ch
Next release
https://nano-lazar-dev.in-silico.ch/predict

Questions