verena additions

author: Christoph Helma <helma@in-silico.ch> 2020-11-04 21:03:13 +0100
committer: Christoph Helma <helma@in-silico.ch> 2020-11-04 21:03:13 +0100
commit: 2936efd649e6494220b7474f8e79761d5fa84136 (patch)
tree: 6286c73ca129d83b73aa68871285365e398a0872 /mutagenicity.md
parent: 3f94e07f7d81da41e3fdb745603997deef47610b (diff)
1 files changed, 29 insertions, 28 deletions
diff --git a/mutagenicity.md b/mutagenicity.md
index 896e088..4d0a602 100644
--- a/mutagenicity.md
+++ b/mutagenicity.md
@@ -7,7 +7,7 @@ author:
       email: helma@in-silico.ch
       correspondence: "yes"
   - Verena Schöning:
-      institute: zeller
+      institute: insel
   - Philipp Boss:
       institute: sysbio
   - Jürgen Drewe:
@@ -23,6 +23,9 @@ institute:
   - sysbio:
       name: Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association
       address: "Robert-Rössle-Strasse 10, Berlin, 13125, Germany"
+  - insel:
+      name: Clinical Pharmacology and Toxicology, Department of General Internal Medicine, Bern University Hospital, University of Bern
+      address: "Inselspital, 3010 Bern, Switzerland"
 
 bibliography: bibliography.bib
 keywords: mutagenicity, QSAR, lazar, random forest, support vector machine, linear regression, neural nets, deep learning
@@ -163,40 +166,40 @@ under a GPL3 License. The new combined dataset can be found at
 
 The testing dataset consisted of 602 different PAs.
 
-**TODO**: **Verena** Kannst Du kurz die Quellen und Auswahlkriterien zusammenfassen?
-
-The compilation of the PA dataset is described in detail in [Schöning et al.
-(2017)](#_ENREF_119).
-
-<!---
-The PAs were assigned to groups according to
-structural features of the necine base and necic acid.
-
-For the necine base, following groups were assigned:
+The PA dataset was created from five independent, necine base substructure
+searches in PubChem (https://pubchem.ncbi.nlm.nih.gov/) and compared to the PAs
+listed in the EFSA publication @EFSA2011 and the book by Mattocks
+@Mattocks1986, to ensure, that all major PAs were included. PAs mentioned in
+these publications which were not found in the downloaded substances were
+searched individually in PubChem and, if available, downloaded separately.
+Non-PA substances, duplicates, and isomers were removed from the files, but
+artificial PAs, even if unlikely to occur in nature, were kept. The resulting
+PA dataset comprised a total of 602 different PAs.
 
--   Retronecine-type (1,2-unstaturated necine base)
+The PAs in the dataset were classified according to structural features. A
+total of 9 different structural features were assigned to the necine base,
+modifications of the necine base and to the necic acid:
 
--   Otonecine-type (1,2-unstaturated necine base)
+For the necine base, the following structural features were chosen:
 
--   Platynecine-type (1,2-saturated necine base)
+  - Retronecine-type (1,2-unstaturated necine base)
+  - Otonecine-type (1,2-unstaturated necine base)
+  - Platynecine-type (1,2-saturated necine base)
 
-For the modification of necine base, following groups were assigned:
+For the modifications of the necine base, the following structural features were chosen:
 
--   *N*-oxide-type
+  - N-oxide-type
+  - Tertiary-type (PAs which were neither from the N-oxide- nor DHP-type)
+  - DHP-type (pyrrolic ester)
 
--   Tertiary-type (PAs which were neither from the *N*-oxide- nor
-    > DHP-type)
+For the necic acid, the following structural features were chosen:
 
--   DHP-type (dehydropyrrolizidine, pyrrolic ester)
+  - Monoester-type
+  - Open-ring diester-type
+  - Macrocyclic diester-type
 
-For the necic acid, following groups were assigned:
+The compilation of the PA dataset is described in detail in @Schoening2017.
 
--   Monoester-type
-
--   Open-ring diester-type
-
--   Macrocyclic diester-type
---->
 Descriptors
 -----------
 
@@ -405,8 +408,6 @@ repeated five times. Based on each of the five different training data,
 the predictive models were trained and the performance tested with the
 validation data. This step was repeated 10 times. 
 
-**TODO**: **Verena** kannst Du bitte ueberpruefen, ob das noch stimmt und ggf die Figure 1 anpassen
-
 ![Flowchart of the generation and validation of the models generated in R-project](figures/image1.png){#fig:valid}
 
 <!--
author	Christoph Helma <helma@in-silico.ch>	2020-11-04 21:03:13 +0100
committer	Christoph Helma <helma@in-silico.ch>	2020-11-04 21:03:13 +0100
commit	2936efd649e6494220b7474f8e79761d5fa84136 (patch)
tree	6286c73ca129d83b73aa68871285365e398a0872 /mutagenicity.md
parent	3f94e07f7d81da41e3fdb745603997deef47610b (diff)