add readmes

author: mguetlein <martin.guetlein@gmail.com> 2014-03-18 13:03:17 +0100
committer: mguetlein <martin.guetlein@gmail.com> 2014-03-18 13:03:17 +0100
commit: b8deb1aacd26eeafd57a075290ba0451f24dae05 (patch)
tree: 7ad18a1e02b252462a7b056406929724a9414e78
parent: ffeef2d78bff7ec5d4bf406fc70de61375c66f31 (diff)
5 files changed, 128 insertions, 2 deletions
diff --git a/README.md b/README.md
index 7005b22..02838ab 100644
--- a/README.md
+++ b/README.md
@@ -6,4 +6,36 @@ OpenTox Validation
 [API documentation](http://rdoc.info/github/opentox/validation)
 --------------------------------------------------------------
 
-Copyright (c) 2009-2012 Martin Guetlein, Christoph Helma. See LICENSE for details.
+General:
+--------
+
+* validation and reported is seperated in code (see below)
+* see validation/README and report/README for more general info
+
+Source Directories:
+-------------------
+
+* **validation** all validation stuff excluding the reports (should not access code in *report*)
+* **report** reporting stuff (should not access code in *validation*)
+* **lib** helper classes used by validation and by report 
+* **test** test examples and use-cases (additional to those in the test repository)
+
+Non-Source Directories:
+-----------------------
+
+* **data** data files for validation
+* **docbook-xml-4.5** for converting xml reports into html
+* **docbook-xsl-1.76.1** for converting xml repors into html
+* **RankPlotter** for creating rank-plots in compare-algorithm reports
+* **reports** reports are stored in this folder
+* **resources** icons and stylsheet
+
+Glossary / Wording:
+-------------------
+
+* **accept_values** domain or possible class-values for classification (e.g. 'active','inactive')
+* **prediction_feature** endpoint feature that is predicted (exists once in cross-valdation)
+* **predicted_variable** feature for predictions of a model (exists 10 times in 10-fold cross-valdation)
+* **predicted_confidence** feature for predicted-confidence of a model (exists 10 times in 10-fold cross-validation)
+
+Copyright (c) 2009-2012 Martin Guetlein, Christoph Helma. See LICENSE for details.
+\ No newline at end of file
diff --git a/TODO.md b/TODO.md
new file mode 100644
index 0000000..c1f7e92
--- /dev/null
+++ b/TODO.md
@@ -0,0 +1,18 @@
+TODOs for validation service
+============================
+
+author: Martin Guetlein, date: 2014-03-18
+
+Refactoring
+-----------
+
+* remove redis, replace with 4store
+* remove gnuplot, replace with R
+* to_json support
+
+Pitfalls
+--------
+
+* for better performance datasets are cached in memory (see lib/dataset_cache.rb), this might cause memmory issues when working with large datasets
+* validation objects does store predictions in dataset (for better performance, to not read all datasets and models again), this can cause redis to get pretty large
+* redis does load everything into main memmory, can cause memmory problems on the long run
+\ No newline at end of file
diff --git a/lib/README.md b/lib/README.md
new file mode 100644
index 0000000..76777ca
--- /dev/null
+++ b/lib/README.md
@@ -0,0 +1,17 @@
+Directory: lib - OpenTox Validation
+=======================================
+
+author: Martin Guetlein, date: 2014-03-18
+
+Code
+----
+
+* **dataset_cache.rb** stores datasets in memmory
+* **format_util.rb** util class for formatting to rdf / yaml
+* **merge.rb** general merge class, to merge e.g. numeric arrays, computes mean and variance
+* **ohm_util.rb** utils for redis (ohm is gem for redis)
+* **predictions.rb** prediction statistics for classification and regression
+* **ot_predictions.rb** extends predictions.rb, mainly by storing predicted compounds
+* **prediction_data.rb** compounds and input data for predictions.rb, can be filtered
+* **test_util.rb** util for debugging and testing
+* **validation_db.rb** validation and crossvalidation redis-objects, validation-statistic fields
diff --git a/report/README.md b/report/README.md
new file mode 100644
index 0000000..2fd9507
--- /dev/null
+++ b/report/README.md
@@ -0,0 +1,36 @@
+Directory: report - OpenTox Validation
+=======================================
+
+author: Martin Guetlein, date: 2014-03-18
+
+General
+-------
+
+* reports are no 'html-view' of validations
+* instead reports are own objects and created for validations and seperately stored
+* report types: 
+ * **valdation** for a single validation
+ * **crossvalidation** for a cross-validation
+ * **algorithm-comparison** compares cross-validation of different-algorithms on >=1 datasets
+ * **method-comparison** compares arbitrary single validations
+* reports are stored as docbook-xml files with additional plotfiles
+* **IMPORTANT** reports have a own representation of validations (see validation_data.rb, not objects in validation/validation_db.rb)
+
+Code
+----
+
+* **environment.rb** requires all gems/files, inits r-util
+* **report_application.rb** REST call handling
+* **report_service.rb** provides/deletes reports
+* **report_factory.rb** creates various report types
+* **report_content.rb** fills report content, wrap xml-report + plot-files
+* **xml_report.rb** xml-object, this is the actual report content
+* **xml_report_util.rb** utils for xml report
+* **plot_factory.rb** creates plots
+* **report_format.rb** formats reports (to html/pdf)
+* **report_persistance.rb** handles storing of reports (stored as file and in redis)
+* **report_test.rb** debugging and testing stuff
+* **statistical_test.rb** applies t-test for significant different performance 
+* **util.rb** various utils 
+* **validation_access.rb** how validations are accessed in reports
+* **validation_data.rb** how validations are represented in reports
diff --git a/validation/README.md b/validation/README.md
index 9daeafb..b535872 100644
--- a/validation/README.md
+++ b/validation/README.md
@@ -1 +1,24 @@
-test
+Directory: validation - OpenTox Validation
+=======================================
+
+author: Martin Guetlein, date: 2014-03-18
+
+General
+-------
+
+* a *validation* object resembles a test-set validation, e.g. compounds in the test-set are predicted, prediciton quality is measured
+* different types of *validation*s:
+ * **test-set-validation** input: model and test-set, no algorithm, no training-set
+ * **training-test-validation** input: algorithm, test-set, training-set
+ * **training-test-split** input: algorithm, dataset, split-ratio
+ * **boostrapping** input: algorithm, dataset
+* k-fold *crossvalidation* creates k *validation* objects and an additional *validation* object that stores the aggregated statistics
+
+Code
+----
+
+* **validation_application.rb** REST call handling
+* **validation_format.rb** to_rdf and to_yaml stuff
+* **validation_service.rb** does the actual validation work (e.g. model building)
+* **validation_test.rb** test-routines for debugging
+
author	mguetlein <martin.guetlein@gmail.com>	2014-03-18 13:03:17 +0100
committer	mguetlein <martin.guetlein@gmail.com>	2014-03-18 13:03:17 +0100
commit	b8deb1aacd26eeafd57a075290ba0451f24dae05 (patch)
tree	7ad18a1e02b252462a7b056406929724a9414e78
parent	ffeef2d78bff7ec5d4bf406fc70de61375c66f31 (diff)