diff options
author | mguetlein <martin.guetlein@gmail.com> | 2014-03-18 13:03:17 +0100 |
---|---|---|
committer | mguetlein <martin.guetlein@gmail.com> | 2014-03-18 13:03:17 +0100 |
commit | b8deb1aacd26eeafd57a075290ba0451f24dae05 (patch) | |
tree | 7ad18a1e02b252462a7b056406929724a9414e78 | |
parent | ffeef2d78bff7ec5d4bf406fc70de61375c66f31 (diff) |
add readmes
-rw-r--r-- | README.md | 34 | ||||
-rw-r--r-- | TODO.md | 18 | ||||
-rw-r--r-- | lib/README.md | 17 | ||||
-rw-r--r-- | report/README.md | 36 | ||||
-rw-r--r-- | validation/README.md | 25 |
5 files changed, 128 insertions, 2 deletions
@@ -6,4 +6,36 @@ OpenTox Validation [API documentation](http://rdoc.info/github/opentox/validation) -------------------------------------------------------------- -Copyright (c) 2009-2012 Martin Guetlein, Christoph Helma. See LICENSE for details. +General: +-------- + +* validation and reported is seperated in code (see below) +* see validation/README and report/README for more general info + +Source Directories: +------------------- + +* **validation** all validation stuff excluding the reports (should not access code in *report*) +* **report** reporting stuff (should not access code in *validation*) +* **lib** helper classes used by validation and by report +* **test** test examples and use-cases (additional to those in the test repository) + +Non-Source Directories: +----------------------- + +* **data** data files for validation +* **docbook-xml-4.5** for converting xml reports into html +* **docbook-xsl-1.76.1** for converting xml repors into html +* **RankPlotter** for creating rank-plots in compare-algorithm reports +* **reports** reports are stored in this folder +* **resources** icons and stylsheet + +Glossary / Wording: +------------------- + +* **accept_values** domain or possible class-values for classification (e.g. 'active','inactive') +* **prediction_feature** endpoint feature that is predicted (exists once in cross-valdation) +* **predicted_variable** feature for predictions of a model (exists 10 times in 10-fold cross-valdation) +* **predicted_confidence** feature for predicted-confidence of a model (exists 10 times in 10-fold cross-validation) + +Copyright (c) 2009-2012 Martin Guetlein, Christoph Helma. See LICENSE for details.
\ No newline at end of file @@ -0,0 +1,18 @@ +TODOs for validation service +============================ + +author: Martin Guetlein, date: 2014-03-18 + +Refactoring +----------- + +* remove redis, replace with 4store +* remove gnuplot, replace with R +* to_json support + +Pitfalls +-------- + +* for better performance datasets are cached in memory (see lib/dataset_cache.rb), this might cause memmory issues when working with large datasets +* validation objects does store predictions in dataset (for better performance, to not read all datasets and models again), this can cause redis to get pretty large +* redis does load everything into main memmory, can cause memmory problems on the long run
\ No newline at end of file diff --git a/lib/README.md b/lib/README.md new file mode 100644 index 0000000..76777ca --- /dev/null +++ b/lib/README.md @@ -0,0 +1,17 @@ +Directory: lib - OpenTox Validation +======================================= + +author: Martin Guetlein, date: 2014-03-18 + +Code +---- + +* **dataset_cache.rb** stores datasets in memmory +* **format_util.rb** util class for formatting to rdf / yaml +* **merge.rb** general merge class, to merge e.g. numeric arrays, computes mean and variance +* **ohm_util.rb** utils for redis (ohm is gem for redis) +* **predictions.rb** prediction statistics for classification and regression +* **ot_predictions.rb** extends predictions.rb, mainly by storing predicted compounds +* **prediction_data.rb** compounds and input data for predictions.rb, can be filtered +* **test_util.rb** util for debugging and testing +* **validation_db.rb** validation and crossvalidation redis-objects, validation-statistic fields diff --git a/report/README.md b/report/README.md new file mode 100644 index 0000000..2fd9507 --- /dev/null +++ b/report/README.md @@ -0,0 +1,36 @@ +Directory: report - OpenTox Validation +======================================= + +author: Martin Guetlein, date: 2014-03-18 + +General +------- + +* reports are no 'html-view' of validations +* instead reports are own objects and created for validations and seperately stored +* report types: + * **valdation** for a single validation + * **crossvalidation** for a cross-validation + * **algorithm-comparison** compares cross-validation of different-algorithms on >=1 datasets + * **method-comparison** compares arbitrary single validations +* reports are stored as docbook-xml files with additional plotfiles +* **IMPORTANT** reports have a own representation of validations (see validation_data.rb, not objects in validation/validation_db.rb) + +Code +---- + +* **environment.rb** requires all gems/files, inits r-util +* **report_application.rb** REST call handling +* **report_service.rb** provides/deletes reports +* **report_factory.rb** creates various report types +* **report_content.rb** fills report content, wrap xml-report + plot-files +* **xml_report.rb** xml-object, this is the actual report content +* **xml_report_util.rb** utils for xml report +* **plot_factory.rb** creates plots +* **report_format.rb** formats reports (to html/pdf) +* **report_persistance.rb** handles storing of reports (stored as file and in redis) +* **report_test.rb** debugging and testing stuff +* **statistical_test.rb** applies t-test for significant different performance +* **util.rb** various utils +* **validation_access.rb** how validations are accessed in reports +* **validation_data.rb** how validations are represented in reports diff --git a/validation/README.md b/validation/README.md index 9daeafb..b535872 100644 --- a/validation/README.md +++ b/validation/README.md @@ -1 +1,24 @@ -test +Directory: validation - OpenTox Validation +======================================= + +author: Martin Guetlein, date: 2014-03-18 + +General +------- + +* a *validation* object resembles a test-set validation, e.g. compounds in the test-set are predicted, prediciton quality is measured +* different types of *validation*s: + * **test-set-validation** input: model and test-set, no algorithm, no training-set + * **training-test-validation** input: algorithm, test-set, training-set + * **training-test-split** input: algorithm, dataset, split-ratio + * **boostrapping** input: algorithm, dataset +* k-fold *crossvalidation* creates k *validation* objects and an additional *validation* object that stores the aggregated statistics + +Code +---- + +* **validation_application.rb** REST call handling +* **validation_format.rb** to_rdf and to_yaml stuff +* **validation_service.rb** does the actual validation work (e.g. model building) +* **validation_test.rb** test-routines for debugging + |