NEANIAS Gitlab

Commit 356aac0c authored by Carlos H. Brandt's avatar Carlos H. Brandt
Browse files

Add logfiles section to data-store, update workflow figure

parent 497aa9c3
# Data Store
###### ToC
* [Data and metadata](#data-and-metadata)
* [Structure](#structure)
In this document we discuss the structure of the data archive and metadata database
necessary not only to keep queries and access to data efficient but also to store
all the _history_ data products have gone through since the original/source retrieval.
......@@ -12,6 +16,24 @@ Service _products_ are kept for the long term either for reuse by our services
-- _e.g._, mosaic products used on landind sites analysis --
or for direct access and download.
## Data and metadata
_Data_ (products) are stored in a object/file store/system -- here called _archive_ --,
while _metadata_ are stored in a database, each product uniquely identified by its ID.
There are three types of information we want to keep in some form of registry
(on disk or database):
* Image data
* on-disk (.IMG, .CUB, .TIF, .JPG)
* Image metadata
* on-disk (.LBL)
* database
* Processing log
* on-disk (.log)
### Archive
The data (files) are stored in a filesystem (or object store) according to their
......@@ -75,7 +97,7 @@ Such multiplicity of values -- for file formats, for instance -- demands the dat
-- from where such information is retrieved -- to reflect unequivocally such structure.
### DB
### Database
Databases considered to store (spatial) data: MondoDB or PostGIS
......@@ -142,16 +164,24 @@ products with 'productPath'", document `abc_99` will be retrieved but _not_ `abc
_records_ on SQL DBs.
## Data and metadata
### Logfiles
Log files are cumulative, for each data product _the_ respective _logfile_ (`file.log`)
starts when data is requested to download, and each processing step append to it
such that data at each processing level (0,1,2,...) is followed by a logfile containing
its processing history -- since downloading.
> Every processing routine should follow a standard log-reporting, stating _inputs_
> and _outputs_ and any information (variables, functions) for reproducibility
> (software version, container hash).
## Structure
_Data_ (products) are stored in a object/file store/system -- aka, _archive_ --,
while _metadata_ are stored in a database.
#### Store data and metadata
Store final products in archive and insert respective metadata to DB
### DB
- - -
### DB
Metadata (DB):
* product ID
......@@ -160,4 +190,3 @@ Metadata (DB):
* source URL
- - -
assets/workflow.jpg

26.4 KB | W: | H:

assets/workflow.jpg

29.4 KB | W: | H:

assets/workflow.jpg
assets/workflow.jpg
assets/workflow.jpg
assets/workflow.jpg
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment