NEANIAS Gitlab

Commit a150f3d3 authored by Carlos H. Brandt's avatar Carlos H. Brandt
Browse files

[wip} update data-store

parent 92bb6bfd
......@@ -119,6 +119,7 @@ they are to the interfaces (visualization, processing tools) they are used on.
The db is modified everytime a new data product is downloaded (_insert_ new entry in db)
or a data product is further processed to a higher-level (_update_ existing entry).
##### MongoDB
Besides providing support for (WGS84) spatial index, MongoDB's flexibility regarding
......@@ -178,15 +179,71 @@ its processing history -- since downloading.
## Structure
The database must reflect the structure of the data archive.
The archive must follow the structure set up in the environment manager.
The environment manager is discussed in more details in document
[Environment](00_environment.md), but essencially
-- for the matter of this discussion --
it sets the _base path_ for each data product _level_ for each _dataset_
(see [Data Source](01_data_source.md) for datasets in use).
> Collections are split by datasets.
### Archive
Where to read a data product from or where to write data to are the questions behind
structuring a data archive. It will probably reflect the underlying storage physical
structure, but, more importantly, it should replect a logical/conceptual structure
related to the data itself.
For instance, we have been discussing about data product levels:
in the archive, each product level is designated to an area in the file system tree.
Considering also that products relate uniquely to their datasets
-- _product type_<sup>++</sup>, instrument, mission, planet --
the archive structure should reflect this relationship.
By printing the relationship between the data products and some of their (immutable)
metadata is valid as it (_i_) naturally creates a hierarchy -- avoiding a flat, super-populated pool of data --, and (_ii_) it guarantees data product IDs/Filenames will _not_ clash.
- - -
### DB
Metadata (DB):
### Database
Metadata associated to data products can be split in two groups: immutable and mutable.
Immutable information refers to data that will not change after a processing step
(_e.g._, product ID). Whereas _mutable_ information change according to the product
level (_e.g._, image path).
Follows, the list of immutable information:
* product ID
* footprint (geometry)
* local path
* instrument
* mission
* footprint (geometry:polygon)
* start-time
* stop-time
* source URL
- - -
> Do we get all this info from ODE's query result or should we read LABEL?
Follows, the list of mutable information:
* product type<sup>++</sup>
* data isis path
* data tiff path
* data jpeg path
> Not clear to me whether this "mutable" (type/level-related) information should
> be:
>
> * in the same collection (together next to immutables)
> * or should it/they be in a separate collection(s) (specific for each level)
>
> I tend to prefer the second option as it separate clearly `r/w` permission regimes.
> (Better to guarantee data/base consistency.) In that case -- _i.e._ having specific
> collections for each product level --, we need to keep track of the "latest level"
> (or "current product type") we have in our archive so that the _join_ between
> documents across the database collections is effectively done.
<sup>++</sup> _product type_ is the way PDS refers to their data _level_ (EDR, RDR, RDRv11, etc)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment