NEANIAS Gitlab

Commit 8aefed75 authored by Carlos H. Brandt's avatar Carlos H. Brandt
Browse files

Update DMP

parent e1be6c14
# Data Management Plan
# NEANIAS Data
* Tickets:
* WP01:
......
......@@ -4,207 +4,197 @@
**_What is the purpose of the data collection/generation and its relation to the objectives of the project?_**
The Planetary data in NEANIAS is such as to support surface analysis over
planetary bodies of common interest.
Planetary bodies (or simply _bodies_ hereafter) of common interest are planets
and moons greatly targeted in past scientific missions and of potential
interest for future human exploration; For instance, Mars and the Moon.
The Planetary group of NEANIAS, under NEANIAS-Space working group, will provide imagery analysis
tools of highly demanded planetary bodies -- Mars and the Moon. For simplicity, hereafter called _bodies_. The goal is to scale up services we have developed during recent years to a larger audience,
comprising scientists, engineers, educators and their students.
Large coverage of the bodies is important for the exploration of large datasets
is in the fundamentals of NEANIAS as a EOSC infrastructure project, where a
basic goal is to provide data analysis at scales individual users are mostly
short in resources.
To provide the services for _image mosaicking_, _map macking_, _landing sites_ evaluation, or analysis on the _composition_ of the surface we will handle data sets composed by images, and their metadata, mainly, as well as vector databases.
In our services we will provide accessibility to data archives providing the
user a high-level interface for their exploration and filtering as well as the
automation to common processes in geosciences that typically require
area-specific technical knowledge.
Planetary images from different wavelength, resolutions and observation angles provide complementary information to each other to create higher-level products. We will use data already _reduced_ -- fotometrically, spatially calibrated images --, from different wavelengths or _digital terrain models_. Vector data sets are used to annotate images, either as regions or specific locations.
_Geolocation_ is the primary feature of our data. Database systems capable of managing geolocated data -- either images or tables -- are responsible to handle their storage as well as access.
**_What types and formats of data will the project generate/collect?_**
In the following sections, we provide the list of data sets to be handled, followed by their description,
and the interface to users and internal system components. As the project evolves, the content of this document will evolve accordingly as to accommodate the details of our data software system.
Our services will handle at some point a composition of raster and vector data
to provide surface imaging featured by attributes either by the projects data
processing tasks or merged from source data archives.
Raster data groups typical images -- _i.e._, bi-dimensional data arrays -- as
well as data-cubes -- multi-dimensional images, the third-dimension is filled by
fine-resolved spectra or discrete spectral indexes.
Vector data are assimetric multi-dimensional tabular data, typically encoded in
text format where one or more columns provide geometric features like polygons,
lines, or points.
* **TODO**: data formats:
### Datasets
* VICAR
Sources of our data are mostly PDS ( [ref_pds] ) data archives:
* dataset: [MEX HRSC DTM and Nadir imagery](/datasets/MEX/HRSC/README.md)
* archive: PDS3
* format: PDS IMAGE (array)
* source: ESA, NASA
* volume: 5 TB 
* dataset: [MRO CRISM](/datasets/MRO/CRISM/README.md)
* archive: PDS3
* format: PDS IMAGE (cube)
* source: NASA
* volume: 20 TB
* dataset: [MRO HiRISE](/datasets/MRO/HiRISE/README.md)
* archive: PDS3
* format: PDS IMAGE (array)
* source: NASA
* volume: 20 TB
* dataset: [MRO CTX](/datasets/MRO/CTX/README.md)
* archive: PDS3
* format: PDS IMAGE (array)
* source: NASA
* volume: 10 TB 
* dataset: [LRO LROC](/datasets/LRO/LROC/README.md)
* archive: PDS3
* format: PDS IMAGE (array)
* source: NASA
* volume: 10 TB 
* dataset: [MRO MOLA + KAGUYAw](/datasets/MRO/MOLA/README.md)
* archive: PDS3
* format: PDS IMAGE (array)
* source: NASA
* volume: 2 TB
**_Will you re-use any existing data and how?_**
Original data will be retrieve from NASA and ESA archives.
Data sources:
* dataset: [ESA MEX HRSC Nadir imagery](/datasets/MEX/HRSC/README.md)
* format: Geotiff
* source: ESA PSA
* volume: 2 TB 
* dataset: [NASA MRO CRISM](/datasets/MRO/CRISM/README.md)
* format: PDS/ISIS cube/geotiff
* source: NASA PDS
* volume: 20 TB
* dataset: [NASA MRO HIRISE](/datasets/MRO/HiRISE/README.md)
* format: PDS/ISIS cube/geotiff
* source: NASA PDS
* volume: 20 TB
* dataset: ESA MEX HRSC DTM
* format: PDS/ISIS cube/geotiff
* source: ESA MEX (PSA)
* volume: 2 TB 
* dataset: NASA MRO CTX
* format: PDS/ISIS cube/geotiff
* source: NASA MRO
* volume: 10 TB 
* dataset: NASA LRO LROC
* format: PDS/ISIS cube/geotiff
* source: NASA PDS
* volume: 10 TB 
* dataset: NASA MRO LOLA + KAGUYAw
* format: PDS/ISIS cube/geotiff
* source: NASA PDS / JAXA
* volume: 2 TB
#### TBD:
* dataset: PLANMAP maps
* archive: PLANMAP
* format: GeoTIFF/Geopackage
* source: Planmap consortium
* volume: 5 TB
* source: PLANMAP consortium
* volume: 5 TB
* dataset: Mars Cave Database (https://www.usgs.gov/center-news/caves-mars, https://www.sciencebase.gov/catalog/item/5bd36eb1e4b0b3fc5ce51783)
* format: Shapefile
* source: NASA USGS
* volume: 2 MB
* dataset: Mars Global Digital Dune Database: MC–1 (https://pubs.usgs.gov/of/2010/1170/)
* format: GeoTIFF/Shapefile
* source: NASA USGS
* volume: 2 GB
* dataset: Mars Global Digital Dune Database: MC2–MC29 (https://pubs.usgs.gov/of/2007/1158/)
* format: GeoTIFF/Shapefile
* source: NASA USGS
* volume: 2 GB
* dataset: Mars Global Digital Dune Database: MC–30 (https://pubs.usgs.gov/of/2012/1259/)
* format: GeoTIFF/Shapefile
* source: NASA USGS
* volume: 3 GB
Total amount of original data is around 80TB.
**_To whom might it be useful ('data utility')?_**
Scientists, Engineers.
Total amount of source data is around 80TB.
### FAIR Data
Key components, and observations, towards a better use of the services and data provided:
* Findable: Publish data on indexes like Virtual Obervatories (IVOA) and OGC Catalogue service, as well as a clear _Data Access_ HTML interface under NEANIAS web portal (well indexed by search engines, _e.g._, Google);
* Accessible: Use standard data access protocols -- IVOA/EPN-TAP([R4][]) and OGC services([R1][],[R2][],[R3][]) --, and well known data and metadata file formats. The technical barrier to read the data from a file or a database should be reasonably low.
* Interoperable: The use of standard, well known data formats, models and access APIs allows our data to exploit a virtually infinity set of resources -- software and documentation -- worldwide available. Data is provided in open source file formats -- GeoTIFF, GeoPackage, PDS IMAGE --, and access interfaces through open standards -- OGC, IVOA, REST.
* Reproducible: Data products -- produced by our systems -- should be linked to their respective processing _logfile_, either as a separated file or merged as a _header_ section.
[R1]: https://www.ogc.org/standards/cat
[R2]: https://docs.geoserver.org/latest/en/user/services/csw/index.html
[R3]: https://docs.geoserver.org/latest/en/user/data/cascaded/index.html
[R4]: https://www.ivoa.net
Planetary data will be available through OGC standards WCS, WMS, WFS, and the Catalogue Services for the Web ([R1][],[R2][]), we may also use the Cascade service provided by GeoServer ([R3][]) to proxy external services -- for instance, Rasdaman.
The IVOA registry is an important resource for data discovery, publishing our data through EPN-TAP -- specifically designed for Planetary data -- should effectively connect our data to the astronomical community.
**TBD: _Data products/sets naming conventions to follow_**
## FAIR DATA
Making data findable, including provisions for metadata
**_Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?_**
Planetary data will be available through OGC standards WCS, WMS, WFS, and the
Catalogue Services for the Web[@ref1;@ref2], where, by the way, we may use the
Cascade service provided by GeoServer[@ref3] to proxy external services (for
instance, Rasdaman) and provide a single point of access.
In synergy to Astronomical data, IVOA registry, through EPN-TAP, is a simple,
straightforward interface that we should implement to make planetary data
findable from both sides: for astrophysicists, used to IVOA framework, as well
as geoscientist, long used to OGC standards.
[@ref1]: https://www.ogc.org/standards/cat
[@ref2]: https://docs.geoserver.org/latest/en/user/services/csw/index.html
[@ref3]: https://docs.geoserver.org/latest/en/user/data/cascaded/index.html
**TBD: _Will search keywords be provided that optimize possibilities for re-use?_**
**_What naming conventions do you follow?_**
Dataset name conventions should simply replicate those from original data
sources whenever original data or direct products from original data are
provided. In case of direct products (_e.g._, recalibrated data), we must
append an expression -- `-NEANIAS` -- to the dataset name as well as any
metadata element (_e.g._, file header) that composes it.
### Version numbers
**TBD**: naming convention for derived, high-level products.
NEANIAS Planetary services will provide dynamic, on users demand data products, as such it does not foresee a schema to versionize its products other than an _expressive data product/file naming_ specification and a detailed logfile -- _i.e._, product definition.
**_Will search keywords be provided that optimize possibilities for re-use?_**
**TODO**
### Metadata
Linked to data products metadata regarding the processing steps caried out during their generation must be provided. Metadata can be provided either as an individual text file, as a text header section of the data file. For instance, GeoTIFF images allow _keyword-value_ pairs of metadata in their headers, GeoPackage datasets will host any number of table, which can be used on metadata table(s).
**_Do you provide clear version numbers?_**
**TBD: define metadata keywords and datatypes**
Data products by NEANIAS planetary services are dynamic in that they
will be produced on users demand, according to individual use cases.
Our services major goal is to abstract -- or ease -- the technical detail of
well defined processes given well known data (sources).
It is tough foreseen that the software providing the different products will
evolve, or even different options for algorithms will be offered to the user.
At this point, it is clear that a version should be assigned to the data
products and that such version can be defined based on the software and
the software version used.
**_What metadata will be created?
In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how._**
### Data Products
Accompanying the data product, whenever possible and convenient, it should be
included metadata regarding the main processing steps caried out during the
generation of it.
_Convenience_ and _possibility_ of inclusion of such metadata depends on the
file format data is to be stored; we don't want to provide multiple files to
accomplish that (the metadata).
For instance, GeoTIFF images allow basic _keyword-value_ pairs, and GeoPackage
allow the inclusion of table-formatted comments, free-format values.
#### Image Mosaic
TBD
**_Making data openly accessible
Which data produced and/or used in the project will be made openly available as the default?_**
#### Map Macking
All planetary data used are public, the products are public as well.
TBD
#### Landing Sites
**_How will the data be made accessible (e.g. by deposition in a
repository)?_**
TBD
As previously stated, the main interface for data access is through the
employment of REST APIs (OGC and IVOA protocols), which means users will use
proper software clients to _request_ for data.
#### Mineralogy
TBD
**_What methods or software tools are needed to access the data?_**
The decision of using standard protocols to access the data is supported by two
aspects:
### Data access
#### GUI
* Interactive map web interface -- ADAM/PlanetServer --, the user can visually check for the data to download or to launch data analysis services.
* [QGIS][@ref-qgis] is a popular open-source GIS client software that which can access our data through the OGC services, but also through IVOA/EPN TAP service with the plugin developed by [VESPA/EPN][@ref-qgisplugin].
[@ref-qgis]: https://qgis.org/en/site/
[@ref-qgisplugin]: https://github.com/epn-vespa/VO_QGIS_plugin
* Jupyter Notebook, for interactive lower-level data analysis.
#### Query interface
* OGC WMS, WCS, WPS, WFS
* EPN-TAP
* REST
#### File format
* GeoTIFF
* GeoPackage
* PDS IMAGE
* PDS LABEL
### Databases
* Rasdaman
* PostGIS
* ?
### Data Archives
* PDS
<hr/>
The decision of using standard protocols to access the data is supported by two aspects:
1. a well defined interface where a consistent set of uses cases is part of;
2. the availability or working implementations of clients (and servers).
Planetary geoscience -- as a spin-off, if I may, from Earth geoscience --
benefits from a mature, stable and wealthy software landscape where OGC
standards are broadly used.
In particular, providing access to NEANIAS Space/Planetary services,
QGis[@ref-qgis] is a popular open-source GIS client software that will allow
our users to access our data. As an extra feature, QGIS users can also access
IVOA EPN-TAP services through the plugin developed during H2020 EPN-VESPA
project[@ref-qgisplugin].
[@ref-qgis]: https://qgis.org/en/site/
[@ref-qgisplugin]: https://github.com/epn-vespa/VO_QGIS_plugin
**_Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories which support open access where possible._**
......
# NEANIAS Data Management Plan
# Data Management Plan Template
* [NEANIAS Planetary DMP](DMP_NEANIAS_Planetary.md)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment