Climate data products to STAC
Publish "climate products" in our STAC catalogue. These products currently include:
-
AWS temperature and precipitation data (but this requires Vector Cubes to be supported)
-
AWS-derived grids of temperature and precipitation (Trentino-South Tyrol)
-
AWS-derived grids of temperature and precipitation climatologies (Trentino-South Tyrol) [TBD: How often are climatologies updated?]
-
RCP 4.5/8.5 EURO-CORDEX-based daily temperature and precipitation climate projections (Trentino-South Tyrol)
- /mnt/CEPH_PROJECTS/FACT_CLIMAX/CORDEX-Adjust/QDM/*
-
RCP 4.5/8.5 EURO-CORDEX-based yearly temperature and precipitation climate indices (Trentino-South Tyrol)
- Warm Spell Duration Index in Trentino-South Tyrol, IT (1971-2100) - EDP (an example)
- /mnt/CEPH_PROJECTS/FACT_CLIMAX/CORDEX-Adjust/INDICES/*
-
X-RISK-CC CERRA-based (historical) and EURO-CORDEX-based (future scenarios) climate indices data (Alpine Space)
NOTE: those data that are already in the EDP need either to be ported to STAC (they might be stored in another backend, eg. rasdaman), or anyway need a syncing job to be executed on a daily/weekly basis to keep the catalogue up to date.
The outcome shall be a set of 1+ command-line utilities for publishing/updating[/deleting] collections to a STAC catalogue (meaning data on CEPH + STAC metadata in server, for the time being). These utilities will be integrated in existing data pipelines like the "meteo data pipeline".
Local environments must be defined (eg. conda envs), and dotenv and/or configuration files + other optional metadata files (eg. extra metadata not included in the original files) must be taken into account by the program.
Basic fundamental functionality of the implemented modules:
- logging
- "dry-run" option
- verbose/debug option to enable DEBUG log lines in stdout
- management of already existing collection in remote catalogue: options to i) forcely overwrite, ii) update, iii) NOP, etc.
A suite of tests for all "types" of datacubes involved shall be part of the repository as well, meaning "toy" simplified version of the actual input datasets (eg. lowered spatial resolution to 2x2 pixels, etc).
With regards to the "utility" for importing the raster files to data cubes / a STAC catalogue (and which could be part of the raster-to-stac
package),
the rationales behind the conversion to xarrays (which are then exported to STAC) could be that, given the INPUT file(s):
-
if INPUT is a single file (eg NetCDF):
- easy, basically delegate to
xarray.open_dataset()
- easy, basically delegate to
-
if INPUT is a folder with N files:
- delegate to a proper call to
xarray.open_mfdataset()
, - the user must be able to specify along which dimension (label and type) shall the single files be concatenated along.
- the user must be able to specify how to label each coordinate (or label) along this dimension.
- delegate to a proper call to
-
if INPUT is 2+ folders containing files, then:
- manually concatenate consecutive calls to
xarray.open_mfdataset()
- the user must be able to specify along which dimension (label and type):
- the files/arrays within each folder shall be concatenated along
- the resulting marrays representing each folder shall be concatenated along.
- the user must be able to specify how to label each coordinate (or label) along these two "arbitrary" dimensions.
- manually concatenate consecutive calls to
IMPORTANT: regarding the way to let a user specify the exact labels for the arbitrary dimensions, I would suggest to give possibility to:
-
extract portions of file names through a provided regex (eg. often time footprint or other domain information are available in the name of the files)
-
same as above, but on the name of the folders for what concerns the folder-level arbitrary dimension in case 3 above.
IMPORTANT: In all of the cases, the user can specify which dimension to turn into BANDS
in the data cube, i.e. turning a single xarray DataSet
variable into an array thereof.