load scale and offset if present
This Merge request adds the functionality to automatically load the scale and offset values from the input datasets into the STAC Metadata directly without having the user enter them manually, if present in the datasets.
However, if these values are not present or are incorrectly written in the data metadata, the user can use the write_scale_and_offset.py
script to enter the values to the datasets and Raster2STAC should be able to read the correct values directly into the metadata afterwards.
Merge request reports
Activity
added dev label
assigned to @michele.claus
requested review from @michele.claus
assigned to @RufaiOmowunmi.Balogun and unassigned @michele.claus
added 2 commits
@michele.claus could you help review the updates. if it's fine with you, I can add the new changes to the Change log and tag it
714 if isinstance(self.data, xr.DataArray) or isinstance(self.data, str): 715 if isinstance(self.data, xr.DataArray): 715 if isinstance(self.data, xr.Dataset) or isinstance(self.data, str): 716 if isinstance(self.data, xr.Dataset): 716 717 pass 717 718 elif isinstance(self.data, str): 718 719 source_path = os.path.dirname(self.data) 719 local_conn = LocalConnection(source_path) 720 self.data = local_conn.load_collection(self.data).execute() 720 # local_conn = LocalConnection(source_path) 721 # self.data = local_conn.load_collection(self.data).execute() 722 self.data = xr.open_dataset(source_path) 723 724 # store datasets in a placeholder 725 self.data_ds = self.data.copy(deep=True) 726 self.data = self.data.to_array(dim="bands") changed this line in version 5 of the diff
711 712 self.upload_s3(output_path) 712 713 713 714 def generate_cog_stac(self): 714 if isinstance(self.data, xr.DataArray) or isinstance(self.data, str): 715 if isinstance(self.data, xr.DataArray): 715 if isinstance(self.data, xr.Dataset) or isinstance(self.data, str): 716 if isinstance(self.data, xr.Dataset): 716 717 pass 717 718 elif isinstance(self.data, str): 718 719 source_path = os.path.dirname(self.data) 719 local_conn = LocalConnection(source_path) 720 self.data = local_conn.load_collection(self.data).execute() 720 # local_conn = LocalConnection(source_path) 721 # self.data = local_conn.load_collection(self.data).execute() 722 self.data = xr.open_dataset(source_path) I would leave the loading of data from file paths to the openEO implementation, since it takes care of loading correctly the data based on the file format (ZARR, netCDF, geoTIFF) and other things (CRS, descriptions). See here: https://github.com/Open-EO/openeo-python-client/blob/d64fc30decfcf94860553f1b32027f5d1183099d/openeo/local/processing.py#L50
You would have to convert it to an xarray Dataset in a way like:
self.data_ds = self.data.to_dataset(dim=self.B_DIM)
changed this line in version 5 of the diff
@RufaiOmowunmi.Balogun you are overwriting self.data, instead of assigning it to self.data_ds
Edited by Claus Michele
223 223 # Missing `bits_per_sample` and `spatial_resolution` 224 224 # It should contain only one band/variable 225 225 # for band in src_dst.indexes: 226 value = { 227 "data_type": str(src_dst.dtype), 228 "scale": 1, # TODO: load scale and offset if present 229 "offset": 0, 230 } 226 if src_dst.attrs["scale_factor"] or src_dst.attrs["add_offset"]: changed this line in version 5 of the diff
@RufaiOmowunmi.Balogun I left some comments, let me know!
By the way, do you think it would make sense to include directly in this PR the nodata field extraction? Ot should we leave it for new PR?
@michele.claus Okay, I will check them and update you accordingly. Actually, the nodata extraction was already included in a previous commit and now works well when we load directly as an xr.Dataset:
if src_dst.rio.nodata is not None: if numpy.isnan(src_dst.rio.nodata): value["nodata"] = "nan" elif numpy.isposinf(src_dst.rio.nodata): value["nodata"] = "inf" elif numpy.isneginf(src_dst.rio.nodata): value["nodata"] = "-inf" else: value["nodata"] = src_dst.rio.nodata
Edited by RufaiOmowunmi.Balogun
@michele.claus made some updates based on the recommendations now. could you help take a look?
148 148 "scale": 0.01, 149 149 "offset": 0.0, 150 150 "sampling": "area", 151 "nodata": -32768.0, @RufaiOmowunmi.Balogun: if the data type is integer, the "nodata" value should also be an integer value. You can use the dtype of the data you axtract to perform this check/cast this value to the proper dtype.
changed this line in version 6 of the diff