Skip to content

invalid dtype: 'object' when processing NetCDF files with separate band variables

When attempting to process NetCDF files that contain separate band variables (such as Sentinel-2 data with individual bands like B02, B03, B04, etc.), raster2stac fails with a TypeError: invalid dtype: 'object' during the COG generation process.

Environment

  • raster2stac version: 0.0.8
  • Python version: 3.11
  • Operating System: Linux
  • xarray version: 2024.3.0
  • rioxarray version: 0.17.0

Steps to Reproduce

  1. Download or use a NetCDF file with separate band variables (example: Sentinel-2 L2A data)
  2. Run the following code:
from raster2stac import Raster2STAC

rs2stac = Raster2STAC(
    data="path/to/S2_L2A_sample.nc",
    collection_id="SENTINEL2_L2A_SAMPLE",
    collection_url="https://stac.eurac.edu/collections/",
    output_folder="SENTINEL2_L2A_SAMPLE_STAC"
).generate_cog_stac()

Expected Behavior

The library should successfully process the NetCDF file and generate COG files and STAC metadata for each band and timestep.

Traceback (most recent call last):
  File "test_stac.py", line 7, in <module>
    ).generate_cog_stac()
      ^^^^^^^^^^^^^^^^^^^
  File "raster2stac/raster2stac.py", line 1429, in generate_cog_stac
    ].to_dataset(name=band).rio.to_raster(
                                ^^^^^^^^^^
  File "rioxarray/raster_dataset.py", line 539, in to_raster
    return data_array.rio.set_spatial_dims(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rioxarray/raster_array.py", line 1135, in to_raster
    return RasterioWriter(raster_path=raster_path).to_raster(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rioxarray/raster_writer.py", line 279, in to_raster
    with rasterio.open(self.raster_path, "w", **kwargs) as rds:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rasterio/env.py", line 463, in wrapper
    return f(*args, **kwds)
           ^^^^^^^^^^^^^^^^
  File "rasterio/__init__.py", line 254, in open
    raise TypeError(f"invalid dtype: {dtype!r}")
TypeError: invalid dtype: 'object'

Root Cause Analysis

The issue occurs because:

  1. Data Structure Mismatch: raster2stac expects the input data to be an xarray.DataArray with a bands dimension
  2. NetCDF Structure: Many NetCDF files (especially Earth observation data) store bands as separate data variables in an xarray.Dataset rather than as a single DataArray
  3. Dtype Conversion: When the library tries to process individual bands from the Dataset structure, it encounters dtype incompatibilities that result in 'object' dtypes