Why¶
Rasterix provides a RasterIndex that uses an affine transform to enable indexing of the underlying data.
Why should you use it?
It eliminates an entire class of bugs where Xarray allows you to add (for example) two datasets with different affine transforms (and/or projections) and return nonsensical outputs.
It enables indexing using the coordinate transformation, minimizing impacts of any floating-point mismatches.
To fit the Xarray data model, RasterIndex creates lazy coordinate variables and propagated them during indexing as much as possible. Thus very large coordinates can be represented with minimal memory cost.
Quick demo¶
Below we quickly demonstrate some of these features. Note that this particular example notebook skips CRS handling.
We will demonstrate the raster index with a simple dataset with rectilinear coordinates and no rotation or skew. Both x and y coordinates are 1-dimensional.
source = "../data/S_20240101_concentration_v4.0.tif"
With and without RasterIndex.¶
da_no_raster_index = xr.open_dataarray(source, engine="rasterio")
da_no_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* y (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
* x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...
Attributes:
AREA_OR_POINT: Areada_raster_index = xr.open_dataarray(source, engine="rasterio", parse_coordinates=False).pipe(
rasterix.assign_index
)
da_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* y (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
* x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...
Indexes:
┌ x RasterIndex (crs=None)
└ y
Attributes:
AREA_OR_POINT: AreaThe affine transforms are accessible:
da_raster_index.xindexes["x"].transform() # top-left
Affine(25000.0, 0.0, -3950000.0,
0.0, -25000.0, 4350000.0)
da_raster_index.xindexes["x"].center_transform() # pixel centrres
Affine(25000.0, 0.0, -3937500.0,
0.0, -25000.0, 4337500.0)
Now Let’s compare the coordinates for the DataArrays with and without RasterIndex.
Without a raster index, x and y are in-memory data.
da_no_raster_index.x
<xarray.DataArray 'x' (x: 316)> Size: 3kB
array([-3937500., -3912500., -3887500., ..., 3887500., 3912500., 3937500.],
shape=(316,))
Coordinates:
* x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...By contrast, with a RasterIndex, coordinate values are lazy and generated on-demand!
da_raster_index.x
<xarray.DataArray 'x' (x: 316)> Size: 3kB
[316 values with dtype=float64]
Coordinates:
x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...These differences are viewable in the repr. Note the PandasIndex type under “Indexes”.
da_no_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* y (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
* x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...
Attributes:
AREA_OR_POINT: AreaThe repr below shows a few values for each coordinate (those have been computed on-the-fly) but clicking on the database icon doesn’t show any value in the spatial coordinate data reprs.
da_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* y (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
* x (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
spatial_ref int64 8B ...
Indexes:
┌ x RasterIndex (crs=None)
└ y
Attributes:
AREA_OR_POINT: AreaThe generated coordinate values correspond to cell centers.
da_raster_index.x.to_numpy()
array([-3937500., -3912500., ..., 3912500., 3937500.], shape=(316,))
Equality¶
equals compares variable values without relying on Xarray coordinate indexes. Both dataarrays should thus be equal.
da_raster_index.equals(da_no_raster_index)
True
Alignment¶
Xarray alignment relies on Xarray coordinate indexes. Trying to align both datasets fails here since they each have different index types.
da_raster_index + da_no_raster_index
AlignmentError: cannot align objects on coordinate 'x' because of conflicting indexes
first index: PandasIndex(Index([-3937500.0, -3912500.0, -3887500.0, -3862500.0, -3837500.0, -3812500.0,
-3787500.0, -3762500.0, -3737500.0, -3712500.0,
...
3712500.0, 3737500.0, 3762500.0, 3787500.0, 3812500.0, 3837500.0,
3862500.0, 3887500.0, 3912500.0, 3937500.0],
dtype='float64', name='x', length=316))
second index: RasterIndex(crs=None)
AxisAffineTransformIndex(AxisAffineTransform(a=2.5e+04, b=0, c=-3.938e+06, d=0, e=-2.5e+04, f=4.338e+06, axis=X, dim='x'))
AxisAffineTransformIndex(AxisAffineTransform(a=2.5e+04, b=0, c=-3.938e+06, d=0, e=-2.5e+04, f=4.338e+06, axis=Y, dim='y'))
first variable: <xarray.IndexVariable 'x' (x: 316)> Size: 3kB
array([-3937500., -3912500., -3887500., ..., 3887500., 3912500., 3937500.],
shape=(316,))
second variable: <xarray.Variable (x: 316)> Size: 3kB
[316 values with dtype=float64]