Why

Rasterix provides a RasterIndex that uses an affine transform to enable indexing of the underlying data.

Why should you use it?

  1. It eliminates an entire class of bugs where Xarray allows you to add (for example) two datasets with different affine transforms (and/or projections) and return nonsensical outputs.

  2. It enables indexing using the coordinate transformation, minimizing impacts of any floating-point mismatches.

  3. To fit the Xarray data model, RasterIndex creates lazy coordinate variables and propagated them during indexing as much as possible. Thus very large coordinates can be represented with minimal memory cost.

Quick demo

Below we quickly demonstrate some of these features. Note that this particular example notebook skips CRS handling.

Hide code cell source

%xmode minimal

import numpy as np
import xarray as xr

import rasterix

np.set_printoptions(threshold=10, edgeitems=2)
xr.set_options(display_expand_indexes=True)

Hide code cell output

Exception reporting mode: Minimal
<xarray.core.options.set_options at 0x785e8733ac60>

We will demonstrate the raster index with a simple dataset with rectilinear coordinates and no rotation or skew. Both x and y coordinates are 1-dimensional.

source = "../data/S_20240101_concentration_v4.0.tif"

With and without RasterIndex.

da_no_raster_index = xr.open_dataarray(source, engine="rasterio")
da_no_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * y            (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
  * x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...
Attributes:
    AREA_OR_POINT:  Area
da_raster_index = xr.open_dataarray(source, engine="rasterio", parse_coordinates=False).pipe(
    rasterix.assign_index
)
da_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * y            (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
  * x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...
Indexes:
  ┌ x        RasterIndex (crs=None)
  └ y
Attributes:
    AREA_OR_POINT:  Area

The affine transforms are accessible:

da_raster_index.xindexes["x"].transform()  # top-left
Affine(25000.0, 0.0, -3950000.0,
       0.0, -25000.0, 4350000.0)
da_raster_index.xindexes["x"].center_transform()  # pixel centrres
Affine(25000.0, 0.0, -3937500.0,
       0.0, -25000.0, 4337500.0)

Now Let’s compare the coordinates for the DataArrays with and without RasterIndex.

Without a raster index, x and y are in-memory data.

da_no_raster_index.x
<xarray.DataArray 'x' (x: 316)> Size: 3kB
array([-3937500., -3912500., -3887500., ...,  3887500.,  3912500.,  3937500.],
      shape=(316,))
Coordinates:
  * x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...

By contrast, with a RasterIndex, coordinate values are lazy and generated on-demand!

da_raster_index.x
<xarray.DataArray 'x' (x: 316)> Size: 3kB
[316 values with dtype=float64]
Coordinates:
    x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...

These differences are viewable in the repr. Note the PandasIndex type under “Indexes”.

da_no_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * y            (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
  * x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...
Attributes:
    AREA_OR_POINT:  Area

The repr below shows a few values for each coordinate (those have been computed on-the-fly) but clicking on the database icon doesn’t show any value in the spatial coordinate data reprs.

da_raster_index
<xarray.DataArray 'band_data' (band: 1, y: 332, x: 316)> Size: 420kB
[104912 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * y            (y) float64 3kB 4.338e+06 4.312e+06 ... -3.912e+06 -3.938e+06
  * x            (x) float64 3kB -3.938e+06 -3.912e+06 ... 3.912e+06 3.938e+06
    spatial_ref  int64 8B ...
Indexes:
  ┌ x        RasterIndex (crs=None)
  └ y
Attributes:
    AREA_OR_POINT:  Area

The generated coordinate values correspond to cell centers.

da_raster_index.x.to_numpy()
array([-3937500., -3912500., ...,  3912500.,  3937500.], shape=(316,))

Equality

equals compares variable values without relying on Xarray coordinate indexes. Both dataarrays should thus be equal.

da_raster_index.equals(da_no_raster_index)
True

Alignment

Xarray alignment relies on Xarray coordinate indexes. Trying to align both datasets fails here since they each have different index types.

da_raster_index + da_no_raster_index
AlignmentError: cannot align objects on coordinate 'x' because of conflicting indexes
first index: PandasIndex(Index([-3937500.0, -3912500.0, -3887500.0, -3862500.0, -3837500.0, -3812500.0,
       -3787500.0, -3762500.0, -3737500.0, -3712500.0,
       ...
        3712500.0,  3737500.0,  3762500.0,  3787500.0,  3812500.0,  3837500.0,
        3862500.0,  3887500.0,  3912500.0,  3937500.0],
      dtype='float64', name='x', length=316))
second index: RasterIndex(crs=None)
    AxisAffineTransformIndex(AxisAffineTransform(a=2.5e+04, b=0, c=-3.938e+06, d=0, e=-2.5e+04, f=4.338e+06, axis=X, dim='x'))
    AxisAffineTransformIndex(AxisAffineTransform(a=2.5e+04, b=0, c=-3.938e+06, d=0, e=-2.5e+04, f=4.338e+06, axis=Y, dim='y'))
first variable: <xarray.IndexVariable 'x' (x: 316)> Size: 3kB
array([-3937500., -3912500., -3887500., ...,  3887500.,  3912500.,  3937500.],
      shape=(316,))
second variable: <xarray.Variable (x: 316)> Size: 3kB
[316 values with dtype=float64]