With exactextract

Rasterix includes dask-aware wrappers for exactextract that support two types of operations:

  1. Coverage calculation: Compute precise fractional coverage values for each geometry using coverage()

  2. Rasterization (burning): Burn geometry indices into a raster using rasterize() with engine="exactextract"

Note

The exactextract engine does not support the all_touched parameter. However, exactextract naturally includes any pixel with non-zero coverage, which produces results similar to all_touched=True in rasterio.

Read data

Read in some raster data

import xarray as xr
import xproj  # noqa

ds = xr.tutorial.open_dataset("eraint_uvz")
ds = ds.proj.assign_crs(spatial_ref="epsg:4326")
ds
/home/docs/checkouts/readthedocs.org/user_builds/rasterix/envs/stable/lib/python3.12/site-packages/xarray/conventions.py:205: SerializationWarning: variable 'z' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
  var = coder.decode(var, name=name)
/home/docs/checkouts/readthedocs.org/user_builds/rasterix/envs/stable/lib/python3.12/site-packages/xarray/conventions.py:205: SerializationWarning: variable 'u' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
  var = coder.decode(var, name=name)
/home/docs/checkouts/readthedocs.org/user_builds/rasterix/envs/stable/lib/python3.12/site-packages/xarray/conventions.py:205: SerializationWarning: variable 'v' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely.
  var = coder.decode(var, name=name)
<xarray.Dataset> Size: 17MB
Dimensions:      (month: 2, level: 3, latitude: 241, longitude: 480)
Coordinates:
  * month        (month) int32 8B 1 7
  * level        (level) int32 12B 200 500 850
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
  * spatial_ref  int64 8B 0
Data variables:
    z            (month, level, latitude, longitude) float64 6MB ...
    u            (month, level, latitude, longitude) float64 6MB ...
    v            (month, level, latitude, longitude) float64 6MB ...
Indexes:
    spatial_ref  CRSIndex (crs=EPSG:4326)
Attributes:
    Conventions:  CF-1.0
    Info:         Monthly ERA-Interim data. Downloaded and edited by fabien.m...

Read in example geometries

import geodatasets
import geopandas as gpd

world = gpd.read_file(geodatasets.get_path("naturalearth land"))
world
featurecla scalerank min_zoom geometry
0 Land 1 1.0 POLYGON ((-59.57209 -80.04018, -59.86585 -80.5...
1 Land 1 1.0 POLYGON ((-159.20818 -79.49706, -161.1276 -79....
2 Land 1 0.0 POLYGON ((-45.15476 -78.04707, -43.92083 -78.4...
3 Land 1 1.0 POLYGON ((-121.21151 -73.50099, -119.91885 -73...
4 Land 1 1.0 POLYGON ((-125.55957 -73.48135, -124.03188 -73...
... ... ... ... ...
122 Land 1 1.0 POLYGON ((51.13619 80.54728, 49.79368 80.41543...
123 Land 0 0.0 POLYGON ((99.93976 78.88094, 97.75794 78.7562,...
124 Land 0 0.0 POLYGON ((-87.02 79.66, -85.81435 79.3369, -87...
125 Land 0 0.0 POLYGON ((-68.5 83.10632, -65.82735 83.02801, ...
126 Land 0 0.0 POLYGON ((-27.10046 83.51966, -20.84539 82.726...

127 rows × 4 columns

Calculate coverage

from rasterix.rasterize.exact import coverage

n = coverage(ds, world[["geometry"]], xdim="longitude", ydim="latitude")
n
<xarray.DataArray 'coverage' (geometry: 127, latitude: 241, longitude: 480)> Size: 1MB
<COO: shape=(127, 241, 480), dtype=float64, nnz=42244, fill_value=0.0>
Coordinates:
    geometry     (geometry) object 1kB POLYGON ((-59.57209469261153 -80.04017...
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
    spatial_ref  int64 8B 0
n.isel(geometry=112).plot()
<matplotlib.collections.QuadMesh at 0x7487df119160>
../_images/ec5d69df75db6c9185c6dd25585106682be3fe81a36abe3a88f47640da438ba0.png

Optionally clip to the total_bounds of the geometries passed in

from rasterix.rasterize.exact import coverage

coverage(
    ds,
    world[["geometry"]].iloc[slice(112, 113)],
    xdim="longitude",
    ydim="latitude",
    clip=True,
).plot()
<matplotlib.collections.QuadMesh at 0x7487df290410>
../_images/e967b042db975a682409f7d90b474ae7c42bd5a0bb4513b495d95278bec1dd86.png

Notice that the output data is a sparse.COO array with a new dimension geometry. The input geometries are propagated as a coordinate variable named geometry.

Different coverage weights are supported. Extra attributes units and long_name are assigned as appropriate.

from rasterix.rasterize.exact import coverage

n = coverage(ds, world[["geometry"]], xdim="longitude", ydim="latitude", coverage_weight="fraction")
n
<xarray.DataArray 'coverage' (geometry: 127, latitude: 241, longitude: 480)> Size: 1MB
<COO: shape=(127, 241, 480), dtype=float64, nnz=42244, fill_value=0.0>
Coordinates:
    geometry     (geometry) object 1kB POLYGON ((-59.57209469261153 -80.04017...
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
    spatial_ref  int64 8B 0

Out-of-core support

As with other rasterization code, various combinations of chunked and in-memory arrays and geometries are supported.

chunked = ds.chunk({"latitude": -1, "longitude": 120})
d = coverage(chunked, world[["geometry"]], xdim="longitude", ydim="latitude")
d
<xarray.DataArray 'coverage' (geometry: 127, latitude: 241, longitude: 480)> Size: 118MB
dask.array<coverage_np_dask_wrapper, shape=(127, 241, 480), dtype=float64, chunksize=(127, 241, 120), chunktype=sparse.COO>
Coordinates:
    geometry     (geometry) object 1kB POLYGON ((-59.57209469261153 -80.04017...
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
    spatial_ref  int64 8B 0
import dask_geopandas as dgpd

dd = coverage(
    ds.chunk({"latitude": -1, "longitude": 240}),
    dgpd.from_geopandas(world[["geometry"]], npartitions=3),
    xdim="longitude",
    ydim="latitude",
)
dd
<xarray.DataArray 'coverage' (geometry: 127, latitude: 241, longitude: 480)> Size: 118MB
dask.array<coverage_np_dask_wrapper, shape=(127, 241, 480), dtype=float64, chunksize=(43, 241, 240), chunktype=sparse.COO>
Coordinates:
    geometry     (geometry) object 1kB dask.array<chunksize=(43,), meta=np.ndarray>
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
    spatial_ref  int64 8B 0

Rasterization (burning)

In addition to coverage calculation, exactextract can be used as an engine for the main rasterize() function. This “burns” geometry indices into a raster, similar to rasterio.features.rasterize.

The exactextract engine uses binary coverage (coverage_weight=none) internally, burning any pixel with non-zero coverage.

from rasterix.rasterize import rasterize

rasterized = rasterize(ds, world[["geometry"]], xdim="longitude", ydim="latitude", engine="exactextract")
rasterized.plot()
<matplotlib.collections.QuadMesh at 0x7487df5723f0>
../_images/39f33b1e01a58e51d815013a42abe4207693a96ff1d55d78914306b95b1d4d6d.png

Out-of-core support works the same way as with the coverage function:

rasterized_dask = rasterize(
    ds.chunk({"latitude": -1, "longitude": 120}),
    world[["geometry"]],
    xdim="longitude",
    ydim="latitude",
    engine="exactextract",
)
rasterized_dask
<xarray.DataArray 'rasterized' (latitude: 241, longitude: 480)> Size: 116kB
dask.array<replace_values, shape=(241, 480), dtype=uint8, chunksize=(241, 120), chunktype=numpy.ndarray>
Coordinates:
  * latitude     (latitude) float32 964B 90.0 89.25 88.5 ... -88.5 -89.25 -90.0
  * longitude    (longitude) float32 2kB -180.0 -179.2 -178.5 ... 178.5 179.2
    spatial_ref  int64 8B 0
xr.testing.assert_identical(rasterized_dask, rasterized)