Skip to content

warp.reproject does not handle masked arrays properly #2575

@adehecq

Description

@adehecq

Issue

When passing a masked array to warp.reproject, the mask is not properly taken into account.

Looking at the source code at this line, it seems that the behavior of the function is:

  • set the src_nodata to the masked_array fill_value - This is not desirable, as the Raster's nodata value and masked_arrays fill_value are two independent concepts. Moreover, a masked value is not necessarily set to fill_value (e.g. if the mask has been updated)
  • the mask is not actually taken into account, at L 267 source = np.array(source, copy=False) means that the
  • for UInt8 rasters using the whole range 0-255, it can be annoying to have to set a nodata value that might actually be used.

Expected behavior and actual behavior.

  1. The mask should be used during the reprojection, not only the fill_value.
  2. If src_nodata is not set, should raise an error or warning
  3. It should be possible to run the reprojection without setting a nodata value, especially for UInt8 types

Point 1 is easy to address by replacing source = np.array(source, copy=False) with source = np.ma.masked_array(source).filled(src_nodata).
Point 2 is also straightforward.
Point 3 might be more complex.

Steps to reproduce the problem.

Below is simple reproducible example:

import numpy as np
import rasterio as rio
from rasterio import warp

# Create fake raster
width = height = 10
src_data = np.random.randint(0, 255, (height, width), dtype="uint8")
src_transform = rio.transform.from_bounds(0, 0, 1, 1, width, height)
crs = rio.crs.CRS.from_epsg(4326)
nodata = 255

# Shift by a 1/3 pixel to the right, with nearest output should be equal
dst_transform = rio.transform.from_bounds(0 + src_transform[0]/3., 0, 1, 1, width, height)
dst_data, _ = warp.reproject(src_data, src_transform=src_transform, resampling=warp.Resampling.nearest, src_crs=crs, dst_crs=crs, dst_transform=dst_transform, destination=np.ones_like(src_data), dst_nodata=nodata, src_nodata=255)
np.array_equal(dst_data, src_data) # -> True

# Same but with 3 left columns masked
src_data2 = np.ma.masked_array(src_data)
src_data2[:, :3] = np.ma.masked
dst_data2, _ = warp.reproject(src_data2, src_transform=src_transform, resampling=warp.Resampling.nearest, src_crs=crs, dst_crs=crs, dst_transform=dst_transform, destination=np.ones_like(src_data), dst_nodata=nodata, src_nodata=255)
dst_data2 = np.ma.masked_array(dst_data2, mask=(dst_data2==nodata))
np.array_equal(src_data2.data, dst_data2.data) # -> True, the underlying pixels are unchanged
np.array_equal(src_data2.mask, dst_data2.mask) # -> False, the mask is not preserved

# Fixing the inapprorpiate behavior
src_data3 = np.ma.masked_array(src_data2).filled(255)
dst_data3, _ = warp.reproject(src_data3, src_transform=src_transform, resampling=warp.Resampling.nearest, src_crs=crs, dst_crs=crs, dst_transform=dst_transform, destination=np.ones_like(src_data), dst_nodata=nodata, src_nodata=255)
dst_data3 = np.ma.masked_array(dst_data3, mask=(dst_data3==nodata))

np.array_equal(src_data3, dst_data3.data) # -> True, the original pixels are unchanged
np.array_equal(src_data2.mask, dst_data3.mask) # -> True, the mask is preserved

Environment Information

rasterio info:
rasterio: 1.3.2
GDAL: 3.5.1
PROJ: 9.0.1
GEOS: 3.11.0
PROJ DATA: /usr/local/Caskroom/miniconda/base/envs/xdem/share/proj
GDAL DATA: /usr/local/Caskroom/miniconda/base/envs/xdem/share/gdal

System:
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:52) [Clang 13.0.1 ]
executable: /usr/local/Caskroom/miniconda/base/envs/xdem/bin/python
machine: macOS-11.6.4-x86_64-i386-64bit

Python deps:
affine: 2.3.1
attrs: 22.1.0
certifi: 2022.06.15
click: 8.0.4
cligj: 0.7.2
cython: None
numpy: 1.22.4
snuggs: 1.4.7
click-plugins: None
setuptools: 65.3.0

Installation Method

Installed from conda.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions