Skip to content

Boundless windowed masked reads of rasters with no NODATA tag set 0 to NaN #3245

@ajnisbet

Description

@ajnisbet

Expected behavior and actual behavior.

Calling read with boundless=True, masked=True, window=window on a all-valid raster with no NODATA tag incorrectly masks 0-valued pixels.

Expected behaviour is for there to be no masked values in an all-valid array.

read_masks returns the correct mask (so a workaround is to read unmasked, read the mask, then do your own masking).

Bug is present in 1.4.0, 1.4.1, and 1.4.2. But not present < 1.4.0.

This branch looks like the culprit:

rasterio/rasterio/_io.pyx

Lines 697 to 699 in ff66850

mask_vrt_doc = _boundless_vrt_doc(
self, nodata=0,
width=max(self.width, window.width) + 1,

Steps to reproduce the problem.

Testcase:

import numpy as np
import rasterio

# Test array: random data > 0, with one pixel set to zero.
shape = (180, 360)
transform = rasterio.transform.from_bounds(-180, -90, 180, 90, 360, 180)
a = np.random.RandomState(1).uniform(1, 2, shape)
a[100, 100] = 0

# Save to geotiff.
path = '/tmp/rasterio-boundless-masked.tif'
profile = {
    "driver":'GTiff',
    "height": a.shape[0],
    "width": a.shape[1],
    "crs": rasterio.crs.CRS.from_epsg(4326),
    "transform": transform,
    "dtype": np.float32,
    "count": 1,
    
}
with rasterio.open(path, 'w', **profile) as f_out:
    f_out.write(a, 1)

# Read with different options, check no nan values area present.
with rasterio.open(path) as f:
    bounds_full = f.bounds
    shape_full = f.shape
    
    window = rasterio.windows.Window.from_slices((1, 180), (1, 360), boundless=True)

    a_full = f.read(1, masked=True).filled(np.nan)
    a_window = f.read(1, window=window, masked=True).filled(np.nan)
    a_boundless = f.read(1, boundless=True, masked=True).filled(np.nan)
    a_window_boundless = f.read(1, window=window, boundless=True, masked=True).filled(np.nan)
    
    mask_full = f.read_masks(1)
    mask_window = f.read_masks(1, window=window)
    mask_boundless = f.read_masks(1, boundless=True)
    mask_window_boundless = f.read_masks(1, window=window, boundless=True)

print(f"{shape_full=}")
print(f"{window=}")
print()

print(f"{np.isnan(a_full).sum()=:d}")
print(f"{np.isnan(a_window).sum()=:d}")
print(f"{np.isnan(a_boundless).sum()=:d}")
print(f"{np.isnan(a_window_boundless).sum()=:d}")
print()
print(f"{(mask_full == 0).sum()=:d}")
print(f"{(mask_window == 0).sum()=:d}")
print(f"{(mask_boundless == 0).sum()=:d}")
print(f"{(mask_window_boundless == 0).sum()=:d}")

Output:

shape_full=(180, 360)
window=Window(col_off=1, row_off=1, width=359, height=179)

np.isnan(a_full).sum()=0
np.isnan(a_window).sum()=0
np.isnan(a_boundless).sum()=0
np.isnan(a_window_boundless).sum()=1  # ⚠️ this should be zero

(mask_full == 0).sum()=0
(mask_window == 0).sum()=0
(mask_boundless == 0).sum()=0
(mask_window_boundless == 0).sum()=0

Bug persists if using a truly boundless window larger than the raster bounds.

Bug disappears if setting the single pixel to any value other than zero (a[100, 100] = -1) or setting a nodata tag on the raster ("nodata": -9999).

Environment Information

rasterio info:
  rasterio: 1.4.2
      GDAL: 3.9.3
      PROJ: 9.4.1
      GEOS: 3.11.1
 PROJ DATA: /home/user/.local/lib/python3.12/site-packages/rasterio/proj_data
 GDAL DATA: None

System:
    python: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
executable: /opt/conda/bin/python
   machine: Linux-5.15.0-119-generic-x86_64-with-glibc2.39

Python deps:
    affine: 2.4.0
     attrs: 24.2.0
   certifi: 2024.08.30
     click: 8.1.7
     cligj: 0.7.2
    cython: None
     numpy: 2.0.2
click-plugins: None
setuptools: 69.5.1

Installation Method

pip install rasterio==1.4.2 in conda py312_24.5.0-0 in ghcr.io/osgeo/gdal:ubuntu-small-3.9.2.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions