Skip to content

Tiled GTiff: Windowed read memory leak in latest rasterio version #3241

@EmilRybergAkson

Description

@EmilRybergAkson

Hi.

We are experiencing severe memory leaks when calling dataset.read(window=window) in rasterio 1.4.2 on Linux using tiled GeoTiffs. The problem seems to only be prevalent when using the tiled=true option. The problem seems to be most severe on Linux, but the problem also seems to be present to a lesser extent on Windows. I'm not sure if this could be related to another issue i opened in the GDAL repo (OSGeo/gdal#11164), although that problem is mostly present on Windows.

Expected behavior and actual behavior.

When reading from a large GeoTiff dataset using windows, it should release memory after variable is garbage collected. E.g.

for window in tqdm(windows):
    with rasterio.open(ds_name, "r") as ds:
        data = ds.read(window=window)

Will just continue to increase memory usage on the data = ds.read(window=window) line.

Memory profiler of sample code:
image

Steps to reproduce the problem.

Following code should replicate the issue on Linux:

import rasterio
import rasterio.windows
from tqdm import tqdm
import numpy as np
import rasterio.crs


ds_name = "test.tif"
dim = 64000
win_dim = 1024
num_columns = dim // win_dim

def create_dataset():
    transform = [
        1, 0, 0, 0, 1, 0
    ]
    spatial_ref = rasterio.crs.CRS.from_epsg(4326)

    print("Creating test tiff")
    with rasterio.open(ds_name, "w", driver="GTiff", width=dim, height=dim, count=3, dtype=np.uint8, crs=spatial_ref, transform=transform, blockxsize=win_dim,
                blockysize=win_dim, tiled=True, bigtiff="yes", compress="lzw") as ds:
        for i in tqdm(range(0, num_columns ** 2)):
            x = i % num_columns
            y = i // num_columns
            x_offset = x * win_dim
            y_offset = y * win_dim

            data_np = (np.random.rand(3, win_dim, win_dim) * 255).astype(np.uint8)
            window = rasterio.windows.Window(x_offset, y_offset, win_dim, win_dim)
            ds.write(data_np, window=window)


def read_dataset():
    with rasterio.open(ds_name, "r") as ds:
        windows = [window for _, window in ds.block_windows()]

    print("Windowed reading")
    for window in tqdm(windows):
        with rasterio.open(ds_name, "r") as ds:
            data = ds.read(window=window)

if __name__ == "__main__":
    create_dataset()
    read_dataset()

create_dataset() is optional. It seems like any large tiled GeoTiff will introduce the problem.

Environment Information

rasterio info:
  rasterio: 1.4.2
      GDAL: 3.9.3
      PROJ: 9.5.0
      GEOS: 3.13.0
 PROJ DATA: /home/ear/miniconda3/envs/akson-310/share/proj
 GDAL DATA: /home/ear/miniconda3/envs/akson-310/share/gdal

System:
    python: 3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:24:24) [GCC 13.3.0]
executable: /home/ear/miniconda3/envs/akson-310/bin/python
   machine: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python deps:
    affine: 2.4.0
     attrs: 24.2.0
   certifi: 2024.08.30
     click: 8.1.7
     cligj: 0.7.2
    cython: None
     numpy: 1.26.4
click-plugins: None
setuptools: 75.3.0

note above environment is from WSL2 Linux, but we have observed the problem on Amazon Linux on AWS EC2.

Installation Method

Installed from conda-forge.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions