-
Notifications
You must be signed in to change notification settings - Fork 551
Description
Hi.
We are experiencing severe memory leaks when calling dataset.read(window=window) in rasterio 1.4.2 on Linux using tiled GeoTiffs. The problem seems to only be prevalent when using the tiled=true option. The problem seems to be most severe on Linux, but the problem also seems to be present to a lesser extent on Windows. I'm not sure if this could be related to another issue i opened in the GDAL repo (OSGeo/gdal#11164), although that problem is mostly present on Windows.
Expected behavior and actual behavior.
When reading from a large GeoTiff dataset using windows, it should release memory after variable is garbage collected. E.g.
for window in tqdm(windows):
with rasterio.open(ds_name, "r") as ds:
data = ds.read(window=window)Will just continue to increase memory usage on the data = ds.read(window=window) line.
Memory profiler of sample code:

Steps to reproduce the problem.
Following code should replicate the issue on Linux:
import rasterio
import rasterio.windows
from tqdm import tqdm
import numpy as np
import rasterio.crs
ds_name = "test.tif"
dim = 64000
win_dim = 1024
num_columns = dim // win_dim
def create_dataset():
transform = [
1, 0, 0, 0, 1, 0
]
spatial_ref = rasterio.crs.CRS.from_epsg(4326)
print("Creating test tiff")
with rasterio.open(ds_name, "w", driver="GTiff", width=dim, height=dim, count=3, dtype=np.uint8, crs=spatial_ref, transform=transform, blockxsize=win_dim,
blockysize=win_dim, tiled=True, bigtiff="yes", compress="lzw") as ds:
for i in tqdm(range(0, num_columns ** 2)):
x = i % num_columns
y = i // num_columns
x_offset = x * win_dim
y_offset = y * win_dim
data_np = (np.random.rand(3, win_dim, win_dim) * 255).astype(np.uint8)
window = rasterio.windows.Window(x_offset, y_offset, win_dim, win_dim)
ds.write(data_np, window=window)
def read_dataset():
with rasterio.open(ds_name, "r") as ds:
windows = [window for _, window in ds.block_windows()]
print("Windowed reading")
for window in tqdm(windows):
with rasterio.open(ds_name, "r") as ds:
data = ds.read(window=window)
if __name__ == "__main__":
create_dataset()
read_dataset()create_dataset() is optional. It seems like any large tiled GeoTiff will introduce the problem.
Environment Information
rasterio info:
rasterio: 1.4.2
GDAL: 3.9.3
PROJ: 9.5.0
GEOS: 3.13.0
PROJ DATA: /home/ear/miniconda3/envs/akson-310/share/proj
GDAL DATA: /home/ear/miniconda3/envs/akson-310/share/gdal
System:
python: 3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:24:24) [GCC 13.3.0]
executable: /home/ear/miniconda3/envs/akson-310/bin/python
machine: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python deps:
affine: 2.4.0
attrs: 24.2.0
certifi: 2024.08.30
click: 8.1.7
cligj: 0.7.2
cython: None
numpy: 1.26.4
click-plugins: None
setuptools: 75.3.0
note above environment is from WSL2 Linux, but we have observed the problem on Amazon Linux on AWS EC2.
Installation Method
Installed from conda-forge.