-
Notifications
You must be signed in to change notification settings - Fork 551
Description
Expected behavior and actual behavior.
Repeatedly using rasterio.open on the same file should eventually stop consuming more RAM.
Instead, RAM usage increases every time rasterio.open is called.
Might be related to #3241.
Steps to reproduce the problem.
Create a test tif with random data. Then repeatedly open it (thousands of times).
# Create a dummy test tif file
import numpy as np
import rasterio
profile = {
'driver': 'GTiff',
'dtype': 'uint8',
'nodata': 255,
'height': 224,
'width': 224,
'count': 1
}
np.random.seed(45612)
with rasterio.open('test.tif', 'w', **profile) as tif:
tif.write(np.random.randint(0, 255, size=(1, 224, 224)))
# File is 52kB large# Repeatedly open the dummy test file and measure memory usage
import psutil
import rasterio
proc = psutil.Process()
for x in range(20):
for y in range(5000):
with rasterio.open('test.tif') as tif:
pass
print(x, f'{proc.memory_info().rss / 1024 / 1024 / 1024:5.3f}')I have observed this same behaviour across my real tif files, too (larger, tiled, compressed). If you just open the file once, then repeatedly read from it, there is no memory leak. So it seems to be something fairly fundamental to rasterio.open.
Environment Information
rasterio info:
rasterio: 1.4.2
GDAL: 3.9.3
PROJ: 9.5.0
GEOS: 3.13.0
PROJ DATA: /home/brandon/git/gff/envs/gff2/share/proj
GDAL DATA: /home/brandon/git/gff/envs/gff2/share/gdal
System:
python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0]
executable: /home/brandon/git/gff/envs/gff2/bin/python
machine: Linux-6.8.0-48-generic-x86_64-with-glibc2.35
Python deps:
affine: 2.4.0
attrs: 24.2.0
certifi: 2024.08.30
click: 8.1.7
cligj: 0.7.2
cython: None
numpy: 2.1.3
click-plugins: None
setuptools: 75.3.0
Installation Method
mamba and conda-forge
Across versions
Using the above environment, the snippet uses ~2.815GB of RAM.
I used Docker containers to try the above code on different rasterio versions. Seems to be a problem that has existed for quite a while, but gotten worse over time. The container had it much worse than my host environment did, and I'm not sure why.
# Dockerfiles like this one, but with different versions
FROM mambaorg/micromamba:2.0.3
RUN micromamba install --yes "rasterio=1.4.1" "psutil"| 1.3.1 | 1.3.8 | 1.3.9 | 1.3.10 | 1.3.11 | 1.4.0 | 1.4.1 | 1.4.2 | |
|---|---|---|---|---|---|---|---|---|
| GDAL | 3.5.2 | 3.7.3 | 3.8.5 | 3.9.2 | 3.9.3 | 3.9.3 | 3.9.3 | 3.10.0 |
| Python | 3.10 | 3.12 | 3.12 | 3.12 | 3.13 | 3.13 | 3.13 | 3.13 |
| Numpy | 1.26.4 | 1.26.4 | 1.26.4 | 2.1.3 | 2.1.3 | 2.1.3 | 2.1.3 | 2.1.3 |
| Mem. (GB) | 0.098 | 0.140 | 0.099 | 2.825 | 2.814 | 35.011 | 35.012 | 36.807 |
In 1.3.9 the above script leaks <100MB after 100k file opens. But, then in 1.3.10 that jumps to 2.8GB for 100k file opens. And then in 1.4.0 it jumps again to 35GB for 100k file opens. To be clear, in all cases the total memory increases over time. Probably no one noticed in <1.3.9 because it leaked so little. Also, my current environment has similar behaviour to 1.3.10 and 1.3.11, even though I'm using 1.4.2. It's weird, and I'm not sure what to explore from here.