Skip to content

Memory leak in rasterio.open #3250

@Multihuntr

Description

@Multihuntr

Expected behavior and actual behavior.

Repeatedly using rasterio.open on the same file should eventually stop consuming more RAM.

Instead, RAM usage increases every time rasterio.open is called.

Might be related to #3241.

Steps to reproduce the problem.

Create a test tif with random data. Then repeatedly open it (thousands of times).

# Create a dummy test tif file
import numpy as np
import rasterio

profile = {
    'driver': 'GTiff',
    'dtype': 'uint8',
    'nodata': 255,
    'height': 224,
    'width': 224,
    'count': 1
}
np.random.seed(45612)
with rasterio.open('test.tif', 'w', **profile) as tif:
    tif.write(np.random.randint(0, 255, size=(1, 224, 224)))
# File is 52kB large
# Repeatedly open the dummy test file and measure memory usage
import psutil
import rasterio

proc = psutil.Process()
for x in range(20):
    for y in range(5000):
        with rasterio.open('test.tif') as tif:
            pass
    print(x, f'{proc.memory_info().rss / 1024 / 1024 / 1024:5.3f}')

I have observed this same behaviour across my real tif files, too (larger, tiled, compressed). If you just open the file once, then repeatedly read from it, there is no memory leak. So it seems to be something fairly fundamental to rasterio.open.

Environment Information

rasterio info:
  rasterio: 1.4.2
      GDAL: 3.9.3
      PROJ: 9.5.0
      GEOS: 3.13.0
 PROJ DATA: /home/brandon/git/gff/envs/gff2/share/proj
 GDAL DATA: /home/brandon/git/gff/envs/gff2/share/gdal

System:
    python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0]
executable: /home/brandon/git/gff/envs/gff2/bin/python
   machine: Linux-6.8.0-48-generic-x86_64-with-glibc2.35

Python deps:
    affine: 2.4.0
     attrs: 24.2.0
   certifi: 2024.08.30
     click: 8.1.7
     cligj: 0.7.2
    cython: None
     numpy: 2.1.3
click-plugins: None
setuptools: 75.3.0

Installation Method

mamba and conda-forge

Across versions

Using the above environment, the snippet uses ~2.815GB of RAM.

I used Docker containers to try the above code on different rasterio versions. Seems to be a problem that has existed for quite a while, but gotten worse over time. The container had it much worse than my host environment did, and I'm not sure why.

# Dockerfiles like this one, but with different versions
FROM mambaorg/micromamba:2.0.3
RUN micromamba install --yes "rasterio=1.4.1" "psutil"
1.3.1 1.3.8 1.3.9 1.3.10 1.3.11 1.4.0 1.4.1 1.4.2
GDAL 3.5.2 3.7.3 3.8.5 3.9.2 3.9.3 3.9.3 3.9.3 3.10.0
Python 3.10 3.12 3.12 3.12 3.13 3.13 3.13 3.13
Numpy 1.26.4 1.26.4 1.26.4 2.1.3 2.1.3 2.1.3 2.1.3 2.1.3
Mem. (GB) 0.098 0.140 0.099 2.825 2.814 35.011 35.012 36.807

In 1.3.9 the above script leaks <100MB after 100k file opens. But, then in 1.3.10 that jumps to 2.8GB for 100k file opens. And then in 1.4.0 it jumps again to 35GB for 100k file opens. To be clear, in all cases the total memory increases over time. Probably no one noticed in <1.3.9 because it leaked so little. Also, my current environment has similar behaviour to 1.3.10 and 1.3.11, even though I'm using 1.4.2. It's weird, and I'm not sure what to explore from here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions