Skip to content

FlwdirRaster.stream crashed on a ~44000x20000 raster on a machine with 16 GB or less #44

@vzagorovskiy

Description

@vzagorovskiy

pyFlwDir version checks

  • I have checked that this issue has not already been reported.
  • I have checked that this bug exists on the latest version of pyFlwDir.

Reproducible Example

import os.path
from datetime import datetime

import geopandas as gpd
import numpy as np
import rasterio
import pyflwdir
filename = r"c:\temp\hand\altaisk_krai_30m.tif"
min_sto = 6

start_time = datetime.now()
start_time_global = start_time
print(f"Read file: {filename}")
print(f"Start time: {start_time:%Y-%m-%d %H:%M:%S}")
with rasterio.open(filename, "r") as src:
    elevtn = src.read(1)
    nodata = src.nodata
    transform = src.transform
    crs = src.crs
    extent = np.array(src.bounds)[[0, 2, 1, 3]]
    latlon = src.crs.is_geographic
    prof = src.profile
end_time = datetime.now()
print(f"End time: {end_time:%Y-%m-%d %H:%M:%S}")
print(f"Duration: {end_time - start_time}")

start_time = datetime.now()
print(f"Calculate flow directions")
print(f"Start time: {start_time:%Y-%m-%d %H:%M:%S}")

flw = pyflwdir.from_dem(
    data=elevtn,
    nodata=src.nodata,
    transform=transform,
    latlon=latlon,
    outlets="edge",
)

print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Calculate stream network")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")

feats = flw.streams(min_sto=4)

print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Vectorize stream network")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")
gdf = gpd.GeoDataFrame.from_features(feats, crs=crs)
print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Write stream network to file")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")
gdf.to_file(os.path.splitext(filename)[0] + '.shp')
print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

print(f"Total duration: {datetime.now() - start_time_global}")

Current behaviour

You cannot use rasters larger than ~44000x20000 on a computer with 16gb of memory or less

image

image

Crashed here on this line of my script

feats = flw.streams(min_sto=4)

pyflwdir.pyflwdir.FlwdirRaster.streams

pyflwdir/pyflwdir.py:977

...
        if xs is None or ys is None:
            idxs0 = np.arange(self.size, dtype=np.intp)
->            xs, ys = gis.idxs_to_coords(idxs0, self.transform, self.shape)
...

pyflwdir/gis_utils.py:222

...
->    xs, ys = transform * transform.translation(coff, roff) * (cols, rows)
...

Desired behaviour

Do not create the whole array with pixel coordinates at once, but only those that are necessary to form geometry in pyflwdir.gis_utils.features

Additional context

I tried to specify the coordinate accuracy to float32

pyflwdir/gis_utils.py:207

pyflwdir.gis_utils.xy

rows, cols = np.asarray(rows, dtype=np.float32), np.asarray(cols, dtype=np.float32)

But I also got the error

image

I tried to specify the coordinate accuracy to float16

pyflwdir/gis_utils.py:207

pyflwdir.gis_utils.xy

rows, cols = np.asarray(rows, dtype=np.float16), np.asarray(cols, dtype=np.float16)

I was able to do the calculations, but the accuracy was very poor

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions