Skip to content

FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM #46

@robgpita

Description

@robgpita

pyFlwDir version checks

  • I have checked that this issue has not already been reported.
  • I have checked that this bug exists on the latest version of pyFlwDir.

Reproducible Example

import numpy as np
import rasterio as rio
import pyflwdir 

def accumulate_flow(
    flow_direction_filename,
    headwaters_filename
):

   with rio.open(flow_direction_filename) as src:
        data = src.read(1)
        nodata = src.nodata
        profile = src.profile

    temp = np.ndarray(shape=np.shape(data), dtype=np.uint8)

    temp[data == 1] = 1
    temp[data == 2] = 128
    temp[data == 3] = 64
    temp[data == 4] = 32
    temp[data == 5] = 16
    temp[data == 6] = 8
    temp[data == 7] = 4
    temp[data == 8] = 2
    temp[data == nodata] = 247

    flw = pyflwdir.from_array(temp, ftype='d8', check_ftype=False)

    del temp

    # Read the flow direction raster
    with rio.open(headwaters_filename) as src:
        headwaters = src.read(1)
        nodata = src.nodata

    flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')

    ...

Current behaviour

In trying to accumulate flow (accuflux) on larger rasters generated from 1m LiDAR Data, Segmentation Faults are occurring.
The specifics:
1m LiDAR flow direction file & headwaters file (same size raster)

>>> np.shape(data)
(63708, 79780)

flow_direction_filename is 253M
head_waters_filename is 66M

Rasters generated from 3m LiDAR data will not seg fault, and process successfully.

>>> np.shape(data)
(21236, 26593)

flow_direction_filename is 107M
head_waters_filename is 7.4M

When the Python script is called from a shell script, an Exit Status of 139 is observed. Further debugging:

export PYTHONFAULTHANDLER=1 
root@f7f7e3af5d8b:/# python3 $srcDir/accumulate_headwaters.py -fd flowdir_d8_burned_filled_0.tif -wg headwaters_0.tif

Fatal Python error: Segmentation fault

Current thread 0x00007ff9300d8000 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 215 in order_cells
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272 in idxs_seq
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555 in accuflux
  File "/accumulate_headwaters.py", line 73 in accumulate_flow
  File "/accumulate_headwaters.py", line 110 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, scipy._lib._ccallback_c, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, rasterio._version, rasterio._err, rasterio._filepath, rasterio._env, _brotli, rasterio._transform, rasterio._base, rasterio.crs, rasterio._features, rasterio._warp, rasterio._io (total: 98)
Segmentation fault (core dumped)

Naively, the source code was modified (call to order_cells) hardcoding method="sort". This did not work. As seen above, line 215 in order_cells, calls core.idxs_seq which appears to be the root of the problem. No further investigation has been made past this point.

Desired behaviour

Ideally larger rasters would process without segmentation faults. If not, the exception could potentially be handled a little more elegantly from python with a message stating that the raster is too big to process, or....

Another option might entail providing documentation/examples to users on how to split larger rasters into chunks, and then provide the tool/utility to join/concatenate 'blocked' or 'chunked' rasters back into a single pyflwdir.FlwdirRaster object once processing (whether it be accuflux or other FlwdirRaster methods) is finished.

Additional context

Memory usage was tracked, and it was observed that less than 1/3 of the available RAM was in use when the segmentation fault occurred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions