Skip to content

Add conditional to set _mv variable to np.uint64 data type#47

Closed
robgpita wants to merge 2 commits intoDeltares:mainfrom
robgpita:Flwdir-uint64
Closed

Add conditional to set _mv variable to np.uint64 data type#47
robgpita wants to merge 2 commits intoDeltares:mainfrom
robgpita:Flwdir-uint64

Conversation

@robgpita
Copy link
Copy Markdown
Contributor

@robgpita robgpita commented May 7, 2024

Allow the processing of large rasters, shaped at (63708, 79780), for Flwdir object.

Issue addressed

Closes #46

Explanation

As described in issue #46, when large rasters are used, an IndexError was thrown from underlying numba code.

The upstream_count function was the cause of the Indexing Error, due to an incorrect value set in the _mv variable when using a np.uint64 data type.

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Updated documentation if needed
  • Updated CHANGELOG.rst if needed

Additional Notes (optional)

The setting of the np.uint64 dtype in pyflwdir.py was not accounted for, and caused non-correct values in the _mv variable, when assigned in core.py. This was causing the C pointer references to not line up with the signed integer type that was being used in python. See numpy docs for np.intp()

The downstream effects

self._mv = core._mv
if idxs_ds.dtype == np.uint32:
self._mv = np.uint32(self._mv)
if idxs_ds.dtype == np.uint64:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just appears to be a dtype limitation?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fernando-aristizabal not exactly. The dtype assigned to self._mv needs to be the same as the dtype for idxs.ds. Those two dtypes need to line up, otherwise there were errors in numba code. In the downstream code, incorrect aliasing was occurring based on _mv's dtype. This was causing a mis-match in values between _mv and items in the idxs.ds array, which ultimately led to the IndexErrors being thrown.

In order to avoid the IndexErrors, for the large array indices case, where the idxs.ds.dtype is set to uint64, the dtype for _mv needs to be set to uint64 as well.

@DirkEilander DirkEilander mentioned this pull request Dec 19, 2024
4 tasks
@DirkEilander
Copy link
Copy Markdown
Contributor

@robgpita and @fernando-aristizabal Many thanks for looking into this and proposing a solution. I've had to create a new branch with your commits in #56 to be able to merge it. With that merge of that branch I'm also closing this PR. Happy holidays!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM

3 participants