Skip to content

Comments

Eliminate boundless reads in merge, new compositing implementation#3234

Merged
sgillies merged 7 commits intomaint-1.4from
issue3228
Nov 6, 2024
Merged

Eliminate boundless reads in merge, new compositing implementation#3234
sgillies merged 7 commits intomaint-1.4from
issue3228

Conversation

@sgillies
Copy link
Member

@sgillies sgillies commented Nov 5, 2024

Standardizing window alignment for use in compositing allows us to avoid large virtual sources and operate on smaller arrays when merging. This improves performance by ~50x for the case of many small datasets merged into one large dataset.

Only one test needed adjustment in the end.

Resolves #3228

data = src.read(1)
assert data[184, 61] != 0
assert data[183, 61] != 0
assert data[184, 60] != 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SImilar story here. The gap we're testing for remains out of the picture, but the merged raster has shifted.

assert result.shape == (3, 719, 792)
assert numpy.allclose(data.mean(), result[:, :-1, :-1].mean())
assert numpy.allclose(
data[data != 0].mean(), result[result != 0].mean(), rtol=1e-3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe implement a difference hash?

FWIW, the result looks fine in QGIS:

Untitled

@groutr
Copy link
Contributor

groutr commented Nov 5, 2024

Nice. I'm getting good performance with the case in #3228

starting merge with rasterio 1.3.9
merging took 0.04064512252807617 seconds
--------------------------------
starting merge with rasterio 1.4.3dev
merging took 0.03887510299682617 seconds

Combines the effects of Window.round_offsets and
Window.round_lengths, producing the same effect as gdal_merge.py
continue

temp_shape = (src_count, chunk.height, chunk.width)
cw = cw.align()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New! This does the kind of aligning in gdal_merge.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new method. I noticed in .align(), 0.1 is added to the offsets, however, 0.001 is added to the offsets in .round_offsets().

temp_shape = (src_count, chunk.height, chunk.width)
cw = cw.align()
rows, cols = cw.toslices()
region = dest[:, rows, cols]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a smaller region for copyto() is the bulk of our speedup, as you pointed out @groutr.

d_x = math.floor(self.col_off + 0.1)
d_h = math.floor(self.height + 0.5)
d_w = math.floor(self.width + 0.5)
return Window(d_x, d_y, d_w, d_h)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in gdal_merge.py and Window.round_offsets() and Window.round_lengths().

@sgillies
Copy link
Member Author

sgillies commented Nov 6, 2024

@groutr @snowman2 @vincentsarago this is ready to go. It's correct and fast.

Copy link
Member

@snowman2 snowman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgillies 👍

Also increase the fraction used in round_offsets() to be
consistent with align()
@groutr
Copy link
Contributor

groutr commented Nov 6, 2024

@sgillies I'm not sure how to mark this as approved, but I'm +1. Liked removing the lambda in the rounding methods.

height = math.floor(self.height + 0.5)
width = math.floor(self.width + 0.5)
return Window(col_off, row_off, width, height)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding this in 1.4.3, but will in 1.5.0.

crs="EPSG:32611",
transform=t2,
) as src:
src.write(numpy.ones((1, 2, 2)) * 2 - 1j)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up of some illegible code.

@sgillies sgillies merged commit 4f32241 into maint-1.4 Nov 6, 2024
@sgillies sgillies deleted the issue3228 branch November 6, 2024 18:08
@groutr
Copy link
Contributor

groutr commented Nov 6, 2024

Does this also resolve #3196?

sgillies added a commit that referenced this pull request Dec 2, 2024
* Eliminate boundless reads in merge, new compositing implementation (#3234)

* Eliminate boundless reads in merge, new compositing implementation

Resolves #3228

* Fix nits

* Compare non-zero mean of arrays

* Implement and use Window.align()

Combines the effects of Window.round_offsets and
Window.round_lengths, producing the same effect as gdal_merge.py

* Remove unused math import

* Add more docs about align()

Also increase the fraction used in round_offsets() to be
consistent with align()

* Move align() to a private func, use two existing methods in tests

* Add a test for merging WarpedVRTs (#3237)

Resolves #3196

* Backport of #3217 (#3243)

* Backport of #3217

* Update change log

* Increment GDAL and Python versions for CI (#3244)

* Rewrite _matches() to better support to_authority() and to_epsg() (#3255)

* Rewrite _matches() to better support to_authority() and to_epsg()

Resolves #3239

* Remove list() call and update change log

* Use to_epsg() in is_epsg_code() (#3258)

* Use to_epsg() in is_epsg_code()

Resolves #3248

* Update change log

* Allow MRF compression to surface in properties (#3259)

* Allow MRF compression to surface in properties

Resolves #3256

* Update change log

* Register drivers at most once per process (#3260)

* Register drivers at most once per process

Resolves #3250

* Appease flake8

* Add a note about GDAL_SKIP in the change log

* Support all GDALFillNodata() options in rasterio.fill (#3265)

* Support all GDALFillNodata() options

Resolve #3175.

* Cast values to str and update docs

* Update change log

* Prevent rasterio from trying to open a dataset object (#3266)

Resolves #3105.

* Fix typos discovered by codespell (#3264) (#3267)

* Fix typos discovered by codespell

* crasher



---------

Co-authored-by: Christian Clauss <[email protected]>

* Fix erroneous masking of 0-valued data (#3268)

* Fix erroneous masking of 0-valued data

Resolves #3245

* Add an assertion about data values and update change log

* This is 1.4.3

---------

Co-authored-by: Christian Clauss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants