Revert to using non-Numy affine transformation operation by sgillies · Pull Request #3313 · rasterio/rasterio

sgillies · 2025-02-17T23:36:18Z

Resolves #3310

sgillies · 2025-02-17T23:37:13Z

@groutr what do you think? Looks like we have a bad interaction between Python and Numpy math on a particular platform.

groutr · 2025-02-18T00:29:49Z

Very interesting. I can take a deeper look at it tomorrow when I'm back at a computer.

groutr · 2025-02-18T17:18:00Z

After thoroughly reading the bug reports more in-depth, I'm a big 👎 on this solution. I doubt this is an issue of mixing Python and Numpy and seems to point to the BLAS/LAPACK library. I don't think universally reverting a big improvement of rasterio 1.4 is the correct way to resolve this.

If we are adamant about the issue being fixed in rasterio, I think a less intrusive fix would be to limit the precision by rounding before applying the floor operation. There may be value in first scrutinizing how BLAS/LAPACK are being packaged on opensuse as mentioned in https://bugzilla.suse.com/show_bug.cgi?id=1236116#c5. If the test suite for BLAS/LAPACK is having issues, then everything that relies on BLAS/LAPACK will also be affected.

@lpechacek the reason numpy is used for matrix multiplication here is because, without it, the process of transforming any large number of points is unbearably slow.

lpechacek · 2025-02-20T09:19:10Z

After thoroughly reading the bug reports more in-depth, I'm a big 👎 on this solution. I doubt this is an issue of mixing Python and Numpy and seems to point to the BLAS/LAPACK library. I don't think universally reverting a big improvement of rasterio 1.4 is the correct way to resolve this.

Understood. I also originally thought that the problem is in BLAS and how Numpy uses it. That is why I originally titled my openSUSE report as an "unreliable calculation". After reading comments from colleagues whose opinion I value, I changed my mind and accepted the idea that there are different kinds of "floating point number" (beyond float, double, and long double).

If we are adamant about the issue being fixed in rasterio, I think a less intrusive fix would be to limit the precision by rounding before applying the floor operation.

Whatever solves the issue. :)

There may be value in first scrutinizing how BLAS/LAPACK are being packaged on opensuse as mentioned in https://bugzilla.suse.com/show_bug.cgi?id=1236116#c5. If the test suite for BLAS/LAPACK is having issues, then everything that relies on BLAS/LAPACK will also be affected.

Well, we can work on openSUSE's BLAS/LAPACK but it won't help Debian. I haven't researched on the impact on distributions but I guess that all Linux distributions that support x86 (32 bit) are affected.

@lpechacek the reason numpy is used for matrix multiplication here is because, without it, the process of transforming any large number of points is unbearably slow.

Thanks for the explanation!

lpechacek · 2025-02-20T12:10:53Z

@lpechacek the reason numpy is used for matrix multiplication here is because, without it, the process of transforming any large number of points is unbearably slow.

Thanks for the explanation!

Pardon me, I fail to understand what "any large number of points" means. I tested the performance of rasterio.io index() and xy() and in all cases I got shorter processing times with the patch from this pull request than with the Numpy implementation.

Mind that I just a packager and know essentially nothing about the Rasterio module and related topics. Still I've got CS education and when I see an inner-loop routine that performs complicated preparation steps, I know that its performance will be degraded for individual values. Then I expect some economies of scale from processing large vectors (here I used sets of 300-3M coordinates) but I fail to detect them too.

In any case, the performance for single value lookup (e.g. index() does not accept vectors) is better with the patch here. Just saying. :)

For the record, my testing procedure:

user@vbox:~/rasterio-1.4.3> PYTHONPATH=build/lib.linux-i686-cpython-311 python3 
Python 3.11.11 (main, Dec 04 2024, 21:44:34) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import numpy as np
>>> import rasterio
>>> src = rasterio.open('tests/data/RGB.byte.tif')
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
10.638185453999995
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
10.657873143000415
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
10.645034958999986
>>> 
user@vbox:~/rasterio-1.4.3> wget --quiet https://github.com/rasterio/rasterio/pull/3313/commits/7608a04567c9960d626b6836ae99b2665e978cee.patch
user@vbox:~/rasterio-1.4.3> patch build/lib.linux-i686-cpython-311/rasterio/transform.py < 7608a04567c9960d626b6836ae99b2665e978cee.patch
patching file build/lib.linux-i686-cpython-311/rasterio/transform.py
Hunk #1 FAILED at 6.
1 out of 1 hunk FAILED -- saving rejects to file build/lib.linux-i686-cpython-311/rasterio/transform.py.rej
patching file build/lib.linux-i686-cpython-311/rasterio/transform.py
user@vbox:~/rasterio-1.4.3> # The patch rejects is the changelog entry
user@vbox:~/rasterio-1.4.3> PYTHONPATH=build/lib.linux-i686-cpython-311 python3 
Python 3.11.11 (main, Dec 04 2024, 21:44:34) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import numpy as np
>>> import rasterio
>>> src = rasterio.open('tests/data/RGB.byte.tif')
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
8.270643490000111
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
8.358940623999843
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(300), np.random.random_sample(300)), number = 100000)
8.308591027000148
>>> 
>>> # Maybe the 300-item vector is too short and Rasterio will choke on a larger vector?
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(30000), np.random.random_sample(30000)), number = 1000)
3.994692398999632
>>> timeit.timeit(lambda: src.xy(np.random.random_sample(3000000), np.random.random_sample(3000000)), number = 1)
0.295577516999856
>>>

The single-point lookup:

user@vbox:~/rasterio-1.4.3> cat timing.py 
import timeit
import rasterio

with rasterio.open('tests/data/RGB.byte.tif') as src:
    left, bottom, right, top = src.bounds
    print(timeit.timeit(lambda: src.index(left, top), number=500000))

user@vbox:~/rasterio-1.4.3> PYTHONPATH=build/lib.linux-i686-cpython-311 python3 timing.py 
34.141054769999755
user@vbox:~/rasterio-1.4.3> patch build/lib.linux-i686-cpython-311/rasterio/transform.py < 7608a04567c9960d626b6836ae99b2665e978cee.patch
patching file build/lib.linux-i686-cpython-311/rasterio/transform.py
Hunk #1 FAILED at 6.
1 out of 1 hunk FAILED -- saving rejects to file build/lib.linux-i686-cpython-311/rasterio/transform.py.rej
patching file build/lib.linux-i686-cpython-311/rasterio/transform.py
user@vbox:~/rasterio-1.4.3> PYTHONPATH=build/lib.linux-i686-cpython-311 python3 timing.py 
27.806108064
user@vbox:~/rasterio-1.4.3> echo $[100*28/34]
82

groutr · 2025-02-24T18:03:25Z

If we performed this computation in a way that avoids directly computing an inverse, we can test if the accuracy errors are being introduced by the inverse transform. Since I don't have 32bit hardware to test on, I'm curious to see what the following results in.

import numpy as np
import rasterio
# This is the transform from the RGB.byte.tif test file.
test_file = rasterio.open("./tests/data/RGB.byte.tif")
transform = np.array(test_file.transform).reshape(3,3)
bounds = test_file.bounds
height = test_file.height
width = test_file.width
# transform = [300.0379266750948,0.0,101985.0,0.0,-300.041782729805, 2826915.0, 0.0, 0.0, 1.0]
# bounds = (101985.0, 2611485.0, 339315.0, 2826915.0)
# height = 791
# width = 718

pxs = np.array([[bounds[0], bounds[3]], [bounds[2], bounds[3]], [bounds[2], bounds[1]], [bounds[0], bounds[1]]])
coords = np.linalg.solve(transform, pxs)   # solve for the coordinates, but without directly inverting the transform
expected = np.array([[0, width, width, 0], [0, 0, height, height]])
np.allclose(np.floor(coords[:2]), expected)

If that test passes, then perhaps AffineTransformer could be improved by not relying on affine to directly compute an inverse, which would be preferred.

class AffineTransformer(TransformerBase):
    """A pure Python class related to affine based coordinate transformations."""
    def __init__(self, affine_transform):
        super().__init__()
        if not isinstance(affine_transform, Affine):
            raise ValueError("Not an affine transform")
        self._transformer = affine_transform
        self._transformer_arr = np.array(affine_transform, dtype='float64').reshape(3, 3)

    def _transform(self, xs, ys, zs, transform_direction):
        bi = np.broadcast(xs, ys)
        input_matrix = np.empty((3, bi.size))
        input_matrix[0] = bi.iters[0]
        input_matrix[1] = bi.iters[1]
        input_matrix[2] = 1

        if transform_direction is TransformDirection.forward:
            transformed = np.matmul(self._transformer_arr, input_matrix) 
        elif transform_direction is TransformDirection.reverse:
            transformed = np.linalg.solve(self._transformer_arr, input_matrix)
        return transformed[0], transformed[1]

lpechacek · 2025-02-26T16:47:01Z

Thanks for the proposal!

I'm curious to see what the following results in.

Triggers a run time error:

lpechacek@fmn:/data/src/rasterio> cat > /tmp/rasterio-test-script.py
import numpy as np
import rasterio
# This is the transform from the RGB.byte.tif test file.
test_file = rasterio.open("./tests/data/RGB.byte.tif")
transform = np.array(test_file.transform).reshape(3,3)
bounds = test_file.bounds
height = test_file.height
width = test_file.width
# transform = [300.0379266750948,0.0,101985.0,0.0,-300.041782729805, 2826915.0, 0.0, 0.0, 1.0]
# bounds = (101985.0, 2611485.0, 339315.0, 2826915.0)
# height = 791
# width = 718

pxs = np.array([[bounds[0], bounds[3]], [bounds[2], bounds[3]], [bounds[2], bounds[1]], [bounds[0], bounds[1]]])
coords = np.linalg.solve(transform, pxs)   # solve for the coordinates, but without directly inverting the transform
expected = np.array([[0, width, width, 0], [0, 0, height, height]])
np.allclose(np.floor(coords[:2]), expected)
lpechacek@fmn:/data/src/rasterio> python3 /tmp/rasterio-test-script.py
Traceback (most recent call last):
  File "/tmp/rasterio-test-script.py", line 15, in <module>
    coords = np.linalg.solve(transform, pxs)   # solve for the coordinates, but without directly inverting the transform
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/numpy/linalg/_linalg.py", line 413, in solve
    r = gufunc(a, b, signature=signature)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 4 is different from 3)

If that test passes, then perhaps AffineTransformer could be improved by not relying on affine to directly compute an inverse, which would be preferred.

This makes the tests pass. The patch is https://build.opensuse.org/projects/home:LPechacek:branches:Application:Geo/packages/python-rasterio/files/v1.4.3-transform-hotfix.patch?expand=1

The build results, which include running the tests, are at https://build.opensuse.org/package/show/home:LPechacek:branches:Application:Geo/python-rasterio .

groutr · 2025-02-26T22:33:09Z

@lpechacek Thank you for testing the np.linalg.solve method! I'm relieved to hear that it passes the tests. I'm a +1 on that method as I think it's a more mathematically and numerically sound approach.

@sgillies I can submit the PR later this evening.

Revert to using non-Numy affine transformation operation

7608a04

Resolves #3310

sgillies self-assigned this Feb 17, 2025

groutr mentioned this pull request Feb 27, 2025

Use Lapack solve to more accurately compute affine reverse transforms. #3315

Merged

sgillies closed this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Revert to using non-Numy affine transformation operation#3313

Revert to using non-Numy affine transformation operation#3313
sgillies wants to merge 1 commit intomaint-1.4from
issue3310

sgillies commented Feb 17, 2025

Uh oh!

sgillies commented Feb 17, 2025

Uh oh!

groutr commented Feb 18, 2025

Uh oh!

groutr commented Feb 18, 2025 •

edited

Loading

Uh oh!

lpechacek commented Feb 20, 2025 •

edited

Loading

Uh oh!

lpechacek commented Feb 20, 2025

Uh oh!

groutr commented Feb 24, 2025 •

edited

Loading

Uh oh!

lpechacek commented Feb 26, 2025

Uh oh!

groutr commented Feb 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

sgillies commented Feb 17, 2025

Uh oh!

sgillies commented Feb 17, 2025

Uh oh!

groutr commented Feb 18, 2025

Uh oh!

groutr commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lpechacek commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lpechacek commented Feb 20, 2025

Uh oh!

groutr commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lpechacek commented Feb 26, 2025

Uh oh!

groutr commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

groutr commented Feb 18, 2025 •

edited

Loading

lpechacek commented Feb 20, 2025 •

edited

Loading

groutr commented Feb 24, 2025 •

edited

Loading

groutr commented Feb 26, 2025 •

edited

Loading