Conversation
|
@groutr what do you think? Looks like we have a bad interaction between Python and Numpy math on a particular platform. |
|
Very interesting. I can take a deeper look at it tomorrow when I'm back at a computer. |
|
After thoroughly reading the bug reports more in-depth, I'm a big 👎 on this solution. I doubt this is an issue of mixing Python and Numpy and seems to point to the BLAS/LAPACK library. I don't think universally reverting a big improvement of rasterio 1.4 is the correct way to resolve this. If we are adamant about the issue being fixed in rasterio, I think a less intrusive fix would be to limit the precision by rounding before applying the floor operation. There may be value in first scrutinizing how BLAS/LAPACK are being packaged on opensuse as mentioned in https://bugzilla.suse.com/show_bug.cgi?id=1236116#c5. If the test suite for BLAS/LAPACK is having issues, then everything that relies on BLAS/LAPACK will also be affected. @lpechacek the reason numpy is used for matrix multiplication here is because, without it, the process of transforming any large number of points is unbearably slow. |
Understood. I also originally thought that the problem is in BLAS and how Numpy uses it. That is why I originally titled my openSUSE report as an "unreliable calculation". After reading comments from colleagues whose opinion I value, I changed my mind and accepted the idea that there are different kinds of "floating point number" (beyond
Whatever solves the issue. :)
Well, we can work on openSUSE's BLAS/LAPACK but it won't help Debian. I haven't researched on the impact on distributions but I guess that all Linux distributions that support x86 (32 bit) are affected.
Thanks for the explanation! |
Pardon me, I fail to understand what "any large number of points" means. I tested the performance of Mind that I just a packager and know essentially nothing about the Rasterio module and related topics. Still I've got CS education and when I see an inner-loop routine that performs complicated preparation steps, I know that its performance will be degraded for individual values. Then I expect some economies of scale from processing large vectors (here I used sets of 300-3M coordinates) but I fail to detect them too. In any case, the performance for single value lookup (e.g. For the record, my testing procedure: The single-point lookup: |
|
If we performed this computation in a way that avoids directly computing an inverse, we can test if the accuracy errors are being introduced by the inverse transform. Since I don't have 32bit hardware to test on, I'm curious to see what the following results in. import numpy as np
import rasterio
# This is the transform from the RGB.byte.tif test file.
test_file = rasterio.open("./tests/data/RGB.byte.tif")
transform = np.array(test_file.transform).reshape(3,3)
bounds = test_file.bounds
height = test_file.height
width = test_file.width
# transform = [300.0379266750948,0.0,101985.0,0.0,-300.041782729805, 2826915.0, 0.0, 0.0, 1.0]
# bounds = (101985.0, 2611485.0, 339315.0, 2826915.0)
# height = 791
# width = 718
pxs = np.array([[bounds[0], bounds[3]], [bounds[2], bounds[3]], [bounds[2], bounds[1]], [bounds[0], bounds[1]]])
coords = np.linalg.solve(transform, pxs) # solve for the coordinates, but without directly inverting the transform
expected = np.array([[0, width, width, 0], [0, 0, height, height]])
np.allclose(np.floor(coords[:2]), expected)If that test passes, then perhaps AffineTransformer could be improved by not relying on affine to directly compute an inverse, which would be preferred. class AffineTransformer(TransformerBase):
"""A pure Python class related to affine based coordinate transformations."""
def __init__(self, affine_transform):
super().__init__()
if not isinstance(affine_transform, Affine):
raise ValueError("Not an affine transform")
self._transformer = affine_transform
self._transformer_arr = np.array(affine_transform, dtype='float64').reshape(3, 3)
def _transform(self, xs, ys, zs, transform_direction):
bi = np.broadcast(xs, ys)
input_matrix = np.empty((3, bi.size))
input_matrix[0] = bi.iters[0]
input_matrix[1] = bi.iters[1]
input_matrix[2] = 1
if transform_direction is TransformDirection.forward:
transformed = np.matmul(self._transformer_arr, input_matrix)
elif transform_direction is TransformDirection.reverse:
transformed = np.linalg.solve(self._transformer_arr, input_matrix)
return transformed[0], transformed[1] |
|
Thanks for the proposal!
Triggers a run time error:
This makes the tests pass. The patch is https://build.opensuse.org/projects/home:LPechacek:branches:Application:Geo/packages/python-rasterio/files/v1.4.3-transform-hotfix.patch?expand=1 The build results, which include running the tests, are at https://build.opensuse.org/package/show/home:LPechacek:branches:Application:Geo/python-rasterio . |
|
@lpechacek Thank you for testing the np.linalg.solve method! I'm relieved to hear that it passes the tests. I'm a +1 on that method as I think it's a more mathematically and numerically sound approach. @sgillies I can submit the PR later this evening. |
Resolves #3310