-
-
Notifications
You must be signed in to change notification settings - Fork 12.2k
np.histogram() inconsistent results #7628
Description
I think that #6100 may have introduced a bug in some edge cases. Here is a case found through fuzz testing:
|35> import numpy as np
|36> np.__version__
'1.10.4'
|37> arr = np.array([337, 404, 739, 806, 1007, 1811, 2012])
|39> hist, edges = np.histogram(arr, bins=8296, range=(2, 2280))
|40> mask = hist > 0
|41> left_edges = edges[:-1][mask]
|42> right_edges = edges[1:][mask]
|43> zip(arr, left_edges, right_edges)
[(337, 337.00000000000006, 337.27459016393448),
(404, 404.00000000000006, 404.27459016393448),
(739, 739.00000000000011, 739.27459016393448),
(806, 806.00000000000011, 806.27459016393448),
(1007, 1007.0000000000001, 1007.2745901639345),
(1811, 1811.0000000000002, 1811.2745901639346),
(2012, 2012.0000000000002, 2012.2745901639346)]
This was found through fuzz testing an internal accelerated histogram routine (which takes roughly the same strategy as #6100 but implemented in Cython) and comparing the results to np.histogram(). Comparing with numpy 1.9.2, everything worked identically. At least as of numpy 1.10.4, some of these edge cases popped up.
I believe the main difference between #6100 and our routine is that we do not precompute the histogram scaling factor bins / (mx - mn): https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L638
We compute ((tmp_a - mn) * bins) / (mx - mn) . The comments of our routine mention floating point problems as the reason for deliberately avoiding precomputation.