BUG: NaT now sorts to ends of arrays #12658

tylerjereddy · 2019-01-03T22:07:23Z

Fixes #12629, which is also referenced from a pandas PR.

If we ultimately decide that this a bug fix we want to do then probably need:

versionchanged directive in sort-related docs
1.17 release note of some sort
any follow-ups / issues with argmin/max & so-on behaviors / consistency for NaT handling to worry about?
- for _argmin there's a note in numpy/core/src/multiarray/arraytypes.c.src that "NPY_DATETIME_NAT is smaller than every other value"
- there are separate TYPE_COMPARE and TYPE_LT style functions in NumPy & I've only touched the latter here as per minimum changes for the specific issue at hand

Other useful notes:

from around 10 years ago, 717c7ac contains the original changes to adjust nan sorting to the end, as far as I can tell; the file in question no longer exists
but numpy/core/src/multiarray/arraytypes.c.src contains the comment regarding np.nan values being sorted to the end; I initially tried making some changes in there, but it really seems to be the TYPE_LT style function that needs adjustment in this case

mhvk

To me, this seems logical. Not sure how much of a deprecation is needed.

numpy/core/src/npysort/npysort_common.h

seberg · 2019-01-05T11:22:01Z

It seems currently NaT is completely ignored for min/max and argmin/argmax. I wonder if we should push to change that to be identical NaN (NaN is always returned). I guess if it that bites someone, on the up-side it probably will be loud since it points to NaT.

There is also an independend bug for argmin when it starts with NaT and has only one not NaT item at the end:

--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -3198,7 +3198,7 @@ static int
         mp = ip[i];
         i++;
     }
-    if (i == n) {
+    if (i - 1 == n) {
         /* All NaTs: return 0 */
         *min_ind = 0;
         return 0;

EDIT: Or probably nicer, just init it to -1 or so. The loop is not called for empty arrays.

charris · 2019-01-11T14:40:37Z

It is LT, so I guess to keep sort order equal should evaluate to false

Agree. If the two quantities are the same, NaT, NaN or not, then one cannot be < the other. That is for sorting only and independent of their "normal" arithmetic.

charris · 2019-01-18T00:18:25Z

Needs a release note.

tylerjereddy · 2019-01-18T18:26:21Z

Added release note, versionchanged directive, and adopted the first ternary operator that @mhvk suggested (second one didn't pass my unit tests). Good to verify that adjustment--there was some back & forth there between reviewers.

Perhaps we can get away with the adjustment to other functions in separate PRs?

seberg · 2019-01-18T18:47:30Z

@tylerjereddy, I think separate PRs should be fine. Just need to open an issue with 1.17.0 milestone, because I would hate to not do it all in the same release.

numpy/core/src/npysort/npysort_common.h

tylerjereddy · 2019-01-18T23:17:04Z

Ok, I restored logic closer to my original instead of one of the reviewer-suggested ternary operators. It is just easier for me to read it. I did apply some of the suggested simplifications to those conditionals.

I added a test for argsort stabillity with all NaT values in an array. This seems ok as long as I force mergesort. It looks like the other stability tests also force mergesort. That does make sense of course, but I wonder about the other sorting code paths.

I did notice that some of the sorting algorithms will segfault with NaT if LT implementation is not correct (i.e., some of them would segfault with one of the ternary operators suggested above). That means we don't test the NaT argsort codepaths with other sort algorithms. Do we want something crude parametrized over the sorting kinds just to make sure there's no major issue (i.e., all-NaT array in sorts to all-NaT array out)?

charris · 2019-01-19T00:03:57Z

The other sorts will change the relative position of even identical elements, the tests for them is simply that the result be sorted. We will probably want parametrized tests at some point, especially with type specific sorts like the upcoming radix sort. Both timsort and radixsort should also be stable, so we will need to expand the tests for those. A new test_sorting.py module is looking like the way to go, that whole library has grown over the years.

seberg · 2019-02-12T20:39:28Z

@tylerjereddy do you want to put this in and you/I can do the rest in a followup, or should we/I continue here? Got to resolve the conflicts, but I can do that.

tylerjereddy · 2019-02-12T23:30:29Z

Are we concerned about deprecation? I guess opening a tracking issue for the other things we'd need to do before 1.17 related to this may be useful.

seberg · 2019-02-13T21:28:46Z

Yeah, you are right. It is a bit annoying because it will create FutureWarnings with no good way of silencing them (except manually removing the NaTs, or, I suppose, view(np.int64) is a hackish way to do it in a more or less future proof way). For min/max it is a bit "nicer" because at least the result is pretty special.
It all feels painful to deprecate, but I suppose few people will actually hit it.

tylerjereddy · 2019-06-11T19:03:21Z

I'm happy if we want to delay this to 1.18. I think 1.17 is getting quite a lot of stuff already?

seberg · 2019-06-11T22:28:07Z

Sounds OK to me, I do not remember that there was something specific that made it super pressing. Only feel it should keep in sync with the argmin/argmax issue. Moved both milestones.

seberg · 2019-10-15T22:57:58Z

We need to fix up the release notes. We just opened a fixup for the (somewhat, but maybe not exactly) related fixes to min/max and argmin/argmax. So I would like to put this into 1.18.

* added unit tests for NaT sorting to the ends of arrays * NaT now treated as largest integer for sorting purposes, so that it sorts to end of arrays for consistency with NaN

mattip · 2019-11-22T19:33:35Z

Rebased and reformated release note for 1.18

charris · 2019-11-25T23:08:23Z

Let's put this in and see what feedback we get.

charris · 2019-11-25T23:08:40Z

Thanks Tyler.

jorisvandenbossche · 2019-12-06T09:58:13Z

It seems this only has been done for datetime64, but not for timedelta64 (introducing another inconsistency). That might be something to still consider for 1.8 ?

tylerjereddy · 2019-12-07T14:01:00Z

numpy/core/fromnumeric.py

    'mergesort' and 'stable' are mapped to radix sort for integer data types. Radix sort is an
    O(n) sort instead of O(n log n).

+    .. versionchanged:: 1.17.0


Minor point -- I forgot to bump this for 1.18.0?

) This work is a follow up of gh-12658. As requested in gh-15063, add NaT sort support for timedelta64 datatypes also. Fixes gh-15063

…gh-15068) This work is a follow up of numpygh-12658. As requested in numpygh-15063, add NaT sort support for timedelta64 datatypes also. Fixes numpygh-15063

In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.

…edelta In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.

tylerjereddy added 00 - Bug component: numpy._core component: numpy.datetime64 (and timedelta64) labels Jan 3, 2019

mhvk reviewed Jan 5, 2019

View reviewed changes

numpy/core/src/npysort/npysort_common.h Outdated Show resolved Hide resolved

numpy/core/src/npysort/npysort_common.h Outdated Show resolved Hide resolved

numpy/core/src/npysort/npysort_common.h Outdated Show resolved Hide resolved

charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 18, 2019

tylerjereddy force-pushed the NaT_sort_order branch from 4d79344 to fc27db0 Compare January 18, 2019 18:23

tylerjereddy added this to the 1.17.0 release milestone Jan 18, 2019

tylerjereddy mentioned this pull request Jan 18, 2019

MAINT: NaT (arg)min/max behavior #12797

Closed

charris reviewed Jan 18, 2019

View reviewed changes

numpy/core/src/npysort/npysort_common.h Outdated Show resolved Hide resolved

tylerjereddy force-pushed the NaT_sort_order branch from fc27db0 to 547786d Compare January 18, 2019 23:08

seberg modified the milestones: 1.17.0 release, 1.18.0 release Jun 11, 2019

tylerjereddy and others added 2 commits November 22, 2019 11:31

BUG: NaT now sorts to end of arrays

c251a51

* added unit tests for NaT sorting to the ends of arrays * NaT now treated as largest integer for sorting purposes, so that it sorts to end of arrays for consistency with NaN

DOC: add release note

fc99245

mattip force-pushed the NaT_sort_order branch from 547786d to fc99245 Compare November 22, 2019 19:33

mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Nov 24, 2019

charris merged commit d051ab5 into numpy:master Nov 25, 2019

seberg mentioned this pull request Dec 6, 2019

BUG: NaT not sorted to end for timedelta #15063

Closed

tirthasheshpatel mentioned this pull request Dec 7, 2019

ENH: Add support to sort timedelta64 NaT to end of the array #15068

Merged

tylerjereddy commented Dec 7, 2019

View reviewed changes

seberg pushed a commit that referenced this pull request Dec 11, 2019

ENH: Add support to sort timedelta64 NaT to end of the array (gh-15068

578f4e7

) This work is a follow up of gh-12658. As requested in gh-15063, add NaT sort support for timedelta64 datatypes also. Fixes gh-15063

charris mentioned this pull request Dec 15, 2019

ENH: Add support to sort timedelta64 NaT to end of the array #15111

Merged

asmeurer mentioned this pull request Aug 3, 2021

BUG: Fix NaT handling in the PyArray_CompareFunc for datetime and timedelta #19607

Merged

charris mentioned this pull request Oct 10, 2021

BUG: Fix NaT handling in the PyArray_CompareFunc for datetime and timedelta #20081

Merged

Uh oh!

BUG: NaT now sorts to ends of arrays #12658

BUG: NaT now sorts to ends of arrays #12658

Uh oh!

Conversation

tylerjereddy commented Jan 3, 2019 • edited by seberg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Jan 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Jan 11, 2019

Uh oh!

charris commented Jan 18, 2019

Uh oh!

tylerjereddy commented Jan 18, 2019

Uh oh!

seberg commented Jan 18, 2019

Uh oh!

Uh oh!

tylerjereddy commented Jan 18, 2019

Uh oh!

charris commented Jan 19, 2019

Uh oh!

seberg commented Feb 12, 2019

Uh oh!

tylerjereddy commented Feb 12, 2019

Uh oh!

seberg commented Feb 13, 2019

Uh oh!

tylerjereddy commented Jun 11, 2019

Uh oh!

seberg commented Jun 11, 2019

Uh oh!

seberg commented Oct 15, 2019

Uh oh!

mattip commented Nov 22, 2019

Uh oh!

charris commented Nov 25, 2019

Uh oh!

charris commented Nov 25, 2019

Uh oh!

jorisvandenbossche commented Dec 6, 2019

Uh oh!

tylerjereddy Dec 7, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tylerjereddy commented Jan 3, 2019 •

edited by seberg

Loading

seberg commented Jan 5, 2019 •

edited

Loading