Skip to content

Conversation

@tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Jan 3, 2019

Fixes #12629, which is also referenced from a pandas PR.

If we ultimately decide that this a bug fix we want to do then probably need:

  • versionchanged directive in sort-related docs
  • 1.17 release note of some sort
  • any follow-ups / issues with argmin/max & so-on behaviors / consistency for NaT handling to worry about?
    • for _argmin there's a note in numpy/core/src/multiarray/arraytypes.c.src that "NPY_DATETIME_NAT is smaller than every other value"
    • there are separate TYPE_COMPARE and TYPE_LT style functions in NumPy & I've only touched the latter here as per minimum changes for the specific issue at hand

Other useful notes:

  • from around 10 years ago, 717c7ac contains the original changes to adjust nan sorting to the end, as far as I can tell; the file in question no longer exists
  • but numpy/core/src/multiarray/arraytypes.c.src contains the comment regarding np.nan values being sorted to the end; I initially tried making some changes in there, but it really seems to be the TYPE_LT style function that needs adjustment in this case

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this seems logical. Not sure how much of a deprecation is needed.

@seberg
Copy link
Member

seberg commented Jan 5, 2019

It seems currently NaT is completely ignored for min/max and argmin/argmax. I wonder if we should push to change that to be identical NaN (NaN is always returned). I guess if it that bites someone, on the up-side it probably will be loud since it points to NaT.

There is also an independend bug for argmin when it starts with NaT and has only one not NaT item at the end:

--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -3198,7 +3198,7 @@ static int
         mp = ip[i];
         i++;
     }
-    if (i == n) {
+    if (i - 1 == n) {
         /* All NaTs: return 0 */
         *min_ind = 0;
         return 0;

EDIT: Or probably nicer, just init it to -1 or so. The loop is not called for empty arrays.

@charris
Copy link
Member

charris commented Jan 11, 2019

It is LT, so I guess to keep sort order equal should evaluate to false

Agree. If the two quantities are the same, NaT, NaN or not, then one cannot be < the other. That is for sorting only and independent of their "normal" arithmetic.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 18, 2019
@charris
Copy link
Member

charris commented Jan 18, 2019

Needs a release note.

@tylerjereddy
Copy link
Contributor Author

Added release note, versionchanged directive, and adopted the first ternary operator that @mhvk suggested (second one didn't pass my unit tests). Good to verify that adjustment--there was some back & forth there between reviewers.

Perhaps we can get away with the adjustment to other functions in separate PRs?

@seberg
Copy link
Member

seberg commented Jan 18, 2019

@tylerjereddy, I think separate PRs should be fine. Just need to open an issue with 1.17.0 milestone, because I would hate to not do it all in the same release.

@tylerjereddy tylerjereddy added this to the 1.17.0 release milestone Jan 18, 2019
@tylerjereddy
Copy link
Contributor Author

Ok, I restored logic closer to my original instead of one of the reviewer-suggested ternary operators. It is just easier for me to read it. I did apply some of the suggested simplifications to those conditionals.

I added a test for argsort stabillity with all NaT values in an array. This seems ok as long as I force mergesort. It looks like the other stability tests also force mergesort. That does make sense of course, but I wonder about the other sorting code paths.

I did notice that some of the sorting algorithms will segfault with NaT if LT implementation is not correct (i.e., some of them would segfault with one of the ternary operators suggested above). That means we don't test the NaT argsort codepaths with other sort algorithms. Do we want something crude parametrized over the sorting kinds just to make sure there's no major issue (i.e., all-NaT array in sorts to all-NaT array out)?

@charris
Copy link
Member

charris commented Jan 19, 2019

The other sorts will change the relative position of even identical elements, the tests for them is simply that the result be sorted. We will probably want parametrized tests at some point, especially with type specific sorts like the upcoming radix sort. Both timsort and radixsort should also be stable, so we will need to expand the tests for those. A new test_sorting.py module is looking like the way to go, that whole library has grown over the years.

@seberg
Copy link
Member

seberg commented Feb 12, 2019

@tylerjereddy do you want to put this in and you/I can do the rest in a followup, or should we/I continue here? Got to resolve the conflicts, but I can do that.

@tylerjereddy
Copy link
Contributor Author

Are we concerned about deprecation? I guess opening a tracking issue for the other things we'd need to do before 1.17 related to this may be useful.

@seberg
Copy link
Member

seberg commented Feb 13, 2019

Yeah, you are right. It is a bit annoying because it will create FutureWarnings with no good way of silencing them (except manually removing the NaTs, or, I suppose, view(np.int64) is a hackish way to do it in a more or less future proof way). For min/max it is a bit "nicer" because at least the result is pretty special.
It all feels painful to deprecate, but I suppose few people will actually hit it.

@tylerjereddy
Copy link
Contributor Author

I'm happy if we want to delay this to 1.18. I think 1.17 is getting quite a lot of stuff already?

@seberg seberg modified the milestones: 1.17.0 release, 1.18.0 release Jun 11, 2019
@seberg
Copy link
Member

seberg commented Jun 11, 2019

Sounds OK to me, I do not remember that there was something specific that made it super pressing. Only feel it should keep in sync with the argmin/argmax issue. Moved both milestones.

@seberg
Copy link
Member

seberg commented Oct 15, 2019

We need to fix up the release notes. We just opened a fixup for the (somewhat, but maybe not exactly) related fixes to min/max and argmin/argmax. So I would like to put this into 1.18.

tylerjereddy and others added 2 commits November 22, 2019 11:31
* added unit tests for NaT sorting
to the ends of arrays

* NaT now treated as largest integer
for sorting purposes, so that it sorts
to end of arrays for consistency with
NaN
@mattip
Copy link
Member

mattip commented Nov 22, 2019

Rebased and reformated release note for 1.18

@mattip mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Nov 24, 2019
@charris
Copy link
Member

charris commented Nov 25, 2019

Let's put this in and see what feedback we get.

@charris charris merged commit d051ab5 into numpy:master Nov 25, 2019
@charris
Copy link
Member

charris commented Nov 25, 2019

Thanks Tyler.

@jorisvandenbossche
Copy link
Contributor

It seems this only has been done for datetime64, but not for timedelta64 (introducing another inconsistency). That might be something to still consider for 1.8 ?

'mergesort' and 'stable' are mapped to radix sort for integer data types. Radix sort is an
O(n) sort instead of O(n log n).
.. versionchanged:: 1.17.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point -- I forgot to bump this for 1.18.0?

seberg pushed a commit that referenced this pull request Dec 11, 2019
)

This work is a follow up of gh-12658.
As requested in gh-15063, add NaT sort support for timedelta64 datatypes also.

Fixes gh-15063
charris pushed a commit to charris/numpy that referenced this pull request Dec 15, 2019
…gh-15068)

This work is a follow up of numpygh-12658.
As requested in numpygh-15063, add NaT sort support for timedelta64 datatypes also.

Fixes numpygh-15063
asmeurer added a commit to asmeurer/numpy that referenced this pull request Aug 3, 2021
In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed
so that NaT sorts to the end, but the internal compare array function was not
updated.

Fixes numpy#19574.
asmeurer added a commit to asmeurer/numpy that referenced this pull request Aug 3, 2021
…edelta

In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed
so that NaT sorts to the end, but the internal compare array function was not
updated.

Fixes numpy#19574.
scratchmex pushed a commit to scratchmex/numpy that referenced this pull request Aug 13, 2021
…edelta

In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed
so that NaT sorts to the end, but the internal compare array function was not
updated.

Fixes numpy#19574.
charris pushed a commit to charris/numpy that referenced this pull request Oct 10, 2021
…edelta

In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed
so that NaT sorts to the end, but the internal compare array function was not
updated.

Fixes numpy#19574.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NaT and extended sort order

6 participants