-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
BUG: NaT now sorts to ends of arrays #12658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this seems logical. Not sure how much of a deprecation is needed.
|
It seems currently NaT is completely ignored for min/max and argmin/argmax. I wonder if we should push to change that to be identical NaN (NaN is always returned). I guess if it that bites someone, on the up-side it probably will be loud since it points to NaT. There is also an independend bug for --- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -3198,7 +3198,7 @@ static int
mp = ip[i];
i++;
}
- if (i == n) {
+ if (i - 1 == n) {
/* All NaTs: return 0 */
*min_ind = 0;
return 0;EDIT: Or probably nicer, just init it to -1 or so. The loop is not called for empty arrays. |
Agree. If the two quantities are the same, NaT, NaN or not, then one cannot be < the other. That is for sorting only and independent of their "normal" arithmetic. |
|
Needs a release note. |
4d79344 to
fc27db0
Compare
|
Added release note, Perhaps we can get away with the adjustment to other functions in separate PRs? |
|
@tylerjereddy, I think separate PRs should be fine. Just need to open an issue with 1.17.0 milestone, because I would hate to not do it all in the same release. |
fc27db0 to
547786d
Compare
|
Ok, I restored logic closer to my original instead of one of the reviewer-suggested ternary operators. It is just easier for me to read it. I did apply some of the suggested simplifications to those conditionals. I added a test for argsort stabillity with all I did notice that some of the sorting algorithms will segfault with NaT if LT implementation is not correct (i.e., some of them would segfault with one of the ternary operators suggested above). That means we don't test the NaT argsort codepaths with other sort algorithms. Do we want something crude parametrized over the sorting |
|
The other sorts will change the relative position of even identical elements, the tests for them is simply that the result be sorted. We will probably want parametrized tests at some point, especially with type specific sorts like the upcoming radix sort. Both |
|
@tylerjereddy do you want to put this in and you/I can do the rest in a followup, or should we/I continue here? Got to resolve the conflicts, but I can do that. |
|
Are we concerned about deprecation? I guess opening a tracking issue for the other things we'd need to do before 1.17 related to this may be useful. |
|
Yeah, you are right. It is a bit annoying because it will create FutureWarnings with no good way of silencing them (except manually removing the NaTs, or, I suppose, |
|
I'm happy if we want to delay this to 1.18. I think 1.17 is getting quite a lot of stuff already? |
|
Sounds OK to me, I do not remember that there was something specific that made it super pressing. Only feel it should keep in sync with the argmin/argmax issue. Moved both milestones. |
|
We need to fix up the release notes. We just opened a fixup for the (somewhat, but maybe not exactly) related fixes to min/max and argmin/argmax. So I would like to put this into 1.18. |
* added unit tests for NaT sorting to the ends of arrays * NaT now treated as largest integer for sorting purposes, so that it sorts to end of arrays for consistency with NaN
547786d to
fc99245
Compare
|
Rebased and reformated release note for 1.18 |
|
Let's put this in and see what feedback we get. |
|
Thanks Tyler. |
|
It seems this only has been done for datetime64, but not for timedelta64 (introducing another inconsistency). That might be something to still consider for 1.8 ? |
| 'mergesort' and 'stable' are mapped to radix sort for integer data types. Radix sort is an | ||
| O(n) sort instead of O(n log n). | ||
| .. versionchanged:: 1.17.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point -- I forgot to bump this for 1.18.0?
…gh-15068) This work is a follow up of numpygh-12658. As requested in numpygh-15063, add NaT sort support for timedelta64 datatypes also. Fixes numpygh-15063
In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.
…edelta In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.
…edelta In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.
…edelta In numpy#12658 and numpy#15068 the sort ordering for datetime and timedelta was changed so that NaT sorts to the end, but the internal compare array function was not updated. Fixes numpy#19574.
Fixes #12629, which is also referenced from a
pandasPR.If we ultimately decide that this a bug fix we want to do then probably need:
versionchangeddirective insort-related docs1.17release note of some sortNaThandling to worry about?_argminthere's a note innumpy/core/src/multiarray/arraytypes.c.srcthat "NPY_DATETIME_NAT is smaller than every other value"TYPE_COMPAREandTYPE_LTstyle functions in NumPy & I've only touched the latter here as per minimum changes for the specific issue at handOther useful notes:
numpy/core/src/multiarray/arraytypes.c.srccontains the comment regardingnp.nanvalues being sorted to the end; I initially tried making some changes in there, but it really seems to be theTYPE_LTstyle function that needs adjustment in this case