[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data)#17377
[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data)#17377glemaitre wants to merge 52 commits intoscikit-learn:mainfrom
Conversation
|
I was wondering about this when I was working on sample weights on HGBT. So I'm all in :) If we have one cython version, then we could use it in other places such as HGBT as well. |
So I think that I am converging here (the tests with the NumPy are fine). Not sure that we actually need
I really rely on NumPy there. If we are fine internally in the HGBDT to use the |
| if last_y_pred is not None: | ||
| assert_array_almost_equal(last_y_pred, y_pred) | ||
|
|
||
| assert_allclose(last_y_pred, y_pred) |
There was a problem hiding this comment.
@lucyleeow I added your test here. Using the strategy nearest everywhere seems to be the winner here to be able to keep the sample_weight semantic rights.
|
I merged a couple of things in this PR to be able to check the integration in the gradient boosting as well. If the test are passing I would propose to cut this PR into 3 parts:
|
So this is no longer for review? |
|
No, I think that I need to cut it into smaller pieces because the diff is too large. I don't want to make the reviewing process even harder. I will take care of it this week. |
|
See #17370 (comment) |
closes #17370
closes #6189
Add a new parameter to select the type of interpolation. By default, the median in NumPy and our implementation should be the same.