[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data) by glemaitre · Pull Request #17377 · scikit-learn/scikit-learn

glemaitre · 2020-05-28T18:11:07Z

Add a new parameter to select the type of interpolation. By default, the median in NumPy and our implementation should be the same.

sklearn/utils/stats.py

adrinjalali · 2020-05-29T09:30:02Z

I was wondering about this when I was working on sample weights on HGBT. So I'm all in :)

If we have one cython version, then we could use it in other places such as HGBT as well.

sklearn/utils/stats.py

glemaitre · 2020-05-29T09:55:27Z

I was wondering about this when I was working on sample weights on HGBT. So I'm all in :)

So I think that I am converging here (the tests with the NumPy are fine). Not sure that we actually need "lower" and "higher". Basically "nearest" should be fast and "linear" would be to match the default of NumPy and allow to interpolate.

If we have one cython version, then we could use it in other places such as HGBT as well.

I really rely on NumPy there. If we are fine internally in the HGBDT to use the nearest I am not sure that we will improve too much by cythonizing.

sklearn/utils/stats.py

glemaitre · 2020-06-22T22:07:33Z

sklearn/ensemble/tests/test_gradient_boosting.py

        if last_y_pred is not None:
-            assert_array_almost_equal(last_y_pred, y_pred)
-
+            assert_allclose(last_y_pred, y_pred)


@lucyleeow I added your test here. Using the strategy nearest everywhere seems to be the winner here to be able to keep the sample_weight semantic rights.

glemaitre · 2020-06-22T22:10:42Z

I merged a couple of things in this PR to be able to check the integration in the gradient boosting as well.

If the test are passing I would propose to cut this PR into 3 parts:

implementation of the _weighted_percentile with its own test;
changing boston with make_regression in the gradient boosting;
change the call from np.median to np.percentile(..., interpolation="nearest"to ensure that we have the same behaviour withsample_weight=None and sample_weight=np.ones(...).

jnothman · 2020-06-25T14:36:52Z

If the test are passing I would propose to cut this PR into 3 parts:

So this is no longer for review?

glemaitre · 2020-06-29T07:51:00Z

No, I think that I need to cut it into smaller pieces because the diff is too large. I don't want to make the reviewing process even harder. I will take care of it this week.

lorentzenchr · 2024-04-26T17:24:20Z

See #17370 (comment)

lucyleeow and others added 5 commits April 16, 2020 11:08

check diabetes

2ecc96d

use diabetes and cali

deacbf5

pytest network

f257ff1

Merge remote-tracking branch 'origin/master' into pr/lucyleeow/16937

e0b00ac

BUG make _weighted_percentile behave as NumPy

31a116e

github-actions bot added the module:utils label May 28, 2020

glemaitre added 2 commits May 29, 2020 00:53

iter

6c8a405

revert setup.cfg

23af759

glemaitre commented May 28, 2020

View reviewed changes

sklearn/utils/stats.py Show resolved Hide resolved

glemaitre added 3 commits May 29, 2020 11:08

iter

3be7c09

iter

06aeab1

iter

f389292

lucyleeow reviewed May 29, 2020

View reviewed changes

sklearn/utils/stats.py Show resolved Hide resolved

iter

9e1222f

lucyleeow reviewed May 29, 2020

View reviewed changes

sklearn/utils/stats.py Outdated Show resolved Hide resolved

iter

9314aee

lucyleeow reviewed May 29, 2020

View reviewed changes

sklearn/utils/stats.py Outdated Show resolved Hide resolved

lucyleeow reviewed May 29, 2020

View reviewed changes

sklearn/utils/stats.py Outdated Show resolved Hide resolved

glemaitre added 10 commits May 29, 2020 13:10

iter

0e857a9

improve documentation

5988234

iter

8100873

iter

cc4a172

parametrize debug

4e100a9

iter

25e5d24

case we have a single weight non null

ff2a6e0

update test

201f0c7

compat old numpy

d8a4a73

iter

450f4c8

lucyleeow added 3 commits June 17, 2020 13:01

fix lint

65f11e9

up rtol

7a61848

[empty] CI

bda24fa

lucyleeow mentioned this pull request Jun 17, 2020

TST replace Boston in test_gradient_boosting.py #16937

Merged

lucyleeow and others added 11 commits June 17, 2020 14:04

try rtol float

475db41

reduc rtol

b631eb4

rtol=100

5e3b31c

suggestions

97281b2

use weighted_percentile everywhere

899d56d

iter

cd4344b

Merge remote-tracking branch 'origin/master' into is/17370

2327f98

iter

2de439b

Merge remote-tracking branch 'lucyleeow/test_grad_boost' into is/17370

45f6123

iter

c02f1ea

iter

d47b162

glemaitre commented Jun 22, 2020

View reviewed changes

glemaitre added 3 commits June 23, 2020 10:16

iter

089da6b

change name variable

548bda1

Merge remote-tracking branch 'origin/master' into is/17370

7ca2d47

glemaitre mentioned this pull request Jun 29, 2020

Make _weighted_percentile more robust #6189

Closed

glemaitre changed the title ~~BUG make _weighted_percentile behave as NumPy~~ [WIP] BUG make _weighted_percentile behave as NumPy Jun 29, 2020

glemaitre mentioned this pull request Jun 29, 2020

ENH improve _weighted_percentile to provide several interpolation #17768

Closed

Base automatically changed from master to main January 22, 2021 10:52

ogrisel mentioned this pull request Jan 5, 2022

ENH add support for sample_weight in KBinsDiscretizer(strategy="quantile") #22048

Closed

ogrisel changed the title ~~[WIP] BUG make _weighted_percentile behave as NumPy~~ [WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data) Jan 16, 2024

lorentzenchr closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data)#17377

[WIP] BUG make _weighted_percentile(data, ones, 50) consistent with numpy.median(data)#17377
glemaitre wants to merge 52 commits intoscikit-learn:mainfrom
glemaitre:is/17370

glemaitre commented May 28, 2020 •

edited

Loading

Uh oh!

Uh oh!

adrinjalali commented May 29, 2020

Uh oh!

Uh oh!

Uh oh!

glemaitre commented May 29, 2020

Uh oh!

Uh oh!

Uh oh!

glemaitre Jun 22, 2020

Uh oh!

glemaitre commented Jun 22, 2020

Uh oh!

jnothman commented Jun 25, 2020

Uh oh!

glemaitre commented Jun 29, 2020

Uh oh!

lorentzenchr commented Apr 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

glemaitre commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adrinjalali commented May 29, 2020

Uh oh!

Uh oh!

Uh oh!

glemaitre commented May 29, 2020

Uh oh!

Uh oh!

Uh oh!

glemaitre Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jun 22, 2020

Uh oh!

jnothman commented Jun 25, 2020

Uh oh!

glemaitre commented Jun 29, 2020

Uh oh!

lorentzenchr commented Apr 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

glemaitre commented May 28, 2020 •

edited

Loading