TST replace Boston in test_gradient_boosting.py by lucyleeow · Pull Request #16937 · scikit-learn/scikit-learn

lucyleeow · 2020-04-16T10:39:27Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Remove Boston dataset in sklearn/ensemble/tests/test_gradient_boosting.py. Use a subset of California housing dataset for test_boston and use diabetes dataset for all remaining tests.

Any other comments?

Used California dataset for test_boston due to the high mse error if diabetes dataset used (for parameters in test, mse ranged from ~600-1700). Correspondingly there was also a bigger difference between predictions (assert_array_almost_equal(last_y_pred, y_pred)) and predictions were only equal with decimals=-2.
Happy to change to diabetes/another dataset if California not suitable.

ogrisel · 2020-04-21T17:35:17Z

I wonder if sample_weight=2 * ones is expected to yield the same predictions as when trained with sample_weight=ones for GBRT with the LS loss. The effective learning rate is probably not the same so maybe this assert_array_almost_equal(last_y_pred, y_pred) is just wrong.

lucyleeow · 2020-05-19T11:50:06Z

ping @glemaitre 😅

glemaitre · 2020-05-28T08:43:42Z

@ogrisel I was looking a the LS loss in GBDT, it seems that we divide by the sum of the weight meaning that all 1s or all 2s will lead to the same loss.

sklearn/ensemble/tests/test_gradient_boosting.py

glemaitre · 2020-05-28T09:08:50Z

Ah actually the issue is with the Huber and LAD losses.

glemaitre · 2020-05-28T09:48:06Z

By digging in, the initial raw predictions given by the DummyRegressor will be different with 1s and 2s sample weights. Basically, we use the weighted median and it seems that the median was not the same. This looks weird, I need to go more in depth.

FYI LAD and Huber divide by the sum of the sample weights as well and it should work as expected.
We did not have the problem with LS because we predict the mean.

glemaitre · 2020-05-28T14:50:45Z

I made a small push by merging master

ogrisel

LGTM as it is.

glemaitre · 2020-06-17T07:21:45Z

sklearn/ensemble/tests/test_gradient_boosting.py


        if last_y_pred is not None:
-            assert_array_almost_equal(last_y_pred, y_pred)
+            assert_array_almost_equal(last_y_pred, y_pred, decimal=0)


Before to merge we need to add a FIXME to explain the issue and link to the relevant PR/issue just to not lose track

Suggested change

assert_array_almost_equal(last_y_pred, y_pred, decimal=0)

# FIXME: `decimal=0` is very permissive. This is due to the fact that

# GBRT with and without `sample_weight` do not use the same implementation

# of the median during the initialization with the `DummyRegressor`.

# In the future, we should make sure that both implementations should be the same.

# Refer to https://github.com/scikit-learn/scikit-learn/pull/17377 for some detailed

# explanations.

assert_array_almost_equal(last_y_pred, y_pred, decimal=0)

…nto test_grad_boost

lucyleeow · 2020-06-17T11:27:56Z

Amended to use make_regression instead of california as we would need to allow network (as California dataset is not local)
Amended to use assert_allclose instead of assert_array_almost_equal as the doc says:

     .. note:: It is recommended to use one of `assert_allclose`,
              `assert_array_almost_equal_nulp` or `assert_array_max_ulp`
              instead of this function for more consistent floating point
              comparisons.

It also allows setting of rtol or atol which is more informative than decimal. For check_regression_dataset, this makes the difference in y_pred between the different sample weights very obvious.

lucyleeow · 2020-06-17T11:58:55Z

For check_regression_dataset, the difference between y_pred for the various sample weights seems to be larger for 32bit. Is this related to #17377 ?

ogrisel · 2020-06-17T14:13:32Z

For check_regression_dataset, the difference between y_pred for the various sample weights seems to be larger for 32bit. Is this related to #17377 ?

This is weird. I am not sure why the bitness of Python would influence the results of this test. But maybe we should focus on fixing #17377 first and then see what is the scale of discrepancy that remains when fitting those models with sample_weight on 32 vs 64 bit Python.

lucyleeow · 2020-06-17T19:55:14Z

The 4 failing tests with rtol=65:

Linux py36_ubuntu_atlas
Windows py36_pip_openblas_32bit
Windows py37_conda_mkl
scikit-learn.scikit-learn

AFAICT the 4 tests are failing at test_regression_dataset -> assert_allclose(last_y_pred, y_pred, rtol=65) due to 1 mismatch in the 500 predictions ('mismatch 0.20000000000000284%', which = 1/500).

Both 64 and 32bit builds are in the 4 failing and the passing builds.
The 4 failing tests include both python 3.6 and 3.7
4 failing tests include both 'openblas' and 'mlk'

Not sure why the difference in results.

Trying rtol=100: 3 failing tests with this setting...

glemaitre · 2020-06-18T09:23:17Z

If it is only on 32 bits, I would not be surprised that it is linked to the underlying trees: #8853

glemaitre · 2020-06-18T09:25:50Z

Sorry I misread. This is happening on both architectures, discard my last comment

glemaitre · 2020-06-18T09:27:07Z

sklearn/ensemble/tests/test_gradient_boosting.py

-    ones = np.ones(len(boston.target))
+    ones = np.ones(len(y_reg))
    last_y_pred = None
    for sample_weight in None, ones, 2 * ones:


Suggested change

for sample_weight in None, ones, 2 * ones:

for sample_weight in [None, ones, 2 * ones]:

glemaitre · 2020-06-18T09:29:05Z

Is it always failing for the huber loss?

glemaitre · 2020-06-18T09:30:08Z

sklearn/ensemble/tests/test_gradient_boosting.py

+X_reg, y_reg = make_regression(
+    n_samples=500, n_features=10, n_informative=8, noise=10, random_state=7
+)
+y_reg = StandardScaler().fit_transform(y_reg.reshape((-1, 1)))


you can use the function from sklearn.preprocessing import scale

lucyleeow · 2020-06-18T09:44:58Z

Is it always failing for the huber loss?

Yes, the failure is always with huber and subsample=0.5

glemaitre · 2020-06-23T08:22:31Z

sklearn/ensemble/tests/test_gradient_boosting.py

    last_y_pred = None
-    for sample_weight in None, ones, 2 * ones:
+    for sample_weight in [None, ones, 2 * ones]:
        clf = GradientBoostingRegressor(n_estimators=100,


could you rename the clf by reg

glemaitre · 2020-06-23T08:22:47Z

sklearn/ensemble/tests/test_gradient_boosting.py

-                sample_weight=sample_weight)
-        leaves = clf.apply(boston.data)
-        assert leaves.shape == (506, 100)
+        assert_raises(ValueError, clf.predict, X_reg)


remove this check since it is covered by the common test

glemaitre · 2020-06-23T08:23:39Z

sklearn/ensemble/tests/test_gradient_boosting.py

+            # FIXME: `rtol=65` is very permissive. This is due to the fact that
+            # GBRT with and without `sample_weight` do not use the same
+            # implementation of the median during the initialization with the
+            # `DummyRegressor`. In the future, we should make sure that both
+            # implementations should be the same. See PR #17377 for more.
+            assert_allclose(last_y_pred, y_pred, rtol=100)


Suggested change

# FIXME: `rtol=65` is very permissive. This is due to the fact that

# GBRT with and without `sample_weight` do not use the same

# implementation of the median during the initialization with the

# `DummyRegressor`. In the future, we should make sure that both

# implementations should be the same. See PR #17377 for more.

assert_allclose(last_y_pred, y_pred, rtol=100)

# FIXME: We temporarily bypass this test. This is due to the fact that

# GBRT with and without `sample_weight` do not use the same

# implementation of the median during the initialization with the

# `DummyRegressor`. In the future, we should make sure that both

# implementations should be the same. See PR #17377 for more.

# assert_allclose(last_y_pred, y_pred)

pass

Let's bypass the test. I think that we cannot test this part until we don't fix the percentile computation.
There are no difference between checking with tolerance so large that we don't test anything and not testing it :)

lucyleeow · 2020-06-23T08:38:12Z

Thanks @glemaitre, changes made

glemaitre · 2020-06-23T09:12:09Z

Thanks @lucyleeow

Co-authored-by: Guillaume Lemaitre <[email protected]>

lucyleeow added 3 commits April 16, 2020 11:08

check diabetes

2ecc96d

use diabetes and cali

deacbf5

pytest network

f257ff1

github-actions bot added the module:ensemble label Apr 16, 2020

glemaitre self-assigned this May 28, 2020

glemaitre reviewed May 28, 2020

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into pr/lucyleeow/16937

e0b00ac

glemaitre mentioned this pull request May 28, 2020

_weighted_percentile does not lead to the same result than np.median #17370

Closed

ogrisel approved these changes Jun 15, 2020

View reviewed changes

glemaitre reviewed Jun 17, 2020

View reviewed changes

lucyleeow added 7 commits June 17, 2020 11:42

wip

e269247

Merge branch 'test_grad_boost' of github.com:lucyleeow/scikit-learn i…

e6ce12b

…nto test_grad_boost

Merge branch 'master' into test_grad_boost

5bf3b8f

use make regression

70de4b9

fix lint

65f11e9

up rtol

7a61848

[empty] CI

bda24fa

try rtol float

475db41

lucyleeow added 2 commits June 17, 2020 16:23

reduc rtol

b631eb4

rtol=100

5e3b31c

glemaitre reviewed Jun 18, 2020

View reviewed changes

suggestions

97281b2

glemaitre reviewed Jun 23, 2020

View reviewed changes

glemaitre changed the title ~~Replace Boston in test_gradient_boosting.py~~ TST replace Boston in test_gradient_boosting.py Jun 23, 2020

bypass test

af02304

lint

9278050

glemaitre merged commit 5a33360 into scikit-learn:master Jun 23, 2020

rubywerman pushed a commit to MLH-Fellowship/scikit-learn that referenced this pull request Jun 24, 2020

TST replace Boston in test_gradient_boosting.py (scikit-learn#16937)

6b29ec6

Co-authored-by: Guillaume Lemaitre <[email protected]>

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

TST replace Boston in test_gradient_boosting.py (scikit-learn#16937)

61eca74

Co-authored-by: Guillaume Lemaitre <[email protected]>

lucyleeow deleted the test_grad_boost branch October 21, 2023 03:27

-            assert_array_almost_equal(last_y_pred, y_pred, decimal=0)
+            # FIXME: `decimal=0` is very permissive. This is due to the fact that
+            # GBRT with and without `sample_weight` do not use the same implementation
+            # of the median during the initialization with the `DummyRegressor`.
+            # In the future, we should make sure that both implementations should be the same.
+            # Refer to https://github.com/scikit-learn/scikit-learn/pull/17377 for some detailed
+            # explanations.
+            assert_array_almost_equal(last_y_pred, y_pred, decimal=0)

	for sample_weight in None, ones, 2 * ones:
	for sample_weight in [None, ones, 2 * ones]:

Uh oh!

Conversation

lucyleeow commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel commented Apr 21, 2020

Uh oh!

lucyleeow commented May 19, 2020

Uh oh!

glemaitre commented May 28, 2020

Uh oh!

Uh oh!

glemaitre commented May 28, 2020

Uh oh!

glemaitre commented May 28, 2020

Uh oh!

glemaitre commented May 28, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre Jun 17, 2020

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 17, 2020

Uh oh!

lucyleeow commented Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 17, 2020

Uh oh!

lucyleeow commented Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jun 18, 2020

Uh oh!

glemaitre commented Jun 18, 2020

Uh oh!

glemaitre Jun 18, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jun 18, 2020

Uh oh!

glemaitre Jun 18, 2020

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 18, 2020

Uh oh!

glemaitre Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 23, 2020

Uh oh!

glemaitre commented Jun 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucyleeow commented Apr 16, 2020 •

edited

Loading

lucyleeow commented Jun 17, 2020 •

edited

Loading

lucyleeow commented Jun 17, 2020 •

edited

Loading