Skip to content

Conversation

@qinhanmin2014
Copy link
Member

closes #8459, closes #9300, closes #9301, closes #9562, closes #11245
Correct brier_score_loss when there's only one class in y_true.
This is an ugly solution, but will fix existing bugs, and avoid backward incompatibility.
Remove some redundant code in calibration_curve and brier_score_loss.

@qinhanmin2014
Copy link
Member Author

ping @jnothman

@qinhanmin2014 qinhanmin2014 changed the title FIX Correct brier_score_loss when there's only one class in y_true [MRG] FIX Correct brier_score_loss when there's only one class in y_true Apr 13, 2019
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We could also raise an error if it's all not 1 and pos_label is not specified. Maybe in the future?

Add what's new?

raise ValueError("y_prob has values outside [0, 1] and normalize is "
"set to False.")

y_true = _check_binary_probabilistic_predictions(y_true, y_prob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we no longer validating y_prob?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y_prob is already validated above.


if pos_label is None:
pos_label = y_true.max()
if (np.array_equal(labels, [0, 1]) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share this code with _binary_clf_curve? And document the behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems difficult since the definition of pos_label=None is different?

@qinhanmin2014
Copy link
Member Author

We could also raise an error if it's all not 1 and pos_label is not specified. Maybe in the future?

Maybe in the future if someone complains? I guess it's not so important.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the parameter documentation please

@qinhanmin2014
Copy link
Member Author

Update the parameter documentation please

docstring updated. IMO it's not so easy to describe current behavior.

Label of the positive class. If None, the maximum label is used as
positive class
Label of the positive class.
When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to the greater label unless y_true is all 0 or all -1 in which case pos_label defaults to 1.

How about that?

@qinhanmin2014
Copy link
Member Author

How about that?

Much better, I also simplify the code.

@qinhanmin2014
Copy link
Member Author

ping @jnothman since the previous PR is tagged as 0.21 by you.

@qinhanmin2014
Copy link
Member Author

ping @jnothman since the previous PR is tagged as 0.21 by you :)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as far as I got today!!

labels = np.unique(y_true)
if len(labels) > 2:
raise ValueError("Only binary classification is supported. "
"Provided labels %s." % labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Provided labels %s." % labels)
"Labels in y_true: %s." % labels)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 qinhanmin2014 added this to the 0.21 milestone Apr 24, 2019

- |Fix| Fixed a bug where :func:`metrics.brier_score_loss` will sometimes
return incorrect result when there's only one class in ``y_true``.
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:issue:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.
:pr:`13628` by :user:`Hanmin Qin <qinhanmin2014>`.

@glemaitre glemaitre self-requested a review April 26, 2019 09:15
@glemaitre glemaitre merged commit 296ee0c into scikit-learn:master Apr 26, 2019
@glemaitre
Copy link
Member

Thanks @qinhanmin2014 I made the change in the what's new

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
@qinhanmin2014 qinhanmin2014 deleted the brier_score branch April 29, 2019 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

brier_score_loss error brier_score_loss returns incorrect value when all y_true values are True/1

3 participants