[MRG] Multi-class Brier Score Loss by aggvarun01 · Pull Request #18699 · scikit-learn/scikit-learn

aggvarun01 · 2020-10-28T14:36:40Z

Reference Issues/PRs

Resolves #16055

What does this implement/fix?

The original formulation for Brier score inherently supports multiclass classification source. This is currently absent in scikit-learn, which restricts Brier score to binary classification. This PR implements Brier Score for multi-class classification.

Notes/Open Questions

There are two different definitions of Brier Score. The one implemented in scikit-learn, which is only applicable for the binary case is:

(where y_hat is the probability of the positive class)

Whereas, the original, and more universal implementation, that is applicable for both the binary and the multi-class case is:

The range of values for the former is [0,1], whereas for the latter, it is [0,2]. For backwards compatibility, this PR uses the old definition for the binary case, and the new one for the multi-class. However, this implementation seems unintuitive, since the range of values changes. I can see the following workarounds:

Leave the PR as is, and add this weird behavior to the docstring (possibly in all caps and bold 🤪)
Slowly deprecate the 'old' formula, so that even in the binary case, the range of outputs is [0,2]. This has the risk of breaking backwards compatibility and possibly making a lot of people angry.
Take inspiration from R library and have a separate multi-class brier score function.

While the current PR implements method 1., I personally lean towards method 3.

ogrisel · 2020-10-29T14:37:01Z

Take inspiration from R library and have a separate multi-class brier score function.

I think this is the best course of action. Introduce a new multiclass_brier_score function with the general formulation and keep the existing implementation untouched for binary classification problems. Cross reference the 2 functions in each other's docstring.

Also when trying to call brier_score on multiclass y_true, make sure that the message in the raised ValueError directs the user to use multiclass_brier_score instead.

If the user decides to use multiclass_brier_score on binary problems, I think this is fine. No need to raise any exception or warning as using the [0, 2] range for binary problems can be considered legits.

…score_loss

aggvarun01 · 2020-11-02T20:02:37Z

@ogrisel

Added a new mutliclass_brier_score
Cross-referenced the two functions in each other's docstrings
Changed the error message for brier_score to point to multiclass_brier_score
Added a bunch of tests

Should be ready for a review.

ogrisel

Thanks for bearing with me, here is another round of review comments:

sklearn/metrics/_classification.py

ogrisel · 2020-11-02T21:41:14Z

sklearn/metrics/_classification.py

+        the probabilities provided are assumed to be that of the
+        positive class. The labels in ``y_pred`` are assumed to be
+        ordered alphabetically, as done by
+        :class:`preprocessing.LabelBinarizer`.


alphabetically => lexicographically.

I find it misleading that we do not respect the order implied by labels when passed. Maybe we should raise a ValueError or a warning when the users passes labels= that does not respect the lexicographical order.

Note that if we decide to something else w.r.t. the handling of the y_prob class order, we would have to update the log_loss metric accordingly.

As you correctly identified, this implementation of mutliclass_brier_score_loss shadows the one from log_loss. Any change we make will have to be made to both the functions.

However, both the functions use LabelBinarizer to infer labels, so it seems like any warning/error that we raise should be raised there. Right?

On second thought, this is a misuse of LabelBinarizer.

We have the following options:

Fix multiclass_brier_score so that it respects labels

Keep multiclass_brier_score as is, but raise a warning/error

If we go with the latter, then we should do the same with log_loss as well. Or is that a different PR as well?

ogrisel · 2020-11-02T21:44:58Z

sklearn/metrics/_classification.py

+    labels : array-like, default=None
+        If not provided, labels will be inferred from y_true. If ``labels``
+        is ``None`` and ``y_prob`` has shape (n_samples,) the labels are
+        assumed to be binary and are inferred from ``y_true``.


We need to pass an explicit pos_label if y_true has object or string values. Similar to fbeta_score I believe. @glemaitre let me know if I miss something.

But then log_loss has the same problem... maybe for a subsequent PR then.

ogrisel · 2020-11-02T21:57:53Z

sklearn/metrics/_classification.py

+        else:
+            raise ValueError(f'The number of classes in labels is different '
+                             f'from that in y_prob. Classes found in '
+                             f'labels: {lb.classes_}')


If I am not mistaken, all this input validation code is duplicated from the log_loss function. Could you please factorize into a private helper method:

y_true, y_prob, labels = _validate_multiclass_probabilistic_prediction(y_true, y_prob, labels)

Not sure about the convention here. Should I put this function in sklearn.metrics._base or in sklearn.metrics._classification?

ogrisel · 2020-11-02T22:01:24Z

sklearn/metrics/tests/test_classification.py

+                                    [[0.2, 0.7, 0.1],
+                                     [0.6, 0.2, 0.2],
+                                     [0.6, 0.1, 0.3]]),
+        .41333333)


Could you please split this test in 2 tests, one for invalid input and error messages and the other for expected numerical values of valid inputs?

Also could you please add a test that check that perfect predictions lead to 0 Brier and perfectly wrong prediction lead to a Brier of 2. (both for a 2-class and a 3-class classification problem).

ogrisel · 2020-11-02T22:02:13Z

sklearn/metrics/_classification.py

+    case as well. When used for the binary case, `multiclass_brier_score_loss`
+    returns Brier score that is exactly twice of the value returned by this
+    function.
+


Please recall the latex equation of the docstring here.

I copied the equation for mutliclass_brier_score_loss as asked. But the docstring for brier_score_loss did not include the alternate (i.e. original) formula, so added that equation to the docstring as well.

Co-authored-by: Olivier Grisel <[email protected]>

aggvarun01 · 2020-11-11T22:36:54Z

@ogrisel

Apologize for the late reply.

I implemented the changes you suggested. However, there are still a few things that need a review/approval: :

Re: here. I added a warning when the label order is different. I added this to the suggested _validate_multiclass_probabilistic_prediction function, so that this warning will be raised for log_loss as well.
Leaving this for another PR unless you guys think otherwise.
Added _validate_multiclass_probabilistic_prediction to sklearn.metrics._classification itself. Let me know if that's the best place to put it, or should I move it to sklearn.metrics._base.
This test fails with the new implementation of log_loss, which only changes where np.clip is being used. Turns out this test was supposed to fail, and was passing merely because of a floating point error. The way I verified this was as follows:

from sklearn.metrics import log_loss

y_true = np.array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1])
y_pred = np.array([0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0])

def add_dimension(y_prob):
    y_prob = y_prob[:, np.newaxis]
    y_prob = np.append(1 - y_prob, y_prob, axis=1)
    return y_prob

# This is what the test is doing, which gives different results
print(log_loss(y_pred, y_true))  # 15.542609297195847
print(log_loss(y_true, y_pred))  # 15.542649277067358

# alternate way to run the same test, which gives the same results
print(log_loss(y_true, add_dimension(y_pred)))  # 15.542449377709806
print(log_loss(y_pred, add_dimension(y_true)))  # 15.542449377709806

I have removed log_loss from NOT_SYMMETRIC_METRICS for now. I don't know if symmetry even makes sense for log_loss, since it takes probabilities.

Varun Aggarwal added 8 commits October 25, 2020 20:47

add multi-class support

f630718

fix swapped y_true y_prob

e08d4f4

fix docstring

eff8854

fix docstring

d864395

fix variable name spelling

32ab60a

add tests

6e73c0d

merge upstream

7ce3f85

import re

9cd4247

github-actions bot added the module:metrics label Oct 28, 2020

Varun Aggarwal added 4 commits October 28, 2020 10:40

fix docstring

1369945

fix linting

a183d06

fix linting

08688d3

remove unused import

4f8a5f2

Varun Aggarwal added 9 commits November 1, 2020 23:04

add multiclass_brier_score_loss

7b51433

add tests

d5c90bf

fix docstring

2243828

Merge remote-tracking branch 'upstream/master' into multiclass_brier_…

9893101

…score_loss

use f-strings

3e4465f

fix tests

eafda42

fix error message

038abf7

fix docstring

838f827

fix linting

5ef41c7

ogrisel changed the title ~~[WIP] Multi-class Brier Score Loss~~ [MRG] Multi-class Brier Score Loss Nov 2, 2020

ogrisel reviewed Nov 2, 2020

View reviewed changes

aggvarun01 and others added 4 commits November 4, 2020 17:02

Update sklearn/metrics/_classification.py

4fb4c4f

Co-authored-by: Olivier Grisel <[email protected]>

Apply suggestions from code review

3260bf3

Co-authored-by: Olivier Grisel <[email protected]>

split tests

86d793e

add private function

411ec1a

add warning for labels

f84493c

KaikeWesleyReis mentioned this pull request Dec 7, 2020

Implement binary/multiclass classification metric - Spherical Payoff #18970

Open

Base automatically changed from master to main January 22, 2021 10:53

cmarmo added the Waiting for Reviewer label Apr 22, 2021

ogrisel mentioned this pull request Dec 21, 2021

ENH Add Multiclass Brier Score Loss #22046

Merged

cmarmo added Superseded PR has been replace by a newer PR and removed Waiting for Reviewer labels Mar 29, 2022

lorentzenchr closed this Jun 23, 2023

Uh oh!

Conversation

aggvarun01 commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix?

Notes/Open Questions

Uh oh!

ogrisel commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aggvarun01 commented Nov 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aggvarun01 Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aggvarun01 Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aggvarun01 commented Nov 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aggvarun01 commented Oct 28, 2020 •

edited

Loading

ogrisel commented Oct 29, 2020 •

edited

Loading

aggvarun01 commented Nov 2, 2020 •

edited

Loading

aggvarun01 Nov 5, 2020 •

edited

Loading

aggvarun01 Nov 5, 2020 •

edited

Loading