[MRG] Multi-class Brier Score Loss#18699
[MRG] Multi-class Brier Score Loss#18699aggvarun01 wants to merge 26 commits intoscikit-learn:mainfrom
Conversation
I think this is the best course of action. Introduce a new Also when trying to call If the user decides to use |
Should be ready for a review. |
ogrisel
left a comment
There was a problem hiding this comment.
Thanks for bearing with me, here is another round of review comments:
| the probabilities provided are assumed to be that of the | ||
| positive class. The labels in ``y_pred`` are assumed to be | ||
| ordered alphabetically, as done by | ||
| :class:`preprocessing.LabelBinarizer`. |
There was a problem hiding this comment.
alphabetically => lexicographically.
I find it misleading that we do not respect the order implied by labels when passed. Maybe we should raise a ValueError or a warning when the users passes labels= that does not respect the lexicographical order.
Note that if we decide to something else w.r.t. the handling of the y_prob class order, we would have to update the log_loss metric accordingly.
There was a problem hiding this comment.
As you correctly identified, this implementation of mutliclass_brier_score_loss shadows the one from log_loss. Any change we make will have to be made to both the functions.
However, both the functions use LabelBinarizer to infer labels, so it seems like any warning/error that we raise should be raised there. Right?
There was a problem hiding this comment.
On second thought, this is a misuse of LabelBinarizer.
We have the following options:
- Fix
multiclass_brier_scoreso that it respects labels - Keep
multiclass_brier_scoreas is, but raise a warning/error
If we go with the latter, then we should do the same with log_loss as well. Or is that a different PR as well?
| labels : array-like, default=None | ||
| If not provided, labels will be inferred from y_true. If ``labels`` | ||
| is ``None`` and ``y_prob`` has shape (n_samples,) the labels are | ||
| assumed to be binary and are inferred from ``y_true``. |
There was a problem hiding this comment.
We need to pass an explicit pos_label if y_true has object or string values. Similar to fbeta_score I believe. @glemaitre let me know if I miss something.
There was a problem hiding this comment.
But then log_loss has the same problem... maybe for a subsequent PR then.
sklearn/metrics/_classification.py
Outdated
| else: | ||
| raise ValueError(f'The number of classes in labels is different ' | ||
| f'from that in y_prob. Classes found in ' | ||
| f'labels: {lb.classes_}') |
There was a problem hiding this comment.
If I am not mistaken, all this input validation code is duplicated from the log_loss function. Could you please factorize into a private helper method:
y_true, y_prob, labels = _validate_multiclass_probabilistic_prediction(y_true, y_prob, labels)There was a problem hiding this comment.
Not sure about the convention here. Should I put this function in sklearn.metrics._base or in sklearn.metrics._classification?
| [[0.2, 0.7, 0.1], | ||
| [0.6, 0.2, 0.2], | ||
| [0.6, 0.1, 0.3]]), | ||
| .41333333) |
There was a problem hiding this comment.
Could you please split this test in 2 tests, one for invalid input and error messages and the other for expected numerical values of valid inputs?
Also could you please add a test that check that perfect predictions lead to 0 Brier and perfectly wrong prediction lead to a Brier of 2. (both for a 2-class and a 3-class classification problem).
| case as well. When used for the binary case, `multiclass_brier_score_loss` | ||
| returns Brier score that is exactly twice of the value returned by this | ||
| function. | ||
|
|
There was a problem hiding this comment.
Please recall the latex equation of the docstring here.
There was a problem hiding this comment.
I copied the equation for mutliclass_brier_score_loss as asked. But the docstring for brier_score_loss did not include the alternate (i.e. original) formula, so added that equation to the docstring as well.
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
|
Apologize for the late reply. I implemented the changes you suggested. However, there are still a few things that need a review/approval: :
from sklearn.metrics import log_loss
y_true = np.array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1])
y_pred = np.array([0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0])
def add_dimension(y_prob):
y_prob = y_prob[:, np.newaxis]
y_prob = np.append(1 - y_prob, y_prob, axis=1)
return y_prob
# This is what the test is doing, which gives different results
print(log_loss(y_pred, y_true)) # 15.542609297195847
print(log_loss(y_true, y_pred)) # 15.542649277067358
# alternate way to run the same test, which gives the same results
print(log_loss(y_true, add_dimension(y_pred))) # 15.542449377709806
print(log_loss(y_pred, add_dimension(y_true))) # 15.542449377709806I have removed |
Reference Issues/PRs
Resolves #16055
What does this implement/fix?
The original formulation for Brier score inherently supports multiclass classification source. This is currently absent in scikit-learn, which restricts Brier score to binary classification. This PR implements Brier Score for multi-class classification.
Notes/Open Questions
There are two different definitions of Brier Score. The one implemented in scikit-learn, which is only applicable for the binary case is:
^{2})
(where y_hat is the probability of the positive class)
Whereas, the original, and more universal implementation, that is applicable for both the binary and the multi-class case is:
^{2})
The range of values for the former is [0,1], whereas for the latter, it is [0,2]. For backwards compatibility, this PR uses the old definition for the binary case, and the new one for the multi-class. However, this implementation seems unintuitive, since the range of values changes. I can see the following workarounds:
While the current PR implements method 1., I personally lean towards method 3.