-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FEA Adds decision_threshold_curve function
#31338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| .. currentmodule:: sklearn.model_selection | ||
|
|
||
| .. _TunedThresholdClassifierCV: | ||
| .. _threshold_tunning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referenced this page starting here for decision_threshold_curve because I thought this first paragraph was appropriate, for context, not just the section I added. Not 100% on this though and happy to change
| .. _threshold_tunning: | ||
|
|
||
| ================================================== | ||
| Tuning the decision threshold for class prediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre Can't comment where this is relevant (after L49), but I wonder if it would be interesting to add another scenario where threshold tunning may be of interest - imbalanced datasets?
| good, or a loss function, meaning low is good. In the latter case, the | ||
| the output of `score_func` will be sign-flipped. | ||
| labels : array-like, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_CurveScorer uses the term "classes" but "labels" is consistent with what is used for other classification metrics, so chose this.
| between the minimum and maximum of `y_score`. If an array-like, it will be | ||
| used as the thresholds. | ||
| greater_is_better : bool, default=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is consistent, with term used in other metrics, so avoided using "sign".
| # This could also be done in `decision_threshold_curve`, not sure which | ||
| # is better | ||
| y_true_unique = cached_unique(y_true) | ||
| if classes is None: | ||
| classes = y_true_unique | ||
| # not sure if this separate error msg needed. | ||
| # there is the possibility that set(classes) != set(y_true_unique) fails | ||
| # because `y_true` only contains one class. | ||
| if len(y_true_unique) == 1: | ||
| raise ValueError("`y_true` only contains one class label.") | ||
| if set(classes) != set(y_true_unique): | ||
| raise ValueError( | ||
| f"`classes` ({classes}) is not equal to the unique values found in " | ||
| f"`y_true` ({y_true_unique})." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks could be done in decision_threshold_curve instead, not sure which is better.
| for th in potential_thresholds | ||
| ] | ||
| return np.array(score_thresholds), potential_thresholds | ||
| # why 'potential' ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my education, why use the term "potential", in "potential_thresholds". Is it because there a possibility that a threshold is redundant because the predicted labels are the same for adjacent thresholds?
|
@glemaitre I've highlighted questions in the code. Tests still to be added for |
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @lucyleeow. I haven't looked at the whole PR yet. I'm wondering if there's a way to not have threshold as a parameter.
| thresholds : int or array-like, default=20 | ||
| Specifies number of decision thresholds to compute score for. If an integer, | ||
| it will be used to generate `thresholds` thresholds uniformly distributed | ||
| between the minimum and maximum of `y_score`. If an array-like, it will be | ||
| used as the thresholds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can compute the thresholds automatically, like in the other curve functions.
score_func is called on y_true (independent of the threshold) and y_pred where y_pred is computed from y_score and the threshold.
If we sort y_score in ascending order, it's only where the value of sorted_y_score changes that we need to set a threshold. That way we compute the exact number of thresholds. We can probably leverage some code from _binary_clf_curve
scikit-learn/sklearn/metrics/_ranking.py
Lines 891 to 906 in 18e89a4
| # sort scores and corresponding truth values | |
| desc_score_indices = xp.argsort(y_score, stable=True, descending=True) | |
| y_score = y_score[desc_score_indices] | |
| y_true = y_true[desc_score_indices] | |
| if sample_weight is not None: | |
| weight = sample_weight[desc_score_indices] | |
| else: | |
| weight = 1.0 | |
| # y_score typically has many tied values. Here we extract | |
| # the indices associated with the distinct values. We also | |
| # concatenate a value for the end of the curve. | |
| distinct_value_indices = xp.nonzero(xp.diff(y_score))[0] | |
| threshold_idxs = xp.concat( | |
| [distinct_value_indices, xp.asarray([size(y_true) - 1], device=device)] | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, I have added a threshold=None option that does the above.
Would you care to take a quick look at the overall PR before I add tests, as I am not sure about a few things, thanks 🙏
|
Sorry for the delay, there was something puzzling me with the approach of the problem in this PR and I had a hard time figuring out precisely what it was. In particular, the need for To me the issue is the use of the scikit-learn/sklearn/metrics/_ranking.py Line 830 in ea9e824
The difference is that instead of computing the true positives and false positives, we'd compute any (threshold dependent) metric. Briefly, this is done in 3 steps:
I think that we can extract most of the This is related to #30134 where the goal is to make |
|
Interesting, I've reviewed #30134, hopefully that will be merged soon and I can look at your suggested changes after! |
|
@jeremiedbb I've tried your approach in a separate PR #32732, as I thought it was nice to be able to see the two implementations side by side, and also it would have been complicated history/changes wise to do it all in this PR. I think your suggested approach is better, please share your thoughts! |
Reference Issues/PRs
closes #25639 (supercedes)
closes #21391
Thought it would be easier to open a new PR and there were no long discussions on the old PR (#25639).
What does this implement/fix? Explain your changes.
Adds a function that takes a scoring function,
y_true,y_score,thresholdsand outputs score per threshold.The intention is to later add a new display class using this function, allowing us to plot metric score per threshold e.g.
Uses the
_CurveScoreras suggested by @glemaitre . Refactors out a new_scores_from_predictionsstatic method, that takes the predictions. The old_scoretakes estimator, calculatesy_scoreand passes to the new_scores_from_prediction.Notes:
_scores_from_predictions- name is inspired by the display class methods 'from_predictions' , but happy to changefrom_scorerbecause we are not using ascorer(which has the signaturecallable(estimator, X, y)) we are using a 'scoring_functionwith signaturescore_func(y_true, y_pred, **kwargs)`_CurveScorerdirectly instead, and then call_scores_from_predictionsindecision_threshold_curvedecided to make[realised that method is nicer to avoid having too many params in_scores_from_predictionsa static method, but I also could have made it a method, and indecision_threshold_curveinstantiate_CurveScorerdirectly first (not viafrom_scorer). Only went with staticmethod path because I initially didn't registerfrom_scorerinstantiates differently than directly via_CurveScorer. Not 100% on what is best here.decision_threshold_curveand avoids some lines of code (use self.xx directly, instead of passing self.xx to_scores_from_predictions)]Any other comments?
cc @glemaitre
Lots more to do, but should get implementation right first.
To do:
_decision_threshold.pymodule docstring