-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FEA add confusion_matrix_at_thresholds
#30134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd also need to add this in api_reference.py under the right section to have it rendered in the docs properly.
@glemaitre you happy with the name?
|
I think I'm fine with the name. I was looking if can have the word |
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
Co-authored-by: Adrin Jalali <[email protected]>
|
@adrinjalali There is an issue with numpydoc validation in the |
|
When you look at the CI log, this is the error, not the return: [gw0] linux -- Python 3.12.7 /usr/share/miniconda/envs/testvenv/bin/python
810 Decreasing score values.
811
812 Examples
813 -------
814 >>> import numpy as np
815 >>> from sklearn.metrics import binary_classification_curve
816 >>> y_true = np.array([0, 0, 1, 1])
817 >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
818 >>> fps, tps, thresholds = binary_classification_curve(y_true, y_scores)
819 >>> fps
Expected:
array([0, 1, 1, 2])
Got:
array([0., 1., 1., 2.])You just need to fix the output to floats |
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @SuccessMoses. I directly pushed the requested change to return negatives as well. I also added a small smoke test, we don't need more since it's heavily tested through the other curve functions.
LGTM.
|
I wonder if |
|
After some discussion with @glemaitre and @ogrisel, we converged toward I tested with both and I find the second option a bit too long since it makes all calls to this function being multi-line, hence hurting the readability a bit. Since a confusion matrix per threshold doesn't make sense for multiclass (with a single threshold as we use to do in sklearn) it didn't feel that necessary. So in the end I went for |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| # For binary problems, :func:`sklearn.metrics.confusion_matrix` has the ``ravel`` method | ||
| # we can use get counts of true negatives, false positives, false negatives and | ||
| # true positives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove this paragraph? It's not clear to me here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a line, just to link the two paragraphs, but not sure if it is ideal either.
|
Doc's failing @jeremiedbb |
Just need to update the import in |
confusion_matrix_at_thresholds
lucyleeow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to amend the tests names (e.g., test_binary_clf_curve_multiclass_error) to use the new name?
| # we can use get counts of true negatives, false positives, false negatives and | ||
| # true positives. | ||
| # | ||
| # :func:`sklearn.metrics.binary_classification_curve` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs updating
| (2, 1, 2, 3) | ||
|
|
||
| With :func:`confusion_matrix_at_thresholds` we can get true negatives, false positives, | ||
| false negatives and true positives for different thresholds:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new user, should we expand on how we determine the thresholds used?
lucyleeow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've directly pushed my suggested changes, to help move this along, and fixed the doc failure.
Also checked the docs render okay.
|
Actually, still one last comment we may want to address here:
|
|
@adrinjalali or @jeremiedbb feel free to merge if you're happy! |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Fixes #16470
Any other comments?
sklearn/metrics/_ranking.py, changed the name of the function_binary_clf_curvetobinary_classifcation_curvewithout changing the body. I also changed test functions liketest_binary_clf_curve_multiclass_errorwithout changing the bodydet_curve,roc_curveandprecision_recall_curvecall this function, so I updated the name of the function in the body