Skip to content

Conversation

@SuccessMoses
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes #16470

Any other comments?

  • In sklearn/metrics/_ranking.py, changed the name of the function _binary_clf_curve to binary_classifcation_curve without changing the body. I also changed test functions like test_binary_clf_curve_multiclass_error without changing the body
  • det_curve, roc_curve and precision_recall_curve call this function, so I updated the name of the function in the body
  • I added examples in the docstring of the function

@github-actions
Copy link

github-actions bot commented Oct 22, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d214d32. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also need to add this in api_reference.py under the right section to have it rendered in the docs properly.

@glemaitre you happy with the name?

@glemaitre glemaitre changed the title Added binary_classification_curve from _binary_clf_curve FEA Added binary_classification_curve from _binary_clf_curve Nov 5, 2024
@glemaitre glemaitre changed the title FEA Added binary_classification_curve from _binary_clf_curve FEA add binary_classification_curve Nov 5, 2024
@glemaitre
Copy link
Member

I think I'm fine with the name. I was looking if can have the word counts in the name of the function but it starts to be really long. So I would be OK with the proposed name.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

Co-authored-by: Adrin Jalali <[email protected]>
@adrinjalali adrinjalali requested a review from glemaitre November 7, 2024 09:06
@SuccessMoses
Copy link
Contributor Author

@adrinjalali There is an issue with numpydoc validation in the binary_classification_curve function, the error is RT03: Return value has no description. Do you know why this is? I already documented the return value of the function.

@adrinjalali
Copy link
Member

When you look at the CI log, this is the error, not the return:

[gw0] linux -- Python 3.12.7 /usr/share/miniconda/envs/testvenv/bin/python
810         Decreasing score values.
811 
812     Examples
813     -------
814     >>> import numpy as np
815     >>> from sklearn.metrics import binary_classification_curve
816     >>> y_true = np.array([0, 0, 1, 1])
817     >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
818     >>> fps, tps, thresholds = binary_classification_curve(y_true, y_scores)
819     >>> fps
Expected:
    array([0, 1, 1, 2])
Got:
    array([0., 1., 1., 2.])

You just need to fix the output to floats

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @SuccessMoses. I directly pushed the requested change to return negatives as well. I also added a small smoke test, we don't need more since it's heavily tested through the other curve functions.

LGTM.

@jeremiedbb
Copy link
Member

I wonder if binary_classification_curve is the best name though. Maybe confusion_matrix_curve or binary_confusion_matrix_curve would be more explicit. What do you think @glemaitre @adrinjalali ?

@jeremiedbb
Copy link
Member

After some discussion with @glemaitre and @ogrisel, we converged toward confusion_matrix_at_thresholds (with or without final "s") or binary_confusion_matrix_at_threshold.

I tested with both and I find the second option a bit too long since it makes all calls to this function being multi-line, hence hurting the readability a bit. Since a confusion matrix per threshold doesn't make sense for multiclass (with a single threshold as we use to do in sklearn) it didn't feel that necessary.

So in the end I went for confusion_matrix_at_thresholds.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 77 to 79
# For binary problems, :func:`sklearn.metrics.confusion_matrix` has the ``ravel`` method
# we can use get counts of true negatives, false positives, false negatives and
# true positives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this paragraph? It's not clear to me here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a line, just to link the two paragraphs, but not sure if it is ideal either.

@adrinjalali
Copy link
Member

Doc's failing @jeremiedbb

@lucyleeow
Copy link
Member

Doc's failing

Just need to update the import in plot_confusion_matrix.py

@lucyleeow lucyleeow changed the title FEA add binary_classification_curve FEA add confusion_matrix_at_thresholds Oct 28, 2025
Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to amend the tests names (e.g., test_binary_clf_curve_multiclass_error) to use the new name?

# we can use get counts of true negatives, false positives, false negatives and
# true positives.
#
# :func:`sklearn.metrics.binary_classification_curve`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs updating

(2, 1, 2, 3)

With :func:`confusion_matrix_at_thresholds` we can get true negatives, false positives,
false negatives and true positives for different thresholds::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the new user, should we expand on how we determine the thresholds used?

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've directly pushed my suggested changes, to help move this along, and fixed the doc failure.

Also checked the docs render okay.

@lucyleeow
Copy link
Member

Actually, still one last comment we may want to address here:

Do we want to amend the tests names (e.g., test_binary_clf_curve_multiclass_error) to use the new name?

@lucyleeow
Copy link
Member

@adrinjalali or @jeremiedbb feel free to merge if you're happy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

Make _binary_clf_curve a "public" method?

6 participants