Skip to content

Conversation

@Micky774
Copy link
Contributor

@Micky774 Micky774 commented Apr 5, 2023

Reference Issues/PRs

Addresses #24686
Selected subset of #26011

What does this implement/fix? Explain your changes.

Changes variable names to new standard, and includes an algorithm refactor to do_labelling. The new function is logically equivalent to the old, just with if-statement de-nesting and improved naming of intermediate values for readability.

Any other comments?

These changes were extracted from #26011 to facilitate quick review

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the logic is the same. Since the logic is a little involved, I think this needs a second review.

@thomasjpfan thomasjpfan added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Apr 5, 2023
@jjerphan jjerphan self-requested a review April 24, 2023 15:06
@ogrisel
Copy link
Member

ogrisel commented Apr 24, 2023

Changes variable names to new standard, and includes an algorithm refactor to do_labelling. The new function is logically equivalent to the old, just with if-statement de-nesting and improved naming of intermediate values for readability.

Have you measured any impact on performance?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the do_labelling function is complex enough to be turned into a cpdef function to be testable with a few dedicated unit tests, at least for the nominal cases, on simplistic data and ideally on a few edge case.

Testing the nominal cases would serve as:

  • documenting how this function is expected to work for future maintainers,
  • and for current day HDBSCAN reviewers,
  • non regression tests for future refactorings.

I also thinkg that a docstring would help (in addition).

Similar comment for the other functions / classes of this Cython module although probably in decidated PRs to keep this one focused.

@Micky774
Copy link
Contributor Author

Micky774 commented May 3, 2023

@ogrisel I've added docstring and tests for _do_labelling and I agree that it is worth revisiting those for other functions as well in other PRs, however I would prefer to save those for after merger into main.

@Micky774
Copy link
Contributor Author

@scikit-learn/core-devs If anyone has some extra bandwidth I would greatly appreciate a second review on this PR :)

@adrinjalali adrinjalali merged commit a773efb into scikit-learn:hdbscan May 16, 2023
@Micky774 Micky774 deleted the hdbscan_labelling branch May 16, 2023 14:00
Micky774 added a commit to Micky774/scikit-learn that referenced this pull request May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cython module:cluster No Changelog Needed Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants