Hdbscan boruvka #4

Micky774 · 2022-09-16T02:35:54Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This is meant to be a sort of "live tracking" PR which handles the reconciliation and reintroduction of the Boruvka algorithm. It exists because I find it more convenient to adapt to incremental changes in the upstream branch as they come out, rather than trying to reconcile a massive gap once the upstream branch is actually merged into scikit-learn/main.

Any other comments?

Everything here is volatile and subject to great change.

- Added support for `n_features_in_` - Improved validation and added support for `feature_names_in_` - Renamed `kwargs` to `metric_params` and added safety check for an empty dict - Removed attributes set in init and deferred to properties - Raised error if tree query is performed with too few samples - Cleaned up some list/dict comprehension logic

…to hdbscan

…trics`" This reverts commit cd1edc4.

- Removed internal minkowski metric parameter validation in favor of `sklearn.metrics` built-in handling - Removed default argument and presence of `p` in hdbscan functions - Now users must pass `p` in through `metric_params`, consistent w/ other metrics such as `wminkowski` and `mahalanobis` - Removed vestigial estimator check -- now supported via common tests - Fixed bug where `boruvka_kdtree` algorithm's accepted metrics were based off of `BallTree` not `KDTree` - Cleaned up lines with unused returns by indexing output of `hdbscan` - Greatly expanded scope of algorithm/metric compatability tests - Streamlined some other tests - Delted commented out tests

Included temporary addition of homogeneity measure from original library for debugging purposes

Micky774 added 30 commits February 24, 2022 23:58

Initial addition of hdbscan

1c61429

Added wraparound wrappers where needed

c5240b7

Updated documentation

74bd0b3

Merge branch 'main' into hdbscan

15793b2

Added a new batch of doc updates for passing docstring tests

faa06b5

Improved metric_params handling

2a7cc22

Propogated metric_params change to tests and other functions

97f036f

Removed plotting, to_pandas, to_networkx infrastructure

8aa297a

Removed plotting, to_pandas, to_networkx infrastructure

fe362b5

Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…

dd44dbc

…to hdbscan

Renamed plots.py-->_trees.py

fda9350

Fixed package namespace in cluster/__init__.py

7478586

Drop-in replaced private dist_metrics with metrics.dist_metrics

cd1edc4

Revert "Drop-in replaced private dist_metrics with `metrics.dist_me…

0802504

…trics`" This reverts commit cd1edc4.

Docstring compliance for flat.py

ce94591

Renamed flat.py --> _flat.py

e93bfe1

Renamed flat.py-->_flat.py

028e98f

Renamed validity.py-->_validity.py

a1ac99a

Renamed robust_single_linkage_.py

788d4bc

Merge branch 'main' into hdbscan

5fba5e0

Removed _flat_.py and associated tests

cf4f239

Made memview readonly constant

1ceac43

Removed experimental/extra API -- may reenable in future PRs

6f20a08

Merge branch 'main' into hdbscan

6705fa7

WIP docstring improvements for RSL

9e9be81

Trimmed and removed unnecessary RSL estimator

0cd08f3

Updated sqrt2 default in robust_single_linkage

7b73dd8

Updated alpha arg for rsl functions

62cf09e

Micky774 added 30 commits September 26, 2023 10:26

Added prototype test

05276bd

Corrected algo key-word

b3ac0d1

Added partial dispatch for boruvka

27d593c

Updated boruvka formatting

6a72efa

Refactored NodeData_t and formatted code

be2b4e9

Included temporary addition of homogeneity measure from original library for debugging purposes

Formatting and new Numpy API

11e91f9

Corrected indexing error

014c168

Added greater nogil support and started boruvka bug fix

baa6a02

Removed debug statements and improved test

ffc6b77

Improved tests and hdbscan dispatch logic

974673e

Cleaned up cython file

61aef63

Removed unnecessary public attributes

2634228

Updated formatting and removed parallel-query schema

53a7ec6

Remove attribute used in debugging

37e5d5c

Improved tests

0788281

Merge branch 'main' into hdbscan_boruvka

57330e5

Updated changelog

3dd7c5a

Changed default to preserve backwards compatability

5a9aebe

Merge branch 'main' into hdbscan_boruvka

90d9403

Improved tests, and adjusted auto option for backwards compatability

3b40f8a

Corrected changelog entry

04e4007

Removed extraneous function

e06188f

Stabalized tests by using sorted lists

6ef1668

Updated to include deprecation for auto heuristic

76713ff

Updated example in docstring

de5d041

Updated centers test to use less adversarial data

8ba71e8

Corrected test by making hdb model more noise-tolerant

6a592f4

Avoid FutureWarning in tests

68d4fd1

Fixed remaining FutureWarning

39aa992

Merge branch 'main' into hdbscan_boruvka

fedeb90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hdbscan boruvka #4

Hdbscan boruvka #4

Micky774 commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hdbscan boruvka #4

Are you sure you want to change the base?

Hdbscan boruvka #4

Conversation

Micky774 commented Sep 16, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants