MAINT additional cleaning in reachibility.pyx #5

glemaitre · 2022-10-19T13:44:00Z

Some additional style change

…n_reachibility

glemaitre · 2022-10-19T13:49:28Z

It was easier to make a PR than commenting.
I think that we should test the function mutual_reachibility_distance directly.
We should ensure that dense and sparse would lead to the same results. This would be the minimal thing.

glemaitre · 2022-10-19T13:49:59Z

sklearn/cluster/_hdbscan/_reachability.pyx

        a core point.

-    max_dist : float, default=0.0
+    max_distance : float, default=0.0


I think that we can be explicit regarding the naming.

glemaitre · 2022-10-19T13:55:28Z

sklearn/cluster/_hdbscan/_reachability.pyx

        `LIL` format.

-    min_points : int, default=5
+    min_samples : int, default=5


Change the name to be consistent with the high-level function and DBSCAN as well

sklearn/cluster/_hdbscan/_reachability.pyx

glemaitre · 2022-10-19T13:56:39Z

sklearn/cluster/_hdbscan/_reachability.pyx

+            )[farther_neighbor_idx]
        else:
-            core_distances[i] = np.infty
+            core_distances[i] = INFINITY


This would avoid a Python interaction

glemaitre · 2022-10-19T13:57:04Z

sklearn/cluster/_hdbscan/_reachability.pyx

-    graph of a distance matrix. Note that computation is performed in-place for
-    `distance_matrix`. If out-of-place computation is required, pass a copy to
-    this function.
+def mutual_reachability_graph(


Since we build the graph, I prefer to make it explicit.

sklearn/cluster/_hdbscan/_reachability.pyx

glemaitre · 2022-10-19T14:36:40Z

The test could be something like:

import numpy as np
from scipy import sparse
from sklearn.utils._testing import assert_allclose

dist = np.random.randn(5, 5)
mr_dense = mutual_reachability_graph(dist)
mr_sparse = mutual_reachability_graph(sparse.lil_matrix(dist))

assert_allclose(mr_sparse.A, mr_dense)

With my PR it fails. I will have a closer look to see what is wrong.
I probably introduced a bug :)

glemaitre

I put the PR WIP but basically this is now working.
I have to update the documentation and make a pass to be sure that I like what I wrote.

glemaitre · 2022-10-20T09:39:37Z

sklearn/cluster/_hdbscan/_reachability.pyx

-    graph of a distance matrix. Note that computation is performed in-place for
-    `distance_matrix`. If out-of-place computation is required, pass a copy to
-    this function.
+ctypedef fused integral:


I added fused type directly.

glemaitre · 2022-10-20T09:40:05Z

sklearn/cluster/_hdbscan/_reachability.pyx

-    return distance_matrix
+            core_distances[i] = INFINITY
+
+    for col_ind in range(n_samples):


This is a nogil interaction.

glemaitre · 2022-10-20T09:58:30Z

I am pretty happy with the current implementation.
Basically, the changes are the following:

mainly renaming variables
add a CSR implementation that does not require the GIL (nor for the sorting par since this is based on NumPy)
add fused type to already be compatible with huge sparse with np.int64_t indices and np.float32 arrays.

A bit later, I will add a file that tests minimally the implementation.

Something that I was wondering and seems quite important, it seems that we assume that the distance used is symmetric. Otherwise, the way we build the core_distances would not work.
@Micky774 is the case from your understanding?

glemaitre added 2 commits October 19, 2022 15:40

MAINT further style improvement

0ab847c

Merge branch 'HDBSCAN/clean_reachability' into further_cleanup_hdbsca…

b4c1660

…n_reachibility

glemaitre commented Oct 19, 2022

View reviewed changes

FIX let's be consistent and call min_samples

d6a59a5

glemaitre commented Oct 19, 2022

View reviewed changes

glemaitre mentioned this pull request Oct 19, 2022

CLN Cleaned cluster/_hdbscan/_reachability.pyx scikit-learn/scikit-learn#24701

Merged

glemaitre commented Oct 19, 2022

View reviewed changes

sklearn/cluster/_hdbscan/_reachability.pyx Outdated Show resolved Hide resolved

TMP POC for CSC processing

e09ece7

glemaitre marked this pull request as draft October 19, 2022 16:07

glemaitre commented Oct 19, 2022

View reviewed changes

glemaitre added 3 commits October 20, 2022 11:34

ENH CSR, fused type, no-copy

1cb0db8

iter

8a38591

homogeneous dtype for max_distance

41cb21e

glemaitre commented Oct 20, 2022

View reviewed changes

glemaitre marked this pull request as ready for review October 20, 2022 09:40

glemaitre added 4 commits October 20, 2022 16:33

TST add a couple of tests (wip)

c510bf8

TST some more tests

85c1914

fused type

9ba964d

FIX put correct name on indices

0c65f8c

Micky774 merged commit 93f1896 into Micky774:HDBSCAN/clean_reachability Nov 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAINT additional cleaning in reachibility.pyx #5

MAINT additional cleaning in reachibility.pyx #5

Uh oh!

glemaitre commented Oct 19, 2022

Uh oh!

glemaitre commented Oct 19, 2022

Uh oh!

glemaitre Oct 19, 2022

Uh oh!

glemaitre Oct 19, 2022

Uh oh!

Uh oh!

glemaitre Oct 19, 2022

Uh oh!

glemaitre Oct 19, 2022

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Oct 19, 2022 •

edited

Loading

Uh oh!

glemaitre left a comment

Uh oh!

glemaitre Oct 20, 2022

Uh oh!

glemaitre Oct 20, 2022

Uh oh!

glemaitre commented Oct 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MAINT additional cleaning in reachibility.pyx #5

MAINT additional cleaning in reachibility.pyx #5

Uh oh!

Conversation

glemaitre commented Oct 19, 2022

Uh oh!

glemaitre commented Oct 19, 2022

Uh oh!

glemaitre Oct 19, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre Oct 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre Oct 19, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre Oct 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre Oct 20, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre Oct 20, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Oct 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

glemaitre commented Oct 19, 2022 •

edited

Loading