-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
ENH Add dtype preservation for SpectralClustering
#22669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test_spectral.py to test implementations on 32bit datasets
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
According to some irl discussions, such tests should only be added after Let's keep this PR open in the mean time. |
|
Resurrecting this PR: |
SpectralClustering
| self.affinity_matrix_ = pairwise_kernels( | ||
| X, metric=self.affinity, filter_params=True, **params | ||
| ) | ||
| ).astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should work on making pairwise_kernels work efficiently with float32 data first.
Since affinity="rbf" is the default, flagging SpectralClustering as dtype preserving without that prerequisite would be misleading to our users: there would be very few performance or peak memory usage gains in passing float32 data to this estimator when using the default hyper-params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we also need a test that checks the dtype of affinity_matrix_ when the affinity param is the str name of a kernel function and another such assertion in a test that covers the case when affinity is nearest_neighbors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to have ArgKmin and RadiusNeighbors preserve dtypes because it is also used when affinity="precomputed_nearest_neighbors".
Let's turn this PR as a draft for now.
| .fit(S) | ||
| .labels_ | ||
| ) | ||
| assert adjusted_rand_score(y, labels) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the dtype of affinity_matrix_ here.
| ) | ||
| results.append(labels) | ||
|
|
||
| assert_array_equal(results[0], results[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the dtype of affinity_matrix_ here.
|
Closing for now, might reopen later. |
Reference Issues/PRs
Partially addresses #22881
Precedes #22590
What does this implement/fix? Explain your changes.
This parametrizes tests from
test_spectral.pyto run on 32bit datasets.Any other comments?
We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.