-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG+2] Fix sparse matrix handling in clustering #4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Fix sparse matrix handling in clustering #4052
Conversation
|
@jnothman |
77a4e13 to
de91554
Compare
|
Huh? Oh, perhaps that's the only part of the implementation I didn't test! :s |
|
You can never test enough ;) |
|
No, had nothing to do with parts I didn't test. It's an edge case. That scipy.sparse doesn't seem to handle well... Until recently there was no support for sparse matrices with a zero-dimension. Apart from that, it seems |
|
Meh :-/ Do you want to look into it / do a fix? I'm not super familiar with the DBSCAN code. |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y, oh y?
|
more tests with less code :D |
|
Regrding the edge case of slicing with an empty index array: this is unsupported in scipy 0.13, but is fixed in master. So perhaps the solution is to special-case it. If we land up with no core samples, create a dense array of the correct shape (but with no data)! Or, could special-case it only if scipy raises an error. |
|
maybe just special-case it all the time and comment saying it is scipy backward compatibility? |
9925a98 to
c71ebb2
Compare
|
fixed the dbscan issue. |
ae632fc to
75f2f5e
Compare
sklearn/cluster/tests/test_dbscan.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*no_core_samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe that's where the missing s went from a very different patch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha I bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment has not been addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn. I forgot to push at some point or a rebase messed it up :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Except for |
|
The |
|
But I can also ditch the other one and we just use this. |
|
the problem is that I used #4058 elsewhere, too. |
04a4364 to
2345d23
Compare
|
Ah sorry, I'd forgotten this was on top of #4058 On 12 January 2015 at 05:45, Andreas Mueller [email protected]
|
2345d23 to
3e3fdd3
Compare
|
done |
3e3fdd3 to
93ed5d8
Compare
sklearn/cluster/tests/test_dbscan.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that all samples are clustered as outliers in that case? It would be great to add an assertion to check that here, first by asserting that core_sample_indices_ has shape (0,) and second that labels_ is filed with -1s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Aside from this last minor comment, +1 on my side as well. |
Make sparsity check check everything. don't test everything. That would be nice but is out of scope :-/ catch special case of no core samples in DBSCAN add nonregression test for sparse dbscan with no core samples.
93ed5d8 to
494b8e5
Compare
|
will merge if travis agrees with me. |
[MRG+2] Fix sparse matrix handling in clustering
See #4051.
Needs fixing: