[MRG] FIX/TST boundary cases in dbscan #4073

jnothman · 2015-01-10T14:55:37Z

#3994 handled the min_samples boundary case differently to the prior DBSCAN implementation. This is now clarified in the docs. Unfortunately, when properly testing boundary cases, I found the inconsistency reported at #4072. I fix it here for 'brute' search without tests, pending a complete patch for #4072.

ogrisel · 2015-01-12T10:35:36Z

sklearn/cluster/tests/test_dbscan.py

typo: boundaries

amueller · 2015-03-03T22:15:50Z

This should now be testable, right?

jnothman · 2015-03-04T11:05:19Z

This should now be testable, right?

Rather the tests were written elsewhere with a different patch.

This is now rebased and ready for review.

amueller · 2015-03-04T16:34:35Z

the case with no core samples fails...

jnothman · 2015-03-05T00:13:29Z

Of course I reviewed #4052, but what basis did we have for thinking X = rng.rand(40, 10); X[X < 8] = 0 would generate data without core samples for eps=.5, min_samples=5? I get:

>>> np.bincount(pairwise_distances(X) <= .5)
[ 0 18 10  3  4  5]

I've made that test more certain.

amueller · 2015-03-05T17:07:07Z

Sorry, that was a hacky test. It probably came from some example that was failing at the time.

amueller · 2015-03-05T17:08:14Z

sklearn/cluster/tests/test_dbscan.py

Maybe a stupid question but why do you need [1] twice?

* tag '0.16b1': (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ...

* releases: (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ... Conflicts: sklearn/externals/joblib/__init__.py sklearn/externals/joblib/numpy_pickle.py sklearn/externals/joblib/parallel.py sklearn/externals/joblib/pool.py

* dfsg: (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ...

ogrisel reviewed Jan 12, 2015
View reviewed changes

jnothman force-pushed the dbscan_boundary branch from aec25bb to 495c146 Compare January 13, 2015 08:23

kno10 mentioned this pull request Jan 14, 2015

[MRG + 1] Do not shuffle by default for DBSCAN. #4066

Closed

amueller added this to the 0.16 milestone Jan 16, 2015

ogrisel changed the title ~~[MRG pending #4072] FIX/TST boundary cases in dbscan~~ [MRG] FIX/TST boundary cases in dbscan Mar 4, 2015

kno10 mentioned this pull request Mar 4, 2015

[MRG] Faster vectorization of DBSCAN (plain python) #4334

Closed

FIX/TST boundary cases in dbscan

cdb0577

jnothman force-pushed the dbscan_boundary branch from 495c146 to cdb0577 Compare March 4, 2015 11:04

TST fix test_dbscan_no_core_samples condition

b875625

amueller reviewed Mar 5, 2015
View reviewed changes

sklearn/cluster/tests/test_dbscan.py

Copy link

Member

amueller Mar 5, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a stupid question but why do you need [1] twice?

jnothman closed this in 15c9c0f Mar 5, 2015

alexsavio pushed a commit to alexsavio/scikit-learn that referenced this pull request Mar 9, 2015

FIX/TST boundary cases in dbscan (closes scikit-learn#4073)

042f9e2

rasbt pushed a commit to rasbt/scikit-learn that referenced this pull request Apr 6, 2015

FIX/TST boundary cases in dbscan (closes scikit-learn#4073)

8fd587d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] FIX/TST boundary cases in dbscan #4073

[MRG] FIX/TST boundary cases in dbscan #4073

Uh oh!

jnothman commented Jan 10, 2015

Uh oh!

ogrisel Jan 12, 2015

Uh oh!

jnothman Jan 13, 2015

Uh oh!

amueller commented Mar 3, 2015

Uh oh!

jnothman commented Mar 4, 2015

Uh oh!

amueller commented Mar 4, 2015

Uh oh!

jnothman commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller Mar 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[MRG] FIX/TST boundary cases in dbscan #4073

[MRG] FIX/TST boundary cases in dbscan #4073

Uh oh!

Conversation

jnothman commented Jan 10, 2015

Uh oh!

ogrisel Jan 12, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman Jan 13, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Mar 3, 2015

Uh oh!

jnothman commented Mar 4, 2015

Uh oh!

amueller commented Mar 4, 2015

Uh oh!

jnothman commented Mar 5, 2015

Uh oh!

amueller commented Mar 5, 2015

Uh oh!

amueller Mar 5, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants