Skip to content

Conversation

@iansan5653
Copy link
Contributor

Prevent division by zeros warning in davies_bouldin_score by replacing zero intracluster distances with np.inf. This results in division with those values returning zero instead of np.nan, which is intuitive (clusters are identical to themselves, so their similarity to themselves is 0). This fixes #12611.

Also improves the docstring for davies_bouldin_score by adding a bit more detail on how it works and what value is expected.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a non-regression test that ensures no warning is raised

@iansan5653
Copy link
Contributor Author

Is there any way I can can run the tests without building? I don't have C++ build tools installed, nor do I have admin access on my machine to install them.

@jnothman
Copy link
Member

The tests are run here in any case, and pass. It should be possible to run tests on an installed package, but I don't know a neat way to install your working copy without a compiler or some hackery.

@jnothman
Copy link
Member

You should not need admin access, though. conda for instance will happily install build tools without super-user rights. You just need to first install conda in user space.

Copy link
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Like say jnothman is good idea use np.max alias. whatsnew?

@iansan5653
Copy link
Contributor Author

I don't understand what's going on with the codecov test; if I'm understanding the output correctly, it's reflecting changes that I didn't make in these commits.

@iansan5653
Copy link
Contributor Author

I'm not sure why it's failing again. It just says No module named builtins

@jnothman
Copy link
Member

Please merge the latest master. Circle CI config has changed

@iansan5653
Copy link
Contributor Author

Anything else I need to do to get this merged?

@jnothman
Copy link
Member

Wait for a second core developer to review

@jnothman
Copy link
Member

Pinging, as you have, helps :)

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agramfort
Copy link
Member

+1 for MRG provided the test added fails on master.

@jnothman
Copy link
Member


(py37) joel@MacBook-Pro:~/repos/scikit-learn(master*)$ git checkout upstream/pr/12760 sklearn/metrics/cluster/tests/test_unsupervised.py
(py37) joel@MacBook-Pro:~/repos/scikit-learn(master*)$ pytest sklearn/metrics/cluster/tests/test_unsupervised.py
============================================================================================= test session starts ==============================================================================================
platform darwin -- Python 3.7.2, pytest-4.1.1, py-1.7.0, pluggy-0.8.1
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/joel/repos/scikit-learn/.hypothesis/examples')
rootdir: /Users/joel/repos/scikit-learn, inifile: setup.cfg
plugins: cov-2.6.1, hypothesis-4.6.0
collected 9 items

sklearn/metrics/cluster/tests/test_unsupervised.py ........F                                                                                                                                             [100%]

=================================================================================================== FAILURES ===================================================================================================
__________________________________________________________________________________________ test_davies_bouldin_score ___________________________________________________________________________________________

    def test_davies_bouldin_score():
        assert_raises_on_only_one_label(davies_bouldin_score)
        assert_raises_on_all_points_same_cluster(davies_bouldin_score)

        # Assert the value is 0. when all samples are equals
        assert davies_bouldin_score(np.ones((10, 2)),
                                    [0] * 5 + [1] * 5) == pytest.approx(0.0)

        # Assert the value is 0. when all the mean cluster are equal
        assert davies_bouldin_score([[-1, -1], [1, 1]] * 10,
                                    [0] * 10 + [1] * 10) == pytest.approx(0.0)

        # General case (with non numpy arrays)
        X = ([[0, 0], [1, 1]] * 5 + [[3, 3], [4, 4]] * 5 +
             [[0, 4], [1, 3]] * 5 + [[3, 1], [4, 0]] * 5)
        labels = [0] * 10 + [1] * 10 + [2] * 10 + [3] * 10
        pytest.approx(davies_bouldin_score(X, labels), 2 * np.sqrt(0.5) / 3)

        # Ensure divide by zero warning is not raised in general case
        with pytest.warns(None) as record:
            davies_bouldin_score(X, labels)
        div_zero_warnings = [
            warning for warning in record
            if "divide by zero encountered" in warning.message.args[0]
        ]
>       assert len(div_zero_warnings) == 0
E       assert 1 == 0
E        +  where 1 = len([<warnings.WarningMessage object at 0x11e0a7c18>])

sklearn/metrics/cluster/tests/test_unsupervised.py:244: AssertionError
================================================================================ 1 failed, 8 passed, 2 warnings in 2.60 seconds ================================================================================
(py37) joel@MacBook-Pro:~/repos/scikit-learn(master*)$

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that this needs a what's new.

@jnothman jnothman merged commit 93a1b77 into scikit-learn:master Feb 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Davies Bouldin measure: division by zero

4 participants