-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX Create a cosistent image via random_state additions in "plot_cluster_comparison.py" #26976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Create a cosistent image via random_state additions in "plot_cluster_comparison.py" #26976
Conversation
27c4c1e to
aa91f56
Compare
| ) | ||
| noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05, random_state=8) | ||
| blobs = datasets.make_blobs(n_samples=n_samples, random_state=8) | ||
| no_structure = np.random.rand(n_samples, 2), None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not seeded.
So we need to create a random number generator at least for all the np.random of the current file:
| no_structure = np.random.rand(n_samples, 2), None | |
| rng = np.random.RandomState(42) | |
| no_structure = rng.rand(n_samples, 2), None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @glemaitre, I have pushed a commit 7dd631f that adds the changes you propose.
| noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05, random_state=8) | ||
| blobs = datasets.make_blobs(n_samples=n_samples, random_state=8) | ||
| no_structure = np.random.rand(n_samples, 2), None | ||
| rng = np.random.RandomState(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this rng a few lines above and pass the same thing to mane_moons and make_circles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to the changing order and the RandomState object usage conversation we had at the workshop (cc @glemaitre as it was us talking)? I got the impression that whenever possible the random_state parameter is to be preferred? As far as I understand, it isn't directly possible to add it in the no_structure case, so we use RandomState instead of the numpy implementation. From a best practice perspective, is it different in this specific case? Or this is a style suggestion so to say? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not saying not to use random_state, I'm suggesting:
noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05, random_state=rng)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right @adrinjalali sorry, I wasn't clear enough, I meant passing an integer to the random_state parameter, instead of a RandomState object. From my conversation with @glemaitre at the event it felt like the first is preferred, we discussed exactly this. I am also happy to go with it as it's better to merge it than to stall it for a few weeks, just trying to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I'll let @glemaitre comment as why :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali passing the rng instance, it will be mutated by each call. Potentially, if we change a line of code that alternates the rng in a future PR, we will change all subsequent behaviour of the function using rng. Passing an int will make sure that we don't change the behaviour in future PRs.
Basically, random-state instance dilemma stuff :)
|
I observe a couple of changes that I think we don't want to have. To be honest I am even surprised that this is linked to the random seed, I would have not expected such a difference. The main changes are on the first dataset: BeforeAfterDifferences
For I think that it could be nice to change the random seed such that we preserve the visual effect. |
We noticed these radical changes at the end of the contribution night, but since they were consistent we thought it might be ok 😊 Would experimenting with different random seed values be the right way to proceed here? Is there any other approach? |
I would say that this is enough for this PR indeed. |
89e27aa to
7dd631f
Compare
@glemaitre Having 30 as a random seed seems to give a close enough impression of how the original image was. Some of them are still slightly different, but it is much better. Would this suffice in your opinion? The change can be found in 0f3dc62. I will also keep in mind that the image needs to give out a similar visual impression for future PRs. |
|
I am fine with these changes. Would you mind fixing the linter issues ;) |
…lustering algorithms and data. Co-authored-by: AnnaWey <[email protected]>
Co-authored-by: AnnaWey <[email protected]>
0f3dc62 to
b6d47f3
Compare
Done! Looks all green now. |
|
Thanks @TamaraAtanasoska, all good then. Merging. |
…ter_comparison.py" (scikit-learn#26976) Co-authored-by: AnnaWey <[email protected]>
…ter_comparison.py" (scikit-learn#26976) Co-authored-by: AnnaWey <[email protected]>
…ter_comparison.py" (scikit-learn#26976) Co-authored-by: AnnaWey <[email protected]>
…ter_comparison.py" (#26976) Co-authored-by: AnnaWey <[email protected]>
…ter_comparison.py" (scikit-learn#26976) Co-authored-by: AnnaWey <[email protected]>





Reference Issues/PRs
Fixes #17568.
What does this implement/fix? Explain your changes.
This PR adds the parameter
random_stateto all the clustering algos in the file that take that parameter. Additionally, we have added therandom_stateparameter to two datasets and removed the globalrandom.seedsetting.Any other comments?
After a few tests, it looks like the generated image is consistently the same, but we aren't as confident as we were for the last PR. Please let us know what we can change to improve in case it is still not the same.
This PR was done as part of the PyLadies workshop meetup.
@adrinjalali and @glemaitre please take a look. Thank you!