Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@MechCoder
Copy link
Member

No description provided.

@MechCoder
Copy link
Member Author

@betatim @glouppe

random_vs_gp

DISCLAIMER: I varied the random state till gp_search performs better than random_search :P

@betatim
Copy link
Member

betatim commented Apr 6, 2016

🍒 picking :)

To avoid the option of tweaking random_state I would suggest running the whole thing several times and plotting the mean score after each iteration and the error on the mean.

@MechCoder
Copy link
Member Author

I thought about that as well.

But finding the mean score at every iteration, kind of defeats the purpose right? Typically you have the budget to only run n iterations and not n * x, so it does depends on the random state. If we average this randomness out, we are more likely to get a better score in every iteration.

@MechCoder
Copy link
Member Author

Maybe, I'll give it a try.

@betatim
Copy link
Member

betatim commented Apr 6, 2016

Doing n repeats isn't something you'd do in a real case. For the example it is a way to allow people to judge if there is a statistically significant advantage for GP and how big that advantage is.

If we repeat the example 100 times with 100 different seeds and ~50 times GP wins and ~50 random wins the conclusion would be that there isn't much difference. If however in 90 cases GP wins, you'd conclude that GP is really much smarter.

@MechCoder
Copy link
Member Author

Indeed, I agree with you!

@betatim
Copy link
Member

betatim commented Apr 6, 2016

Wondering if average score after each iteration is the best way to visualise this or if it would be better to show the distribution of the final (after 100 iterations) score for each method.

@MechCoder
Copy link
Member Author

figure_1-4

I averaged the best scores across 5 random states. Hope the new graphs are convincing. Please merge if happy.

@MechCoder
Copy link
Member Author

I have changed the example.

@glouppe
Copy link
Member

glouppe commented Apr 15, 2016

Thanks! However I think we should not mix the messages and illustrate only one concept at a time. What do you think of only showing how to use gp_minimize for hyperparameter search, without comparison with RandomizedSearchCV?

@MechCoder
Copy link
Member Author

Removed the comparison with dummy_search and updated just to show how to use in combination with sklearn estimator.

params = {
'max_depth': [max_depth], 'max_features': [max_features],
'min_samples_split': [mss], 'min_samples_leaf': [msl]}
gscv = GridSearchCV(rfc, params, n_jobs=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why GridSearchCV is needed. cross_val_scores should be enough.

@MechCoder
Copy link
Member Author

@glouppe Addressed!

@glouppe
Copy link
Member

glouppe commented Apr 21, 2016

Merging!

@glouppe glouppe merged commit e4a010f into master Apr 21, 2016
@glouppe glouppe deleted the random_vs_gp branch April 21, 2016 11:37
holgern added a commit that referenced this pull request Feb 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants