Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@kiudee
Copy link
Contributor

@kiudee kiudee commented Jul 12, 2017

This acquisition function aims at reducing the overall uncertainty of our objective function approximation.
This is useful if you want to accurately gauge the effect of every hyperparameter on the objective function, typically to set proper ranges for the subsequent optimization or to remove a parameter completely.

The gaussian_a_opt function uses the standard deviation provided by the base estimator and samples those points first where it is maximal.

Suggestions for improvement are welcome.

@codecov-io
Copy link

codecov-io commented Jul 12, 2017

Codecov Report

Merging #432 into master will increase coverage by 0.02%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #432      +/-   ##
==========================================
+ Coverage   86.43%   86.46%   +0.02%     
==========================================
  Files          22       22              
  Lines        1563     1581      +18     
==========================================
+ Hits         1351     1367      +16     
- Misses        212      214       +2
Impacted Files Coverage Δ
skopt/acquisition.py 95.95% <75%> (-0.89%) ⬇️
skopt/callbacks.py 95.65% <0%> (-0.51%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afb0e49...3824ef0. Read the comment docs.

@iaroslav-ai
Copy link
Member

Looks interesting!

Could you elaborate a bit more on particular use cases for the function, eg a bit more description of practical use cases? Also could you provide some references to the literature where such thing is used? Would be good so that people can take a look at it a bit more in detail.

One idea I would have in my mind is to possibly use this instead of random initialization for the optimizers, so that initial points generated are distributed "more evenly" across search space.

@kiudee
Copy link
Contributor Author

kiudee commented Jul 12, 2017

The general setting is called active learning in which you want to learn the target function with as few evaluations as possible.

"A-optimality" was established in optimal design . The goal is to specify design points in advance which reduce the average variance of the parameter estimates. See [1] for a good treatment of the different optimality criteria when applied in Bayesian optimization. This reference could also be useful if we want to implement more criteria like the mutual information.

For initialization we could calculate a fixed set of n_random_starts points to implement an optimal design.
I would advise against using the surrogate model for that purpose.
For quasi-random initialization I would recommend a sequence of points satisfying low-discrepancy (see [2] for a recent paper on quasi-monte carlo integration). This captures your intuition of "more evenly" exploring the search space.
The library Spearmint uses a Sobol sequence for initialization. I would recommend choosing a random start value of the sequence, otherwise it will always start with the exact same points.

[1] Krause, Andreas, Ajit Singh, and Carlos Guestrin. "Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies." Journal of Machine Learning Research 9.Feb (2008): 235-284.
[2] Dick, Josef, Frances Y. Kuo, and Ian H. Sloan. "High-dimensional integration: the quasi-Monte Carlo way." Acta Numerica 22 (2013): 133-288.

@betatim
Copy link
Member

betatim commented Jul 12, 2017

Naive question: how is this acquisition function different from evaluating the objective using a Sobol (or your favourite quasi random) sequence? Is it because with a Sobol sequence you explore the space "evenly" and here you pick points that have large uncertainty? Is there a simple example where the two don't lead to "the same" thing? (a heteroscedastic objective?)

@MechCoder
Copy link
Member

MechCoder commented Jul 12, 2017

Hmm, I think you can achieve the same by setting Kappa to a very-high value in LCB. Is it not?

@kiudee
Copy link
Contributor Author

kiudee commented Jul 13, 2017

@betatim I will play around with a few GPs to come up with an example where the behavior is different. In any case the Sobol sequence is not adaptive, ie it will not change if the user provides an initial set of points for which the objective value is already known.

@MechCoder Yes, indeed I was doing exactly this as a workaround before deciding to implement the acquisition function. In my opinion it is cleaner this way, since the effect of the mean is completely removed.

@MechCoder
Copy link
Member

In that case, I would prefer having a special value for Kappa that will set exploitation to zero (and will have no controversy in getting merged) instead of having yet another acquisition function.

@kiudee
Copy link
Contributor Author

kiudee commented Jul 15, 2017 via email

@kiudee
Copy link
Contributor Author

kiudee commented Jul 19, 2017

I made the change, by letting the user provide a special string 'Aopt' as the parameter kappa in LCB.

Somehow Github did not like that I rebased the commits and force-pushed. Any ideas on how to fix the pull request without recreating it?
edit: It appears simply reopening it fixed the history, but we need to rerun the tests.

@kiudee kiudee reopened this Jul 19, 2017
@glouppe
Copy link
Member

glouppe commented Jul 21, 2017

Looks good to me. +1 for merge

@glouppe glouppe changed the title Implement greedy A-optimal acquisition function for pure exploration [MRG+1] Implement greedy A-optimal acquisition function for pure exploration Jul 21, 2017
Controls how much of the variance in the predicted values should be
taken into account. If set to be very high, then we are favouring
exploration over exploitation and vice versa.
If set to 'Aopt', the acquisition function will only use the variance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being a prick but is Aopt the best name?
`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, since we do not have any other acquisition functions approximating optimal designs, we could call it something like 'var', 'variance', 'var_only' or 'explore_only'. I am open to suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"variance" is fine with me.

Controls how much of the variance in the predicted values should be
taken into account. If set to be very high, then we are favouring
exploration over exploitation and vice versa.
If set to 'variance', the acquisition function will only use the variance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again, but this should be `std'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you talk about the name of acquisition function? Some might have weird associations with 'std' as abbreviation 😅

@kiudee
Copy link
Contributor Author

kiudee commented Jul 26, 2017 via email

@MechCoder
Copy link
Member

So the confusion on my side is because kappa denotes the value by which the std is multiplied and not the acquisition function itself.

I would be fine with allowing kappa="inf" or/and kappa=np.inf with a note that says this sets off exploitation. WDYT?

@glouppe
Copy link
Member

glouppe commented Jul 27, 2017 via email

@kiudee
Copy link
Contributor Author

kiudee commented Jul 27, 2017 via email

Since in LCB the variable kappa is used to describe how much weight is
given to the standard deviation, 'inf' is a more natural name for
the limit of this weight.
@glouppe
Copy link
Member

glouppe commented Jul 27, 2017

Good to go for me when Travis is happy.

@kiudee
Copy link
Contributor Author

kiudee commented Jul 27, 2017

The Travis build canceled due to
The job exceeded the maximum time limit for jobs, and has been terminated.

@MechCoder MechCoder merged commit bb73e24 into scikit-optimize:master Jul 28, 2017
@MechCoder
Copy link
Member

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants