-
Notifications
You must be signed in to change notification settings - Fork 554
[MRG] Model expected inverse time along with function #369
[MRG] Model expected inverse time along with function #369
Conversation
Codecov Report
@@ Coverage Diff @@
## master #369 +/- ##
==========================================
+ Coverage 86.22% 86.25% +0.02%
==========================================
Files 22 22
Lines 1496 1550 +54
==========================================
+ Hits 1290 1337 +47
- Misses 206 213 +7
Continue to review full report at Codecov.
|
|
edit: moved to the issue thread, keep the PR to technical discussions |
skopt/optimizer/base.py
Outdated
| This implies `1/t` follows a lognormal distribution with mean `-m` | ||
| and `sigma` or `E(1/t) = exp(-m + sigma**2)` | ||
| The next point suggested is the point that minimizes `acq(x)*E(-1/t(x))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do you have a reference for this strategy?
A smart strategy, but which would require quite some changes in our codebase, is freeze-thaw bayesian optimization (http://arxiv.org/abs/1406.3896) where the idea is basically to extrapolate the function wrt some its parameters (like n_epochs). This is in turn allows to do early stopping, thereby reducing the total execution time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see section 3.2 of https://arxiv.org/pdf/1206.2944.pdf .
I'll have to look at the other paper though.
|
While in general I am in favor of taking running time into account, I also think we should be cautious in what we end up implementing. Most of those approaches are very state-of-the-art -- that is none are well-established. So before adding yet another option to the minimizers, I think it would be wise to really establish first whether this is worth it. |
f06cafa to
4c00169
Compare
c30b035 to
7e2f965
Compare
|
I removed the I'll add tests and some experiments in a while. |
|
Can someone make a pass for correctness? |
68161b1 to
7d81c51
Compare
|
I wrote a toy example here, the function optimized is a constant function while the time is proportional to the input. Here are the "EI" and "EIps" values after 3 points have been seen. def contant_func(x):
return 10.0
def time_func(x):
"""
Hard-code such that time is x[0]
"""
return contant_func(x), log(x[0])
x = [3, 5, 7]
x_1D = np.reshape(x, (-1, 1))
y = [contant_func(xi) for xi in x]
x_pred = np.linspace(1, 9, 100)
x_pred1D = np.reshape(x_pred, (-1, 1))
# Naive EI.
gpr = GaussianProcessRegressor(random_state=0)
gpr.fit(x_1D, y)
y_pred = gpr.predict(x_pred1D)
ei_vals = gaussian_ei(x_pred1D, gpr, y_opt=10.0)
# EI that takes into account time.
y_2D = [time_func(xi) for xi in x_1D]
mor = MultiOutputRegressor(gpr)
mor.fit(x_1D, y_2D)
eips_vals = gaussian_ei(x_pred1D, mor, y_opt=10.0, per_second=True)
plt.plot(x, y, "ro")
plt.plot(x_pred, y_pred, label="mean")
plt.plot(x_pred, ei_vals, label="EI")
plt.plot(x_pred, eips_vals, label="EIps")
plt.legend()
plt.show() |
|
I can take a closer look maybe this afternoon (EU time) or next week. Had a quick look now and was wondering if we should make the infrastructure to support "per second" already flexible enough to handle arbitrary multi-objective problems? I think that would mainly influence decision in |
|
We can do that but I don't think the present acquisition function machinery can handle multitask problems. I'd prefer writing separate acquisition functions as and when we decide to support that. |
|
Updated to [MRG]. Ready for review from my side. |
|
Can I attract some attention here? |
|
I don't really know much about the science side of EI per second etc. Would be good to have @glouppe comment on that, at least he seems to know that none are established methods?? |
|
Since @glouppe 's initial comment, I have moved the option to include time as an option to the minimizers to a new acquisition function. Not sure what you mean by "the science side of EI", can you please elaborate? Also, surely my links to the original bayesian optimisation paper and the matlab toolbox that allows such options make it a more established method than gbrt_minimize which our codebase supports which hasn't been published? |
|
In the graph (#369 (comment)), you can see a constant function being modelled by the GP, with time proportional to the logarithm of x. Including the time, makes sure that if there are two points to the left and right of the origin, it is likelier to pick the point on the left. You are right that one cannot fine-tune the balance between the function and time, but I would prefer implementing methods that are already there rather than something new as a starting point. |
|
@iaroslav-ai I've removed references to LCBps and gphedgeps |
| - `"EI"` for negative expected improvement, | ||
| - `"PI"` for negative probability of improvement. | ||
| - ``"EIps"`` for negated expected improvement per second to take into | ||
| account the function compute time. Then, the objective function is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: maybe drop "Then, ", seems a bit redundant
skopt/optimizer/base.py
Outdated
| else: | ||
| if np.ndim(y0) != 2 and np.shape(y0)[1] != 2: | ||
| raise ValueError( | ||
| "`y0` elements should be a tuple of 2 values.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: one might get confused by this message by assuming that y0 as such should be a tuple of 2 values. Maybe use something like "Every element in y0 should be a tuple or list containing 2 scalar values"
| acq_vals = -func_and_grad | ||
|
|
||
| else: | ||
| raise ValueError("Acquisition function not implemented.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider maybe adding here also some code which prints the list of supported acquisition function identifiers. Might be useful for someone who is familiar with BO but not with skopt, or for someone lazy who does not want to look up how to spell the acquisition name properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added detailed error messages in the Optimizer instance. It's unlikely that anyone will use this private function directly.
skopt/tests/test_acquisition.py
Outdated
| return np.zeros(X.shape[0]), np.ones(X.shape[0]) | ||
|
|
||
|
|
||
| class MultiOutputSurrogate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its more like a two output surrogate, we might want to add multi - output surrogates later, might be better to spare this name for now to use it later. Could you use something along the lines of "TwoOutputSurrogate"?
skopt/tests/test_acquisition.py
Outdated
| X_pred = [[1], [2], [4], [6], [8], [9]] | ||
| for acq_func in ["EIps", "PIps"]: | ||
| vals = _gaussian_acquisition(X_pred, mos, y_opt=1.0, acq_func=acq_func) | ||
| for fast, slow in zip([0, 1, 2], [5, 4, 3]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you here just check if vals is a sorted list? Something like
all(vals[i] <= vals[i+1] for i in range(len(vals)-1))
Or maybe skip some elements to be sure
all(vals[i] <= vals[i+2] for i in range(len(vals)-2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... to make it a bit more easy to pass the test for the surrogate - while I would expect that most of the time it should learn the right thing, it might not just because data is not too much
skopt/tests/test_common.py
Outdated
| MINIMIZERS = [gp_minimize] | ||
| ACQUISITION = ["LCB", "PI", "EI"] | ||
|
|
||
| ACQ_FUNCS_PS = ["LCBps", "PIps", "EIps", "gp_hedgeps"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove here the unsupported acquisitions
|
Just thinking out loud: a user one can always have a tradeoff between value of time and objective in principle by just multiplying the time returned in the objective function by some constant. This would be good to clarify in documentation somewhere. Still explicit tradeoff parameter would be better I think. |
4e146ce to
677008e
Compare
| if len(x0) != len(y0): | ||
| raise ValueError("`x0` and `y0` should have the same length") | ||
|
|
||
| if not all(map(np.isscalar, y0)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these checks were moved inside Optimizer
|
@iaroslav-ai I've addressed all your comments and have added support for gradient with gradient checks. I would postpone adding an example and allowing the user to configure the tradeoff to another pull request. Please have another look. |
skopt/optimizer/forest.py
Outdated
| account the function compute time. Then, the objective function is | ||
| assumed to return two values, the first being the objective value and | ||
| the second being the time taken. | ||
| - `"PIps"` for negated probability of improvement per second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding here that the objective function is assumed to be the same as for 'EIps', or maybe describe "EIps" and "PIps" jointly. Otherwise someone who might be in a rush or tired can just miss the implied similarity of objective function as is for the 'EIps', which might lead to some confusion.
| if "ps" in self.acq_func: | ||
| if is_2Dlistlike(x): | ||
| if np.ndim(y) == 2 and np.shape(y)[1] == 2: | ||
| y = [[val, log(t)] for (val, t) in y] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a nitpick: maybe use log(t + 1e-3), for example in case user measures time in minutes, and then it might happen that for some task the time is zero minutes (< 60 seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is easier to document that time should be in seconds?
| return np.zeros(X.shape[0]), np.ones(X.shape[0]) | ||
|
|
||
|
|
||
| class ConstantGPRSurrogate(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment somewhere around to explain why you need those.
|
|
||
| @pytest.mark.fast_test | ||
| @pytest.mark.parametrize("acq_func", ["EIps", "PIps"]) | ||
| def test_acquisition_per_second(acq_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check here whether the acquisition function is working properly with an example surrogate ConstantGPRSurrogate by checking if the time function wrt x is learned, right? Would be good to have a comment elaborating on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below but the diff view does not collapse it.
|
LGTM when my latest comments are addressed! |
|
Will merge when tests pass. |

return_grad=Truewhenper_second=True