Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 26, 2017

  • Add tests
  • Support return_grad=True when per_second=True

@codecov-io
Copy link

codecov-io commented Apr 26, 2017

Codecov Report

Merging #369 into master will increase coverage by 0.02%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #369      +/-   ##
==========================================
+ Coverage   86.22%   86.25%   +0.02%     
==========================================
  Files          22       22              
  Lines        1496     1550      +54     
==========================================
+ Hits         1290     1337      +47     
- Misses        206      213       +7
Impacted Files Coverage Δ
skopt/optimizer/gp.py 100% <ø> (ø) ⬆️
skopt/optimizer/gbrt.py 100% <ø> (ø) ⬆️
skopt/optimizer/base.py 94.82% <ø> (+1.27%) ⬆️
skopt/optimizer/forest.py 100% <ø> (ø) ⬆️
skopt/utils.py 98.33% <100%> (+0.23%) ⬆️
skopt/acquisition.py 96.84% <100%> (+0.78%) ⬆️
skopt/optimizer/optimizer.py 93.61% <81.08%> (-3.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c8bbd0...9dd249d. Read the comment docs.

@betatim
Copy link
Member

betatim commented Apr 27, 2017

edit: moved to the issue thread, keep the PR to technical discussions

This implies `1/t` follows a lognormal distribution with mean `-m`
and `sigma` or `E(1/t) = exp(-m + sigma**2)`
The next point suggested is the point that minimizes `acq(x)*E(-1/t(x))`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, do you have a reference for this strategy?

A smart strategy, but which would require quite some changes in our codebase, is freeze-thaw bayesian optimization (http://arxiv.org/abs/1406.3896) where the idea is basically to extrapolate the function wrt some its parameters (like n_epochs). This is in turn allows to do early stopping, thereby reducing the total execution time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see section 3.2 of https://arxiv.org/pdf/1206.2944.pdf .

I'll have to look at the other paper though.

@glouppe
Copy link
Member

glouppe commented May 2, 2017

While in general I am in favor of taking running time into account, I also think we should be cautious in what we end up implementing. Most of those approaches are very state-of-the-art -- that is none are well-established. So before adding yet another option to the minimizers, I think it would be wise to really establish first whether this is worth it.

@MechCoder MechCoder force-pushed the acq_function_with_time branch 3 times, most recently from f06cafa to 4c00169 Compare May 31, 2017 14:45
@MechCoder MechCoder force-pushed the acq_function_with_time branch from c30b035 to 7e2f965 Compare June 1, 2017 06:37
@MechCoder
Copy link
Member Author

I removed the include_time parameter and added support for acquisition functions based on the suffix "ps". I still need to finish manually compute gradients for optimizations.

I'll add tests and some experiments in a while.

@MechCoder
Copy link
Member Author

Can someone make a pass for correctness?

@MechCoder MechCoder force-pushed the acq_function_with_time branch from 68161b1 to 7d81c51 Compare June 2, 2017 05:54
@MechCoder
Copy link
Member Author

I wrote a toy example here, the function optimized is a constant function while the time is proportional to the input. Here are the "EI" and "EIps" values after 3 points have been seen.

def contant_func(x):
    return 10.0

def time_func(x):
    """
    Hard-code such that time is x[0]
    """
    return contant_func(x), log(x[0])

x = [3, 5, 7]
x_1D = np.reshape(x, (-1, 1))
y = [contant_func(xi) for xi in x]
x_pred = np.linspace(1, 9, 100)
x_pred1D = np.reshape(x_pred, (-1, 1))

# Naive EI.
gpr = GaussianProcessRegressor(random_state=0)
gpr.fit(x_1D, y)
y_pred = gpr.predict(x_pred1D)
ei_vals = gaussian_ei(x_pred1D, gpr, y_opt=10.0)

# EI that takes into account time.
y_2D = [time_func(xi) for xi in x_1D]
mor = MultiOutputRegressor(gpr)
mor.fit(x_1D, y_2D)
eips_vals = gaussian_ei(x_pred1D, mor, y_opt=10.0, per_second=True)

plt.plot(x, y, "ro")
plt.plot(x_pred, y_pred, label="mean")
plt.plot(x_pred, ei_vals, label="EI")
plt.plot(x_pred, eips_vals, label="EIps")
plt.legend()
plt.show()

eips

@betatim
Copy link
Member

betatim commented Jun 2, 2017

I can take a closer look maybe this afternoon (EU time) or next week. Had a quick look now and was wondering if we should make the infrastructure to support "per second" already flexible enough to handle arbitrary multi-objective problems? I think that would mainly influence decision in Optimizer, the user facing parts would stay the same. Thoughts?

@MechCoder
Copy link
Member Author

We can do that but I don't think the present acquisition function machinery can handle multitask problems. I'd prefer writing separate acquisition functions as and when we decide to support that.

@MechCoder
Copy link
Member Author

Updated to [MRG]. Ready for review from my side.

@MechCoder MechCoder changed the title [WIP] Model expected inverse time along with function [MRG] Model expected inverse time along with function Jun 2, 2017
@MechCoder MechCoder added this to the 0.4 milestone Jun 2, 2017
@MechCoder
Copy link
Member Author

Can I attract some attention here?

@betatim
Copy link
Member

betatim commented Jun 13, 2017

I don't really know much about the science side of EI per second etc. Would be good to have @glouppe comment on that, at least he seems to know that none are established methods??

@MechCoder
Copy link
Member Author

Since @glouppe 's initial comment, I have moved the option to include time as an option to the minimizers to a new acquisition function. Not sure what you mean by "the science side of EI", can you please elaborate? Also, surely my links to the original bayesian optimisation paper and the matlab toolbox that allows such options make it a more established method than gbrt_minimize which our codebase supports which hasn't been published?

@MechCoder
Copy link
Member Author

In the graph (#369 (comment)), you can see a constant function being modelled by the GP, with time proportional to the logarithm of x. Including the time, makes sure that if there are two points to the left and right of the origin, it is likelier to pick the point on the left.

You are right that one cannot fine-tune the balance between the function and time, but I would prefer implementing methods that are already there rather than something new as a starting point.

@MechCoder
Copy link
Member Author

@iaroslav-ai I've removed references to LCBps and gphedgeps

- `"EI"` for negative expected improvement,
- `"PI"` for negative probability of improvement.
- ``"EIps"`` for negated expected improvement per second to take into
account the function compute time. Then, the objective function is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: maybe drop "Then, ", seems a bit redundant

else:
if np.ndim(y0) != 2 and np.shape(y0)[1] != 2:
raise ValueError(
"`y0` elements should be a tuple of 2 values.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: one might get confused by this message by assuming that y0 as such should be a tuple of 2 values. Maybe use something like "Every element in y0 should be a tuple or list containing 2 scalar values"

acq_vals = -func_and_grad

else:
raise ValueError("Acquisition function not implemented.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider maybe adding here also some code which prints the list of supported acquisition function identifiers. Might be useful for someone who is familiar with BO but not with skopt, or for someone lazy who does not want to look up how to spell the acquisition name properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added detailed error messages in the Optimizer instance. It's unlikely that anyone will use this private function directly.

return np.zeros(X.shape[0]), np.ones(X.shape[0])


class MultiOutputSurrogate:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more like a two output surrogate, we might want to add multi - output surrogates later, might be better to spare this name for now to use it later. Could you use something along the lines of "TwoOutputSurrogate"?

X_pred = [[1], [2], [4], [6], [8], [9]]
for acq_func in ["EIps", "PIps"]:
vals = _gaussian_acquisition(X_pred, mos, y_opt=1.0, acq_func=acq_func)
for fast, slow in zip([0, 1, 2], [5, 4, 3]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you here just check if vals is a sorted list? Something like

all(vals[i] <= vals[i+1] for i in range(len(vals)-1))

Or maybe skip some elements to be sure

all(vals[i] <= vals[i+2] for i in range(len(vals)-2))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... to make it a bit more easy to pass the test for the surrogate - while I would expect that most of the time it should learn the right thing, it might not just because data is not too much

MINIMIZERS = [gp_minimize]
ACQUISITION = ["LCB", "PI", "EI"]

ACQ_FUNCS_PS = ["LCBps", "PIps", "EIps", "gp_hedgeps"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to remove here the unsupported acquisitions

@iaroslav-ai
Copy link
Member

Just thinking out loud: a user one can always have a tradeoff between value of time and objective in principle by just multiplying the time returned in the objective function by some constant. This would be good to clarify in documentation somewhere. Still explicit tradeoff parameter would be better I think.

@MechCoder MechCoder force-pushed the acq_function_with_time branch from 4e146ce to 677008e Compare June 27, 2017 13:31
if len(x0) != len(y0):
raise ValueError("`x0` and `y0` should have the same length")

if not all(map(np.isscalar, y0)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these checks were moved inside Optimizer

@MechCoder
Copy link
Member Author

@iaroslav-ai I've addressed all your comments and have added support for gradient with gradient checks. I would postpone adding an example and allowing the user to configure the tradeoff to another pull request. Please have another look.

account the function compute time. Then, the objective function is
assumed to return two values, the first being the objective value and
the second being the time taken.
- `"PIps"` for negated probability of improvement per second.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding here that the objective function is assumed to be the same as for 'EIps', or maybe describe "EIps" and "PIps" jointly. Otherwise someone who might be in a rush or tired can just miss the implied similarity of objective function as is for the 'EIps', which might lead to some confusion.

if "ps" in self.acq_func:
if is_2Dlistlike(x):
if np.ndim(y) == 2 and np.shape(y)[1] == 2:
y = [[val, log(t)] for (val, t) in y]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nitpick: maybe use log(t + 1e-3), for example in case user measures time in minutes, and then it might happen that for some task the time is zero minutes (< 60 seconds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is easier to document that time should be in seconds?

return np.zeros(X.shape[0]), np.ones(X.shape[0])


class ConstantGPRSurrogate(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment somewhere around to explain why you need those.


@pytest.mark.fast_test
@pytest.mark.parametrize("acq_func", ["EIps", "PIps"])
def test_acquisition_per_second(acq_func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You check here whether the acquisition function is working properly with an example surrogate ConstantGPRSurrogate by checking if the time function wrt x is learned, right? Would be good to have a comment elaborating on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below but the diff view does not collapse it.

@iaroslav-ai
Copy link
Member

LGTM when my latest comments are addressed!

@MechCoder
Copy link
Member Author

Will merge when tests pass.

@MechCoder MechCoder merged commit a151a9a into scikit-optimize:master Jul 10, 2017
@MechCoder MechCoder deleted the acq_function_with_time branch July 10, 2017 01:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants