Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@nfcampos
Copy link
Contributor

@nfcampos nfcampos commented Aug 12, 2016

added validate_sample argument to Space

  • function that takes in each sample and return True if sample is valid
  • added tests for validate_sample
  • catch RecursionError to raise ValueError about dimensions and validate_func being incompatible

this isn't ready for merging or anything, opening the PR just for discussion

@nfcampos nfcampos mentioned this pull request Aug 12, 2016
@betatim betatim changed the title first attempt at conditional Space [WIP] first attempt at conditional Space Aug 15, 2016
@betatim
Copy link
Member

betatim commented Aug 15, 2016

Could you explain a little how to use this with an example or two?

This is what I had in mind with conditional spaces (taken from the sklearn docs):

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

This doesn't work in scikit-optimize yet because we specify the dimensions in a way inspired by scipy.optimize. In a very early version of Space we had support for the following:

param_grid = [
  ((1, 10, 100, 1000), ('linear',)),
  ((1, 10, 100, 1000), ('rbf',), Categorical(0.001, 0.0001)), # note the order
]

Already in this example you have to be careful on how to order your dimensions to not have to do a lot of work in your objective function to figure out what is what. For this having named dimensions would help, because then we could pass them as named arguments.

How would one implement handling these varying size spaces in a GP? -> should all this be supported by optimizing over two spaces separately "under the hood"?

The more I think about it, the more I am thinking that this is going to take some trial&error before we get it right API wise.

@nfcampos
Copy link
Contributor Author

nfcampos commented Aug 15, 2016

I agree that getting this API right will not be obvious. Because of that, I thought we could start by just having a function that validates samples (returning True for valid samples).
While this does not let you have dimensions that are conditional, it does let you define that this value of dimension 0 is not compatible with this other value of dimension 1.

example:

def validate_sample(sample):
  if sample[0] != 'liblinear' and sample[1] != 'l2':
    return False
  else:
    return True

space = Space([
  ('liblinear', 'lbfgs'),  # solver
  ('l1', 'l2'),  # penalty
], validate_sample=validate_sample)

For this having named dimensions would help, because then we could pass them as named arguments.

That's exactly the benefit of using the DictSpace from the other PR. Do you think it'd be better if the dimensions themselves had names? What if you then used a mix of named and unnamed dimensions in the same space? That's why I placed the names at the level of the space.

should all this be supported by optimizing over two spaces separately "under the hood"?

Maybe, but then how do you weigh the various spaces when sampling, are they all equally weighted, are they weighted by the number of distinct possibilities they define (a product of the bounds)?

@MechCoder MechCoder added this to the 0.2 milestone Sep 8, 2016
- function that takes in each sample and return True if sample is valid
- added tests for validate_sample
- catch RecursionError to raise ValueError about dimensions and validate_func being incompatible
@codecov-io
Copy link

Current coverage is 81.75% (diff: 77.77%)

Merging #199 into master will decrease coverage by 0.08%

@@             master       #199   diff @@
==========================================
  Files            18         18          
  Lines           892        899     +7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            730        735     +5   
- Misses          162        164     +2   
  Partials          0          0          

Powered by Codecov. Last update 6655873...6f22379

@betatim
Copy link
Member

betatim commented Sep 14, 2016

Brief thought: it feels more natural to me that the objective decides if a sample is valid or not. It keeps all the things "together". Am I the only one for whom that is intuitive? The objective function could either return +inf or raise or ... to signal that this configuration is invalid. WDYT?

@nfcampos
Copy link
Contributor Author

@betatim Yeah that's a good point. I guess I went with this because I think of Space as something that is useful on its own, so I was thinking of sample validation as something that would make sense for the Space itself. Thinking just in terms of (which I'm well aware is probably the right thing here) how good the API for using *_minimize is, I do tend to agree that this can be done inside the objective function more naturally.

@MechCoder
Copy link
Member

MechCoder commented Sep 14, 2016

The problem with that approach is that it allows unnecessary expensive function evaluations (unless there is the strict assumption that the objecive function fails early given these invalid samples). It would be better to choose the candidate points from only those points which are "valid".

I would favour the dict-space method but not again sure of how to optimize spaces of different sizes. :-/

@MechCoder
Copy link
Member

optimizing over two spaces separately "under the hood"?

That in a vague way is equivalent to optimising 2 different gp_minimize functions with different spaces and simply choosing the best among (n_calls / 2) * 2 candidate points, no?

@betatim
Copy link
Member

betatim commented Sep 16, 2016

optimizing over two spaces separately "under the hood"?

That in a vague way is equivalent to optimising 2 different gp_minimize functions with different spaces and simply choosing the best among (n_calls / 2) * 2 candidate points, no?

I think so. It would be merely syntactic sugar.

@betatim betatim changed the title [WIP] first attempt at conditional Space [WIP] Support Spaces with invalid parameter combinations Sep 16, 2016
@betatim
Copy link
Member

betatim commented Sep 16, 2016

I'm not worried about the extra calls to the objective. I had assumed that if you setup a problem where parts of the parameter space are "invalid" you'd be smart enough to fail early in the objective.

One advantage of generating "invalid" samples and having the objective tell us that they are invalid is that you would have them in the OptimizeResult and could visualise them afterwards etc.

@nfcampos
Copy link
Contributor Author

nfcampos commented Jan 9, 2018

Closing this one as well as I won’t be updating.

@nfcampos nfcampos closed this Jan 9, 2018
@sytham
Copy link

sytham commented Jun 21, 2018

It's a shame that this is closed without merging. I think calling the objective function for a sample that you can know in advance is invalid is a waste, even if you fail early in the objective. For example, if I set n_calls of the optimizer to 20, I want that to be 20 valid calls, not e.g. 20 calls where 50% of the samples is invalid, giving only 10 eval points. I personally also see no upside in having these invalid samples in OptimizeResult and visualizing them. They're invalid, so I'm not interested in them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants