Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@Hvass-Labs
Copy link
Contributor

This pull-request partially solves issue #576.

I have added some more support for named dimensions in the search-space, as well as two plotting functions. I don't have time to fix your existing plotting functions but I have made TODO comments for what needs to be done in plots.py

Before I start, let me repeat that I really like your library! So please take the following as friendly and constructive feedback. I spent the whole Saturday from morning to evening working on this and half of this Sunday. A large part of yesterday was just trying to understand your code because of the severe lack of comments. I spent this much effort because I think your library is important and I hope you will polish it some more.

Please note that I have to travel tomorrow so it might be a while before I can respond if you have questions / comments to this PR.

How to test this pull-request

The following code uses the additions of this PR by simulating an objective function for tuning the hyper-parameters of a simple neural network. It will save a few plots to disk and print various info. It should work if you copy this to a python-file and run it.

(And yes! My code is also ugly when I'm writing it. I polish it when it is done and works. But this code is not for release :-)

import numpy as np
from math import exp

from skopt import gp_minimize
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_histogram, plot_contour, plot_objective

dim_learning_rate = Real(name='learning_rate', low=1e-6, high=1e-2, prior='log-uniform')
dim_num_dense_layers = Integer(name='num_dense_layers', low=1, high=5)
dim_num_dense_nodes = Integer(name='num_dense_nodes', low=5, high=512)
dim_activation = Categorical(name='activation', categories=['relu', 'sigmoid'])

dimensions = [dim_learning_rate,
              dim_num_dense_layers,
              dim_num_dense_nodes,
              dim_activation]

default_parameters = [1e-4, 1, 64, 'relu']


def model_fitness(x):
    learning_rate, num_dense_layers, num_dense_nodes, activation = x

    fitness = ((exp(learning_rate) - 1.0) * 1000) ** 2 + \
              (num_dense_layers) ** 2 + \
              (num_dense_nodes/100) ** 2

    fitness *= 1.0 + 0.1 * np.random.rand()

    if activation == 'sigmoid':
        fitness += 10

    return fitness

print(model_fitness(x=default_parameters))

search_result = gp_minimize(func=model_fitness,
                            dimensions=dimensions,
                            n_calls=40,
                            x0=default_parameters)

print(search_result.x)
print(search_result.fun)

for fitness, x in sorted(zip(search_result.func_vals, search_result.x_iters)):
    print(fitness, x)

space = search_result.space

print(search_result.x_iters)

print(space.to_dict(x=default_parameters))

print("Plotting now ...")

# fig = plot_histogram(result=search_result, dimension_id='activation')
fig = plot_histogram(result=search_result, dimension_id='learning_rate', bins=20)
# fig = plot_histogram(result=search_result, dimension_id='num_dense_layers', bins=20)
# fig = plot_histogram(result=search_result, dimension_id='num_dense_nodes', bins=20)
fig.savefig('histogram.png')

fig = plot_contour(result=search_result,
                     dimension_id1='learning_rate',
                     dimension_id2='num_dense_nodes')
fig.savefig('contour_learning_rate_vs_num_dense_nodes.png')

fig = plot_contour(result=search_result,
                     dimension_id1='num_dense_layers',
                     dimension_id2='num_dense_nodes')
fig.savefig('contour_num_dense_layers_vs_num_dense_nodes.png')

print("Done plotting!")

Comments about this PR

  • I am completely new to your library so it is possible I have misunderstood how something works. Please feel free to make changes to the code I have added. But please keep the good commenting and coding style as it will help people in the future who have to understand and maintain this code.

  • I am not familiar with the syntax you use for doc-strings. I have tried to imitate it. I normally use PyCharm which has a different syntax for doc-strings. It is one of the things I really don't like about Python, that you don't have to specify data-types in the function-declaration, but then you often have to do it in the doc-strings using a strange syntax. It's a really poor language-design. But anyway, please check if my docstrings are correct.

  • Some of the functionality I have added such as space.get_dimensions() is not actually used by my code. It is intended to be helpful for when you fix the other plotting functions to support named dimensions and search-spaces with categories, etc.

  • I added the function space.to_dict(). There is a somewhat related function in utils.point_asdict(). But I prefer the name to_dict() which is also used by Pandas. I realize it is bad to have two naming conventions in one library. Could I convince you to change the name of asdict()? :-)

Comments about your existing code

You REALLY need to make better comments in your code! It is very difficult for others to understand what you want to do and how everything fits together. When there is a bug we have to guess what your intentions are with the code. If you don't have time to maintain this library in the future, it will probably become abandoned because others will find it easier to just start a new project rather than to try and understand and build on yours. That would be a pity, because you have some outstanding ideas in your library!

For example, this is one of the few comments in plots.py:

# plots on the diagonal are special, like Texas. They have
# their own range so do not mess with them.

I had to read this numerous times to finally understand that it was probably a joke
about Texas USA, although I'm still not 100% sure. You really have to be a master at writing comments in order to pull off jokes, otherwise it will be confusing and harmful to the people who try to understand your code. I personally don't write jokes in code and I am very reluctant to make jokes in my video tutorials as well, for the same reason.

Please look at the coding and commenting style in the functions I have added. You may think that I write too many comments, but what I am aiming for is to explain as much as possible in plain English. Every time you have a function call into skopt or matplotlib or some other package, the reader needs to look up the semantics of that function. If that is also poorly documented, reading your code becomes like solving a giant puzzle. It takes immense effort and bugs are allowed to propagate very easily. When everything is commented in plain English, you can read and understand the code very quickly.

Another issue is your actual coding style. I found bugs as well as code that was correct but not very good. For example, these nested loops are typical in plots.py:

for i in range(space.n_dims):  # rows
    for j in range(space.n_dims):  # columns
        # ...
        if i != j:
            # ...
            if j > 0:
                # ...

Firstly, you should have a local variable called n_dims instead of always referring to space.n_dims. Secondly, your nesting depth is sometimes 4 or 5 inside these for-loops, because you haven't been very precise in your loop-conditions. You can rewrite the for-loops to something like the following. It drastically reduces the nesting-depth and also only executes the loops that are necessary:

for i in range(n_dims):
    # Do something for the diagonal case.

    for j in range(i):
        # Do something for the case where j<i.

    for j in range(i+1, n_dims):
        # Do something for the case where j>i.

You also have this code in plots.py which is incredibly difficult to understand because of the multiple nestings and list-comprehension:

diagonal_ylim = (np.min([ax[i, i].get_ylim()[0]
                         for i in range(space.n_dims)]),
                 np.max([ax[i, i].get_ylim()[1]
                         for i in range(space.n_dims)]))

It is better to split code like this into several lines to avoid the complicated nesting and also make it easier to step-debug. I added a revised version to the code, although I've commented it out for now so you can test it properly.

Another thing is that it is quite confusing how the Space and Dimension objects interact and wrap each other, e.g. regarding the transformation. It appears that the transformation and its inverse may be carried out repeatedly to nullify each other, although I'm not sure.

@pep8speaks
Copy link

pep8speaks commented Dec 17, 2017

Hello @Hvass-Labs! Thanks for updating the PR.

Line 140:1: W293 blank line contains whitespace
Line 142:1: W293 blank line contains whitespace
Line 274:80: E501 line too long (80 > 79 characters)
Line 304:1: W293 blank line contains whitespace
Line 334:80: E501 line too long (80 > 79 characters)
Line 498:78: W291 trailing whitespace
Line 523:80: E501 line too long (80 > 79 characters)
Line 609:74: W291 trailing whitespace
Line 669:1: W293 blank line contains whitespace
Line 689:80: E501 line too long (80 > 79 characters)
Line 734:80: E501 line too long (81 > 79 characters)
Line 796:1: W293 blank line contains whitespace
Line 829:80: W291 trailing whitespace
Line 833:80: W291 trailing whitespace
Line 844:80: E501 line too long (80 > 79 characters)
Line 933:80: W291 trailing whitespace

Line 112:1: W293 blank line contains whitespace
Line 289:80: E501 line too long (85 > 79 characters)
Line 387:80: E501 line too long (91 > 79 characters)
Line 494:80: E501 line too long (95 > 79 characters)
Line 773:1: W293 blank line contains whitespace
Line 777:1: W293 blank line contains whitespace
Line 781:1: W293 blank line contains whitespace
Line 816:80: E501 line too long (90 > 79 characters)
Line 857:5: E303 too many blank lines (2)
Line 863:1: W293 blank line contains whitespace

Line 173:80: E501 line too long (81 > 79 characters)
Line 177:80: E501 line too long (89 > 79 characters)

Line 518:1: W293 blank line contains whitespace

Comment last updated on January 26, 2018 at 12:06 Hours UTC

@Hvass-Labs
Copy link
Contributor Author

I have tested the code in this PR on the actual problem I am trying to solve, which is hyper-parameter optimization of a neural network. It is for one of my tutorials on TensorFlow but it is not yet released as I want to include plots from skopt.

I have a question for you regarding the interpretation of the plots. The plots I have here are similar to the ones in your original version, except that I have normalized the color-gradients.

The parameters for my search-space are:

  • learning_rate (1e-6 to 1e-2, log-uniform)
  • num_dense_layers (1 to 5)
  • num_dense_nodes (5 to 512)
  • activation (relu or sigmoid)

These are the fitness values and neural network parameters found by skopt. The fitness is the negative classification accuracy on a validation-set.

[(-0.99019999999999997, [0.0010475183322128938, 1, 149, 'relu']),
(-0.98860000000000003, [0.0018261589189723629, 2, 470, 'relu']),
(-0.98839999999999995, [0.00037986474511170302, 2, 512, 'relu']),
(-0.98839999999999995, [0.0010415672190332141, 1, 512, 'relu']),
(-0.98799999999999999, [0.0024978263671842247, 3, 504, 'relu']),
(-0.98760000000000003, [0.0017094485608837695, 1, 512, 'relu']),
(-0.98680000000000001, [0.0017476496180086432, 3, 512, 'relu']),
(-0.98499999999999999, [0.00077047010944170677, 3, 43, 'relu']),
(-0.98480000000000001, [0.0014475695468873279, 3, 220, 'relu']),
(-0.98260000000000003, [0.01, 4, 198, 'relu']),
(-0.97919999999999996, [0.0014732190199105925, 3, 286, 'sigmoid']),
(-0.97499999999999998, [0.0022674676715302932, 3, 481, 'sigmoid']),
(-0.97419999999999995, [4.4040787332101281e-05, 4, 512, 'relu']),
(-0.97419999999999995, [7.1181229112317682e-05, 4, 498, 'relu']),
(-0.9698, [0.01, 2, 5, 'relu']),
(-0.96440000000000003, [0.01, 4, 461, 'relu']),
(-0.96179999999999999, [0.0001, 1, 64, 'relu']),
(-0.96099999999999997, [3.4301200074707198e-05, 3, 512, 'relu']),
(-0.95979999999999999, [0.01, 3, 5, 'relu']),
(-0.91579999999999995, [0.01, 5, 5, 'relu']),
(-0.90759999999999996, [0.00076921095317043851, 3, 5, 'relu']),
(-0.88200000000000001, [1.4842453224480901e-05, 1, 88, 'relu']),
(-0.82620000000000005, [5.7751197554451486e-06, 2, 238, 'relu']),
(-0.64780000000000004, [1.8824150512622488e-06, 5, 259, 'relu']),
(-0.53779999999999994, [0.00049781616995792967, 4, 5, 'relu']),
(-0.47120000000000001, [9.9999999999999995e-07, 1, 512, 'relu']),
(-0.11260000000000001, [1.3010975577257652e-06, 3, 456, 'sigmoid']),
(-0.11260000000000001, [7.553247055497371e-06, 4, 209, 'sigmoid']),
(-0.11260000000000001, [0.0049652350761811535, 1, 5, 'sigmoid']),
(-0.107, [0.0017182189692616892, 5, 49, 'sigmoid'])]

The min and max of the fitness values in result.func_vals are -0.9902 and -0.107.

In the code for plot_contour() that I have submitted in this PR, I have normalized the color-gradients of all these plots to use these min and max values. This is an attempt at giving a consistent interpretation of the color-scales between the different dimensions, because their ranges sometimes vary significantly.

(1) This picture shows the learning_rate vs. num_dense_layers. You will note the red star which shows the best parameters found, and the yellow area which is presumably the minimum area of the Gaussian Process? The zi returned by partial_dependence() ranges between -0.99 and -0.34 but the plot-colors are normalized to range between -0.9902 and -0.107 which are the extreme values from the optimization results.

learning_rate_vs_num_dense_layers

(2) This next picture shows the learning_rate vs. num_dense_nodes where the zi from partial_dependence() now only ranges between -0.73 and -0.44. Because of the normalization I do using vmin and vmax in my new plot_contour() function, the color-gradients are now almost just completely green.

learning_rate_vs_num_dense_nodes

(3) This next picture shows the num_dense_layers vs. num_dense_nodes, whose zi ranges between -0.78 and -0.47. Again because of the color-normalization in plot_contour() this is almost completely green, with a slight yellow in the center.

num_dense_layers_vs_num_dense_nodes

I'm a bit confused about how to interpret these plots. Can you perhaps explain that to me? Is it because e.g. plot (3) for num_dense_layers vs. num_dense_nodes "averages" the fitness over the other remaining dimensions for learning_rate and activation? How should I interpret this? Does it mean that the performance is unpredictable from these two variables alone?

But if we look at plot (1) which shows a yellow area in the middle, does this mean that it is "generally" better to have a learning_rate around 0.002 and num_dense_layers of 3. But the best solution had num_dense_layers=1. Is it because the Gaussian Process is incapable of modelling this search-space accurately? Or is it because we are averaging the performance over the remaining variables, so it is possible that for particular choices of the other variables, the best point can fall outside of the yellow area?

Another thing is that these plots change each time I run gp_minimize(). Does that indicate I should run a lot more iteration than the 30 I have done here? The problem is of course that it is very costly to run each of these, as it is a full training of a neural network.

I hope you can enlighten me as to what is going on with these plots :-)

@codecov-io
Copy link

codecov-io commented Dec 17, 2017

Codecov Report

Merging #579 into master will decrease coverage by 4.36%.
The diff coverage is 15.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
- Coverage    86.6%   82.23%   -4.37%     
==========================================
  Files          23       23              
  Lines        1777     1903     +126     
==========================================
+ Hits         1539     1565      +26     
- Misses        238      338     +100
Impacted Files Coverage Δ
skopt/utils.py 90.11% <33.33%> (-1.01%) ⬇️
skopt/plots.py 7.98% <6.09%> (+0.44%) ⬆️
skopt/space/space.py 91.23% <60.52%> (-4.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20e46a7...dd7433d. Read the comment docs.

@betatim
Copy link
Member

betatim commented Dec 21, 2017

Thanks a lot for your hard work. I think currently everyone is travelling so it will take some time for looking at this seriously.

The doc string format we try to follow https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt this is not entirely successful/uniform across the code base, yet.

The code could (always) be commented better or restructured. It probably also depends on each persons history on how confusing they find different idioms. The way forward is to add comments or propose changes like you did. Leaving comments about work to be done/homework for others (empirically) does not lead to things getting done. So I vote to remove them or turn them into comments about the code.

self.dimensions = [check_dimension(dim) for dim in dimensions]

# Names of all the dimensions in the search-space.
self.dimension_names = [dim.name for dim in self.dimensions]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a property like is_real below so the doc string is accessible in the REPL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!


return distance

def get_dimension(self, id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a dict like interface so that some_space['learning_rate'] returns the dimension called learning_rate. We might not need to support indexing based access as people can to some_space.dimensions[idx]. This way we avoid having to support some_space[idx] and some_space['learning_rate'] which might require us making guesses about what the user meant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea I'll look into it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this again and you make a very good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works now using Space.__getitem__().

Returns
-------
* (`dimension`, `index`, `name`) [tuple(Dimension, int, str)]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return just the dimension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the reason I do it like this is to try and provide efficient support for the existing plotting functions. I agree that it is an ugly function. I'll see if I can fix the plotting functions to make this better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index is now set inside each Dimension-instance by Space.__init__ as we need it for e.g. plotting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a Dimension inside a space should know what its index is. If you need the index when iterating over the dimensions in a space you should use:

for i, dim in enumerate(space):
  print(i, dim)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I originally made the above function return both the index and the Dimension-objects was precisely because I did not want to modify your code. But you need the index for various things when only using a subset of the dimensions and you cannot loop in the way you propose. So an efficient way of doing it was the above, but the API was ugly as I agreed.

What I have done now is to modify both Dimension and Space to give each Dimension-instance an index once they are added to a Space. This way we have easy access to it when we need it e.g. for plotting a sub-set of the dimensions, and for having default names such as X_i for unnamed dimensions. I think it is a reasonable solution.

The only problem with this would be if you later want to change the composition of dimensions in a search-space. But I don't see any need for doing that, or why that should even be supported by the API.

Are there any other problems with having an index inside the Dimension-object?


# Use a default name for unnamed dimensions.
# TODO: Maybe it would be better to set this in Space.__init__?
if name is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimensions by default do not have a name, we should only give them "fake" names in the place where we need them (like in plotting) instead of here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it even makes sense to have some dimensions named and others not?

I think it is quite inelegant that some dimensions don't have names because we then have special cases elsewhere e.g. in plotting. So if we want to give them default names it might as well be done in the Space-class.

Let us think about what dimension-names are used for:

  • With PR Added support for named arguments to objective function. #590 dim-names are now used for calling the objective function.
  • Dim-names are used in plotting for the names on the axes.
  • Dim-names may be used to lookup a dim from the search-space object e.g. using the dict-like syntax you proposed above.
  • Anything else?

Can you name a case where we don't want dimensions to have names, except for benchmark functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a key question! It makes the code very messy that there is the possibility for having unnamed dimensions. It requires special cases in so many places in the code. If the only reason for having unnamed dimensions is to have support for simple benchmark functions, then it would be better to drop this "feature" and force all dimensions to have names. Please explain why you have unnamed dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnamed dimensions are part historical (to support the shorthand way of defining them), for small search spaces you don't need to have names, and for some problems the dimensions do not have (sensible) names. I'd be -1 on requiring all dimensions to have names (or giving them dummy names automatically).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently in the process of rewriting plots.py They now use the dimension-names for axis-labels. I think the most sensible way of supporting unnamed dimensions, is to return X_i for the i'th dimension, or something similar. I have already added an index to Dimension which is necessary when plotting only a sub-set of the dimensions (you'll see when I commit the code). I have just now modified the name property in the Dimension-class to return X_index if self._name is None. This also allows you to lookup unnamed dimensions using e.g. space['X_2'] as well as named dimensions using space['foo']. It works fine and I think it is a reasonable solution. The only problem is that you cannot use different names for the axis in the plots. Sure it can be coded, but it would be a mess. If you want to name the axis in the plot, then they should be set when you define the Dimension. I think this is reasonable.


return dimension, index, name

def get_dimensions(self, ids=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could combine this with the above comment about using a dict like interface: some_space[['lr', 'C']] we would have to limit names to strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the right way of doing this in Python? Is there a dunder-method that can be overrided for this type of behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works now using Space.__getitem__() which can take either a single string or list of strings as argument.

skopt/plots.py Outdated
n_points=n_points)

# Start a new plot.
fig = plt.figure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a figure and axis. Then later we can do ax.plot(...) and ax.scatter() instead of using plt.scatter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never really understood the semantics of Matplotlib. What is the reason for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plt.plot and co interface uses global state and is meant to make interactive plotting easy. The object orientated interface has less potential for weird interactions between plots because it does not have global state. The best way to learn is probably following some matplotlib tutorials/videos and looking at how other libraries that do plotting handle things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. I'll take a look if I can fix this. I've used Matplotlib many times for many projects, but never liked its API.

skopt/plots.py Outdated
plt.xlabel(dimension_name1)
plt.ylabel(dimension_name2)

return fig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the pandas plotting functions return an axis. What other examples of libraries are there that return plots? I think we should try to be consistent with what other libraries do. I'd at least return fig, axis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, I never really understood Matplotlib, but when I ran your plotting-functions in a python-file I couldn't figure out how to e.g. save the plot without having the fig-object. When using the functions in a Jupyter Notebook they can work either way with returning fig or axis. I also thought about returning both fig, axis but I didn't think it was necessary. I can give it a try and see if it works in Jupyter.

@Hvass-Labs
Copy link
Contributor Author

I get the impression that the plotting functions will not get fixed by anyone else and since I really would like this to work, I'll try and take a look at it. I'll probably have to change the existing plotting functions and we'll have to discuss later if there are API changes etc. E.g. the naming scheme you have now is a bit strange when you already have names in the dimensions, so I'll definitely change that.

@betatim thanks for all the comments, they were quite helpful and I'll try and incorporate your suggestions.

Regarding comments in code, it is the responsibility of the person adding new code to properly document it. He is the only person who really understands what the code is supposed to do. You are hoping for a miracle if you think others will come along and fix your work for you. I will help you out a bit, but to be perfectly honest, it is quite arrogant to add code that isn't properly polished and commented, because if there are 1000 people who will be using the code and they waste just a few hours each, then it is thousands of man-hours that are wasted, because the original author didn't take an hour or two more to document the code. It is not just a problem with your project but a general problem. The solution is simply that you don't accept PR's until they are properly documented so you understand what is going on and you feel the API is sufficiently elegant. Like you are doing with this PR now :-)

PS: I really could use an explanation for the plots that I showed above, so I can understand what is going on and explain it in my tutorial. Perhaps I should have opened a separate thread for that question.

@betatim
Copy link
Member

betatim commented Dec 29, 2017

All PRs are reviewed by someone who is not the original author before they are merged. This means there are at least two people who are happy with the state of things. The code base is maintained by people who are experienced with python, numpy, matplotlib and scikit-learn, as a result it relies on common idioms and conventions from the PyData ecosystem. This can result in some pretty dense or hard to read code. The way out is for an outsider (who doesn't view the world through the same biased view) to come along and help improve things.

Some may even disagree on the level of comments/understandability. To avoid bike shedding what the appropriate level of comments, docs, style, etc is we require a concrete proposal via PR. This seems to work quite well.

The miracle is that random people volunteering their time manage to put together a quite well maintained and functioning software project implementing a complex bit of research. There will always be parts that aren't as good as they could be, new things to add, historical legacy that is annoying, etc. Assuming people are already donating as much time and effort as they can, the only way to move things along faster is by donating more effort yourself or donating money to pay for developer time.

@glouppe
Copy link
Member

glouppe commented Dec 29, 2017

As for #590, please remove your inline comments about our coding conventions. This is distracting everyone from the main goal of this PR (i.e., adding two plotting functions). If you want to propose changes to improve how to code looks and is read, we welcome that, but please make self-contained targeted PRs.

@Hvass-Labs
Copy link
Contributor Author

I have now pushed a new version. Almost the entire plots.py has been rewritten.

This example tests the new functions using the built-in hart6 benchmark function.

from skopt import forest_minimize
from skopt.space import Real, Categorical, Integer
from skopt.benchmarks import hart6
from skopt.plots import plot_objective_matrix, plot_evaluations_matrix
from skopt.plots import plot_histogram, plot_objective_2D

dim1 = Real(name='foo', low=0.0, high=1.0)
dim2 = Real(name='bar', low=0.0, high=1.0)
dim3 = Real(name='baz', low=0.0, high=1.0)
dim4 = Real(name='rip', low=0.0, high=1.0)
dim5 = Real(name='rap', low=0.0, high=1.0)
dim6 = Real(name='rup', low=0.0, high=1.0)

dimensions = [dim1, dim2, dim3, dim4, dim5, dim6]

print("Optimizing ...")

result = forest_minimize(hart6, dimensions, n_calls=200,
                         base_estimator="ET", random_state=4)

print("Plotting ...")

fig, ax = plot_evaluations_matrix(result=result, bins=20, dimension_names=None)
fig.savefig('plot_evaluations_matrix_all.png')

fig, ax = plot_objective_matrix(result=result, dimension_names=None)
fig.savefig('plot_objective_matrix_all.png')

# Now use a subset of the dimensions identified by their names.
# Note the order is different from the search-space,
# which will also be reflected in the plots.
dim_names = ['bar', 'rip', 'foo']

fig, ax = plot_evaluations_matrix(result=result, bins=20, dimension_names=dim_names)
fig.savefig('plot_evaluations_matrix_some.png')

fig, ax = plot_objective_matrix(result=result, dimension_names=dim_names)
fig.savefig('plot_objective_matrix_some.png')

fig, ax = plot_histogram(result=result, dimension_name='baz', bins=40)
fig.savefig('plot_histogram.png')

fig, ax = plot_objective_2D(result=result,
                            dimension_name1='baz',
                            dimension_name2='rap')
fig.savefig('plot_objective_2D.png')

print("Done plotting!")

These are the resulting plots:

plot_evaluations_matrix_all
plot_evaluations_matrix_some
plot_objective_matrix_all
plot_objective_matrix_some

plot_histogram
plot_objective_2d

@Hvass-Labs
Copy link
Contributor Author

Here is another example using a "simulated" Neural Network for the objective function. This has a Categorical dimension which cannot be plotted.

import numpy as np
from math import exp
from skopt import forest_minimize
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_objective_matrix, plot_evaluations_matrix
from skopt.plots import plot_histogram, plot_objective_2D

dim_learning_rate = Real(name='learning_rate', low=1e-6, high=1e-2, prior='log-uniform')
dim_num_dense_layers = Integer(name='num_dense_layers', low=1, high=5)
dim_num_dense_nodes = Integer(name='num_dense_nodes', low=5, high=512)
dim_activation = Categorical(name='activation', categories=['relu', 'sigmoid', 'linear'])

dimensions = [dim_learning_rate,
              dim_num_dense_layers,
              dim_num_dense_nodes,
              dim_activation]

default_parameters = [1e-4, 1, 64, 'relu']


def model_fitness(x):
    learning_rate, num_dense_layers, num_dense_nodes, activation = x

    fitness = ((exp(learning_rate - 1e-3) - 1.0) * 5000) ** 2 + \
          (num_dense_layers - 3) ** 2 + \
          ((num_dense_nodes - 200)/100) ** 2

    fitness *= 1.0 + 0.05 * np.random.rand()

    if activation == 'sigmoid':
        fitness += 10
    elif activation == 'linear':
        fitness += 20

    return fitness


print("Optimizing ...")

result = forest_minimize(model_fitness, dimensions, n_calls=200,
                         base_estimator="ET", random_state=4)

print(result.x)
print(result.fun)

for fitness, x in sorted(zip(result.func_vals, result.x_iters)):
    print(fitness, x)

print("Plotting ...")

try:
    fig, ax = plot_evaluations_matrix(result=result, bins=20, dimension_names=None)
    fig.savefig('plot_evaluations_matrix_all.png')

    fig, ax = plot_objective_matrix(result=result, dimension_names=None)
    fig.savefig('plot_objective_matrix_all.png')
except:
    print("Cannot plot Categorical dimensions! Skipping ...")

# Now use a subset of the dimensions identified by their names.
# Note the order is different from the search-space,
# which will also be reflected in the plots.
dim_names = ['num_dense_nodes', 'learning_rate', 'num_dense_layers']

fig, ax = plot_evaluations_matrix(result=result, bins=20, dimension_names=dim_names)
fig.savefig('plot_evaluations_matrix_some.png')

fig, ax = plot_objective_matrix(result=result, dimension_names=dim_names)
fig.savefig('plot_objective_matrix_some.png')

fig, ax = plot_histogram(result=result, dimension_name='learning_rate', bins=40)
fig.savefig('plot_histogram1.png')

fig, ax = plot_histogram(result=result, dimension_name='activation', bins=40)
fig.savefig('plot_histogram2.png')

fig, ax = plot_objective_2D(result=result,
                            dimension_name1='learning_rate',
                            dimension_name2='num_dense_nodes')
fig.savefig('plot_objective_2D.png')

print("Done plotting!")

These are the resulting plots:

plot_evaluations_matrix_some
plot_histogram1
plot_histogram2
plot_objective_2d
plot_objective_matrix_some

@Hvass-Labs
Copy link
Contributor Author

Let me once again start out by saying that I really like the idea of your project! My issue is with the implementation and the code "style" of certain parts of the project.

I rewrote almost the entire plots.py except for plot_convergence() which I don't want to spend hours understanding and rewriting. I'm not happy about leaving a file that I've almost completely rewritten, but frankly I've had enough of reading your code and feel my time is better spent elsewhere.

I would also like to add the plotting functions from: https://scikit-optimize.github.io/notebooks/bayesian-optimization.html to plots.py but I'll await your comments on this PR before I spend probably a whole day understanding and rewriting that.

From our previous discussions I get the impression that you don't like comments inside code. You find comments to be redundant or whatever. So I expect that you will either demand that I remove all those comments I added to plots.py, or you will bull-doze them the next time you work on the code yourself. At any rate, I get the impression that you look down on comments in code.

Here is the truth: Code with few comments might be OK if your code is really fantastic and beautiful, and your audience is very familiar with all the API's you are using, the math and methods, etc.

But your code is far from fantastic and I repeatedly got the impression that plots.py was written by someone without formal training in computer science and software engineering. It would certainly not get a passing grade where I come from. And when the code is poorly written AND has almost no comments, then it is really problematic for others to work on. This is what I meant by an arrogant coding style, because it shows complete disregard for whoever is going to work on the code next. Fair enough if the person is not a masterful coder - but everyone can write proper comments in English!

Fact is that the code was broken, and since you didn't want to fix it yourselves, someone from the outside would have to do it for you. If the code had been well-written or at least well-documented and well-commented, then it could have been fixed in a fraction of the time I have now spent. Instead I feel I've wasted almost a week understanding and cleaning up your code, because you didn't even want to add comments that showed what the intention of the code was, so I had to spend hours upon hours trying to deduce it.

And when you talk about "programming idioms" and "complex research" and that is why I might find your code difficult to understand, it is also rather arrogant, because what you're really saying is that you think people who don't understand your code must be ignorant or stupid. But in reality you are just making poor excuses for writing bad code!

So it has been very frustrating working on your code!

Let me ask you two questions:

  1. Do you think someone in the future will find it easier to work on my version of plots.py or your old version? If you ask 100 programmers to fix a bug or extend plots.py with new features, I bet you that most of them would give up trying to understand your old code and they would just hack it until it maybe worked. This would bloat the code and quite possibly introduce new bugs. But even first-year students can now understand my version of plots.py.

  2. Would you rather have someone waste a week trying to understand code you neglected to spend a few hours commenting when you wrote it, or would you rather have that person spend a week contributing something more meaningful to your project?

It is obvious that you are getting quite annoyed with me, but I really hope you take this criticism to heart. You don't have to like me, but at least consider that what I am saying might be reasonable and would significantly improve your project. Scikit-optimize could be a fantastic library! - but you need to improve the code-base so others can understand it and help you extend it, especially when you no longer work on it yourself!

Best wishes and Happy New Years! :-)

@Hvass-Labs
Copy link
Contributor Author

I've looked a bit into plotting the Gaussian Processes from your tutorial here:

https://scikit-optimize.github.io/notebooks/bayesian-optimization.html

For a real problem we do not have the true function so that can be omitted.

Then we have the mean and variance of the Gaussian Process which is green in those plots and the acquisition function which is blue.

I don't know if I understand this correctly, but I think that in order to plot these three aspects of the Gaussian Processes for multi-dimensional search-spaces, we would have to make Partial Dependence plots similarly to plot_objective() (which I call plot_objective_matrix() in my version of plots.py). That is, we would make a plot-matrix for the GP mean, another plot-matrix for the GP variance, and finally a plot-matrix for the acquisition function.

A significant part of the existing code could be reused - either the lazy way by copy and paste the common code, or the proper way of having the common code in a separate function which is used by both plot_objective() and a new plot_gaussian_process().

But I'm not sure how useful these plots of the Gaussian Process would really be for multi-dimensional problems and if it would be worth the effort. Any thoughts?

@Hvass-Labs
Copy link
Contributor Author

Last evening I thought about this and I really have to move on to other things, so I won't be implementing the plots of the Gaussian Processes, even though plots of especially the acquisition function might be quite useful.

So let's focus on what needs to be fixed with the PR as it stands now before it can be merged. Perhaps you don't like that I changed the names to plot_..._matrix() and I can easily change them back, that's perfectly fine.

Note from the discussion in PR #592 that I will not remove comments from the code. They serve a purpose of helping newcomers 5 years from now to quickly and clearly understand the code. This code is not written for you - it is written for them!

Hvass-Labs added a commit to Hvass-Labs/scikit-optimize that referenced this pull request Jan 3, 2018
@iaroslav-ai
Copy link
Member

I assume that this PR is ready to be merged (@Hvass-Labs if otherwise let me know) so I add [MRG] to the title (scikit-learn style titles).

@iaroslav-ai iaroslav-ai changed the title Added two plotting functions and more support for named dimensions. [MRG] Added two plotting functions and more support for named dimensions. Jan 6, 2018
@Hvass-Labs Hvass-Labs changed the title [MRG] Added two plotting functions and more support for named dimensions. [MRG] Rewrote most of plots.py. Added 2 more plotting functions. Better support for named dimensions. Jan 8, 2018
@Hvass-Labs Hvass-Labs changed the title [MRG] Rewrote most of plots.py. Added 2 more plotting functions. Better support for named dimensions. [MRG] Rewrote most of plots.py. Added 2 more plotting functions. Better support for named dimensions. Jan 8, 2018
@Hvass-Labs
Copy link
Contributor Author

@iaroslav-ai and @betatim I'm sorry that this PR has become so unwieldy. As you can see the checks are failing because I rewrote the API a bit e.g. renamed plot_objective() to plot_objective_matrix(). I'll tell you what. I'll revert these things to try and make it compatible with your old API, so all the checks pass and you also don't have to rewrite all your tutorials. Then I'll let you know when I believe this is ready to be merged 'as is'. That should make it easier for you to review.

@Hvass-Labs
Copy link
Contributor Author

I have now pushed a version that I believe can be merged 'as is', provided all tests pass. The tests all pass on my system and I have also run bash ./build_tools/circle/execute.sh locally and it seems to create all the plots correctly. I have also run the test-code that I have shown above and it works fine.

There is no need to read all of the above discussions when reviewing this. Here is essentially what this PR does:

  1. Improves support for named dimensions in several ways:

    • A dimension is given a default name 'X_index' if the user has not supplied a name, where index is a count for this dimension in the search-space. This is the original naming scheme you have used in plots.py and I have just moved it into Dimension and Space. I'm not so fond of the name X_index but I don't really have a better suggestion. Maybe dim0, dim1, dim2 which would also work better with @use_named_args? Maybe another PR for this in the future.

    • Plots now use these dimension-names on the x- and y-axis. This means there is a slight API change in e.g. plot_objective() and plot_evaluations() but I don't think this even affects v. 0.4 users (not sure).

    • Names should be valid Python-variable-names because they may be used as args with @use_named_args. For example, you should use n_trees instead of number of trees. If you want another name to be displayed e.g. in plots then an idea would be to either add display_name to Dimension or to let the plotting functions take a separate list of names to be displayed. That would be a future PR.

    • Dimensions can be looked up from the search-space using dict-like lookup e.g. space['foo'] and several dimensions can be looked up using e.g. space[['foo', 'bar']]. (Thanks to @betatim for suggesting this feature instead of e.g. get_dimensions()).

    • I also had to add an __init__ to Dimension and remove test_dimension_name_none because a dimension's name can now only be None during initialization. I also fixed various bugs related to names.

  2. plots.py has been almost completely rewritten because I needed to add support for search-spaces with Categorical dimensions. The easiest way of doing this, was to let the user select a subset of the dimensions to be plotted, hence the need for e.g. space[['foo', 'bar']]. But this required so many changes to plots.py and because I found it very difficult to understand your code, I decided to do a complete rewrite.

  3. I have added two new plotting functions plot_histogram() and plot_objective_2D(). These also serve as a simple example of plotting data from the optimization log. plot_histogram() also allows plotting of Categorical dimensions, which is not allowed by plot_objective() and plot_evaluations().

In my TensorFlow video tutorials I sometimes spend a few minutes commenting on the coding style of other libraries. But there were so many issues with plots.py that I decided to do a whole separate video, because it might be useful to a lot more people than just TensorFlow users. So I have actually already recorded a 40-minute video where I discuss why I rewrote plots.py like this. I have tried to make the video funny and memorable. It is sort of a stand-up comedy code review and I hope you don't get too upset. I don't think it is a personal attack, so just take it on the chin :-) I'll put it on youtube shortly after the actual tutorial which I haven't recorded yet.

So what I suggest, is that you merge this code now if it works correctly, so my audience can run my new tutorial which relies on this code addition. If you greatly dislike my coding style and want to bulldoze the comments, then please wait until you've watched the video where I explain my coding philosophy. Perhaps you'll realize that what I've done actually makes sense.

@Hvass-Labs
Copy link
Contributor Author

I have fixed the merge conflicts with the master-branch. I did a git rebase to squash all my commits into a single commit, then rebased to master and fixed merge conflicts, and then push --force back to github. I've never done this before and it took me some hours to figure out how to do it. If I've messed up anything please let me know.

If all other checks pass, then I believe this is ready to be merged.

@Hvass-Labs
Copy link
Contributor Author

Hvass-Labs commented Jan 17, 2018

@betatim Thanks!
@iaroslav-ai If you have some time, would you mind helping with this review? I found your review of my @use_named_args contribution to be constructive.

EDIT: I don't think it's a good idea to have another outsider / newcomer help with this review. I really wish @iaroslav-ai would give a hand.

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 17, 2018

I am currently experimenting with the plotting functionality. Will update here soon with results.

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 17, 2018

P.S. most likely tomorrow after ~5 pm.

@Hvass-Labs
Copy link
Contributor Author

Hvass-Labs commented Jan 18, 2018

@iaroslav-ai Thanks! I don't mean to stress you, but it would be nice if we could close this soon. I think it has been almost 3 weeks since I finished this (apart from minor fixes and rebasing). The installation instructions for my TensorFlow / skopt tutorial are awkward because it uses a commit from my skopt-fork and it apparently doesn't always work. So it would be nice if this could get merged into master soon, so we can all move on.

To avoid having to rebase and fix merge conflicts multiple times, I'll await both of your full reviews before fixing the code.

@Hvass-Labs
Copy link
Contributor Author

You can test the new plotting functions using the code-snippets in the above comments. I think you just need to change plot_*_matrix() to plot_*().

#579 (comment)
#579 (comment)

@iaroslav-ai
Copy link
Member

Hi everyone. Here are some of my thoughts on this.
I did not use previously the plotting functionality and got to know it through this pr.
I would like to understand more where the idea for the following plotting functions comes from, in particular for:

plot_objective
plot_evaluations

I have never seen plots that these functions produce to be used in black box optimization literature. Does anyone knows a paper where these are used? Was it intended originally maybe for the debugging of optimization algorithms?

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 18, 2018

The reason why I ask these questions is because I am not sure how these plotting functions are useful for the skopt user’s.
Consider for instance example plot below, obtained by optimizing a deterministic 6d function, code here:

figure_1

Remark 1. To obtain plots, the average over values of objective is used (over random invisible parameters), whereas a user / algorithm is interested in minimum value of objective. Because of this, user might draw incorrect conclusions about what parameter values are “good”.

Remark 2. Best found minima point values do not correlate with minima of plots. This is due to a) remark 1 and b) because generally the objective is assumed to be sampled from a Gaussian process prior where you can have complex non-linear correlations between n > 2 dimensions, which hence cannot be represented in 2d plots. Because of this, incorrect inferences can be drawn about correctness of optimizers, correlations between two variables, and absence of “higher order” correlations.

Remark 3. Primary goal of skopt is to optimize user objective efficiently. How do these plots help for that? For instance, the idea of using such plots to aid the optimization contradicts the idea of taking user out of the loop.

Remark 4. A minimum can be computed as per Remark 1. However this would still fall under remark 3. Also technical difficulties can occur for large number of dimensions (say 20+).

Somewhat similar issues can be brought for the plot_evaluations.

Possible benefit of such plots is to catch failures of optimization algorithm. However ideally such failures should not occur / should be detected automatically without user’s intervention.

I am happy to further elaborate on this if necessary or if something unclear :)

@iaroslav-ai
Copy link
Member

@Hvass-Labs sorry that I start these discussions here! I know you want to be done with this soon. But it would be good to clarify the why, as it then dictates the how . Results of discussion here could eg go to docstrings.

@Hvass-Labs
Copy link
Contributor Author

@iaroslav-ai Your review is very good and perfectly fine to have the discussion here! You are neutral and polite, and your goal is to make it a good user-experience. That's why I asked you to help with this review :-)

Quick comment on your example: In my experiments I found that 40 iterations was very low for getting semi-accurate contour plots. You could try 100 or 200 iterations instead. I use 40 iterations in my own tutorial on skopt and TensorFlow because those experiments are very expensive to run:

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/19_Hyper-Parameters.ipynb

Note: I wrote the tutorial in just a matter of days. The reason it took me 3-4 weeks to finish it, was because I wanted these plots to be in the tutorial as I found them quite useful - in spite of their short-comings - and I needed to fix plots.py to make it work with my Categorical dimension. It turned out to be a bit of a hornet's nest to fix it, though :-)

I will try and answer your questions as best as I can, considering that I'm not the original author of these, and I mainly just cleaned up the code and added a couple of features. I have combined the answers for several of your questions.

About plot_objective()

These contour-plots show a kind-of "approximation of an approximation" and are hence not completely reliable, as you point out. Perhaps this should be more strongly noted in the doc-strings? The first approximation is from the model used for the objective function (e.g. Gaussian Process) which can be unreliable in regions that are thinly sampled. The second approximation is from the so-called Partial Dependence plot, which is a "trick" used to show 3D and higher dimensions in 2D-plots, by averaging over the dimensions that aren't shown.

There are a few problems with the best-found "red star" that is shown in the plots. Due to noise in the objective function, and the inaccurate surrogate model, and the averaging of the Partial Dependence plot, the best-found point may fall outside of the best region shown in the plots (yellow colour). I agree that this is somewhat confusing but I'm not a statistician so I don't know any clever ways to make a better plot of these things. Perhaps the doc-string should explain this better?

Also note that the function is very expensive to compute for higher dimensions because it evaluates the surrogate model n_dims^2 * n_samples times as I recall, where n_samples might be fairly high (default 250) to get a fine-grained contour-plot. That is another strong argument for having the new feature I added, where you can use a subset of the dimensions to use in the plot.

In my own work on Meta-Optimization I also plotted the "meta-fitness landscapes" but I didn't have a surrogate model, so I needed to actually evaluate the real objective function in a grid, while keeping the other dimensions fixed at some "good" hand-tuned value. You can see a few examples of landscapes plotted this way here:

https://en.wikipedia.org/wiki/Meta-optimization

I think the plots in plot_objective() are more useful in spite of their short-comings.

About plot_evaluations()

This shows a histogram of samples on the diagonal instead of the Partial Dependence plots. I also found the histogram to be useful.

Below the diagonal this shows the points where the objective function was sampled. You can argue that this was already shown in plot_objective() and hence not very useful. That is sort-of right, but this plot doesn't show the contour of the objective function (which is very expensive to compute for higher dimensions!), and instead it shows a colour-grading of the time-step at which the points were sampled. Because you only have 40 iterations in your example, it may not be very useful, but what you would typically see for higher iterations, is that the samples start to converge in a certain region, as shown by the colour-grading.

About plot_histogram()

If you only want to show the histograms from plot_evaluations() then you can use plot_histogram() instead (which I added). It also supports Categorical dimensions.

About plot_objective_2D()

This shows the contour-plot from plot_objective() but only for 2 dimensions. This may be useful for some people, and it also is a simpler code-example in case people want to write their own plotting-functions, because plot_objective() has a lot of other things as well.

Conclusion

Altogether, I think the plots in skopt are useful in spite of their short-comings. And like I said, I only waited to finish my tutorial on TensorFlow and skopt to get these plots in. (Was it worth it in hindsight considering all the trouble? Well ... :-)

@Hvass-Labs
Copy link
Contributor Author

Should I conclude from your silence that if I fix the things you've mentioned then this can be merged? @betatim mentioned a couple of minor issues and @iaroslav-ai wanted better doc-strings for the plotting functions. I don't want to rebase and fix merge conflicts only to find out that you have more things you want changed.

I think I argued quite well that (1) these plotting functions are useful, (2) the new semantics for dimension-names are useful and sensible, (3) a dimension's search-space index is needed for efficiently working with sub-sets of dimensions and the natural place to store the index is inside the dimension-object. I believe this dealt with all your concerns.

Please note that the very small PR #597 is also needed for me to complete this PR.

My tutorial on TensorFlow and skopt has been viewed almost 800 times by now. My guess would be that it will get 10000 views within a year. Currently people have to install my dev-fork of skopt and it doesn't work for everyone. People are going to be disappointed in skopt if it doesn't work, so it really is in your own best interest to get this merged soon. Not to mention that people are probably going to be reluctant to contribute if it takes so long to get their work merged.

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 22, 2018

Sorry, I cannot convince myself about plot_objective and plot_evaluations. Right now I am of the opinion that they should be removed. This is because of the following:

  1. There is no peer reviewed publication on black box optimization that I know of that would use these plots (ideally with evaluation of usefulness of these plots, but if it is used to aid in a way the user that is shown to lead to better objectives is also good).
  2. The plot_objective is borrowed from the supervised learning domain, where you are interested in all possible inputs, whereas we are interested in subset minimizing inputs. This is also better addressed with the plots that you have referenced in wiki.
  3. A possible use of plot_evaluations is as you mention is to check for convergence. However, this is a subjective criterion that depends on viewer of the plot, and does not scale if a user runs multiple different optimization problems in parallel (say, do training on different datasets) as user then needs to monitor n plots. I would rather prefer objective criterions such as this one that we have or others in literature. These are objective, scale with number of optimizations run and can be tested automatically.
  4. Testing usefulness of these plots is hard - would require a user study to show that the plots indeed help users to optimize better their objective. This is necessary in absence of point 1.

These methodological issues are important, as these plots are intended to aid user in getting better optimization outcomes, beyond purely technical support (correct me if you have something else in mind). Due to uncertainty I would rather go with Occam’s razor approach and not include them.

Consider examples for other functionality of skopt:

I cannot argue like that for these plots.

I did not come to this conclusion easily - I know and appreciate that you put a lot of work into rewriting these plots. Yet it is better to address these methodological issues right now, to avoid unnecessary effort spent, and breaking changes in the future, which could also affect your tutorial.

Please do let me know if you wish for me to elaborate more on the points that I raise or if you have any comments on this.

@Hvass-Labs
Copy link
Contributor Author

@iaroslav-ai Thanks for your review. You have good intentions even if we strongly disagree. Of course I'm not happy if all my work is lost :-) But I don't want to waste more time on this anyway, so we need to close this PR this week whichever way you guys decide.

On your points:

  1. I'm not sure if I understand you correctly, but this goes together with what you said the other day. It sounds like you haven't seen these plots in publications before, so from that you conclude nobody will use them in the future? I don't mean to be condescending, but strangely enough I sometimes hear this "argument" from people in academia, that there is no precedence for something, so they don't think it should be done now. But if people only did what was previously accepted then science and technology would stagnate completely. It's not a valid argument that something hasn't been done before. In fact, you should be proud if skopt introduces something new to the field! Unfortunately I can't take credit for coming up with these plots - somebody else was apparently quite visionary by adding these plots (I actually don't know who originally wrote this code because Github shows several people have worked on this file.)
  2. The plots for Meta-optimization that I referenced on Wikipedia are not possible to do for most problems because they are created by evaluating the actual objective function on a grid in the search-space, which would be extremely time-consuming, as you would have to evaluate func many more times to make such a plot than you did for the actual optimization using e.g. gp_minimize(). Furthermore, in order to plot high dimensional spaces in 2D those plots on Wikipedia had fixed the parameters for all other dimensions, which is not a very good way to do it. The plots in skopt plot_objective() are created from evaluating the fairly cheap surrogate model, and they use Partial Dependence to show high-dimensional spaces. This distorts the landscape-plot, but it is a compromise that is necessary to make if you want to be able to plot it.
  3. I actually have already read the Vizier paper (but thanks for the link) and I don't understand why Google hasn't open-sourced it. It is not instrumental to their business and they already have $100 billion in cash and they generate more than $20 billion cash per year in profits, as I recall, so they can surely afford to open-source a tiny little library that could help others in the world. There is a discrepancy between what Google say they will do for the benefit of society and what they actually do. Anyway, I digress. What you reference here are dedicated convergence plots in the Vizier paper and the DeltaX-stopping criterion in skopt. But plot_objective and plot_evaluations are of course meant for a completely different purpose - to plot various aspects of the optimization progress and the search-space after it has been run. Personally I like plot_objective the most but plot_evaluations is useful for other things, e.g. plotting the histogram (which can now also be done with plot_histogram), but it should also be much faster than plot_objective for high dimensionalities, so it is useful to see where in the search-space the samples were taken, without plotting the landscape which is very expensive.
  4. Usefulness is highly subjective. I find these plots very useful - you find them completely useless. Neither of us seems able to convince the other :-)

Please take a look at your own tutorial which uses these plots. As you will see, the plots make good sense there and clearly show you the objective-landscape and the regions that were sampled, the sample-ordering etc. These plots are better approximations to the true objective function because they are built from surrogate models with 160 samples of the true objective-function, whereas you only took 40 samples. I think these plots are great! And I am perfectly willing to accept more distorted plots for harder problems where we cannot evaluate hundreds of samples of the objective-function.

https://scikit-optimize.github.io/notebooks/visualizing-results.html

I think it is a mistake to discuss whether these plotting functions should be in skopt or not. We are way beyond that point. If you have other plotting functions that are better, then those can be added later by someone else in another PR. Somebody else thought these plots were useful to add to skopt (the above tutorial was written by @betatim so perhaps he also wrote the original plots.py - I don't know). I agree completely that these plots are useful! If they are "new" to the field, then congratulations you have added something to the field that you should be proud of! Perhaps in 5 years everyone will show these plots in their papers, thanks to you!

Unfortunately plots.py didn't work with Categorical dimensions which I had in my Neural Network problem, so I had to fix that. But I didn't understand your code because it was so convoluted and undocumented that I had to rewrite it all in order to modify it. It is now probably the cleanest code you have in skopt - you can get angry with me for saying that, but it's the truth. It makes very minor changes to the layout of the plots that I think looks better. So all we need to decide now is whether the code is in correct working order and if something needs to be fixed.

The code passes all the tests and also creates the plots correctly for your own tutorial. I can expand on the doc-strings with my explanations above. And I can fix the minor issues @betatim raised. I'll be happy to do that, but then we must merge it. I cannot afford to waste anymore time on this. @betatim and @iaroslav-ai should I make these minor changes so you will merge, or should I stop work on this now?

@iaroslav-ai Please review and merge #597 which is needed for this PR as well (some of the issues @betatim raised are fixed by that).

Please consider that this thread may deter others from wanting to contribute to skopt in the future because it is so problematic. So once again, I think it is in your own best interest to get this PR closed ASAP.

@iaroslav-ai
Copy link
Member

@Hvass-Labs thanks for detailed comments! Will respond later today.

@iaroslav-ai
Copy link
Member

Thanks again for a thorough discussion! To extend a bit on my arguments:

I would measure the usefulness of an algorithm as a minimum objective found by an algorithm after k iterations, k > 0, averaged on a set of n >> 0 problems such as ones used here or others. Our library is concerned with doing optimization best it can, so we should measure our success by how well we optimize “in general”.

To argue about the usefulness of the plots, I would like to see a comparison of such quantities, which could look like:

With plots Without plots
0.6 +/- 0.2 0.9 +/- 0.1

It does not matter to me where such data comes from as long as it is reliable - paper or not :) But without such reliable data, we cannot make any claims of whether these plots are helpful or otherwise. Any claims without supporting data is just a speculation.

I would like to provide to every user of skopt a functionality which is verified to be beneficial in optimizing user’s objective.

One can possibly define other meaningful evaluation metrics. Generally, I find it important to show that indeed plotting search space can be used to improve some measurable and important performance metric, to motivate necessity for plotting in the first place.

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 23, 2018

On the other hand side, I must confess removing the functions will involve too much stone turning in skopt, with already large PR. Plus it might be useful for tutorials/examples in say 2d space. Finally, it is good to decide on something concrete and doable to do.

I propose the following plan:

  1. let me and others who might be looking here think about this for a day and add some comments if any.
  2. If nothing comes up, we modify the docstrings, to add the warning that there is no publications or public data available to support these plots.
  3. no guarantees are made that these functions are not removed or changed in the future. This is also noted in docstrings.
  4. We fix if there are some technical issues with pr, and merge it

Anyone comments on the plan?

P.S. I will be not very responsive tomorrow and the day after tomorrow in the morning, but will get better after that.

@Hvass-Labs
Copy link
Contributor Author

@iaroslav-ai Thanks for the quick response! Once again I don't mean to be condescending, but I am smiling at this discussion. I think I am a lot older and more experienced than you, so I hope you take this as friendly feedback. It is amusing and annoying at the same time because I don't have time for it, but it is a bit funny :-) You were such a cool dude on the @use_named_args PR where you came in and resolved matters very quickly and professionally in the middle of a heated discussion - just like Mr. Wolf :-) But what you're saying here doesn't make any sense. That is why I'm smiling now :-)

How are you going to quantify the usefulness of a plot? It is plotted after the optimization has taken place. Personally I am just curious what the objective function looks like. The surrogate model is only an approximation and the 2D mapping is a further approximation. So I know that the plots are probably misleading to some extent, especially if the surrogate model is only built from e.g. 40 samples. So the plots have limitations. That doesn't mean they are useless. Especially if you don't have a better way of plotting them. But you cannot even prove and quantify the usefulness of an ordinary convergence plot, yet those are used everywhere.

As you agree, this is not a very fruitful discussion to be having. If in 6 or 12 months you find a better way of making plots then you can add those, or modify plot_objective if you find a better way to map +3D to 2D plots. But don't remove these plotting functions because you don't find them useful. That is not a rational decision.

@iaroslav-ai If you agree to merge, then I will clarify in the doc-strings that the plots have the limitations we have discussed. I don't think it is reasonable to add a warning that they might be removed, but perhaps a note that the visuals might change if a better method is found in the future. You are very welcome to review the code for bugs / problems, but I don't think there's a need for you to spend many hours on that - it is fairly high quality code and it passes all the tests. Please merge #597 first as it is needed here.

@betatim Please confirm that you will also support the merge if I fix the minor issues you raised. You guys have had almost 4 weeks to discuss the new naming semantics and I don't have anymore time for this. So you need to make a decision now whether you will merge this or not.

I am in beautiful Greece surrounded by spectacular vistas of mountains and water. Even so, I am happy to spend my Saturday in front of my computer finishing this PR. But that is the last day that I am willing to help you.

Please note that you already start getting issues with people who are having problems with these functions (one in #606 and another person in this thread). Those people are coming here by way of my tutorial. You should expect a large number of people to learn about skopt from my tutorial in the future. I explain in both the video and the Python Notebook that people need to install from my dev-fork to use this plotting functionality. Hopefully these functions are added to skopt 0.5 so it can be installed easily with pip. But after this discussion I will keep my dev-fork as long as there is a possibility that the functions might be removed from skopt master. This is harmful to your cause because people will be very confused about what to use, especially as I will not maintain my dev-fork.

@iaroslav-ai
Copy link
Member

iaroslav-ai commented Jan 25, 2018

Let's just focus on the PR. I think our time is best spent on reviewing the code or enjoying our evening or enjoying beauty of Greece :)

The tests fail on Travis for some reason:
https://travis-ci.org/scikit-optimize/scikit-optimize/jobs/326786738

@Hvass-Labs please make sure that you rebase your code on master and that all the tests pass locally for you. Please do address the comments that @betatim raised. Also, state the limitations of dependency plots in docstrings as you mentioned. I was told by @betatim that this was the original inspiration for the plots.

I am fine with merging this PR as long as everyone has reviewed the code and given their +1.

@betatim
Copy link
Member

betatim commented Jan 25, 2018

I haven't had time to look at this any further than the comments I made previously. The tone of the discussion doesn't make me want to schedule time for it either. The nature of volunteer driven projects is that it is more important to create a welcoming environment than anything else so that people do schedule time to look at things. I'd ask you to please tone down and edit your comments, otherwise people will not engage with the work you are doing. Beyond just your work it creates a combative and adversarial feeling which turns people off. As a option of last resort the maintainers of the project would ask you to leave.

Trying to force things with comments that are easy to (mis)read as threats ("You will be flooded by users requesting this.", "People will get a bad impression.") is not productive. The best possible outcome is a bad atmosphere amongst contributors.

Experience has shown that the effort to review work grows exponentially with the amount of change. So many small (<50 lines) PRs are good and keep the ball rolling. For large changes building consensus first and following it up with a series of small PRs seems to work well. Some things can't be achieved in small steps, but this means they will be unfortunately very, very slow. (Anecdote: changing two lines in the jupyter notebook project took ~3months to get merged.)

You mentioned that you also don't have a lot of time to spare. That is understandable. Feel free to leave the PR as it is now, someone will eventually come and pick it up, or cherry pick parts of it into a smaller PR. Authorship will be maintained through the cherry-picking. It is part of the volunteer aspect that PRs stall, get picked up by others later, or just abandoned. This is Ok and not a failure on anyone's side.

@Hvass-Labs
Copy link
Contributor Author

@iaroslav-ai This is the text I've added to the doc-string of plot_objective():

NOTE: The Partial Dependence plot is only an estimation of the surrogate
      model which in turn is only an estimation of the true objective
      function that has been optimized. This means the plots show
      an "estimate of an estimate" and may therefore be quite imprecise,
      especially if few samples have been collected during the optimization
      (e.g. less than 100-200 samples), and in regions of the search-space
      that have been sparsely sampled (e.g. regions away from the optimum).
      This means that the plots may change each time you run the
      optimization and they should not be considered completely reliable.
      These compromises are necessary because we cannot evaluate the
      expensive objective function in order to plot it, so we have to use
      the cheaper surrogate model to plot its contour. And in order to
      show search-spaces with 3 dimensions or more in a 2-dimensional plot,
      we further need to map those dimensions to only 2-dimensions using
      the Partial Dependence, which also causes distortions in the plots.

@Hvass-Labs
Copy link
Contributor Author

I have now fixed what you have requested and pushed it back to github. I did not squash it to make a single commit in the PR, because it might cause problems for people who are using my dev-fork. I have run all the tests in skopt/tests and it works locally. I believe this PR is ready to be merged.

This ends my involvement in your project. You can now merge it or not, it's up to you.

I will leave my tutorial using my dev-fork for now. That is liable to give you guys a headache later on as explained above, but that's apparently how you want it - it's not a threat, it's just a fact. (I have re-read my message above and don't understand how you perceive that as menacing. #579 (comment))

You have had 4 weeks to make a very simple decision and it is obvious that it could take many months and never result in anything. I simply cannot afford to waste anymore time on this.

@betatim raised some issues about this PR 9 days ago (!) that I answered within one hour (#579 (comment)). I explained in detail the technical reasons why I had made those decision, including issues that @betatim himself had raised earlier. But @betatim never responded which was of course frustrating because it stalled the PR, and now he's mad again about something I said so he refuses to respond and finish this PR.

Nobody is more sorry about this whole situation than me. I am the person who has wasted a lot of time and energy on it. I also don't enjoy this, but I do the work anyway because it needs to be finished.

This whole thing got off course when you guys refused to accept responsibility for writing really bad code that was functionally broken and impossible to understand and fix for others. Instead you tried to shift the blame to a newcomer for being unable to understand your code. All you had to do was say: "We can see that this code may be really hard to understand for others, and we're really sorry for the pain it has caused. If you're willing to take a crack at fixing and cleaning it up, we'll be happy to help explain it wherever we can. In the future we'll try and make our code easier for others to work on."

But instead you got pissy when I started complaining about your code, and I also recall that @betatim mentioned somewhere early on that I might be having mental issues! (That might have been in another PR ... something about me having angst or something.) That is about as low as you can go and I certainly haven't sunken that low when I have criticized your work. Of course I didn't take it very seriously, but before you guys blame me for having bad manners and being counter-productive, you may want to look at yourself first.

In my video tutorial on TensorFlow and skopt towards the end I encourage people to help you guys out. But I feel I now have a responsibility to moderate that recommendation and make it clear that it can be a very prolonged and painful experience that ultimately may not yield anything, so I will post a link to this thread and let people judge for themselves whether they want to spend time on your project. I know you guys believe I have harmed my own reputation in this thread, but that is really not the case.

I still think your project is really cool, but I also think it is really difficult to work on, in great part because of your attitude towards "collaboration". Naturally, I wish I had never gotten involved so I could have spent the time and energy on something more productive.

Despite all of this, I still sincerely wish you the best of luck in the future of this project and hope we all learned something from this experience!

To end all of this on a humorous note :-)

https://www.youtube.com/watch?v=J6VjPM5CeWs

@scikit-optimize scikit-optimize locked as too heated and limited conversation to collaborators Jan 26, 2018
@betatim betatim closed this Feb 13, 2020
@betatim betatim reopened this Feb 13, 2020
@holgern holgern merged commit 7ee5971 into scikit-optimize:master Feb 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants