-
Notifications
You must be signed in to change notification settings - Fork 554
[MRG] Rewrote most of plots.py. Added 2 more plotting functions. Better support for named dimensions. #579
[MRG] Rewrote most of plots.py. Added 2 more plotting functions. Better support for named dimensions. #579
Conversation
|
Hello @Hvass-Labs! Thanks for updating the PR.
Comment last updated on January 26, 2018 at 12:06 Hours UTC |
|
I have tested the code in this PR on the actual problem I am trying to solve, which is hyper-parameter optimization of a neural network. It is for one of my tutorials on TensorFlow but it is not yet released as I want to include plots from skopt. I have a question for you regarding the interpretation of the plots. The plots I have here are similar to the ones in your original version, except that I have normalized the color-gradients. The parameters for my search-space are:
These are the fitness values and neural network parameters found by skopt. The fitness is the negative classification accuracy on a validation-set. [(-0.99019999999999997, [0.0010475183322128938, 1, 149, 'relu']), The min and max of the fitness values in In the code for (1) This picture shows the learning_rate vs. num_dense_layers. You will note the red star which shows the best parameters found, and the yellow area which is presumably the minimum area of the Gaussian Process? The (2) This next picture shows the learning_rate vs. num_dense_nodes where the (3) This next picture shows the num_dense_layers vs. num_dense_nodes, whose I'm a bit confused about how to interpret these plots. Can you perhaps explain that to me? Is it because e.g. plot (3) for num_dense_layers vs. num_dense_nodes "averages" the fitness over the other remaining dimensions for learning_rate and activation? How should I interpret this? Does it mean that the performance is unpredictable from these two variables alone? But if we look at plot (1) which shows a yellow area in the middle, does this mean that it is "generally" better to have a learning_rate around 0.002 and num_dense_layers of 3. But the best solution had num_dense_layers=1. Is it because the Gaussian Process is incapable of modelling this search-space accurately? Or is it because we are averaging the performance over the remaining variables, so it is possible that for particular choices of the other variables, the best point can fall outside of the yellow area? Another thing is that these plots change each time I run I hope you can enlighten me as to what is going on with these plots :-) |
Codecov Report
@@ Coverage Diff @@
## master #579 +/- ##
==========================================
- Coverage 86.6% 82.23% -4.37%
==========================================
Files 23 23
Lines 1777 1903 +126
==========================================
+ Hits 1539 1565 +26
- Misses 238 338 +100
Continue to review full report at Codecov.
|
|
Thanks a lot for your hard work. I think currently everyone is travelling so it will take some time for looking at this seriously. The doc string format we try to follow https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt this is not entirely successful/uniform across the code base, yet. The code could (always) be commented better or restructured. It probably also depends on each persons history on how confusing they find different idioms. The way forward is to add comments or propose changes like you did. Leaving comments about work to be done/homework for others (empirically) does not lead to things getting done. So I vote to remove them or turn them into comments about the code. |
skopt/space/space.py
Outdated
| self.dimensions = [check_dimension(dim) for dim in dimensions] | ||
|
|
||
| # Names of all the dimensions in the search-space. | ||
| self.dimension_names = [dim.name for dim in self.dimensions] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a property like is_real below so the doc string is accessible in the REPL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
skopt/space/space.py
Outdated
|
|
||
| return distance | ||
|
|
||
| def get_dimension(self, id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a dict like interface so that some_space['learning_rate'] returns the dimension called learning_rate. We might not need to support indexing based access as people can to some_space.dimensions[idx]. This way we avoid having to support some_space[idx] and some_space['learning_rate'] which might require us making guesses about what the user meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea I'll look into it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read this again and you make a very good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works now using Space.__getitem__().
skopt/space/space.py
Outdated
| Returns | ||
| ------- | ||
| * (`dimension`, `index`, `name`) [tuple(Dimension, int, str)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should return just the dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the reason I do it like this is to try and provide efficient support for the existing plotting functions. I agree that it is an ugly function. I'll see if I can fix the plotting functions to make this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index is now set inside each Dimension-instance by Space.__init__ as we need it for e.g. plotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a Dimension inside a space should know what its index is. If you need the index when iterating over the dimensions in a space you should use:
for i, dim in enumerate(space):
print(i, dim)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I originally made the above function return both the index and the Dimension-objects was precisely because I did not want to modify your code. But you need the index for various things when only using a subset of the dimensions and you cannot loop in the way you propose. So an efficient way of doing it was the above, but the API was ugly as I agreed.
What I have done now is to modify both Dimension and Space to give each Dimension-instance an index once they are added to a Space. This way we have easy access to it when we need it e.g. for plotting a sub-set of the dimensions, and for having default names such as X_i for unnamed dimensions. I think it is a reasonable solution.
The only problem with this would be if you later want to change the composition of dimensions in a search-space. But I don't see any need for doing that, or why that should even be supported by the API.
Are there any other problems with having an index inside the Dimension-object?
skopt/space/space.py
Outdated
|
|
||
| # Use a default name for unnamed dimensions. | ||
| # TODO: Maybe it would be better to set this in Space.__init__? | ||
| if name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimensions by default do not have a name, we should only give them "fake" names in the place where we need them (like in plotting) instead of here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it even makes sense to have some dimensions named and others not?
I think it is quite inelegant that some dimensions don't have names because we then have special cases elsewhere e.g. in plotting. So if we want to give them default names it might as well be done in the Space-class.
Let us think about what dimension-names are used for:
- With PR Added support for named arguments to objective function. #590 dim-names are now used for calling the objective function.
- Dim-names are used in plotting for the names on the axes.
- Dim-names may be used to lookup a dim from the search-space object e.g. using the dict-like syntax you proposed above.
- Anything else?
Can you name a case where we don't want dimensions to have names, except for benchmark functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a key question! It makes the code very messy that there is the possibility for having unnamed dimensions. It requires special cases in so many places in the code. If the only reason for having unnamed dimensions is to have support for simple benchmark functions, then it would be better to drop this "feature" and force all dimensions to have names. Please explain why you have unnamed dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnamed dimensions are part historical (to support the shorthand way of defining them), for small search spaces you don't need to have names, and for some problems the dimensions do not have (sensible) names. I'd be -1 on requiring all dimensions to have names (or giving them dummy names automatically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently in the process of rewriting plots.py They now use the dimension-names for axis-labels. I think the most sensible way of supporting unnamed dimensions, is to return X_i for the i'th dimension, or something similar. I have already added an index to Dimension which is necessary when plotting only a sub-set of the dimensions (you'll see when I commit the code). I have just now modified the name property in the Dimension-class to return X_index if self._name is None. This also allows you to lookup unnamed dimensions using e.g. space['X_2'] as well as named dimensions using space['foo']. It works fine and I think it is a reasonable solution. The only problem is that you cannot use different names for the axis in the plots. Sure it can be coded, but it would be a mess. If you want to name the axis in the plot, then they should be set when you define the Dimension. I think this is reasonable.
skopt/space/space.py
Outdated
|
|
||
| return dimension, index, name | ||
|
|
||
| def get_dimensions(self, ids=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could combine this with the above comment about using a dict like interface: some_space[['lr', 'C']] we would have to limit names to strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the right way of doing this in Python? Is there a dunder-method that can be overrided for this type of behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works now using Space.__getitem__() which can take either a single string or list of strings as argument.
skopt/plots.py
Outdated
| n_points=n_points) | ||
|
|
||
| # Start a new plot. | ||
| fig = plt.figure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should create a figure and axis. Then later we can do ax.plot(...) and ax.scatter() instead of using plt.scatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never really understood the semantics of Matplotlib. What is the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plt.plot and co interface uses global state and is meant to make interactive plotting easy. The object orientated interface has less potential for weird interactions between plots because it does not have global state. The best way to learn is probably following some matplotlib tutorials/videos and looking at how other libraries that do plotting handle things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. I'll take a look if I can fix this. I've used Matplotlib many times for many projects, but never liked its API.
skopt/plots.py
Outdated
| plt.xlabel(dimension_name1) | ||
| plt.ylabel(dimension_name2) | ||
|
|
||
| return fig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of the pandas plotting functions return an axis. What other examples of libraries are there that return plots? I think we should try to be consistent with what other libraries do. I'd at least return fig, axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, I never really understood Matplotlib, but when I ran your plotting-functions in a python-file I couldn't figure out how to e.g. save the plot without having the fig-object. When using the functions in a Jupyter Notebook they can work either way with returning fig or axis. I also thought about returning both fig, axis but I didn't think it was necessary. I can give it a try and see if it works in Jupyter.
|
I get the impression that the plotting functions will not get fixed by anyone else and since I really would like this to work, I'll try and take a look at it. I'll probably have to change the existing plotting functions and we'll have to discuss later if there are API changes etc. E.g. the naming scheme you have now is a bit strange when you already have names in the dimensions, so I'll definitely change that. @betatim thanks for all the comments, they were quite helpful and I'll try and incorporate your suggestions. Regarding comments in code, it is the responsibility of the person adding new code to properly document it. He is the only person who really understands what the code is supposed to do. You are hoping for a miracle if you think others will come along and fix your work for you. I will help you out a bit, but to be perfectly honest, it is quite arrogant to add code that isn't properly polished and commented, because if there are 1000 people who will be using the code and they waste just a few hours each, then it is thousands of man-hours that are wasted, because the original author didn't take an hour or two more to document the code. It is not just a problem with your project but a general problem. The solution is simply that you don't accept PR's until they are properly documented so you understand what is going on and you feel the API is sufficiently elegant. Like you are doing with this PR now :-) PS: I really could use an explanation for the plots that I showed above, so I can understand what is going on and explain it in my tutorial. Perhaps I should have opened a separate thread for that question. |
|
All PRs are reviewed by someone who is not the original author before they are merged. This means there are at least two people who are happy with the state of things. The code base is maintained by people who are experienced with python, numpy, matplotlib and scikit-learn, as a result it relies on common idioms and conventions from the PyData ecosystem. This can result in some pretty dense or hard to read code. The way out is for an outsider (who doesn't view the world through the same biased view) to come along and help improve things. Some may even disagree on the level of comments/understandability. To avoid bike shedding what the appropriate level of comments, docs, style, etc is we require a concrete proposal via PR. This seems to work quite well. The miracle is that random people volunteering their time manage to put together a quite well maintained and functioning software project implementing a complex bit of research. There will always be parts that aren't as good as they could be, new things to add, historical legacy that is annoying, etc. Assuming people are already donating as much time and effort as they can, the only way to move things along faster is by donating more effort yourself or donating money to pay for developer time. |
|
As for #590, please remove your inline comments about our coding conventions. This is distracting everyone from the main goal of this PR (i.e., adding two plotting functions). If you want to propose changes to improve how to code looks and is read, we welcome that, but please make self-contained targeted PRs. |
|
I have now pushed a new version. Almost the entire This example tests the new functions using the built-in hart6 benchmark function. These are the resulting plots: |
|
Here is another example using a "simulated" Neural Network for the objective function. This has a Categorical dimension which cannot be plotted. These are the resulting plots: |
|
Let me once again start out by saying that I really like the idea of your project! My issue is with the implementation and the code "style" of certain parts of the project. I rewrote almost the entire I would also like to add the plotting functions from: https://scikit-optimize.github.io/notebooks/bayesian-optimization.html to From our previous discussions I get the impression that you don't like comments inside code. You find comments to be redundant or whatever. So I expect that you will either demand that I remove all those comments I added to Here is the truth: Code with few comments might be OK if your code is really fantastic and beautiful, and your audience is very familiar with all the API's you are using, the math and methods, etc. But your code is far from fantastic and I repeatedly got the impression that Fact is that the code was broken, and since you didn't want to fix it yourselves, someone from the outside would have to do it for you. If the code had been well-written or at least well-documented and well-commented, then it could have been fixed in a fraction of the time I have now spent. Instead I feel I've wasted almost a week understanding and cleaning up your code, because you didn't even want to add comments that showed what the intention of the code was, so I had to spend hours upon hours trying to deduce it. And when you talk about "programming idioms" and "complex research" and that is why I might find your code difficult to understand, it is also rather arrogant, because what you're really saying is that you think people who don't understand your code must be ignorant or stupid. But in reality you are just making poor excuses for writing bad code! So it has been very frustrating working on your code! Let me ask you two questions:
It is obvious that you are getting quite annoyed with me, but I really hope you take this criticism to heart. You don't have to like me, but at least consider that what I am saying might be reasonable and would significantly improve your project. Scikit-optimize could be a fantastic library! - but you need to improve the code-base so others can understand it and help you extend it, especially when you no longer work on it yourself! Best wishes and Happy New Years! :-) |
|
I've looked a bit into plotting the Gaussian Processes from your tutorial here: For a real problem we do not have the true function so that can be omitted. Then we have the mean and variance of the Gaussian Process which is green in those plots and the acquisition function which is blue. I don't know if I understand this correctly, but I think that in order to plot these three aspects of the Gaussian Processes for multi-dimensional search-spaces, we would have to make Partial Dependence plots similarly to A significant part of the existing code could be reused - either the lazy way by copy and paste the common code, or the proper way of having the common code in a separate function which is used by both But I'm not sure how useful these plots of the Gaussian Process would really be for multi-dimensional problems and if it would be worth the effort. Any thoughts? |
|
Last evening I thought about this and I really have to move on to other things, so I won't be implementing the plots of the Gaussian Processes, even though plots of especially the acquisition function might be quite useful. So let's focus on what needs to be fixed with the PR as it stands now before it can be merged. Perhaps you don't like that I changed the names to Note from the discussion in PR #592 that I will not remove comments from the code. They serve a purpose of helping newcomers 5 years from now to quickly and clearly understand the code. This code is not written for you - it is written for them! |
|
I assume that this PR is ready to be merged (@Hvass-Labs if otherwise let me know) so I add [MRG] to the title (scikit-learn style titles). |
plots.py. Added 2 more plotting functions. Better support for named dimensions.
plots.py. Added 2 more plotting functions. Better support for named dimensions.|
@iaroslav-ai and @betatim I'm sorry that this PR has become so unwieldy. As you can see the checks are failing because I rewrote the API a bit e.g. renamed |
|
I have now pushed a version that I believe can be merged 'as is', provided all tests pass. The tests all pass on my system and I have also run There is no need to read all of the above discussions when reviewing this. Here is essentially what this PR does:
In my TensorFlow video tutorials I sometimes spend a few minutes commenting on the coding style of other libraries. But there were so many issues with So what I suggest, is that you merge this code now if it works correctly, so my audience can run my new tutorial which relies on this code addition. If you greatly dislike my coding style and want to bulldoze the comments, then please wait until you've watched the video where I explain my coding philosophy. Perhaps you'll realize that what I've done actually makes sense. |
f5d5c5d to
610ce8d
Compare
|
I have fixed the merge conflicts with the master-branch. I did a If all other checks pass, then I believe this is ready to be merged. |
|
@betatim Thanks! EDIT: I don't think it's a good idea to have another outsider / newcomer help with this review. I really wish @iaroslav-ai would give a hand. |
|
I am currently experimenting with the plotting functionality. Will update here soon with results. |
|
P.S. most likely tomorrow after ~5 pm. |
|
@iaroslav-ai Thanks! I don't mean to stress you, but it would be nice if we could close this soon. I think it has been almost 3 weeks since I finished this (apart from minor fixes and rebasing). The installation instructions for my TensorFlow / skopt tutorial are awkward because it uses a commit from my skopt-fork and it apparently doesn't always work. So it would be nice if this could get merged into master soon, so we can all move on. To avoid having to rebase and fix merge conflicts multiple times, I'll await both of your full reviews before fixing the code. |
|
You can test the new plotting functions using the code-snippets in the above comments. I think you just need to change |
|
Hi everyone. Here are some of my thoughts on this. I have never seen plots that these functions produce to be used in black box optimization literature. Does anyone knows a paper where these are used? Was it intended originally maybe for the debugging of optimization algorithms? |
|
The reason why I ask these questions is because I am not sure how these plotting functions are useful for the skopt user’s. Remark 1. To obtain plots, the average over values of objective is used (over random invisible parameters), whereas a user / algorithm is interested in minimum value of objective. Because of this, user might draw incorrect conclusions about what parameter values are “good”. Remark 2. Best found minima point values do not correlate with minima of plots. This is due to a) remark 1 and b) because generally the objective is assumed to be sampled from a Gaussian process prior where you can have complex non-linear correlations between n > 2 dimensions, which hence cannot be represented in 2d plots. Because of this, incorrect inferences can be drawn about correctness of optimizers, correlations between two variables, and absence of “higher order” correlations. Remark 3. Primary goal of skopt is to optimize user objective efficiently. How do these plots help for that? For instance, the idea of using such plots to aid the optimization contradicts the idea of taking user out of the loop. Remark 4. A minimum can be computed as per Remark 1. However this would still fall under remark 3. Also technical difficulties can occur for large number of dimensions (say 20+). Somewhat similar issues can be brought for the Possible benefit of such plots is to catch failures of optimization algorithm. However ideally such failures should not occur / should be detected automatically without user’s intervention. I am happy to further elaborate on this if necessary or if something unclear :) |
|
@Hvass-Labs sorry that I start these discussions here! I know you want to be done with this soon. But it would be good to clarify the why, as it then dictates the how . Results of discussion here could eg go to docstrings. |
|
@iaroslav-ai Your review is very good and perfectly fine to have the discussion here! You are neutral and polite, and your goal is to make it a good user-experience. That's why I asked you to help with this review :-) Quick comment on your example: In my experiments I found that 40 iterations was very low for getting semi-accurate contour plots. You could try 100 or 200 iterations instead. I use 40 iterations in my own tutorial on skopt and TensorFlow because those experiments are very expensive to run: Note: I wrote the tutorial in just a matter of days. The reason it took me 3-4 weeks to finish it, was because I wanted these plots to be in the tutorial as I found them quite useful - in spite of their short-comings - and I needed to fix I will try and answer your questions as best as I can, considering that I'm not the original author of these, and I mainly just cleaned up the code and added a couple of features. I have combined the answers for several of your questions. About
|
|
Should I conclude from your silence that if I fix the things you've mentioned then this can be merged? @betatim mentioned a couple of minor issues and @iaroslav-ai wanted better doc-strings for the plotting functions. I don't want to rebase and fix merge conflicts only to find out that you have more things you want changed. I think I argued quite well that (1) these plotting functions are useful, (2) the new semantics for dimension-names are useful and sensible, (3) a dimension's search-space index is needed for efficiently working with sub-sets of dimensions and the natural place to store the index is inside the dimension-object. I believe this dealt with all your concerns. Please note that the very small PR #597 is also needed for me to complete this PR. My tutorial on TensorFlow and skopt has been viewed almost 800 times by now. My guess would be that it will get 10000 views within a year. Currently people have to install my dev-fork of skopt and it doesn't work for everyone. People are going to be disappointed in skopt if it doesn't work, so it really is in your own best interest to get this merged soon. Not to mention that people are probably going to be reluctant to contribute if it takes so long to get their work merged. |
|
Sorry, I cannot convince myself about
These methodological issues are important, as these plots are intended to aid user in getting better optimization outcomes, beyond purely technical support (correct me if you have something else in mind). Due to uncertainty I would rather go with Occam’s razor approach and not include them. Consider examples for other functionality of skopt:
I cannot argue like that for these plots. I did not come to this conclusion easily - I know and appreciate that you put a lot of work into rewriting these plots. Yet it is better to address these methodological issues right now, to avoid unnecessary effort spent, and breaking changes in the future, which could also affect your tutorial. Please do let me know if you wish for me to elaborate more on the points that I raise or if you have any comments on this. |
|
@iaroslav-ai Thanks for your review. You have good intentions even if we strongly disagree. Of course I'm not happy if all my work is lost :-) But I don't want to waste more time on this anyway, so we need to close this PR this week whichever way you guys decide. On your points:
Please take a look at your own tutorial which uses these plots. As you will see, the plots make good sense there and clearly show you the objective-landscape and the regions that were sampled, the sample-ordering etc. These plots are better approximations to the true objective function because they are built from surrogate models with 160 samples of the true objective-function, whereas you only took 40 samples. I think these plots are great! And I am perfectly willing to accept more distorted plots for harder problems where we cannot evaluate hundreds of samples of the objective-function. I think it is a mistake to discuss whether these plotting functions should be in skopt or not. We are way beyond that point. If you have other plotting functions that are better, then those can be added later by someone else in another PR. Somebody else thought these plots were useful to add to skopt (the above tutorial was written by @betatim so perhaps he also wrote the original Unfortunately The code passes all the tests and also creates the plots correctly for your own tutorial. I can expand on the doc-strings with my explanations above. And I can fix the minor issues @betatim raised. I'll be happy to do that, but then we must merge it. I cannot afford to waste anymore time on this. @betatim and @iaroslav-ai should I make these minor changes so you will merge, or should I stop work on this now? @iaroslav-ai Please review and merge #597 which is needed for this PR as well (some of the issues @betatim raised are fixed by that). Please consider that this thread may deter others from wanting to contribute to skopt in the future because it is so problematic. So once again, I think it is in your own best interest to get this PR closed ASAP. |
|
@Hvass-Labs thanks for detailed comments! Will respond later today. |
|
Thanks again for a thorough discussion! To extend a bit on my arguments: I would measure the usefulness of an algorithm as a minimum objective found by an algorithm after k iterations, k > 0, averaged on a set of n >> 0 problems such as ones used here or others. Our library is concerned with doing optimization best it can, so we should measure our success by how well we optimize “in general”. To argue about the usefulness of the plots, I would like to see a comparison of such quantities, which could look like:
It does not matter to me where such data comes from as long as it is reliable - paper or not :) But without such reliable data, we cannot make any claims of whether these plots are helpful or otherwise. Any claims without supporting data is just a speculation. I would like to provide to every user of skopt a functionality which is verified to be beneficial in optimizing user’s objective. One can possibly define other meaningful evaluation metrics. Generally, I find it important to show that indeed plotting search space can be used to improve some measurable and important performance metric, to motivate necessity for plotting in the first place. |
|
On the other hand side, I must confess removing the functions will involve too much stone turning in skopt, with already large PR. Plus it might be useful for tutorials/examples in say 2d space. Finally, it is good to decide on something concrete and doable to do. I propose the following plan:
Anyone comments on the plan? P.S. I will be not very responsive tomorrow and the day after tomorrow in the morning, but will get better after that. |
|
@iaroslav-ai Thanks for the quick response! Once again I don't mean to be condescending, but I am smiling at this discussion. I think I am a lot older and more experienced than you, so I hope you take this as friendly feedback. It is amusing and annoying at the same time because I don't have time for it, but it is a bit funny :-) You were such a cool dude on the How are you going to quantify the usefulness of a plot? It is plotted after the optimization has taken place. Personally I am just curious what the objective function looks like. The surrogate model is only an approximation and the 2D mapping is a further approximation. So I know that the plots are probably misleading to some extent, especially if the surrogate model is only built from e.g. 40 samples. So the plots have limitations. That doesn't mean they are useless. Especially if you don't have a better way of plotting them. But you cannot even prove and quantify the usefulness of an ordinary convergence plot, yet those are used everywhere. As you agree, this is not a very fruitful discussion to be having. If in 6 or 12 months you find a better way of making plots then you can add those, or modify @iaroslav-ai If you agree to merge, then I will clarify in the doc-strings that the plots have the limitations we have discussed. I don't think it is reasonable to add a warning that they might be removed, but perhaps a note that the visuals might change if a better method is found in the future. You are very welcome to review the code for bugs / problems, but I don't think there's a need for you to spend many hours on that - it is fairly high quality code and it passes all the tests. Please merge #597 first as it is needed here. @betatim Please confirm that you will also support the merge if I fix the minor issues you raised. You guys have had almost 4 weeks to discuss the new naming semantics and I don't have anymore time for this. So you need to make a decision now whether you will merge this or not. I am in beautiful Greece surrounded by spectacular vistas of mountains and water. Even so, I am happy to spend my Saturday in front of my computer finishing this PR. But that is the last day that I am willing to help you. Please note that you already start getting issues with people who are having problems with these functions (one in #606 and another person in this thread). Those people are coming here by way of my tutorial. You should expect a large number of people to learn about skopt from my tutorial in the future. I explain in both the video and the Python Notebook that people need to install from my dev-fork to use this plotting functionality. Hopefully these functions are added to skopt 0.5 so it can be installed easily with pip. But after this discussion I will keep my dev-fork as long as there is a possibility that the functions might be removed from skopt master. This is harmful to your cause because people will be very confused about what to use, especially as I will not maintain my dev-fork. |
|
Let's just focus on the PR. I think our time is best spent on reviewing the code or enjoying our evening or enjoying beauty of Greece :) The tests fail on Travis for some reason: @Hvass-Labs please make sure that you rebase your code on master and that all the tests pass locally for you. Please do address the comments that @betatim raised. Also, state the limitations of dependency plots in docstrings as you mentioned. I was told by @betatim that this was the original inspiration for the plots. I am fine with merging this PR as long as everyone has reviewed the code and given their +1. |
|
I haven't had time to look at this any further than the comments I made previously. The tone of the discussion doesn't make me want to schedule time for it either. The nature of volunteer driven projects is that it is more important to create a welcoming environment than anything else so that people do schedule time to look at things. I'd ask you to please tone down and edit your comments, otherwise people will not engage with the work you are doing. Beyond just your work it creates a combative and adversarial feeling which turns people off. As a option of last resort the maintainers of the project would ask you to leave. Trying to force things with comments that are easy to (mis)read as threats ("You will be flooded by users requesting this.", "People will get a bad impression.") is not productive. The best possible outcome is a bad atmosphere amongst contributors. Experience has shown that the effort to review work grows exponentially with the amount of change. So many small (<50 lines) PRs are good and keep the ball rolling. For large changes building consensus first and following it up with a series of small PRs seems to work well. Some things can't be achieved in small steps, but this means they will be unfortunately very, very slow. (Anecdote: changing two lines in the jupyter notebook project took ~3months to get merged.) You mentioned that you also don't have a lot of time to spare. That is understandable. Feel free to leave the PR as it is now, someone will eventually come and pick it up, or cherry pick parts of it into a smaller PR. Authorship will be maintained through the cherry-picking. It is part of the volunteer aspect that PRs stall, get picked up by others later, or just abandoned. This is Ok and not a failure on anyone's side. |
|
@iaroslav-ai This is the text I've added to the doc-string of |
610ce8d to
dd7433d
Compare
|
I have now fixed what you have requested and pushed it back to github. I did not squash it to make a single commit in the PR, because it might cause problems for people who are using my dev-fork. I have run all the tests in This ends my involvement in your project. You can now merge it or not, it's up to you. I will leave my tutorial using my dev-fork for now. That is liable to give you guys a headache later on as explained above, but that's apparently how you want it - it's not a threat, it's just a fact. (I have re-read my message above and don't understand how you perceive that as menacing. #579 (comment)) You have had 4 weeks to make a very simple decision and it is obvious that it could take many months and never result in anything. I simply cannot afford to waste anymore time on this. @betatim raised some issues about this PR 9 days ago (!) that I answered within one hour (#579 (comment)). I explained in detail the technical reasons why I had made those decision, including issues that @betatim himself had raised earlier. But @betatim never responded which was of course frustrating because it stalled the PR, and now he's mad again about something I said so he refuses to respond and finish this PR. Nobody is more sorry about this whole situation than me. I am the person who has wasted a lot of time and energy on it. I also don't enjoy this, but I do the work anyway because it needs to be finished. This whole thing got off course when you guys refused to accept responsibility for writing really bad code that was functionally broken and impossible to understand and fix for others. Instead you tried to shift the blame to a newcomer for being unable to understand your code. All you had to do was say: "We can see that this code may be really hard to understand for others, and we're really sorry for the pain it has caused. If you're willing to take a crack at fixing and cleaning it up, we'll be happy to help explain it wherever we can. In the future we'll try and make our code easier for others to work on." But instead you got pissy when I started complaining about your code, and I also recall that @betatim mentioned somewhere early on that I might be having mental issues! (That might have been in another PR ... something about me having angst or something.) That is about as low as you can go and I certainly haven't sunken that low when I have criticized your work. Of course I didn't take it very seriously, but before you guys blame me for having bad manners and being counter-productive, you may want to look at yourself first. In my video tutorial on TensorFlow and skopt towards the end I encourage people to help you guys out. But I feel I now have a responsibility to moderate that recommendation and make it clear that it can be a very prolonged and painful experience that ultimately may not yield anything, so I will post a link to this thread and let people judge for themselves whether they want to spend time on your project. I know you guys believe I have harmed my own reputation in this thread, but that is really not the case. I still think your project is really cool, but I also think it is really difficult to work on, in great part because of your attitude towards "collaboration". Naturally, I wish I had never gotten involved so I could have spent the time and energy on something more productive. Despite all of this, I still sincerely wish you the best of luck in the future of this project and hope we all learned something from this experience! To end all of this on a humorous note :-) |















This pull-request partially solves issue #576.
I have added some more support for named dimensions in the search-space, as well as two plotting functions. I don't have time to fix your existing plotting functions but I have made TODO comments for what needs to be done in
plots.pyBefore I start, let me repeat that I really like your library! So please take the following as friendly and constructive feedback. I spent the whole Saturday from morning to evening working on this and half of this Sunday. A large part of yesterday was just trying to understand your code because of the severe lack of comments. I spent this much effort because I think your library is important and I hope you will polish it some more.
Please note that I have to travel tomorrow so it might be a while before I can respond if you have questions / comments to this PR.
How to test this pull-request
The following code uses the additions of this PR by simulating an objective function for tuning the hyper-parameters of a simple neural network. It will save a few plots to disk and print various info. It should work if you copy this to a python-file and run it.
(And yes! My code is also ugly when I'm writing it. I polish it when it is done and works. But this code is not for release :-)
Comments about this PR
I am completely new to your library so it is possible I have misunderstood how something works. Please feel free to make changes to the code I have added. But please keep the good commenting and coding style as it will help people in the future who have to understand and maintain this code.
I am not familiar with the syntax you use for doc-strings. I have tried to imitate it. I normally use PyCharm which has a different syntax for doc-strings. It is one of the things I really don't like about Python, that you don't have to specify data-types in the function-declaration, but then you often have to do it in the doc-strings using a strange syntax. It's a really poor language-design. But anyway, please check if my docstrings are correct.
Some of the functionality I have added such as
space.get_dimensions()is not actually used by my code. It is intended to be helpful for when you fix the other plotting functions to support named dimensions and search-spaces with categories, etc.I added the function
space.to_dict(). There is a somewhat related function inutils.point_asdict(). But I prefer the nameto_dict()which is also used by Pandas. I realize it is bad to have two naming conventions in one library. Could I convince you to change the name ofasdict()? :-)Comments about your existing code
You REALLY need to make better comments in your code! It is very difficult for others to understand what you want to do and how everything fits together. When there is a bug we have to guess what your intentions are with the code. If you don't have time to maintain this library in the future, it will probably become abandoned because others will find it easier to just start a new project rather than to try and understand and build on yours. That would be a pity, because you have some outstanding ideas in your library!
For example, this is one of the few comments in
plots.py:I had to read this numerous times to finally understand that it was probably a joke
about Texas USA, although I'm still not 100% sure. You really have to be a master at writing comments in order to pull off jokes, otherwise it will be confusing and harmful to the people who try to understand your code. I personally don't write jokes in code and I am very reluctant to make jokes in my video tutorials as well, for the same reason.
Please look at the coding and commenting style in the functions I have added. You may think that I write too many comments, but what I am aiming for is to explain as much as possible in plain English. Every time you have a function call into skopt or matplotlib or some other package, the reader needs to look up the semantics of that function. If that is also poorly documented, reading your code becomes like solving a giant puzzle. It takes immense effort and bugs are allowed to propagate very easily. When everything is commented in plain English, you can read and understand the code very quickly.
Another issue is your actual coding style. I found bugs as well as code that was correct but not very good. For example, these nested loops are typical in
plots.py:Firstly, you should have a local variable called
n_dimsinstead of always referring tospace.n_dims. Secondly, your nesting depth is sometimes 4 or 5 inside these for-loops, because you haven't been very precise in your loop-conditions. You can rewrite the for-loops to something like the following. It drastically reduces the nesting-depth and also only executes the loops that are necessary:You also have this code in
plots.pywhich is incredibly difficult to understand because of the multiple nestings and list-comprehension:It is better to split code like this into several lines to avoid the complicated nesting and also make it easier to step-debug. I added a revised version to the code, although I've commented it out for now so you can test it properly.
Another thing is that it is quite confusing how the Space and Dimension objects interact and wrap each other, e.g. regarding the transformation. It appears that the transformation and its inverse may be carried out repeatedly to nullify each other, although I'm not sure.