Benchmarks and Interfaces for Black Box Optimization Software

The interfaces are available to following software:

scikit-optimize (native support)
spearmint (python 2 only)
smac3 (python 3 only)
gpyopt
hyperopt

Repository contains a set of benchmarks intended for testing black box optimization algorithms. Some are inspired by practical problems, and many originate from literature and are based on sigopt's evalset .

Aggregated results

Results below are for n_calls=64:

Method	Average rank (less is better)
dummy_minimize	4.552
forest_minimize	2.362
gbrt_minimize	2.172
gp_minimize	1.241
gpyopt_minimize	1.069
hyperopt_minimize	3.052
smac_minimize	3.431

Important note: these results need not generalize to the problems which are largely different from problems in the evaluation set.

See below how these results are calculated.

Installation on Ubuntu - like systems

To benchmark scikit-optimize only: sudo bash skopt_py2.sh or sudo bash skopt_py3.sh depending whether you use python 2 or 3.
Run sudo bash full_install.sh .

If any of this fails at some point, let us know!

Docker image

Has all of necessary software installed. Requires you to run mongo in a screen in order to run spearmint.

sudo docker run -t -i iaroslavai/scikit-optimize-benchmarks /bin/bash

Run on a single machine

from tracks.ampgo import Hartmann3_3_ri, Ackley_3_1_r
from evaluation import parallel_evaluate, plot_results, calculate_metrics

from skopt import forest_minimize
from wrappers.hyperopt_minimize import hyperopt_minimize
from wrappers.gpyopt_minimize import gpyopt_minimize

r = parallel_evaluate(
    solvers=[forest_minimize, gpyopt_minimize, hyperopt_minimize],
    task_subset=[Hartmann3_3_ri, Ackley_3_1_r],  # set to None to evaluate on all tasks
    n_reps=2, # number of repetitions
    eval_kwargs={'n_calls': 10},
    joblib_kwargs={'n_jobs': -1, 'verbose': 10})

p = calculate_metrics(r) # returns pandas dataframe
p.to_csv('data.csv')
plot_results(r)

How performance is calculated

Results can be found in results_history folder, in .csv file with the latest date. To reproduce, run python distributed_run.py with n_reps >= 64.

Every entry in such csv file corresponds to performance of some algorithm on some problem. Such entries consist of 3 values: lower confidence bound < mean < upper confidence bound, where 95% confidence interval for the value of mean is computed using bootstrapping method.

On every test optimization problem algorithms are ranked based on their relative performance. A rank of some algorithm is a number of other algorithms that significantly (based on derived confidence intervals) i mprove over the algorithm. Consider example results below:

A	B	C	D
3<4<5	0<1<2	-1<0<1	-5<-4<-3

Based on above results, ranks of algorithms are: A = 3, B = 1, C = 1, D = 0. The rank for B and C is the same, as their confidence intervals overlap. A large number of repetitions is used to reduce the size of such intervals.

Confidence intervals and large number of repetitions is used because of empirical observation that results with small number of iterations often cannot be reproduced and hence are unreliable.

Contributions

All contributions are welcome! :)

If you want to add a benchmark, consider this:

It needs to have practical relevance and solving corresponding optimization problem should clearly be valuable. For example, minimizing random polynomials of power 3 is unlikely to be a problem encountered in practice. Optimizing components of some medication to improve patient recovery rate is.
It needs to simulate a task, where the objective is unknown or complex, and is expensive to evaluate.
It needs to run quickly, so that benchmarking does not take days and so that progress can be done quickly. A speed - up for realistic optimization problem can be obtained by learning predictive models from the data to simulate actual objective function.
It should be hard to guess [near] global optimal value with small number of random guesses, that is, the problem should not be "easy". This can be verified by running the dummy_minimize procedure for many iterations. If the objective does not improve after small number of iterations, it implies that optimization task is not too complex.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bbob		bbob
examples		examples
results_history		results_history
.gitignore		.gitignore
README.md		README.md
combine_results.py		combine_results.py
distributed_run.py		distributed_run.py
full_install.sh		full_install.sh
mongod_install.sh		mongod_install.sh
run_inf.sh		run_inf.sh
setup.py		setup.py
skopt_py2.sh		skopt_py2.sh
skopt_py3.sh		skopt_py3.sh
test_minimizers.py		test_minimizers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarks and Interfaces for Black Box Optimization Software

Aggregated results

Installation on Ubuntu - like systems

Docker image

Run on a single machine

How performance is calculated

Contributions

About

Uh oh!

Releases

Packages

Languages

iaroslav-ai/scikit-optimize-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Benchmarks and Interfaces for Black Box Optimization Software

Aggregated results

Installation on Ubuntu - like systems

Docker image

Run on a single machine

How performance is calculated

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages