ENH add randomized hyperparameter optimization

amueller · amueller · commit 0c94b55c63fc · 2013-03-03T20:56:39.000+01:00
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -455,7 +455,9 @@ From text
    :template: class.rst
 
    grid_search.GridSearchCV
-   grid_search.IterGrid
+   grid_search.ParameterGrid
+   grid_search.ParameterSampler
+   grid_search.RandomizedSearchCV
 
 
 .. _hmm_ref:
diff --git a/doc/modules/grid_search.rst b/doc/modules/grid_search.rst
@@ -1,11 +1,11 @@
 .. _grid_search:
 
+.. currentmodule:: sklearn.grid_search
+
 ==========================================
 Grid Search: setting estimator parameters
 ==========================================
 
-.. currentmodule:: sklearn
-
 Grid Search is used to optimize the parameters of a model (e.g. ``C``,
 ``kernel`` and ``gamma`` for Support Vector Classifier, ``alpha`` for
 Lasso, etc.) using an internal :ref:`cross_validation` scheme).
@@ -15,10 +15,10 @@ GridSearchCV
 ============
 
 The main class for implementing hyperparameters grid search in
-scikit-learn is :class:`grid_search.GridSearchCV`. This class is passed
+scikit-learn is :class:`GridSearchCV`. This class is passed
 a base model instance (for example ``sklearn.svm.SVC()``) along with a
 grid of potential hyper-parameter values specified with the `param_grid`
-attribute. For instace the following `param_grid`::
+attribute. For instance the following `param_grid`::
 
   param_grid = [
     {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
@@ -30,7 +30,7 @@ C values in [1, 10, 100, 1000], and the second one with an RBG kernel,
 and the cross-product of C values ranging in [1, 10, 100, 1000] and gamma
 values in [0.001, 0.0001].
 
-The :class:`grid_search.GridSearchCV` instance implements the usual
+The :class:`GridSearchCV` instance implements the usual
 estimator API: when "fitting" it on a dataset all the possible
 combinations of hyperparameter values are evaluated and the best
 combinations is retained.
@@ -64,24 +64,76 @@ alternative scoring function can be specified via the ``scoring`` parameter to
 :class:`GridSearchCV`. 
 See :ref:`score_func_objects` for more details.
 
-Examples
-========
+.. topic:: Examples:
 
-- See :ref:`example_grid_search_digits.py` for an example of
-  Grid Search computation on the digits dataset.
+    - See :ref:`example_grid_search_digits.py` for an example of
+      Grid Search computation on the digits dataset.
 
-- See :ref:`example_grid_search_text_feature_extraction.py` for an example
-  of Grid Search coupling parameters from a text documents feature
-  extractor (n-gram count vectorizer and TF-IDF transformer) with a
-  classifier (here a linear SVM trained with SGD with either elastic
-  net or L2 penalty) using a :class:`pipeline.Pipeline` instance.
+    - See :ref:`example_grid_search_text_feature_extraction.py` for an example
+      of Grid Search coupling parameters from a text documents feature
+      extractor (n-gram count vectorizer and TF-IDF transformer) with a
+      classifier (here a linear SVM trained with SGD with either elastic
+      net or L2 penalty) using a :class:`pipeline.Pipeline` instance.
 
 .. note::
 
   Computations can be run in parallel if your OS supports it, by using
   the keyword n_jobs=-1, see function signature for more details.
 
 
+Randomized Hyper-Parameter Optimization
+=======================================
+While using a grid of parameter settings is currently the most widely used
+method for hyper-parameter optimization, other search methods have more
+favourable properties.
+:class:`RandomizedSearchCV` implements a randomized search over hyperparameters,
+where each setting is sampled from a distribution over possible parameter values.
+This has two main benefits over searching over a grid:
+
+* A budget can be chosen independent of the number of parameters and possible values.
+
+* Adding parameters that do not influence the performance does not decrease efficiency.
+
+Specifying how parameters should be sampled is done using a dictionary, very
+similar to specifying parameters for :class:`GridSearchCV`. Additionally,
+a computation budget is specified using ``n_iter``, which is the number
+of iterations (parameter samples) to be used.
+For each parameter, either a distribution over possible values or list of
+discrete choices (which will be sampled uniformly) can be specified::
+
+  [{'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
+    'kernel': ['rbf'], 'class_weight':['auto', None]}]
+
+This example uses the ``scipy.stats`` module, which contains many useful
+distributions for sampling hyperparameters, such as ``expon``, ``gamma``,
+``uniform`` or ``randint``.
+In principle, any function can be passed that provides a ``rvs`` (random
+variate sample) method to sample a value. A call to the ``rvs`` function should
+provide independent random samples from possible parameter values on
+consecutive calls.
+
+    .. warning::
+        
+        The distributions in ``scipy.stats`` do not allow specifying a random
+        state. Instead, they use the global numpy random state, that can be seeded
+        via ``np.random.seed`` or set using ``np.random.set_state``.
+
+For continuous parameters, such as ``C`` above, it is important to specify
+a continuous distribution to take full advantage of the randomization. This way,
+increasing ``n_iter`` will always lead to a finer search.
+
+.. topic:: Examples:
+
+    * :ref:`example_randomized_search.py` compares the usage and efficiency
+      of randomized search and grid search.
+
+.. topic:: References:
+
+    * Bergstra, J. and Bengio, Y.,
+      Random search for hyper-parameter optimization,
+      The Journal of Machine Learning Research (2012)
+
+
 Alternatives to brute force grid search
 =======================================
 
diff --git a/doc/tutorial/statistical_inference/model_selection.rst b/doc/tutorial/statistical_inference/model_selection.rst
@@ -146,7 +146,7 @@ estimator during the construction and exposes an estimator API::
     >>> clf.fit(X_digits[:1000], y_digits[:1000]) # doctest: +ELLIPSIS
     GridSearchCV(cv=None,...
     >>> clf.best_score_
-    0.988991985997974
+    0.98899999999999999
     >>> clf.best_estimator_.gamma
     9.9999999999999995e-07
 
diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -35,6 +35,10 @@ Changelog
      attribute. Setting ``compute_importances=True`` is no longer required.
      By `Gilles Louppe`_.
 
+   - Added :class:`grid_search.RandomizedSearchCV` and
+     :class:`grid_search.ParameterSampler` for randomized hyperparameter
+     optimization. By `Andreas Müller`_.
+
    - :class:`LinearSVC`, :class:`SGDClassifier` and :class:`SGDRegressor`
      now have a ``sparsify`` method that converts their ``coef_`` into a
      sparse matrix, meaning stored models trained using these estimators
@@ -46,6 +50,13 @@ Changelog
    - Fixed bug in :class:`MinMaxScaler` causing incorrect scaling of the
      features for non-default ``feature_range`` settings. By `Andreas Müller`_.
 
+
+API changes summary
+-------------------
+
+    - :class:`grid_search.IterGrid` was renamed to
+      :class:`grid_search.ParameterGrid`.
+
    - Fixed bug in :class:`KFold` causing imperfect class balance in some
      cases. By `Alexandre Gramfort`_ and Tadej Janež.
 
diff --git a/examples/grid_search_digits.py b/examples/grid_search_digits.py
@@ -59,7 +59,7 @@
     print()
     print("Grid scores on development set:")
     print()
-    for params, mean_score, scores in clf.grid_scores_:
+    for params, mean_score, scores in clf.cv_scores_:
         print("%0.3f (+/-%0.03f) for %r"
               % (mean_score, scores.std() / 2, params))
     print()
diff --git a/examples/svm/plot_rbf_parameters.py b/examples/svm/plot_rbf_parameters.py
@@ -105,8 +105,8 @@
     pl.axis('tight')
 
 # plot the scores of the grid
-# grid_scores_ contains parameter settings and scores
-score_dict = grid.grid_scores_
+# cv_scores_ contains parameter settings and scores
+score_dict = grid.cv_scores_
 
 # We extract just the scores
 scores = [x[1] for x in score_dict]
diff --git a/examples/svm/plot_svm_scale_c.py b/examples/svm/plot_svm_scale_c.py
@@ -131,7 +131,7 @@
                             cv=ShuffleSplit(n=n_samples, train_size=train_size,
                                             n_iter=250, random_state=1))
         grid.fit(X, y)
-        scores = [x[1] for x in grid.grid_scores_]
+        scores = [x[1] for x in grid.cv_scores_]
 
         scales = [(1, 'No scaling'),
                   ((n_samples * train_size), '1/n_samples'),
diff --git a/sklearn/grid_search.py b/sklearn/grid_search.py
diff --git a/sklearn/tests/test_grid_search.py b/sklearn/tests/test_grid_search.py
diff --git a/sklearn/utils/testing.py b/sklearn/utils/testing.py