-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Support numpy.random.Generator and/or BitGenerator for random number generation #16988
Copy link
Copy link
Open
Labels
Description
Describe the workflow you want to enable
I'd like to use a Generator or BitGenerator with scikit-learn where I'd otherwise use RandomState or a seed int.
For example:
import numpy as np
bit_generator = np.random.PCG64(seed=0)
generator = np.random.Generator(bit_generator)
and then use this for random_state= in scikit-learn:
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.svm import LinearSVC
X, y = make_classification(random_state=generator) # or my bit_generator here
classifier = LinearSVC(random_state=generator)
cv = ShuffleSplit(random_state=generator)
This fails because these methods expect a RandomState object or int seed value. The specific trigger is check_random_state(random_state).
Describe your proposed solution
This would require:
- changing code to allow
GeneratororBitGeneratoras acceptable values forrandom_state=..in every function and class constructor that acceptsrandom_state. - change
check_random_state()to allowGeneratorand/orBitGeneratorobjects. - adding tests for using
GeneratororBitGeneratorwith classes or functions that consumerandom_state(similar toseedint orRandomStateobjects already) - change any internal code that assumes
RandomStatemethods that aren't available withGenerator(e.g.rand,randn, see ) - maybe switch to using
Generatorinstead ofRandomStateby default, when seed int is given
Describe alternatives you've considered, if relevant
The scope could include either or both of BitGenerator or Generator.
It might be easiest to allow only BitGenerator, and not Generator.
- This allows flexibility.
- Users have control over seed and PRNG algorithm.
- This is easier to implement (can be treated just like a
seedint value).BitGeneratorcan be given toRandomState, and I think it then produces the same values asGenerator.
Additional context
NumPy v1.17 added the numpy.random.Generator (docs) interface for random number generation.
Overview:
Generatoris similar toRandomState, but enables different PRNG algorithmsBitGenerator(docs) encapsulates the PRNG and seed value, e.g.PCG64(seed=0)RandomState"is considered frozen" and uses "the slow Mersenne Twister" by default (docs)RandomStatecan work with non-MersenneBitGeneratorobjects- More info in NEP-19, the design document from NumPy.
The API for Generator and BitGenerator looks like:
from numpy import random
bit_generator = random.PCG64(seed=0) # PCG64 is a BitGenerator subclass
generator = random.Generator(bit_generator)
generator.uniform(...) # API is similar to RandomState
# there's also this, for making a PCG64-backed Generator
generator = random.default_rng(seed=0)
Reactions are currently unavailable