-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Open
Labels
Description
Describe the workflow you want to enable
Since numpy version 1.17.0, np.random.RandomState can accept the ._bit_generator attribute as input in the constructor. This can be a plus for those who use np.random.Generator in their code and want to use the same bitgenerator with sklearn's estimators. Currently this is not possible, see:
from sklearn.datasets import make_classification
from sklearn.manifold import TSNE
X, y = make_classification(n_samples=150, n_features=5, n_informative=5,
n_redundant=0, n_repeated=0, n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94],
class_sep=0.8, random_state=0)
rng = np.random.default_rng(12345)
tsne = TSNE()
# some piece of code here
# then later we use our own rng to set the seed of `tsne`
# notice `_bit_generator` used here, which is compatible with RandomState
tsne.set_params(random_state=rng._bit_generator)
tsne.fit_transform(X, y)this leads to the error:
File "/home/python3.6/site-packages/sklearn/manifold/_t_sne.py", line 932, in fit_transform
embedding = self._fit(X)
File "/home/python3.6/site-packages/sklearn/manifold/_t_sne.py", line 728, in _fit
random_state = check_random_state(self.random_state)
File "/home/python3.6/site-packages/sklearn/utils/validation.py", line 944, in check_random_state
' instance' % seed)
ValueError: <numpy.random.pcg64.PCG64 object at 0x7ffa3ab471b8> cannot be used to seed a numpy.random.RandomState instance
Describe your proposed solution
I propose we add a conditional in check_random_state that supports an instance of BitGenerator, see:
scikit-learn/sklearn/utils/validation.py
Lines 926 to 944 in 2beed55
| def check_random_state(seed): | |
| """Turn seed into a np.random.RandomState instance | |
| Parameters | |
| ---------- | |
| seed : None, int or instance of RandomState | |
| If seed is None, return the RandomState singleton used by np.random. | |
| If seed is an int, return a new RandomState instance seeded with seed. | |
| If seed is already a RandomState instance, return it. | |
| Otherwise raise ValueError. | |
| """ | |
| if seed is None or seed is np.random: | |
| return np.random.mtrand._rand | |
| if isinstance(seed, numbers.Integral): | |
| return np.random.RandomState(seed) | |
| if isinstance(seed, np.random.RandomState): | |
| return seed | |
| raise ValueError('%r cannot be used to seed a numpy.random.RandomState' | |
| ' instance' % seed) |
something like
supported_bitgenerators = {'PCG64', 'SFC64', 'Philox', ...}
def check_random_state(seed):
...
if seed.__class__.__name__ in supported_bitgenerators:
return np.random.RandomState(seed) # should work if numpy>=1.17.0
...Describe alternatives you've considered, if relevant
I know there is an issue regarding supporting the new numpy Generator interface but I feel this is slightly different since it does not attempt to replace RandomState.