Skip to content

ENH: Should there be an rng.clone() or similar? #24086

@seberg

Description

@seberg

Proposed new feature or change:

Discussing with @betatim, sklearn sometimes would like to re-use the same random number state (e.g. for splitting data the same way, but potentially many times).

I am not sure about whether that is a good idea, but I assume there is a need for this. So this is to track/get feedback if we should add a .clone() or .copy() method to the rng/bit_generator.

Right now, the best pattern I could think of is to basically do:

def __init__(self, rng=None)
    # maybe spawn a new one to have one for ourself
    # (not sure how it exactly looks like)
    self._blueprint_rng, = new_rng(rng) or rng.spawn(1)

def method(self):
    rng = copy.deepcopy(self._blueprint_rng)
    # use rng

The reason is that we have to work with a copy because otherwise threading will be broken. deepcopy works, and it might be nice to implement __deepcopy__ to reduce the overhead a bit (I think this could be <1us rather than >10us easily).

But the other question is whether copy.deepcopy() is an obvious enough solution to begin with, or whether it wouldn't be better to have an explicit method to make this easy?

Or maybe I am missing a nicer pattern to have such a "rewinding", rng?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions