Conversation
|
Performance benchmarks:
|
|
As indicated in the fixme, I am not sure, yet this is the right approach. I am not even sure whether I want to allow a generator or SeedSequence as a valid argument. |
|
Thanks for the review Since the main uncertainty seems to be around the API design, I’d love to understand your preference for how Mesa should treat RNG inputs. One way I’ve been thinking about it is in terms of intent:
Does this framing make sense to you, or is Mesa aiming for a different mental model here? Once I understand the direction you prefer, I’m happy to adjust the PR accordingly or leave it as-is if this isn’t something you’d like to move forward with right now. |
|
My assumption is that most users are at least somewhat familiar with the idea of a seed as an integer used to initialize the random number generator. However, Numpy has moved beyond this, e.g., through its spawn method. I am inclined to favor the idea of the second bullet. However, every stochastic realization of a scenario must remain reproducible. So, if we use spawn, how can we reproduce those results? |
|
spawn is deterministic given the same root seed and spawning order, so as long as we know the Root Seed and the Run ID (index), we can reconstruct the exact generator for any specific run without re-running the others. Here is an example to show this : import numpy as np
# The user runs a batch of 5 models
seed = 42
root_A = np.random.default_rng(seed)
children_A = root_A.spawn(5)
# We record the result of Run #2 (Index 2)
value_run_2_original = children_A[2].random()
print(f"Original Run 2: {value_run_2_original}")
#later, we want to debug JUST Run #2.
# We don't need to run 0, 1, 3, or 4.
run_id = 2
root_B = np.random.default_rng(seed)
# (Since spawn is deterministic, the generator at index 2 is identical to Scenario A)
rng_reproduced = root_B.spawn(5)[run_id]
value_run_2_reproduced = rng_reproduced.random()
print(f"Reproduced Run 2: {value_run_2_reproduced}")Output: |
Ok, but that is still not particularly user friendly. Imagine that you have done a bunch of experiments. There is a weird outlier for one of the realizations (i.e., seeds) of a given experiment. You now want to parameterize the model to have a closer look at that one realization. What do you need to do? In this case, you would need the |
|
I agree that if a user spots an outlier in the data, they shouldn't have to manually calculate which seed generated it. However, I realized that Mesa’s batch_run actually handles this for us already. The output of batch_run includes an If you want I can share an example showing how batch_run + iteration already allows reproducing a single realization using generators, without rerunning the rest |
|
closing this in favor of #3493 |
this PR resolves the
# fixme we might want to spawn a generator (in essence a copy)Changes: