Skip to content

fix-spawn-generator#3195

Closed
champ-byte wants to merge 1 commit intomesa:mainfrom
champ-byte:fix-spawn-generator
Closed

fix-spawn-generator#3195
champ-byte wants to merge 1 commit intomesa:mainfrom
champ-byte:fix-spawn-generator

Conversation

@champ-byte
Copy link
Copy Markdown
Contributor

this PR resolves the # fixme we might want to spawn a generator (in essence a copy)

Changes:

  • Modified initialization to spawn a child generator when a Generator instance is passed
  • Updated tests to check the assertions

@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🟢 -4.6% [-5.5%, -3.6%] 🔵 -0.5% [-0.7%, -0.3%]
BoltzmannWealth large 🔵 +2.9% [-7.4%, +14.2%] 🟢 -14.3% [-17.1%, -11.5%]
Schelling small 🔵 -3.4% [-4.1%, -2.6%] 🔵 -1.7% [-2.5%, -1.1%]
Schelling large 🔵 +2.6% [-4.5%, +9.2%] 🔵 -3.7% [-6.9%, -0.8%]
WolfSheep small 🔵 +0.9% [-2.1%, +3.9%] 🔵 -1.0% [-1.5%, -0.4%]
WolfSheep large 🔵 +3.0% [-9.3%, +14.9%] 🔴 +6.2% [+3.8%, +9.1%]
BoidFlockers small 🔵 -0.4% [-0.8%, -0.1%] 🔵 +0.6% [+0.4%, +0.8%]
BoidFlockers large 🔵 +2.9% [+0.7%, +5.6%] 🔵 +0.7% [+0.5%, +1.0%]

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 23, 2026

As indicated in the fixme, I am not sure, yet this is the right approach. I am not even sure whether I want to allow a generator or SeedSequence as a valid argument.

@champ-byte
Copy link
Copy Markdown
Contributor Author

Thanks for the review

Since the main uncertainty seems to be around the API design, I’d love to understand your preference for how Mesa should treat RNG inputs.

One way I’ve been thinking about it is in terms of intent:

  • If a Scenario is meant to act as a reproducible blueprint, then accepting a SeedSequence seems attractive. It keeps Model() deterministic, and users who want multiple runs can explicitly spawn child sequences beforehand. This also aligns with the fact that Mesa already defines SeedSequence as a valid input type.

  • If instead the Scenario is closer to a configured experiment runner, then accepting a Generator (and spawning a child for isolation) feels more convenient, since variation across runs is handled automatically while still avoiding shared state.

Does this framing make sense to you, or is Mesa aiming for a different mental model here?

Once I understand the direction you prefer, I’m happy to adjust the PR accordingly or leave it as-is if this isn’t something you’d like to move forward with right now.

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 23, 2026

My assumption is that most users are at least somewhat familiar with the idea of a seed as an integer used to initialize the random number generator. However, Numpy has moved beyond this, e.g., through its spawn method. I am inclined to favor the idea of the second bullet. However, every stochastic realization of a scenario must remain reproducible. So, if we use spawn, how can we reproduce those results?

@champ-byte
Copy link
Copy Markdown
Contributor Author

spawn is deterministic given the same root seed and spawning order, so as long as we know the Root Seed and the Run ID (index), we can reconstruct the exact generator for any specific run without re-running the others.

Here is an example to show this :

import numpy as np
# The user runs a batch of 5 models
seed = 42
root_A = np.random.default_rng(seed)
children_A = root_A.spawn(5)

# We record the result of Run #2 (Index 2)
value_run_2_original = children_A[2].random()
print(f"Original Run 2: {value_run_2_original}")

#later, we want to debug JUST Run #2. 
# We don't need to run 0, 1, 3, or 4.
run_id = 2
root_B = np.random.default_rng(seed)
# (Since spawn is deterministic, the generator at index 2 is identical to Scenario A)
rng_reproduced = root_B.spawn(5)[run_id]

value_run_2_reproduced = rng_reproduced.random()
print(f"Reproduced Run 2: {value_run_2_reproduced}")

Output:
Original Run 2: 0.07123920291270869 Reproduced Run 2: 0.07123920291270869

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 27, 2026

spawn is deterministic given the same root seed and spawning order, so as long as we know the Root Seed and the Run ID (index), we can reconstruct the exact generator for any specific run without re-running the others.

Ok, but that is still not particularly user friendly. Imagine that you have done a bunch of experiments. There is a weird outlier for one of the realizations (i.e., seeds) of a given experiment. You now want to parameterize the model to have a closer look at that one realization. What do you need to do? In this case, you would need the Scenario, which is easy, but you would also need to know the spawn index or something equivalent.

@champ-byte
Copy link
Copy Markdown
Contributor Author

I agree that if a user spots an outlier in the data, they shouldn't have to manually calculate which seed generated it.

However, I realized that Mesa’s batch_run actually handles this for us already.

The output of batch_run includes an iteration column. This column corresponds 1:1 to the index of the RNG in the input list (i.e., the spawn index).

If you want I can share an example showing how batch_run + iteration already allows reproducing a single realization using generators, without rerunning the rest

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Mar 10, 2026

closing this in favor of #3493

@quaquel quaquel closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants