Replacing the batch runner #3483

quaquel · 2026-03-08T21:11:25Z

quaquel
Mar 8, 2026
Maintainer

I have been thinking about a replacement for the batch_runner. As I indicated in Mesa 4 Goals, the current batchrunner does too much. In my view, the generation of samples and the (parallel) execution of these samples is not the responsibility of Mesa but can be handled by other libraries. For example, numpy.qmc and SALib can be used to create samples. concurrent.futures offers a ProcessPoolExecutor for parallel model execution, and this API is mimicked by e.g. mpi4py for support on HPC using MPI. However, what would still be needed inside Mesa to make this all work?

In short, there are 3 things still needed: experiment representation including seed management for replications, a clear reusable specification of executing a single sample, and a helper function for storing a sample and its result.

1. Replication support on Scenario

In #3103, we added support for experiment representation via Scenario. However, Scenario does not yet provide strong support for replications. I suggest we expand rng inside Scenario to support spawning multiple replications of the same scenario. The basic API would be something along the lines of

scenario = MyScenario(a=1, b=2, rng=42)
replications = scenario.spawn_replications(n_replications = 10)

for replication in replications:
	print(scenario.scenario_id) # always 1
	print(scenario.replication_id) # 0 .. 9

The above can quite easily be implemented, but it requires some refactoring

The instantiation of the numpy generator moves into the scenario
scenario.spawn_replications internally uses numpy.qmc.Generator.spawn to spawn n independent generators and returns a list of copies of itself with just the generator changed and the replication_id specified. See also fix-spawn-generator #3195
the auto-generation of scenario_id becomes a bit trickier than the current always increment on instantiation

2. A standard run_scenario function which specifies the logic of executing a single scenario

This is the worker function to be used by a concurrent.futures executor

run_scenario(model_cls, scenario, n_steps):
	model = model_cls(scenario=scenario)
	model.run_for(n_steps)
	data = model.data_recorder.get_data()
	return data

There are two open questions here

Should we just return the data as stored by the data_recorder, or should we allow for a user-specified callable that takes the model as input and returns a pickleable object containing the result of the scenario the user wants to extract?
The current sketch only specifies a run_for approach. Likely, we want to support a richer set of possibilities, including the still unresolved run_while/stop_condition problem.

3. Some helper function that takes the (scenario, result) tuple and turns this into something that can be stored in an appropriate back-end

When doing parameter sweeps, ideally, we want to store the results in an appropriate back-end to disk. This means we need to take the replication and its results, process them so they can be easily stored, and store them. This can be as simple as turning the replication and the results into dictionaries, which can then be combined into several dataframes. But for larger outputs (e.g., maps, timeseries, maps over time), we might need something richer. Moreover, dataframes are just one possible data structure that can be stored as, e.g., CSV files. Ideally, however, the experiments and all results are stored in a single file rather than spread across multiple files.

codebreaker32 · 2026-03-09T05:04:24Z

codebreaker32
Mar 9, 2026
Collaborator

I have an initial concern that I also raised in the original PR (#3103) here ro which you replied.

In short, To throw a list of Scenario objects into a ProcessPoolExecutor, those objects must be lightweight, safely pickleable data containers. Currently, to prevent mid-run parameter changes, Scenario.__setattr__ holds a reference to the Model and checks self.model.running. This creates a bidirectional dependency. We can easily handle this mutation thing with an alternate private attribute _locked.

If we use a ProcessPoolExecutor, Python will serialize the Scenario object to send it to the workers. If that Scenario has already been touched or attached to a Model, pickling the scenario will inadvertently pickle the entire attached Model instance (along with its grid, DataRegistry, event schedule, and all agents which was also prevalent in batch_run). This will lead to memory overhead, and defeats the purpose of the scenario being a lightweight configuration object.

It also prevents the Scenario reuse.

Some helper function that takes the (scenario, result) tuple and turns this into something that can be stored in an appropriate back-end

This feels like a natural fit for a single-file sweep backend. We could extend the SQLDataRecorder (or build a wrapper around it) that creates a centralized scenarios table to store the Scenario parameters, assigned to a unique run_id. The time-series/result tables would just get a run_id(here replication_id) foreign key.
Or to store the entire sweep in a single partitioned file, we can use pyarrow/parquet

For the much richer outputs you mentioned (like maps over time), Can we consider HDF5 or Xarray/NetCDF as a backend target? They are built explicitly for storing complex N-dimensional arrays tightly bound with metadata attributes in a single file

0 replies

quaquel · 2026-03-09T08:43:18Z

quaquel
Mar 9, 2026
Maintainer Author

In short, To throw a list of Scenario objects into a ProcessPoolExecutor, those objects must be lightweight, safely pickleable data containers

I share this concern. Conceptually, I want scenarios to be the equivalent of a frozen dataclass. The current implementation is less restrictive by not allowing modification after use inside a model. I am inclined to change this to be fully frozen upon instantiation. I have to see if this means I can just use a dataclass, or that I have to implement __setattr__ to always raise a type error.

0 replies

jackiekazil · 2026-03-10T02:42:03Z

jackiekazil
Mar 10, 2026
Maintainer

Love this update. This has been headache.

RE: Data management -- If I understand the problem correctly -- this feels a little like... we don't want too much in core Mesa, but also, we want this handled by core Mesa.

I want to propose a middle ground. We could define a storage protocol and Mesa ships with one default implementation for new users (e.g. SQLite). More experienced users can swamp out for more advanced set ups.
Django (another python web framework) does this. It has a settings.py file where it is defined -- https://docs.djangoproject.com/en/6.0/ref/settings/#databases

I am not married to this idea. I am throwing it out as an option or jumping off point.

1 reply

quaquel Mar 10, 2026
Maintainer Author

I want to propose a middle ground. We could define a storage protocol and Mesa ships with one default implementation for new users (e.g. SQLite). More experienced users can swamp out for more advanced set ups.

Yes that is indeed the direction I am thinking of.

souro26 · 2026-03-10T05:18:30Z

souro26
Mar 10, 2026

I have a concern about how flexible the execution logic should be in the proposed 'run_scenario' function.
The current sketch assumes something like this,

model = model_cls(scenario=scenario)
model.run_for(n_steps)
data = model.data_recorder.get_data()

But models are often executed in different ways, like 'run_for', stepping loops, stop conditions, etc, and experiments also vary in what they want to extract as the result (full recorder output vs summary statistics or custom metrics).

It might be better for 'run_scenario' to separate the execution logic and the result extraction, for example by allowing user-provided callables for one or both of these. That way the worker function stays general enough for different experiment patterns while still giving Mesa a standard way to execute a scenario.

5 replies

quaquel Mar 10, 2026
Maintainer Author

This touches on one of my open questions

Should we just return the data as stored by the data_recorder, or should we allow for a user-specified callable that takes the model as input and returns a pickleable object containing the result of the scenario the user wants to extract?

I am inclined to go with the callable route. But this does bring up a new question. Where to handle the replications. Inside the worker function or at the executor level? The less data is sent back from the worker, the better. If the plan is to use descriptive statistics over the replications, it makes more sense to handle replications within the worker. If the plan is to store the individual replications, it makes more sense to handle it at the executor level. My original proposal assumed the former, but below I give a quick sketch of the latter.

def run_scenario(model_cls, scenario, n_replications, n_steps, user_callable):
    results = {}
    for replication in scenario.spawn_replications(n_replications):
        model = model_cls(replication)
        model.run_for(n_steps)
        results[replication.replication_id] = model.data_recorders.get_data()

    processed_results = user_callable(results)
    return processed_results

souro26 Mar 10, 2026

I think it would be better to keep run_scenario focused on executing a single scenario and return the result for that run. In that case, replications would be handled at the executor level instead of inside the worker. Conceptually it keeps the responsibility of the worker simple by running one scenario and returning its result. Then the executor or experiment logic can decide whether to run multiple replications and how to aggregate or store them.Handling replications outside the worker also seems more flexible, since some experiments might want to store every replication separately instead of aggregating them immediately.

jackiekazil Mar 11, 2026
Maintainer

I have really enjoyed this thread. Tell me if I'm wrong, but this feels like the same decision as the data store.

The cleaner implementation is the default aimed at easy onboarding (executor level), with something like a mesa_hpc module that overrides for inside-the-worker behavior. Users learn one pattern and it applies consistently across both execution and storage. Does that module belong in core or external? (I can see either way.)

souro26 Mar 11, 2026

My opinion would be to keep something like mesa_hpc outside of core initially. The default execution pattern in core should stay simple, and more specialized execution patterns can evolve separately without adding complexity to Mesa itself. If the patterns stabilize and see broad use, it can always be reconsidered for addition to core later.

quaquel Mar 11, 2026
Maintainer Author

If we can realize what I am proposing, there is no need for a mesa.hpc. Let me try to explain. Below is a simple example (using my scenario development branch, #3493). Here, I use a ProcessPoolExecutor to execute scenarios in parallel over 4 cores. If I want to go to an HPC, the only change required is to do from mpi4py.futures import MPIPoolExecutor and use this instead of the ProcessPoolExecutor.

from itertools import cycle
from concurrent.futures import ProcessPoolExecutor
from scipy.stats import qmc

from mesa.experimental.scenarios.scenario import Scenario

from mesa import Model

class MyModel(Model):

    def __init__(self, scenario=Scenario):
        super().__init__(scenario=scenario)

    def step(self):
        pass

samples = qmc.LatinHypercube(4, rng=42).random(100,)
scenarios = Scenario.from_numpy(samples, ["a", "b", "c", "d"], rng=42)


def run_func(cls, scenario, n_steps):
    model = cls(scenario=scenario)
    model.run_for(n_steps)
    return scenario.scenario_id, scenario.replication_id, -1


if __name__ == '__main__':
    n_steps = 100
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = executor.map(run_func, cycle([MyModel,]), scenarios, cycle([n_steps,]))

What is also clear from this example is that there are still various unresolved issues.

I need to use cycle to pass extra arguments, such as MyModel and n_steps. I want a more elegant solution for this. This will probably involve using a so-called initializer function that is passed to the executor: ProcessPoolExecutor(max_workers=4, initializer=init_fun, init_args=(MyModel, n_steps)).
I don't process results in this example. Results is a generator object that yields the result from each experiment. In this example, that will be a tuple with the scenario_id, the replication_id, and -1. In principle, this should be sufficient but the correct return will at least partly depend on the backend storage protocal that we have to decide on.

quaquel · 2026-03-14T15:06:28Z

quaquel
Mar 14, 2026
Maintainer Author

With #3493 merged, let's see where we stand. It is now trivial to generate scenarios (see below). What is still missing are the following pieces

An easy way to scale experiments when using scipy.stats.qmc, samples are created on a unit interval. But you might want to rescale these to custom parameter intervals. Is this something we want to support inside Mesa? A simple rescale(samples:ndarray[n, d], ranges:ndarray[d,2]) -> ndarray[n, d] should be sufficient.
A way to specify the run specification. Below, I suggest using a base RunSpec class that users can subclass if needed. The nice thing about this is that the example implementation should already cover many use cases while being trivial to extend or customize. What do people think of this RunSpec base class idea?
The storage, as discussed before.

class RunSpec:                                                                                                                                                                                         
    def __init__(                                                                                                                                                                                        
        self,                                                                                                                                                                                            
        model_class: type[Model],
        n_steps: int,
    ):
        self.model_class = model_class
        self.steps = steps

    def __call__(self, scenario: Scenario) -> tuple[int, int, dict[str, pd.DataFrame]]:
        model = self.model_class(scenario=scenario)

        model.run_for(n_steps)

        data = {
            name: model.recorder.get_table_dataframe(name)
            for name in model.recorder.storage
        }

        return scenario.scenario_id, scenario.replication_id, data


if __name__ == "main":
    # draw 1000 samples
    samples = scipy.stats.qmc(d=3).random(1000)
    scenarios = SchellingScenario.from_numpy(samples, parameter_names=["density","minory_pc", "homophily"], 
                                             replications=10)
    run_schelling = RunSpec(Schelling, 100)
    
    with ProcessPoolExecutor() as executor:
        results = list(executor.map(run_schelling, scenarios))

7 replies

quaquel Mar 15, 2026
Maintainer Author

Thanks for the feedback

As long as we have not resolved how to handle run_while/stopping conditions inside mesa, it's difficult to design for it in something like RunSpec.
I deliberately stayed away from passing a callable. Lambda's and partials for example are not allways pickleable and so might break a ProcessPoolExecutor or an mpi4py.futures.MPIMPIPoolExecutor.
I like the idea of adding hooks to the base RunSpec class.

There is a performance concern regarding the dict[str, pd.DataFrame] return type in the sketch.

It is indeed a concern. But mainly if you are returning full agent data over time, which, in my experience, is rare to do when running a parameter sweep. That kind of data is more often used for diagnostic work on individual runs.

Python dicts, OR the future storage backend protocol should allow workers to write chunks directly to the backend (e.g., appending to a partitioned Parquet dataset or a SQLite database) rather than sending all heavy data back to the main process executor.

I have not thought about that. The problem is that when running in parallel (potentially even across multiple physical machines and thus relying on networking), multiple processes can simultaneously try to access the back end. So it becomes trickier to design. If the results are sent to the back end from the main process, it's a lot simpler.

souro26 Mar 15, 2026

As long as we have not resolved how to handle run_while/stopping conditions inside mesa, it's difficult to design for it in something like RunSpec.

This can be addressed by treating the execution logic as an overridable hook on RunSpec. The base class can provide a simple default implementation (like run_for) so that the common case works out of the box, while subclasses override the execution step when different run semantics are needed.

class RunSpec:

    def __init__(self, model_class: type[Model], steps: int = 100):
        self.model_class = model_class
        self.steps = steps

    def execute(self, model: Model) -> None:
        """Default execution logic."""
        model.run_for(self.steps)

    def extract_data(self, model: Model) -> Any:
        """Hook for extracting results from the model."""
        return model.recorder.get_data()

    def __call__(self, scenario: Scenario):
        model = self.model_class(scenario=scenario)
        self.execute(model)
        data = self.extract_data(model)
        return scenario.scenario_id, scenario.replication_id, data

Alternative run patterns can then be expressed by subclassing

class UntilRunSpec(RunSpec):

    def __init__(self, model_class, end_time):
        super().__init__(model_class)
        self.end_time = end_time

    def execute(self, model: Model):
        model.run_until(self.end_time)

codebreaker32 Mar 15, 2026
Collaborator

I have not thought about that. The problem is that when running in parallel (potentially even across multiple physical machines and thus relying on networking), multiple processes can simultaneously try to access the back end. So it becomes trickier to design. If the results are sent to the back end from the main process, it's a lot simpler.

For local runs or simple setups, funneling data back to the main process to handle the writing is definitely the safest and simplest default. However, when scaling up to multiple physical machines (like with mpi4py), funneling heavy data back to the main process becomes a bottleneck. It forces the master node to hold the combined results of all workers in RAM.

future storage backend protocol should allow workers to write chunks directly to the backend (e.g., appending to a partitioned Parquet dataset or a SQLite database)

To elaborate it, Instead of workers fighting to write to the same file, each worker writes its own file to a shared network directory (e.g., results/run_1.parquet, results/run_2.parquet). The worker then just returns a lightweight status tuple back to the master node, like (scenario_id, "SUCCESS"). SQLIte database for main process

Because they are writing to separate files, there are zero database locks and zero network collisions. At the end of the experiment, Pandas or Dask can load that entire directory.

This is actually why I am inclined to hook idea! FOR eg. if an HPC user needs to scale, they can just subclass RunSpec, override the hook to write a partitioned Parquet file and good to go.

quaquel Mar 15, 2026
Maintainer Author

It forces the master node to hold the combined results of all workers in RAM.

Just to be nitpicky, but the main process can still flush to disk from time to time. So there is no hard need to keep everything in RAM but otherwise, I agree.

To elaborate it, Instead of workers fighting to write to the same file, each worker writes its own file to a shared network directory

This is indeed the simpler design for an HPC-style situation. The drawback from a user's point of view is that there will now be a difference between running locally and on an HPC, and the custom RunSpec might need to be tailored to the hardware you are running on.

I agree that a subclassable RunSpec is still the way to go. But I am wondering what the default implementation should be designed for.

souro26 Mar 15, 2026

But I am wondering what the default implementation should be designed for.

I think the default RunSpec should be designed for the simple case where a worker executes a single Scenario and returns its result to the main process. In that setup the default implementation stays very small: construct the model from the Scenario, run it and return the result together with the scenario_id and replication_id. More specialized workflows could then be implemented by subclassing RunSpec and overriding parts of the execution.

souro26 · 2026-03-19T14:12:12Z

souro26
Mar 19, 2026

Building on the RunSpec idea, I think this can be the default execution unit: a small object that takes a Scenario, runs a model, and returns a result, without handling replications or storage.

class RunSpec:
    def __init__(self, model_class, steps: int = 100):
        self.model_class = model_class
        self.steps = steps

    def execute(self, model):
        model.run_for(self.steps)

    def extract(self, model):
        return model.data_registry

    def __call__(self, scenario):
        model = self.model_class(scenario=scenario)
        self.execute(model)
        data = self.extract(model)
        return scenario.scenario_id, scenario.replication_id, data

The intent here is that this replaces the current run_scenario style worker with something structured but still minimal. Replications stay at the executor level so the unit of work remains a single Scenario. I’ve kept extract deliberately open rather than fixing a return format, so different experiments can define their own outputs via subclassing instead of passing callables into the API.

0 replies

souro26 · 2026-04-02T15:24:11Z

souro26
Apr 2, 2026

In my initial implementation of RunSpec in #3641, one thing I didn't account for is that none of the recorders store scenario_id or replication_id. They're single-run by design, so once results are aggregated across runs, there's no way to distinguish where each row came from unless that information is attached somewhere.

So where should run identity live, considering it has to be added before aggregation? One option is to make extract responsible for tagging each dataset with scenario_id and replication_id, since it has access to both the model and scenario. That would require passing scenario into extract. If tagging doesn't live here, it has to be handled at the executor or storage layer instead, but this needs to be handled explicitly somewhere in the pipeline.

2 replies

quaquel Apr 2, 2026
Maintainer Author

As long as RunSpec.__call__, or some equivalent returns the scenario_id, replication_id, and the results, you are fine.

souro26 Apr 3, 2026

Thanks, there are two other things around RunSpec that i wanted to settle.

For extract, a concrete default based on the recorder (e.g. get_table_dataframe(name) per dataset) is the most practical option. It gives a usable result, and anything more specialized can be handled by overriding.
format_output is not necessary and could be dropped. Letting subclasses change the return shape breaks the (scenario_id, replication_id, result) , which should stay fixed and be owned by __call__.

Uh oh!

Replacing the batch runner #3483

Uh oh!

Uh oh!

quaquel Mar 8, 2026 Maintainer

Replies: 7 comments · 15 replies

Uh oh!

Uh oh!

codebreaker32 Mar 9, 2026 Collaborator

Uh oh!

quaquel Mar 9, 2026 Maintainer Author

Uh oh!

jackiekazil Mar 10, 2026 Maintainer

Uh oh!

quaquel Mar 10, 2026 Maintainer Author

Uh oh!

Uh oh!

quaquel Mar 10, 2026 Maintainer Author

Uh oh!

Uh oh!

jackiekazil Mar 11, 2026 Maintainer

Uh oh!

Uh oh!

quaquel Mar 11, 2026 Maintainer Author

Uh oh!

Uh oh!

quaquel Mar 14, 2026 Maintainer Author

Uh oh!

quaquel Mar 15, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codebreaker32 Mar 15, 2026 Collaborator

Uh oh!

quaquel Mar 15, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quaquel Apr 2, 2026 Maintainer Author

Uh oh!

Uh oh!

quaquel
Mar 8, 2026
Maintainer

Replies: 7 comments 15 replies

codebreaker32
Mar 9, 2026
Collaborator

quaquel
Mar 9, 2026
Maintainer Author

jackiekazil
Mar 10, 2026
Maintainer

quaquel Mar 10, 2026
Maintainer Author

quaquel Mar 10, 2026
Maintainer Author

jackiekazil Mar 11, 2026
Maintainer

quaquel Mar 11, 2026
Maintainer Author

quaquel
Mar 14, 2026
Maintainer Author

quaquel Mar 15, 2026
Maintainer Author

codebreaker32 Mar 15, 2026
Collaborator

quaquel Mar 15, 2026
Maintainer Author

quaquel Apr 2, 2026
Maintainer Author