Mesa 4 goals #2972

EwoutH · 2025-12-18T21:39:05Z

EwoutH
Dec 18, 2025
Maintainer

Update 2026-01-14: We're starting to move towards Mesa 4 development. See #3132 as tracking issue.

As we start development towards Mesa 4.0 soon, we have an opportunity to address fundamental challenges in agent-based modeling from a clean slate design. In this discussion I want to start aligning our overarching goals and priorities, focusing on what users should be able to do.

I would like to invite the community to help shape these priorities and identify what matters most for the future of Mesa.

Personally, I would propose focussing Mesa 4.0 on two areas: Fundamentals and Extendibility.

Fundamentals

Time & Space in order

Users should be able to model temporal dynamics and spatial relationships as naturally as they think about them. Today, Mesa requires users to choose between fixed time steps and complex simulator objects, while spatial modeling involves navigating multiple overlapping abstractions. In Mesa 4, a researcher studying disease spread should be able to schedule events at arbitrary times (an infection after 3.7 days), have agents perform tasks with explicit durations (a commute that takes 45 minutes), and seamlessly mix regular stepping with event-driven behavior—all without managing separate simulator objects or worrying about time unit mismatches.

Similarly, when modeling movement, users should be able to work with a unified coordinate system where agents have positions, spaces have properties, and everything relates through clear geometric relationships. Whether modeling flocking birds in continuous 2D space, pedestrians navigating urban networks, or cells on a grid, the conceptual model should be consistent and the transitions between representations should be straightforward.

See: Proposal: Unified Time and Event Scheduling API #2921, Conceptual model of Space #2585 and Support for agent movement in continuous spaces #2967

Cleansheet Experimentation and Data Collection

Users should be able to design sophisticated experiments and collect data without fighting Mesa's infrastructure. A researcher running parameter sweeps on an HPC cluster should have the same simple experience as a student doing local replications on their laptop—just with appropriate scaling. Users should be able to specify complex experimental designs (Latin hypercube sampling, adaptive parameter exploration), execute them across diverse computing environments (from laptops to cloud platforms), and have their data land in formats that their analysis tools already understand (Parquet, Zarr, Xarray). The barrier between "running my model" and "analyzing my results" should vanish: data collection should integrate naturally with modern Python analytics tools, support heterogeneous agent types and spatial properties without manual wrangling, and provide clean provenance tracking for reproducibility. Whether someone needs a quick parameter sweep or a systematic exploration of thousands of scenarios, the path should be clear and the tools should get out of the way.

See Next-gen batch runner #2321 A system for managing Model/Agent default values and ranges #2268, The future of data collection #1944, Multi-Agent data collection #348

Stable, Performant Visualization

Users should be able to visualize their models reliably without the visualization system being a source of uncertainty. In Mesa 4, visualization should be a stable, well-documented tool that users can depend on across releases. Someone building a model for policy presentations should be confident that their visualizations will work the same way six months later, that performance will be acceptable for their model scale, and that they can customize what they need without diving into internals. The focus should shift from adding new experimental features to ensuring what exists works consistently, performs well, and has clear documentation. Users should spend their time thinking about what insights to show from their model, not debugging visualization quirks or working around performance issues.

Extendable

Powerful Agent Behavior

Users should be able to model sophisticated agent behavior without implementing everything from scratch. A modeler studying cooperation should be able to give agents competing goals, time-consuming tasks that can be interrupted, and continuous states that evolve based on environment and actions—using composable building blocks rather than ad-hoc conditional logic. Agents in an evacuation model should be able to start moving toward an exit, reassess when conditions change, prioritize helping others over escaping, and track continuous stress levels that affect their decisions. Users modeling hunter-gatherers should express that hunting takes hours, can be abandoned if threats appear, and success depends on continuously updated energy states—without manually tracking dozens of counters and flags. The infrastructure should make modeling human-like reasoning and decision-making natural rather than forcing everything into single-step reactive rules.

See Behavioral Framework #2538, Tasks #2526, Continuous States #2529

ML/RL/DL/AI Integration

Users should be able to integrate modern AI and machine learning approaches into their agent-based models without impedance mismatches. A researcher studying learning in agent populations should be able to calibrate model parameters against real-world data using simulation-based inference, train agents with reinforcement learning, or have agents make decisions via language models—with Mesa providing the scaffolding rather than forcing them to bridge incompatible abstractions. Someone building a policy model should be able to specify prior distributions over micro-parameters, run inference to identify parameter sets consistent with macro observations, and quantify how much variance their agent-based hypothesis actually explains. Users experimenting with LLM-driven agents should be able to manage context windows, enforce structured outputs, and integrate local or cloud models without building custom orchestration. The boundary between Mesa models and the ML ecosystem should be clean: models should be differentiable where possible, inference tools should connect naturally, and the framework should support both traditional parameter calibration and modern learned behaviors.

See Mesa RL #2070 and ActionSpace: Defining the influence Agents can have #2858

Extensibility Through Modules

Users should be able to build on and share sophisticated modeling components without either reinventing the wheel or depending on fragile code. A researcher implementing a new negotiation protocol should be able to package it as a reusable module that others can install, understand, and compose with their own models. Someone studying epidemiology should be able to draw on a library of disease transmission models, economic primitives, or movement behaviors—each maintained by domain experts, with clear documentation and tested examples—rather than copying code from papers or reimplementing from descriptions. The Mesa ecosystem should make high-quality components discoverable, provide clear expectations about compatibility and maintenance, and create paths for community contributions to move from experiments to stable, widely-used infrastructure. Users should be able to focus on their research questions, confident that the behavioral building blocks they need are either already available or can be contributed back to benefit others.

See Mesa Module system (scaling the ABM ecosystem) #2954.

These goals might be a starting point for discussion. Pursuing all of them simultaneously probably isn't feasible, so we need your input: Which of these matters most for your work? What are we missing? What tradeoffs should we consider? Please share your thoughts, use cases, and priorities in the discussions linked above and here in this thread.

quaquel · 2025-12-18T22:05:25Z

quaquel
Dec 18, 2025
Maintainer

I broadly agree with these ideas with a few caveats.

I am in broad agreement on making scheduling a core part of Mesa. However, historically and conceptually, next-event time progression and incremental time progression are distinct paradigms. True, next event time progression is more fundamental in that incremental time progression can be expressed in it, but still.
I am not sure Mesa should worry about experimentation with models. I have argued repeatedly, also with NetLogo people about Behaviorspace, that experimentation is a problem quite different from building simulation models, namely the design and execution of experiments. I personally would prefer that modeling frameworks in general offer a clear API for setting model input parameters, running the model, and extracting the data. With that, other tools can focus on the design and execution of experiments on the platform of choice.
I am not sure about the UI side of things in general. Again, this is a very broad area. Therefore, having a clear interface with a basic implementation using e.g, Solara showcasing this interface might be sufficient. Additionally, I would strongly favour a more reactive design for the UI, where the model might run faster than the rendering, and where you can save the "events" that result in render updates, allowing you to replay and rewind animations.
Not sure about the AI/ML stuff. Model calibration is its own can of worms, which overlaps with point 2. RL is, in my experience, currently an incredibly wasteful use of compute and easily outperformed by MCMC or GAs. However, if you have a good interface, as per point 2, people can easily experiment with other libraries.
Yes to the module idea.

3 replies

EwoutH Dec 24, 2025
Maintainer Author

However, historically and conceptually, next-event time progression and incremental time progression are distinct paradigms.

Curious what the reasoning for that was. Was it natural evolution or technology limitations, or where the fundamental arguments to split it that way?

True, next event time progression is more fundamental in that incremental time progression can be expressed in it, but still.

I agree that a nice endpoint for incremental time progression could be useful for incremental learning (@tpike3 you had a nice term for this type of learning, can you remember it?)

I am not sure Mesa should worry about experimentation with models.

Yes and no. Experimentation is key, but that doesn't mean we have to implement it ourselves.

NetLogo has to, since they're in a very closed ecosystem, but we're in Python, so we can leverage the ecosystem. But simple guides and example how to connect (and if needed connectors) could replace current batch_run stuff.

I am not sure about the UI side of things in general.

While as a power user I largely agree, it's by far the most frequent request by teachers to have good visualisation (in my experience).

Not sure about the AI/ML stuff.

I'm also not. Especially LLMs I think are extreme overkill. But it's an area in very active development, and I think the current explicit rule-based paradigm has severe limitations.

Yes to the module idea.

Awesome. Now to find the maintenance time...

quaquel Dec 24, 2025
Maintainer

Curious what the reasoning for that was. Was it natural evolution or technology limitations, or where the fundamental arguments to split it that way?

I suppose it's more of a domain issue. Queuing stuff is more naturally expressed in the next event, while, e.g., crowd simulations are easier to express in incremental time.

Yes and no. Experimentation is key, but that doesn't mean we have to implement it ourselves.

I think we agree on this. What matters is that it is easy to run experiments with Mesa models. This means that it is easy to set parameters, run a model, and extract data from it. The first two are in a good position, the last is not because of the trouble with the data collector.

While as a power user I largely agree, it's by far the most frequent request by teachers to have good visualisation (in my experience).

My concern here is also about the long-term viability of Solara as the basis, as well as the need, in my view, to switch to a reactive design for the UI. I want to be able to vary the run speed of the animation without changing the run speed of the model. I also want to be able to replay (part of) a run.

EwoutH Dec 24, 2025
Maintainer Author

I suppose it's more of a domain issue.

So that would mean Mesa would have a great opportunity to connect domains by supporting both / the full spectrum, right?

Yes we do agree.

My concern here is also about the long-term viability of Solara as the basis

Yeah this indeed might be a problem.

reactive design for the UI

Mesa-interactive was certainly moving in the right direction.

EwoutH · 2025-12-24T11:39:04Z

EwoutH
Dec 24, 2025
Maintainer Author

Time

A thread to discuss time.

In my vision, there is a single universal truth of time. That's kept in the model in model.time. Everyone can read it, but only a single control class (Run, whatever) can update it. There are simple APIs to do that.

model.steps and using .step() to read and progress time will be deprecated and removed. Using integer time progression (.run(1) or whatever) is a perfect equivalent of it, with model.time being equivalent to the old model.steps in that case.

One complication: So many stuff is build around model.step() progressing time. While that served Mesa 3 well, it means DataCollector, batch_run and SolaraViz rely on it, with it often being deeply embedded in core logic. That make it difficult to overhaul. My initial idea would be removing DataCollector and batch_run for Mesa 4. Then we can do a clean sheet design for both, considering what's already present in the Python ecosystem, the latest Python features (like dataclasses) and our new core logic.

1 reply

quaquel Dec 24, 2025
Maintainer

One complication: So many stuff is build around model.step() progressing time. While that served Mesa 3 well, it means DataCollector, batch_run and SolaraViz rely on it, with it often being deeply embedded in core logic. That make it difficult to overhaul. My initial idea would be removing DataCollector and batch_run for Mesa 4. Then we can do a clean sheet design for both, considering what's already present in the Python ecosystem, the latest Python features (like dataclasses) and our new core logic.

This highlights my point above: if you can "connect" in a reactive manner, we have a much weaker coupling, thus avoiding these problems in the future.

EwoutH · 2025-12-28T09:32:55Z

EwoutH
Dec 28, 2025
Maintainer Author

Removing `batch_run` and `DataCollector`

I'd like to gather thoughts on a significant architectural decision for Mesa 4: completely removing the current batch_run function and DataCollector class rather than trying to evolve them.

The current implementations are deeply coupled to model.step() and incremental time progression, which conflicts with our vision for unified time handling in Mesa 4. Rather than carrying forward technical debt or attempting complex backwards compatibility, a clean break would let us design from scratch with modern Python features, better integration with the scientific Python ecosystem, and support for our new time model from the ground up.

This would be a major breaking change, but we're allowed to make these with Mesa 4. It might be the right opportunity to get the fundamentals right. There is the risk we ship early versions without dedicated data collection or batch run features built-in, but for those there's still Mesa 3. Another risk to keep in mind is we lose some test coverage of Mesa behavior that DataCollector and batch_run now implicitly cover. Some of that is intended (move away from model.step()), but some might be good to check if that's tested anywhere else currently (it should be, but you never know).

What's everybody's stance on this?

12 replies

codebreaker32 Jan 14, 2026
Collaborator

Hi everyone!

I have been researching the alternatives to replace the current batch_run and data_collector and make them obsolete. Following the suggestion from @EwoutH and @quaquel in #3103, I suggest a new architecture. It directly addresses the need for a clean API to set values, run models, and extract output.

Problem

Tools like Vensim (DLLs) , NetLogo (NetLogoLink) , and Simio all provide fine-grained control: load, set_value, run_for, and get_data.
Mesa currently lacks this. batch_run relies on its own internal loop, and users often write wrapper functions just to run a model for steps. We lack a standardized way to say model.run_for(10) or model.reset().
Simio and others use a Scenario object to centralize parameters (inputs, seeds, replications).
Mesa models often rely on untyped **kwargs in __init__. (Thanks to @quaquel, it is introduced in Add explicit support for Scenarios #3103 )
The current DataCollector tries to do too much. It requires manual "pulling" (self.datacollector.collect(self)), which is error-prone. It also tightly couples the model to the output format, making optimization (which might not need a dataframe) difficult.

Conceptual Design

I propose an architecture that separates Execution, Configuration, and Data.

The Architecture Diagram

Mermaid Chart - Create complex, visual diagrams with text -2026-01-14-190007

Key Components

The Engine (RunControl)
Standardizes execution logic. It enables the desired API:

model.run_for(100)
model.reset()

2. The Configuration (Scenario)
A standardized object to hold initialization parameters. It absorbs **kwargs, creating a clean interface(see #3103)

3. The Data System (CollectorListener)
Moves from "Pull" to "Push". The Model emits signals, and the Listener reacts. This solves the "DataCollector doing too much" problem by allowing us to swap listeners (e.g., a Polars listener for speed, a Pandas listener for compatibility, or no listener for pure speed).

Above architecture also addresses @tpike3's concern

I have been implementing this architecture lately. If these design align with future goals for mesa 4.0, I'd love to open a pathfinding PR to explore more ideas and invite discussions.

EwoutH Jan 14, 2026
Maintainer Author

Sounds great. Some good ideas in here.

I think Jan and I have the scenario and execution covered. Jan has done a lot with signals, I think the two main concerns here are A) simple, preferably elegant, API for the user and B) performance. I would zoom in there.

Elegant data storage objects/structure could also be an area of interest. Since we often have sparse data sets, track values for a whole AgentSet, agents get added and removed, etc.

codebreaker32 Jan 26, 2026
Collaborator

Quick update: I’ve implemented the new listener architecture using an Abstract Base Class (ABC) pattern with suggestions and reviews from @quaquel (ref #3145) . This allows users to swap storage backends (In-Memory, SQL, Parquet, JSON) without touching their model code.

1. Architecture Overview

The design splits the collection logic into three layers:

Registry (DataRegistry): Defines what to collect.
Orchestrator (BaseCollectorListener): Defines when to collect. Handles config, intervals, and subscribes to model.steps.
Backend (Subclasses): Defines where to store. Implements the _store_dataset_snapshot contract.

Design Decisions:

Observable-Driven: Collection is triggered reactively via model.steps, removing the need for manual collect() calls (though manual is still supported).
Block-Based Storage: Data accumulates in blocks, and is only converted to DataFrames upon retrieval.

Class Diagram

classDiagram
    class BaseCollectorListener {
        <<Abstract>>
        #_on_step_change()
        #_store_dataset_snapshot()*
    }
    class CollectorListener {
        +get_table_dataframe()
    }
    class SQLListener {
        +query()
    }
    class ParquetListener {
        +_flush_buffer()
    }

    BaseCollectorListener <|-- CollectorListener
    BaseCollectorListener <|-- SQLListener
    BaseCollectorListener <|-- ParquetListener

2. Usage Examples

A. In-Memory (Default replacement for DataCollector)
Zero-config, stores data in memory blocks.

listener = CollectorListener(model)
for _ in range(100):
    model.step()

# Converted to DataFrame on demand
wealth = listener.get_table_dataframe("wealth")

B. SQL Backend (Streaming)
Useful for large sims where memory is a constraint.

listener = SQLListener(model, db_path="simulation.db")
for _ in range(100):
    model.step()

# Query directly without loading into Python
df = listener.query("SELECT avg(wealth) FROM wealth WHERE step > 50")

C. Parquet Backend (Batching)
Writes compressed columnar files.

listener = ParquetListener(model, output_dir="results/")
for _ in range(100):
    model.step()
# Creates 'results/wealth.parquet', etc.

D. Scheduling
Intervals are configured per-dataset, independent of the backend.

config = {
    "wealth": {"interval": 1},      # Every step
    "network": {"interval": 10},    # Every 10 steps
    "heavy_metric": {"interval": 100, "start": 500} # Start late, collect rarely
}
listener = CollectorListener(model, config=config)

3. Current Implementation

I have the core architecture and 4 working backends:

CollectorListener: In-memory block storage.
SQLListener: SQLite streaming with auto-schema inference.
ParquetListener: Batched Parquet writer.
JSONListener: Simple JSON export.

4. Roadmap & Performance

I'm currently targeting two retrieval optimizations:

Using np.vstack on raw storage blocks to merge arrays before DataFrame creation (avoids the concatenation overhead).
Updating the ABC to accept start_step/end_step/max_steps/retention_size(or window_size)

I would appreciate any feedback or code reviews on:-

Does BaseCollectorListener expose the right methods for custom backends?
Is the DatasetConfig flexible enough for complex scheduling?
Should we lean harder into Polars for the list-of-dicts optimization, or keep Pandas as the first-class citizen?
Are above ideas worth following up
Anything else...

Thanks!

quaquel Jan 26, 2026
Maintainer

This looks great! I have been pursuing some other things (mesa-rust, and some pathfinding for mesa-signals), but I hope to spend some time this week again on DataSet and DataRegistry.

Some quick thoughts

Observable-Driven: Collection is triggered reactively via model.steps, removing the need for manual collect() calls (though manual is still supported).

I suggest tying it to model.time. This is the path forward anyway and will also make this work in purely discrete-event models. As you might have seen in #3212, it is trivial to make model.time observable.

Intervals are configured per-dataset, independent of the backend.

I think this is really cool and powerful. I'll have to take a closer look, but the fact that you allow for interval and also specify when to start is great. Models can have a warm-up period, and you might only want to collect data after it. With this design, we have that covered in a very natural way. One minor consideration is whether to do this config via a dict or via a lightweight dataclass. The advantage of the dataclass is that it's easy to use in an IDE and that you can set defaults, so you only have to specify what you want to change.

Should we lean harder into Polars for the list-of-dicts optimization, or keep Pandas as the first-class citizen?

I am torn on this. Pandas is a staple in most academic teaching, so it's what at least students will know. Do we use pandas anywhere outside of data collection at the moment? Probably inside solara UI stuff.

codebreaker32 Jan 27, 2026
Collaborator

One minor consideration is whether to do this config via a dict or via a lightweight dataclass. The advantage of the dataclass is that it's easy to use in an IDE and that you can set defaults, so you only have to specify what you want to change.

You are spot on. I actually implemented a DatasetConfig dataclass in the code (see baseCollectorListener.py), but I used a plain dict in the example for brevity. The dictionary input in init is just syntactic sugar to allow users to easily override specific fields without importing the class (e.g., config={"wealth": {"interval": 10}}). Internally, the listener immediately converts these dicts into strict DatasetConfig objects.

self.configs: dict[str, DatasetConfig] = {}

I will expose this in API soon to end-users

I am torn on this. Pandas is a staple in most academic teaching, so it's what at least students will know. Do we use pandas anywhere outside of data collection at the moment? Probably inside solara UI stuff.

Pandas is the lingua franca for students and researchers. Let's keep Pandas as the first-class citizen for the return type (get_table_dataframe returns a pd.DataFrame). However, we can use Polars internally (if installed) as an optimization engine to speed up the conversions.

I suggest tying it to model.time. This is the path forward anyway and will also make this work in purely discrete-event models. As you might have seen in #3212, it is trivial to make model.time observable

Agreed 100%. If model.time is observable, switching the listener to subscribe to time instead of steps is the right move for full DES support

quaquel · 2025-12-28T19:21:32Z

quaquel
Dec 28, 2025
Maintainer

Performance

There have been previous discussions on making Mesa more performant. I believe that as part of the move to Mesa 4.0, we should conduct extensive profiling and testing to determine the overhead of the Mesa framework itself, while also exploring ways to reduce it.

To be clear: in any non-trivial ABM in Python, the user-written code is highly likely to dominate the runtime. Even in our example models, this is true. For example, I have been running some tests on Schelling. Most of the runtime is in SchellingAgent.assign_state. However, AgentSet.shuffle_do, and CellCollection.agents also contribute substantially to the overall runtime. Therefore, any measure we can reasonably take to reduce this runtime is welcome.

Critical here is the word "reasonably". Two options have been previously discussed: using Cython and migrating parts of the codebase to Rust. I am open to exploring both. However, either will have profound ramifications: we need to completely overhaul the building and distribution if we include compiled code. Moreover, although I have used Cython and understand basic Rust, I am not a fluent developer in Cython, and the steep learning curve of Rust has so far been an obstacle, as I simply don't have the time for it. So, there is a skill issue on the maintainer side that also needs to be considered.

21 replies

quaquel Jan 23, 2026
Maintainer

I have some first results:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🟢 -61.2% [-62.7%, -59.7%]	🟢 -9.7% [-10.2%, -9.2%]
BoltzmannWealth	large	🟢 -77.2% [-77.3%, -77.0%]	🟢 -28.6% [-31.4%, -25.3%]
Schelling	small	🟢 -68.1% [-68.5%, -67.8%]	🟢 -20.2% [-20.8%, -19.5%]
Schelling	large	🟢 -63.6% [-64.9%, -62.0%]	🟢 -20.1% [-24.9%, -15.3%]
WolfSheep	small	🟢 -49.5% [-50.0%, -49.0%]	🟢 -48.9% [-51.3%, -46.3%]
WolfSheep	large	🟢 -66.3% [-73.2%, -59.0%]	🟢 -41.1% [-42.0%, -40.3%]

The Rust code is not feature-complete yet, and there is still room for further improvement. But it now contains everything: Grid, Cell, and CellCollection. For example, select on CellCollection is missing. I also just added CellCollection, but it uses an internal Python dictionary rather than a Rust HashMap. There are a bunch of similar things where I was using Python datatypes instead of Rust datatypes while developing and testing.

quaquel Jan 24, 2026
Maintainer

For those curious, the Rust code can be found here: https://github.com/quaquel/mesa-rust. The modified examples can be found here: https://github.com/quaquel/mesa/tree/mesa-rust.

codebreaker32 Jan 26, 2026
Collaborator

I checked https://github.com/quaquel/mesa-rust but it’s currently returning a 404.

quaquel Jan 26, 2026
Maintainer

The repo was set to private by default. It should now be visible.

falloficarus22 Feb 7, 2026

I would like to help on this if u are willing.

quaquel · 2025-12-29T13:12:40Z

quaquel
Dec 29, 2025
Maintainer

Mesa exceptions

For Mesa 4.0, it might also be good to make a decision on custom Mesa exceptions. Currently, we heavily rely on the various built-in exceptions or generic Exceptions with more informative messages. However, I don't know the best practice in python for how to proceed on this.

EwoutH Dec 29, 2025
Maintainer Author

I think we need three things:

Classes you can catch
Relevant metadata included (Mesa version, Agent which caused the error if in Agent, place on grid if by grid, etc.)
Informative messages that clearly explain

The first is for it being machine-processable, the second two for it being human-interpretable.

This allows users to wrap their simulation logic in a try/except block that catches only Mesa-related failures

This would be a very nice capability, with a clear use case.

Nithurshen Dec 29, 2025

Here are the references to how these libraries structure their exceptions:

In both cases, they avoid scattering exception classes throughout the codebase. They stick to standard Python exceptions (ValueError, TypeError) for argument validation, but use these centralized custom exceptions for library-specific state errors.

quaquel Dec 29, 2025
Maintainer

Classes you can catch

As long as you subclass (in)directly from BaseException (starting from Exception is recommended), this will work.

Relevant metadata included (Mesa version, Agent which caused the error if in Agent, place on grid if by grid, etc.)

It will be difficult to do with default Python exceptions, which only provide stack trace information (so, where in the code the error originates). I also fail to see why you would need this if you already have the stack trace.

Informative messages that clearly explain

Not sure what you mean here. There are at least two parts to this: having well-defined exception types, and including a clear text message when raising the exception.

Nithurshen Dec 29, 2025

You are right that for debugging, the stack trace is usually sufficient.

However, metadata on the exception object becomes valuable for programmatic handling and structured logging, especially in large scale batch runs.

I agree we should avoid over engineering. We can start with a simple class hierarchy (MesaError) and only add metadata fields to specific subclasses when we have a concrete use case for them.

Sonu0305 Jan 7, 2026

I agree that stack traces are usually sufficient for debugging, and that we should avoid adding complexity without a clear use case. The main value I see in a Mesa-specific exception is not better debugging, but explicit programmatic handling: allowing users to catch Mesa-related failures without swallowing unrelated Python errors.

To keep this minimal, I think it would be reasonable to start with:

a single MesaError base class (subclassing Exception) defined in one central place,
continued use of built-in exceptions (ValueError, TypeError, etc.) for argument validation.

EwoutH · 2026-01-13T10:40:52Z

EwoutH
Jan 13, 2026
Maintainer Author

Keyword-only arguments

We could consider using keyword-only arguments (using the * separator in method signatures) in some places in Mesa 4, i.e. for core Agent and Model methods. The main advantage is flexibility: we can add new parameters anywhere after the * without breaking existing code, since users must specify parameters by name rather than position. This also makes code more readable and prevents mistakes from mixing up argument order.

While it's can be perceived a bit defensive, it does make things nicely explicit and makes it one obvious way to do things. Methods with several parameters or where the order isn't obvious are natural candidates. Of course we have to find some balance here.

See PEP 3102.

5 replies

Sonu0305 Jan 13, 2026

Hi @EwoutH,
That sounds like a great idea for Mesa 4. Keyword-only arguments can significantly improve readability and safety, especially for methods with boolean flags or multiple optional parameters.

To get started, here are a few candidate methods in agent.py that seem like perfect fits for this change. What do you think about starting with these?

AgentSet.select: Force keywords for the filter options.
AgentSet.sort: Force keywords for the sorting options.
AgentSet.get: Make the error handling and default values explicit.

Would you like me to proceed with applying these changes to agent.py first?
Thanks for any help.

EwoutH Jan 13, 2026
Maintainer Author

We’re not in the implementation phase yet. First we need consensus on directions, goals and priorities.

Sonu0305 Jan 13, 2026

Agreed, I'd suggest we prioritize methods where:
There are multiple optional configurations.
There are boolean flags (to avoid True/False confusion).

DHRUV-014 Jan 13, 2026

Hello, @EwoutH
As a researcher who uses Mesa mainly to develop and refine models over extended periods of time (often revisiting code months later), I firmly believe that Mesa 4 should move toward a more explicit and evolution-friendly API design.
In particular, it seems to be consistent with research workflows to use keyword-only arguments for basic Model and Agent methods. In practice, models tend to accumulate optional parameters over time and positional arguments become brittle; small API changes can subtly alter semantics or require intricate refactoring across multiple projects. With keyword-only arguments serving as a mechanism to support additive evolution.

Is it anticipated that Mesa 4.x will treat core APIs (such as Model/Agent interfaces and run-control entry points) as comparatively stable once finalized?

Keyword-only APIs trade a small amount of verbosity for long-term stability, readability, and safer evolution. This is especially valuable in academic contexts, where models are reused, shared, and extended by others who may not be deeply familiar with the original implementation.

EwoutH Jan 15, 2026
Maintainer Author

In practice, models tend to accumulate optional parameters over time

I recognize that.

Is it anticipated that Mesa 4.x will treat core APIs as comparatively stable

We follow SemVer. That mean with major version we may break API (from 3 to 4), but with minor version (from 4.0 to 4.1) we never break API. See also our deprecation policy.

quaquel · 2026-01-21T06:40:21Z

quaquel
Jan 21, 2026
Maintainer

Mesa signals

#2291 added experimental support for signals to Mesa. However, at present, they are not part of the core functionality. For Mesa 4, I would like to make them a core part of Mesa that users can hook into. To be clear: I don't advocate shifting Mesa to a fully reactive programming style. Rather, signals are simply another convenient mechanism available to users. It will always be up to the user to decide whether to use it and profile their models to see the performance implications.

Why? Signals are very convenient for creating loose coupling between model parts. It also makes it easier to use reactive programming patterns where appropriate.

Currently, it is very difficult to implement data collection on a subset of agents whose membership varies over runtime. Likewise, parts of the model may be relevant only if certain conditions are met. At present, this condition must be checked at every model step. Signals might make it easier (and potentially faster) to implement these types of things.

What signals to add? At a minimum, I would like to add signals to model.register_agent, model.deregister_agent, and agent.remove. This will make data gathering much easier. For example, a custom agent data collection can listen to register_agent to check if new agents need to be added, and to agent.remove for removal. This will reduce reliance on hard refs (potentially reducing memory problems) and is easy to add. Moreover, the existing mesa_signals will be much more powerful and useful. For example, if I define an observable on an agent, listeners for this signal might also subscribe to agent.remove() to clean up if the agent is removed from the model.

Additional suggestions for places to add signals are welcome.

12 replies

quaquel Jan 24, 2026
Maintainer

Thanks @ShreyasN707, I broadly agree, with one additional detail. Next to lifecycle signals, I am still wondering whether time changes should not be reactive as well. If time changes are reactive, the UI can listen to this and update its rendering. Likewise, a data collector that listens to this can now know when to collect. One can then even specify the frequency. For example, only collect on every full-time unit increment.

ShreyasN707 Jan 24, 2026

Thanks, that helps clarify it — and I agree the time side is important here.
Making time changes reactive feels like a really good fit for things like the UI and data collection, without turning the whole model into an event-driven system. If there’s a simple time_changed (or similar) signal whenever the model clock advances, the scheduler can stay in control, but anything observing the model gets a clean, reliable hook.

That would make collectors and visualizations much simpler too. Instead of being tied to step() or a specific scheduler, they could just listen to time and decide when to run — for example, only on full time units, every N ticks, or when some threshold is crossed.

Combined with lifecycle signals for agents coming and going, this gives a nice separation: structure changes and time progression are observable, but the simulation logic itself stays straightforward and explicit.

codebyNJ Feb 2, 2026

I've been thinking about the signal overhead concern you raised, especially for lifecycle events like bulk agent creation. What if we added opt-in context managers to control when signals fire?

The Overhead Problem

If we add lifecycle signals (register_agent, deregister_agent), initializing 1000 agents means 1000 individual notifications. Same issue during batch removals in NumpyAgentDataSet — each tombstone triggers a signal storm listeners don't really need individually.

Two Context Managers

1. suppress_signals() — silence during bulk operations

with model.suppress_signals():
    for i in range(1000):
        model.add_agent(Agent(i))  # No signals emitted
# Signals resume after exit

2. batch_signals() — deduplicate and emit on exit

with model.batch_signals():
    agent.x = 1  # Queued
    agent.x = 2  # Overwrites (same observable)
    agent.y = 5  # Queued
# Exit: emits x=2, y=5 (not intermediate x=1)

codebreaker32 Feb 2, 2026
Collaborator

See #3227

codebyNJ Feb 2, 2026

I agree that suppression breaks reactivity by design - that's its explicit purpose. The key is that it's opt-in and scoped. Consider the real-world scenario that motivated this:

# Without suppression - 1000 signals during initialization
for i in range(1000):
    model.add_agent(Agent(i))  # Counter increments 1000 times

# With suppression - observers choose their semantic level
with model.suppress_signals():
    for i in range(1000):
        model.add_agent(Agent(i))
model.notify("agents_initialized", count=1000)  # Counter increments once

The Counter observer in your example should be out of sync during the block that's the contract.

The question is: does the observer care about 1000 individual AGENT_ADDED events, or does it care about the final state? If it needs every event, don't use suppression. If it needs the aggregate, emit a summary signal after.

On Batching Ambiguity
You're absolutely right that batching events vs. state is different, and I should have been clearer. The proposal distinguishes between these:

State updates (observables on agents) - batching deduplicates:

with model.batch_signals():
    agent.x = 1  # Queued
    agent.x = 2  # Overwrites
# Exit: emits x=2 only

Lifecycle events (register_agent) - batching doesn't deduplicate, it aggregates:

with model.batch_signals():
    model.add_agent(Agent(1))
    model.add_agent(Agent(2))
# Exit: emits [Agent(1), Agent(2)] as single "agents_registered" event

If we fire 1,000 individual signals at the end, you're right - that's just shifted overhead. The implementation should aggregate lifecycle events into a batch notification:

# Instead of 1000 × AGENT_ADDED
model.notify("AGENT_ADDED", agent=agent_1)
model.notify("AGENT_ADDED", agent=agent_2)
# ...

# Emit once with collection
model.notify("AGENTS_ADDED", agents=[agent_1, agent_2, ...])

On Complexity and Debugging
The concern about consistency bugs is valid. To mitigate this:

Fail-fast validation - Warn if users try to nest incompatible contexts (e.g., suppress inside batch)
Optional strict mode - A model.signals.strict = True flag could raise if any suppression/batching is attempted, helping users debug "why isn't my observer firing?"

The trade-off is: some complexity in signal machinery vs. forcing every user to manually implement batching/suppression in their own code.

codebyNJ · 2026-01-29T02:02:59Z

codebyNJ
Jan 29, 2026

Proposal: Add Real-World Signal Performance Benchmarks

Problem

Mesa Signals (#3166) is becoming core to Mesa 4.0 (behavioral framework, data collection #3145, stopping conditions #2921), but we lack benchmarks showing signal overhead in real simulation contexts.

Current gap:

Have: mesa_signals_benchmark.py (micro-benchmarks for internal optimizations)
Missing: Macro-benchmarks showing signal impact on actual models

Can't answer: "What's the performance cost of using signals in my model?"

Proposed Solution

Add 3 benchmark scenarios to existing benchmark suite:

1. Signal Overhead Benchmark

Compare identical model with/without signals
Measure: init time, run time, memory at different scales (100, 1k, 10k agents)
Example: Signal-based Schelling vs traditional Schelling

2. Signal Density Impact

Test performance with different # of signal connections per agent
Find "breaking points" where overhead becomes significant

3. Signal vs. Polling

Compare reactive patterns (signals) vs traditional polling
Use case: Stopping conditions (run_while with signal-based detector)

Expected Outcomes

Documentation: "Signal overhead is approximately X% for typical usage"
Guidance: "Use signals when [conditions]; avoid when [conditions]"
Baselines: Track signal performance across Mesa versions in CI
Validation: Confirm signals are fast enough for Mesa 4.0 features

Implementation

Extend benchmarks/global_benchmark.py with signal-based model variant
Add new benchmarks/signal_benchmark.py for detailed scenarios
Document results in Mesa Signals docs
Add to CI performance tracking

Related: #3166, #3198, #3145, #2921, #2972

4 replies

quaquel Jan 29, 2026
Maintainer

The same argument, i.e., the lack of microbenchmarks, can be made for all of Mesa. Why single out signals?
I agree that performance matters. But, the general advise is to first profile the code. Typically, 20% of the code is responsible for 80% of the run time. So, why would we need microbenchmarks when we haven no profiling results?
We have benchmark models. They give us a more thorough insight into the overal runtime costs. Why are they not sufficient? Clearly since core mesa currently does not yet use signals, we don't see the performance. But once Making Model reactive #3212 is merged, we start the integration and, as evidenced by the benchmarks there, we start seeing the impact of signals

codebyNJ Jan 29, 2026

Thank you for the feedback, @quaquel. Those are fair points.

You're right that:

The lack of isolated benchmarks applies broadly to Mesa, not just signals
Profiling should come before targeted micro-optimization
Once Making Model reactive #3212 is merged, the existing benchmark models will naturally capture signal overhead

I think I may have framed this incorrectly. My concern wasn't about micro-optimizing signals in isolation, but rather about having visibility into signal overhead as reactive patterns become more central to Mesa.

quaquel Jan 29, 2026
Maintainer

To be clear, I would not mind having a wider suite of microbenchmarks that covers agentset, mesa.discrete_space, and mesa.experimental. continuous_space. However, this is not essential right now, and I would not run these as part of the default CI. It's just useful at times (e.g., while doing stuff on mesa-rust).

Once the new data collection PRs are merged (So that is #3212, #3145, and #3156), and used in the benchmark models, we can get a better sense of where we stand.

Also, I am not sure that the current benchmarking models cover all relevant parts of the code base. For example, I still want to add SugarScape to the benchmark suite because it is the only model that uses property layers. Also, we lack a good model with dynamic adding and removing of agents in continuous space, missing a critical component for benchmarking the performance of continous space.

codebyNJ Jan 29, 2026

That makes sense. A few follow-up thoughts:

On the benchmark model coverage gap you mentioned:

SugarScape for PropertyLayers - I can help add this to the benchmark suite once the new data collection PRs are merged.

Also, we lack a good model with dynamic adding and removing of agents in continuous space, missing a critical component for benchmarking the performance of continous space.

Dynamic agent model for ContinuousSpace - Do you have a specific model in mind, or should we design one? Something like a predator-prey with births/deaths in continuous space?

Ishwarpatra · 2026-03-01T17:11:23Z

Ishwarpatra
Mar 1, 2026

Hi @quaquel ! I saw your note about needing a dynamic agent model for ContinuousSpace benchmarking. I have started building a Predator-Prey model to address this. I've got the Mesa 4.0 continuous space and random movement wired up, and I am working on adding the dynamic birth/death/energy mechanics next. I will open a PR in mesa-examples once it's fully ready!

0 replies

jackiekazil · 2026-03-10T03:20:14Z

jackiekazil
Mar 10, 2026
Maintainer

Formalizing design proposal process

Mesa Enhancement Proposals - MEPs

I feel like the annoying person who constantly brings this up, but also has been annoyed at times by changes that were unexpected or designs I would have done differently, but didn't have the opportunity to chime in. While everyone doesn't need to track everything, there is no good way for people to keep a future pulse on Mesa.

I think we need some more formality around changes. It is difficult to track and discuss when the final design is being held in 1 or 2 heads and is being linked across 1 PR, 1 comment thread in a discussion, and an issue.

There is no cleanly structured way to move an idea from "discussion" to "accepted design". Mesa has experienced a lot of growth over the year -- people need predictability. Mesa 4 seems like a good time to do it.

Looking at this thread alone: signals, the Scenario class, data collection backends, keyword-only arguments, and the module system have all received conceptual approval but it is not clear what was decided and why and alternatives -- that might be someone's head and it gets put into a PR, but by the time a PR is submitted, it feels like the element is "done". Furthermore, six months from now, a new contributor (or even a current maintainer) has no authoritative place to look to understand why something is the way it is.

Proposal

Python's PEP process process does this well. Borrowing from it, but scaled down...

Structure of a Mesa Enhancement Proposal (MEP)

Motivation: what problem does this solve, with concrete user stories
Specification: the actual API or behavior, precise enough to implement from
Alternatives considered: what else was evaluated and why it was rejected
Backwards compatibility: what breaks and what the migration path is
Open questions: explicitly flagged uncertainties, not buried in comments

Process:

Draft posted as a PR to a /proposals directory in the repo or separate repo
Two-week comment period for community input
Maintainer sign-off required from at least two maintainers
Merged proposal = canonical design record; supersedes any prior discussion threads

Mesa 4 involves multiple interconnected breaking changes: the time model, signals, data collection, spatial abstractions, the module system. These need to be designed in harmony. Proposals will have cross-cutting implications, but each element can discussed independently and in context to each other.

This also gives a clear map of what to code to, so by the time it is a fully submitted, coded up PR there is no discussion conceptually.

I don't want overhead. No one wants that. I don't see this as that. Trivial changes (bug fixes, docs, small additions) don't need proposals. The requirement should be on something like: does this change a public API or introduce a new abstraction? If yes, a proposal. If no, a normal PR.

Extra benefits:
-- These are proposals create artifacts that parallel work can start from. Docs and migration guides can be created in more cohesiveness without code having to be complete.
-- I imagine docs will be auto generated with AI in the future -- and these will support the narrative about the why/how.

4 replies

EwoutH Mar 10, 2026
Maintainer Author

I’m in doubt about this. It’s adding another step in our, sometimes already long process.

I’m also not sure how this interacts with our experimental space, where most features land initially. In practice, we see most features change relative little from their initial design to stabilization (which is good!). However, if you require a MEP before landing in experimental, you take away velocity. If you require it before stabilization, it just becomes fancy documentation because all discussions already have been had.

quaquel Mar 10, 2026
Maintainer

In my perception, the de facto process is already quite close to what @jackiekazil outlines. Most large-scale changes start with a discussion. Next, one or more pathfinding PRs are opened. Once a PR has been accepted, there is often a tracking issue for working out the remaining implications and possible refinements. Also, most big changes land in experimental first, giving ample time to discover the real-world implications of a change. It is extremely rare for a major change to Mesa to land within a few days of opening a PR. Thus, there is ample time for everyone to provide input both at the original inception, at the PR, and when deciding to stabilize.

What is, however, lacking is indeed a single place where it is all written up. Look at how many discussions there have been on data collection. Ideas are often spread across several discussions, PRs, and issues. This can make it difficult, at times, to reconstruct how and why we ended up with a design. At the same time, I am afraid that introducing a new structure, such as a MEP, will not fundamentally change any of this. I fear that a MEP would just replace the original first discussion. The other option is to actively close MEPs that won't be merged. But this would force us to rewrite and summarize older discussions whenever we are refining an idea. This seems a lot of work with, in my perception, limited benefit.

jackiekazil Mar 11, 2026
Maintainer

I don't think we have to write and rewrite -- we can commit to documenting the finalize path and having some sort of collective description, which can reference links -- but tracking at the moment is incredibly painful unless you track an item from the very beginning.

Maybe there is an initial propose and then highlighted bullets at the end of the proposal that describe the final result (whether it deviated or aligned, and if it deviated, how so).

This is the problem I want to solve for: "What is, however, lacking is indeed a single place where it is all written up."

quaquel Mar 11, 2026
Maintainer

I don't know of a clean solution to this. Generally, the final write-up is in the big PR. We do actively ensure that this correctly reflects whatever is merged. All of these can still be found, but are not actively linked from anywhere. I am, however, not sure that git or, more broadly, GitHub is the right place to store this kind of information. One direction might be to include it in readthedocs. Basically, take the detailed big PR message and turn it into an MD file, and have a curated list somewhere on the website.

Uh oh!

Mesa 4 goals #2972

Uh oh!

Uh oh!

EwoutH Dec 18, 2025 Maintainer

Fundamentals

Time & Space in order

Cleansheet Experimentation and Data Collection

Stable, Performant Visualization

Extendable

Powerful Agent Behavior

ML/RL/DL/AI Integration

Extensibility Through Modules

Replies: 10 comments · 69 replies

Uh oh!

quaquel Dec 18, 2025 Maintainer

Uh oh!

EwoutH Dec 24, 2025 Maintainer Author

Uh oh!

quaquel Dec 24, 2025 Maintainer

Uh oh!

Uh oh!

EwoutH Dec 24, 2025 Maintainer Author

Uh oh!

Uh oh!

EwoutH Dec 24, 2025 Maintainer Author

Time

Uh oh!

quaquel Dec 24, 2025 Maintainer

Uh oh!

EwoutH Dec 28, 2025 Maintainer Author

Removing batch_run and DataCollector

Uh oh!

codebreaker32 Jan 14, 2026 Collaborator

Problem

Conceptual Design

The Architecture Diagram

Key Components

Uh oh!

Uh oh!

EwoutH Jan 14, 2026 Maintainer Author

Uh oh!

Uh oh!

codebreaker32 Jan 26, 2026 Collaborator

1. Architecture Overview

2. Usage Examples

3. Current Implementation

4. Roadmap & Performance

Uh oh!

quaquel Jan 26, 2026 Maintainer

Uh oh!

Uh oh!

codebreaker32 Jan 27, 2026 Collaborator

Uh oh!

quaquel Dec 28, 2025 Maintainer

Performance

Uh oh!

Uh oh!

quaquel Jan 23, 2026 Maintainer

Uh oh!

quaquel Jan 24, 2026 Maintainer

Uh oh!

codebreaker32 Jan 26, 2026 Collaborator

Uh oh!

quaquel Jan 26, 2026 Maintainer

Uh oh!

falloficarus22 Feb 7, 2026

Uh oh!

quaquel Dec 29, 2025 Maintainer

Mesa exceptions

Uh oh!

EwoutH Dec 29, 2025 Maintainer Author

Uh oh!

Nithurshen Dec 29, 2025

Uh oh!

Uh oh!

EwoutH
Dec 18, 2025
Maintainer

Replies: 10 comments 69 replies

quaquel
Dec 18, 2025
Maintainer

EwoutH Dec 24, 2025
Maintainer Author

quaquel Dec 24, 2025
Maintainer

EwoutH Dec 24, 2025
Maintainer Author

EwoutH
Dec 24, 2025
Maintainer Author

quaquel Dec 24, 2025
Maintainer

EwoutH
Dec 28, 2025
Maintainer Author

Removing `batch_run` and `DataCollector`

codebreaker32 Jan 14, 2026
Collaborator

EwoutH Jan 14, 2026
Maintainer Author

codebreaker32 Jan 26, 2026
Collaborator

quaquel Jan 26, 2026
Maintainer

codebreaker32 Jan 27, 2026
Collaborator

quaquel
Dec 28, 2025
Maintainer

quaquel Jan 23, 2026
Maintainer

quaquel Jan 24, 2026
Maintainer

codebreaker32 Jan 26, 2026
Collaborator

quaquel Jan 26, 2026
Maintainer

quaquel
Dec 29, 2025
Maintainer

EwoutH Dec 29, 2025
Maintainer Author