Introducing _StrongAgentSet to support strong references in agents.py by codebreaker32 · Pull Request #3160 · mesa/mesa

codebreaker32 · 2026-01-17T15:20:21Z

Problem:

Iteration Overhead: Profiling indicated that AgentSet iteration was unnecessarily expensive due to WeakKeyDictionary. Every iteration required a liveness check for every agent. Since model.agents (the central registry) relies on AgentSet, this overhead impacted every step of the simulation.

This PR solves them by:-

Introduced _StrongAgentSet, a subclass of AgentSet that uses a standard dict for storage instead of WeakKeyDictionary.
Implemented automatic downgrading so that copies or views (e.g., from .select()) return a standard AgentSet (weak references), preventing memory leaks in user space.

Backward Compatibility: No breaking changes to the public API signature.

Closes #3128

github-actions · 2026-01-17T15:27:09Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🟢 -4.5% [-4.9%, -3.9%]	🔵 -2.2% [-2.3%, -2.1%]
BoltzmannWealth	large	🟢 -6.7% [-7.3%, -6.1%]	🔵 -1.7% [-5.2%, +3.1%]
Schelling	small	🟢 -4.9% [-5.0%, -4.7%]	🔵 +0.8% [+0.7%, +0.9%]
Schelling	large	🟢 -3.7% [-4.0%, -3.4%]	🔵 -2.3% [-3.0%, -1.6%]
WolfSheep	small	🟢 -12.0% [-12.3%, -11.8%]	🟢 -10.6% [-11.2%, -10.0%]
WolfSheep	large	🟢 -12.1% [-12.5%, -11.6%]	🟢 -9.4% [-10.4%, -8.5%]
BoidFlockers	small	🟢 -8.0% [-8.6%, -7.4%]	🔵 -1.0% [-1.2%, -0.8%]
BoidFlockers	large	🟢 -7.5% [-8.3%, -6.7%]	🔵 -0.7% [-0.9%, -0.4%]

quaquel · 2026-01-17T16:20:24Z

Would you mind separating this into 2 PRs? The agent creation one will be straightforward to merge.

The agent set stuff is more difficult. I don't want to change the internal weakkeydict to a dict while still under mesa 3.X. It might cause all kinds of unexpected results in existing models that rely on agentset having a weakkeydict. I'm also currently thinking of keeping the existing AgentSet as a weakkeydict, adding a separate dict version used only internally for model.agents and model.agents_by_type, and removing model._agents (because it's redundant from there on out). But even that might not be fully backward compatible because people might use a copy operation on their agentset. What is also poorly defined at the moment is the behavior of all methods that return an agentset.

codebreaker32 · 2026-01-17T16:34:18Z

I have reverted the creation change.

I'm also currently thinking of keeping the existing AgentSet as a weakkeydict, adding a separate dict version used only internally for model.agents and model.agents_by_type

What if we can keep AgentSet as it is and inherit a new subclass StrongAgentSet(or some better name) for dict-only purposes. It'd be more cleaner and all the methods of AgentSet would be still available to it

quaquel · 2026-01-17T19:52:00Z

What if we can keep AgentSet as it is and inherit a new subclass StrongAgentSet(or some better name) for dict-only purposes. It'd be more cleaner and all the methods of AgentSet would be still available to it

Yes that is a logical next step.

The main issue however is how to properly integrate this into mesa for model.agents and model.agents_by_type. Clearly, for performance, we want to use normal dicts here. And they are controlled by model.register_agent and model.deregister_agent, which in turn is tied to agent.remove. So, memory leaks here are not a concern.

However, what happens if someone makes a copy of these hard ref agent sets?. This can happen via any of the methods with an inplace keyword argument. If you say inplace=False, you'll get a copy. And now we create the potential for a memory leak because these copies are not controlled by model.deregister_agent.

I see two solutions:

Don't allow inplace on any of the methods of the hard ref agent set. This avoids the implict copy and forces a user to use copy.copy. Next, we can make it clear in the docstring that making a copy is a bad idea due to the potential for memory leaks.
Allow inplace, but always return the current AgentSet (so with weakkeydict). It keeps the API the same between both; however, it implicitly changes the datatype of the object you are dealing with.

codebreaker32 · 2026-01-18T07:47:17Z

Allow inplace, but always return the current AgentSet (so with weakkeydict). It keeps the API the same between both; however, it implicitly changes the datatype of the object you are dealing with.

The implementation I suggested already addresses this naturally. We don't need to write any extra logic to enforce it; the inheritance structure handles it for us.

We define _StrongAgentSet to strictly override the storage mechanism:

class _StrongAgentSet(AgentSet):
    """Internal optimization for Model.agents. Uses strong refs for speed."""
    
    def _storage(self, agents):
        return dict.fromkeys(agents)

In the base AgentSet class, methods like select, shuffle, and sort explicitly call the AgentSet constructor for their return values

def select(self, ...):
    # ...
    return AgentSet(agents, self.random)  # <--- Explicitly returns a WEAK AgentSet

Input: You call model.agents.select(...) on the optimized _StrongAgentSet (Internal/Strong).
Output: The method runs, filters the agents, and returns a standard AgentSet (Public/Weak).

This automatically creates the safety boundary you described:

Internal Engine (model.agents): Fast, Strong Refs.
External Views (return values): Safe, Weak Refs.

For extra safety, we can explicitly override __copy__, to ensure that Python's standard copy operations also respect this "downgrade" to weak references.

quaquel · 2026-01-18T08:10:54Z

Thanks for the clarification. This PR currently subclasses the weakkey version from the hard key version. I suggest changing this around.

Moreover, there are many optimizations inside the current agent set when we iterate. These really need to stay in place because they have a profound impact on performance. From memory, the one in shuffle is the most critical, but the others also matter. Basically, we cannot just change the type of the dict used internally and keep everything else the same.

codebreaker32 · 2026-01-18T08:59:13Z

Basically, we cannot just change the type of the dict used internally and keep everything else the same.

You are completely right, we cannot just swap the storage backend and expect it to work. The current AgentSet implementation is tightly coupled to WeakKeyDictionary specific APIs (like .keyrefs() and .data), which don't exist on a standard dict.

To make _StrongAgentSet work, We have to do more than just change _storage; Instead have to override every method that relies on those weakref internals.

codebreaker32 · 2026-01-18T09:11:20Z

I’ve implemented a working version where _StrongAgentSet passes all safety checks and correctly downgrades copies to weak sets. Happy to leave the final merge decision to you

quaquel · 2026-01-18T20:49:53Z

Can you update the title to reflect what this PR does.

I'll try to review this tomorrow. A first quick look suggests most if fine.

What are your thoughts on updating the internals of Mesa to start using this?

codebreaker32 · 2026-01-19T07:28:32Z

What are your thoughts on updating the internals of Mesa to start using this?

In my opinion, this change isn't introducing strong ownership; it is optimizing the model.agents to match the strong ownership that already exists, removing the redundant weak-reference overhead.

I think updating model.agents to use _StrongAgentSet is the right move because:

Conceptually, the Model is the "source of truth" and owner of the agents. In the current implementation, model.py already enforces this ownership via the internal _agents dict. However, the public model.agents (currently a standard AgentSet) wraps these in WeakKeyDictionary, adding unnecessary overhead to check for liveness that the Model already guarantees. Switching to _StrongAgentSet removes this redundancy and optimizes the hot loop.
Since the Model already holds strong references (in _agents), "Lazy Deletion" (relying on GC to remove agents) is already disabled in practice. Adopting _StrongAgentSet formalizes this: agents persist until agent.remove() is explicitly called.

My view: This is a positive standard. Relying on Python's Garbage Collector for simulation logic is non-deterministic. Explicit remove() calls ensure the simulation state is stable and predictable.

Integration Plan
I recommend we apply this specifically to Model.agents

Change: Initialize self._all_agents = _StrongAgentSet([], self.random) and self._agents_by_type inside Model.__init__.
Constraint: Keep _StrongAgentSet strictly internal. It should never be instantiated by users directly.

I also believe this is be gonna a critical change as AgentSet directly interacts/will interact with AgentDataSet

EwoutH

Thanks for working on this, interesting performance benefits!

I will leave the full review to Jan, two minor points.

mesa/agent.py

quaquel · 2026-01-19T09:47:33Z

For Schelling, you should also update agents_by_type to use the dict based agentset. Since this one is already managed by Model.deregister_agent it should be a trivial change.

mesa/model.py

codebreaker32 · 2026-01-19T10:06:25Z

Some cons that I could think for this approach are:-

Copying Costs
AgentSet.copy() is usually fast (shallow copy of weakrefs). _StrongAgentSet.copy() forces the creation of a new AgentSet, which requires generating new weak references for every agent in the set. This makes operations like copy() or non-inplace shuffle() significantly heavier.
Polymorphism Issue
_StrongAgentSet shadows AgentSet. If a future contributor adds a new method to AgentSet that relies on internal weakref structures (like .data or .keyrefs()) and forgets to override it in _StrongAgentSet, it will cause runtime crashes (AttributeError).

EwoutH · 2026-01-24T13:13:17Z

On the GitHub side, it’s running on shared, virtual runners. Performance is influenced by other workloads. Not the most stable platform for benchmarks (but free).

With 3.4.2 released, we can move towards merging this.

Small note: We have maintenance branches (or can create ones), which to we can backport specific fixes. main always points to at least a new minor version, sometimes a major, but never a patch. So don’t let that slow you down :)

codebreaker32 · 2026-01-24T13:45:04Z

it’s running on shared, virtual runners. Performance is influenced by other workloads

Agree. But I found some points that maybe we could fix from our side

We currently measure from the very first run. This captures the cold start penalty. Standard practice(On netlogo and ASV) is to run the model 3+ times to "warm up" the system before starting the timer.
gc.enable() is currently on, meaning the Garbage Collector fires randomly during runs. If GC hits during a specific run, that run looks artificially slow. The standard approach is gc.disable() during the timed loop and gc.collect() manually between runs.
We currently track the minimum(fastest_init). While common, this is extremely noisy on shared CI runners. The industry standard (used by NetLogo and ASV) is to calculate the median and Interquartile Range (IQR), which filters out outliers caused by system spikes.
We use timeit.default_timer(). We should use time.perf_counter(), which provides nanosecond resolution and is monotonic (unaffected by system clock updates).

quaquel · 2026-01-24T14:42:17Z

and

We don't properly clean up after a run. As indicated in Memory leak in models #3179, there is still some memory issue. It would be good to call model.remove_agents() at the end of run_model.

@codebreaker32, If you have time, a PR would be welcome. Otherwise, I'll take a look hopefully tonight.

codebreaker32 · 2026-01-24T14:59:36Z

@codebreaker32, If you have time, a PR would be welcome

Sure, I am on it

EwoutH · 2026-01-24T17:04:16Z

3. We currently track the minimum(fastest_init). While common, this is extremely noisy on shared CI runners. The industry standard (used by NetLogo and ASV) is to calculate the median and Interquartile Range (IQR), which filters out outliers caused by system spikes.

To give some context, there are good arguments for this: While compute execution is often slower, is rarely faster. The distribution tends to be heavily skewed. When originally developing these benchmarks, using the minimum was a conscious decision with more stable results.

That said, improvements to our benchmarking are certainly welcome.

Polymorphism Issue
_StrongAgentSet shadows AgentSet. If a future contributor adds a new method to AgentSet that relies on internal weakref structures (like .data or .keyrefs()) and forgets to override it in _StrongAgentSet, it will cause runtime crashes (AttributeError).

Did we address this yet?

codebreaker32 · 2026-01-24T17:44:05Z

To give some context, there are good arguments for this: While compute execution is often slower, is rarely faster. The distribution tends to be heavily skewed. When originally developing these benchmarks, using the minimum was a conscious decision with more stable results.

After testing both approaches(results shared on #3203), I strongly agree with you :)

Polymorphism Issue
_StrongAgentSet shadows AgentSet. If a future contributor adds a new method to AgentSet that relies on internal weakref structures (like .data or .keyrefs()) and forgets to override it in _StrongAgentSet, it will cause runtime crashes (AttributeError).

Did we address this yet?

I thought it is okay for now after what @quaquel said

However, I doubt there will be major changes to AgentSet. The API has proven very stable after its introduction and early optimizations.

EwoutH · 2026-01-24T18:53:26Z

Could you check if there's a simple test we could add to check for this?

quaquel · 2026-01-24T19:09:33Z

Could you check if there's a simple test we could add to check for this?

I think you are overthinking the issue. First, the API has been very stable. Second, if new stuff is added to agentset, we are likely to remember this and test accordingly. Third, I expect to go in and clean this up in a follow-up PR by adding an ABC AgentSet class that both weakkeydict and normal dict inherit from.

codebreaker32 · 2026-01-24T19:58:24Z

Could you check if there's a simple test we could add to check for this?

Also, DW There are enough tests for _StrongAgentSet to save us from unsafe method definition or overrides :)

github-actions · 2026-01-25T07:52:02Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🟢 -20.6% [-21.1%, -20.0%]	🟢 -6.3% [-6.6%, -6.0%]
BoltzmannWealth	large	🔴 +7.0% [+3.2%, +10.2%]	🟢 -7.8% [-9.9%, -5.4%]
Schelling	small	🟢 -10.9% [-11.1%, -10.6%]	🟢 -5.9% [-6.0%, -5.7%]
Schelling	large	🔵 -5.4% [-7.6%, -2.8%]	🔵 -3.6% [-5.5%, -1.5%]
WolfSheep	small	🟢 -11.3% [-12.4%, -10.0%]	🟢 -8.0% [-8.3%, -7.7%]
WolfSheep	large	🔵 -2.2% [-7.1%, +4.1%]	🟢 -5.2% [-5.9%, -4.3%]
BoidFlockers	small	🟢 -13.1% [-13.8%, -12.3%]	🔵 -1.3% [-1.6%, -1.1%]
BoidFlockers	large	🟢 -10.1% [-10.5%, -9.8%]	🔵 -1.2% [-1.5%, -0.9%]

codebreaker32 · 2026-01-25T08:12:32Z

Above benchmarks ran using old global_benchmarks.py on this PR and updated on main, Now I have merged the main into this branch. You can run again for the updated behavior

github-actions · 2026-01-25T08:32:12Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🟢 -7.4% [-7.7%, -7.2%]	🟢 -5.5% [-5.6%, -5.4%]
BoltzmannWealth	large	🟢 -6.7% [-7.0%, -6.4%]	🟢 -6.7% [-7.9%, -5.7%]
Schelling	small	🟢 -4.2% [-4.3%, -4.0%]	🟢 -3.3% [-3.4%, -3.2%]
Schelling	large	🔵 -2.8% [-3.0%, -2.6%]	🔵 -1.1% [-1.9%, -0.3%]
WolfSheep	small	🟢 -7.1% [-7.3%, -7.0%]	🟢 -5.0% [-5.1%, -4.9%]
WolfSheep	large	🟢 -7.2% [-7.5%, -6.9%]	🟢 -4.8% [-5.6%, -3.9%]
BoidFlockers	small	🟢 -5.6% [-5.8%, -5.3%]	🔵 -0.4% [-0.5%, -0.3%]
BoidFlockers	large	🟢 -5.6% [-5.9%, -5.3%]	🔵 -0.5% [-0.7%, -0.4%]

quaquel · 2026-01-25T10:25:36Z

I am reviewing the code locally at the moment. I have a few minor comments still:

The typing on model for model.agents and model.agents_by_type is not entirely correct anymore since these return the new _StrongAgentSet instead of AgentSet. I guess because of inheritance it still works.
It might also be good to slightly expand the documentation on model.agents and model.agents_by_type, warning people to use this with care.
It would be good to open an issue indicating that model._agent and model._all_agents can be removed. I know this will be a messy job.

codebreaker32 · 2026-01-25T14:01:29Z

You are right since _StrongAgentSet is a subclass, the typing holds up via inheritance, but it is subtle. Also, I acknowledge that the current structure (with _agent, _all_agents, and the inheritance mix) is due for a cleanup.

I see this complete process as a 3-step roadmap to maintain backward compatibility and easy to review:

Refactor AgentSet: Introduce an AbstractAgentSet and make the standard AgentSet inherit from it.
Formalize Strong Storage: Introduce HardKeyAgentSet (inheriting from AbstractAgentSet) and formally swap it into the Model.
Cleanup: Remove redundant instances like model._all_agents once the new hierarchy is stable.

We can open a tracking issue for this roadmap. If you are fine with this approach, I will proceed

quaquel · 2026-01-25T14:29:52Z

Sounds like a plan. Does this mean that this PR will be superseded and so does not need to be merged, or do you want to start from this PR?

codebreaker32 · 2026-01-25T14:34:59Z

Does this mean that this PR will be superseded and so does not need to be merged, or do you want to start from this PR?

I am fine either way!

If you want to get an ABC in immediately, we can merge this PR by tomorrow( I will finalize it by tonight).
Alternatively, if you prefer to have the clean architecture in place first, we can consider this PR superseded. I would then close this and start the new PR series (beginning with AbstractAgentSet) from scratch and tracking issue.

Which approach do you prefer?

quaquel · 2026-01-25T14:39:06Z

Let's start from the ABC instead of merging this. The git history will be cleaner and easier to reconstruct in the future, if needed, what changed and why.

codebreaker32 and others added 2 commits January 17, 2026 15:08

Modify agent.py to support AgentSet and WeakAgentSet

8850a9f

Merge branch 'main' into agentset

5c337bf

EwoutH requested a review from quaquel January 17, 2026 15:38

EwoutH added the enhancement Release notes label label Jan 17, 2026

codebreaker32 added 3 commits January 17, 2026 15:40

Fix bug which led to readthedocs failure

48385fa

Exclude range index from arguments

c8ca828

Add test for WeakAgentSet

7ad4325

Revert agent creation changes

0b09c46

Merge branch 'mesa:main' into agentset

ed43945

Update agent.py and test_agent.py

0dd5699

codebreaker32 changed the title ~~Optimize AgentSet Storage and Agent Creation~~ Introducing _StrongAgentSet to support strong references in agents.py Jan 19, 2026

Merge branch 'main' into agentset

d59cfff

EwoutH reviewed Jan 19, 2026

View reviewed changes

mesa/agent.py Outdated Show resolved Hide resolved

mesa/agent.py Outdated Show resolved Hide resolved

Add extensive tests and updated model.py to use _StrongAgentSet

f8602be

This comment was marked as outdated.

Sign in to view

codebreaker32 added 2 commits January 19, 2026 09:57

Add _StrongAgentSet in register_agent of model.py

1f6e5f9

Change _storage to _create_storage

912771d

quaquel reviewed Jan 19, 2026

View reviewed changes

mesa/model.py Show resolved Hide resolved

Merge branch 'main' into agentset

5c18c4f

quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Jan 25, 2026

Merge branch 'main' into agentset

6c4dc1f

quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Jan 25, 2026

Merge branch 'main' into agentset

13cb966

codebreaker32 closed this Jan 25, 2026

codebreaker32 deleted the agentset branch February 17, 2026 16:46

Uh oh!

Conversation

codebreaker32 commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

quaquel commented Jan 17, 2026

Uh oh!

codebreaker32 commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quaquel commented Jan 17, 2026

Uh oh!

codebreaker32 commented Jan 18, 2026

Uh oh!

quaquel commented Jan 18, 2026

Uh oh!

codebreaker32 commented Jan 18, 2026

Uh oh!

codebreaker32 commented Jan 18, 2026

Uh oh!

quaquel commented Jan 18, 2026

Uh oh!

codebreaker32 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EwoutH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

quaquel commented Jan 19, 2026

Uh oh!

Uh oh!

codebreaker32 commented Jan 19, 2026

Uh oh!

EwoutH commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codebreaker32 commented Jan 24, 2026

Uh oh!

quaquel commented Jan 24, 2026

Uh oh!

codebreaker32 commented Jan 24, 2026

Uh oh!

EwoutH commented Jan 24, 2026

Uh oh!

codebreaker32 commented Jan 24, 2026

Uh oh!

EwoutH commented Jan 24, 2026

Uh oh!

quaquel commented Jan 24, 2026

Uh oh!

codebreaker32 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 25, 2026

Uh oh!

codebreaker32 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 25, 2026

Uh oh!

quaquel commented Jan 25, 2026

Uh oh!

codebreaker32 commented Jan 25, 2026

Uh oh!

quaquel commented Jan 25, 2026

Uh oh!

codebreaker32 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quaquel commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

codebreaker32 commented Jan 17, 2026 •

edited

Loading

codebreaker32 commented Jan 17, 2026 •

edited

Loading

codebreaker32 commented Jan 19, 2026 •

edited

Loading

EwoutH commented Jan 24, 2026 •

edited

Loading

codebreaker32 commented Jan 24, 2026 •

edited

Loading

codebreaker32 commented Jan 25, 2026 •

edited

Loading

codebreaker32 commented Jan 25, 2026 •

edited

Loading