Fix: Add deepcopy to agent reporters to prevent mutable reference lea… by Nithin9585 · Pull Request #3038 · mesa/mesa

Nithin9585 · 2025-12-29T16:02:07Z

The DataCollector was storing references to mutable objects (lists, dicts, numpy arrays, etc.) returned by agent reporters instead of creating deep copies. This caused all historical records to retroactively update when agents modified these objects in subsequent steps, completely invalidating longitudinal data analysis.

The Problem

When an agent reporter returns a mutable object, the DataCollector stores a reference to that object. As the simulation progresses and the agent modifies the object, the historical data changes to reflect the current state rather than the state at the time of collection.

Example:

# Step 1: Agent has list = []
datacollector.collect()  # Should record []

# Step 2: Agent has list = [1]  
datacollector.collect()  # Should record [1]

# Bug: When checking Step 1's data later, it shows [1] instead of []

The Solution

Apply deepcopy() to agent reporter results in both _record_agents() and _record_agenttype() methods, consistent with the existing behavior for model-level reporters.

Changes Made

mesa/datacollection.py: Add deepcopy() to agent reporter results (lines 284, 296)
tests/test_datacollector_mutable_fix.py: Add test to verify mutable data independence

Testing

New test test_mutable_data_independence passes
All existing DataCollector tests pass (15/15)
No regressions introduced

Impact

Data Integrity: Historical agent data now correctly preserves state at each collection time
Compatibility: Works with all mutable types (lists, dicts, sets, numpy arrays, custom objects)

github-actions · 2025-12-29T16:09:40Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🔵 +0.7% [+0.4%, +1.0%]	🔵 +2.6% [+2.5%, +2.7%]
BoltzmannWealth	large	🔵 +1.6% [+1.2%, +2.0%]	🔵 +5.5% [+3.0%, +7.6%]
Schelling	small	🔵 +0.6% [-1.0%, +2.4%]	🔴 +5.5% [+4.1%, +7.1%]
Schelling	large	🔵 +0.8% [+0.7%, +1.0%]	🔴 +19.3% [+14.4%, +24.6%]
WolfSheep	small	🔵 -0.9% [-1.9%, -0.2%]	🔵 +0.8% [+0.5%, +1.0%]
WolfSheep	large	🔵 -2.9% [-7.9%, +0.2%]	🔵 -0.1% [-4.0%, +3.4%]
BoidFlockers	small	🔵 +1.1% [+0.7%, +1.5%]	🔵 +1.4% [+1.3%, +1.5%]
BoidFlockers	large	🔵 +0.9% [+0.2%, +1.6%]	🔵 +1.5% [+1.2%, +1.8%]

mesa#3035) The DataCollector was storing references to mutable objects returned by agent reporters instead of creating deep copies. This caused all historical records to show the current state rather than the state at each collection time. Changes: - Add deepcopy() to _record_agents() method - Add deepcopy() to _record_agenttype() method - Add test to verify mutable data independence Fixes mesa#3035

tests/test_datacollector_mutable_fix.py

…ting test file

quaquel · 2025-12-30T08:16:01Z

Thanks for this PR and for addressing my initial comments.

As you can see in the benchmark, deepcopy adds substantial overhead. So, I am not sure whether this is the ideal solution for this bug. It is a wasteful operation in the case of immutables, and expensive in the case of complicated, nested data structures. Moreover, if I am not mistaken, returning collections will break some of the downstream behavior of the data collector, such as returning dataframes. So I am curious if there are alternative solutions that don't have this overhead.

One option is to simply not allow mutables as a return to datacollector. Which, in my experience, should not be a significant problem since people typically collect simple data (i.e., numbers instead of mutable collections).

codebyNJ · 2025-12-30T14:18:08Z

@Nithin9585 @quaquel

I guess we can use type-aware approach.

def get_reports(agent):
        _prefix = (agent.model.steps, agent.unique_id)
        reports = []
        for rep in rep_funcs:
            value = rep(agent)
            # only copy known mutable types
            if isinstance(value, (list, dict, set, bytearray)):
                reports.append(deepcopy(value))
            elif hasattr(value, '__array__'):  # numpy arrays
                reports.append(value.copy())
            else:
                reports.append(value)  # immutable types: int, float, str, tuple
        return _prefix + tuple(reports)

quaquel · 2025-12-30T18:46:36Z

I have to think about this a bit more. Your code, as given, will solve the problem only for known mutable types, but will still silently work fine for user-defined classes. I am inclined instead to limit data that can be collected to only known built-in immutable types. Which is a much tighter constraint.

Alternatively, you could do something like

python_immutable_types = [str, int, long, bool, float, tuple,]

if insintance(value, *python_immutable_types)
   reports.append(value)
else:
   reports.append(copy.deepcopy(value))

At least, this avoids the expensive deepcopy in 80% of all cases, if not more.

Nithin9585 · 2025-12-30T19:05:10Z

@quaquel Implemented your suggested approach:

python
python_immutable_types = (str, int, bool, float, tuple)
if isinstance(value, python_immutable_types):
    reports.append(value)
else:
    reports.append(deepcopy(value))

This avoids deepcopy overhead for primitives (80%+ of cases) while still fixing the bug. All 16 tests pass.

… overhead

github-actions · 2025-12-30T19:41:30Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🔵 +0.7% [-0.2%, +1.6%]	🟢 -4.9% [-5.1%, -4.8%]
BoltzmannWealth	large	🔵 +3.1% [+2.6%, +3.5%]	🟢 -13.8% [-15.6%, -12.0%]
Schelling	small	🔵 +4.4% [+2.8%, +6.1%]	🟢 -6.8% [-7.8%, -5.8%]
Schelling	large	🔵 +3.0% [+2.8%, +3.2%]	🔵 -2.8% [-4.0%, -1.5%]
WolfSheep	small	🔵 +2.9% [+1.9%, +3.4%]	🔵 +0.1% [-0.1%, +0.2%]
WolfSheep	large	🔵 +4.5% [-3.7%, +13.4%]	🔵 -0.0% [-1.1%, +1.2%]
BoidFlockers	small	🔵 +0.4% [+0.1%, +0.7%]	🔵 +0.1% [+0.0%, +0.3%]
BoidFlockers	large	🔵 -0.2% [-0.8%, +0.3%]	🔵 +0.1% [+0.0%, +0.2%]

Nithin9585 force-pushed the fix/datacollector-mutable-reference-3035 branch from 13b3932 to 86ba0da Compare December 29, 2025 16:06

Nithin9585 force-pushed the fix/datacollector-mutable-reference-3035 branch from 86ba0da to c35145e Compare December 29, 2025 16:12

quaquel reviewed Dec 29, 2025

View reviewed changes

tests/test_datacollector_mutable_fix.py Outdated Show resolved Hide resolved

Refactor: Convert mutable data test to pytest and integrate into exis…

ff2ff4d

…ting test file

Nithin9585 force-pushed the fix/datacollector-mutable-reference-3035 branch from 800ff21 to ff2ff4d Compare December 30, 2025 05:59

Merge branch 'main' into fix/datacollector-mutable-reference-3035

96a4878

quaquel added the bug Release notes label label Dec 30, 2025

Merge branch 'main' into fix/datacollector-mutable-reference-3035

314ff6b

Optimize: Use type-aware conditional deepcopy to minimize performance…

7fe252c

… overhead

Nithin9585 force-pushed the fix/datacollector-mutable-reference-3035 branch from 20eee9d to 7fe252c Compare December 30, 2025 19:14

quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Dec 30, 2025

quaquel approved these changes Dec 30, 2025

View reviewed changes

quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Dec 30, 2025

quaquel merged commit 489545d into mesa:main Dec 30, 2025
15 checks passed

Nithin9585 deleted the fix/datacollector-mutable-reference-3035 branch December 30, 2025 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Add deepcopy to agent reporters to prevent mutable reference lea…#3038

Fix: Add deepcopy to agent reporters to prevent mutable reference lea…#3038
quaquel merged 5 commits intomesa:mainfrom
Nithin9585:fix/datacollector-mutable-reference-3035

Nithin9585 commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

quaquel commented Dec 30, 2025

Uh oh!

codebyNJ commented Dec 30, 2025

Uh oh!

quaquel commented Dec 30, 2025

Uh oh!

Nithin9585 commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Nithin9585 commented Dec 29, 2025

The Problem

The Solution

Changes Made

Testing

Impact

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

quaquel commented Dec 30, 2025

Uh oh!

codebyNJ commented Dec 30, 2025

Uh oh!

quaquel commented Dec 30, 2025

Uh oh!

Nithin9585 commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants