DataCollector Mutable Reference Leak



The `DataCollector` fails to create deep copies of agent-level data when collecting. When an agent reporter returns a mutable object (list, dict, numpy array, etc.), the `DataCollector` stores a reference to that object instead of a copy. As the agent modifies this object in subsequent steps, all historical records are retroactively updated to reflect the current state, completely invalidating longitudinal data analysis.

**Expected behavior**

When collecting agent data at each step, the `DataCollector` should preserve the state of mutable objects at that specific time step. Historical records should remain unchanged when agents modify their attributes in later steps.

For example:
- Step 1: Agent has `grades = []`, DataCollector should record `[]`
- Step 2: Agent has `grades = [85]`, DataCollector should record `[85]`
- Step 3: Agent has `grades = [85, 92]`, DataCollector should record `[85, 92]`

When reviewing historical data, Step 1 should still show `[]`, Step 2 should show `[85]`, and Step 3 should show `[85, 92]`.

**To Reproduce**

```python
from mesa.datacollection import DataCollector
from mesa.model import Model
from mesa.agent import Agent

class TestAgent(Agent):
    def __init__(self, model):
        super().__init__(model)
        self.my_list = []  # Mutable attribute
    
    def step(self):
        self.my_list.append(self.model.steps)

class TestModel(Model):
    def __init__(self):
        super().__init__()
        self.agent = TestAgent(self)
        
        # Track the mutable list
        self.datacollector = DataCollector(
            agent_reporters={"MyList": lambda a: a.my_list}
        )
    
    def step(self):
        self.datacollector.collect(self)
        self.agent.step()

# Run simulation
model = TestModel()
model.step()  # Step 1: list is []
model.step()  # Step 2: list is [1]
model.step()  # Step 3: list is [1, 2]

# Check historical data
df = model.datacollector.get_agent_vars_dataframe()
print(df)

# BUG: All steps show [1, 2, 3] instead of their historical values
# Expected:
#   Step 1: []
#   Step 2: [1]
#   Step 3: [1, 2]
# Actual:
#   Step 1: [1, 2, 3]
#   Step 2: [1, 2, 3]
#   Step 3: [1, 2, 3]
```

**Additional context**

- This bug affects any mutable data type: lists, dicts, sets, numpy arrays, custom objects, etc.
- The `DataCollector` already uses `deepcopy()` for model-level reporters (line 330 in `datacollection.py`) but not for agent-level reporters
- The fix is to apply `deepcopy()` to agent reporter results in the `_record_agents()` method (around line 284)


This bug silently corrupts historical data, making research conclusions based on Mesa simulations potentially invalid when tracking mutable agent attributes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataCollector Mutable Reference Leak #3035

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataCollector Mutable Reference Leak #3035

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions