Skip to content

DataCollector Mutable Reference Leak #3035

@Nithin9585

Description

@Nithin9585

The DataCollector fails to create deep copies of agent-level data when collecting. When an agent reporter returns a mutable object (list, dict, numpy array, etc.), the DataCollector stores a reference to that object instead of a copy. As the agent modifies this object in subsequent steps, all historical records are retroactively updated to reflect the current state, completely invalidating longitudinal data analysis.

Expected behavior

When collecting agent data at each step, the DataCollector should preserve the state of mutable objects at that specific time step. Historical records should remain unchanged when agents modify their attributes in later steps.

For example:

  • Step 1: Agent has grades = [], DataCollector should record []
  • Step 2: Agent has grades = [85], DataCollector should record [85]
  • Step 3: Agent has grades = [85, 92], DataCollector should record [85, 92]

When reviewing historical data, Step 1 should still show [], Step 2 should show [85], and Step 3 should show [85, 92].

To Reproduce

from mesa.datacollection import DataCollector
from mesa.model import Model
from mesa.agent import Agent

class TestAgent(Agent):
    def __init__(self, model):
        super().__init__(model)
        self.my_list = []  # Mutable attribute
    
    def step(self):
        self.my_list.append(self.model.steps)

class TestModel(Model):
    def __init__(self):
        super().__init__()
        self.agent = TestAgent(self)
        
        # Track the mutable list
        self.datacollector = DataCollector(
            agent_reporters={"MyList": lambda a: a.my_list}
        )
    
    def step(self):
        self.datacollector.collect(self)
        self.agent.step()

# Run simulation
model = TestModel()
model.step()  # Step 1: list is []
model.step()  # Step 2: list is [1]
model.step()  # Step 3: list is [1, 2]

# Check historical data
df = model.datacollector.get_agent_vars_dataframe()
print(df)

# BUG: All steps show [1, 2, 3] instead of their historical values
# Expected:
#   Step 1: []
#   Step 2: [1]
#   Step 3: [1, 2]
# Actual:
#   Step 1: [1, 2, 3]
#   Step 2: [1, 2, 3]
#   Step 3: [1, 2, 3]

Additional context

  • This bug affects any mutable data type: lists, dicts, sets, numpy arrays, custom objects, etc.
  • The DataCollector already uses deepcopy() for model-level reporters (line 330 in datacollection.py) but not for agent-level reporters
  • The fix is to apply deepcopy() to agent reporter results in the _record_agents() method (around line 284)

This bug silently corrupts historical data, making research conclusions based on Mesa simulations potentially invalid when tracking mutable agent attributes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelease notes label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions