Conversation
13b3932 to
86ba0da
Compare
|
Performance benchmarks:
|
mesa#3035) The DataCollector was storing references to mutable objects returned by agent reporters instead of creating deep copies. This caused all historical records to show the current state rather than the state at each collection time. Changes: - Add deepcopy() to _record_agents() method - Add deepcopy() to _record_agenttype() method - Add test to verify mutable data independence Fixes mesa#3035
86ba0da to
c35145e
Compare
800ff21 to
ff2ff4d
Compare
|
Thanks for this PR and for addressing my initial comments. As you can see in the benchmark, deepcopy adds substantial overhead. So, I am not sure whether this is the ideal solution for this bug. It is a wasteful operation in the case of immutables, and expensive in the case of complicated, nested data structures. Moreover, if I am not mistaken, returning collections will break some of the downstream behavior of the data collector, such as returning dataframes. So I am curious if there are alternative solutions that don't have this overhead. One option is to simply not allow mutables as a return to datacollector. Which, in my experience, should not be a significant problem since people typically collect simple data (i.e., numbers instead of mutable collections). |
|
I guess we can use type-aware approach. def get_reports(agent):
_prefix = (agent.model.steps, agent.unique_id)
reports = []
for rep in rep_funcs:
value = rep(agent)
# only copy known mutable types
if isinstance(value, (list, dict, set, bytearray)):
reports.append(deepcopy(value))
elif hasattr(value, '__array__'): # numpy arrays
reports.append(value.copy())
else:
reports.append(value) # immutable types: int, float, str, tuple
return _prefix + tuple(reports) |
|
I have to think about this a bit more. Your code, as given, will solve the problem only for known mutable types, but will still silently work fine for user-defined classes. I am inclined instead to limit data that can be collected to only known built-in immutable types. Which is a much tighter constraint. Alternatively, you could do something like python_immutable_types = [str, int, long, bool, float, tuple,]
if insintance(value, *python_immutable_types)
reports.append(value)
else:
reports.append(copy.deepcopy(value))At least, this avoids the expensive deepcopy in 80% of all cases, if not more. |
|
@quaquel Implemented your suggested approach: This avoids deepcopy overhead for primitives (80%+ of cases) while still fixing the bug. All 16 tests pass. |
20eee9d to
7fe252c
Compare
|
Performance benchmarks:
|
Fixes #3035
The
DataCollectorwas storing references to mutable objects (lists, dicts, numpy arrays, etc.) returned by agent reporters instead of creating deep copies. This caused all historical records to retroactively update when agents modified these objects in subsequent steps, completely invalidating longitudinal data analysis.The Problem
When an agent reporter returns a mutable object, the
DataCollectorstores a reference to that object. As the simulation progresses and the agent modifies the object, the historical data changes to reflect the current state rather than the state at the time of collection.Example:
The Solution
Apply
deepcopy()to agent reporter results in both_record_agents()and_record_agenttype()methods, consistent with the existing behavior for model-level reporters.Changes Made
mesa/datacollection.py: Adddeepcopy()to agent reporter results (lines 284, 296)tests/test_datacollector_mutable_fix.py: Add test to verify mutable data independenceTesting
test_mutable_data_independencepassesImpact