Skip to content

Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps#3109

Merged
quaquel merged 7 commits intomesa:mainfrom
codebreaker32:bug/batch_run
Jan 12, 2026
Merged

Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps#3109
quaquel merged 7 commits intomesa:mainfrom
codebreaker32:bug/batch_run

Conversation

@codebreaker32
Copy link
Copy Markdown
Collaborator

@codebreaker32 codebreaker32 commented Jan 10, 2026

Summary

This PR fixes a bug where batch_run failed to capture the final step of a simulation and incorrectly handled data in models with non-standard collection frequencies. It transitions the BatchRunner from a "prediction-based" step calculation to an "observation-based" approach, ensuring 100% data integrity.

Bug / Issue

Related Issue: Fixes #2514

When using batch_run, the DataCollector would frequently miss the final step of the model. This was caused by the function _model_run_func using a hardcoded range() based on model.steps - 1, which stopped the data extraction one step short of the actual final state.

However #2514 wasn't clearly able to state the problem i.e, the previous implementation assumed a strict 1-to-1 relationship between model steps and data rows. This led to:

  • Data Loss: In models with "Time Dilation" (collecting multiple times per step), sub-step data was silently deleted because range() only generates one integer per step.
  • Ghost Data: In sparse collection models, batch_run would try to fetch data for steps where none was collected, resulting in duplicated or NaN rows.

Implementation

The logic in _model_run_func has been refactored to stop guessing the step indices and instead trust the DataCollector's recorded history.

Key Changes:

  • Truth-Sourcing: Replaced range(0, model.steps, ...) with a direct lookup of model.datacollector._collection_steps.

Testing

  • Accuracy Verification: Confirmed via the provided MRE that the all steps are collected
  • Fault in Previous Code: Using range() only returned 6 rows for a 5-step TimeDilation run (Initial + 5 steps), losing nearly half the data(locally tested using TimeDilationModel in test_batch_run)
  • New Code Result: Correctly retrieves all 11 rows (Initial + 2 collections per step 5 steps), ensuring len(results) == 11.

Additional Notes

Performance vs. Accuracy Trade-off:
By moving away from range(), we now handle the actual list of timestamps recorded by the model. While this ensures perfect accuracy for all model types (sparse, dilated, or standard), it introduces an overhead. So I think the point of debate is to decide whether this change is worth the performance tradeoff

MRE

import mesa
import pandas as pd

class TimeDilationModel(mesa.Model):
    def __init__(self,rng=None):
        super().__init__()
        self.datacollector = mesa.DataCollector(
            model_reporters={"Step_Value": lambda m: m.steps}
        )
        # 1. Collect Initial State (Step 0)
        self.datacollector.collect(self)

    def step(self):
        self.datacollector.collect(self)
        self.datacollector.collect(self)
        
        if self.steps >= 2:
            self.running = False


# Logic: 
# Step 0: 1 collection (Init)
# Step 1: 2 collections (Start/End)
# Step 2: 2 collections (Start/End)
# Total Expected Rows: 5

results = mesa.batch_run(
    TimeDilationModel,
    parameters={},
    iterations=1,
    max_steps=10,
    data_collection_period=1,
    display_progress=False
)

df = pd.DataFrame(results)

print("--- RESULTS ---")
print(df[["Step", "Step_Value"]])
print(f"\nTotal Rows: {len(df)}")

expected = 5
if len(df) == expected:
    print("\nSUCCESS: All sub-step data captured.")
else:
    print(f"\nFAILED: Data Loss! Expected {expected} rows, but got {len(df)}.")

@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 -1.7% [-2.3%, -0.9%] 🔴 +4.8% [+4.6%, +5.0%]
BoltzmannWealth large 🔴 +7.8% [+6.8%, +8.7%] 🔵 -0.3% [-4.7%, +3.5%]
Schelling small 🔵 +2.3% [+1.3%, +3.3%] 🔵 +3.6% [+2.8%, +4.2%]
Schelling large 🔵 +2.2% [+1.4%, +3.1%] 🔴 +9.2% [+6.2%, +12.2%]
WolfSheep small 🔴 +8.3% [+8.0%, +8.6%] 🔵 +0.2% [+0.0%, +0.4%]
WolfSheep large 🔴 +8.1% [+4.3%, +10.6%] 🔵 +0.6% [-1.6%, +2.6%]
BoidFlockers small 🔴 +6.7% [+5.8%, +7.5%] 🔵 +2.0% [+1.7%, +2.2%]
BoidFlockers large 🔴 +4.8% [+4.1%, +5.5%] 🔵 +1.5% [+1.2%, +1.7%]

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 10, 2026

Can you ensure proper test code coverage?

Also, any explanation for the performance change?

@quaquel quaquel added the bug Release notes label label Jan 10, 2026
@codebreaker32
Copy link
Copy Markdown
Collaborator Author

codebreaker32 commented Jan 11, 2026

The new code processes more data because it stops deleting "sub-steps"

Old Implementation: The code used range() to generate a list of unique integers.

# Old Code
steps = list(range(0, model.steps, data_collection_period))

If your model ran for 5 steps but collected data 3 times per step (15 total records), range(0, 5) only generated 5 indices. The loop ran 5 times, silently ignoring the other 10 records.
New Implementation: The code points directly to the DataCollector's full history.

If the DataCollector recorded 15 entries, recorded_steps has length 15. The loop for step in steps: now executes 15 times instead of 5.
The "slowdown" is simply the time it takes to process the 10 rows of data that were previously being deleted.
Honestly, I expected far more increased execution time because of this but I guess it is improving performance also by eliminating list(range(...)) (O(n/k) performance overhead)

The new test results on old code :-

FAILED tests/test_batch_run.py::test_batch_run_sparse_collection - AssertionError: Expected 4 rows for sparse collection, got 20
assert 20 == 4
 +  where 20 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0,
 'iteration': 0, 'Step': 1, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 2, 
'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 3, 'collect_interval': 5, 'rng': 42,
 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 
'Step': 5, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 6, 'collect_interval': 5, 'rng': 
42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 7, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 
0, 'Step': 8, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 9, 'collect_interval': 5, 
'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 10, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0,
 'iteration': 0, 'Step': 11, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 12, 
'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 13, 'collect_interval': 5, 'rng': 42,
 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 14, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration':
 0, 'Step': 15, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 16, 'collect_interval': 5,
 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 17, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0,
 'iteration': 0, 'Step': 18, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 19, 
'collect_interval': 5, 'rng': 42, 'Value': 14}])

FAILED tests/test_batch_run.py::test_batch_run_time_dilation - AssertionError: Data Loss! Expected 11 rows, got 6
assert 6 == 11
 +  where 6 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'RealStep': 0.0}, {'RunId': 0, 'iteration': 0, 
'Step': 1, 'rng': None, 'RealStep': 1.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'RealStep': 2.0}, {'RunId':
 0, 'iteration': 0, 'Step': 3, 'rng': None, 'RealStep': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'rng': None, 
'RealStep': 4.0}, {'RunId': 0, 'iteration': 0, 'Step': 5, 'rng': None, 'RealStep': 5.0}])

FAILED tests/test_batch_run.py::test_batch_run_missing_step - AssertionError: Expected 4 rows, got 7

assert 7 == 4
 +  where 7 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'Value': 0.0}, {'RunId': 0, 'iteration': 0, 'Step': 
1, 'rng': None, 'Value': 0.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'Value': 2.0}, {'RunId': 0, 'iteration': 
0, 'Step': 3, 'rng': None, 'Value': 2.0}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'rng': None, 'Value': 4.0}, {'RunId': 0, 
'iteration': 0, 'Step': 5, 'rng': None, 'Value': 4.0}, {'RunId': 0, 'iteration': 0, 'Step': 6, 'rng': None, 'Value': 6.0}])

FAILED tests/test_batch_run.py::test_batch_run_empty_collection_edge_case - AssertionError: Expected 1 
row, got 4
assert 4 == 1
 +  where 4 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 
1, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration':
 0, 'Step': 3, 'rng': None, 'Value': 3.0}])

================ 4 failed, 11 passed in 13.85s ===========================

@codebreaker32 codebreaker32 requested a review from quaquel January 11, 2026 07:18
@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 11, 2026

Code coverage sits at less than 80%. Can you please ensure full coverage?

You can click on codecov/patch to see which code paths are not yet covered by tests.

@codebreaker32
Copy link
Copy Markdown
Collaborator Author

codebreaker32 commented Jan 11, 2026

Added tests to ensure 100% patch coverage

@EwoutH EwoutH added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Jan 11, 2026
@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 -0.2% [-0.6%, +0.3%] 🔵 -0.2% [-0.3%, -0.1%]
BoltzmannWealth large 🔵 -0.3% [-1.1%, +0.4%] 🔵 -1.0% [-3.5%, +1.2%]
Schelling small 🔵 -1.3% [-2.1%, -0.4%] 🔵 -2.6% [-3.1%, -2.1%]
Schelling large 🔵 -0.8% [-1.5%, -0.1%] 🟢 -5.6% [-6.7%, -4.6%]
WolfSheep small 🔵 -2.1% [-2.4%, -1.8%] 🔵 -0.8% [-1.0%, -0.6%]
WolfSheep large 🔵 -2.9% [-6.4%, -1.0%] 🔵 -2.5% [-3.6%, -1.5%]
BoidFlockers small 🔵 -1.9% [-2.2%, -1.5%] 🔵 -0.1% [-0.3%, +0.1%]
BoidFlockers large 🔵 -0.7% [-1.4%, +0.1%] 🔵 +0.1% [-0.3%, +0.6%]

@quaquel quaquel merged commit d3781de into mesa:main Jan 12, 2026
14 checks passed
@codebreaker32 codebreaker32 deleted the bug/batch_run branch February 17, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batch runner doesn't collect data on last step

3 participants