Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps by codebreaker32 · Pull Request #3109 · mesa/mesa

codebreaker32 · 2026-01-10T17:00:28Z

Summary

This PR fixes a bug where batch_run failed to capture the final step of a simulation and incorrectly handled data in models with non-standard collection frequencies. It transitions the BatchRunner from a "prediction-based" step calculation to an "observation-based" approach, ensuring 100% data integrity.

Bug / Issue

Related Issue: Fixes #2514

When using batch_run, the DataCollector would frequently miss the final step of the model. This was caused by the function _model_run_func using a hardcoded range() based on model.steps - 1, which stopped the data extraction one step short of the actual final state.

However #2514 wasn't clearly able to state the problem i.e, the previous implementation assumed a strict 1-to-1 relationship between model steps and data rows. This led to:

Data Loss: In models with "Time Dilation" (collecting multiple times per step), sub-step data was silently deleted because range() only generates one integer per step.
Ghost Data: In sparse collection models, batch_run would try to fetch data for steps where none was collected, resulting in duplicated or NaN rows.

Implementation

The logic in _model_run_func has been refactored to stop guessing the step indices and instead trust the DataCollector's recorded history.

Key Changes:

Truth-Sourcing: Replaced range(0, model.steps, ...) with a direct lookup of model.datacollector._collection_steps.

Testing

Accuracy Verification: Confirmed via the provided MRE that the all steps are collected
Fault in Previous Code: Using range() only returned 6 rows for a 5-step TimeDilation run (Initial + 5 steps), losing nearly half the data(locally tested using TimeDilationModel in test_batch_run)
New Code Result: Correctly retrieves all 11 rows (Initial + 2 collections per step 5 steps), ensuring len(results) == 11.

Additional Notes

Performance vs. Accuracy Trade-off:
By moving away from range(), we now handle the actual list of timestamps recorded by the model. While this ensures perfect accuracy for all model types (sparse, dilated, or standard), it introduces an overhead. So I think the point of debate is to decide whether this change is worth the performance tradeoff

MRE

import mesa
import pandas as pd

class TimeDilationModel(mesa.Model):
    def __init__(self,rng=None):
        super().__init__()
        self.datacollector = mesa.DataCollector(
            model_reporters={"Step_Value": lambda m: m.steps}
        )
        # 1. Collect Initial State (Step 0)
        self.datacollector.collect(self)

    def step(self):
        self.datacollector.collect(self)
        self.datacollector.collect(self)
        
        if self.steps >= 2:
            self.running = False


# Logic: 
# Step 0: 1 collection (Init)
# Step 1: 2 collections (Start/End)
# Step 2: 2 collections (Start/End)
# Total Expected Rows: 5

results = mesa.batch_run(
    TimeDilationModel,
    parameters={},
    iterations=1,
    max_steps=10,
    data_collection_period=1,
    display_progress=False
)

df = pd.DataFrame(results)

print("--- RESULTS ---")
print(df[["Step", "Step_Value"]])
print(f"\nTotal Rows: {len(df)}")

expected = 5
if len(df) == expected:
    print("\nSUCCESS: All sub-step data captured.")
else:
    print(f"\nFAILED: Data Loss! Expected {expected} rows, but got {len(df)}.")

github-actions · 2026-01-10T17:08:02Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🔵 -1.7% [-2.3%, -0.9%]	🔴 +4.8% [+4.6%, +5.0%]
BoltzmannWealth	large	🔴 +7.8% [+6.8%, +8.7%]	🔵 -0.3% [-4.7%, +3.5%]
Schelling	small	🔵 +2.3% [+1.3%, +3.3%]	🔵 +3.6% [+2.8%, +4.2%]
Schelling	large	🔵 +2.2% [+1.4%, +3.1%]	🔴 +9.2% [+6.2%, +12.2%]
WolfSheep	small	🔴 +8.3% [+8.0%, +8.6%]	🔵 +0.2% [+0.0%, +0.4%]
WolfSheep	large	🔴 +8.1% [+4.3%, +10.6%]	🔵 +0.6% [-1.6%, +2.6%]
BoidFlockers	small	🔴 +6.7% [+5.8%, +7.5%]	🔵 +2.0% [+1.7%, +2.2%]
BoidFlockers	large	🔴 +4.8% [+4.1%, +5.5%]	🔵 +1.5% [+1.2%, +1.7%]

mesa/batchrunner.py

quaquel · 2026-01-10T20:23:14Z

Can you ensure proper test code coverage?

Also, any explanation for the performance change?

codebreaker32 · 2026-01-11T06:50:28Z

The new code processes more data because it stops deleting "sub-steps"

Old Implementation: The code used range() to generate a list of unique integers.

# Old Code
steps = list(range(0, model.steps, data_collection_period))

If your model ran for 5 steps but collected data 3 times per step (15 total records), range(0, 5) only generated 5 indices. The loop ran 5 times, silently ignoring the other 10 records.
New Implementation: The code points directly to the DataCollector's full history.

If the DataCollector recorded 15 entries, recorded_steps has length 15. The loop for step in steps: now executes 15 times instead of 5.
The "slowdown" is simply the time it takes to process the 10 rows of data that were previously being deleted.
Honestly, I expected far more increased execution time because of this but I guess it is improving performance also by eliminating list(range(...)) (O(n/k) performance overhead)

The new test results on old code :-

FAILED tests/test_batch_run.py::test_batch_run_sparse_collection - AssertionError: Expected 4 rows for sparse collection, got 20
assert 20 == 4
 +  where 20 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0,
 'iteration': 0, 'Step': 1, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 2, 
'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 3, 'collect_interval': 5, 'rng': 42,
 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 
'Step': 5, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 6, 'collect_interval': 5, 'rng': 
42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 7, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 
0, 'Step': 8, 'collect_interval': 5, 'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 9, 'collect_interval': 5, 
'rng': 42, 'Value': 4}, {'RunId': 0, 'iteration': 0, 'Step': 10, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0,
 'iteration': 0, 'Step': 11, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 12, 
'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 13, 'collect_interval': 5, 'rng': 42,
 'Value': 9}, {'RunId': 0, 'iteration': 0, 'Step': 14, 'collect_interval': 5, 'rng': 42, 'Value': 9}, {'RunId': 0, 'iteration':
 0, 'Step': 15, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 16, 'collect_interval': 5,
 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 17, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0,
 'iteration': 0, 'Step': 18, 'collect_interval': 5, 'rng': 42, 'Value': 14}, {'RunId': 0, 'iteration': 0, 'Step': 19, 
'collect_interval': 5, 'rng': 42, 'Value': 14}])

FAILED tests/test_batch_run.py::test_batch_run_time_dilation - AssertionError: Data Loss! Expected 11 rows, got 6
assert 6 == 11
 +  where 6 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'RealStep': 0.0}, {'RunId': 0, 'iteration': 0, 
'Step': 1, 'rng': None, 'RealStep': 1.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'RealStep': 2.0}, {'RunId':
 0, 'iteration': 0, 'Step': 3, 'rng': None, 'RealStep': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'rng': None, 
'RealStep': 4.0}, {'RunId': 0, 'iteration': 0, 'Step': 5, 'rng': None, 'RealStep': 5.0}])

FAILED tests/test_batch_run.py::test_batch_run_missing_step - AssertionError: Expected 4 rows, got 7

assert 7 == 4
 +  where 7 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'Value': 0.0}, {'RunId': 0, 'iteration': 0, 'Step': 
1, 'rng': None, 'Value': 0.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'Value': 2.0}, {'RunId': 0, 'iteration': 
0, 'Step': 3, 'rng': None, 'Value': 2.0}, {'RunId': 0, 'iteration': 0, 'Step': 4, 'rng': None, 'Value': 4.0}, {'RunId': 0, 
'iteration': 0, 'Step': 5, 'rng': None, 'Value': 4.0}, {'RunId': 0, 'iteration': 0, 'Step': 6, 'rng': None, 'Value': 6.0}])

FAILED tests/test_batch_run.py::test_batch_run_empty_collection_edge_case - AssertionError: Expected 1 
row, got 4
assert 4 == 1
 +  where 4 = len([{'RunId': 0, 'iteration': 0, 'Step': 0, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 
1, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration': 0, 'Step': 2, 'rng': None, 'Value': 3.0}, {'RunId': 0, 'iteration':
 0, 'Step': 3, 'rng': None, 'Value': 3.0}])

================ 4 failed, 11 passed in 13.85s ===========================

quaquel · 2026-01-11T07:38:05Z

Code coverage sits at less than 80%. Can you please ensure full coverage?

You can click on codecov/patch to see which code paths are not yet covered by tests.

codebreaker32 · 2026-01-11T08:46:49Z

Added tests to ensure 100% patch coverage

github-actions · 2026-01-11T09:52:37Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
BoltzmannWealth	small	🔵 -0.2% [-0.6%, +0.3%]	🔵 -0.2% [-0.3%, -0.1%]
BoltzmannWealth	large	🔵 -0.3% [-1.1%, +0.4%]	🔵 -1.0% [-3.5%, +1.2%]
Schelling	small	🔵 -1.3% [-2.1%, -0.4%]	🔵 -2.6% [-3.1%, -2.1%]
Schelling	large	🔵 -0.8% [-1.5%, -0.1%]	🟢 -5.6% [-6.7%, -4.6%]
WolfSheep	small	🔵 -2.1% [-2.4%, -1.8%]	🔵 -0.8% [-1.0%, -0.6%]
WolfSheep	large	🔵 -2.9% [-6.4%, -1.0%]	🔵 -2.5% [-3.6%, -1.5%]
BoidFlockers	small	🔵 -1.9% [-2.2%, -1.5%]	🔵 -0.1% [-0.3%, +0.1%]
BoidFlockers	large	🔵 -0.7% [-1.4%, +0.1%]	🔵 +0.1% [-0.3%, +0.6%]

mesa/batchrunner.py

codebreaker32 and others added 2 commits January 10, 2026 16:31

Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps

40e64c1

Merge branch 'main' into bug/batch_run

141ed50

quaquel reviewed Jan 10, 2026

View reviewed changes

mesa/batchrunner.py Outdated Show resolved Hide resolved

quaquel added the bug Release notes label label Jan 10, 2026

codebreaker32 force-pushed the bug/batch_run branch from ee4a6ea to f63c4b5 Compare January 11, 2026 06:10

Add test coverage for fixed batch_run behavior

8a3e60d

codebreaker32 force-pushed the bug/batch_run branch from f63c4b5 to 8a3e60d Compare January 11, 2026 06:13

codebreaker32 requested a review from quaquel January 11, 2026 07:18

codebreaker32 force-pushed the bug/batch_run branch from a9af8a1 to 2522225 Compare January 11, 2026 08:37

Ensure complete test coverage

c765d88

codebreaker32 force-pushed the bug/batch_run branch from 2522225 to c765d88 Compare January 11, 2026 08:39

EwoutH added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Jan 11, 2026

codebreaker32 and others added 2 commits January 11, 2026 19:27

Merge branch 'main' into bug/batch_run

46b6496

Merge branch 'main' into bug/batch_run

6b9477d

quaquel reviewed Jan 12, 2026

View reviewed changes

mesa/batchrunner.py Outdated Show resolved Hide resolved

Apply suggestion from @quaquel

fad9220

quaquel approved these changes Jan 12, 2026

View reviewed changes

quaquel merged commit d3781de into mesa:main Jan 12, 2026
14 checks passed

codebreaker32 deleted the bug/batch_run branch February 17, 2026 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps#3109

Fix batch_run Data Collection to Ensure Accuracy and Capture All Steps#3109
quaquel merged 7 commits intomesa:mainfrom
codebreaker32:bug/batch_run

codebreaker32 commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

Uh oh!

quaquel commented Jan 10, 2026

Uh oh!

codebreaker32 commented Jan 11, 2026 •

edited

Loading

Uh oh!

quaquel commented Jan 11, 2026

Uh oh!

codebreaker32 commented Jan 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

codebreaker32 commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug / Issue

Implementation

Testing

Additional Notes

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

Uh oh!

quaquel commented Jan 10, 2026

Uh oh!

codebreaker32 commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quaquel commented Jan 11, 2026

Uh oh!

codebreaker32 commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codebreaker32 commented Jan 10, 2026 •

edited

Loading

codebreaker32 commented Jan 11, 2026 •

edited

Loading

codebreaker32 commented Jan 11, 2026 •

edited

Loading