Skip to content

fix: prevent infinite loop in select_random_empty_cell via heuristic fallback#3014

Merged
quaquel merged 7 commits intomesa:mainfrom
DipayanDasgupta:fix-grid-infinite-loop-2968
Dec 29, 2025
Merged

fix: prevent infinite loop in select_random_empty_cell via heuristic fallback#3014
quaquel merged 7 commits intomesa:mainfrom
DipayanDasgupta:fix-grid-infinite-loop-2968

Conversation

@DipayanDasgupta
Copy link
Copy Markdown
Contributor

@DipayanDasgupta DipayanDasgupta commented Dec 25, 2025

Closes #2968

This PR fixes the infinite loop in Grid.select_random_empty_cell when the grid is full, while optimizing for performance on large grids. It includes a comprehensive regression test suite.

The Issue

The original implementation used an unbounded while True loop to find a random empty cell. On a full grid, this condition is never met, causing the simulation to hang indefinitely.

The Solution: Heuristic Fallback

Instead of checking every cell (O(N)) or looping indefinitely, this implementation uses a constant-time heuristic:

  1. Try Random (O(1)): It attempts to find an empty cell via random sampling up to 50 times. This preserves the performance benefits for sparse grids.
  2. Fallback (Deterministic): If no empty cell is found after 50 tries (implying the grid is dense or full), it immediately falls back to super().select_random_empty_cell().

The parent method correctly builds a list of empty cells and selects one, or raises an IndexError if the grid is truly full.

Changes

  • Source: Modified mesa/discrete_space/grid.py to replace while True with a bounded loop.
  • Tests: Added tests/discrete_space/test_grid_sampling.py with comprehensive coverage:
    1. Full Grid: Verifies fallback is called and IndexError is raised.
    2. Sparse Grid: Verifies random sampling returns a cell quickly (fast path).
    3. Nearly Full Grid: Verifies the correct empty cell is found even when probability is low.
    4. Disabled Random: Verifies _try_random=False forces immediate fallback.
  • Documentation: Preserved original context comments regarding the cutoff trade-offs.

Verification

  • Passed all new unit tests locally.
  • Passed strict ruff linting and formatting.
  • Verified that UserWarning regarding RNG is suppressed in tests by using explicit fixtures.

@DipayanDasgupta
Copy link
Copy Markdown
Contributor Author

@coderabbitai review and give feedback on improvements possible

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 25, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the diff for the PR to see what changed
gh pr diff 3014

# Read the modified file to understand context
cat mesa/discrete_space/grid.py

# Check if there are test files related to this change
fd -e py test | rg -l "select_random_empty_cell" || echo "No test files found with select_random_empty_cell"

Length of output: 12349


@DipayanDasgupta, great fix for the infinite loop issue! The bounded loop approach is solid and the fallback mechanism is well-implemented. Here's my detailed feedback with suggestions for improvements:

✅ What Works Well

  1. Core fix is correct: The bounded loop prevents infinite loops while preserving the O(1) fast path for sparse grids
  2. Fallback logic: Properly delegates to the parent method which handles empty grid detection
  3. Behavior preserved: When _try_random=False, it correctly skips random sampling entirely

🔧 Suggested Improvements

1. Add Comprehensive Unit Tests (Critical)

No tests were found for select_random_empty_cell. This is a critical bug fix that should include:

def test_select_random_empty_cell_full_grid():
    """Test that full grid raises IndexError instead of hanging."""
    grid = OrthogonalMooreGrid((5, 5), capacity=1)
    # Fill all cells
    for cell in grid.all_cells:
        cell.add_agent(SomeAgent())
    
    with pytest.raises(IndexError):
        grid.select_random_empty_cell()

def test_select_random_empty_cell_sparse_grid():
    """Test random selection on sparse grid completes quickly."""
    grid = OrthogonalMooreGrid((100, 100))
    # Should find empty cell quickly without fallback
    cell = grid.select_random_empty_cell()
    assert cell.is_empty

def test_select_random_empty_cell_nearly_full():
    """Test behavior when grid is 99% full."""
    grid = OrthogonalMooreGrid((10, 10), capacity=1)
    cells = list(grid.all_cells)
    # Fill all but one cell
    for cell in cells[:-1]:
        cell.add_agent(SomeAgent())
    
    empty_cell = grid.select_random_empty_cell()
    assert empty_cell == cells[-1]

2. Make Max Attempts Configurable

The magic number 50 lacks justification. Consider:

def __init__(self, ..., max_empty_search_attempts: int = 50):
    ...
    self._max_empty_search_attempts = max_empty_search_attempts

def select_random_empty_cell(self) -> T:
    if self._try_random:
        for _ in range(self._max_empty_search_attempts):
            cell = self.all_cells.select_random_cell()
            if cell.is_empty:
                return cell
    
    return super().select_random_empty_cell()

Alternative: Make it a class constant MAX_EMPTY_SEARCH_ATTEMPTS = 50 with documentation explaining the trade-off.

3. Improve Documentation

The comments could be more explicit:

def select_random_empty_cell(self) -> T:
    """Select a random empty cell from the grid.
    
    Uses a heuristic approach: attempts random sampling up to 50 times (O(1) per attempt),
    then falls back to exhaustive search (O(N) where N is total cells).
    
    Returns:
        T: A random empty cell
        
    Raises:
        IndexError: If the grid is completely full
        
    See Also:
        https://github.com/mesa/mesa/issues/1052
        https://github.com/mesa/mesa/issues/2968
    """

4. Consider Adaptive Approach (Optional Enhancement)

For very large grids, 50 attempts might be insufficient before hitting the expensive O(N) fallback. Consider:

# Scale attempts with grid size (e.g., log or sqrt)
max_attempts = min(50, max(10, int(math.log10(len(self._cells)) * 10)))

5. Add Performance Metrics (Optional)

For debugging and optimization, consider tracking fallback usage:

# In __init__
self._empty_cell_search_stats = {"fast_path": 0, "fallback": 0}

# In select_random_empty_cell
if self._try_random:
    for _ in range(50):
        cell = self.all_cells.select_random_cell()
        if cell.is_empty:
            self._empty_cell_search_stats["fast_path"] += 1
            return cell

self._empty_cell_search_stats["fallback"] += 1
return super().select_random_empty_cell()

6. Consider Performance Trade-off Documentation

Add a note about when to toggle _try_random:

# When grid is >98% full, set _try_random=False to skip random attempts
# grid._try_random = False

Priority Recommendations

  1. High Priority: Add unit tests (especially for full grid case)
  2. Medium Priority: Document why 50 was chosen or make it configurable
  3. Low Priority: Consider adaptive approach for very large grids

The fix is production-ready as-is, but tests would greatly increase confidence in the implementation!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 -1.9% [-3.1%, -0.7%] 🔵 -0.2% [-0.4%, -0.1%]
BoltzmannWealth large 🔵 -0.0% [-2.0%, +3.2%] 🟢 -5.9% [-7.6%, -4.2%]
Schelling small 🔵 -0.4% [-1.5%, +0.6%] 🔵 -0.3% [-0.7%, +0.2%]
Schelling large 🔵 +0.8% [-0.3%, +2.5%] 🔵 +1.3% [-1.2%, +3.6%]
WolfSheep small 🔵 -0.4% [-0.8%, -0.0%] 🔵 -0.5% [-0.7%, -0.3%]
WolfSheep large 🔵 -0.7% [-4.1%, +1.2%] 🔵 +2.4% [+0.6%, +4.0%]
BoidFlockers small 🔵 -2.1% [-2.6%, -1.6%] 🔵 +0.2% [-0.1%, +0.5%]
BoidFlockers large 🔵 -1.2% [-1.6%, -0.9%] 🔵 +0.7% [+0.5%, +1.0%]

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 25, 2025

Walkthrough

The select_random_empty_cell method in the grid module has been modified to prevent infinite loops. It now limits random sampling attempts to 50 iterations, and if no empty cell is found, it falls back to the parent class's deterministic implementation, ensuring termination in all cases.

Changes

Cohort / File(s) Summary
Bug Fix: Prevent Infinite Loop in Random Cell Selection
mesa/discrete_space/grid.py
Modified select_random_empty_cell to cap random attempts at 50 before falling back to parent class implementation. Addresses hang when grid is full or nearly full.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A grid that hung when full and tight,
Now bounces back with bounded might,
Fifty tries, then fallback's grace—
No more loops in this tight space! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main change: preventing an infinite loop in select_random_empty_cell by implementing a heuristic fallback.
Linked Issues check ✅ Passed The PR successfully addresses the core objective from #2968: implementing a bounded heuristic with fallback to prevent infinite loops while preserving performance for sparse grids.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the infinite loop issue in select_random_empty_cell; no unrelated modifications were introduced.
Description check ✅ Passed The PR description comprehensively covers the issue, solution, changes made, and verification steps, following a clear structure.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
mesa/discrete_space/grid.py (1)

151-154: Bounded loop successfully prevents infinite loops.

The fix correctly limits random sampling attempts, ensuring termination in all cases. The logic preserves the O(1) fast path for sparse grids while guaranteeing the method will never hang.

Consider making the attempt limit configurable.

The hardcoded value of 50 works well, but making it configurable would improve flexibility for different use cases (e.g., very large grids or performance tuning).

💡 Options for making the limit configurable

Option 1: Class constant

class Grid(DiscreteSpace[T], HasPropertyLayers):
    """Base class for all grid classes."""
    
    MAX_RANDOM_ATTEMPTS = 50  # Maximum attempts before fallback to deterministic search
    
    # ... rest of class ...
    
    def select_random_empty_cell(self) -> T:
        if self._try_random:
            for _ in range(self.MAX_RANDOM_ATTEMPTS):
                # ... rest of method

Option 2: Constructor parameter

def __init__(
    self,
    dimensions: Sequence[int],
    torus: bool = False,
    capacity: float | None = None,
    random: Random | None = None,
    cell_klass: type[T] = Cell,
    max_random_attempts: int = 50,
) -> None:
    # ... existing init code ...
    self._max_random_attempts = max_random_attempts

Option 3 (minimal): Inline comment

for _ in range(50):  # Attempt limit chosen as balance between performance and reliability
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3be1194 and 4c45f9e.

📒 Files selected for processing (1)
  • mesa/discrete_space/grid.py
🧰 Additional context used
🧬 Code graph analysis (1)
mesa/discrete_space/grid.py (3)
mesa/discrete_space/discrete_space.py (2)
  • all_cells (151-155)
  • select_random_empty_cell (168-170)
mesa/discrete_space/cell_collection.py (1)
  • select_random_cell (101-103)
mesa/discrete_space/cell.py (1)
  • is_empty (130-132)
🔇 Additional comments (2)
mesa/discrete_space/grid.py (2)

147-149: LGTM! Clear documentation of the fix.

The comments effectively explain the rationale for the bounded loop and link to the relevant issue for future reference.


156-158: LGTM! Fallback ensures deterministic termination.

The fallback to the parent class's deterministic implementation correctly handles both the case where random sampling exhausts its attempts and when the grid is truly full (raising IndexError as expected).

Comment on lines 143 to +158
def select_random_empty_cell(self) -> T: # noqa
# FIXME:: currently just a simple boolean to control behavior
# FIXME:: basically if grid is close to 99% full, creating empty list can be faster
# FIXME:: note however that the old results don't apply because in this implementation
# FIXME:: because empties list needs to be rebuild each time
# This method is based on Agents.jl's random_empty() implementation. See
# https://github.com/JuliaDynamics/Agents.jl/pull/541. For the discussion, see
# https://github.com/mesa/mesa/issues/1052 and
# https://github.com/mesa/mesa/pull/1565. The cutoff value provided
# is the break-even comparison with the time taken in the else branching point.
#
# FIX: Added a limit (50 attempts) to prevent infinite loops on full grids.
# See https://github.com/mesa/mesa/issues/2968
if self._try_random:
while True:
for _ in range(50):
cell = self.all_cells.select_random_cell()
if cell.is_empty:
return cell
else:
return super().select_random_empty_cell()

# Fallback to the robust parent method (O(N)) if random sampling fails
# or if the grid is full.
return super().select_random_empty_cell()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for test files related to grid and discrete_space

echo "=== Repository structure for tests ==="
find . -type d -name "tests" -o -name "test" | head -20

echo -e "\n=== Searching for grid-related test files ==="
fd -e py -t f "test.*grid" -i

echo -e "\n=== Searching for discrete_space test files ==="
fd -e py -t f -path "*test*" "discrete_space" -i

echo -e "\n=== Checking main test directory structure ==="
if [ -d "tests" ]; then ls -la tests/ | head -20; fi
if [ -d "test" ]; then ls -la test/ | head -20; fi

Repository: mesa/mesa

Length of output: 1561


🏁 Script executed:

#!/bin/bash
# Search for select_random_empty_cell references in test files

echo "=== Searching for select_random_empty_cell in all Python test files ==="
rg "select_random_empty_cell" --type py -g "*test*" -C 3

echo -e "\n=== Searching for test files that might test this method ==="
rg "select_random_empty_cell" --type py -C 2

echo -e "\n=== Searching for Grid class tests ==="
rg "class.*Test.*Grid|def test.*empty.*cell" --type py -g "*test*" -i -C 2

Repository: mesa/mesa

Length of output: 11791


🏁 Script executed:

#!/bin/bash
# Get the full test file to see complete test coverage

echo "=== Content of test_discrete_space.py ==="
cat tests/discrete_space/test_discrete_space.py

echo -e "\n=== Lines around select_random_empty_cell in test_discrete_space.py ==="
rg "select_random_empty_cell" tests/discrete_space/test_discrete_space.py -B 10 -A 10

Repository: mesa/mesa

Length of output: 37856


🏁 Script executed:

#!/bin/bash
# Also check test_grid.py for any tests related to select_random_empty_cell

echo "=== Searching test_grid.py for select_random_empty_cell ==="
rg "select_random_empty_cell" tests/discrete_space/test_grid.py -B 5 -A 5

echo -e "\n=== Full content of test_grid.py (first 200 lines) ==="
head -200 tests/discrete_space/test_grid.py

Repository: mesa/mesa

Length of output: 6964


Add dedicated unit tests for select_random_empty_cell() in the Grid class.

While select_random_empty_cell() is used in existing tests (e.g., test_agents_property), there are no dedicated unit tests for the Grid class method being fixed. Comprehensive test coverage is essential for this critical bug fix, especially to prevent regressions on edge cases.

Add tests to cover:

  1. Full grid: Verify that the fallback to super().select_random_empty_cell() is called when all cells are occupied.
  2. Sparse grid: Verify that an empty cell is returned quickly when grid has many empty cells.
  3. Nearly-full grid: Verify that the 50-attempt limit and fallback work correctly when empty cells are rare.
  4. _try_random=False: Verify that the fallback path is used when random sampling is disabled.
🧰 Tools
🪛 Ruff (0.14.10)

143-143: Unused blanket noqa directive

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
mesa/discrete_space/grid.py around lines 143-158: add a new pytest file (e.g.,
tests/test_grid_select_random_empty_cell.py) with four focused unit tests: (1)
Full grid: create a grid with every cell occupied, monkeypatch/spy the parent
Grid.select_random_empty_cell to assert it is called and that the method returns
its value; (2) Sparse grid: create a large grid with many empty cells and assert
select_random_empty_cell returns an empty cell quickly (no fallback call); (3)
Nearly-full grid: populate the grid so empty cells are rare (e.g., one empty
cell) and assert the method either finds the empty cell or falls back after 50
attempts by spying the parent fallback was invoked; (4) _try_random=False: set
grid._try_random = False and assert the parent fallback is invoked immediately.
Use pytest fixtures and monkeypatch to spy the superclass fallback (patch the
base class method or use a small subclass wrapper) and assert correct behavior
and returned cell validity for each case.

@DipayanDasgupta DipayanDasgupta force-pushed the fix-grid-infinite-loop-2968 branch from 4c45f9e to 5e430a2 Compare December 25, 2025 14:03
@DipayanDasgupta
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I have implemented the requested changes:
Added Comprehensive Tests: Created tests/discrete_space/test_grid_sampling.py which explicitly verifies:
Full grid: Confirms IndexError is raised via fallback.
Sparse grid: Confirms random sampling works quickly.
Nearly full grid: Confirms the correct empty cell is found (handling CellCollection indexing correctly).
Disabled random: Confirms _try_random=False forces immediate fallback.
Linting & Docs: Added the missing docstring to select_random_empty_cell and removed the unused # noqa directive.
All local tests and pre-commit hooks are passing

@DipayanDasgupta DipayanDasgupta force-pushed the fix-grid-infinite-loop-2968 branch 2 times, most recently from 19781e2 to 1bc58ee Compare December 25, 2025 14:09
# See https://github.com/mesa/mesa/issues/2968
if self._try_random:
while True:
for _ in range(50):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest turning this into an attribute. As a general rule, it's bad to have implicit constants. By making this an attribute (e.g., self._n_tries_random), users can also potentially control the behavior.

@@ -0,0 +1,84 @@
"""Unit tests for Grid.select_random_empty_cell heuristic behavior."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include these in the overall test file rather than adding new test files.

@DipayanDasgupta DipayanDasgupta force-pushed the fix-grid-infinite-loop-2968 branch from 6213a77 to 6a4eaeb Compare December 29, 2025 05:16
@DipayanDasgupta
Copy link
Copy Markdown
Contributor Author

DipayanDasgupta commented Dec 29, 2025

@quaquel implemented it, _n_tries_random is now an instance attribute (default 50).
Moved the new tests into tests/discrete_space/test_grid.py as requested.

@quaquel quaquel added the bug Release notes label label Dec 29, 2025
@quaquel
Copy link
Copy Markdown
Member

quaquel commented Dec 29, 2025

Moved the new tests into tests/discrete_space/test_grid.py as requested.

Are you sure you commited this change because I still see the old test file under files changed at the top.

@quaquel quaquel mentioned this pull request Dec 29, 2025
@DipayanDasgupta DipayanDasgupta force-pushed the fix-grid-infinite-loop-2968 branch from 6b80f41 to c57a1b5 Compare December 29, 2025 14:40
@@ -1,4 +1,4 @@
"""Test Solara visualizations."""
"""Test Solara visualizations - Modern API."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of the PR. Please remove any changes you have made here (probably some branching issue?)

Copy link
Copy Markdown
Contributor Author

@DipayanDasgupta DipayanDasgupta Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya im sorry, I was working on another branch , and probably forgot to switch, which caused all this confusion. I'm so sorry.

@DipayanDasgupta DipayanDasgupta force-pushed the fix-grid-infinite-loop-2968 branch 2 times, most recently from cc899d3 to 5ae3e06 Compare December 29, 2025 14:58
@quaquel
Copy link
Copy Markdown
Member

quaquel commented Dec 29, 2025

I have taken the liberty to make a few additional changes to your PR so it can be merged, which I will do later today.

Thanks so far.

@quaquel quaquel merged commit 1d674e8 into mesa:main Dec 29, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grid.select_random_empty_cell hangs on full grids

2 participants