Skip to content

Fix: Correct FixedAgent state after removal#3100

Merged
quaquel merged 5 commits intomesa:mainfrom
Sonu0305:3099
Jan 9, 2026
Merged

Fix: Correct FixedAgent state after removal#3100
quaquel merged 5 commits intomesa:mainfrom
Sonu0305:3099

Conversation

@Sonu0305
Copy link
Copy Markdown
Contributor

@Sonu0305 Sonu0305 commented Jan 9, 2026

Summary

This pull request refactors the removal logic for FixedAgent to ensure its internal state is consistent after being removed. The remove() method now guarantees that the agent's .cell attribute is set to None.

Bug / Issue

This PR addresses an inconsistency in the state of a FixedAgent after remove() is called.
Fixes #3099

Implementation

Testing

To verify the fix, a new test case, test_fixed_agent_removal_state, was added to tests/discrete_space/test_discrete_space.py.

Additional Notes

This change improves the internal consistency of the FixedAgent's state after agent.remove() has been called.

@Sonu0305 Sonu0305 changed the title Fix: Correct FixedAgent state after removal Fix: Correct FixedAgent state after removal Jan 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2026

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +2.2% [+1.0%, +3.4%] 🔵 +0.7% [+0.5%, +0.8%]
BoltzmannWealth large 🔵 +0.8% [+0.3%, +1.5%] 🔵 +4.5% [+1.0%, +8.1%]
Schelling small 🔵 +0.5% [-0.5%, +1.7%] 🔵 -0.1% [-0.7%, +0.6%]
Schelling large 🔵 +0.4% [+0.1%, +0.8%] 🔴 +7.2% [+5.4%, +8.9%]
WolfSheep small 🔵 +1.2% [+0.9%, +1.5%] 🔵 +0.7% [+0.3%, +1.1%]
WolfSheep large 🔵 -0.8% [-3.7%, +1.1%] 🔵 +1.3% [-0.8%, +3.2%]
BoidFlockers small 🔵 +1.9% [+1.6%, +2.3%] 🔵 +1.0% [+0.8%, +1.3%]
BoidFlockers large 🔵 +1.2% [+0.5%, +2.0%] 🔵 +0.0% [-0.2%, +0.2%]

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 9, 2026

It eliminates a potential source of subtle bugs in user models that rely on the state of removed agents.

This is not correct. The moment you call agent.remove(), this also removes the agent from the model itself. So, anything that relies on model.agent, will no longer access the removed agent.

So, in short, technically this PR is cleaner than the current code, so I am fine with merging it after review. However, the claimed bug is simply not there.

@EwoutH EwoutH requested a review from quaquel January 9, 2026 12:47
@EwoutH EwoutH added the bug Release notes label label Jan 9, 2026
@Sonu0305
Copy link
Copy Markdown
Contributor Author

Sonu0305 commented Jan 9, 2026

It eliminates a potential source of subtle bugs in user models that rely on the state of removed agents.

This is not correct. The moment you call agent.remove(), this also removes the agent from the model itself. So, anything that relies on model.agent, will no longer access the removed agent.

So, in short, technically this PR is cleaner than the current code, so I am fine with merging it after review. However, the claimed bug is simply not there.

Right, makes sense.
The problem is that the agent object itself, is left in an inconsistent state. Its cell property does not reflect reality.

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 9, 2026

The problem is that the agent object itself, is left in an inconsistent state. Its cell property does not reflect reality.

Exactly, which is unlikely to cause a bug because the agent has been removed, so no reference to it inside Mesa itself exists anymore. If you can refine the PR description, I'll merge this asap.

@Sonu0305
Copy link
Copy Markdown
Contributor Author

Sonu0305 commented Jan 9, 2026

The problem is that the agent object itself, is left in an inconsistent state. Its cell property does not reflect reality.

Exactly, which is unlikely to cause a bug because the agent has been removed, so no reference to it inside Mesa itself exists anymore. If you can refine the PR description, I'll merge this asap.

Is the new description appropriate?
Thanks.

@Sonu0305 Sonu0305 requested a review from quaquel January 9, 2026 13:28
@quaquel quaquel merged commit f962207 into mesa:main Jan 9, 2026
14 checks passed
@Sonu0305 Sonu0305 deleted the 3099 branch January 9, 2026 18:12
@codebreaker32
Copy link
Copy Markdown
Collaborator

codebreaker32 commented Jan 9, 2026

I think this was an intended feature (albeit implemented via a "side effect") rather than a bug.

It appears the author(@Corvince ) intentionally left the _mesa_cell variable pointing to the old cell in FixedAgent.remove() specifically to maintain the immobility contract. If we "clean up" this variable by setting it to None, we accidentally unlock the agent, allowing it to be resurrected and teleported.

To Reproduce:

from mesa.model import Model
from mesa.discrete_space import OrthogonalMooreGrid
from mesa.discrete_space.cell_agent import FixedAgent, CellAgent

class Walker(CellAgent):
    """A commuter trying to walk East."""
    def step(self):
        # Try to move East (0, 1) relative to current position
        print(f"Walker at {self.cell.coordinate} trying to move East...")
        
        # Get the cell to the East
        current_x, current_y = self.cell.coordinate
        target_cell = self.model.grid._cells.get((current_x, current_y + 1))
        
        if target_cell and target_cell.is_empty:
            self.cell = target_cell
            print(f" -> Success! Moved to {self.cell.coordinate}")
        else:
            # If blocked, report WHO is blocking us
            blockers = target_cell.agents if target_cell else []
            print(f" -> BLOCKED! Path obstructed by: {blockers}")

class MockModel(Model):
    def __init__(self):
        super().__init__()
        self.grid = OrthogonalMooreGrid((1, 3), random=self.random)
        
        self.walker = Walker(self)
        self.walker.cell = self.grid[(0, 0)]
        
        self.building = FixedAgent(self)
        self.building.cell = self.grid[(0, 2)]

    def step(self):
        self.agents.shuffle_do("step")


city = MockModel()

print(f"1. Initial State:")
print(f"   Walker is at {city.walker.cell.coordinate}")
print(f"   Building is at {city.building.cell.coordinate}")
print(f"   Grid (0,1) is empty: {city.grid[(0,1)].is_empty}")

print("\n2. EVENT: Demolishing the Building...")
# Correctly remove from scheduler (Time)
city.building.remove() 

print("\n3. EVENT: The Exploit (Teleportation)")
# We move the 'dead' building to (0, 1) - right in the Walker's path
# This should NOT be allowed for a FixedAgent
city.building.cell = city.grid[(0, 1)] 
print(f"   Zombie Building successfully placed at {city.building.cell.coordinate}")

print(f"\n4. VERIFICATION: Is the building in the Model?")
print(f"   In Scheduler? {'Yes' if city.building in city.agents else 'NO'}")
print(f"   (The model thinks the building is gone!)")

print("\n5. STEP: Walker tries to move")
# The walker steps. It should be blocked by the "ghost" building.
city.step()

output

Before:

1. Initial State:
   Walker is at (0, 0)
   Building is at (0, 2)
   Grid (0,1) is empty: True

2. EVENT: Demolishing the Building...

3. EVENT: The Exploit (Teleportation)
Traceback (most recent call last):
  File "/home/aman/mesa/script.py", line 52, in <module>
    city.building.cell = city.grid[(0, 1)] 
    ^^^^^^^^^^^^^^^^^^
  File "/home/aman/mesa/mesa/discrete_space/cell_agent.py", line 83, in cell
    raise ValueError("Cannot move agent in FixedCell")
ValueError: Cannot move agent in FixedCell

after

1. Initial State:
   Walker is at (0, 0)
   Building is at (0, 2)
   Grid (0,1) is empty: True

2. EVENT: Demolishing the Building...

3. EVENT: The Exploit (Teleportation)
   Zombie Building successfully placed at (0, 1)

4. VERIFICATION: Is the building in the Model?
   In Scheduler? NO
   (The model thinks the building is gone!)

5. STEP: Walker tries to move
Walker at (0, 0) trying to move East...
 -> BLOCKED! Path obstructed by: [<mesa.discrete_space.cell_agent.FixedAgent object at 0x7f3ee5cb5be0>]

While it looks like a bug, retaining the _mesa_cell reference was currently the only thing preventing a removed FixedAgent from being reassigned to a new location.
The FixedCell mixin enforced immobility by checking if self.cell is not None

Please let me know if I am wrong in assuming it!

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 9, 2026

I built the discrete spaces in close interaction with Corvince. I also added Agent.remove().

To understand why this PR is fine and was already merged, please carefully check what happens in Agent.remove() and how it interacts with model.agents, and most importantly, model._agents. You only ever call agent.remove() to completely remove an agent from the simulation. With a properly implemented model, after you have called this, the object will be garbage collected.

Your example works, because you have done self.building = FixedAgent(self). This creates a hard reference to the agent, preventing garbage collection from cleaning it up. The proper way to handle this would be to either subclass FixedAgent to clear model.building, use a weakref for model.building, or (best practice) simply not assigning FixedAgent(self) to an attribute.

@codebreaker32
Copy link
Copy Markdown
Collaborator

I just asked this out of curiosity(seeing it merged), so I wanted to dig deeper to see what would happen. Thanks for the explanation, really appreciate it :)

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 9, 2026

What your example highlights, but what is not well-documented yet in the Mesa docs, is that you have to be careful with hard references to agents.

I ran into this myself with a PhD student I was advising. She had a model where agents had a social network. Basically, each agent had friends, and all this was implemented using a normal dict. When running a large parameter sweep on this model, we encountered a massive memory issue. The social network of hard references was so large that the garbage collector was unable to detect that it was, in fact, a cyclical network and so could be cleared.

If agent.friends had been implemented using a weakkeydict, the entire problem would have been avoided.

@codebreaker32
Copy link
Copy Markdown
Collaborator

codebreaker32 commented Jan 9, 2026

I think it’s always helpful to learn from real-world experiences like that, it definitely gave me some good perspective.
Really appreciate you sharing this :)

@hillhack
Copy link
Copy Markdown

hillhack commented Mar 7, 2026

After going through the discussion above, I want to clarify the interpretation of the two viewpoints.

@codebreaker32 — As I understand, the idea was that the current “broken” setter was effectively acting as a guardrail.
If the internal variable were cleared, a user could potentially “resurrect” a removed FixedAgent (for example a building) and move it elsewhere. The intention was to keep that behavior as a safeguard.

@quaquel — your perspective seems to be that once an agent is removed, it should simply be considered dead.
If a user keeps a hard reference to that object and tries to reuse it, that is a misuse of the API, and the library should not rely on stale internal state as a protective mechanism.

However, the main issue is that the Spatial API itself becomes inconsistent.

A common Mesa operation is:

agent.cell = None

which is the standard way to remove an agent from the grid.

For FixedAgent, this operation currently raises either a ValueError or an AttributeError.
So the intended “safety guardrail” ends up behaving more like a barbed wire fence, blocking normal use of the API and causing crashes in otherwise valid workflows (e.g., removing a building from the grid to place something else).

The goal of the fix is simply to restore consistent spatial API behavior, without requiring users to bypass the public API or modify private attributes.
This is reason why i raise #3467 pr. If still any confusion do let me know. or if m wrong do correct me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent state in FixedAgent after removal

5 participants