Skip to content

Bug: Race condition between stop and rm can leak network resources #130

@ramsyana

Description

@ramsyana

I've been digging into the container lifecycle and found a potential race condition that can occur when a user runs container stop <name> followed quickly by container rm <name>.

The Problem

The stop command returns control to the user before all of its background resource cleanup (like deallocating network IPs) is finished. If a user immediately runs rm, the container's on-disk assets can be deleted while the stop command's cleanup task is still trying to access them.

This causes two issues:

  1. The stop task crashes with a "file not found" error, creating noise in the system logs.
  2. More importantly, because the cleanup task fails, the container's IP address is never deallocated, leading to a resource leak.

How to Reproduce

It's a race, but this sequence triggers it:

  1. container run -d --name test-race alpine sleep infinity
  2. container stop test-race; container rm test-race (Run in quick succession)
  3. Observe errors in container system logs related to file access after the container bundle has been removed.
  4. If repeated, this will eventually exhaust the IP address pool.

Proposal for a Fix

The root of the issue seems to be that the "stopped" state isn't granular enough. The system needs to differentiate between "stop requested" and "fully cleaned up."

I think we can solve this by making the delete operation aware of the stop operation's full lifecycle. A possible approach:

  1. Introduce a more descriptive state: Add a .cleaning state to the SandboxService state machine to signal that resource deallocation is in progress.
  2. Make delete wait: When ContainersService.delete() sees a container in a .stopping or .cleaning state, it should wait until the state transitions to fully .stopped before proceeding with the deletion.

This would ensure resources are always cleaned up correctly before the on-disk assets are removed, making the whole process more robust.

I'd be happy to work on a PR for this if the approach seems reasonable.

Thanks for building such a cool project

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions