fix: Isolate step containers network namespace to match docker:// action semantics #1333

Merged
mfenniak merged 3 commits from syncstack/runner:fix/step-container-network-isolation into main 2026-02-01 06:15:31 +00:00
Contributor

When using uses: docker://... in workflows, step containers are created with NetworkMode: "container:<job_container_name>", which makes them share the entire network namespace with the job container, including:

  • Network interfaces
  • IP addresses
  • Hostname
  • Ports and localhost

Reproduction:

jobs:
  test:
    runs-on: ubuntu-latest
    container: alpine:latest
    steps:
      - run: hostname

      - uses: docker://busybox:latest
        with:
          args: hostname

When you exec into the step container, hostname returns the job container's ID, not its own. This makes debugging confusing and breaks the expected isolation model.

Expected Behavior

As a user, when I specify uses: docker://image, I expect:

  1. Container isolation: The step runs in a separate, isolated container
  2. Own identity: The container has its own hostname (container ID by default)
  3. Network communication: Can still communicate with job container and services via Docker network

Solution

Changed network configuration in step_docker.go to connect step containers via network name instead of namespace sharing:

Maybe Breaking Changes

This may change behavior for workflows that rely on shared network namespace:

Potentially affected pattern:

jobs:
  test:
    container: node:18
    steps:
      # If something in job container listens on localhost:8080
      - run: node server.js &
      
      # This docker:// step tries to access it
      - uses: docker://curlimages/curl:latest
        run: curl http://localhost:8080  # Will fail after this PR
  • bug fixes
    • PR: fix: Isolate step containers network namespace to match docker:// action semantics
When using `uses: docker://...` in workflows, step containers are created with `NetworkMode: "container:<job_container_name>"`, which makes them share the entire network namespace with the job container, including: - Network interfaces - IP addresses - Hostname - Ports and localhost ## Reproduction: ```yaml jobs: test: runs-on: ubuntu-latest container: alpine:latest steps: - run: hostname - uses: docker://busybox:latest with: args: hostname ``` When you exec into the step container, `hostname` returns the job container's ID, not its own. This makes debugging confusing and breaks the expected isolation model. ## Expected Behavior As a user, when I specify `uses: docker://image`, I expect: 1. Container isolation: The step runs in a separate, isolated container 2. Own identity: The container has its own hostname (container ID by default) 3. Network communication: Can still communicate with job container and services via Docker network ## Solution Changed network configuration in `step_docker.go` to connect step containers via network name instead of namespace sharing: ## Maybe Breaking Changes This *may* change behavior for workflows that rely on shared network namespace: ### Potentially affected pattern: ```yaml jobs: test: container: node:18 steps: # If something in job container listens on localhost:8080 - run: node server.js & # This docker:// step tries to access it - uses: docker://curlimages/curl:latest run: curl http://localhost:8080 # Will fail after this PR ``` <!--start release-notes-assistant--> <!--URL:https://code.forgejo.org/forgejo/runner--> - bug fixes - [PR](https://code.forgejo.org/forgejo/runner/pulls/1333): <!--number 1333 --><!--line 0 --><!--description Zml4OiBJc29sYXRlIHN0ZXAgY29udGFpbmVycyBuZXR3b3JrIG5hbWVzcGFjZSB0byBtYXRjaCBkb2NrZXI6Ly8gYWN0aW9uIHNlbWFudGljcw==-->fix: Isolate step containers network namespace to match docker:// action semantics<!--description--> <!--end release-notes-assistant-->
- Use `rc.getNetworkName` instead of `container:jobContainerName`
- Add `NetworkAliases` with sanitized step ID
test: remove fork-specific tests and align with upstream test suite
All checks were successful
checks / validate pre-commit-hooks file (pull_request) Successful in 43s
checks / validate mocks (pull_request) Successful in 48s
checks / Build Forgejo Runner (pull_request) Successful in 52s
checks / runner exec tests (pull_request) Successful in 40s
checks / Build unsupported platforms (pull_request) Successful in 1m7s
checks / integration tests (docker-latest) (pull_request) Successful in 9m5s
checks / integration tests (docker-stable) (pull_request) Successful in 11m3s
cascade / debug (pull_request_target) Has been skipped
cascade / forgejo (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 7s
716f398630
Member

Thanks a lot. Makes sense and looks good. I don't approve it because I'm not that familiar with all aspects of Docker networking in Forgejo Runner.

Thanks a lot. Makes sense and looks good. I don't approve it because I'm not that familiar with all aspects of Docker networking in Forgejo Runner.
Contributor

cascading-pr updated at actions/setup-forgejo#864

cascading-pr updated at https://code.forgejo.org/actions/setup-forgejo/pulls/864
Owner

what's the GitHub action behavior? I think most users assume same behavior as GitHub.

what's the GitHub action behavior? I think most users assume same behavior as GitHub.
Author
Contributor

I currently can't check it on GitHub Actions.
If someone has the ability to verify how GitHub handles this scenario, that would be valuable information.

P.S. While I understand the desire for compatibility, I think there's value in
considering what the correct behavior should be, regardless of what GitHub does.
Sometimes it's worth doing the right thing rather than copying bugs. :)

I currently can't check it on GitHub Actions. If someone has the ability to verify how GitHub handles this scenario, that would be valuable information. P.S. While I understand the desire for compatibility, I think there's value in considering what the correct behavior should be, regardless of what GitHub does. Sometimes it's worth doing the right thing rather than copying bugs. :)
Member

GitHub Actions prints different container IDs when running the reproducer whereas Forgejo Actions prints identical container IDs.

The example listed under "Potentially affected pattern" doesn't work out of the box and I wasn't able to find a variant that works on either GitHub Actions or Forgejo Actions. My latest attempt:

on:
  push:
  workflow_dispatch:
jobs:
  test:
    runs-on: ubuntu-latest
    container: nginx:latest
    steps:
      - run: nginx &
      
      - uses: docker://curlimages/curl:latest
        with:
          args: curl http://localhost:8080

GitHub Actions prints different container IDs when running the reproducer whereas Forgejo Actions prints identical container IDs. The example listed under "Potentially affected pattern" doesn't work out of the box and I wasn't able to find a variant that works on either GitHub Actions or Forgejo Actions. My latest attempt: ```yaml on: push: workflow_dispatch: jobs: test: runs-on: ubuntu-latest container: nginx:latest steps: - run: nginx & - uses: docker://curlimages/curl:latest with: args: curl http://localhost:8080 ```
mfenniak approved these changes 2026-02-01 06:15:13 +00:00
mfenniak left a comment
Owner

This change looks great to me.

I think the real-world risk of this being a breaking change is very low -- you'd have to be doing something weird, which I'm not even confident is possible, to leave a running process in the job container from an earlier step. And that weirdness would have to occur concurrently with the pretty uncommon usage of uses: docker://, of course.

I was more concerned initially with "are services still accessible?", which I wanted to do a hands-on test for. That worked perfectly, as below. I am wondering if we should include a more integration-style test which validates the network access we expect to have, as the current added test is a bit more "did we configure it the way we expect" rather than "does it work the way we expect"... but I think it's a grey area as the runner test suite's job isn't to test the OCI runtime works like we're telling it to. I'll leave that as an open question which can be addressed in a future PR if desired.

on:
  pull_request:

jobs:
  test:
    runs-on: docker
    services:
      maindb:
        image: data.forgejo.org/oci/mysql:8.4
        env:
          MYSQL_DATABASE: dbname
          MYSQL_USER: dbuser
          MYSQL_PASSWORD: dbpass
          MYSQL_RANDOM_ROOT_PASSWORD: yes
        options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3
    steps:
    - uses: docker://data.forgejo.org/oci/mysql:8.4
      with:
        entrypoint: mysql
        args: -u dbuser -D dbname -pdbpass -h maindb -e "create table T(id INT NOT NULL AUTO_INCREMENT, val VARCHAR(255), PRIMARY KEY (id))"
This change looks great to me. I think the real-world risk of this being a breaking change is very low -- you'd have to be doing something weird, which I'm not even confident is possible, to leave a running process in the job container from an earlier step. And that weirdness would have to occur concurrently with the pretty uncommon usage of `uses: docker://`, of course. I was more concerned initially with "are services still accessible?", which I wanted to do a hands-on test for. That worked perfectly, as below. I am wondering if we should include a more integration-style test which validates the network access we expect to have, as the current added test is a bit more "did we configure it the way we expect" rather than "does it work the way we expect"... but I think it's a grey area as the runner test suite's job isn't to test the OCI runtime works like we're telling it to. I'll leave that as an open question which can be addressed in a future PR if desired. ```yaml on: pull_request: jobs: test: runs-on: docker services: maindb: image: data.forgejo.org/oci/mysql:8.4 env: MYSQL_DATABASE: dbname MYSQL_USER: dbuser MYSQL_PASSWORD: dbpass MYSQL_RANDOM_ROOT_PASSWORD: yes options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3 steps: - uses: docker://data.forgejo.org/oci/mysql:8.4 with: entrypoint: mysql args: -u dbuser -D dbname -pdbpass -h maindb -e "create table T(id INT NOT NULL AUTO_INCREMENT, val VARCHAR(255), PRIMARY KEY (id))" ```
Owner

Thanks for the contribution!

Thanks for the contribution!
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner!1333
No description provided.