forgejo/runner

Fork 91

fix: Isolate step containers network namespace to match docker:// action semantics #1333

Merged

mfenniak merged 3 commits from syncstack/runner:fix/step-container-network-isolation into main

2026-02-01 06:15:31 +00:00

syncstack commented

2026-01-24 21:31:58 +00:00

Contributor

When using uses: docker://... in workflows, step containers are created with NetworkMode: "container:<job_container_name>", which makes them share the entire network namespace with the job container, including:

Network interfaces
IP addresses
Hostname
Ports and localhost

Reproduction:

jobs:
  test:
    runs-on: ubuntu-latest
    container: alpine:latest
    steps:
      - run: hostname

      - uses: docker://busybox:latest
        with:
          args: hostname

When you exec into the step container, hostname returns the job container's ID, not its own. This makes debugging confusing and breaks the expected isolation model.

Expected Behavior

As a user, when I specify uses: docker://image, I expect:

Container isolation: The step runs in a separate, isolated container
Own identity: The container has its own hostname (container ID by default)
Network communication: Can still communicate with job container and services via Docker network

Solution

Changed network configuration in step_docker.go to connect step containers via network name instead of namespace sharing:

Maybe Breaking Changes

This may change behavior for workflows that rely on shared network namespace:

Potentially affected pattern:

jobs:
  test:
    container: node:18
    steps:
      # If something in job container listens on localhost:8080
      - run: node server.js &
      
      # This docker:// step tries to access it
      - uses: docker://curlimages/curl:latest
        run: curl http://localhost:8080  # Will fail after this PR

bug fixes
- PR: fix: Isolate step containers network namespace to match docker:// action semantics

When using `uses: docker://...` in workflows, step containers are created with `NetworkMode: "container:<job_container_name>"`, which makes them share the entire network namespace with the job container, including: - Network interfaces - IP addresses - Hostname - Ports and localhost ## Reproduction: ```yaml jobs: test: runs-on: ubuntu-latest container: alpine:latest steps: - run: hostname - uses: docker://busybox:latest with: args: hostname ``` When you exec into the step container, `hostname` returns the job container's ID, not its own. This makes debugging confusing and breaks the expected isolation model. ## Expected Behavior As a user, when I specify `uses: docker://image`, I expect: 1. Container isolation: The step runs in a separate, isolated container 2. Own identity: The container has its own hostname (container ID by default) 3. Network communication: Can still communicate with job container and services via Docker network ## Solution Changed network configuration in `step_docker.go` to connect step containers via network name instead of namespace sharing: ## Maybe Breaking Changes This *may* change behavior for workflows that rely on shared network namespace: ### Potentially affected pattern: ```yaml jobs: test: container: node:18 steps: # If something in job container listens on localhost:8080 - run: node server.js & # This docker:// step tries to access it - uses: docker://curlimages/curl:latest run: curl http://localhost:8080 # Will fail after this PR ```   - bug fixes - [PR](https://code.forgejo.org/forgejo/runner/pulls/1333): fix: Isolate step containers network namespace to match docker:// action semantics

syncstack added 3 commits

2026-01-24 21:31:58 +00:00

fix(docker): correct step container network configuration 0ea44fc9f6

- Use `rc.getNetworkName` instead of `container:jobContainerName`
- Add `NetworkAliases` with sanitized step ID

test(step_docker): clarify network mode assertion text in Docker step tests 9e9492e1c1

test: remove fork-specific tests and align with upstream test suite

checks / validate pre-commit-hooks file (pull_request) Successful in 43s

Details

checks / validate mocks (pull_request) Successful in 48s

Details

checks / Build Forgejo Runner (pull_request) Successful in 52s

Details

checks / runner exec tests (pull_request) Successful in 40s

Details

checks / Build unsupported platforms (pull_request) Successful in 1m7s

Details

checks / integration tests (docker-latest) (pull_request) Successful in 9m5s

Details

checks / integration tests (docker-stable) (pull_request) Successful in 11m3s

Details

cascade / debug (pull_request_target) Has been skipped

Details

cascade / forgejo (pull_request_target) Has been skipped

Details

cascade / end-to-end (pull_request_target) Has been skipped

Details

issue-labels / release-notes (pull_request_target) Successful in 7s

Details

716f398630

aahlenst commented

2026-01-26 16:24:15 +00:00

Member

Thanks a lot. Makes sense and looks good. I don't approve it because I'm not that familiar with all aspects of Docker networking in Forgejo Runner.

viceice added the

run-end-to-end-tests

label

2026-01-27 05:52:03 +00:00

cascading-pr referenced this pull request from actions/setup-forgejo

2026-01-27 05:52:14 +00:00

cascading-pr from https://code.forgejo.org/forgejo/runner refs/pull/1333/head to forgejo/runner-1333 #864

cascading-pr commented

2026-01-27 05:52:14 +00:00

Contributor

cascading-pr updated at actions/setup-forgejo#864

cascading-pr updated at https://code.forgejo.org/actions/setup-forgejo/pulls/864

viceice commented

2026-01-27 05:53:12 +00:00

Owner

what's the GitHub action behavior? I think most users assume same behavior as GitHub.

syncstack commented

2026-01-27 09:55:35 +00:00

Author

Contributor

I currently can't check it on GitHub Actions.
If someone has the ability to verify how GitHub handles this scenario, that would be valuable information.

P.S. While I understand the desire for compatibility, I think there's value in
considering what the correct behavior should be, regardless of what GitHub does.
Sometimes it's worth doing the right thing rather than copying bugs. :)

I currently can't check it on GitHub Actions. If someone has the ability to verify how GitHub handles this scenario, that would be valuable information. P.S. While I understand the desire for compatibility, I think there's value in considering what the correct behavior should be, regardless of what GitHub does. Sometimes it's worth doing the right thing rather than copying bugs. :)

aahlenst commented

2026-01-28 14:39:50 +00:00

Member

GitHub Actions prints different container IDs when running the reproducer whereas Forgejo Actions prints identical container IDs.

The example listed under "Potentially affected pattern" doesn't work out of the box and I wasn't able to find a variant that works on either GitHub Actions or Forgejo Actions. My latest attempt:

on:
  push:
  workflow_dispatch:
jobs:
  test:
    runs-on: ubuntu-latest
    container: nginx:latest
    steps:
      - run: nginx &
      
      - uses: docker://curlimages/curl:latest
        with:
          args: curl http://localhost:8080

GitHub Actions prints different container IDs when running the reproducer whereas Forgejo Actions prints identical container IDs. The example listed under "Potentially affected pattern" doesn't work out of the box and I wasn't able to find a variant that works on either GitHub Actions or Forgejo Actions. My latest attempt: ```yaml on: push: workflow_dispatch: jobs: test: runs-on: ubuntu-latest container: nginx:latest steps: - run: nginx & - uses: docker://curlimages/curl:latest with: args: curl http://localhost:8080 ```

mfenniak approved these changes

2026-02-01 06:15:13 +00:00

mfenniak left a comment

Owner

This change looks great to me.

I think the real-world risk of this being a breaking change is very low -- you'd have to be doing something weird, which I'm not even confident is possible, to leave a running process in the job container from an earlier step. And that weirdness would have to occur concurrently with the pretty uncommon usage of uses: docker://, of course.

I was more concerned initially with "are services still accessible?", which I wanted to do a hands-on test for. That worked perfectly, as below. I am wondering if we should include a more integration-style test which validates the network access we expect to have, as the current added test is a bit more "did we configure it the way we expect" rather than "does it work the way we expect"... but I think it's a grey area as the runner test suite's job isn't to test the OCI runtime works like we're telling it to. I'll leave that as an open question which can be addressed in a future PR if desired.

on:
  pull_request:

jobs:
  test:
    runs-on: docker
    services:
      maindb:
        image: data.forgejo.org/oci/mysql:8.4
        env:
          MYSQL_DATABASE: dbname
          MYSQL_USER: dbuser
          MYSQL_PASSWORD: dbpass
          MYSQL_RANDOM_ROOT_PASSWORD: yes
        options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3
    steps:
    - uses: docker://data.forgejo.org/oci/mysql:8.4
      with:
        entrypoint: mysql
        args: -u dbuser -D dbname -pdbpass -h maindb -e "create table T(id INT NOT NULL AUTO_INCREMENT, val VARCHAR(255), PRIMARY KEY (id))"

This change looks great to me. I think the real-world risk of this being a breaking change is very low -- you'd have to be doing something weird, which I'm not even confident is possible, to leave a running process in the job container from an earlier step. And that weirdness would have to occur concurrently with the pretty uncommon usage of `uses: docker://`, of course. I was more concerned initially with "are services still accessible?", which I wanted to do a hands-on test for. That worked perfectly, as below. I am wondering if we should include a more integration-style test which validates the network access we expect to have, as the current added test is a bit more "did we configure it the way we expect" rather than "does it work the way we expect"... but I think it's a grey area as the runner test suite's job isn't to test the OCI runtime works like we're telling it to. I'll leave that as an open question which can be addressed in a future PR if desired. ```yaml on: pull_request: jobs: test: runs-on: docker services: maindb: image: data.forgejo.org/oci/mysql:8.4 env: MYSQL_DATABASE: dbname MYSQL_USER: dbuser MYSQL_PASSWORD: dbpass MYSQL_RANDOM_ROOT_PASSWORD: yes options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3 steps: - uses: docker://data.forgejo.org/oci/mysql:8.4 with: entrypoint: mysql args: -u dbuser -D dbname -pdbpass -h maindb -e "create table T(id INT NOT NULL AUTO_INCREMENT, val VARCHAR(255), PRIMARY KEY (id))" ```