fix: Isolate step containers network namespace to match docker:// action semantics #1333
No reviewers
Labels
No labels
FreeBSD
Kind/Breaking
Kind/Bug
Kind/Chore
Kind/DependencyUpdate
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
Windows
linux-powerpc64le
linux-riscv64
linux-s390x
run-end-to-end-tests
run-forgejo-tests
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo/runner!1333
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "syncstack/runner:fix/step-container-network-isolation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When using
uses: docker://...in workflows, step containers are created withNetworkMode: "container:<job_container_name>", which makes them share the entire network namespace with the job container, including:Reproduction:
When you exec into the step container,
hostnamereturns the job container's ID, not its own. This makes debugging confusing and breaks the expected isolation model.Expected Behavior
As a user, when I specify
uses: docker://image, I expect:Solution
Changed network configuration in
step_docker.goto connect step containers via network name instead of namespace sharing:Maybe Breaking Changes
This may change behavior for workflows that rely on shared network namespace:
Potentially affected pattern:
Thanks a lot. Makes sense and looks good. I don't approve it because I'm not that familiar with all aspects of Docker networking in Forgejo Runner.
cascading-pr updated at actions/setup-forgejo#864
what's the GitHub action behavior? I think most users assume same behavior as GitHub.
I currently can't check it on GitHub Actions.
If someone has the ability to verify how GitHub handles this scenario, that would be valuable information.
P.S. While I understand the desire for compatibility, I think there's value in
considering what the correct behavior should be, regardless of what GitHub does.
Sometimes it's worth doing the right thing rather than copying bugs. :)
GitHub Actions prints different container IDs when running the reproducer whereas Forgejo Actions prints identical container IDs.
The example listed under "Potentially affected pattern" doesn't work out of the box and I wasn't able to find a variant that works on either GitHub Actions or Forgejo Actions. My latest attempt:
This change looks great to me.
I think the real-world risk of this being a breaking change is very low -- you'd have to be doing something weird, which I'm not even confident is possible, to leave a running process in the job container from an earlier step. And that weirdness would have to occur concurrently with the pretty uncommon usage of
uses: docker://, of course.I was more concerned initially with "are services still accessible?", which I wanted to do a hands-on test for. That worked perfectly, as below. I am wondering if we should include a more integration-style test which validates the network access we expect to have, as the current added test is a bit more "did we configure it the way we expect" rather than "does it work the way we expect"... but I think it's a grey area as the runner test suite's job isn't to test the OCI runtime works like we're telling it to. I'll leave that as an open question which can be addressed in a future PR if desired.
Thanks for the contribution!