Skip to content

[docker-29.x backport] seccomp: Block AF_ALG sockets in default profile (CVE-2026-31431)#52501

Merged
vvoland merged 4 commits intomoby:docker-29.xfrom
vvoland:52494-docker-29.x
May 1, 2026
Merged

[docker-29.x backport] seccomp: Block AF_ALG sockets in default profile (CVE-2026-31431)#52501
vvoland merged 4 commits intomoby:docker-29.xfrom
vvoland:52494-docker-29.x

Conversation

@vvoland
Copy link
Copy Markdown
Contributor

@vvoland vvoland commented May 1, 2026

CVE-2026-31431 ("Copy Fail") is a logic flaw in the kernel's algif_aead module that allows any unprivileged user with access to AF_ALG sockets to perform a controlled 4-byte page-cache write, leading to reliable local privilege escalation. The exploit is a 732-byte Python script that works on every Linux distribution shipped since 2017.

Inside a container, this allows escalation to root within the container by corrupting setuid binaries in the page cache. Since the page cache is shared across the host, corruption of shared image-layer files is also visible to other containers using the same layers on the same node.

Seccomp profile changes

The previous default seccomp profile allowed AF_ALG sockets (only AF_VSOCK was denied). This update denies both AF_ALG (38) and AF_VSOCK (40) by allowing socket creation only for address families outside that range:

  • arg0 < 38 (AF_ALG) → allow
  • arg0 == 39 (the single family between them) → allow
  • arg0 > 40 (AF_VSOCK) → allow
  • everything else (38 and 40) → falls through to default ERRNO

The previous socket rule used a single arg0 != AF_VSOCK condition. Naively adding a second OpNotEqual for AF_ALG does not work: seccomp evaluates multiple argument conditions within a single rule as a logical AND, so arg0 != 38 AND arg0 != 40 requires two comparisons against the same argument index, which libseccomp does not support reliably in one rule. Splitting into separate deny-action rules also fails because any matching allow rule takes precedence in seccomp's first-match-wins evaluation.

See moby/profiles#20 for more details.

Additionally, socketcall(2) is now explicitly denied to prevent bypassing the socket address family filters on architectures with the legacy socketcall multiplexer. See https://github.com/moby/profiles/releases/tag/seccomp%2Fv0.2.2 for details.

Integration tests

Adds TestExecSocketDenied which compiles and runs small C programs inside a container to verify that:

  • AF_ALG socket creation is denied
  • AF_VSOCK socket creation is denied
  • AF_ALG via socketcall(2) (using int $0x80 from amd64) is denied
  • AF_ALG via a cross-compiled static i386 binary is denied

Changelog

CVE-2026-31431: Block `AF_ALG` sockets and the `socketcall(2)` multiplexer in the default seccomp profile to prevent in-container privilege escalation via the kernel crypto API ("Copy Fail").

Big thanks to @tianon for helping me figure out why the initial socketcall block didn't work: https://github.com/moby/profiles/releases/tag/seccomp%2Fv0.2.2 ❤️

vvoland added 2 commits May 1, 2026 03:10
Verify that AF_ALG and AF_VSOCK sockets cannot be created inside a
container running with the default seccomp profile.

The test compiles small C programs inside a debian:trixie-slim container
that attempt to create sockets with these address families, then runs
them as a non-root user (uid 1000) and asserts that socket creation is
denied with EPERM or EAFNOSUPPORT.

Signed-off-by: Paweł Gronowski <[email protected]>
(cherry picked from commit ccabd78)
Signed-off-by: Paweł Gronowski <[email protected]>
Test that AF_ALG is also denied through the socketcall(2) multiplexer,
which is used by glibc on i386 instead of direct socket(2) syscalls.

Two subtests:
- AF_ALG_socketcall_int80: uses int $0x80 inline assembly from a native
  64-bit binary to invoke the ia32 socketcall path, with MAP_32BIT to
  keep the args pointer below 4 GB (ia32 compat truncates registers).
- AF_ALG_socketcall_i386: cross-compiles a static i386 binary using
  gcc-i686-linux-gnu where glibc naturally routes socket() through
  socketcall(2).

Both are amd64-only.

Signed-off-by: Paweł Gronowski <[email protected]>
(cherry picked from commit 5a34580)
Signed-off-by: Paweł Gronowski <[email protected]>
@vvoland vvoland force-pushed the 52494-docker-29.x branch from ff71fa2 to a1197d9 Compare May 1, 2026 01:13
@vvoland vvoland marked this pull request as ready for review May 1, 2026 01:16
@vvoland vvoland force-pushed the 52494-docker-29.x branch from cea106a to becdb42 Compare May 1, 2026 01:17
@vvoland vvoland merged commit d329809 into moby:docker-29.x May 1, 2026
96 of 121 checks passed
kaosagnt added a commit to kaosagnt/boot2docker-xfs-ng that referenced this pull request May 2, 2026
https://docs.docker.com/engine/release-notes/29/

Security

This release includes hardening for CVE-2026-31431.

Block AF_ALG sockets and the socketcall(2) multiplexer in the default
seccomp profile to prevent in-container privilege escalation via the kernel
crypto API ("Copy Fail"). moby/moby#52501
kaosagnt added a commit to kaosagnt/toolbox2docker that referenced this pull request May 2, 2026
https://docs.docker.com/engine/release-notes/29/

Security

This release includes hardening for CVE-2026-31431.

Block AF_ALG sockets and the socketcall(2) multiplexer in the default
seccomp profile to prevent in-container privilege escalation via the kernel
crypto API ("Copy Fail"). moby/moby#52501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant