Introduction
I have documented the failures in depth on a PR where the runs were failing - #40519 (comment)
Failed run
https://github.com/cilium/cilium/actions/runs/16281632742/job/45972300970
Output
❌ 2/22 tests failed (2/202 actions), 7 tests skipped, 0 scenarios skipped:
Test [client-egress-tls-sni]:
🟥 client-egress-tls-sni/pod-to-world:https-to-one.one.one.one.-ipv4-0: cilium-test-5/client-645b68dcf7-2jmrw (10.244.3.30) -> one.one.one.one.-https (one.one.one.one.:443): command "curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n --output /dev/null [https://one.one.one.one.:443](https://one.one.one.one./)" failed: command failed (pod=cilium-test-5/client-645b68dcf7-2jmrw, container=client): command terminated with exit code 28
⛑️ The following owners are responsible for reliability of the testsuite:
- @cilium/proxy (pod-to-world)
- @cilium/ci-structure (.github/workflows/tests-e2e-upgrade.yaml)
Test [client-egress-tls-sni-wildcard]:
🟥 client-egress-tls-sni-wildcard/pod-to-world:https-to-one.one.one.one.-ipv4-1: cilium-test-5/client2-66475877c6-fbw7t (10.244.3.68) -> one.one.one.one.-https (one.one.one.one.:443): command "curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n --output /dev/null [https://one.one.one.one.:443](https://one.one.one.one./)" failed: command failed (pod=cilium-test-5/client2-66475877c6-fbw7t, container=client2): command terminated with exit code 28
⛑️ The following owners are responsible for reliability of the testsuite:
- @cilium/proxy (pod-to-world)
- @cilium/ci-structure (.github/workflows/tests-e2e-upgrade.yaml)
Attachments
All sysdump link - https://github.com/cilium/cilium/actions/runs/16281632742/artifacts/3531894952
Attached as zip the failed run
cilium-sysdump-conn-disrupt-test-cilium-upgrade-6-concurrent-20250715-013221.zip
Brief analysis of the failure - #40519 (comment)
Slack discussion
There's a long Slack discussion about the failure - https://cilium.slack.com/archives/CDLN7836J/p1752524978378719?thread_ts=1752163611.807769&cid=CDLN7836J
More details
I am trying to check in this change - cb40019 .
Here's the details of the bug and root cause:
The test is only failing during upgrade. And the test is failing consistently (though it passes occasionally). What I was trying to determine if there is an issue with syncing the policy to the eBPF map which was causing curl failures. Given the test pass consistently on retry, I think either
- the client-egress-sni* test is flaky and retry is the correct way to go (but at the test level), or
- there is an existing bug in L7 that this change exposes
Introduction
I have documented the failures in depth on a PR where the runs were failing - #40519 (comment)
Failed run
https://github.com/cilium/cilium/actions/runs/16281632742/job/45972300970
Output
Attachments
All
sysdumplink - https://github.com/cilium/cilium/actions/runs/16281632742/artifacts/3531894952Attached as zip the failed run
cilium-sysdump-conn-disrupt-test-cilium-upgrade-6-concurrent-20250715-013221.zip
Brief analysis of the failure - #40519 (comment)
Slack discussion
There's a long Slack discussion about the failure - https://cilium.slack.com/archives/CDLN7836J/p1752524978378719?thread_ts=1752163611.807769&cid=CDLN7836J
More details
I am trying to check in this change - cb40019 .
Here's the details of the bug and root cause:
The test is only failing during upgrade. And the test is failing consistently (though it passes occasionally). What I was trying to determine if there is an issue with syncing the policy to the eBPF map which was causing curl failures. Given the test pass consistently on retry, I think either