pkg/bpf/collection: Temporarily don't error on unused maps#41379
Conversation
When we introduced the new unused map pruning logic we added a verification step which, after loading, confirms we did everything as expected. This works by reading back the loaded program instructions. However, on GKE the process of reading back the instructions fails inside of cilium/ebpf logic. To unblock CI until the root cause can be resolved, let us not throw an error in the verification step. Signed-off-by: Dylan Reimerink <[email protected]>
|
/test |
|
/ci-gke |
joestringer
left a comment
There was a problem hiding this comment.
Thanks, this is a much more surgical mitigation of the immediate failure for GKE. Given that commit 059977b introduced this check and it seems the repercussions are just that Cilium may load additional maps (which is what it would have done prior to the optimization), this seems like a safe workaround for now.
|
ci-eks run: hit known issue #36428 . Not sure why @cilium/loader was not pulled in for review, but as I understand there's a lack of bandwidth there at the moment. Given this should allow us to mitigate reliable failures on GKE for the immediate term I think this is valuable to push in as-is. |
In cilium#41379 we stopped returning errors from unused map verification because on systems with sysctls set to restrict instruction readback the verification would always fail and block CI. In cilium/ebpf#1858 an `ErrRestrictedKernel` error was added to indicate this specific case. This allows us to now differentiate between genuine unused map errors and the restricted kernel case, and only ignore the latter. Thus we can re-enable unused map verification errors on systems that support it. Signed-off-by: Dylan Reimerink <[email protected]>
In #41379 we stopped returning errors from unused map verification because on systems with sysctls set to restrict instruction readback the verification would always fail and block CI. In cilium/ebpf#1858 an `ErrRestrictedKernel` error was added to indicate this specific case. This allows us to now differentiate between genuine unused map errors and the restricted kernel case, and only ignore the latter. Thus we can re-enable unused map verification errors on systems that support it. Signed-off-by: Dylan Reimerink <[email protected]>
When we introduced the new unused map pruning logic we added a verification step which, after loading, confirms we did everything as expected. This works by reading back the loaded program instructions.
However, on GKE the process of reading back the instructions fails inside of cilium/ebpf logic. To unblock CI until the root cause can be resolved, let us not throw an error in the verification step.
See #41245