Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.7] Fix initial sync race when registering NRI plugins #11326

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

klihub
Copy link
Member

@klihub klihub commented Jan 31, 2025

Update NRI deps to v0.8.0 and fix racy initial NRI plugin synchronization wrt. event processing by blocking plugin synchronization altogether during CRI event processing.

Notes: this PR does not pull in the latest tagged NRI version (v0.9.0) to keep the blast radius smaller. We can pull in NRI v0.9.0 if that is preferred. That would also pull in support for NRI WASM plugins and the related wazero and go-plugin dependency bits.

@dosubot dosubot bot added area/nri Node Resource Interface (NRI) dependencies Pull requests that update a dependency file labels Jan 31, 2025
@klihub klihub force-pushed the fixes/1.7/nri-plugin-sync branch 2 times, most recently from 2d08d83 to 83f7b51 Compare January 31, 2025 16:37
@klihub klihub requested a review from mikebrow February 3, 2025 07:31
@klihub
Copy link
Member Author

klihub commented Feb 4, 2025

/test pull-containerd-node-e2e-1-7

Block the synchronization of registering NRI plugins during
CRI events to avoid the plugin ending up in an inconsistent
starting state after initial sync (missing pods, containers
or missed events for some pods or containers).

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the fixes/1.7/nri-plugin-sync branch from 83f7b51 to 11af051 Compare February 13, 2025 19:12
@austinvazquez
Copy link
Member

🤞🏼 the ttrpc update has the magic fix for the almalinux 8 test failure. Is that optimistic wishing? containerd 2.0 branch is testing almalinux 8 and this fix was successful there.

@klihub
Copy link
Member Author

klihub commented Feb 13, 2025

🤞🏼 the ttrpc update has the magic fix for the almalinux 8 test failure. Is that optimistic wishing? containerd 2.0 branch is testing almalinux 8 and this fix was successful there.

Yeah, I'm not sure if it's a flake or not. There are PRs against 1.7 that pass all tests (#11346) and there are ones that failed on almalinux8 (#11358).

So unless someone has definite proof that tests are now flakey on almalinux8 for a known reason, I'd say let's play safe, assume that this PR breaks something, and keep it out from the imminent 1.7 release. I'll need to then dig in and try to figure out what goes on, so it could get into 1.7.27...

Copy link
Member

@austinvazquez austinvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSandboxRemoveWithoutIPLeakage passed on rerun for Almalinux 8 and has consistently passed for other platforms.

@estesp estesp merged commit 12f214a into containerd:release/1.7 Feb 13, 2025
58 checks passed
@klihub klihub deleted the fixes/1.7/nri-plugin-sync branch February 14, 2025 06:59
@dmcgowan dmcgowan changed the title [release/1.7] fix initial sync race of registering NRI plugins. [release/1.7] Fix initial sync race when registering NRI plugins Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nri Node Resource Interface (NRI) dependencies Pull requests that update a dependency file impact/changelog size/XXL
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants