Skip to content

policy: Replace versioned with part.Map redux#43279

Merged
jrajahalme merged 3 commits intocilium:mainfrom
jrajahalme:policy-part-map-redux
Dec 16, 2025
Merged

policy: Replace versioned with part.Map redux#43279
jrajahalme merged 3 commits intocilium:mainfrom
jrajahalme:policy-part-map-redux

Conversation

@jrajahalme
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme commented Dec 11, 2025

#43237 reverted the last commit of #42992 due to a new unit test flake.

This PR reapplies the reverted commit as-is (1st commit) and adds two unit test fixes:

  1. policy: Terminate incremental notification handlers in tests

Return a closer function from test infra returning new SelectorCache instances, so that the test can call closer() to terminate the incremental update notification handler goroutine, if one was started for the SelectorCache instance. This eliminates leaked goroutines which take both resources and complicate go test output in error situations such as test timeouts.

  1. policy: Defer test validations of selector notifications till commit

The commit changes the test tooling for the mock cachedSelectionUser, no Test code is changed otherwise. The rationale for the change is that the mock was accessing the current version of the selected identities via selector.GetSelections(), while it really should postpone this access until IdentitySelectionCommit() is called (as that is the time the user gets a handle to the version where the changes were applied) and use that version explicitly (selector.GetSelectionsAt(...)).

Production implementation (in pkg/policy/mapstate.go) is already doing the right thing.

Previously the selection updates were always done prior IdentitySelectionUpdated() was called, but with the change in the policy: Replace versioned with part.Map commit the publication of the changes is done later, but always prior to the IdentitySelectionCommit() call. This results in a race, where sometimes the update on the selector selections is not yet available when the IdentitySelectionUpdated() callback executes (in a different goroutine). If the execution of the notifications was synchronous then the associated tests would always fail without this fix.

This test can be used to validate the fix:

$ go test -v -run "TestTransactionalUpdate" -count=10000 -timeout=10s ./pkg/policy/.
  • When run on main (with the already merged revert) this test should pass
  • When run with the 1st commit applied, this fails with console full of goroutines with
github.com/cilium/cilium/pkg/policy.(*SelectorCache).handleUserNotifications
  • When run with the 2nd commit also applied the actual error is visible:
selectorcache_test.go:204:
        Error Trace:    /Volumes/dev/cilium/cilium/pkg/policy/selectorcache_test.go:204
                                  /Volumes/dev/cilium/cilium/pkg/policy/selectorcache.go:384
                                  /usr/local/go/src/runtime/asm_arm64.s:1268
        Error:          Should be false
        Test:           TestTransactionalUpdate
  • when run with all commits applied it succeeds again:
$ go test -run "TestTransactionalUpdate" -count=10000 -timeout=10s ./pkg/policy/.
ok      github.com/cilium/cilium/pkg/policy     0.708s

Fixes: #42992
Fixes: #43237

@jrajahalme jrajahalme requested review from a team as code owners December 11, 2025 18:54
@jrajahalme jrajahalme added the area/CI Continuous Integration testing issue or flake label Dec 11, 2025
@jrajahalme jrajahalme requested review from a team as code owners December 11, 2025 18:54
@jrajahalme jrajahalme added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Dec 11, 2025
@jrajahalme jrajahalme requested a review from a team as a code owner December 11, 2025 18:54
@jrajahalme jrajahalme requested review from gandro and nebril December 11, 2025 18:54
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 11, 2025
@jrajahalme jrajahalme added the release-note/misc This PR makes changes that have no direct user impact. label Dec 11, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 11, 2025
@jrajahalme
Copy link
Copy Markdown
Member Author

/test

@jrajahalme
Copy link
Copy Markdown
Member Author

/ci-l3-l4

@jrajahalme
Copy link
Copy Markdown
Member Author

/ci-l7

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't re-review the first commit in detail given it was already functionally reviewed. The test changes seem good to me. Thanks!

This reverts commit 206d042.

Signed-off-by: Jarno Rajahalme <[email protected]>
Add a done channel to allow test code to terminate the incremental update
handler goroutine. This prevents leaking of goroutines in the test suite,
which is especially bad if tests are repeated multiple times.

The handler is never stopped in production to avoid the associated
failure modes (handler stopping when it should not, etc.).

Signed-off-by: Jarno Rajahalme <[email protected]>
Perform sanity checks on the cached selector selections only after the
corresponding snapshot is delivered via commit.

Fixes: cilium#42992
Signed-off-by: Jarno Rajahalme <[email protected]>
@jrajahalme jrajahalme force-pushed the policy-part-map-redux branch from 77040f1 to e5c8323 Compare December 15, 2025 14:58
@jrajahalme jrajahalme requested a review from squeed December 15, 2025 14:59
@jrajahalme
Copy link
Copy Markdown
Member Author

/test

@jrajahalme jrajahalme enabled auto-merge December 15, 2025 15:00
@jrajahalme
Copy link
Copy Markdown
Member Author

rebased and changed to use tb.Cleanup() instead of returning closer and deferring it.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 15, 2025
@jrajahalme jrajahalme added this pull request to the merge queue Dec 16, 2025
Merged via the queue into cilium:main with commit 87fe8c0 Dec 16, 2025
78 of 79 checks passed
@jrajahalme jrajahalme deleted the policy-part-map-redux branch December 16, 2025 07:24
@odinuge odinuge mentioned this pull request Dec 16, 2025
8 tasks
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

7 participants