Skip to content

Revert "policy: Replace versioned with part.Map"#43237

Merged
jrajahalme merged 1 commit intocilium:mainfrom
odinuge:revert-42992-policy-selectorcache-statedb
Dec 11, 2025
Merged

Revert "policy: Replace versioned with part.Map"#43237
jrajahalme merged 1 commit intocilium:mainfrom
odinuge:revert-42992-policy-selectorcache-statedb

Conversation

@odinuge
Copy link
Copy Markdown
Member

@odinuge odinuge commented Dec 10, 2025

Reverts the last commit from #42992.

Seeing pretty consistent failures of the TestTransactionalUpdate test on main now. Looks like this can be bisected back to this change.

Locally I'm able to repro the following error with latest main;

=== RUN   TestTransactionalUpdate
    selectorcache_test.go:151:
                Error Trace:    /mnt/code/pkg/policy/selectorcache_test.go:151
                                                        /mnt/code/pkg/policy/selectorcache.go:378
                                                        /usr/local/go/src/runtime/asm_arm64.s:1268
                Error:          Should be false
                Test:           TestTransactionalUpdate

With this revert I'm not able to reproduce so far - so its clearly some kind of race here, but its not fully clear to me why. I think we should revert for now until we fully understand whats going on. Its not deterministically failing, and it only does so very rarely - but there has to be a race somewhere.

I think the reason for the CI fail in #43231 was because some other test took a long time, causing the go testing tooling to realize that the selectorcache is leaking the goroutine handleUserNotifications thats never cleaned up. So if all tests take more time than the test timeout, it can hit that. Reproducing that test failure is easy locally, since you can do something like go test -v -run "TestTransactionalUpdate" -count=100000 -timeout=1s. This is also probably something we should fix at some point.

@maintainer-s-little-helper
Copy link
Copy Markdown

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Dec 10, 2025
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Dec 10, 2025
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

/test

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

Locally I'm able to repro the following error with latest main;

=== RUN   TestTransactionalUpdate
    selectorcache_test.go:151:
                Error Trace:    /mnt/code/pkg/policy/selectorcache_test.go:151
                                                        /mnt/code/pkg/policy/selectorcache.go:378
                                                        /usr/local/go/src/runtime/asm_arm64.s:1268
                Error:          Should be false
                Test:           TestTransactionalUpdate

With this revert I'm not able to reproduce - so its clearly some kind of race here.

I think the reason for the CI fail was because some other test took a long time, causing the go testing tooling to realize that the selectorcache is leaking the goroutine handleUserNotifications thats never cleaned up. So if all tests take more time than the test timeout, it can hit that. Reproducing that test failure is easy locally, since you can do something like go test -v -run "TestTransactionalUpdate" -count=100000 -timeout=1s.

It looks like its only 0d76100 causing issues here, so I'll revert only that one and add the sign-off message 😄

@odinuge odinuge force-pushed the revert-42992-policy-selectorcache-statedb branch from ea7b381 to 88838be Compare December 10, 2025 11:13
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 10, 2025
@odinuge odinuge marked this pull request as ready for review December 10, 2025 11:15
@odinuge odinuge requested review from a team as code owners December 10, 2025 11:15
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

/test

@odinuge odinuge mentioned this pull request Dec 10, 2025
8 tasks
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

Hitting #42922. Will rerun

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

/ci-clustermesh

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

cc @jrajahalme

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

/ci-e2e-upgrade

@odinuge odinuge force-pushed the revert-42992-policy-selectorcache-statedb branch from 88838be to 5ff22c4 Compare December 10, 2025 12:25
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with the revert, thanks for the repro, will investigate ASAP.

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

Thanks! We should probably deal with the "leaked" handleUserNotifications goroutine as well separately. When running loads of tests, each selector cache will create a separate leaked goroutine, that causes loads of noise if tests timeout - and its very hard to understand what the actual problem is.

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

/test

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

Another instance of #42922. Will retry

@jrajahalme
Copy link
Copy Markdown
Member

Can we merge as-is or do I need to do another rebase?

Looks like most of the fails in ci-e2e-upgrade are due to this being merged a few hours ago: #42150, so unfortunately a rebase is needed.

This reverts commit 4c4c7c9.

Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>
auto-merge was automatically disabled December 11, 2025 13:03

Head branch was pushed to by a user without write access

@odinuge odinuge force-pushed the revert-42992-policy-selectorcache-statedb branch from 6b94d0b to 3e1ee89 Compare December 11, 2025 13:03
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

/test

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

/ci-e2e-upgrade

@jrajahalme jrajahalme enabled auto-merge December 11, 2025 15:24
@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
@jrajahalme
Copy link
Copy Markdown
Member

Merge keeps failing due to some unrelated infrastructure issue on image build workflows, trying again:

Error:  received HTTP status=503 for url='https://github.com/anchore/syft/releases/download/v1.38.0/syft_1.38.0_linux_amd64.tar.gz' 

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

Looks like some github.com issues. All tests are passing now.

@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@jrajahalme jrajahalme added this pull request to the merge queue Dec 11, 2025
Merged via the queue into cilium:main with commit 206d042 Dec 11, 2025
75 of 76 checks passed
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

5 participants