fix: Correctly create PolicyEvaluatorPre and consume eval timeouts by viccuad · Pull Request #1284 · kubewarden/policy-server

viccuad · 2025-09-19T12:25:06Z

Description

Fix #1270.
With this, both integration tests and coverage (which run integration tests) should not be flaky anymore.

This PR fixes 2 bugs:

Ensure we enable eval timeout even if we don't have policy-server global timeout.
Before, we were creating a PolicyEvaluator per compiled module.
This is not enough, as each policy can now have a different
spec.timeoutEvalSeconds.
This bug made the timeout integration test fail in a flaky manner.
The integration tests have 2 policies, sleep and sleep-1s-timeout.
They are saved in a hashmap (nondeterministic order). If on the run the
sleep policy was registered first, the sleep-1s-timeout would reuse
the sleep PolicyEvaluator, which meant that it didn't have the
spec.timeoutEvalSeconds assigned and the test would fail.
To fix this:
- Extend the check when registering and creating a PolicyEvaluatorPre
  from only the module to the module + policy_evaluation_limit_seconds.
  Add new PolicyEvaluatorPreKey{} with those values as key for the
  hashmap of policies and PolicyEvaluatorPre.
- Add module_digest_and_eval_timeout_to_policy_evaluator_pre, needed
  to obtain the eval timeout used in the key when we are rehydrating.

Test

Extended existing unit test for creation of PolicyEvaluators.
Use existing integration tests for the timeoutEvalSeconds feature.
Add configurable logging output to integration tests.

Additional Information

Tradeoff

Irrelevant space cost increase of +N on worst case, N: number of policies. As we have a new hashmap of policy_id, eval timeout.

Potential improvement

Maybe drop the commit that adds configurable logging output to integration tests (defaults to INFO) if it slows test runs.

codecov · 2025-09-19T12:31:34Z

Codecov Report

❌ Patch coverage is 20.00000% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 26.61%. Comparing base (d9504b9) to head (82cec0f).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/evaluation/evaluation_environment.rs	33.33%	8 Missing ⚠️
src/lib.rs	0.00%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1284      +/-   ##
==========================================
- Coverage   26.62%   26.61%   -0.02%     
==========================================
  Files          16       16              
  Lines        1044     1052       +8     
==========================================
+ Hits          278      280       +2     
- Misses        766      772       +6

Flag	Coverage Δ
unit-tests	`26.61% <20.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

viccuad · 2025-09-19T15:01:16Z

Dropped the commit adding logging to the integration tests, which slowed them in CI and made them fail.

flavio · 2025-09-20T13:56:05Z

@viccuad you approach works, but it leads to more memory being used. That's because, when multiple policies share the same module, but have different timeout values, we end up creating multiple "pre" objects. This leads to more memory being used by the policy server.

I've looked into the codebase and I discovered that the evaluation timeout doesn't actually need to be bound to the "pre" objects. Instead, it can be provided to a "pre" object at rehydration time.

This required quite some changes across different projects:

rust sdk: feat: update Kubewarden CRDs policy-sdk-rust#201 - I needed to read the evaluation timeout value that has been added to our CRDs
wapc: feat!(wasmtime-provider): do not bind epoch deadline to "pre" instances wapc/wapc-rs#168 - this untangles the epoch deadline from the pre object
policy-evaluator: feat!: do not bind epoch deadline to pre instances policy-evaluator#766 - on top of consuming the changes from SDK and wapc, the code has been changed to build Rego and WASI "pre" objects without binding the epoch deadline to them

Stuff left to be done:

update kwctl to consume the new version of policy evaluator. The API changed, we will have to do some small tweaks. I don't think this is going to be hard, since we don't use epoch deadline in there (AFAIK)
merge and tag all the crates involved

I'll be traveling next week, but I'll find time to review your comments. Please, ping me on slack whenever you leave a message on one of these PRs

Before, we were creating a PolicyEvaluator per compiled module. This is not enough, as each policy can now have a different spec.timeoutEvalSeconds. This bug made the timeout integration test fail in a flaky manner. The integration tests have 2 policies, `sleep` and `sleep-1s-timeout`. They are saved in a hashmap (nondeterministic order). If on the run the `sleep` policy was registered first, the `sleep-1s-timeout` would reuse the `sleep` PolicyEvaluator, which meant that it didn't have the spec.timeoutEvalSeconds assigned and the test would fail. To fix this: - Extend the check when registering and creating a PolicyEvaluatorPre from only the module to the module + policy_evaluation_limit_seconds. Add new `PolicyEvaluatorPreKey{}` with those values as key for the hashmap of policies and PolicyEvaluatorPre. - Add `module_digest_and_eval_timeout_to_policy_evaluator_pre`, needed to obtain the eval timeout used in the key when we are rehydrating. Signed-off-by: Víctor Cuadrado Juan <[email protected]>

Signed-off-by: Víctor Cuadrado Juan <[email protected]>

Ensure we enable eval timeout even if we don't have policy-server global timeout. Signed-off-by: Víctor Cuadrado Juan <[email protected]>

flavio · 2025-09-20T13:59:06Z

I just did a force push because I had to solve some conflicts between this branch and main.

viccuad

Changes LGTM, thanks! (can't approve, I'm also author)

jvanz

Considering all the changes related to this fix (all the other PRs opened by Flavio), the code looks fine. But there is a linter error that we can fix to make the CI green.

viccuad · 2025-09-22T15:57:15Z

The lint fix would be the same as this one:
d83a9bd
We can either port it or rebase after merging that one.

The prior commit that fixes kubewarden#1270 causes a higher memory usage. That happens because, if the same module is used by two different policies, each one of them with different epoch deadlines, two "pre" objects would be created and kept in memory. This happens because the epoch deadline is bound to the "pre" object. This limitation has been changed with PR kubewarden/policy-evaluator#766. This commit uses the new API introduced by the PR linked above, allowing to go back to the old memory consumption. Signed-off-by: Flavio Castelli <[email protected]>

Signed-off-by: Flavio Castelli <[email protected]>

flavio · 2025-09-23T05:30:31Z

I've updated the PR to use officially tagged crates, plus I've fixed the linter error in the way as @viccuad suggested

viccuad · 2025-09-23T06:22:48Z

Thanks, merging!

viccuad requested a review from a team as a code owner September 19, 2025 12:25

github-project-automation bot added this to Kubewarden Admission Controller Sep 19, 2025

github-project-automation bot moved this to Pending review in Kubewarden Admission Controller Sep 19, 2025

github-actions bot added kind/bug kind/feature labels Sep 19, 2025

viccuad changed the title ~~fix: Correctly create PolicyEvaluatorPre and consume timeouts for spec.timeoutEvalSeconds feature~~ fix: Correctly create PolicyEvaluatorPre and consume eval timeouts Sep 19, 2025

viccuad removed this from Kubewarden Admission Controller Sep 19, 2025

github-project-automation bot moved this to Pending review in Kubewarden Admission Controller Sep 19, 2025

github-project-automation bot added this to Kubewarden Admission Controller Sep 19, 2025

viccuad removed this from Kubewarden Admission Controller Sep 19, 2025

viccuad force-pushed the fix/timeout branch from 7396a27 to 9d18f68 Compare September 19, 2025 15:00

This was referenced Sep 20, 2025

feat: update Kubewarden CRDs kubewarden/policy-sdk-rust#201

Merged

feat!: do not bind epoch deadline to pre instances kubewarden/policy-evaluator#766

Merged

viccuad added 4 commits September 20, 2025 15:56

test: Extend test that checks for PolicyEvaluator instances

695b1b7

Signed-off-by: Víctor Cuadrado Juan <[email protected]>

refactor: Rename EvaluationEnvironmentBuilder var

7dad19f

Signed-off-by: Víctor Cuadrado Juan <[email protected]>

fix: Enable eval timeout in needed cases

3c26d72

Ensure we enable eval timeout even if we don't have policy-server global timeout. Signed-off-by: Víctor Cuadrado Juan <[email protected]>

flavio force-pushed the fix/timeout branch from dc5f6d9 to 2526adc Compare September 20, 2025 13:58

viccuad commented Sep 22, 2025

View reviewed changes

jvanz approved these changes Sep 22, 2025

View reviewed changes

jvanz suggested changes Sep 22, 2025

View reviewed changes

flavio added 2 commits September 23, 2025 07:29

chore(test): fix linter

82cec0f

Signed-off-by: Flavio Castelli <[email protected]>

flavio force-pushed the fix/timeout branch from 2526adc to 82cec0f Compare September 23, 2025 05:30

viccuad merged commit 92cd9c0 into kubewarden:main Sep 23, 2025
11 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Correctly create PolicyEvaluatorPre and consume eval timeouts#1284

fix: Correctly create PolicyEvaluatorPre and consume eval timeouts#1284
viccuad merged 6 commits intokubewarden:mainfrom
viccuad:fix/timeout

viccuad commented Sep 19, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

viccuad commented Sep 19, 2025 •

edited

Loading

Uh oh!

flavio commented Sep 20, 2025

Uh oh!

flavio commented Sep 20, 2025

Uh oh!

viccuad left a comment

Uh oh!

jvanz left a comment

Uh oh!

viccuad commented Sep 22, 2025 •

edited

Loading

Uh oh!

flavio commented Sep 23, 2025

Uh oh!

viccuad commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

viccuad commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test

Additional Information

Tradeoff

Potential improvement

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

viccuad commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flavio commented Sep 20, 2025

Uh oh!

flavio commented Sep 20, 2025

Uh oh!

viccuad left a comment

Choose a reason for hiding this comment

Uh oh!

jvanz left a comment

Choose a reason for hiding this comment

Uh oh!

viccuad commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flavio commented Sep 23, 2025

Uh oh!

viccuad commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viccuad commented Sep 19, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading

viccuad commented Sep 19, 2025 •

edited

Loading

viccuad commented Sep 22, 2025 •

edited

Loading