fix: Correctly create PolicyEvaluatorPre and consume eval timeouts#1284
fix: Correctly create PolicyEvaluatorPre and consume eval timeouts#1284viccuad merged 6 commits intokubewarden:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1284 +/- ##
==========================================
- Coverage 26.62% 26.61% -0.02%
==========================================
Files 16 16
Lines 1044 1052 +8
==========================================
+ Hits 278 280 +2
- Misses 766 772 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7396a27 to
9d18f68
Compare
|
Dropped the commit adding logging to the integration tests, which slowed them in CI and made them fail. |
|
@viccuad you approach works, but it leads to more memory being used. That's because, when multiple policies share the same module, but have different timeout values, we end up creating multiple "pre" objects. This leads to more memory being used by the policy server. I've looked into the codebase and I discovered that the evaluation timeout doesn't actually need to be bound to the "pre" objects. Instead, it can be provided to a "pre" object at rehydration time. This required quite some changes across different projects:
Stuff left to be done:
I'll be traveling next week, but I'll find time to review your comments. Please, ping me on slack whenever you leave a message on one of these PRs |
Before, we were creating a PolicyEvaluator per compiled module.
This is not enough, as each policy can now have a different
spec.timeoutEvalSeconds.
This bug made the timeout integration test fail in a flaky manner.
The integration tests have 2 policies, `sleep` and `sleep-1s-timeout`.
They are saved in a hashmap (nondeterministic order). If on the run the
`sleep` policy was registered first, the `sleep-1s-timeout` would reuse
the `sleep` PolicyEvaluator, which meant that it didn't have the
spec.timeoutEvalSeconds assigned and the test would fail.
To fix this:
- Extend the check when registering and creating a PolicyEvaluatorPre
from only the module to the module + policy_evaluation_limit_seconds.
Add new `PolicyEvaluatorPreKey{}` with those values as key for the
hashmap of policies and PolicyEvaluatorPre.
- Add `module_digest_and_eval_timeout_to_policy_evaluator_pre`, needed
to obtain the eval timeout used in the key when we are rehydrating.
Signed-off-by: Víctor Cuadrado Juan <[email protected]>
Signed-off-by: Víctor Cuadrado Juan <[email protected]>
Signed-off-by: Víctor Cuadrado Juan <[email protected]>
Ensure we enable eval timeout even if we don't have policy-server global timeout. Signed-off-by: Víctor Cuadrado Juan <[email protected]>
|
I just did a force push because I had to solve some conflicts between this branch and |
viccuad
left a comment
There was a problem hiding this comment.
Changes LGTM, thanks! (can't approve, I'm also author)
jvanz
left a comment
There was a problem hiding this comment.
Considering all the changes related to this fix (all the other PRs opened by Flavio), the code looks fine. But there is a linter error that we can fix to make the CI green.
|
The lint fix would be the same as this one: |
The prior commit that fixes kubewarden#1270 causes a higher memory usage. That happens because, if the same module is used by two different policies, each one of them with different epoch deadlines, two "pre" objects would be created and kept in memory. This happens because the epoch deadline is bound to the "pre" object. This limitation has been changed with PR kubewarden/policy-evaluator#766. This commit uses the new API introduced by the PR linked above, allowing to go back to the old memory consumption. Signed-off-by: Flavio Castelli <[email protected]>
Signed-off-by: Flavio Castelli <[email protected]>
|
I've updated the PR to use officially tagged crates, plus I've fixed the linter error in the way as @viccuad suggested |
|
Thanks, merging! |
Description
Fix #1270.
With this, both integration tests and coverage (which run integration tests) should not be flaky anymore.
This PR fixes 2 bugs:
Ensure we enable eval timeout even if we don't have policy-server global timeout.
Before, we were creating a PolicyEvaluator per compiled module.
This is not enough, as each policy can now have a different
spec.timeoutEvalSeconds.
This bug made the timeout integration test fail in a flaky manner.
The integration tests have 2 policies,
sleepandsleep-1s-timeout.They are saved in a hashmap (nondeterministic order). If on the run the
sleeppolicy was registered first, thesleep-1s-timeoutwould reusethe
sleepPolicyEvaluator, which meant that it didn't have thespec.timeoutEvalSeconds assigned and the test would fail.
To fix this:
from only the module to the module + policy_evaluation_limit_seconds.
Add new
PolicyEvaluatorPreKey{}with those values as key for thehashmap of policies and PolicyEvaluatorPre.
module_digest_and_eval_timeout_to_policy_evaluator_pre, neededto obtain the eval timeout used in the key when we are rehydrating.
Test
Extended existing unit test for creation of PolicyEvaluators.
Use existing integration tests for the timeoutEvalSeconds feature.
Add configurable logging output to integration tests.
Additional Information
Tradeoff
Irrelevant space cost increase of
+Non worst case, N: number of policies. As we have a new hashmap ofpolicy_id,eval timeout.Potential improvement
Maybe drop the commit that adds configurable logging output to integration tests (defaults to INFO) if it slows test runs.