Conversation
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
|
Ah, the irony of hitting a different test flakiness. For the record: I will rerun the integration tests and investigate that flakiness separately. |
| let endpoints_ready = |obj: Option<&k8s::Endpoints>| -> bool { | ||
| if let Some(ep) = obj { | ||
| return ep.subsets.iter().flatten().count() > 0; | ||
| } | ||
| false | ||
| }; | ||
| await_condition(&client, &ns, "web", endpoints_ready).await; |
There was a problem hiding this comment.
nit/TIOLI: is there a reason endpoints_ready has to be repeated in all these tests, or can it be a helper in linkerd_policy_test?
Signed-off-by: Alex Leong <[email protected]>
hawkw
left a comment
There was a problem hiding this comment.
looks good to me! i commented on a couple of nits that you're welcome to address or ignore :)
| if let Some(ep) = obj { | ||
| return ep.subsets.iter().flatten().count() > 0; | ||
| } | ||
| false |
There was a problem hiding this comment.
nit, take it or leave it: could also be represented as
| if let Some(ep) = obj { | |
| return ep.subsets.iter().flatten().count() > 0; | |
| } | |
| false | |
| obj.into_iter().flat_map(|ep| ep.subsets.iter().flatten()).count() > 0 |
although 🤷♀️ if that's any clearer,
| .inner | ||
| } | ||
|
|
||
| // Wait for the endpoints controller to populate the Endpoints resource. |
There was a problem hiding this comment.
nit, not a big deal:
| // Wait for the endpoints controller to populate the Endpoints resource. | |
| /// Returns `true` if the endpoints controller has populated an Endpoints resource. |
There was a problem hiding this comment.
we could leave the "// wait for..." comment in the places where this is actually used with await_condition, if you want, but not a big deal either way.
We intermittently see flaky policy integration test failures like:
```
failures:
either
thread 'either' panicked at 'assertion failed: `(left == right)`
left: `7`,
right: `0`: blessed uninjected curl must succeed', policy-test/tests/e2e_server_authorization.rs:293:9
```
This test failure is saying that the curl process is returning an exit code of 7 instead of the expected exit code of 0. This exit code indicates that curl failed to establish a connection. https://everything.curl.dev/usingcurl/returns
It's unclear why this connection occasionally fails in CI and I have not been able to reproduce this failure locally.
However, by looking at the logic of the integration test, we can see that the integration test creates the `web` Service and the `web` Pod and waits for that pod to become ready before unblocking the curl from executing. This means that, theoretically, there could be a race condition between the test and the kubernetes endpoints controller. As soon as the web pod becomes ready, the endpoints controller will update the endpoints resource for the `web` Service and at the same time, our test will unblock the curl command. If the test wins this race, it is possible that curl will run before the endpoints resource has been updated.
We add an additional wait condition to the test to wait until the endpoints resource has an endpoint before unblocking curl.
Since I could not reproduce the test failure locally, it is impossible to say if this is actually the cause of the flakiness or if this change fixes it.
Signed-off-by: Alex Leong <[email protected]>
We intermittently see flaky policy integration test failures like:
```
failures:
either
thread 'either' panicked at 'assertion failed: `(left == right)`
left: `7`,
right: `0`: blessed uninjected curl must succeed', policy-test/tests/e2e_server_authorization.rs:293:9
```
This test failure is saying that the curl process is returning an exit code of 7 instead of the expected exit code of 0. This exit code indicates that curl failed to establish a connection. https://everything.curl.dev/usingcurl/returns
It's unclear why this connection occasionally fails in CI and I have not been able to reproduce this failure locally.
However, by looking at the logic of the integration test, we can see that the integration test creates the `web` Service and the `web` Pod and waits for that pod to become ready before unblocking the curl from executing. This means that, theoretically, there could be a race condition between the test and the kubernetes endpoints controller. As soon as the web pod becomes ready, the endpoints controller will update the endpoints resource for the `web` Service and at the same time, our test will unblock the curl command. If the test wins this race, it is possible that curl will run before the endpoints resource has been updated.
We add an additional wait condition to the test to wait until the endpoints resource has an endpoint before unblocking curl.
Since I could not reproduce the test failure locally, it is impossible to say if this is actually the cause of the flakiness or if this change fixes it.
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Adam Shaw <[email protected]>
#11368 added a step to certain e2e integration tests where we await the endpoints becoming ready before attempting to send traffic to them. This was done to combat flakyness on those tests. We have observed flakyness in other similar tests, `targets_route` in particular. We add the same await step to that test and to all other tests in that form. Given the nature of flaky tests, it's difficult to confirm that this fixes the flakyness. Signed-off-by: Alex Leong <[email protected]>
We intermittently see flaky policy integration test failures like:
This test failure is saying that the curl process is returning an exit code of 7 instead of the expected exit code of 0. This exit code indicates that curl failed to establish a connection. https://everything.curl.dev/usingcurl/returns
It's unclear why this connection occasionally fails in CI and I have not been able to reproduce this failure locally.
However, by looking at the logic of the integration test, we can see that the integration test creates the
webService and thewebPod and waits for that pod to become ready before unblocking the curl from executing. This means that, theoretically, there could be a race condition between the test and the kubernetes endpoints controller. As soon as the web pod becomes ready, the endpoints controller will update the endpoints resource for thewebService and at the same time, our test will unblock the curl command. If the test wins this race, it is possible that curl will run before the endpoints resource has been updated.We add an additional wait condition to the test to wait until the endpoints resource has an endpoint before unblocking curl.
Since I could not reproduce the test failure locally, it is impossible to say if this is actually the cause of the flakiness or if this change fixes it.