Conversation
2edb5d4 to
3b4ecf2
Compare
13aead3 to
301fbaf
Compare
agent/xds/delta_test.go
Outdated
There was a problem hiding this comment.
~ Given how huge delta_test.go is, I think it might be worth considering moving some of these recent tests to delta_envoy_extender_test.go, which is currently just a set of helpers to aid goldens in their respective test files.
Following suit with Test_validateAndApplyEnvoyExtension_Validations today to minimize change.
There was a problem hiding this comment.
These changes were necessary to avoid nil pointer panics when calling CanApply on extensions that use these helpers, which aren't guaranteed to have data outside their use case (IsSourcedFromUpstream).
301fbaf to
7402d06
Compare
|
cc @mkeeler in case you'd like to follow along since you raised this bug |
cthain
left a comment
There was a problem hiding this comment.
This looks great. Nice work on this fix, the test coverage is awesome 👍
Left a few minor, non-blocking comments.
agent/xds/delta.go
Outdated
| r = resources | ||
| // Only report an error if the extension should have been applied. | ||
| if canApply { | ||
| e = fmt.Errorf("unexpected panic: %v", e) |
There was a problem hiding this comment.
I'm glad you guard against the case of faulty extension logic causing a panic! WDYT about making the error message here a bit more verbose? Also, I think we want to wrap err in the new error that we return.
| e = fmt.Errorf("unexpected panic: %v", e) | |
| e = fmt.Errorf("attempt to apply Envoy extension %q caused an unexpected panic: %w", runtimeConfig.EnvoyExtension.Name, err) |
There was a problem hiding this comment.
I don't see a test for the panic case. It feels like a case we should probably unit test for.
There was a problem hiding this comment.
Good call on both 👍🏻 I'll follow up on these.
There was a problem hiding this comment.
Had to stick w/ %v due to type uncertainty of err after recover() (TIL), but updated message. Also added tests around panic.
|
Thank you for the thorough review @cthain ! Will get these comments addressed before merging, agree w/ all of them. |
Ensure that non-required extensions do not change xDS resources before exiting on failure by cloning proto messages prior to applying each extension. To support this change, also move `CanApply` checks up a layer and make them prior to attempting extension application, s.t. we avoid unnecessary copies where extensions can't be applied. Last, ensure that we do not allow panics from `CanApply` or `Extend` checks to escape the attempted extension application.
7402d06 to
a920c71
Compare
agent/xds/delta.go
Outdated
| r = resources | ||
| // Only report an error if the extension should have been applied. | ||
| if canApply { | ||
| e = fmt.Errorf("unexpected panic: %v", e) |
There was a problem hiding this comment.
Had to stick w/ %v due to type uncertainty of err after recover() (TIL), but updated message. Also added tests around panic.
| e = fmt.Errorf("attempt to apply Envoy extension %q caused an unexpected panic: %v", | ||
| runtimeConfig.EnvoyExtension.Name, err) |
There was a problem hiding this comment.
When writing tests for the panic handling, I decided it was best to err on the side of explicit errors if CanApply fails entirely, so I removed the previous conditional around this logging.
| } | ||
| } | ||
|
|
||
| func Test_applyEnvoyExtension_HandlesPanics(t *testing.T) { |
Ensure that non-required extensions do not change xDS resources before exiting on failure by cloning proto messages prior to applying each extension.
Also avoid unnecessary cloning by moving
CanApplychecks up a layer and making them before application is attempted.Description
Today, it's theoretically possible that a non-required extension could fail partway through making updates. If that extension is not configured as
Required, we will not short-circuit xDS updates; the error will be logged and swallowed, allowing any other extensions to continue applying before updating xDS resources.By proactively cloning resources that are subject to extensions, s.t. we do not pass any data by reference that is intended to be modified atomically, we can prevent the unintentional partial application of extensions regardless of their
Requiredstatus.Testing & Reproduction steps
CanApplycheck to ensure we skip application when extensions should not be appliedClonemethod ofIndexedResourcesPR Checklist