Skip to content

[NET-4703] Prevent partial application of Envoy extensions#18068

Merged
zalimeni merged 1 commit intomainfrom
zalimeni/net-4703-prevent-envoy-extension-partial-application
Jul 31, 2023
Merged

[NET-4703] Prevent partial application of Envoy extensions#18068
zalimeni merged 1 commit intomainfrom
zalimeni/net-4703-prevent-envoy-extension-partial-application

Conversation

@zalimeni
Copy link
Copy Markdown
Member

@zalimeni zalimeni commented Jul 10, 2023

Ensure that non-required extensions do not change xDS resources before exiting on failure by cloning proto messages prior to applying each extension.

Also avoid unnecessary cloning by moving CanApply checks up a layer and making them before application is attempted.

Description

Today, it's theoretically possible that a non-required extension could fail partway through making updates. If that extension is not configured as Required, we will not short-circuit xDS updates; the error will be logged and swallowed, allowing any other extensions to continue applying before updating xDS resources.

By proactively cloning resources that are subject to extensions, s.t. we do not pass any data by reference that is intended to be modified atomically, we can prevent the unintentional partial application of extensions regardless of their Required status.

Testing & Reproduction steps

  • Added tests that use a faulty extension to make changes, then fail, and verify that those changes are not retained
  • Added tests for newly promoted CanApply check to ensure we skip application when extensions should not be applied
  • Added tests for new Clone method of IndexedResources

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

@zalimeni zalimeni added pr/no-docs PR does not include docs and should not trigger reminder for cherrypicking them. backport/1.16 This release series is no longer active on CE. Use backport/ent/1.16. labels Jul 10, 2023
@github-actions github-actions bot added the theme/envoy/xds Related to Envoy support label Jul 10, 2023
@zalimeni zalimeni force-pushed the zalimeni/net-4703-prevent-envoy-extension-partial-application branch 8 times, most recently from 2edb5d4 to 3b4ecf2 Compare July 11, 2023 14:58
@zalimeni zalimeni changed the title Prevent partial application of Envoy extensions [NET-4703] Prevent partial application of Envoy extensions Jul 11, 2023
@zalimeni zalimeni force-pushed the zalimeni/net-4703-prevent-envoy-extension-partial-application branch 6 times, most recently from 13aead3 to 301fbaf Compare July 24, 2023 17:57
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ Given how huge delta_test.go is, I think it might be worth considering moving some of these recent tests to delta_envoy_extender_test.go, which is currently just a set of helpers to aid goldens in their respective test files.

Following suit with Test_validateAndApplyEnvoyExtension_Validations today to minimize change.

Comment on lines 73 to 76
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were necessary to avoid nil pointer panics when calling CanApply on extensions that use these helpers, which aren't guaranteed to have data outside their use case (IsSourcedFromUpstream).

@zalimeni zalimeni force-pushed the zalimeni/net-4703-prevent-envoy-extension-partial-application branch from 301fbaf to 7402d06 Compare July 24, 2023 18:00
@zalimeni zalimeni marked this pull request as ready for review July 24, 2023 18:25
@zalimeni zalimeni requested review from cthain and hashi-derek July 24, 2023 18:27
@zalimeni
Copy link
Copy Markdown
Member Author

cc @mkeeler in case you'd like to follow along since you raised this bug

Copy link
Copy Markdown
Contributor

@cthain cthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Nice work on this fix, the test coverage is awesome 👍

Left a few minor, non-blocking comments.

r = resources
// Only report an error if the extension should have been applied.
if canApply {
e = fmt.Errorf("unexpected panic: %v", e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you guard against the case of faulty extension logic causing a panic! WDYT about making the error message here a bit more verbose? Also, I think we want to wrap err in the new error that we return.

Suggested change
e = fmt.Errorf("unexpected panic: %v", e)
e = fmt.Errorf("attempt to apply Envoy extension %q caused an unexpected panic: %w", runtimeConfig.EnvoyExtension.Name, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test for the panic case. It feels like a case we should probably unit test for.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on both 👍🏻 I'll follow up on these.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to stick w/ %v due to type uncertainty of err after recover() (TIL), but updated message. Also added tests around panic.

@zalimeni
Copy link
Copy Markdown
Member Author

Thank you for the thorough review @cthain ! Will get these comments addressed before merging, agree w/ all of them.

Ensure that non-required extensions do not change xDS resources before
exiting on failure by cloning proto messages prior to applying each
extension.

To support this change, also move `CanApply` checks up a layer and make
them prior to attempting extension application, s.t. we avoid
unnecessary copies where extensions can't be applied.

Last, ensure that we do not allow panics from `CanApply` or `Extend`
checks to escape the attempted extension application.
@zalimeni zalimeni force-pushed the zalimeni/net-4703-prevent-envoy-extension-partial-application branch from 7402d06 to a920c71 Compare July 27, 2023 22:23
r = resources
// Only report an error if the extension should have been applied.
if canApply {
e = fmt.Errorf("unexpected panic: %v", e)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to stick w/ %v due to type uncertainty of err after recover() (TIL), but updated message. Also added tests around panic.

Comment on lines +549 to +550
e = fmt.Errorf("attempt to apply Envoy extension %q caused an unexpected panic: %v",
runtimeConfig.EnvoyExtension.Name, err)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing tests for the panic handling, I decided it was best to err on the side of explicit errors if CanApply fails entirely, so I removed the previous conditional around this logging.

}
}

func Test_applyEnvoyExtension_HandlesPanics(t *testing.T) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New panic test here

@zalimeni zalimeni requested a review from cthain July 27, 2023 22:26
Copy link
Copy Markdown
Contributor

@cthain cthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀