Skip to content

thrift: Reject router config if cluster/weighted clusters/mirror cluster doesn't exist for static route config provider#20577

Merged
wrowe merged 13 commits intoenvoyproxy:mainfrom
JuniorHsu:unknownCluster
Apr 7, 2022
Merged

thrift: Reject router config if cluster/weighted clusters/mirror cluster doesn't exist for static route config provider#20577
wrowe merged 13 commits intoenvoyproxy:mainfrom
JuniorHsu:unknownCluster

Conversation

@JuniorHsu
Copy link
Copy Markdown
Contributor

@JuniorHsu JuniorHsu commented Mar 30, 2022

Signed-off-by: kuochunghsu [email protected]

Commit Message:
Expose ServerFactoryContext to ConfigTraitsImpl::createConfig so the underlying RouteEntryImpl is able to validate its clusters.

Currently HTTP validates the clusters for static route config by default but doesn't validate the clusters for RDS.
Moreover, optional field validate_clusters changes the default behavior.

In this diff, thrift route config provider manager do the same default behavior for cluster validation.
That is, validate the clusters for static route config by default but not validate the clusters for TRDS.

Optional field validate_clusters is coming soon.

Risk Level: low
Testing: unit test
[Optional Fixes #Issue] #20559

@JuniorHsu JuniorHsu requested a review from zuercher as a code owner March 30, 2022 03:36
@JuniorHsu JuniorHsu marked this pull request as draft March 30, 2022 03:37
@JuniorHsu
Copy link
Copy Markdown
Contributor Author

Before I fixed all the interesting tests and add the appropriate test coverage.

Like http config setup, we don't validate the cluster for rds. Instead, only validate the clusters in static config.

Which introduce two interfaces I'm hesitating to determine:

virtual ConfigConstSharedPtr createConfig(const Protobuf::Message& rc,
                                            Server::Configuration::ServerFactoryContext& context,
                                            bool validate_clusters_default) const PURE;

I'm not sure if we need to have the flexibility the change the validate_clusters_default while createConfig
Or once we decide the configTraits, the rule is fixed.

Currently I keep the flexibility since reverting interface is expensive .
2.

RouteMatcher::RouteMatcher(
    const envoy::extensions::filters::network::thrift_proxy::v3::RouteConfiguration& config,
    Server::Configuration::ServerFactoryContext& factory_context, bool validate_clusters)

Same thing, for thrift RouteMatcher, we could pass the limited information absl::optional<Upstream::ClusterManager::ClusterInfoMaps> instead of while context.

It's not an interface, we might want the minimal exposure principal?

cc @zuercher @rgs1 @tkovacs-2

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

JuniorHsu commented Mar 30, 2022

After I take another look, 1. is not an issue given the config traits in common/rds doesn't know if it should create a config with validation. Unless the traits take another parameters.

I'll go ahead with 2. to minimal exposure and fix the tests.

kuochunghsu added 2 commits March 30, 2022 10:05
Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just one nit

@JuniorHsu JuniorHsu marked this pull request as ready for review March 30, 2022 19:41
@tkovacs-2
Copy link
Copy Markdown
Contributor

I'm not sure if we need to have the flexibility the change the validate_clusters_default while createConfig. Or once we decide the configTraits, the rule is fixed.

Since it is not protocol specific, I think the validate_clusters_default is ok here.

It's not an interface, we might want the minimal exposure principal?

I also vote for minimal exposure.

kuochunghsu added 2 commits March 30, 2022 21:03
Signed-off-by: kuochunghsu <[email protected]>
Signed-off-by: kuochunghsu <[email protected]>
@tkovacs-2
Copy link
Copy Markdown
Contributor

Don't you add flag for the validation into the config proto?
I'm not sure but in that case if the listener and the routing is in the bootstrap config but the clusters are through xDS then the validation fails at startup.

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

Don't you add flag for the validation into the config proto? I'm not sure but in that case if the listener and the routing is in the bootstrap config but the clusters are through xDS then the validation fails at startup.

No. Currently follow what the http did: only validate in static provider. No config proto is able to change the behavior at the moment.

As for your scenario, envoy will retry after the cluster arrives after it fails at startup without cluster information. Not perfect but it works.

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

Writing fixes to clang-tidy-fixes.yaml ...
clang-tidy check failed, potentially fixed by clang-apply-replacements:
Diagnostics:
- BuildDirectory: /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy
  DiagnosticMessage:
    FileOffset: 298
    FilePath: ./source/common/common/logger.h
    Message: '''source/common/common/logger_impl.h'' file not found'
    Replacements: []
  DiagnosticName: clang-diagnostic-error
  Level: Error
  Ranges:
  - FileOffset: 298
    FilePath: ./source/common/common/logger.h
    Length: 36

For the test failure in "envoy-presubmit (check linux_x64 clang_tidy)" , I guess it's unrelated since I didn't touch nor include the logger.

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

Note: We might want validate_clusters in proto at some point like http.

google.protobuf.BoolValue validate_clusters = 7;

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20577 (comment) was created by @JuniorHsu.

see: more, trace.

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, thanks.

@htuch do you have some cycles to look at the common RDS code?

@tkovacs-2
Copy link
Copy Markdown
Contributor

tkovacs-2 commented Apr 2, 2022

Note: We might want validate_clusters in proto at some point like http.

I would set the default to false even for of static case and add the flag to the proto to control the validation.
The current state is not fully backward compatible: If someone has a dummy/legacy rule referring to an unknown cluster but never matches, this change breaks a previously working system.

route:
cluster: cluster2
)EOF";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you want, you can add back the assert for the case when the cluster list is empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks!

virtual ConfigConstSharedPtr createConfig(const Protobuf::Message& rc) const PURE;
virtual ConfigConstSharedPtr createConfig(const Protobuf::Message& rc,
Server::Configuration::ServerFactoryContext& context,
bool validate_clusters_default) const PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the new params in the Doxygen-style comments above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo testing question.

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

@htuch it's ready to have another review with three follow-up items
(a) make validate_clusters configurable (I can do it after this lands)
(b) consider cluster manager return a ref
(c) consider remove ClusterHeader() from ThriftProxy::RouteEntry

thanks

htuch
htuch previously approved these changes Apr 5, 2022
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch enabled auto-merge (squash) April 5, 2022 03:51
@JuniorHsu
Copy link
Copy Markdown
Contributor Author

Writing fixes to clang-tidy-fixes.yaml ...
clang-tidy check failed, potentially fixed by clang-apply-replacements:
Diagnostics:
- BuildDirectory: /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy
  DiagnosticMessage:
    FileOffset: 298
    FilePath: ./source/common/common/logger.h
    Message: '''source/common/common/logger_impl.h'' file not found'
    Replacements: []
  DiagnosticName: clang-diagnostic-error
  Level: Error
  Ranges:
  - FileOffset: 298
    FilePath: ./source/common/common/logger.h
    Length: 36

For the test failure in "envoy-presubmit (check linux_x64 clang_tidy)" , I guess it's unrelated since I didn't touch nor include the logger.

This might need a manual merge.
The clang-tidy error from Error while processing /source/envoy/rds/config_traits.h.
which includes context files which include logger

However the clang-tidy can't parse strip_include_prefix

strip_include_prefix = "standard",

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Apr 6, 2022

The logger was indirectly added by including "envoy/server/factory_context.h", probably without adding the deps for it to envoy/rds/config_traits.h

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Apr 6, 2022

/wait for deps to be patched up

Signed-off-by: kuochunghsu <[email protected]>
@JuniorHsu
Copy link
Copy Markdown
Contributor Author

The logger was indirectly added by including "envoy/server/factory_context.h", probably without adding the deps for it to envoy/rds/config_traits.h

It's a promising solution and it takes some effort to break the circular reference. Next commit is comming

.-> //envoy/router:router_interface (8a9baa446c4cdc335df7924c31f550a82dcac9f2cefefd75763ac169448146ca)
|   //envoy/rds:rds_interface (8a9baa446c4cdc335df7924c31f550a82dcac9f2cefefd75763ac169448146ca)
|   //envoy/server:factory_context_interface (8a9baa446c4cdc335df7924c31f550a82dcac9f2cefefd75763ac169448146ca)
|   //envoy/http:filter_interface (8a9baa446c4cdc335df7924c31f550a82dcac9f2cefefd75763ac169448146ca)
`-- //envoy/router:router_interface (8a9baa446c4cdc335df7924c31f550a82dcac9f2cefefd75763ac169448146ca)

auto-merge was automatically disabled April 6, 2022 02:52

Head branch was pushed to by a user without write access

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20577 (comment) was created by @JuniorHsu.

see: more, trace.

@wrowe wrowe enabled auto-merge (squash) April 6, 2022 15:19
@wrowe wrowe self-assigned this Apr 6, 2022
@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Apr 6, 2022

Would a reviewer please revisit? It looks that all the comments and feedback have been addressed.

@JuniorHsu
Copy link
Copy Markdown
Contributor Author

@htuch could you move this forward? I commited @wrowe 's great suggestion to make the test happier. Thanks.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wrowe wrowe merged commit 456e449 into envoyproxy:main Apr 7, 2022
@JuniorHsu JuniorHsu deleted the unknownCluster branch April 15, 2022 00:43
htuch pushed a commit that referenced this pull request Apr 27, 2022
#20874)

This is a continuation of #20577.

Additional Description:
Risk Level: low
Testing: unit test

Signed-off-by: kuochunghsu <[email protected]>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…ter doesn't exist for static route config provider (envoyproxy#20577)

* Reject thrift router config if cluster/weighted clusters/mirror cluster don't exist
* Only expose cluster information to `RouteMatcher`
* Fix router ratelimit test, add unit test, add test coverage
* address rgs1's , tkovacs-2's comment, Tamas and Harvey's comments
* remove redundant ref, dead code and 
* add comment to state the current status
* add dep lib

Signed-off-by: kuochunghsu <[email protected]>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
envoyproxy#20874)

This is a continuation of envoyproxy#20577.

Additional Description:
Risk Level: low
Testing: unit test

Signed-off-by: kuochunghsu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants