cluster: destroy on main thread#14954
Conversation
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
|
ClusterInfo is not really destroyed because mainthread dispatcher stops dispatching posted tasks. SSLContextManager and symbol tables are not happy: these two managers expect all resources destroyed. Before this commit the main thread dispatcher will drain the post queue but though, but the exit flag is fragile. Any advice on how to gracefully shut down dispatcher? @antoniovicente |
|
The tests are failing because symbol tables is not empty when the IsolatedStoreImpl is destroyed. I am investigating. But it's also good to hear if this is a big issue. |
antoniovicente
left a comment
There was a problem hiding this comment.
ClusterInfo is not really destroyed because mainthread dispatcher stops dispatching posted tasks. SSLContextManager and symbol tables are not happy: these two managers expect all resources destroyed.
Before this commit the main thread dispatcher will drain the post queue but though, but the exit flag is fragile.
Any advice on how to gracefully shut down dispatcher? @antoniovicente
Clean shutdown of dispatchers is a fairly involved topic. The envoy dispatcher class hasn't really tried to shoot for clean shutdown. There are some things that could be done to move in that direction like:
- introducing a bool tryPost(cb) that rejects additional callbacks after shutdown.
- executing scheduled callbacks as part of the shutdown process
- force close downstream connections
- not sure what's relevant to timers
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
|
Is tsan error a red herring? |
|
fixing format |
Signed-off-by: Yuchen Dai <[email protected]>
|
Thank you! |
|
@cpakulski The buggy behavior caused lots of churns in istio. See the PR description. Can we backport it? |
|
@lambdai Sure. To which releases? |
Thanks! The newly added SdsCdsIntegrationTest doesn't use any recent features. I am afraid all the stable releases are impacted. |
|
OK - will port to all 4 latest releases. |
|
Awesome. Thanks! |
Signed-off-by: Yuchen Dai <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]>
* Dispatcher: keeps a stack of tracked objects. (#14573) Dispatcher will now keep a stack of tracked objects; on crash it'll "unwind" and have those objects dump their state. Moreover, it'll invoke fatal actions with the tracked objects. This allows us to dump more information during crash. See related PR: #14509 Will follow up with another PR dumping information at the codec/parser level. Signed-off-by: Kevin Baichoo <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]> * cluster: destroy on main thread (#14954) Signed-off-by: Yuchen Dai <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]> * Updated release notes. Signed-off-by: Christoph Pakulski <[email protected]> Co-authored-by: Kevin Baichoo <[email protected]> Co-authored-by: Yuchen Dai <[email protected]>
@cpakulski could you please confirm starting with version of envoy is this fixed? |
|
I do not remember the starting version. It was long long time ago. Please check the code. |
Commit Message:
Always destroy cluster info object on the master thread.
Fixed SDS churn and a rare crash case.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
Fixes:
#13209
istio/istio#30199
istio/istio#28315
[Optional Deprecated:]
[Optional API Considerations:]