admin: add an endpoint to dump spawned Tokio tasks#595
Conversation
|
@hawkw re: feature-flagging: What's the "cost" of this? Is there any discernible different in benchmarks, for instance? |
It should be pretty minimal when disabled at runtime, but I'll do a benchmark run to make sure, good call. |
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
| }; | ||
| use tokio_trace::tasks::TaskList; | ||
| #[derive(Clone)] | ||
| pub struct Tasks { |
There was a problem hiding this comment.
Not a blocker (obviously) but do you think this can live in hawkw/tokio-trace behind a feature flag?
kleimkuhler
left a comment
There was a problem hiding this comment.
This works well for me!
I installed linkerd with this change as the proxy version and set up a port-forward for 4191 from the controller pod.
I can go to localhost:4191/tasks or localhost:4191/tasks.json and see all the tasks for the controller's proxy.
| RUN --mount=type=cache,target=/var/lib/apt/lists \ | ||
| --mount=type=cache,target=/var/tmp \ | ||
| apt update && apt install -y time cmake | ||
| --mount=type=cache,target=/var/tmp \ |
There was a problem hiding this comment.
Auto formatting here? Not a blocker but just wondering if this was intentional.
There was a problem hiding this comment.
yup, i think this change can be removed entirely — this was left over from an attempt at feature-flagging.
There was a problem hiding this comment.
Yeah, would prefer to revert this as it removes some indentation below
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
linkerd/app/core/src/trace.rs
Outdated
| Self::Json(level) => level.reload(filter)?, | ||
| Self::Plain(level) => level.reload(filter)?, | ||
| LevelHandle::Json(level) => level.reload(filter)?, | ||
| LevelHandle::Plain(level) => level.reload(filter)?, |
There was a problem hiding this comment.
I think these methods briefly moved to a different type and then came back. Will back that out.
Signed-off-by: Eliza Weisman <[email protected]>
|
@hawkw looks like this merge wasn't clean |
yeah, I'm fixing that up right now! |
Signed-off-by: Eliza Weisman <[email protected]>
This release enables a multi-threaded runtime. Previously, the proxy would only ever use a single thread for data plane processing; now, when the proxy is allocated more than 1 CPU share, the proxy allocates a thread per available CPU. This has shown substantial latency improvements in benchmarks, especially when the proxy is serving requests for many concurrent connections. --- * Add a `multicore` feature flag (linkerd/linkerd2-proxy#611) * Add `multicore` to default features (linkerd/linkerd2-proxy#612) * admin: add an endpoint to dump spawned Tokio tasks (linkerd/linkerd2-proxy#595) * trace: roll `tracing` and `tracing-subscriber` dependencies (linkerd/linkerd2-proxy#615) * stack: Add NewService::into_make_service (linkerd/linkerd2-proxy#618) * trace: tweak tracing & test support for the multithreaded runtime (linkerd/linkerd2-proxy#616) * Make FailFast cloneable (linkerd/linkerd2-proxy#617) * Move HTTP detection & server into linkerd2_proxy_http (linkerd/linkerd2-proxy#619) * Mark tap integration tests as flakey (linkerd/linkerd2-proxy#621) * Introduce a SkipDetect layer to preempt detection (linkerd/linkerd2-proxy#620)
This release enables a multi-threaded runtime. Previously, the proxy would only ever use a single thread for data plane processing; now, when the proxy is allocated more than 1 CPU share, the proxy allocates a thread per available CPU. This has shown substantial latency improvements in benchmarks, especially when the proxy is serving requests for many concurrent connections. --- * Add a `multicore` feature flag (linkerd/linkerd2-proxy#611) * Add `multicore` to default features (linkerd/linkerd2-proxy#612) * admin: add an endpoint to dump spawned Tokio tasks (linkerd/linkerd2-proxy#595) * trace: roll `tracing` and `tracing-subscriber` dependencies (linkerd/linkerd2-proxy#615) * stack: Add NewService::into_make_service (linkerd/linkerd2-proxy#618) * trace: tweak tracing & test support for the multithreaded runtime (linkerd/linkerd2-proxy#616) * Make FailFast cloneable (linkerd/linkerd2-proxy#617) * Move HTTP detection & server into linkerd2_proxy_http (linkerd/linkerd2-proxy#619) * Mark tap integration tests as flakey (linkerd/linkerd2-proxy#621) * Introduce a SkipDetect layer to preempt detection (linkerd/linkerd2-proxy#620)



Motivation
When debugging proxy issues, it can be useful to inspect the list of
currently spawned Tokio tasks and their states. This can be used
similarly to the thread or coroutine dumps provided by other languages'
runtimes.
Solution
This branch adds a new endpoint to the proxy's admin server,
/tasks,that returns a dump of all tasks currently spawned on the Tokio runtime,
using the new Tracing instrumentation added in tokio-rs/tokio#2655, and
a work-in-progress
tokio-tracecrate that provides Tokio-specificTracing layers.
Currently, the
/tasksadmin endpoint records the following informationabout each task:
currently, since Linkerd does not use local or blocking tasks...but
we might eventually!)
to be polled)
it was first polled (essentially, measuring the Tokio scheduler's
latency)
polled)
polled)
In the future, Tokio will likely expose additional Tracing information,
which we'll be able to collect as well.
The task dump can be accessed either as an HTML table or as JSON. JSON
is returned if the request has an
Accept: application/jsonheader, orwhenever the path
/tasks.jsonis requested; otherwise, the data isrendered as an HTML table. Like the
/proxy-log-levelendpoint, accessis denied to requests coming from sources other than localhost, to help
restrict access to authorized users (since a high volume of requests for
task dumps could be used to starve the proxy).
Example JSON output (in Firefox Dev Edition's extremely nice GUI
JSON viewer):
Zoomed in on the timing data for a single task:

And HTML:
Because the task data is generated from Tracing spans emitted by Tokio,
the task spans must be enabled for it to be used. This can be done by
setting a trace filter that enables the
tracelevel for the targettokio::task, e.g.:or
Notes
Tokio change that has merged to master but not been published, and my
unreleased work-in-progress
tokio-tracecrate. Therefore, I'vepinned these upstreams to fixed Git SHAs, to guard against
dependencies changing under us unexpectedly.
feature, the way we do for the mock SO_ORIG_DST implementation for
testing. However, this would make it harder to use task tracking to
debug issues in proxies not built with the flag. I'm happy to change
this code to be feature flagged if we think that's the right approach.
Closes linkerd/linkerd2#3803
Signed-off-by: Eliza Weisman [email protected]