[otel-tracing] Initial opentelemetry support#5570
[otel-tracing] Initial opentelemetry support#5570alakesh wants to merge 3 commits intocontainerd:mainfrom
Conversation
|
Hi @alakesh. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8c43b04 to
57e4816
Compare
|
Build succeeded.
|
57e4816 to
4dee0fe
Compare
|
Build succeeded.
|
4dee0fe to
b962579
Compare
|
Build succeeded.
|
b962579 to
c21bfb4
Compare
|
Build succeeded.
|
c21bfb4 to
96bd4f5
Compare
|
Build succeeded.
|
96bd4f5 to
d059186
Compare
|
Build succeeded.
|
d059186 to
1c98ea0
Compare
|
Build succeeded.
|
|
Can we make that build-time optional, please ? |
1c98ea0 to
b2bb672
Compare
|
Build succeeded.
|
b2bb672 to
12ad573
Compare
|
Build succeeded.
|
12ad573 to
4367996
Compare
|
Build succeeded.
|
4367996 to
5b0d540
Compare
|
Build succeeded.
|
5b0d540 to
8e06148
Compare
|
Build succeeded.
|
8e06148 to
11901ec
Compare
|
Build succeeded.
|
There was a problem hiding this comment.
It would be better to explain the reason in the comment. See https://golangci-lint.run/usage/false-positives/ for the format.
There was a problem hiding this comment.
It complains about github.com/golang/protobu/proto being deprecated and suggests using protobuf from google. Will add a comment. Thanks
There was a problem hiding this comment.
Do we really need to support stdout? containerd's stdout is currently used for logging as like all systemd-native daemons.
There was a problem hiding this comment.
stdout is very helpful for debugging, once opentelemetry support is stable we can remove it if needed.
There was a problem hiding this comment.
It doesn't seem like we need to expose an exporter name at this point.
I don't see why people would need something other than OTLP, but if they do we can address that later if it comes up (e.g. maybe a plugin interface for exporters).
As such we can remove all of this it seems and turn on tracing based on ExporterEndpoint being set or not.
There was a problem hiding this comment.
Is it okay to shutdown the exporter at the end of this function?
There was a problem hiding this comment.
No, this function should probably return this closure instead of deferring it so that it can be called by the main package when the application is ready to shutdown. It should probably also be calling Shutdown() on the BatchSpanProcessor created below, since that may be holding on to batched spans that it will flush on shutdown. The BSP will take care of shutting down the exporter it wraps.
There was a problem hiding this comment.
@Aneurysm9 @rakyll Do you think it is better to support both OLTP/gRPC and OLTP/HTTP?
There was a problem hiding this comment.
would be nice to know what protocols are being enabled.. what the options are..
Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places. |
cpuguy83
left a comment
There was a problem hiding this comment.
I feel like we should think about this in terms of the client wanting to trace where cri is one of those clients and as such needs to expose things like trace exporters in the cri config.
There was a problem hiding this comment.
These seem like debug messages.
There was a problem hiding this comment.
feels like it's at the wrong scope and log level.. suggest moving into initopentelemetry not sure if you need to log even debug If the config isn't on.. you'll have that from info..
There was a problem hiding this comment.
It doesn't seem like we need to expose an exporter name at this point.
I don't see why people would need something other than OTLP, but if they do we can address that later if it comes up (e.g. maybe a plugin interface for exporters).
As such we can remove all of this it seems and turn on tracing based on ExporterEndpoint being set or not.
There was a problem hiding this comment.
We'll need to expose this on the config.
mikebrow
left a comment
There was a problem hiding this comment.
probably need some docs or links somewhere to explain how this works, how to set it up.. what it's for...
is there a a test plan? interlock with k8s?
tracing is expensive, (obvious statement :-) is this a part of a plan to add end to end tracing by service request?
There was a problem hiding this comment.
SGTM for 1.22 k8s vendor updates .. can we split this into two prs one for the vendor updates?
There was a problem hiding this comment.
feels like it's at the wrong scope and log level.. suggest moving into initopentelemetry not sure if you need to log even debug If the config isn't on.. you'll have that from info..
There was a problem hiding this comment.
reference/link.. / explanation?
There was a problem hiding this comment.
what are we trying to trace.. containerd daemon and plugins, tasks, image services?
There was a problem hiding this comment.
would be nice to know what protocols are being enabled.. what the options are..
Trivial: add some tiny wrappers that are no-op when building w/o it. |
There was a problem hiding this comment.
| otel.SetTextMapPropagator(propagation.TraceContext{}) |
The later call adding a composite with baggage and tracecontext propagators will replace this, so no need for this invocation.
There was a problem hiding this comment.
tp.Shutdown should be returned rather than invoked on exit here. This function sets up an SDK and immediately tears it down. Returning the shutdown function allows the caller to control the lifecycle of the SDK.
The OpenTelemetry API provides a default, no-op implementation that is used when no SDK is configured. To the extent that |
Still a huge increase in binary size. containerd alone is increased by over 2 MB, Note that some people use it in constrained environments (eg. embedded), where code size and amount of code to audit really matters. |
That increase likely comes from protobuf, supporting the OTLP exporter. If you want to make anything a build-time option that would be it. The OTel API module does not include any dependencies not already included in this module. Attempting to create a wrapper API around the OTel API for instrumentation is only likely to lead to confusion for developers attempting to instrument the application and is likely to be rendered ineffective by the inclusion of an instrumented library at some point in the future. We've worked very hard to make the OTel API low-overhead so that it can be included in as many libraries as possible while staying out of the way of applications that have not configured an SDK. |
11901ec to
1188259
Compare
|
Build succeeded.
|
1188259 to
3802531
Compare
|
Build succeeded.
|
Signed-off-by: Alakesh Haloi <[email protected]>
Signed-off-by: Alakesh Haloi <[email protected]>
Add basic intiialization of opentelemetry including minimum support to be able to read open telemetry config from config.toml and initialize exporter. Tracer is initialized and ready to be be used for creating spans, sub spans etc. With no opentelemetry configuration enabled in config file, this patch is a no-op. Basic config stub to be added to use opentelemetry is to add following in config.toml. We use otlp exporter with default port 4317. [otel] exporter_name = "otlp" exporter_endpoint = "0.0.0.1:4317" otel-collector binary needs to run listening at the same port. Signed-off-by: Alakesh Haloi <[email protected]>
3802531 to
cd561d9
Compare
|
Build succeeded.
|
| // Get tracer | ||
| ctrdTracer := otel.Tracer("containerd") | ||
| ctx, mainCtrdSpan := ctrdTracer.Start(gocontext.Background(), "containerd-expoerter") | ||
| defer mainCtrdSpan.End() |
There was a problem hiding this comment.
@Aneurysm9 @alakesh Is it common to have the "main" span?
There was a problem hiding this comment.
It probably depends largely on your program's usage pattern. If you expect a main() invocation to be quick and represent a single unit of work, then it might make sense to have one. On the other hand, if your program is long-lived then the value of having such a span decreases and you'll want to ensure that spans representing individual units of work are linked to the main span but create their own traces.
| // LoadPlugins loads all plugins into containerd and generates an ordered graph | ||
| // of all plugins. | ||
| func LoadPlugins(ctx context.Context, config *srvconfig.Config) ([]*plugin.Registration, error) { | ||
|
|
There was a problem hiding this comment.
Better to remove the empty line :)
| // Currently we only support OTLP as exporter | ||
| if otelCfg.ExporterName != "otlp" { | ||
| return nil, errors.Wrapf(err, "Unsupported exporter %s", otelCfg.ExporterName) | ||
| } |
| otlptracegrpc.WithInsecure(), | ||
| otlptracegrpc.WithEndpoint("localhost:4317"), |
There was a problem hiding this comment.
- Can we hardcode localhost:4317 here?
- For remote locations, shouldn't we support TLS or some other encrypted transports? (cc @Aneurysm9)
| // process to aggregate spans before export. | ||
| ctrdBatchSpanProcessor := sdktrace.NewBatchSpanProcessor(ctrdTraceExporter) | ||
| ctrdTracerProvider := sdktrace.NewTracerProvider( | ||
| sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.5)), |
There was a problem hiding this comment.
The sampling rate shouldn't be hard-coded.
| // Shutdown will flush any remaining spans and shut down the exporter | ||
| err := ctrdTracerProvider.Shutdown(ctx) | ||
| if err != nil { | ||
| logrus.Fatalf("failed to shutdown TracerProvider: %v", err) |
There was a problem hiding this comment.
Wouldn't it kill containerd process? Should it be?
|
@alakesh why did you close/remove this PR? (is there a replacement for this one?) |
|
|
Thanks, yes I saw the other PR was opened after I posted |
Add basic intiialization of opentelemetry including minimum support to
be able to read open telemetry config from config.toml and initialize
exporter. Tracer is initialized and ready to be be used for creating
spans, sub spans etc. With no opentelemetry configuration enabled in
config file, this patch is a no-op.
Signed-off-by: Alakesh Haloi [email protected]
Example config changes to use/test this initialization code
[otel]
exporter_name = "otlp"
exporter_endpoint = "0.0.0.1:4317"
or
[otel]
exporter_name = "stdout"
Tested without any config change and containerd starts as usual.
Note: Opentelemetry does not seem to be compatible with go 1.13 and hence failures seen in 1.13 build.