Skip to content

[otel-tracing] Initial opentelemetry support#5570

Closed
alakesh wants to merge 3 commits intocontainerd:mainfrom
alakesh:opentelemetry-basic
Closed

[otel-tracing] Initial opentelemetry support#5570
alakesh wants to merge 3 commits intocontainerd:mainfrom
alakesh:opentelemetry-basic

Conversation

@alakesh
Copy link
Copy Markdown
Contributor

@alakesh alakesh commented Jun 4, 2021

Add basic intiialization of opentelemetry including minimum support to
be able to read open telemetry config from config.toml and initialize
exporter. Tracer is initialized and ready to be be used for creating
spans, sub spans etc. With no opentelemetry configuration enabled in
config file, this patch is a no-op.

Signed-off-by: Alakesh Haloi [email protected]

Example config changes to use/test this initialization code

[otel]
exporter_name = "otlp"
exporter_endpoint = "0.0.0.1:4317"

or

[otel]
exporter_name = "stdout"

Tested without any config change and containerd starts as usual.

Note: Opentelemetry does not seem to be compatible with go 1.13 and hence failures seen in 1.13 build.

@k8s-ci-robot
Copy link
Copy Markdown

Hi @alakesh. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 8c43b04 to 57e4816 Compare June 4, 2021 18:29
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 4, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 57e4816 to 4dee0fe Compare June 4, 2021 23:29
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 4, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 4dee0fe to b962579 Compare June 7, 2021 23:50
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 7, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from b962579 to c21bfb4 Compare June 8, 2021 00:48
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from c21bfb4 to 96bd4f5 Compare June 8, 2021 01:16
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 96bd4f5 to d059186 Compare June 8, 2021 01:22
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from d059186 to 1c98ea0 Compare June 8, 2021 06:11
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@metux
Copy link
Copy Markdown
Contributor

metux commented Jun 8, 2021

Can we make that build-time optional, please ?

@alakesh alakesh force-pushed the opentelemetry-basic branch from 1c98ea0 to b2bb672 Compare June 8, 2021 17:44
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from b2bb672 to 12ad573 Compare June 8, 2021 21:11
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 12ad573 to 4367996 Compare June 8, 2021 21:25
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 4367996 to 5b0d540 Compare June 8, 2021 21:31
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 5b0d540 to 8e06148 Compare June 8, 2021 22:07
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 8e06148 to 11901ec Compare June 8, 2021 23:13
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jun 8, 2021

Build succeeded.

@alakesh alakesh marked this pull request as ready for review June 8, 2021 23:50
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to explain the reason in the comment. See https://golangci-lint.run/usage/false-positives/ for the format.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complains about github.com/golang/protobu/proto being deprecated and suggests using protobuf from google. Will add a comment. Thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest using TODO format..

Comment thread services/server/config/config.go Outdated
Comment on lines 223 to 224
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to support stdout? containerd's stdout is currently used for logging as like all systemd-native daemons.

Copy link
Copy Markdown
Contributor Author

@alakesh alakesh Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout is very helpful for debugging, once opentelemetry support is stable we can remove it if needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we need to expose an exporter name at this point.
I don't see why people would need something other than OTLP, but if they do we can address that later if it comes up (e.g. maybe a plugin interface for exporters).

As such we can remove all of this it seems and turn on tracing based on ExporterEndpoint being set or not.

Comment thread tracing/tracing.go Outdated
Comment on lines 52 to 57
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to shutdown the exporter at the end of this function?

Copy link
Copy Markdown

@Aneurysm9 Aneurysm9 Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this function should probably return this closure instead of deferring it so that it can be called by the main package when the application is ready to shutdown. It should probably also be calling Shutdown() on the BatchSpanProcessor created below, since that may be holding on to batched spans that it will flush on shutdown. The BSP will take care of shutting down the exporter it wraps.

Comment thread tracing/exporter.go Outdated
Comment on lines 50 to 53
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aneurysm9 @rakyll Do you think it is better to support both OLTP/gRPC and OLTP/HTTP?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to know what protocols are being enabled.. what the options are..

@alakesh
Copy link
Copy Markdown
Contributor Author

alakesh commented Jun 9, 2021

Can we make that build-time optional, please ?

Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places.

Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should think about this in terms of the client wanting to trace where cri is one of those clients and as such needs to expose things like trace exporters in the cri config.

Comment thread cmd/containerd/command/main.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like debug messages.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like it's at the wrong scope and log level.. suggest moving into initopentelemetry not sure if you need to log even debug If the config isn't on.. you'll have that from info..

Comment thread services/server/config/config.go Outdated
Comment on lines 223 to 224
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we need to expose an exporter name at this point.
I don't see why people would need something other than OTLP, but if they do we can address that later if it comes up (e.g. maybe a plugin interface for exporters).

As such we can remove all of this it seems and turn on tracing based on ExporterEndpoint being set or not.

Comment thread tracing/tracing.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to expose this on the config.

Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably need some docs or links somewhere to explain how this works, how to set it up.. what it's for...

is there a a test plan? interlock with k8s?

tracing is expensive, (obvious statement :-) is this a part of a plan to add end to end tracing by service request?

Comment thread go.mod Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM for 1.22 k8s vendor updates .. can we split this into two prs one for the vendor updates?

Comment thread tracing/tracing.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ups/up/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest using TODO format..

Comment thread cmd/containerd/command/main.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same..

Comment thread cmd/containerd/command/main.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like it's at the wrong scope and log level.. suggest moving into initopentelemetry not sure if you need to log even debug If the config isn't on.. you'll have that from info..

Comment thread services/server/config/config.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference/link.. / explanation?

Comment thread tracing/tracing.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we trying to trace.. containerd daemon and plugins, tasks, image services?

Comment thread tracing/exporter.go Outdated
Comment on lines 50 to 53
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to know what protocols are being enabled.. what the options are..

@metux
Copy link
Copy Markdown
Contributor

metux commented Jun 10, 2021

Can we make that build-time optional, please ?

Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places.

Trivial: add some tiny wrappers that are no-op when building w/o it.

Comment thread tracing/tracing.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
otel.SetTextMapPropagator(propagation.TraceContext{})

The later call adding a composite with baggage and tracecontext propagators will replace this, so no need for this invocation.

Comment thread tracing/tracing.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tp.Shutdown should be returned rather than invoked on exit here. This function sets up an SDK and immediately tears it down. Returning the shutdown function allows the caller to control the lifecycle of the SDK.

@Aneurysm9
Copy link
Copy Markdown

Can we make that build-time optional, please ?

Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places.

Trivial: add some tiny wrappers that are no-op when building w/o it.

The OpenTelemetry API provides a default, no-op implementation that is used when no SDK is configured. To the extent that InitOpenTelemetry doesn't register a TracerProvider if there is no exporter configured then the no-op implementation of the API will be used by any instrumentation.

@metux
Copy link
Copy Markdown
Contributor

metux commented Jun 10, 2021

Can we make that build-time optional, please ?

Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places.

Trivial: add some tiny wrappers that are no-op when building w/o it.

The OpenTelemetry API provides a default, no-op implementation that is used when no SDK is configured. To the extent that InitOpenTelemetry doesn't register a TracerProvider if there is no exporter configured then the no-op implementation of the API will be used by any instrumentation.

Still a huge increase in binary size. containerd alone is increased by over 2 MB,
all together ca. 10% bigger.

Note that some people use it in constrained environments (eg. embedded), where code size and amount of code to audit really matters.

@Aneurysm9
Copy link
Copy Markdown

Can we make that build-time optional, please ?

Do you mean making opentelemetry support to be build-time option? Not sure how it would work when we will have trace added at various places.

Trivial: add some tiny wrappers that are no-op when building w/o it.

The OpenTelemetry API provides a default, no-op implementation that is used when no SDK is configured. To the extent that InitOpenTelemetry doesn't register a TracerProvider if there is no exporter configured then the no-op implementation of the API will be used by any instrumentation.

Still a huge increase in binary size. containerd alone is increased by over 2 MB,
all together ca. 10% bigger.

Note that some people use it in constrained environments (eg. embedded), where code size and amount of code to audit really matters.

That increase likely comes from protobuf, supporting the OTLP exporter. If you want to make anything a build-time option that would be it. The OTel API module does not include any dependencies not already included in this module.

Attempting to create a wrapper API around the OTel API for instrumentation is only likely to lead to confusion for developers attempting to instrument the application and is likely to be rendered ineffective by the inclusion of an instrumented library at some point in the future. We've worked very hard to make the OTel API low-overhead so that it can be included in as many libraries as possible while staying out of the way of applications that have not configured an SDK.

@kzys kzys added the status/needs-update Awaiting contributor update label Jul 2, 2021
@alakesh alakesh force-pushed the opentelemetry-basic branch from 11901ec to 1188259 Compare July 12, 2021 22:32
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 12, 2021

Build succeeded.

@alakesh alakesh force-pushed the opentelemetry-basic branch from 1188259 to 3802531 Compare July 13, 2021 16:22
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 13, 2021

Build succeeded.

alakesh added 3 commits July 13, 2021 09:53
Add basic intiialization of opentelemetry including minimum support to
be able to read open telemetry config from config.toml and initialize
exporter. Tracer is initialized and ready to be be used for creating
spans, sub spans etc. With no opentelemetry configuration enabled in
config file, this patch is a no-op.

Basic config stub to be added to use opentelemetry is to add following
in config.toml. We use otlp exporter with default port 4317.

[otel]
  exporter_name = "otlp"
  exporter_endpoint = "0.0.0.1:4317"

otel-collector binary needs to run listening at the same port.

Signed-off-by: Alakesh Haloi <[email protected]>
@alakesh alakesh force-pushed the opentelemetry-basic branch from 3802531 to cd561d9 Compare July 13, 2021 16:55
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 13, 2021

Build succeeded.

Comment on lines +143 to +146
// Get tracer
ctrdTracer := otel.Tracer("containerd")
ctx, mainCtrdSpan := ctrdTracer.Start(gocontext.Background(), "containerd-expoerter")
defer mainCtrdSpan.End()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aneurysm9 @alakesh Is it common to have the "main" span?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably depends largely on your program's usage pattern. If you expect a main() invocation to be quick and represent a single unit of work, then it might make sense to have one. On the other hand, if your program is long-lived then the value of having such a span decreases and you'll want to ensure that spans representing individual units of work are linked to the main span but create their own traces.

Comment thread services/server/server.go
// LoadPlugins loads all plugins into containerd and generates an ordered graph
// of all plugins.
func LoadPlugins(ctx context.Context, config *srvconfig.Config) ([]*plugin.Registration, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to remove the empty line :)

Comment thread tracing/tracing.go
Comment on lines +54 to +57
// Currently we only support OTLP as exporter
if otelCfg.ExporterName != "otlp" {
return nil, errors.Wrapf(err, "Unsupported exporter %s", otelCfg.ExporterName)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it the Validate() does?

Comment thread tracing/tracing.go
Comment on lines +73 to +74
otlptracegrpc.WithInsecure(),
otlptracegrpc.WithEndpoint("localhost:4317"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we hardcode localhost:4317 here?
  • For remote locations, shouldn't we support TLS or some other encrypted transports? (cc @Aneurysm9)

Comment thread tracing/tracing.go
// process to aggregate spans before export.
ctrdBatchSpanProcessor := sdktrace.NewBatchSpanProcessor(ctrdTraceExporter)
ctrdTracerProvider := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.5)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampling rate shouldn't be hard-coded.

Comment thread tracing/tracing.go
// Shutdown will flush any remaining spans and shut down the exporter
err := ctrdTracerProvider.Shutdown(ctx)
if err != nil {
logrus.Fatalf("failed to shutdown TracerProvider: %v", err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it kill containerd process? Should it be?

@alakesh alakesh closed this Jul 13, 2021
@alakesh alakesh deleted the opentelemetry-basic branch July 13, 2021 19:04
@thaJeztah
Copy link
Copy Markdown
Member

@alakesh why did you close/remove this PR? (is there a replacement for this one?)

@alakesh
Copy link
Copy Markdown
Contributor Author

alakesh commented Jul 23, 2021

@alakesh why did you close/remove this PR? (is there a replacement for this one?)
@thaJeztah It was a mistake, i accidentally deleted my forked branch and since the PR was linked, it got closed. I did not have a way to revive it other than creating a duplicate one. I have tried to take care of the comments in the latest one
#5731

@thaJeztah
Copy link
Copy Markdown
Member

Thanks, yes I saw the other PR was opened after I posted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test status/needs-update Awaiting contributor update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants