Skip to content

Conversation

@swagatbora90
Copy link
Contributor

@swagatbora90 swagatbora90 commented Sep 29, 2022

Issue: #6550

Adding Otel spans to CRI + client methods.

This PR adds spans in CRI ImageService apis and pull.go

TODO: Add spans to CRI RuntimeService APIs

Signed-off-by: Swagat Bora [email protected]

  • Image_pull
    Screen Shot 2022-10-12 at 11 56 24 AM

  • image_list

Screen Shot 2022-09-29 at 2 07 30 PM

  • image_status

Screen Shot 2022-09-29 at 2 09 07 PM

  • image_fsinfo

Screen Shot 2022-09-29 at 2 10 11 PM

  • image_remove

Screen Shot 2022-09-29 at 2 11 04 PM

@k8s-ci-robot
Copy link

Hi @swagatbora90. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@swagatbora90 swagatbora90 force-pushed the trace-cri-image branch 3 times, most recently from 38d25e7 to 98d73dc Compare September 30, 2022 19:12
@swagatbora90 swagatbora90 marked this pull request as ready for review September 30, 2022 19:14
@cpuguy83
Copy link
Member

Looks like the go.mod in the integration client needs to be updated.

@cpuguy83
Copy link
Member

Otherwise this seems good.

/ok-to-test

runtimeRunhcsV1 = "io.containerd.runhcs.v1"

// name prefix for CRI sbserver specific spans
criSbServerSpanPrefix = "pkg.cri.sbserver"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding pkg.cri.sbserver vs pkg.cri.server, I'm unsure how much we'd like to expose to users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to follow a common naming convention for the spans, so used : "relative path to the package"+ "name describing the span logic or the method", so added the name. Any reason we do not want to expose server vs sbserver difference? I was assuming this could help catch divergence issues between the both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we eventually migrate from pkg.cri.server to pkg.cri.sbserver and don't want to surprise the consumer of the traces.

That being said, I do agree having different names will help users spot the difference in between.

imageRef := r.GetImage().GetImage()
namedRef, err := distribution.ParseDockerRef(imageRef)
if err != nil {
span.RecordError(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RecordError calls, how about renaming this function to something like pullImage and calling it like below? Calling RecordError() on all branches seems error-prone.

resp, err := pullImage(...)
if err != nil {
  span.RecordError(err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I can take a look into creating a thin wrapper

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned about the pattern of span.RecordError(err) in each branch; that's not going to be easy to enforce as we continue to maintain the code.

I'd prefer the wrapper, but another reasonable approach would be to change the signature to use a named return:

func (c *criService) PullImage(ctx context.Context, r *runtime.PullImageRequest) (_ *runtime.PullImageResponse, err error)

and then capture the error in a defer:

defer func(){
	if err != nil {
		span.RecordError(err)
	}
}()

On further reflection: for sbserver and server, can we create the spans inside a gRPC interceptor and stuff them into ctx if we later want to add attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @samuelkarp, I already updated this PR to add a wrapper and removed most of the RecordError calls. Please take a look. I am fine changing the current impl to use a defer func with named error, if we strongly feel about it.

for sbserver and server, can we create the spans inside a gRPC interceptor and stuff them into ctx if we later want to add attributes

IIUC we already have a span from the gRPC interceptor which is currently the top level span. This PR adds additional child spans for tracing after the gRPC calls is received. Sure, we can keep using the parent gRPC span as well to add the attributes, but I think ability to trace which packages are being invoked is an added advantage.


// registryHosts is the registry hosts to be used by the resolver.
func (c *criService) registryHosts(ctx context.Context, auth *runtime.AuthConfig, updateClientFn config.UpdateClientFunc) docker.RegistryHosts {
ctx, span := tracing.StartSpan(ctx, makeSpanName("registryHosts"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to expose Go's unexported methods to spans. Wouldn't people treat them as a part of the published interface?

Copy link
Contributor Author

@swagatbora90 swagatbora90 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree to that. Isn't the idea of tracing is being able to look into the unexposed methods, say a method that does network calls ,to get more visibility into the network latency?

For this case though, I don't think there is any real value to instrumenting this method so I will go ahead and remove the instrumentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there can be value for long operations like network, but not for small functions like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to remove this operation

return nil, fmt.Errorf("can not resolve %q locally: %w", r.GetImage().GetImage(), err)
}

span.SetAttributes(attribute.String("imageID", image.ID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be different from imageRef?

Copy link
Contributor Author

@swagatbora90 swagatbora90 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are the same, but I might be wrong

Image object has both ID (digest of image config) and References (repo tag, repo digest)

ID: sha256:1123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
References: []string{
			"docker.io/library/busybox:latest",
			"docker.io/library/busybox@sha256:e6693c20186f837fc393390135d8a598a96a833917917789d63766cab6c59582",
		},

However, In containerd metadata store, I see that we assign Image.ID to metadata ImageRef https://github.com/containerd/containerd/blob/main/pkg/cri/server/container_create.go#L206. So within containerd, imageRef is treated same as image.ID

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. How about using imageRef here then? I'd like to keep attributes consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. I am changing all the attributes to lowercase and dot notation, so this will be "image.ref"

pull.go Outdated
span.RecordError(err)
return nil, fmt.Errorf("unable to resolve snapshotter: %w", err)
}
span.SetAttributes(attribute.String("snapshotterName", snapshotterName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have snapshotter as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference? If no valid snapshotter is provided, the snapshotterName defaults to "overlayfs" so decided to use this value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I meant, we have snapshotterName and snapshotter as attributes. If they are semantically same, we should pick one (I prefer snapshotter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. I decided to go with "snapshotter.name". also we have "image.name" and "image.ref" so it felt more semantically similar

@swagatbora90 swagatbora90 mentioned this pull request Oct 4, 2022
imageRef := r.GetImage().GetImage()
namedRef, err := distribution.ParseDockerRef(imageRef)
if err != nil {
span.RecordError(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned about the pattern of span.RecordError(err) in each branch; that's not going to be easy to enforce as we continue to maintain the code.

I'd prefer the wrapper, but another reasonable approach would be to change the signature to use a named return:

func (c *criService) PullImage(ctx context.Context, r *runtime.PullImageRequest) (_ *runtime.PullImageResponse, err error)

and then capture the error in a defer:

defer func(){
	if err != nil {
		span.RecordError(err)
	}
}()

On further reflection: for sbserver and server, can we create the spans inside a gRPC interceptor and stuff them into ctx if we later want to add attributes?

}
log.G(ctx).Debugf("PullImage %q with snapshotter %s", ref, snapshotter)
span.SetAttributes(
attribute.String("imageRef", ref),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use "ref" as the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to the attribute names, based on recommednations in the https://opentelemetry.io/docs/reference/specification/common/attribute-naming/. As per that, I went with image.ref.


// registryHosts is the registry hosts to be used by the resolver.
func (c *criService) registryHosts(ctx context.Context, auth *runtime.AuthConfig, updateClientFn config.UpdateClientFunc) docker.RegistryHosts {
ctx, span := tracing.StartSpan(ctx, makeSpanName("registryHosts"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there can be value for long operations like network, but not for small functions like this.

Comment on lines +151 to 145
_, unpackSpan := tracing.StartSpan(ctx, "pull.UnpackWait")
if ur, err = unpacker.Wait(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want a span within the unpack operation. The pull.UnpackWait span will capture the delay imposed on the overall pull operation, but won't capture the actual time it takes to unpack a layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to add spans around unpack operations

This is how the new traces will look like
Screen Shot 2022-10-12 at 11 56 24 AM

Also added span attributes for the descriptor mediatype, size and digest, to identify which particular oci_descriptor/layer information

Screen Shot 2022-10-12 at 12 02 50 PM

Screen Shot 2022-10-12 at 12 01 06 PM

Screen Shot 2022-10-12 at 12 01 24 PM

return nil, fmt.Errorf("can not resolve %q locally: %w", r.GetImage().GetImage(), err)
}

span.SetAttributes(attribute.String("image.ref", image.ID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pick either image.name or image.ref, assuming both of them would have something like docker.io/library/debian:latest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wherever I used image.name refers to name of the images docker.io/library/debian:latest

image.ref refers to the image id (which is the digest of the image config) sha256:51086ed63d8cba3a6a3d94ecd103e9638b4cb8533bb896caf2cda04fb79b862f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the digest same as the image's digest? How about calling it image.digest

We use ref as a way to refer the human-readable image name. We shouldn't use that for something else.

fmt.Fprintln(tw, "REF\tTYPE\tDIGEST\tSIZE\tPLATFORMS\tLABELS\t")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sure I can change image.name to image.ref

As for ctr

[ec2-user@ip-172-31-62-162 client]$ sudo ctr i ls
REF                             TYPE                                                      DIGEST                                                                  SIZE     PLATFORMS                                                                                               LABELS
docker.io/library/debian:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:e538a2f0566efc44db21503277c7312a142f4d0dedc5d2886932b92626104bff 52.5 MiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x -

Digest here refers to the image Index file,

[ec2-user@ip-172-31-62-162 client]$ cat  /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/e538a2f0566efc44db21503277c7312a142f4d0dedc5d2886932b92626104bff| json_reformat | tail -n 10
            "platform": {
                "architecture": "s390x",
                "os": "linux"
            },
            "size": 529
        }
    ],
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "schemaVersion": 2
}

this is different from image.id which refers to the image config digest.

The way I see this, ID is used to refer to unique images (platform + name), REF is a generic name for the image that can belong to any platform, DIGEST is used to denote the index file for the image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is what I am using now:
image.ref - name of the image such as docker.io/library/debian:latest
image.id - digest of the image config which is also the unique image identifier

Copy link
Member

@kzys kzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Still not too sure about the naming convention of spans, but we may figure out while adding more spans.

image.id vs. image.name vs. image.ref is the only blocker for me.

@jterry75
Copy link
Contributor

High level comments. For CRI the original idea here was to use the instrumentedService for entry level spans. We have been doing that for a long time. But I love the otel spans. Because they pass the context the downstream methods can all add to it but dont have to open/end the spans. 2nd, we should really put this on the gRPC hook like we do for containerd as well as ttrpc via the interceptors.

@jterry75
Copy link
Contributor

@kevpar will likely be interested here as well

@swagatbora90
Copy link
Contributor Author

Thanks for the feedback @jterry75.

Because they pass the context the downstream methods can all add to it but don't have to open/end the spans

I think having child spans do add value specially when a sub method is invoked multiple times say a network call or when there are async calls. I am not sure how we could track those using a single span. We are able to add attributes and events to an existing span, but they do not represent the time spent in each of the sub operations (which I believe is what we will be interested in when debugging any issue). Also, thinking in terms of analyzing/filtering, having granular sub spans will provide better ability to filter the operations we are interested in. I might be missing something though, so correct me if I am missing anything here.

2nd, we should really put this on the gRPC hook like we do for containerd as well as ttrpc via the interceptors.

We are already capturing the gRPC call invoked by kubelet(in my example I am using crictl). This PR add additional subspans to that top-level gRPC trace, so we are now able to expand into some of the main methods that actually get once the gRPC call is received

For example

runtime.v1.ImageService/PullImage is already being captured with the gRPC interceptor
Screen Shot 2022-09-29 at 2 06 14 PM

@jterry75
Copy link
Contributor

Oh I see the misunderstanding here. You are adding "sub spans" to the cri image service. :). I thought you were thinking these were the parent spans and I was trying to say that we already open them on the interceptor via gRPC. What that means is that in the entrylevel context PullImage for example in the instrumentedService we can grab the span from the context and attribute it with the deserialized values we care about such as container id etc. Since this already represents the entry context I dont think we need to duplicate it in the actual PullImage function unless that gives us some value I dont understand.

As for your other question about child spans. My general philosophy (just my opinion) is that all Entry and Exit's should be covered by spans and they should flow context, at a minimum this is the span/trace id's. So what this means is that when PullImage is called we start a span. When PullImage calls into the containerd core runtime for the content store, the resolver, the layer store, the snapshotter, etc these should ALL have spans! This is the granular level that you want. Now should each of those components have spans on each private function? Likely not, but eventually we call into the shims via ttrpc and yet another span, which eventually makes some syscalls. These syscalls IMO should be surrounded by a span, so that we can measure the effect of the OS in the entire call chain. Etc. So yes, Span On Entry, Span On Exit, maybe, when needed, span in the middle.

Let me review your code again here now that I better understand the intent.

ctx, span := tracing.StartSpan(ctx, makeSpanName("PullImage"))
defer span.End()
response, err := c.pullImage(ctx, r)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/microsoft/hcsshim/blob/main/cmd/containerd-shim-runhcs-v1/service.go#L100

Here is an example of how we did this in hcsshim. Basically we set the error state based on a defer rather than needing to wrap all methods. That way any call that returns the error from func will apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @jterry75. I will implement some of the changes based on your recommendations:

  1. Create the top level span at the InstrumentedService layer. Use the same span in the CRI layer and create new spans. I will add child spans when we reach containerd client side methods[same as today].
  2. Use the defer approach to catch errors during exit at the InstrumentedService layer.

Also, need a clarification regardingspan.RecordError(err). RecordError() does not set the status of the Span to error. it is simply recording the error in the span similar to how we add attributes.

From otel doc:

It is highly recommended that you also set a span’s status to Error when using RecordError, unless you do not wish to consider the span tracking a failed operation as an error span. The RecordError function does not automatically set a span status when called.

Currently, I am considering the span to be still valid even though we might encounter a failed operation. Is there any real benefit of marking the span as an error span when we encounter an error in the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh what an odd change in otel. Wow I guess I need to freshen up. IMO yes. The span status will be wrong if we dont set it to error when there was an api failure. So yes a human can read the logs and sorta make sense but anything scraping the output for alarms/errors/etc will not because it will not see the status as a failure. We dont want to have to add special cases for every api like "output was X on PullImage when status was SUCCESS == an error". That is just a poor experience.

I do get the case for recording an error without the span failing I guess. If for example we retried an operation internally but failed on the first attempt etc. But if the overall operation fails, I feel like we must set the span status to error or else its misleading right?

return nil, err
}
log.G(ctx).Debugf("PullImage %q with snapshotter %s", ref, snapshotter)
span.SetAttributes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love love love to set attributes on existing spans!!!!!!

image, err := c.client.Pull(pctx, ref, pullOpts...)
pcancel()
if err != nil {
span.RecordError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again if we defer this it would hit any branch.

// Remove the whole image no matter the it's image id or reference. This is the
// semantic defined in CRI now.
func (c *criService) RemoveImage(ctx context.Context, r *runtime.RemoveImageRequest) (*runtime.RemoveImageResponse, error) {
ctx, span := tracing.StartSpan(ctx, makeSpanName("RemoveImage"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this again in the instrumented service rather than duplicating functions here?

defer span.End()
image, err := c.localResolve(r.GetImage().GetImage())
if err != nil {
span.RecordError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example of why defer matters. We actually arent going to error from the api when it IsNotFound so we shouldnt process it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually looking at it a little differently. If the image reference does not exist, client would only see an empty image status response and no error. Looking at the trace can provide information about the underlying error which was encountered. This will be true even if I use defer, sicne only returned errors will be caught.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new for otel so yes my expectation here was off. It sorta feels odd to record this as an error though given that its "expected" from the api perspective. Its not like the retry example I gave above that could be valid for recording an error case.

// TODO(windows): Usage for windows is always 0 right now. Support this for windows.
func (c *criService) ImageFsInfo(ctx context.Context, r *runtime.ImageFsInfoRequest) (*runtime.ImageFsInfoResponse, error) {
_, span := tracing.StartSpan(ctx, makeSpanName("ImageFsInfo"))
defer span.End()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error state?

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a really good start! Thank you so much. I left a few comments that will change the model a bit so didnt leave them everywhere but I do think they are important to address.

@swagatbora90 swagatbora90 force-pushed the trace-cri-image branch 2 times, most recently from b0c17a5 to 570cb1b Compare October 24, 2022 20:46
if err := in.checkInitialized(); err != nil {
return nil, err
}
ctx, span := tracing.StartSpan(ctx, makeSpanName("PullImage"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see we can use instrumentedService.

@kzys
Copy link
Member

kzys commented Oct 27, 2022

Looks almost good to me. Can you clean up the commits a bit? We don't have to keep try-and-error commits like 88bba62.

@swagatbora90
Copy link
Contributor Author

Looks almost good to me. Can you clean up the commits a bit? We don't have to keep try-and-error commits like 88bba62.

Sure, let me rebase and squash the commits

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is awesome! HUGE improvement

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if err := in.checkInitialized(); err != nil {
return nil, err
}
ctx, span := tracing.StartSpan(ctx, makeSpanName("PullImage"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this strings.Join a part of tracing package API?

For instance, StartSpan could accept arbitrary number of strings, like this:
tracing.StartSpan(ctx, criSbServerSpanPrefix, "PullImage")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can but does it bring any value? it will be simpler if the instrumentation side makes the decision on how to create the span name( add prefix, suffix or use any custom logic)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add prefix, suffix or use any custom logic

Does it make sense to be consistent everywhere?
Like we'd specify tracing.StartSpan(ctx, "a", "b", "c") and it'll apply same separator across the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see your point. I am planning to change the current span methods once I add a wrapper object, so I will add this change in the follow up PR.

return nil, err
}
ctx, span := tracing.StartSpan(ctx, makeSpanName("ListImages"))
defer tracing.StopSpan(span)
Copy link
Member

@mxpv mxpv Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer span.End() or defer span.Stop() or defer span.Close() feels more natural (we'd probably have to wrap span object).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that is a good point. I can make the change and wrap trace.Span object. As for context propagation, I am thinking I will keep the existing logic. I can modify StartSpan and CurrentSpan to return me the wrapper Span, but keep span context propagation as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a wrapper object, but I don't want to block this PR. That's fine to keep it as is and/or follow up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will send a follow up PR adding the wrapper object and restructure some of the current span methods.

@samuelkarp samuelkarp added the area/cri Container Runtime Interface (CRI) label Nov 2, 2022
Signed-off-by: Swagat Bora <[email protected]>

Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
Copy link
Member

@mxpv mxpv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good to get in.
We have a couple of things to follow up, which are not blocker for this PR.

LGTM on green.

Signed-off-by: Swagat Bora <[email protected]>
@mxpv
Copy link
Member

mxpv commented Nov 3, 2022

/test pull-containerd-sandboxed-node-e2e

@mxpv mxpv merged commit b2a01ee into containerd:main Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants