Skip to content

Conversation

@danlenar
Copy link
Contributor

@danlenar danlenar commented Feb 17, 2023

Fixes panic produced by containerd when cri metrics are enabled.

The panic was observed with containerd 1.6.18 and k8s 1.23.16.
We started seeing this issue when on k8s side, we enabled the feature gate PodAndContainerStatsFromCRI

fatal error: concurrent map read and map write
goroutine 24245 [running]:
runtime.throw({0x55555c169112?, 0xc003f0b340?})
        /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00575c6c8 sp=0xc00575c698 pc=0x55555aed5511
runtime.mapaccess1(0x55555c7717e0?, 0xc00145ee10?, 0x40?)
        /usr/local/go/src/runtime/map.go:415 +0x1f7 fp=0xc00575c708 sp=0xc00575c6c8 pc=0x55555aeaccd7
github.com/containerd/containerd/pkg/cri/store/sandbox.(*Store).UpdateContainerStats(0xc00145eea0, {0xc001d4d140?, 0xc00179a240?}, 0xc003f28a00)
        /root/rpmbuild/BUILD/pkg/cri/store/sandbox/sandbox.go:142 +0x1d9 fp=0xc00575c850 sp=0xc00575c708 pc=0x55555bdeff59
github.com/containerd/containerd/pkg/cri/server.(*criService).getUsageNanoCores(0xc00049d800, {0xc001d4d140, 0x40}, 0x1, 0x144f0a7e, {0x203001?, 0x203001?, 0x55555d63bc20?})
        /root/rpmbuild/BUILD/pkg/cri/server/container_stats_list_linux.go:134 +0x465 fp=0xc00575cce0 sp=0xc00575c850 pc=0x55555bfdfc45
github.com/containerd/containerd/pkg/cri/server.(*criService).cpuContainerStats(0xc00575ce78?, {0xc001d4d140, 0x40}, 0xc5?, {0x55555c8ff4e0?, 0xc0058ccab0?}, {0x1000000002000?, 0x7f60b05d08b0?, 0x55555d6>
        /root/rpmbuild/BUILD/pkg/cri/server/container_stats_list_linux.go:205 +0x30f fp=0xc00575ce00 sp=0xc00575cce0 pc=0x55555bfe048f
github.com/containerd/containerd/pkg/cri/server.(*criService).podSandboxStats(0xc00049d800, {0x55555c9d7d58, 0xc00512be30}, {{{0xc001d4d140, 0x40}, {0xc00179a240, 0x5c}, 0xc000236e70, {0xc001d4d1c0, 0x37>
        /root/rpmbuild/BUILD/pkg/cri/server/sandbox_stats_linux.go:60 +0x2be fp=0xc00575d1b8 sp=0xc00575ce00 pc=0x55555c0267be
github.com/containerd/containerd/pkg/cri/server.(*criService).ListPodSandboxStats(0x55555c9d7d58?, {0x55555c9d7d58, 0xc00512be30}, 0x6?)
        /root/rpmbuild/BUILD/pkg/cri/server/sandbox_stats_list.go:41 +0x156 fp=0xc00575d3d8 sp=0xc00575d1b8 pc=0x55555c027b36
github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).ListPodSandboxStats(0xc00044e890, {0x55555c9d7d58, 0xc00512bc80}, 0xc005905b10)
        /root/rpmbuild/BUILD/pkg/cri/server/instrumented_service.go:1319 +0x1c6 fp=0xc00575d480 sp=0xc00575d3d8 pc=0x55555c0099a6
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_ListPodSandboxStats_Handler.func1({0x55555c9d7d58, 0xc00512bc80}, {0x55555c919020?, 0xc005905b10})
        /root/rpmbuild/BUILD/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:9862 +0x78 fp=0xc00575d4c0 sp=0xc00575d480 pc=0x55555bbc1298
github.com/containerd/containerd/services/server.unaryNamespaceInterceptor({0x55555c9d7d58, 0xc00512bc80}, {0x55555c919020, 0xc005905b10}, 0x3?, 0xc001b72330)
        /root/rpmbuild/BUILD/services/server/namespace.go:31 +0x6b fp=0xc00575d4f0 sp=0xc00575d4c0 pc=0x55555c0ea04b
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x55555c9d7d58?, 0xc00512bc80?}, {0x55555c919020?, 0xc005905b10?})
        /root/rpmbuild/BUILD/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a fp=0xc00575d530 sp=0xc00575d4f0 pc=0x55555c0e0d1a
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x55555c9d7d58, 0xc00512bc80}, {0x55555c919020, 0xc005905b10}, 0x0?, 0xc0008e5a60)
        /root/rpmbuild/BUILD/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:107 +0x87 fp=0xc00575d590 sp=0xc00575d530 pc=0x55555c0e3867
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x55555c9d7d58?, 0xc00512bc80?}, {0x55555c919020?, 0xc005905b10?})
        /root/rpmbuild/BUILD/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a fp=0xc00575d5d0 sp=0xc00575d590 pc=0x55555c0e0d1a
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1({0x55555c9d7d58, 0xc00512ba40}, {0x55555c919020, 0xc005905b10}, 0xc0008e5a00, 0xc0008e5a80)
        /root/rpmbuild/BUILD/vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:325 +0x664 fp=0xc00575da10 sp=0xc00575d5d0 pc=0x55555c0e7484
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x55555c9d7d58?, 0xc00512ba40?}, {0x55555c919020?, 0xc005905b10?})
        /root/rpmbuild/BUILD/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a fp=0xc00575da50 sp=0xc00575da10 pc=0x55555c0e0d1a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x55555c9d7d58, 0xc00512ba40}, {0x55555c919020, 0xc005905b10}, 0xc0024e9af0?, 0x55555c789220?)
        /root/rpmbuild/BUILD/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xbf fp=0xc00575daa8 sp=0xc00575da50 pc=0x55555c0e0bbf
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_ListPodSandboxStats_Handler({0x55555c985e60?, 0xc00044e890}, {0x55555c9d7d58, 0xc00512ba40}, 0xc003f3e060, 0xc000322090)
        /root/rpmbuild/BUILD/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:9864 +0x138 fp=0xc00575db00 sp=0xc00575daa8 pc=0x55555bbc1158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000404540, {0x55555c9ddb88, 0xc0043cc000}, 0xc004c4d7a0, 0xc001834780, 0x55555d5b3838, 0x0)
        /root/rpmbuild/BUILD/vendor/google.golang.org/grpc/server.go:1283 +0xcfd fp=0xc00575de48 sp=0xc00575db00 pc=0x55555b53841d
google.golang.org/grpc.(*Server).handleStream(0xc000404540, {0x55555c9ddb88, 0xc0043cc000}, 0xc004c4d7a0, 0x0)
        /root/rpmbuild/BUILD/vendor/google.golang.org/grpc/server.go:1620 +0xa1b fp=0xc00575df68 sp=0xc00575de48 pc=0x55555b53ca7b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /root/rpmbuild/BUILD/vendor/google.golang.org/grpc/server.go:922 +0x98 fp=0xc00575dfe0 sp=0xc00575df68 pc=0x55555b535f38
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00575dfe8 sp=0xc00575dfe0 pc=0x55555af0a241
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /root/rpmbuild/BUILD/vendor/google.golang.org/grpc/server.go:920 +0x28a

@k8s-ci-robot
Copy link

Hi @danlenar. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dcantah
Copy link
Member

dcantah commented Feb 18, 2023

Can you include a little blurb in the commit message describing how this fixes the panic? PR description is fine as you included the stack trace that shows the offender.

@samuelkarp
Copy link
Member

/ok-to-test

@samuelkarp
Copy link
Member

/test pull-containerd-sandboxed-node-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch kind/bug ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants