Skip to content

Seeing errors/panics when trying to upgrade past 2.14.1 #12010

@alpeb

Description

@alpeb

Discussed in #11927

Originally posted by nealharris January 12, 2024
This feels like a bug to me, but I'm not exactly sure how to reproduce it, so I'm not yet willing to open this as an issue in the project.

We're currently running Linkerd 2.14.1 in production. We try to keep up to date, and always run the latest stable release. However, for each stable release after 2.14.1, we've had some trouble when trying to upgrade.

Specifically, when we upgrade in our staging environment, after a few days, we notice restarts in the destination controller. When looking at logs, I see things like

ERROR 2024-01-12T08:00:46.599129659Z [resource.labels.containerName: destination] fatal error: concurrent map iteration and map write
ERROR 2024-01-12T08:00:46.602373609Z [resource.labels.containerName: destination] {}
ERROR 2024-01-12T08:00:46.602422320Z [resource.labels.containerName: destination] goroutine 824961 [running]:
ERROR 2024-01-12T08:00:46.602427273Z [resource.labels.containerName: destination] github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add(...)
ERROR 2024-01-12T08:00:46.602430571Z [resource.labels.containerName: destination] /linkerd-build/controller/api/destination/endpoint_translator.go:192
ERROR 2024-01-12T08:00:46.602434365Z [resource.labels.containerName: destination] github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate(0xc0043d8ea0, {0x1b12480?, 0xc000e1e2e8?})
ERROR 2024-01-12T08:00:46.602436814Z [resource.labels.containerName: destination] /linkerd-build/controller/api/destination/endpoint_translator.go:183 +0x12d
ERROR 2024-01-12T08:00:46.602439176Z [resource.labels.containerName: destination] github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1()
ERROR 2024-01-12T08:00:46.602441799Z [resource.labels.containerName: destination] /linkerd-build/controller/api/destination/endpoint_translator.go:167 +0x37
ERROR 2024-01-12T08:00:46.602444278Z [resource.labels.containerName: destination] created by github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start
ERROR 2024-01-12T08:00:46.602446869Z [resource.labels.containerName: destination] /linkerd-build/controller/api/destination/endpoint_translator.go:163 +0x56
ERROR 2024-01-12T08:00:46.602449332Z [resource.labels.containerName: destination] {}
ERROR 2024-01-12T08:00:46.602479161Z [resource.labels.containerName: destination] goroutine 1 [chan receive, 2169 minutes]:
ERROR 2024-01-12T08:00:46.602482727Z [resource.labels.containerName: destination] github.com/linkerd/linkerd2/controller/cmd/destination.Main({0xc0000500e0, 0xa, 0xa})
ERROR 2024-01-12T08:00:46.602486313Z [resource.labels.containerName: destination] /linkerd-build/controller/cmd/destination/main.go:162 +0xf65
ERROR 2024-01-12T08:00:46.602496863Z [resource.labels.containerName: destination] main.main()
ERROR 2024-01-12T08:00:46.602499788Z [resource.labels.containerName: destination] /linkerd-build/controller/cmd/main.go:23 +0x191
ERROR 2024-01-12T08:00:46.602502201Z [resource.labels.containerName: destination] {}
ERROR 2024-01-12T08:00:46.602505639Z [resource.labels.containerName: destination] goroutine 48 [select]:
ERROR 2024-01-12T08:00:46.602508155Z [resource.labels.containerName: destination] go.opencensus.io/stats/view.(*worker).start(0xc000146200)
ERROR 2024-01-12T08:00:46.602513801Z [resource.labels.containerName: destination] /go/pkg/mod/[email protected]/stats/view/worker.go:292 +0xad
ERROR 2024-01-12T08:00:46.602516690Z [resource.labels.containerName: destination] created by go.opencensus.io/stats/view.init.0
ERROR 2024-01-12T08:00:46.602521661Z [resource.labels.containerName: destination] /go/pkg/mod/[email protected]/stats/view/worker.go:34 +0x8d
ERROR 2024-01-12T08:00:46.602524146Z [resource.labels.containerName: destination] {}
ERROR 2024-01-12T08:00:46.602526628Z [resource.labels.containerName: destination] goroutine 185 [sync.Cond.Wait, 10 minutes]:
ERROR 2024-01-12T08:00:46.602533518Z [resource.labels.containerName: destination] sync.runtime_notifyListWait(0xc000612028, 0xa90)
ERROR 2024-01-12T08:00:46.602543572Z [resource.labels.containerName: destination] /usr/local/go/src/runtime/sema.go:517 +0x14c
ERROR 2024-01-12T08:00:46.602548176Z [resource.labels.containerName: destination] sync.(*Cond).Wait(0xc0042b0800?)
ERROR 2024-01-12T08:00:46.602551037Z [resource.labels.containerName: destination] /usr/local/go/src/sync/cond.go:70 +0x8c
ERROR 2024-01-12T08:00:46.602558126Z [resource.labels.containerName: destination] k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc000612000, 0xc000616010)
ERROR 2024-01-12T08:00:46.602560684Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/tools/cache/delta_fifo.go:575 +0x256
ERROR 2024-01-12T08:00:46.602569140Z [resource.labels.containerName: destination] k8s.io/client-go/tools/cache.(*controller).processLoop(0xc000618000)
ERROR 2024-01-12T08:00:46.602571907Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:192 +0x36
ERROR 2024-01-12T08:00:46.602592218Z [resource.labels.containerName: destination] k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40d91f?)
ERROR 2024-01-12T08:00:46.602612454Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x3e
ERROR 2024-01-12T08:00:46.602621037Z [resource.labels.containerName: destination] k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xb331c5?, {0x22a2200, 0xc000614030}, 0x1, 0x0)
ERROR 2024-01-12T08:00:46.602624141Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xb6
ERROR 2024-01-12T08:00:46.602627223Z [resource.labels.containerName: destination] k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000618078?, 0x3b9aca00, 0x0, 0x0?, 0x8bb2c97000?)
ERROR 2024-01-12T08:00:46.602629739Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x89
ERROR 2024-01-12T08:00:46.602632922Z [resource.labels.containerName: destination] k8s.io/apimachinery/pkg/util/wait.Until(...)
ERROR 2024-01-12T08:00:46.602635384Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
ERROR 2024-01-12T08:00:46.602649519Z [resource.labels.containerName: destination] k8s.io/client-go/tools/cache.(*controller).Run(0xc000618000, 0x0)
ERROR 2024-01-12T08:00:46.602666866Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:163 +0x385
ERROR 2024-01-12T08:00:46.602693511Z [resource.labels.containerName: destination] k8s.io/client-go/tools/cache.(*sharedIndexInformer).Run(0xc00051a210, 0xc000129fd0?)
ERROR 2024-01-12T08:00:46.602705904Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:503 +0x542
ERROR 2024-01-12T08:00:46.602710075Z [resource.labels.containerName: destination] k8s.io/client-go/informers.(*sharedInformerFactory).Start.func1()
ERROR 2024-01-12T08:00:46.602713741Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/informers/factory.go:150 +0x6b
ERROR 2024-01-12T08:00:46.602717251Z [resource.labels.containerName: destination] created by k8s.io/client-go/informers.(*sharedInformerFactory).Start
ERROR 2024-01-12T08:00:46.602721222Z [resource.labels.containerName: destination] /go/pkg/mod/k8s.io/[email protected]/informers/factory.go:148 +0x22a

The above are from our most recent attempt to upgrade, which is for 2.14.8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions