Skip to content

client-go/tools/cache: add APIs with context parameter#126387

Merged
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
pohly:log-client-go-tools-cache-apis
Dec 18, 2024
Merged

client-go/tools/cache: add APIs with context parameter#126387
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
pohly:log-client-go-tools-cache-apis

Conversation

@pohly
Copy link
Copy Markdown
Contributor

@pohly pohly commented Jul 26, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

The context is used for cancellation and to support contextual logging.

In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream.

An API break around SharedInformer couldn't be avoided because the
alternative (keeping the interface unchanged and adding a second one with
the new method) would have been worse. controller-runtime needs to be updated
because it implements that interface in a test package. Downstream consumers of
controller-runtime will work unless they use those test package.

Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see #126379 for the process).

Which issue(s) this PR fixes:

Related-to: #126379

Special notes for your reviewer:

#121989 contains this change and the related changes in the rest of k/k.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 26, 2024
@k8s-ci-robot k8s-ci-robot requested review from aojea and bart0sh July 26, 2024 14:14
@pohly pohly changed the title client-go/tools/cache: add APIs with context parameter WIP: client-go/tools/cache: add APIs with context parameter Jul 26, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2024
@pohly pohly force-pushed the log-client-go-tools-cache-apis branch 2 times, most recently from 8e98f6b to a5ffb2c Compare July 29, 2024 07:08
@pohly pohly changed the title WIP: client-go/tools/cache: add APIs with context parameter client-go/tools/cache: add APIs with context parameter Jul 29, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2024
@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from a5ffb2c to 215861c Compare July 30, 2024 12:12
@seans3
Copy link
Copy Markdown
Contributor

seans3 commented Jul 30, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 30, 2024
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Jul 31, 2024

/test pull-kubernetes-apidiff

}

func NewNamingConditionController(
ctx context.Context,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

- ./pkg/controller/status.NewNamingConditionController: changed from func(k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController to func(context.Context, k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController

This seems to be unused outside of Kubernetes, so breaking the API seems acceptable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all controllers can be considered internal and breaking consumers is ok.

//
// It's an error to call Run more than once.
// RunWithContext blocks; call via go.
RunWithContext(ctx context.Context)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

 ## ./staging/src/k8s.io/client-go
Incompatible changes:
- ./tools/cache.Controller.RunWithContext: added 

Adding a new method to an interface is a breaking change because out-of-tree implementations of that interface no longer work. I think this is an interface only because there is one test mocking it, not because there are supposed to be other implementations, so this API change is okay?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather change the existing. This interface seems to be very low-level. Would not be surprised if it is not used externally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me - changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this Run method is what everyone calls who uses an informer. After changing it, lots of code no longer compiled. I think we have to keep RunWithContext and transition to it gradually.

type WatchErrorHandlerWithContext func(ctx context.Context, r *Reflector, err error)

// DefaultWatchErrorHandler is the default implementation of WatchErrorHandlerWithContext.
func DefaultWatchErrorHandler(ctx context.Context, r *Reflector, err error) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

Incompatible changes:
- ./tools/cache.DefaultWatchErrorHandler: changed from func(*Reflector, error) to func(context.Context, *Reflector, error)

https://grep.app/search?q=DefaultWatchErrorHandler

Some copies seem to exist, but not used elsewhere?

Users of cache.Config can still set a watch handler without the extra context parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both opa gatekeeper and kubevirt aren't libraries. So version skew shouldn't be an issue.

// in the underlying store. This is only safe if your use of the cache can handle mutation entries
// remaining in the cache for up to ttl when mutations and deletes occur very closely in time.
func NewIntegerResourceVersionMutationCache(backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
func NewIntegerResourceVersionMutationCache(ctx context.Context, backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incompatible changes:
 - ./tools/cache.NewIntegerResourceVersionMutationCache: changed from func(Store, Indexer, time.Duration, bool) MutationCache to func(context.Context, Store, Indexer, time.Duration, bool) MutationCache 

https://grep.app/search?q=NewIntegerResourceVersionMutationCache

Not used outside of Kubernetes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look like.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an overkill, based on the discussion we've had with Jordan during KubeCon I believe we've agreed to apply context where it made sense. This cache seems like one that should not have, at most I'd expect it to only accept logger. This would resolve the other comment I had about creating a controller without the necessity for context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. This is one of those places that I hadn't updated based on the "it's okay to put logr.Logger into our APIs" conclusion.

What could potentially throw a monkey wrench into this "no context during construction" approach is when other components don't follow it. We've seen that with event recorder. Yesterday I found that construction a delaying queue also already spawns a goroutine during New*.

I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your ctx -> logger replacement for NewIntegerResourceVersionMutationCache from #129148 into this PR.

I did not take the other changes for controller construction. That can be a follow-up PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.

I don't want to clean those, yet. Probably in a separate PR, but in the long run we should definitely ensure that.

// where additional optional parameters are passed in a struct. If no resync period
// is specified there, AddEventHandlerWithConfig behaves like AddEventHandler.
// The context is used for contextual logging.
AddEventHandlerWithConfig(ctx context.Context, handler ResourceEventHandler, config HandlerConfig) (ResourceEventHandlerRegistration, error)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WithConfig is used instead of adding a WithContext because it enables adding other optional parameters besides resync period without having to add another method.

Incompatible changes:
- ./tools/cache.SharedInformer.AddEventHandlerWithConfig: added
- ./tools/cache.SharedInformer.RunWithContext: added
- ./tools/cache.SharedInformer.SetWatchErrorHandlerWithContext: added 

Same as with Controller: probably shouldn't be implemented outside of Kubernetes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented outside of kube, especially for mocking. Controller-runtime for example has an instance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to break that other code? Probably not.

The alternative is to leave cache.SharedInformer as it is and add a second cache.SharedInformerCtx with the new methods. The code which returns a cache.SharedInformer can return a cache.SharedInformerCtx (assignable to variables or fields of type cache.SharedInformer). The only API break for such producers is that the function signature changes.

The downside of this approach is that code accepting a cache.SharedInformer must continue to do so. It can optionally use the new method only after a checked type cast.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't such a struct usually called Options in Kube, not Config?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in that case, I would rather not mention Config/Options in the method name, but call it AddEventHandlerWithContext.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that satisfy what consumers of controller-runtime want

... unless those consumers import those packages:
https://grep.app/search?q=sigs.k8s.io/controller-runtime/pkg/controller/controllertest

😢

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds okay from my side. We intentionally state in our docs that every CR minor version is only compatible with a specific client-go minor version:

Every minor version of controller-runtime has been tested with a specific minor version of client-go. A controller-runtime minor version may be compatible with other client-go minor versions, but this is by chance and neither supported nor tested. In general, we create one minor version of controller-runtime for each minor version of client-go and other k8s.io/* dependencies.
https://github.com/kubernetes-sigs/controller-runtime?tab=readme-ov-file#compatibility

Also happened in the past that we weren't entirely compatible with other client-go versions.

So seems okay to me to have a breaking change here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sbueringer for heads up. I've pushed a single commit with the original proposal (extending SharedInformer) and with the renames suggested by @sttts above.

Please take another look, it might be ready for merging now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me from a CR perspective

Comment thread staging/src/k8s.io/client-go/tools/cache/shared_informer.go Outdated
@k8s-ci-robot k8s-ci-robot added area/code-generation area/dependency Issues or PRs related to dependency changes labels Nov 29, 2024
// the next notification will be attempted. This is usually better than the alternative of never
// delivering again.
stopCh := make(chan struct{})
wait.Until(func() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only purpose of wait.Until here is to deal with panics. The way it was implemented prevented shutting down the goroutine immediately when the last item in the channel caused a panic because wait.Until was stuck in a sleep.

I found that when adding more unit tests and checking for proper termination of goroutines. Insteading of allowing goleak to wait longer, I changed this code so that it sleeps only if really needed (= before processing the next item after a panic).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to grok it, but it does make sense.

@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from 460930a to 11dffab Compare November 29, 2024 15:39
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Nov 30, 2024

/retest

Some jobs didn't start.

// It replaces its internal store with the collected items and
// reuses the current watch requests for getting further events.
func (r *Reflector) watchList(stopCh <-chan struct{}) (watch.Interface, error) {
func (r *Reflector) watchList(ctx context.Context) (watch.Interface, error) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elephant in the room which does not get addressed by this PR is the ListWatcher API. The implementations all use context.TODO. They get called here:

w, err = r.listerWatcher.Watch(options)

In this PR, that code has a proper context, but cannot pass it on. Let's address that in a follow-up PR, similar to #129109?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this in #129125. But that's a massive PR that I need to split up properly. The main benefit of doing the full exercise is that there will be no more surprises later on. Decisions like "this code does not need a context" could turn out to be incorrect when later it is found that the code directly or indirectly calls code which does need a context after all.

Copy link
Copy Markdown
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest comment I have is around not adding the context to controller New* methods, since that's what we discussed a month ago during KubeCon. I've made a small POC, dropping the changes in #129148

// in the underlying store. This is only safe if your use of the cache can handle mutation entries
// remaining in the cache for up to ttl when mutations and deletes occur very closely in time.
func NewIntegerResourceVersionMutationCache(backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
func NewIntegerResourceVersionMutationCache(ctx context.Context, backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an overkill, based on the discussion we've had with Jordan during KubeCon I believe we've agreed to apply context where it made sense. This cache seems like one that should not have, at most I'd expect it to only accept logger. This would resolve the other comment I had about creating a controller without the necessity for context.

return nil, false, fmt.Errorf("failed to build token generator: %v", err)
}
tokenController, err := serviceaccountcontroller.NewTokensController(
ctx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discussed that we should not require context for the controller creation, only for the actual run method. If you change this to only pass logger, similarly to what the change below was before your modification we could also get rid of the changes in the controller to use .AddEventHandler for now. This would also address part of the concerns that Jordan had in this thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh: so do you also want to avoid adding AddEventHandlerWithContext? Or add AddEventHandlerWithLogger instead?

The problem is that there's a bunch of log calls potentially triggered by AddEventHandler. Without something passed by the caller, those cannot be properly attributed to the caller. In particular when those log calls point out things like "Warning: resync period is smaller than resync check period and the informer has already started" it would be good to get more information into the log entry from a contextual logger.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to *WithContext: part of my motivation for adding it was that it's easier for developers to cancel one context and get everything stopped which was created with it than having to remember to call all individual Stop or Remove methods. Is that goal worth extra complexity?

Perhaps not. Resource leaks are annoying, but perhaps developers simply have to be more careful if they care.

In particular when those log calls point out things like "Warning: resync period is smaller than resync check period and the informer has already started" it would be good to get more information into the log entry from a contextual logger.

That may be possible without an API change. The code underneath AddEventHandler can mark itself as helper code, then stack unwinding will skip it and the log message will be recorded as coming from the code which made the broken AddEventHandler call. How does that sound?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed AddEventHandlerWithContext and then my own new TestEventPanics reminded me why a per-event-handler logger is useful: when it panics, we want to know what panicked. I've replaced AddEventHandlerWithContext with AddEventHandlerWithOptions, with a logger as one optional option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced AddEventHandlerWithContext with AddEventHandlerWithOptions, with a logger as one optional option.

This will probably serve us much better in the long run, w/o the need to change the API.

Comment thread staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go Outdated
// It replaces its internal store with the collected items and
// reuses the current watch requests for getting further events.
func (r *Reflector) watchList(stopCh <-chan struct{}) (watch.Interface, error) {
func (r *Reflector) watchList(ctx context.Context) (watch.Interface, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh
Copy link
Copy Markdown
Contributor

soltysh commented Dec 10, 2024

I'm planning to go through the client-go changes once more tomorrow, most likely.

There are situations where it makes more sense to pass a logger through a
call chain, primarily because passing a context would imply that the call chain
should honor cancellation even though there is a different shutdown mechanism.

Using the *WithContext variants would cause additional overhead for
klog.NewContext, which hurts in particular for HandleCrash because that
function is typically a nop that doesn't actually need to log
anything. HandleCrashWithLogger avoids that overhead.

For HandleError that is less relevant because it always logs, but for the sake
of symmetry it also gets added.

Putting klog.Logger (= logr.Logger) into the public Kubernetes Go API is okay
because it's no longer realistic that these packages can ever drop the klog
dependency. Callers using slog as logger in their binary can use
https://github.com/veqryn/slog-context to store a slog.Logger in a context and
then call the *WithContext variants, klog.FromContext will be able to use it.
This is probably very rare, so there's no need for *WithSlog variants.

While at it, unit testing gets enhanced and logging in panic handlers gets
improved such that they are guaranteed to get a saner location when not
doing any caller skipping. Previously, this was undefined.
Several tests leaked goroutines. All of those get fixed where possible
without API changes. Goleak is used to prevent regressions.

One new test specifically covers shutdown of an informer and its event
handlers.
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Dec 12, 2024

I force-pushed to sync with #129125. That includes the WithLogr -> WithLogger renaming and adding of comments to suppress logcheck warnings. These suppressions will become relevant later, when the Contextual logging: HandleErrorWithContext or HandleErrorWithLogger should be used instead of HandleError in code which supports contextual logging gets turned into logcheck:context // HandleErrorWithContext ... to make it something that gets enforced in relevant code by logcheck.

@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Dec 12, 2024

/test pull-kubernetes-apidiff

}
discoveryController := NewDiscoveryController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), versionDiscoveryHandler, groupDiscoveryHandler, aggregatedDiscoveryManager)
namingController := status.NewNamingConditionController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1())
namingController := status.NewNamingConditionController(context.TODO() /* for contextual logging */, s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just the logger, like we agreed for all controllers. Especially in the New* methods.

}, r.backoffManager, true, stopCh)
klog.V(3).Infof("Stopping reflector %s (%s) from %s", r.typeDescription, r.resyncPeriod, r.name)
}, r.backoffManager, true, ctx.Done())
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)
logger.V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)

you already have there logger.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// In practice, the reflector exits the watchHandler as soon as the bookmark event is received and calls the t.C() method.
func newInitialEventsEndBookmarkTicker(name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker {
return newInitialEventsEndBookmarkTickerInternal(name, c, watchStart, 10*time.Second, exitOnWatchListBookmarkReceived)
func newInitialEventsEndBookmarkTicker(ctx context.Context, name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context seems unnecessary here, based on my understanding. Logger should be sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

}
if err != nil {
klog.Warningf("The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = %v", err)
logger.Error(err, "The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below in this method we have:

	if fallbackToList {
		err = r.list(stopCh)
		if err != nil {
			return err
		}
	}

which seems the only consumer of stopCh. I tested locally and switching that bit to context should be an easy change, especially that later in tests you're using ctx.Done to pass to list invocation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I already made that change in my full PR (#129125).

I've included it here instead.


if s.HasStarted() {
klog.Warningf("The sharedIndexInformer has started, run more than once is not allowed")
logger.Info("Warning: the sharedIndexInformer has started, run more than once is not allowed")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit on inconsistency, here you're using logger.Info("Warning: but in staging/src/k8s.io/client-go/tools/cache/listers.go#ListAllByNamespace you switch to klog.TODO().Error. I'd either do both with info and warning prefix or error, consistently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one here is correct. Warnings are not errors, just info message that "need more attention" - whatever that means.

I've changed ListAllByNamespace to do the same.

processorStopCh := make(chan struct{})
// Separate stop context because Processor should be stopped strictly after controller.
// Cancelation in the parent context is ignored.
processorStopCtx, stopProcessor := context.WithCancelCause(context.WithoutCancel(ctx))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double context wrapping is to:

  1. drop parent cancel
  2. maintain the logger from context

Do I get that correct? The former is being written down, but the 2nd point is something I had to figure out. I'd just put it in the comment, because the first thing that came to my mind was why not just start with a fresh context.Background() rather than double wrapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's not just logger (we could add that back explicitly) but also all other values which we don't know about here. I added a comment.

if resyncPeriod < s.resyncCheckPeriod {
if s.started {
klog.Warningf("resyncPeriod %v is smaller than resyncCheckPeriod %v and the informer has already started. Changing it to %v", resyncPeriod, s.resyncCheckPeriod, s.resyncCheckPeriod)
logger.Info("Warning: resync period is smaller than resync check period and the informer has already started. Changing it to the resync check period", "resyncPeriod", resyncPeriod, "resyncCheckPeriod", s.resyncCheckPeriod)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another instance of info+warning. I'm leaning towards doing that also in that once place where you've used error. I don't see a reason why it would have that error there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and changed there.

// the next notification will be attempted. This is usually better than the alternative of never
// delivering again.
stopCh := make(chan struct{})
wait.Until(func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to grok it, but it does make sense.

The context is used for cancellation and to support contextual logging.

In most cases, alternative *WithContext APIs get added, except for
NewIntegerResourceVersionMutationCache where code searches indicate that the
API is not used downstream.

An API break around SharedInformer couldn't be avoided because the
alternative (keeping the interface unchanged and adding a second one with
the new method) would have been worse. controller-runtime needs to be updated
because it implements that interface in a test package. Downstream consumers of
controller-runtime will work unless they use those test package.

Converting Kubernetes to use the other new alternatives will follow. In the
meantime, usage of the new alternatives cannot be enforced via logcheck
yet (see kubernetes#126379 for the
process).

Passing context through and checking it for cancellation is tricky for event
handlers. A better approach is to map the context cancellation to the normal
removal of an event handler via a helper goroutine. Thanks to the new
HandleErrorWithLogr and HandleCrashWithLogr, remembering the logger is
sufficient for handling problems at runtime.
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Dec 18, 2024

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-apidiff eea739a link false /test pull-kubernetes-apidiff
pull-kubernetes-apidiff-client-go 4638ba9 link false /test pull-kubernetes-apidiff-client-go
pull-kubernetes-linter-hints 4638ba9 link false /test pull-kubernetes-linter-hints

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 73c2ccaf794c3ac71eb3ff0dc4c67337c3eab90b

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, soltysh, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/dependency Issues or PRs related to dependency changes area/logging area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.

Projects

Archived in project
Archived in project
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.