Skip to content

Add support for kubernetes events#87

Merged
karanthukral merged 8 commits intomainfrom
karan/cpo-79/event-recorder
Sep 15, 2025
Merged

Add support for kubernetes events#87
karanthukral merged 8 commits intomainfrom
karan/cpo-79/event-recorder

Conversation

@karanthukral
Copy link
Copy Markdown
Contributor

💸 TL;DR

  • Adds support for Kubernetes events
  • Added the ability for the the sdk to automatically create metrics for every event fired. Idea being that this is easier to track externally to k8s

📜 Details

  • Creates a new events package to create a client-go event recorder. Despite the event recorder interface being extremely simple and already being able to use it with the sdk, the new events package will allow us to add extra smarts to the package.
  • I added the ability for the caller to pass an optional metrics interface to the event recorder creation method. If passed, the interface will automatically emit prometheus metrics for each created event. The idea behind this would be to have the event creation be externally observable along with the option of adding alerting to it.
  • When an object is deleted the event metrics will also get cleaned up

🧪 Testing Steps / Validation

  • I used the branch in a local version of an internal controller.
  • The setup logic example:
eventRecorder := fsmevents.NewEventRecorder("orchestration-controller-manager", mgr, metrics)

// pass it to internal context so it can be used by a controller
  • Example event creation in code:
r.eventRecorder.RecordEvent(ns, "NSCreated", "Namespace created in cluster")
  • Testing in running controller, kubectl describe output
Events:
  Type    Reason         Age   From                              Message
  ----    ------         ----  ----                              -------
  Normal  NSCreated  5s    orchestration-controller-manager  Namespace created in cluster
  • Metrics output (curl):
# TYPE achilles_event counter
achilles_event{controller="orchestration-controller-manager",eventType="Normal",group="app.infrared.reddit.com",kind="RedditNamespace",objName="test-apps",objNamespace="",reason="NSCreated",version="v1alpha1"} 3
  • Deleted the object and ensured the metrics were cleaned up:
❯ curl localhost:8080/metrics | grep achilles_event
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  396k    0  396k    0     0  2388k      0 --:--:-- --:--:-- --:--:-- 2400k

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

I am open to skipping the metrics inclusion but wanted to share the current version to start getting people's thoughts on the utility of the events interface

@karanthukral karanthukral requested a review from a team as a code owner September 12, 2025 14:43
@karanthukral karanthukral force-pushed the karan/cpo-79/event-recorder branch from 1b3a8d2 to 764d0fd Compare September 12, 2025 14:45
Copy link
Copy Markdown

@william-richard william-richard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it - KISS!

if message == "" {
message = "Object is ready"
}
e.recorder.Event(obj, EventTypeNormal, EventReadyReason, message)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought for future iterations.

One best practice I've seen around k8s events is avoiding repetitive/duplicate events - like emitting an event once when an object is reconciled successfully for the first time, and not on subsequent reconsile loops.

It might be nice if this RecordReady function did the dedupe for you, or had a flag to enable/disable dedupe logic.

Like I said, just a thought - feel free to kick this idea into the future!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool idea. Let me think on how best to do the dedup. Due to the concern with event cardinality is the reason why I chose not to emit events automatically in the sdk and instead leave it up to the controller especially in v1. That being said let me mull it over on how best to tackle this in the interface itself

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to discourage the idea, but this may be difficult to implement on the SDK given that Events model point-in-time events. So, naively, an event for reconciliation at T1 is distinct from an event for reconciliation at T2.

That being said, the user of the SDK can make decisions about what constitutes as a "unique" event (i.e. the dedupe logic would need to be on the caller side since it's contingent on their specific logic)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I decided to punt on this as a follow up to have some time to think about how best to do it or make the call that this should be a client side responsibility in which case we can come up with a recommendation in the docs

Copy link
Copy Markdown
Member

@jessicayuen jessicayuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay events!

return
}

m.sink.RecordEvent(triggerGVK, objectName, objectNamespace, eventType, reason, controllerName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can recording fail? should we be erroring here and surfacing this in the logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics interface from prometheus for Inc() call does not return any error so not really sure how to capture failure and this matches what we do with other metrics. Let me look again to see if I missed something. For context the prom inc function looks like:

func (c *counter) Inc() {
	atomic.AddUint64(&c.valInt, 1)
}

Copy link
Copy Markdown
Collaborator

@harveyxia harveyxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM—how do you envision metrics against the events being used?

Comment on lines +12 to +15
EventReadyReason = "Ready"

EventTypeNormal = "Normal"
EventTypeWarning = "Warning"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be kept private if access is done through the receiver methods like RecordReady?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good catch I'll make them private

if message == "" {
message = "Object is ready"
}
e.recorder.Event(obj, EventTypeNormal, EventReadyReason, message)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to discourage the idea, but this may be difficult to implement on the SDK given that Events model point-in-time events. So, naively, an event for reconciliation at T1 is distinct from an event for reconciliation at T2.

That being said, the user of the SDK can make decisions about what constitutes as a "unique" event (i.e. the dedupe logic would need to be on the caller side since it's contingent on their specific logic)

@karanthukral
Copy link
Copy Markdown
Contributor Author

how do you envision metrics against the events being used?

I was chatting with Justin and he mentioned that it could be useful to be able to alert on certain k8s events or if events are not simply tied to state transitions, having metrics would provide an easy way to visualize them in a dashboard. It feels very much a client specific behaviour regarding this utility which is why I pitched the idea to see what others thought

@karanthukral karanthukral merged commit 3cfc237 into main Sep 15, 2025
1 check passed
@karanthukral karanthukral deleted the karan/cpo-79/event-recorder branch September 15, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants