fix(roleinstance): nil pointer dereference in in-place update#274
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces robust handling for missing ControllerRevision objects during Pod updates. Specifically, it adds a nil check in splitComponentControllerRevision and implements a fallback to the ReCreate strategy in updatePod when a Pod's referenced revision is no longer available. Additionally, it improves context usage by replacing context.TODO() with the provided context and includes comprehensive unit tests for these scenarios. I have no feedback to provide.
There was a problem hiding this comment.
Pull request overview
Fixes a nil pointer dereference in RoleInstance pod in-place update flow when the pod’s referenced ControllerRevision no longer exists, by falling back to recreate and adding a defensive nil check in revision-splitting logic.
Changes:
- Add
oldRevision == nilguard inupdatePodto delete/recreate instead of attempting in-place update with a nil revision. - Add a nil guard in
splitComponentControllerRevisionto return an error rather than panic. - Add focused unit tests covering nil revision handling and recreate fallback behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/reconciler/roleinstance/sync/instance_update.go | Adds nil oldRevision fallback path and uses ctx for delete. |
| pkg/reconciler/roleinstance/sync/instance_update_test.go | Adds tests for nil revision fallback, in-place success, and recreate fallback. |
| pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate.go | Adds nil input guard returning error in revision splitting. |
| pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate_test.go | Adds test ensuring nil revision returns an error. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When a pod references a ControllerRevision that has been garbage-collected by truncateHistory, the updatePod function leaves oldRevision as nil. This nil flows into splitComponentControllerRevision which panics on revision.Data.Raw. Fix by adding a nil guard in updatePod that falls back to recreate (delete pod) when the old revision is missing, and a defense-in-depth nil check in splitComponentControllerRevision. Also fix pre-existing context.TODO() usage to propagate the reconciler's ctx. Upstream: mslsrc/supercomputing#418
e88b905 to
a46de72
Compare
|
@cheyang / @Syspretor : any change required on this PR? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
updatePodwhen a pod's referenced ControllerRevision is missingErrNilControllerRevisionguard insplitComponentControllerRevisionctxfor the recreate delete path and make theRevisionNotFoundlog and event wording accurateerrors.Is, non-empty pod metadata, expectation cleanup, and preciseNotFoundassertionsUpstream: mslsrc/supercomputing#418
Test plan
go test ./pkg/reconciler/roleinstance/inplaceupdatego test ./pkg/reconciler/roleinstance/sync