Skip to content

fix(roleinstance): nil pointer dereference in in-place update#274

Merged
Syspretor merged 2 commits intosgl-project:mainfrom
sebest:fix/nil-pointer-roleinstance-inplace-update
Apr 22, 2026
Merged

fix(roleinstance): nil pointer dereference in in-place update#274
Syspretor merged 2 commits intosgl-project:mainfrom
sebest:fix/nil-pointer-roleinstance-inplace-update

Conversation

@sebest
Copy link
Copy Markdown
Contributor

@sebest sebest commented Apr 15, 2026

Summary

  • Fix nil pointer dereference in updatePod when a pod's referenced ControllerRevision is missing
  • Fall back to recreate when no matching revision is found for the pod
  • Add a sentinel ErrNilControllerRevision guard in splitComponentControllerRevision
  • Use ctx for the recreate delete path and make the RevisionNotFound log and event wording accurate
  • Harden unit tests with errors.Is, non-empty pod metadata, expectation cleanup, and precise NotFound assertions

Upstream: mslsrc/supercomputing#418

Test plan

  • go test ./pkg/reconciler/roleinstance/inplaceupdate
  • go test ./pkg/reconciler/roleinstance/sync

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust handling for missing ControllerRevision objects during Pod updates. Specifically, it adds a nil check in splitComponentControllerRevision and implements a fallback to the ReCreate strategy in updatePod when a Pod's referenced revision is no longer available. Additionally, it improves context usage by replacing context.TODO() with the provided context and includes comprehensive unit tests for these scenarios. I have no feedback to provide.

@sebest sebest marked this pull request as draft April 15, 2026 08:54
@sebest sebest marked this pull request as ready for review April 15, 2026 08:55
@cheyang cheyang requested a review from Copilot April 17, 2026 02:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a nil pointer dereference in RoleInstance pod in-place update flow when the pod’s referenced ControllerRevision no longer exists, by falling back to recreate and adding a defensive nil check in revision-splitting logic.

Changes:

  • Add oldRevision == nil guard in updatePod to delete/recreate instead of attempting in-place update with a nil revision.
  • Add a nil guard in splitComponentControllerRevision to return an error rather than panic.
  • Add focused unit tests covering nil revision handling and recreate fallback behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pkg/reconciler/roleinstance/sync/instance_update.go Adds nil oldRevision fallback path and uses ctx for delete.
pkg/reconciler/roleinstance/sync/instance_update_test.go Adds tests for nil revision fallback, in-place success, and recreate fallback.
pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate.go Adds nil input guard returning error in revision splitting.
pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate_test.go Adds test ensuring nil revision returns an error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/reconciler/roleinstance/sync/instance_update_test.go
Comment thread pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate.go
Comment thread pkg/reconciler/roleinstance/inplaceupdate/inplaceupdate_test.go
Comment thread pkg/reconciler/roleinstance/sync/instance_update.go Outdated
Comment thread pkg/reconciler/roleinstance/sync/instance_update_test.go
sebest added 2 commits April 21, 2026 22:39
When a pod references a ControllerRevision that has been garbage-collected
by truncateHistory, the updatePod function leaves oldRevision as nil. This
nil flows into splitComponentControllerRevision which panics on
revision.Data.Raw.

Fix by adding a nil guard in updatePod that falls back to recreate (delete
pod) when the old revision is missing, and a defense-in-depth nil check in
splitComponentControllerRevision. Also fix pre-existing context.TODO() usage
to propagate the reconciler's ctx.

Upstream: mslsrc/supercomputing#418
@sebest sebest force-pushed the fix/nil-pointer-roleinstance-inplace-update branch from e88b905 to a46de72 Compare April 22, 2026 05:39
@sebest
Copy link
Copy Markdown
Contributor Author

sebest commented Apr 22, 2026

@cheyang / @Syspretor : any change required on this PR?

Copy link
Copy Markdown
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Syspretor Syspretor merged commit fc6bad9 into sgl-project:main Apr 22, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants