refactor(rbg): simplify rbg reconcile - Phase 1#155
refactor(rbg): simplify rbg reconcile - Phase 1#155cheyang merged 2 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @Syspretor, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a major refactoring of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly refactors the Reconcile function in rolebasedgroup_controller.go, breaking it down into smaller, more manageable steps. This greatly improves the readability and maintainability of the code. The introduction of a cache for workload reconcilers is a good performance optimization. I've added a couple of minor suggestions for further improvement. Overall, this is a great refactoring.
There was a problem hiding this comment.
Pull request overview
Refactors the RoleBasedGroup controller reconcile loop into clearer step-based helpers (phase 1), and marks one e2e scenario as pending.
Changes:
- Split
Reconcile()into structured steps (preCheck,handleRevisions, status construction, coordination strategy handling, role reconciliation, cleanup). - Add cached
WorkloadReconcilercreation with RWMutex protection to avoid repeated instantiation. - Mark the volcano gang scheduling e2e test as pending.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
internal/controller/workloads/rolebasedgroup_controller.go |
Refactors reconcile flow into helper methods and adds a cached workload reconciler map. |
test/e2e/testcase/rbg.go |
Changes the volcano gang scheduling test from It to PIt (pending). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
315d06c to
d59ba2e
Compare
Pull Request Test Coverage Report for Build 21506047280Details
💛 - Coveralls |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a solid refactoring of the Reconcile function in rolebasedgroup_controller.go. Breaking down the logic into smaller, well-defined functions for pre-checks, revision handling, role reconciliation, and cleanup significantly improves the code's readability and maintainability. The introduction of a cache for workload reconcilers is a good performance enhancement.
My review focuses on further improving the robustness and safety of the new code. I've suggested ensuring all cleanup operations are attempted, even if one fails, and refactoring a couple of loops to avoid a common Go pitfall of taking the address of a loop variable. These changes will make the code more resilient and align it better with Go best practices.
| for _, role := range rbg.Spec.Roles { | ||
| logger := log.FromContext(ctx) | ||
| roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name)) | ||
| // first check whether watch lws cr | ||
| dynamicWatchCustomCRD(roleCtx, role.Workload.Kind) | ||
|
|
||
| reconciler, err := reconciler.NewWorkloadReconciler(role.Workload, r.scheme, r.client) | ||
| workloadReconciler, err := r.getOrCreateWorkloadReconciler(roleCtx, role.Workload) | ||
| if err != nil { | ||
| logger.Error(err, "Failed to create workload reconciler") | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to reconcile role %s, err: %v", role.Name, err, | ||
| "Failed to validate role %s, err: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err)) | ||
| continue | ||
| } | ||
|
|
||
| if err := reconciler.Validate(ctx, &role); err != nil { | ||
| if err := workloadReconciler.Validate(roleCtx, &role); err != nil { | ||
| logger.Error(err, "Failed to validate role declaration") | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to validate role %s declaration, err: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err)) | ||
| } | ||
|
|
||
| roleReconciler[role.Name] = reconciler | ||
|
|
||
| roleStatus, updateRoleStatus, err := reconciler.ConstructRoleStatus(roleCtx, rbg, &role) | ||
| if err != nil { | ||
| if !apierrors.IsNotFound(err) { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to construct role %s status: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| } | ||
| logger.V(1).Info("Workload not found for role, using empty status", "role", role.Name) | ||
| } | ||
| updateStatus = updateStatus || updateRoleStatus | ||
| roleStatuses = append(roleStatuses, roleStatus) | ||
| } |
There was a problem hiding this comment.
The loop for _, role := range rbg.Spec.Roles creates a copy of the role for each iteration. Taking the address of this loop variable (&role) can be risky if the called function stores the pointer, as it would point to the same memory location which gets updated in each iteration. Although it's not causing a bug here, it's a common pitfall in Go. It's safer to iterate by index and get a pointer to the slice element directly.
for i := range rbg.Spec.Roles {
role := &rbg.Spec.Roles[i]
roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name))
workloadReconciler, err := r.getOrCreateWorkloadReconciler(roleCtx, role.Workload)
if err != nil {
logger.Error(err, "Failed to create workload reconciler")
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to validate role %s, err: %v", role.Name, err,
)
err_s := append(errs, fmt.Errorf("role %s: %w", role.Name, err))
continue
}
if err := workloadReconciler.Validate(roleCtx, role); err != nil {
logger.Error(err, "Failed to validate role declaration")
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to validate role %s declaration, err: %v", role.Name, err,
)
err_s := append(errs, fmt.Errorf("role %s: %w", role.Name, err))
}
}| func (r *RoleBasedGroupReconciler) cleanup(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error { | ||
| // Delete orphan roles | ||
| if err := r.deleteOrphanRoles(ctx, rbg); err != nil { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, "delete role error", | ||
| "Failed to delete roles for %s: %v", rbg.Name, err, | ||
| "Failed to delete orphan roles for %s: %v", rbg.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| return err | ||
| } | ||
|
|
||
| // delete expired controllerRevision | ||
| // Delete expired controllerRevision | ||
| if _, err := utils.CleanExpiredRevision(ctx, r.client, rbg); err != nil { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, "delete expired revision error", | ||
| "Failed to delete expired revision for %s: %v", rbg.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| return err | ||
| } | ||
|
|
||
| r.recorder.Event(rbg, corev1.EventTypeNormal, Succeed, "ReconcileSucceed") | ||
| return ctrl.Result{}, nil | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The current implementation of the cleanup function returns early if deleteOrphanRoles fails, which prevents CleanExpiredRevision from running. To make the cleanup process more robust, it's better to execute all cleanup steps and aggregate any errors that occur. This ensures that even if one cleanup step fails, the others are still attempted.
func (r *RoleBasedGroupReconciler) cleanup(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error {
var allErrs []error
// Delete orphan roles
if err := r.deleteOrphanRoles(ctx, rbg); err != nil {
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, "delete role error",
"Failed to delete orphan roles for %s: %v", rbg.Name, err,
)
allErrs = append(allErrs, err)
}
// Delete expired controllerRevision
if _, err := utils.CleanExpiredRevision(ctx, r.client, rbg); err != nil {
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, "delete expired revision error",
"Failed to delete expired revision for %s: %v", rbg.Name, err,
)
allErrs = append(allErrs, err)
}
return kerrors.NewAggregate(allErrs)
}| for _, role := range rbg.Spec.Roles { | ||
| logger := log.FromContext(ctx) | ||
| roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name)) | ||
|
|
||
| reconciler, err := r.getOrCreateWorkloadReconciler(ctx, role.Workload) | ||
| if err != nil { | ||
| logger.Error(err, "Failed to get workload reconciler") | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to get workload reconciler for role %s, err: %v", role.Name, err, | ||
| ) | ||
| return nil, err | ||
| } else { | ||
| logger.Info(fmt.Sprintf("Create revision [%s] successfully", expectedRevision.Name)) | ||
| r.recorder.Event(rbg, corev1.EventTypeNormal, SucceedCreateRevision, "Successful create revision for RoleBasedGroup") | ||
| } | ||
|
|
||
| roleStatus, updateRoleStatus, err := reconciler.ConstructRoleStatus(roleCtx, rbg, &role) | ||
| if err != nil { | ||
| if !apierrors.IsNotFound(err) { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to construct role %s status: %v", role.Name, err, | ||
| ) | ||
| return nil, err | ||
| } | ||
| } | ||
| updateStatus = updateStatus || updateRoleStatus | ||
| roleStatuses = append(roleStatuses, roleStatus) | ||
| } |
There was a problem hiding this comment.
Similar to the preCheck function, this loop takes the address of a loop variable (&role). This is a potential source of bugs if the pointer is stored by any of the functions it's passed to. To avoid this common Go pitfall, it's safer to iterate using an index and get a pointer to the actual slice element.
for i := range rbg.Spec.Roles {
role := &rbg.Spec.Roles[i]
logger := log.FromContext(ctx)
roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name))
reconciler, err := r.getOrCreateWorkloadReconciler(ctx, role.Workload)
if err != nil {
logger.Error(err, "Failed to get workload reconciler")
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to get workload reconciler for role %s, err: %v", role.Name, err,
)
return nil, err
}
roleStatus, updateRoleStatus, err := reconciler.ConstructRoleStatus(roleCtx, rbg, role)
if err != nil {
if !apierrors.IsNotFound(err) {
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to construct role %s status: %v", role.Name, err,
)
return nil, err
}
}
updateStatus = updateStatus || updateRoleStatus
roleStatuses = append(roleStatuses, roleStatus)
}There was a problem hiding this comment.
Code Review
This pull request is a great step towards simplifying the Reconcile function in the RoleBasedGroupReconciler. By breaking down the monolithic function into smaller, well-defined steps (preCheck, handleRevisions, reconcileRoles, cleanup, etc.), you've significantly improved the code's readability and maintainability. The introduction of a cache for WorkloadReconciler instances is also a nice performance optimization.
My review includes a couple of suggestions to further improve error handling by aggregating errors instead of returning early, which will provide a better experience for users by reporting all validation or cleanup issues at once.
Overall, this is a solid refactoring. Well done!
| func (r *RoleBasedGroupReconciler) preCheck(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error { | ||
| logger := log.FromContext(ctx) | ||
| // Validate RoleTemplates | ||
| if err := workloadsv1alpha1.ValidateRoleTemplates(rbg); err != nil { | ||
| r.recorder.Event(rbg, corev1.EventTypeWarning, InvalidRoleTemplates, err.Error()) | ||
| return errors.Wrap(err, "invalid role templates") | ||
| } | ||
|
|
||
| // Validate template references in roles | ||
| if err := workloadsv1alpha1.ValidateRoleTemplateReferences(rbg); err != nil { | ||
| r.recorder.Event(rbg, corev1.EventTypeWarning, InvalidTemplateRef, err.Error()) | ||
| return errors.Wrap(err, "invalid template references") | ||
| } | ||
|
|
||
| // Validate role workload declarations | ||
| var errs []error | ||
| for _, role := range rbg.Spec.Roles { | ||
| logger := log.FromContext(ctx) | ||
| roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name)) | ||
| // first check whether watch lws cr | ||
| dynamicWatchCustomCRD(roleCtx, role.Workload.Kind) | ||
|
|
||
| reconciler, err := reconciler.NewWorkloadReconciler(role.Workload, r.scheme, r.client) | ||
| workloadReconciler, err := r.getOrCreateWorkloadReconciler(roleCtx, role.Workload) | ||
| if err != nil { | ||
| logger.Error(err, "Failed to create workload reconciler") | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to reconcile role %s, err: %v", role.Name, err, | ||
| "Failed to validate role %s, err: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err)) | ||
| continue | ||
| } | ||
|
|
||
| if err := reconciler.Validate(ctx, &role); err != nil { | ||
| if err := workloadReconciler.Validate(roleCtx, &role); err != nil { | ||
| logger.Error(err, "Failed to validate role declaration") | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to validate role %s declaration, err: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err)) | ||
| } | ||
|
|
||
| roleReconciler[role.Name] = reconciler | ||
|
|
||
| roleStatus, updateRoleStatus, err := reconciler.ConstructRoleStatus(roleCtx, rbg, &role) | ||
| if err != nil { | ||
| if !apierrors.IsNotFound(err) { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedReconcileWorkload, | ||
| "Failed to construct role %s status: %v", role.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| } | ||
| logger.V(1).Info("Workload not found for role, using empty status", "role", role.Name) | ||
| } | ||
| updateStatus = updateStatus || updateRoleStatus | ||
| roleStatuses = append(roleStatuses, roleStatus) | ||
| } | ||
|
|
||
| // Update the status based on the observed role statuses. | ||
| if updateStatus { | ||
| if err := r.updateRBGStatus(ctx, rbg, roleStatuses); err != nil { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, FailedUpdateStatus, | ||
| "Failed to update status for %s: %v", rbg.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| } | ||
| } | ||
| return errors.Wrap(kerrors.NewAggregate(errs), "invalid role workload declarations") | ||
| } |
There was a problem hiding this comment.
The preCheck function currently returns early on the first validation failure for RoleTemplates and RoleTemplateReferences. This prevents aggregation of all validation errors, which can be inconvenient for users who have to fix issues one by one. It's better to collect all validation errors and return them as a single aggregate error. This provides a more comprehensive feedback loop.
func (r *RoleBasedGroupReconciler) preCheck(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error {
logger := log.FromContext(ctx)
var errs []error
// Validate RoleTemplates
if err := workloadsv1alpha1.ValidateRoleTemplates(rbg); err != nil {
r.recorder.Event(rbg, corev1.EventTypeWarning, InvalidRoleTemplates, err.Error())
errs = append(errs, errors.Wrap(err, "invalid role templates"))
}
// Validate template references in roles
if err := workloadsv1alpha1.ValidateRoleTemplateReferences(rbg); err != nil {
r.recorder.Event(rbg, corev1.EventTypeWarning, InvalidTemplateRef, err.Error())
errs = append(errs, errors.Wrap(err, "invalid template references"))
}
// Validate role workload declarations
for _, role := range rbg.Spec.Roles {
roleCtx := log.IntoContext(ctx, logger.WithValues("role", role.Name))
workloadReconciler, err := r.getOrCreateWorkloadReconciler(roleCtx, role.Workload)
if err != nil {
logger.Error(err, "Failed to create workload reconciler")
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to validate role %s, err: %v", role.Name, err,
)
errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err))
continue
}
if err := workloadReconciler.Validate(roleCtx, &role); err != nil {
logger.Error(err, "Failed to validate role declaration")
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, FailedReconcileWorkload,
"Failed to validate role %s declaration, err: %v", role.Name, err,
)
errs = append(errs, fmt.Errorf("role %s: %w", role.Name, err))
}
}
return kerrors.NewAggregate(errs)
}| func (r *RoleBasedGroupReconciler) cleanup(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error { | ||
| // Delete orphan roles | ||
| if err := r.deleteOrphanRoles(ctx, rbg); err != nil { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, "delete role error", | ||
| "Failed to delete roles for %s: %v", rbg.Name, err, | ||
| "Failed to delete orphan roles for %s: %v", rbg.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| return err | ||
| } | ||
|
|
||
| // delete expired controllerRevision | ||
| // Delete expired controllerRevision | ||
| if _, err := utils.CleanExpiredRevision(ctx, r.client, rbg); err != nil { | ||
| r.recorder.Eventf( | ||
| rbg, corev1.EventTypeWarning, "delete expired revision error", | ||
| "Failed to delete expired revision for %s: %v", rbg.Name, err, | ||
| ) | ||
| return ctrl.Result{}, err | ||
| return err | ||
| } | ||
|
|
||
| r.recorder.Event(rbg, corev1.EventTypeNormal, Succeed, "ReconcileSucceed") | ||
| return ctrl.Result{}, nil | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The cleanup function returns on the first error, which means if deleteOrphanRoles fails, CleanExpiredRevision will not be executed. Since these two cleanup tasks are independent, it's better to execute both and aggregate any errors. This ensures that the cleanup process is as complete as possible in each reconciliation cycle.
func (r *RoleBasedGroupReconciler) cleanup(ctx context.Context, rbg *workloadsv1alpha1.RoleBasedGroup) error {
var allErrs []error
// Delete orphan roles
if err := r.deleteOrphanRoles(ctx, rbg); err != nil {
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, "delete role error",
"Failed to delete orphan roles for %s: %v", rbg.Name, err,
)
allErrs = append(allErrs, err)
}
// Delete expired controllerRevision
if _, err := utils.CleanExpiredRevision(ctx, r.client, rbg); err != nil {
r.recorder.Eventf(
rbg, corev1.EventTypeWarning, "delete expired revision error",
"Failed to delete expired revision for %s: %v", rbg.Name, err,
)
allErrs = append(allErrs, err)
}
return kerrors.NewAggregate(allErrs)
}| logger.Error(err, "Failed get or create revision") | ||
| return nil, err | ||
| // Fast path: check if reconciler already exists with read lock | ||
| r.reconcilerMu.RLock() |
There was a problem hiding this comment.
Ⅰ. Motivation
Ⅱ. Modifications
Ⅲ. Does this pull request fix one issue?
fixes #XXXX
Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Checklist
make fmt.