Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements worker selector propagation from parent DAGs to sub-DAGs during execution. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/runtime/executor/dag_runner.go (1)
439-444:⚠️ Potential issue | 🟠 MajorTrack distributed retry runs before waiting.
Killonly requests coordinator cancellation for IDs indistributedRuns. The distributedExecutepath records the run before dispatch, butRetrydoes not, so canceling a parent while a distributed step retry is running can leave the coordinator task uncanceled and wait until timeout.Proposed fix
if e.shouldDispatchToCoordinator(rCtx.DefaultExecMode) { logger.Info(ctx, "Retrying sub DAG via distributed execution", tag.Step(stepName)) + e.mu.Lock() + e.distributedRuns[runParams.RunID] = true + e.mu.Unlock() if err := e.dispatchRetryToCoordinator(ctx, runParams, stepName); err != nil { return nil, fmt.Errorf("distributed step retry failed: %w", err) } return e.waitCompletion(ctx, runParams.RunID) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/runtime/executor/dag_runner.go` around lines 439 - 444, The retry path doesn't register the distributed run ID before dispatching, so the Kill logic (which cancels IDs in distributedRuns) can miss it; modify the retry flow in the block guarded by e.shouldDispatchToCoordinator to record the run into the same distributed run tracking used by the Execute path (the distributedRuns registry/collection on the executor) prior to calling e.dispatchRetryToCoordinator(ctx, runParams, stepName), and ensure any cleanup or removal remains paired (e.g., after waitCompletion or on error) so the coordinator cancellation logic sees the retry run ID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/runtime/executor/dag_runner.go`:
- Around line 439-444: The retry path doesn't register the distributed run ID
before dispatching, so the Kill logic (which cancels IDs in distributedRuns) can
miss it; modify the retry flow in the block guarded by
e.shouldDispatchToCoordinator to record the run into the same distributed run
tracking used by the Execute path (the distributedRuns registry/collection on
the executor) prior to calling e.dispatchRetryToCoordinator(ctx, runParams,
stepName), and ensure any cleanup or removal remains paired (e.g., after
waitCompletion or on error) so the coordinator cancellation logic sees the retry
run ID.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26d170d0-16f0-41ab-a2c0-babdc226ed30
📒 Files selected for processing (4)
internal/intg/distr/subdag_test.gointernal/runtime/builtin/dag/dag.gointernal/runtime/builtin/dag/parallel.gointernal/runtime/executor/dag_runner.go
Summary
worker_selectorthrough sub-DAG and parallel sub-DAG executioncallstep whose child DAG has no DAG-level selectorRoot Cause
worker_selectoroncallsteps was parsed and validated, butSubDAGExecutoronly consulted the child DAG's ownWorkerSelector. When the selector existed only on the parent step, the child DAG was treated as local and the coordinator task was never created with the requested worker labels.Testing
go test ./internal/intg/distr -run 'TestSubDAG_CallStepWorkerSelector/immediateParentDispatchesChildUsingCallStepSelector' -count=1 -vgo test ./internal/intg/distr -run 'TestSubDAG' -count=1go test ./internal/runtime/executor ./internal/runtime/builtin/dag -count=1go test ./internal/intg/distr -count=1Closes #1638
Summary by CodeRabbit
Release Notes
New Features
Tests