Preserve DAG env parity in subprocess relaunches#1860
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a presolved build-environment transport and env-resolution flow so explicit host env values are encoded by the CLI, carried through subprocess invocations, and reconstructed for subprocesses when restoring DAGs or retrying/enqueuing runs. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (dagu)
participant API as API / Frontend
participant Scheduler as Scheduler
participant Worker as Worker / TaskHandler
participant Subproc as Subprocess (built exec)
CLI->>CLI: presolve build env (buildenv.Encode)
CLI->>API: start/enqueue/retry request + presolved env vars
API->>Scheduler: enqueue or dispatch job (carries presolved env)
Scheduler->>Worker: assign task (includes dag, presolved env)
Worker->>Worker: prepare DAG for subprocess (spec.ResolveEnv)
Worker->>Subproc: launch built executable with presolved env in os.Env
Subproc->>Subproc: subprocess reconstructs env (buildenv.Load -> apply)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
internal/cmd/start_test.go (1)
24-27: Spawn the built CLI with a deadline.A hung subprocess here blocks the whole suite until the global
go testtimeout becauseCombinedOutput()is running on a plainexec.Command. Useexec.CommandContextwith a timeout or a deadline derived fromt.Deadline().Suggested hardening
-func runBuiltCLICommand(th test.Command, extraEnv []string, args ...string) ([]byte, error) { - cmd := osexec.Command(th.Config.Paths.Executable, test.WithConfigFlag(args, th.Config)...) +func runBuiltCLICommand(t *testing.T, th test.Command, extraEnv []string, args ...string) ([]byte, error) { + t.Helper() + + ctx := context.Background() + if deadline, ok := t.Deadline(); ok { + var cancel context.CancelFunc + ctx, cancel = context.WithDeadline(ctx, deadline) + defer cancel() + } + + cmd := osexec.CommandContext(ctx, th.Config.Paths.Executable, test.WithConfigFlag(args, th.Config)...) cmd.Env = append(append([]string{}, th.ChildEnv...), extraEnv...) return cmd.CombinedOutput() } @@ - output, err := runBuiltCLICommand(th, extraEnv, args...) + output, err := runBuiltCLICommand(t, th, extraEnv, args...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cmd/start_test.go` around lines 24 - 27, runBuiltCLICommand currently spawns the subprocess via osexec.Command and calls CombinedOutput(), which can hang the test runner; change runBuiltCLICommand to accept a context (or *testing.T) and use exec.CommandContext with a context created from t.Deadline() (or a sensible timeout) so the subprocess is killed on timeout, then call cmd.CombinedOutput() on that command; keep the same env handling (th.ChildEnv and extraEnv) and preserve the call sites by passing a context or t through to runBuiltCLICommand.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/cmd/retry_test.go`:
- Around line 227-230: The current assertion only checks the left side of RESULT
and can miss leaked raw host variable; update the test after
retriedAttempt.ReadStatus to fetch the full RESULT via statusOutputValue(t,
retriedStatus, "RESULT") and assert that it equals "from-host|" (or split on "|"
and assert the second segment is empty) instead of only comparing the left
segment, ensuring the raw CMD_RETRY_EXPLICIT_ENV is not preserved.
In `@internal/cmn/buildenv/env.go`:
- Around line 30-33: The current loop appends presolved values into process env
using PresolvedEnvPrefix (building "PresolvedEnvPrefix+key=..."), which exposes
secrets; stop serializing presolved entries into the child environment in the
Encode path. Instead write the presolved payload (the entries map for keys) to a
secure temporary file or create a pipe and pass only a single reference (e.g.,
PRESOLVED_PAYLOAD_PATH or a file descriptor) to the child; remove the append of
PresolvedEnvPrefix+key from the extra env slice and update the child-side loader
(the code that reads presolved envs) to read the payload from that path/pipe and
then delete/close the temp resource. Ensure functions/variables referenced are
Encode, PresolvedEnvPrefix, entries, and keys so reviewers can locate and
replace the env export with file/pipe transport and corresponding reader
changes.
In `@internal/core/spec/loader.go`:
- Around line 233-250: When loading multi-document YAML from memory in
loadDAGsFromData the code only returns dags[0] and never preserves secondary
documents, causing dag.LocalDAGs to be lost; update the branch that handles
dags, err := loadDAGsFromData(...) so that after selecting dag := dags[0] you
set dag.LocalDAGs (or equivalent) to the remaining documents (e.g., dags[1:]) so
sub-DAG definitions are retained, then continue to set dag.YamlData,
dag.BaseConfigData, WorkingDir/WorkingDirExplicit and call
core.InitializeDefaults(dag) before returning.
In `@internal/core/spec/runtime_env.go`:
- Around line 65-78: ResolveEnv currently prefers reloading from dag.Location
before using the persisted snapshot in dag.YamlData, causing retries to read
local files and diverge; change the switch in ResolveEnv (the cases handling
dag.Location and dag.YamlData) to check for non-empty dag.YamlData first and
call LoadYAML(ctx, dag.YamlData, ...) before falling back to Load(ctx,
dag.Location, ...), keeping the same error handling and returning
append([]string{}, fresh.Env...) in both branches.
---
Nitpick comments:
In `@internal/cmd/start_test.go`:
- Around line 24-27: runBuiltCLICommand currently spawns the subprocess via
osexec.Command and calls CombinedOutput(), which can hang the test runner;
change runBuiltCLICommand to accept a context (or *testing.T) and use
exec.CommandContext with a context created from t.Deadline() (or a sensible
timeout) so the subprocess is killed on timeout, then call cmd.CombinedOutput()
on that command; keep the same env handling (th.ChildEnv and extraEnv) and
preserve the call sites by passing a context or t through to runBuiltCLICommand.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a6f1be5d-f523-4af3-84e1-e511c2aed6c2
📒 Files selected for processing (19)
internal/cmd/helper.gointernal/cmd/helper_test.gointernal/cmd/restart_test.gointernal/cmd/retry_test.gointernal/cmd/start.gointernal/cmd/start_test.gointernal/cmn/buildenv/env.gointernal/core/spec/loader.gointernal/core/spec/runtime_env.gointernal/core/spec/variables.gointernal/intg/queue/queue_test.gointernal/runtime/subcmd.gointernal/runtime/subcmd_test.gointernal/service/frontend/api/v1/dagruns.gointernal/service/frontend/api/v1/dags_test.gointernal/service/scheduler/dag_executor.gointernal/service/scheduler/dag_executor_test.gointernal/service/worker/handler.gointernal/service/worker/handler_test.go
# Conflicts: # internal/runtime/subcmd.go # internal/service/frontend/api/v1/dags_test.go
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
internal/core/spec/loader.go (1)
256-298:⚠️ Potential issue | 🟠 Major
LoadYAMLWithOptsdoes not preserveLocalDAGsfrom multi-document YAML.When loading from raw YAML data (not file), this function only processes the first document via
def.build(buildCtx). If the input contains multiple YAML documents (separated by---), secondary documents that define sub-DAGs are silently dropped. The file-basedloadDAGpath at lines 396-404 correctly populatesmainDAG.LocalDAGs, but this memory-based path does not.This could cause issues when rebuilding DAGs from
YamlDataduring retry/restart if the original DAG contained inline sub-DAGs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/spec/loader.go` around lines 256 - 298, LoadYAMLWithOpts currently only builds the first parsed document via def.build(buildCtx), which drops additional documents and therefore loses LocalDAGs; update the memory-path in LoadYAMLWithOpts to iterate over all parsed documents (similar to the file-path behavior that sets mainDAG.LocalDAGs), building each document with def.build(buildCtx) and either merging or appending the resulting DAGs into dest.LocalDAGs (and merging into dest where appropriate), while preserving existing logic around baseDef/build errors, dest.YamlData and dest.BaseConfigData; use the existing symbols def, buildCtx, baseDef, dest, dag and the merge function to locate where to add the loop/aggregation.internal/cmn/buildenv/env.go (1)
30-34:⚠️ Potential issue | 🟠 MajorSecurity: Presolved env values are exposed in process environment.
This serializes presolved values (which may include secrets defined in
env:blocks) into child process environments viaPresolvedEnvPrefix+key. These values become visible in/proc/<pid>/environ, crash dumps, and debug tooling, increasing secret exposure surface.Consider an alternative transport mechanism (temp file with restricted permissions, pipe, or file descriptor) for sensitive payloads.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cmn/buildenv/env.go` around lines 30 - 34, The code currently serializes presolved values into child process env using PresolvedEnvPrefix+key (the loop building extra), which exposes secrets; change this to avoid putting secrets in the environment: instead write the presolved payload to a temp file with strict permissions (0600) or a pipe/FD and return only a single reference (e.g., PRESOLVED_ENV_FILE) or file descriptor identifier to the child process; update the function that builds extras (the loop over keys producing PresolvedEnvPrefix+key entries) to (1) write entries[key] to a securely created temp file or a dedicated FD, (2) add only a single env var like PRESOLVED_ENV_FILE=<path> or pass the FD, and (3) ensure the file/FD is properly cleaned up and access-restricted—do not place individual PresolvedEnvPrefix+key entries into the environment.internal/core/spec/runtime_env.go (1)
65-78:⚠️ Potential issue | 🟠 MajorPrefer the persisted YAML snapshot over
Location.Line 66 currently reloads from
dag.Locationbeforedag.YamlData. If the file changed after the run was queued or persisted, env reconstruction will silently pick up newer on-disk content instead of the stored snapshot, which breaks retry/restart parity.Suggested fix
switch { - case dag.Location != "": - fresh, err := Load(ctx, dag.Location, loadOpts...) - if err != nil { - return nil, err - } - return append([]string{}, fresh.Env...), nil - case len(dag.YamlData) > 0: fresh, err := LoadYAML(ctx, dag.YamlData, loadOpts...) if err != nil { return nil, err } return append([]string{}, fresh.Env...), nil + + case dag.Location != "": + fresh, err := Load(ctx, dag.Location, loadOpts...) + if err != nil { + return nil, err + } + return append([]string{}, fresh.Env...), nil default: return append([]string{}, dag.Env...), nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/spec/runtime_env.go` around lines 65 - 78, The current switch prefers dag.Location over dag.YamlData, causing on-disk changes to override the persisted YAML snapshot; update the logic in the function that reconstructs the environment to check dag.YamlData first and call LoadYAML(ctx, dag.YamlData, loadOpts...) when len(dag.YamlData) > 0 (returning append([]string{}, fresh.Env...)), and only fall back to Load(ctx, dag.Location, loadOpts...) when YamlData is empty, preserving retry/restart parity.
🧹 Nitpick comments (2)
internal/core/spec/variables.go (1)
101-106: Semantic mismatch:EnvSourceDAGEnvused for presolved host-env values.The presolved values from
ctx.opts.BuildEnvoriginated from the host environment (viabuildenv.Load()), not from the DAG'senv:field definition. UsingEnvSourceDAGEnvfor these values is semantically incorrect.Per
internal/cmn/eval/envscope.go,EnvSourceDAGEnvis defined as "From DAG env: field". While this doesn't currently affect resolution (scope lookup ignores source), it could:
- Cause confusion during debugging when tracing variable origins
- Break future logic that uses
EnvSourcefor precedence decisionsConsider adding a dedicated
EnvSourcePresolvedBuildconstant or usingEnvSourceOSto accurately represent the origin.♻️ Suggested approach
In
internal/cmn/eval/envscope.go, add:EnvSourcePresolved EnvSource = "presolved" // From presolved build env transportThen in this file:
if presolved, ok := ctx.opts.BuildEnv[p.key]; ok { value = presolved - scope = scope.WithEntry(p.key, value, eval.EnvSourceDAGEnv) + scope = scope.WithEntry(p.key, value, eval.EnvSourcePresolved) vars[p.key] = value continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/spec/variables.go` around lines 101 - 106, The presolved values taken from ctx.opts.BuildEnv are annotated with EnvSourceDAGEnv but should be marked as a distinct source; add a new EnvSource constant (e.g., EnvSourcePresolved or EnvSourcePresolvedBuild) to internal/cmn/eval/envscope.go and then change the scope.WithEntry call in variables.go (the block referencing ctx.opts.BuildEnv and presolved) to use that new EnvSource constant instead of EnvSourceDAGEnv so the entry correctly reflects "presolved build env" as the origin.internal/cmd/start_test.go (1)
24-57: Extract the built-CLI/output helpers intointernal/test.
runBuiltCLI*andstatusOutputValuenow duplicate logic that also exists ininternal/runtime/subcmd_test.goandinternal/intg/queue/queue_test.go. A small shared helper would keep env injection andkey=trimming consistent across all three suites.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cmd/start_test.go` around lines 24 - 57, Create a shared test helper package (internal/test) and move the duplicate helpers into it: implement exported functions RunBuiltCLICommand(th test.Command, extraEnv []string, args ...string) ([]byte, error), RunBuiltCLI(t *testing.T, th test.Command, extraEnv []string, args ...string) string and StatusOutputValue(t *testing.T, status *exec.DAGRunStatus, key string) string that preserve the exact behavior (env injection via th.ChildEnv + extraEnv, using test.WithConfigFlag for args, require.NoError in RunBuiltCLI, and trimming the "key=" prefix from output in StatusOutputValue); then replace the local implementations in internal/cmd/start_test.go, internal/runtime/subcmd_test.go and internal/intg/queue/queue_test.go with imports from internal/test and call the new exported functions, removing the duplicated code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/service/worker/handler.go`:
- Around line 127-135: Replace use of task.RootDagRunName when building load
options with the actual DAG identifier from the task (e.g., task.DagName or
task.Dag) so spec.Load uses the executed DAG rather than the root run name; keep
the dagNameHint(originalTarget) fallback but prefer the task's DAG field, and
continue to append spec.WithName(dagName) to loadOpts only when that selected
dagName is non-empty.
---
Duplicate comments:
In `@internal/cmn/buildenv/env.go`:
- Around line 30-34: The code currently serializes presolved values into child
process env using PresolvedEnvPrefix+key (the loop building extra), which
exposes secrets; change this to avoid putting secrets in the environment:
instead write the presolved payload to a temp file with strict permissions
(0600) or a pipe/FD and return only a single reference (e.g.,
PRESOLVED_ENV_FILE) or file descriptor identifier to the child process; update
the function that builds extras (the loop over keys producing
PresolvedEnvPrefix+key entries) to (1) write entries[key] to a securely created
temp file or a dedicated FD, (2) add only a single env var like
PRESOLVED_ENV_FILE=<path> or pass the FD, and (3) ensure the file/FD is properly
cleaned up and access-restricted—do not place individual PresolvedEnvPrefix+key
entries into the environment.
In `@internal/core/spec/loader.go`:
- Around line 256-298: LoadYAMLWithOpts currently only builds the first parsed
document via def.build(buildCtx), which drops additional documents and therefore
loses LocalDAGs; update the memory-path in LoadYAMLWithOpts to iterate over all
parsed documents (similar to the file-path behavior that sets
mainDAG.LocalDAGs), building each document with def.build(buildCtx) and either
merging or appending the resulting DAGs into dest.LocalDAGs (and merging into
dest where appropriate), while preserving existing logic around baseDef/build
errors, dest.YamlData and dest.BaseConfigData; use the existing symbols def,
buildCtx, baseDef, dest, dag and the merge function to locate where to add the
loop/aggregation.
In `@internal/core/spec/runtime_env.go`:
- Around line 65-78: The current switch prefers dag.Location over dag.YamlData,
causing on-disk changes to override the persisted YAML snapshot; update the
logic in the function that reconstructs the environment to check dag.YamlData
first and call LoadYAML(ctx, dag.YamlData, loadOpts...) when len(dag.YamlData) >
0 (returning append([]string{}, fresh.Env...)), and only fall back to Load(ctx,
dag.Location, loadOpts...) when YamlData is empty, preserving retry/restart
parity.
---
Nitpick comments:
In `@internal/cmd/start_test.go`:
- Around line 24-57: Create a shared test helper package (internal/test) and
move the duplicate helpers into it: implement exported functions
RunBuiltCLICommand(th test.Command, extraEnv []string, args ...string) ([]byte,
error), RunBuiltCLI(t *testing.T, th test.Command, extraEnv []string, args
...string) string and StatusOutputValue(t *testing.T, status *exec.DAGRunStatus,
key string) string that preserve the exact behavior (env injection via
th.ChildEnv + extraEnv, using test.WithConfigFlag for args, require.NoError in
RunBuiltCLI, and trimming the "key=" prefix from output in StatusOutputValue);
then replace the local implementations in internal/cmd/start_test.go,
internal/runtime/subcmd_test.go and internal/intg/queue/queue_test.go with
imports from internal/test and call the new exported functions, removing the
duplicated code.
In `@internal/core/spec/variables.go`:
- Around line 101-106: The presolved values taken from ctx.opts.BuildEnv are
annotated with EnvSourceDAGEnv but should be marked as a distinct source; add a
new EnvSource constant (e.g., EnvSourcePresolved or EnvSourcePresolvedBuild) to
internal/cmn/eval/envscope.go and then change the scope.WithEntry call in
variables.go (the block referencing ctx.opts.BuildEnv and presolved) to use that
new EnvSource constant instead of EnvSourceDAGEnv so the entry correctly
reflects "presolved build env" as the origin.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b941120f-b274-4a73-acae-3f9917857a71
📒 Files selected for processing (22)
internal/cmd/helper.gointernal/cmd/helper_test.gointernal/cmd/restart_test.gointernal/cmd/retry_test.gointernal/cmd/start.gointernal/cmd/start_test.gointernal/cmn/buildenv/env.gointernal/core/spec/loader.gointernal/core/spec/loader_test.gointernal/core/spec/runtime_env.gointernal/core/spec/variables.gointernal/intg/queue/queue_test.gointernal/runtime/subcmd.gointernal/runtime/subcmd_test.gointernal/service/frontend/api/v1/dagruns.gointernal/service/frontend/api/v1/dags_test.gointernal/service/scheduler/dag_executor.gointernal/service/scheduler/dag_executor_test.gointernal/service/worker/handler.gointernal/service/worker/handler_test.gointernal/service/worker/worker.gointernal/service/worker/worker_init_test.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1860 +/- ##
==========================================
+ Coverage 69.03% 69.07% +0.03%
==========================================
Files 443 445 +2
Lines 54478 54771 +293
==========================================
+ Hits 37609 37832 +223
- Misses 13541 13581 +40
- Partials 3328 3358 +30
... and 19 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary
start,enqueue,restart, andretrychild commands so built executables, scheduler launches, and API-triggered retries keep explicit env-backed values even when the child process filters host env vars.Testing
go test ./internal/cmd ./internal/runtime -count=1go test ./internal/service/frontend/api/v1 ./internal/service/scheduler ./internal/service/worker -count=1go test ./internal/intg/queue -count=1Closes #1856
Summary by CodeRabbit
Bug Fixes
Tests