Sibling-hunt during PR #636 (which fixed #614 — bounded the 5 Docker SDK calls in cli/web layers) found 14 more unbounded `context.Background()` / `context.TODO()` sites in `core/`.
High-severity (job execution can hang on a slow upstream)
- `core/common.go:64` — `Ctx: context.Background()` in `NewContext` factory. All jobs inherit this unless caller overrides. Foundational.
- `core/scheduler.go:216` — `m.RunWithContext(context.Background())` in `maxConcurrentSkipJob.Run()`. No timeout on job execution.
- `core/scheduler.go:818` — `w.runWithCtx(context.Background())` in scheduler watchdog. Context never propagated.
- `core/execjob.go:47` — fallback `runCtx = context.Background()` when ctx.Ctx is nil.
- `core/runjob.go:107` — same fallback for run jobs.
- `core/runservice.go:72` — same fallback for service jobs.
- `core/localjob.go:54` — `ResolveJobEnvironment(context.Background(), ...)` for env-from-file/env-from-container resolution.
Lower priority (already bounded or CLI-only)
- `core/resilient_job.go:101` — `execCtx := context.Background()` then `WithTimeout` applied immediately at line 105. OK.
- `core/shutdown.go:95,214` — both have explicit `WithTimeout`/`WithCancel`. OK.
- `middlewares/preset.go:220` — has 30s WithTimeout. OK.
- `middlewares/slack.go:103` — has m.Client.Timeout WithTimeout. OK.
Suggested approach
This isn't a single PR. Suggest tackling in three follow-ups:
- Job-context plumbing (high): make the `*core.Context.Ctx` field mandatory non-nil, drop the four fallbacks, force callers (scheduler, executors) to pass a real bounded context.
- Scheduler watchdog + maxConcurrentSkipJob: bound with the job's configured `max-runtime` (already a global config setting).
- Job env resolution: bound `ResolveJobEnvironment` so env-from-container can't hang on a slow Docker.
Severity
Medium overall — these don't surface as readily as the Docker pings PR #636 fixed (those hung at startup; these would only hang during actual job runs). But they're the next class of "Ofelia silently stops responding" bugs.
Related
- #608 → #611: NegotiateAPIVersion bound
- #614 → #636: web/cli Docker calls bound
- This issue: the deeper core/ paths
Sibling-hunt during PR #636 (which fixed #614 — bounded the 5 Docker SDK calls in cli/web layers) found 14 more unbounded `context.Background()` / `context.TODO()` sites in `core/`.
High-severity (job execution can hang on a slow upstream)
Lower priority (already bounded or CLI-only)
Suggested approach
This isn't a single PR. Suggest tackling in three follow-ups:
Severity
Medium overall — these don't surface as readily as the Docker pings PR #636 fixed (those hung at startup; these would only hang during actual job runs). But they're the next class of "Ofelia silently stops responding" bugs.
Related