-
-
Notifications
You must be signed in to change notification settings - Fork 69.1k
Feature Request: Model Fallback Support for Cron Jobs #26120
Copy link
Copy link
Closed
Description
Problem Statement
Currently, cron jobs have no automatic model fallback mechanism. If a job specifies a specific model (e.g., kimi-coding/k2p5) and that model hits a rate limit or quota, the job simply fails with consecutiveErrors++.
The Self-Heal Cron may retry failed jobs, but it retries with the same model, which often results in repeated failures during rate-limiting events.
Current Workarounds
Users must either:
- Set all cron jobs to a single "safe" model (expensive)
- Implement manual fallback logic inside every cron prompt (tedious, error-prone)
- Accept periodic job failures during quota/rate-limit events
Proposed Solution
Add a fallbacks field to cron job payloads, similar to how agents already support model fallbacks:
{
"payload": {
"kind": "agentTurn",
"message": "Do the thing...",
"model": "kimi-coding/k2p5",
"fallbacks": [
"anthropic/claude-sonnet-4-6",
"openai-codex/gpt-5.3-codex"
]
}
}Expected Behavior
- Cron executes with primary model
- If primary fails with rate-limit/quota error → automatic retry with fallback[0]
- If fallback[0] fails → retry with fallback[1]
- Only increment
consecutiveErrorsif all models exhausted - Log which model succeeded for observability
Use Case
We run 20+ cron jobs for routine tasks (health checks, digests, monitoring). We want:
- Primary: Cheap/fast models (MiniMax, Kimi) for cost efficiency
- Fallback: Premium models (Sonnet, Opus) when quotas are hit
- Result: Jobs complete reliably without manual intervention or cost bloat
Prior Art
Agent configs already support this:
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6",
"fallbacks": ["openai-codex/gpt-5.3-codex", "kimi-coding/k2p5"]
}
}
}Extending this pattern to cron jobs would provide consistency across the platform.
Additional Context
- Runtime: OpenClaw 2026.2.23
- Current cron count: 22 jobs
- Pain point: Rate limits on budget models cause cascading job failures during high-activity periods
Happy to provide more details or help test if this gets prioritized!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.