fix: properly fall back to greedy scheduler in case of pulp/cbc errors#3733
fix: properly fall back to greedy scheduler in case of pulp/cbc errors#3733johanneskoester merged 1 commit intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds a scheduler reference attribute in JobScheduler. In MILP Scheduler, introduces a technical failure flag, wraps ILP solve in try/except with logging, and adds a fallback path to the greedy scheduler, including early-return behavior indicating no ILP result. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant JS as JobScheduler
participant MILP as MILP Scheduler
participant ILP as ILP Solver
participant GS as Greedy Scheduler
JS->>MILP: select_jobs(selectable, remaining, resources, sizes)
alt Technical failure previously
MILP-->>JS: None (signal fallback)
JS->>GS: select_jobs(...)
GS-->>JS: jobs
else Attempt ILP
MILP->>ILP: Solve model
alt Success & optimal
ILP-->>MILP: Solution
MILP-->>JS: jobs
else Error / timeout / non-optimal
ILP-->>MILP: Failure/Status
MILP->>MILP: Mark technical failure, log
MILP-->>JS: None (trigger fallback)
JS->>GS: select_jobs(...)
GS-->>JS: jobs
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/snakemake/scheduling/job_scheduler.py (1)
160-160: New _scheduler reference is unused; verify necessity vs. workflow.scheduler.Storing the scheduler instance here appears redundant since JobScheduler already binds
scheduler.select_jobsandworkflowexposesworkflow.scheduler. If no external code readsJobScheduler._scheduler, consider removing it to avoid confusion, or rename to clarify intent (e.g.,_scheduler_impl). Please confirm if anything relies on this attribute.src/snakemake/scheduling/milp.py (2)
52-53: Make PulpSolverError import robust across pulp versions.Some pulp versions expose
PulpSolverErrordifferently. Fallback to a generic Exception if the symbol isn’t importable to preserve the greedy fallback.Apply:
- from pulp import PulpSolverError + try: + from pulp import PulpSolverError + except Exception: + PulpSolverError = ExceptionAdditionally, consider guarding the
import pulpitself to trigger fallback when pulp is missing.
171-178: Avoid hard-coding the 10s timeout in logs.Tie the message to the actual
time_limitused to keep logs accurate if this changes later.Apply:
- "Failed to solve scheduling problem with ILP solver in time (10s), " + f"Failed to solve scheduling problem with ILP solver in time ({time_limit}s), "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/scheduling/job_scheduler.py(1 hunks)src/snakemake/scheduling/milp.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/scheduling/job_scheduler.pysrc/snakemake/scheduling/milp.py
🧠 Learnings (1)
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
Applied to files:
src/snakemake/scheduling/job_scheduler.pysrc/snakemake/scheduling/milp.py
🧬 Code graph analysis (2)
src/snakemake/scheduling/job_scheduler.py (1)
src/snakemake/workflow.py (2)
scheduler(481-482)scheduler(485-486)
src/snakemake/scheduling/milp.py (1)
src/snakemake/scheduling/greedy.py (1)
select_jobs(42-113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (55)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (9, macos-latest, py312)
- GitHub Check: tests (10, windows-2022, py312)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (3, macos-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (9, windows-2022, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-2022, py312)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (8, windows-2022, py312)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, windows-2022, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (6, windows-2022, py312)
- GitHub Check: tests (4, windows-2022, py312)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (2, windows-2022, py312)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (1, windows-2022, py312)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: apidocs
🔇 Additional comments (1)
src/snakemake/scheduling/milp.py (1)
46-49: Verify baseselect_jobsreturn type allowsNone
OverridingSchedulerBase.select_jobsto returnOptional[Sequence[JobSchedulerInterface]]may violate its declared return type. Please confirm the signature ofSchedulerBase.select_jobs(insnakemake_interface_scheduler_plugins.base) permitsNoneor adjust the interface/override to stay type-safe.
| def __post_init__(self) -> None: | ||
| self._technical_failure = False | ||
|
|
There was a problem hiding this comment.
Potential AttributeError: post_init won’t be called unless Scheduler is a dataclass.
Scheduler isn’t a dataclass, so __post_init__ might never run; accessing self._technical_failure later can crash. Initialize the flag defensively where it’s first used.
Apply this diff near the early-return check:
@@
- ) -> Optional[Sequence[JobSchedulerInterface]]:
- if self._technical_failure:
+ ) -> Optional[Sequence[JobSchedulerInterface]]:
+ # Ensure flag exists even if __post_init__ is not invoked by the base class.
+ if not hasattr(self, "_technical_failure"):
+ self._technical_failure = False
+ if self._technical_failure:
# fallback early since we failed before already
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __post_init__(self) -> None: | |
| self._technical_failure = False | |
| ) -> Optional[Sequence[JobSchedulerInterface]]: | |
| # Ensure flag exists even if __post_init__ is not invoked by the base class. | |
| if not hasattr(self, "_technical_failure"): | |
| self._technical_failure = False | |
| if self._technical_failure: | |
| # fallback early since we failed before already | |
| return None |
🤖 Prompt for AI Agents
In src/snakemake/scheduling/milp.py around lines 37 to 39, __post_init__ is
defined but Scheduler is not a dataclass so that method may never run and
self._technical_failure can be missing; initialize the flag defensively by
setting self._technical_failure = False in the Scheduler.__init__ (or
immediately before the early-return check referenced in the review) so the
attribute always exists before any access, ensuring subsequent code never raises
AttributeError.
| try: | ||
| status = self._solve_ilp(prob, time_limit=10) | ||
| except PulpSolverError as e: | ||
| self._technical_failure = True | ||
| self.logger.warning( | ||
| "Failed to solve scheduling problem with ILP solver, falling back to " | ||
| "greedy scheduler. You likely have to fix your ILP solver " | ||
| f"installation. Error message: {e}" | ||
| ) | ||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Broaden error handling to guarantee fallback on all ILP failures (not only PulpSolverError).
Solver invocation can raise other exceptions (e.g., ImportError/OSError when binaries are missing). Broaden the catch to ensure we always fall back to greedy and set the sticky failure flag.
Apply:
- try:
- status = self._solve_ilp(prob, time_limit=10)
+ time_limit = 10
+ try:
+ status = self._solve_ilp(prob, time_limit=time_limit)
except PulpSolverError as e:
self._technical_failure = True
self.logger.warning(
"Failed to solve scheduling problem with ILP solver, falling back to "
"greedy scheduler. You likely have to fix your ILP solver "
f"installation. Error message: {e}"
)
return None
+ except Exception as e:
+ self._technical_failure = True
+ self.logger.warning(
+ "Unexpected error from ILP solver, falling back to greedy scheduler. "
+ f"Error: {type(e).__name__}: {e}"
+ )
+ return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| status = self._solve_ilp(prob, time_limit=10) | |
| except PulpSolverError as e: | |
| self._technical_failure = True | |
| self.logger.warning( | |
| "Failed to solve scheduling problem with ILP solver, falling back to " | |
| "greedy scheduler. You likely have to fix your ILP solver " | |
| f"installation. Error message: {e}" | |
| ) | |
| return None | |
| time_limit = 10 | |
| try: | |
| status = self._solve_ilp(prob, time_limit=time_limit) | |
| except PulpSolverError as e: | |
| self._technical_failure = True | |
| self.logger.warning( | |
| "Failed to solve scheduling problem with ILP solver, falling back to " | |
| "greedy scheduler. You likely have to fix your ILP solver " | |
| f"installation. Error message: {e}" | |
| ) | |
| return None | |
| except Exception as e: | |
| self._technical_failure = True | |
| self.logger.warning( | |
| "Unexpected error from ILP solver, falling back to greedy scheduler. " | |
| f"Error: {type(e).__name__}: {e}" | |
| ) | |
| return None |
🤖 Prompt for AI Agents
In src/snakemake/scheduling/milp.py around lines 158 to 167, broaden the
exception handler around the ILP solver call so all failures (not only
PulpSolverError) trigger the fallback: catch a general Exception (e.g., except
Exception as e), set self._technical_failure = True, log the same warning
message including the caught exception details, and return None; keep the
existing message/context but ensure it runs for ImportError/OSError/other
runtime errors so the greedy scheduler is always used on any solver failure.
🤖 I have created a release *beep* *boop* --- ## [9.11.2](v9.11.1...v9.11.2) (2025-09-09) ### Bug Fixes * properly fall back to greedy scheduler in case of pulp/cbc errors ([#3733](#3733)) ([62f13b8](62f13b8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
snakemake#3733) fixes snakemake#3719 ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Automatic fallback to a greedy scheduler when the ILP solver fails or times out, ensuring continued progress. - More informative logging that clearly indicates when and why a fallback occurs. - Bug Fixes - Improved robustness by gracefully handling solver errors and timeouts, preventing crashes or stalls during scheduling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.11.2](snakemake/snakemake@v9.11.1...v9.11.2) (2025-09-09) ### Bug Fixes * properly fall back to greedy scheduler in case of pulp/cbc errors ([snakemake#3733](snakemake#3733)) ([62f13b8](snakemake@62f13b8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
fixes #3719
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes