Skip to content

fix: make ilp solver enumeration lazy#3900

Merged
johanneskoester merged 8 commits intosnakemake:mainfrom
coroa:fix/avoid-calling-pulp-listsolvers
Jan 7, 2026
Merged

fix: make ilp solver enumeration lazy#3900
johanneskoester merged 8 commits intosnakemake:mainfrom
coroa:fix/avoid-calling-pulp-listsolvers

Conversation

@coroa
Copy link
Copy Markdown
Contributor

@coroa coroa commented Jan 2, 2026

Hi everyone,

The enumeration of available solvers with pulp.listSolvers(onlyAvailable=True) typically acquires license slots of all installed commercial solvers, which becomes problematic, if only a limited number of shared slots are available. pulp correctly releases the licenses immediately again, but it still repeatedly leads to friction.

This PR avoids the pulp solver enumeration by using pulp.getSolver(...).available() to check for whether the chosen solver is available and only on failure queries the full list.

The default solver choice from #3736 was updated to rely on pulp.apis.LpSolverDefault, which is compatible with "PULP_CBC_CMD" (if it is available).

I also moved the pulp specific checking out of api.py entirely, so that all the solver choice is now handled by the new structure.

With these changes, even if pulp is not installed, snakemake gracefully falls back to the greedy scheduler.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • Improvements

    • Lazy, more robust solver discovery with improved default selection for ILP scheduling, reducing startup overhead and surprises.
    • Scheduler now detects when the configured solver is unavailable and automatically falls back to the greedy scheduler.
  • New Features

    • Exposes a solver-availability indicator for easier diagnostics and decisions.
  • Bug Fixes

    • Clearer warning when falling back, indicating missing solver support and required external solver dependencies.

✏️ Tip: You can customize this high-level summary in your review settings.

only calls pulp.listSolvers if a non-default solver is selected to avoid acquiring a
license.
@coroa coroa requested a review from johanneskoester as a code owner January 2, 2026 11:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 2, 2026

📝 Walkthrough

Walkthrough

Add a lazy LpSolverCollection to discover available pulp solvers, replace the global solver list with it, update SchedulerSettings to use its .default and expose .lp_solver_available, and make the API fallback to the greedy scheduler when no ILP solver is available.

Changes

Cohort / File(s) Summary
MILP solver collection & scheduling
src/snakemake/scheduling/milp.py
Add LpSolverCollection(Collection[str]) (lazy discovery, .default, iteration/containment); replace previous lp_solvers list with LpSolverCollection(); simplify _solve_ilp to use pulp.getSolver(self.settings.solver); update imports and typing.
SchedulerSettings adjustments
src/snakemake/scheduling/milp.py
Change SchedulerSettings.solver default to lp_solvers.default; add lp_solver_available property indicating whether configured solver is present.
API ILP fallback logic
src/snakemake/api.py
Replace direct pulp checks with scheduler_settings.lp_solver_available; if unavailable, log a warning and switch scheduler and settings to greedy; update warning text to reference pulp and coin-or/glpk requirement.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant API as API (scheduler request)
  participant Settings as SchedulerSettings
  participant Collection as LpSolverCollection
  participant Pulp as pulp (external)
  participant Scheduler as Scheduler (ILP / Greedy)

  API->>Settings: use configured solver
  Settings->>Collection: check availability (lp_solver_available)
  Collection->>Pulp: query available solvers / LpSolverDefault.name
  Pulp-->>Collection: solver list / default
  alt solver available
    Collection-->>Settings: available=True
    Settings-->>Scheduler: select ILP solver (pulp.getSolver(...))
  else solver unavailable
    Collection-->>Settings: available=False
    Settings-->>API: indicate fallback
    API-->>Scheduler: switch to Greedy scheduler
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: make ilp solver enumeration lazy' directly and clearly describes the main technical change: converting eager solver enumeration to lazy evaluation to avoid acquiring license slots.
Description check ✅ Passed The PR description includes a clear explanation of the problem, the solution approach, implementation details, and both QC checkboxes are marked as completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 656c377 and 65a823e.

📒 Files selected for processing (1)
  • src/snakemake/scheduling/milp.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/milp.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2025-09-05T16:17:21.298Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3726
File: src/snakemake/common/__init__.py:413-414
Timestamp: 2025-09-05T16:17:21.298Z
Learning: Snakemake requires Python 3.11 at runtime nowadays. Only imports need to work with Python 3.7 for compatibility reasons.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-08-13T09:25:24.046Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 2985
File: tests/tests.py:2051-2051
Timestamp: 2024-08-13T09:25:24.046Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-08-13T16:22:09.641Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3014
File: snakemake/workflow.py:1147-1147
Timestamp: 2024-08-13T16:22:09.641Z
Learning: Avoid suggesting type annotations for functions that are inside methods in the Snakemake codebase.

Applied to files:

  • src/snakemake/scheduling/milp.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: formatting
  • GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/scheduling/milp.py (3)

13-59: Excellent lazy evaluation implementation!

The LpSolverCollection successfully addresses the PR objective by avoiding eager pulp.listSolvers(onlyAvailable=True) calls that can acquire license slots. Key strengths:

  • Default solver retrieved from pulp.apis.LpSolverDefault without enumerating all solvers
  • nondefault_solvers computed lazily via cached_property only when needed
  • __contains__ checks availability directly via pulp.getSolver(x).available() without materializing the full list
  • Proper Collection[str] interface implementation with __iter__, __contains__, and __len__
  • Python 3.7+ compatible type annotations (List[str])

The past review concerns about type annotations and iterator consistency have been properly addressed.


66-82: Clean integration of lazy solver checking.

The changes to SchedulerSettings properly support lazy evaluation:

  • Switching the default from lp_solvers[0] to lp_solvers.default (line 67) avoids triggering the cached nondefault_solvers property unnecessarily
  • The lp_solver_available property provides a clear API for the fallback logic in api.py mentioned in the PR objectives
  • Correctly delegates to lp_solvers.__contains__ which checks availability via pulp.getSolver(solver).available()

255-255: Appropriate simplification of solver selection.

Removing conditional logic in favor of direct pulp.getSolver(self.settings.solver) is clean and correct. The safety is provided by:

  • SchedulerSettings.solver defaulting to lp_solvers.default (which is None only if no solver is available)
  • The api.py logic checking lp_solver_available before attempting ILP scheduling (per PR objectives)
  • The PulpSolverError catch on line 208 providing fallback if solver initialization fails

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coroa
Copy link
Copy Markdown
Contributor Author

coroa commented Jan 2, 2026

@johanneskoester Happy to add docstrings to the methods, if you think that improves legibility. I am doubtful.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @src/snakemake/scheduling/milp.py:
- Around line 44-60: The iterator and length logic currently skip
nondefault_solvers when self.default is None; update __iter__ to always iterate
over self.nondefault_solvers (filtering them for availability using
pulp.getSolver(name).available() like __contains__ does) and update __len__ to
count available nondefault_solvers when default is None (or in all cases return
(1 if self.default is not None else 0) plus the number of available
nondefault_solvers). Ensure you reference the same availability check used in
__contains__ to keep behavior consistent.
- Around line 31-42: The return type annotation on the cached_property
nondefault_solvers uses Python 3.9+ syntax (list[str]) which breaks
compatibility with Snakemake 9 (Python 3.7); change the annotation to use
typing.List[str] and add List to the imports (e.g., import List from typing at
the top), or alternatively enable postponed evaluation by adding from __future__
import annotations at the top of the file; update the signature of
nondefault_solvers accordingly (referencing nondefault_solvers) so the file
stays compatible with Python 3.7.
🧹 Nitpick comments (1)
src/snakemake/scheduling/milp.py (1)

49-55: Catch PulpSolverError to handle invalid solver names more robustly.

pulp.getSolver() raises PulpSolverError (not KeyError) when given an unknown solver name. The current code only catches KeyError, leaving PulpSolverError unhandled. While the code works in normal cases since self.solver is typically validated through SchedulerSettings choices, a more defensive approach would improve robustness.

To properly handle this, use nested try-except blocks to avoid referencing pulp.PulpSolverError when the import itself fails:

Suggested approach
    def __contains__(self, x: object) -> bool:
        try:
            import pulp
+           from pulp import PulpSolverError
        except ImportError:
            return False
        
        try:
            return pulp.getSolver(x).available()
-       except (ImportError, KeyError):
+       except (KeyError, PulpSolverError):
            return False
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b11aba6 and 19d3e95.

📒 Files selected for processing (2)
  • src/snakemake/api.py
  • src/snakemake/scheduling/milp.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/milp.py
  • src/snakemake/api.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.

Applied to files:

  • src/snakemake/scheduling/milp.py
  • src/snakemake/api.py
🧬 Code graph analysis (1)
src/snakemake/api.py (1)
src/snakemake/scheduling/milp.py (1)
  • lp_solver_available (81-82)
🔇 Additional comments (3)
src/snakemake/api.py (1)

609-616: LGTM! Clean integration with the new lazy solver collection.

The fallback logic correctly uses scheduler_settings.lp_solver_available which internally checks solver availability via pulp.getSolver().available() rather than enumerating all solvers. This avoids the license acquisition issue mentioned in the PR objectives.

src/snakemake/scheduling/milp.py (2)

80-82: LGTM! Correctly implements lazy solver availability check.

The property uses the collection's __contains__ method which calls pulp.getSolver(x).available() directly, avoiding the listSolvers enumeration that triggers license acquisition. This is the core fix for the PR objective.


68-73: Good: Using lp_solvers.default avoids triggering solver enumeration.

The previous lp_solvers[0] approach would have required implementing __getitem__ or would have failed. Using the explicit .default attribute aligns with the lazy initialization pattern and keeps the default solver lookup O(1) without enumerating all solvers.

Comment thread src/snakemake/scheduling/milp.py
Comment thread src/snakemake/scheduling/milp.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/snakemake/scheduling/milp.py (1)

254-258: Add defensive check for None solver to improve error clarity.

If both self.settings.solver and pulp.apis.LpSolverDefault are None, the code would assign solver = None, causing an AttributeError at line 261 when accessing solver.optionsDict. While the API code (per the PR summary) checks lp_solver_available before invoking ILP, a defensive check here would improve robustness against edge cases (solver becoming unavailable between check and use, or direct instantiation bypassing API checks).

🔎 Proposed defensive check
 solver = (
     pulp.getSolver(self.settings.solver)
     if self.settings.solver
     else pulp.apis.LpSolverDefault
 )
+if solver is None:
+    raise pulp.PulpSolverError(
+        "No ILP solver available. Please install a supported solver "
+        "(e.g., CBC, GLPK) or specify a solver with --solver."
+    )

This raises PulpSolverError so it's caught by the existing except block at line 207, which logs a clear warning and falls back to the greedy scheduler.

🧹 Nitpick comments (1)
src/snakemake/scheduling/milp.py (1)

13-59: Approve the lazy solver enumeration approach.

The LpSolverCollection class successfully implements lazy solver discovery, avoiding pulp.listSolvers calls until explicitly needed. The __iter__ and __len__ methods correctly include nondefault_solvers regardless of whether default is None, ensuring the collection properly reports all available solvers.

Consider the consistency difference between listSolvers and getSolver(...).available().

The nondefault_solvers property caches results from pulp.listSolvers(onlyAvailable=True), while __contains__ checks availability via pulp.getSolver(x).available(). If PuLP's internal logic differs between these two methods, or if solver availability changes between the cache time and check time, the collection could report a solver in __iter__ but return False for __contains__. Depending on how critical this edge case is for your use case, you might consider normalizing the availability check logic.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03c627d and f75ef82.

📒 Files selected for processing (1)
  • src/snakemake/scheduling/milp.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/milp.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2025-09-05T16:17:21.298Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3726
File: src/snakemake/common/__init__.py:413-414
Timestamp: 2025-09-05T16:17:21.298Z
Learning: Snakemake requires Python 3.11 at runtime nowadays. Only imports need to work with Python 3.7 for compatibility reasons.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 2985
File: tests/tests.py:2051-2051
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3014
File: snakemake/workflow.py:1147-1147
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid suggesting type annotations for functions that are inside methods in the Snakemake codebase.

Applied to files:

  • src/snakemake/scheduling/milp.py
🔇 Additional comments (1)
src/snakemake/scheduling/milp.py (1)

64-81: LGTM: SchedulerSettings correctly integrates with lazy solver collection.

The changes properly use lp_solvers.default instead of indexing, and the new lp_solver_available property provides a clean interface for checking solver availability at use time via __contains__.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/snakemake/scheduling/milp.py (1)

78-80: Consider adding a docstring to clarify the property's purpose.

The lp_solver_available property is concise and correct, but a brief docstring would help document that it checks whether the configured solver is available without triggering full solver enumeration.

🔎 Suggested docstring
 @property
 def lp_solver_available(self):
+    """Check if the configured solver is available via pulp.getSolver().available()."""
     return self.solver in lp_solvers
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f75ef82 and 40f4401.

📒 Files selected for processing (1)
  • src/snakemake/scheduling/milp.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/milp.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2025-09-05T16:17:21.298Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3726
File: src/snakemake/common/__init__.py:413-414
Timestamp: 2025-09-05T16:17:21.298Z
Learning: Snakemake requires Python 3.11 at runtime nowadays. Only imports need to work with Python 3.7 for compatibility reasons.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 2985
File: tests/tests.py:2051-2051
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.

Applied to files:

  • src/snakemake/scheduling/milp.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3014
File: snakemake/workflow.py:1147-1147
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid suggesting type annotations for functions that are inside methods in the Snakemake codebase.

Applied to files:

  • src/snakemake/scheduling/milp.py
🔇 Additional comments (3)
src/snakemake/scheduling/milp.py (3)

13-59: Excellent lazy loading design that achieves the PR objective.

The LpSolverCollection successfully defers the expensive listSolvers(onlyAvailable=True) call until needed (e.g., CLI help display) while enabling efficient individual solver checks via getSolver().available(). The use of cached_property on nondefault_solvers ensures the list is enumerated at most once.


1-6: LGTM! Clean imports and integration.

The addition of cached_property and the global lp_solvers instance integrate cleanly with the existing code structure.

Also applies to: 61-61


254-254: The upstream guard is sufficient—no action needed.

The api.py check at line 609 confirms that scheduler_settings.lp_solver_available is verified before ILP scheduling is invoked. The lp_solver_available property (milp.py:79-80) validates that self.solver is in the available solvers list, ensuring the solver is never None or invalid when _solve_ilp is called. The try-except wrapping pulp.getSolver provides additional defensive fallback for solver initialization failures.

@coroa
Copy link
Copy Markdown
Contributor Author

coroa commented Jan 5, 2026

@coderabbitai Add docstrings, please

@coroa
Copy link
Copy Markdown
Contributor Author

coroa commented Jan 5, 2026

@coderabbitai generate docstrings

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 5, 2026

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 5, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #3904

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 5, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the scheduling/milp.py file to see the current code structure
cat src/snakemake/scheduling/milp.py

Length of output: 8739


@coroa: I'll add comprehensive docstrings to the methods and classes in src/snakemake/scheduling/milp.py. Here's the updated code:

from dataclasses import dataclass, field
import math
import os
from functools import cached_property
from pathlib import Path
from typing import Collection, Dict, Iterator, List, Mapping, Optional, Sequence, Union
from snakemake_interface_scheduler_plugins.base import SchedulerBase
from snakemake_interface_scheduler_plugins.settings import SchedulerSettingsBase
from snakemake_interface_scheduler_plugins.interfaces.jobs import JobSchedulerInterface
from snakemake_interface_common.io import AnnotatedStringInterface


class LpSolverCollection(Collection[str]):
    """A lazy collection that avoids calling pulp.listSolvers if the default solver is selected.
    
    This collection provides access to available linear programming solvers from PuLP
    without eagerly enumerating all solvers, which can cause issues with commercial
    solver license acquisition.
    """

    def __init__(self):
        """Initialize the solver collection and determine the default solver.
        
        Attempts to import PuLP and retrieve the default solver name from
        pulp.apis.LpSolverDefault. If PuLP is not installed or no default
        solver is available, sets default to None.
        """
        default = None
        try:
            import pulp

            solver_default = pulp.apis.LpSolverDefault
            if solver_default is not None:
                default = solver_default.name
        except ImportError:
            pass

        self.default = default

    @cached_property
    def nondefault_solvers(self) -> List[str]:
        """Get a sorted list of available non-default solvers.
        
        This property is cached and only evaluated when accessed, deferring the
        call to pulp.listSolvers until necessary.
        
        Returns:
            List[str]: Sorted list of available solver names excluding the default solver.
        """
        try:
            import pulp

            return sorted(
                solver
                for solver in pulp.listSolvers(onlyAvailable=True)
                if solver != self.default
            )
        except ImportError:
            return []

    def __iter__(self) -> Iterator[str]:
        """Iterate over all available solvers, starting with the default.
        
        Yields:
            str: Solver names, with the default solver yielded first if available.
        """
        if self.default is not None:
            yield self.default
        yield from self.nondefault_solvers

    def __contains__(self, x: object) -> bool:
        """Check if a solver is available.
        
        Args:
            x: The solver name to check.
            
        Returns:
            bool: True if the solver is available, False otherwise.
        """
        try:
            import pulp

            return pulp.getSolver(x).available()
        except Exception:  # noqa: BLE001
            return False

    def __len__(self) -> int:
        """Get the total number of available solvers.
        
        Returns:
            int: The count of available solvers including the default.
        """
        return (1 if self.default is not None else 0) + len(self.nondefault_solvers)


lp_solvers = LpSolverCollection()


@dataclass
class SchedulerSettings(SchedulerSettingsBase):
    """Settings for the MILP scheduler.
    
    Attributes:
        solver: Name of the MILP solver to use. Defaults to the system's default solver.
        solver_path: Optional path to search for scheduler solver binaries.
    """
    
    solver: Optional[str] = field(
        default=lp_solvers.default,
        metadata={
            "help": "Set MILP solver to use",
            "choices": lp_solvers,
        },
    )
    solver_path: Optional[Path] = field(
        default=None,
        metadata={"help": "Set the PATH to search for scheduler solver binaries."},
    )

    @property
    def lp_solver_available(self):
        """Check if the configured solver is available.
        
        Returns:
            bool: True if the configured solver exists in the available solvers.
        """
        return self.solver in lp_solvers


class Scheduler(SchedulerBase):
    """MILP-based job scheduler for Snakemake.
    
    This scheduler uses Mixed Integer Linear Programming to optimally select jobs
    based on priorities, resource requirements, and temporary file management.
    Falls back to greedy scheduling if the ILP solver fails or is unavailable.
    """
    
    def __post_init__(self) -> None:
        """Initialize the scheduler state after dataclass initialization.
        
        Sets up the technical failure flag used to track solver failures
        and enable early fallback to greedy scheduling.
        """
        self._technical_failure = False

    def select_jobs(
        self,
        selectable_jobs: Sequence[JobSchedulerInterface],
        remaining_jobs: Sequence[JobSchedulerInterface],
        available_resources: Mapping[str, Union[int, str]],
        input_sizes: Dict[AnnotatedStringInterface, int],
    ) -> Optional[Sequence[JobSchedulerInterface]]:
        """Select jobs to execute using MILP optimization.
        
        Formulates a Mixed Integer Linear Programming problem to select an optimal
        subset of jobs considering job priorities, resource constraints, core load,
        and temporary file management. Falls back to None (greedy scheduler) on failure.
        
        Args:
            selectable_jobs: Jobs that can potentially be scheduled in this round.
            remaining_jobs: All jobs remaining in the workflow.
            available_resources: Currently available resources (cores, memory, etc.).
            input_sizes: Sizes of input files in bytes.
            
        Returns:
            Optional[Sequence[JobSchedulerInterface]]: Selected jobs to execute, or None
                to fall back to greedy scheduling.
        """
        if self._technical_failure:
            # fallback early since we failed before already
            return None
        import pulp
        from pulp import lpSum
        from pulp import PulpSolverError

        scheduled_jobs = {
            job: pulp.LpVariable(
                f"job_{idx}", lowBound=0, upBound=1, cat=pulp.LpInteger
            )
            for idx, job in enumerate(selectable_jobs)
        }

        job_temp_files = {}
        for job in remaining_jobs:
            job_temp_files[job] = {
                infile for infile in job.input if infile.is_flagged("temp")
            }

        temp_files = {
            f for job in selectable_jobs for f in job.input if f.is_flagged("temp")
        }

        temp_sizes_gb = {f: input_sizes[f] / 1e9 for f in temp_files}

        temp_job_improvement = {
            temp_file: pulp.LpVariable(
                f"temp_file_{idx}", lowBound=0, upBound=1, cat="Continuous"
            )
            for idx, temp_file in enumerate(temp_files)
        }

        temp_file_deletable = {
            temp_file: pulp.LpVariable(
                f"deletable_{idx}",
                lowBound=0,
                upBound=1,
                cat=pulp.LpInteger,
            )
            for idx, temp_file in enumerate(temp_files)
        }
        prob = pulp.LpProblem("JobScheduler", pulp.LpMaximize)

        total_temp_size = max(
            sum([temp_sizes_gb[temp_file] for temp_file in temp_files]), 1
        )
        total_core_requirement = sum(
            [
                max(job.scheduler_resources.get("_cores", 1), 1)
                for job in selectable_jobs
            ]
        )
        # Objective function
        # Job priority > Core load
        # Core load > temp file removal
        # Instant removal > temp size
        prob += (
            2
            * total_core_requirement
            * 2
            * total_temp_size
            * lpSum([job.priority * scheduled_jobs[job] for job in selectable_jobs])
            + 2
            * total_temp_size
            * lpSum(
                [
                    max(job.scheduler_resources.get("_cores", 1), 1)
                    * scheduled_jobs[job]
                    for job in selectable_jobs
                ]
            )
            + total_temp_size
            * lpSum(
                [
                    temp_file_deletable[temp_file] * temp_sizes_gb[temp_file]
                    for temp_file in temp_files
                ]
            )
            + lpSum(
                [
                    temp_job_improvement[temp_file] * temp_sizes_gb[temp_file]
                    for temp_file in temp_files
                ]
            )
        )

        # Constraints:
        for name in available_resources:
            prob += (
                lpSum(
                    [
                        scheduled_jobs[job] * job.scheduler_resources.get(name, 0)
                        for job in selectable_jobs
                    ]
                )
                <= available_resources[name]
            )

        # Choose jobs that lead to "fastest" (minimum steps) removal of existing temp file
        for temp_file in temp_files:
            prob += temp_job_improvement[temp_file] <= lpSum(
                [
                    scheduled_jobs[job]
                    for job in selectable_jobs
                    if temp_file in job_temp_files[job]
                ]
            ) / lpSum([1 for job in remaining_jobs if temp_file in job_temp_files[job]])

            prob += temp_file_deletable[temp_file] <= temp_job_improvement[temp_file]

        try:
            status = self._solve_ilp(prob, time_limit=10)
        except PulpSolverError as e:
            self._technical_failure = True
            self.logger.warning(
                "Failed to solve scheduling problem with ILP solver, falling back to "
                "greedy scheduler. You likely have to fix your ILP solver "
                f"installation. Error message: {e}"
            )
            return None
        if pulp.LpStatus[status] != "Optimal":
            if pulp.LpStatus[status] == "Not Solved":
                self.logger.warning(
                    "Failed to solve scheduling problem with ILP solver in time (10s), "
                    "falling back to greedy scheduler."
                )
            elif pulp.LpStatus[status] == "Infeasible":
                self.logger.warning(
                    "Failed to solve scheduling problem with ILP solver, falling back "
                    "to greedy scheduler."
                )
            return None

        selected_jobs = [
            job
            for job, variable in scheduled_jobs.items()
            if math.isclose(variable.value(), 1.0)
        ]

        if not selected_jobs:
            # No selected jobs. This could be due to insufficient resources or a failure in the ILP solver
            # Hence, we silently fall back to the greedy solver to make sure that we don't miss anything.
            return None

        return selected_jobs

    def _solve_ilp(self, prob, threads=2, time_limit=10):
        """Solve the ILP problem using the configured solver.
        
        Temporarily modifies the PATH environment variable if solver_path is set
        to ensure the solver binary can be found in cluster environments.
        
        Args:
            prob: The PuLP LpProblem to solve.
            threads: Number of threads for the solver to use. Defaults to 2.
            time_limit: Maximum time in seconds for solving. Defaults to 10.
            
        Returns:
            The status code from the solver indicating the result of the optimization.
        """
        import pulp

        old_path = os.environ["PATH"]
        if self.settings.solver_path is not None:
            # Temporarily prepend the given snakemake env to the path, such that the solver can be found in any case.
            # This is needed for cluster envs, where the cluster job might have a different environment but
            # still needs access to the solver binary.
            os.environ["PATH"] = "{}:{}".format(
                self.settings.solver_path,
                os.environ["PATH"],
            )
        try:
            solver = pulp.getSolver(self.settings.solver)
        finally:
            os.environ["PATH"] = old_path
        solver.optionsDict["threads"] = threads
        solver.timeLimit = time_limit
        solver.msg = False  # Suppress solver output
        return prob.solve(solver)

The docstrings now document:

  • Purpose and behavior of each class and method
  • Parameters with types and descriptions
  • Return values with types and descriptions
  • Key implementation details and gotchas

🧠 Learnings used
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.

coderabbitai Bot added a commit that referenced this pull request Jan 5, 2026
Docstrings generation was requested by @coroa.

* #3900 (comment)

The following files were modified:

* `src/snakemake/api.py`
* `src/snakemake/scheduling/milp.py`
@coroa
Copy link
Copy Markdown
Contributor Author

coroa commented Jan 5, 2026

I am confident i addressed all open reservations of the bot. Ready for review!

Comment thread src/snakemake/scheduling/milp.py Outdated
@johanneskoester johanneskoester enabled auto-merge (squash) January 7, 2026 10:15
@johanneskoester johanneskoester merged commit 30e1509 into snakemake:main Jan 7, 2026
59 checks passed
johanneskoester pushed a commit that referenced this pull request Jan 8, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.14.6](v9.14.5...v9.14.6)
(2026-01-08)


### Bug Fixes

* create local clone of git repos for source files from hosting
providers ([#3643](#3643))
([d2f8aba](d2f8aba))
* create potentially missing .snakemake folder in case of very long
command lines for spawned jobs
([#3894](#3894))
([4b431dd](4b431dd))
* make ilp solver enumeration lazy
([#3900](#3900))
([30e1509](30e1509))
* Prevent broken report_href links by using deterministic report IDs
with fixed prefix length
([#3889](#3889))
([6d8f4d8](6d8f4d8))
* refactor LoggerManager setup and scope
([#3851](#3851))
([f46d904](f46d904))
* yield proper error message in case a local git source file is not
retrievable
([#3892](#3892))
([ed79cae](ed79cae))


### Documentation

* explain how to pass nested config via CLI
([#3885](#3885))
([9d8c539](9d8c539))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
@coroa coroa deleted the fix/avoid-calling-pulp-listsolvers branch January 29, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants