Skip to content

fix: make --containerize consider conda env specs from all rules, not just from dry-run DAG jobs#3783

Merged
johanneskoester merged 3 commits intomainfrom
fix/containerize-use-all-env-specs
Oct 14, 2025
Merged

fix: make --containerize consider conda env specs from all rules, not just from dry-run DAG jobs#3783
johanneskoester merged 3 commits intomainfrom
fix/containerize-use-all-env-specs

Conversation

@tedil
Copy link
Copy Markdown
Contributor

@tedil tedil commented Oct 10, 2025

Since rules that depend on checkpoints aren't part of the initial (dry-run) DAG, we previously didn't include their conda env specs in the --containerize call. Now we do ;)

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Containerization now discovers Conda environments from both executed rules and imported modules (including checkpoints), producing a unified, de-duplicated and consistently ordered environment set for image builds.
    • Externally managed environments are explicitly skipped with a warning; dynamic or non-file-based env definitions in non-initial rules are validated and reported with clear errors.
  • Tests

    • Added end-to-end test covering checkpoints, module rules, multiple env files, and Dockerfile generation to verify environment discovery and containerization.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 10, 2025

📝 Walkthrough

Walkthrough

Collects conda environments from both DAG jobs and all workflow rules (including modules), validates non-DAG env specs (disallowing callables/wildcards), resolves file-based envs, skips externally managed envs with a warning, and uses the unified env set for hashing and Dockerfile generation. Adds tests and env YAMLs.

Changes

Cohort / File(s) Summary
Containerize logic
src/snakemake/deployment/containerize.py
Gather conda envs from dag.jobs and from all workflow.rules (including module rules); validate non-DAG rule envs (reject callables or wildcards); resolve file-based specs via infer_source_file; construct CondaEnvFileSpec and obtain concrete envs; skip externally managed envs with a warning; unify env set and update relfile ordering, hashing, and Dockerfile emission.
Checkpoint test workflow and envs
tests/test_containerize_checkpoint/Snakefile, tests/test_containerize_checkpoint/envs/a.yaml, tests/test_containerize_checkpoint/envs/b.yaml, tests/test_containerize_checkpoint/workflow/Snakefile, tests/test_containerize_checkpoint/workflow/envs/c.yaml
Add a small workflow including a module, rules (one checkpoint), and three conda env YAMLs (a.yaml, b.yaml, c.yaml) to exercise env discovery from DAG and non-DAG/module rules.
Conda/containerize test
tests/tests_using_conda.py
Add test_containerize_checkpoint to run containerize, assert Dockerfile contains source/COPY entries for the three env files and that three conda env creations occur; includes cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Snakemake
  participant DAG
  participant Workflow
  participant EnvResolver as Env Resolver
  participant DockerGen as Dockerfile Gen

  User->>Snakemake: snakemake --containerize
  Snakemake->>DAG: Collect conda_envs from initial DAG jobs
  Snakemake->>Workflow: Iterate all rules (incl. modules & non-DAG)
  Snakemake->>EnvResolver: Validate & resolve env specs
  EnvResolver-->>Snakemake: file-based env paths or errors/warnings
  Snakemake->>Snakemake: Build unified set of conda envs (dedupe & sort)
  Snakemake->>DockerGen: Emit COPY + conda env create steps using unified env set
  DockerGen-->>User: Dockerfile with resolved envs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the primary fix—ensuring --containerize considers conda environment specifications from all rules rather than only dry-run DAG jobs—and directly matches the changes introduced without extraneous details.
Description Check ✅ Passed The description provides a clear summary of the change in a single opening paragraph and correctly follows the repository’s template by including a QC section with both required checkboxes ticked, demonstrating test coverage and documentation status.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/containerize-use-all-env-specs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_containerize_checkpoint/workflow/envs/c.yaml (1)

1-4: Optional: Consider removing space before equals sign for consistency with conda conventions.

The dependency specification python =3.13 (line 4) uses a space before the equals sign. While conda accepts both formats, the standard convention is python=3.13 without a space. This same pattern appears in a.yaml and b.yaml.

Apply this diff for consistency with typical conda YAML syntax:

-  - python =3.13
+  - python=3.13
tests/test_containerize_checkpoint/envs/a.yaml (1)

1-4: Optional: Consider removing space before equals sign for consistency with conda conventions.

The dependency specification python =3.11 (line 4) uses a space before the equals sign. While conda accepts both formats, the standard convention is python=3.11 without a space.

Apply this diff for consistency:

-  - python =3.11
+  - python=3.11
tests/test_containerize_checkpoint/envs/b.yaml (1)

1-4: Optional: Consider removing space before equals sign for consistency with conda conventions.

The dependency specification python =3.12 (line 4) uses a space before the equals sign. While conda accepts both formats, the standard convention is python=3.12 without a space.

Apply this diff for consistency:

-  - python =3.12
+  - python=3.12
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb9079 and 4397f38.

📒 Files selected for processing (7)
  • src/snakemake/deployment/containerize.py (1 hunks)
  • tests/test_containerize_checkpoint/Snakefile (1 hunks)
  • tests/test_containerize_checkpoint/envs/a.yaml (1 hunks)
  • tests/test_containerize_checkpoint/envs/b.yaml (1 hunks)
  • tests/test_containerize_checkpoint/workflow/Snakefile (1 hunks)
  • tests/test_containerize_checkpoint/workflow/envs/c.yaml (1 hunks)
  • tests/tests_using_conda.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/deployment/containerize.py
  • tests/tests_using_conda.py
🧬 Code graph analysis (2)
src/snakemake/deployment/containerize.py (3)
src/snakemake/workflow.py (2)
  • sourcecache (497-498)
  • basedir (481-482)
src/snakemake/sourcecache.py (2)
  • LocalSourceFile (105-135)
  • infer_source_file (316-346)
src/snakemake/deployment/conda.py (4)
  • CondaEnvFileSpec (888-928)
  • CondaEnvSpecType (1005-1025)
  • from_spec (1011-1025)
  • is_externally_managed (109-111)
tests/tests_using_conda.py (1)
tests/common.py (1)
  • dpath (33-36)
🪛 Ruff (0.13.3)
src/snakemake/deployment/containerize.py

34-38: Avoid specifying long messages outside the exception class

(TRY003)


43-47: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
🔇 Additional comments (7)
tests/test_containerize_checkpoint/workflow/Snakefile (1)

1-9: LGTM!

The rule definition is correct and properly demonstrates the use of a conda environment from a module. The wildcard in the output pattern and the relative conda env path are both appropriate for this test case.

tests/tests_using_conda.py (1)

371-409: LGTM!

The test implementation is well-structured:

  • Proper test decorators (@skip_on_windows, @conda) ensure it runs in the correct environment
  • Comprehensive assertions verify Dockerfile content includes all three environments (a.yaml, b.yaml, c.yaml) with correct source comments and COPY directives
  • Validates that exactly 3 conda environments are created
  • Includes proper cleanup in the finally block

This effectively tests the PR's objective of including conda environments from rules outside the initial DAG.

tests/test_containerize_checkpoint/Snakefile (1)

1-39: LGTM!

This workflow effectively demonstrates the PR's objective by:

  • Using a module (c_module) to test environment discovery from module rules
  • Including a checkpoint (b) that causes downstream rules (like c from the module) to be excluded from the initial DAG
  • Providing a realistic scenario where --containerize needs to collect environments from both DAG and non-DAG rules

The workflow structure is correct, with proper dependency chains and conda environment specifications.

src/snakemake/deployment/containerize.py (4)

7-8: LGTM!

The new imports are appropriate for the enhanced functionality:

  • infer_source_file resolves environment file paths
  • is_callable and contains_wildcard validate environment specifications

15-21: LGTM!

The docstring clearly explains the function's purpose and the local import of conda-specific types is appropriate to avoid circular dependencies.


23-57: LGTM!

The two-phase collection strategy correctly addresses the PR objective:

  1. Lines 23-24: Collects environments from initial DAG jobs (existing behavior)
  2. Lines 26-57: Extends collection to all workflow rules, including those downstream of checkpoints

The validation logic is sound:

  • Lines 33-38: Prevents dynamic environments (callable/wildcard) in non-DAG rules, which cannot be resolved at containerization time
  • Lines 41-47: Enforces file-based specifications for non-DAG rules, ensuring predictable containerization
  • Lines 51-56: Properly skips externally managed environments with appropriate logging

The use of a set for conda_envs automatically deduplicates environments that appear in both collections.

Note: The static analysis hints about TRY003 (long error messages) are acceptable here, as the detailed messages provide necessary context to users about why their workflow cannot be containerized. As per coding guidelines.


65-65: LGTM!

Correctly uses the unified conda_envs set that now includes both DAG and non-DAG rule environments, completing the integration of the two-phase collection strategy.

Comment thread src/snakemake/deployment/containerize.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/snakemake/deployment/containerize.py (2)

34-39: LGTM! The validation correctly prevents dynamic environments.

The check ensures that conda environments for non-DAG rules can be resolved without workflow execution, which is necessary for containerization.

Note: Static analysis flagged TRY003 for the error message length. While the message is clear and helpful, you could optionally define it as a constant if following strict style guidelines.


41-48: LGTM! The file-based requirement is correctly enforced.

Only file-based conda environments can be containerized, and the error message clearly explains this limitation.

Note: Static analysis flagged TRY003 for the error message length (same as above).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4397f38 and e1575df.

📒 Files selected for processing (1)
  • src/snakemake/deployment/containerize.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/deployment/containerize.py
🧬 Code graph analysis (1)
src/snakemake/deployment/containerize.py (2)
src/snakemake/sourcecache.py (2)
  • LocalSourceFile (105-135)
  • infer_source_file (316-346)
src/snakemake/deployment/conda.py (4)
  • CondaEnvFileSpec (888-928)
  • CondaEnvSpecType (1005-1025)
  • from_spec (1011-1025)
  • is_externally_managed (109-111)
🪛 Ruff (0.13.3)
src/snakemake/deployment/containerize.py

35-39: Avoid specifying long messages outside the exception class

(TRY003)


44-48: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
🔇 Additional comments (6)
src/snakemake/deployment/containerize.py (6)

7-8: LGTM! Imports support the new env collection logic.

The new imports are necessary for resolving and validating conda environment definitions from non-DAG rules.


15-20: Good documentation of the updated behavior.

The docstring clearly explains that the function now collects conda environment specifications from both the initial DAG and all workflow rules.


21-21: LGTM! Local import is appropriate.

Moving the imports inside the function can help avoid potential circular dependencies.


23-25: LGTM! DAG env collection is correct.

The code correctly collects conda environments from the initial DAG and tracks job names to avoid reprocessing those rules later.


27-32: LGTM! The skip condition correctly addresses past review concerns.

The check rule.name in dag_jobs ensures that rules already processed via the DAG are not reprocessed, preventing the error cases from being triggered for DAG jobs.


66-66: LGTM! Correctly uses the unified environment set.

The change from deriving environments solely from DAG jobs to using the combined conda_envs set ensures that all conda environments (including those from non-DAG rules) are included in containerization.

Comment thread src/snakemake/deployment/containerize.py
@johanneskoester johanneskoester merged commit d092d8c into main Oct 14, 2025
60 checks passed
@johanneskoester johanneskoester deleted the fix/containerize-use-all-env-specs branch October 14, 2025 15:06
johanneskoester pushed a commit that referenced this pull request Oct 17, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.13.3](v9.13.2...v9.13.3)
(2025-10-17)


### Bug Fixes

* corner case bug causing key error when module statement is used
without config directive; better error message when empty string is
defined as input or output file
([#3792](#3792))
([640c549](640c549))
* make `--containerize` consider conda env specs from all rules, not
just from dry-run DAG jobs
([#3783](#3783))
([d092d8c](d092d8c))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
…ot just from dry-run DAG jobs (snakemake#3783)

Since rules that depend on checkpoints aren't part of the initial
(dry-run) DAG, we previously didn't include their conda env specs in the
`--containerize` call. Now we do ;)

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
- Containerization now discovers Conda environments from both executed
rules and imported modules (including checkpoints), producing a unified,
de-duplicated and consistently ordered environment set for image builds.
- Externally managed environments are explicitly skipped with a warning;
dynamic or non-file-based env definitions in non-initial rules are
validated and reported with clear errors.

- Tests
- Added end-to-end test covering checkpoints, module rules, multiple env
files, and Dockerfile generation to verify environment discovery and
containerization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.13.3](snakemake/snakemake@v9.13.2...v9.13.3)
(2025-10-17)


### Bug Fixes

* corner case bug causing key error when module statement is used
without config directive; better error message when empty string is
defined as input or output file
([snakemake#3792](snakemake#3792))
([640c549](snakemake@640c549))
* make `--containerize` consider conda env specs from all rules, not
just from dry-run DAG jobs
([snakemake#3783](snakemake#3783))
([d092d8c](snakemake@d092d8c))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants