Skip to content

fix: create local clone of git repos for source files from hosting providers#3643

Merged
johanneskoester merged 27 commits intomainfrom
fix/source-cache-git-clone
Jan 8, 2026
Merged

fix: create local clone of git repos for source files from hosting providers#3643
johanneskoester merged 27 commits intomainfrom
fix/source-cache-git-clone

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Jun 23, 2025

Description

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Add Git-hosted source support for importing files/modules from GitHub/GitLab with token-aware access and on-disk caching.
  • Improvements

    • Propagate and improve runtime caching for hosted sources.
    • Safer secret-free path handling and centralized path validation/formatting.
    • Streamlined Conda environment spec detection and validation.
  • Tests

    • Added a GitHub-import integration test (skipped by default).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 23, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !pyproject.toml

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds git-backed hosting support and extends SourceFile APIs (check, format, endswith, replace_suffix); introduces HostedGitRepo and HostingProviderFile with auth/cache_path/checkout/open; propagates cache_path through SourceCache; updates conda env spec APIs and callers; centralizes I/O path checks.

Changes

Cohort / File(s) Change Summary
Core Git-backed hosting & SourceFile surface
src/snakemake/sourcecache.py
Adds HostedGitRepo; expands SourceFile abstract API (check, format, endswith, replace_suffix); implements HostingProviderFile (repo/path/host/cache_path, auth, hosted_repo, get_path_or_uri, mtime, _checkout, apply_wildcards, str); adds Local/Generic/LocalGit implementations and path-suffix utilities; runtime cache dir renamed to snakemake-runtime-cache.
Workflow integration
src/snakemake/workflow.py
Imports HostingProviderFile; assigns cache_path for hosting-backed snakefiles; treats hosting-backed files like local files for sys.path insertion and reuses computed snakefile path/URI.
SourceCache & runtime I/O
src/snakemake/sourcecache.py
Propagates cache_path to HostingProviderFile in _cache_entry; integrates HostingProviderFile.open into file resolution; preserves LocalGitFile behavior; adds __str__ for hosting-backed sources.
Conda env spec API & callers
src/snakemake/deployment/conda.py, src/snakemake/rules.py, src/snakemake/deployment/containerize.py
Simplifies CondaEnvFileSpec/CondaEnvDirSpec constructors (remove rule param); apply_wildcards now single-arg and returns Self; CondaEnvSpecType.from_spec updated (YAML/dir detection and LocalSourceFile handling); callers adjusted and validations added; containerize reorders detection before I/O.
Source path handling & secret propagation
src/snakemake/parser.py, src/snakemake/script/__init__.py
Calls to get_path_or_uri now use secret_free=True; script.get_source formats SourceFile via path.format(...) when wildcards/params provided.
I/O validation centralization
src/snakemake/io/__init__.py
Extracts inline path-warning logic into new public check(iofile, rule=None, is_storage=False) and delegates _IOFile.check to it.
Typing / utilities
src/snakemake/common/__init__.py
Adds return type annotation to smart_join.
Tests & examples
tests/tests.py, tests/test_sourcecache.py, tests/test_python_import_from_github_module/*, tests/tests_using_conda.py
Adds skipped integration test for importing from GitHub; new minimal test module files; updates tests for hosting behavior and adds a GitHub fetch test; minor test renames and imports.

Sequence Diagram

sequenceDiagram
    participant Workflow
    participant SourceCache
    participant HostingProviderFile
    participant HostedGitRepo
    participant RemoteGit

    Workflow->>SourceCache: request cache entry for source_file
    SourceCache->>HostingProviderFile: set cache_path (if hosted)
    Workflow->>HostingProviderFile: open() / get_path_or_uri()
    HostingProviderFile->>HostedGitRepo: ensure repo checkout / fetch(ref)
    HostedGitRepo->>RemoteGit: clone or fetch from remote host
    RemoteGit-->>HostedGitRepo: repository data
    HostedGitRepo-->>HostingProviderFile: provide checked-out file path
    HostingProviderFile-->>Workflow: return file handle or path
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • tedil
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: creating local clones of Git repositories for source files from hosting providers.
Description check ✅ Passed The description provides a QC checklist with both required boxes checked, confirming test coverage and documentation review, though it lacks detailed explanation of the changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

211-218: Consider cleanup on move failure

If shutil.move fails, the temporary directory might not be cleaned up properly since it was created with a custom prefix.

Consider wrapping the move operation in a try-except to ensure cleanup:

with tempfile.TemporaryDirectory(prefix=f"{self.repo_clone}.") as tmpdir:
    # the clone is not atomic, hence we do that in a temporary directory
    Repo.clone_from(repo_url, to_path=tmpdir)
-    # move is atomic, so we can safely move the directory to the final
-    # location
-    shutil.move(tmpdir, self.repo_clone)
+    # move is atomic, so we can safely move the directory to the final
+    # location
+    try:
+        shutil.move(tmpdir, self.repo_clone)
+    except Exception:
+        # Ensure cleanup even if move fails
+        raise
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10b4ddf and 03e36a2.

📒 Files selected for processing (3)
  • src/snakemake/sourcecache.py (10 hunks)
  • src/snakemake/workflow.py (3 hunks)
  • tests/tests.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • tests/tests.py
  • src/snakemake/workflow.py
  • src/snakemake/sourcecache.py
🪛 Ruff (0.11.9)
src/snakemake/sourcecache.py

31-31: Undefined name TYPE_CHECKING

(F821)


381-385: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Pylint (3.3.7)
src/snakemake/sourcecache.py

[error] 31-31: Undefined variable 'TYPE_CHECKING'

(E0602)


[refactor] 232-238: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 279-279: Too many instance attributes (8/7)

(R0902)


[refactor] 284-284: Too many arguments (8/5)

(R0913)


[refactor] 284-284: Too many positional arguments (8/5)

(R0917)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (9, macos-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
🔇 Additional comments (8)
src/snakemake/sourcecache.py (4)

242-278: Well-structured checkout management

The implementation properly handles different checkout scenarios and includes good error recovery with fetch retry. The staleness tracking and assertion in source_path help catch programming errors early.


327-333: Good validation for host parameter

The validation ensures the host is provided as a domain name without protocol prefix, which prevents URL construction errors later. The error message provides clear context about what went wrong.


342-350: Proper cache path management

The implementation correctly manages cache paths with appropriate assertions and directory structure. The integration with SourceCache ensures the cache path is set before use.

Also applies to: 524-526


430-434: Clean authentication implementation

The auth properties provide a clean abstraction for authentication tokens, correctly formatting them with "@" suffix for URL construction.

Also applies to: 443-447

tests/tests.py (1)

2328-2331: New test for GitHub module imports

The test follows the established pattern and appropriately tests the new functionality for importing Python modules from GitHub-hosted sources.

src/snakemake/workflow.py (3)

124-124: LGTM: Import addition supports new functionality.

The HostingProviderFile import is correctly added to support the hosted git repository feature being introduced.


582-584: LGTM: Proper cache path setup for hosted git repositories.

The cache path assignment ensures that HostingProviderFile instances use the workflow's source cache, enabling proper caching of hosted git repositories.


610-615: LGTM: Extends module import support to hosted git repositories.

The modification correctly extends the existing sys.path insertion logic to include HostingProviderFile instances, enabling Python module imports from hosted git repositories while maintaining consistency with LocalSourceFile handling.

Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/snakemake/sourcecache.py (1)

377-386: Preserve exception chain for better debugging.

When re-raising exceptions, use raise ... from e to preserve the original exception chain.

def _checkout(self):
    try:
        return self.hosted_repo.checkout(self.branch, self.commit or self.tag)
    except Exception as e:
        raise WorkflowError(
            "Failed to clone/checkout git repository "
            f"{self.host}/{self.repo} at {self.branch or self.commit or self.tag}.",
-            e,
-        )
+        ) from e
🧹 Nitpick comments (2)
src/snakemake/sourcecache.py (2)

190-240: Well-designed git repository management class.

The HostedGitRepo class provides solid functionality with atomic cloning operations and proper caching. The design prevents repository corruption through temporary directory usage and implements efficient checkout management.

Consider these improvements:

  1. More specific exception handling in clone operation - The current broad exception handling in checkout could mask specific git errors
  2. Explicit cleanup documentation - While tempfile.TemporaryDirectory handles cleanup automatically, documenting the atomic operation guarantees would help maintainability

242-277: Solid checkout management implementation.

The HostedGitRepoCheckout class effectively manages git checkouts with proper branch updating logic and stale state tracking.

Consider improving exception handling specificity.

The broad exception handling at line 261 could mask important git errors. Consider catching specific git exceptions instead:

-        except Exception:
+        except git.exc.GitCommandError:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03e36a2 and 61a9367.

📒 Files selected for processing (1)
  • src/snakemake/sourcecache.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/sourcecache.py
🪛 Pylint (3.3.7)
src/snakemake/sourcecache.py

[refactor] 232-238: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 279-279: Too many instance attributes (8/7)

(R0902)


[refactor] 284-284: Too many arguments (8/5)

(R0913)


[refactor] 284-284: Too many positional arguments (8/5)

(R0917)

🪛 Ruff (0.11.9)
src/snakemake/sourcecache.py

381-385: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (9, macos-latest, py312)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (5, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (11)
src/snakemake/sourcecache.py (11)

12-12: Import addition looks good.

This addresses the missing TYPE_CHECKING import flagged in previous reviews.


31-34: TYPE_CHECKING block properly implemented.

The conditional import block for git module is correctly structured to avoid runtime import costs while providing type hints.


286-293: Constructor enhancement looks good.

The addition of cache_path parameter and improved type hints enhance the class functionality. The optional parameters maintain backward compatibility while enabling better dependency injection.

The pylint warnings about argument count are acceptable given the functional requirements.


327-333: Excellent input validation for host format.

This validation prevents common user errors and provides clear guidance on the expected host format. The early validation approach follows fail-fast principles.


337-361: Well-designed property implementations.

The properties provide clean abstractions for authentication, cache management, and lazy repository initialization. The cache path setter automatically handling the subdirectory is a nice touch for encapsulation.


369-376: Good delegation to checkout system.

The methods properly delegate path and modification time queries to the checkout system, maintaining clean separation of concerns.


399-400: Constructor calls properly updated.

The addition of cache_path parameter maintains consistency with the updated constructor signature.

Also applies to: 415-416


425-434: Clean refactoring of GitHub authentication.

The refactoring properly delegates authentication to the base class while maintaining GitHub-specific token handling. The removal of redundant methods improves maintainability.


442-446: Consistent GitLab authentication refactoring.

The refactoring mirrors the GitHub implementation, maintaining consistency while providing GitLab-specific token handling.


491-491: Improved cache directory naming.

The change to "snakemake-runtime-cache" provides clearer identification and reduces potential conflicts with other tools.


524-526: Proper cache path initialization.

Setting the cache_path on HostingProviderFile instances ensures proper initialization before caching operations. This enables the dependency injection pattern used by the new git repository management system.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 23, 2025

Please format your code with pixi run format

durr1602 added a commit to durr1602/gyoza that referenced this pull request Jul 7, 2025
until we get something better, e.g. from snakemake/snakemake#3643
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec416a3 and 353609b.

📒 Files selected for processing (3)
  • src/snakemake/sourcecache.py
  • src/snakemake/workflow.py
  • tests/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/workflow.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • tests/tests.py
  • src/snakemake/sourcecache.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

  • tests/tests.py
📚 Learning: 2024-11-07T00:32:44.137Z
Learnt from: mbhall88
Repo: snakemake/snakemake PR: 3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.

Applied to files:

  • tests/tests.py
📚 Learning: 2024-11-12T20:22:54.184Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (1)
src/snakemake/sourcecache.py (1)
src/snakemake/workflow.py (1)
  • source_path (1551-1573)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

266-266: Do not catch blind exception: Exception

(BLE001)


333-337: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Do not catch blind exception: Exception

(BLE001)


386-390: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


386-390: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py313)
🔇 Additional comments (11)
tests/tests.py (2)

2495-2497: LGTM! Clean test addition.

The new test follows the established pattern and correctly tests the GitHub module import functionality introduced in this PR.


2710-2710: LGTM! Appropriate platform skip.

The decorator correctly skips this test on Windows due to filename length limitations, with a clear explanatory comment.

src/snakemake/sourcecache.py (9)

12-12: LGTM! Proper type checking import.

The TYPE_CHECKING import is correctly used to conditionally import the git module for type hints without adding a runtime dependency.

Also applies to: 32-34


195-245: LGTM! Solid git repository management implementation.

The HostedGitRepo class properly handles:

  • Lock-free cloning using temporary directories with atomic moves
  • Checkout caching to avoid redundant operations
  • Authentication via URL tokens (acceptable per coding guidelines)

Based on learnings, embedding authentication tokens in Git repository URLs is accepted in the Snakemake codebase.


247-282: LGTM! Well-designed checkout management.

The HostedGitRepoCheckout class handles:

  • Branch updates when repository existed previously
  • Fallback logic with fetch when initial checkout fails
  • Staleness tracking to prevent using outdated checkouts
  • Correct mtime implementation based on local clone

The bare Exception catch at line 266 provides robustness for the fallback fetch-and-checkout logic, which is acceptable here.


332-337: LGTM! Good validation for common user error.

The host validation correctly prevents users from including protocol prefixes, with a clear error message explaining the expected format.


346-366: LGTM! Clean property implementation.

The lazy initialization of hosted_repo is appropriate for expensive git operations, and the cache_path property correctly enforces that it's set before use (by SourceCache).


382-390: Exception handling follows project conventions.

The exception handling correctly wraps git errors in WorkflowError with context about which repository and ref failed. The absence of exception chaining (raise ... from e) is intentional, as per Snakemake's backward compatibility guidelines.


428-439: LGTM! Clean GitHub-specific implementation.

The auth property properly returns the token with "@" suffix when available, and the default host assignment simplifies usage for standard GitHub.


447-458: LGTM! Consistent GitLab implementation.

The auth property follows the same pattern as GithubFile, providing consistent authentication handling across hosting providers.


515-515: LGTM! Proper cache path propagation.

The runtime cache directory rename to "snakemake-runtime-cache" reduces naming collision risk, and setting cache_path for HostingProviderFile instances ensures the property is properly initialized before use.

Also applies to: 548-549

Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/snakemake/sourcecache.py (1)

264-409: Critical: Missing abstract method implementation.

HostingProviderFile extends SourceFile which declares get_path_or_uri(self, secret_free: bool) as abstract (line 50), but this class does not implement it. While GithubFile and GitlabFile subclasses provide implementations, any code working directly with HostingProviderFile instances will fail with TypeError.

🔎 Proposed fix

If HostingProviderFile is intended to be abstract (only instantiated via subclasses), consider making it explicitly abstract by inheriting from ABC and decorating get_path_or_uri with @abstractmethod.

Alternatively, if it should be instantiable, implement the method:

+    def get_path_or_uri(self, secret_free: bool = True) -> str:
+        checkout = self._checkout()
+        return str(checkout.source_path(self.path))

However, based on the code structure, it appears this class is meant to be abstract, so the first approach is recommended.

🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

229-246: Optional: Move success returns to else blocks for clarity.

Static analysis suggests moving the return True statements to else blocks to make the success path more explicit.

🔎 Proposed refactor
 def ref_exists(self, ref: str):
     import git
 
     try:
         self.repo.git.rev_parse("--verify", ref)
-        return True
     except git.GitCommandError:
         return False
+    else:
+        return True

 def file_exists(self, path: str, ref: str):
     try:
         # Get the tree for the specific revision
         tree = self.repo.commit(ref).tree
         # Attempt to access the file by path
         tree[path]
-        return True
     except KeyError:
         return False
+    else:
+        return True
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 353609b and 343b4fd.

📒 Files selected for processing (2)
  • src/snakemake/sourcecache.py
  • tests/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/tests.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/sourcecache.py
🧠 Learnings (3)
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

234-234: Consider moving this statement to an else block

(TRY300)


244-244: Consider moving this statement to an else block

(TRY300)


313-317: Avoid specifying long messages outside the exception class

(TRY003)


372-374: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


372-374: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (8, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
🔇 Additional comments (6)
src/snakemake/sourcecache.py (6)

195-224: LGTM: Robust lock-free cloning design.

The lock-free cloning pattern using a temporary directory with atomic move is well-designed and prevents race conditions during concurrent access.


312-317: Good defensive validation for host parameter.

The check prevents users from accidentally including protocol prefixes in the host parameter, which would break git URL construction.


414-422: LGTM: Clean auth property implementation.

The auth property cleanly provides authentication credentials in the format expected by HostedGitRepo, with proper handling of missing tokens.


437-441: LGTM: Consistent auth implementation.

The auth property implementation mirrors GithubFile, providing consistency across hosting providers.


531-533: LGTM: Proper cache_path propagation.

Setting cache_path on HostingProviderFile instances before computing their cache path ensures the git repository cache is properly initialized.


604-605: LGTM: Clean delegation for git-hosted sources.

Delegating to source_file.open() for HostingProviderFile instances keeps the caching logic clean and leverages the new git-backed file access.

Comment thread src/snakemake/sourcecache.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/snakemake/sourcecache.py (1)

415-427: Missing runtime import for git in exception handler.

git.GitCommandError on line 424 requires git to be imported at runtime, but it's only imported under TYPE_CHECKING. This will raise a NameError when the exception occurs.

🔎 Proposed fix
     def open(self) -> io.BytesIO:
+        import git
+
         if not self.hosted_repo.ref_exists(
             self.ref
         ) or not self.hosted_repo.file_exists(path=self.path, ref=self.ref):
             self.hosted_repo.fetch()
         try:
             return io.BytesIO(
                 self.hosted_repo.repo.git.show(f"{self.ref}:{self.path}").encode()
             )
         except git.GitCommandError as e:
             raise WorkflowError(
                 f"Failed to get cached git source file {self.repo}:{self.path}: {e}. "
             )
🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

304-304: Consider annotating shared cache with ClassVar.

The _hosted_repos dictionary is shared across all instances. Using ClassVar makes this intent explicit and satisfies type checkers.

🔎 Proposed fix
+from typing import ClassVar
...
-    _hosted_repos: Dict[str, HostedGitRepo] = {}
+    _hosted_repos: ClassVar[Dict[str, HostedGitRepo]] = {}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 343b4fd and 0c16ad2.

📒 Files selected for processing (3)
  • src/snakemake/common/__init__.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/sourcecache.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/common/__init__.py
  • src/snakemake/sourcecache.py
  • src/snakemake/deployment/conda.py
🧠 Learnings (7)
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 2985
File: tests/tests.py:2051-2051
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.

Applied to files:

  • src/snakemake/common/__init__.py
📚 Learning: 2024-08-13T16:22:09.641Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3014
File: snakemake/workflow.py:1147-1147
Timestamp: 2024-08-13T16:22:09.641Z
Learning: Avoid suggesting type annotations for functions that are inside methods in the Snakemake codebase.

Applied to files:

  • src/snakemake/common/__init__.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3014
File: snakemake/workflow.py:0-0
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake project, type annotations are not necessary for internal helper functions.

Applied to files:

  • src/snakemake/common/__init__.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.

Applied to files:

  • src/snakemake/deployment/conda.py
🧬 Code graph analysis (3)
src/snakemake/common/__init__.py (1)
src/snakemake/sourcecache.py (1)
  • abspath (143-144)
src/snakemake/sourcecache.py (2)
src/snakemake/common/__init__.py (4)
  • LockFreeWritableFile (391-432)
  • is_local_file (131-132)
  • parse_uri (135-150)
  • smart_join (153-170)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/deployment/conda.py (2)
src/snakemake/io/__init__.py (1)
  • file (459-466)
src/snakemake/sourcecache.py (5)
  • apply_wildcards (69-69)
  • apply_wildcards (110-111)
  • apply_wildcards (131-132)
  • apply_wildcards (179-186)
  • apply_wildcards (351-360)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

37-37: Undefined name Mapping

(F821)


267-267: Consider moving this statement to an else block

(TRY300)


277-277: Consider moving this statement to an else block

(TRY300)


304-304: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


385-389: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


385-389: Avoid specifying long messages outside the exception class

(TRY003)


425-427: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


425-427: Avoid specifying long messages outside the exception class

(TRY003)

src/snakemake/deployment/conda.py

893-893: Unused method argument: rule

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (10, macos-latest, py313)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (8, macos-latest, py313)
  • GitHub Check: tests (5, macos-latest, py313)
🔇 Additional comments (8)
src/snakemake/common/__init__.py (1)

153-170: LGTM!

The return type annotation -> str accurately reflects the function's behavior across all code paths.

src/snakemake/sourcecache.py (5)

68-79: LGTM!

The abstract apply_wildcards method enforces implementation across all SourceFile subclasses, and the join method correctly uses the base class's path for joining operations.


378-399: LGTM!

The hosted_repo property correctly caches instances and validates the host format. The exception handling follows the project's convention. Based on learnings, exception chaining is intentionally avoided in the Snakemake codebase.


473-496: LGTM!

The auth property implementations correctly format tokens for HTTPS Git URL authentication. Based on learnings, including authentication tokens in Git repository URLs is an accepted pattern in this codebase.


554-554: Runtime cache directory renamed.

The rename from "runtime-cache" to "snakemake-runtime-cache" adds a namespace prefix. Since the runtime cache is ephemeral (created as a TemporaryDirectory), this change should have no impact on existing workflows.


586-598: LGTM!

The _cache_entry method correctly propagates cache_path to HostingProviderFile instances, enabling git-backed source resolution.

src/snakemake/deployment/conda.py (2)

893-898: LGTM with note on unused parameter.

The refactored CondaEnvFileSpec correctly delegates to the SourceFile abstraction. The rule parameter in __init__ is unused but retained for API consistency with sibling classes (CondaEnvDirSpec, CondaEnvNameSpec), which is acceptable.


916-918: No action needed. The contains_wildcard function explicitly converts its parameter to a string using str(path), and SourceFile has a proper __str__ implementation that returns the path. The code works correctly with SourceFile objects.

Likely an incorrect or invalid review comment.

Comment thread src/snakemake/sourcecache.py Outdated
Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/snakemake/deployment/conda.py (2)

926-931: Critical: CondaEnvDirSpec.apply_wildcards signature doesn't match abstract method.

This method has signature (self, wildcards, rule) but the abstract method at line 857 declares (self, wildcards) -> Self. This signature mismatch causes a TypeError at runtime when called from rules.py line 1238.

🐛 Proposed fix
-    def apply_wildcards(self, wildcards, rule):
+    def apply_wildcards(self, wildcards) -> Self:
         filepath = self.path.apply_wildcards(wildcards)
         if is_local_file(filepath):
             # Normalize 'file:///my/path.yml' to '/my/path.yml'
             filepath = parse_uri(filepath).uri_path
-        return CondaEnvDirSpec(filepath, rule)
+        return CondaEnvDirSpec(filepath, rule=self.path.rule)

964-965: Critical: CondaEnvNameSpec.apply_wildcards signature doesn't match abstract method.

This method has signature (self, wildcards, _) but the abstract method at line 857 declares (self, wildcards) -> Self. The unused _ parameter must be removed, and the return type annotation added.

🐛 Proposed fix
-    def apply_wildcards(self, wildcards, _):
+    def apply_wildcards(self, wildcards) -> Self:
         return CondaEnvNameSpec(apply_wildcards(self.name, wildcards))
🤖 Fix all issues with AI agents
In @src/snakemake/deployment/conda.py:
- Line 9: The code imports Self from typing which only exists on Python 3.11+;
add a conditional fallback to import Self from typing_extensions when
typing.Self isn't available so the module works on Python 3.7–3.10; update the
import block in src/snakemake/deployment/conda.py (and do the same in
src/snakemake/sourcecache.py) to try importing Self from typing and on
ImportError import Self from typing_extensions.

In @src/snakemake/sourcecache.py:
- Around line 464-466: The hosted_repo property is inconsistent: the getter
returns from the instance cache self._hosted_repos[self.repo] while the setter
writes to a different attribute self._hosted_repo, so setting has no effect; fix
by either (A) removing the setter if unused, or (B) make the setter update the
same cache key (assign hosted_repo to self._hosted_repos[self.repo] or
initialize self._hosted_repos if missing) so that property getter and setter
operate on the same storage; update or remove the hosted_repo.setter
accordingly.
🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

353-353: Annotate mutable class attribute with ClassVar.

This mutable class attribute is intentionally shared across instances as a cache. Adding ClassVar annotation makes this intent explicit and silences static analysis warnings.

♻️ Suggested fix
+from typing import ClassVar
 ...
-    _hosted_repos: Dict[str, HostedGitRepo] = {}
+    _hosted_repos: ClassVar[Dict[str, HostedGitRepo]] = {}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c16ad2 and 199b2dc.

📒 Files selected for processing (4)
  • src/snakemake/deployment/conda.py
  • src/snakemake/io/__init__.py
  • src/snakemake/rules.py
  • src/snakemake/sourcecache.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/io/__init__.py
  • src/snakemake/rules.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/sourcecache.py
🧠 Learnings (5)
📚 Learning: 2024-10-11T12:49:08.705Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/rules.py:1105-1105
Timestamp: 2024-10-11T12:49:08.705Z
Learning: In `snakemake/rules.py`, within the `Rule` class, `self.basedir` is a custom class (not a `pathlib.Path` object) that has a `join()` method. Therefore, using `self.basedir.join(conda_env)` is appropriate in this context.

Applied to files:

  • src/snakemake/rules.py
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.

Applied to files:

  • src/snakemake/rules.py
  • src/snakemake/deployment/conda.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/deployment/conda.py
  • src/snakemake/sourcecache.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (3)
src/snakemake/rules.py (1)
src/snakemake/deployment/conda.py (6)
  • CondaEnvFileSpec (882-914)
  • CondaEnvDirSpec (917-957)
  • apply_wildcards (857-857)
  • apply_wildcards (886-888)
  • apply_wildcards (926-931)
  • apply_wildcards (964-965)
src/snakemake/deployment/conda.py (1)
src/snakemake/sourcecache.py (10)
  • replace_suffix (85-85)
  • replace_suffix (123-128)
  • replace_suffix (154-159)
  • replace_suffix (212-223)
  • replace_suffix (400-413)
  • apply_wildcards (79-79)
  • apply_wildcards (133-134)
  • apply_wildcards (164-165)
  • apply_wildcards (228-235)
  • apply_wildcards (418-427)
src/snakemake/sourcecache.py (3)
src/snakemake/common/__init__.py (4)
  • LockFreeWritableFile (391-432)
  • is_local_file (131-132)
  • parse_uri (135-150)
  • smart_join (153-170)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/io/__init__.py (4)
  • apply_wildcards (799-839)
  • apply_wildcards (1175-1184)
  • open (397-416)
  • get (1913-1918)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

316-316: Consider moving this statement to an else block

(TRY300)


326-326: Consider moving this statement to an else block

(TRY300)


353-353: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


452-456: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


452-456: Avoid specifying long messages outside the exception class

(TRY003)


492-494: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


492-494: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (15)
src/snakemake/io/__init__.py (2)

468-471: LGTM!

Clean refactoring that delegates to the new standalone check() function while maintaining the same interface.


953-1001: LGTM!

The new standalone check() function properly centralizes all file path validation logic. The implementation correctly handles the optional rule parameter for context and appropriately skips the double-slash check for storage files.

src/snakemake/sourcecache.py (9)

12-38: LGTM!

The imports are correctly organized, with the TYPE_CHECKING guard properly protecting the git import for type hints only.


60-116: LGTM!

The abstract method additions properly define the interface for all SourceFile subclasses. Using Self return type correctly enables proper type inference for subclass method chaining.


537-554: LGTM!

The GithubFile class correctly defaults the host and provides the auth property for git URL authentication.


556-578: LGTM!

The GitlabFile class correctly mirrors the GithubFile pattern with appropriate defaults and authentication handling.


656-658: LGTM!

Properly propagates cache_path to HostingProviderFile instances before computing their cache path, enabling the git-hosted file functionality.


729-730: LGTM!

Cleanly integrates git-hosted file support by delegating to HostingProviderFile.open() method.


196-272: LGTM!

The LocalGitFile class correctly implements all new abstract methods following the established patterns.


474-480: LGTM!

The mtime() implementation correctly accesses the underlying git.Repo via self.hosted_repo.repo to get the last commit date for the file path.


336-342: Undefined attribute self.repo - should be self.repo_name.

Line 287 stores the repo as self.repo_name, but line 338 references self.repo which doesn't exist. This will cause an AttributeError at runtime.

🐛 Proposed fix
         with self._tmpdir() as tmpdir:
             logger.info(
-                f"Fetching latest changes of {self.host}/{self.repo} to {self.repo_clone}"
+                f"Fetching latest changes of {self.host}/{self.repo_name} to {self.repo_clone}"
             )
             shutil.copytree(self.repo_clone, tmpdir)
             Repo(tmpdir).remotes.origin.fetch()
             shutil.move(tmpdir, self.repo_clone)

Likely an incorrect or invalid review comment.

src/snakemake/deployment/conda.py (3)

857-857: LGTM: Abstract method return type annotation is correct.

The addition of -> Self to the abstract method signature is appropriate and helps ensure type safety across implementations.

However, note that concrete implementations at lines 926 (CondaEnvDirSpec) and 964 (CondaEnvNameSpec) have signature mismatches with extra parameters—this is flagged separately as a critical issue.


883-888: LGTM: Clean refactor to simplified API.

The updated CondaEnvFileSpec implementation:

  • Simplifies the constructor to accept source_file directly
  • Correctly implements the abstract method signature
  • Uses self.__class__(...) for proper subclass support
  • Returns Self for fluent chaining

This aligns well with the SourceFile abstraction pattern.


138-145: LGTM: Cleaner delegation to SourceFile API.

The refactor to use self.file.replace_suffix([".yaml", ".yml"], suffix) is cleaner than manual prefix handling. The None check appropriately handles cases where the conda env file doesn't have the expected extension.

src/snakemake/rules.py (1)

1232-1238: Critical: Signature mismatch will cause runtime failures for DIR and NAME env specs.

The changes at lines 1232 and 1238 update the API calls to match the new CondaEnvFileSpec signature. However, at line 1238, conda_env.apply_wildcards(wildcards) is called polymorphically on three different spec types:

  • CondaEnvFileSpec (line 1232) - has apply_wildcards(self, wildcards) -> Self
  • CondaEnvNameSpec (line 1234) - has apply_wildcards(self, wildcards, _) in conda.py line 964 ✗
  • CondaEnvDirSpec (line 1236) - has apply_wildcards(self, wildcards, rule) in conda.py line 926 ✗

The call at line 1238 will raise TypeError: missing required positional argument when conda_env is a CondaEnvDirSpec or CondaEnvNameSpec.

The signatures in conda.py need to be fixed:

  • CondaEnvDirSpec.apply_wildcards should extract rule from self.path.rule instead of taking it as a parameter
  • CondaEnvNameSpec.apply_wildcards should remove the unused _ parameter

Both must match the abstract method signature declared at conda.py line 857.

⛔ Skipped due to learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/rules.py:1105-1105
Timestamp: 2024-10-11T12:49:08.705Z
Learning: In `snakemake/rules.py`, within the `Rule` class, `self.basedir` is a custom class (not a `pathlib.Path` object) that has a `join()` method. Therefore, using `self.basedir.join(conda_env)` is appropriate in this context.

Comment thread src/snakemake/deployment/conda.py Outdated
Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/snakemake/deployment/conda.py (3)

921-926: Add return type annotation for consistency.

The apply_wildcards method should include a -> Self return type annotation to match the abstract method signature on line 857 and be consistent with other implementations like CondaEnvFileSpec.

♻️ Proposed fix
-    def apply_wildcards(self, wildcards):
+    def apply_wildcards(self, wildcards) -> Self:
         filepath = apply_wildcards(self.path, wildcards)

959-960: Add return type annotation for consistency.

The apply_wildcards method should include a -> Self return type annotation to match the abstract method signature on line 857 and be consistent with other implementations.

♻️ Proposed fix
-    def apply_wildcards(self, wildcards):
+    def apply_wildcards(self, wildcards) -> Self:
         return CondaEnvNameSpec(apply_wildcards(self.name, wildcards))

136-151: Improve error message readability for consistency with other log statements.

The refactored _get_aux_file method calls self.file.replace_suffix([".yaml", ".yml"], suffix) on line 138, which is properly implemented across all SourceFile subclasses.

However, the error message on line 142 directly interpolates self.file, which produces a less user-friendly representation. Use self.file.get_path_or_uri(secret_free=True) instead to match the pattern already established elsewhere in this file (lines 324, 615, 638):

             if aux_file is None:
                 logger.warning(
-                    f"Conda environment file {self.file} does not end "
+                    f"Conda environment file {self.file.get_path_or_uri(secret_free=True)} does not end "
                     f"on .yaml or .yml. {omit_msg}"
                 )
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0e2873 and 82c2c83.

📒 Files selected for processing (1)
  • src/snakemake/deployment/conda.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/deployment/conda.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.

Applied to files:

  • src/snakemake/deployment/conda.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/deployment/conda.py
📚 Learning: 2025-09-05T16:17:21.298Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3726
File: src/snakemake/common/__init__.py:413-414
Timestamp: 2025-09-05T16:17:21.298Z
Learning: Snakemake requires Python 3.11 at runtime nowadays. Only imports need to work with Python 3.7 for compatibility reasons.

Applied to files:

  • src/snakemake/deployment/conda.py
📚 Learning: 2024-10-11T12:49:08.705Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/rules.py:1105-1105
Timestamp: 2024-10-11T12:49:08.705Z
Learning: In `snakemake/rules.py`, within the `Rule` class, `self.basedir` is a custom class (not a `pathlib.Path` object) that has a `join()` method. Therefore, using `self.basedir.join(conda_env)` is appropriate in this context.

Applied to files:

  • src/snakemake/deployment/conda.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: apidocs
🔇 Additional comments (2)
src/snakemake/deployment/conda.py (2)

857-857: LGTM! Good addition of return type annotation.

Adding -> Self to the abstract method signature clarifies the fluent interface pattern and ensures consistency across implementations.


883-888: LGTM! Clean simplification of the implementation.

The refactored CondaEnvFileSpec properly:

  • Removes the unused rule parameter
  • Delegates wildcard expansion to the underlying source_file object
  • Returns the correct type with -> Self annotation
  • Maintains consistency with the abstract method signature

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/snakemake/workflow.py (1)

1616-1657: Align linemap/included keys with the compile filename to avoid failures for hosted snakefiles with tokens

The new include() behavior has two important aspects:

  • Good changes:

    • Initializing snakefile.cache_path when snakefile is a HostingProviderFile ensures that git-hosted sources have a well-defined cache root in the workflow’s SourceCache, which is required for the new HostedGitRepo-based cloning.
    • Treating HostingProviderFile like LocalSourceFile for sys.path insertion is consistent with the intention that hosted snakefiles behave like local modules once cloned.
  • Problematic interaction with linemaps and __code__.co_filename:

    • _included and linemaps are now keyed using snakefile.get_path_or_uri(secret_free=True).

    • The compiled code still uses snakefile.get_path_or_uri(secret_free=False) as the filename:

      exec(
          compile(code, snakefile.get_path_or_uri(secret_free=False), "exec"),
          self.globals,
      )
    • For local snakefiles these two strings are identical, but for hosted ones (e.g. GitHub/GitLab) secret_free=False may embed credentials (e.g. https://:[email protected]/...), while secret_free=True omits them.

    • Workflow.get_rule_source() relies on exact key equality:

      sourcefile = func.__code__.co_filename
      linemap = self.linemaps[sourcefile]

      With the current change, for token-bearing hosted snakefiles sourcefile will be the non-secret-free string, while self.linemaps is keyed by the secret-free variant, leading to a KeyError when attempting to populate rule.run_func_src.

    • Similarly, source_path() checks if calling_file in self._included using frame.f_back.f_code.co_filename, which will now miss for such hosted snakefiles and fall back to the less precise heuristic path calculation.

To keep secrets out of user-visible paths and maintain correct behavior, the filename passed to compile() should match the key used in _included and linemaps. A minimal fix is to compute a single safe filename once and use it consistently:

Suggested fix to keep filenames secret-free and consistent
     snakefile = infer_source_file(snakefile, basedir)

     if not self.modifier.is_module and snakefile in self.included:
         logger.info(f"Multiple includes of {snakefile} ignored")
         return
-    self._included[snakefile.get_path_or_uri(secret_free=True)] = snakefile
+    safe_snakefile_name = snakefile.get_path_or_uri(secret_free=True)
+    self._included[safe_snakefile_name] = snakefile
     self.included_stack.append(snakefile)

     default_target = self.default_target
     linemap: Dict[int, int] = dict()
-    self.linemaps[snakefile.get_path_or_uri(secret_free=True)] = linemap
+    self.linemaps[safe_snakefile_name] = linemap
@@
-        exec(
-            compile(code, snakefile.get_path_or_uri(secret_free=False), "exec"),
-            self.globals,
-        )
+        exec(
+            compile(code, safe_snakefile_name, "exec"),
+            self.globals,
+        )

This preserves the goal of not leaking tokens (because safe_snakefile_name is secret-free) and restores the invariant that func.__code__.co_filename matches the keys used in _included and linemaps, so get_rule_source() and source_path() continue to work for both local and hosted snakefiles, including those that require tokens.

🤖 Fix all issues with AI agents
In @src/snakemake/exceptions.py:
- Line 17: Remove the debugging call breakpoint() from
src/snakemake/exceptions.py (it’s the stray breakpoint() introduced in the
exception formatting/path) so execution is not paused in production; simply
delete the breakpoint() invocation and run the test suite to confirm no runtime
side-effects remain.

In @src/snakemake/sourcecache.py:
- Line 359: The log f-string in the HostedGitRepo fetch/update code uses a
nonexistent attribute self.repo; replace it with the correct attribute
self.repo_name (e.g., update the f-string that currently references self.repo to
use self.repo_name) so the log message reads the correct repository name and
avoids AttributeError.
- Around line 500-520: The except clause in the open method references
git.GitCommandError but git is only imported under TYPE_CHECKING, causing a
NameError at runtime; fix by importing the exception at runtime (e.g., add "from
git import GitCommandError" or "import git" at module scope) and update the
except to catch GitCommandError (or keep git.GitCommandError if you imported
git), ensuring the open method's except block correctly references a real
symbol.
- Around line 295-368: The lock-free clone and fetch can raise shutil.Error if
another process created/moved the destination concurrently; wrap the shutil.move
call in both __init__ (clone path where Repo.clone_from -> shutil.move(tmpdir,
self.repo_clone)) and fetch (shutil.move(tmpdir, self.repo_clone)) in try/except
handling shutil.Error: if the destination already exists, swallow the error and
refresh self._repo (e.g., set self._repo = Repo(self.repo_clone) or reopen after
a short retry), otherwise re-raise; ensure you import or reference shutil.Error
and keep the tmpdir cleanup path intact (methods to touch: __init__, fetch,
using _tmpdir, repo_clone, and _repo).
- Around line 471-492: The hosted_repo property accesses and mutates the shared
class-level dictionary _hosted_repos without synchronization and the hosted_repo
setter mistakenly assigns to self._hosted_repo instead of updating the shared
dict; fix by introducing a class-level threading.Lock (e.g., _hosted_repos_lock)
and wrap the check-create-assign sequence in the hosted_repo getter in a with
_hosted_repos_lock: block (or use _hosted_repos.setdefault(...) under the lock)
so creating/setting HostedGitRepo(self.repo, ...) is atomic, and change the
hosted_repo.setter to update self._hosted_repos[self.repo] = hosted_repo (under
the same lock) instead of assigning self._hosted_repo.
🧹 Nitpick comments (4)
tests/tests.py (1)

2495-2504: Re‑check whether the GitHub module import test should still be globally skipped

This new test exercises Python imports from a GitHub-hosted module but is unconditionally skipped with a reason stating that maintaining a full local GitHub repo is “unsupported for now”. Given this PR introduces git-hosted clones, please confirm whether:

  • the scenario in test_python_import_from_github_module is now expected to work, in which case the skip can likely be dropped (or turned into a conditional xfail), or
  • the test still targets functionality beyond the current implementation, in which case updating the skip reason to be more specific (e.g. referencing the remaining limitation) would help future maintainers.
src/snakemake/script/__init__.py (1)

960-965: Prefer explicit if/else over ifelse with ... in R preambles

The new R and RMarkdown preambles correctly distinguish between local and https?:// scriptdirs, but using:

is_url <- grepl("^https?://", snakemake@scriptdir)
file <- ifelse(is_url, file.path(snakemake@scriptdir, ...), ...)
if (!is_url) setwd(snakemake@scriptdir)
source(file)

is awkward because ifelse() eagerly evaluates both branches and you’re passing ... in two different positions. That’s harder to reason about and could behave unexpectedly if multiple path components are supplied.

A clearer and more robust pattern would be an explicit branch, e.g.:

Suggested R/RMarkdown preamble adjustment
-        is_url <- grepl("^https?://", snakemake@scriptdir)
-        file <- ifelse(is_url, file.path(snakemake@scriptdir, ...), ...)
-        if (!is_url) setwd(snakemake@scriptdir)
-        source(file)
+        is_url <- grepl("^https?://", snakemake@scriptdir)
+        if (is_url) {
+            file <- file.path(snakemake@scriptdir, ...)
+        } else {
+            file <- ...
+            setwd(snakemake@scriptdir)
+        }
+        source(file)

This keeps the existing behavior for local scripts, cleanly supports URL-based scriptdirs, and ensures ... is only expanded in the path construction relevant to the actual case.

Also applies to: 1073-1078

src/snakemake/sourcecache.py (2)

374-374: Mutable class attribute without ClassVar annotation.

The static analysis correctly identifies that _hosted_repos is a mutable class attribute that should be annotated with typing.ClassVar to clarify it's shared across all instances.

♻️ Suggested fix
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, List, Optional
...
-    _hosted_repos: Dict[str, HostedGitRepo] = {}
+    _hosted_repos: ClassVar[Dict[str, HostedGitRepo]] = {}

460-469: Assert in property getter may be problematic for debugging.

The cache_path property uses an assertion that will fail with a generic AssertionError if _cache_path is not set. This could be confusing when debugging. Consider raising a more descriptive error:

♻️ Optional improvement
     @property
     def cache_path(self) -> Path:
-        assert (
-            self._cache_path
-        ), "bug: cache_path not set, should be done by SourceCache"
-
+        if not self._cache_path:
+            raise RuntimeError(
+                "bug: cache_path not set, should be done by SourceCache"
+            )
         return self._cache_path
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82c2c83 and 08d195d.

📒 Files selected for processing (9)
  • src/snakemake/deployment/conda.py
  • src/snakemake/deployment/containerize.py
  • src/snakemake/exceptions.py
  • src/snakemake/parser.py
  • src/snakemake/rules.py
  • src/snakemake/script/__init__.py
  • src/snakemake/sourcecache.py
  • src/snakemake/workflow.py
  • tests/tests.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/deployment/containerize.py
  • src/snakemake/workflow.py
  • src/snakemake/parser.py
  • src/snakemake/exceptions.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/script/__init__.py
  • src/snakemake/rules.py
  • tests/tests.py
  • src/snakemake/sourcecache.py
🧠 Learnings (12)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.

Applied to files:

  • src/snakemake/deployment/containerize.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/rules.py
📚 Learning: 2024-10-06T14:09:26.494Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-06T14:09:26.494Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/deployment/containerize.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-11T12:49:08.705Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/rules.py:1105-1105
Timestamp: 2024-10-11T12:49:08.705Z
Learning: In `snakemake/rules.py`, within the `Rule` class, `self.basedir` is a custom class (not a `pathlib.Path` object) that has a `join()` method. Therefore, using `self.basedir.join(conda_env)` is appropriate in this context.

Applied to files:

  • src/snakemake/deployment/containerize.py
  • src/snakemake/deployment/conda.py
  • src/snakemake/rules.py
📚 Learning: 2025-08-28T13:17:37.693Z
Learnt from: Hocnonsense
Repo: snakemake/snakemake PR: 3714
File: src/snakemake/workflow.py:637-665
Timestamp: 2025-08-28T13:17:37.693Z
Learning: In Snakemake, rule.module_globals["__name__"] always exists and defaults to "snakemake.workflow" when not modified, so direct dictionary access is safe without needing .get().

Applied to files:

  • src/snakemake/workflow.py
📚 Learning: 2025-09-17T04:03:59.943Z
Learnt from: Hocnonsense
Repo: snakemake/snakemake PR: 3714
File: src/snakemake/modules.py:236-241
Timestamp: 2025-09-17T04:03:59.943Z
Learning: In Snakemake's WorkflowModifier, rule filtering via rule_whitelist and rule_exclude_list happens upstream during rule creation in the rule() decorator. The avail_rulename() method returns None for filtered-out rules, causing the entire rule creation to be skipped. This means child_modifier.rule_proxies only contains rules that passed the filtering, so inherit_rule_proxies doesn't need additional filtering logic.

Applied to files:

  • src/snakemake/workflow.py
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

  • src/snakemake/parser.py
  • tests/tests.py
📚 Learning: 2025-01-14T14:04:30.554Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3107
File: snakemake/logging.py:510-515
Timestamp: 2025-01-14T14:04:30.554Z
Learning: In Snakemake's logging handlers, KeyboardInterrupt and SystemExit exceptions should be ignored (pass) rather than re-raised, as these are control flow signals that should not disrupt the logging process. The exception variable in the final except block should be kept for potential debugging purposes.

Applied to files:

  • src/snakemake/exceptions.py
📚 Learning: 2025-09-05T16:17:21.298Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3726
File: src/snakemake/common/__init__.py:413-414
Timestamp: 2025-09-05T16:17:21.298Z
Learning: Snakemake requires Python 3.11 at runtime nowadays. Only imports need to work with Python 3.7 for compatibility reasons.

Applied to files:

  • src/snakemake/deployment/conda.py
📚 Learning: 2024-11-07T00:32:44.137Z
Learnt from: mbhall88
Repo: snakemake/snakemake PR: 3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (8)
src/snakemake/deployment/containerize.py (2)
src/snakemake/deployment/conda.py (2)
  • CondaEnvSpecType (986-1009)
  • from_spec (992-1009)
src/snakemake/sourcecache.py (1)
  • infer_source_file (607-637)
src/snakemake/workflow.py (1)
src/snakemake/sourcecache.py (11)
  • HostingProviderFile (370-560)
  • cache_path (460-465)
  • cache_path (468-469)
  • infer_source_file (607-637)
  • get_path_or_uri (66-66)
  • get_path_or_uri (147-148)
  • get_path_or_uri (183-184)
  • get_path_or_uri (258-259)
  • get_path_or_uri (575-579)
  • get_path_or_uri (594-604)
  • LocalSourceFile (161-209)
src/snakemake/parser.py (1)
src/snakemake/sourcecache.py (6)
  • get_path_or_uri (66-66)
  • get_path_or_uri (147-148)
  • get_path_or_uri (183-184)
  • get_path_or_uri (258-259)
  • get_path_or_uri (575-579)
  • get_path_or_uri (594-604)
src/snakemake/deployment/conda.py (2)
src/snakemake/sourcecache.py (21)
  • replace_suffix (89-91)
  • replace_suffix (132-139)
  • replace_suffix (168-175)
  • replace_suffix (231-244)
  • replace_suffix (424-439)
  • get_path_or_uri (66-66)
  • get_path_or_uri (147-148)
  • get_path_or_uri (183-184)
  • get_path_or_uri (258-259)
  • get_path_or_uri (575-579)
  • get_path_or_uri (594-604)
  • format (80-80)
  • format (144-145)
  • format (180-181)
  • format (249-256)
  • endswith (86-86)
  • endswith (129-130)
  • endswith (165-166)
  • endswith (228-229)
  • endswith (421-422)
  • LocalSourceFile (161-209)
src/snakemake/common/__init__.py (2)
  • is_local_file (131-132)
  • parse_uri (135-150)
src/snakemake/script/__init__.py (1)
src/snakemake/sourcecache.py (11)
  • get_path_or_uri (66-66)
  • get_path_or_uri (147-148)
  • get_path_or_uri (183-184)
  • get_path_or_uri (258-259)
  • get_path_or_uri (575-579)
  • get_path_or_uri (594-604)
  • format (80-80)
  • format (144-145)
  • format (180-181)
  • format (249-256)
  • infer_source_file (607-637)
src/snakemake/rules.py (1)
src/snakemake/deployment/conda.py (8)
  • CondaEnvFileSpec (882-914)
  • CondaEnvSpecType (986-1009)
  • CondaEnvNameSpec (955-983)
  • CondaEnvDirSpec (917-952)
  • apply_wildcards (857-857)
  • apply_wildcards (886-888)
  • apply_wildcards (921-926)
  • apply_wildcards (959-960)
tests/tests.py (1)
tests/common.py (2)
  • run (160-546)
  • dpath (38-41)
src/snakemake/sourcecache.py (5)
src/snakemake/utils.py (1)
  • format (523-567)
src/snakemake/common/__init__.py (3)
  • is_local_file (131-132)
  • parse_uri (135-150)
  • smart_join (153-170)
src/snakemake/exceptions.py (1)
  • SourceFileError (201-203)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/io/__init__.py (4)
  • apply_wildcards (799-839)
  • apply_wildcards (1175-1184)
  • open (397-416)
  • get (1913-1918)
🪛 Ruff (0.14.10)
src/snakemake/deployment/containerize.py

43-47: Avoid specifying long messages outside the exception class

(TRY003)

src/snakemake/exceptions.py

17-17: Trace found: breakpoint used

(T100)

src/snakemake/rules.py

1240-1240: Avoid specifying long messages outside the exception class

(TRY003)

src/snakemake/sourcecache.py

337-337: Consider moving this statement to an else block

(TRY300)


347-347: Consider moving this statement to an else block

(TRY300)


374-374: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


478-482: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


478-482: Avoid specifying long messages outside the exception class

(TRY003)


518-520: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


518-520: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: apidocs
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (10, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (8, macos-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py313)
🔇 Additional comments (23)
src/snakemake/parser.py (1)

1356-1364: Using secret_free=True for Snakefile.path is appropriate

This keeps error messages and rule metadata free of embedded tokens while leaving actual file access via workflow.sourcecache.open(path) unchanged. No functional issues spotted.

tests/tests.py (1)

1709-1715: Windows skip on test_module_with_script looks reasonable

Marking this test with @skip_on_windows to avoid long-filename issues is consistent with other tests and avoids known Windows-specific path problems.

src/snakemake/deployment/containerize.py (1)

32-51: Conda env spec classification refactor is sound

Computing spec_type = CondaEnvSpecType.from_spec(rule.conda_env) before infer_source_file(...) keeps the FILE/NAME/DIR detection tied to the original spec (string/SourceFile/Path) and plays nicely with the newer SourceFile subclasses. The subsequent FILE-only check and error path remain correct.

src/snakemake/script/__init__.py (1)

660-664: Secret‑free script paths and get_source wildcard handling look correct

  • ScriptBase.local_path now using self.path.get_path_or_uri(secret_free=True) is safe under assert self.is_local and avoids ever embedding credentials in path strings.
  • All language preambles (Python/R/RMarkdown/Julia/Rust/Bash) now derive scriptdir via get_path_or_uri(secret_free=True), which is exactly what you want for hosted/git-backed sources: actual I/O still goes through SourceCache and the SourceFile objects, while human-visible paths are sanitized.
  • get_source()’s new logic:
    • Calls path.format(wildcards=..., params=...) when path is a SourceFile, letting LocalSourceFile, HostingProviderFile, etc. control how wildcards are applied.
    • Otherwise formats the string and then wraps it once via infer_source_file.
    • Finally resolves basedir and calls infer_source_file(path, basedir) a single time.
      This is a clean way to support the new hosted/git-backed types without double-wrapping or losing type information.

No functional issues spotted in these areas.

Also applies to: 988-989, 1100-1102, 1186-1188, 1256-1257, 1554-1557, 1648-1659

src/snakemake/rules.py (3)

1219-1219: LGTM!

The type assertion at line 1219 ensures conda_env is of valid type before proceeding with spec type determination. This is good defensive programming that will catch issues early.


1233-1238: LGTM!

The simplified constructor calls for CondaEnvFileSpec and CondaEnvDirSpec align well with the updated signatures in conda.py. The type assertion at line 1235 ensures conda_env is a string before constructing a CondaEnvNameSpec.


1239-1242: LGTM!

The explicit error handling for unsupported spec types (lines 1239-1240) is a good defensive measure. The updated apply_wildcards call (line 1242) correctly uses the simplified single-argument signature.

src/snakemake/deployment/conda.py (6)

136-151: LGTM!

The _get_aux_file method now correctly uses the replace_suffix method from the SourceFile interface. The logic properly handles the case where the suffix replacement returns None (when the file doesn't end with .yaml or .yml), and the existence check via self.workflow.sourcecache.exists(aux_file) is appropriate.


437-437: LGTM!

Using secret_free=True when obtaining the path for local source files is the correct approach here, as this path is used for creating conda environments and should not contain any secrets.


883-888: LGTM!

The simplified CondaEnvFileSpec constructor now directly accepts a source_file parameter, and the apply_wildcards method correctly uses the format method of the source file with keyword argument unpacking.


918-926: LGTM!

The CondaEnvDirSpec constructor and apply_wildcards method are correctly updated. The path is stored as a string, and wildcard application properly handles local file URI normalization.


959-960: LGTM!

The apply_wildcards method signature is correctly simplified to take only wildcards as an argument, consistent with the other spec types.


991-1009: LGTM!

The from_spec method now properly distinguishes between FILE, DIR, and NAME spec types. The logic correctly:

  1. Checks for .yaml/.yml extensions to identify FILE specs
  2. Uses LocalSourceFile path check to identify DIR specs
  3. Falls back to NAME for other cases
src/snakemake/sourcecache.py (10)

38-58: LGTM!

The helper functions _format_or_none and _replace_suffix are clean utilities that handle None values appropriately and support the new suffix replacement functionality across SourceFile subclasses.


61-122: LGTM!

The SourceFile ABC is properly extended with new abstract methods (format, endswith, replace_suffix) that enable consistent suffix-based operations across all source file types. The simplify_path method now has a sensible default implementation.


125-159: LGTM!

Both GenericSourceFile and LocalSourceFile correctly implement the new abstract methods (endswith, replace_suffix, format) using the shared _replace_suffix helper. The implementations are consistent and properly handle None returns when suffix doesn't match.

Also applies to: 161-210


212-256: LGTM!

LocalGitFile correctly implements the new abstract methods. The format method properly handles wildcards for all fields including the optional tag, ref, and commit using _format_or_none.


421-443: LGTM!

The endswith, replace_suffix, check, and apply_wildcards methods for HostingProviderFile are properly implemented and maintain consistency with other SourceFile subclasses.


563-573: LGTM!

The GithubFile class correctly sets a default host and implements the auth property to provide token-based authentication for Git operations.


582-592: LGTM!

The GitlabFile class mirrors the GithubFile pattern with its own auth property implementation.


650-650: LGTM!

The runtime cache directory is appropriately renamed to snakemake-runtime-cache.


682-685: LGTM!

Setting cache_path on HostingProviderFile instances before computing cache paths ensures the git-backed infrastructure is properly configured.


755-756: LGTM!

The _open method correctly routes HostingProviderFile instances to use their own open() method for git-backed file access.

Comment thread src/snakemake/exceptions.py Outdated
Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py Outdated
Comment thread src/snakemake/sourcecache.py Outdated
Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @src/snakemake/sourcecache.py:
- Around line 472-488: The shared registry _hosted_repos in the hosted_repo
property is currently keyed only by self.repo, which can cause wrong
HostedGitRepo instances when different HostingProviderFile objects use the same
repo but different cache_path; change the registry key to include both repo and
cache_path (e.g., tuple of self.repo and str(self.cache_path)) or alternatively
validate that an existing entry for self.repo has the same cache_path before
returning it; update all accesses to _hosted_repos in the hosted_repo method
(lookup, assignment, and KeyError handling) to use the combined key or to
perform the cache_path consistency check and raise/replace accordingly so each
unique repo+cache_path gets its own HostedGitRepo.
- Line 374: The _hosted_repos registry is a class-level dictionary and should be
annotated as a typing.ClassVar to indicate it is shared across instances; update
the declaration of _hosted_repos to use ClassVar[Dict[str, HostedGitRepo]] and
add ClassVar to the imports from typing (or typing_extensions if used), ensuring
the symbol name _hosted_repos and the HostedGitRepo type are preserved.
- Around line 508-520: The open method in src/snakemake/sourcecache.py
references git.GitCommandError at runtime but git is only imported inside the
TYPE_CHECKING block, causing a NameError; fix by importing git for runtime
use—either add a top-level "import git" to the module imports or (preferred to
avoid startup cost) add a local "import git" at the start of the open method
(which references hosted_repo, ref, path, and git.GitCommandError) so the
exception class and git.show call are available at runtime.
- Around line 351-363: The current fetch method using shutil.copytree and
shutil.move around Repo(tmpdir).remotes.origin.fetch() is broken (tmpdir is
precreated, destination exists) and inefficient; replace the copy-fetch-move
approach with a direct git fetch against the existing working clone: open the
repository at self.repo_clone via Repo(self.repo_clone) and call
remotes.origin.fetch() with proper try/except to log failures via logger, and
add a fallback path that reclones into a temporary directory and atomically
swaps in only if self.repo_clone is missing or corrupted (use self._tmpdir for
the clone target and shutil.replace or os.replace to swap), removing
shutil.copytree/shutil.move usage to prevent failures and corruption.
- Around line 317-327: The clone block using self._tmpdir(), Repo.clone_from and
shutil.move is unsafe: moving a TemporaryDirectory-managed path prevents
cleanup, and shutil.move can fail if self.repo_clone is created concurrently;
replace the pattern by creating a distinct temporary directory path (not the
TemporaryDirectory's root) for the bare clone, clone into that path, then use
os.rename (or atomic os.replace/rename) to move the temp path into
self.repo_clone while catching FileExistsError to handle the race (if
destination exists, delete/cleanup the temp path and proceed to open existing
repo), and ensure the TemporaryDirectory/context is not moved so it can always
be cleaned up; update the logic around self._existed_before and setting
self._repo = Repo(self.repo_clone) accordingly.
🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

650-650: Runtime cache directory renamed.

The runtime cache directory name changed from "runtime-cache" to "snakemake-runtime-cache". This is a minor breaking change if users or scripts reference the old directory name directly.

Consider documenting this change in migration notes or deprecation warnings if backward compatibility is important.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08d195d and c0a7796.

📒 Files selected for processing (1)
  • src/snakemake/sourcecache.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/sourcecache.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (1)
src/snakemake/sourcecache.py (7)
src/snakemake/parser.py (1)
  • parse (1391-1414)
src/snakemake/utils.py (1)
  • format (523-567)
src/snakemake/common/__init__.py (4)
  • LockFreeWritableFile (391-432)
  • is_local_file (131-132)
  • parse_uri (135-150)
  • smart_join (153-170)
src/snakemake/exceptions.py (1)
  • SourceFileError (200-202)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/workflow.py (2)
  • check (645-653)
  • is_local (714-718)
src/snakemake/io/__init__.py (8)
  • check (468-471)
  • check (953-1000)
  • exists (473-477)
  • parents (479-484)
  • mtime (161-162)
  • mtime (505-521)
  • open (397-416)
  • get (1913-1918)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

337-337: Consider moving this statement to an else block

(TRY300)


347-347: Consider moving this statement to an else block

(TRY300)


374-374: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


478-482: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


478-482: Avoid specifying long messages outside the exception class

(TRY003)


518-520: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


518-520: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: apidocs
🔇 Additional comments (2)
src/snakemake/sourcecache.py (2)

444-453: Clean implementation of the format method.

The format method correctly creates a new instance with formatted parameters and properly handles optional fields using _format_or_none. The pattern is consistently applied across all SourceFile subclasses.


80-80: No action needed. The code correctly uses "typing.Self" as a string forward reference, which is compatible with any Python version and does not require importing typing.Self. Type hints in string form are evaluated only by static type checkers, not at runtime, so there is no compatibility concern here. Additionally, the project requires Python 3.11 at runtime per the coding guidelines.

Likely an incorrect or invalid review comment.

Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py Outdated
Comment thread src/snakemake/sourcecache.py Outdated
Comment thread src/snakemake/sourcecache.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @src/snakemake/sourcecache.py:
- Around line 296-328: HostedGitRepo's lock-free cloning can race when multiple
threads clone the same repo: after Repo.clone_from into tmpdir, wrap
shutil.move(tmpdir, self.repo_clone) in a try/except to handle race conditions
(e.g., FileExistsError/OSError/shutil.Error); if the move fails because
self.repo_clone already exists, discard tmpdir (cleanup) and open the existing
repository with Repo(self.repo_clone) instead of failing, and ensure self._repo
is set to that Repo instance (and update _existed_before if appropriate);
perform these adjustments around the move and Repo(self.repo_clone) calls inside
the __init__ (use the same _tmpdir context and repo_clone symbol to locate the
code).
- Around line 375-376: The class-level mutable `_hosted_repos` dict should be
annotated as a ClassVar to signal it's shared across instances and satisfy
static type checkers: update the annotation of `_hosted_repos` to
`ClassVar[Dict[str, HostedGitRepo]] = {}` and add/import `ClassVar` from
`typing` if not already present; keep the existing variable name `_hosted_repos`
and the `HostedGitRepo` type reference unchanged so static analyzers understand
it's a class-level mutable.
🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

352-364: Consider optimizing concurrent fetch operations.

Multiple threads calling fetch() simultaneously will each copy the repository, perform a fetch, and move it back. While the final shutil.move() will overwrite and leave the repository in a consistent state, this approach wastes resources with redundant fetch operations.

Consider adding a flag or lock to prevent concurrent fetches of the same repository.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0a7796 and aa67d23.

📒 Files selected for processing (1)
  • src/snakemake/sourcecache.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/sourcecache.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (1)
src/snakemake/sourcecache.py (5)
src/snakemake/utils.py (1)
  • format (523-567)
src/snakemake/common/__init__.py (1)
  • LockFreeWritableFile (391-432)
src/snakemake/exceptions.py (1)
  • SourceFileError (200-202)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/io/__init__.py (4)
  • check (468-471)
  • check (953-1000)
  • exists (473-477)
  • get (1913-1918)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

338-338: Consider moving this statement to an else block

(TRY300)


348-348: Consider moving this statement to an else block

(TRY300)


375-375: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


480-484: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


480-484: Avoid specifying long messages outside the exception class

(TRY003)


520-522: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


520-522: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: apidocs
🔇 Additional comments (6)
src/snakemake/sourcecache.py (6)

12-14: LGTM! Clean additions to imports and helper utilities.

The new imports and helper functions are well-structured:

  • TYPE_CHECKING is used correctly for conditional imports
  • _format_or_none and _replace_suffix are useful utility functions

Also applies to: 21-21, 32-32, 35-40, 55-59


130-146: LGTM! Consistent implementation across concrete SourceFile classes.

The new methods are properly implemented across GenericSourceFile, LocalSourceFile, and LocalGitFile:

  • Path manipulation methods use the _replace_suffix helper correctly
  • check() appropriately delegates to snakemake.io.check where applicable
  • format() properly handles all relevant fields

Also applies to: 166-182, 214-257


423-455: LGTM! Well-implemented git-backed hosting features.

The additions to HostingProviderFile are well-structured:

  • Path manipulation methods properly propagate all required fields including cache_path
  • mtime() efficiently retrieves commit timestamp
  • open() method with fetch-on-demand is a good approach
  • __str__() provides useful representation

Also applies to: 500-522, 525-528, 538-538, 554-554, 561-562


567-575: LGTM! Auth property implementations are correct.

The auth property implementations in both GithubFile and GitlabFile properly handle authentication tokens for git operations.

Also applies to: 590-594


652-652: LGTM! Clean integration of git-backed hosting in SourceCache.

The changes properly integrate the new git-backed hosting features:

  • Renamed runtime cache directory for clarity
  • Cache path propagation to HostingProviderFile instances
  • Routing of HostingProviderFile to its specialized open() method

Also applies to: 685-686, 757-758


80-81: No action needed. The use of typing.Self in string-literal form (forward references) is fully compatible with Snakemake's minimum Python 3.11+ runtime requirement and requires no changes.

Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/snakemake/sourcecache.py:
- Around line 511-525: The open() method assigns fetch_error from
hosted_repo.fetch() but never uses it, so if fetching fails the subsequent
hosted_repo.repo.git.show(...) will raise an uninformative error; update open()
to check the result of hosted_repo.fetch() (the fetch_error variable returned by
hosted_repo.fetch()) and if it indicates failure include its details in the
WorkflowError message raised when git.GitCommandError occurs (or raise a
WorkflowError immediately when fetch_error shows failure) so that failures from
hosted_repo.fetch() are surfaced; reference the open() method,
hosted_repo.ref_exists/ref_exists, hosted_repo.file_exists/file_exists,
hosted_repo.fetch()/fetch_error, hosted_repo.repo.git.show(...) and the
WorkflowError/generic exception handling to locate and change the logic.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa67d23 and 6ebb4ba.

📒 Files selected for processing (1)
  • src/snakemake/sourcecache.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/sourcecache.py
🧠 Learnings (5)
📚 Learning: 2026-01-08T11:13:20.523Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3643
File: src/snakemake/sourcecache.py:474-492
Timestamp: 2026-01-08T11:13:20.523Z
Learning: In Python files under src/snakemake, ensure that the per-run cache_path for source files does not change within a single workflow run, so that HostingProviderFile._hosted_repos can be keyed reliably by repository name. This invariant should be documented and enforced (e.g., by freezing cache_path after initialization, avoiding mutations during a run, and ensuring any caching uses a run-scoped key). This pattern applies to all modules in the Snakemake source cache area where per-run caching and repository hosting are implemented.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-08-27T20:15:58.699Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3026
File: snakemake/assets/__init__.py:0-0
Timestamp: 2024-08-27T20:15:58.699Z
Learning: Do not suggest type annotations for `cls` arguments in class methods when reviewing code for the Snakemake project.

Applied to files:

  • src/snakemake/sourcecache.py
🧬 Code graph analysis (1)
src/snakemake/sourcecache.py (4)
src/snakemake/utils.py (1)
  • format (523-567)
src/snakemake/common/__init__.py (1)
  • LockFreeWritableFile (391-432)
src/snakemake/common/git.py (1)
  • split_git_path (7-14)
src/snakemake/io/__init__.py (3)
  • check (468-471)
  • check (953-1000)
  • get (1913-1918)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

339-339: Consider moving this statement to an else block

(TRY300)


349-349: Consider moving this statement to an else block

(TRY300)


483-487: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


483-487: Avoid specifying long messages outside the exception class

(TRY003)


517-517: Local variable fetch_error is assigned to but never used

Remove assignment to unused variable fetch_error

(F841)


523-525: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


523-525: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (9, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: apidocs
🔇 Additional comments (7)
src/snakemake/sourcecache.py (7)

40-41: LGTM! Helper functions are well-designed.

The _format_or_none and _replace_suffix utility functions correctly handle optional parameters and suffix replacement logic.

Also applies to: 56-61


64-93: LGTM! Abstract method declarations enhance the SourceFile API.

The new abstract methods (check, format, endswith, replace_suffix) provide a consistent interface across all SourceFile implementations.


131-184: LGTM! Method implementations are correct and consistent.

All SourceFile subclasses correctly implement the new abstract methods. The use of _format_or_none in LocalGitFile.format() properly handles optional parameters.

Also applies to: 215-258


297-372: LGTM! Lock-free cloning strategy is sound.

The HostedGitRepo class correctly implements lock-free repository cloning via temporary directory + atomic move. The bare clone approach is appropriate for read-only operations. The fetch() method returns an error message string on failure and None on success, providing clear semantics for callers.


426-458: LGTM! Path manipulation and authentication methods are correctly implemented.

The implementations of endswith, replace_suffix, check, format, mtime, __str__, and the auth properties in GithubFile and GitlabFile are all correct. The format method correctly propagates cache_path through to new instances.

Also applies to: 503-509, 564-565, 574-597


687-689: LGTM! SourceCache correctly integrates with hosting provider files.

The cache path propagation in _cache_entry() (lines 688-689) and the routing of HostingProviderFile to its own open() method (lines 760-761) correctly integrate the new git-backed hosting support.

Also applies to: 760-761


476-495: Class-level caching relies on cache_path stability—confirmed by design.

The _hosted_repos class-level cache is keyed by repository name alone, which safely relies on cache_path being set once per workflow run and remaining stable thereafter. This invariant is maintained by SourceCache._cache_entry (line 689), which assigns cache_path exactly once when creating cache entries for HostingProviderFile instances, and subsequent accesses only read the stored value.

Comment thread src/snakemake/sourcecache.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/snakemake/sourcecache.py (1)

522-540: Consider consistent exception handling approach.

The open() method uses exception chaining (raise ... from e) at line 540, which differs from the exception handling pattern used elsewhere in this file (e.g., lines 771-774 for LocalGitFile). Based on learnings, exception chaining has traditionally been avoided in the Snakemake codebase for backward compatibility reasons.

♻️ For consistency with existing patterns
-            raise WorkflowError(msg) from e
+            raise WorkflowError(msg)
tests/tests_using_conda.py (1)

6-6: Remove unused imports.

The imports partial (line 6) and HostingProviderFile (line 16) are not used anywhere in this test file and should be removed.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ebb4ba and f066b37.

📒 Files selected for processing (4)
  • src/snakemake/deployment/containerize.py
  • src/snakemake/sourcecache.py
  • tests/test_sourcecache.py
  • tests/tests_using_conda.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • tests/tests_using_conda.py
  • tests/test_sourcecache.py
  • src/snakemake/sourcecache.py
  • src/snakemake/deployment/containerize.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3643
File: src/snakemake/sourcecache.py:474-492
Timestamp: 2026-01-08T11:13:20.523Z
Learning: In Snakemake, the `cache_path` for source files does not change within a single workflow run, ensuring that all `HostingProviderFile` instances in a run share the same cache path. This makes it safe for the `_hosted_repos` class variable in `HostingProviderFile` to be keyed only by repository name.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

  • tests/tests_using_conda.py
  • tests/test_sourcecache.py
📚 Learning: 2026-01-08T11:13:20.523Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3643
File: src/snakemake/sourcecache.py:474-492
Timestamp: 2026-01-08T11:13:20.523Z
Learning: In Snakemake, the `cache_path` for source files does not change within a single workflow run, ensuring that all `HostingProviderFile` instances in a run share the same cache path. This makes it safe for the `_hosted_repos` class variable in `HostingProviderFile` to be keyed only by repository name.

Applied to files:

  • tests/tests_using_conda.py
  • tests/test_sourcecache.py
📚 Learning: 2025-01-13T18:50:40.791Z
Learnt from: cademirch
Repo: snakemake/snakemake PR: 3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.

Applied to files:

  • tests/tests_using_conda.py
📚 Learning: 2026-01-08T11:13:20.523Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3643
File: src/snakemake/sourcecache.py:474-492
Timestamp: 2026-01-08T11:13:20.523Z
Learning: In Python files under src/snakemake, ensure that the per-run cache_path for source files does not change within a single workflow run, so that HostingProviderFile._hosted_repos can be keyed reliably by repository name. This invariant should be documented and enforced (e.g., by freezing cache_path after initialization, avoiding mutations during a run, and ensuring any caching uses a run-scoped key). This pattern applies to all modules in the Snakemake source cache area where per-run caching and repository hosting are implemented.

Applied to files:

  • src/snakemake/sourcecache.py
  • src/snakemake/deployment/containerize.py
📚 Learning: 2024-10-29T09:26:26.636Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: snakemake/deployment/conda.py:0-0
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In `snakemake/deployment/conda.py`, the exception handling in the `_get_version` method is intentional. The method raises a `WorkflowError` when the version cannot be determined, and this behavior is desired. Do not suggest modifying this exception handling.

Applied to files:

  • src/snakemake/sourcecache.py
  • src/snakemake/deployment/containerize.py
📚 Learning: 2025-01-23T08:17:08.873Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3263
File: snakemake/sourcecache.py:195-196
Timestamp: 2025-01-23T08:17:08.873Z
Learning: Including authentication tokens in Git repository URLs is accepted in the Snakemake codebase when using the HostedGitRepo class, as secure authentication handling through GitPython's credential helpers is not feasible in this context.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-08-27T20:15:58.699Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3026
File: snakemake/assets/__init__.py:0-0
Timestamp: 2024-08-27T20:15:58.699Z
Learning: Do not suggest type annotations for `cls` arguments in class methods when reviewing code for the Snakemake project.

Applied to files:

  • src/snakemake/sourcecache.py
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.

Applied to files:

  • src/snakemake/deployment/containerize.py
📚 Learning: 2024-10-11T12:49:08.705Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/rules.py:1105-1105
Timestamp: 2024-10-11T12:49:08.705Z
Learning: In `snakemake/rules.py`, within the `Rule` class, `self.basedir` is a custom class (not a `pathlib.Path` object) that has a `join()` method. Therefore, using `self.basedir.join(conda_env)` is appropriate in this context.

Applied to files:

  • src/snakemake/deployment/containerize.py
🧬 Code graph analysis (3)
tests/tests_using_conda.py (2)
src/snakemake/workflow.py (1)
  • sourcecache (502-503)
src/snakemake/sourcecache.py (1)
  • HostingProviderFile (385-580)
tests/test_sourcecache.py (1)
src/snakemake/sourcecache.py (5)
  • GithubFile (583-599)
  • GitlabFile (602-624)
  • repo (381-382)
  • open (522-540)
  • open (685-689)
src/snakemake/deployment/containerize.py (2)
src/snakemake/deployment/conda.py (3)
  • CondaEnvSpecType (986-1009)
  • from_spec (992-1009)
  • CondaEnvFileSpec (882-914)
src/snakemake/sourcecache.py (1)
  • infer_source_file (627-657)
🪛 Ruff (0.14.10)
src/snakemake/sourcecache.py

347-347: Consider moving this statement to an else block

(TRY300)


357-357: Consider moving this statement to an else block

(TRY300)


494-498: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


494-498: Avoid specifying long messages outside the exception class

(TRY003)

src/snakemake/deployment/containerize.py

43-47: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (7, macos-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: apidocs
🔇 Additional comments (11)
src/snakemake/deployment/containerize.py (1)

41-50: LGTM! Improved control flow.

The refactoring correctly determines the conda env spec type before inferring the source file, which is more efficient than the previous approach. The changes properly:

  • Check the spec type first (line 41) to validate it's FILE-based
  • Only call infer_source_file after validation (line 49), avoiding unnecessary work for non-FILE specs
  • Align with the updated CondaEnvFileSpec constructor that takes only the source file parameter (line 50)

The logic is sound and the API usage matches the updated signatures from conda.py.

src/snakemake/sourcecache.py (8)

6-6: LGTM!

The new imports and helper functions are well-structured:

  • TYPE_CHECKING pattern correctly avoids runtime git import
  • _format_or_none and _replace_suffix helpers reduce code duplication
  • All imports are utilized in the code

Also applies to: 13-15, 22-24, 34-34, 37-39, 41-42, 57-62


65-66: LGTM!

The abstract method additions (check, format, endswith, replace_suffix) ensure consistent API across all SourceFile implementations. The use of smart_join in the join method centralizes path joining logic.

Also applies to: 82-94, 99-101


132-148: LGTM!

The implementations of the new abstract methods across GenericSourceFile, LocalSourceFile, and LocalGitFile are consistent and correct. Each class appropriately implements endswith, replace_suffix, check, and format according to its specific needs.

Also applies to: 168-184, 216-259


298-383: LGTM!

The HostedGitRepo class implementation is well-designed with robust concurrent access handling:

  • Atomic repository cloning using tmpdir and rename for race-free initialization
  • Proper cleanup on exceptions during cloning
  • Appropriate error handling in ref_exists, file_exists, and fetch methods
  • Use of retry logic with exponential backoff for network operations

The static analysis hints (TRY300) about moving statements to else blocks are false positives—the early returns are intentional and improve readability.


437-469: LGTM!

The new methods in HostingProviderFile are well-implemented:

  • endswith, replace_suffix, check, and format follow the pattern established in other SourceFile subclasses
  • mtime() correctly retrieves the last commit date
  • ref property provides clean access to the active reference
  • __str__ provides useful string representation

Also applies to: 514-520, 542-546, 579-580


471-506: LGTM!

The properties are well-implemented:

  • auth provides extensibility for subclasses
  • cache_path getter/setter with appropriate validation and subdirectory structure
  • hosted_repo uses thread-safe lazy initialization with class-level caching
  • Host validation prevents common misconfiguration

The static analysis hints (B904, TRY003) at lines 494-498 are false positives:

  • B904 suggests exception chaining, but this is raising a new exception, not re-raising
  • TRY003 suggests shorter messages, but the detailed message aids debugging

Based on learnings, the cache_path design correctly assumes it remains constant during a workflow run.


583-599: LGTM!

The auth property additions to GithubFile and GitlabFile are well-designed. They provide authentication tokens in the correct format for Git repository URLs while maintaining clean separation of concerns. The token formatting with "@" suffix integrates seamlessly with the HostedGitRepo URL construction.

Also applies to: 602-624


670-670: LGTM!

The SourceCache updates correctly integrate hosting-backed source files:

  • Runtime cache directory naming is clear and consistent
  • cache_path assignment in _cache_entry ensures HostingProviderFile instances have their cache path set before use (as per learnings)
  • _open method properly delegates to HostingProviderFile.open() for hosting-backed sources

Also applies to: 703-705, 775-776

tests/test_sourcecache.py (2)

1-4: LGTM!

The import additions and test function rename are appropriate:

  • New imports (Path, tempfile, patch, GithubFile) are all utilized
  • Test function renamed to follow Python naming conventions

Also applies to: 7-12


14-25: Verify test requirements and consider mocking fetch behavior.

This test makes actual network requests to GitHub, which could cause issues:

  1. Requires network connectivity to pass
  2. May fail if the repository is unavailable or changes
  3. Depends on external service reliability

Consider either:

  • Adding a @connected decorator (similar to other network-dependent tests in the codebase)
  • Mocking additional methods (fetch, ref_exists) to make this a true unit test
  • Explicitly documenting this as an integration test
🔍 Suggested improvements

Option 1: Add @connected decorator

from .common import connected

@connected
def test_github_file_fetch():
    # ... existing test code

Option 2: Mock fetch to make it a unit test

def test_github_file_fetch():
    with patch("snakemake.sourcecache.HostedGitRepo.file_exists", return_value=False), \
         patch("snakemake.sourcecache.HostedGitRepo.ref_exists", return_value=False), \
         patch("snakemake.sourcecache.HostedGitRepo.fetch", return_value=None) as mock_fetch:
        with tempfile.TemporaryDirectory() as tmpdir:
            file = GithubFile(
                repo="snakemake/snakemake",
                path="README.md",
                branch="main",
                cache_path=Path(tmpdir),
            )
            # This would require additional mocking of git.Repo to avoid actual network calls
            # ...

@johanneskoester johanneskoester merged commit d2f8aba into main Jan 8, 2026
59 checks passed
@johanneskoester johanneskoester deleted the fix/source-cache-git-clone branch January 8, 2026 16:20
johanneskoester pushed a commit that referenced this pull request Jan 8, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.14.6](v9.14.5...v9.14.6)
(2026-01-08)


### Bug Fixes

* create local clone of git repos for source files from hosting
providers ([#3643](#3643))
([d2f8aba](d2f8aba))
* create potentially missing .snakemake folder in case of very long
command lines for spawned jobs
([#3894](#3894))
([4b431dd](4b431dd))
* make ilp solver enumeration lazy
([#3900](#3900))
([30e1509](30e1509))
* Prevent broken report_href links by using deterministic report IDs
with fixed prefix length
([#3889](#3889))
([6d8f4d8](6d8f4d8))
* refactor LoggerManager setup and scope
([#3851](#3851))
([f46d904](f46d904))
* yield proper error message in case a local git source file is not
retrievable
([#3892](#3892))
([ed79cae](ed79cae))


### Documentation

* explain how to pass nested config via CLI
([#3885](#3885))
([9d8c539](9d8c539))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
@ezherman
Copy link
Copy Markdown
Contributor

ezherman commented Jan 9, 2026

Hi @johanneskoester, thank you for implementing this fix. I believe I am unfortunately seeing the error I originally reported, and I was wondering whether you could take a look.

After updating Snakemake to 9.14.6 and testing with my repo (ezherman/minimal_example_snakemake_module, branch renamed_script_directory_2), I am running into an import error:

ModuleNotFoundError in file "https://raw.githubusercontent.com/ezherman/minimal_example_snakemake_module/renamed_script_directory_2/workflow/minimal_module/workflow/Snakefile", line 4:
No module named 'module_scripts'

Inspecting sys.path during execution, I see that the first entry is a remote URL:

['https://raw.githubusercontent.com/ezherman/minimal_example_snakemake_module/renamed_script_directory_2/workflow/minimal_module/workflow', ...]

I expected it to point to the local cached clone so Python could resolve imports like import module_scripts as proper local modules.

To summarize:

  • The repo seems to be cloned to the local cache, but the local path is not being prepended to sys.path.
  • The relative module import fails.

Is there a step I might be missing? Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants