fix(config): restore $ref resolution for schemas#3777
fix(config): restore $ref resolution for schemas#3777johanneskoester merged 1 commit intosnakemake:mainfrom
Conversation
📝 WalkthroughWalkthroughReplaces remote JSON Schema $ref resolution with local-file resolution: sets the primary schema "$id" to the schema file's file:// URI and provides a custom retrieve_uri that converts URIs to local filesystem paths before loading referenced schemas; adds tests for relative, nested, fragment, allOf/anyOf/defs reference scenarios. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Workflow
participant Utils as snakemake.utils.validate
participant Loader as _load_configfile
participant JSONS as jsonschema.Registry/Resource
User->>Workflow: run workflow (config + schema path)
Workflow->>Utils: validate(config, schema=".../config.schema.yaml")
Utils->>Loader: load primary schema file
Note right of Utils: set schema["$id"]=Path(...).resolve().as_uri()
rect rgb(230,245,255)
Utils->>JSONS: build Resource/Registry with custom retrieve_uri
JSONS-->>Utils: validator ready
Utils->>JSONS: validate config
JSONS->>Utils: request $ref (e.g. "bar.schema.yaml")
Note right of Utils: retrieve_uri -> urlparse(uri).path -> url2pathname -> local path
Utils->>Loader: load referenced schema from local filesystem path
Loader-->>Utils: referenced schema contents
Utils-->>JSONS: provide referenced schema
end
JSONS-->>Utils: validation result
Utils-->>Workflow: return success or raise WorkflowError
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/snakemake/utils.py (1)
95-104: Consider Windows path handling for file:// URIs.On Windows,
urlparse(uri).pathforfile:///C:/path/to/filereturns/C:/path/to/file, which includes a leading slash before the drive letter. While_load_configfilemight handle this correctly (viaopen()), it's worth verifying that this works on Windows.If issues arise on Windows, use
urllib.request.url2pathname:from urllib.parse import urlparse + from urllib.request import url2pathname ... def retrieve_uri(uri): ... return Resource.from_contents( - contents=_load_configfile(urlparse(uri).path, filetype="Schema") + contents=_load_configfile(url2pathname(urlparse(uri).path), filetype="Schema") )Alternatively, verify current behavior is correct on Windows through testing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/utils.py(2 hunks)tests/test_schema.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/utils.pytests/test_schema.py
🧬 Code graph analysis (2)
src/snakemake/utils.py (1)
src/snakemake/common/configfile.py (1)
_load_configfile(7-33)
tests/test_schema.py (1)
src/snakemake/utils.py (1)
validate(26-263)
🔇 Additional comments (6)
tests/test_schema.py (4)
4-4: LGTM!The import of
WorkflowErroris correctly added to support the new test assertions that expect validation errors.
61-159: LGTM!The new schema constants comprehensively cover various JSON Schema reference patterns:
- Relative $ref (
RELATIVE_CONFIG_SCHEMA)- Nested relative $ref chains (
NESTED_BAR_SCHEMA,NESTED_BAZ_SCHEMA)- Remote $id with local $ref (
REMOTE_ID_CONFIG_SCHEMA)- Fragment references (
FRAGMENT_BAR_SCHEMA,FRAGMENT_CONFIG_SCHEMA)- allOf with defaults (
ALLOF_CONFIG_SCHEMA)- $defs with anyOf (
DEFS_CONFIG_SCHEMA)All schemas are syntactically valid and properly test the reference resolution functionality described in the PR objectives.
220-280: LGTM!The new fixtures properly set up the necessary file structure for testing various $ref resolution scenarios. Each fixture:
- Creates a main schema file with appropriate $ref declarations
- Creates referenced schema files (bar.schema.yaml, baz.schema.yaml) in the same directory
- Returns the path to the main schema for use in tests
This structure accurately simulates the real-world use case described in issue #3648 where local cross-file schema references need to resolve correctly.
312-360: LGTM!The new test functions comprehensively validate the $ref resolution fix:
- Test relative references with required fields (lines 314-320)
- Test nested reference chains (lines 322-328)
- Test that remote $id doesn't break local $ref resolution (lines 330-338)
- Test fragment references (#/definitions/...) (lines 340-348)
- Test allOf with default value propagation (lines 350-355)
- Test anyOf via $defs with default value selection (lines 357-360)
Each test verifies both validation failures (missing required fields) and successes (valid data), which ensures the reference resolution is working correctly at multiple levels.
src/snakemake/utils.py (2)
53-53: LGTM!The import of
urlparseis necessary for parsing file:// URIs in the updatedretrieve_urifunction.
69-92: LGTM!The comment block is thorough and clearly documents:
- The purpose of the $id injection (enforce local reference resolution)
- The relationship to the regression (PR #3420, issue #3648)
- The intentional breaking change (no longer fetching remote URIs specified in schema $id)
- The rationale for this design decision (efficiency and predictability)
This level of documentation will be valuable for future maintainers.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/snakemake/utils.py (1)
94-94: Consider using pathname2url for robust URI construction.The current approach of string concatenation may not handle Windows paths correctly (backslashes, drive letters) or URL-encode special characters (spaces, etc.) in file paths.
Apply this diff to use the standard library's URI construction:
+ from urllib.request import pathname2url + - schema["$id"] = "file://" + os.path.abspath(schemafile.get_path_or_uri()) + schema["$id"] = "file:" + pathname2url(os.path.abspath(schemafile.get_path_or_uri()))Alternatively, use pathlib for a more modern approach:
+ from pathlib import Path + - schema["$id"] = "file://" + os.path.abspath(schemafile.get_path_or_uri()) + schema["$id"] = Path(schemafile.get_path_or_uri()).resolve().as_uri()Both approaches ensure:
- Correct URI format on Windows (
file:///C:/pathinstead offile://C:\path)- Proper URL encoding of special characters (spaces →
%20)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/utils.py(2 hunks)tests/test_schema.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/utils.pytests/test_schema.py
🧬 Code graph analysis (2)
src/snakemake/utils.py (2)
src/snakemake/sourcecache.py (1)
abspath(118-119)src/snakemake/common/configfile.py (1)
_load_configfile(7-33)
tests/test_schema.py (1)
src/snakemake/utils.py (1)
validate(26-266)
🔇 Additional comments (7)
src/snakemake/utils.py (3)
53-54: LGTM! Imports are appropriate for URI handling.The addition of
urlparseandurl2pathnameis necessary for the new URI parsing logic in theretrieve_urifunction.
70-93: Excellent documentation of the design decision.The comment thoroughly explains the rationale for injecting the
$id, the regression being fixed, the backward compatibility break, and the reasoning behind it. This will be valuable for future maintainers.
96-107: LGTM! URI retrieval logic is correct.The function correctly:
- Accepts URIs resolved by the referencing library (absolute file:// URIs)
- Extracts the path component with
urlparse(uri).path- Converts the URL path to a local filesystem path with
url2pathname(handles platform differences and URL decoding)- Loads the schema using the existing
_load_configfilehelpertests/test_schema.py (4)
4-4: LGTM! Good practice to import at module scope.Moving the
WorkflowErrorimport to module scope is cleaner than importing it inside individual test functions.
61-159: LGTM! Comprehensive schema definitions for testing.The new schema definitions cover all the scenarios mentioned in the PR objectives:
- Relative
$refresolution- Nested
$refchains- Remote
$idwith local$ref(validates new behavior)- Fragment references (
#/definitions/...)allOfwith defaultsanyOfvia$defswith defaults
220-280: LGTM! Fixtures properly set up test environments.The fixtures correctly:
- Create schema files in temporary directories
- Write schema content from the constants
- Set up dependencies (e.g., bar.schema.yaml, baz.schema.yaml)
- Follow consistent naming conventions
314-360: LGTM! Thorough test coverage of reference resolution scenarios.The new tests effectively validate:
- Relative refs (
test_config_ref_relative): Basic relative$refresolution- Nested refs (
test_config_ref_relative_nested): Multi-level$refchains- Remote $id (
test_config_ref_relative_with_remote_id): Ensures local resolution even with remote$id(validates the behavior change documented in utils.py)- Fragment refs (
test_config_ref_relative_with_fragment): References to schema definitions- allOf defaults (
test_config_allof_default): Default value assignment withallOf- anyOf via $defs (
test_config_anyof_via_defs_default): Default value assignment withanyOfEach test correctly verifies both validation errors (invalid configs) and successful validation (valid configs), ensuring the fix works as intended.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_schema.py (1)
314-360: Good test coverage, but workflow-context scenario not tested.The new tests provide excellent coverage of various JSON Schema reference scenarios, including:
- Relative references (simple and nested)
- Remote
$idwith local$refresolution (verifying the intentional behavior)- Fragment references
- Default value assignment with
allOfandanyOfEach test follows a clear pattern: validate invalid config expecting failure, then validate valid config expecting success.
However, all tests call
validate()without a workflow context (i.e.,workflow=None). According to the past review comment onsrc/snakemake/utils.py, the original bug would manifest "whenvalidate()is called from a Snakefile" (whenworkflowexists andworkflow.sourcecache.open(schemafile)is used at line 64 instead ofschemafile.get_path_or_uri()at line 66).The current tests don't cover the workflow-context case, which means they wouldn't catch regressions in that code path. Consider adding at least one integration test that validates a schema from within a Snakefile context to ensure the fix works in production scenarios.
Would you like me to help design a workflow-context test or open an issue to track this test coverage gap?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/utils.py(2 hunks)tests/test_schema.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/utils.pytests/test_schema.py
🔇 Additional comments (6)
src/snakemake/utils.py (3)
53-55: LGTM!The imports are from the standard library and are appropriate for the URI-to-path conversion logic introduced in this PR.
71-94: LGTM!The extensive comments clearly document the rationale for injecting the local
file://URI as$id, the intentional backward-compatibility break from the RefResolver approach, and why remote URI resolution is not supported. This level of detail is appropriate given the complexity and the intentional breaking change.
97-108: LGTM!The
retrieve_uriimplementation correctly convertsfile://URIs to filesystem paths usingurl2pathname(urlparse(uri).path)before passing to_load_configfile. This approach properly handles:
- URL encoding in the path
- Windows file URIs (
file:///C:/...)- The intentional design constraint that remote URIs are not supported
The comments clearly explain that the referencing library resolves relative
$refs against the schema's$idbefore callingretrieve, so all URIs should be absolutefile://URIs by this point.tests/test_schema.py (3)
4-4: LGTM!Moving the
WorkflowErrorimport to module scope is a good practice and improves code organization.
61-159: LGTM!The new schema constants provide comprehensive coverage of various JSON Schema reference scenarios:
- Relative references
- Nested references
- Remote
$idwith local$ref(tests the intentional behavior change)- Fragment references (
#/definitions/...)allOfwith defaults$defswithanyOfand defaultsThe schema definitions are syntactically correct and well-structured for testing the restored local reference resolution behavior.
220-280: LGTM!The new fixtures are well-structured and correctly set up the schema file dependencies for testing various reference scenarios. Each fixture:
- Creates the main schema file
- Creates any dependent schema files that will be referenced
- Returns the path to the main schema file
This approach provides clean test isolation and comprehensive scenario coverage.
|
Thanks a lot! Also for the in depth and careful testing!! |
|
@johanneskoester: Thank you for the fast review. Anything else you want me to adjust? Like how exactly to construct the absolute config schema |
Head branch was pushed to by a user without write access
This commit fixes a regression introduced in Snakemake 9.6.0 where relative `$ref` references in local config schemas were not correctly resolved during validation. This regression was introduced when refactoring the config validation from the (deprecated) `RefResolver` to the (current) `resolving` API in snakemake#3420. The two required changes were: 1. Injection of an absolute file:// $id in schemas to enforce proper local reference resolution. 2. A follow-up update to retrieve_uri to correctly load subschemas (as it now receives a file:// URI). In addition, to prevent future regressions, this commit adds tests for (nested) `$ref`s, `$ref` with remote `$id` (see below), fragment references, `allOf`/`anyOf` defaults, and `$def`s. This commit restores pre-9.6.0 behavior and prevents ValidationError when nested schemas are referenced. Caveat: This commit intentionally does *not* restore one specific aspect of the old (`RefResolver` based) implementation: If a schema file contains an explicit `$id` pointing to a remote copy of the schema, the old implementation would fetch any references (in- or external) from that remote URI. Instead, the implementation introduced by this commit always resolves references based on the local schema file. This behaviour is demonstrated (and validated) with the remote `$id` test mentioned above. The following checks have been performed to ensure the correctness of this patch: 1. All tests (but the remote `$id` one, see above) were successfully run at commit `960f6a89eaa31da6014e810dfcf08f635ac03a6e` (last ancestor of current `main` that still uses the old `RefResolver` implementation), proving that they cover pre-9.6.0 behaviour. 2. Re-running the same tests at commit `cf724272eefcd9b30fb625d2e16380727bef9c3e` (direct child of the above, introducing the changes from PR snakemake#3420), the three (nested) `$ref`s test fail, verifying this commit/PR caused the regression and the tests added in this commit catch that regression. 3. The same tests still fail at commit `2eb9079dae29f271aeacff75e59f6c2f6a0d352d` (current `main`), demonstrating that the regression has not been fixed yet. 4. All tests pass after applying the changes from this commit, validating it fixes the regression. --- fixes snakemake#3648
|
I went ahead and force-pushed a rebase and the typo fix caught by your automatic check. So this version should pass all tests now. 🤞🏻 |
|
Thanks a lot! |
🤖 I have created a release *beep* *boop* --- ## [9.13.1](v9.13.0...v9.13.1) (2025-10-11) ### Bug Fixes * **config:** restore $ref resolution for schemas ([#3777](#3777)) ([a3e0939](a3e0939)) * fix iofile formatting to always display the storage query if it is set, also in case of missing input exceptions ([#3786](#3786)) ([b1c29fa](b1c29fa)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This PR fixes a regression introduced in Snakemake 9.6.0 where relative `$ref` references in local config schemas were not correctly resolved during validation. This regression was introduced when refactoring the config validation from the (deprecated) `RefResolver` to the (current) `resolving` API in snakemake#3420. The two required changes were: 1. Injection of an absolute `file://` `$id` in schemas to enforce proper local reference resolution. 2. A follow-up update to retrieve_uri to correctly load subschemas (as it now receives a `file://` URI). In addition, to prevent future regressions, this PR adds tests for (nested) `$ref`s, `$ref` with remote `$id` (see below), fragment references, `allOf`/`anyOf` defaults, and `$def`s. This commit restores pre-9.6.0 behavior and prevents ValidationError when nested schemas are referenced. Caveat: This PR intentionally does *not* restore one specific aspect of the old (`RefResolver` based) implementation: If a schema file contains an explicit `$id` pointing to a remote copy of the schema, the old implementation would fetch any references (in- or external) from that remote URI. Instead, the implementation introduced by this PR always resolves references based on the local schema file. This behaviour is demonstrated (and validated) with the remote `$id` test mentioned above. The following checks have been performed to ensure the correctness of this patch: 1. All tests (but the remote `$id` one, see above) were successfully run at commit [`960f6a89eaa31da6014e810dfcf08f635ac03a6e`](snakemake@960f6a8) (last ancestor of current `main` that still uses the old `RefResolver` implementation), proving that they cover pre-9.6.0 behaviour. 2. Re-running the same tests at commit [`cf724272eefcd9b30fb625d2e16380727bef9c3e`](snakemake@cf72427) (direct child of the above, introducing the changes from PR snakemake#3420), the three (nested) `$ref`s test fail, verifying this commit/PR caused the regression and the tests added in this commit catch that regression. 3. The same tests still fail at commit [`2eb9079dae29f271aeacff75e59f6c2f6a0d352d`](snakemake@2eb9079) (current `main`), demonstrating that the regression has not been fixed yet. 4. All tests pass after applying the changes from this PR, validating it fixes the regression. ### QC * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). #### Notes: This is my first contribution to the project. Thus, I am not familiar with the codebase. It is also my first time dealing with `jsonschema` and the both, its (deprecated) `RefResolver` API (to analyse the old implementation and design tests accordingly), and its current `resolving` API (to fix the regression). Please let me know if you have any adjustments you want/need me to do. Also, I started from a standalone POC script to make sure I understand the API, defined tests and then worked on the actual fix off that. Initially, I overestimated the power of pre-9.6.0's implementation of the `validate` function and designed a test that tested defaults propagation across (nested) `$ref`s. Thus, my intermediate solution was much more complex (and powerful) than this simple (minimal) fix. I do have a local branch with that work in case anybody is interested in implementing more powerful defaults value handling during config validation. Please reach out if there is interest. --- fixes snakemake#3648 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Schema resolution now enforces local resolution of relative $ref references, improving validation and default assignment across nested refs, fragments, allOf/anyOf/defs; may change behavior for schemas that previously relied on remote resolution. * **Tests** * Added comprehensive tests and fixtures covering relative/fragment references, nested schemas, allOf/anyOf/defs, default handling, and error scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.13.1](snakemake/snakemake@v9.13.0...v9.13.1) (2025-10-11) ### Bug Fixes * **config:** restore $ref resolution for schemas ([snakemake#3777](snakemake#3777)) ([a3e0939](snakemake@a3e0939)) * fix iofile formatting to always display the storage query if it is set, also in case of missing input exceptions ([snakemake#3786](snakemake#3786)) ([b1c29fa](snakemake@b1c29fa)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This PR fixes a regression introduced in Snakemake 9.6.0 where relative
$refreferences in local config schemas were not correctly resolved during validation.This regression was introduced when refactoring the config validation from the (deprecated)
RefResolverto the (current)resolvingAPI in #3420.The two required changes were:
file://$idin schemas to enforce proper local reference resolution.file://URI).In addition, to prevent future regressions, this PR adds tests for (nested)
$refs,$refwith remote$id(see below), fragment references,allOf/anyOfdefaults, and$defs.This commit restores pre-9.6.0 behavior and prevents ValidationError when nested schemas are referenced.
Caveat: This PR intentionally does not restore one specific aspect of the old (
RefResolverbased) implementation:If a schema file contains an explicit
$idpointing to a remote copy of the schema, the old implementation would fetch any references (in- or external) from that remote URI.Instead, the implementation introduced by this PR always resolves references based on the local schema file.
This behaviour is demonstrated (and validated) with the remote
$idtest mentioned above.The following checks have been performed to ensure the correctness of this patch:
$idone, see above) were successfully run at commit960f6a89eaa31da6014e810dfcf08f635ac03a6e(last ancestor of currentmainthat still uses the oldRefResolverimplementation), proving that they cover pre-9.6.0 behaviour.cf724272eefcd9b30fb625d2e16380727bef9c3e(direct child of the above, introducing the changes from PR fix: DeprecationWarning when using snakemake.utils.validate #3420), the three (nested)$refs test fail, verifying this commit/PR caused the regression and the tests added in this commit catch that regression.2eb9079dae29f271aeacff75e59f6c2f6a0d352d(currentmain), demonstrating that the regression has not been fixed yet.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Notes:
This is my first contribution to the project. Thus, I am not familiar with the codebase.
It is also my first time dealing with
jsonschemaand the both, its (deprecated)RefResolverAPI (to analyse the old implementation and design tests accordingly), and its currentresolvingAPI (to fix the regression).Please let me know if you have any adjustments you want/need me to do.
Also, I started from a standalone POC script to make sure I understand the API, defined tests and then worked on the actual fix off that.
Initially, I overestimated the power of pre-9.6.0's implementation of the
validatefunction and designed a test that tested defaults propagation across (nested)$refs. Thus, my intermediate solution was much more complex (and powerful) than this simple (minimal) fix.I do have a local branch with that work in case anybody is interested in implementing more powerful defaults value handling during config validation. Please reach out if there is interest.
fixes #3648
Summary by CodeRabbit
Bug Fixes
Tests