Skip to content

fix(config): restore $ref resolution for schemas#3777

Merged
johanneskoester merged 1 commit intosnakemake:mainfrom
mschilli87:issue-3648
Oct 11, 2025
Merged

fix(config): restore $ref resolution for schemas#3777
johanneskoester merged 1 commit intosnakemake:mainfrom
mschilli87:issue-3648

Conversation

@mschilli87
Copy link
Copy Markdown
Contributor

@mschilli87 mschilli87 commented Oct 6, 2025

This PR fixes a regression introduced in Snakemake 9.6.0 where relative $ref references in local config schemas were not correctly resolved during validation.

This regression was introduced when refactoring the config validation from the (deprecated) RefResolver to the (current) resolving API in #3420.

The two required changes were:

  1. Injection of an absolute file:// $id in schemas to enforce proper local reference resolution.
  2. A follow-up update to retrieve_uri to correctly load subschemas (as it now receives a file:// URI).

In addition, to prevent future regressions, this PR adds tests for (nested) $refs, $ref with remote $id (see below), fragment references, allOf/anyOf defaults, and $defs.

This commit restores pre-9.6.0 behavior and prevents ValidationError when nested schemas are referenced.

Caveat: This PR intentionally does not restore one specific aspect of the old (RefResolver based) implementation:
If a schema file contains an explicit $id pointing to a remote copy of the schema, the old implementation would fetch any references (in- or external) from that remote URI.
Instead, the implementation introduced by this PR always resolves references based on the local schema file.
This behaviour is demonstrated (and validated) with the remote $id test mentioned above.

The following checks have been performed to ensure the correctness of this patch:

  1. All tests (but the remote $id one, see above) were successfully run at commit 960f6a89eaa31da6014e810dfcf08f635ac03a6e (last ancestor of current main that still uses the old RefResolver implementation), proving that they cover pre-9.6.0 behaviour.
  2. Re-running the same tests at commit cf724272eefcd9b30fb625d2e16380727bef9c3e (direct child of the above, introducing the changes from PR fix: DeprecationWarning when using snakemake.utils.validate #3420), the three (nested) $refs test fail, verifying this commit/PR caused the regression and the tests added in this commit catch that regression.
  3. The same tests still fail at commit 2eb9079dae29f271aeacff75e59f6c2f6a0d352d (current main), demonstrating that the regression has not been fixed yet.
  4. All tests pass after applying the changes from this PR, validating it fixes the regression.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Notes:

This is my first contribution to the project. Thus, I am not familiar with the codebase.
It is also my first time dealing with jsonschema and the both, its (deprecated) RefResolver API (to analyse the old implementation and design tests accordingly), and its current resolving API (to fix the regression).
Please let me know if you have any adjustments you want/need me to do.

Also, I started from a standalone POC script to make sure I understand the API, defined tests and then worked on the actual fix off that.
Initially, I overestimated the power of pre-9.6.0's implementation of the validate function and designed a test that tested defaults propagation across (nested) $refs. Thus, my intermediate solution was much more complex (and powerful) than this simple (minimal) fix.
I do have a local branch with that work in case anybody is interested in implementing more powerful defaults value handling during config validation. Please reach out if there is interest.


fixes #3648

Summary by CodeRabbit

  • Bug Fixes

    • Schema resolution now enforces local resolution of relative $ref references, improving validation and default assignment across nested refs, fragments, allOf/anyOf/defs; may change behavior for schemas that previously relied on remote resolution.
  • Tests

    • Added comprehensive tests and fixtures covering relative/fragment references, nested schemas, allOf/anyOf/defs, default handling, and error scenarios.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 6, 2025

📝 Walkthrough

Walkthrough

Replaces remote JSON Schema $ref resolution with local-file resolution: sets the primary schema "$id" to the schema file's file:// URI and provides a custom retrieve_uri that converts URIs to local filesystem paths before loading referenced schemas; adds tests for relative, nested, fragment, allOf/anyOf/defs reference scenarios.

Changes

Cohort / File(s) Summary
Schema resolution logic
src/snakemake/utils.py
Add imports (Path, urlparse, url2pathname); assign schema["$id"] = Path(schemafile.get_path_or_uri()).resolve().as_uri(); implement retrieve_uri to convert URIs to local paths via urlparse(...).path and url2pathname and load referenced schemas with the existing loader; construct Resource/Registry with the custom retriever; retain comments about local vs remote resolution and backward-compatibility.
Schema resolution tests
tests/test_schema.py
Add multiple inline schema constants and pytest fixtures for relative, nested, remote-id, fragment, allOf/defs cases; add tests test_config_ref_relative, test_config_ref_relative_nested, test_config_ref_relative_with_remote_id, test_config_ref_relative_with_fragment, test_config_allof_default, test_config_anyof_via_defs_default; move WorkflowError import to module scope; assert expected validation outcomes and error raising for invalid references.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Workflow
  participant Utils as snakemake.utils.validate
  participant Loader as _load_configfile
  participant JSONS as jsonschema.Registry/Resource

  User->>Workflow: run workflow (config + schema path)
  Workflow->>Utils: validate(config, schema=".../config.schema.yaml")
  Utils->>Loader: load primary schema file
  Note right of Utils: set schema["$id"]=Path(...).resolve().as_uri()

  rect rgb(230,245,255)
    Utils->>JSONS: build Resource/Registry with custom retrieve_uri
    JSONS-->>Utils: validator ready
    Utils->>JSONS: validate config
    JSONS->>Utils: request $ref (e.g. "bar.schema.yaml")
    Note right of Utils: retrieve_uri -> urlparse(uri).path -> url2pathname -> local path
    Utils->>Loader: load referenced schema from local filesystem path
    Loader-->>Utils: referenced schema contents
    Utils-->>JSONS: provide referenced schema
  end

  JSONS-->>Utils: validation result
  Utils-->>Workflow: return success or raise WorkflowError
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely indicates the fix to restore $ref resolution in configuration schemas within the config validation logic, directly reflecting the main change in the pull request.
Linked Issues Check ✅ Passed The changes inject an absolute file:// $id, update retrieve_uri, and add comprehensive tests for nested, remote‐ID, fragment, allOf/anyOf, and $defs scenarios, directly addressing the objectives of issue #3648 to restore local JSON Schema $ref resolution.
Out of Scope Changes Check ✅ Passed All modifications are limited to the schema resolution logic in utils.py and the corresponding test suite; no unrelated files or functionality have been altered.
Description Check ✅ Passed The pull request description clearly outlines the regression issue, the specific code changes made to restore correct $ref resolution, the rationale behind those changes, and the historical testing steps, and it includes a properly formatted QC section with both required checkboxes ticked in accordance with the repository’s template.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mschilli87

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/snakemake/utils.py (1)

95-104: Consider Windows path handling for file:// URIs.

On Windows, urlparse(uri).path for file:///C:/path/to/file returns /C:/path/to/file, which includes a leading slash before the drive letter. While _load_configfile might handle this correctly (via open()), it's worth verifying that this works on Windows.

If issues arise on Windows, use urllib.request.url2pathname:

     from urllib.parse import urlparse
+    from urllib.request import url2pathname
     
     ...
     
     def retrieve_uri(uri):
         ...
         return Resource.from_contents(
-            contents=_load_configfile(urlparse(uri).path, filetype="Schema")
+            contents=_load_configfile(url2pathname(urlparse(uri).path), filetype="Schema")
         )

Alternatively, verify current behavior is correct on Windows through testing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9c9d27 and dc95ddb.

📒 Files selected for processing (2)
  • src/snakemake/utils.py (2 hunks)
  • tests/test_schema.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/utils.py
  • tests/test_schema.py
🧬 Code graph analysis (2)
src/snakemake/utils.py (1)
src/snakemake/common/configfile.py (1)
  • _load_configfile (7-33)
tests/test_schema.py (1)
src/snakemake/utils.py (1)
  • validate (26-263)
🔇 Additional comments (6)
tests/test_schema.py (4)

4-4: LGTM!

The import of WorkflowError is correctly added to support the new test assertions that expect validation errors.


61-159: LGTM!

The new schema constants comprehensively cover various JSON Schema reference patterns:

  • Relative $ref (RELATIVE_CONFIG_SCHEMA)
  • Nested relative $ref chains (NESTED_BAR_SCHEMA, NESTED_BAZ_SCHEMA)
  • Remote $id with local $ref (REMOTE_ID_CONFIG_SCHEMA)
  • Fragment references (FRAGMENT_BAR_SCHEMA, FRAGMENT_CONFIG_SCHEMA)
  • allOf with defaults (ALLOF_CONFIG_SCHEMA)
  • $defs with anyOf (DEFS_CONFIG_SCHEMA)

All schemas are syntactically valid and properly test the reference resolution functionality described in the PR objectives.


220-280: LGTM!

The new fixtures properly set up the necessary file structure for testing various $ref resolution scenarios. Each fixture:

  • Creates a main schema file with appropriate $ref declarations
  • Creates referenced schema files (bar.schema.yaml, baz.schema.yaml) in the same directory
  • Returns the path to the main schema for use in tests

This structure accurately simulates the real-world use case described in issue #3648 where local cross-file schema references need to resolve correctly.


312-360: LGTM!

The new test functions comprehensively validate the $ref resolution fix:

  • Test relative references with required fields (lines 314-320)
  • Test nested reference chains (lines 322-328)
  • Test that remote $id doesn't break local $ref resolution (lines 330-338)
  • Test fragment references (#/definitions/...) (lines 340-348)
  • Test allOf with default value propagation (lines 350-355)
  • Test anyOf via $defs with default value selection (lines 357-360)

Each test verifies both validation failures (missing required fields) and successes (valid data), which ensures the reference resolution is working correctly at multiple levels.

src/snakemake/utils.py (2)

53-53: LGTM!

The import of urlparse is necessary for parsing file:// URIs in the updated retrieve_uri function.


69-92: LGTM!

The comment block is thorough and clearly documents:

  • The purpose of the $id injection (enforce local reference resolution)
  • The relationship to the regression (PR #3420, issue #3648)
  • The intentional breaking change (no longer fetching remote URIs specified in schema $id)
  • The rationale for this design decision (efficiency and predictability)

This level of documentation will be valuable for future maintainers.

Comment thread src/snakemake/utils.py Outdated
@mschilli87

This comment was marked as outdated.

@mschilli87

This comment was marked as resolved.

mschilli87 added a commit to mschilli87/snakemake that referenced this pull request Oct 6, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/snakemake/utils.py (1)

94-94: Consider using pathname2url for robust URI construction.

The current approach of string concatenation may not handle Windows paths correctly (backslashes, drive letters) or URL-encode special characters (spaces, etc.) in file paths.

Apply this diff to use the standard library's URI construction:

+    from urllib.request import pathname2url
+
-    schema["$id"] = "file://" + os.path.abspath(schemafile.get_path_or_uri())
+    schema["$id"] = "file:" + pathname2url(os.path.abspath(schemafile.get_path_or_uri()))

Alternatively, use pathlib for a more modern approach:

+    from pathlib import Path
+
-    schema["$id"] = "file://" + os.path.abspath(schemafile.get_path_or_uri())
+    schema["$id"] = Path(schemafile.get_path_or_uri()).resolve().as_uri()

Both approaches ensure:

  • Correct URI format on Windows (file:///C:/path instead of file://C:\path)
  • Proper URL encoding of special characters (spaces → %20)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f606e2 and 5d5f507.

📒 Files selected for processing (2)
  • src/snakemake/utils.py (2 hunks)
  • tests/test_schema.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/utils.py
  • tests/test_schema.py
🧬 Code graph analysis (2)
src/snakemake/utils.py (2)
src/snakemake/sourcecache.py (1)
  • abspath (118-119)
src/snakemake/common/configfile.py (1)
  • _load_configfile (7-33)
tests/test_schema.py (1)
src/snakemake/utils.py (1)
  • validate (26-266)
🔇 Additional comments (7)
src/snakemake/utils.py (3)

53-54: LGTM! Imports are appropriate for URI handling.

The addition of urlparse and url2pathname is necessary for the new URI parsing logic in the retrieve_uri function.


70-93: Excellent documentation of the design decision.

The comment thoroughly explains the rationale for injecting the $id, the regression being fixed, the backward compatibility break, and the reasoning behind it. This will be valuable for future maintainers.


96-107: LGTM! URI retrieval logic is correct.

The function correctly:

  1. Accepts URIs resolved by the referencing library (absolute file:// URIs)
  2. Extracts the path component with urlparse(uri).path
  3. Converts the URL path to a local filesystem path with url2pathname (handles platform differences and URL decoding)
  4. Loads the schema using the existing _load_configfile helper
tests/test_schema.py (4)

4-4: LGTM! Good practice to import at module scope.

Moving the WorkflowError import to module scope is cleaner than importing it inside individual test functions.


61-159: LGTM! Comprehensive schema definitions for testing.

The new schema definitions cover all the scenarios mentioned in the PR objectives:

  • Relative $ref resolution
  • Nested $ref chains
  • Remote $id with local $ref (validates new behavior)
  • Fragment references (#/definitions/...)
  • allOf with defaults
  • anyOf via $defs with defaults

220-280: LGTM! Fixtures properly set up test environments.

The fixtures correctly:

  • Create schema files in temporary directories
  • Write schema content from the constants
  • Set up dependencies (e.g., bar.schema.yaml, baz.schema.yaml)
  • Follow consistent naming conventions

314-360: LGTM! Thorough test coverage of reference resolution scenarios.

The new tests effectively validate:

  • Relative refs (test_config_ref_relative): Basic relative $ref resolution
  • Nested refs (test_config_ref_relative_nested): Multi-level $ref chains
  • Remote $id (test_config_ref_relative_with_remote_id): Ensures local resolution even with remote $id (validates the behavior change documented in utils.py)
  • Fragment refs (test_config_ref_relative_with_fragment): References to schema definitions
  • allOf defaults (test_config_allof_default): Default value assignment with allOf
  • anyOf via $defs (test_config_anyof_via_defs_default): Default value assignment with anyOf

Each test correctly verifies both validation errors (invalid configs) and successful validation (valid configs), ensuring the fix works as intended.

@mschilli87

This comment was marked as resolved.

@mschilli87

This comment was marked as resolved.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 6, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_schema.py (1)

314-360: Good test coverage, but workflow-context scenario not tested.

The new tests provide excellent coverage of various JSON Schema reference scenarios, including:

  • Relative references (simple and nested)
  • Remote $id with local $ref resolution (verifying the intentional behavior)
  • Fragment references
  • Default value assignment with allOf and anyOf

Each test follows a clear pattern: validate invalid config expecting failure, then validate valid config expecting success.

However, all tests call validate() without a workflow context (i.e., workflow=None). According to the past review comment on src/snakemake/utils.py, the original bug would manifest "when validate() is called from a Snakefile" (when workflow exists and workflow.sourcecache.open(schemafile) is used at line 64 instead of schemafile.get_path_or_uri() at line 66).

The current tests don't cover the workflow-context case, which means they wouldn't catch regressions in that code path. Consider adding at least one integration test that validates a schema from within a Snakefile context to ensure the fix works in production scenarios.

Would you like me to help design a workflow-context test or open an issue to track this test coverage gap?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d5f507 and cc4ddf3.

📒 Files selected for processing (2)
  • src/snakemake/utils.py (2 hunks)
  • tests/test_schema.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/utils.py
  • tests/test_schema.py
🔇 Additional comments (6)
src/snakemake/utils.py (3)

53-55: LGTM!

The imports are from the standard library and are appropriate for the URI-to-path conversion logic introduced in this PR.


71-94: LGTM!

The extensive comments clearly document the rationale for injecting the local file:// URI as $id, the intentional backward-compatibility break from the RefResolver approach, and why remote URI resolution is not supported. This level of detail is appropriate given the complexity and the intentional breaking change.


97-108: LGTM!

The retrieve_uri implementation correctly converts file:// URIs to filesystem paths using url2pathname(urlparse(uri).path) before passing to _load_configfile. This approach properly handles:

  • URL encoding in the path
  • Windows file URIs (file:///C:/...)
  • The intentional design constraint that remote URIs are not supported

The comments clearly explain that the referencing library resolves relative $refs against the schema's $id before calling retrieve, so all URIs should be absolute file:// URIs by this point.

tests/test_schema.py (3)

4-4: LGTM!

Moving the WorkflowError import to module scope is a good practice and improves code organization.


61-159: LGTM!

The new schema constants provide comprehensive coverage of various JSON Schema reference scenarios:

  • Relative references
  • Nested references
  • Remote $id with local $ref (tests the intentional behavior change)
  • Fragment references (#/definitions/...)
  • allOf with defaults
  • $defs with anyOf and defaults

The schema definitions are syntactically correct and well-structured for testing the restored local reference resolution behavior.


220-280: LGTM!

The new fixtures are well-structured and correctly set up the schema file dependencies for testing various reference scenarios. Each fixture:

  1. Creates the main schema file
  2. Creates any dependent schema files that will be referenced
  3. Returns the path to the main schema file

This approach provides clean test isolation and comprehensive scenario coverage.

Comment thread src/snakemake/utils.py
@johanneskoester
Copy link
Copy Markdown
Contributor

Thanks a lot! Also for the in depth and careful testing!!

@johanneskoester johanneskoester enabled auto-merge (squash) October 8, 2025 09:45
@mschilli87
Copy link
Copy Markdown
Contributor Author

mschilli87 commented Oct 8, 2025

@johanneskoester: Thank you for the fast review. Anything else you want me to adjust? Like how exactly to construct the absolute config schema file: URI (see bot messages), spelling mistakes, rebasing on / merging of latest main, etc.?

auto-merge was automatically disabled October 8, 2025 10:07

Head branch was pushed to by a user without write access

This commit fixes a regression introduced in Snakemake 9.6.0 where
relative `$ref` references in local config schemas were not correctly
resolved during validation.

This regression was introduced when refactoring the config validation
from the (deprecated) `RefResolver` to the (current) `resolving` API in
snakemake#3420.

The two required changes were:

1. Injection of an absolute file:// $id in schemas to enforce proper
   local reference resolution.
2. A follow-up update to retrieve_uri to correctly load subschemas (as
   it now receives a file:// URI).

In addition, to prevent future regressions, this commit adds tests for
(nested) `$ref`s, `$ref` with remote `$id` (see below), fragment
references, `allOf`/`anyOf` defaults, and `$def`s.

This commit restores pre-9.6.0 behavior and prevents ValidationError
when nested schemas are referenced.

Caveat: This commit intentionally does *not* restore one specific aspect
of the old (`RefResolver` based) implementation:
If a schema file contains an explicit `$id` pointing to a remote copy of
the schema, the old implementation would fetch any references (in- or
external) from that remote URI.
Instead, the implementation introduced by this commit always resolves
references based on the local schema file.
This behaviour is demonstrated (and validated) with the remote `$id`
test mentioned above.

The following checks have been performed to ensure the correctness of
this patch:

1. All tests (but the remote `$id` one, see above) were successfully
   run at commit `960f6a89eaa31da6014e810dfcf08f635ac03a6e` (last
   ancestor of current `main` that still uses the old `RefResolver`
   implementation), proving that they cover pre-9.6.0 behaviour.
2. Re-running the same tests at commit
   `cf724272eefcd9b30fb625d2e16380727bef9c3e` (direct child of the
   above, introducing the changes from PR snakemake#3420), the three (nested)
   `$ref`s test fail, verifying this commit/PR caused the regression and
   the tests added in this commit catch that regression.
3. The same tests still fail at commit
   `2eb9079dae29f271aeacff75e59f6c2f6a0d352d` (current `main`),
   demonstrating that the regression has not been fixed yet.
4. All tests pass after applying the changes from this commit,
   validating it fixes the regression.

---
fixes snakemake#3648
@mschilli87
Copy link
Copy Markdown
Contributor Author

I went ahead and force-pushed a rebase and the typo fix caught by your automatic check. So this version should pass all tests now. 🤞🏻

@johanneskoester johanneskoester enabled auto-merge (squash) October 11, 2025 08:30
@johanneskoester
Copy link
Copy Markdown
Contributor

Thanks a lot!

@johanneskoester johanneskoester merged commit a3e0939 into snakemake:main Oct 11, 2025
61 checks passed
johanneskoester pushed a commit that referenced this pull request Oct 11, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.13.1](v9.13.0...v9.13.1)
(2025-10-11)


### Bug Fixes

* **config:** restore $ref resolution for schemas
([#3777](#3777))
([a3e0939](a3e0939))
* fix iofile formatting to always display the storage query if it is
set, also in case of missing input exceptions
([#3786](#3786))
([b1c29fa](b1c29fa))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
@mschilli87 mschilli87 deleted the issue-3648 branch October 11, 2025 09:33
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
This PR fixes a regression introduced in Snakemake 9.6.0 where relative
`$ref` references in local config schemas were not correctly resolved
during validation.

This regression was introduced when refactoring the config validation
from the (deprecated) `RefResolver` to the (current) `resolving` API in
snakemake#3420.

The two required changes were:

1. Injection of an absolute `file://` `$id` in schemas to enforce proper
local reference resolution.
2. A follow-up update to retrieve_uri to correctly load subschemas (as
it now receives a `file://` URI).

In addition, to prevent future regressions, this PR adds tests for
(nested) `$ref`s, `$ref` with remote `$id` (see below), fragment
references, `allOf`/`anyOf` defaults, and `$def`s.

This commit restores pre-9.6.0 behavior and prevents ValidationError
when nested schemas are referenced.

Caveat: This PR intentionally does *not* restore one specific aspect of
the old (`RefResolver` based) implementation:
If a schema file contains an explicit `$id` pointing to a remote copy of
the schema, the old implementation would fetch any references (in- or
external) from that remote URI.
Instead, the implementation introduced by this PR always resolves
references based on the local schema file.
This behaviour is demonstrated (and validated) with the remote `$id`
test mentioned above.

The following checks have been performed to ensure the correctness of
this patch:

1. All tests (but the remote `$id` one, see above) were successfully run
at commit
[`960f6a89eaa31da6014e810dfcf08f635ac03a6e`](snakemake@960f6a8)
(last ancestor of current `main` that still uses the old `RefResolver`
implementation), proving that they cover pre-9.6.0 behaviour.
2. Re-running the same tests at commit
[`cf724272eefcd9b30fb625d2e16380727bef9c3e`](snakemake@cf72427)
(direct child of the above, introducing the changes from PR snakemake#3420), the
three (nested) `$ref`s test fail, verifying this commit/PR caused the
regression and the tests added in this commit catch that regression.
3. The same tests still fail at commit
[`2eb9079dae29f271aeacff75e59f6c2f6a0d352d`](snakemake@2eb9079)
(current `main`), demonstrating that the regression has not been fixed
yet.
4. All tests pass after applying the changes from this PR, validating it
fixes the regression.

### QC

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

#### Notes:

This is my first contribution to the project. Thus, I am not familiar
with the codebase.
It is also my first time dealing with `jsonschema` and the both, its
(deprecated) `RefResolver` API (to analyse the old implementation and
design tests accordingly), and its current `resolving` API (to fix the
regression).
Please let me know if you have any adjustments you want/need me to do.

Also, I started from a standalone POC script to make sure I understand
the API, defined tests and then worked on the actual fix off that.
Initially, I overestimated the power of pre-9.6.0's implementation of
the `validate` function and designed a test that tested defaults
propagation across (nested) `$ref`s. Thus, my intermediate solution was
much more complex (and powerful) than this simple (minimal) fix.
I do have a local branch with that work in case anybody is interested in
implementing more powerful defaults value handling during config
validation. Please reach out if there is interest.

---
fixes snakemake#3648

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Schema resolution now enforces local resolution of relative $ref
references, improving validation and default assignment across nested
refs, fragments, allOf/anyOf/defs; may change behavior for schemas that
previously relied on remote resolution.

* **Tests**
* Added comprehensive tests and fixtures covering relative/fragment
references, nested schemas, allOf/anyOf/defs, default handling, and
error scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.13.1](snakemake/snakemake@v9.13.0...v9.13.1)
(2025-10-11)


### Bug Fixes

* **config:** restore $ref resolution for schemas
([snakemake#3777](snakemake#3777))
([a3e0939](snakemake@a3e0939))
* fix iofile formatting to always display the storage query if it is
set, also in case of missing input exceptions
([snakemake#3786](snakemake#3786))
([b1c29fa](snakemake@b1c29fa))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: Config YAML validation with cross-file JSON Schema references fails in Snakemake >= 9.6.0

2 participants