feat: Add --external-ref-mapping to import from external packages instead of generating#3006
Conversation
Add a new --external-ref-mapping option that maps external $ref file paths to Python import packages. When a $ref points to a mapped file, an import statement is generated instead of a duplicate class definition. This commit adds the option to the CLI argument parser, Config class, GenerateConfig, ParserConfig, and their TypedDict counterparts. No behavior change yet — the core logic follows in the next commit. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When --external-ref-mapping is provided, external $ref targets that match a mapped file produce import-based DataTypes (via Import.from_full_path and DataType.from_import) instead of loading and parsing the external file. This follows the exact same pattern as the existing x-python-import vendor extension, but configured externally via CLI rather than requiring modifications to the schema YAML. Three changes in jsonschema.py: - __init__: normalize mapping file paths to absolute for reliable matching - get_ref_data_type: check mapping before _load_ref_schema_object - resolve_ref: skip loading/parsing for mapped external files Co-Authored-By: Claude Opus 4.6 <[email protected]>
Five test cases covering: - Basic CLI usage: external refs produce imports, not class definitions - No duplicate classes: mapped types are imported, not generated - Regression: without the flag, behavior is unchanged - Invalid format: missing '=' in mapping produces a clear error - Programmatic API: GenerateConfig with external_ref_mapping dict Test fixtures: api.yaml referencing common.yaml via $ref, with expected output showing imports from the mapped package. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence DiagramsequenceDiagram
participant User as User
participant CLI as CLI
participant Config as Config
participant Parser as JSONSchemaParser
participant Resolver as Resolver
User->>CLI: run with --external-ref-mapping path/schema.yaml=mypackage.models
CLI->>Config: parse args (list of KEY=VALUE)
Config->>Config: validate & convert to dict[str,str]
Config->>Parser: initialize with external_ref_mapping
Parser->>Parser: normalize mapping (abs paths / URLs)
User->>Parser: encounter $ref "path/schema.yaml#/components/schemas/User"
Parser->>Parser: _resolve_external_ref_mapping(ref)?
alt mapped
Parser->>Parser: _check_external_ref_mapping -> build Import-backed DataType
Parser->>User: return DataType referencing imported model
else not mapped
Parser->>Resolver: resolve_ref -> load/parse external schema
Resolver->>Parser: return generated DataType (class)
Parser->>User: return generated DataType
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adds a new --external-ref-mapping CLI option that allows mapping external $ref file paths to Python import packages. When enabled, the code generator produces import statements instead of duplicating class definitions from external schemas. This solves the problem of code bloat and type identity issues in multi-spec API architectures.
Changes:
- Added
--external-ref-mappingCLI option with validation forFILE_PATH=PYTHON_PACKAGEformat - Implemented mapping logic in JSON Schema parser to check refs against configured mappings before loading
- Added comprehensive test coverage with 5 test cases covering basic functionality, regression, error handling, and programmatic API usage
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/datamodel_code_generator/parser/jsonschema.py |
Core implementation: path normalization, ref checking, and import generation logic |
src/datamodel_code_generator/__main__.py |
CLI validator to parse KEY=VALUE format and error handling |
src/datamodel_code_generator/config.py |
Added external_ref_mapping field to Config and ParserConfig |
src/datamodel_code_generator/arguments.py |
CLI argument definition with help text |
src/datamodel_code_generator/cli_options.py |
CLI option metadata registration |
src/datamodel_code_generator/_types/parser_config_dicts.py |
TypedDict field for type safety |
src/datamodel_code_generator/_types/generate_config_dict.py |
TypedDict field for GenerateConfig |
tests/main/openapi/test_external_ref_mapping.py |
Comprehensive test suite with 5 test cases |
tests/data/openapi/external_ref_mapping/api.yaml |
Test fixture for main API spec |
tests/data/openapi/external_ref_mapping/common.yaml |
Test fixture for shared schemas |
tests/data/expected/main/openapi/external_ref_mapping.py |
Expected output with imports instead of classes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/datamodel_code_generator/parser/jsonschema.py`:
- Around line 1338-1346: The code extracts class_name from the JSON pointer
fragment and uses it raw to build an import, which can produce invalid or
mismatched names; update the logic that builds full_path so the pointer segment
is unescaped and normalized using the same class-name resolver the generator
uses for models (apply the resolver used by the generator but skip the
uniqueness-renaming step), e.g. replace the raw class_name derived from fragment
with a normalized_name produced by the resolver, then call
Import.from_full_path(f"{python_package}.{normalized_name}") and append that
import to self.imports; reference symbols: fragment, class_name, python_package,
Import.from_full_path, self.imports, and the generator's class-name resolver
function.
- Around line 3831-3839: _update _is_external_ref_mapped and
_check_external_ref_mapping to resolve file refs using the dynamic parser
context instead of the static root: when splitting ref into file_part use
base_path = Path(file_part).parent if file_part else
self.model_resolver.current_base_path (same pattern as at line 1742), then
resolve the referenced file path against that context (use
model_resolver.current_base_path as the fallback) and check membership in
self._external_ref_mapping; replace usages of self.base_path / file_part with
this context-aware resolution so nested-schema relative refs map correctly
(apply the same change in both _is_external_ref_mapped and
_check_external_ref_mapping).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
tests/data/openapi/external_ref_mapping/api.yamlis excluded by!tests/data/**/*.yamland included by nonetests/data/openapi/external_ref_mapping/common.yamlis excluded by!tests/data/**/*.yamland included by none
📒 Files selected for processing (9)
src/datamodel_code_generator/__main__.pysrc/datamodel_code_generator/_types/generate_config_dict.pysrc/datamodel_code_generator/_types/parser_config_dicts.pysrc/datamodel_code_generator/arguments.pysrc/datamodel_code_generator/cli_options.pysrc/datamodel_code_generator/config.pysrc/datamodel_code_generator/parser/jsonschema.pytests/data/expected/main/openapi/external_ref_mapping.pytests/main/openapi/test_external_ref_mapping.py
With nargs="+", multiple mappings are passed after a single flag invocation, not by repeating the flag. Updated help text to reflect this. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/datamodel_code_generator/arguments.py (2)
998-1005: Consider adding atype=validator for early format validation.Without a validator, malformed entries like
path/to/schema.yaml(missing the=separator) are silently accepted at parse time and only fail downstream. A lightweight validator would surface the error immediately with a clear message.♻️ Example validator
def _external_ref_mapping(value: str) -> str: """Validate FILE_PATH=PYTHON_PACKAGE format.""" if "=" not in value: msg = f"Invalid format {value!r}: expected 'FILE_PATH=PYTHON_PACKAGE'" raise ArgumentTypeError(msg) return valuebase_options.add_argument( "--external-ref-mapping", nargs="+", metavar="FILE_PATH=PYTHON_PACKAGE", + type=_external_ref_mapping, help=..., )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/datamodel_code_generator/arguments.py` around lines 998 - 1005, Add a lightweight validator function named _external_ref_mapping(value: str) that checks for the '=' separator and raises argparse.ArgumentTypeError with a clear message when missing, import ArgumentTypeError from argparse, and wire it into the existing base_options.add_argument call for "--external-ref-mapping" via the type=_external_ref_mapping parameter so malformed entries are rejected at parse time; update the add_argument invocation to include type=_external_ref_mapping and keep the existing nargs/metavar/help unchanged.
998-1005: Nit: argument is placed inside the# Schema version optionssection.
--external-ref-mappingis functionally a base/input option unrelated to schema versioning. Moving it above the# Schema version optionscomment (or after it, before the GraphQL section) would keep the section grouping meaningful.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/datamodel_code_generator/arguments.py` around lines 998 - 1005, The --external-ref-mapping argument (added via base_options.add_argument("--external-ref-mapping", ...)) is located inside the "# Schema version options" block; move that add_argument call out of that section into the base/input options area so grouping remains logical — e.g., relocate the base_options.add_argument for "--external-ref-mapping" above the "# Schema version options" comment (or immediately after it but before the GraphQL section), keeping the argument definition unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/datamodel_code_generator/arguments.py`:
- Around line 998-1005: Add a lightweight validator function named
_external_ref_mapping(value: str) that checks for the '=' separator and raises
argparse.ArgumentTypeError with a clear message when missing, import
ArgumentTypeError from argparse, and wire it into the existing
base_options.add_argument call for "--external-ref-mapping" via the
type=_external_ref_mapping parameter so malformed entries are rejected at parse
time; update the add_argument invocation to include type=_external_ref_mapping
and keep the existing nargs/metavar/help unchanged.
- Around line 998-1005: The --external-ref-mapping argument (added via
base_options.add_argument("--external-ref-mapping", ...)) is located inside the
"# Schema version options" block; move that add_argument call out of that
section into the base/input options area so grouping remains logical — e.g.,
relocate the base_options.add_argument for "--external-ref-mapping" above the "#
Schema version options" comment (or immediately after it but before the GraphQL
section), keeping the argument definition unchanged.
Merging this PR will not alter performance
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/datamodel_code_generator/parser/jsonschema.py (1)
1359-1366:⚠️ Potential issue | 🟠 MajorNormalize mapped class names before building imports.
At Line [1360], the class name is taken directly from the JSON pointer fragment. This can produce invalid/mismatched imports for escaped or non-identifier schema keys (e.g., kebab-case or
~1/%escapes).🔧 Proposed fix
- # Extract class name from fragment (e.g., /components/schemas/SecretMetadata -> SecretMetadata) - class_name = fragment.rstrip("/").rsplit("/", maxsplit=1)[-1] + # Extract and normalize class name from JSON pointer fragment + raw_name = unescape_json_pointer_segment(fragment.rstrip("/").rsplit("/", maxsplit=1)[-1]) + class_name = self.model_resolver.get_class_name(raw_name, unique=False).name if not class_name: return None # Construct import — same pattern as x-python-import full_path = f"{python_package}.{class_name}"
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
tests/data/openapi/external_ref_mapping/api_nested.yamlis excluded by!tests/data/**/*.yamland included by nonetests/data/openapi/external_ref_mapping/nested/level1.yamlis excluded by!tests/data/**/*.yamland included by none
📒 Files selected for processing (5)
src/datamodel_code_generator/parser/jsonschema.pytests/data/expected/main/input_model/config_class.pytests/data/expected/main/openapi/external_ref_mapping_nested.pytests/main/openapi/test_external_ref_mapping.pytests/main/test_public_api_signature_baseline.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3006 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 94 94
Lines 18182 18348 +166
Branches 2108 2129 +21
==========================================
+ Hits 18182 18348 +166
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/datamodel_code_generator/__main__.py`:
- Around line 327-336: The loop that parses raw external-ref mappings accepts
empty file_path or python_package values; update the parsing in __main__.py (the
for item in raw loop that splits into file_path, python_package and assigns
mapping[file_path.strip()] = python_package.strip()) to validate that both
file_path.strip() and python_package.strip() are non-empty after splitting, and
if either is empty raise Error with a clear message (e.g., "Invalid
--external-ref-mapping: empty file path or package in {item!r}"); ensure you
still trim whitespace and only add the mapping when both values are present and
valid.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/datamodel_code_generator/__main__.pysrc/datamodel_code_generator/arguments.pysrc/datamodel_code_generator/parser/jsonschema.pytests/main/openapi/test_external_ref_mapping.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/datamodel_code_generator/arguments.py
Breaking Change AnalysisResult: No breaking changes detected Reasoning: This PR is purely additive - it introduces a new opt-in This analysis was performed by Claude Code Action |
|
🎉 Released in 0.54.1 This PR is now available in the latest release. See the release notes for details. |
Summary
Adds
--external-ref-mappingCLI option that maps external$reffile paths to Python import packages. When a$refpoints to a mapped file, an import statement is generated instead of a duplicate class definition.datamodel-codegen \ --input api.yaml \ --external-ref-mapping "../common/components.yaml=mypackage.shared.models"Before (without flag):
After (with flag):
Motivation
Multi-spec API architectures (microservices, admin/public API splits, monorepos) commonly share schemas via external
$ref. Today,datamodel-codegenresolves these refs and generates duplicate class definitions in every output package. This causes:admin.models.User != shared.models.Usereven though they're semantically identicalComparison with
x-python-importThe codebase already supports
x-python-importfor this purpose, but it requires modifying the schema YAML itself:x-python-import--external-ref-mappingImplementation
101 lines added, 0 deleted. Three clean commits:
Import.from_full_path()+DataType.from_import()pattern asx-python-importCore changes in
parser/jsonschema.py:__init__: normalize mapping paths to absolute for reliable matching_check_external_ref_mapping(): split ref → match file → extract class name → returnDataType.from_import()get_ref_data_type(): check mapping before_load_ref_schema_object()(prevents loading external file)resolve_ref(): skip loading/parsing for mapped external filesZero behavior change without the flag
Purely opt-in. All existing tests pass unmodified.
Test plan
=in mapping produces a clear error messageGenerateConfig(external_ref_mapping={...})works🤖 Generated with Claude Code
Summary by CodeRabbit