Skip to content

Preserve Python types (Set, Mapping, Sequence) in --input-model#2837

Merged
koxudaxi merged 1 commit intomainfrom
feature/preserve-python-types-in-input-model
Dec 28, 2025
Merged

Preserve Python types (Set, Mapping, Sequence) in --input-model#2837
koxudaxi merged 1 commit intomainfrom
feature/preserve-python-types-in-input-model

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 28, 2025

Summary by CodeRabbit

  • New Features

    • Generated models now preserve Python complex types (Set, FrozenSet, Mapping, Sequence) from JSON Schema definitions, improving code fidelity and preventing type information loss during conversion.
  • Tests

    • Added comprehensive test coverage for type preservation across multiple output formats, nested models, and recursive structures.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 28, 2025

Warning

Rate limit exceeded

@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 632e736 and d01a0b1.

📒 Files selected for processing (6)
  • src/datamodel_code_generator/__main__.py
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/types.py
  • tests/data/python/input_model/dataclass_models.py
  • tests/data/python/input_model/pydantic_models.py
  • tests/test_input_model.py
📝 Walkthrough

Walkthrough

This change introduces systematic preservation of Python typing semantics (Set, FrozenSet, Mapping, Sequence) across JSON Schema conversion by annotating schemas with x-python-type metadata during model loading, then propagating and applying these flags during code generation to restore accurate type hints in output models.

Changes

Cohort / File(s) Summary
Core type preservation logic
src/datamodel_code_generator/__main__.py
Adds utilities to serialize Python types into x-python-type strings and augments JSON Schema with preserved type metadata via _serialize_python_type and _add_python_type_info helpers when loading from Pydantic/dataclass/TypedDict sources
Schema parsing integration
src/datamodel_code_generator/parser/jsonschema.py
Introduces _get_python_type_flags to extract container-type overrides from x-python-type metadata in JSON Schema; propagates parent_obj context through parsing methods to apply dynamic is_dict/is_list/is_set/is_mapping/is_sequence flags
Type system extensions
src/datamodel_code_generator/types.py
Adds three new boolean flags (is_frozen_set, is_mapping, is_sequence) to Config and DataType; updates import generation and type_hint composition to prioritize frozen sets, sequences, and mappings in output
Test input models
tests/data/python/input_model/dataclass_models.py,
tests/data/python/input_model/pydantic_models.py
Introduces new test data classes and Pydantic models (DataclassWithPythonTypes, Tag, ModelWithPythonTypes, RecursiveNode) with fields exercising Set, FrozenSet, Mapping, Sequence, and nested/optional variants
Integration tests
tests/test_input_model.py
Adds 9 test functions verifying x-python-type preservation and correct type generation for TypedDict, dataclass, and recursive model targets

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant __main__
    participant JsonSchema
    participant jsonschema parser
    participant types
    participant Output

    User->>__main__: Load model (Pydantic/dataclass)
    __main__->>__main__: Generate JSON Schema
    __main__->>__main__: _serialize_python_type() for each property
    __main__->>JsonSchema: Annotate with x-python-type metadata<br/>(Set→"set", Mapping→"Mapping", etc.)
    JsonSchema-->>__main__: Augmented schema with x-python-type
    
    __main__->>jsonschema parser: Parse augmented schema
    jsonschema parser->>jsonschema parser: _get_python_type_flags(parent_obj)<br/>Extract x-python-type from extras
    jsonschema parser->>jsonschema parser: Apply is_set/is_mapping/is_sequence<br/>to data_type construction
    jsonschema parser-->>types: Property data_type<br/>with preserved flags
    
    types->>types: type_hint(): Prefer FrozenSet,<br/>Mapping, Sequence based on flags
    types-->>Output: Generate code with<br/>correct Python types

    Output-->>User: Model with preserved types<br/>(Set[str], Mapping[str, int], etc.)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

breaking-change-analyzed

Poem

🐰 A rabbit hops through JSON schemes,
Preserving types—fulfilling dreams!
Set, Mapping, Sequence—no more lost,
x-python-type: worth every cost! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main objective of the pull request: preserving Python types (Set, Mapping, Sequence) when using the --input-model feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 28, 2025

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 28, 2025

CodSpeed Performance Report

Merging #2837 will not alter performance

Comparing feature/preserve-python-types-in-input-model (d01a0b1) with main (fca31fe)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.51%. Comparing base (e4394af) to head (d01a0b1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2837      +/-   ##
==========================================
- Coverage   99.52%   99.51%   -0.01%     
==========================================
  Files          90       90              
  Lines       14092    14245     +153     
  Branches     1674     1697      +23     
==========================================
+ Hits        14025    14176     +151     
- Misses         36       37       +1     
- Partials       31       32       +1     
Flag Coverage Δ
unittests 99.51% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi koxudaxi force-pushed the feature/preserve-python-types-in-input-model branch from bc520e7 to 632e736 Compare December 28, 2025 14:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/datamodel_code_generator/types.py (1)

336-347: New container flags are wired correctly, but base_type_hint ignores them

The additions of is_frozen_set, is_mapping, and is_sequence are cleanly integrated into imports and type_hint, so emitted hints and imports for FrozenSet, Mapping, and Sequence should be correct.

One behavioral gap: base_type_hint still only considers is_list, is_set, and is_dict. For types marked with the new flags, base_type_hint will effectively return just the inner type (e.g. int instead of Mapping[str, int] or str instead of Sequence[str]). If base_type_hint is ever used downstream for these preserved containers (beyond the current regex/RootModel use cases), this could be surprising.

Consider mirroring the type_hint container wrapping in base_type_hint for is_frozen_set, is_mapping, and is_sequence to keep the two accessors consistent, or explicitly documenting that base_type_hint may drop these container wrappers.

Also applies to: 489-533, 579-681

src/datamodel_code_generator/parser/jsonschema.py (1)

1226-1234: Clean up unused noqa directives flagged by Ruff

Ruff reports unused noqa directives here:

  • Line 1226: # noqa: PLR6301 on _get_python_type_flags
  • Line 2458: # noqa: PLR0912 on parse_property_names
  • Line 2462: # noqa: FBT001 on additional_properties parameter

Given these rules are not enabled in this project’s configuration, the noqa comments just generate RUF100 noise. Removing them keeps lint output clean without changing behavior.

Also applies to: 2458-2464

src/datamodel_code_generator/__main__.py (1)

603-608: Remove unused noqa pragmas on local imports and global mutation

Ruff’s RUF100 warnings about unused noqa directives apply here:

  • # noqa: PLC0415 on the local imports inside _init_preserved_type_origins, _serialize_python_type, _find_models_in_type, and _get_type_hints_safe.
  • # noqa: PLW0603 on the global _PRESERVED_TYPE_ORIGINS line.

Since those rules are not enabled, the noqa annotations are unnecessary and now themselves cause lint noise. Dropping these noqa comments will keep the linter quiet without affecting runtime behavior.

Also applies to: 624-624, 635-635, 690-690, 702-702

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4394af and 632e736.

📒 Files selected for processing (6)
  • src/datamodel_code_generator/__main__.py
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/types.py
  • tests/data/python/input_model/dataclass_models.py
  • tests/data/python/input_model/pydantic_models.py
  • tests/test_input_model.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/datamodel_code_generator/parser/jsonschema.py (1)
src/datamodel_code_generator/types.py (1)
  • DataType (296-805)
src/datamodel_code_generator/types.py (7)
src/datamodel_code_generator/model/typed_dict.py (1)
  • imports (164-170)
src/datamodel_code_generator/model/pydantic/base_model.py (2)
  • imports (268-276)
  • field (92-106)
src/datamodel_code_generator/model/base.py (3)
  • imports (324-349)
  • imports (810-815)
  • field (403-405)
src/datamodel_code_generator/model/pydantic_v2/types.py (1)
  • imports (62-67)
src/datamodel_code_generator/model/dataclass.py (2)
  • imports (139-144)
  • field (147-152)
src/datamodel_code_generator/model/enum.py (1)
  • imports (119-121)
src/datamodel_code_generator/model/type_alias.py (1)
  • imports (28-34)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/jsonschema.py

1226-1226: Unused noqa directive (non-enabled: PLR6301)

Remove unused noqa directive

(RUF100)


2458-2458: Unused noqa directive (non-enabled: PLR0912)

Remove unused noqa directive

(RUF100)


2462-2462: Unused noqa directive (non-enabled: FBT001)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/__main__.py

603-603: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


604-604: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


605-605: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


606-606: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


607-607: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


608-608: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


624-624: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)


635-635: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


690-690: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


702-702: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: py312-black24 on Ubuntu
  • GitHub Check: 3.10 on Windows
  • GitHub Check: py312-isort6 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (5)
tests/data/python/input_model/dataclass_models.py (1)

3-25: Dataclass fixture cleanly exercises the preserved Python container types

The new DataclassWithPythonTypes (and associated imports) correctly covers Set, FrozenSet, Mapping, and Sequence using standard-library types and __future__ annotations. This is a good, minimal fixture for the new --input-model behavior and aligns with the Pydantic model counterparts.

src/datamodel_code_generator/parser/jsonschema.py (1)

1226-1253: x-python-type flag handling is well-scoped and keeps existing behavior as a fallback

The new _get_python_type_flags helper and its use in:

  • parse_property_names (via parent_obj),
  • the additionalProperties object branch in parse_item, and
  • parse_array_fields for arrays

cleanly thread x-python-type through to DataType flags (is_set, is_frozen_set, is_mapping, is_sequence) without disturbing the previous defaults:

  • When x-python-type is absent or not a string, you still fall back to {"is_dict": True} or {"is_list": True}.
  • For tuples (is_tuple=True), you intentionally skip container overrides, preserving the existing tuple handling.
  • Using parent_obj for propertyNames is sensible, since the extension describes the container rather than the key-schema itself.

This looks consistent with the DataType changes and should be backwards compatible for schemas without x-python-type.

Also applies to: 2458-2539, 2621-2630, 2705-2715, 2839-2841

src/datamodel_code_generator/__main__.py (1)

597-760: Python-type preservation pipeline for --input-model is well factored

The new helpers:

  • _init_preserved_type_origins / _get_preserved_type_origins
  • _serialize_python_type and _simple_type_name
  • _collect_nested_models / _find_models_in_type
  • _get_type_hints_safe
  • _add_python_type_to_properties, _add_python_type_info, and _add_python_type_info_generic

are cohesive and safely scoped:

  • You only touch schemas produced by BaseModel.model_json_schema() and TypeAdapter(obj).json_schema(), so non---input-model flows are unaffected.
  • Container origins are mapped for both builtins (set, frozenset) and ABCs (Mapping, Sequence, Mutable* variants), which matches the imports used in your test fixtures.
  • Nested models in $defs are handled via model_fields and recursive discovery, ensuring that things like Tag.values: FrozenSet[str] get x-python-type even when defined as a referenced definition.
  • The dataclass/TypedDict path via _add_python_type_info_generic is a reasonable, simpler approximation that still captures per-field containers via get_type_hints.

Given the parser changes, this should provide the end-to-end preservation you’re testing (Set/FrozenSet/Mapping/Sequence in both top-level and nested Pydantic models, plus dataclasses/TypedDict).

Also applies to: 762-867

tests/data/python/input_model/pydantic_models.py (1)

3-41: Pydantic fixtures comprehensively cover preserved type scenarios

The new Tag, ModelWithPythonTypes, and RecursiveNode models are well chosen:

  • They mirror the dataclass fixtures and exercise Set, FrozenSet, Mapping, Sequence, nested Mapping[str, Set[int]], and recursive Set usage.
  • from __future__ import annotations makes the recursive children: Optional[list[RecursiveNode]] safe for Pydantic v2.

These look like solid inputs for the new --input-model tests.

tests/test_input_model.py (1)

464-558: New --input-model tests accurately validate preserved container types

The added “x-python-type preservation” tests:

  • Exercise Set/FrozenSet/Mapping/Sequence on Pydantic models (ModelWithPythonTypes) and confirm they survive into generated code.
  • Verify that preservation holds when targeting typing.TypedDict and dataclasses.dataclass.
  • Cover dataclass input (DataclassWithPythonTypes) and a recursive model (RecursiveNode), which is important for the nested-model collection logic.

The use of SKIP_PYDANTIC_V1 and the shared run_input_model_and_assert helper keeps these tests consistent with the rest of the suite. The expectations ("set[str]", "frozenset[int]", "Mapping[str, int]", "Sequence[str]") match the intended output under the new pipeline.

@koxudaxi koxudaxi force-pushed the feature/preserve-python-types-in-input-model branch from 632e736 to d01a0b1 Compare December 28, 2025 15:00
@koxudaxi koxudaxi merged commit 2cc9cee into main Dec 28, 2025
36 of 37 checks passed
@koxudaxi koxudaxi deleted the feature/preserve-python-types-in-input-model branch December 28, 2025 15:04
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: Breaking changes detected

Reasoning: This PR changes the generated code output when using the --input-model feature with Pydantic models or dataclasses that contain Set, FrozenSet, Mapping, or Sequence types. Previously, these types were converted to list/dict in the generated output because JSON Schema doesn't distinguish them. Now, the tool preserves the original Python types via an x-python-type extension. While this is semantically correct and an improvement, users who have code depending on the exact previous output (e.g., expecting list[str] instead of set[str]) may experience breakage. The change affects both code generation output and adds new type flags to the DataType model that custom templates may need to handle.

Content for Release Notes

Code Generation Changes

  • Different output when using --input-model with Set, FrozenSet, Mapping, or Sequence types - When using --input-model to convert Pydantic models or dataclasses, types that were previously converted to list or dict are now preserved as their original Python types. For example, a field typed as Set[str] now generates set[str] instead of list[str], FrozenSet[T] generates frozenset[T], Mapping[K, V] generates Mapping[K, V] instead of dict[K, V], and Sequence[T] generates Sequence[T] instead of list[T]. This may cause type checking differences or runtime behavior changes if your code depended on the previous output types. (Preserve Python types (Set, Mapping, Sequence) in --input-model #2837)

Custom Template Update Required

  • New DataType flags available for custom templates - Three new boolean flags have been added to the DataType class: is_frozen_set, is_mapping, and is_sequence. Custom Jinja2 templates that inspect DataType flags may need to be updated to handle these new type variations if they contain logic that depends on exhaustive type flag checks. (Preserve Python types (Set, Mapping, Sequence) in --input-model #2837)

This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant