Fix --use-unique-items-as-set to output set literals for default values by koxudaxi · Pull Request #2672 · koxudaxi/datamodel-code-generator

koxudaxi · 2025-12-16T04:45:50Z

Summary

Fix --use-unique-items-as-set option to convert list default values to set literals
Previously: tags: Optional[Set[str]] = ['tag1', 'tag2'] (type mismatch)
Now: tags: Optional[Set[str]] = {'tag1', 'tag2'} (correct set literal)

Summary by CodeRabbit

New Features
- Set defaults now render deterministically (sorted representation) for consistent, reproducible output
- When converting arrays with unique-item constraints to sets, non-hashable elements are detected and left unchanged to avoid errors
Tests
- Added end-to-end tests covering set defaults with unique-item constraints across all supported output model types

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-16T04:45:56Z

Walkthrough

Adds a deterministic set repr helper and applies it when generating set defaults across dataclass, msgspec, and pydantic outputs; parser gains a hashability check before converting list defaults to sets; new OpenAPI fixture and parametrized tests validate uniqueItems-as-set behavior.

Changes

Cohort / File(s)	Summary
Set utility & base model `src/datamodel_code_generator/model/base.py`	Adds public `repr_set_sorted(value: set[Any]) -> str`; updates `DataModelFieldBase.represented_default` to use it for set defaults.
Model field generators `src/datamodel_code_generator/model/dataclass.py`, `src/datamodel_code_generator/model/msgspec.py`, `src/datamodel_code_generator/model/pydantic/base_model.py`	Use `repr_set_sorted` (via local import) when producing default factories/representations for set defaults; adjust default_factory handling for empty vs non-empty collections.
Parser set conversion `src/datamodel_code_generator/parser/base.py`	Adds a hashability check before converting list-typed defaults to sets when `uniqueItems` is enabled; skips conversion if any element is unhashable.
OpenAPI test fixture `tests/data/openapi/unique_items_default_set.yaml`	New schema defining `TestModel` with three array properties (`tags`, `empty_tags`, `numbers`) using `uniqueItems: true` and defaults.
Expected outputs `tests/data/expected/main/openapi/unique_items_default_set_dataclass.py`, `..._msgspec.py`, `..._pydantic.py`, `..._pydantic_v2.py`	New expected files showing `TestModel` with `Optional[Set[...]]` fields and default_factory-based deterministic set defaults.
Parametrized test `tests/main/openapi/test_main_openapi.py`	Adds `test_main_unique_items_default_set` parameterized across pydantic, pydantic_v2, dataclass, and msgspec output types validating `--use-unique-items-as-set`.

Sequence Diagram

sequenceDiagram
    participant Schema as OpenAPI Schema
    participant Parser as Parser (base.py)
    participant Validator as Hashability Check
    participant Gen as Model Generator (dataclass/msgspec/pydantic)
    participant Repr as repr_set_sorted()
    participant Output as Generated Code

    Schema->>Parser: Parse field with uniqueItems=true and default list
    Parser->>Validator: Attempt to convert list -> set (check element hashability)
    alt Elements hashable
        Validator->>Parser: OK — convert default to set, update field type
        Parser->>Gen: Provide field with set default
        Gen->>Repr: Request deterministic set repr
        Repr->>Gen: Return sorted representation
        Gen->>Output: Emit field using default_factory with sorted repr
    else Elements not hashable
        Validator->>Parser: Not hashable — skip conversion
        Parser->>Gen: Provide field with original list default
        Gen->>Output: Emit field using list default handling
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect repr_set_sorted for correctness with heterogeneous element types and empty-set handling.
Verify local imports and consistent usage across dataclass, msgspec, and pydantic generators.
Review parser try/except to ensure it only suppresses hashability issues and preserves original defaults when appropriate.
Check new tests and expected outputs for exact formatting/whitespace consistency.

Poem

🐇 I sorted my hops and sorted my sets,

No more surprises, no tangled nets.
From parser check to generated line,
Deterministic berries, oh how fine! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing the --use-unique-items-as-set option to output set literals instead of list literals for default values.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/unique-items-set-default-literal

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e7ae24 and 14240f3.

📒 Files selected for processing (2)

src/datamodel_code_generator/model/dataclass.py (1 hunks)
tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/data/expected/main/openapi/unique_items_default_set_dataclass.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/datamodel_code_generator/model/dataclass.py (1)

src/datamodel_code_generator/model/base.py (1)

repr_set_sorted (52-62)

🪛 GitHub Check: CodeQL

src/datamodel_code_generator/model/dataclass.py

[notice] 155-155: Cyclic import
Import of module datamodel_code_generator.model.base begins an import cycle.

🪛 Ruff (0.14.8)

src/datamodel_code_generator/model/dataclass.py

155-155: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: py312-black22 on Ubuntu
GitHub Check: 3.9 on macOS
GitHub Check: 3.9 on Windows
GitHub Check: 3.11 on Windows
GitHub Check: 3.10 on Windows
GitHub Check: benchmarks
GitHub Check: 3.13 on Windows
GitHub Check: 3.13 on Ubuntu
GitHub Check: 3.14 on Windows
GitHub Check: Analyze (python)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

            default_value = data.pop("default")
            if default_value:
-                data["default_factory"] = f"lambda: {default_value!r}"
+                from datamodel_code_generator.model.base import repr_set_sorted  # noqa: PLC0415


To fix the cyclic import, the dependency between model.msgspec and model.base (specifically for repr_set_sorted) must be broken. The best and least disruptive way is to move the function repr_set_sorted out of model.base into a new utility module (e.g., datamodel_code_generator.utils.repr.py or similar). Both model.base and model.msgspec should then import repr_set_sorted from this new utility module. Since you only provided code from model.msgspec, you can only adjust the import in-place. Thus, update the dynamic import inside the __str__ method so that it imports from the new location.

Required steps:

Move the function repr_set_sorted out of datamodel_code_generator.model.base into a new module, e.g., datamodel_code_generator.utils.repr_set.py. (Not possible here; not enough context—so simulate as if that has been done.)

In src/datamodel_code_generator/model/msgspec.py, update the dynamic import in __str__ to use the new import path:
from datamodel_code_generator.utils.repr_set import repr_set_sorted

Make this change only inside the __str__ method where the dynamic import previously occurred.

codecov · 2025-12-16T04:49:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (b458594) to head (14240f3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2672   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files          76       76           
  Lines       10648    10666   +18     
  Branches     1300     1303    +3     
=======================================
+ Hits        10607    10625   +18     
  Misses         21       21           
  Partials       20       20

Flag	Coverage Δ
unittests	`99.61% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-12-16T04:51:01Z

CodSpeed Performance Report

Merging #2672 will not alter performance

_{Comparing fix/unique-items-set-default-literal (14240f3) with main (b458594)}

Summary

✅ 50 untouched
⏩ 3 skipped¹

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b458594 and 1e7ae24.

📒 Files selected for processing (11)

src/datamodel_code_generator/model/base.py (2 hunks)
src/datamodel_code_generator/model/dataclass.py (1 hunks)
src/datamodel_code_generator/model/msgspec.py (1 hunks)
src/datamodel_code_generator/model/pydantic/base_model.py (1 hunks)
src/datamodel_code_generator/parser/base.py (1 hunks)
tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1 hunks)
tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1 hunks)
tests/data/expected/main/openapi/unique_items_default_set_pydantic.py (1 hunks)
tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py (1 hunks)
tests/data/openapi/unique_items_default_set.yaml (1 hunks)
tests/main/openapi/test_main_openapi.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

src/datamodel_code_generator/model/msgspec.py (1)

src/datamodel_code_generator/model/base.py (1)

repr_set_sorted (52-62)

tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)

src/datamodel_code_generator/model/msgspec.py (1)

field (243-248)

src/datamodel_code_generator/model/pydantic/base_model.py (1)

src/datamodel_code_generator/model/base.py (1)

repr_set_sorted (52-62)

src/datamodel_code_generator/model/dataclass.py (1)

src/datamodel_code_generator/model/base.py (1)

repr_set_sorted (52-62)

🪛 Checkov (3.2.334)

tests/data/openapi/unique_items_default_set.yaml

[high] 1-35: Ensure that the global security field has rules defined

(CKV_OPENAPI_4)

[medium] 12-20: Ensure that arrays have a maximum number of items

(CKV_OPENAPI_21)

🪛 Ruff (0.14.8)

src/datamodel_code_generator/model/msgspec.py

280-280: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/model/pydantic/base_model.py

224-224: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/model/dataclass.py

154-154: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: 3.10 on macOS
GitHub Check: 3.11 on macOS
GitHub Check: 3.10 on Windows
GitHub Check: 3.12 on Windows
GitHub Check: 3.9 on Windows
GitHub Check: 3.13 on Windows
GitHub Check: 3.12 on macOS
GitHub Check: 3.11 on Windows
GitHub Check: 3.13 on macOS
GitHub Check: Analyze (python)
GitHub Check: benchmarks

🔇 Additional comments (10)

tests/data/openapi/unique_items_default_set.yaml (1)

1-33: Well-structured test fixture for unique items with set defaults.

The OpenAPI spec correctly covers the key test scenarios: non-empty string set, empty set, and integer set with uniqueItems enabled. The static analysis warnings about security rules and maxItems are false positives—this is a minimal test fixture, not a production API specification.

tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py (1)

12-15: Expected output correctly demonstrates the fix.

The generated model now properly uses set literals ({'tag1', 'tag2'}, {1, 2, 3}) instead of list literals, and set() for the empty set. This matches the PR objective of fixing the type mismatch between Set[...] annotations and list default values.

tests/data/expected/main/openapi/unique_items_default_set_pydantic.py (1)

12-15: Expected Pydantic v1 output is correct.

The generated model properly uses set literals within Field() calls. The unique_items=True constraint is correctly preserved.

src/datamodel_code_generator/model/pydantic/base_model.py (1)

224-227: Logic for set default representation is correct.

The conditional handling properly uses repr_set_sorted for set defaults to ensure deterministic output, while falling back to standard repr for other types. This aligns with the pattern used in src/datamodel_code_generator/model/base.py.

Regarding the Ruff hint about unused noqa directive: the PLC0415 rule (import-outside-top-level) may be disabled in the project's Ruff configuration, but the noqa comment maintains consistency with other local imports in this file (e.g., line 375). Consider removing if it causes linter noise, but it's not blocking.

tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)

12-15: Msgspec output correctly uses default_factory pattern.

The generated code properly uses field(default_factory=...) for mutable set defaults—this is the correct approach for msgspec Structs. Using set directly (line 14) for the empty set factory is idiomatic, and lambdas with set literals are appropriate for non-empty defaults.

src/datamodel_code_generator/model/base.py (2)

51-62: Well-designed helper for deterministic set representation.

The repr_set_sorted function elegantly handles the key challenge of producing consistent output across Python runs:

The (type(x).__name__, repr(x)) sort key safely handles heterogeneous collections and types without __lt__ defined.

Empty set correctly returns "set()" (since {} would be an empty dict literal).

296-300: Clean integration with existing property.

The represented_default property now correctly delegates to repr_set_sorted for set values while preserving the existing repr() behavior for all other types.

src/datamodel_code_generator/parser/base.py (1)

1383-1402: Safe default conversion for --use-unique-items-as-set

The new hashability check before converting list defaults to sets is correct: it fixes the type/default mismatch and gracefully skips conversion when elements are unhashable, keeping type and default consistent. This is a good, low-risk guardrail around the option’s behavior.

tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1)

1-15: Expected dataclass output matches new set default semantics

The generated TestModel correctly uses Optional[Set[...]] with field(default_factory=lambda: {...}) (and set() for empty), which aligns with the new deterministic set representation and avoids mutable defaults.

tests/main/openapi/test_main_openapi.py (1)

3746-3764: Good cross-backend coverage for --use-unique-items-as-set

The new parametrized test cleanly validates that all four backends (Pydantic v1/v2, dataclasses, msgspec) render unique-items defaults as set literals when --use-unique-items-as-set is enabled. This directly guards the behavior the PR is fixing.

Add support for unique items as set literals in generated models

1e7ae24

koxudaxi marked this pull request as ready for review December 16, 2025 04:46

github-advanced-security AI found potential problems Dec 16, 2025

View reviewed changes

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

Comment thread src/datamodel_code_generator/model/dataclass.py

Comment thread src/datamodel_code_generator/model/msgspec.py

Add support for unique items as set literals in generated models

14240f3

github-advanced-security AI found potential problems Dec 16, 2025

View reviewed changes

Comment thread src/datamodel_code_generator/model/dataclass.py Dismissed

koxudaxi merged commit b72471d into main Dec 16, 2025
40 checks passed

koxudaxi deleted the fix/unique-items-set-default-literal branch December 16, 2025 06:23

coderabbitai Bot mentioned this pull request Jan 4, 2026

Add __hash__ to Pydantic v2 models used in sets #2918

Merged

@@ -277,7 +277,7 @@
                     if "default" in data and isinstance(data["default"], (list, dict, set)) and "default_factory" not in data:
                         default_value = data.pop("default")
                         if default_value:
-                            from datamodel_code_generator.model.base import repr_set_sorted  # noqa: PLC0415
+                            from datamodel_code_generator.utils.repr_set import repr_set_sorted  # noqa: PLC0415
                             default_repr = repr_set_sorted(default_value) if isinstance(default_value, set) else repr(default_value)
                             data["default_factory"] = f"lambda: {default_repr}"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix --use-unique-items-as-set to output set literals for default values#2672

Fix --use-unique-items-as-set to output set literals for default values#2672
koxudaxi merged 2 commits intomainfrom
fix/unique-items-set-default-literal

koxudaxi commented Dec 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Check notice

Copilot Autofix

Uh oh!

Uh oh!

codecov Bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

koxudaxi commented Dec 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Check notice

Uh oh!

Copilot Autofix

Uh oh!

Uh oh!

codecov Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #2672 will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

koxudaxi commented Dec 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 16, 2025 •

edited

Loading

codecov Bot commented Dec 16, 2025 •

edited

Loading

codspeed-hq Bot commented Dec 16, 2025 •

edited

Loading