Skip to content

Fix --use-unique-items-as-set to output set literals for default values#2672

Merged
koxudaxi merged 2 commits intomainfrom
fix/unique-items-set-default-literal
Dec 16, 2025
Merged

Fix --use-unique-items-as-set to output set literals for default values#2672
koxudaxi merged 2 commits intomainfrom
fix/unique-items-set-default-literal

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 16, 2025

Summary

  • Fix --use-unique-items-as-set option to convert list default values to set literals
  • Previously: tags: Optional[Set[str]] = ['tag1', 'tag2'] (type mismatch)
  • Now: tags: Optional[Set[str]] = {'tag1', 'tag2'} (correct set literal)

Summary by CodeRabbit

  • New Features

    • Set defaults now render deterministically (sorted representation) for consistent, reproducible output
    • When converting arrays with unique-item constraints to sets, non-hashable elements are detected and left unchanged to avoid errors
  • Tests

    • Added end-to-end tests covering set defaults with unique-item constraints across all supported output model types

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 16, 2025

Walkthrough

Adds a deterministic set repr helper and applies it when generating set defaults across dataclass, msgspec, and pydantic outputs; parser gains a hashability check before converting list defaults to sets; new OpenAPI fixture and parametrized tests validate uniqueItems-as-set behavior.

Changes

Cohort / File(s) Summary
Set utility & base model
src/datamodel_code_generator/model/base.py
Adds public repr_set_sorted(value: set[Any]) -> str; updates DataModelFieldBase.represented_default to use it for set defaults.
Model field generators
src/datamodel_code_generator/model/dataclass.py, src/datamodel_code_generator/model/msgspec.py, src/datamodel_code_generator/model/pydantic/base_model.py
Use repr_set_sorted (via local import) when producing default factories/representations for set defaults; adjust default_factory handling for empty vs non-empty collections.
Parser set conversion
src/datamodel_code_generator/parser/base.py
Adds a hashability check before converting list-typed defaults to sets when uniqueItems is enabled; skips conversion if any element is unhashable.
OpenAPI test fixture
tests/data/openapi/unique_items_default_set.yaml
New schema defining TestModel with three array properties (tags, empty_tags, numbers) using uniqueItems: true and defaults.
Expected outputs
tests/data/expected/main/openapi/unique_items_default_set_dataclass.py, ..._msgspec.py, ..._pydantic.py, ..._pydantic_v2.py
New expected files showing TestModel with Optional[Set[...]] fields and default_factory-based deterministic set defaults.
Parametrized test
tests/main/openapi/test_main_openapi.py
Adds test_main_unique_items_default_set parameterized across pydantic, pydantic_v2, dataclass, and msgspec output types validating --use-unique-items-as-set.

Sequence Diagram

sequenceDiagram
    participant Schema as OpenAPI Schema
    participant Parser as Parser (base.py)
    participant Validator as Hashability Check
    participant Gen as Model Generator (dataclass/msgspec/pydantic)
    participant Repr as repr_set_sorted()
    participant Output as Generated Code

    Schema->>Parser: Parse field with uniqueItems=true and default list
    Parser->>Validator: Attempt to convert list -> set (check element hashability)
    alt Elements hashable
        Validator->>Parser: OK — convert default to set, update field type
        Parser->>Gen: Provide field with set default
        Gen->>Repr: Request deterministic set repr
        Repr->>Gen: Return sorted representation
        Gen->>Output: Emit field using default_factory with sorted repr
    else Elements not hashable
        Validator->>Parser: Not hashable — skip conversion
        Parser->>Gen: Provide field with original list default
        Gen->>Output: Emit field using list default handling
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect repr_set_sorted for correctness with heterogeneous element types and empty-set handling.
  • Verify local imports and consistent usage across dataclass, msgspec, and pydantic generators.
  • Review parser try/except to ensure it only suppresses hashability issues and preserves original defaults when appropriate.
  • Check new tests and expected outputs for exact formatting/whitespace consistency.

Poem

🐇 I sorted my hops and sorted my sets,

No more surprises, no tangled nets.
From parser check to generated line,
Deterministic berries, oh how fine! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the --use-unique-items-as-set option to output set literals instead of list literals for default values.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/unique-items-set-default-literal

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e7ae24 and 14240f3.

📒 Files selected for processing (2)
  • src/datamodel_code_generator/model/dataclass.py (1 hunks)
  • tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/data/expected/main/openapi/unique_items_default_set_dataclass.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/model/dataclass.py (1)
src/datamodel_code_generator/model/base.py (1)
  • repr_set_sorted (52-62)
🪛 GitHub Check: CodeQL
src/datamodel_code_generator/model/dataclass.py

[notice] 155-155: Cyclic import
Import of module datamodel_code_generator.model.base begins an import cycle.

🪛 Ruff (0.14.8)
src/datamodel_code_generator/model/dataclass.py

155-155: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: 3.9 on macOS
  • GitHub Check: 3.9 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.10 on Windows
  • GitHub Check: benchmarks
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.13 on Ubuntu
  • GitHub Check: 3.14 on Windows
  • GitHub Check: Analyze (python)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@koxudaxi koxudaxi marked this pull request as ready for review December 16, 2025 04:46
Comment thread src/datamodel_code_generator/model/dataclass.py Fixed
default_value = data.pop("default")
if default_value:
data["default_factory"] = f"lambda: {default_value!r}"
from datamodel_code_generator.model.base import repr_set_sorted # noqa: PLC0415

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
datamodel_code_generator.model.base
begins an import cycle.

Copilot Autofix

AI 5 months ago

To fix the cyclic import, the dependency between model.msgspec and model.base (specifically for repr_set_sorted) must be broken. The best and least disruptive way is to move the function repr_set_sorted out of model.base into a new utility module (e.g., datamodel_code_generator.utils.repr.py or similar). Both model.base and model.msgspec should then import repr_set_sorted from this new utility module. Since you only provided code from model.msgspec, you can only adjust the import in-place. Thus, update the dynamic import inside the __str__ method so that it imports from the new location.

Required steps:

  • Move the function repr_set_sorted out of datamodel_code_generator.model.base into a new module, e.g., datamodel_code_generator.utils.repr_set.py. (Not possible here; not enough context—so simulate as if that has been done.)
  • In src/datamodel_code_generator/model/msgspec.py, update the dynamic import in __str__ to use the new import path:
    from datamodel_code_generator.utils.repr_set import repr_set_sorted
  • Make this change only inside the __str__ method where the dynamic import previously occurred.

Suggested changeset 1
src/datamodel_code_generator/model/msgspec.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/datamodel_code_generator/model/msgspec.py b/src/datamodel_code_generator/model/msgspec.py
--- a/src/datamodel_code_generator/model/msgspec.py
+++ b/src/datamodel_code_generator/model/msgspec.py
@@ -277,7 +277,7 @@
         if "default" in data and isinstance(data["default"], (list, dict, set)) and "default_factory" not in data:
             default_value = data.pop("default")
             if default_value:
-                from datamodel_code_generator.model.base import repr_set_sorted  # noqa: PLC0415
+                from datamodel_code_generator.utils.repr_set import repr_set_sorted  # noqa: PLC0415
 
                 default_repr = repr_set_sorted(default_value) if isinstance(default_value, set) else repr(default_value)
                 data["default_factory"] = f"lambda: {default_repr}"
EOF
@@ -277,7 +277,7 @@
if "default" in data and isinstance(data["default"], (list, dict, set)) and "default_factory" not in data:
default_value = data.pop("default")
if default_value:
from datamodel_code_generator.model.base import repr_set_sorted # noqa: PLC0415
from datamodel_code_generator.utils.repr_set import repr_set_sorted # noqa: PLC0415

default_repr = repr_set_sorted(default_value) if isinstance(default_value, set) else repr(default_value)
data["default_factory"] = f"lambda: {default_repr}"
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread src/datamodel_code_generator/model/pydantic/base_model.py Dismissed
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (b458594) to head (14240f3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2672   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files          76       76           
  Lines       10648    10666   +18     
  Branches     1300     1303    +3     
=======================================
+ Hits        10607    10625   +18     
  Misses         21       21           
  Partials       20       20           
Flag Coverage Δ
unittests 99.61% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 16, 2025

CodSpeed Performance Report

Merging #2672 will not alter performance

Comparing fix/unique-items-set-default-literal (14240f3) with main (b458594)

Summary

✅ 50 untouched
⏩ 3 skipped1

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b458594 and 1e7ae24.

📒 Files selected for processing (11)
  • src/datamodel_code_generator/model/base.py (2 hunks)
  • src/datamodel_code_generator/model/dataclass.py (1 hunks)
  • src/datamodel_code_generator/model/msgspec.py (1 hunks)
  • src/datamodel_code_generator/model/pydantic/base_model.py (1 hunks)
  • src/datamodel_code_generator/parser/base.py (1 hunks)
  • tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1 hunks)
  • tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1 hunks)
  • tests/data/expected/main/openapi/unique_items_default_set_pydantic.py (1 hunks)
  • tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py (1 hunks)
  • tests/data/openapi/unique_items_default_set.yaml (1 hunks)
  • tests/main/openapi/test_main_openapi.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/datamodel_code_generator/model/msgspec.py (1)
src/datamodel_code_generator/model/base.py (1)
  • repr_set_sorted (52-62)
tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)
src/datamodel_code_generator/model/msgspec.py (1)
  • field (243-248)
src/datamodel_code_generator/model/pydantic/base_model.py (1)
src/datamodel_code_generator/model/base.py (1)
  • repr_set_sorted (52-62)
src/datamodel_code_generator/model/dataclass.py (1)
src/datamodel_code_generator/model/base.py (1)
  • repr_set_sorted (52-62)
🪛 Checkov (3.2.334)
tests/data/openapi/unique_items_default_set.yaml

[high] 1-35: Ensure that the global security field has rules defined

(CKV_OPENAPI_4)


[medium] 12-20: Ensure that arrays have a maximum number of items

(CKV_OPENAPI_21)

🪛 Ruff (0.14.8)
src/datamodel_code_generator/model/msgspec.py

280-280: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/model/pydantic/base_model.py

224-224: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/model/dataclass.py

154-154: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.11 on macOS
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.9 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.13 on macOS
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (10)
tests/data/openapi/unique_items_default_set.yaml (1)

1-33: Well-structured test fixture for unique items with set defaults.

The OpenAPI spec correctly covers the key test scenarios: non-empty string set, empty set, and integer set with uniqueItems enabled. The static analysis warnings about security rules and maxItems are false positives—this is a minimal test fixture, not a production API specification.

tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py (1)

12-15: Expected output correctly demonstrates the fix.

The generated model now properly uses set literals ({'tag1', 'tag2'}, {1, 2, 3}) instead of list literals, and set() for the empty set. This matches the PR objective of fixing the type mismatch between Set[...] annotations and list default values.

tests/data/expected/main/openapi/unique_items_default_set_pydantic.py (1)

12-15: Expected Pydantic v1 output is correct.

The generated model properly uses set literals within Field() calls. The unique_items=True constraint is correctly preserved.

src/datamodel_code_generator/model/pydantic/base_model.py (1)

224-227: Logic for set default representation is correct.

The conditional handling properly uses repr_set_sorted for set defaults to ensure deterministic output, while falling back to standard repr for other types. This aligns with the pattern used in src/datamodel_code_generator/model/base.py.

Regarding the Ruff hint about unused noqa directive: the PLC0415 rule (import-outside-top-level) may be disabled in the project's Ruff configuration, but the noqa comment maintains consistency with other local imports in this file (e.g., line 375). Consider removing if it causes linter noise, but it's not blocking.

tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)

12-15: Msgspec output correctly uses default_factory pattern.

The generated code properly uses field(default_factory=...) for mutable set defaults—this is the correct approach for msgspec Structs. Using set directly (line 14) for the empty set factory is idiomatic, and lambdas with set literals are appropriate for non-empty defaults.

src/datamodel_code_generator/model/base.py (2)

51-62: Well-designed helper for deterministic set representation.

The repr_set_sorted function elegantly handles the key challenge of producing consistent output across Python runs:

  • The (type(x).__name__, repr(x)) sort key safely handles heterogeneous collections and types without __lt__ defined.
  • Empty set correctly returns "set()" (since {} would be an empty dict literal).

296-300: Clean integration with existing property.

The represented_default property now correctly delegates to repr_set_sorted for set values while preserving the existing repr() behavior for all other types.

src/datamodel_code_generator/parser/base.py (1)

1383-1402: Safe default conversion for --use-unique-items-as-set

The new hashability check before converting list defaults to sets is correct: it fixes the type/default mismatch and gracefully skips conversion when elements are unhashable, keeping type and default consistent. This is a good, low-risk guardrail around the option’s behavior.

tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1)

1-15: Expected dataclass output matches new set default semantics

The generated TestModel correctly uses Optional[Set[...]] with field(default_factory=lambda: {...}) (and set() for empty), which aligns with the new deterministic set representation and avoids mutable defaults.

tests/main/openapi/test_main_openapi.py (1)

3746-3764: Good cross-backend coverage for --use-unique-items-as-set

The new parametrized test cleanly validates that all four backends (Pydantic v1/v2, dataclasses, msgspec) render unique-items defaults as set literals when --use-unique-items-as-set is enabled. This directly guards the behavior the PR is fixing.

Comment thread src/datamodel_code_generator/model/dataclass.py
Comment thread src/datamodel_code_generator/model/msgspec.py
Comment thread src/datamodel_code_generator/model/dataclass.py Dismissed
@koxudaxi koxudaxi merged commit b72471d into main Dec 16, 2025
40 checks passed
@koxudaxi koxudaxi deleted the fix/unique-items-set-default-literal branch December 16, 2025 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants