Fix --use-unique-items-as-set to output set literals for default values#2672
Fix --use-unique-items-as-set to output set literals for default values#2672
Conversation
WalkthroughAdds a deterministic set repr helper and applies it when generating set defaults across dataclass, msgspec, and pydantic outputs; parser gains a hashability check before converting list defaults to sets; new OpenAPI fixture and parametrized tests validate uniqueItems-as-set behavior. Changes
Sequence DiagramsequenceDiagram
participant Schema as OpenAPI Schema
participant Parser as Parser (base.py)
participant Validator as Hashability Check
participant Gen as Model Generator (dataclass/msgspec/pydantic)
participant Repr as repr_set_sorted()
participant Output as Generated Code
Schema->>Parser: Parse field with uniqueItems=true and default list
Parser->>Validator: Attempt to convert list -> set (check element hashability)
alt Elements hashable
Validator->>Parser: OK — convert default to set, update field type
Parser->>Gen: Provide field with set default
Gen->>Repr: Request deterministic set repr
Repr->>Gen: Return sorted representation
Gen->>Output: Emit field using default_factory with sorted repr
else Elements not hashable
Validator->>Parser: Not hashable — skip conversion
Parser->>Gen: Provide field with original list default
Gen->>Output: Emit field using list default handling
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code graph analysis (1)src/datamodel_code_generator/model/dataclass.py (1)
🪛 GitHub Check: CodeQLsrc/datamodel_code_generator/model/dataclass.py[notice] 155-155: Cyclic import 🪛 Ruff (0.14.8)src/datamodel_code_generator/model/dataclass.py155-155: Unused Remove unused (RUF100) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| default_value = data.pop("default") | ||
| if default_value: | ||
| data["default_factory"] = f"lambda: {default_value!r}" | ||
| from datamodel_code_generator.model.base import repr_set_sorted # noqa: PLC0415 |
Check notice
Code scanning / CodeQL
Cyclic import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 months ago
To fix the cyclic import, the dependency between model.msgspec and model.base (specifically for repr_set_sorted) must be broken. The best and least disruptive way is to move the function repr_set_sorted out of model.base into a new utility module (e.g., datamodel_code_generator.utils.repr.py or similar). Both model.base and model.msgspec should then import repr_set_sorted from this new utility module. Since you only provided code from model.msgspec, you can only adjust the import in-place. Thus, update the dynamic import inside the __str__ method so that it imports from the new location.
Required steps:
- Move the function
repr_set_sortedout ofdatamodel_code_generator.model.baseinto a new module, e.g.,datamodel_code_generator.utils.repr_set.py. (Not possible here; not enough context—so simulate as if that has been done.) - In
src/datamodel_code_generator/model/msgspec.py, update the dynamic import in__str__to use the new import path:
from datamodel_code_generator.utils.repr_set import repr_set_sorted - Make this change only inside the
__str__method where the dynamic import previously occurred.
| @@ -277,7 +277,7 @@ | ||
| if "default" in data and isinstance(data["default"], (list, dict, set)) and "default_factory" not in data: | ||
| default_value = data.pop("default") | ||
| if default_value: | ||
| from datamodel_code_generator.model.base import repr_set_sorted # noqa: PLC0415 | ||
| from datamodel_code_generator.utils.repr_set import repr_set_sorted # noqa: PLC0415 | ||
|
|
||
| default_repr = repr_set_sorted(default_value) if isinstance(default_value, set) else repr(default_value) | ||
| data["default_factory"] = f"lambda: {default_repr}" |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2672 +/- ##
=======================================
Coverage 99.61% 99.61%
=======================================
Files 76 76
Lines 10648 10666 +18
Branches 1300 1303 +3
=======================================
+ Hits 10607 10625 +18
Misses 21 21
Partials 20 20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodSpeed Performance ReportMerging #2672 will not alter performanceComparing Summary
Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/datamodel_code_generator/model/base.py(2 hunks)src/datamodel_code_generator/model/dataclass.py(1 hunks)src/datamodel_code_generator/model/msgspec.py(1 hunks)src/datamodel_code_generator/model/pydantic/base_model.py(1 hunks)src/datamodel_code_generator/parser/base.py(1 hunks)tests/data/expected/main/openapi/unique_items_default_set_dataclass.py(1 hunks)tests/data/expected/main/openapi/unique_items_default_set_msgspec.py(1 hunks)tests/data/expected/main/openapi/unique_items_default_set_pydantic.py(1 hunks)tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py(1 hunks)tests/data/openapi/unique_items_default_set.yaml(1 hunks)tests/main/openapi/test_main_openapi.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/datamodel_code_generator/model/msgspec.py (1)
src/datamodel_code_generator/model/base.py (1)
repr_set_sorted(52-62)
tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)
src/datamodel_code_generator/model/msgspec.py (1)
field(243-248)
src/datamodel_code_generator/model/pydantic/base_model.py (1)
src/datamodel_code_generator/model/base.py (1)
repr_set_sorted(52-62)
src/datamodel_code_generator/model/dataclass.py (1)
src/datamodel_code_generator/model/base.py (1)
repr_set_sorted(52-62)
🪛 Checkov (3.2.334)
tests/data/openapi/unique_items_default_set.yaml
[high] 1-35: Ensure that the global security field has rules defined
(CKV_OPENAPI_4)
[medium] 12-20: Ensure that arrays have a maximum number of items
(CKV_OPENAPI_21)
🪛 Ruff (0.14.8)
src/datamodel_code_generator/model/msgspec.py
280-280: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
src/datamodel_code_generator/model/pydantic/base_model.py
224-224: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
src/datamodel_code_generator/model/dataclass.py
154-154: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: 3.10 on macOS
- GitHub Check: 3.11 on macOS
- GitHub Check: 3.10 on Windows
- GitHub Check: 3.12 on Windows
- GitHub Check: 3.9 on Windows
- GitHub Check: 3.13 on Windows
- GitHub Check: 3.12 on macOS
- GitHub Check: 3.11 on Windows
- GitHub Check: 3.13 on macOS
- GitHub Check: Analyze (python)
- GitHub Check: benchmarks
🔇 Additional comments (10)
tests/data/openapi/unique_items_default_set.yaml (1)
1-33: Well-structured test fixture for unique items with set defaults.The OpenAPI spec correctly covers the key test scenarios: non-empty string set, empty set, and integer set with uniqueItems enabled. The static analysis warnings about security rules and maxItems are false positives—this is a minimal test fixture, not a production API specification.
tests/data/expected/main/openapi/unique_items_default_set_pydantic_v2.py (1)
12-15: Expected output correctly demonstrates the fix.The generated model now properly uses set literals (
{'tag1', 'tag2'},{1, 2, 3}) instead of list literals, andset()for the empty set. This matches the PR objective of fixing the type mismatch betweenSet[...]annotations and list default values.tests/data/expected/main/openapi/unique_items_default_set_pydantic.py (1)
12-15: Expected Pydantic v1 output is correct.The generated model properly uses set literals within
Field()calls. Theunique_items=Trueconstraint is correctly preserved.src/datamodel_code_generator/model/pydantic/base_model.py (1)
224-227: Logic for set default representation is correct.The conditional handling properly uses
repr_set_sortedfor set defaults to ensure deterministic output, while falling back to standardreprfor other types. This aligns with the pattern used insrc/datamodel_code_generator/model/base.py.Regarding the Ruff hint about unused
noqadirective: thePLC0415rule (import-outside-top-level) may be disabled in the project's Ruff configuration, but thenoqacomment maintains consistency with other local imports in this file (e.g., line 375). Consider removing if it causes linter noise, but it's not blocking.tests/data/expected/main/openapi/unique_items_default_set_msgspec.py (1)
12-15: Msgspec output correctly uses default_factory pattern.The generated code properly uses
field(default_factory=...)for mutable set defaults—this is the correct approach for msgspec Structs. Usingsetdirectly (line 14) for the empty set factory is idiomatic, and lambdas with set literals are appropriate for non-empty defaults.src/datamodel_code_generator/model/base.py (2)
51-62: Well-designed helper for deterministic set representation.The
repr_set_sortedfunction elegantly handles the key challenge of producing consistent output across Python runs:
- The
(type(x).__name__, repr(x))sort key safely handles heterogeneous collections and types without__lt__defined.- Empty set correctly returns
"set()"(since{}would be an empty dict literal).
296-300: Clean integration with existing property.The
represented_defaultproperty now correctly delegates torepr_set_sortedfor set values while preserving the existingrepr()behavior for all other types.src/datamodel_code_generator/parser/base.py (1)
1383-1402: Safe default conversion for--use-unique-items-as-setThe new hashability check before converting list defaults to sets is correct: it fixes the type/default mismatch and gracefully skips conversion when elements are unhashable, keeping type and default consistent. This is a good, low-risk guardrail around the option’s behavior.
tests/data/expected/main/openapi/unique_items_default_set_dataclass.py (1)
1-15: Expected dataclass output matches new set default semanticsThe generated
TestModelcorrectly usesOptional[Set[...]]withfield(default_factory=lambda: {...})(andset()for empty), which aligns with the new deterministic set representation and avoids mutable defaults.tests/main/openapi/test_main_openapi.py (1)
3746-3764: Good cross-backend coverage for--use-unique-items-as-setThe new parametrized test cleanly validates that all four backends (Pydantic v1/v2, dataclasses, msgspec) render unique-items defaults as set literals when
--use-unique-items-as-setis enabled. This directly guards the behavior the PR is fixing.
Summary
--use-unique-items-as-setoption to convert list default values to set literalstags: Optional[Set[str]] = ['tag1', 'tag2'](type mismatch)tags: Optional[Set[str]] = {'tag1', 'tag2'}(correct set literal)Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.