Skip to content

Return str or dict when output=None in generate()#2787

Merged
koxudaxi merged 1 commit intomainfrom
feature/generate-return-string-or-dict
Dec 24, 2025
Merged

Return str or dict when output=None in generate()#2787
koxudaxi merged 1 commit intomainfrom
feature/generate-return-string-or-dict

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 24, 2025

Fixes: #423

Summary by CodeRabbit

  • New Features

    • generate() can return generated code as a string (single module) or a mapping of modules to code (multi-module)
    • CLI prints generated code to stdout when no output path is specified
    • Custom file headers supported
  • Documentation

    • Usage guide updated with "Getting Generated Code as String", multi-module examples, file-writing guidance, and a Return Value Summary table
  • Behavior Changes

    • Supplying a file path for multiple modules now raises an error; use a directory path
  • Tests

    • Expanded tests covering return values, header handling, multi-module outputs, and CLI stdout

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

This PR makes generate() return generated code when output=None: a single module returns a string, multiple modules return a GeneratedModules dict. The CLI now prints such returned content to stdout. Docs and tests are updated to reflect and verify these behaviors.

Changes

Cohort / File(s) Summary
Core API
src/datamodel_code_generator/__init__.py
Adds public GeneratedModules TypeAlias; extends generate() return type to str | GeneratedModules | None; adds _build_module_content() helper to assemble headers/body and relocate __future__ imports when custom headers exist; adjusts single vs multi-module return behavior.
CLI / Entrypoint
src/datamodel_code_generator/__main__.py
Captures generate() result and prints to stdout when output=None, handling string and mapping results (iterates and prints values).
Documentation
docs/using_as_module.md
Replaces previous single-module guidance with examples for returning code as string, multiple-module output (GeneratedModules), writing to files, and a Return Value Summary table; clarifies file vs directory output rules.
Tests — Behavior & Export
tests/main/test_main_general.py, tests/test_main_kr.py
Adds tests verifying return types/values for single and multiple modules, custom_file_header permutations, file-writing behavior, and that GeneratedModules is exported; updates test_main_modular_no_file to use capsys and assert stdout.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

breaking-change-analyzed

Poem

🐰 Hoppity hop, code in a string,
No temp files, I dance and sing.
One module, one tidy line,
Many modules, a mapping fine.
Toast to simpler dev-time spring! ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: returning string or dict from generate() when output=None, which is the primary objective of this PR.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #423 by enabling generate() to return generated content as a string (or dict for multiple modules) when output=None, eliminating the need for temporary directories.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the core feature: updating generate() return behavior, adding documentation, updating tests, and handling stdout printing for CLI when no output path is specified.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/generate-return-string-or-dict

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@koxudaxi koxudaxi force-pushed the feature/generate-return-string-or-dict branch from cdc64fd to 7a6c76b Compare December 24, 2025 17:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/datamodel_code_generator/__init__.py (1)

965-992: Consider consolidating duplicate future import extraction logic.

The future import extraction logic (lines 965-992) duplicates the logic in _build_module_content (lines 459-485). While the file writing path uses print() statements versus string concatenation, this could potentially be refactored to reduce duplication.

This is a minor observation; the current implementation is correct and functional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cceb4f and cdc64fd.

📒 Files selected for processing (5)
  • docs/using_as_module.md
  • src/datamodel_code_generator/__init__.py
  • src/datamodel_code_generator/__main__.py
  • tests/main/test_main_general.py
  • tests/test_main_kr.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_main_kr.py (3)
tests/main/openapi/test_main_openapi.py (1)
  • test_main_modular_no_file (463-470)
tests/main/conftest.py (1)
  • run_main_with_args (215-241)
src/datamodel_code_generator/__main__.py (1)
  • Exit (94-100)
src/datamodel_code_generator/__init__.py (1)
src/datamodel_code_generator/format.py (1)
  • Formatter (162-168)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/__init__.py (1)
  • generate (489-1016)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: 3.10 on macOS
  • GitHub Check: py312-black24 on Ubuntu
  • GitHub Check: 3.10 on Windows
  • GitHub Check: py312-isort5 on Ubuntu
  • GitHub Check: 3.11 on macOS
  • GitHub Check: benchmarks
  • GitHub Check: py312-pydantic1 on Ubuntu
  • GitHub Check: 3.11 on Ubuntu
  • GitHub Check: py312-isort6 on Ubuntu
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on macOS
  • GitHub Check: py312-isort7 on Ubuntu
  • GitHub Check: Analyze (python)
  • GitHub Check: 3.13 on macOS
  • GitHub Check: 3.13 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.12 on Ubuntu
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.14 on Ubuntu
  • GitHub Check: 3.12 on Windows
🔇 Additional comments (13)
tests/test_main_kr.py (1)

104-109: LGTM!

The test correctly validates the new behavior where modular output without a file path now succeeds and prints generated content to stdout. The assertions for "class Chocolate" and "class Source" verify that the expected models are present in the output.

src/datamodel_code_generator/__main__.py (1)

849-855: LGTM!

The stdout output logic correctly handles both single-module (string) and multi-module (mapping) return types from generate(). Adding a trailing newline ensures proper stdout formatting.

docs/using_as_module.md (2)

15-62: LGTM!

The documentation clearly explains the new return value behavior with practical examples. The type hint str | GeneratedModules and the handling pattern for both cases are well-documented.


190-198: LGTM!

The return value summary table is a helpful addition that clearly documents the behavior matrix for different output parameter scenarios.

src/datamodel_code_generator/__init__.py (4)

85-90: LGTM!

The GeneratedModules type alias is well-documented and provides a clear contract for the multi-module return type.


448-486: LGTM!

The _build_module_content helper correctly handles future import extraction and placement when a custom file header is provided. The logic properly preserves the docstring position while inserting __future__ imports in the correct location.


606-616: LGTM!

The updated return type and docstring accurately describe the three possible return scenarios.


921-935: LGTM!

The new return logic correctly handles both single-module (string) and multi-module (GeneratedModules dict) cases when output is None.

tests/main/test_main_general.py (5)

10-21: LGTM!

The new imports for inline_snapshot and GeneratedModules are appropriate for the added tests.


1442-1464: LGTM!

This test validates the core new functionality: generate() returns a string when output=None for single-file schemas. The snapshot testing approach ensures the output format is verified.


1518-1554: LGTM!

These tests correctly verify that:

  1. generate() returns None when an output path is provided
  2. The file content matches what would be returned with output=None

The .strip() comparison on line 1554 appropriately handles potential trailing whitespace differences.


1557-1642: LGTM!

This test validates the multi-module return behavior using a directory input with cross-file references. The snapshot correctly captures the GeneratedModules dict structure with module path tuples as keys.


1650-1717: LGTM!

These tests comprehensively cover custom file header scenarios:

  • Basic custom header
  • Custom header with code after docstring (testing __future__ import placement)
  • Custom header with disable_future_imports=True

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/datamodel_code_generator/__init__.py (2)

448-487: Code duplication between _build_module_content and file-writing logic.

The future imports extraction logic at lines 463-477 is nearly identical to lines 969-985 in the file-writing path. Consider extracting shared logic to reduce duplication, though this is not critical for this PR.

🔎 Potential refactor to reduce duplication

You could extract the future imports handling into a separate helper:

def _extract_future_imports(body: str) -> tuple[str, str]:
    """Extract future imports from body, returning (extracted, body_without_future)."""
    lines = body.split("\n")
    future_indices = [i for i, line in enumerate(lines) if line.strip().startswith("from __future__")]
    if not future_indices:
        return "", body
    extracted = "\n".join(lines[i] for i in future_indices)
    remaining = [line for i, line in enumerate(lines) if i not in future_indices]
    return extracted, "\n".join(remaining).lstrip("\n")

This could be used in both _build_module_content and the file-writing path.


958-1000: Consider using a context manager for file handling.

The file is opened at line 960 but closed manually at line 1000. Using a context manager would be safer against exceptions during write operations.

🔎 Proposed refactor using context manager
     for path, (body, future_imports, filename) in modules.items():
         if not path.parent.exists():
             path.parent.mkdir(parents=True)
-        file = path.open("wt", encoding=encoding)
+        with path.open("wt", encoding=encoding) as file:

-        safe_filename = filename.replace("\n", " ").replace("\r", " ") if filename else ""
-        effective_header = custom_file_header or header.format(safe_filename)
+            safe_filename = filename.replace("\n", " ").replace("\r", " ") if filename else ""
+            effective_header = custom_file_header or header.format(safe_filename)

-        if custom_file_header and body:
-            # ... (indent rest of the block)
+            if custom_file_header and body:
+                # ... (rest of the block)
-        file.close()
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdc64fd and 7a6c76b.

📒 Files selected for processing (5)
  • docs/using_as_module.md
  • src/datamodel_code_generator/__init__.py
  • src/datamodel_code_generator/__main__.py
  • tests/main/test_main_general.py
  • tests/test_main_kr.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_main_kr.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/main/test_main_general.py (1)
src/datamodel_code_generator/__init__.py (4)
  • AllExportsScope (259-267)
  • DataModelType (226-234)
  • generate (489-1016)
  • InputFileType (204-214)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/__init__.py (1)
  • generate (489-1016)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: py312-black23 on Ubuntu
  • GitHub Check: 3.10 on Ubuntu
  • GitHub Check: py312-isort5 on Ubuntu
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: py312-pydantic1 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.10 on macOS
  • GitHub Check: py312-isort7 on Ubuntu
  • GitHub Check: 3.11 on macOS
  • GitHub Check: py312-black24 on Ubuntu
  • GitHub Check: 3.11 on Ubuntu
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on Ubuntu
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.13 on Ubuntu
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.13 on macOS
  • GitHub Check: benchmarks
🔇 Additional comments (10)
src/datamodel_code_generator/__main__.py (1)

734-855: LGTM! Clean implementation of stdout printing for generated content.

The changes correctly:

  1. Capture the return value from generate()
  2. Print string results directly when output is None
  3. Iterate over GeneratedModules dict values for multi-module output
docs/using_as_module.md (2)

15-62: Documentation looks comprehensive and well-structured.

The examples clearly demonstrate:

  1. Getting generated code as a string (single module)
  2. Handling GeneratedModules dict for multi-module schemas
  3. The isinstance(result, dict) check correctly distinguishes between return types

190-199: Return Value Summary table is accurate and helpful.

The table correctly documents the behavior, including that a file path with multiple modules raises an error.

src/datamodel_code_generator/__init__.py (3)

85-91: Well-documented type alias.

The GeneratedModules TypeAlias with its docstring clearly explains the purpose and structure of the return type for multi-module generation.


921-935: Clean implementation of in-memory return for both single and multi-module cases.

The logic correctly:

  1. Returns a str for single-file output
  2. Returns a GeneratedModules dict with sorted keys for deterministic ordering
  3. Applies headers consistently to all modules

1052-1052: LGTM! GeneratedModules correctly exported.

Adding GeneratedModules to __all__ makes it part of the public API, enabling users to type-hint their code when working with multi-module generation.

tests/main/test_main_general.py (4)

1439-1461: Good test coverage for basic string return.

The test correctly validates:

  1. Return type is str when output=None
  2. Generated content structure with header, imports, and model

1554-1639: Comprehensive test for multi-module generation.

The test properly validates the GeneratedModules return type with tuple keys mapping to generated code strings.

Note: The snapshot shows test_generate_returns_dict_for0 as the filename in __init__.py (lines 1597-1598), which appears truncated. This is likely because the directory name (tmp_path) is used as input_filename fallback, which is expected behavior for directory inputs.


1647-1714: Excellent edge case coverage for custom file headers.

The three tests thoroughly validate:

  1. Basic custom header with future imports placement
  2. Custom header with docstring and code - future imports inserted after docstring
  3. Custom header when disable_future_imports=True - no future import handling needed

1642-1644: Simple but effective export verification.

The test confirms GeneratedModules is importable from the public API. The assertion is minimal but sufficient since the import at line 16 would fail if the export was missing.

@koxudaxi koxudaxi enabled auto-merge (squash) December 24, 2025 17:50
@koxudaxi koxudaxi merged commit f3029e8 into main Dec 24, 2025
35 checks passed
@koxudaxi koxudaxi deleted the feature/generate-return-string-or-dict branch December 24, 2025 17:51
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 24, 2025

CodSpeed Performance Report

Merging #2787 will not alter performance

Comparing feature/generate-return-string-or-dict (7a6c76b) with main (5cceb4f)

Summary

✅ 73 untouched
⏩ 10 skipped1

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.48%. Comparing base (a46ceb8) to head (7a6c76b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #2787    +/-   ##
========================================
  Coverage   99.47%   99.48%            
========================================
  Files          88       88            
  Lines       13213    13348   +135     
  Branches     1556     1565     +9     
========================================
+ Hits        13144    13279   +135     
  Misses         36       36            
  Partials       33       33            
Flag Coverage Δ
unittests 99.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: Breaking changes detected

Reasoning: This PR contains several breaking changes: 1) The generate() function signature changed from returning None to returning str | GeneratedModules | None - when output=None, it now returns the generated code as a string (single module) or a dictionary (multiple modules). This is a significant API change that could affect type checking and code that relies on the previous behavior. 2) The CLI now prints generated code to stdout when no --output flag is specified, which is a new default behavior that could affect scripts or pipelines expecting silent operation. 3) An error message was slightly changed which could affect error parsing. The removal of the "Modular references require an output directory" error for output=None case is not breaking since it now supports returning the dict instead of erroring.

Content for Release Notes

API/CLI Changes

  • generate() function return type changed - Previously returned None, now returns str | GeneratedModules | None. When output=None, returns str for single module or GeneratedModules dict for multiple modules. Code that explicitly checks generate() is None or ignores the return value will continue to work, but type checkers may flag this change. (Return str or dict when output=None in generate() #2787)

Default Behavior Changes

  • CLI now prints to stdout when no output path specified - When running datamodel-codegen without the --output flag, generated code is now printed to stdout instead of silently doing nothing. This enables piping output but may affect scripts that expected no output. (Return str or dict when output=None in generate() #2787)
# Before: No output
datamodel-codegen --input schema.json

# After: Prints generated code to stdout
datamodel-codegen --input schema.json

Error Handling Changes

  • Error message changed for multi-module output without directory - The error message when attempting multi-module generation with a file path changed from "Modular references require an output directory" to "Modular references require an output directory, not a file". Scripts parsing error messages may need updates. (Return str or dict when output=None in generate() #2787)

This analysis was performed by Claude Code Action

koxudaxi added a commit that referenced this pull request Dec 25, 2025
* Add --collapse-root-models-name-strategy option

* docs: update CLI reference documentation and prompt data

🤖 Generated by GitHub Actions

* Add pragma no cover for defensive edge cases

* Achieve 100% diff coverage for collapse-root-models-name-strategy

* Use cast instead of type ignore comment

* Remove line comments from collapse-root-models implementation

* Add complex e2e tests for collapse-root-models-name-strategy

* Update reference metadata when renaming in parent strategy

* Refactor collapse-root-models tests to use parameterization for v1/v2

* Add schema path context to error messages (#2786)

* Return str or dict when output=None in generate() (#2787)

* Add --http-timeout CLI option (#2788)

* Add --http-timeout CLI option for configurable HTTP request timeout

* docs: update CLI reference documentation and prompt data

🤖 Generated by GitHub Actions

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Pass schema extensions to templates (#2790)

* Pass schema extensions to templates

* Move model_base import to top of file

* Add schema extensions documentation

Document how x-* schema extensions are passed to custom templates
via the extensions variable, with examples for database model
configuration and other use cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* Add propertyNames and x-propertyNames support (#2789)

* Add propertyNames and x-propertyNames support

* Fix Pydantic v1 compatibility for x-propertyNames

Use the model_validate utility function from util module instead of
calling JsonSchemaObject.model_validate() directly, which only
exists in Pydantic v2.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* Add test for x-propertyNames non-dict branch coverage

Test that x-propertyNames with non-dict value (e.g., boolean) is
correctly ignored, achieving 100% diff coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* Add support for additional_imports in extra-template-data JSON (#2793)

* Update zensical to 0.0.15 (#2794)

* Add --use-field-description-example option (#2792)

* Add --use-field-description-example option

* docs: update CLI reference documentation and prompt data

🤖 Generated by GitHub Actions

* Add tests for complete branch coverage of docstring property

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix formatting in test file

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simple Use Case with CSV

1 participant