Skip to content

Add decimal precision constraints to JSON schema#11016

Closed
KCui0327 wants to merge 4 commits intopydantic:mainfrom
KCui0327:decimal_constraint
Closed

Add decimal precision constraints to JSON schema#11016
KCui0327 wants to merge 4 commits intopydantic:mainfrom
KCui0327:decimal_constraint

Conversation

@KCui0327
Copy link
Copy Markdown

@KCui0327 KCui0327 commented Dec 1, 2024

Change Summary

This PR added a decimal precision pattern validation to JSON schema generation. The implementation enforces max_digits and decimal_places constraints by adding a regex pattern. A corresponding unit test was also added to test the changes.

Related issue number

fix #10867

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

@github-actions github-actions Bot added the relnotes-fix Used for bugfixes. label Dec 1, 2024
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 1, 2024

CodSpeed Performance Report

Merging #11016 will not alter performance

Comparing KCui0327:decimal_constraint (c159bea) with main (32f405b)

Summary

✅ 45 untouched benchmarks

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 1, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  json_schema.py 712-720
Project Total  

This report was generated by python-coverage-comment-action

@KCui0327
Copy link
Copy Markdown
Author

KCui0327 commented Dec 1, 2024

please review (thanks!)

Copy link
Copy Markdown
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A couple remarks on the pattern implementation:

  • Can't we also set a pattern if only decimal_places is set?
  • I'm wondering if we should special case leading zeros differently. What we could do is:
    • in validation, allow arbitrary leading zeros to be specified. Pydantic will not error during validation.
    • in serialization, only allow a single 0 or [1-9] followed by the constrained number of integers.
  • Taking into account what was said above, let's define the pattern as a private constant in this module, and explain it with comments (and then compile with the re.VERBOSE flag), e.g:
     _DECIMAL_PATTERN = r"""
     ^-?      # Possible minus sign
     ...
     """

@KCui0327 KCui0327 requested a review from Viicos December 6, 2024 08:47
@KCui0327
Copy link
Copy Markdown
Author

KCui0327 commented Dec 6, 2024

Thanks for the PR. A couple remarks on the pattern implementation:

* Can't we also set a pattern if only `decimal_places` is set?

* I'm wondering if we should special case leading zeros differently. What we could do is:
  
  * in validation, allow arbitrary leading zeros to be specified. Pydantic will not error during validation.
  * in serialization, only allow a single 0 or `[1-9]` followed by the constrained number of integers.

* Taking into account what was said above, let's define the pattern as a private constant in this module, and explain it with comments (and then compile with the `re.VERBOSE` flag), e.g:
  ```python
   _DECIMAL_PATTERN = r"""
   ^-?      # Possible minus sign
   ...
   """
  ```

Hi @Viicos, thank you for your feedback! I have address your feedback in the latest commits, please let me know if there's any other changes I should add.

Copy link
Copy Markdown
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this does not yet handle all cases:

  • in validation, you can either have:
    • decimal_places set, max_digits not set. In that case, your current _DECIMAL_JSON_VALIDATION_PATTERN handles this.
    • decimal_places set and max_digits set. In that case, we need a new regex.
    • decimal_places not set and max_digits set. In that case, the regex will be limited but you can make use of a positive lookahead, e.g. "^(?=.{0,n+1}$){decimal_regex}$", with n being max_digits.
  • in serialization, similar logic applies.

Comment thread pydantic/json_schema.py
^
-? # Minus sign (optional)
\d+ # One or more digits for the integer part (allows leading zeros)
( # Optional decimal group
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it non capturing? (same for the other regex). Or is there a reason not to do so?

Comment thread tests/test_json_schema.py
@@ -935,11 +935,41 @@ class Model(BaseModel):

def test_decimal_constraints():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test no longer test anything related to JSON Schema. I believe we already have plenty of such tests in test_types.py

@sydney-runkle sydney-runkle added awaiting author revision awaiting changes from the PR author and removed ready for review labels Jan 2, 2025
@sydney-runkle
Copy link
Copy Markdown
Contributor

Hey @KCui0327,

Are you still interested in working on this? Thanks for your contributions thus far!

@Viicos Viicos mentioned this pull request Jan 25, 2025
1 task
@KCui0327
Copy link
Copy Markdown
Author

KCui0327 commented Jan 30, 2025

Hey @KCui0327,

Are you still interested in working on this? Thanks for your contributions thus far!

Hi @sydney-runkle ,

Apologies for the late response. I will not be able to actively contribute to this PR for the next few months so I want to open it to anyone who wants to take this PR further and address the comments from Viccos.

@hellt
Copy link
Copy Markdown

hellt commented Feb 2, 2025

at pydantify we are looking forward to have this merged as well. Thanks for all the work on this PR

@Jack70248
Copy link
Copy Markdown
Contributor

Hi @sydney-runkle and @Viicos

I can have a look at finishing this off. I should be able to submit an updated PR later this week.

I was reviewing the current PR and noticed the use of a verbose regex pattern makes its way through to the json_schema() output, and it doesn't look like a re.Pattern object provides an easy way to extract the regex without the comments and extra white space.

{'anyOf': [{'maximum': 10.0, 'minimum': 0.0, 'type': 'number'}, {'pattern': '\n ^\n -? # Minus sign (optional)\n \\d+ # One or more digits for the integer part (allows leading zeros)\n ( # Optional decimal group\n \\. # Decimal point\n \\d{0,2} # Up to 2 decimal digits\n )?\n $\n', 'type': 'string'}]}

@Viicos
Copy link
Copy Markdown
Member

Viicos commented Feb 3, 2025

Thanks for looking into it.

My bad, I thought compiling the verbose pattern and using the pattern attribute would give us the final pattern with comments and extra spaces stripped.

Instead, lets use a simple multi-line string and add comments this way:

_DECIMAL_JSON_SERIALIZATION_PATTERN = (
    r"^-?"                                  # Minus sign (optional)
    r"(?:0|[1-9]\d{{0,{integer_places}}})"  # Integer part: single zero OR single non-zero digit...
)

@sydney-runkle
Copy link
Copy Markdown
Contributor

Looks like this will be finished up by #11420, thanks for the initial work on this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting author revision awaiting changes from the PR author relnotes-fix Used for bugfixes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

max_digits and decimal_places do not serialize to json_schema for decimal

5 participants