Skip to content

Add regex patterns reflecting Decimal Constraints to the Decimal type's string schema#11420

Closed
Jack70248 wants to merge 11 commits intopydantic:mainfrom
Jack70248:decimal_constraint
Closed

Add regex patterns reflecting Decimal Constraints to the Decimal type's string schema#11420
Jack70248 wants to merge 11 commits intopydantic:mainfrom
Jack70248:decimal_constraint

Conversation

@Jack70248
Copy link
Copy Markdown
Contributor

@Jack70248 Jack70248 commented Feb 8, 2025

Change Summary

This PR builds on and addresses remaining requirements from #11016

Based on the remaining requirements from the previous PR I have:

  1. Updated the patterns and corresponding logic in json_schema.py. These should now handle the following cases for both validation and serialization:

    • decimal_places and max_digits set: both decimal_places and integer_places enforced via a repetition quantifier, the total of the maximum of integer_places and decimal_places enforces max_digits
    • decimal_places set and max_digits not set: decimal_places enforced via a repetition quantifier, excludes the lookahead and leaves the upper bound of integer_places repetition quantifier blank
    • decimal_places not set and max_digits set: max_digits enforced via positive lookahead, integer_places and decimal_places repetition quantifier upper bound left blank
  2. Removed the test that was added in the previous PR, and added a few more parametrize test cases for test_constraints_schema_validation() and test_constraints_schema_serialization() tests - ruff format split these out over quite a few lines due to their length

I've been able to reuse the patterns for all cases by using bracketed repetition quantifiers where we just leave the upper bound empty if it doesn't need to be enforced by max_digits or decimal_places. This does have a downside in that the output of json_schema() for the user will be slightly less succinct, using {0,} or {1,} in some places where * or + feels more natural. What this looks like can be seen in the new test cases, example here

I also have a couple of questions:

  • Do we want to allow for trailing zeroes in validation and/or serialization? Implemented for validation
  • We currently match against something like 1. as a valid Decimal, but wouldn't match against .1 due to the pattern requiring at minimum one integer place.
    • Do we want to match against entries like .1? We can drop the requirement for a minimum of one integer place but then we'd match against . and empty entries - the best way to handle it might be a minimum digit requirement in the positive lookahead, where we don't care if its an integer or decimal, and then always include the lookahead.
    • Instead if we shouldn't be matching against entries like 1. this should be an easy fix by requiring a minimum of one digit in the optional decimal place capture group

Related issue number

fix #10867

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

@github-actions github-actions Bot added the relnotes-fix Used for bugfixes. label Feb 8, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 8, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  json_schema.py
Project Total  

This report was generated by python-coverage-comment-action

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Feb 8, 2025

CodSpeed Performance Report

Merging #11420 will not alter performance

Comparing Jack70248:decimal_constraint (88da428) with main (10af6a8)

Summary

✅ 46 untouched benchmarks

@Jack70248 Jack70248 changed the title Decimal constraint Add regex patterns reflecting user-defined constraints to the Decimal type's string schema Feb 8, 2025
@Jack70248 Jack70248 changed the title Add regex patterns reflecting user-defined constraints to the Decimal type's string schema Add regex patterns reflecting Decimal Constraints to the Decimal type's string schema Feb 8, 2025
@Jack70248 Jack70248 marked this pull request as ready for review February 9, 2025 02:46
@Jack70248
Copy link
Copy Markdown
Contributor Author

please review - thanks!

@Jack70248
Copy link
Copy Markdown
Contributor Author

Jack70248 commented Feb 11, 2025

Apologies, I've spotted a couple of outstanding issues. I'll pull this PR back to draft until I can push an update:

  • The lookahead is redundant when both max_digits and decimal_places are set
  • The lookahead isn't currently working with the requirement for arbitrary leading zeroes in validation This was actually fine

I'll also add in support for arbitrary trailing zeroes in validation, on the basis that Pydantic successfully validates 1111.000 with a constraint of 'max_digits':4

@Jack70248 Jack70248 marked this pull request as draft February 11, 2025 08:58
@sydney-runkle sydney-runkle added relnotes-feature awaiting author revision awaiting changes from the PR author and removed ready for review relnotes-fix Used for bugfixes. labels Feb 12, 2025
@Jack70248 Jack70248 marked this pull request as ready for review February 22, 2025 03:21
@Jack70248
Copy link
Copy Markdown
Contributor Author

This should be ready for review now, thanks @sydney-runkle

Comment thread pydantic/json_schema.py Outdated
# ##### Regex for Decimal JSON Schema Generation #####

_DECIMAL_JSON_VALIDATION_MAX_DIGIT_LOOKAHEAD_PATTERN = (
r'(?=\d{{0,{max_digits}}}0*$' # Positive lookahead for max_digits, allows trailing zeroes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will match arbitrary trailing zeros, which is only allowed after the . separator:

ta = TypeAdapter(Annotated[Decimal, Field(max_digits=4)])

ta.validate_python("30000000") # ValidationError, passes regex:

ta.json_schema()['anyOf'][1]['pattern']
#> '^-?0*(?=\\d{0,4}0*$|(?=.*\\..*)[\\d\\.]{0,5}0*$)\\d{1,}(?:\\.\\d{0,})?0*$'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't know if this is reasonably easy to support, so perhaps we should use a simpler (but one that accepts invalid inputs in some cases) regex when having only max_digits specified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Viicos. I've realised this issue also occurs when both max_digits and decimal_places is set as well. I've been able to allow trailing zeroes only if a decimal place exists using the (?(id/name)yes-pattern|no-pattern) syntax in the max_digits lookahead and always including the lookahead if max_digits is set.

In looking in to a solution I've also been able to simplify the max_digit lookahead a bit.

@Viicos
Copy link
Copy Markdown
Member

Viicos commented Mar 27, 2025

Thanks and sorry for the delayed response. We are almost here. One issue I noticed is that during serialization, we don't make use of the normalized version:

import re
from decimal import Decimal
from typing import Annotated

from pydantic import TypeAdapter, Field


ta = TypeAdapter(Annotated[Decimal, Field(max_digits=4)])

reg = ta.json_schema(mode='serialization')['pattern']

val = ta.validate_python("0000001.001000")
print(val)
#> 1.001000
dumped = ta.dump_python(val, mode='json')
print(dumped)
#> 1.001000

re.compile(reg).match(dumped)  # None

@Jack70248
Copy link
Copy Markdown
Contributor Author

Thanks and sorry for the delayed response. We are almost here. One issue I noticed is that during serialization, we don't make use of the normalized version:

import re
from decimal import Decimal
from typing import Annotated

from pydantic import TypeAdapter, Field


ta = TypeAdapter(Annotated[Decimal, Field(max_digits=4)])

reg = ta.json_schema(mode='serialization')['pattern']

val = ta.validate_python("0000001.001000")
print(val)
#> 1.001000
dumped = ta.dump_python(val, mode='json')
print(dumped)
#> 1.001000

re.compile(reg).match(dumped)  # None

Thanks @Viicos , and no problem on the delay. I've got a few things on for the next week but will be able to push an update later next week.

@Jack70248
Copy link
Copy Markdown
Contributor Author

Hi @Viicos

I might need a hand to wrap my head around the solution to this problem. It sounds like we want the output of the TypeAdaptors dump_python() method (and BaseModel's dump_model() method?) to return the normalised Decimal form: str(Decimal("1.001000").normalize()) -> "1.001", so that it correctly matches against the serialization regex pattern.

It looks like the default serializers for this come from pydantic_core, but there's mechanisms to overwrite these through defining custom serializers such as a PlainSerializer or using something like core_schema.plain_serializer_function_ser_schema(), but it's not clear to me the best way to overwrite the TypeAdaptors/BaseModels serializer with a custom serializer, or if this should instead be changed in pydantic-core?

Let me know if I've misunderstood anything, thanks!

@Viicos
Copy link
Copy Markdown
Member

Viicos commented May 8, 2025

I might need a hand to wrap my head around the solution to this problem. It sounds like we want the output of the TypeAdaptors dump_python() method (and BaseModel's dump_model() method?) to return the normalised Decimal form: str(Decimal("1.001000").normalize()) -> "1.001", so that it correctly matches against the serialization regex pattern.

We could, but this would be a breaking change so it can only be considered for v3.

@Viicos
Copy link
Copy Markdown
Member

Viicos commented Jul 11, 2025

Handled in #11420.

@Viicos Viicos closed this Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting author revision awaiting changes from the PR author relnotes-feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

max_digits and decimal_places do not serialize to json_schema for decimal

4 participants