Add regex patterns reflecting Decimal Constraints to the Decimal type's string schema#11420
Add regex patterns reflecting Decimal Constraints to the Decimal type's string schema#11420Jack70248 wants to merge 11 commits intopydantic:mainfrom
Conversation
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
CodSpeed Performance ReportMerging #11420 will not alter performanceComparing Summary
|
This reverts commit 7014353.
|
please review - thanks! |
|
Apologies, I've spotted a couple of outstanding issues. I'll pull this PR back to draft until I can push an update:
I'll also add in support for arbitrary trailing zeroes in validation, on the basis that Pydantic successfully validates |
|
This should be ready for review now, thanks @sydney-runkle |
| # ##### Regex for Decimal JSON Schema Generation ##### | ||
|
|
||
| _DECIMAL_JSON_VALIDATION_MAX_DIGIT_LOOKAHEAD_PATTERN = ( | ||
| r'(?=\d{{0,{max_digits}}}0*$' # Positive lookahead for max_digits, allows trailing zeroes |
There was a problem hiding this comment.
This will match arbitrary trailing zeros, which is only allowed after the . separator:
ta = TypeAdapter(Annotated[Decimal, Field(max_digits=4)])
ta.validate_python("30000000") # ValidationError, passes regex:
ta.json_schema()['anyOf'][1]['pattern']
#> '^-?0*(?=\\d{0,4}0*$|(?=.*\\..*)[\\d\\.]{0,5}0*$)\\d{1,}(?:\\.\\d{0,})?0*$'There was a problem hiding this comment.
But I don't know if this is reasonably easy to support, so perhaps we should use a simpler (but one that accepts invalid inputs in some cases) regex when having only max_digits specified.
There was a problem hiding this comment.
Thanks @Viicos. I've realised this issue also occurs when both max_digits and decimal_places is set as well. I've been able to allow trailing zeroes only if a decimal place exists using the (?(id/name)yes-pattern|no-pattern) syntax in the max_digits lookahead and always including the lookahead if max_digits is set.
In looking in to a solution I've also been able to simplify the max_digit lookahead a bit.
|
Thanks and sorry for the delayed response. We are almost here. One issue I noticed is that during serialization, we don't make use of the normalized version: import re
from decimal import Decimal
from typing import Annotated
from pydantic import TypeAdapter, Field
ta = TypeAdapter(Annotated[Decimal, Field(max_digits=4)])
reg = ta.json_schema(mode='serialization')['pattern']
val = ta.validate_python("0000001.001000")
print(val)
#> 1.001000
dumped = ta.dump_python(val, mode='json')
print(dumped)
#> 1.001000
re.compile(reg).match(dumped) # None |
Thanks @Viicos , and no problem on the delay. I've got a few things on for the next week but will be able to push an update later next week. |
|
Hi @Viicos I might need a hand to wrap my head around the solution to this problem. It sounds like we want the output of the TypeAdaptors dump_python() method (and BaseModel's dump_model() method?) to return the normalised Decimal form: It looks like the default serializers for this come from pydantic_core, but there's mechanisms to overwrite these through defining custom serializers such as a PlainSerializer or using something like Let me know if I've misunderstood anything, thanks! |
We could, but this would be a breaking change so it can only be considered for v3. |
|
Handled in #11420. |
Change Summary
This PR builds on and addresses remaining requirements from #11016
Based on the remaining requirements from the previous PR I have:
Updated the patterns and corresponding logic in json_schema.py. These should now handle the following cases for both validation and serialization:
Removed the test that was added in the previous PR, and added a few more parametrize test cases for
test_constraints_schema_validation()andtest_constraints_schema_serialization()tests - ruff format split these out over quite a few lines due to their lengthI've been able to reuse the patterns for all cases by using bracketed repetition quantifiers where we just leave the upper bound empty if it doesn't need to be enforced by
max_digitsordecimal_places. This does have a downside in that the output of json_schema() for the user will be slightly less succinct, using{0,}or{1,}in some places where*or+feels more natural. What this looks like can be seen in the new test cases, example hereI also have a couple of questions:
Do we want to allow for trailing zeroes in validation and/or serialization?Implemented for validation1.as a valid Decimal, but wouldn't match against.1due to the pattern requiring at minimum one integer place..1? We can drop the requirement for a minimum of one integer place but then we'd match against.and empty entries - the best way to handle it might be a minimum digit requirement in the positive lookahead, where we don't care if its an integer or decimal, and then always include the lookahead.1.this should be an easy fix by requiring a minimum of one digit in the optional decimal place capture groupRelated issue number
fix #10867
Checklist
Selected Reviewer: @sydney-runkle