-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve alias configuration APIs #11468
Conversation
CodSpeed Performance ReportMerging #11468 will not alter performanceComparing Summary
|
Deploying pydantic-docs with
|
Latest commit: |
ffb5edd
|
Status: | ✅ Deploy successful! |
Preview URL: | https://c14714b2.pydantic-docs.pages.dev |
Branch Preview URL: | https://alias-consistency-new-api.pydantic-docs.pages.dev |
…_name * Bump pydantic_core to appropriate PR * Document new alias config settings: validate_by_name, validate_by_alias, and serialize_by_alias * Deprecate and document populate_by_name * Add initial API docs for new alias config settings
ef29d15
to
2fdfe89
Compare
Closing and reopening to trigger codspeed on the new |
tests/test_aliases.py
Outdated
class Model(BaseModel): | ||
model_config = ConfigDict(**config_dict) | ||
|
||
a: int = Field(validation_alias='A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heavily tested all of the config / runtime setting combos for each applicable schema type in pydantic-core
. It feels redundant to do so here as well, so I've gone with the simple model only approach.
@sydney-runkle , ignore the Hyperlint failure on this PR. |
to empower users with more fine grained validation control. In <v2.11, disabling validation by alias was not possible. | ||
|
||
!!! tip | ||
If you set `validate_by_alias` to `False`, you should set `validate_by_name` to `True` to ensure that the field can still be populated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think the literal pattern would really fit better here.. If having this boolean pattern on two configuration values only introduced the inconsistency when setting both validate_by_alias=False, validate_by_name=False
, it would be fine (I don't see why users would do so), but I won't be surprised if many users find it counter-intuitive that you also need to set validate_by_name=True
here.
I think it's worth reconsidering, cc @samuelcolvin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what should happen if you set validate_by_alias=False
, but explicitly set by_alias=True
or by_name=True
during validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what should happen if you set validate_by_alias=False, but explicitly set by_alias=True or by_name=True during validation?
Validation time settings always take priority, when set. This is the same with strict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sympathetic to the literal pattern argument. If we were starting fully from scratch, I think it might make more sense. Specifically, the boolean traps can be a bit confusing. In particular, the fact that you have to set validate_by_name=True
if validate_by_alias=False
explicitly is a bit confusing, especially for new users.
One thing we could do to mitigate this challenge is automatically set validate_by_name=True
if a user sets validate_By_alias=False
.
My thoughts re why we should stick with the 2 boolean flags:
- It represents less change to this setting compared to a switch to literals - there's already a lot of change going on here, and I'm hesitant to introduce a setting
type
change as well. - 2 boolean flags provide greater configurability for interaction between config and runtime settings, as you can override one behavior and not the other. It's also helpful to have unset markers for each thing. For example:
M1: validate_by_alias = True, validate_by_name = False
M2: validate_by_alias = False, validate_by_name = True
runtime setting: by_name = True
==>
M1: alias and name validation
M2: name only validation
This can't be achieved with the literal approach. Either you'd use:
-
validate_by='name'
, and M1 would lose alias validation -
validate_by='name and alias'
and M2 would no longer avoid validating with alias -
Autocomplete is easier with boolean flags, and the behavior is relatively intuitive
Aliases are one of the most common (if not the most commonly used) field tool, so I do think this decision is quite important. I also understand that if we go with bools here, we're stuck with that until at least V4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as discussed on Slack, thanks for summing things up here, this might be useful as a reference in case we get questions about the current API.
As we discussed as well, defaulting validate_by_name
to True
if validate_by_alias
is set to False
is postponed after this PR, and should be tackled either before 2.11 or after. Leaving this conversation unresolved so that it's easier to find it later.
Co-authored-by: Victorien <[email protected]>
Need to fix a few things (third party test investigation, mypy, docs build, etc). However, great to see that all of our tests are passing - big step! |
@@ -88,7 +88,7 @@ that the version support policy is subject to change at discretion of contributo | |||
|
|||
* Any required fields that don't have dynamically-determined aliases will be included as required | |||
keyword arguments. | |||
* If the [`populate_by_name`][pydantic.ConfigDict.populate_by_name] model configuration value is set to | |||
* If the [`validate_by_name`][pydantic.ConfigDict.validate_by_name] model configuration value is set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if we should have another pair of ConfigDict
settings instantiate_by_name
and instantiate_by_alias
to distinguish it from model_validate_X
(and we should allow them to coexist, resulting in an overloaded __init__
).
For background see #8379 (comment) and #6762
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean instanciate_by_*
only take effect on direct instantiation (i.e. Model(...)
)?
This would really complicate the API. Using __init__
directly is better suited when you provide the arguments directly (e.g. Model(a=1, b='test')
). In that case, the user can simply provide the aliases (and this is what static type checkers will enforce, we have no control over it).
If you want to validate data where you don't control the provided keys, then model_validate()
is better suited anyway: Model.model_validate({'a': 1, 'b': 'test'})
, and you can provide by_name=True
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
__init__
directly is better suited when you provide the arguments directly (e.g.Model(a=1, b='test')
).
This is exactly why some (I would say most if we consider popular serialization frameworks in other languages) developers are reluctant to use alias in a direct instantiation. Field names are carefully chosen according to language conventions, e.g. snake_case
for Python, while aliases are decided by business logic. We wouldn't like to see things like Model(SomeRandomValue=147, Env_Global='test')
in a Python code review, and it would be unfortunate if we need to annotate every model_validate_X
call for this reason.
If we don't want that complexity, I would suggest enforcing validate_by_name
for __init__
(keep the current behavior in v2 for compatibility). This is intuitive and aligned with dataclass
and other frameworks in statically typed languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate_by_name
logic still applies to __init__
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intuitive and aligned with dataclass and other frameworks in statically typed languages.
Dataclasses don't make use of aliases, but this is something supported by the @dataclass_transform
spec, and as per the fields specifiers section:
alias
is an optional str parameter that provides an alternative name for the field. This alternative name is used in the synthesized__init__
method.
But I get your point, Model(SomeRandomValue=147, Env_Global='test')
feels weird in Python code. The fact that type checkers will enforce aliases in __init__
is unfortunate though.
This merits a broader discussion, currently we don't have a proper distinction between direct instantiation (__init__
) and the model_validate(_*)
methods when it comes to validation behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevapple, feel free to open an issue with a summary of this discussion!
Co-authored-by: Victorien <[email protected]>
19d1e8b
to
b3aaa78
Compare
b3aaa78
to
ffb5edd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, good to see this highly requested issue (almost) tackled!
The CI of the merge commit is failing: https://github.com/pydantic/pydantic/actions/runs/13547096675/job/37861427725 |
This PR introduces a few new features and a few changes. They might be easier to digest by checking out this simple API changes snippet:
Here's the longer version:
validate_by_alias
has been introduced as abool
type configuration flag, set toTrue
by default.populate_by_name
has been deprecated in favor ofvalidate_by_name
(for consistency withvalidate_by_alias
. This is set toFalse
by default to match the behavior ofpopulate_by_name
. Though this is deprecated, we include a patch in the configuration init logic so that this setting still works in the short term. This will be removed in V3.New feature capability: you can now set
validate_by_alias = False
andvalidate_by_name = True
if you want to only allow validation by field name. This limitation was not possible with solely thepopulate_by_name
configuration setting.Note: You cannot set both
validate_by_name
andvalidate_by_alias
toFalse
. This results in a schema error.serialize_by_alias
has been introduced as abool
type configuration flag, set toFalse
by default to match theby_alias
setting on model serialization functions. We anticipate changing this default in V3 to be consistent withvalidate_by_alias = True
by default.Most of these changes are described in more detail, and practically implemented, in pydantic/pydantic-core#1640.
This makes significant progress on #8379. Some changes (like default value changes) will have to wait for V3.