Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve alias configuration APIs #11468

Merged
merged 25 commits into from
Feb 26, 2025
Merged

Improve alias configuration APIs #11468

merged 25 commits into from
Feb 26, 2025

Conversation

sydney-runkle
Copy link
Contributor

@sydney-runkle sydney-runkle commented Feb 19, 2025

This PR introduces a few new features and a few changes. They might be easier to digest by checking out this simple API changes snippet:

class ConfigDict:
    validate_by_alias: bool = True
    serialize_by_alias: bool = False
        in v3, serialize_by_alias default should change to True

    populate_by_name: bool = False ---> validate_by_name: bool = False

def model_dump_X(by_alias: bool = False, ...):
    in v3, by_alias default should change to True

def model_validate_X(by_alias: bool = True, by_name: bool = False, ...):

Here's the longer version:

  1. validate_by_alias has been introduced as a bool type configuration flag, set to True by default.
  2. populate_by_name has been deprecated in favor of validate_by_name (for consistency with validate_by_alias. This is set to False by default to match the behavior of populate_by_name. Though this is deprecated, we include a patch in the configuration init logic so that this setting still works in the short term. This will be removed in V3.

New feature capability: you can now set validate_by_alias = False and validate_by_name = True if you want to only allow validation by field name. This limitation was not possible with solely the populate_by_name configuration setting.

Note: You cannot set both validate_by_name and validate_by_alias to False. This results in a schema error.

  1. serialize_by_alias has been introduced as a bool type configuration flag, set to False by default to match the by_alias setting on model serialization functions. We anticipate changing this default in V3 to be consistent with validate_by_alias = True by default.

Most of these changes are described in more detail, and practically implemented, in pydantic/pydantic-core#1640.

This makes significant progress on #8379. Some changes (like default value changes) will have to wait for V3.

@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Feb 19, 2025
Copy link

codspeed-hq bot commented Feb 19, 2025

CodSpeed Performance Report

Merging #11468 will not alter performance

Comparing alias-consistency-new-api (ffb5edd) with main (52fe685)

Summary

✅ 46 untouched benchmarks

Copy link

cloudflare-workers-and-pages bot commented Feb 19, 2025

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: ffb5edd
Status: ✅  Deploy successful!
Preview URL: https://c14714b2.pydantic-docs.pages.dev
Branch Preview URL: https://alias-consistency-new-api.pydantic-docs.pages.dev

View logs

…_name

* Bump pydantic_core to appropriate PR
* Document new alias config settings: validate_by_name, validate_by_alias, and serialize_by_alias
* Deprecate and document populate_by_name
* Add initial API docs for new alias config settings
@sydney-runkle sydney-runkle force-pushed the alias-consistency-new-api branch from ef29d15 to 2fdfe89 Compare February 20, 2025 12:12
@sydney-runkle sydney-runkle added relnotes-feature relnotes-change Used for changes to existing functionality which don't have a better categorization. and removed relnotes-fix Used for bugfixes. labels Feb 20, 2025
@sydney-runkle
Copy link
Contributor Author

sydney-runkle commented Feb 21, 2025

Closing and reopening to trigger codspeed on the new pydantic-core commits. Assume that anytime I do this here, that's the purpose :)

@sydney-runkle sydney-runkle marked this pull request as ready for review February 22, 2025 16:00
Comment on lines 797 to 800
class Model(BaseModel):
model_config = ConfigDict(**config_dict)

a: int = Field(validation_alias='A')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heavily tested all of the config / runtime setting combos for each applicable schema type in pydantic-core. It feels redundant to do so here as well, so I've gone with the simple model only approach.

@bllchmbrs
Copy link
Contributor

@sydney-runkle , ignore the Hyperlint failure on this PR.

to empower users with more fine grained validation control. In <v2.11, disabling validation by alias was not possible.

!!! tip
If you set `validate_by_alias` to `False`, you should set `validate_by_name` to `True` to ensure that the field can still be populated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think the literal pattern would really fit better here.. If having this boolean pattern on two configuration values only introduced the inconsistency when setting both validate_by_alias=False, validate_by_name=False, it would be fine (I don't see why users would do so), but I won't be surprised if many users find it counter-intuitive that you also need to set validate_by_name=True here.

I think it's worth reconsidering, cc @samuelcolvin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what should happen if you set validate_by_alias=False, but explicitly set by_alias=True or by_name=True during validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what should happen if you set validate_by_alias=False, but explicitly set by_alias=True or by_name=True during validation?

Validation time settings always take priority, when set. This is the same with strict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sympathetic to the literal pattern argument. If we were starting fully from scratch, I think it might make more sense. Specifically, the boolean traps can be a bit confusing. In particular, the fact that you have to set validate_by_name=True if validate_by_alias=False explicitly is a bit confusing, especially for new users.

One thing we could do to mitigate this challenge is automatically set validate_by_name=True if a user sets validate_By_alias=False.

My thoughts re why we should stick with the 2 boolean flags:

  • It represents less change to this setting compared to a switch to literals - there's already a lot of change going on here, and I'm hesitant to introduce a setting type change as well.
  • 2 boolean flags provide greater configurability for interaction between config and runtime settings, as you can override one behavior and not the other. It's also helpful to have unset markers for each thing. For example:
M1: validate_by_alias = True, validate_by_name = False
M2: validate_by_alias = False, validate_by_name = True

runtime setting: by_name = True

==>

M1: alias and name validation
M2: name only validation

This can't be achieved with the literal approach. Either you'd use:

  • validate_by='name', and M1 would lose alias validation

  • validate_by='name and alias' and M2 would no longer avoid validating with alias

  • Autocomplete is easier with boolean flags, and the behavior is relatively intuitive

Aliases are one of the most common (if not the most commonly used) field tool, so I do think this decision is quite important. I also understand that if we go with bools here, we're stuck with that until at least V4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as discussed on Slack, thanks for summing things up here, this might be useful as a reference in case we get questions about the current API.

As we discussed as well, defaulting validate_by_name to True if validate_by_alias is set to False is postponed after this PR, and should be tackled either before 2.11 or after. Leaving this conversation unresolved so that it's easier to find it later.

@sydney-runkle sydney-runkle added the third-party-tests Add this label on a PR to trigger 3rd party tests label Feb 25, 2025
@sydney-runkle
Copy link
Contributor Author

Need to fix a few things (third party test investigation, mypy, docs build, etc). However, great to see that all of our tests are passing - big step!

@@ -88,7 +88,7 @@ that the version support policy is subject to change at discretion of contributo

* Any required fields that don't have dynamically-determined aliases will be included as required
keyword arguments.
* If the [`populate_by_name`][pydantic.ConfigDict.populate_by_name] model configuration value is set to
* If the [`validate_by_name`][pydantic.ConfigDict.validate_by_name] model configuration value is set to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we should have another pair of ConfigDict settings instantiate_by_name and instantiate_by_alias to distinguish it from model_validate_X (and we should allow them to coexist, resulting in an overloaded __init__).

For background see #8379 (comment) and #6762

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean instanciate_by_* only take effect on direct instantiation (i.e. Model(...))?

This would really complicate the API. Using __init__ directly is better suited when you provide the arguments directly (e.g. Model(a=1, b='test')). In that case, the user can simply provide the aliases (and this is what static type checkers will enforce, we have no control over it).

If you want to validate data where you don't control the provided keys, then model_validate() is better suited anyway: Model.model_validate({'a': 1, 'b': 'test'}), and you can provide by_name=True there.

Copy link

@stevapple stevapple Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using __init__ directly is better suited when you provide the arguments directly (e.g. Model(a=1, b='test')).

This is exactly why some (I would say most if we consider popular serialization frameworks in other languages) developers are reluctant to use alias in a direct instantiation. Field names are carefully chosen according to language conventions, e.g. snake_case for Python, while aliases are decided by business logic. We wouldn't like to see things like Model(SomeRandomValue=147, Env_Global='test') in a Python code review, and it would be unfortunate if we need to annotate every model_validate_X call for this reason.

If we don't want that complexity, I would suggest enforcing validate_by_name for __init__ (keep the current behavior in v2 for compatibility). This is intuitive and aligned with dataclass and other frameworks in statically typed languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_by_name logic still applies to __init__ :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intuitive and aligned with dataclass and other frameworks in statically typed languages.

Dataclasses don't make use of aliases, but this is something supported by the @dataclass_transform spec, and as per the fields specifiers section:

alias is an optional str parameter that provides an alternative name for the field. This alternative name is used in the synthesized __init__ method.

But I get your point, Model(SomeRandomValue=147, Env_Global='test') feels weird in Python code. The fact that type checkers will enforce aliases in __init__ is unfortunate though.

This merits a broader discussion, currently we don't have a proper distinction between direct instantiation (__init__) and the model_validate(_*) methods when it comes to validation behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevapple, feel free to open an issue with a summary of this discussion!

Copy link
Contributor

github-actions bot commented Feb 26, 2025

Coverage report

This PR does not seem to contain any modification to coverable code.

@sydney-runkle sydney-runkle changed the title Improve alias management API Improve alias configutation APIs Feb 26, 2025
@sydney-runkle sydney-runkle force-pushed the alias-consistency-new-api branch 3 times, most recently from 19d1e8b to b3aaa78 Compare February 26, 2025 14:54
@sydney-runkle sydney-runkle force-pushed the alias-consistency-new-api branch from b3aaa78 to ffb5edd Compare February 26, 2025 15:06
Copy link
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, good to see this highly requested issue (almost) tackled!

@sydney-runkle sydney-runkle merged commit acb0f10 into main Feb 26, 2025
85 checks passed
@sydney-runkle sydney-runkle deleted the alias-consistency-new-api branch February 26, 2025 15:11
@sydney-runkle sydney-runkle changed the title Improve alias configutation APIs Improve alias configuration APIs Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-change Used for changes to existing functionality which don't have a better categorization. relnotes-feature third-party-tests Add this label on a PR to trigger 3rd party tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants