Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent and intuitive alias behavior for validation and serialization #8379

Open
sydney-runkle opened this issue Dec 15, 2023 · 27 comments
Assignees
Labels
feature request v3 Under consideration for V3
Milestone

Comments

@sydney-runkle
Copy link
Contributor

sydney-runkle commented Dec 15, 2023

Right now, we have some inconsistent behavior in terms of using aliases in validation and serialization.

By default, if an alias or validation_alias is defined on a field, we use the alias for validation. This behavior can be changed by setting populate_by_name to True on the model_config.

Conversely, if an alias or serialization_alias is defined on a field, that alias not used by default for serialization. We must specify by_alias=True in the call to model_dump + other serialization functions.

I propose that in V3:

  • We use alias by default for both validation and serialization
  • We add a field (or fields, but ideally just one) to ConfigDict to support different behavior than the default

This is a breaking change, hence the V3 label.

Requests to change the inconsistent default behavior have been made for a few years, so I'm going to comb through issues and close those so we can centralize discussion here.

@marcussaad
Copy link

Plus one having this behavior match and an option to pass to ConfigDict!

@bluenote10
Copy link
Contributor

It would be great if this could be considered for V2 as well, because it looks more like a bug currently, no? The current behavior is not what the original plan for the alias system #5426 (comment) had specified (every other row in that specification table currently misbehaves). Also, the current behavior means that it is effectively not possible in V2 to decouple the Python field name from its serialized name -- which is kind of the primary feature of the alias system.

@sydney-runkle
Copy link
Contributor Author

@bluenote10,

Though I agree that the current behavior is not great, changing the API is enough of a breaking change that I don't think it makes sense to do in V2.

Thanks for linking that comment - that's a helpful reference to have in the future on this issue.

Also, the current behavior means that it is effectively not possible in V2 to decouple the Python field name from its serialized name -- which is kind of the primary feature of the alias system.

Could you please say more about this? I don't think I understand the issue you're having.

@bluenote10
Copy link
Contributor

Could you please say more about this? I don't think I understand the issue you're having.

I'm often seeing variations of code that contains funny comments like these:

class SomeModel(BaseModel):
    # Please don't change these field names even though they poorly named!
    # They are determined by external API X / third-party JSON schema Y / ...
    foo: int
    bar: int

or even

class SomeModel(BaseModel):
    # Please don't fix the typo in this field name to avoid breaking backwards compatibility.
    field_wit_typo: int

This raises the question: Can we not simply decouple the field name visible in the code from the representation in the serialized layer? The alias system seems to be the feature that should provide exactly that. Ideally the code should look like:

class SomeModel(BaseModel):
    # We are decoupling the field names from external API X for better readability
    better_named_foo: int = Field(alias="foo")
    better_named_bar: int = Field(alias="bar")

However currently in V2 the alias system is a rather dangerous pitfall: It suggests that it addresses this use case but it doesn't! The above code looks right, and a shallow experiment may even wrongly conclude that it does what one expects. But as demonstrated in #8551, none of the 8 possible combinations of alias, validation_alias, serialization_alias and populate_by_name actually behaves correctly in the sense of fully decoupling the field name (behaving exactly like the un-aliased model)!

Identifying each and every occurrence of model_dump and adding by_alias is not "behaving like the un-aliased model": Forgetting to do so is very easy (especially in generic usages of models), and will be a big bug, because the code will serialize to an invalid payload. This strategy is way too bug prone to be a realistic work-around.

Having to use a custom serializer on top doesn't feel right either. Currently the alias system is non-intuitive, does not cover its primary use case, does not follow the docs and the specs in #5426 (comment), so it looks a bit more like a bug, no?

@justin-snyder-slgg
Copy link

This is a pretty major pitfall for my current attempt at upgrading a codebase to 2.x. While V2 aliases seemed promising at first, I am still running into a fair amount of frustration with regards to the by_alias=False default and surprising behavior where sometimes the wrong name is being serialized into the json.

@alexanderankin
Copy link

alexanderankin commented Mar 13, 2024

are we just waiting for PR's on this? right? or are we not sold on a particular solution - where are we at with this?

diff --git a/pydantic/config.py b/pydantic/config.py
index 7edf7c60..504d627d 100644
--- a/pydantic/config.py
+++ b/pydantic/config.py
@@ -161,6 +161,11 @@ class ConfigDict(TypedDict, total=False):
     3. The model is populated by the field name `'name'`.
     """
 
+    serialize_by_name: bool
+    """
+    counter part to populate_by_name
+    """
+
     use_enum_values: bool
     """
     Whether to populate models with the `value` property of enums, rather than the raw enum.
diff --git a/pydantic/main.py b/pydantic/main.py
index 525c8f98..fc61d8a7 100644
--- a/pydantic/main.py
+++ b/pydantic/main.py
@@ -363,6 +363,8 @@ class BaseModel(metaclass=_model_construction.ModelMetaclass):
         Returns:
             A JSON string representation of the model.
         """
+        if self.model_config.get("serialize_by_name", False):
+            by_alias = True
         return self.__pydantic_serializer__.to_json(
             self,
             indent=indent,

</details.

@samuelcolvin
Copy link
Member

I think we can add an option to config now. We can't change the default behaviour until V3.

@fortify-avnenciu
Copy link

fortify-avnenciu commented Jul 15, 2024

@samuelcolvin
Hi! Is anybody working on implementing the "new option in the config" approach?
I would like to pick it up if not.

I have not made any contributions to pydantic before, but it looks like a pretty manageable change.

@alexanderankin
Copy link

alexanderankin commented Jul 15, 2024

I started learning rust in order to contribute to pydantic core but so far have not been able to get anything meaningful working. iirc its going to be a change to core.

which is to say im not working on it at the moment

@sydney-runkle
Copy link
Contributor Author

I don't know of anyone working on it at the moment.

This will definitely require changes in pydantic-core, as well as changes in pydantic. Most of the logical changes will be in pydantic-core, whereas the changes in pydantic will just reflect the new API.

@fortify-avnenciu or @jammymalina, feel free to take a stab!

@mmzeynalli
Copy link

I have different case that needs this feature, decided to share in this thread too:

If I have a superclass model which has fields with aliases, I would like to be able to define how to serialize in subclass. For example, I have class which converts date object to month and year as ints, and I have two models which inherits this model:

class DateToMonthYearSchema(BaseModel):
    model_config = ConfigDict(populate_by_name=True)

    issue_date: date = Field(
        exclude=True,
        alias='start_date',
    )
    expire_date: Optional[date] = Field(
        default=None,
        exclude=True,
        alias='end_date',
    )

    @computed_field(alias='start_date_month')  # type: ignore[misc]
    @property
    def issue_date_month(self) -> int:
        return self.issue_date.month

    @computed_field(alias='start_date_year')  # type: ignore[misc]
    @property
    def issue_date_year(self) -> int:
        return self.issue_date.year
        
class UserWorkExperienceSchema(DateToMonthYearSchema):
    id: int
    currently_working: bool

class CertificateSchema(DateToMonthYearSchema):
    id: int

Ideally, I can use `.model_dump(by_alias=True), however, these two models are also part of another model:

class UserProfileSchema(BaseModel):
    work_experiences: list[UserWorkExperienceSchema] = []
    certificates: list[CertificateSchema] = []

So, I get either start_date or issue_date when serializing UserProfileSchema

@Youssefares
Copy link
Contributor

There doesn't seem to be any open PRs for this on pydantic-core, so I am going to look at this.. 🧐 Very keen to see this feature.

@sydney-runkle
Copy link
Contributor Author

@Youssefares,

Any progress? i can also take a look at this for v2.10, if desired.

@Viicos Viicos mentioned this issue Sep 12, 2024
1 task
@Youssefares
Copy link
Contributor

Hey @sydney-runkle I had only started to look at this when you commented, so tbf feel free to reassign if this is high in list of priorities, otherwise I can probably up a PR up for this within the next 7-10 days.

@alexanderankin
Copy link

Is there a related pydantic core issue for keeping track of failed attempts yet

@sydney-runkle
Copy link
Contributor Author

Hey @sydney-runkle I had only started to look at this when you commented, so tbf feel free to reassign if this is high in list of priorities, otherwise I can probably up a PR up for this within the next 7-10 days.

Great, ping me when you're ready :).

@mirober
Copy link

mirober commented Oct 13, 2024

It would be great to see some way of doing:

class SomeModel(BaseModel):
    # We are decoupling the field names from external API X for better readability
    better_named_foo: int = Field(alias="foo")
    better_named_bar: int = Field(alias="bar")

I have recently started using pydantic and have been really impressed, but I am working with a lot of camelCase APIs and am having to replicate this in all of my models, which feels very awkward.

I would be happy to put some time into a solution to this if there is agreement on what it should look like - there have been a lot of suggestions in different threads.

Maybe if v3 is going to fix the behaviour this could be backported under a different argument to Field? e.g. you could have better_named_foo: int = Field(alias_v3="foo").

@alicederyn
Copy link
Contributor

It would be great if the Pydantic model could completely separate the Python attribute name for a field from the dictionary key used when serializing and deserializing JSON/YAML/etc.

class SomeModel(BaseModel):
  some_field: str
  model_config = ConfigDict(serialized_key_generator=to_camel, extra="forbid")

SomeModel(some_field="x")  # Legal
SomeModel(someField="x")  # Illegal: someField not allowed here
SomeModel.model_validate_json('{"someField": "x"}')  # Legal
SomeModel.model_validate_json('{"some_field": "x"}')  # Illegal: some_field not allowed here
SomeModel(some_field="x").model_dump_json()  # {"someField": "x"}

The Python attribute name has constraints that are very different from the serialized form. Currently it seems as if the only options are:

  • allowing the attribute name to appear in the JSON, which breaks validation, or
  • not allowing the attribute name to be used in Python in the constructor, which I think breaks type checkers, and makes it impossible to use keyword arguments at all if the JSON key happens to not be a valid Python attribute, e.g. if it is the reserved word type

I think this is addressing the same problem as #8379 (comment) only without changing how aliases work? Also, I think this could be maybe done in v2 since it's opt-in?

@kamzil
Copy link

kamzil commented Nov 8, 2024

@mrob95 @alicederyn doesn't model_config = ConfigDict(alias_generator=to_camel) together with using by_alias=True in model_dump[_json] achieve what you want? We just need a new config setting to make the model use by_alias=True by default when dumping.

@alicederyn
Copy link
Contributor

As far as I can tell, no, it does not.

@stevapple
Copy link

stevapple commented Nov 27, 2024

Just wants to give a shot at #6762 here. Ideally when we use ModelName(key=value) instead of ModelName.model_validate(obj), we're requesting a construction from its programmatic name instead of aliases. This means we should have different defaults for #10921 regarding these two methods.

A new model config might be unavoidable before we can make it the new default behavior. populate_by_names is not capable because it is for "allowing programmatic name in addition to alias name", while for strict validation we need only one of them. This is a big difference from by_alias which doesn't introduce such ambiguity.

cc @sydney-runkle who seems to be working on it.

@alicederyn
Copy link
Contributor

@stevapple how about my suggestion of adding a new metadata type, serialized_key, instead of trying to retrofit this onto aliases?

@stevapple
Copy link

stevapple commented Nov 27, 2024

@alicederyn As long as others are comfortable about the migration from alias, which I believe has been heavily used for such case.

With your proposed change it looks like alias should only be used to loose the validation, and thus a deprecation in favor of serialized_key+aliases should be more proper (and a long list of options can be dropped!).

@sydney-runkle
Copy link
Contributor Author

I plan on working on this for v2.11 👍

@MtkN1
Copy link

MtkN1 commented Feb 2, 2025

We are excited that the pydantic team has planned this for v2.11!

I don't know what the API will look like, but I wrote some test code as a small contribution. If I understand correctly, this is the behavior users should expect.

import pytest
from pydantic import BaseModel, Field, ValidationError


class SomeModel(BaseModel):
    some_field: str = Field(alias="someField")  # NOTE: Replace "alias" with the new API


def test_construct() -> None:
    # Currently, ValidationError is raised at runtime and the type checker reports an error
    m = SomeModel(some_field="x")

    assert m.some_field == "x"
    assert m.model_dump_json() == '{"someField":"x"}'


def test_construct_illegal() -> None:
    with pytest.raises(ValidationError):
        SomeModel(someField="x")  # Currently, DID NOT RAISE


def test_validate() -> None:
    m = SomeModel.model_validate_json('{"someField":"x"}')

    assert m.some_field == "x"
    # Currently, AssertionError '{"some_field":"x"}'
    assert m.model_dump_json() == '{"someField":"x"}'


def test_validate_illegal() -> None:  # Currently, OK
    with pytest.raises(ValidationError):
        SomeModel.model_validate_json('{"some_field":"x"}')

@sydney-runkle sydney-runkle self-assigned this Feb 10, 2025
@sydney-runkle
Copy link
Contributor Author

Yes, we're excited to have this land in v2.11 as well. I'll be starting work on this today :)

@sydney-runkle
Copy link
Contributor Author

I've merged #11468 which makes some significant strides towards a unified API here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request v3 Under consideration for V3
Projects
None yet
Development

No branches or pull requests