Skip to content

Exponential perfomance drop (100k%) for edge cases serialization in pydantic>=2.11.0b1 (0.044 seconds -> 183.207 seconds) #12800

@Danipulok

Description

@Danipulok

Initial Checks

  • I confirm that I'm using Pydantic V2

Greetings

Hey here!

Summary

Exponential serialization perfomance drop for really edge-case scenario (which we unfortunately encountered).

Context:

  • we have been using pydantic as our core component for over 1.5 years now;
  • we have deep inheritance hierarchies (up to 5 levels of model inheritance);
  • we have heavily nested entity structures (nesting depth reaches up to 8 levels);
  • we have large union types that use Discriminator (17 entities in the largest union);
  • we have large serialized payloads (my initial debug entity was 17k lines of JSON);

What happened:

After adding an ordinary field_validator we had serialization time increased from 0.044 seconds to 183.207 seconds (for the entity of 17k lines), which is about: (183.163 / 0.044) × 100% = 416279% increase.

Here's a simplified version of the original code that has caused the perfomance issue:

class FooType(StrEnum):
    FOO_ENABLED = "foo_enabled"
    FOO_DISABLED = "foo_disabled"

class InnerTool(BaseModel):
    foo: FooType = FOO_ENABLED

class WrapTool(BaseModel):
    tool: InnerTool
    context_settings: dict[str, Any]

    # START OF NEW CODE:
    @field_validator("tool")
    @classmethod
    def _model_validate_tool(
        cls,
        tool: Tool,
        info: ValidationInfo,
    ) -> Tool:
        # Automatically set context for `InnerTool` in runtime
        context_settings = info.data.get("context_settings")
        if not context_settings:
            return tool

        if isinstance(tool, InnerTool):
            if "foo" in context_settings:
                tool.foo = context_settings["foo"]

        return tool
    # END OF NEW CODE

What we did after that:

  • we discovered the perfomance drop some time later;
  • we use FastAPI, if was fully blocked while serializing the response and was not responding to any other requests for a few minutes straight;
  • unfortunately we had the code untested for some time, so we had to trace back at what commit the perfomance drop had happened;
  • then we had to debug what lines of code had caused this behaviour;
  • the harderst part was to create an MRE for this, this alone has taken about 10 hours of concentrated work;

What seems to be the issue:

  • the core (?) issue seems to be setting entity.enum_field = "string_value" inside field_validator;
  • but it also happens ONLY if some other model has @model_serializer(mode="wrap") overridden (yes, I know it sounds really strange);

How I have fixed it (temporary solution, perhaps?):

I simply coersed string values to enums.

Before:

if "foo" in context_settings:
    tool.foo = context_settings["foo"]

After:

if "foo" in context_settings:
    tool.foo = FooType(context_settings["foo"])

Result:
Before field_validator: 0.044 seconds;
Without coersion: 183.207 seconds;
With coersion: 0.041 seconds;

MRE:

Below is the MRE.
The 'base' time for the serialization is 5 seconds (if you just copy and paste the code).
Since there many variables that somehow affect this issue, I tried to leave all of them and separate them by 'cases'.
I also couldn't make this any smaller, because it has a lot of variables that affect the perfomance...

Code

# /// script
# dependencies = [
#     "pydantic==2.12.5",
# ]
# requires-python = ">=3.11"
# ///

# ruff: noqa: T201
# ruff: noqa: E501

from __future__ import annotations as _annotations

import copy
import time
from enum import Enum, StrEnum
from typing import TYPE_CHECKING, Annotated, Any, Final, Literal

from pydantic import (
    BaseModel,
    Discriminator,
    TypeAdapter,
    __version__,
    field_validator,
    model_serializer,
)

if TYPE_CHECKING:
    from collections.abc import Callable, Iterable

    from pydantic_core.core_schema import SerializationInfo, ValidationInfo


# BASE BENCHMARK: 5s


class DataType(StrEnum):
    STRING = "string"
    OBJECT = "object"
    NULL = "null"
    NUMBER = "number"
    INTEGER = "integer"
    BOOLEAN = "boolean"
    ARRAY = "array"


class Schema(BaseModel):
    type: DataType | list[DataType] | None = None
    properties: dict[str, Schema] | None = None

    # CASE1:
    # IF COMMENT OUT `__model_serializer_wrap`:
    # 5s -> Time taken: 0.007714509963989258 seconds
    # Comment: I genually don't know and understand the reason it affects anything, but it does
    @model_serializer(mode="wrap")
    def __model_serializer_wrap(
        self,
        handler: Callable[[BaseModel], dict[str, Any]],
        _info: SerializationInfo,
    ) -> dict[str, Any]:
        return handler(self)

    # CASE2:
    # IF COMMENT OUT `__repr_args__`:
    # 5s -> 4.8-5s
    # Quite insignificant change, but still surprising for me, that's why I left it in the MRE
    def __repr_args__(self) -> Iterable[tuple[str | None, Any]]:
        return (
            (k, v.value if isinstance(v, Enum) else v)
            for k, v in super().__repr_args__()
            if v is not None
        )


# CASE3:
# Core of the issue

# OPTION1 - `StrEnum` + set value is `string`: 5s
# Base case, takes 5 seconds because we set string, not an enum
class MyStrangeEnum(StrEnum):
    ENABLED = "enabled"
    DISABLED = "disabled"


STRANGE_ENUM_DEFAULT: Final[MyStrangeEnum] = MyStrangeEnum.ENABLED
STRANGE_ENUM_OTHER_VALUE: Final[MyStrangeEnum] = "disabled"

# OPTION2 - `StrEnum` + set value is `Enum`: 5s -> 0.0020003318786621094 seconds + [no warnings]
# In this case we operate ONLY with `Enums`, NOT strings
# class MyStrangeEnum(StrEnum):
#     ENABLED = "enabled"
#     DISABLED = "disabled"
#
#
# STRANGE_ENUM_DEFAULT: Final[MyStrangeEnum] = MyStrangeEnum.ENABLED
# STRANGE_ENUM_OTHER_VALUE: Final[MyStrangeEnum] = MyStrangeEnum.DISABLED


class InnerTool(BaseModel):
    type: Literal["tool:inner"] = "tool:inner"
    my_enum: MyStrangeEnum = STRANGE_ENUM_DEFAULT


class WrapTool(BaseModel):
    type: Literal["tool:wrap"] = "tool:wrap"
    input_schema: Schema
    tool: InnerTool

    @field_validator("tool")
    @classmethod
    def _model_validate_tool(
        cls,
        tool: Tool,
        _info: ValidationInfo,
    ) -> InnerTool:
        # CASE4:
        # IF COMMENT OUT `tool.my_enum=STRANGE_ENUM_OTHER_VALUE`:
        # 5s -> Time taken: 0.0009996891021728516 seconds
        # The main line where the issue happens
        tool.my_enum = STRANGE_ENUM_OTHER_VALUE
        return tool


class ToolType1(BaseModel):
    type: Literal["tool:type-1"] = "tool:type-1"


class ToolType2(BaseModel):
    type: Literal["tool:type-2"] = "tool:type-2"


class ToolType3(BaseModel):
    type: Literal["tool:type-3"] = "tool:type-3"


class ToolType4(BaseModel):
    type: Literal["tool:type-4"] = "tool:type-4"


class ToolType5(BaseModel):
    type: Literal["tool:type-5"] = "tool:type-5"


# CASE5:
# The more entities in a union we have, the more time it takes (exponentially).
# But why, if it should be discriminted by `type` field?

# OPTION1: Union of 7 entities
# 5s
type Tool = Annotated[
    InnerTool | WrapTool | ToolType1 | ToolType2 | ToolType3 | ToolType4 | ToolType5,
    Discriminator("type"),
]

# OPTION2: Union of 2 entities
# 5s -> 0.8845579624176025s
# type Tool = Annotated[
#     InnerTool | WrapTool,
#     Discriminator("type"),
# ]


def gen_json_schema(num_properties: int) -> dict:
    """Generate a JSON schema with the specified number of properties."""

    # CASE6:
    # Also seems pretty strange for me that it can affect the serialization...

    # OPTION1: ALL TYPES PRESENT:
    # 5s
    types = [
        {"type": "string"},
        {"type": "integer"},
        {"type": "number"},
        {"type": "boolean"},
        {"type": "array", "items": {"type": "string"}},
        {"type": "object", "properties": {"nested": {"type": "string"}}},
    ]

    # OPTION2: ONLY `string` IS PRESENT:
    # 5s -> 3.7s
    # types = [
    #     {"type": "string"},
    # ]

    properties = {}
    for i in range(num_properties):
        prop_name = f"property_{i}"
        prop_type = types[i % len(types)]
        properties[prop_name] = prop_type

    return {
        "type": "object",
        "properties": properties,
    }


tools_adapter = TypeAdapter(list[Tool])
tool = WrapTool(
    input_schema=Schema(
        # CASE7:
        # The more schema properties -> the longer is the serialization time
        # OPTION1: 65 properties => 5 seconds
        **gen_json_schema(num_properties=65),
        # OPTION2: 10 properties => 0.18109583854675293 seconds
        # **gen_json_schema(num_properties=10),
        # OPTION3: 100 properties => 11.81133508682251 seconds
        # **gen_json_schema(num_properties=100),
    ),
    tool=InnerTool(),
)

# CASE8:
# The more tools properties -> the longer is the serialization time
# OPTION1: 5 tools => 5 seconds
tools_amount = 5
# OPTION2: 1 tools => 0.10 seconds
# tools_amount = 1
# OPTION3: 10 tools => 21 seconds
# tools_amount = 10

tools = [copy.deepcopy(tool) for _ in range(tools_amount)]

print("serialization.start")
time_start = time.time()
data = tools_adapter.dump_python(tools)
time_end = time.time()
time_elapsed = time_end - time_start
print("serialization.end")
print("PYDANTIC VERSION", __version__)
print(f"Time taken: {time_elapsed} seconds")

The affected pydantic versions:

2.12.5 - 5s
2.12.0b1 - 5s
2.11.10 - 5s
2.11.5 - 5s
2.11.3 - 5s
2.11.2 - 5s
2.11.1 - 3.2s
2.11.0 - 3.2s
2.11.0b2 - 3.4s
2.11.0b1 - 3.3s
2.11.0a2 - 0.025s
2.11.0a1 -  0.024s
2.10.6 - 0.03s

Python, Pydantic & OS Version

pydantic version: 2.12.0b1
        pydantic-core version: 2.40.1
          pydantic-core build: profile=release pgo=false
               python version: 3.12.12 (main, Jan 14 2026, 19:30:21) [MSC v.1944 64 bit (AMD64)]
                     platform: Windows-10-10.0.19045-SP0
             related packages: email-validator-2.3.0 fastapi-0.117.0 pydantic-extra-types-2.10.6 pydantic-settings-2.11.0 typing_extensions-4.15.0
                       commit: unknown

Metadata

Metadata

Assignees

Labels

bug V2Bug related to Pydantic V2topic-discriminated unionsRelated to discriminated unionstopic-serializationHow Pydantic serializes data, often related to `model_dump`, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions