Initial Checks
Greetings
Hey here!
Summary
Exponential serialization perfomance drop for really edge-case scenario (which we unfortunately encountered).
Context:
- we have been using
pydantic as our core component for over 1.5 years now;
- we have deep inheritance hierarchies (up to 5 levels of model inheritance);
- we have heavily nested entity structures (nesting depth reaches up to 8 levels);
- we have large union types that use
Discriminator (17 entities in the largest union);
- we have large serialized payloads (my initial debug entity was 17k lines of JSON);
What happened:
After adding an ordinary field_validator we had serialization time increased from 0.044 seconds to 183.207 seconds (for the entity of 17k lines), which is about: (183.163 / 0.044) × 100% = 416279% increase.
Here's a simplified version of the original code that has caused the perfomance issue:
class FooType(StrEnum):
FOO_ENABLED = "foo_enabled"
FOO_DISABLED = "foo_disabled"
class InnerTool(BaseModel):
foo: FooType = FOO_ENABLED
class WrapTool(BaseModel):
tool: InnerTool
context_settings: dict[str, Any]
# START OF NEW CODE:
@field_validator("tool")
@classmethod
def _model_validate_tool(
cls,
tool: Tool,
info: ValidationInfo,
) -> Tool:
# Automatically set context for `InnerTool` in runtime
context_settings = info.data.get("context_settings")
if not context_settings:
return tool
if isinstance(tool, InnerTool):
if "foo" in context_settings:
tool.foo = context_settings["foo"]
return tool
# END OF NEW CODE
What we did after that:
- we discovered the perfomance drop some time later;
- we use
FastAPI, if was fully blocked while serializing the response and was not responding to any other requests for a few minutes straight;
- unfortunately we had the code untested for some time, so we had to trace back at what commit the perfomance drop had happened;
- then we had to debug what lines of code had caused this behaviour;
- the harderst part was to create an MRE for this, this alone has taken about 10 hours of concentrated work;
What seems to be the issue:
- the core (?) issue seems to be setting
entity.enum_field = "string_value" inside field_validator;
- but it also happens ONLY if some other model has
@model_serializer(mode="wrap") overridden (yes, I know it sounds really strange);
How I have fixed it (temporary solution, perhaps?):
I simply coersed string values to enums.
Before:
if "foo" in context_settings:
tool.foo = context_settings["foo"]
After:
if "foo" in context_settings:
tool.foo = FooType(context_settings["foo"])
Result:
Before field_validator: 0.044 seconds;
Without coersion: 183.207 seconds;
With coersion: 0.041 seconds;
MRE:
Below is the MRE.
The 'base' time for the serialization is 5 seconds (if you just copy and paste the code).
Since there many variables that somehow affect this issue, I tried to leave all of them and separate them by 'cases'.
I also couldn't make this any smaller, because it has a lot of variables that affect the perfomance...
Code
# /// script
# dependencies = [
# "pydantic==2.12.5",
# ]
# requires-python = ">=3.11"
# ///
# ruff: noqa: T201
# ruff: noqa: E501
from __future__ import annotations as _annotations
import copy
import time
from enum import Enum, StrEnum
from typing import TYPE_CHECKING, Annotated, Any, Final, Literal
from pydantic import (
BaseModel,
Discriminator,
TypeAdapter,
__version__,
field_validator,
model_serializer,
)
if TYPE_CHECKING:
from collections.abc import Callable, Iterable
from pydantic_core.core_schema import SerializationInfo, ValidationInfo
# BASE BENCHMARK: 5s
class DataType(StrEnum):
STRING = "string"
OBJECT = "object"
NULL = "null"
NUMBER = "number"
INTEGER = "integer"
BOOLEAN = "boolean"
ARRAY = "array"
class Schema(BaseModel):
type: DataType | list[DataType] | None = None
properties: dict[str, Schema] | None = None
# CASE1:
# IF COMMENT OUT `__model_serializer_wrap`:
# 5s -> Time taken: 0.007714509963989258 seconds
# Comment: I genually don't know and understand the reason it affects anything, but it does
@model_serializer(mode="wrap")
def __model_serializer_wrap(
self,
handler: Callable[[BaseModel], dict[str, Any]],
_info: SerializationInfo,
) -> dict[str, Any]:
return handler(self)
# CASE2:
# IF COMMENT OUT `__repr_args__`:
# 5s -> 4.8-5s
# Quite insignificant change, but still surprising for me, that's why I left it in the MRE
def __repr_args__(self) -> Iterable[tuple[str | None, Any]]:
return (
(k, v.value if isinstance(v, Enum) else v)
for k, v in super().__repr_args__()
if v is not None
)
# CASE3:
# Core of the issue
# OPTION1 - `StrEnum` + set value is `string`: 5s
# Base case, takes 5 seconds because we set string, not an enum
class MyStrangeEnum(StrEnum):
ENABLED = "enabled"
DISABLED = "disabled"
STRANGE_ENUM_DEFAULT: Final[MyStrangeEnum] = MyStrangeEnum.ENABLED
STRANGE_ENUM_OTHER_VALUE: Final[MyStrangeEnum] = "disabled"
# OPTION2 - `StrEnum` + set value is `Enum`: 5s -> 0.0020003318786621094 seconds + [no warnings]
# In this case we operate ONLY with `Enums`, NOT strings
# class MyStrangeEnum(StrEnum):
# ENABLED = "enabled"
# DISABLED = "disabled"
#
#
# STRANGE_ENUM_DEFAULT: Final[MyStrangeEnum] = MyStrangeEnum.ENABLED
# STRANGE_ENUM_OTHER_VALUE: Final[MyStrangeEnum] = MyStrangeEnum.DISABLED
class InnerTool(BaseModel):
type: Literal["tool:inner"] = "tool:inner"
my_enum: MyStrangeEnum = STRANGE_ENUM_DEFAULT
class WrapTool(BaseModel):
type: Literal["tool:wrap"] = "tool:wrap"
input_schema: Schema
tool: InnerTool
@field_validator("tool")
@classmethod
def _model_validate_tool(
cls,
tool: Tool,
_info: ValidationInfo,
) -> InnerTool:
# CASE4:
# IF COMMENT OUT `tool.my_enum=STRANGE_ENUM_OTHER_VALUE`:
# 5s -> Time taken: 0.0009996891021728516 seconds
# The main line where the issue happens
tool.my_enum = STRANGE_ENUM_OTHER_VALUE
return tool
class ToolType1(BaseModel):
type: Literal["tool:type-1"] = "tool:type-1"
class ToolType2(BaseModel):
type: Literal["tool:type-2"] = "tool:type-2"
class ToolType3(BaseModel):
type: Literal["tool:type-3"] = "tool:type-3"
class ToolType4(BaseModel):
type: Literal["tool:type-4"] = "tool:type-4"
class ToolType5(BaseModel):
type: Literal["tool:type-5"] = "tool:type-5"
# CASE5:
# The more entities in a union we have, the more time it takes (exponentially).
# But why, if it should be discriminted by `type` field?
# OPTION1: Union of 7 entities
# 5s
type Tool = Annotated[
InnerTool | WrapTool | ToolType1 | ToolType2 | ToolType3 | ToolType4 | ToolType5,
Discriminator("type"),
]
# OPTION2: Union of 2 entities
# 5s -> 0.8845579624176025s
# type Tool = Annotated[
# InnerTool | WrapTool,
# Discriminator("type"),
# ]
def gen_json_schema(num_properties: int) -> dict:
"""Generate a JSON schema with the specified number of properties."""
# CASE6:
# Also seems pretty strange for me that it can affect the serialization...
# OPTION1: ALL TYPES PRESENT:
# 5s
types = [
{"type": "string"},
{"type": "integer"},
{"type": "number"},
{"type": "boolean"},
{"type": "array", "items": {"type": "string"}},
{"type": "object", "properties": {"nested": {"type": "string"}}},
]
# OPTION2: ONLY `string` IS PRESENT:
# 5s -> 3.7s
# types = [
# {"type": "string"},
# ]
properties = {}
for i in range(num_properties):
prop_name = f"property_{i}"
prop_type = types[i % len(types)]
properties[prop_name] = prop_type
return {
"type": "object",
"properties": properties,
}
tools_adapter = TypeAdapter(list[Tool])
tool = WrapTool(
input_schema=Schema(
# CASE7:
# The more schema properties -> the longer is the serialization time
# OPTION1: 65 properties => 5 seconds
**gen_json_schema(num_properties=65),
# OPTION2: 10 properties => 0.18109583854675293 seconds
# **gen_json_schema(num_properties=10),
# OPTION3: 100 properties => 11.81133508682251 seconds
# **gen_json_schema(num_properties=100),
),
tool=InnerTool(),
)
# CASE8:
# The more tools properties -> the longer is the serialization time
# OPTION1: 5 tools => 5 seconds
tools_amount = 5
# OPTION2: 1 tools => 0.10 seconds
# tools_amount = 1
# OPTION3: 10 tools => 21 seconds
# tools_amount = 10
tools = [copy.deepcopy(tool) for _ in range(tools_amount)]
print("serialization.start")
time_start = time.time()
data = tools_adapter.dump_python(tools)
time_end = time.time()
time_elapsed = time_end - time_start
print("serialization.end")
print("PYDANTIC VERSION", __version__)
print(f"Time taken: {time_elapsed} seconds")
The affected pydantic versions:
2.12.5 - 5s
2.12.0b1 - 5s
2.11.10 - 5s
2.11.5 - 5s
2.11.3 - 5s
2.11.2 - 5s
2.11.1 - 3.2s
2.11.0 - 3.2s
2.11.0b2 - 3.4s
2.11.0b1 - 3.3s
2.11.0a2 - 0.025s
2.11.0a1 - 0.024s
2.10.6 - 0.03s
Python, Pydantic & OS Version
pydantic version: 2.12.0b1
pydantic-core version: 2.40.1
pydantic-core build: profile=release pgo=false
python version: 3.12.12 (main, Jan 14 2026, 19:30:21) [MSC v.1944 64 bit (AMD64)]
platform: Windows-10-10.0.19045-SP0
related packages: email-validator-2.3.0 fastapi-0.117.0 pydantic-extra-types-2.10.6 pydantic-settings-2.11.0 typing_extensions-4.15.0
commit: unknown
Initial Checks
Greetings
Hey here!
Summary
Exponential serialization perfomance drop for really edge-case scenario (which we unfortunately encountered).
Context:
pydanticas our core component for over 1.5 years now;Discriminator(17 entities in the largest union);What happened:
After adding an ordinary
field_validatorwe had serialization time increased from0.044seconds to183.207seconds (for the entity of 17k lines), which is about: (183.163 / 0.044) × 100% = 416279% increase.Here's a simplified version of the original code that has caused the perfomance issue:
What we did after that:
FastAPI, if was fully blocked while serializing the response and was not responding to any other requests for a few minutes straight;What seems to be the issue:
entity.enum_field = "string_value"insidefield_validator;@model_serializer(mode="wrap")overridden (yes, I know it sounds really strange);How I have fixed it (temporary solution, perhaps?):
I simply coersed string values to enums.
Before:
After:
Result:
Before
field_validator: 0.044 seconds;Without coersion: 183.207 seconds;
With coersion: 0.041 seconds;
MRE:
Below is the MRE.
The 'base' time for the serialization is 5 seconds (if you just copy and paste the code).
Since there many variables that somehow affect this issue, I tried to leave all of them and separate them by 'cases'.
I also couldn't make this any smaller, because it has a lot of variables that affect the perfomance...
Code
The affected
pydanticversions:Python, Pydantic & OS Version