- Structured Output helps to constrain LLM's output for better consistency for downstream tasks
- Most LLM providers use Pydantic for Structured Output, e.g. OpenAI, Gemini
- Pydantic can be very lengthy to define; StrictJSON schema allows you to create a Pydantic Model in a concise way
- Details on usage can be found in the Jupyter Notebook: Provider Structured Output
from typing import List
from pydantic import BaseModel, Field
from datetime import date as Date
class Participant(BaseModel):
Name: str = Field(..., description="starting with A")
Age: int = Field(..., description="between 5 to 12")
class CalendarEvent(BaseModel):
name: str = Field(..., description="Name of birthday party")
date: Date = Field(..., description="Any date in March 2026")
participants: List[Participant]{"name": "Name of birthday party, str",
"date": "Any date in Mar 2026, date",
"participants": [{'Name': 'starting with A, str',
'Age': 'between 5 to 12, int'}]}pip install strictjsonfrom strictjson import convert_schema_to_pydantic
output_format = {"name": "Name of birthday party, str",
"date": "Any date in Mar 2026, date",
"participants": [{'Name': 'starting with A, str',
'Age': 'between 5 to 12, int'}]}
# This is preferred, can be used for Gemini models
pydantic_model = convert_schema_to_pydantic(output_format)
# use this code if you need json schema input for the LLM
json_schema = pydantic_model.model_json_schema()- Use the pydantic model generated by
convert_schema_to_pydanticto pass into your LLM provider Structured Output format - If required, you can also use
model_json_schema()to convert the Pydantic model into JSON Schema - Note: For OpenAI API, use
convert_schema_to_openai_pydanticinstead, as OpenAI API does not acceptAnyordictdatatypes and we will convert them tostrandList[]accordingly
We offer some default multimodal integration support for OpenAI and Gemini models with the StrictJSON schema (openai_sync, openai_async, gemini_sync, gemini_async). If you would like support for more models, do sound out on the discord (John's AI Group on Discord).
Code:
from strictjson.llm import openai_sync
kwargs = {"model": "gpt-5.1"} # any extra parameters for the llm
res = openai_sync(system_prompt = "You are a helpful assistent",
user_prompt = "Generate a birthday event for Alex",
output_format = {"name": "Name of birthday party, str",
"date": "Any date in Mar 2026, date",
"participants": [{'Name': 'starting with A, str',
'Age': 'between 5 to 12, int'}]},
**kwargs)Output:
{'name': "Alex's Birthday Party",
'date': datetime.date(2026, 3, 15),
'participants': [{'Name': 'Alice', 'Age': 10},
{'Name': 'Adam', 'Age': 8},
{'Name': 'Ava', 'Age': 9}]}ai_electricity.png:
Code:
from strictjson.llm import gemini_async
kwargs = {"model": "gemini-2.5-flash"} # any extra parameters for the llm
# Use the filepath to understand the images
res = await gemini_async(system_prompt = '''Output all text on the image and give an overall description''',
user_prompt = f"Image: <<ai_electricity.png>>",
output_format = {"Image_Text": "Output word for word all the text on the image, str",
"Image_Description": "Describe the overall image, str"},
**kwargs)Output:
{'Image_Text': '\'AI IS THE NEW ELECTRICITY\'\n"Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don\'t think AI will transform in the next several years."\n\nAndrew Ng\nFormer chief scientist at Baidu, Co-founder at Coursera.',
'Image_Description': 'The image features a quote attributed to Andrew Ng, surrounded by a blue background. In the center is a circular photograph of a man smiling, presumably Andrew Ng. The quote discusses the transformative impact of AI, comparing it to the historical transformation brought by electricity.'}- StrictJSON Schema is a dictionary of keys (maintained) and values (replaced with a suitable generation from LLM)
- Values are of the format
<description>, <datatype> - Values can also be
<description>, type: <datatype> - Values can also be
<datatype>- there will be no description - Values can also be
<description>- datatype will then be defaulted toAny
You can use simple data types like these:
| Type | Example Value (in Python) | Meaning / Notes |
|---|---|---|
str |
"Hello" |
A string of text. |
int |
5 |
A whole number (integer). |
float |
3.14 |
A floating-point number (decimal). |
bool |
True / False |
A boolean value — yes/no, true/false. |
List[int] |
[1, 2, 3] |
A list (array-like) of integers. From typing. |
Dict[str, Any] |
{"a": 1, "b": 2} |
A dictionary (key–value pairs). From typing. |
date |
date(2025, 5, 9) |
A calendar date (from datetime import date). |
datetime |
datetime(2025, 5, 9, 14, 30) |
A date and time (from datetime import datetime). |
UUID |
UUID("550e8400-e29b-41d4-a716-446655440000") |
A universally unique identifier (from uuid import UUID). |
Decimal |
Decimal("12.50") |
An exact decimal number (from decimal import Decimal). |
None |
None |
The null or "no value" object in Python. |
A datatype Any refers to any datatype. If you do not define a datatype in a value field, it is defaulted to Any
All of the datatypes in StrictJSON Schema are similar to Python type hints.
If you write list or dict without brackets, we interpret them as List[Any] and Dict[str, Any].
If you write List[{'key1': 'str', 'key2': 'int'}], we will ensure the output to be a list of dictionaries with the desired key names and output types for each key. This is useful if you need a list of a particular schema.
You can constrain a field to a fixed set of values using Enum:
Enum['A','B','C']— limits output to one of these values. (Note: UseEnuminstead of theLiteraltype hint in Python)
You can use Optional for fields that can return null values, e.g. Optional[str] returns either a string or a null value.
You can use Union for fields that can return more than one datatype, e.g. Union[str, int] returns either a string or integer.
We also support PEP 604 syntax using |:
str | int→Union[str, int]str | None→Optional[str]
List[...] and Dict[...] are type hints — the actual runtime values are list and dict. Likewise, date, datetime, UUID, and Decimal appear as proper Python objects in the result.
parse_yaml helps you get clean, structured output from LLMs. It tells the LLM the required output format and data types, to ensure strict conformity to the output format.
Old versions (e.g.
strict_json) that used JSON still work, butparse_yamlis the new and improved method.
- YAML structure avoids the need for backslash escaping or heavy bracket use, simplifying output and parsing compared to JSON.
- Shorter output context size with YAML than JSON.
- Creates / uses a Pydantic model for type checking, ensuring robustness of output.
- Fixes mistakes automatically (tries again up to three times by default — configurable via
num_tries). - Works with any LLM model (ChatGPT, Claude, Gemini, etc.) with adequate YAML understanding capabilities.
Basics: Teaches how to create LLM functions with system_prompt, user_prompt as input parameters, and output a string as output. Also covers both sync and async modes of parse_yaml
Multimodal Input: Teaches how to use the decorator image_parser and image_parser_async to get LLM to view images
Here’s a simple program that asks the AI for a blog post idea:
from strictjson import parse_yaml
def llm(system_prompt: str, user_prompt: str, **kwargs) -> str:
from openai import OpenAI
client = OpenAI()
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": user_prompt})
resp = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
messages=messages
)
return resp.choices[0].message.content
result = parse_yaml(
system_prompt="You are a friendly assistant.",
user_prompt="Write a blog post idea about AI in education",
output_format={
"title": "str",
"tags": "List[str]",
"published": "bool"
},
llm=llm
)
print(result)
# {'title': 'Adaptive Tutors', 'tags': ['ai', 'edtech', 'personalization'], 'published': False}- Updated: 20 Nov 2025
- Created: 7 Apr 2023
- Tested with: Pydantic 2.12.4
- Community: John's AI Group on Discord
