Skip to content

tanchongmin/strictjson

Repository files navigation

StrictJSON v6.3.0

Easy Structured Output for Large Language Models (LLMs)

  • Structured Output helps to constrain LLM's output for better consistency for downstream tasks
  • Most LLM providers use Pydantic for Structured Output, e.g. OpenAI, Gemini
  • Pydantic can be very lengthy to define; StrictJSON schema allows you to create a Pydantic Model in a concise way
  • Details on usage can be found in the Jupyter Notebook: Provider Structured Output

Pydantic (lengthy)

from typing import List
from pydantic import BaseModel, Field
from datetime import date as Date

class Participant(BaseModel):
    Name: str = Field(..., description="starting with A")
    Age: int = Field(..., description="between 5 to 12")

class CalendarEvent(BaseModel):
    name: str = Field(..., description="Name of birthday party")
    date: Date = Field(..., description="Any date in March 2026")
    participants: List[Participant]

StrictJSON Schema (short)

{"name": "Name of birthday party, str",
 "date": "Any date in Mar 2026, date",
 "participants": [{'Name': 'starting with A, str', 
                   'Age': 'between 5 to 12, int'}]}

How to Install

pip install strictjson

How to Convert StrictJSON Schema to Pydantic

from strictjson import convert_schema_to_pydantic

output_format = {"name": "Name of birthday party, str",
                 "date": "Any date in Mar 2026, date",
                 "participants": [{'Name': 'starting with A, str', 
                                   'Age': 'between 5 to 12, int'}]}

# This is preferred, can be used for Gemini models
pydantic_model = convert_schema_to_pydantic(output_format)

# use this code if you need json schema input for the LLM
json_schema = pydantic_model.model_json_schema()
  • Use the pydantic model generated by convert_schema_to_pydantic to pass into your LLM provider Structured Output format
  • If required, you can also use model_json_schema() to convert the Pydantic model into JSON Schema
  • Note: For OpenAI API, use convert_schema_to_openai_pydantic instead, as OpenAI API does not accept Any or dict datatypes and we will convert them to str and List[] accordingly

Default StrictJSON integration with LLMs

We offer some default multimodal integration support for OpenAI and Gemini models with the StrictJSON schema (openai_sync, openai_async, gemini_sync, gemini_async). If you would like support for more models, do sound out on the discord (John's AI Group on Discord).

Sample Text Input

Code:

from strictjson.llm import openai_sync

kwargs = {"model": "gpt-5.1"} # any extra parameters for the llm

res = openai_sync(system_prompt = "You are a helpful assistent",
    user_prompt = "Generate a birthday event for Alex",
    output_format = {"name": "Name of birthday party, str",
                     "date": "Any date in Mar 2026, date",
                     "participants": [{'Name': 'starting with A, str', 
                                       'Age': 'between 5 to 12, int'}]},
                    **kwargs)

Output:

{'name': "Alex's Birthday Party",
 'date': datetime.date(2026, 3, 15),
 'participants': [{'Name': 'Alice', 'Age': 10},
  {'Name': 'Adam', 'Age': 8},
  {'Name': 'Ava', 'Age': 9}]}

Sample Multimodal Input (place your filepath or url inside << >> in user_prompt)

ai_electricity.png:

AI Electricity

Code:

from strictjson.llm import gemini_async

kwargs = {"model": "gemini-2.5-flash"} # any extra parameters for the llm

# Use the filepath to understand the images
res = await gemini_async(system_prompt = '''Output all text on the image and give an overall description''',
      user_prompt = f"Image: <<ai_electricity.png>>",
      output_format = {"Image_Text": "Output word for word all the text on the image, str",
                      "Image_Description": "Describe the overall image, str"},                 
                      **kwargs)

Output:

{'Image_Text': '\'AI IS THE NEW ELECTRICITY\'\n"Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don\'t think AI will transform in the next several years."\n\nAndrew Ng\nFormer chief scientist at Baidu, Co-founder at Coursera.',
 'Image_Description': 'The image features a quote attributed to Andrew Ng, surrounded by a blue background. In the center is a circular photograph of a man smiling, presumably Andrew Ng. The quote discusses the transformative impact of AI, comparing it to the historical transformation brought by electricity.'}

StrictJSON Schema Definition

  • StrictJSON Schema is a dictionary of keys (maintained) and values (replaced with a suitable generation from LLM)
  • Values are of the format <description>, <datatype>
  • Values can also be <description>, type: <datatype>
  • Values can also be <datatype> - there will be no description
  • Values can also be <description> - datatype will then be defaulted to Any

StrictJSON Datatype

You can use simple data types like these:

Type Example Value (in Python) Meaning / Notes
str "Hello" A string of text.
int 5 A whole number (integer).
float 3.14 A floating-point number (decimal).
bool True / False A boolean value — yes/no, true/false.
List[int] [1, 2, 3] A list (array-like) of integers. From typing.
Dict[str, Any] {"a": 1, "b": 2} A dictionary (key–value pairs). From typing.
date date(2025, 5, 9) A calendar date (from datetime import date).
datetime datetime(2025, 5, 9, 14, 30) A date and time (from datetime import datetime).
UUID UUID("550e8400-e29b-41d4-a716-446655440000") A universally unique identifier (from uuid import UUID).
Decimal Decimal("12.50") An exact decimal number (from decimal import Decimal).
None None The null or "no value" object in Python.

A datatype Any refers to any datatype. If you do not define a datatype in a value field, it is defaulted to Any

All of the datatypes in StrictJSON Schema are similar to Python type hints.

If you write list or dict without brackets, we interpret them as List[Any] and Dict[str, Any].

If you write List[{'key1': 'str', 'key2': 'int'}], we will ensure the output to be a list of dictionaries with the desired key names and output types for each key. This is useful if you need a list of a particular schema.

You can constrain a field to a fixed set of values using Enum:

Enum['A','B','C'] — limits output to one of these values. (Note: Use Enum instead of the Literal type hint in Python)

You can use Optional for fields that can return null values, e.g. Optional[str] returns either a string or a null value.

You can use Union for fields that can return more than one datatype, e.g. Union[str, int] returns either a string or integer.

We also support PEP 604 syntax using |:

  • str | intUnion[str, int]
  • str | NoneOptional[str]

List[...] and Dict[...] are type hints — the actual runtime values are list and dict. Likewise, date, datetime, UUID, and Decimal appear as proper Python objects in the result.


StrictJSON's in-built Structured Output Parser for any LLM: parse_yaml

parse_yaml helps you get clean, structured output from LLMs. It tells the LLM the required output format and data types, to ensure strict conformity to the output format.

Old versions (e.g. strict_json) that used JSON still work, but parse_yaml is the new and improved method.

What Makes It Useful

  • YAML structure avoids the need for backslash escaping or heavy bracket use, simplifying output and parsing compared to JSON.
  • Shorter output context size with YAML than JSON.
  • Creates / uses a Pydantic model for type checking, ensuring robustness of output.
  • Fixes mistakes automatically (tries again up to three times by default — configurable via num_tries).
  • Works with any LLM model (ChatGPT, Claude, Gemini, etc.) with adequate YAML understanding capabilities.

Tutorial

Basics: Teaches how to create LLM functions with system_prompt, user_prompt as input parameters, and output a string as output. Also covers both sync and async modes of parse_yaml

Multimodal Input: Teaches how to use the decorator image_parser and image_parser_async to get LLM to view images

Quick Example

Here’s a simple program that asks the AI for a blog post idea:

from strictjson import parse_yaml

def llm(system_prompt: str, user_prompt: str, **kwargs) -> str:
    from openai import OpenAI
    client = OpenAI()
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": user_prompt})
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=messages
    )
    return resp.choices[0].message.content

result = parse_yaml(
  system_prompt="You are a friendly assistant.",
  user_prompt="Write a blog post idea about AI in education",
  output_format={
    "title": "str",
    "tags": "List[str]",
    "published": "bool"
  },
  llm=llm
)

print(result)
# {'title': 'Adaptive Tutors', 'tags': ['ai', 'edtech', 'personalization'], 'published': False}

Project Info

About

A Structured Output Framework for LLM Outputs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •