0% found this document useful (0 votes)

103 views12 pages

Reasoning Best Practices - OpenAI API

OpenAI offers two model families: reasoning models (o-series) and GPT models, each designed for different tasks. Reasoning models excel at complex problem-solving and decision-making, while GPT models are faster and cost-efficient for straightforward tasks. Users should choose based on their needs for speed, cost, accuracy, or complexity, often utilizing a combination of both models for optimal results.

Uploaded by

Krot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views12 pages

Reasoning Best Practices - OpenAI API

Uploaded by

Krot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Reasoning best Copy page Reasoning models vs.

GPT
models
practices When to use our reasoning
models
Learn when to use reasoning
models and how they compare to How to prompt
GPT models.

OpenAI offers two types of models: reasoning

models (o1 and o3-mini, for example) and GPT
models (like GPT-4o). These model families behave
differently.

This guide covers:

1 The difference between our reasoning and non-

reasoning GPT models
2 When to use our reasoning models
3 How to prompt reasoning models effectively

Reasoning models vs. GPT

models
Compared to GPT models, our o-series models
excel at different tasks and require different
prompts. One model family isn't better than the
other—they're just different.

We trained our o-series models (“the planners”) to

think longer and harder about complex tasks,
making them effective at strategizing, planning
solutions to complex problems, and making
decisions based on large volumes of ambiguous
information. These models can also execute tasks
with high accuracy and precision, making them
ideal for domains that would otherwise require a
human expert—like math, science, engineering,
financial services, and legal services.

On the other hand, our lower-latency, more cost-

efficient GPT models (“the workhorses”) are
designed for straightforward execution. An
application might use o-series models to plan out
the strategy to solve a problem, and use GPT
models to execute specific tasks, particularly when
speed and cost are more important than perfect
accuracy.

How to choose

What's most important for your use case?

Speed and cost → GPT models are faster and

tend to cost less

Executing well defined tasks → GPT models

handle explicitly defined tasks well
Accuracy and reliability → o-series models are
reliable decision makers

Complex problem-solving → o-series models

work through ambiguity and complexity

If speed and cost are the most important factors

when completing your tasks and your use case is
made up of straightforward, well defined tasks, then
our GPT models are the best fit for you. However, if
accuracy and reliability are the most important
factors and you have a very complex, multi-step
problem to solve, our o-series models are likely right
for you.
Most AI workflows will use a combination of both
models—o-series for agentic planning and decision-
making, GPT series for task execution.

Our GPT-4o and GPT-4o mini models triage order details with
customer information, identify the order issues and the return
policy, and then feed all of these data points into o3-mini to
make the final decision about the viability of the return based
on policy.

When to use our reasoning

models
Here are a few patterns of successful usage that
we’ve observed from customers and internally at
OpenAI. This isn't a comprehensive review of all
possible use cases but, rather, some practical
guidance for testing our o-series models.

Ready to use a reasoning model? Skip to the

quickstart →

1. Navigating ambiguous tasks

Reasoning models are particularly good at taking

limited information or disparate pieces of
information and with a simple prompt,
understanding the user’s intent and handling any
gaps in the instructions. In fact, reasoning models
will often ask clarifying questions before making
uneducated guesses or attempting to fill
information gaps.

“o1’s reasoning capabilities enable our multi-

agent platform Matrix to produce exhaustive,
well-formatted, and detailed responses when
processing complex documents. For example,
o1 enabled Matrix to easily identify baskets
available under the restricted payments
capacity in a credit agreement, with a basic
prompt. No former models are as performant.
o1 yielded stronger results on 52% of complex
prompts on dense Credit Agreements
compared to other models.”

—Hebbia, AI knowledge platform company for

legal and finance

2. Finding a needle in a haystack

When you’re passing large amounts of unstructured

information, reasoning models are great at
understanding and pulling out only the most
relevant information to answer a question.

"To analyze a company's acquisition, o1

reviewed dozens of company documents—like
contracts and leases—to find any tricky
conditions that might affect the deal. The
model was tasked with flagging key terms and
in doing so, identified a crucial "change of
control" provision in the footnotes: if the
company was sold, it would have to pay off a
$75 million loan immediately. o1's extreme
attention to detail enables our AI agents to
support finance professionals by identifying
mission-critical information."

—Endex, AI financial intelligence platform

3. Finding relationships and nuance

across a large dataset

We’ve found that reasoning models are particularly

good at reasoning over complex documents that
have hundreds of pages of dense, unstructured
information—things like legal contracts, financial
statements, and insurance claims. The models are
particularly strong at drawing parallels between
documents and making decisions based on
unspoken truths represented in the data.

“Tax research requires synthesizing multiple

documents to produce a final, cogent answer.
We swapped GPT-4o for o1 and found that o1
was much better at reasoning over the
interplay between documents to reach logical
conclusions that were not evident in any one
single document. As a result, we saw a 4x
improvement in end-to-end performance by
switching to o1—incredible.”

—Blue J, AI platform for tax research

Reasoning models are also skilled at reasoning over

nuanced policies and rules, and applying them to
the task at hand in order to reach a reasonable
conclusion.

"In financial analyses, analysts often tackle

complex scenarios around shareholder equity
and need to understand the relevant legal
intricacies. We tested about 10 models from
different providers with a challenging but
common question: how does a fundraise affect
existing shareholders, especially when they
exercise their anti-dilution privileges? This
required reasoning through pre- and post-
money valuations and dealing with circular
dilution loops—something top financial
analysts would spend 20-30 minutes to figure
out. We found that o1 and o3-mini can do this
flawlessly! The models even produced a clear
calculation table showing the impact on a
$100k shareholder."

–BlueFlame AI, AI platform for investment

management

4. Multi-step agentic planning

Reasoning models are critical to agentic planning

and strategy development. We’ve seen success
when a reasoning model is used as “the planner,”
producing a detailed, multi-step solution to a
problem and then selecting and assigning the right
GPT model (“the doer”) for each step, based on
whether high intelligence or low latency is most
important.

“We use o1 as the planner in our agent

infrastructure, letting it orchestrate other
models in the workflow to complete a multi-
step task. We find o1 is really good at selecting
data types and breaking down big questions
into smaller chunks, enabling other models to
focus on execution.”

—Argon AI, AI knowledge platform for the

pharmaceutical industry

“o1 powers many of our agentic workflows at

Lindy, our AI assistant for work. The model
uses function calling to pull information from
your calendar or email and then can
automatically help you schedule meetings,
send emails, and manage other parts of your
day-to-day tasks. We switched all of our
agentic steps that used to cause issues to o1
and observing our agents becoming basically
flawless overnight!”

—Lindy.AI, AI assistant for work

5. Visual reasoning

As of today, o1 is the only reasoning model that

supports vision capabilities. What sets it apart from
GPT-4o is that o1 can grasp even the most
challenging visuals, like charts and tables with
ambiguous structure or photos with poor image
quality.

“We automate risk and compliance reviews for

millions of products online, including luxury
jewelry dupes, endangered species, and
controlled substances. GPT-4o reached 50%
accuracy on our hardest image classification
tasks. o1 achieved an impressive 88% accuracy
without any modifications to our pipeline.”

—SafetyKit, AI-powered risk and compliance

platform

From our own internal testing, we’ve seen that o1

can identify fixtures and materials from highly
detailed architectural drawings to generate a
comprehensive bill of materials. One of the most
surprising things we observed was that o1 can draw
parallels across different images by taking a legend
on one page of the architectural drawings and
correctly applying it across another page without
explicit instructions. Below you can see that, for the
4×4 PT wood posts, o1 recognized that "PT" stands
for pressure treated based on the legend.

6. Reviewing, debugging, and

improving code quality

Reasoning models are particularly effective at

reviewing and improving large amounts of code,
often running code reviews in the background given
the models’ higher latency.

“We deliver automated AI Code Reviews on

platforms like GitHub and GitLab. While code
review process is not inherently latency-
sensitive, it does require understanding the
code diffs across multiple files. This is where o1
really shines—it's able to reliably detect minor
changes to a codebase that could be missed
by a human reviewer. We were able to increase
product conversion rates by 3x after switching
to o-series models.”

—CodeRabbit, AI code review startup

While GPT-4o and GPT-4o mini may be better

designed for writing code with their lower latency,
we’ve also seen o3-mini spike on code production
for use cases that are slightly less latency-sensitive.

“o3-mini consistently produces high-quality,

conclusive code, and very frequently arrives at
the correct solution when the problem is well-
defined, even for very challenging coding tasks.
While other models may only be useful for
small-scale, quick code iterations, o3-mini
excels at planning and executing complex
software design systems.”

—Windsurf, collaborative agentic AI-powered

IDE, built by Codeium

7. Evaluation and benchmarking for

other model responses

We’ve also seen reasoning models do well in

benchmarking and evaluating other model
responses. Data validation is important for ensuring
dataset quality and reliability, especially in sensitive
fields like healthcare. Traditional validation methods
use predefined rules and patterns, but advanced
models like o1 and o3-mini can understand context
and reason about data for a more flexible and
intelligent approach to validation.

"Many customers use LLM-as-a-judge as part

of their eval process in Braintrust. For example,
a healthcare company might summarize
patient questions using a workhorse model like
gpt-4o, then assess the summary quality with
o1. One Braintrust customer saw the F1 score of
a judge go from 0.12 with 4o to 0.74 with o1! In
these use cases, they’ve found o1’s reasoning
to be a game-changer in finding nuanced
differences in completions, for the hardest and
most complex grading tasks."

—Braintrust, AI evals platform

How to prompt reasoning

models effectively
These models perform best with straightforward
prompts. Some prompt engineering techniques, like
instructing the model to "think step by step," may
not enhance performance (and can sometimes
hinder it). See best practices below, or get started
with prompt examples.

Developer messages are the new system

messages: Starting with o1-2024-12-17 ,
reasoning models support developer messages
rather than system messages, to align with the
chain of command behavior described in the
model spec.

Keep prompts simple and direct: The models

excel at understanding and responding to brief,
clear instructions.
Avoid chain-of-thought prompts: Since these
models perform reasoning internally, prompting
them to "think step by step" or "explain your
reasoning" is unnecessary.

Use delimiters for clarity: Use delimiters like

markdown, XML tags, and section titles to
clearly indicate distinct parts of the input,
helping the model interpret different sections
appropriately.

Try zero shot first, then few shot if needed:

Reasoning models often don't need few-shot
examples to produce good results, so try to
write prompts without examples first. If you
have more complex requirements for your
desired output, it may help to include a few
examples of inputs and desired outputs in your
prompt. Just ensure that the examples align
very closely with your prompt instructions, as
discrepancies between the two may produce
poor results.

Provide specific guidelines: If there are ways

you explicitly want to constrain the model's
response (like "propose a solution with a
budget under $500"), explicitly outline those
constraints in the prompt.

Be very specific about your end goal: In your

instructions, try to give very specific
parameters for a successful response, and
encourage the model to keep reasoning and
iterating until it matches your success criteria.

Markdown formatting: Starting with

o1-2024-12-17 , reasoning models in the API
will avoid generating responses with markdown
formatting. To signal to the model when you do
want markdown formatting in the response,
include the string Formatting re-enabled
on the first line of your developer message.
Other resources
For more inspiration, visit the OpenAI Cookbook,
which contains example code and links to third-
party resources, or learn more about our models
and reasoning capabilities:

Meet the models

Reasoning guide

How to use reasoning for validation

Video course: Reasoning with o1

Papers on advanced prompting to improve

reasoning

Introducing OpenAI O1 - OpenAI
No ratings yet
Introducing OpenAI O1 - OpenAI
5 pages
OpenAI o1 for Finance & FP&A Insights
No ratings yet
OpenAI o1 for Finance & FP&A Insights
14 pages
OpenAI o3 & o4-mini: Advanced AI Reasoning
No ratings yet
OpenAI o3 & o4-mini: Advanced AI Reasoning
8 pages
1.1. Background On Reasoning in Large Language Models (LLMS)
No ratings yet
1.1. Background On Reasoning in Large Language Models (LLMS)
64 pages
AI Updates: April 2025 Highlights
100% (1)
AI Updates: April 2025 Highlights
24 pages
How To Use Deepseek? Manual Full en
No ratings yet
How To Use Deepseek? Manual Full en
105 pages
AI Revolution: OpenAI O3 Model
No ratings yet
AI Revolution: OpenAI O3 Model
5 pages
Info
No ratings yet
Info
3 pages
Understanding Reasoning LLMS: Methods and Strategies For Building and Refining Reasoning Models
No ratings yet
Understanding Reasoning LLMS: Methods and Strategies For Building and Refining Reasoning Models
27 pages
Types of AI and Their Capabilities
No ratings yet
Types of AI and Their Capabilities
10 pages
AI Prompt Engineering Research
No ratings yet
AI Prompt Engineering Research
4 pages
Generalist Fellowship Brochure
No ratings yet
Generalist Fellowship Brochure
13 pages
AI-900: Azure AI Fundamentals Overview
No ratings yet
AI-900: Azure AI Fundamentals Overview
158 pages
Unveiling The Mathematical Reasoning in Deepseek Models: A Comparative Study of Large Language Models
No ratings yet
Unveiling The Mathematical Reasoning in Deepseek Models: A Comparative Study of Large Language Models
27 pages
From Artificial To Organic - The Evolution of Intelligence in Technology
No ratings yet
From Artificial To Organic - The Evolution of Intelligence in Technology
2 pages
Paper2 2
No ratings yet
Paper2 2
9 pages
EL4106Intro 2024
No ratings yet
EL4106Intro 2024
69 pages
Competitive Programming With Large Reasoning Models
No ratings yet
Competitive Programming With Large Reasoning Models
48 pages
AI-900 Exam Study Guide and Q&As
No ratings yet
AI-900 Exam Study Guide and Q&As
161 pages
DeepSeek R1: Advancements in AI Reasoning
No ratings yet
DeepSeek R1: Advancements in AI Reasoning
9 pages
Ai900 Exam Topics
100% (2)
Ai900 Exam Topics
50 pages
Recent Innovations in Large Language Models 1744705204
No ratings yet
Recent Innovations in Large Language Models 1744705204
10 pages
Microsoft - Pre .AI-933
No ratings yet
Microsoft - Pre .AI-933
31 pages
Prometeia's LLM GenAI Validation Framework - March 2024
No ratings yet
Prometeia's LLM GenAI Validation Framework - March 2024
18 pages
Planning and Advantage and Disadvantage The Planning Graph
No ratings yet
Planning and Advantage and Disadvantage The Planning Graph
14 pages
Generative AI With Python - Bert Gollnick
100% (3)
Generative AI With Python - Bert Gollnick
708 pages
Introducing GPT 5 For Developers - OpenAI
No ratings yet
Introducing GPT 5 For Developers - OpenAI
16 pages
Artificial Intelligence Courses
No ratings yet
Artificial Intelligence Courses
25 pages
Machine Learning Systems: Vĳay Janapa Reddi
No ratings yet
Machine Learning Systems: Vĳay Janapa Reddi
1,474 pages
Week 11 Chats
No ratings yet
Week 11 Chats
5 pages
Chapter 15 Students
No ratings yet
Chapter 15 Students
37 pages
ML 22
No ratings yet
ML 22
29 pages
Proj Report - Naan Mudhalvan - 1
No ratings yet
Proj Report - Naan Mudhalvan - 1
13 pages
Ai Notes
No ratings yet
Ai Notes
49 pages
Introduction To The OpenAI Reasoning Model O1 Preview API 1727193646
No ratings yet
Introduction To The OpenAI Reasoning Model O1 Preview API 1727193646
16 pages
Ai With Aws
No ratings yet
Ai With Aws
9 pages
Transforming Legal Workflows With Customized AI Models
No ratings yet
Transforming Legal Workflows With Customized AI Models
4 pages
Ultimate AI Resource Sheet
No ratings yet
Ultimate AI Resource Sheet
25 pages
Introduction To Ai Unit 1
No ratings yet
Introduction To Ai Unit 1
11 pages
1 To 50 AI 900 Questions
No ratings yet
1 To 50 AI 900 Questions
30 pages
AI 900 Demo
No ratings yet
AI 900 Demo
13 pages
Exhibit 20
No ratings yet
Exhibit 20
22 pages
2204 LwAI Newsletter
No ratings yet
2204 LwAI Newsletter
18 pages
AI Fundamentals With Capstone Session 2 - July 9th 2025
No ratings yet
AI Fundamentals With Capstone Session 2 - July 9th 2025
44 pages
L02 Introduction
No ratings yet
L02 Introduction
29 pages
Inference Efficiency by Learning Task Complexity
No ratings yet
Inference Efficiency by Learning Task Complexity
9 pages
Clear Exam
No ratings yet
Clear Exam
297 pages
NeurIPS 2023 Openagi When LLM Meets Domain Experts Paper Datasets - and - Benchmarks
No ratings yet
NeurIPS 2023 Openagi When LLM Meets Domain Experts Paper Datasets - and - Benchmarks
30 pages
Prompt Engineering Guide For B.com Computer Applications Students Version 1
No ratings yet
Prompt Engineering Guide For B.com Computer Applications Students Version 1
14 pages
Anjali Case
No ratings yet
Anjali Case
10 pages
Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
Intro To Intelligent Apps Workshop
100% (1)
Intro To Intelligent Apps Workshop
106 pages
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
DL 1
No ratings yet
DL 1
54 pages
Ai Important Mid-I
No ratings yet
Ai Important Mid-I
47 pages
Module 1
No ratings yet
Module 1
24 pages
Domain 3 - AI Algorithms and Models
No ratings yet
Domain 3 - AI Algorithms and Models
5 pages
AI Agent Patterns
No ratings yet
AI Agent Patterns
1 page
AWS AI and ML Scholarship Skills Guide 2024
No ratings yet
AWS AI and ML Scholarship Skills Guide 2024
9 pages
Artificial Intelligence in Agriculture 1st Edition Rajesh Singh Full Chapters Included
No ratings yet
Artificial Intelligence in Agriculture 1st Edition Rajesh Singh Full Chapters Included
103 pages
HEALTHCARE
No ratings yet
HEALTHCARE
9 pages
A Competency Framework For AI Integration in India
No ratings yet
A Competency Framework For AI Integration in India
78 pages
M32 AI Fast Track - Remote AI Product Engineer Role-1
No ratings yet
M32 AI Fast Track - Remote AI Product Engineer Role-1
3 pages
A Critical Evaluation of Handling Uncertainty in Big Data Processing
No ratings yet
A Critical Evaluation of Handling Uncertainty in Big Data Processing
8 pages
AIS Impact on UAE SMEs Performance
No ratings yet
AIS Impact on UAE SMEs Performance
22 pages
ML Lecture 8 - Clustering
No ratings yet
ML Lecture 8 - Clustering
41 pages
Leveraging AI in Business
No ratings yet
Leveraging AI in Business
10 pages
Enhancing Big Data with Emerging Tech
No ratings yet
Enhancing Big Data with Emerging Tech
2 pages
Research Final Defense
No ratings yet
Research Final Defense
8 pages
Salesforce 2023 Nonprofit Trends Report
No ratings yet
Salesforce 2023 Nonprofit Trends Report
30 pages
Advanced Image Processing For Fingerprint-Based Blood Grouping
No ratings yet
Advanced Image Processing For Fingerprint-Based Blood Grouping
6 pages
Manuel JB Bandalan - Lesson 2.2 Positive Use of ICT
No ratings yet
Manuel JB Bandalan - Lesson 2.2 Positive Use of ICT
7 pages
PRC1
No ratings yet
PRC1
15 pages
Form 3 Computer Science Guide
No ratings yet
Form 3 Computer Science Guide
60 pages
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
100% (1)
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
134 pages
Ait Imma
No ratings yet
Ait Imma
3 pages
Cost-Sensitive Deep Learning for NIDS
No ratings yet
Cost-Sensitive Deep Learning for NIDS
21 pages
Trend Vision One XDR Advanced - Student Guide - V1
No ratings yet
Trend Vision One XDR Advanced - Student Guide - V1
46 pages
Computer Vision
No ratings yet
Computer Vision
8 pages
AIaaS Edge Computing Presentation
No ratings yet
AIaaS Edge Computing Presentation
11 pages
RecipeNLG: Dataset for Text Generation
No ratings yet
RecipeNLG: Dataset for Text Generation
7 pages
Wa0010.
No ratings yet
Wa0010.
69 pages
Master's Study Plan (Study Plan For Master Degree Program)
100% (1)
Master's Study Plan (Study Plan For Master Degree Program)
5 pages
Griffins Ayieko? Nairobi Sample
No ratings yet
Griffins Ayieko? Nairobi Sample
3 pages
(2024) Building AI For Education - Ceibal
No ratings yet
(2024) Building AI For Education - Ceibal
144 pages
Advanced Computer Vision Course
No ratings yet
Advanced Computer Vision Course
50 pages