Best practice scanned PDF / What model to use?

JorenDeb · February 17, 2025, 9:53am

Hi all

I’m new to the OpenAI API. I’ve written a (backoffice) application which uploads documents (mainly pdf) to OpenAI to extract data.

All works perfectly, but i’m struggling with scanned pdf’s. What the best practice?

I can do OCR before sending the file to ChatGPT and make a searchable pdf
I can make an image of each pdf page and upload those using the vision API calls. I tried this but the chat then asks to upload the document, so I guess im doing something wrong here.
I can extract the text from the pdf and send it to the API instead of the file, but i’m worried about the results if i do that. (Position of the data,…)
I read making a html file of the pdf does the trick. Anyone van verify?

Additional questions:

anyone knows how the data extraction works on secured pdf files? Like the security which makes you can’t extract a page for example.
whats the best model to use? I’m now using gpt4o-mini and results are fine. But i’ve read gpt4o is cheaper for the vision calls?

Alot of questions. Hopefully alot of answers too I have read alot of it but the API has changed a lot recently it seems so it’s hard to find the right answers online. Community to the rescue?

Thanks!

udm17 · February 17, 2025, 12:01pm

I’ve not worked with pdf’s specifically before but using gpt4o has worked well with images for me so far.

My suggestion would be to do it one pdf at a time. For each page in the pdf, get the image and encode it into base64. Then add all the images in order of page number into a message object that you can send to 4o.

Just be careful about the amount of images and tokens you are sending in one message to ensure they dont cross the limits.

The prompt in the begining of the messages object can have your task description init with the developer role.

This should do the trick for you

JorenDeb · February 17, 2025, 1:33pm

Thanks for your answer.
I used the Assistant API so far. Guess you use the chat completion one?

udm17 · February 19, 2025, 6:45am

Yeah. While I like the concept of the Assistant API, it just feels inflexible most of the times to me.

It’s quite possible to create a assistant like architecture using tool calls and message trails which I prefer

Topic		Replies	Views
What is the best way to parse a PDF file with ChatGPT? API	9	46377	November 16, 2024
Process scanned pdfs through api API gpt-4 , chatgpt , api , pdf , ocr	2	712	December 12, 2024
Scanned pdf with API and ask questions API chatgpt , api	3	1101	October 15, 2024
OCR using API for text extraction API api	9	8491	December 18, 2024
Best practices for PDF parsing with Assistants API and file_search tool API assistants-api	6	632	March 4, 2025

Best practice scanned PDF / What model to use?

Related topics