Using gpt to structure large amounts of data to json format

jr.2509 · May 5, 2024, 11:16am

Depending on what structure you are opting for, you might want to take a look at this thread:

A few members of the Forum including myself discussed and worked out a solution for semantically chunking a document using GPT-4-turbo. In essence, the approach involves using GPT-4-turbo to create an outline of the document (incl. the identification of the start and end position of individual sections within the document) and then use the information to programmatically extract the text verbatim from the document into a structured JSON.

The benefit of this approach is that is that you only need one API call to get the document’s basic structure and that you don’t have to worry about the output token constraints. Additionally, it saves a lot of cost compared to a scenario where you ask the model to return the text verbatim. That said, the approach currently is mostly applicable to documents that have clearly defined sections.

Topic		Replies	Views
How can I use chat/completion API on large datasets of "arbitrary" JSON API gpt-4 , fine-tuning , token , json	7	2701	March 12, 2024
ChatGPT answers partially to request API chatgpt	6	111	February 20, 2025
Practical Tips for Dealing with Large Documents (>2048 tokens) API	6	8461	December 17, 2023
Can't fetch large data from external API calls despite GPT-4's ability to handle up to 128k Plugins / Actions builders gpt-4 , gpts	7	2769	November 19, 2023
Working with GPT 3.5 Turbo to query JSON data - ChatGPT and Token Limits API	4	3252	May 17, 2023

Using gpt to structure large amounts of data to json format

Related topics