Ensure JSON response format

Hello,

I often want the API to send back a response that is something I can parse programmatically. For example, I want it to summarize a passage of text, providing comments, where each comment is a member of a JSON array.

It’s frustrating. Something the response will have “JSON:” prepended to it. Sometimes it won’t. Sometimes there will be some other sort of preamble to explain that it’s JSON.

A pretty common feature of API’s is to be able to specify a response format. It would make things much more usable for developers here.

8 Likes

Have you tried putting in the prompt to respond with certain parts as JSON? For a similar problem with math expressions, asking for LaTeX is working fairly well. (ref)

It also seems that the more ChatGPT is asked to use LaTeX for math expressions the more it seems to know this, does it automatically and gets it correct, but that could be my wishful thinking. :slightly_smiling_face:

1 Like

Yes, I tell it to format that response as JSON.

:slightly_frowning_face:

Have you tried adding a few-shot examples?

Some things you might try:

Provide your answer in JSON form. Reply with only the answer in JSON form and include no other commentary:

Another thing you might try adding to the very end of your prompt is \n```json to indicate the beginning of a JSON object in markdown format.

6 Likes

Telling the model what to do(prompting) only 60% success rate or poor reliability. Give a few examples of your output style, you improve it to 90%. If you need 100% reliability in achieving JSON output and wish to parse it error-free, use 2 API calls, first one gets the text, and 2nd call formats the text in JSON format. If you get any parse error, you loop the generated text back into the 2nd API call the loop should only end if there is no error in parsing the JSON. This is one way I have used to ensure reliability when you are parsing the text to the front end and to avoid errors.

5 Likes

As mentioned, a few shot example works wonders. Depending on how large and nested your JSON object is, you will continuously run into issues.

You would have a better time if you were to train a model, or simplify it with token classification and use that to logically create your object.

1 Like

@RonaldGRuckus , that is interesting. I could train my own custom model to understand JSON output, and then use that model going forward. Is that possible with the chat endpoint? Seems like it’s not, or not yet:

https://platform.openai.com/docs/api-reference/chat/create#chat/create-model

What do you mean by this:

simplify it with token classification

The response is actually in JSON format, it just so happens that the text of the response is a text JSON value, so the completion response is a JSON text value.

I think it is more reliable to parse this text value and convert to JSON after the completion versus trying to prompt am stochastic language model to always output reliable JSON data.

FWIW,

:slight_smile:

I think you may be right. I’ve had more consistent luck asking it to respond in a numerical list, with each list item on it’s own line. That’s easy enough to parse into JSON with a little string hackery.

1 Like

For those who may have noticed the earlier link for this was broken, it has now been corrected.

I have had reasonable success with asking ChatGPT to create/improve the questions that you ask it e.g. “Can you modify the question I asked you so that next time I ask it you will…” then, for example “… provide the answer in JSON format”. You can also ask chatGPT to compress the question. Here is an example I achieved using this technique. I wanted ChatGPT to provide topics and weightings for any input phrase I gave it, with the answer in the form of a JSON object. The resulting question was…

:memo:: Yesteday I bought myself a teapot. :dart:: Topics/keywords/people/places/things/dates/times. :dart:: 10+ topics. :pray: Output: JSON hash {topic: % weight} only. :mute:NoExplanation

The bit about “buying a teapot” is an example of the phrase I want it to categorize. The output I get is consistently just pure JSON e.g.
{
“teapot”: 0.8,
“yesterday”: 0.4,
“buying”: 0.3,
“self”: 0.3,
“household items”: 0.2,
“kitchenware”: 0.2,
“shopping”: 0.2,
“personal purchase”: 0.2,
“home goods”: 0.1,
“retail”: 0.1,
“April 16, 2023”: 0.1
}

1 Like

I personally found few-shot examples and English description of the schema to be unreliable for my use case. Specifying the schema as a TypeScript interface finally did the trick for me. You can see some examples at ResuLLMe/__init__.py at main · IvanIsCoding/ResuLLMe · GitHub

In general, I recommend the following prompt style:

You are going to write a JSON <insert the rest of your instructions>

Now consider the following TypeScript Interface for the JSON schema:

interface MessageItem {
    receiver: string;
    sender: string;
    content: string;
}

interface Message {
   messages: MessageItem[]; 
}

Write the basics section according to the Message schema. 
On the response, include only the JSON.
1 Like

Typescript is the way to go. I also find using zod works really well too because you can validate the outputs at runtime and if there is an error feed it back in and let GPT reflect on it’s mistake.

How are you expressing line breaks, in your code, when providing your prompt? Currently I just have “\r\n” inside of a string and it does seem to work. But it’s confusing because obviously if I just log the request, I don’t see a new line, I see the literal “\r\n”.

You can try my system prompt: GitHub - rlogank/chatgpt-controller

Thank you. This guide really works for me.

1 Like

I use this technique regularly

You just need to ask it.

I use a prompt along the lines of

# How to respond to this prompt
- Your response MUST be a JSON array
- No other text, just the JSON array please

Youll probably want to strip out ```json header and ``` footer before parsing the response

GPT3.5 particularly struggles to not throw these MD bits in

But beyond that… it generally just works fine

P.S. I also find adding an example response can help a lot if the structure you’re looking for is complex

I always put the requirements in both the system and user messages.

2 Likes

From my personal experience, the stability of outputing JSON array is worse than JSON object, so instead of ["a","b","c"], I always ask the model to output {"0": "a", "1": "b", "2": "c"}, which has a better stability.