Hi everyone,
Does anyone know how OpenAI formats the output as JSON after we define the schema inside a function using the API?
Does the same LLM that generates the response also process the JSON formatting? If so, has anyone measured whether using JSON mode impacts performance? If a single LLM is handling both tasks (following the prompt request and adhering to the schema instructions), I imagine it could affect the model’s self-attention and potentially change its performance.
It wouldn’t be too hard to test, I guess I could set up a coding experiment or math question and compare responses in normal mode versus JSON mode. But I’m curious if anyone has already conducted such experiments. I’m particularly interested in non-coding and non-math questions, is just that it’s harder to measure differences and mistakes.
The alternative would be that a separate LLM or program processes the initial model response and converts it into JSON. If that were the case, we would expect the output quality to remain the same, but latency could increase (which does seem to happen with JSON responses). On top of that, this setup would likely be more costly for OpenAI since a single request would require two inference steps, so I don’t think they would want that
Also, is it just me, or does this whole thing feel a bit unscientific? Features get released, sometimes they work, sometimes they don’t, and the inner workings remain secret, making it difficult for companies to decide whether to implement them. This doesn’t exactly encourage businesses to integrate LLMs into their systems