How to Improve Time and Date Recognition in the Real-Time API?
I’m using the GPT-4o Mini real-time API for a product that helps users book and reschedule appointments. However, I’ve noticed that the API frequently misinterprets dates and times, even when customers state them clearly.
Are there any effective prompting strategies or other methods to improve its accuracy in recognizing dates and times?
Just provide the current date and time in the prompt. Phrases like “next Friday” are interpreted correctly in my app using 4o-mini with this technique.
Voice production out, from voice input, has worse reproduction of data. The model is trained on producing the sound of voice where the actual contents are another layer below.
Typical advice would be to give system messages such as “session start time and date {datetime}, maximum session length 30 minutes” so there is continued context. When a text chat can be revisited, an app can be enhanced by “latest user input there”, or even timestamped messages.
What I think could help is having a written production of the date before it is spoken. For a scheduling application, calculation tools are useful for “next friday”, “next week”, “calendar_output”, etc, to return valid dates. The AI putting those dates in writing and seeing them will also help a bit in reproduction, but then you still face the translation of information to accurate audio.
Thanks. I have a set of prompt to instruct the AI current date, what is this Friday, what is next Friday. I think the annoying thing is that, for example, the user says 3/20th. GPT will respond like Ok, I see you want to book on 3/19th. Is that correct. User say: no, its 3/20th. GPT then still does not get it, it will just assume it’s 3/19th.
Examine the full context of your conversation. It might not be what you’re expecting. Very easy to have something in a prompt or amendment somewhere that confuses the smaller model. Remember it has no reasoning. Your instructions must be linear.
Is it possible its due to the codec of ulaw and twilio integration. Same test with chrome API, speech to text → text to speech, it can recognize date/time perfectly.