We’re attempting to use Assistant API to run a module where two languages are needed, say English and French. Given a paragraph of text that contains both English and French sentences or words, we want to segment the text so that we can feed each segment into a TTS API so that it knows exactly which language it is speaking.
We tried multiple prompts like the followign:
When you respond with text containing mixed English and French, separate the segments based on language. For each segment, wrap it in markup Welcome</> and Bienvenue</>
Example:
Input: “Welcome to our website. Bienvenue sur notre site Web.”
Output: Welcome to our website.</> Bienvenue sur notre site Web.</>
—
However it’s not segmenting the text very well when the segments are shorter, even with gpt-4-0125-preview, for example:
- Bonjour - Hello
- Merci - Thank you
- Oui - Yes
- Non - No
- Au revoir - Goodbye
just comes out as (all inside a French tag):
1. Bonjour - Hello 2. Merci - Thank you 3. Oui - Yes 4. Non - No 5. Au revoir - Goodbye </>
ChatGPT may not be the best way to do this, does anyone know a better way?