Skip to content

fix(mistral): preserve diarization segments in transcription response#23925

Merged
Chesars merged 1 commit intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/mistral-diarize-segments-response
Mar 18, 2026
Merged

fix(mistral): preserve diarization segments in transcription response#23925
Chesars merged 1 commit intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/mistral-diarize-segments-response

Conversation

@Chesars
Copy link
Copy Markdown
Contributor

@Chesars Chesars commented Mar 18, 2026

Relevant issues

Fixes #23890

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

When using Mistral's Voxtral model with diarize=true, the API returns segments (with speaker_id, timestamps) and language fields. These were being dropped in transform_audio_transcription_response which only extracted text.

Now segments and language are preserved on the TranscriptionResponse object, matching the pattern used by other providers like Deepgram.

Fixes BerriAI#23890 — Mistral's Voxtral transcription with `diarize=true` returns
`segments` (with speaker_id, timestamps) and `language`, but these fields
were dropped when mapping the response to TranscriptionResponse.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 2:07am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR fixes a data-loss bug where Mistral Voxtral's diarization fields (segments and language) were silently dropped during transform_audio_transcription_response, leaving users unable to access speaker-attributed transcript segments.

Changes:

  • litellm/llms/mistral/audio_transcription/transformation.py: After constructing the base TranscriptionResponse, the fix conditionally copies segments and language from the raw API response JSON to the top-level response object using dictionary-style attribute assignment (response["key"] = value).
  • tests/test_litellm/llms/mistral/audio_transcription/test_mistral_audio_transcription_transformation.py: Adds test_mistral_audio_transcription_response_transform_diarized, a fully mocked unit test verifying that both segments (with speaker_id, start, end) and language (including None) survive the transformation.

The implementation correctly follows the pattern already used by Deepgram (response["language"], response["words"]) and ElevenLabs (response["language"]), and no real network calls are made in the new test.

Confidence Score: 5/5

  • This PR is safe to merge — it is a small, additive, non-breaking change with appropriate mock test coverage.
  • The change is minimal (7 lines), follows an already-established pattern used by Deepgram and ElevenLabs, introduces no breaking changes (segments/language are only surfaced when present in the response), and is fully covered by a mock unit test. No hardcoded model flags, no FastAPI imports, no DB calls, and no backwards-incompatible behaviour changes.
  • No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/mistral/audio_transcription/transformation.py Adds preservation of segments and language fields from Mistral's diarized transcription API response, following the same pattern used by Deepgram and ElevenLabs.
tests/test_litellm/llms/mistral/audio_transcription/test_mistral_audio_transcription_transformation.py Adds a mock-only unit test test_mistral_audio_transcription_response_transform_diarized that verifies segments and language are preserved in the transformed response when diarization is active.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant MistralAPI

    Client->>LiteLLM: transcription(model="mistral/voxtral-mini-latest", diarize=True)
    LiteLLM->>MistralAPI: POST /v1/audio/transcriptions (form: diarize=true)
    MistralAPI-->>LiteLLM: { text, language, segments: [{speaker_id, start, end, ...}], usage }
    Note over LiteLLM: transform_audio_transcription_response()<br/>Extract text → TranscriptionResponse<br/>Preserve segments → response["segments"]<br/>Preserve language → response["language"]<br/>Store full JSON → _hidden_params
    LiteLLM-->>Client: TranscriptionResponse(text, segments, language)
Loading

Last reviewed commit: "fix(mistral): preser..."

@Chesars Chesars merged commit f059ba5 into BerriAI:litellm_oss_staging_03_17_2026 Mar 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant