fix(azure-ai-agents): preserve annotations in Bing Search grounding responses#23939
Conversation
Azure AI Agents with Grounding (e.g., Bing Search) include annotations (citation URLs) in responses, but the handler was dropping them during transformation. This fix: - Extracts annotations from text content in agent responses - Transforms them to OpenAI-compatible ChatCompletionAnnotation format - Passes annotations through all completion paths (sync, async, streaming) - Handles both polling and SSE streaming responses Fixes BerriAI#19126 Co-Authored-By: Claude Haiku 4.5 <[email protected]>
… in streaming - Fix bug where only last text item's annotations were preserved when thread.message.completed contained multiple text content items - Accumulate annotations via extend() instead of overwriting - Add test_azure_ai_agents_streaming_annotations_from_completed_message - Add test_azure_ai_agents_streaming_accumulates_annotations_from_multiple_text_items Addresses Greptile review on PR BerriAI#23849 Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a bug where Azure AI Foundry Agents with Bing Search grounding were silently dropping citation annotations from their responses. The fix introduces an Key changes:
Notes:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/azure_ai/agents/handler.py | Core fix: adds annotation extraction and transformation pipeline. _extract_content_from_messages now returns (content, annotations), _transform_annotations maps Azure format to OpenAI-compatible format, and _build_model_response/streaming path attach annotations to the response. The streaming path correctly accumulates annotations from all text content blocks; the non-streaming path still returns early on the first text block (previously flagged). Logic is otherwise sound. |
| tests/llm_translation/test_azure_agents.py | Adds 5 new tests covering annotation extraction, transformation, model response building (with and without annotations), and streaming annotation accumulation (including multi-block accumulation). All tests use mocks only (no real network calls). The streaming test is missing start_index/end_index assertions in the url_citation sub-object (previously flagged). |
Sequence Diagram
sequenceDiagram
participant C as Caller
participant H as AzureAIAgentsHandler
participant A as Azure AI Agent API
C->>H: completion() / acompletion()
H->>A: POST /threads (create thread)
A-->>H: {id: thread_id}
H->>A: POST /threads/{id}/messages
A-->>H: 200 OK
H->>A: POST /threads/{id}/runs
A-->>H: {id: run_id}
loop Poll until completed
H->>A: GET /threads/{id}/runs/{run_id}
A-->>H: {status: "..."}
end
H->>A: GET /threads/{id}/messages
A-->>H: {data: [{role: "assistant", content: [{type: "text", text: {value, annotations}}]}]}
H->>H: _extract_content_from_messages() → (content, annotations)
H->>H: _transform_annotations() maps Azure→OpenAI format
H->>H: _build_model_response(annotations=annotations)
H-->>C: ModelResponse with Message.annotations
Note over H,A: Streaming path (SSE)
C->>H: acompletion_stream()
H->>A: POST /threads/runs (stream=True)
loop SSE events
A-->>H: event: thread.created / data: {id}
A-->>H: event: thread.message.delta / data: {delta.content}
H-->>C: ModelResponseStream chunk (content)
A-->>H: event: thread.message.completed / data: {content[].text.annotations}
H->>H: collect_annotations.extend(_transform_annotations(...))
A-->>H: data: [DONE]
H-->>C: Final chunk with Delta.annotations = collected_annotations
end
Last reviewed commit: "Update litellm/llms/..."
| for msg in messages_data.get("data", []): | ||
| if msg.get("role") == "assistant": | ||
| for content_item in msg.get("content", []): | ||
| if content_item.get("type") == "text": | ||
| return content_item.get("text", {}).get("value", "") | ||
| return "" | ||
| text_obj = content_item.get("text", {}) | ||
| content = text_obj.get("value", "") | ||
| raw_annotations = text_obj.get("annotations") | ||
| annotations = self._transform_annotations( | ||
| raw_annotations | ||
| ) | ||
| return content, annotations |
There was a problem hiding this comment.
Non-streaming path only captures annotations from the first text content item
_extract_content_from_messages returns immediately upon finding the first type == "text" content item, so if an assistant message contains multiple text blocks, annotations (and content) from all subsequent blocks are silently dropped.
The streaming path (_process_sse_stream) handles this correctly by looping over all content items and accumulating all annotations via collected_annotations.extend(transformed). The non-streaming path should mirror that approach to be consistent.
Consider accumulating all text values and all annotations instead of returning early:
def _extract_content_from_messages(
self, messages_data: dict
) -> Tuple[str, Optional[List[Dict[str, Any]]]]:
for msg in messages_data.get("data", []):
if msg.get("role") == "assistant":
content_parts: List[str] = []
all_annotations: List[Dict[str, Any]] = []
for content_item in msg.get("content", []):
if content_item.get("type") == "text":
text_obj = content_item.get("text", {})
content_parts.append(text_obj.get("value", ""))
transformed = self._transform_annotations(
text_obj.get("annotations")
)
if transformed:
all_annotations.extend(transformed)
if content_parts:
return "".join(content_parts), all_annotations or None
return "", None| assert ann["url_citation"]["title"] == "Citation Source" | ||
|
|
||
|
|
||
| @pytest.mark.asyncio |
There was a problem hiding this comment.
Missing assertion on
start_index/end_index in streaming annotation test
test_azure_ai_agents_streaming_annotations_from_completed_message verifies url and title inside url_citation, but does not assert that start_index (12) and end_index (15) were also moved into the url_citation sub-object by _transform_annotations. Adding those assertions would give full coverage of the index-remapping logic for the streaming path, matching what test_azure_ai_agents_extract_content_with_annotations already covers for the non-streaming path.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
0673c57
into
BerriAI:litellm_dev_sameer_16_march_week
Relevant issues
Fixes #19126
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
When using Azure AI Foundry Agents with Grounding (e.g., Bing Search), the upstream agent response includes
annotations(citation URLs) in the message content. The LiteLLM handler was silently dropping these because_extract_content_from_messagesonly returned the text string, ignoring theannotationsfield.Changes:
_extract_content_from_messagesnow returns(content, annotations)instead of justcontent_transform_annotationsmethod maps Azure AI annotation format to OpenAI-compatibleChatCompletionAnnotation(movesstart_index/end_indexintourl_citationsub-object)_build_model_responseaccepts and passesannotationsto theMessageobject_process_sse_stream) extracts annotations fromthread.message.completedSSE events and includes them in the final chunk