fix(azure-ai-agents): preserve annotations in Bing Search grounding responses by Sameerlite · Pull Request #23939 · BerriAI/litellm

Sameerlite · 2026-03-18T03:35:52Z

Relevant issues

Fixes #19126

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

When using Azure AI Foundry Agents with Grounding (e.g., Bing Search), the upstream agent response includes annotations (citation URLs) in the message content. The LiteLLM handler was silently dropping these because _extract_content_from_messages only returned the text string, ignoring the annotations field.

Changes:

_extract_content_from_messages now returns (content, annotations) instead of just content
New _transform_annotations method maps Azure AI annotation format to OpenAI-compatible ChatCompletionAnnotation (moves start_index/end_index into url_citation sub-object)
_build_model_response accepts and passes annotations to the Message object
Streaming path (_process_sse_stream) extracts annotations from thread.message.completed SSE events and includes them in the final chunk
4 new unit tests covering annotation extraction, transformation, and response building

Azure AI Agents with Grounding (e.g., Bing Search) include annotations (citation URLs) in responses, but the handler was dropping them during transformation. This fix: - Extracts annotations from text content in agent responses - Transforms them to OpenAI-compatible ChatCompletionAnnotation format - Passes annotations through all completion paths (sync, async, streaming) - Handles both polling and SSE streaming responses Fixes BerriAI#19126 Co-Authored-By: Claude Haiku 4.5 <[email protected]>

… in streaming - Fix bug where only last text item's annotations were preserved when thread.message.completed contained multiple text content items - Accumulate annotations via extend() instead of overwriting - Add test_azure_ai_agents_streaming_annotations_from_completed_message - Add test_azure_ai_agents_streaming_accumulates_annotations_from_multiple_text_items Addresses Greptile review on PR BerriAI#23849 Made-with: Cursor

vercel · 2026-03-18T03:35:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 18, 2026 3:41am

codspeed-hq · 2026-03-18T03:38:01Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Sameerlite:Sameerlite/azure-ai-annotations (6514446) with main (cfeafbe)}

greptile-apps · 2026-03-18T03:38:55Z

Greptile Summary

This PR fixes a bug where Azure AI Foundry Agents with Bing Search grounding were silently dropping citation annotations from their responses. The fix introduces an _transform_annotations helper that maps the Azure annotation format (where start_index/end_index live at the annotation root level) to the OpenAI-compatible format (where those indices live inside the url_citation sub-object), and threads annotations through both the non-streaming and streaming response-building paths.

Key changes:

_extract_content_from_messages now returns (content, annotations) instead of just content; callers updated accordingly across sync, async, and streaming paths
New _transform_annotations method handles url_citation type remapping and passes unknown annotation types through as-is (good forward-compatibility practice)
_build_model_response conditionally attaches annotations to the Message object only when non-empty, matching OpenAI spec behavior (where Message.annotations is deleted when absent)
Streaming path (_process_sse_stream) collects annotations from thread.message.completed SSE events across all text content items using extend, correctly accumulating multiple blocks before emitting them on the final [DONE] chunk
5 new mock-only unit tests cover annotation extraction, transformation, response building, streaming accumulation, and multi-block accumulation — all compatible with CI requirements

Notes:

The non-streaming _extract_content_from_messages still exits on the first text content block (already discussed in a prior review thread), while the streaming path iterates all blocks correctly; in practice Azure returns a single text item per message, so this asymmetry is unlikely to affect real usage
Both Delta and Message Pydantic types already had annotations: Optional[List[ChatCompletionAnnotation]] fields, so no type changes were needed in litellm/types/

Confidence Score: 4/5

Safe to merge; the fix is narrow in scope and correctly preserves backward compatibility (annotations field is optional and omitted when empty, matching prior behavior)
The implementation is logically sound: annotation transformation is correct, both paths (sync/async/streaming) are updated, type system compatibility is maintained (ChatCompletionAnnotation/Delta.annotations were pre-existing), and 5 mock-only unit tests validate the behaviour. Score is 4 (not 5) because the non-streaming path still returns early on the first text content block (flagged in a prior review thread but not yet resolved), and the streaming annotation test is missing start_index/end_index assertions inside url_citation (also flagged previously)
No files require special attention; both changed files are self-contained within the Azure AI Agents handler and its test suite

Important Files Changed

Filename	Overview
litellm/llms/azure_ai/agents/handler.py	Core fix: adds annotation extraction and transformation pipeline. _extract_content_from_messages now returns (content, annotations), _transform_annotations maps Azure format to OpenAI-compatible format, and _build_model_response/streaming path attach annotations to the response. The streaming path correctly accumulates annotations from all text content blocks; the non-streaming path still returns early on the first text block (previously flagged). Logic is otherwise sound.
tests/llm_translation/test_azure_agents.py	Adds 5 new tests covering annotation extraction, transformation, model response building (with and without annotations), and streaming annotation accumulation (including multi-block accumulation). All tests use mocks only (no real network calls). The streaming test is missing start_index/end_index assertions in the url_citation sub-object (previously flagged).

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant H as AzureAIAgentsHandler
    participant A as Azure AI Agent API

    C->>H: completion() / acompletion()
    H->>A: POST /threads (create thread)
    A-->>H: {id: thread_id}
    H->>A: POST /threads/{id}/messages
    A-->>H: 200 OK
    H->>A: POST /threads/{id}/runs
    A-->>H: {id: run_id}
    loop Poll until completed
        H->>A: GET /threads/{id}/runs/{run_id}
        A-->>H: {status: "..."}
    end
    H->>A: GET /threads/{id}/messages
    A-->>H: {data: [{role: "assistant", content: [{type: "text", text: {value, annotations}}]}]}
    H->>H: _extract_content_from_messages() → (content, annotations)
    H->>H: _transform_annotations() maps Azure→OpenAI format
    H->>H: _build_model_response(annotations=annotations)
    H-->>C: ModelResponse with Message.annotations

    Note over H,A: Streaming path (SSE)
    C->>H: acompletion_stream()
    H->>A: POST /threads/runs (stream=True)
    loop SSE events
        A-->>H: event: thread.created / data: {id}
        A-->>H: event: thread.message.delta / data: {delta.content}
        H-->>C: ModelResponseStream chunk (content)
        A-->>H: event: thread.message.completed / data: {content[].text.annotations}
        H->>H: collect_annotations.extend(_transform_annotations(...))
        A-->>H: data: [DONE]
        H-->>C: Final chunk with Delta.annotations = collected_annotations
    end

_{Last reviewed commit: "Update litellm/llms/..."}

greptile-apps · 2026-03-18T03:38:58Z

litellm/llms/azure_ai/agents/handler.py

        for msg in messages_data.get("data", []):
            if msg.get("role") == "assistant":
                for content_item in msg.get("content", []):
                    if content_item.get("type") == "text":
-                        return content_item.get("text", {}).get("value", "")
-        return ""
+                        text_obj = content_item.get("text", {})
+                        content = text_obj.get("value", "")
+                        raw_annotations = text_obj.get("annotations")
+                        annotations = self._transform_annotations(
+                            raw_annotations
+                        )
+                        return content, annotations


Non-streaming path only captures annotations from the first text content item

_extract_content_from_messages returns immediately upon finding the first type == "text" content item, so if an assistant message contains multiple text blocks, annotations (and content) from all subsequent blocks are silently dropped.

The streaming path (_process_sse_stream) handles this correctly by looping over all content items and accumulating all annotations via collected_annotations.extend(transformed). The non-streaming path should mirror that approach to be consistent.

Consider accumulating all text values and all annotations instead of returning early:

def _extract_content_from_messages( self, messages_data: dict ) -> Tuple[str, Optional[List[Dict[str, Any]]]]: for msg in messages_data.get("data", []): if msg.get("role") == "assistant": content_parts: List[str] = [] all_annotations: List[Dict[str, Any]] = [] for content_item in msg.get("content", []): if content_item.get("type") == "text": text_obj = content_item.get("text", {}) content_parts.append(text_obj.get("value", "")) transformed = self._transform_annotations( text_obj.get("annotations") ) if transformed: all_annotations.extend(transformed) if content_parts: return "".join(content_parts), all_annotations or None return "", None

litellm/llms/azure_ai/agents/handler.py

greptile-apps · 2026-03-18T03:39:02Z

tests/llm_translation/test_azure_agents.py

+    assert ann["url_citation"]["title"] == "Citation Source"
+
+
+@pytest.mark.asyncio


Missing assertion on start_index/end_index in streaming annotation test

test_azure_ai_agents_streaming_annotations_from_completed_message verifies url and title inside url_citation, but does not assert that start_index (12) and end_index (15) were also moved into the url_citation sub-object by _transform_annotations. Adding those assertions would give full coverage of the index-remapping logic for the streaming path, matching what test_azure_ai_agents_extract_content_with_annotations already covers for the non-streaming path.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Sameerlite and others added 2 commits March 17, 2026 15:55

vercel bot deployed to Preview March 18, 2026 03:37 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

Update litellm/llms/azure_ai/agents/handler.py

6514446

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel bot deployed to Preview March 18, 2026 03:41 View deployment

Sameerlite changed the base branch from main to litellm_dev_sameer_16_march_week March 20, 2026 18:03

Sameerlite merged commit 0673c57 into BerriAI:litellm_dev_sameer_16_march_week Mar 20, 2026
17 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(azure-ai-agents): preserve annotations in Bing Search grounding responses#23939

fix(azure-ai-agents): preserve annotations in Bing Search grounding responses#23939
Sameerlite merged 3 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:Sameerlite/azure-ai-annotations

Sameerlite commented Mar 18, 2026

Uh oh!

vercel bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert ann["url_citation"]["title"] == "Citation Source"


		@pytest.mark.asyncio

Uh oh!

Conversation

Sameerlite commented Mar 18, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 18, 2026 •

edited

Loading

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading