Skip to content

fix(azure-ai-agents): preserve annotations in Bing Search grounding responses#23939

Merged
Sameerlite merged 3 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:Sameerlite/azure-ai-annotations
Mar 20, 2026
Merged

fix(azure-ai-agents): preserve annotations in Bing Search grounding responses#23939
Sameerlite merged 3 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:Sameerlite/azure-ai-annotations

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

Relevant issues

Fixes #19126

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

When using Azure AI Foundry Agents with Grounding (e.g., Bing Search), the upstream agent response includes annotations (citation URLs) in the message content. The LiteLLM handler was silently dropping these because _extract_content_from_messages only returned the text string, ignoring the annotations field.

Changes:

  • _extract_content_from_messages now returns (content, annotations) instead of just content
  • New _transform_annotations method maps Azure AI annotation format to OpenAI-compatible ChatCompletionAnnotation (moves start_index/end_index into url_citation sub-object)
  • _build_model_response accepts and passes annotations to the Message object
  • Streaming path (_process_sse_stream) extracts annotations from thread.message.completed SSE events and includes them in the final chunk
  • 4 new unit tests covering annotation extraction, transformation, and response building

Sameerlite and others added 2 commits March 17, 2026 15:55
Azure AI Agents with Grounding (e.g., Bing Search) include annotations
(citation URLs) in responses, but the handler was dropping them during
transformation. This fix:

- Extracts annotations from text content in agent responses
- Transforms them to OpenAI-compatible ChatCompletionAnnotation format
- Passes annotations through all completion paths (sync, async, streaming)
- Handles both polling and SSE streaming responses

Fixes BerriAI#19126

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
… in streaming

- Fix bug where only last text item's annotations were preserved when
  thread.message.completed contained multiple text content items
- Accumulate annotations via extend() instead of overwriting
- Add test_azure_ai_agents_streaming_annotations_from_completed_message
- Add test_azure_ai_agents_streaming_accumulates_annotations_from_multiple_text_items

Addresses Greptile review on PR BerriAI#23849

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 3:41am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Sameerlite:Sameerlite/azure-ai-annotations (6514446) with main (cfeafbe)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR fixes a bug where Azure AI Foundry Agents with Bing Search grounding were silently dropping citation annotations from their responses. The fix introduces an _transform_annotations helper that maps the Azure annotation format (where start_index/end_index live at the annotation root level) to the OpenAI-compatible format (where those indices live inside the url_citation sub-object), and threads annotations through both the non-streaming and streaming response-building paths.

Key changes:

  • _extract_content_from_messages now returns (content, annotations) instead of just content; callers updated accordingly across sync, async, and streaming paths
  • New _transform_annotations method handles url_citation type remapping and passes unknown annotation types through as-is (good forward-compatibility practice)
  • _build_model_response conditionally attaches annotations to the Message object only when non-empty, matching OpenAI spec behavior (where Message.annotations is deleted when absent)
  • Streaming path (_process_sse_stream) collects annotations from thread.message.completed SSE events across all text content items using extend, correctly accumulating multiple blocks before emitting them on the final [DONE] chunk
  • 5 new mock-only unit tests cover annotation extraction, transformation, response building, streaming accumulation, and multi-block accumulation — all compatible with CI requirements

Notes:

  • The non-streaming _extract_content_from_messages still exits on the first text content block (already discussed in a prior review thread), while the streaming path iterates all blocks correctly; in practice Azure returns a single text item per message, so this asymmetry is unlikely to affect real usage
  • Both Delta and Message Pydantic types already had annotations: Optional[List[ChatCompletionAnnotation]] fields, so no type changes were needed in litellm/types/

Confidence Score: 4/5

  • Safe to merge; the fix is narrow in scope and correctly preserves backward compatibility (annotations field is optional and omitted when empty, matching prior behavior)
  • The implementation is logically sound: annotation transformation is correct, both paths (sync/async/streaming) are updated, type system compatibility is maintained (ChatCompletionAnnotation/Delta.annotations were pre-existing), and 5 mock-only unit tests validate the behaviour. Score is 4 (not 5) because the non-streaming path still returns early on the first text content block (flagged in a prior review thread but not yet resolved), and the streaming annotation test is missing start_index/end_index assertions inside url_citation (also flagged previously)
  • No files require special attention; both changed files are self-contained within the Azure AI Agents handler and its test suite

Important Files Changed

Filename Overview
litellm/llms/azure_ai/agents/handler.py Core fix: adds annotation extraction and transformation pipeline. _extract_content_from_messages now returns (content, annotations), _transform_annotations maps Azure format to OpenAI-compatible format, and _build_model_response/streaming path attach annotations to the response. The streaming path correctly accumulates annotations from all text content blocks; the non-streaming path still returns early on the first text block (previously flagged). Logic is otherwise sound.
tests/llm_translation/test_azure_agents.py Adds 5 new tests covering annotation extraction, transformation, model response building (with and without annotations), and streaming annotation accumulation (including multi-block accumulation). All tests use mocks only (no real network calls). The streaming test is missing start_index/end_index assertions in the url_citation sub-object (previously flagged).

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant H as AzureAIAgentsHandler
    participant A as Azure AI Agent API

    C->>H: completion() / acompletion()
    H->>A: POST /threads (create thread)
    A-->>H: {id: thread_id}
    H->>A: POST /threads/{id}/messages
    A-->>H: 200 OK
    H->>A: POST /threads/{id}/runs
    A-->>H: {id: run_id}
    loop Poll until completed
        H->>A: GET /threads/{id}/runs/{run_id}
        A-->>H: {status: "..."}
    end
    H->>A: GET /threads/{id}/messages
    A-->>H: {data: [{role: "assistant", content: [{type: "text", text: {value, annotations}}]}]}
    H->>H: _extract_content_from_messages() → (content, annotations)
    H->>H: _transform_annotations() maps Azure→OpenAI format
    H->>H: _build_model_response(annotations=annotations)
    H-->>C: ModelResponse with Message.annotations

    Note over H,A: Streaming path (SSE)
    C->>H: acompletion_stream()
    H->>A: POST /threads/runs (stream=True)
    loop SSE events
        A-->>H: event: thread.created / data: {id}
        A-->>H: event: thread.message.delta / data: {delta.content}
        H-->>C: ModelResponseStream chunk (content)
        A-->>H: event: thread.message.completed / data: {content[].text.annotations}
        H->>H: collect_annotations.extend(_transform_annotations(...))
        A-->>H: data: [DONE]
        H-->>C: Final chunk with Delta.annotations = collected_annotations
    end
Loading

Last reviewed commit: "Update litellm/llms/..."

Comment on lines 108 to +118
for msg in messages_data.get("data", []):
if msg.get("role") == "assistant":
for content_item in msg.get("content", []):
if content_item.get("type") == "text":
return content_item.get("text", {}).get("value", "")
return ""
text_obj = content_item.get("text", {})
content = text_obj.get("value", "")
raw_annotations = text_obj.get("annotations")
annotations = self._transform_annotations(
raw_annotations
)
return content, annotations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-streaming path only captures annotations from the first text content item

_extract_content_from_messages returns immediately upon finding the first type == "text" content item, so if an assistant message contains multiple text blocks, annotations (and content) from all subsequent blocks are silently dropped.

The streaming path (_process_sse_stream) handles this correctly by looping over all content items and accumulating all annotations via collected_annotations.extend(transformed). The non-streaming path should mirror that approach to be consistent.

Consider accumulating all text values and all annotations instead of returning early:

def _extract_content_from_messages(
    self, messages_data: dict
) -> Tuple[str, Optional[List[Dict[str, Any]]]]:
    for msg in messages_data.get("data", []):
        if msg.get("role") == "assistant":
            content_parts: List[str] = []
            all_annotations: List[Dict[str, Any]] = []
            for content_item in msg.get("content", []):
                if content_item.get("type") == "text":
                    text_obj = content_item.get("text", {})
                    content_parts.append(text_obj.get("value", ""))
                    transformed = self._transform_annotations(
                        text_obj.get("annotations")
                    )
                    if transformed:
                        all_annotations.extend(transformed)
            if content_parts:
                return "".join(content_parts), all_annotations or None
    return "", None

assert ann["url_citation"]["title"] == "Citation Source"


@pytest.mark.asyncio
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing assertion on start_index/end_index in streaming annotation test

test_azure_ai_agents_streaming_annotations_from_completed_message verifies url and title inside url_citation, but does not assert that start_index (12) and end_index (15) were also moved into the url_citation sub-object by _transform_annotations. Adding those assertions would give full coverage of the index-remapping logic for the streaming path, matching what test_azure_ai_agents_extract_content_with_annotations already covers for the non-streaming path.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@Sameerlite Sameerlite changed the base branch from main to litellm_dev_sameer_16_march_week March 20, 2026 18:03
@Sameerlite Sameerlite merged commit 0673c57 into BerriAI:litellm_dev_sameer_16_march_week Mar 20, 2026
17 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Azure AI Foundry Agents (Grounding with Bing Search): annotations missing in LiteLLM Proxy response (present in upstream agent response)

1 participant