Skip to content

Commit ac15a16

Browse files
crysmagsclaude
andauthored
doc(skills): add LLMObs integration and testing skills (#7655)
* feat(skills): Add LLMObs integration and testing skills Adds two new agent skills for LLM Observability (LLMObs) instrumentation, these skills provide comprehensive guidance for agents creating LLMObs integrations, ensuring proper category classification, message handling, and testing strategies. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
1 parent b6d42d8 commit ac15a16

File tree

10 files changed

+1990
-0
lines changed

10 files changed

+1990
-0
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
name: llmobs-integration
3+
description: |
4+
This skill should be used when the user asks to "add LLMObs support", "create an LLMObs plugin",
5+
"instrument an LLM library", "add LLM Observability", "add llmobs", "add llm observability",
6+
"instrument chat completions", "instrument streaming", "instrument embeddings",
7+
"instrument agent runs", "instrument orchestration", "instrument LLM",
8+
"LLMObsPlugin", "LlmObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags",
9+
"tagLLMIO", "tagEmbeddingIO", "tagRetrievalIO", "tagTextIO", "tagMetrics", "tagMetadata",
10+
"tagSpanTags", "tagPrompt", "LlmObsCategory", "LlmObsSpanKind",
11+
"span kind llm", "span kind workflow", "span kind agent", "span kind embedding",
12+
"span kind tool", "span kind retrieval",
13+
"openai llmobs", "anthropic llmobs", "genai llmobs", "google llmobs",
14+
"langchain llmobs", "langgraph llmobs", "ai-sdk llmobs",
15+
"llm span", "llmobs span event", "model provider", "model name",
16+
"CompositePlugin llmobs", "llmobs tracing", "VCR cassettes",
17+
or needs to build, modify, or debug an LLMObs plugin for any LLM library in dd-trace-js.
18+
---
19+
20+
# LLM Observability Integration Skill
21+
22+
## Purpose
23+
24+
This skill helps you create LLMObs plugins that instrument LLM library operations and emit proper span events for LLM observability in dd-trace-js. Supported operation types include:
25+
26+
- **Chat completions** — standard request/response LLM calls
27+
- **Streaming chat completions** — streamed token-by-token responses
28+
- **Embeddings** — vector embedding generation
29+
- **Agent runs** — autonomous LLM agent execution loops
30+
- **Orchestration** — multi-step workflow and graph execution (langgraph, etc.)
31+
- **Tool calls** — tool/function invocations
32+
- **Retrieval** — vector DB / RAG operations
33+
34+
## When to Use
35+
36+
- Creating a new LLMObs plugin for an LLM library
37+
- Adding LLMObs support to an existing tracing integration
38+
- Understanding LLMObsPlugin architecture and patterns
39+
- Determining how to instrument a new LLM package
40+
41+
## Core Concepts
42+
43+
### 1. LLMObsPlugin Base Class
44+
45+
All LLMObs plugins extend the `LLMObsPlugin` base class, which provides the core instrumentation framework.
46+
47+
**Key responsibilities:**
48+
- **Span registration**: Define span metadata (model provider, model name, span kind)
49+
- **Tag extraction**: Extract and tag LLM-specific data (messages, metrics, metadata)
50+
- **Context management**: Handle span lifecycle and parent context
51+
52+
**Required methods to implement:**
53+
- `getLLMObsSpanRegisterOptions(ctx)` - Returns span registration options (modelProvider, modelName, kind, name)
54+
- `setLLMObsTags(ctx)` - Extracts and tags LLM data (input/output messages, metrics, metadata)
55+
56+
**Plugin lifecycle:**
57+
1. `start(ctx)` - Registers span with LLMObs, captures context
58+
2. Operation executes (chat completion call)
59+
3. `asyncEnd(ctx)` - Calls `setLLMObsTags()` to extract and tag data
60+
4. `end(ctx)` - Restores parent context
61+
62+
See [references/plugin-architecture.md](references/plugin-architecture.md) for complete implementation details.
63+
64+
### 2. Package Category System
65+
66+
**CRITICAL:** Every integration must be classified into one category using the `LlmObsCategory` enum. This determines test strategy and implementation approach.
67+
68+
#### LlmObsCategory Enum Values
69+
70+
- **`LlmObsCategory.LLM_CLIENT`** - Direct API wrappers (openai, anthropic, genai)
71+
- Signs: Makes HTTP calls to LLM provider endpoints, requires API keys
72+
- Test strategy: VCR with real API calls via proxy
73+
- Instrumentation: Hook chat/completion methods
74+
75+
- **`LlmObsCategory.MULTI_PROVIDER`** - Multi-provider frameworks (ai-sdk, langchain)
76+
- Signs: Supports multiple LLM providers via configuration, wraps LLM_CLIENT libraries
77+
- Test strategy: VCR with real API calls via proxy
78+
- Instrumentation: Hook provider abstraction layer
79+
80+
- **`LlmObsCategory.ORCHESTRATION`** - Workflow managers (langgraph)
81+
- Signs: Graph/workflow execution, state management, NO direct HTTP to LLM providers
82+
- Test strategy: Pure function tests, NO VCR, NO real API calls
83+
- Instrumentation: Hook workflow lifecycle (invoke, stream, run)
84+
- **Special:** Tests should use actual LLM as orchestration node (not mock responses)
85+
86+
- **`LlmObsCategory.INFRASTRUCTURE`** - Protocols/servers (MCP)
87+
- Signs: Protocol implementation, server/client architecture, transport layers
88+
- Test strategy: Mock server tests
89+
- Instrumentation: Hook protocol handlers
90+
91+
#### Decision Tree
92+
93+
Answer these questions by reading the code:
94+
95+
1. **Does the package make direct HTTP calls to LLM provider endpoints?**
96+
- YES → Go to question 2
97+
- NO → Go to question 3
98+
99+
2. **Does it support multiple LLM providers via configuration?**
100+
- YES → **`LlmObsCategory.MULTI_PROVIDER`**
101+
- NO → **`LlmObsCategory.LLM_CLIENT`**
102+
103+
3. **Does it implement workflow/graph orchestration with state management?**
104+
- YES → **`LlmObsCategory.ORCHESTRATION`**
105+
- NO → **`LlmObsCategory.INFRASTRUCTURE`**
106+
107+
See [references/category-detection.md](references/category-detection.md) for detailed heuristics and examples.
108+
109+
### 3. LLM Span Kinds
110+
111+
Use the `LlmObsSpanKind` enum:
112+
113+
- **`LlmObsSpanKind.LLM`** - Chat completions, text generation
114+
- **`LlmObsSpanKind.WORKFLOW`** - Graph/chain execution
115+
- **`LlmObsSpanKind.AGENT`** - Agent runs
116+
- **`LlmObsSpanKind.TOOL`** - Tool/function calls
117+
- **`LlmObsSpanKind.EMBEDDING`** - Embedding generation
118+
- **`LlmObsSpanKind.RETRIEVAL`** - Vector DB/RAG retrieval
119+
120+
**Most common:** Use `'llm'` for chat completions/text generation in LLM_CLIENT and MULTI_PROVIDER categories.
121+
122+
### 4. Message Extraction
123+
124+
All plugins must convert provider-specific message formats to the standard format:
125+
126+
**Standard format:** `[{content: string, role: string}]`
127+
128+
**Common roles:** `'user'`, `'assistant'`, `'system'`, `'tool'`
129+
130+
**Provider-specific handling:**
131+
- OpenAI: Direct format match, handle `function_call` and `tool_calls`
132+
- Anthropic: Map `role` values, flatten nested content arrays
133+
- Google GenAI: Extract from `parts` arrays, map role names
134+
- Multi-provider: Detect provider and apply appropriate extraction
135+
136+
See [references/message-extraction.md](references/message-extraction.md) for provider-specific patterns.
137+
138+
## Implementation Steps
139+
140+
1. **Detect package category** (REQUIRED FIRST STEP)
141+
- Follow decision tree above
142+
- Output: category, confidence, reasoning
143+
144+
2. **Create plugin file**
145+
- Location: `packages/dd-trace/src/llmobs/plugins/{integration}/index.js`
146+
- Extend: `LLMObsPlugin` base class
147+
- Implement: Required methods per plugin architecture
148+
149+
3. **Implement `getLLMObsSpanRegisterOptions(ctx)`**
150+
- Extract model provider and name from context
151+
- Determine span kind (usually `'llm'`)
152+
- Return registration options object
153+
154+
4. **Implement `setLLMObsTags(ctx)`**
155+
- Extract input messages from `ctx.arguments`
156+
- Extract output messages from `ctx.result`
157+
- Extract token metrics (input_tokens, output_tokens, total_tokens)
158+
- Extract metadata (temperature, max_tokens, etc.)
159+
- Tag span using `this._tagger` methods
160+
161+
5. **Handle edge cases**
162+
- Streaming responses (if applicable)
163+
- Error cases (empty output messages)
164+
- Non-standard message formats
165+
- Missing metadata
166+
167+
See [references/plugin-architecture.md](references/plugin-architecture.md) for step-by-step implementation guide.
168+
169+
## Common Patterns
170+
171+
Based on category:
172+
173+
- **LLM_CLIENT**: Messages in array, straightforward extraction from `result.choices[0]` or equivalent
174+
- **MULTI_PROVIDER**: Handle multiple provider formats with provider detection logic
175+
- **ORCHESTRATION**: May use `'workflow'` span kind instead of `'llm'`, focus on lifecycle events
176+
- **INFRASTRUCTURE**: Protocol-specific instrumentation, may not have traditional messages
177+
178+
## Plugin Registration
179+
180+
All plugins must export an array:
181+
182+
**Static properties required:**
183+
- `integration` - Integration name (e.g., 'openai')
184+
- `id` - Unique plugin ID (e.g., 'llmobs_openai')
185+
- `prefix` - Channel prefix (e.g., 'tracing:apm:openai:chat')
186+
187+
## References
188+
189+
For detailed information, see:
190+
191+
- [references/plugin-architecture.md](references/plugin-architecture.md) - Complete plugin structure, implementation steps, helper methods
192+
- [references/category-detection.md](references/category-detection.md) - Package classification heuristics and detection process
193+
- [references/message-extraction.md](references/message-extraction.md) - Provider-specific message format patterns
194+
- [references/reference-implementations.md](references/reference-implementations.md) - Working plugin examples (Anthropic, Google GenAI)
195+
196+
## Key Principles
197+
198+
1. **Category determines approach** - Always detect category first using decision tree
199+
2. **Use enum values** - Reference `LlmObsCategory` and `LlmObsSpanKind` enums from models
200+
3. **Standard message format** - Always convert to `[{content, role}]` format
201+
4. **Complete metadata** - Extract all available model parameters and token metrics
202+
5. **Error handling** - Handle failures gracefully (empty messages on error)
203+
6. **Test strategy follows category** - VCR for clients, pure functions for orchestration
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Package Category Detection Reference
2+
3+
Detailed guide for classifying LLM packages into `LlmObsCategory` enum values.
4+
5+
## Categories Explained
6+
7+
### LlmObsCategory.LLM_CLIENT
8+
9+
**Definition:** Direct wrappers around LLM provider APIs.
10+
11+
**Examples:**
12+
- `@google/generative-ai` - Google GenAI client (recommended reference implementation)
13+
- `@anthropic-ai/sdk` - Anthropic Claude client (recommended reference implementation)
14+
- `openai` - OpenAI API client
15+
16+
**Observable signs:**
17+
- Package name contains provider name (openai, anthropic, genai, etc.)
18+
- Has chat/completion/embedding methods (`chat.completions.create`, `messages.create`)
19+
- Makes HTTP calls directly to LLM provider endpoints
20+
- Requires API keys for authentication
21+
- Has HTTP client dependencies (axios, fetch, request)
22+
- Code contains HTTP request patterns
23+
24+
**Test strategy:** VCR with real API calls via proxy
25+
26+
**Enum value:** `LlmObsCategory.LLM_CLIENT`
27+
28+
### LlmObsCategory.MULTI_PROVIDER
29+
30+
**Definition:** Unified interfaces that abstract multiple LLM providers.
31+
32+
**Examples:**
33+
- `@ai-sdk/vercel` - Vercel AI SDK
34+
- `langchain` - LangChain framework
35+
36+
**Observable signs:**
37+
- Package name suggests multi-provider (ai-sdk, langchain)
38+
- Provider configuration and switching support
39+
- Wraps multiple Category 1 libraries
40+
- Dependencies include 2+ LLM provider SDKs
41+
- Has abstraction layers over providers
42+
43+
**Test strategy:** VCR with real API calls via proxy
44+
45+
**Enum value:** `LlmObsCategory.MULTI_PROVIDER`
46+
47+
### LlmObsCategory.ORCHESTRATION
48+
49+
**Definition:** Workflow/graph managers that coordinate LLM calls but don't make them directly.
50+
51+
**Examples:**
52+
- `@langchain/langgraph` - LangGraph workflow engine
53+
- Workflow engines, agent coordinators
54+
55+
**Observable signs:**
56+
- Package name suggests orchestration (langgraph, crew, workflow, graph)
57+
- Has graph/workflow/chain execution methods (`invoke`, `stream`, `run`)
58+
- Manages state and control flow between nodes/agents
59+
- Dependencies include orchestration libraries (e.g., @langchain/core)
60+
- Methods focus on state management, not API calls
61+
62+
**Test strategy:** Pure function tests, NO VCR, NO real API calls
63+
64+
**Enum value:** `LlmObsCategory.ORCHESTRATION`
65+
66+
### LlmObsCategory.INFRASTRUCTURE
67+
68+
**Definition:** Communication protocols, server frameworks, infrastructure layers.
69+
70+
**Examples:**
71+
- MCP (Model Context Protocol) clients/servers
72+
- Protocol implementations
73+
- Transport layers
74+
75+
**Observable signs:**
76+
- Package name suggests infrastructure (mcp, protocol, server, transport)
77+
- Implements protocols or server/client architecture
78+
- Transport layer code
79+
80+
**Test strategy:** Mock server tests
81+
82+
**Enum value:** `LlmObsCategory.INFRASTRUCTURE`
83+
84+
## Decision Tree
85+
86+
Follow this tree to determine category:
87+
88+
```
89+
1. Does the package make direct HTTP calls to LLM provider endpoints?
90+
├─ YES → Go to question 2
91+
└─ NO → Go to question 3
92+
93+
2. Does it support multiple LLM providers via configuration?
94+
├─ YES → LlmObsCategory.MULTI_PROVIDER
95+
└─ NO → LlmObsCategory.LLM_CLIENT
96+
97+
3. Does it implement workflow/graph orchestration with state management?
98+
├─ YES → LlmObsCategory.ORCHESTRATION
99+
└─ NO → LlmObsCategory.INFRASTRUCTURE
100+
```
101+
102+
## Detection Process
103+
104+
### Step 1: Read Package Name
105+
106+
Analyze package name for patterns:
107+
- Contains "openai", "anthropic", "genai" → Likely `LlmObsCategory.LLM_CLIENT`
108+
- Contains "langchain", "llamaindex", "ai-sdk" → Likely `LlmObsCategory.MULTI_PROVIDER`
109+
- Contains "langgraph", "crew", "workflow" → Likely `LlmObsCategory.ORCHESTRATION`
110+
- Contains "mcp", "protocol", "server" → Likely `LlmObsCategory.INFRASTRUCTURE`
111+
112+
### Step 2: Check package.json Dependencies
113+
114+
```bash
115+
cat node_modules/{{package}}/package.json
116+
```
117+
118+
Look for:
119+
- HTTP clients (axios, fetch, got) → `LlmObsCategory.LLM_CLIENT`
120+
- Multiple LLM SDKs (openai + anthropic + cohere) → `LlmObsCategory.MULTI_PROVIDER`
121+
- LangChain/orchestration libs → `LlmObsCategory.ORCHESTRATION`
122+
- Protocol/transport libs → `LlmObsCategory.INFRASTRUCTURE`
123+
124+
### Step 3: Check Exported Methods
125+
126+
```bash
127+
node -e "console.log(Object.keys(require('{{package}}')))"
128+
```
129+
130+
Method patterns:
131+
- `chat()`, `complete()`, `embed()``LlmObsCategory.LLM_CLIENT` or `MULTI_PROVIDER`
132+
- `invoke()`, `stream()`, `graph()`, `workflow()``LlmObsCategory.ORCHESTRATION`
133+
- `connect()`, `listen()`, `handle()``LlmObsCategory.INFRASTRUCTURE`
134+
135+
### Step 4: Analyze Source Code
136+
137+
Check for:
138+
- HTTP request patterns (`http.request`, `.post(`, `fetch(`) → `LlmObsCategory.LLM_CLIENT`
139+
- Provider switching logic → `LlmObsCategory.MULTI_PROVIDER`
140+
- State management, graph execution → `LlmObsCategory.ORCHESTRATION`
141+
- Protocol implementation → `LlmObsCategory.INFRASTRUCTURE`
142+
143+
## Real-World Examples
144+
145+
### Example 1: Anthropic (LLM_CLIENT)
146+
147+
**Package:** `@anthropic-ai/sdk` — see `packages/datadog-plugin-anthropic/`
148+
149+
**Category:** `LlmObsCategory.LLM_CLIENT` — name contains "anthropic", direct HTTP calls to Claude API, requires API key, methods are `messages.create`
150+
151+
### Example 2: Google GenAI (LLM_CLIENT)
152+
153+
**Package:** `@google/generative-ai` — see `packages/datadog-plugin-google-genai/`
154+
155+
**Category:** `LlmObsCategory.LLM_CLIENT` — name contains "genai", direct HTTP calls to Gemini API, complex nested message format (contents/parts)
156+
157+
### Example 3: Vercel AI SDK (MULTI_PROVIDER)
158+
159+
**Package:** `ai` (Vercel AI SDK)
160+
161+
- Name contains "ai-sdk" → multi_provider
162+
- Depends on openai + anthropic SDKs (multiple LLM providers)
163+
- Methods include provider-agnostic chat interface
164+
165+
**Category:** `LlmObsCategory.MULTI_PROVIDER`
166+
167+
### Example 4: LangGraph (ORCHESTRATION)
168+
169+
**Package:** `@langchain/langgraph` — see `packages/dd-trace/src/llmobs/plugins/langgraph/`
170+
171+
**Category:** `LlmObsCategory.ORCHESTRATION` — name indicates graph orchestration, depends on `@langchain/core`, methods manage workflow state (`StateGraph.invoke`, `Pregel.stream`), no direct LLM HTTP calls
172+
173+
## Edge Cases
174+
175+
When signals conflict or are weak, choose the category with the most evidence and prefer the category that matches test strategy needs: if the package makes HTTP calls it needs VCR (LLM_CLIENT/MULTI_PROVIDER); if it doesn't, use pure functions (ORCHESTRATION) or mock servers (INFRASTRUCTURE).
176+
177+
Some packages don't fit cleanly:
178+
- Utilities/helpers → Check what they instrument
179+
- Plugins/extensions → Follow parent library category
180+
- Hybrid packages → Categorize by primary function

0 commit comments

Comments
 (0)