|
| 1 | +--- |
| 2 | +name: llmobs-integration |
| 3 | +description: | |
| 4 | + This skill should be used when the user asks to "add LLMObs support", "create an LLMObs plugin", |
| 5 | + "instrument an LLM library", "add LLM Observability", "add llmobs", "add llm observability", |
| 6 | + "instrument chat completions", "instrument streaming", "instrument embeddings", |
| 7 | + "instrument agent runs", "instrument orchestration", "instrument LLM", |
| 8 | + "LLMObsPlugin", "LlmObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags", |
| 9 | + "tagLLMIO", "tagEmbeddingIO", "tagRetrievalIO", "tagTextIO", "tagMetrics", "tagMetadata", |
| 10 | + "tagSpanTags", "tagPrompt", "LlmObsCategory", "LlmObsSpanKind", |
| 11 | + "span kind llm", "span kind workflow", "span kind agent", "span kind embedding", |
| 12 | + "span kind tool", "span kind retrieval", |
| 13 | + "openai llmobs", "anthropic llmobs", "genai llmobs", "google llmobs", |
| 14 | + "langchain llmobs", "langgraph llmobs", "ai-sdk llmobs", |
| 15 | + "llm span", "llmobs span event", "model provider", "model name", |
| 16 | + "CompositePlugin llmobs", "llmobs tracing", "VCR cassettes", |
| 17 | + or needs to build, modify, or debug an LLMObs plugin for any LLM library in dd-trace-js. |
| 18 | +--- |
| 19 | + |
| 20 | +# LLM Observability Integration Skill |
| 21 | + |
| 22 | +## Purpose |
| 23 | + |
| 24 | +This skill helps you create LLMObs plugins that instrument LLM library operations and emit proper span events for LLM observability in dd-trace-js. Supported operation types include: |
| 25 | + |
| 26 | +- **Chat completions** — standard request/response LLM calls |
| 27 | +- **Streaming chat completions** — streamed token-by-token responses |
| 28 | +- **Embeddings** — vector embedding generation |
| 29 | +- **Agent runs** — autonomous LLM agent execution loops |
| 30 | +- **Orchestration** — multi-step workflow and graph execution (langgraph, etc.) |
| 31 | +- **Tool calls** — tool/function invocations |
| 32 | +- **Retrieval** — vector DB / RAG operations |
| 33 | + |
| 34 | +## When to Use |
| 35 | + |
| 36 | +- Creating a new LLMObs plugin for an LLM library |
| 37 | +- Adding LLMObs support to an existing tracing integration |
| 38 | +- Understanding LLMObsPlugin architecture and patterns |
| 39 | +- Determining how to instrument a new LLM package |
| 40 | + |
| 41 | +## Core Concepts |
| 42 | + |
| 43 | +### 1. LLMObsPlugin Base Class |
| 44 | + |
| 45 | +All LLMObs plugins extend the `LLMObsPlugin` base class, which provides the core instrumentation framework. |
| 46 | + |
| 47 | +**Key responsibilities:** |
| 48 | +- **Span registration**: Define span metadata (model provider, model name, span kind) |
| 49 | +- **Tag extraction**: Extract and tag LLM-specific data (messages, metrics, metadata) |
| 50 | +- **Context management**: Handle span lifecycle and parent context |
| 51 | + |
| 52 | +**Required methods to implement:** |
| 53 | +- `getLLMObsSpanRegisterOptions(ctx)` - Returns span registration options (modelProvider, modelName, kind, name) |
| 54 | +- `setLLMObsTags(ctx)` - Extracts and tags LLM data (input/output messages, metrics, metadata) |
| 55 | + |
| 56 | +**Plugin lifecycle:** |
| 57 | +1. `start(ctx)` - Registers span with LLMObs, captures context |
| 58 | +2. Operation executes (chat completion call) |
| 59 | +3. `asyncEnd(ctx)` - Calls `setLLMObsTags()` to extract and tag data |
| 60 | +4. `end(ctx)` - Restores parent context |
| 61 | + |
| 62 | +See [references/plugin-architecture.md](references/plugin-architecture.md) for complete implementation details. |
| 63 | + |
| 64 | +### 2. Package Category System |
| 65 | + |
| 66 | +**CRITICAL:** Every integration must be classified into one category using the `LlmObsCategory` enum. This determines test strategy and implementation approach. |
| 67 | + |
| 68 | +#### LlmObsCategory Enum Values |
| 69 | + |
| 70 | +- **`LlmObsCategory.LLM_CLIENT`** - Direct API wrappers (openai, anthropic, genai) |
| 71 | + - Signs: Makes HTTP calls to LLM provider endpoints, requires API keys |
| 72 | + - Test strategy: VCR with real API calls via proxy |
| 73 | + - Instrumentation: Hook chat/completion methods |
| 74 | + |
| 75 | +- **`LlmObsCategory.MULTI_PROVIDER`** - Multi-provider frameworks (ai-sdk, langchain) |
| 76 | + - Signs: Supports multiple LLM providers via configuration, wraps LLM_CLIENT libraries |
| 77 | + - Test strategy: VCR with real API calls via proxy |
| 78 | + - Instrumentation: Hook provider abstraction layer |
| 79 | + |
| 80 | +- **`LlmObsCategory.ORCHESTRATION`** - Workflow managers (langgraph) |
| 81 | + - Signs: Graph/workflow execution, state management, NO direct HTTP to LLM providers |
| 82 | + - Test strategy: Pure function tests, NO VCR, NO real API calls |
| 83 | + - Instrumentation: Hook workflow lifecycle (invoke, stream, run) |
| 84 | + - **Special:** Tests should use actual LLM as orchestration node (not mock responses) |
| 85 | + |
| 86 | +- **`LlmObsCategory.INFRASTRUCTURE`** - Protocols/servers (MCP) |
| 87 | + - Signs: Protocol implementation, server/client architecture, transport layers |
| 88 | + - Test strategy: Mock server tests |
| 89 | + - Instrumentation: Hook protocol handlers |
| 90 | + |
| 91 | +#### Decision Tree |
| 92 | + |
| 93 | +Answer these questions by reading the code: |
| 94 | + |
| 95 | +1. **Does the package make direct HTTP calls to LLM provider endpoints?** |
| 96 | + - YES → Go to question 2 |
| 97 | + - NO → Go to question 3 |
| 98 | + |
| 99 | +2. **Does it support multiple LLM providers via configuration?** |
| 100 | + - YES → **`LlmObsCategory.MULTI_PROVIDER`** |
| 101 | + - NO → **`LlmObsCategory.LLM_CLIENT`** |
| 102 | + |
| 103 | +3. **Does it implement workflow/graph orchestration with state management?** |
| 104 | + - YES → **`LlmObsCategory.ORCHESTRATION`** |
| 105 | + - NO → **`LlmObsCategory.INFRASTRUCTURE`** |
| 106 | + |
| 107 | +See [references/category-detection.md](references/category-detection.md) for detailed heuristics and examples. |
| 108 | + |
| 109 | +### 3. LLM Span Kinds |
| 110 | + |
| 111 | +Use the `LlmObsSpanKind` enum: |
| 112 | + |
| 113 | +- **`LlmObsSpanKind.LLM`** - Chat completions, text generation |
| 114 | +- **`LlmObsSpanKind.WORKFLOW`** - Graph/chain execution |
| 115 | +- **`LlmObsSpanKind.AGENT`** - Agent runs |
| 116 | +- **`LlmObsSpanKind.TOOL`** - Tool/function calls |
| 117 | +- **`LlmObsSpanKind.EMBEDDING`** - Embedding generation |
| 118 | +- **`LlmObsSpanKind.RETRIEVAL`** - Vector DB/RAG retrieval |
| 119 | + |
| 120 | +**Most common:** Use `'llm'` for chat completions/text generation in LLM_CLIENT and MULTI_PROVIDER categories. |
| 121 | + |
| 122 | +### 4. Message Extraction |
| 123 | + |
| 124 | +All plugins must convert provider-specific message formats to the standard format: |
| 125 | + |
| 126 | +**Standard format:** `[{content: string, role: string}]` |
| 127 | + |
| 128 | +**Common roles:** `'user'`, `'assistant'`, `'system'`, `'tool'` |
| 129 | + |
| 130 | +**Provider-specific handling:** |
| 131 | +- OpenAI: Direct format match, handle `function_call` and `tool_calls` |
| 132 | +- Anthropic: Map `role` values, flatten nested content arrays |
| 133 | +- Google GenAI: Extract from `parts` arrays, map role names |
| 134 | +- Multi-provider: Detect provider and apply appropriate extraction |
| 135 | + |
| 136 | +See [references/message-extraction.md](references/message-extraction.md) for provider-specific patterns. |
| 137 | + |
| 138 | +## Implementation Steps |
| 139 | + |
| 140 | +1. **Detect package category** (REQUIRED FIRST STEP) |
| 141 | + - Follow decision tree above |
| 142 | + - Output: category, confidence, reasoning |
| 143 | + |
| 144 | +2. **Create plugin file** |
| 145 | + - Location: `packages/dd-trace/src/llmobs/plugins/{integration}/index.js` |
| 146 | + - Extend: `LLMObsPlugin` base class |
| 147 | + - Implement: Required methods per plugin architecture |
| 148 | + |
| 149 | +3. **Implement `getLLMObsSpanRegisterOptions(ctx)`** |
| 150 | + - Extract model provider and name from context |
| 151 | + - Determine span kind (usually `'llm'`) |
| 152 | + - Return registration options object |
| 153 | + |
| 154 | +4. **Implement `setLLMObsTags(ctx)`** |
| 155 | + - Extract input messages from `ctx.arguments` |
| 156 | + - Extract output messages from `ctx.result` |
| 157 | + - Extract token metrics (input_tokens, output_tokens, total_tokens) |
| 158 | + - Extract metadata (temperature, max_tokens, etc.) |
| 159 | + - Tag span using `this._tagger` methods |
| 160 | + |
| 161 | +5. **Handle edge cases** |
| 162 | + - Streaming responses (if applicable) |
| 163 | + - Error cases (empty output messages) |
| 164 | + - Non-standard message formats |
| 165 | + - Missing metadata |
| 166 | + |
| 167 | +See [references/plugin-architecture.md](references/plugin-architecture.md) for step-by-step implementation guide. |
| 168 | + |
| 169 | +## Common Patterns |
| 170 | + |
| 171 | +Based on category: |
| 172 | + |
| 173 | +- **LLM_CLIENT**: Messages in array, straightforward extraction from `result.choices[0]` or equivalent |
| 174 | +- **MULTI_PROVIDER**: Handle multiple provider formats with provider detection logic |
| 175 | +- **ORCHESTRATION**: May use `'workflow'` span kind instead of `'llm'`, focus on lifecycle events |
| 176 | +- **INFRASTRUCTURE**: Protocol-specific instrumentation, may not have traditional messages |
| 177 | + |
| 178 | +## Plugin Registration |
| 179 | + |
| 180 | +All plugins must export an array: |
| 181 | + |
| 182 | +**Static properties required:** |
| 183 | +- `integration` - Integration name (e.g., 'openai') |
| 184 | +- `id` - Unique plugin ID (e.g., 'llmobs_openai') |
| 185 | +- `prefix` - Channel prefix (e.g., 'tracing:apm:openai:chat') |
| 186 | + |
| 187 | +## References |
| 188 | + |
| 189 | +For detailed information, see: |
| 190 | + |
| 191 | +- [references/plugin-architecture.md](references/plugin-architecture.md) - Complete plugin structure, implementation steps, helper methods |
| 192 | +- [references/category-detection.md](references/category-detection.md) - Package classification heuristics and detection process |
| 193 | +- [references/message-extraction.md](references/message-extraction.md) - Provider-specific message format patterns |
| 194 | +- [references/reference-implementations.md](references/reference-implementations.md) - Working plugin examples (Anthropic, Google GenAI) |
| 195 | + |
| 196 | +## Key Principles |
| 197 | + |
| 198 | +1. **Category determines approach** - Always detect category first using decision tree |
| 199 | +2. **Use enum values** - Reference `LlmObsCategory` and `LlmObsSpanKind` enums from models |
| 200 | +3. **Standard message format** - Always convert to `[{content, role}]` format |
| 201 | +4. **Complete metadata** - Extract all available model parameters and token metrics |
| 202 | +5. **Error handling** - Handle failures gracefully (empty messages on error) |
| 203 | +6. **Test strategy follows category** - VCR for clients, pure functions for orchestration |
0 commit comments