doc(skills): add LLMObs integration and testing skills#7655
Conversation
Overall package sizeSelf size: 5.04 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
a009de0 to
e5bb825
Compare
BenchmarksBenchmark execution time: 2026-03-18 20:33:32 Comparing candidate commit 88b214e in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 229 metrics, 31 unstable metrics. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7655 +/- ##
==========================================
+ Coverage 80.31% 80.45% +0.14%
==========================================
Files 739 748 +9
Lines 31946 32405 +459
==========================================
+ Hits 25657 26072 +415
- Misses 6289 6333 +44
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
| - `openai` - OpenAI API client | ||
| - `@google/generative-ai` - Google GenAI client | ||
| - `@anthropic-ai/sdk` - Anthropic Claude client | ||
| - `@mistralai/mistralai` - Mistral AI client | ||
| - `cohere-ai` - Cohere API client |
There was a problem hiding this comment.
verify with @sabrenner about best reference integrations to point at.
There was a problem hiding this comment.
i think the google genai and anthropic ones are best, openai is a bit too complicated probably. we also don't have mistral/cohere integrations
.agents/skills/llmobs-integration/references/category-detection.md
Outdated
Show resolved
Hide resolved
.agents/skills/llmobs-integration/references/category-detection.md
Outdated
Show resolved
Hide resolved
Adds two new agent skills for LLM Observability (LLMObs) instrumentation: 1. **llmobs-integration** - Creating LLMObs plugins - Package category system (LLM_CLIENT, MULTI_PROVIDER, ORCHESTRATION, INFRASTRUCTURE) - Category detection decision tree - Message extraction patterns (OpenAI, Anthropic, Google GenAI formats) - LLMObsSpanKind enum (llm, workflow, agent, tool, embedding, retrieval) - Plugin architecture patterns (getLLMObsSpanRegisterOptions, setLLMObsTags) - Reference implementations 2. **llmobs-testing** - Testing LLMObs plugins - Category-specific test strategies - VCR cassette system for API call recording - Assertion helpers (assertLlmObsSpanEvent, MOCK_* matchers) - Test structure patterns - Error handling validation These skills provide comprehensive guidance for agents creating LLMObs integrations, ensuring proper category classification, message handling, and testing strategies. Files added: - .agents/skills/llmobs-integration/SKILL.md - .agents/skills/llmobs-integration/references/category-detection.md - .agents/skills/llmobs-integration/references/message-extraction.md - .agents/skills/llmobs-integration/references/plugin-architecture.md - .agents/skills/llmobs-integration/references/reference-implementations.md - .agents/skills/llmobs-testing/SKILL.md - .agents/skills/llmobs-testing/references/assertion-helpers.md - .agents/skills/llmobs-testing/references/category-strategies.md - .agents/skills/llmobs-testing/references/test-structure.md - .agents/skills/llmobs-testing/references/vcr-cassettes.md Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Claude Code discovers skills via .claude/skills/ directory. Add symlinks to make llmobs-integration and llmobs-testing auto-discoverable, following the pattern from apm-integrations. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
The skills should be accessed from .agents/skills/ directory, not .claude/skills/. Toolkit now uses symlinks in anubis_apm/agent/skills/dd_trace_js/ to access these. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
These symlinks created circular references when skill loader tried to copy .agents/skills/* to .claude/skills/*. Skills are accessed via toolkit's anubis_apm/agent/skills/dd_trace_js/ symlinks instead. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Skills should only be accessed via toolkit symlinks in anubis_apm/agent/skills/dd_trace_js/, not from .claude/skills/. This directory should remain empty to avoid confusion. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
9e86463 to
1813645
Compare
- Strengthen descriptions with keyword-rich frontmatter for both skills - Expand Purpose section to cover streaming, embeddings, agent runs, orchestration - Convert all bare references to proper MD links - Remove mistral/cohere examples (no integrations exist); promote genai/anthropic - Remove toolkit-specific Multi-Signal Heuristics scoring section - Remove toolkit-specific Output Format JSON section - Remove anubis_apm module path references Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
.agents/skills/llmobs-integration/references/plugin-architecture.md
Outdated
Show resolved
Hide resolved
.agents/skills/llmobs-integration/references/plugin-architecture.md
Outdated
Show resolved
Hide resolved
…ons symlink - plugin-architecture.md: replace code blocks with lightweight prose descriptions, pointing to real implementation files for reference - message-extraction.md: replace provider-specific code blocks with explanation of what varies per provider and pointers to reference implementations - assertion-helpers.md: replace synthetic "Complete Test Example" with links to real test files (anthropic, google-genai) - Restore deleted .claude/skills/apm-integrations symlink Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…esting Documents that instrumented sub-packages (e.g. @openai/agents-openai as a dep of @openai/agents-core) must be required before the parent package so RITM can patch them before they are cached by the transitive import. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
sabrenner
left a comment
There was a problem hiding this comment.
cool! left some comments on some inconsistencies/incorrect things, but looks good overall
| **Examples:** | ||
| - `@ai-sdk/vercel` - Vercel AI SDK | ||
| - `langchain` - LangChain framework | ||
| - `llamaindex` - LlamaIndex framework |
There was a problem hiding this comment.
we don't have llamaindex support here yet so we can probs remove this so it's only referencing integrations we already support
|
|
||
| **Examples:** | ||
| - `@langchain/langgraph` - LangGraph workflow engine | ||
| - `crewai` - CrewAI multi-agent framework |
There was a problem hiding this comment.
again, maybe we can also exclude this crewai ref as well since we don't have it instrumented yet
|
|
||
| **Observable signs:** | ||
| - Package name suggests orchestration (langgraph, crew, workflow, graph) | ||
| - Has graph/workflow/chain execution methods (`invoke`, `stream`, `run`) |
There was a problem hiding this comment.
nit: i guess this could be true for non-orchestration libraries, we might be able to remove this point
| **Definition:** Communication protocols, server frameworks, infrastructure layers. | ||
|
|
||
| **Examples:** | ||
| - MCP (Model Context Protocol) clients/servers |
There was a problem hiding this comment.
i think this one we're good to leave as we don't have any examples to point to 👍
|
|
||
| ## Required Methods | ||
|
|
||
| ### getLLMObsSpanRegisterOptions(ctx) |
There was a problem hiding this comment.
we should specify that this method can return null to not record an LLMObs span for a given ctx
| - `asyncEnd(ctx)` - Calls setLLMObsTags | ||
| - `end(ctx)` - Restores context | ||
|
|
||
| **Inherited helpers:** |
There was a problem hiding this comment.
ditto on these not being inherited methods but instead methods on the tagger (see above comment)
| const CompositePlugin = require('../../plugins/composite') | ||
|
|
||
| module.exports = [ | ||
| CompositePlugin.createPlugin([TracingPlugin, LLMObsPlugin]) |
There was a problem hiding this comment.
idt this is a function, we should spell it out more directly that they need to aggregate the plugins and then create a CompositePlugin class with a static plugins field like
'use strict'
const CompositePlugin = require('../../dd-trace/src/plugins/composite')
const VercelAILLMObsPlugin = require('../../dd-trace/src/llmobs/plugins/ai')
const VercelAITracingPlugin = require('./tracing')
class VercelAIPlugin extends CompositePlugin {
static get id () { return 'ai' }
static get plugins () {
return {
llmobs: VercelAILLMObsPlugin,
tracing: VercelAITracingPlugin,
}
}
}
module.exports = VercelAIPluginThere was a problem hiding this comment.
instead of adding a code block I pointed to an example of the composite plugin structure.
|
|
||
| **Signature:** | ||
| ```javascript | ||
| assertLlmObsSpanEvent(actual, expected) |
There was a problem hiding this comment.
this function does have a docstring with types for it - do we still need the description below if it can read the docstring, or is the below table/parameters still necessary?
|
|
||
| ```javascript | ||
| const client = new MyLLMClient({ | ||
| apiKey: 'test-api-key', // Any value works for recording |
There was a problem hiding this comment.
this is wrong - not any value works for recording, it needs to be a real API key for the first go. this can be pulled from process.env. i think we can give an example like
apiKey: process.env.OPENAI_API_KEY ?? 'test-api-key'this way locally it always works on the first go as long as the api key is set in the process env, otherwise in ci where it's not set it still has a value
There was a problem hiding this comment.
although i see we specify it below, but we should maybe clarify the comment here anyhow
| - Cassettes are outdated | ||
|
|
||
| **Process:** | ||
| 1. Delete old cassettes: `rm -rf test/llmobs/plugins/{provider}/cassettes/` |
There was a problem hiding this comment.
we don't want to delete the whole folder's contents, just the one associated with a specific test. i don't have a great heuristic for doing this, usually it's one in the git diff in whatever PR is being worked on
* feat(skills): Add LLMObs integration and testing skills Adds two new agent skills for LLM Observability (LLMObs) instrumentation, these skills provide comprehensive guidance for agents creating LLMObs integrations, ensuring proper category classification, message handling, and testing strategies. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
* feat(skills): Add LLMObs integration and testing skills Adds two new agent skills for LLM Observability (LLMObs) instrumentation, these skills provide comprehensive guidance for agents creating LLMObs integrations, ensuring proper category classification, message handling, and testing strategies. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Summary
Adds two new agent skills for LLM Observability (LLMObs) instrumentation to support agents creating LLMObs integrations.
Changes
New Skills Added
1. llmobs-integration/
LLM_CLIENT: Direct LLM API clients (OpenAI, Anthropic, etc.)MULTI_PROVIDER: Abstraction layers over multiple providersORCHESTRATION: Workflow/chain orchestration librariesINFRASTRUCTURE: Vector databases, retrievers, embeddings2. llmobs-testing/
Files Added
Why This Change?
Without these skills, agents have no guidance on:
These skills ensure agents create consistent, properly-tested LLMObs integrations with correct category classification and appropriate testing patterns.
Testing
Skills are referenced by LLMObs workflow agents during integration creation. Content verified against existing LLMObs integrations (OpenAI, Anthropic, LangChain, etc.).
🤖 Generated with Claude Code