Issue/319 splunk integration#791
Conversation
Greptile SummaryThis PR adds first-class Splunk integration to OpenSRE, following the existing Coralogix/BetterStack pattern end-to-end: config model, catalog wiring, env var loading, verification, source detection, SPL query building, a Confidence Score: 5/5Safe to merge — all remaining findings are P2 style/usability suggestions that don't affect correctness or the primary investigation path. Both prior P1 concerns (dead _client.py code and unused _ERROR_KEYWORDS constant) have been resolved. The two remaining comments are P2: a local import that should be module-level, and the wizard not supporting verify_ssl for self-signed certs (still configurable via env var). Core logic, error handling, and test coverage are solid. app/cli/wizard/flow.py — verify_ssl gap for enterprise self-signed cert environments. Important Files Changed
Sequence DiagramsequenceDiagram
participant Alert as Alert Payload
participant DS as detect_sources.py
participant SPL as build_splunk_spl_query()
participant BP as build_prompt.py
participant LLM as LLM Planner
participant Tool as SplunkSearchTool.run()
participant Client as SplunkClient
participant Splunk as Splunk REST API
Alert->>DS: raw_alert + annotations
DS->>DS: resolve splunk integration (base_url, token, index)
DS->>SPL: raw_query / error_message / alert_name / trace_id
SPL-->>DS: default_query (SPL string)
DS-->>BP: sources["splunk"] (base_url, index, default_query)
BP-->>LLM: Splunk Available hint block
LLM->>Tool: query_splunk_logs(query, time_range_minutes, limit)
Tool->>Client: search_logs(query, time_range_minutes, limit)
Client->>Splunk: POST /services/search/jobs/export
Splunk-->>Client: NDJSON stream
Client-->>Tool: success, logs, total
Tool-->>LLM: available, logs, error_logs, truncation_note
Reviews (2): Last reviewed commit: "Add verify ssl flag in onboarding wizard..." | Re-trigger Greptile |
|
Hey @abhishek-marathe04 👋 Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran One thing I noticed:
Can you please pass
And add one small test to confirm |
Let me check |
…t, Added test case to confirm same
Thanks for the review, Added this in latest commit : eb4e6c2 Please check. |
|
Hi @muddlebee Can you look at this one when you get chance. |
|
My bad I see the demo video now.. |
|
Fix all observations @muddlebee |
yashksaini-coder
left a comment
There was a problem hiding this comment.
Ran locally against the current head. Results: 66/66 tests pass, lint clean, no new mypy errors.
All six prior review threads are resolved. The implementation is solid — deterministic SPL builder, streaming export endpoint (no polling), StrictConfigModel on the integration config, "splunk" in EvidenceSource, connection_verified set in detect_sources, and clean verify/catalog wiring.
One real gap before merging:
SplunkSearchTool is missing surfaces
Every peer log-search tool in this codebase sets surfaces = ("investigation", "chat") — BetterStackLogsTool (line 27), CoralogixLogsTool (line 27), AzureMonitorLogsTool (line 55), and others. SplunkSearchTool has no surfaces attribute, which means it won't appear in the chat surface and won't be listed in tool discovery for that context.
Add to the class body:
surfaces = ("investigation", "chat")Everything else is in good shape. Fix surfaces and this is ready to merge.
|
🎻 "The diff was clean, the tests did pass, the reviewer wept." That poem was about @abhishek-marathe04's PR. 🥹 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |
|
@abhishek-marathe04 lovely, nice work 🎸 |
Resolves conflicts in auto-generated bot files: docs/daily-updates/overview.mdx, docs/daily-updates/2026-04-30.mdx, docs/daily-updates/2026-05-01.mdx, and README.md (contributors section). All four are produced by scheduled bot workflows in main and have no relationship to this PR; resolved by accepting the upstream main version verbatim. Pulls in 35+ commits from main since this PR was opened, including: refactor of integrations module (Tracer-Cloud#1165), Splunk integration (Tracer-Cloud#791), interactive shell improvements (Tracer-Cloud#1159, Tracer-Cloud#1167), Claude Code CLI provider (Tracer-Cloud#1168), and CI quality gate restoration. None of these touch the OpenSearch wizard, detect_sources, or the validation modules this PR modifies. Refs: Tracer-Cloud#1143

Fixes #319
Describe the changes you have made in this PR -
Adds first-class Splunk integration to OpenSRE. When a Splunk instance is configured (via env vars, the local credential store, or the onboarding wizard), the agent automatically detects it, builds an SPL query from the alert payload, and runs
query_splunk_logsas part of the investigation. Log evidence from Splunk flows into the root-cause diagnosis node alongside all other sources.Files changed:
app/integrations/models.py— AddedSplunkIntegrationConfigPydantic model andsplunkfield onEffectiveIntegrationsapp/integrations/catalog.py— Wired Splunk into_SERVICE_KEY_MAP,_classify_service_instance(), andload_env_integrations()(readsSPLUNK_URL,SPLUNK_TOKEN,SPLUNK_INDEX,SPLUNK_VERIFY_SSL)app/integrations/verify.py— Added_verify_splunk()and wired it intoverify_integrations()app/services/splunk/client.py— NewSplunkClient(uses the/services/search/jobs/exportstreaming endpoint; no polling) andbuild_splunk_spl_query()(deterministic SPL builder — no LLM involvement)app/services/splunk/__init__.py— Package init re-exporting public surfaceapp/tools/SplunkSearchTool/__init__.py—SplunkSearchTool(BaseToolsubclass):is_available()checksconnection_verified,extract_params()pulls fromsources["splunk"],run()separates error logs and compacts resultsapp/tools/SplunkSearchTool/_client.py— Client factory for the tool layerapp/nodes/plan_actions/detect_sources.py— Splunk source detection block: reads resolved integration, extracts SPL from alert annotations (priority cascade: verbatim query → error_message → alert_name → fallback), setssources["splunk"]app/nodes/plan_actions/build_prompt.py— Added "Splunk Available" hint block so the LLM planner knows the tool is availableapp/cli/wizard/flow.py—_configure_splunk()prompts user, validates live, saves to credential storeapp/cli/wizard/integration_health.py—validate_splunk_integration()calls/services/server/infoapp/cli/alert_templates.py— Added"splunk"template foropensre investigate --template splunktests/tools/test_splunk_search_tool.py— Contract tests + behavior tests (happy path, unavailable paths, error log separation, truncation note)tests/integrations/test_splunk.py— Config model tests,build_splunk_spl_querytests,SplunkClient.validate_accesstests, catalog classification tests, env var loader teststests/tools/conftest.py— Added"splunk"entry tomock_agent_state()tests/e2e/rca/splunk_errors.json— E2E alert fixture for the payments-service NullPointerException scenarioScreenshots of the UI changes (If any) -
Splunk working example :
https://youtu.be/Dy-JIaR0Lec
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
Problem: OpenSRE had no way to query Splunk logs during an investigation, leaving a gap for teams that use Splunk as their primary log platform.
Approach: Followed the existing integration pattern (Coralogix/BetterStack as reference). The key design decision was to keep SPL query generation fully deterministic —
build_splunk_spl_query()assembles the query from alert fields in a priority cascade (verbatim annotation query →error_message→alert_name→ fallback). The LLM only decides whether to call the tool; it never writes SPL. This makes investigations reproducible and auditable.Used the Splunk
/services/search/jobs/exportstreaming endpoint instead of the two-step create-job/poll-results flow — this is simpler, has no polling loop, and is ideal for the time-bounded RCA use case.Alternatives considered: Using the HEC (HTTP Event Collector) endpoint — rejected because HEC is write-only. Using the standard search job API with polling — rejected in favor of the simpler export endpoint which returns results in one request.
Key components:
SplunkClient.search_logs()— POSTs to the export endpoint, parses the NDJSON stream, normalizes rows to a common shapeSplunkClient.validate_access()— hits/services/server/infoto confirm credentials during onboardingbuild_splunk_spl_query()— deterministic SPL builder with a priority cascade; always appends| head Nto prevent runaway queriesSplunkSearchTool.run()— calls the client, separates error logs by keyword match, appliescompact_logs()before returning evidenceEdge cases handled:
base_urlortoken→ tool returnsavailable: Falseimmediately, no HTTP call madeerrorfield in result, investigation continues with other sources| head Nappended only if missing, preventing runaway queriesverify_ssl=Falsesupported for self-signed corporate Splunk instances (common in enterprise)result-type rows keptChecklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.