Skip to content

Issue/319 splunk integration#791

Merged
muddlebee merged 30 commits intoTracer-Cloud:mainfrom
abhishek-marathe04:issue/319-splunk-integration
Apr 30, 2026
Merged

Issue/319 splunk integration#791
muddlebee merged 30 commits intoTracer-Cloud:mainfrom
abhishek-marathe04:issue/319-splunk-integration

Conversation

@abhishek-marathe04
Copy link
Copy Markdown
Contributor

@abhishek-marathe04 abhishek-marathe04 commented Apr 23, 2026

Fixes #319

Describe the changes you have made in this PR -

Adds first-class Splunk integration to OpenSRE. When a Splunk instance is configured (via env vars, the local credential store, or the onboarding wizard), the agent automatically detects it, builds an SPL query from the alert payload, and runs query_splunk_logs as part of the investigation. Log evidence from Splunk flows into the root-cause diagnosis node alongside all other sources.

Files changed:

  • app/integrations/models.py — Added SplunkIntegrationConfig Pydantic model and splunk field on EffectiveIntegrations
  • app/integrations/catalog.py — Wired Splunk into _SERVICE_KEY_MAP, _classify_service_instance(), and load_env_integrations() (reads SPLUNK_URL, SPLUNK_TOKEN, SPLUNK_INDEX, SPLUNK_VERIFY_SSL)
  • app/integrations/verify.py — Added _verify_splunk() and wired it into verify_integrations()
  • app/services/splunk/client.py — New SplunkClient (uses the /services/search/jobs/export streaming endpoint; no polling) and build_splunk_spl_query() (deterministic SPL builder — no LLM involvement)
  • app/services/splunk/__init__.py — Package init re-exporting public surface
  • app/tools/SplunkSearchTool/__init__.pySplunkSearchTool (BaseTool subclass): is_available() checks connection_verified, extract_params() pulls from sources["splunk"], run() separates error logs and compacts results
  • app/tools/SplunkSearchTool/_client.py — Client factory for the tool layer
  • app/nodes/plan_actions/detect_sources.py — Splunk source detection block: reads resolved integration, extracts SPL from alert annotations (priority cascade: verbatim query → error_message → alert_name → fallback), sets sources["splunk"]
  • app/nodes/plan_actions/build_prompt.py — Added "Splunk Available" hint block so the LLM planner knows the tool is available
  • app/cli/wizard/flow.py_configure_splunk() prompts user, validates live, saves to credential store
  • app/cli/wizard/integration_health.pyvalidate_splunk_integration() calls /services/server/info
  • app/cli/alert_templates.py — Added "splunk" template for opensre investigate --template splunk
  • tests/tools/test_splunk_search_tool.py — Contract tests + behavior tests (happy path, unavailable paths, error log separation, truncation note)
  • tests/integrations/test_splunk.py — Config model tests, build_splunk_spl_query tests, SplunkClient.validate_access tests, catalog classification tests, env var loader tests
  • tests/tools/conftest.py — Added "splunk" entry to mock_agent_state()
  • tests/e2e/rca/splunk_errors.json — E2E alert fixture for the payments-service NullPointerException scenario

Screenshots of the UI changes (If any) -

Splunk working example :
https://youtu.be/Dy-JIaR0Lec


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Problem: OpenSRE had no way to query Splunk logs during an investigation, leaving a gap for teams that use Splunk as their primary log platform.

Approach: Followed the existing integration pattern (Coralogix/BetterStack as reference). The key design decision was to keep SPL query generation fully deterministic — build_splunk_spl_query() assembles the query from alert fields in a priority cascade (verbatim annotation query → error_messagealert_name → fallback). The LLM only decides whether to call the tool; it never writes SPL. This makes investigations reproducible and auditable.

Used the Splunk /services/search/jobs/export streaming endpoint instead of the two-step create-job/poll-results flow — this is simpler, has no polling loop, and is ideal for the time-bounded RCA use case.

Alternatives considered: Using the HEC (HTTP Event Collector) endpoint — rejected because HEC is write-only. Using the standard search job API with polling — rejected in favor of the simpler export endpoint which returns results in one request.

Key components:

  • SplunkClient.search_logs() — POSTs to the export endpoint, parses the NDJSON stream, normalizes rows to a common shape
  • SplunkClient.validate_access() — hits /services/server/info to confirm credentials during onboarding
  • build_splunk_spl_query() — deterministic SPL builder with a priority cascade; always appends | head N to prevent runaway queries
  • SplunkSearchTool.run() — calls the client, separates error logs by keyword match, applies compact_logs() before returning evidence

Edge cases handled:

  • Missing base_url or token → tool returns available: False immediately, no HTTP call made
  • Splunk REST API error (4xx/5xx) → caught, returned as error field in result, investigation continues with other sources
  • Raw SPL supplied in alert annotations → used verbatim; | head N appended only if missing, preventing runaway queries
  • verify_ssl=False supported for self-signed corporate Splunk instances (common in enterprise)
  • NDJSON export stream may contain non-result lines (metadata rows) → parsed defensively, only result-type rows kept

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@abhishek-marathe04 abhishek-marathe04 marked this pull request as draft April 23, 2026 12:47
Comment thread app/services/splunk/client.py Fixed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

This PR adds first-class Splunk integration to OpenSRE, following the existing Coralogix/BetterStack pattern end-to-end: config model, catalog wiring, env var loading, verification, source detection, SPL query building, a BaseTool subclass, wizard onboarding, and comprehensive tests. The implementation is clean and well-tested with two minor P2 items remaining.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/usability suggestions that don't affect correctness or the primary investigation path.

Both prior P1 concerns (dead _client.py code and unused _ERROR_KEYWORDS constant) have been resolved. The two remaining comments are P2: a local import that should be module-level, and the wizard not supporting verify_ssl for self-signed certs (still configurable via env var). Core logic, error handling, and test coverage are solid.

app/cli/wizard/flow.py — verify_ssl gap for enterprise self-signed cert environments.

Important Files Changed

Filename Overview
app/services/splunk/client.py New Splunk REST client — clean implementation using the export streaming endpoint; handles HTTP errors, cert verification, NDJSON parsing, and row normalization correctly.
app/tools/SplunkSearchTool/init.py BaseTool subclass for Splunk search — now correctly imports and uses make_client/unavailable from _client.py; error-log separation and compaction follow existing patterns.
app/nodes/plan_actions/detect_sources.py Splunk source detection block added; has a minor style inconsistency with a local import that should be module-level; the SPL priority cascade logic is correct.
app/cli/wizard/flow.py Splunk wizard configuration added; verify_ssl is not prompted for or saved, so enterprises with self-signed certs cannot complete wizard-based setup.
app/integrations/catalog.py Splunk wired into service key map, classification, and env var loader; SPLUNK_INSTANCES multi-instance path correctly prevents double-loading single-instance env vars.
app/integrations/models.py SplunkIntegrationConfig added with proper field validators for normalization; follows the StrictConfigModel pattern used by other integrations.
app/integrations/verify.py _verify_splunk added and wired into verify_integrations; follows the exact same pattern as _verify_opensearch and _verify_openobserve.
tests/integrations/test_splunk.py Comprehensive test coverage for config model, SPL builder, validate_access, search_logs, catalog classification, and env var loading.
tests/tools/test_splunk_search_tool.py Good contract + behavior coverage including unavailable paths, error-log separation, truncation notes, and verify_ssl propagation.

Sequence Diagram

sequenceDiagram
    participant Alert as Alert Payload
    participant DS as detect_sources.py
    participant SPL as build_splunk_spl_query()
    participant BP as build_prompt.py
    participant LLM as LLM Planner
    participant Tool as SplunkSearchTool.run()
    participant Client as SplunkClient
    participant Splunk as Splunk REST API

    Alert->>DS: raw_alert + annotations
    DS->>DS: resolve splunk integration (base_url, token, index)
    DS->>SPL: raw_query / error_message / alert_name / trace_id
    SPL-->>DS: default_query (SPL string)
    DS-->>BP: sources["splunk"] (base_url, index, default_query)
    BP-->>LLM: Splunk Available hint block
    LLM->>Tool: query_splunk_logs(query, time_range_minutes, limit)
    Tool->>Client: search_logs(query, time_range_minutes, limit)
    Client->>Splunk: POST /services/search/jobs/export
    Splunk-->>Client: NDJSON stream
    Client-->>Tool: success, logs, total
    Tool-->>LLM: available, logs, error_logs, truncation_note
Loading

Reviews (2): Last reviewed commit: "Add verify ssl flag in onboarding wizard..." | Re-trigger Greptile

Comment thread app/tools/SplunkSearchTool/_client.py
Comment thread app/services/splunk/client.py
@abhishek-marathe04 abhishek-marathe04 marked this pull request as ready for review April 24, 2026 11:49
@rrajan94
Copy link
Copy Markdown
Collaborator

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

  • detect_sources()
  • SplunkSearchTool.extract_params()
  • SplunkSearchTool.run()
  • _client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

@abhishek-marathe04
Copy link
Copy Markdown
Contributor Author

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

  • detect_sources()
  • SplunkSearchTool.extract_params()
  • SplunkSearchTool.run()
  • _client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

Let me check

@abhishek-marathe04
Copy link
Copy Markdown
Contributor Author

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

  • detect_sources()
  • SplunkSearchTool.extract_params()
  • SplunkSearchTool.run()
  • _client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

Thanks for the review, Added this in latest commit : eb4e6c2 Please check.

@abhishek-marathe04
Copy link
Copy Markdown
Contributor Author

Hi @muddlebee Can you look at this one when you get chance.

@muddlebee
Copy link
Copy Markdown
Collaborator

muddlebee commented Apr 28, 2026

@abhishek-marathe04 can you share a demo video? pls and walk us through the entire integration.

My bad I see the demo video now..

Comment thread app/tools/SplunkSearchTool/__init__.py Outdated
Comment thread app/integrations/catalog.py
Comment thread app/nodes/plan_actions/detect_sources.py
@abhishek-marathe04
Copy link
Copy Markdown
Contributor Author

Fix all observations @muddlebee
Its ready for review.
Thanks

Copy link
Copy Markdown
Collaborator

@muddlebee muddlebee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@yashksaini-coder yashksaini-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran locally against the current head. Results: 66/66 tests pass, lint clean, no new mypy errors.

All six prior review threads are resolved. The implementation is solid — deterministic SPL builder, streaming export endpoint (no polling), StrictConfigModel on the integration config, "splunk" in EvidenceSource, connection_verified set in detect_sources, and clean verify/catalog wiring.

One real gap before merging:


SplunkSearchTool is missing surfaces

Every peer log-search tool in this codebase sets surfaces = ("investigation", "chat")BetterStackLogsTool (line 27), CoralogixLogsTool (line 27), AzureMonitorLogsTool (line 55), and others. SplunkSearchTool has no surfaces attribute, which means it won't appear in the chat surface and won't be listed in tool discovery for that context.

Add to the class body:

surfaces = ("investigation", "chat")

Everything else is in good shape. Fix surfaces and this is ready to merge.

Comment thread app/tools/SplunkSearchTool/__init__.py
@muddlebee muddlebee merged commit 83a5d13 into Tracer-Cloud:main Apr 30, 2026
10 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎻 "The diff was clean, the tests did pass, the reviewer wept." That poem was about @abhishek-marathe04's PR. 🥹


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

@muddlebee
Copy link
Copy Markdown
Collaborator

@abhishek-marathe04 lovely, nice work 🎸

Sarah-Salah added a commit to Sarah-Salah/opensre that referenced this pull request May 5, 2026
Resolves conflicts in auto-generated bot files: docs/daily-updates/overview.mdx, docs/daily-updates/2026-04-30.mdx, docs/daily-updates/2026-05-01.mdx, and README.md (contributors section). All four are produced by scheduled bot workflows in main and have no relationship to this PR; resolved by accepting the upstream main version verbatim.

Pulls in 35+ commits from main since this PR was opened, including: refactor of integrations module (Tracer-Cloud#1165), Splunk integration (Tracer-Cloud#791), interactive shell improvements (Tracer-Cloud#1159, Tracer-Cloud#1167), Claude Code CLI provider (Tracer-Cloud#1168), and CI quality gate restoration. None of these touch the OpenSearch wizard, detect_sources, or the validation modules this PR modifies.

Refs: Tracer-Cloud#1143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add Splunk integration for log search and RCA evidence

5 participants