Issue/319 splunk integration by abhishek-marathe04 · Pull Request #791 · Tracer-Cloud/opensre

abhishek-marathe04 · 2026-04-23T12:47:25Z

Fixes #319

Describe the changes you have made in this PR -

Adds first-class Splunk integration to OpenSRE. When a Splunk instance is configured (via env vars, the local credential store, or the onboarding wizard), the agent automatically detects it, builds an SPL query from the alert payload, and runs query_splunk_logs as part of the investigation. Log evidence from Splunk flows into the root-cause diagnosis node alongside all other sources.

Files changed:

app/integrations/models.py — Added SplunkIntegrationConfig Pydantic model and splunk field on EffectiveIntegrations
app/integrations/catalog.py — Wired Splunk into _SERVICE_KEY_MAP, _classify_service_instance(), and load_env_integrations() (reads SPLUNK_URL, SPLUNK_TOKEN, SPLUNK_INDEX, SPLUNK_VERIFY_SSL)
app/integrations/verify.py — Added _verify_splunk() and wired it into verify_integrations()
app/services/splunk/client.py — New SplunkClient (uses the /services/search/jobs/export streaming endpoint; no polling) and build_splunk_spl_query() (deterministic SPL builder — no LLM involvement)
app/services/splunk/__init__.py — Package init re-exporting public surface
app/tools/SplunkSearchTool/__init__.py — SplunkSearchTool (BaseTool subclass): is_available() checks connection_verified, extract_params() pulls from sources["splunk"], run() separates error logs and compacts results
app/tools/SplunkSearchTool/_client.py — Client factory for the tool layer
app/nodes/plan_actions/detect_sources.py — Splunk source detection block: reads resolved integration, extracts SPL from alert annotations (priority cascade: verbatim query → error_message → alert_name → fallback), sets sources["splunk"]
app/nodes/plan_actions/build_prompt.py — Added "Splunk Available" hint block so the LLM planner knows the tool is available
app/cli/wizard/flow.py — _configure_splunk() prompts user, validates live, saves to credential store
app/cli/wizard/integration_health.py — validate_splunk_integration() calls /services/server/info
app/cli/alert_templates.py — Added "splunk" template for opensre investigate --template splunk
tests/tools/test_splunk_search_tool.py — Contract tests + behavior tests (happy path, unavailable paths, error log separation, truncation note)
tests/integrations/test_splunk.py — Config model tests, build_splunk_spl_query tests, SplunkClient.validate_access tests, catalog classification tests, env var loader tests
tests/tools/conftest.py — Added "splunk" entry to mock_agent_state()
tests/e2e/rca/splunk_errors.json — E2E alert fixture for the payments-service NullPointerException scenario

Screenshots of the UI changes (If any) -

Splunk working example :
https://youtu.be/Dy-JIaR0Lec

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Problem: OpenSRE had no way to query Splunk logs during an investigation, leaving a gap for teams that use Splunk as their primary log platform.

Approach: Followed the existing integration pattern (Coralogix/BetterStack as reference). The key design decision was to keep SPL query generation fully deterministic — build_splunk_spl_query() assembles the query from alert fields in a priority cascade (verbatim annotation query → error_message → alert_name → fallback). The LLM only decides whether to call the tool; it never writes SPL. This makes investigations reproducible and auditable.

Used the Splunk /services/search/jobs/export streaming endpoint instead of the two-step create-job/poll-results flow — this is simpler, has no polling loop, and is ideal for the time-bounded RCA use case.

Alternatives considered: Using the HEC (HTTP Event Collector) endpoint — rejected because HEC is write-only. Using the standard search job API with polling — rejected in favor of the simpler export endpoint which returns results in one request.

Key components:

SplunkClient.search_logs() — POSTs to the export endpoint, parses the NDJSON stream, normalizes rows to a common shape
SplunkClient.validate_access() — hits /services/server/info to confirm credentials during onboarding
build_splunk_spl_query() — deterministic SPL builder with a priority cascade; always appends | head N to prevent runaway queries
SplunkSearchTool.run() — calls the client, separates error logs by keyword match, applies compact_logs() before returning evidence

Edge cases handled:

Missing base_url or token → tool returns available: False immediately, no HTTP call made
Splunk REST API error (4xx/5xx) → caught, returned as error field in result, investigation continues with other sources
Raw SPL supplied in alert annotations → used verbatim; | head N appended only if missing, preventing runaway queries
verify_ssl=False supported for self-signed corporate Splunk instances (common in enterprise)
NDJSON export stream may contain non-result lines (metadata rows) → parsed defensively, only result-type rows kept

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

greptile-apps · 2026-04-23T12:51:31Z

Greptile Summary

This PR adds first-class Splunk integration to OpenSRE, following the existing Coralogix/BetterStack pattern end-to-end: config model, catalog wiring, env var loading, verification, source detection, SPL query building, a BaseTool subclass, wizard onboarding, and comprehensive tests. The implementation is clean and well-tested with two minor P2 items remaining.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/usability suggestions that don't affect correctness or the primary investigation path.

Both prior P1 concerns (dead _client.py code and unused _ERROR_KEYWORDS constant) have been resolved. The two remaining comments are P2: a local import that should be module-level, and the wizard not supporting verify_ssl for self-signed certs (still configurable via env var). Core logic, error handling, and test coverage are solid.

app/cli/wizard/flow.py — verify_ssl gap for enterprise self-signed cert environments.

Important Files Changed

Filename	Overview
app/services/splunk/client.py	New Splunk REST client — clean implementation using the export streaming endpoint; handles HTTP errors, cert verification, NDJSON parsing, and row normalization correctly.
app/tools/SplunkSearchTool/init.py	BaseTool subclass for Splunk search — now correctly imports and uses make_client/unavailable from _client.py; error-log separation and compaction follow existing patterns.
app/nodes/plan_actions/detect_sources.py	Splunk source detection block added; has a minor style inconsistency with a local import that should be module-level; the SPL priority cascade logic is correct.
app/cli/wizard/flow.py	Splunk wizard configuration added; verify_ssl is not prompted for or saved, so enterprises with self-signed certs cannot complete wizard-based setup.
app/integrations/catalog.py	Splunk wired into service key map, classification, and env var loader; SPLUNK_INSTANCES multi-instance path correctly prevents double-loading single-instance env vars.
app/integrations/models.py	SplunkIntegrationConfig added with proper field validators for normalization; follows the StrictConfigModel pattern used by other integrations.
app/integrations/verify.py	_verify_splunk added and wired into verify_integrations; follows the exact same pattern as _verify_opensearch and _verify_openobserve.
tests/integrations/test_splunk.py	Comprehensive test coverage for config model, SPL builder, validate_access, search_logs, catalog classification, and env var loading.
tests/tools/test_splunk_search_tool.py	Good contract + behavior coverage including unavailable paths, error-log separation, truncation notes, and verify_ssl propagation.

Sequence Diagram

sequenceDiagram
    participant Alert as Alert Payload
    participant DS as detect_sources.py
    participant SPL as build_splunk_spl_query()
    participant BP as build_prompt.py
    participant LLM as LLM Planner
    participant Tool as SplunkSearchTool.run()
    participant Client as SplunkClient
    participant Splunk as Splunk REST API

    Alert->>DS: raw_alert + annotations
    DS->>DS: resolve splunk integration (base_url, token, index)
    DS->>SPL: raw_query / error_message / alert_name / trace_id
    SPL-->>DS: default_query (SPL string)
    DS-->>BP: sources["splunk"] (base_url, index, default_query)
    BP-->>LLM: Splunk Available hint block
    LLM->>Tool: query_splunk_logs(query, time_range_minutes, limit)
    Tool->>Client: search_logs(query, time_range_minutes, limit)
    Client->>Splunk: POST /services/search/jobs/export
    Splunk-->>Client: NDJSON stream
    Client-->>Tool: success, logs, total
    Tool-->>LLM: available, logs, error_logs, truncation_note

_{Reviews (2): Last reviewed commit: "Add verify ssl flag in onboarding wizard..." | Re-trigger Greptile}

…i test

rrajan94 · 2026-04-25T10:28:13Z

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

detect_sources()
SplunkSearchTool.extract_params()
SplunkSearchTool.run()
_client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

abhishek-marathe04 · 2026-04-25T10:56:32Z

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

detect_sources()

SplunkSearchTool.extract_params()

SplunkSearchTool.run()

_client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

Let me check

…t, Added test case to confirm same

abhishek-marathe04 · 2026-04-25T11:15:12Z

Hey @abhishek-marathe04 👋

Thanks for sharing the working video. It looks like the Splunk happy path is working end-to-end. The investigation picked up the Splunk integration, ran query_splunk_logs, and used Splunk evidence in the final report.

One thing I noticed: ca_bundle is supported in config, env loading, wizard validation, and SplunkConfig.ssl_verify, but it does not seem to reach the actual SplunkSearchTool execution path.

detect_sources() adds verify_ssl to sources["splunk"], but not ca_bundle. Then SplunkSearchTool.extract_params() / run() and _client.make_client() only pass verify_ssl, so a setup where integrations verify splunk passes using SPLUNK_CA_BUNDLE could still fail when the tool actually runs a Splunk query.

Can you please pass ca_bundle through:

detect_sources()

SplunkSearchTool.extract_params()

SplunkSearchTool.run()

_client.make_client()

And add one small test to confirm ca_bundle is passed into SplunkConfig during tool execution.

Thanks for the review, Added this in latest commit : eb4e6c2 Please check.

abhishek-marathe04 · 2026-04-28T09:21:00Z

Hi @muddlebee Can you look at this one when you get chance.

muddlebee · 2026-04-28T09:23:23Z

~~@abhishek-marathe04 can you share a demo video? pls and walk us through the entire integration.~~

My bad I see the demo video now..

abhishek-marathe04 · 2026-04-30T09:55:25Z

Fix all observations @muddlebee
Its ready for review.
Thanks

muddlebee

LGTM

yashksaini-coder

Ran locally against the current head. Results: 66/66 tests pass, lint clean, no new mypy errors.

All six prior review threads are resolved. The implementation is solid — deterministic SPL builder, streaming export endpoint (no polling), StrictConfigModel on the integration config, "splunk" in EvidenceSource, connection_verified set in detect_sources, and clean verify/catalog wiring.

One real gap before merging:

SplunkSearchTool is missing surfaces

Every peer log-search tool in this codebase sets surfaces = ("investigation", "chat") — BetterStackLogsTool (line 27), CoralogixLogsTool (line 27), AzureMonitorLogsTool (line 55), and others. SplunkSearchTool has no surfaces attribute, which means it won't appear in the chat surface and won't be listed in tool discovery for that context.

Add to the class body:

surfaces = ("investigation", "chat")

Everything else is in good shape. Fix surfaces and this is ready to merge.

github-actions · 2026-04-30T14:11:05Z

🎻 "The diff was clean, the tests did pass, the reviewer wept." That poem was about @abhishek-marathe04's PR. 🥹

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

muddlebee · 2026-04-30T14:11:05Z

@abhishek-marathe04 lovely, nice work 🎸

Resolves conflicts in auto-generated bot files: docs/daily-updates/overview.mdx, docs/daily-updates/2026-04-30.mdx, docs/daily-updates/2026-05-01.mdx, and README.md (contributors section). All four are produced by scheduled bot workflows in main and have no relationship to this PR; resolved by accepting the upstream main version verbatim. Pulls in 35+ commits from main since this PR was opened, including: refactor of integrations module (Tracer-Cloud#1165), Splunk integration (Tracer-Cloud#791), interactive shell improvements (Tracer-Cloud#1159, Tracer-Cloud#1167), Claude Code CLI provider (Tracer-Cloud#1168), and CI quality gate restoration. None of these touch the OpenSearch wizard, detect_sources, or the validation modules this PR modifies. Refs: Tracer-Cloud#1143

abhishek-marathe04 added 8 commits April 21, 2026 20:04

SYnc with main

8296027

Add SplunkIntegrationConfig model

0fe98d2

Merge branch 'Tracer-Cloud:main' into issue/319-splunk-integration

ca82812

Add splunk integration config

f69ccf5

Add splunk client and splunk tool

758d09b

Fix lint and typing issues in splunk integration

88aad26

Add test data for splunk integration

2ba1f54

Add test cases for splunk integration

dde9cf6

abhishek-marathe04 marked this pull request as draft April 23, 2026 12:47

github-advanced-security AI found potential problems Apr 23, 2026

View reviewed changes

Comment thread app/services/splunk/client.py Fixed

greptile-apps Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread app/tools/SplunkSearchTool/_client.py

Comment thread app/services/splunk/client.py

abhishek-marathe04 added 5 commits April 23, 2026 18:21

Fix lint errors

4abd8e5

Ruf format files

8566290

Fix test cases with make client changes

13e5960

Add ruff formatting

51307fe

Merge branch 'main' into issue/319-splunk-integration

fee8d62

abhishek-marathe04 marked this pull request as ready for review April 24, 2026 11:49

abhishek-marathe04 and others added 6 commits April 24, 2026 17:20

Add verify ssl flag in onboarding wizard as well

1367ec6

Add splunk docs

51c0e4b

Fix ruff formatting

38bcf3d

Add CA Bundle support for enterprise self sign certs

bdd0bfc

Update test cases to include CA bundle change, and Fix interactive cl…

235cc0d

…i test

docs(daily-update): archive 2026-04-24

cc00f51

Pass ca bundle to SplunkSearchTool extract_params, run and make clien…

eb4e6c2

…t, Added test case to confirm same

github-actions Bot and others added 2 commits April 26, 2026 04:33

docs(daily-update): archive 2026-04-25

384cd93

Sync with remote main

e0b9e6e

abhishek-marathe04 added 2 commits April 26, 2026 15:16

Sync with main

cc751e2

Merge remote-tracking branch 'refs/remotes/origin/main'

3df992f

muddlebee reviewed Apr 28, 2026

View reviewed changes

Comment thread app/tools/SplunkSearchTool/__init__.py Outdated

Comment thread app/integrations/catalog.py

Comment thread app/nodes/plan_actions/detect_sources.py

abhishek-marathe04 added 5 commits April 30, 2026 14:56

Fixing observations

e3d8543

Merge remote-tracking branch 'refs/remotes/origin/main'

66462d1

Merge with main, Added splunk integration check in client_validators.py

8852b0d

Ruff formatting

6628950

Fix test case

8dddf85

muddlebee approved these changes Apr 30, 2026

View reviewed changes

yashksaini-coder requested changes Apr 30, 2026

View reviewed changes

yashksaini-coder reviewed Apr 30, 2026

View reviewed changes

Comment thread app/tools/SplunkSearchTool/__init__.py

Add surface attribute

da1ccd5

abhishek-marathe04 requested a review from yashksaini-coder April 30, 2026 11:15

muddlebee merged commit 83a5d13 into Tracer-Cloud:main Apr 30, 2026
10 checks passed

Conversation

abhishek-marathe04 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the changes you have made in this PR -

Screenshots of the UI changes (If any) -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

rrajan94 commented Apr 25, 2026

Uh oh!

abhishek-marathe04 commented Apr 25, 2026

Uh oh!

abhishek-marathe04 commented Apr 25, 2026

Uh oh!

abhishek-marathe04 commented Apr 28, 2026

Uh oh!

muddlebee commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishek-marathe04 commented Apr 30, 2026

Uh oh!

muddlebee left a comment

Choose a reason for hiding this comment

Uh oh!

yashksaini-coder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

muddlebee commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abhishek-marathe04 commented Apr 23, 2026 •

edited

Loading

greptile-apps Bot commented Apr 23, 2026 •

edited

Loading

muddlebee commented Apr 28, 2026 •

edited

Loading