Skip to content

test(tools): add dedicated unit tests for Snowflake, OpenObserve, OpenSearch tools (#790)#1238

Merged
muddlebee merged 2 commits intoTracer-Cloud:mainfrom
Powlisher:issue/790-snowflake-openobserve-opensearch-tests
May 4, 2026
Merged

test(tools): add dedicated unit tests for Snowflake, OpenObserve, OpenSearch tools (#790)#1238
muddlebee merged 2 commits intoTracer-Cloud:mainfrom
Powlisher:issue/790-snowflake-openobserve-opensearch-tests

Conversation

@Powlisher
Copy link
Copy Markdown
Contributor

Fixes #790

Describe the changes you have made in this PR

Adds three new dedicated test files under `tests/tools/` with the file names and proposed coverage exactly as listed in #790. Each file mirrors the existing tool test conventions: inherits from `BaseToolContract` for the shared metadata/contract checks, uses pytest `monkeypatch` to stub `httpx.post` (or `ElasticsearchClient` for OpenSearch) — no live HTTP calls, no required external services.

File New tests What it covers
`tests/tools/test_snowflake_query_history_tool.py` 26 `is_available` + `extract_params`, account/token validation, default vs custom SQL with `LIMIT` enforcement, optional session params (`warehouse`/`role`/`database`/`schema`), bounded limit cap, row normalization for both dict-rows AND tabular rows + `resultSetMetaData`, HTTP failure handling.
`tests/tools/test_openobserve_logs_tool.py` 36 `is_available` across token-only / basic-auth-only / both, `extract_params`, bearer vs basic `Authorization` header behavior (bearer wins when both provided), endpoint org templating, default SQL fallback, time window + `size` payload shape, `stream_name` only when set, bounded limit cap, record extraction across the four supported response shapes (top-level `hits` list, Elastic-style `hits.hits._source`, `records`, `data`), HTTP failure handling.
`tests/tools/test_opensearch_analytics_tool.py` 26 `is_available` + `extract_params`, `ElasticsearchConfig` normalization (URL trim, blank `api_key` → `None`, blank index → `*`), search call shape (`query` default, `time_range_minutes` floor 1, bounded limit, `_MAX_HARD_LIMIT` cap), log filtering (non-dict items dropped), missing / non-list `logs` field, client error propagation (with and without `error` field).

Demo/Screenshot for feature changes and bug fixes

```
$ .venv/Scripts/python -m pytest tests/tools/test_snowflake_query_history_tool.py tests/tools/test_openobserve_logs_tool.py tests/tools/test_opensearch_analytics_tool.py -v

============================= 88 passed in 1.55s ==============================
```

Full `tests/tools/` suite: 1541 passed in 6.96s (was 1453 before this PR; +88 new tests, no regressions).

Local CI gates:

  • `ruff check` on the three new files → All checks passed!
  • `ruff format --check` on the three new files → 3 files already formatted

Note on the existing `test_integration_wave_tools.py`

That file already exists and contains 1 limit-bound smoke test per tool for Snowflake / OpenObserve / OpenSearch (plus Azure). The new dedicated files supersede those in scope (deeper coverage of normalization, auth, validation, error paths). No changes were made to `test_integration_wave_tools.py` in this PR — happy to drop the now-duplicative cases there in a follow-up if a maintainer prefers.


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • Yes, I used AI assistance (Claude Code)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

For each tool, I started by reading `app/tools//init.py` to enumerate every branch the tool can take: validation guards, request shaping, response normalization, error handling. I then wrote one test per branch — each isolated, deterministic, and inheriting from `BaseToolContract` for the shared metadata checks (`name`/`description`/`input_schema`/`source`/`is_available` returns bool / `extract_params` returns dict).

Mocking strategy:

  • Snowflake & OpenObserve call `httpx.post` directly. I built a tiny `_MockResponse` class with just `raise_for_status()` and `.json()` and patched `app.tools..httpx.post`. The `raise_for_status_error` kwarg lets one helper class cover both "HTTP error status" and "raises during `raise_for_status`" failure modes without duplicating boilerplate.
  • OpenSearch wraps `ElasticsearchClient`. I patched `ElasticsearchClient.init` AND `search_logs` so I can both (a) capture the `ElasticsearchConfig` the tool builds and (b) control what `search_logs` returns. This let me write the "blank `api_key` → `None`" and "blank `index_pattern` → `*`" tests with no plumbing duplication.

Tricky cases that drove specific tests:

  • OpenObserve auth precedence — when both bearer token AND basic credentials are present, the tool short-circuits on bearer in `_auth_headers`. I added `test_bearer_takes_precedence_over_basic` to lock that contract.
  • Snowflake `_ensure_sql_limit` — must not double-stamp `LIMIT` if the caller already wrote one. I added a test that asserts `statement.lower().count("limit") == 1` AND the trailing `;` is stripped.
  • OpenSearch `time_range_minutes=0` — the tool floors it via `max(1, time_range_minutes)` to avoid zero-length windows. Without a test, future refactors could silently break this.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

…nSearch tools (Tracer-Cloud#790)

Per the issue's proposed coverage list, adds three new dedicated test files
under tests/tools/:

- test_snowflake_query_history_tool.py (26 tests)
  is_available + extract_params, account/token validation, default vs
  custom SQL with LIMIT enforcement, optional session params (warehouse/
  role/database/schema), bounded limit cap, row normalization for both
  dict-rows and tabular rows + resultSetMetaData, HTTP failure handling.

- test_openobserve_logs_tool.py (36 tests)
  is_available across token-only / basic-auth-only / both, extract_params,
  bearer vs basic Authorization header behavior (bearer wins), endpoint
  org templating, default SQL fallback, time window + size payload shape,
  stream_name only when set, bounded limit cap, record extraction across
  the four supported response shapes (top-level hits list, Elastic-style
  hits.hits._source, records, data), HTTP failure handling.

- test_opensearch_analytics_tool.py (26 tests)
  is_available + extract_params, ElasticsearchConfig normalization (URL
  trim, blank api_key -> None, blank index -> '*'), search call shape
  (query default, time_range floor 1, bounded limit, _MAX_HARD_LIMIT cap),
  log filtering (non-dict items dropped), missing/non-list logs field,
  client error propagation (with and without 'error' field).

Each file inherits from BaseToolContract for the shared metadata/contract
checks and uses pytest's monkeypatch to stub httpx.post or
ElasticsearchClient — no live HTTP calls.

tests/tools/ -> 1541 passed (was 1453; +88 new tests, no regressions).
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 3, 2026

Greptile Summary

Adds 88 dedicated unit tests (26 + 36 + 26) for the Snowflake, OpenObserve, and OpenSearch tools, mirroring the existing test conventions: BaseToolContract for shared metadata checks and monkeypatch for deterministic mocking with no live HTTP calls. All assertions were verified correct against the current tool implementations — including SQL LIMIT enforcement, auth-header precedence, ElasticsearchConfig normalization, and all four OpenObserve response shapes.

Confidence Score: 4/5

Safe to merge — test-only additions with no changes to production code and no regressions reported.

All 88 test assertions are logically correct relative to the tool implementations. Two P2 findings: a duplicated _MockResponse class across two files (could live in conftest.py) and inconsistent ElasticsearchClient.init patching in the OpenSearch tests (safe today due to lazy init, but fragile if that ever changes). No P0 or P1 issues.

tests/tools/test_opensearch_analytics_tool.py — inconsistent init patching; tests/tools/test_snowflake_query_history_tool.py — duplicated _MockResponse

Important Files Changed

Filename Overview
tests/tools/test_openobserve_logs_tool.py 36 new unit tests covering availability gating (token vs. basic auth), auth header precedence, endpoint/org templating, default SQL fallback, payload shape, stream_name inclusion, bounded limits, all four response shapes, and HTTP failure handling — all assertions verified correct against the tool implementation.
tests/tools/test_opensearch_analytics_tool.py 26 new unit tests covering availability, param extraction, ElasticsearchConfig normalization, search-call shape, log filtering, missing/non-list logs field, and error propagation — correct logic throughout, but several tests only patch search_logs without patching init, creating a latent fragility if ElasticsearchClient initialization ever becomes eager.
tests/tools/test_snowflake_query_history_tool.py 26 new unit tests covering availability gating, param extraction, SQL LIMIT enforcement (appending, no-double-stamp, semicolon stripping), optional session params, bounded limits, dict-row and tabular-row normalization, and HTTP failure handling — all assertions correct, but the _MockResponse helper class is duplicated verbatim from test_openobserve_logs_tool.py.

Comments Outside Diff (1)

  1. tests/tools/test_opensearch_analytics_tool.py, line 729-766 (link)

    P2 Inconsistent ElasticsearchClient.__init__ patching

    Tests that verify search-call shape (e.g. test_search_query_defaults_to_star_when_blank, test_time_range_minutes_floor_one, test_bounded_limit_caps_caller_request) patch only ElasticsearchClient.search_logs but leave __init__ unpatched, while the config-normalization tests patch both. This works correctly today because __init__ is lazy (self._client = None; no connection made), but the asymmetry makes it less obvious which tests are actually isolated from the real client. If __init__ ever becomes eager (e.g. validates URL reachability), several of these tests would start failing with confusing network errors rather than a clear mocking gap. Patching __init__ consistently across all test functions would make the isolation explicit.

Reviews (1): Last reviewed commit: "test(tools): add dedicated unit tests fo..." | Re-trigger Greptile

Comment on lines +27 to +45
raise self._error

def json(self) -> dict[str, Any]:
return self._payload


# ---------------------------------------------------------------------------
# Contract — metadata, is_available, extract_params surface
# ---------------------------------------------------------------------------


class TestSnowflakeQueryHistoryToolContract(BaseToolContract):
def get_tool_under_test(self) -> Any:
return query_snowflake_history.__opensre_registered_tool__


def _rt() -> Any:
return query_snowflake_history.__opensre_registered_tool__

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated _MockResponse helper class

_MockResponse is defined identically here and in test_openobserve_logs_tool.py (lines 27–39). Both files duplicate the same two-method stub with the same raise_for_status_error kwarg pattern. Moving it to tests/tools/conftest.py alongside mock_http_response would eliminate the duplication and make both files shorter, especially if more tools are added later.

Address two Greptile P2 findings on PR Tracer-Cloud#1238:

1. _MockResponse was duplicated in test_snowflake_query_history_tool.py
   and test_openobserve_logs_tool.py. Promoted it to tests/tools/conftest.py
   as MockHttpxResponse with a docstring, and both files now import it.

2. test_opensearch_analytics_tool.py patched ElasticsearchClient.__init__
   in three tests but only search_logs in nine others. The asymmetry worked
   today because __init__ is lazy, but it would silently break if __init__
   ever validated the URL or opened a connection. Added _install_es_stubs
   helper that patches both methods together and returns the captured
   ElasticsearchConfig — every test now goes through it.

tests/tools/ -> 1541 passed (unchanged), ruff lint + format check clean.
@Powlisher
Copy link
Copy Markdown
Contributor Author

Addressed both Greptile P2 findings in 2eefef1:

  1. Promoted _MockResponse to tests/tools/conftest.py as a documented MockHttpxResponse class. Both test_snowflake_query_history_tool.py and test_openobserve_logs_tool.py now import it instead of redefining.
  2. Added an _install_es_stubs(monkeypatch, search_impl) helper in test_opensearch_analytics_tool.py that patches both ElasticsearchClient.__init__ and search_logs together. Every OpenSearch test now goes through it, so isolation is consistent regardless of whether __init__ ever stops being lazy.

tests/tools/ still 1541/1541 PASS, ruff check + ruff format --check clean.

@muddlebee muddlebee merged commit 226e8f0 into Tracer-Cloud:main May 4, 2026
10 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

💜 One more reason the project grows. Thanks @Powlisher — your contribution just landed!


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dedicated tool tests for Snowflake, OpenObserve, and OpenSearch integration-wave tools

2 participants