test(tools): add dedicated unit tests for Snowflake, OpenObserve, OpenSearch tools (#790)#1238
Conversation
…nSearch tools (Tracer-Cloud#790) Per the issue's proposed coverage list, adds three new dedicated test files under tests/tools/: - test_snowflake_query_history_tool.py (26 tests) is_available + extract_params, account/token validation, default vs custom SQL with LIMIT enforcement, optional session params (warehouse/ role/database/schema), bounded limit cap, row normalization for both dict-rows and tabular rows + resultSetMetaData, HTTP failure handling. - test_openobserve_logs_tool.py (36 tests) is_available across token-only / basic-auth-only / both, extract_params, bearer vs basic Authorization header behavior (bearer wins), endpoint org templating, default SQL fallback, time window + size payload shape, stream_name only when set, bounded limit cap, record extraction across the four supported response shapes (top-level hits list, Elastic-style hits.hits._source, records, data), HTTP failure handling. - test_opensearch_analytics_tool.py (26 tests) is_available + extract_params, ElasticsearchConfig normalization (URL trim, blank api_key -> None, blank index -> '*'), search call shape (query default, time_range floor 1, bounded limit, _MAX_HARD_LIMIT cap), log filtering (non-dict items dropped), missing/non-list logs field, client error propagation (with and without 'error' field). Each file inherits from BaseToolContract for the shared metadata/contract checks and uses pytest's monkeypatch to stub httpx.post or ElasticsearchClient — no live HTTP calls. tests/tools/ -> 1541 passed (was 1453; +88 new tests, no regressions).
Greptile SummaryAdds 88 dedicated unit tests (26 + 36 + 26) for the Snowflake, OpenObserve, and OpenSearch tools, mirroring the existing test conventions: Confidence Score: 4/5Safe to merge — test-only additions with no changes to production code and no regressions reported. All 88 test assertions are logically correct relative to the tool implementations. Two P2 findings: a duplicated _MockResponse class across two files (could live in conftest.py) and inconsistent ElasticsearchClient.init patching in the OpenSearch tests (safe today due to lazy init, but fragile if that ever changes). No P0 or P1 issues. tests/tools/test_opensearch_analytics_tool.py — inconsistent init patching; tests/tools/test_snowflake_query_history_tool.py — duplicated _MockResponse Important Files Changed
|
| raise self._error | ||
|
|
||
| def json(self) -> dict[str, Any]: | ||
| return self._payload | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Contract — metadata, is_available, extract_params surface | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class TestSnowflakeQueryHistoryToolContract(BaseToolContract): | ||
| def get_tool_under_test(self) -> Any: | ||
| return query_snowflake_history.__opensre_registered_tool__ | ||
|
|
||
|
|
||
| def _rt() -> Any: | ||
| return query_snowflake_history.__opensre_registered_tool__ | ||
|
|
There was a problem hiding this comment.
Duplicated
_MockResponse helper class
_MockResponse is defined identically here and in test_openobserve_logs_tool.py (lines 27–39). Both files duplicate the same two-method stub with the same raise_for_status_error kwarg pattern. Moving it to tests/tools/conftest.py alongside mock_http_response would eliminate the duplication and make both files shorter, especially if more tools are added later.
Address two Greptile P2 findings on PR Tracer-Cloud#1238: 1. _MockResponse was duplicated in test_snowflake_query_history_tool.py and test_openobserve_logs_tool.py. Promoted it to tests/tools/conftest.py as MockHttpxResponse with a docstring, and both files now import it. 2. test_opensearch_analytics_tool.py patched ElasticsearchClient.__init__ in three tests but only search_logs in nine others. The asymmetry worked today because __init__ is lazy, but it would silently break if __init__ ever validated the URL or opened a connection. Added _install_es_stubs helper that patches both methods together and returns the captured ElasticsearchConfig — every test now goes through it. tests/tools/ -> 1541 passed (unchanged), ruff lint + format check clean.
|
Addressed both Greptile P2 findings in 2eefef1:
|
|
💜 One more reason the project grows. Thanks @Powlisher — your contribution just landed! 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Fixes #790
Describe the changes you have made in this PR
Adds three new dedicated test files under `tests/tools/` with the file names and proposed coverage exactly as listed in #790. Each file mirrors the existing tool test conventions: inherits from `BaseToolContract` for the shared metadata/contract checks, uses pytest `monkeypatch` to stub `httpx.post` (or `ElasticsearchClient` for OpenSearch) — no live HTTP calls, no required external services.
Demo/Screenshot for feature changes and bug fixes
```
$ .venv/Scripts/python -m pytest tests/tools/test_snowflake_query_history_tool.py tests/tools/test_openobserve_logs_tool.py tests/tools/test_opensearch_analytics_tool.py -v
============================= 88 passed in 1.55s ==============================
```
Full `tests/tools/` suite: 1541 passed in 6.96s (was 1453 before this PR; +88 new tests, no regressions).
Local CI gates:
Note on the existing `test_integration_wave_tools.py`
That file already exists and contains 1 limit-bound smoke test per tool for Snowflake / OpenObserve / OpenSearch (plus Azure). The new dedicated files supersede those in scope (deeper coverage of normalization, auth, validation, error paths). No changes were made to `test_integration_wave_tools.py` in this PR — happy to drop the now-duplicative cases there in a follow-up if a maintainer prefers.
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
For each tool, I started by reading `app/tools//init.py` to enumerate every branch the tool can take: validation guards, request shaping, response normalization, error handling. I then wrote one test per branch — each isolated, deterministic, and inheriting from `BaseToolContract` for the shared metadata checks (`name`/`description`/`input_schema`/`source`/`is_available` returns bool / `extract_params` returns dict).
Mocking strategy:
Tricky cases that drove specific tests:
Checklist before requesting a review