test: add pytest for queries and assert hits #8241

neha00290 · 2025-09-01T14:34:37Z

PR Type

Tests

Description

Add end-to-end SQL query validation tests
Verify WHERE, match_all, ORDER BY semantics
Enforce LIMIT and result consistency checks
Validate CTE filtering and fields presence

Diagram Walkthrough

flowchart LR
  A["New E2E tests"] -- "WHERE filter" --> B["Validate field equality"]
  A -- "match_all" --> C["Check term presence"]
  A -- "ORDER BY" --> D["Assert timestamp DESC"]
  A -- "LIMIT" --> E["Enforce max hits"]
  A -- "CTE" --> F["Validate filtered fields"]

File Walkthrough

Relevant files

Tests

test_search.py `Add comprehensive E2E SQL query tests` tests/api-testing/tests/test_search.py Add test for WHERE condition equality Add test for match_all term presence Add test for ORDER BY timestamp DESC Add tests for LIMIT and CTE behavior	+252/-0

github-actions · 2025-09-01T14:35:26Z

PR Reviewer Guide 🔍

(Review updated until commit `e79f915`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Flaky Timing Window Tests constrain the time range to the last minute; depending on data ingestion latency, this may intermittently yield zero hits and cause flaky results. Consider widening the window or waiting for data readiness. now = datetime.now(timezone.utc) end_time = int(now.timestamp() * 1000000) one_min_ago = int((now - timedelta(minutes=1)).timestamp() * 1000000) json_data = { "query": { "sql": "SELECT * FROM \"stream_pytest_data\" WHERE kubernetes_container_name = 'ziox'", "start_time": one_min_ago, "end_time": end_time, "from": 0, "size": 100, "quick_mode": False } } resp_get_where_query = session.post(f"{url}api/{org_id}/_search?type=logs", json=json_data) assert ( resp_get_where_query.status_code == 200 ), f"WHERE condition query failed with status {resp_get_where_query.status_code} {resp_get_where_query.content}" response_data = resp_get_where_query.json() # Assert that we got results assert "hits" in response_data, "Response should contain 'hits' field" hits = response_data["hits"] # If we have hits, validate that ALL hits contain kubernetes_container_name = 'ziox' if len(hits) > 0: print(f"Found {len(hits)} hits matching the WHERE condition") for i, hit in enumerate(hits): assert "kubernetes_container_name" in hit, f"Hit {i} should contain 'kubernetes_container_name' field" assert hit["kubernetes_container_name"] == "ziox", ( f"Hit {i} kubernetes_container_name should be 'ziox', but got '{hit.get('kubernetes_container_name')}'" ) print(f"✅ All {len(hits)} hits have kubernetes_container_name = 'ziox' as expected") else: print("⚠️ No hits found for the WHERE condition query") # Log the total count and took time for monitoring total_hits = response_data.get("total", 0) took_time = response_data.get("took", 0) print(f"Query executed in {took_time}ms and found {total_hits} total matching records") def test_e2e_match_all_validation(create_session, base_url): """Test match_all query and validate all hits contain the search term.""" session = create_session url = base_url org_id = "default" now = datetime.now(timezone.utc) end_time = int(now.timestamp() * 1000000) one_min_ago = int((now - timedelta(minutes=1)).timestamp() * 1000000) json_data = { "query": { "sql": "SELECT * FROM \"stream_pytest_data\" WHERE match_all('level=info')", "start_time": one_min_ago, "end_time": end_time, "from": 0, "size": 50, "quick_mode": False } } resp_get_match_query = session.post(f"{url}api/{org_id}/_search?type=logs", json=json_data) assert ( resp_get_match_query.status_code == 200 ), f"match_all query failed with status {resp_get_match_query.status_code} {resp_get_match_query.content}" response_data = resp_get_match_query.json() assert "hits" in response_data, "Response should contain 'hits' field" hits = response_data["hits"] if len(hits) > 0: print(f"Found {len(hits)} hits for match_all query") matching_hits = 0 for i, hit in enumerate(hits): # Convert hit to string and check if it contains 'level=info' hit_str = str(hit).lower() if 'level=info' in hit_str or 'level":"info' in hit_str: matching_hits += 1 else: print(f"Hit {i} does not contain 'level=info': {hit}") # For match_all queries, we expect at least some hits to contain the term assert matching_hits > 0, f"Expected some hits to contain 'level=info', but found {matching_hits}/{len(hits)}" print(f"✅ {matching_hits}/{len(hits)} hits contain 'level=info' as expected") else: print("⚠️ No hits found for match_all query") total_hits = response_data.get("total", 0) took_time = response_data.get("took", 0) print(f"match_all query executed in {took_time}ms and found {total_hits} total matching records") Assertion Strictness The WHERE test enforces all hits have exact match on 'kubernetes_container_name' value 'ziox'. If the source stream includes heterogeneous records, this may cause false negatives. Consider scoping the dataset or using additional filters. # If we have hits, validate that ALL hits contain kubernetes_container_name = 'ziox' if len(hits) > 0: print(f"Found {len(hits)} hits matching the WHERE condition") for i, hit in enumerate(hits): assert "kubernetes_container_name" in hit, f"Hit {i} should contain 'kubernetes_container_name' field" assert hit["kubernetes_container_name"] == "ziox", ( f"Hit {i} kubernetes_container_name should be 'ziox', but got '{hit.get('kubernetes_container_name')}'" ) print(f"✅ All {len(hits)} hits have kubernetes_container_name = 'ziox' as expected") else: match_all Validation Weak The match_all test only asserts that some hits contain the term via string search on the hit dict. This may pass with unrelated fields or formatting. Prefer asserting the structured 'level' field equals 'info' for all hits returned. matching_hits = 0 for i, hit in enumerate(hits): # Convert hit to string and check if it contains 'level=info' hit_str = str(hit).lower() if 'level=info' in hit_str or 'level":"info' in hit_str: matching_hits += 1 else: print(f"Hit {i} does not contain 'level=info': {hit}") # For match_all queries, we expect at least some hits to contain the term assert matching_hits > 0, f"Expected some hits to contain 'level=info', but found {matching_hits}/{len(hits)}" print(f"✅ {matching_hits}/{len(hits)} hits contain 'level=info' as expected") else:

github-actions · 2025-09-01T14:35:53Z

PR Code Suggestions ✨

Latest suggestions up to e79f915
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Enforce strict timestamp presence Defaulting missing `_timestamp` to 0 hides ordering errors and may pass incorrectly. Fail fast if `_timestamp` is missing or not an integer to ensure correct validation. tests/api-testing/tests/test_search.py [1157-1163] # Validate descending timestamp order for i in range(len(hits) - 1): - current_timestamp = hits[i].get("_timestamp", 0) - next_timestamp = hits[i + 1].get("_timestamp", 0) - + assert "_timestamp" in hits[i] and "_timestamp" in hits[i + 1], f"Missing '_timestamp' in hit {i} or {i+1}" + current_timestamp = hits[i]["_timestamp"] + next_timestamp = hits[i + 1]["_timestamp"] + assert isinstance(current_timestamp, (int, float)) and isinstance(next_timestamp, (int, float)), "Timestamps must be numeric" assert current_timestamp >= next_timestamp, ( f"Timestamps not in DESC order: hit {i} ({current_timestamp}) should be >= hit {i+1} ({next_timestamp})" ) Suggestion importance[1-10]: 8 __ Why: Requiring `_timestamp` presence and numeric type prevents false positives where missing values default to 0, directly strengthening the ORDER BY validation with accurate, low-risk checks.	Medium
Possible issue	Normalize hits structure handling The API often wraps results as an object with items under `hits['hits']`. Guard for both structures to avoid false negatives and KeyErrors. Normalize `hits` to a list before validations. tests/api-testing/tests/test_search.py [1095-1098] assert "hits" in response_data, "Response should contain 'hits' field" -hits = response_data["hits"] +# Normalize hits shape: allow either list at top-level or under 'hits' +hits_section = response_data["hits"] +if isinstance(hits_section, dict) and "hits" in hits_section: + hits = hits_section["hits"] +else: + hits = hits_section if isinstance(hits_section, list) else [] Suggestion importance[1-10]: 6 __ Why: The normalization can prevent KeyErrors if the API returns Elasticsearch-like hits format, improving robustness with minimal risk; however, it assumes an alternate response shape not evidenced in this PR.	Low
General	Widen time window to stabilize Using the current moving window can create flaky tests when data isn’t present in the last minute. Expand the window slightly to reduce flakiness and ensure results during low activity. tests/api-testing/tests/test_search.py [1075-1077] now = datetime.now(timezone.utc) end_time = int(now.timestamp() * 1000000) -one_min_ago = int((now - timedelta(minutes=1)).timestamp() * 1000000) +one_min_ago = int((now - timedelta(minutes=5)).timestamp() * 1000000) Suggestion importance[1-10]: 5 __ Why: Expanding the time window can reduce test flakiness when data is sparse, but it may also loosen test precision and isn't strictly required by the diff.	Low

Previous suggestions

Suggestions up to commit 4531a4b

Category	Suggestion	Impact
Possible issue	Enforce timestamp presence and type Defaulting missing '_timestamp' to 0 can mask ordering issues and create false positives. Fail fast when a hit lacks '_timestamp' or when values are non-numeric by asserting presence and type before comparing. tests/api-testing/tests/test_search.py [1157-1163] for i in range(len(hits) - 1): - current_timestamp = hits[i].get("_timestamp", 0) - next_timestamp = hits[i + 1].get("_timestamp", 0) - + assert "_timestamp" in hits[i] and "_timestamp" in hits[i + 1], f"Missing '_timestamp' in hits {i} or {i+1}" + current_timestamp = hits[i]["_timestamp"] + next_timestamp = hits[i + 1]["_timestamp"] + assert isinstance(current_timestamp, (int, float)) and isinstance(next_timestamp, (int, float)), f"Non-numeric '_timestamp' in hits {i} or {i+1}" assert current_timestamp >= next_timestamp, ( f"Timestamps not in DESC order: hit {i} ({current_timestamp}) should be >= hit {i+1} ({next_timestamp})" ) Suggestion importance[1-10]: 7 __ Why: Asserting `_timestamp` presence and numeric type avoids masking ordering issues caused by defaulting to 0, improving test correctness with moderate impact.	Medium
Possible issue	Validate hits type robustly Guard against non-list 'hits' to prevent type errors when iterating or taking len. Some backends return hits under a nested object or as None on empty results. Validate type and normalize to a list before assertions. tests/api-testing/tests/test_search.py [1095-1098] assert "hits" in response_data, "Response should contain 'hits' field" +hits = response_data["hits"] or [] +assert isinstance(hits, list), f"'hits' should be a list, got {type(hits).__name__}" -hits = response_data["hits"] - Suggestion importance[1-10]: 6 __ Why: Normalizing and type-checking `hits` prevents errors when iterating or using len if the API returns None or a non-list. It's a reasonable robustness improvement, though not critical.	Low
General	Normalize total count value The API often wraps total inside an object (e.g., {'value': X}). Accessing it directly may print dicts or zero. Normalize 'total' to an int to avoid misleading logs and assertions. Apply this wherever total is read. tests/api-testing/tests/test_search.py [1064-1066] -total_hits = response_data.get("total", 0) +total_raw = response_data.get("total", 0) +total_hits = total_raw.get("value", 0) if isinstance(total_raw, dict) else int(total_raw or 0) took_time = response_data.get("took", 0) print(f"Query executed in {took_time}ms and found {total_hits} total matching records") Suggestion importance[1-10]: 6 __ Why: Handling `total` as either a dict with `value` or a raw int avoids misleading logs; it's accurate and improves resilience but is a minor enhancement.	Low

github-actions · 2025-09-01T15:19:16Z

Persistent review updated to latest commit e79f915

greptile-apps

Greptile Summary

This PR enhances the existing test suite for OpenObserve's search API by adding 5 comprehensive end-to-end validation tests that go beyond basic HTTP status code checks to validate actual query result correctness. The changes are made exclusively to tests/api-testing/tests/test_search.py, adding 252 lines of new test code.

The new tests validate critical SQL query functionality:

WHERE condition validation: Ensures all returned hits actually match the WHERE clause filter (kubernetes_container_name = 'vector')
match_all search validation: Verifies that when using match_all queries, at least some results contain the expected search term
Timestamp ordering validation: Confirms that ORDER BY _timestamp DESC actually returns results in descending chronological order
LIMIT clause validation: Checks that LIMIT 3 actually restricts results to 3 or fewer hits
CTE query validation: Validates Common Table Expression queries return expected filtered results (level = 'info')

These tests follow a consistent pattern: they execute SQL queries against the search API, parse the JSON response, and perform detailed assertions on the returned data structure. Each test includes proper error handling for empty result sets, informative console logging for debugging, and meaningful assertions that validate data correctness rather than just API availability. This addresses a critical gap in the existing test suite, which only validated HTTP response codes (200/400) without ensuring the returned data was actually correct.

The implementation integrates seamlessly with the existing test infrastructure, using the same session management, base URL configuration, and time range handling patterns as the current tests. This systematic approach to improving test quality is essential for a search and analytics platform where data accuracy is paramount.

Confidence score: 4/5

This PR is safe to merge with minimal risk as it only adds test cases without modifying production code
Score reflects well-structured test additions that follow existing patterns and improve test coverage significantly
No files require special attention as this is purely additive test code with consistent implementation patterns

_{1 file reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

tests/api-testing/tests/test_search.py

…erve into pytest-query-results

test : add pytest for queries and assert hits

4531a4b

github-actions bot added Invalid PR Title ✏️ Feature 🧹 Updates labels Sep 1, 2025

github-actions bot added the Review effort 2/5 label Sep 1, 2025

Merge branch 'main' into pytest-query-results

3b55b4d

neha00290 changed the title ~~test : add pytest for queries and assert hits~~ test: add pytest for queries and assert hits Sep 1, 2025

neha00290 removed the Invalid PR Title label Sep 1, 2025

neha00290 marked this pull request as ready for review September 1, 2025 15:18

Merge branch 'main' into pytest-query-results

e79f915

greptile-apps bot reviewed Sep 1, 2025

View reviewed changes

tests/api-testing/tests/test_search.py Outdated Show resolved Hide resolved

neha00290 added 2 commits September 1, 2025 20:51

fix exact assertions

d989a7e

Merge branch 'pytest-query-results' of github.com:openobserve/openobs…

cfbbf1c

…erve into pytest-query-results

Shrinath-O2 approved these changes Sep 1, 2025

View reviewed changes

add breaking case

23eab26

oasisk merged commit ed11cd1 into main Sep 1, 2025
29 of 30 checks passed

oasisk deleted the pytest-query-results branch September 1, 2025 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add pytest for queries and assert hits #8241

test: add pytest for queries and assert hits #8241

Uh oh!

neha00290 commented Sep 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

test: add pytest for queries and assert hits #8241

test: add pytest for queries and assert hits #8241

Uh oh!

Conversation

neha00290 commented Sep 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit e79f915)

Uh oh!

github-actions bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neha00290 commented Sep 1, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Sep 1, 2025 •

edited

Loading

(Review updated until commit `e79f915`)

github-actions bot commented Sep 1, 2025 •

edited

Loading