test: add regression test for text field aggregation bug#3952
Merged
Conversation
Add pg_regress test for the "unexpected type Str" bug that affected v0.21.2 when using metric aggregations (value_count, count, etc.) on text fields as sub-aggregations inside bucket aggregations. The bug was fixed in tantivy commit 65b5a1a3. Tests cover: - GROUP BY on text field + ORDER BY count() (high cardinality) - pdb.agg value_count on text field with GROUP BY - Histogram + value_count sub-aggregation on text field - Range + value_count sub-aggregation on text field - Simple value_count on text field (top-level, always worked)
Added EXPLAIN (FORMAT TEXT, COSTS OFF, TIMING OFF, VERBOSE) before each test query to show query plans and verify the custom scan is being used.
mdashti
pushed a commit
that referenced
this pull request
Feb 13, 2026
## Summary - Adds pg_regress regression test for the "unexpected type Str" bug that affected v0.21.2 - Tests metric aggregations (value_count, count) on TEXT fields when used as sub-aggregations inside bucket aggregations (histogram, range, terms) - The bug was fixed in tantivy commit 65b5a1a3 - Includes EXPLAIN output showing query plans to verify custom scan is being used ## Test Cases 1. GROUP BY on text field + ORDER BY count() - High cardinality triggers HashMap path 2. pdb.agg value_count on text field with GROUP BY - Direct value_count on text 3. Histogram + value_count sub-aggregation - Bucket agg with metric sub-agg on text 4. Range + value_count sub-aggregation - Range buckets with text field metric 5. Simple value_count on text field - Top-level metric (baseline, always worked) ## Background The bug occurred when SegmentStatsCollector::collect() received a text field column type (ColumnType::Str) during sub-aggregation collection. Unlike collect_block_with_field(), it lacked the is_number_or_date_type check and panicked at f64_from_fastfield_u64(). This test ensures any future tantivy regressions in this area are caught.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test Cases
Background
The bug occurred when SegmentStatsCollector::collect() received a text field column type (ColumnType::Str) during sub-aggregation collection. Unlike collect_block_with_field(), it lacked the is_number_or_date_type check and panicked at f64_from_fastfield_u64().
This test ensures any future tantivy regressions in this area are caught.