Skip to content

test: add regression test for text field aggregation bug#3952

Merged
mithuncy merged 2 commits intomainfrom
fix/text-field-aggregation-regression-test
Jan 20, 2026
Merged

test: add regression test for text field aggregation bug#3952
mithuncy merged 2 commits intomainfrom
fix/text-field-aggregation-regression-test

Conversation

@mithuncy
Copy link
Copy Markdown
Contributor

Summary

  • Adds pg_regress regression test for the "unexpected type Str" bug that affected v0.21.2
  • Tests metric aggregations (value_count, count) on TEXT fields when used as sub-aggregations inside bucket aggregations (histogram, range, terms)
  • The bug was fixed in tantivy commit 65b5a1a3
  • Includes EXPLAIN output showing query plans to verify custom scan is being used

Test Cases

  1. GROUP BY on text field + ORDER BY count() - High cardinality triggers HashMap path
  2. pdb.agg value_count on text field with GROUP BY - Direct value_count on text
  3. Histogram + value_count sub-aggregation - Bucket agg with metric sub-agg on text
  4. Range + value_count sub-aggregation - Range buckets with text field metric
  5. Simple value_count on text field - Top-level metric (baseline, always worked)

Background

The bug occurred when SegmentStatsCollector::collect() received a text field column type (ColumnType::Str) during sub-aggregation collection. Unlike collect_block_with_field(), it lacked the is_number_or_date_type check and panicked at f64_from_fastfield_u64().

This test ensures any future tantivy regressions in this area are caught.

Add pg_regress test for the "unexpected type Str" bug that affected
v0.21.2 when using metric aggregations (value_count, count, etc.) on
text fields as sub-aggregations inside bucket aggregations.

The bug was fixed in tantivy commit 65b5a1a3.

Tests cover:
- GROUP BY on text field + ORDER BY count() (high cardinality)
- pdb.agg value_count on text field with GROUP BY
- Histogram + value_count sub-aggregation on text field
- Range + value_count sub-aggregation on text field
- Simple value_count on text field (top-level, always worked)
Added EXPLAIN (FORMAT TEXT, COSTS OFF, TIMING OFF, VERBOSE) before each
test query to show query plans and verify the custom scan is being used.
@mithuncy mithuncy added the cherry-pick/0.23.x Request that this PR to `main` should get an automatic cherry-pick PR to `0.23.x` after it lands. label Jan 20, 2026
Copy link
Copy Markdown
Contributor

@mdashti mdashti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mithuncy mithuncy added Do Not Cherry Pick PR should not be cherry-picked to other branches and removed cherry-pick/0.23.x Request that this PR to `main` should get an automatic cherry-pick PR to `0.23.x` after it lands. labels Jan 20, 2026
@mithuncy mithuncy merged commit c36317f into main Jan 20, 2026
21 of 23 checks passed
@mithuncy mithuncy deleted the fix/text-field-aggregation-regression-test branch January 20, 2026 16:47
mdashti pushed a commit that referenced this pull request Feb 13, 2026
## Summary

- Adds pg_regress regression test for the "unexpected type Str" bug that
affected v0.21.2
- Tests metric aggregations (value_count, count) on TEXT fields when
used as sub-aggregations inside bucket aggregations (histogram, range,
terms)
- The bug was fixed in tantivy commit 65b5a1a3
- Includes EXPLAIN output showing query plans to verify custom scan is
being used

## Test Cases

1. GROUP BY on text field + ORDER BY count() - High cardinality triggers
HashMap path
2. pdb.agg value_count on text field with GROUP BY - Direct value_count
on text
3. Histogram + value_count sub-aggregation - Bucket agg with metric
sub-agg on text
4. Range + value_count sub-aggregation - Range buckets with text field
metric
5. Simple value_count on text field - Top-level metric (baseline, always
worked)

## Background

The bug occurred when SegmentStatsCollector::collect() received a text
field column type (ColumnType::Str) during sub-aggregation collection.
Unlike collect_block_with_field(), it lacked the is_number_or_date_type
check and panicked at f64_from_fastfield_u64().

This test ensures any future tantivy regressions in this area are
caught.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Do Not Cherry Pick PR should not be cherry-picked to other branches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants