feat: Add composite type support for BM25 indexes (>32 fields)#3776
feat: Add composite type support for BM25 indexes (>32 fields)#3776
Conversation
d6aca04 to
3071050
Compare
Support indexing >32 columns using ROW(...)::type expressions with field-level search granularity via composite.rs module and tests.
3071050 to
be575d6
Compare
Move all #[pg_test] composite tests to pg_regress golden tests using v2 query syntax (paradedb.parse, paradedb.boolean). Remove implementation-dependent assertions and update expected outputs.
Extract field name from composite type definition when column is on LHS of @@@ operator. Update composite tests to use v2 pdb APIs with EXPLAIN and TopN queries.
64df5e2 to
1bd5189
Compare
1bd5189 to
b2b77cc
Compare
- Use .is_some() instead of if let Some(_) for nodecast! check - Use crate::api::HashMap instead of std::collections::HashMap - Remove self-explanatory comments
- Replace on-demand caching with upfront unpacking via from_composites() - Add collect_composites_for_unpacking() helper in utils.rs - Simplify get_field_value() to use immutable reference and simple lookup - Update all callers: insert.rs, build_parallel.rs, mvcc.rs - Remove integration tests (covered by pg_regress tests)
ab8f6b2 to
74ac551
Compare
…hing Bug: Refactoring moved expr_no += 1 outside the tokenizer check and removed the early return None for non-tokenizable types.
Resolved conflicts: - pg_search/src/api/operator.rs: Keep composite type support, add type_is_alias guard to expr_matches_node (#3760 fix)
Avoid intermediate Vec allocation; clarify safety contract for lazy pointer dereferencing.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search single-server.toml Performance - TPS
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom Scan - Primary - tps |
581.6928221528699 median tps |
541.6051734873952 median tps |
0.93 |
Delete values - Primary - tps |
3088.6080709462467 median tps |
3010.4362011538165 median tps |
0.97 |
Index Only Scan - Primary - tps |
594.8230011332421 median tps |
622.8590893387328 median tps |
1.05 |
Index Scan - Primary - tps |
501.01156127041315 median tps |
452.78617553248864 median tps |
0.90 |
Insert value - Primary - tps |
3328.9979710095527 median tps |
3201.977317603253 median tps |
0.96 |
Update random values - Primary - tps |
2145.9159451854002 median tps |
2089.9038650329157 median tps |
0.97 |
Vacuum - Primary - tps |
156.04670488067885 median tps |
111.76418077854224 median tps |
0.72 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search single-server.toml Performance - Other Metrics
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom Scan - Primary - cpu |
4.6647234 median cpu |
4.678363 median cpu |
1.00 |
Custom Scan - Primary - mem |
57.712890625 median mem |
57.34375 median mem |
1.01 |
Delete values - Primary - cpu |
4.655674 median cpu |
4.669261 median cpu |
1.00 |
Delete values - Primary - mem |
33.1796875 median mem |
33.5390625 median mem |
0.99 |
Index Only Scan - Primary - cpu |
4.6647234 median cpu |
4.669261 median cpu |
1.00 |
Index Only Scan - Primary - mem |
58.234375 median mem |
57.796875 median mem |
1.01 |
Index Scan - Primary - cpu |
4.64666 median cpu |
4.655674 median cpu |
1.00 |
Index Scan - Primary - mem |
57.43359375 median mem |
57.203125 median mem |
1.00 |
Insert value - Primary - cpu |
4.660194 median cpu |
4.6647234 median cpu |
1.00 |
Insert value - Primary - mem |
45.8046875 median mem |
45.85546875 median mem |
1.00 |
Monitor Index Size - Primary - block_count |
1767 median block_count |
1702 median block_count |
1.04 |
Monitor Index Size - Primary - segment_count |
12 median segment_count |
7 median segment_count |
1.71 |
Update random values - Primary - cpu |
4.6511626 median cpu |
4.6875 median cpu |
0.99 |
Update random values - Primary - mem |
48.421875 median mem |
48.48828125 median mem |
1.00 |
Vacuum - Primary - cpu |
0 median cpu |
4.673807 median cpu |
0 |
Vacuum - Primary - mem |
49.8515625 median mem |
50.8125 median mem |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'pg_search single-server.toml Performance - Other Metrics'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Monitor Index Size - Primary - segment_count |
12 median segment_count |
7 median segment_count |
1.71 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @mithuncy
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search bulk-updates.toml Performance - TPS
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Bulk Update - Primary - tps |
7.4167670507693435 median tps |
7.440606065943415 median tps |
1.00 |
Count Query - Primary - tps |
5.481560146066513 median tps |
5.430168528742451 median tps |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search bulk-updates.toml Performance - Other Metrics
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Bulk Update - Primary - cpu |
23.233301 median cpu |
23.166023 median cpu |
1.00 |
Bulk Update - Primary - mem |
232.51171875 median mem |
232.1953125 median mem |
1.00 |
Count Query - Primary - cpu |
23.323614 median cpu |
23.233301 median cpu |
1.00 |
Count Query - Primary - mem |
172.30078125 median mem |
171.9921875 median mem |
1.00 |
Monitor Index Size - Primary - block_count |
48506 median block_count |
48783 median block_count |
0.99 |
Monitor Index Size - Primary - segment_count |
88 median segment_count |
89 median segment_count |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search wide-table.toml Performance - TPS
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Bulk Update - Primary - tps |
1089.4493537883277 median tps |
1129.8923655292951 median tps |
1.04 |
Single Insert - Primary - tps |
1219.6374030035308 median tps |
1229.8000758847982 median tps |
1.01 |
Single Update - Primary - tps |
1826.849307784978 median tps |
1926.2613579019683 median tps |
1.05 |
Top N - Primary - tps |
5.3612485573338535 median tps |
5.725964068646208 median tps |
1.07 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search wide-table.toml Performance - Other Metrics
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Background Merger - Primary - background_merging |
0 median background_merging |
0 median background_merging |
1 |
Background Merger - Primary - cpu |
4.660194 median cpu |
4.660194 median cpu |
1 |
Background Merger - Primary - mem |
22.77734375 median mem |
22.90625 median mem |
0.99 |
Bulk Update - Primary - cpu |
4.660194 median cpu |
4.669261 median cpu |
1.00 |
Bulk Update - Primary - mem |
165.8359375 median mem |
165.80859375 median mem |
1.00 |
Monitor Index Size - Primary - block_count |
66871 median block_count |
64565 median block_count |
1.04 |
Monitor Index Size - Primary - segment_count |
46 median segment_count |
47 median segment_count |
0.98 |
Single Insert - Primary - cpu |
4.6647234 median cpu |
4.655674 median cpu |
1.00 |
Single Insert - Primary - mem |
120.6796875 median mem |
123.1640625 median mem |
0.98 |
Single Update - Primary - cpu |
4.660194 median cpu |
4.660194 median cpu |
1 |
Single Update - Primary - mem |
165.29296875 median mem |
165.23828125 median mem |
1.00 |
Top N - Primary - cpu |
23.369036 median cpu |
23.346306 median cpu |
1.00 |
Top N - Primary - mem |
160.04296875 median mem |
160.0078125 median mem |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search background-merge.toml Performance - TPS
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom scan - Primary - tps |
32.3171187074876 median tps |
32.52606751548291 median tps |
1.01 |
Delete value - Primary - tps |
237.33447161497622 median tps |
238.48276200379138 median tps |
1.00 |
Insert value - Primary - tps |
1902.59970177451 median tps |
1868.3480100972972 median tps |
0.98 |
Update random values - Primary - tps |
154.4303790030059 median tps |
153.8506542306622 median tps |
1.00 |
Vacuum - Primary - tps |
14.939362952411155 median tps |
14.697187851368014 median tps |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search background-merge.toml Performance - Other Metrics
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom scan - Primary - cpu |
18.60465 median cpu |
18.60465 median cpu |
1 |
Custom scan - Primary - mem |
163.08984375 median mem |
151.4453125 median mem |
1.08 |
Delete value - Primary - cpu |
4.6511626 median cpu |
4.6511626 median cpu |
1 |
Delete value - Primary - mem |
117.12109375 median mem |
118.33984375 median mem |
0.99 |
Insert value - Primary - cpu |
4.6421666 median cpu |
4.6421666 median cpu |
1 |
Insert value - Primary - mem |
124.9140625 median mem |
125.390625 median mem |
1.00 |
Monitor Segment Count - Primary - block_count |
14031 median block_count |
14027 median block_count |
1.00 |
Monitor Segment Count - Primary - cpu |
4.6376815 median cpu |
4.6332045 median cpu |
1.00 |
Monitor Segment Count - Primary - mem |
99.01171875 median mem |
98.64453125 median mem |
1.00 |
Monitor Segment Count - Primary - segment_count |
26 median segment_count |
26 median segment_count |
1 |
Update random values - Primary - cpu |
9.248554 median cpu |
9.239654 median cpu |
1.00 |
Update random values - Primary - mem |
159.5390625 median mem |
150.65625 median mem |
1.06 |
Vacuum - Primary - cpu |
13.859479 median cpu |
13.913043 median cpu |
1.00 |
Vacuum - Primary - mem |
171.921875 median mem |
171.8671875 median mem |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search logical-replication.toml Performance - TPS
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom Scan - Subscriber - tps |
558.4758186580741 median tps |
546.2631357068778 median tps |
0.98 |
Index Only Scan - Subscriber - tps |
620.9670331744552 median tps |
660.1542280056016 median tps |
1.06 |
Parallel Custom Scan - Subscriber - tps |
85.68014104349551 median tps |
86.85447420211887 median tps |
1.01 |
Top N - Subscriber - tps |
109.46626996065463 median tps |
109.53068986286965 median tps |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
pg_search logical-replication.toml Performance - Other Metrics
Details
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Custom Scan - Subscriber - cpu |
4.5714283 median cpu |
4.5714283 median cpu |
1 |
Custom Scan - Subscriber - mem |
47.08984375 median mem |
47.40234375 median mem |
0.99 |
Delete values - Publisher - cpu |
4.58891 median cpu |
4.5584044 median cpu |
1.01 |
Delete values - Publisher - mem |
29.94140625 median mem |
29.8828125 median mem |
1.00 |
Find by ctid - Subscriber - cpu |
9.099526 median cpu |
9.108159 median cpu |
1.00 |
Find by ctid - Subscriber - mem |
49.61328125 median mem |
50.16015625 median mem |
0.99 |
Index Only Scan - Subscriber - cpu |
4.5714283 median cpu |
4.567079 median cpu |
1.00 |
Index Only Scan - Subscriber - mem |
46.78515625 median mem |
47.17578125 median mem |
0.99 |
Index Size Info - Subscriber - cpu |
4.567079 median cpu |
4.567079 median cpu |
1 |
Index Size Info - Subscriber - mem |
30.7421875 median mem |
30.86328125 median mem |
1.00 |
Index Size Info - Subscriber - pages |
1127 median pages |
1122 median pages |
1.00 |
Index Size Info - Subscriber - relation_size:MB |
8.8046875 median relation_size:MB |
8.765625 median relation_size:MB |
1.00 |
Index Size Info - Subscriber - segment_count |
8 median segment_count |
7 median segment_count |
1.14 |
Insert value A - Publisher - cpu |
4.567079 median cpu |
4.524034 median cpu |
1.01 |
Insert value A - Publisher - mem |
27.7421875 median mem |
27.41796875 median mem |
1.01 |
Insert value B - Publisher - cpu |
4.549763 median cpu |
4.5540795 median cpu |
1.00 |
Insert value B - Publisher - mem |
27.62890625 median mem |
27.4453125 median mem |
1.01 |
Parallel Custom Scan - Subscriber - cpu |
4.597701 median cpu |
4.58891 median cpu |
1.00 |
Parallel Custom Scan - Subscriber - mem |
44.9453125 median mem |
45.25 median mem |
0.99 |
| `SELECT | |||
| pid, | |||
| pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag, | |||
| application_name::text, | |||
| state::text | |||
| FROM pg_stat_replication; - Publisher - replication_lag:MB` | 0 median replication_lag:MB |
0 median replication_lag:MB |
1 |
Top N - Subscriber - cpu |
4.5714283 median cpu |
4.5714283 median cpu |
1 |
Top N - Subscriber - mem |
45.68359375 median mem |
46.0390625 median mem |
0.99 |
Update 1..9 - Publisher - cpu |
4.5801525 median cpu |
4.5933013 median cpu |
1.00 |
Update 1..9 - Publisher - mem |
30.5859375 median mem |
30.55859375 median mem |
1.00 |
Update 10,11 - Publisher - cpu |
4.567079 median cpu |
4.567079 median cpu |
1 |
Update 10,11 - Publisher - mem |
30.484375 median mem |
30.65625 median mem |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
paradedb-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'pg_search logical-replication.toml Performance - Other Metrics'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: 1d01c3b | Previous: ac76a85 | Ratio |
|---|---|---|---|
Index Size Info - Subscriber - segment_count |
8 median segment_count |
7 median segment_count |
1.14 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @mithuncy





Summary
Fixes #3686
ROW(...)::typeexpressionsChanges
Core Module (
pg_search/src/postgres/composite.rs)CompositeSlotValues- Unpacks composite values upfront during construction for field extractionCompositeFieldInfo- Metadata struct for composite type fieldsCompositeError- Validation errors for nested composites, anonymous ROW, and domain typesis_composite_type,get_composite_type_fields,get_composite_fields_for_indexField Matching (
pg_search/src/api/operator.rs)expr_matches_node- New helper function for matching WHERE clause expressions against indexed expressionsfield_name_from_nodeto detect composite type fields and match them by field nameIntegration (
pg_search/src/postgres/utils.rs)FieldSource::CompositeFieldvariant for composite-derived fieldsget_field_valuehelper to extract individual fields from unpacked compositesextract_field_attributesto detect and expand composite expressionsParallel Build & Insert
build_parallel.rsandinsert.rsupdated to useCompositeSlotValuesmvcc.rsupdated for MVCC-aware composite unpackingUsage Example
Test plan
composite.sql(39 test sections): Basic indexing, >32 fields, 100 fields, JSON/array fields, tokenizers, error cases, parallel builds, MVCC, and morecomposite_advanced.sql(8 test sections): Field-level queries withfield @@@ querysyntax, pdb functions, scoring, snippetscargo fmtandcargo clippyclean