Skip to content

Refactor keys dict - [MOD-13151]#7843

Merged
GuyAv46 merged 8 commits intomasterfrom
guyav-refactor_keys_dict
Dec 28, 2025
Merged

Refactor keys dict - [MOD-13151]#7843
GuyAv46 merged 8 commits intomasterfrom
guyav-refactor_keys_dict

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Dec 19, 2025

Describe the changes in the pull request

Since our terms are no longer stored in Redis key-space, we don't need to format them before adding them to the spec's keys dictionary.

Avoiding the format will speed up query parsing, especially for complex queries and infix (suffix, prefix) queries, where we traverse many tokens and therefore perform numerous temporary allocations and string copies.

To stop formatting inverted index keys, we need:

  1. Change keysDict key type to avoid creating temporary RedisModuleStrings for lookup.
  2. Stop storing Vector indexes in this dictionary. This is required because the vector formatted-name template was the plain name given by the user, and we can't have 2 different types of values in the same dictionary if there is a chance for name collision. Luckily, it shouldn't have been added to this dictionary in the first place, and we can easily avoid that. Done in Remove non-TEXT fields from the spec's keys dictionary - [MOD-13150] #7871

We also now create (and allocate) NewQueryTerm only if it's needed, avoiding the paths where we allocate and later free the term without using it

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Consolidates key handling for in-memory inverted indexes and reduces per-term allocations across query paths.

  • keysDict refactor: keysDict now stores InvertedIndex directly keyed by CharBuf with a custom dictType (invIdxDictType); remove KeysDictValue wrapper and add CharBuf hash/dup/compare/destructors and InvIndFreeCb for values. IndexSpec_MakeKeyless uses the new dict type; missingFieldDict uses the shared value dtor.
  • Stop term key formatting: Redis_OpenInvertedIndex/openIndexKeysDict accept raw CharBuf (no RedisModuleString); introduce Legacy_fmtRedis* helpers and Redis_LegacyDropScanHandler for old key-space cleanup paths only.
  • Query iterator changes: Redis_OpenReader now takes RSToken and tok_id; RSQueryTerm is created only when building the iterator. Update token/prefix/wildcard expansion and disk-iterator sites to pass raw token data and increment tokenId appropriately.
  • GC/spec updates: Fork GC deletes terms from keysDict using CharBuf; spec drop logic and iterators switched to legacy formatters only where needed.
  • Minor comment/rename cleanup in field-spec info and related files.

Written by Cursor Bugbot for commit 4bb4b70. This will update automatically on new commits. Configure here.

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 92.72727% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.17%. Comparing base (f95b33f) to head (4bb4b70).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
src/query.c 50.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7843      +/-   ##
==========================================
- Coverage   84.22%   84.17%   -0.05%     
==========================================
  Files         360      361       +1     
  Lines       56225    56283      +58     
  Branches    15717    15720       +3     
==========================================
+ Hits        47356    47379      +23     
- Misses       8653     8688      +35     
  Partials      216      216              
Flag Coverage Δ
flow 85.17% <92.72%> (-0.16%) ⬇️
unit 51.94% <76.36%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fcostaoliveira
Copy link
Contributor

fcostaoliveira commented Dec 19, 2025

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 30 stable tests between versions.
  • Detected a total of 9 highly unstable benchmarks (9 baseline).
  • Detected a total of 2 improvements above the improvement water line.
  • Detected a total of 1 regressions bellow the regression water line 8.0%.

You can check a comparison in detail via the grafana link

Performance Improvements - Comparison between master and guyav-refactor_keys_dict.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-refactor_keys_dict (median obs. +- std.dev) % change (higher-better) Note
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 558 +- 4.3% (7 datapoints) 618 10.8% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-wildcard 8582 +- 3.6% (7 datapoints) 9454 10.2% IMPROVEMENT

Performance Regressions and Issues - Comparison between master and guyav-refactor_keys_dict.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-refactor_keys_dict (median obs. +- std.dev) % change (higher-better) Note
search-numeric-sortby 3786 +- 23.3% UNSTABLE (7 datapoints) 2205 -41.8% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 74.4% (baseline CV=30.3%); client: client latency stable; only server side confirms regression (client side stable) - insufficient evidence
search-numeric-sortby-optimize 30 +- 7.1% (7 datapoints) 27 -9.3% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 43 +- 31.6% UNSTABLE (7 datapoints) 40 -6.2% UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 increased 5.1% (baseline CV=12.6%); neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 2862 +- 11.4% UNSTABLE (7 datapoints) 2811 -1.8% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 344 +- 18.4% UNSTABLE (7 datapoints) 352 2.1% UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 decreased 5.5% (baseline CV=17.3%); neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1103 +- 11.9% UNSTABLE (7 datapoints) 1131 2.6% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-numeric-sortby-desc 2380 +- 31.9% UNSTABLE (7 datapoints) 2443 2.7% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 929 +- 16.0% UNSTABLE (7 datapoints) 958 3.2% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-filtering-tag-numeric 248 +- 13.5% UNSTABLE (7 datapoints) 268 7.9% UNSTABLE (baseline high variance); server: FT.AGGREGATE p50 decreased 7.7% (baseline CV=12.3%); client: Latency decreased 7.4% (baseline CV=12.1%); neither server nor client side confirms regression
search-numeric 2383 +- 30.1% UNSTABLE (7 datapoints) 3967 66.5% UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 41.0% (baseline CV=21.9%); client: Latency decreased 39.4% (baseline CV=21.4%); neither server nor client side confirms regression
Tests with No Significant Changes (30 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-refactor_keys_dict (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 93 +- 5.6% (7 datapoints) 91.0 -2.9% No Change
ftsb-10K-enwiki_abstract-hashes-term-prefix 6114 +- 6.9% (7 datapoints) 6347.0 3.8% potential IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-suffix 2203 +- 1.0% (7 datapoints) 2187.0 -0.7% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 16782 +- 0.8% (7 datapoints) 16339.0 -2.6% No Change
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1058 +- 4.1% (7 datapoints) 1075.0 1.6% No Change
ftsb-10K-enwiki_pages-hashes-load 64683 +- 4.4% (7 datapoints) 66703.0 3.1% potential IMPROVEMENT
ftsb-10K-multivalue-numeric-json 1034 +- 1.7% (7 datapoints) 1011.0 -2.2% No Change
ftsb-10K-singlevalue-numeric-json 487 +- 0.6% (7 datapoints) 491.0 0.8% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 1879 +- 1.1% (7 datapoints) 1921.0 2.2% No Change
ftsb-1M-enwiki_abstract-hashes-load 22618 +- 5.9% (7 datapoints) 23930.0 5.8% potential IMPROVEMENT
ftsb-1M-nyc_taxis-ftadd-load 29595 +- 4.9% (7 datapoints) 29625.0 0.1% No Change
ftsb-1M-nyc_taxis-hashes-load 32312 +- 3.5% (7 datapoints) 32716.0 1.3% No Change
search-aggregate-post-filter-simple.yml 17355 +- 0.7% (7 datapoints) 17776.0 2.4% No Change
search-filtering-tag-numeric-filter-pipeline 11304 +- 1.8% (7 datapoints) 11289.0 -0.1% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 958 +- 2.2% (7 datapoints) 945.0 -1.4% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 947 +- 2.4% (7 datapoints) 955.0 0.9% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 14471 +- 1.7% (7 datapoints) 14429.0 -0.3% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 14217 +- 1.3% (7 datapoints) 14620.0 2.8% No Change
search-ftsb-1700K-docs-union-iterators-q3 8.3 +- 1.0% (7 datapoints) 8.4 0.8% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 174 +- 5.6% (7 datapoints) 176.0 1.2% No Change
search-ftsb-370K-docs-union-iterators-q4 8.5 +- 1.1% (7 datapoints) 8.7 1.4% No Change
search-ftsb-5200K-docs-union-iterators-q1 0.86 +- 1.3% (7 datapoints) 0.9 4.7% potential IMPROVEMENT
search-ftsb-5500K-docs-union-iterators-q2 1.2 +- 1.5% (7 datapoints) 1.2 -1.6% No Change
search-geo 221 +- 2.6% (7 datapoints) 222.0 0.2% No Change
search-high-cardinality-negation-term-baseline 32 +- 6.1% (7 datapoints) 33.0 1.6% No Change
search-high-cardinality-negation-term-comparison_union_all_other_terms 13 +- 3.1% (7 datapoints) 13.0 0.9% No Change
search-numeric-optimize 8194 +- 0.4% (7 datapoints) 8346.0 1.9% No Change
search-numeric-sortby-desc-optimize 30 +- 6.5% (7 datapoints) 29.0 -2.1% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 150 +- 4.6% (7 datapoints) 148.0 -1.7% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 15999 +- 1.1% (7 datapoints) 15706.0 -1.8% No Change

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the internal key dictionary (keysDict) to use plain CharBuf structures instead of RedisModuleString for keys, and moves vector index storage from the shared dictionary to individual FieldSpec structures. This optimization eliminates temporary string allocations during query parsing.

Key changes:

  • Introduced CharBuf type and custom dictionary functions for hash/compare/duplicate/destroy operations
  • Modified IndexSpec_GetFormattedKey* functions to return CharBuf* instead of RedisModuleString*
  • Moved vector index storage from keysDict to FieldSpec.vectorOpts.vecIndex, managed per-field

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/spec.h Added CharBuf struct definition and updated IndexSpecFmtStrings to use CharBuf*
src/spec.c Implemented CharBuf dict operations; updated IndexSpec_GetFormattedKey* to return CharBuf*; updated cleanup logic
src/field_spec.h Added vecIndex pointer to vectorOpts for per-field vector index storage
src/field_spec.c Added cleanup for vecIndex in FieldSpec_Cleanup
src/vector_index.h Changed openVectorIndex to take FieldSpec* instead of key name
src/vector_index.c Simplified openVectorIndex to access field-level vecIndex directly; removed dict-based lookup
src/tag_index.h/c Updated TagIndex_Open to accept CharBuf*; updated callers
src/numeric_index.h/c Updated openNumericKeysDict to accept CharBuf*; updated callers
src/geometry_index.c Updated openGeometryKeysDict to accept CharBuf*
src/redis_index.h/c Updated Redis_OpenReader to take RSToken* and token id; renamed legacy functions; added Redis_LegacyDeleteKey
src/obfuscation/hidden.h/c Renamed HiddenString_DropFromKeySpace to HiddenString_LegacyDropFromKeySpace
src/query.c Updated query evaluation to pass tokens directly to Redis_OpenReader
src/module.c Updated to use new CharBuf* based APIs
src/document.c Updated indexers to use new API signatures
src/indexer.c Updated vector index access for document deletion
src/fork_gc.c Updated GC to use CharBuf for dictionary operations
src/debug_commands.c Updated debug commands to work with field specs and CharBuf*
src/info/field_spec_info.c Updated to use new openVectorIndex signature
tests/pytests/test_index.py Removed vector index count increments (vectors no longer in keysDict)
tests/cpptests/*.cpp Updated tests to use CharBuf* and new API signatures; removed string cleanup calls

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GuyAv46 GuyAv46 changed the title Refactor keys dict Refactor keys dict - [MOD-13151] Dec 21, 2025
@eyalrund

This comment was marked as resolved.

@GuyAv46 GuyAv46 force-pushed the guyav-refactor_keys_dict branch from 1d13500 to 35ec166 Compare December 22, 2025 12:04
@GuyAv46 GuyAv46 marked this pull request as draft December 22, 2025 16:18
@GuyAv46 GuyAv46 force-pushed the guyav-refactor_keys_dict branch from 35ec166 to ccdbe22 Compare December 23, 2025 16:56
@GuyAv46 GuyAv46 changed the base branch from master to guyav-deprecate_fields_formatting December 23, 2025 16:56
Base automatically changed from guyav-deprecate_fields_formatting to master December 24, 2025 13:11
@GuyAv46 GuyAv46 force-pushed the guyav-refactor_keys_dict branch from ccdbe22 to ecd02bc Compare December 24, 2025 13:28
@GuyAv46 GuyAv46 marked this pull request as ready for review December 24, 2025 13:33
@GuyAv46 GuyAv46 removed the size:L label Dec 24, 2025
ofiryanai
ofiryanai previously approved these changes Dec 24, 2025
Copy link
Contributor

@ofiryanai ofiryanai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GuyAv46 GuyAv46 enabled auto-merge December 25, 2025 10:21
@GuyAv46 GuyAv46 added this pull request to the merge queue Dec 28, 2025
Merged via the queue into master with commit 692e14f Dec 28, 2025
49 of 52 checks passed
@GuyAv46 GuyAv46 deleted the guyav-refactor_keys_dict branch December 28, 2025 11:25
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-7843-to-8.2 origin/8.2
cd .worktree/backport-7843-to-8.2
git switch --create backport-7843-to-8.2
git cherry-pick -x 692e14fa369df2c41bf1d577cbac92b0c6fee217

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.4
git worktree add -d .worktree/backport-7843-to-8.4 origin/8.4
cd .worktree/backport-7843-to-8.4
git switch --create backport-7843-to-8.4
git cherry-pick -x 692e14fa369df2c41bf1d577cbac92b0c6fee217

GuyAv46 added a commit that referenced this pull request Dec 28, 2025
* mark legacy API as such

* refactor spec's keysDict to hold char buffer

* cleanups

* make keysDict a dict for inverted indexes only (simplify value struct)

* use bool

* some comment fixes

* address review comment

* dead code cleanup

(cherry picked from commit 692e14f)
GuyAv46 added a commit that referenced this pull request Dec 28, 2025
* mark legacy API as such

* refactor spec's keysDict to hold char buffer

* cleanups

* make keysDict a dict for inverted indexes only (simplify value struct)

* use bool

* some comment fixes

* address review comment

* dead code cleanup

(cherry picked from commit 692e14f)
GuyAv46 added a commit that referenced this pull request Dec 29, 2025
* mark legacy API as such

* refactor spec's keysDict to hold char buffer

* cleanups

* make keysDict a dict for inverted indexes only (simplify value struct)

* use bool

* some comment fixes

* address review comment

* dead code cleanup

(cherry picked from commit 692e14f)
github-merge-queue bot pushed a commit that referenced this pull request Dec 29, 2025
* Refactor keys dict - [MOD-13151] (#7843)

* mark legacy API as such

* refactor spec's keysDict to hold char buffer

* cleanups

* make keysDict a dict for inverted indexes only (simplify value struct)

* use bool

* some comment fixes

* address review comment

* dead code cleanup

(cherry picked from commit 692e14f)

* CP fix

* CP fix for 8.2

* add back missing definition
github-merge-queue bot pushed a commit that referenced this pull request Dec 29, 2025
Refactor keys dict - [MOD-13151] (#7843)

* mark legacy API as such

* refactor spec's keysDict to hold char buffer

* cleanups

* make keysDict a dict for inverted indexes only (simplify value struct)

* use bool

* some comment fixes

* address review comment

* dead code cleanup

(cherry picked from commit 692e14f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants