Skip to content

[MOD-11406] Switch to rust II#6958

Merged
chesedo merged 25 commits intomasterfrom
switch_to_rust_ii
Nov 5, 2025
Merged

[MOD-11406] Switch to rust II#6958
chesedo merged 25 commits intomasterfrom
switch_to_rust_ii

Conversation

@chesedo
Copy link
Collaborator

@chesedo chesedo commented Oct 1, 2025

Describe the changes in the pull request

Switch inverted index implementation to use the Rust impl. The C benchmarks are also removed since the static library for the C inverted index is no longer compiled with this change.

NOTE

This PR changes the behaviour around turning numeric compression on / off. The old C code used to implement this change immediately all new and existing numeric inverted indices. This is not the case with the Rust code. For the Rust code it will only take effect for new indices. Existing indices have to be dropped and recreated to have the config change. See this commit for an example: 97678be

Post work

After this merge is done, we should complete RED-175099 to recover the memory increase.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Switch to the Rust inverted index, update GC/iterator APIs and FFI, remove C benches, and align tests/memory expectations with the new implementation.

  • Core/Build:
    • Replace C inverted_index with Rust implementation; remove src/inverted_index from build/link and include paths.
    • Update redisearch target links; drop inverted_index object.
  • GC & Index APIs:
    • Update GC delta API (InvertedIndex_GcDelta_Read/Free/Apply) and stats fields (entries_removed/bytes_freed/bytes_allocated/blocks_ignored).
    • Change IndexRepairParams field names; pass blockCount to InvertedIndex_BlocksSummaryFree.
    • Use const IndexBlock * refs; adjust HLL/cardinality reset logic and block index math.
  • Iterators/Numeric:
    • Relax iterator precondition (allow empty indexes).
    • Numeric median helper now receives num_entries.
  • Rust FFI/Deps:
    • Add/export inverted_index_ffi; update build to generate headers from headers/inverted_index.h.
    • Use NewInvertedIndex_Ex in benchers; wire through redisearch_rs re-exports.
  • Benchmarks:
    • Remove C-based encode/decode benches and related build scripts; keep Rust-only paths.
  • Tests:
    • Adjust numerous memory/size expectations, GC counters, block counts, and buffer growth behavior to match Rust index.
    • Update decoder context initialization and debug outputs accordingly.

Written by Cursor Bugbot for commit 1b6baac. This will update automatically on new commits. Configure here.

@fcostaoliveira
Copy link
Contributor

fcostaoliveira commented Oct 1, 2025

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 17 stable tests between versions.
  • Detected a total of 1 highly unstable benchmarks (1 baseline).
  • Detected a total of 1 regressions bellow the regression water line 8.0%.

You can check a comparison in detail via the grafana link

Performance Regressions and Issues - Comparison between master and switch_to_rust_ii.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison switch_to_rust_ii (median obs. +- std.dev) % change (higher-better) Note
ftsb-1M-nyc_taxis-hashes-load 33027 +- 3.1% (7 datapoints) 29959 -9.3% REGRESSION
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 151 +- 15.1% UNSTABLE (7 datapoints) 164 8.5% UNSTABLE (baseline high variance); client: Latency decreased 7.8% (baseline CV=16.9%); neither server nor client side confirms regression
Tests with No Significant Changes (17 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison switch_to_rust_ii (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-term-prefix 9306 +- 4.2% (7 datapoints) 8607 -7.5% potential REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-suffix 32 +- 0.3% (7 datapoints) 32 0.6% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 32 +- 0.2% (7 datapoints) 31 -0.9% No Change
ftsb-10K-enwiki_abstract-hashes-term-wildcard 64 +- 0.1% (7 datapoints) 64 -0.0% No Change
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 62 +- 0.3% (7 datapoints) 62 0.2% No Change
ftsb-10K-enwiki_pages-hashes-load 64834 +- 5.5% (7 datapoints) 63958 -1.4% No Change
ftsb-10K-multivalue-numeric-json 8.0 +- 0.1% (7 datapoints) 8 -0.3% No Change
ftsb-10K-singlevalue-numeric-json 8.0 +- 0.1% (7 datapoints) 8 0.1% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 32 +- 0.2% (7 datapoints) 32 0.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 63 +- 0.4% (7 datapoints) 63 0.3% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 63 +- 0.1% (7 datapoints) 63 0.2% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 3296 +- 7.8% (7 datapoints) 3313 0.5% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 63 +- 0.4% (7 datapoints) 63 0.1% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 63 +- 0.3% (7 datapoints) 63 0.5% No Change
ftsb-1M-enwiki_abstract-hashes-load 23432 +- 3.4% (7 datapoints) 22931 -2.1% No Change
ftsb-1M-nyc_taxis-ftadd-load 30958 +- 3.8% (7 datapoints) 29130 -5.9% potential REGRESSION
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 32 +- 0.2% (7 datapoints) 32 0.1% No Change

@chesedo chesedo force-pushed the switch_to_rust_ii branch 2 times, most recently from 4ec20de to 5ef14b6 Compare October 2, 2025 09:20
@chesedo chesedo changed the base branch from master to rust_ii_fork_rw October 2, 2025 09:20
}

// Test Revalidate returns VALIDATE_MOVED when the lastDocId was deleted from the index
TEST_P(InvIndIteratorRevalidateTest, RevalidateAfterDocumentDeleted) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is quite useful. Why is it being removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test requires getting a mutable reference to the inverted index from the index reader. The Rust index reader only holds a borrowed reference to the inverted index. So I can't implement the FFI needed for this test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alonre24 This is testing an iterator. So there is not an equivalent Rust tests for it (the Rust iterators does have this kind of test but that code is not in production yet)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is things like line 1020 which tries to mutate an index while it will only have *const access to it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the Rust iterator tests we have for revalidation

Base automatically changed from rust_ii_fork_rw to master October 20, 2025 22:19
@chesedo chesedo force-pushed the switch_to_rust_ii branch 2 times, most recently from 90b53ab to 049a70c Compare October 22, 2025 06:05
@chesedo chesedo requested review from GlenDC and gdesmott October 22, 2025 06:09
GlenDC
GlenDC previously approved these changes Oct 22, 2025
@chesedo chesedo force-pushed the switch_to_rust_ii branch 3 times, most recently from 5c05b42 to c91a117 Compare October 27, 2025 14:32
@gdesmott
Copy link
Collaborator

Not a blocker for this branch, but can't we now remove IndexReader_HasSeeker and safely assume all readers can seek?

@chesedo
Copy link
Collaborator Author

chesedo commented Oct 30, 2025

Not a blocker for this branch, but can't we now remove IndexReader_HasSeeker and safely assume all readers can seek?

We can definitely do that as a cleanup tasks after this is merged

@chesedo chesedo force-pushed the switch_to_rust_ii branch 3 times, most recently from af78c1f to 9fd6f6e Compare October 31, 2025 14:14
This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.
The text inverted indices expect to have a field mask filter, not the
`None` case.
It is now possible to have an inverted index with no blocks in it.
@chesedo chesedo added this pull request to the merge queue Nov 5, 2025
Merged via the queue into master with commit a8e9b3c Nov 5, 2025
30 of 32 checks passed
@chesedo chesedo deleted the switch_to_rust_ii branch November 5, 2025 07:24
@chesedo chesedo changed the title Switch to rust ii [MOD-11406] Switch to rust II Nov 5, 2025
This was referenced Nov 5, 2025
raz-mon pushed a commit that referenced this pull request Nov 8, 2025
* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.
github-merge-queue bot pushed a commit that referenced this pull request Nov 9, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

* Switch to rust ii (#6958)

* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.

* Partially fix totalDocLen metric

* Fix

* Make test not pass by accident

* Remove var

* Update totalDocLen on deletions as well

* Fix tests

* Fix assertions

* Clean up unused old dmd

* Add assertions to llapi

* Remove Delete API

* Skip unnecesary cluster tests

---------

Co-authored-by: Pieter <[email protected]>
redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 9, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

* Switch to rust ii (#6958)

* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.

* Partially fix totalDocLen metric

* Fix

* Make test not pass by accident

* Remove var

* Update totalDocLen on deletions as well

* Fix tests

* Fix assertions

* Clean up unused old dmd

* Add assertions to llapi

* Remove Delete API

* Skip unnecesary cluster tests

---------

Co-authored-by: Pieter <[email protected]>
(cherry picked from commit 25e65ef)
github-merge-queue bot pushed a commit that referenced this pull request Nov 9, 2025
* MOD-12234: Fix totalDocsLen updates (#7227)

* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

* Switch to rust ii (#6958)

* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.

* Partially fix totalDocLen metric

* Fix

* Make test not pass by accident

* Remove var

* Update totalDocLen on deletions as well

* Fix tests

* Fix assertions

* Clean up unused old dmd

* Add assertions to llapi

* Remove Delete API

* Skip unnecesary cluster tests

---------

Co-authored-by: Pieter <[email protected]>
(cherry picked from commit 25e65ef)

* Remove unrelated diffs

---------

Co-authored-by: Raz Monsonego <[email protected]>
Co-authored-by: Pieter <[email protected]>
raz-mon added a commit that referenced this pull request Nov 9, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

* Switch to rust ii (#6958)

* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.

* Partially fix totalDocLen metric

* Fix

* Make test not pass by accident

* Remove var

* Update totalDocLen on deletions as well

* Fix tests

* Fix assertions

* Clean up unused old dmd

* Add assertions to llapi

* Remove Delete API

* Skip unnecesary cluster tests

---------

Co-authored-by: Pieter <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 10, 2025
MOD-12234: Fix totalDocsLen updates (#7227)

* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

* Switch to rust ii (#6958)

* Switch to Rust inverted index

* Remove encoder and decoder getter usage

It is no longer possible to get the encoders or decoders from the flags.
Instead they are coupled to the inverted index or its reader.

* Update `testNumericInverted`

* Update testIndexFlags

* Don't check buffer growth in test

There is no reason for the test to check the internals of these structures.

* Fix size test on flags

* Fix size test for numeric index

* Fix size test for tag index

* Fix sizes on LLapi

* Update fork tests to const

* C GC updates

* Update fork tests

* Don't free deltas twice

* Fix sizes in Python tests

* Recreate index to have compression turned on

* Remove impossible test

This test requires a mutable access to the inverted index in a reader
and direct access to the repair block call. Both of these are not
possible so the test is removed.

* Remove C benchmarks

* Move header out of future

* Remove from iterator benches

* Fix symbols for GC benchmark

* Update iterator benchmark FFI

* Get num_entries on index directly

* Trim bench dependencies

* Fix filters in CPP tests

The text inverted indices expect to have a field mask filter, not the
`None` case.

* Update debug assertions

It is now possible to have an inverted index with no blocks in it.

* Partially fix totalDocLen metric

* Fix

* Make test not pass by accident

* Remove var

* Update totalDocLen on deletions as well

* Fix tests

* Fix assertions

* Clean up unused old dmd

* Add assertions to llapi

* Remove Delete API

* Skip unnecesary cluster tests

---------

Co-authored-by: Pieter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants