Skip to content

MOD-11335: Merge feature-RRF branch into master#6874

Merged
nafraf merged 43 commits intomasterfrom
nafraf_merge-feature-RRF-to-master
Sep 23, 2025
Merged

MOD-11335: Merge feature-RRF branch into master#6874
nafraf merged 43 commits intomasterfrom
nafraf_merge-feature-RRF-to-master

Conversation

@nafraf
Copy link
Collaborator

@nafraf nafraf commented Sep 22, 2025

Describe the changes in the pull request

Merge feature-RRFbranch into master

Note:

  • In cachedVars, lastLk renamed to lastLookup
  • HasScorer() was modified to receive AREQ
  • Fix to serializeVectorNormalizer because it failed on Alpine.
  • Metric GetMetric(const QueryIterator *it); was unused, then it was removed
  • The changes to reply.c created by PR MOD-10250: RRF RESP3 format response #6388 were considered risky and were not included in this PR.

raz-mon and others added 30 commits June 24, 2025 10:36
Sync feature branch with master
Add pipeline activation and result serialization functions for `FT.HYBRID`
* wrap common areq members usage with functions

* fix compilation

* small fixes

* revert some small changes

* use AREQ_RequestFlags in a few other places

* additional changes

* minor changes

* Apply suggestions from code review

use right address

* code review comments
…tency (#6413)

* Add index locking/freeing mechanism for index-space consistency

* Verify lock is released at the end of the depleter's logic

* Fix locking functions

* Set take-lock = false

* Address review and add tests with locking mechanism

* Add tests with index synchronization

* Move released var to sync

* Add important note

* Remove leftovers

* Address review
* wrap common areq members usage with functions

* fix compilation

* small fixes

* revert some small changes

* use AREQ_RequestFlags in a few other places

* additional changes

* minor changes

* initial commit

common initialization

use functions to access and manipulate common areq members

use local variables to reduce duplicate function calls

compilation fixes

revert line

remove hybrid request

* fix test

* add AggregationPipeline_Free

* move more code to AggregationPipeline_Free

* fix compilation + max search result

* code review comments

* fix compilation

* address code review comments, better pipeline and params separation for pipeline construction

* fix type name

* remove comment

* add cmake changes

* bring back old clean logic

* move AREQ_BuildPipeline to original position in header

* fix compilation post merge

* fix some tests seg faults

* remove hybrid request, not supposed to be in this branch

* add missing newlines

* bring back missing freeing of areq members

* minor api changes

* copy the context before releasing the loader

* add comments

* code review comments

* code review - rename

* accidently added hybrid request - removing
* wrap common areq members usage with functions

* use AREQ_RequestFlags in a few other places

* common initialization

* use functions to access and manipulate common areq members

* use local variables to reduce duplicate function calls

* add AggregationPipeline_Free

* move more code to AggregationPipeline_Free

* fix compilation + max search result

* address code review comments, better pipeline and params separation for pipeline construction

* add cmake changes

* bring back old clean logic

* move AREQ_BuildPipeline to original position in header

* fix compilation post merge

* fix some tests seg faults

* remove hybrid request, not supposed to be in this branch

* add missing newlines

* bring back missing freeing of areq members

* minor api changes

* copy the context before releasing the loader

* add comments

* add hybrid request

* add loader to areq + tests

* add cmake changes

* align hybrid request api

* fix tests - use the AGGPLN inside the hybrid request for consistent linke list pointer comparison

* remove RLookup_CloneKey API

* remove pln member from params

* fix memory leak

* Moved loadDtor from aggregate_request to aggregate_plan and remove duplicate implementation of it

* Remove redundant VERBATIM flag setting in CreateTestAREQ function

* Add TODO comments for sorter

---------

Co-authored-by: [email protected] <[email protected]>
* Initial FT.HYBRID parsing

* Fix parseSearchSubquery compilation error

* Fix search parsing and endless loop in error handling

* Improve cpp tests argv creation and fix

* Parse tail AGGPlan & rebase

* All tests pass

* Add database selection functions to redismock

* Fix parseCombine() memory leak

* Add HybridPipelineParams* to HybridRequest  struct

* Fix AggPlan handling in parser

* Update copyright notice and licensing information in parse_hybrid files

* Change AREQ pipeline to pointer

* Apply context to vector requests and refactoring assertions in tests

* Copy reqConfig to searchRequest and vectorRequest.

* Cleanup

* MOD-10303: Hybrid Request (#6404)

* wrap common areq members usage with functions

* use AREQ_RequestFlags in a few other places

* common initialization

* use functions to access and manipulate common areq members

* use local variables to reduce duplicate function calls

* add AggregationPipeline_Free

* move more code to AggregationPipeline_Free

* fix compilation + max search result

* address code review comments, better pipeline and params separation for pipeline construction

* add cmake changes

* bring back old clean logic

* move AREQ_BuildPipeline to original position in header

* fix compilation post merge

* fix some tests seg faults

* remove hybrid request, not supposed to be in this branch

* add missing newlines

* bring back missing freeing of areq members

* minor api changes

* copy the context before releasing the loader

* add comments

* add hybrid request

* add loader to areq + tests

* add cmake changes

* align hybrid request api

* fix tests - use the AGGPLN inside the hybrid request for consistent linke list pointer comparison

* remove RLookup_CloneKey API

* remove pln member from params

* fix memory leak

* Moved loadDtor from aggregate_request to aggregate_plan and remove duplicate implementation of it

* Remove redundant VERBATIM flag setting in CreateTestAREQ function

* Add TODO comments for sorter

---------



* CR: Add docs for a few functions

* CR: extract HYBRID_REQUEST_NUM_SUBQUERIES constant

* Fix pipeline reference in parseQueryArgs and AREQ_BuildPipeline functions

* Refactor to don't use AREQ to parse tail

* Refactor pipeline handling to use direct struct access instead of pointers

* Remove unused test util

* Revert "Remove unused test util"

This reverts commit 7130515.

* Revert "Revert "Remove unused test util""

This reverts commit d7e54d2.

* Remove multiline request test

This test only tested ArgvList initialization with multiline command

* Revert AREQ_AddRequestFlags() and AREQ_RemoveRequestFlags() changes

* Fix maxResultsLimit assignment logic

* Update HybridRequest_Free and tests to reflect scoring context is freed by hybrid merger

* Add test for valid input with parameters

* Implement context cleanup scheduling for background thread processing

---------

Co-authored-by: nafraf <[email protected]>
Co-authored-by: kei-nan <[email protected]>
Co-authored-by: [email protected] <[email protected]>
* Implement automatic @__key and @__score return for hybrid search (MOD-10249)

- Add CreateImplicitLoadStep() function to create implicit LOAD with @__key and @__score fields
- Modify HybridRequest_BuildPipeline() to detect missing LOAD and add implicit one after merge operation
- Implicit LOAD is added only to tail pipeline for final merged results, not individual request pipelines
- Add proper RLookupKey creation following existing test helper patterns
- Add 2 focused tests following existing naming conventions:
  - testHybridRequestImplicitLoad: verifies implicit LOAD creation
  - testHybridRequestExplicitLoadPreserved: ensures explicit LOAD steps are preserved
- Update baseline test expectations to match correct behavior (no loaders in individual pipelines)
- All tests passing, implementation complete and ready for use

Implements PRD requirement 1.12: automatic doc_id and combined_score return when no LOAD field specified

* Fix hybrid search implicit LOAD field names

Change from @__key/@__score to doc_id/combined_score

* Remove empty lines

* Remove redundant test

* Fix hybrid search implicit LOAD field names

- Change from @__key/@__score to doc_id/combined_score
- Add HYBRID_IMPLICIT_DOC_ID and HYBRID_IMPLICIT_COMBINED_SCORE constants
- Update RLookupKey mapping and test comments
- Preserves both automatic and explicit LOAD behaviors

All tests passing.

* Fix implicit LOAD args setup for proper field name representation

- Add both doc_id and combined_score fields to implicit LOAD step
- Set up ArgsCursor with proper field names for serialization and display
- Use static array for args.objs to avoid memory management issues with loadDtor
- Include rmutil/args.h for ArgsCursor type definitions

All tests passing.

* Fix lookup to first

Last is after groupby steps, if exist. Currently it's not supported in hybrid subqueries so this was not noticed.

* Move implicit load from tail to hybrid subqueries

* Add implicit score if no LOAD was supplied

* Fix memory leak

* Rename key field

* CR: Remove comments and extend tests

* Add scoreKey validation for HybridMerger

* Fix merge
* Implement automatic @__key and @__score return for hybrid search (MOD-10249)

- Add CreateImplicitLoadStep() function to create implicit LOAD with @__key and @__score fields
- Modify HybridRequest_BuildPipeline() to detect missing LOAD and add implicit one after merge operation
- Implicit LOAD is added only to tail pipeline for final merged results, not individual request pipelines
- Add proper RLookupKey creation following existing test helper patterns
- Add 2 focused tests following existing naming conventions:
  - testHybridRequestImplicitLoad: verifies implicit LOAD creation
  - testHybridRequestExplicitLoadPreserved: ensures explicit LOAD steps are preserved
- Update baseline test expectations to match correct behavior (no loaders in individual pipelines)
- All tests passing, implementation complete and ready for use

Implements PRD requirement 1.12: automatic doc_id and combined_score return when no LOAD field specified

* Fix hybrid search implicit LOAD field names

Change from @__key/@__score to doc_id/combined_score

* Remove empty lines

* Remove redundant test

* Fix hybrid search implicit LOAD field names

- Change from @__key/@__score to doc_id/combined_score
- Add HYBRID_IMPLICIT_DOC_ID and HYBRID_IMPLICIT_COMBINED_SCORE constants
- Update RLookupKey mapping and test comments
- Preserves both automatic and explicit LOAD behaviors

All tests passing.

* Fix implicit LOAD args setup for proper field name representation

- Add both doc_id and combined_score fields to implicit LOAD step
- Set up ArgsCursor with proper field names for serialization and display
- Use static array for args.objs to avoid memory management issues with loadDtor
- Include rmutil/args.h for ArgsCursor type definitions

All tests passing.

* Fix lookup to first

Last is after groupby steps, if exist. Currently it's not supported in hybrid subqueries so this was not noticed.

* Move implicit load from tail to hybrid subqueries

* Add implicit score if no LOAD was supplied

* Fix memory leak

* Rename key field

* CR: Remove comments and extend tests

* Add implicit sort-by-score for hybrid search and refactor test boilerplate

- Add implicit sort-by-score step in HybridRequest_BuildPipeline when no explicit SORTBY is provided
- Ensures hybrid search results are automatically sorted by computed hybrid scores (descending order)
- Only adds implicit sorting when AGPLN_FindStep finds no PLN_T_ARRANGE step in tail pipeline
- Uses maxResultsLimit from aggregation params, falls back to DEFAULT_LIMIT if not specified
- Follows same pattern as regular search pipelines for consistent behavior

Test improvements:
- Extract boilerplate setup/cleanup code into reusable helper methods in HybridRequestTest class
- Reduce test code size by 23% (743 → 568 lines) while maintaining full functionality
- Add automatic resource management with tracked cleanup in TearDown()
- Improve test readability by focusing on test logic rather than setup boilerplate
- Add comprehensive helper methods for hybrid request creation, scoring contexts, and pipeline verification
- Fix TODOs in existing tests to reflect new implicit sort-by-score functionality

* Add scoreKey validation for HybridMerger

* Fix merge

* Add SORTBY 0 to disable implicit sorting

* revert change

* Fix SORTBY 0 and add test

* CR: Removed unused code

* CR: Removed wrong comment

* CR: Preserve FT.AGGREGATE behavior with SORTBY 0

We introduce new IsHybrid request flag for that, and apply the new
behavior only if it's a hybrid request

* CR: Comments

* CR: Replace RP create with AGG Plan

* Fix tests
…uery (#6583)

* Add scorer and sorter by setting search flag for first subquery

* Change search subquery type to Hybrid

* Fix missing parenthesis

Co-authored-by: Raz Monsonego <[email protected]>

* Fix missing parenthesis

* Clarify query type access

`reqflag & (QEXEC_F_IS_SEARCH | QEXEC_F_IS_HYBRID)` might be confusing because it suggests a query can have mixed type with the 2 flags set at the same time.
 Using `IsSearch(req) || IsHybrid(req)` to better convey that a request is either Hybrid or Search.

* Change type to Hybrid Search Subquery

Introducing a new separate type for Search Subquery of Hybrid.
Hybrid type is still left for the tail pipeline construction and output.

* Rename Hybrid query type to Hybrid Tail

* Fix bug

* Update src/aggregate/aggregate.h

Co-authored-by: nafraf <[email protected]>

* CR: Fix bug

* Fix flow test

---------

Co-authored-by: Raz Monsonego <[email protected]>
Co-authored-by: nafraf <[email protected]>
Fix SORTBY 0 by not creating Arrange AggPlanStep
* Add default Window/Limit handling

Also fix rrf k parsing to support decimal values.

* Remove unneeded comments

* Fix memory leaks in hybrid search: properly free HybridScoringContext

- Fix HybridRequest_Free to properly call HybridScoringContext_Free
- Remove manual HybridScoringContext_Free calls from all ParseHybridTest tests
- Establish proper ownership: HybridRequest owns and cleans up its scoring context
- Resolves use-after-free errors and memory leaks detected by AddressSanitizer
- All 52 ParseHybridTest and 13 HybridDefaultsTest now pass with ASAN enabled

* Update src/hybrid/parse_hybrid.c

Co-authored-by: Raz Monsonego <[email protected]>

* Remove accidentally committed files (TODO.md and CMakeLists.txt)

* Update src/hybrid/parse_hybrid.c

Co-authored-by: Raz Monsonego <[email protected]>

* CR: fix default RRF K, Add a test

* Add KNN K validation on Defaults tests

* Remove scope creeping test

I'm not sure whats the expected behavior on LIMIT 0 0 on Hybrid. counting results seems non obvious (should we count results that appear in both subqueries once or twice?) and missing the purpose of this command.
I'm removing this test and if this requires further handling lets make it on separate task to avoid scope creep

* Remove originalWeights

Also make num of weights parameter to parseCombine, preparing for multi vectors in Hybrid

* Move window back only to RRF

* fix CR comments

* CR cosmetics

- remove malloc failure check
- rename explicit load search for clarity

* Apply KNN K <= WINDOW constraint internally

* Refactor

* Fix Tests Refactor

* Fix Tests

* Another test refactor to extract boilerplate

* Trying to fix leaks

* Trying to fix leaks

* Move HybridScoringContext ownership from request to Hybrid Merger

* Release scoring ctx on parsing tests

Now that Merger RP owns scoring ctx and HybridRequest doesn't free it,
release it manually in parsing tests as the merger is not created there

* Remove redundant test cleanup

* Remove Merger dependency on parent RP

* SIZE_T_MAX not defined, use SIZE_MAX instead

---------

Co-authored-by: Raz Monsonego <[email protected]>
Add tests utils for FT.HYBRID flow tests and add tests for FT.HYBRID + KNN + RRF.
size_t *prefixesOffset; // Prefixes offset
CursorConfig *cursorConfig; // Cursor configuration
const char ***requiredFields; // Required fields
size_t *maxSearchResults; // Maximum search results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced on using pointers here

if (internal_only && isClusterCoord(debug_req)) {
if (results_count == 0) {
if (!(debug_req->r.reqflags & QEXEC_F_IS_CURSOR)) {
if (!(AREQ_RequestFlags(&debug_req->r) & QEXEC_F_IS_CURSOR)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we could have a AREQ_IsCursor?

QEFlags reqFlags = AREQ_RequestFlags(req);

if (req->reqflags & QEXEC_F_IS_SEARCH) {
if (reqFlags & QEXEC_F_IS_SEARCH) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also use these kind of methods here


bool hasTimeoutError(QueryError *err);

bool ShouldReplyWithError(QueryError *status, RSTimeoutPolicy timeoutPolicy, bool isProfile);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add docstrings?

return results;
}

void startPipelineCommon(RSTimeoutPolicy timeoutPolicy, struct timespec *timeout,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss some documentation, for instance why is Results not filled if No TimeOutPolicy FAIL?


static int parseSortby(PLN_ArrangeStep *arng, ArgsCursor *ac, QueryError *status, int isLegacy) {
static int parseSortby(PLN_ArrangeStep *arng, ArgsCursor *ac, QueryError *status, ParseAggPlanContext *papCtx) {
bool isLegacy = *papCtx->reqflags & QEXEC_F_IS_SEARCH;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be interesting a comment explaining with IS_SEARCH implies Legacy

int Param_DictAdd(dict *d, const char *name, const char *value, size_t value_len, QueryError *status);
const char *Param_DictGet(dict *d, const char *name, size_t *value_len, QueryError *status);
void Param_DictFree(dict *);
dict *Param_DictClone(dict *source);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it exists, but adding a specific unit test would be nice

@JoanFM
Copy link
Collaborator

JoanFM commented Sep 23, 2025

@nafraf I added my review. Some of the comments do not seem quite important. Feel free to ignore those that you think may not be good given the time constraint

}
}

std::array<RedisModuleCtx*, NumberOfContexts> redisContexts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need an array of Module contexts?

json_add_close(reply, " ]");
}
int count = _RedisModule_Reply_Pop(reply);
if (reply->resp3) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems risky. I would perhaps try to push this in a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the changes to reply.c were reverted.

cursor[bot]

This comment was marked as outdated.

JoanFM
JoanFM previously approved these changes Sep 23, 2025
Copy link
Collaborator

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve waiting for some follow Ups for minor things

@nafraf nafraf added this pull request to the merge queue Sep 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 23, 2025
@nafraf nafraf added this pull request to the merge queue Sep 23, 2025
Merged via the queue into master with commit e017b6c Sep 23, 2025
39 checks passed
@nafraf nafraf deleted the nafraf_merge-feature-RRF-to-master branch September 23, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants