[feature-RRF] MOD-10224: Add vector filter validation by nafraf · Pull Request #6487 · RediSearch/RediSearch

nafraf · 2025-07-10T13:00:54Z

Description

This PR contains the same changes of #6376 but applied to feature-RRF branch.

For HYBRID command add:

Vector filter validation - cannot use weights, cannot use vector commands
Search query validation - cannot use vector commands

codecov · 2025-07-10T13:35:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (feature-RRF@e80f690). Learn more about missing BASE report.

Additional details and impacted files

@@              Coverage Diff               @@
##             feature-RRF    #6487   +/-   ##
==============================================
  Coverage               ?   88.60%           
==============================================
  Files                  ?      254           
  Lines                  ?    41893           
  Branches               ?     3630           
==============================================
  Hits                   ?    37119           
  Misses                 ?     4725           
  Partials               ?       49

Flag	Coverage Δ
flow	`80.68% <64.70%> (?)`
unit	`47.37% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* MOD-10030: Add background depleter result processor (#6325) Add RPDepleter * Sync feature branch with master (#6373) Sync feature branch with master * Prevent CI breakage when Rust 1.88.0 is released (#6375) (#6389) * MOD-10031: RPHybridMerger (#6317) * MOD-10250: RRF RESP3 format response (#6388) Add pipeline activation and result serialization functions for `FT.HYBRID` * MOD-10299: Wrap Common AREQ Members Usage With Functions (#6390) * wrap common areq members usage with functions * fix compilation * small fixes * revert some small changes * use AREQ_RequestFlags in a few other places * additional changes * minor changes * Apply suggestions from code review use right address * code review comments * MOD-10301: Add index locking/freeing mechanism for index-space consistency (#6413) * Add index locking/freeing mechanism for index-space consistency * Verify lock is released at the end of the depleter's logic * Fix locking functions * Set take-lock = false * Address review and add tests with locking mechanism * Add tests with index synchronization * Move released var to sync * Add important note * Remove leftovers * Address review * MOD-10254: Separate Aggregation Pipeline From AREQ (#6399) * wrap common areq members usage with functions * fix compilation * small fixes * revert some small changes * use AREQ_RequestFlags in a few other places * additional changes * minor changes * initial commit common initialization use functions to access and manipulate common areq members use local variables to reduce duplicate function calls compilation fixes revert line remove hybrid request * fix test * add AggregationPipeline_Free * move more code to AggregationPipeline_Free * fix compilation + max search result * code review comments * fix compilation * address code review comments, better pipeline and params separation for pipeline construction * fix type name * remove comment * add cmake changes * bring back old clean logic * move AREQ_BuildPipeline to original position in header * fix compilation post merge * fix some tests seg faults * remove hybrid request, not supposed to be in this branch * add missing newlines * bring back missing freeing of areq members * minor api changes * copy the context before releasing the loader * add comments * code review comments * code review - rename * accidently added hybrid request - removing * [feature-RRF] MOD-10224: Add vector filter validation (#6487) Apply PR 6376 to feature-RRF branch * MOD-10303: Hybrid Request (#6404) * wrap common areq members usage with functions * use AREQ_RequestFlags in a few other places * common initialization * use functions to access and manipulate common areq members * use local variables to reduce duplicate function calls * add AggregationPipeline_Free * move more code to AggregationPipeline_Free * fix compilation + max search result * address code review comments, better pipeline and params separation for pipeline construction * add cmake changes * bring back old clean logic * move AREQ_BuildPipeline to original position in header * fix compilation post merge * fix some tests seg faults * remove hybrid request, not supposed to be in this branch * add missing newlines * bring back missing freeing of areq members * minor api changes * copy the context before releasing the loader * add comments * add hybrid request * add loader to areq + tests * add cmake changes * align hybrid request api * fix tests - use the AGGPLN inside the hybrid request for consistent linke list pointer comparison * remove RLookup_CloneKey API * remove pln member from params * fix memory leak * Moved loadDtor from aggregate_request to aggregate_plan and remove duplicate implementation of it * Remove redundant VERBATIM flag setting in CreateTestAREQ function * Add TODO comments for sorter --------- Co-authored-by: [email protected] <[email protected]> * MOD-10041 Parse FT.HYBRID command (#6428) (#6551) * Initial FT.HYBRID parsing * Fix parseSearchSubquery compilation error * Fix search parsing and endless loop in error handling * Improve cpp tests argv creation and fix * Parse tail AGGPlan & rebase * All tests pass * Add database selection functions to redismock * Fix parseCombine() memory leak * Add HybridPipelineParams* to HybridRequest struct * Fix AggPlan handling in parser * Update copyright notice and licensing information in parse_hybrid files * Change AREQ pipeline to pointer * Apply context to vector requests and refactoring assertions in tests * Copy reqConfig to searchRequest and vectorRequest. * Cleanup * MOD-10303: Hybrid Request (#6404) * wrap common areq members usage with functions * use AREQ_RequestFlags in a few other places * common initialization * use functions to access and manipulate common areq members * use local variables to reduce duplicate function calls * add AggregationPipeline_Free * move more code to AggregationPipeline_Free * fix compilation + max search result * address code review comments, better pipeline and params separation for pipeline construction * add cmake changes * bring back old clean logic * move AREQ_BuildPipeline to original position in header * fix compilation post merge * fix some tests seg faults * remove hybrid request, not supposed to be in this branch * add missing newlines * bring back missing freeing of areq members * minor api changes * copy the context before releasing the loader * add comments * add hybrid request * add loader to areq + tests * add cmake changes * align hybrid request api * fix tests - use the AGGPLN inside the hybrid request for consistent linke list pointer comparison * remove RLookup_CloneKey API * remove pln member from params * fix memory leak * Moved loadDtor from aggregate_request to aggregate_plan and remove duplicate implementation of it * Remove redundant VERBATIM flag setting in CreateTestAREQ function * Add TODO comments for sorter --------- * CR: Add docs for a few functions * CR: extract HYBRID_REQUEST_NUM_SUBQUERIES constant * Fix pipeline reference in parseQueryArgs and AREQ_BuildPipeline functions * Refactor to don't use AREQ to parse tail * Refactor pipeline handling to use direct struct access instead of pointers * Remove unused test util * Revert "Remove unused test util" This reverts commit 7130515. * Revert "Revert "Remove unused test util"" This reverts commit d7e54d2. * Remove multiline request test This test only tested ArgvList initialization with multiline command * Revert AREQ_AddRequestFlags() and AREQ_RemoveRequestFlags() changes * Fix maxResultsLimit assignment logic * Update HybridRequest_Free and tests to reflect scoring context is freed by hybrid merger * Add test for valid input with parameters * Implement context cleanup scheduling for background thread processing --------- Co-authored-by: nafraf <[email protected]> Co-authored-by: kei-nan <[email protected]> Co-authored-by: [email protected] <[email protected]> * MOD-10249 Hybrid implicit load (#6529) * Implement automatic @__key and @__score return for hybrid search (MOD-10249) - Add CreateImplicitLoadStep() function to create implicit LOAD with @__key and @__score fields - Modify HybridRequest_BuildPipeline() to detect missing LOAD and add implicit one after merge operation - Implicit LOAD is added only to tail pipeline for final merged results, not individual request pipelines - Add proper RLookupKey creation following existing test helper patterns - Add 2 focused tests following existing naming conventions: - testHybridRequestImplicitLoad: verifies implicit LOAD creation - testHybridRequestExplicitLoadPreserved: ensures explicit LOAD steps are preserved - Update baseline test expectations to match correct behavior (no loaders in individual pipelines) - All tests passing, implementation complete and ready for use Implements PRD requirement 1.12: automatic doc_id and combined_score return when no LOAD field specified * Fix hybrid search implicit LOAD field names Change from @__key/@__score to doc_id/combined_score * Remove empty lines * Remove redundant test * Fix hybrid search implicit LOAD field names - Change from @__key/@__score to doc_id/combined_score - Add HYBRID_IMPLICIT_DOC_ID and HYBRID_IMPLICIT_COMBINED_SCORE constants - Update RLookupKey mapping and test comments - Preserves both automatic and explicit LOAD behaviors All tests passing. * Fix implicit LOAD args setup for proper field name representation - Add both doc_id and combined_score fields to implicit LOAD step - Set up ArgsCursor with proper field names for serialization and display - Use static array for args.objs to avoid memory management issues with loadDtor - Include rmutil/args.h for ArgsCursor type definitions All tests passing. * Fix lookup to first Last is after groupby steps, if exist. Currently it's not supported in hybrid subqueries so this was not noticed. * Move implicit load from tail to hybrid subqueries * Add implicit score if no LOAD was supplied * Fix memory leak * Rename key field * CR: Remove comments and extend tests * Add scoreKey validation for HybridMerger * Fix merge * MOD-10549 Hybrid implicit SORTBY (#6552) * Implement automatic @__key and @__score return for hybrid search (MOD-10249) - Add CreateImplicitLoadStep() function to create implicit LOAD with @__key and @__score fields - Modify HybridRequest_BuildPipeline() to detect missing LOAD and add implicit one after merge operation - Implicit LOAD is added only to tail pipeline for final merged results, not individual request pipelines - Add proper RLookupKey creation following existing test helper patterns - Add 2 focused tests following existing naming conventions: - testHybridRequestImplicitLoad: verifies implicit LOAD creation - testHybridRequestExplicitLoadPreserved: ensures explicit LOAD steps are preserved - Update baseline test expectations to match correct behavior (no loaders in individual pipelines) - All tests passing, implementation complete and ready for use Implements PRD requirement 1.12: automatic doc_id and combined_score return when no LOAD field specified * Fix hybrid search implicit LOAD field names Change from @__key/@__score to doc_id/combined_score * Remove empty lines * Remove redundant test * Fix hybrid search implicit LOAD field names - Change from @__key/@__score to doc_id/combined_score - Add HYBRID_IMPLICIT_DOC_ID and HYBRID_IMPLICIT_COMBINED_SCORE constants - Update RLookupKey mapping and test comments - Preserves both automatic and explicit LOAD behaviors All tests passing. * Fix implicit LOAD args setup for proper field name representation - Add both doc_id and combined_score fields to implicit LOAD step - Set up ArgsCursor with proper field names for serialization and display - Use static array for args.objs to avoid memory management issues with loadDtor - Include rmutil/args.h for ArgsCursor type definitions All tests passing. * Fix lookup to first Last is after groupby steps, if exist. Currently it's not supported in hybrid subqueries so this was not noticed. * Move implicit load from tail to hybrid subqueries * Add implicit score if no LOAD was supplied * Fix memory leak * Rename key field * CR: Remove comments and extend tests * Add implicit sort-by-score for hybrid search and refactor test boilerplate - Add implicit sort-by-score step in HybridRequest_BuildPipeline when no explicit SORTBY is provided - Ensures hybrid search results are automatically sorted by computed hybrid scores (descending order) - Only adds implicit sorting when AGPLN_FindStep finds no PLN_T_ARRANGE step in tail pipeline - Uses maxResultsLimit from aggregation params, falls back to DEFAULT_LIMIT if not specified - Follows same pattern as regular search pipelines for consistent behavior Test improvements: - Extract boilerplate setup/cleanup code into reusable helper methods in HybridRequestTest class - Reduce test code size by 23% (743 → 568 lines) while maintaining full functionality - Add automatic resource management with tracked cleanup in TearDown() - Improve test readability by focusing on test logic rather than setup boilerplate - Add comprehensive helper methods for hybrid request creation, scoring contexts, and pipeline verification - Fix TODOs in existing tests to reflect new implicit sort-by-score functionality * Add scoreKey validation for HybridMerger * Fix merge * Add SORTBY 0 to disable implicit sorting * revert change * Fix SORTBY 0 and add test * CR: Removed unused code * CR: Removed wrong comment * CR: Preserve FT.AGGREGATE behavior with SORTBY 0 We introduce new IsHybrid request flag for that, and apply the new behavior only if it's a hybrid request * CR: Comments * CR: Replace RP create with AGG Plan * Fix tests * MOD-10226: Implement parser for Hybrid Query VECTOR query params (#6482) * MOD-10633 Add scorer and sorter by setting search flag for first subquery (#6583) * Add scorer and sorter by setting search flag for first subquery * Change search subquery type to Hybrid * Fix missing parenthesis Co-authored-by: Raz Monsonego <[email protected]> * Fix missing parenthesis * Clarify query type access `reqflag & (QEXEC_F_IS_SEARCH | QEXEC_F_IS_HYBRID)` might be confusing because it suggests a query can have mixed type with the 2 flags set at the same time. Using `IsSearch(req) || IsHybrid(req)` to better convey that a request is either Hybrid or Search. * Change type to Hybrid Search Subquery Introducing a new separate type for Search Subquery of Hybrid. Hybrid type is still left for the tail pipeline construction and output. * Rename Hybrid query type to Hybrid Tail * Fix bug * Update src/aggregate/aggregate.h Co-authored-by: nafraf <[email protected]> * CR: Fix bug * Fix flow test --------- Co-authored-by: Raz Monsonego <[email protected]> Co-authored-by: nafraf <[email protected]> * MOD-10533: Sync RLookups - tail and upstreams (#6612) * MOD-10835 Fix SORTBY 0 by not creating Arrange AggPlanStep (#6635) Fix SORTBY 0 by not creating Arrange AggPlanStep * MOD-10550 MOD-10645 Window / Limit defaults (#6610) * Add default Window/Limit handling Also fix rrf k parsing to support decimal values. * Remove unneeded comments * Fix memory leaks in hybrid search: properly free HybridScoringContext - Fix HybridRequest_Free to properly call HybridScoringContext_Free - Remove manual HybridScoringContext_Free calls from all ParseHybridTest tests - Establish proper ownership: HybridRequest owns and cleans up its scoring context - Resolves use-after-free errors and memory leaks detected by AddressSanitizer - All 52 ParseHybridTest and 13 HybridDefaultsTest now pass with ASAN enabled * Update src/hybrid/parse_hybrid.c Co-authored-by: Raz Monsonego <[email protected]> * Remove accidentally committed files (TODO.md and CMakeLists.txt) * Update src/hybrid/parse_hybrid.c Co-authored-by: Raz Monsonego <[email protected]> * CR: fix default RRF K, Add a test * Add KNN K validation on Defaults tests * Remove scope creeping test I'm not sure whats the expected behavior on LIMIT 0 0 on Hybrid. counting results seems non obvious (should we count results that appear in both subqueries once or twice?) and missing the purpose of this command. I'm removing this test and if this requires further handling lets make it on separate task to avoid scope creep * Remove originalWeights Also make num of weights parameter to parseCombine, preparing for multi vectors in Hybrid * Move window back only to RRF * fix CR comments * CR cosmetics - remove malloc failure check - rename explicit load search for clarity * Apply KNN K <= WINDOW constraint internally * Refactor * Fix Tests Refactor * Fix Tests * Another test refactor to extract boilerplate * Trying to fix leaks * Trying to fix leaks * Move HybridScoringContext ownership from request to Hybrid Merger * Release scoring ctx on parsing tests Now that Merger RP owns scoring ctx and HybridRequest doesn't free it, release it manually in parsing tests as the merger is not created there * Remove redundant test cleanup * Remove Merger dependency on parent RP * SIZE_T_MAX not defined, use SIZE_MAX instead --------- Co-authored-by: Raz Monsonego <[email protected]> * MOD-10533: Merge SearchResults (#6579) * MOD-10408: Hybrid command handler - part 1 (#6639) * MOD-10227: Disable tag scoring for FT.HYBRID (#6707) * MOD-11012: Fix VSIM filter not working in hybrid queries (#6708) * MOD-11132: Fix direct blob issue in FT.HYBRID (#6744) * MOD-10407: Handle warnings and errors of the subquery pipelines (#6737) * MOD-11140: Implement range vector subquery filter as intersect (#6751) * MOD-10050: Add hybrid search flow tests - part 1 (#6711) Add tests utils for FT.HYBRID flow tests and add tests for FT.HYBRID + KNN + RRF. * MOD-11047: Rename RRF K parameter to CONSTANT (#6745) * MOD-11144: Fix FT.HYBRID GROUPBY bug (#6776) * MOD-10047: Prepare hybrid parsing for cursors (#6770) * MOD-11049: Use BM25STD.NORM for text subquery for LINEAR fusion (#6789) * Revert BM25STD.NORM as default (#6822) * Revert "MOD-11049: Use BM25STD.NORM for text subquery for LINEAR fusion (#6789)" This reverts commit 5d41d9e. * Restore the explicit and implicit scorer tests for FT.HYBRID * newline EOF * Add flow test for FT.HYBRID with explicit scorer * newline EOF * MOD-11276, MOD-11043: Reconstruct RlookupRow when merging (#6801) * Implement RLookup_AddKeysFrom and RLookupRow_TransferFields functions with corresponding tests * Remove RLookupKey_Clone and RLookup_CloneInto and their respective tests * Integrate hybrid_lookup_context to FT.HYBRID flow * Remove leftovers and redundencies * Refactor merge search results * Refactor rlookup add&transfer fucntions, Refactor their tests Fix hybrid merger tests (include added parameter) * enable YIELD_DISTANCE_AS cpp tests * Add tests/pytests/test_hybrid_distance.py * Fix memory leak * Cleanups * Fix test * Enhance RLookup_AddKeysFrom to combine caller flags with source key properties, preserving non-transient flags. Add tests for F_HIDDEN flag handling, verifying preservation and override behavior. * CR comments * CR comments * CI fix * CR comments * Cleanup of lookup context in test_cpp_hybridmerger.cpp * Fix memory leak * CR comments * spellcheck * Refactor RLookup tests to use helper functions for keys initialization and values verification * fix parsing of FT.HYBRID results in pytest * MOD-10050: Tests with dialects and multithread (#6777) * MOD-11048, MOD-11157: Add default normalizers to vector search (#6820) * Implement RLookup_AddKeysFrom and RLookupRow_TransferFields functions with corresponding tests * Remove RLookupKey_Clone and RLookup_CloneInto and their respective tests * Integrate hybrid_lookup_context to FT.HYBRID flow * Remove leftovers and redundencies * Refactor merge search results * Refactor rlookup add&transfer fucntions, Refactor their tests Fix hybrid merger tests (include added parameter) * enable YIELD_DISTANCE_AS cpp tests * Add tests/pytests/test_hybrid_distance.py * Fix memory leak * Cleanups * Fix test * Enhance RLookup_AddKeysFrom to combine caller flags with source key properties, preserving non-transient flags. Add tests for F_HIDDEN flag handling, verifying preservation and override behavior. * Implement RPVectorNormalizer and PLN_VectorNormalizerStep * Intermediate work of vector normalization * Add GetVecSimMetricFromVectorField function to extract metrics from vector field specifications * Make default vector distance field hidden from response unless explicitly requested Vector normalization step now reads distance from RLookup, normalizes it, and writes back Always normalize vectors for both RFF and LINEAR scoring methods Update test expectations to account for normalized L2 distances * CR comments * Implement RPVectorNormalizer and PLN_VectorNormalizerStep * Intermediate work of vector normalization * Add GetVecSimMetricFromVectorField function to extract metrics from vector field specifications * Make default vector distance field hidden from response unless explicitly requested Vector normalization step now reads distance from RLookup, normalizes it, and writes back Always normalize vectors for both RFF and LINEAR scoring methods Update test expectations to account for normalized L2 distances * Add tests/pytests/test_hybrid_vector_normalizer.py - test the yielded distance againt a calculated one * Fix comments * Fix tests to match new RLookups solution in FT.HYBRID * Fix hybrid parsing tests as default scorefield is now enabled * CR comments * CR comments * Move flag setting logic from aggregate_request to parse_hybrid * Minor refactor of GetVecSimMetricFromVectorField * new line EOF * Add ['INT8', 'UINT8'] to test_hybrid_vector_normalizer * CR comments * CR comments * Rename YIELD_DISTANCE_AS to YIELD_SCORE_AS in FT.HYBRID VSIM * Fix test_hybrid_multithread * CR comnents * Fix test_hybrid_dialects * Minor fix * MOD-11258: Add parameter count validation to COMBINE LINEAR clause (#6826) * MOD-10047: Add Shard Cursor Support For _FT.HYBRID (#6697) * hybrid request and hybrid parsing changes * important fixes to tests and code * revert unit-tests file changes * fix memory leaks in unit tests * fix memory leaks in flow tests * code review comments * initial commit * small fix * implemented basic functionality * bring back commented out code, remove isHybrid * post rebase on feature-RRF * fix compilation post rebase * trying to reduce amount of changes * fix compilation post rebase * some general fixes * try and fix cpp-tests paths issue * fix the hybrid tests, bring back support for running cpp-tests with gdb * rename test file * fixing the new cursor flow * added flow tests, some minor fixes * rebase fixes * remove code which went in accidently * revert again unit tests changes * don't run internal command in cluster environment * try and fix memory leak * increase coverage for when cursor limit is reached, fix request lifetime management. * fix typo * remove free in execute since now lifetime is managed by ref counting. * fix memory leak * fix leaks in error flow * fix StartCursors signature * fix query error leaks * compilation fixes post rebase * remove some redundant changes * remove some more redundant changes * code review comments * comments + result limit issue workaround * minor fix * code review comments * try and fix tests * move query test to be a cpp test since it needs redis mock, fix query error clear function * code review comments, fix leak in query test, add flow test * fix typo * small fix * fix if * don't rely on done_depleting since it is set by depleter thread * hybrid debug request should now also free the hybrid request * fix code review comment * MOD-11335: Sync master and feature-RRF branch (#6862) * Fix after code review * Update HasScorer() * Revert reply.c changes * Fix serializeVectorNormalizer() --------- Co-authored-by: Raz Monsonego <[email protected]> Co-authored-by: Itzikvaknin <[email protected]> Co-authored-by: kei-nan <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: ofiryanai <[email protected]>

Apply PR 6376 to feature-RRF branch

cf1449a

github-actions bot added the size:M label Jul 10, 2025

nafraf changed the title ~~Apply PR 6376 to feature-RRF branch~~ [feature-RRF] MOD-10224: Add vector filter validation Jul 10, 2025

nafraf requested a review from Itzikvaknin July 10, 2025 13:04

nafraf marked this pull request as ready for review July 10, 2025 13:04

Itzikvaknin approved these changes Jul 10, 2025

View reviewed changes

nafraf merged commit 00868c8 into feature-RRF Jul 10, 2025
23 of 25 checks passed

nafraf deleted the nafraf_vector-filter-parser-to-feature-RRF branch July 10, 2025 13:38

nafraf mentioned this pull request Jul 10, 2025

MOD-10224: Add vector filter validation #6376

Closed

2 tasks

Itzikvaknin mentioned this pull request Jul 21, 2025

MOD-10226: Implement parser for Hybrid Query VECTOR query params #6482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature-RRF] MOD-10224: Add vector filter validation #6487

[feature-RRF] MOD-10224: Add vector filter validation #6487
nafraf merged 1 commit intofeature-RRFfrom
nafraf_vector-filter-parser-to-feature-RRF

nafraf commented Jul 10, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nafraf commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nafraf commented Jul 10, 2025 •

edited

Loading

codecov bot commented Jul 10, 2025 •

edited

Loading