Skip to content

HasVector filtering condition#5303

Merged
generall merged 8 commits intodevfrom
has-vector-condition
Oct 25, 2024
Merged

HasVector filtering condition#5303
generall merged 8 commits intodevfrom
has-vector-condition

Conversation

@generall
Copy link
Copy Markdown
Member

@generall generall commented Oct 23, 2024

Related to: #5264 and #2737

Also required for faster selection of samples in Matrix API

This PR includes:

  • In order to check for vector presence, optimized query should have access to vector storage. So StructPayloadIndex now have vector_storages field.
  • API change for REST and gRPC
  • Query optimization logic previously was implemented as standalone functions, but it caused "too many argumants" problem. This PR makes functions like optimize_filter part of the StructPayloadIndex, so they can access context via self., which allows to simplify interface significantly

Pro tip: review by commits

@generall generall changed the title include vector storage into struct vector index HasVector filtering condition Oct 24, 2024
@generall generall marked this pull request as ready for review October 24, 2024 11:37
Comment thread lib/segment/src/index/query_optimization/optimizer.rs
Comment thread lib/segment/src/index/struct_payload_index.rs Outdated
Comment thread lib/segment/src/index/struct_payload_index.rs Outdated

for (vector_name, vector_config) in &segment_config.vector_data {
let vector_index_path = get_vector_index_path(temp_dir.path(), vector_name);
let mut vector_storages_arc = HashMap::new();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is not an Arc

id_tracker: Arc<AtomicRefCell<IdTrackerSS>>,
/// Vector storages for each field, used for `has_vector` condition
#[allow(dead_code)]
vector_storages: HashMap<VectorName, Arc<AtomicRefCell<VectorStorageEnum>>>,
Copy link
Copy Markdown
Contributor

@coszio coszio Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the payload index only reads from the vector storage, can't this just be an Arc<VectorStorageEnum>?

Ah, nvm, it is the same refcell that is used in other structs which DO modify it, so removing it does not make sense

@generall generall merged commit 633d996 into dev Oct 25, 2024
@generall generall deleted the has-vector-condition branch October 25, 2024 16:47
timvisee pushed a commit that referenced this pull request Nov 8, 2024
* include vector storage into struct vector index

* implement has_vector

* generate schemas

* refactor query filter optimizer so avoid too many function arguments

* test + fix for sparse vectors

* Update lib/segment/src/index/struct_payload_index.rs

Co-authored-by: Jojii <[email protected]>

* Update lib/segment/src/index/query_optimization/optimizer.rs

Co-authored-by: Jojii <[email protected]>

* fmt

---------

Co-authored-by: Jojii <[email protected]>
@timvisee timvisee mentioned this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants