Expose SVS Vamana defaults + debug info [MOD-9759]#702
Conversation
meiravgri
left a comment
There was a problem hiding this comment.
I think we should have some runtime getters for default values, since some may depend on other runtime parameters.
src/VecSim/vec_sim_common.h
Outdated
| #define SVS_VAMANA_DEFAULT_ALPHA_IP 0.95f | ||
| #define SVS_VAMANA_DEFAULT_GRAPH_MAX_DEGREE 32 | ||
| #define SVS_VAMANA_DEFAULT_CONSTRUCTION_WINDOW_SIZE 200 | ||
| #define SVS_VAMANA_DEFAULT_MAX_CANDIDATE_POOL_SIZE (3 * SVS_VAMANA_DEFAULT_CONSTRUCTION_WINDOW_SIZE) |
There was a problem hiding this comment.
what if SVS_VAMANA_DEFAULT_CONSTRUCTION_WINDOW_SIZE is defined by the user differently than the default, but SVS_VAMANA_DEFAULT_MAX_CANDIDATE_POOL_SIZE is not?
There was a problem hiding this comment.
Right, in this case, the info report will not be correct. I believe we should not expose SVS_VAMANA_DEFAULT_MAX_CANDIDATE_POOL_SIZE at all for users and have it only as an internal param, but let's wait for svs devlopers decision
There was a problem hiding this comment.
Do you mean it shouldn’t be configurable at all, or just not exposed via the info command?
In any case, if we do expose it in the info output, I think it should reflect the correct runtime-derived value. So maybe the options are:
- Don’t expose it now and possibly introduce a runtime fetch API later if needed, or
- Remove it altogether if it's not meant to be user-facing.
There was a problem hiding this comment.
We will not expose it now, and if we decide to leave these params as user-facing facing we will expose them to info via API call (with all others)
…VAMANA_DEFAULT_USE_SEARCH_HISTORY Co-authored-by: meiravgri <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #702 +/- ##
=======================================
Coverage 96.85% 96.86%
=======================================
Files 122 122
Lines 7340 7359 +19
=======================================
+ Hits 7109 7128 +19
Misses 231 231 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/VecSim/vec_sim_tiered_index.h
Outdated
| { | ||
| // Compute the union of the labels from both indices. | ||
| std::vector<size_t> labels_union; | ||
| auto flat_labels = this->frontendIndex->getLabelsSet(); | ||
| auto main_labels = this->backendIndex->getLabelsSet(); | ||
| labels_union.reserve(flat_labels.size() + main_labels.size()); | ||
| std::set_union(flat_labels.begin(), flat_labels.end(), main_labels.begin(), | ||
| main_labels.end(), std::back_inserter(labels_union)); | ||
| info.commonInfo.indexLabelCount = labels_union.size(); | ||
| } | ||
|
|
There was a problem hiding this comment.
following the comment on indexLabelCount api, can this code block replace the existing implmneation of indexLabelCount of TieredHNSW and TieredSVS?
There was a problem hiding this comment.
not relevant since not adding the new API. In tiered indexes, this should be guarded anyway
rfsaliev
left a comment
There was a problem hiding this comment.
Some comments to getLabelsSet()
| // labels in a tiered index. | ||
| template <typename DataType, typename DistType> | ||
| vecsim_stl::set<labelType> HNSWIndex_Single<DataType, DistType>::getLabelsSet() const { | ||
| std::shared_lock<std::shared_mutex> index_data_lock(this->indexDataGuard); |
There was a problem hiding this comment.
should we add a lock to getLabelsSet in svs as well?
There was a problem hiding this comment.
I don't think so, the main index guard that is assumed to be held from the tiered index should be enough (we never access internal svs locks from our code), right @rfsaliev ?
| // labels in a tiered index, and caller should hold the index data guard. | ||
| // labels in a tiered index. | ||
| inline vecsim_stl::set<labelType> getLabelsSet() const override { | ||
| std::shared_lock<std::shared_mutex> index_data_lock(this->indexDataGuard); |
There was a problem hiding this comment.
consider using the api lockSharedIndexDataGuard
There was a problem hiding this comment.
Though we are using that api in many places, I believe it is a better practice to acquire locks like this where possible
Describe the changes in the pull request
Expose SVS Vamana default build parameters to be used by redisearch in ft.info, and implement the
debugInfoAPI for svs index and tiered svs.This includes adding unit tests for the svs factory to validate that both default and non-default parameters are passed properly, and refining the debug info call for the tiered index to compute the effective label set count internally, thereby avoiding the need to reacquire locks in
indexLabelCount(). The last change was possible due to moving thegetLabelSetvirtual call to the abstract index level (all derived indexes were already implementing this call)Mark if applicable