Skip to content

MOD-9985 Refactoring of C Code for the RSSortingVector Port#6312

Merged
DarthB merged 9 commits intomasterfrom
rp-sortvec-crefactor
Jun 17, 2025
Merged

MOD-9985 Refactoring of C Code for the RSSortingVector Port#6312
DarthB merged 9 commits intomasterfrom
rp-sortvec-crefactor

Conversation

@DarthB
Copy link
Contributor

@DarthB DarthB commented Jun 13, 2025

Describe the changes in the pull request

  1. Introduce inline functions to avoid direct field accesses, such that Rust functions could be called at that place.
  2. Switch from one complex variant-based function to multiple type-safe functions to put values into the RSSortingVector
  3. Remove unnecessary argument from SortingVector_RdbLoad function
  4. Add an access point for the Increment and Decrement Functions of RSValue to be used by Rust code too.

Extracted from: #6311

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@DarthB DarthB force-pushed the rp-sortvec-crefactor branch from 95c19b3 to 46f0fd3 Compare June 13, 2025 15:34
src/value.h Outdated
}
}

void RSValue_IncrRef_Rust(RSValue* v);
Copy link
Collaborator

@JoanFM JoanFM Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be better not to name them Rust, but call the static one with some _ underscore to imply "private"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, having two variants of a function raises questions.
Instead of a _ prefix, a _inl postfix is more descriptive, what do you think?

Addressed in 365cd88

@codecov
Copy link

codecov bot commented Jun 16, 2025

Codecov Report

Attention: Patch coverage is 85.41667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 88.84%. Comparing base (119d11a) to head (cc2e2c4).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
src/document.c 68.75% 5 Missing ⚠️
src/doc_table.c 0.00% 1 Missing ⚠️
src/sortable.c 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6312      +/-   ##
==========================================
- Coverage   88.90%   88.84%   -0.07%     
==========================================
  Files         241      242       +1     
  Lines       40793    40796       +3     
  Branches     3435     3435              
==========================================
- Hits        36266    36244      -22     
- Misses       4491     4516      +25     
  Partials       36       36              
Flag Coverage Δ
flow 82.18% <85.41%> (-0.17%) ⬇️
unit 46.32% <35.41%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/value.h Outdated
Comment on lines 152 to 164
// Use RSValue_IncrRef_inl for performance in C; RSValue_IncrRef for FFI
static inline RSValue *RSValue_IncrRef_inl(RSValue *v) {
__atomic_fetch_add(&v->refcount, 1, __ATOMIC_RELAXED);
return v;
}
void RSValue_IncrRef(RSValue* v);

#define RSValue_Decref(v) \
if (!__atomic_sub_fetch(&(v)->refcount, 1, __ATOMIC_RELAXED)) { \
RSValue_Free(v); \
// Use RSValue_DecrRef_inl for performance in C; RSValue_IncrRef for FFI
static inline void RSValue_DecrRef_inl(RSValue* v) {
if (__atomic_sub_fetch(&(v)->refcount, 1, __ATOMIC_RELAXED) == 0) {
RSValue_Free(v);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the original name for these two functions, using a more verbose name for the FFI ones, since those will only be used through bindings in Rust.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, do we truly need two definitions? The C compiler may inline these functions even if not marked as static inline. A benchmark run might clarify if the change is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the ones started by the action:run_benchmark label?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 51 to 53
size_t len = RSSortingVector_Length(r->rowdata.sv);
if ((kk->flags & RLOOKUP_F_SVSRC) && (r->rowdata.sv && len > kk->svidx)) {
return RSSortingVector_Get(r->rowdata.sv, kk->svidx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the length access after the null check on rowdata.sv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 55 to 61
static inline size_t RSSortingVector_Length(const RSSortingVector* vec) {
if (vec) {
return vec->len;
} else {
return 0;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have this defaulting behaviour for the null case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarthB DarthB force-pushed the rp-sortvec-crefactor branch from be743d2 to cc2e2c4 Compare June 16, 2025 08:56
@fcostaoliveira
Copy link
Contributor

fcostaoliveira commented Jun 16, 2025

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 23 stable tests between versions.
  • Detected a total of 8 highly unstable benchmarks.
  • Detected a total of 5 improvements above the improvement water line.
  • Detected a total of 4 regressions bellow the regression water line 5.0.

You can check a comparison in detail via the grafana link

Comparison between master and rp-sortvec-crefactor.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison rp-sortvec-crefactor (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 84 +- 1.2% (7 datapoints) 89.00 5.9% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-prefix 8703 +- 8.5% (7 datapoints) 9176.00 5.4% waterline=8.5%. potential IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-suffix 2724 +- 1.0% (7 datapoints) 2708.00 -0.6% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 85617 +- 4.6% (7 datapoints) 83462.00 -2.5% No Change
ftsb-10K-enwiki_abstract-hashes-term-wildcard 14423 +- 10.2% UNSTABLE (7 datapoints) 15389.00 6.7% UNSTABLE (very high variance)
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1152 +- 3.4% (7 datapoints) 1099.00 -4.6% potential REGRESSION
ftsb-10K-enwiki_pages-hashes-load 60097 +- 5.8% (7 datapoints) 61835.00 2.9% waterline=5.8%. No Change
ftsb-10K-multivalue-numeric-json 983 +- 1.0% (7 datapoints) 951.00 -3.2% potential REGRESSION
ftsb-10K-singlevalue-numeric-json 448 +- 0.5% (7 datapoints) 445.00 -0.6% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 2279 +- 1.1% (7 datapoints) 2258.00 -0.9% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 430 +- 13.7% UNSTABLE (7 datapoints) 334.00 -22.4% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 27 +- 38.6% UNSTABLE (7 datapoints) 41.00 49.4% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 3874 +- 9.2% (7 datapoints) 3478.00 -10.2% waterline=9.2%. REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1147 +- 8.1% (7 datapoints) 1283.00 11.8% waterline=8.1%. IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 847 +- 20.1% UNSTABLE (7 datapoints) 986.00 16.5% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-load 23710 +- 4.3% (7 datapoints) 23967.00 1.1% No Change
ftsb-1M-nyc_taxis-ftadd-load 29082 +- 2.1% (7 datapoints) 29060.00 -0.1% No Change
ftsb-1M-nyc_taxis-hashes-load 32084 +- 1.7% (7 datapoints) 32102.00 0.1% No Change
search-aggregate-post-filter-simple.yml 110910 +- 4.9% (7 datapoints) 122019.00 10.0% IMPROVEMENT
search-filtering-tag-numeric 261 +- 12.6% UNSTABLE (7 datapoints) 275.00 5.2% UNSTABLE (very high variance)
search-filtering-tag-numeric-filter-pipeline 25576 +- 1.3% (7 datapoints) 24938.00 -2.5% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 50751 +- 3.2% (7 datapoints) 48169.00 -5.1% REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 48409 +- 2.6% (7 datapoints) 48945.00 1.1% No Change
search-ftsb-1700K-docs-union-iterators-q3 8.1 +- 1.1% (7 datapoints) 8.20 1.4% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 159 +- 4.9% (7 datapoints) 168.00 5.2% IMPROVEMENT
search-ftsb-370K-docs-union-iterators-q4 8.3 +- 1.0% (7 datapoints) 8.30 -0.8% No Change
search-ftsb-5200K-docs-union-iterators-q1 0.84 +- 0.6% (7 datapoints) 0.85 1.2% No Change
search-ftsb-5500K-docs-union-iterators-q2 1.2 +- 1.5% (7 datapoints) 1.20 0.0%
search-geo 217 +- 2.7% (7 datapoints) 225.00 4.1% potential IMPROVEMENT
search-high-cardinality-negation-term-baseline 30 +- 3.0% (7 datapoints) 28.00 -7.2% REGRESSION
search-high-cardinality-negation-term-comparison_union_all_other_terms 8.2 +- 2.0% (7 datapoints) 8.80 7.1% IMPROVEMENT
search-numeric 2413 +- 52.4% UNSTABLE (7 datapoints) 4179.00 73.2% UNSTABLE (very high variance)
search-numeric-optimize 14269 +- 1.3% (7 datapoints) 14151.00 -0.8% No Change
search-numeric-sortby 2460 +- 44.1% UNSTABLE (7 datapoints) 4897.00 99.1% UNSTABLE (very high variance)
search-numeric-sortby-desc 2453 +- 40.1% UNSTABLE (7 datapoints) 2467.00 0.5% UNSTABLE (very high variance)
search-numeric-sortby-desc-optimize 26 +- 5.4% (7 datapoints) 26.00 -1.4% waterline=5.4%. No Change
search-numeric-sortby-optimize 24 +- 5.4% (7 datapoints) 25.00 1.4% waterline=5.4%. No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 605 +- 5.2% (7 datapoints) 625.00 3.3% waterline=5.2%. potential IMPROVEMENT
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 167 +- 6.7% (7 datapoints) 167.00 0.2% waterline=6.7%. No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 72152 +- 3.4% (7 datapoints) 67320.00 -6.7% REGRESSION

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DarthB DarthB added this pull request to the merge queue Jun 17, 2025
Merged via the queue into master with commit fa57fd2 Jun 17, 2025
30 checks passed
@DarthB DarthB deleted the rp-sortvec-crefactor branch June 17, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants