Skip to content

Performance improvement in indexBulkFields - [MOD-8093]#5186

Merged
GuyAv46 merged 9 commits intomasterfrom
guyav-perf_indexBulkFields
Nov 10, 2024
Merged

Performance improvement in indexBulkFields - [MOD-8093]#5186
GuyAv46 merged 9 commits intomasterfrom
guyav-perf_indexBulkFields

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Nov 9, 2024

Describe the changes in the pull request

We are removing a redundant caching mechanism in indexBulkFields that is no longer in use and takes a long time to initialize.

The caching mechanism helped with field keys and inverted index caching. The first is no longer relevant, and the second only helps when we use bulk indexing (which we don’t do anymore).

The redundant caching mechanism requires a large memory initialization, which takes a large percentage of the indexing time for no reason.

Note:

This is a performance fix, the bug does not affect the correctness of any search

Future work:

Removing the need for "formatted keys", stop using the spec's dictionary for fields, and store the relevant data in the field spec instead (today we usually extract the field data by first obtaining the field spec by its name, then getting the formatted redis string from it, and lastly performing a dictionary lookup by the formatted key name)

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@GuyAv46 GuyAv46 changed the title Perf index bulk fields Perf index bulk fields - [MOD-8093] Nov 10, 2024
@GuyAv46 GuyAv46 changed the title Perf index bulk fields - [MOD-8093] Performance improvement in indexBulkFields - [MOD-8093] Nov 10, 2024
@GuyAv46 GuyAv46 requested a review from raz-mon November 10, 2024 07:26
@GuyAv46 GuyAv46 marked this pull request as ready for review November 10, 2024 07:26
@GuyAv46 GuyAv46 requested review from alonre24 and oshadmi November 10, 2024 07:27
@codecov
Copy link

codecov bot commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 86.50%. Comparing base (a93aa64) to head (1504d54).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/document.c 63.63% 8 Missing ⚠️
src/fork_gc.c 50.00% 3 Missing ⚠️
src/debug_commands.c 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5186      +/-   ##
==========================================
+ Coverage   86.43%   86.50%   +0.07%     
==========================================
  Files         192      192              
  Lines       34784    34686      -98     
==========================================
- Hits        30066    30006      -60     
+ Misses       4718     4680      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alonre24
alonre24 previously approved these changes Nov 10, 2024
@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 10, 2024
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Nov 10, 2024
Merged via the queue into master with commit dfd463f Nov 10, 2024
@GuyAv46 GuyAv46 deleted the guyav-perf_indexBulkFields branch November 10, 2024 13:21
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-5186-to-2.8 origin/2.8
cd .worktree/backport-5186-to-2.8
git switch --create backport-5186-to-2.8
git cherry-pick -x dfd463fe089bf2f0ef1bc495f206a6925df72ff7

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.6
git worktree add -d .worktree/backport-5186-to-2.6 origin/2.6
cd .worktree/backport-5186-to-2.6
git switch --create backport-5186-to-2.6
git cherry-pick -x dfd463fe089bf2f0ef1bc495f206a6925df72ff7

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-5186-to-2.10 origin/2.10
cd .worktree/backport-5186-to-2.10
git switch --create backport-5186-to-2.10
git cherry-pick -x dfd463fe089bf2f0ef1bc495f206a6925df72ff7

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-5186-to-8.0 origin/8.0
cd .worktree/backport-5186-to-8.0
git switch --create backport-5186-to-8.0
git cherry-pick -x dfd463fe089bf2f0ef1bc495f206a6925df72ff7

GuyAv46 added a commit that referenced this pull request Nov 10, 2024
* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)
GuyAv46 added a commit that referenced this pull request Nov 10, 2024
* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)
GuyAv46 added a commit that referenced this pull request Nov 10, 2024
* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)
GuyAv46 added a commit that referenced this pull request Nov 10, 2024
* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)
Comment on lines -68 to -71
RedisModule_ModuleTypeSetValue(*idxKey, GeometryIndexType, idx);
return idx;
}
if (RedisModule_ModuleTypeGetType(*idxKey) == GeometryIndexType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuyAv46 We can also remove the global GeometryIndexType?

github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Performance improvement in indexBulkFields - [MOD-8093] (#5186)

* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)

* fixes for 2.6
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Performance improvement in indexBulkFields - [MOD-8093] (#5186)

* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)

* fixes for 2.8
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Performance improvement in indexBulkFields - [MOD-8093] (#5186)

* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)

* remove const
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Performance improvement in indexBulkFields - [MOD-8093] (#5186)

* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)

* fix inverted index

* remove const
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Performance improvement in indexBulkFields - [MOD-8093] (#5186)

* initial cleanup

* cleanup tag

* cleanup geoshape

* cleanup numeric

* more cleanup

* clean bulk data object

* missed code cleanup

* improve vecsim delete doc flow

* another vecsim improvement

(cherry picked from commit dfd463f)

* fix inverted index

* remove const
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants