Skip to content

Fix string embedding decision with keys#2087

Merged
ranshid merged 1 commit intovalkey-io:unstablefrom
poiuj:fix-embedding
May 15, 2025
Merged

Fix string embedding decision with keys#2087
ranshid merged 1 commit intovalkey-io:unstablefrom
poiuj:fix-embedding

Conversation

@poiuj
Copy link
Contributor

@poiuj poiuj commented May 14, 2025

This commit fixes the size calculation for embedded strings with keys in createStringObjectWithKeyAndExpire(). The previous implementation used a fixed size of 3 bytes for the key's header, which was incorrect. The new implementation properly calculates the required size based on the key's length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:

  1. Short values with long keys are properly encoded as embedded strings
  2. Values that would cause the total size to exceed 64 bytes are encoded as raw

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're right!

So what's the impact of this bug? We sometimes use embedded strings when the total allocation size needed is 65 or 66?

@poiuj
Copy link
Contributor Author

poiuj commented May 14, 2025

That's right. And when we need 65 or 66 bytes, we actually get 80 bytes as this is the next jemalloc bucket.

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Vadym!

The change LGTM. added some minor addition which IMO makes the code more clear.

src/object.c Outdated
size += sdsReqSize(key_len, sdsReqType(key_len)) + 1; /* 1 byte for prefixed sds hdr size */
}
size += (expire != -1) * sizeof(long long);
size += 4 + len; /* embstr header (3) + nullterm (1) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although no problem here, I would love to see this part changed as well:
size += sdsReqSize(len, SDS_TYPE_8)

Suggested change
size += 4 + len; /* embstr header (3) + nullterm (1) */
size += sdsReqSize(len, SDS_TYPE_8); /* embstr header (3) + nullterm (1) */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in that case, we don't need the comment on the right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let met update the PR.

This commit fixes the size calculation for embedded strings with keys in
createStringObjectWithKeyAndExpire(). The previous implementation used a
fixed size of 3 bytes for the key's header, which was incorrect. The new
implementation properly calculates the required size based on the key's
length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:
1. Short values with long keys are properly encoded as embedded strings
2. Values that would cause the total size to exceed 64 bytes are encoded as raw

Signed-off-by: Vadym Khoptynets <[email protected]>
@ranshid
Copy link
Member

ranshid commented May 15, 2025

@zuiderkwast I think this should be backported right?

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 15, 2025

@zuiderkwast I think this should be backported right?

We can if you want, but I don't think it's very important. We save 8 bytes in very few specific cases. It's not a regression because 8.1 saves memory anyway compared too earlier versions.

Only when key >= 32, value < 7 and key + value < 48 (or 40 if there is an expire), we save 8 bytes here. In this case we get a 64 bytes allocation jemalloc bucket for the robj and 8 bytes for the value.

When the value is >= 7, we need a 16 bytes bucket for it, so the sum is 80 bytes again.

@codecov
Copy link

codecov bot commented May 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.22%. Comparing base (9e87473) to head (1f818f7).
Report is 17 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2087      +/-   ##
============================================
+ Coverage     71.05%   71.22%   +0.17%     
============================================
  Files           122      122              
  Lines         66143    66024     -119     
============================================
+ Hits          46997    47025      +28     
+ Misses        19146    18999     -147     
Files with missing lines Coverage Δ
src/object.c 81.35% <100.00%> (-0.17%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ranshid ranshid merged commit 11a2b6c into valkey-io:unstable May 15, 2025
51 checks passed
TEST_ASSERT(embstr_obj->encoding == OBJ_ENCODING_EMBSTR);

robj *raw_obj = objectSetKeyAndExpire(longer_val_obj, key, -1);
TEST_ASSERT(raw_obj->encoding == OBJ_ENCODING_RAW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pointer is 32 bits so the robj struct is smaller.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 22, 2025

Bump. This PR broke the daily build. I'm willing to revert this PR if it isn't fixed. Sorry that was rude. If you're not able to fix it, someone else will.

@zuiderkwast
Copy link
Contributor

I have a fix here:

@poiuj
Copy link
Contributor Author

poiuj commented May 25, 2025

@zuiderkwast thanks for the fix!
Sorry, didn't get the notification.

shanwan1 pushed a commit to shanwan1/valkey that referenced this pull request Jun 13, 2025
This commit fixes the size calculation for embedded strings with keys in
createStringObjectWithKeyAndExpire(). The previous implementation used a
fixed size of 3 bytes for the key's header, which was incorrect. The new
implementation properly calculates the required size based on the key's
length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:
1. Short values with long keys are properly encoded as embedded strings
2. Values that would cause the total size to exceed 64 bytes are encoded
as raw

Signed-off-by: Vadym Khoptynets <[email protected]>
Signed-off-by: shanwan1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants