Skip to content

Conversation

@61yao
Copy link
Contributor

@61yao 61yao commented Sep 20, 2022

This change is backward incompatible:

  1. Change get bytes behavior to return raw bytes directly.
  • This doesn't impact the values that are originally stored as bytes
  • If the values are stored as strings and read as bytes, in the past, we returned the hex encoded bytes for the string value. This PR changes the behavior to return utf8 bytes directly.
  1. Add support to getBytesMV

@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2022

Codecov Report

Merging #9441 (2ac7f0a) into master (b043dd8) will increase coverage by 6.30%.
The diff coverage is 27.27%.

@@             Coverage Diff              @@
##             master    #9441      +/-   ##
============================================
+ Coverage     63.53%   69.84%   +6.30%     
- Complexity     5072     5136      +64     
============================================
  Files          1836     1910      +74     
  Lines         98545   101832    +3287     
  Branches      15062    15451     +389     
============================================
+ Hits          62614    71122    +8508     
+ Misses        31313    25699    -5614     
- Partials       4618     5011     +393     
Flag Coverage Δ
integration1 26.09% <2.27%> (?)
integration2 24.72% <0.00%> (?)
unittests1 67.08% <27.27%> (+0.01%) ⬆️
unittests2 15.48% <0.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/apache/pinot/core/common/DataBlockCache.java 89.44% <0.00%> (-1.47%) ⬇️
...ore/operator/docvalsets/ProjectionBlockValSet.java 86.48% <0.00%> (-3.66%) ⬇️
...core/operator/docvalsets/TransformBlockValSet.java 59.43% <0.00%> (+4.09%) ⬆️
...ator/transform/function/BaseTransformFunction.java 75.80% <0.00%> (+0.03%) ⬆️
...ransform/function/IdentifierTransformFunction.java 77.08% <0.00%> (-1.65%) ⬇️
...r/transform/function/LiteralTransformFunction.java 70.19% <0.00%> (+1.26%) ⬆️
...e/pinot/core/query/reduce/RowBasedBlockValSet.java 15.71% <0.00%> (-0.23%) ⬇️
...pache/pinot/segment/local/io/util/ValueReader.java 100.00% <ø> (ø)
...ot/segment/local/io/util/VarLengthValueReader.java 72.97% <0.00%> (-2.03%) ⬇️
.../dictionary/BigDecimalOnHeapMutableDictionary.java 34.83% <0.00%> (ø)
... and 502 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@61yao 61yao changed the title [not for review] Update get bytes to return raw bytes of string [enhancement] Update get bytes to return raw bytes of string and support getBytesMV Sep 21, 2022
@61yao 61yao changed the title [enhancement] Update get bytes to return raw bytes of string and support getBytesMV [enhancement] [release-note] Update get bytes to return raw bytes of string and support getBytesMV Sep 21, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments

}
break;
case BYTES:
case STRING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the switch and directly read the values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we don't want to support this for other types?

String unpaddedString = getUnpaddedString(i, buffer);
_unpaddedStrings[i] = unpaddedString;
_unPaddedStringToIdMap.put(unpaddedString, i);
_unpaddedBytes[i] = getUnpaddedBytes(i, buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only read the unpadded bytes, and directly create unpaddedString from the unpadded bytes for better performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance will not be good because we have do encode every time if we want to read unpadded string? Or you are talking about space efficiency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the bytes/string reading during the construction

Suggested change
_unpaddedBytes[i] = getUnpaddedBytes(i, buffer);
byte[] unpaddedBytes = getUnpaddedBytes(i, buffer);
String unpaddedString = new String(unpaddedBytes, UTF_8);
...

@Jackie-Jiang Jackie-Jiang added enhancement release-notes Referenced by PRs that need attention when compiling the next release notes backward-incompat Referenced by PRs that introduce or fix backward compat issues labels Sep 21, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

String unpaddedString = getUnpaddedString(i, buffer);
_unpaddedStrings[i] = unpaddedString;
_unPaddedStringToIdMap.put(unpaddedString, i);
_unpaddedBytes[i] = getUnpaddedBytes(i, buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the bytes/string reading during the construction

Suggested change
_unpaddedBytes[i] = getUnpaddedBytes(i, buffer);
byte[] unpaddedBytes = getUnpaddedBytes(i, buffer);
String unpaddedString = new String(unpaddedBytes, UTF_8);
...

walterddr
walterddr approved these changes Sep 26, 2022
@walterddr walterddr merged commit 3818397 into apache:master Sep 27, 2022
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
…string and support getBytesMV (apache#9441)

* get raw bytes for string value column

* get raw bytes for string value column

* support getBytesMV

* fix fail test

* fix style

* Address sytle commets

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Referenced by PRs that introduce or fix backward compat issues enhancement release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants