-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add a experiment API for upsert heap memory estimation #8355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a experiment API for upsert heap memory estimation #8355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resource is about UpsertTable only. Why not name it explicitly so that it is clear that this is about UpsertTable resource estimation?
68769c6 to
3bc1c66
Compare
Codecov Report
@@ Coverage Diff @@
## master #8355 +/- ##
============================================
- Coverage 70.47% 70.28% -0.20%
- Complexity 4313 4409 +96
============================================
Files 1703 1718 +15
Lines 89569 90597 +1028
Branches 13564 13713 +149
============================================
+ Hits 63123 63673 +550
- Misses 21990 22431 +441
- Partials 4456 4493 +37
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is built as a tool instead of a regular restlet API as it doesn't rely on the information stored in the cluster. We should at least read the instance config info, calculate the partitions on each server, and estimate the heap usage per server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, I added numPartitions, replicasPerPartition and memoryPerHost
.../org/apache/pinot/controller/api/resources/PinotUpsertCapacityEstimationRestletResource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If comparison column is fixed width, no need to provide it in columnStats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks Jackie. I will infer the comparison size from Datatype.
FieldSpec.DataType dt = schema.getFieldSpecFor(comparisonColumn).getDataType(); bytesPerValue = 52 + columnStats.getComparisonColSize();
for datatype that are not fixed width, i will throw an controllerException since it's a not supported type for comparison column e.g. String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also count the array size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks Jackie. Java has a 24 bytes array overhead and there's also 8 bytes for the actual array object, so i added 32 bytes in bytes per key.
0e29e7e to
b38c607
Compare
ad1c735 to
210e036
Compare
pinot-controller/src/test/java/org/apache/pinot/controller/ControllerTestUtils.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/controller/api/resources/PinotUpsertCapacityEstimationRestletResource.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/controller/api/resources/PinotUpsertCapacityEstimationRestletResource.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/controller/api/resources/PinotUpsertCapacityEstimationRestletResource.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/controller/api/resources/PinotUpsertCapacityEstimationRestletResource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need replicasPerPartition because multiple replicas won't be assigned to the same host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalSpacePerPartition(bytes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks for suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) Here we only calculate the map content size. Should we also consider the map entry size and the array size within the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object overhead will vary between different machines.
I found this https://dzone.com/articles/size-entry-map shows 81.4 bytes per concurrentHashmap entry, do you think this is accurate and want me to add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make more accurate estimation, we should count the object overhead because as you can see, the object overhead can be larger than the actual content within the entry.
I don't know the formula of object overhead from map size, so we need to do some research here. If you find that is too hard, we may add a TODO and address that later. But users should know this size does not include the overhead, so it can be quite off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion! I can add a Todo and address later. I will do some benchmark with heap dump.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/Constants.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/builder/ControllerRequestURLBuilder.java
Outdated
Show resolved
Hide resolved
4bfc9aa to
f00f66d
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the release note of the PR description for the new added rest endpoint
...ller/src/main/java/org/apache/pinot/controller/api/resources/PinotUpsertRestletResource.java
Outdated
Show resolved
Hide resolved
…/resources/PinotUpsertRestletResource.java Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
Thanks @Jackie-Jiang , done. |
Description
Added a new rest resource and an endpoint for estimating heap memory usage for Pinot upsert table.
Related Pinot issue: #6729
Release Notes
Added a http endpoint for estimate the heap usage for Pinot upsert table per partition based on PK cardinality.
Documentation