Skip to content

Conversation

@deemoliu
Copy link
Contributor

@deemoliu deemoliu commented Mar 15, 2022

Description

Added a new rest resource and an endpoint for estimating heap memory usage for Pinot upsert table.
Related Pinot issue: #6729

Release Notes

Added a http endpoint for estimate the heap usage for Pinot upsert table per partition based on PK cardinality.

  • POST /upsert/estimateHeapUsage

Documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resource is about UpsertTable only. Why not name it explicitly so that it is clear that this is about UpsertTable resource estimation?

@deemoliu deemoliu force-pushed the qiaochu/capacity-estimate-api branch from 68769c6 to 3bc1c66 Compare March 29, 2022 20:52
@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2022

Codecov Report

Merging #8355 (795fa15) into master (b5690a7) will decrease coverage by 0.19%.
The diff coverage is 41.66%.

@@             Coverage Diff              @@
##             master    #8355      +/-   ##
============================================
- Coverage     70.47%   70.28%   -0.20%     
- Complexity     4313     4409      +96     
============================================
  Files          1703     1718      +15     
  Lines         89569    90597    +1028     
  Branches      13564    13713     +149     
============================================
+ Hits          63123    63673     +550     
- Misses        21990    22431     +441     
- Partials       4456     4493      +37     
Flag Coverage Δ
integration1 26.81% <4.16%> (-0.36%) ⬇️
integration2 25.45% <4.16%> (-0.06%) ⬇️
unittests1 66.42% <0.00%> (-0.03%) ⬇️
unittests2 13.98% <41.66%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache/pinot/controller/api/resources/Constants.java 23.07% <ø> (ø)
...org/apache/pinot/spi/config/table/ColumnStats.java 0.00% <0.00%> (ø)
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% <0.00%> (ø)
...ller/api/resources/PinotUpsertRestletResource.java 56.81% <56.81%> (ø)
...inot/controller/helix/ControllerRequestClient.java 63.00% <100.00%> (ø)
...data/manager/realtime/DefaultSegmentCommitter.java 0.00% <0.00%> (-80.00%) ⬇️
...ot/segment/local/segment/store/TextIndexUtils.java 76.47% <0.00%> (-23.53%) ⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java 43.56% <0.00%> (-18.82%) ⬇️
...ava/org/apache/pinot/client/ConnectionFactory.java 54.54% <0.00%> (-16.05%) ⬇️
...data/manager/realtime/SegmentCommitterFactory.java 88.23% <0.00%> (-11.77%) ⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5690a7...795fa15. Read the comment docs.

@deemoliu deemoliu changed the title add a experiment API for upsert heap memory estimation [WIP] add a experiment API for upsert heap memory estimation Mar 29, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is built as a tool instead of a regular restlet API as it doesn't rely on the information stored in the cluster. We should at least read the instance config info, calculate the partitions on each server, and estimate the heap usage per server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I added numPartitions, replicasPerPartition and memoryPerHost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If comparison column is fixed width, no need to provide it in columnStats

Copy link
Contributor Author

@deemoliu deemoliu Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks Jackie. I will infer the comparison size from Datatype.
FieldSpec.DataType dt = schema.getFieldSpecFor(comparisonColumn).getDataType(); bytesPerValue = 52 + columnStats.getComparisonColSize();

for datatype that are not fixed width, i will throw an controllerException since it's a not supported type for comparison column e.g. String

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also count the array size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks Jackie. Java has a 24 bytes array overhead and there's also 8 bytes for the actual array object, so i added 32 bytes in bytes per key.

@deemoliu deemoliu force-pushed the qiaochu/capacity-estimate-api branch from 0e29e7e to b38c607 Compare April 20, 2022 16:28
@deemoliu deemoliu changed the title [WIP] add a experiment API for upsert heap memory estimation add a experiment API for upsert heap memory estimation Apr 20, 2022
@deemoliu deemoliu force-pushed the qiaochu/capacity-estimate-api branch 8 times, most recently from ad1c735 to 210e036 Compare April 27, 2022 04:14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need replicasPerPartition because multiple replicas won't be assigned to the same host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalSpacePerPartition(bytes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) Here we only calculate the map content size. Should we also consider the map entry size and the array size within the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object overhead will vary between different machines.

I found this https://dzone.com/articles/size-entry-map shows 81.4 bytes per concurrentHashmap entry, do you think this is accurate and want me to add that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make more accurate estimation, we should count the object overhead because as you can see, the object overhead can be larger than the actual content within the entry.
I don't know the formula of object overhead from map size, so we need to do some research here. If you find that is too hard, we may add a TODO and address that later. But users should know this size does not include the overhead, so it can be quite off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion! I can add a Todo and address later. I will do some benchmark with heap dump.

@deemoliu deemoliu force-pushed the qiaochu/capacity-estimate-api branch from 4bfc9aa to f00f66d Compare May 2, 2022 21:26
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the release note of the PR description for the new added rest endpoint

@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label May 3, 2022
…/resources/PinotUpsertRestletResource.java

Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
@deemoliu
Copy link
Contributor Author

deemoliu commented May 4, 2022

Please update the release note of the PR description for the new added rest endpoint

Thanks @Jackie-Jiang , done.

@Jackie-Jiang Jackie-Jiang merged commit 9873ed9 into apache:master May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants