-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Set max number of parallel segment downloads per table in pinot-server #8694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8694 +/- ##
============================================
+ Coverage 66.16% 69.69% +3.52%
- Complexity 4387 4575 +188
============================================
Files 1277 1722 +445
Lines 64430 89862 +25432
Branches 10014 13326 +3312
============================================
+ Hits 42633 62633 +20000
- Misses 18786 22908 +4122
- Partials 3011 4321 +1310
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing something similar to SegmentRefreshSemaphore by wrapping the Semaphore inside it or can we just reuse it here since it is reused for both refresh and reload ? May have to change the log message to be more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Let me explain a bit on why I didn't choose to use the same existing semaphore in this PR:
- For segment reload, it doesn't have to always download the segment. If the same semaphore is reused, pinot server could lose the ability to download for an actual download request while it's blocked by the same semaphore which is used for reloading a local segment.
- The default number of segment to be refresh/reload is 1, meaning that for reloading all segments request, there is at most 1 segment to be reloaded, which is also different from the download request.
- For segment refresh/reload, the target segments are already shown in external view, which means that these segments are already used for querying. While download requests could be made by just adding a segment or table rebalance, during which the EV could not have been converged yet. Thus, it's okay to have higher parallelism for those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was not to reuse the same object/instance (which will be problematic like you mentioned) but the implementation instead and instantiate it here with the required number of permits for download because generically, semaphore just needs those 2 configs and can be instantiated separately by each different user for different purpose.
But this is not super necessary. The current approach is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Later on we can improve it to be dynamic instead of a server level config.
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
Outdated
Show resolved
Hide resolved
b915f2e to
6436a9e
Compare
...rver/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManagerConfig.java
Outdated
Show resolved
Hide resolved
6436a9e to
7e6bb56
Compare
|
For my understanding, does this feature implicitly lower impact on the servers as well? If we set this and the controller is using the semaphore, does this mean the servers will also be throttled in the rate they download? For context, I just pushed 4096 segments to a table, and the servers were at 100% CPU for ~30 minutes while it downloaded the segments and finished building indices. |
@jadami10 Yes, this could be one of the optimizations to lower your CPU usage. While as you mentioned there are two major steps that require servers to handle. For building indices, you can enable the index generation logic during the segment generation so that servers won't be busy doing so. You can enable it in the table config. Take a look at this: |
|
This PR adds the ability to set the max number of parallel segment downloads per table in pinot-server. There are a couple of reasons: