Skip to content

Conversation

@richardstartin
Copy link
Member

@richardstartin richardstartin commented Feb 18, 2022

We frequently see that prometheus is a resource hog in our customer deployments, with regular expression matching leading to excessive allocation and CPU time. Pinot currently uses version 0.12.0, which is hardcoded into deployments and the jar is checked in.
Screenshot 2022-02-17 at 14 31 09

Screenshot 2022-02-18 at 08 43 34

Prometheus fixed this here, allowing : prometheus/jmx_exporter#518

This PR changes this so that the latest version (pinned at 0.16.1) of jmx_prometheus_javaagent is downloaded in the docker image, and caching is enabled for our rules.

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. btw we shouldn't check in a binary jar to repo in the first place :-)

@richardstartin
Copy link
Member Author

btw we shouldn't check in a binary jar to repo in the first place :-)

you're telling me?

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2022

Codecov Report

Merging #8227 (55ce55f) into master (8f0e49e) will decrease coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #8227      +/-   ##
============================================
- Coverage     71.06%   71.00%   -0.07%     
- Complexity     4320     4322       +2     
============================================
  Files          1626     1626              
  Lines         85036    85076      +40     
  Branches      12795    12802       +7     
============================================
- Hits          60428    60405      -23     
- Misses        20456    20518      +62     
- Partials       4152     4153       +1     
Flag Coverage Δ
integration1 28.78% <ø> (-0.08%) ⬇️
integration2 27.48% <ø> (-0.06%) ⬇️
unittests1 67.40% <ø> (+0.04%) ⬆️
unittests2 14.14% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...data/manager/realtime/DefaultSegmentCommitter.java 0.00% <0.00%> (-80.00%) ⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java 43.56% <0.00%> (-18.82%) ⬇️
...data/manager/realtime/SegmentCommitterFactory.java 88.23% <0.00%> (-11.77%) ⬇️
...pinot/common/utils/fetcher/HttpSegmentFetcher.java 61.53% <0.00%> (-10.26%) ⬇️
...altime/ServerSegmentCompletionProtocolHandler.java 51.42% <0.00%> (-6.67%) ⬇️
...he/pinot/segment/local/segment/store/IndexKey.java 70.00% <0.00%> (-5.00%) ⬇️
...nction/DistinctCountBitmapAggregationFunction.java 51.81% <0.00%> (-3.63%) ⬇️
.../predicate/NotEqualsPredicateEvaluatorFactory.java 75.00% <0.00%> (-2.95%) ⬇️
...r/helix/SegmentOnlineOfflineStateModelFactory.java 63.20% <0.00%> (-1.89%) ⬇️
...nMaxValueBasedSelectionOrderByCombineOperator.java 71.21% <0.00%> (-1.52%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f0e49e...55ce55f. Read the comment docs.

@richardstartin
Copy link
Member Author

Adding the cache expressions needs thorough testing as it is not appropriate to all metric matching logic, I suggest delaying caching

@xiangfu0
Copy link
Contributor

Makes a lot of sense! Thanks Richard!

@xiangfu0 xiangfu0 changed the title upgrade prometheus and enable caching [pinot docker image] Upgrade prometheus and enable caching Feb 18, 2022
@xiangfu0 xiangfu0 self-requested a review February 18, 2022 19:20
@xiangfu0
Copy link
Contributor

btw we shouldn't check in a binary jar to repo in the first place :-)

you're telling me?

This is my bad :p

@richardstartin
Copy link
Member Author

I figured out that the much bigger problem here is that all the rules are evaluated on all the instance types, so we are running controller rules on the brokers, controller + broker on the servers, controller + broker + server rules on the minions before matching. We should break this down before considering caching.

@richardstartin richardstartin changed the title [pinot docker image] Upgrade prometheus and enable caching [pinot docker image] Upgrade prometheus and scope rulesets to components Feb 19, 2022
@richardstartin richardstartin merged commit 45c1062 into apache:master Feb 19, 2022
xiangfu0 pushed a commit to xiangfu0/pinot that referenced this pull request Feb 23, 2022
…nts (apache#8227)

* upgrade prometheus and enable caching

* revert cahcing for now

* download prometheus 0.12.0 for backward compatibility

* separate config files to reduce scope of rules to appropriate services

* newlines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants