Skip to content

Conversation

@9aman
Copy link
Contributor

@9aman 9aman commented Mar 27, 2025

Context add metrics for Pauseless Ingestion

Existing metrics don't provide the full state of a pauseless enabled table e.g. realtime ingestion will not stop even in case of build failures, failures during reingestion, number of error segments to be corrected.

Scope of the PR

This PR aims to add the following metrics

  • Server and controller level metric to check whether pauseless is enabled.
  • Build failures during segment commit
    • For conventional tables these get flagged by ingestion stop.
    • For pauseless tables, the ingestion will not stop. Thus a separate metric is needed
  • Reingestion failures.
    • Reingestion is only limited to pauseless tables

Additional Metrics

  • Some metrics that are useful for pauseless table debugging are handled by this PR: Disable reingestion for Pauseless dedup #15383
    • These includes number of errored segments.
    • Number of segments that can't be fixed in case of dedup and partial upsert tables.
  • Number of segments with missing download url is already present and will help point commitEndMetadata failures.

@9aman 9aman force-pushed the observability_for_pauseless branch from 5822a15 to a14940f Compare March 28, 2025 05:06
@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2025

Codecov Report

Attention: Patch coverage is 47.82609% with 12 lines in your changes missing coverage. Please review.

Project coverage is 63.22%. Comparing base (59551e4) to head (f6015ae).
Report is 1936 commits behind head on master.

Files with missing lines Patch % Lines
...a/manager/realtime/RealtimeSegmentDataManager.java 30.00% 7 Missing ⚠️
.../helix/core/realtime/SegmentCompletionManager.java 50.00% 2 Missing and 1 partial ⚠️
...inot/server/api/resources/ReingestionResource.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15384      +/-   ##
============================================
+ Coverage     61.75%   63.22%   +1.47%     
- Complexity      207     1375    +1168     
============================================
  Files          2436     2812     +376     
  Lines        133233   159002   +25769     
  Branches      20636    24354    +3718     
============================================
+ Hits          82274   100527   +18253     
- Misses        44911    50883    +5972     
- Partials       6048     7592    +1544     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.17% <47.82%> (+1.46%) ⬆️
java-21 63.21% <47.82%> (+1.58%) ⬆️
skip-bytebuffers-false 63.22% <47.82%> (+1.47%) ⬆️
skip-bytebuffers-true 63.16% <47.82%> (+35.43%) ⬆️
temurin 63.22% <47.82%> (+1.47%) ⬆️
unittests 63.21% <47.82%> (+1.47%) ⬆️
unittests1 56.25% <53.33%> (+9.36%) ⬆️
unittests2 33.83% <34.78%> (+6.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@9aman 9aman marked this pull request as ready for review March 28, 2025 06:31
9aman added 2 commits March 28, 2025 12:53
1. Pauseless is enabled/ disabled on server and controller
2. Build failures during the segment commit protocol
@9aman 9aman force-pushed the observability_for_pauseless branch from 72a7285 to f6015ae Compare March 28, 2025 07:24
@KKcorps KKcorps merged commit 5dd0c3a into apache:master Mar 28, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants