Skip to content

Conversation

@snleee
Copy link
Contributor

@snleee snleee commented Jan 17, 2024

  1. Fix the issue with fetching validDocId metadata for table with a large number of segments. (Added POST API with list of segments to be part of the request body)
  2. Added POST support for MultiHttpRequest to cover 1.
  3. Added GET /tables//validDocIdMetadata API on the controller for improving debuggability.
e.g. sample response

GET tables/{tableName}/validDocIdMetadata
[
  {
    "segmentName": "gameScores__1__0__20240117T0931Z",
    "totalValidDocs": 0,
    "totalInvalidDocs": 100,
    "totalDocs": 100
  },
  {
    "segmentName": "gameScores__1__10__20240117T0931Z",
    "totalValidDocs": 0,
    "totalInvalidDocs": 100,
    "totalDocs": 100
  }
]

@snleee snleee force-pushed the upsert-compaction-refactoring branch 3 times, most recently from b8618c6 to e176928 Compare January 17, 2024 09:13
@snleee
Copy link
Contributor Author

snleee commented Jan 17, 2024

@robertzych Can you take a look at this?

@snleee snleee requested a review from Jackie-Jiang January 17, 2024 09:15
@snleee snleee force-pushed the upsert-compaction-refactoring branch from e176928 to 4d4bb48 Compare January 17, 2024 09:19
1. Fix the issue with fetching validDocId metadata for table with
   a large number of segments. (Added POST API with list of segments
   to be part of the request body)
2. Added POST support for MultiHttpRequest to cover 1.
3. Added GET /tables/<tableName>/validDocIdMetadata API on the controller
   for improving debuggability.
@snleee snleee force-pushed the upsert-compaction-refactoring branch from 4d4bb48 to 9d0b2df Compare January 17, 2024 09:22
@codecov-commenter
Copy link

codecov-commenter commented Jan 17, 2024

Codecov Report

Attention: 181 lines in your changes are missing coverage. Please review.

Comparison is base (8f5fa80) 61.50% compared to head (4156f04) 61.53%.
Report is 23 commits behind head on master.

Files Patch % Lines
...t/controller/util/ServerSegmentMetadataReader.java 5.33% 71 Missing ⚠️
...ent/readers/CompactedPinotSegmentRecordReader.java 0.00% 29 Missing ⚠️
...oller/api/resources/PinotTableRestletResource.java 0.00% 17 Missing ⚠️
...che/pinot/server/api/resources/TablesResource.java 26.31% 11 Missing and 3 partials ⚠️
...che/pinot/controller/util/TableMetadataReader.java 0.00% 12 Missing ⚠️
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 38.88% 9 Missing and 2 partials ⚠️
...mmon/restlet/resources/ValidDocIdMetadataInfo.java 0.00% 10 Missing ⚠️
...psertcompaction/UpsertCompactionTaskGenerator.java 52.63% 8 Missing and 1 partial ⚠️
...pinot/controller/util/CompletionServiceHelper.java 75.00% 3 Missing ⚠️
...roker/requesthandler/BaseBrokerRequestHandler.java 0.00% 2 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12275      +/-   ##
============================================
+ Coverage     61.50%   61.53%   +0.02%     
- Complexity      207     1153     +946     
============================================
  Files          2416     2417       +1     
  Lines        131179   131406     +227     
  Branches      20246    20269      +23     
============================================
+ Hits          80686    80862     +176     
- Misses        44595    44626      +31     
- Partials       5898     5918      +20     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.48% <20.96%> (+0.01%) ⬆️
java-21 61.41% <20.96%> (+0.03%) ⬆️
skip-bytebuffers-false 61.51% <20.96%> (+0.02%) ⬆️
skip-bytebuffers-true 61.39% <20.96%> (+0.02%) ⬆️
temurin 61.53% <20.96%> (+0.02%) ⬆️
unittests 61.53% <20.96%> (+0.02%) ⬆️
unittests1 46.58% <25.00%> (-0.02%) ⬇️
unittests2 27.77% <15.28%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

+ "aggregate valid doc id metadata of all segments for a table")
public String getTableAggregateValidDocIdMetadata(
@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName,
@ApiParam(value = "OFFLINE|REALTIME") @QueryParam("type") String tableTypeStr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table type should always be REALTIME as this API makes sense only for upsert tables. Should we just remove this from request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tibrewalpratik17 Currently, I return the error when the table is OFFLINE. In the future, we may want to support upsert in the offline table. We can use this flag once we support this. I'm trying to follow the convention in this class to make the API look similar as other ones.

@Produces(MediaType.APPLICATION_JSON)
@ApiOperation(value = "Get the aggregate valid doc id metadata of all segments for a table", notes = "Get the "
+ "aggregate valid doc id metadata of all segments for a table")
public String getTableAggregateValidDocIdMetadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add an optional list of segmentNames in the API request? This will be ideal for tables with a lot of segments.

Copy link
Contributor Author

@snleee snleee Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added segment names to the api param

Copy link
Contributor

@robertzych robertzych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Thanks for factoring out the UpsertCompactionReader as well.

@snleee snleee force-pushed the upsert-compaction-refactoring branch from 57709f6 to 313d531 Compare January 18, 2024 08:43
@snleee snleee force-pushed the upsert-compaction-refactoring branch from 313d531 to 4156f04 Compare January 18, 2024 08:46
@snleee snleee requested a review from klsince January 18, 2024 08:53
@snleee snleee merged commit 7b69d09 into apache:master Jan 22, 2024
@snleee snleee deleted the upsert-compaction-refactoring branch January 22, 2024 08:55
saurabhd336 pushed a commit to saurabhd336/pinot that referenced this pull request Jan 24, 2024
* Refactoring the upsert compaction related code

1. Fix the issue with fetching validDocId metadata for table with
   a large number of segments. (Added POST API with list of segments
   to be part of the request body)
2. Added POST support for MultiHttpRequest to cover 1.
3. Added GET /tables/<tableName>/validDocIdMetadata API on the controller
   for improving debuggability.

* Addressing comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants