Skip to content

Conversation

@swaminathanmanish
Copy link
Contributor

@swaminathanmanish swaminathanmanish commented Oct 26, 2023

Problem:
We need the ability to pass in custom RecordTransformers that can be used during map transformation (segment generation). Without this, users have to introduce custom RecordReaders where custom transformations have to be done. With customer RecordReaders, we no longer have control over allocations/deallocations of RecordReaders (eg: users can pass in long list of pre-allocated Parquet recordReaders which can result in holding up large heap allocations during segment generation and even result in OOM). Moreover we also have a well defined RecordTransformer interface just to do this transformation.

This PR allows users to pass custom RecordTransformers to SegmentProcessorFramework.

@swaminathanmanish swaminathanmanish marked this pull request as draft October 26, 2023 23:19
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2023

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.42%. Comparing base (60edf49) to head (d19c6af).
⚠️ Report is 3092 commits behind head on master.

Files with missing lines Patch % Lines
.../local/recordtransformer/CompositeTransformer.java 50.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #11887      +/-   ##
============================================
- Coverage     61.43%   61.42%   -0.02%     
- Complexity     1146     1147       +1     
============================================
  Files          2375     2376       +1     
  Lines        128511   128779     +268     
  Branches      19848    19906      +58     
============================================
+ Hits          78955    79102     +147     
- Misses        43859    43956      +97     
- Partials       5697     5721      +24     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.37% <76.92%> (+0.01%) ⬆️
java-21 61.30% <76.92%> (-0.01%) ⬇️
skip-bytebuffers-false 61.40% <76.92%> (-0.02%) ⬇️
skip-bytebuffers-true 61.27% <76.92%> (+<0.01%) ⬆️
temurin 61.42% <76.92%> (-0.02%) ⬇️
unittests 61.41% <76.92%> (-0.02%) ⬇️
unittests1 46.62% <53.84%> (-0.04%) ⬇️
unittests2 27.66% <23.07%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@swaminathanmanish swaminathanmanish marked this pull request as ready for review October 30, 2023 22:30
@swaminathanmanish swaminathanmanish changed the title (WIP) Allowing custom transformers to be passed to SegmentProcessingFramework Allowing custom transformers to be passed to SegmentProcessingFramework Oct 30, 2023
@snleee snleee merged commit 7b0eb29 into apache:master Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants