Skip to content

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented Jan 19, 2024

this addresses: #12284

@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (6bb387a) 61.58% compared to head (2db4bc4) 61.57%.
Report is 9 commits behind head on master.

Files Patch % Lines
...e/pinot/spi/config/table/DimensionTableConfig.java 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12290      +/-   ##
============================================
- Coverage     61.58%   61.57%   -0.01%     
- Complexity     1152     1153       +1     
============================================
  Files          2415     2417       +2     
  Lines        131476   131611     +135     
  Branches      20303    20315      +12     
============================================
+ Hits          80963    81046      +83     
- Misses        44597    44629      +32     
- Partials       5916     5936      +20     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.53% <81.81%> (+0.02%) ⬆️
java-21 61.46% <81.81%> (+<0.01%) ⬆️
skip-bytebuffers-false 61.55% <81.81%> (+<0.01%) ⬆️
skip-bytebuffers-true 61.43% <81.81%> (+0.01%) ⬆️
temurin 61.57% <81.81%> (-0.01%) ⬇️
unittests 61.57% <81.81%> (-0.01%) ⬇️
unittests1 46.69% <81.81%> (+0.08%) ⬆️
unittests2 27.72% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

private final AtomicInteger _loadToken = new AtomicInteger();

private boolean _disablePreload = false;
private boolean _allowDuplicatePrimaryKey = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using true as default boolean config might be error prone. Consider changing it to errorOnDuplicatePrimaryKey and default to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. will fix

use false as default for disallowed config key;
improve test
private final AtomicInteger _loadToken = new AtomicInteger();

private boolean _disablePreload = false;
private boolean _disallowDuplicatePrimaryKey = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still suggest naming it _errorOnDuplicatePrimaryKey. We shouldn't allow duplicate primary key in any case since it will result in inconsistent behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

@walterddr walterddr merged commit eeaf1f0 into apache:master Jan 22, 2024
aishikbh added a commit to aishikbh/pinot that referenced this pull request Jan 23, 2024
- PR apache#12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that.
- Changed import statements (addressed comment).
snleee pushed a commit that referenced this pull request Jan 24, 2024
… Disk Issues (#12220)

* modularized map, reduce and segment generation phase.

* added interfaces and classes

* added staetefulrecordreaderfileconfig instead of recordreaderfileconfig

* save Progress

* major changes, working tests.

* more changes

* made the number of bytes to be configurable

* added global sequence id to segments since it is modularised out, made the intermediate file size configurable through segmentsConfig.

* remove logic for global segment IDs.

* used recordreaderconfig instead of statefulrecordreaderconfig, changed looping conditions, made recordreader of recordreaderfileconfig modifiable.

* added proper sequence ids for segments.

* ingestion working without errors. Should be good in sunny day scenarios.

* remove typo from testing

* replace _partitionToFilemanagerMap with local variable asthe mapping is happening in multiple iterations. Change exception handling for mapper.

* added precondition check for intermediateFileSizeThresholdInBytes and inferred observer from segmentsConfig

* fix typo and add comment.

* removed redundant method.

* config related changes, fix unit tests.

* remove redundant flag.

* replaced size based constraint checker with size based constraint writer.

* consolidated all the constraint checks into one place.

* add test to validate segmentprocessorframework

* decoupled recordreader closing logic from segment mapper

* addressing comments related to segmentprocessorframework.

* delegated initialisation and check for recordreader being done with all the records to recordreaderfileconfig.

* simplify logic for termination of map phase.

* change variable names for better readability.

* Added tests and minor changes in logic

- Added further tests for SegmentProcessorFramework.
- Removed redundant checks for SegmentProcessorFramework.
- Detected an edge case and added an additional statement in
  SegmentMapper.

* isolated the termination logs to mapAndTransformRow.

* Keep SegmentMapper public interface unchanged

- Pass the total count of RecordReaders through SegmentMapper
  Constructor instead of map() arguments.

* addressing review comments.

* cleaned up logging logic.

* Channged Logs

* kept the logging for default scenario same as before
* conditionally logged the progress between iterations in case the
  feature is enabled.

* modify tests

- PR #12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that.
- Changed import statements (addressed comment).
@Jackie-Jiang Jackie-Jiang added feature documentation release-notes Referenced by PRs that need attention when compiling the next release notes labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants