-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[feature] allow dim table config to detect/disallow duplicate PK #12290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] allow dim table config to detect/disallow duplicate PK #12290
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12290 +/- ##
============================================
- Coverage 61.58% 61.57% -0.01%
- Complexity 1152 1153 +1
============================================
Files 2415 2417 +2
Lines 131476 131611 +135
Branches 20303 20315 +12
============================================
+ Hits 80963 81046 +83
- Misses 44597 44629 +32
- Partials 5916 5936 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| private final AtomicInteger _loadToken = new AtomicInteger(); | ||
|
|
||
| private boolean _disablePreload = false; | ||
| private boolean _allowDuplicatePrimaryKey = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using true as default boolean config might be error prone. Consider changing it to errorOnDuplicatePrimaryKey and default to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. will fix
use false as default for disallowed config key; improve test
| private final AtomicInteger _loadToken = new AtomicInteger(); | ||
|
|
||
| private boolean _disablePreload = false; | ||
| private boolean _disallowDuplicatePrimaryKey = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still suggest naming it _errorOnDuplicatePrimaryKey. We shouldn't allow duplicate primary key in any case since it will result in inconsistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
- PR apache#12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that. - Changed import statements (addressed comment).
… Disk Issues (#12220) * modularized map, reduce and segment generation phase. * added interfaces and classes * added staetefulrecordreaderfileconfig instead of recordreaderfileconfig * save Progress * major changes, working tests. * more changes * made the number of bytes to be configurable * added global sequence id to segments since it is modularised out, made the intermediate file size configurable through segmentsConfig. * remove logic for global segment IDs. * used recordreaderconfig instead of statefulrecordreaderconfig, changed looping conditions, made recordreader of recordreaderfileconfig modifiable. * added proper sequence ids for segments. * ingestion working without errors. Should be good in sunny day scenarios. * remove typo from testing * replace _partitionToFilemanagerMap with local variable asthe mapping is happening in multiple iterations. Change exception handling for mapper. * added precondition check for intermediateFileSizeThresholdInBytes and inferred observer from segmentsConfig * fix typo and add comment. * removed redundant method. * config related changes, fix unit tests. * remove redundant flag. * replaced size based constraint checker with size based constraint writer. * consolidated all the constraint checks into one place. * add test to validate segmentprocessorframework * decoupled recordreader closing logic from segment mapper * addressing comments related to segmentprocessorframework. * delegated initialisation and check for recordreader being done with all the records to recordreaderfileconfig. * simplify logic for termination of map phase. * change variable names for better readability. * Added tests and minor changes in logic - Added further tests for SegmentProcessorFramework. - Removed redundant checks for SegmentProcessorFramework. - Detected an edge case and added an additional statement in SegmentMapper. * isolated the termination logs to mapAndTransformRow. * Keep SegmentMapper public interface unchanged - Pass the total count of RecordReaders through SegmentMapper Constructor instead of map() arguments. * addressing review comments. * cleaned up logging logic. * Channged Logs * kept the logging for default scenario same as before * conditionally logged the progress between iterations in case the feature is enabled. * modify tests - PR #12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that. - Changed import statements (addressed comment).
this addresses: #12284