-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Allow ingestion of errored records with incorrect datatype #9320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can we also address time column here? We should have users control the default value when it is null or not in valid time range. |
| throw new RuntimeException("Caught exception while transforming data type for column: " + column, e); | ||
| } else { | ||
| LOGGER.debug("Caught exception while transforming data type for column: {}", column, e); | ||
| record.putValue(column, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it intended to always set null here and let NullValueTransformer (which happens after DataTypeTransformer) to change null to the default value set for nullValueHandling?It's pretty neat to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Since NullValueTransformer takes care of this, I don't want to duplicate that logic and introduce the risk of it diverging from the current logic in future patches.
Codecov Report
@@ Coverage Diff @@
## master #9320 +/- ##
=============================================
- Coverage 63.49% 24.79% -38.71%
+ Complexity 4991 53 -4938
=============================================
Files 1820 1867 +47
Lines 97342 99608 +2266
Branches 14906 15167 +261
=============================================
- Hits 61806 24695 -37111
- Misses 30955 72406 +41451
+ Partials 4581 2507 -2074
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Added this change. If the config |
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexingConfig.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. With it we can catch bad data early
| } | ||
| } | ||
|
|
||
| _useDefaultValueOnError = tableConfig.getIndexingConfig().useDefaultValueOnError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making it inside the IngestionConfig
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
a4738d3 to
89230dc
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
...ocal/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java
Outdated
Show resolved
Hide resolved
|
Can you please update the pinot doc for the changes? |
Currently, if there is an exception during data type transform, we simply stop the ingestion and throw the error. The PR introduces a config so that if user sets
continueOnError: truein the tableConfig, we simply usenullor defaultValue instead of throwing error for the particular column or record.The PR also adds another config
validateTimeValue : truethat would check if the time value falls in the valid range or not during the ingestion time itself.