Skip to content

Conversation

@KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Sep 1, 2022

Currently, if there is an exception during data type transform, we simply stop the ingestion and throw the error. The PR introduces a config so that if user sets continueOnError: true in the tableConfig, we simply use null or defaultValue instead of throwing error for the particular column or record.

The PR also adds another config validateTimeValue : true that would check if the time value falls in the valid range or not during the ingestion time itself.

@mayankshriv
Copy link
Contributor

mayankshriv commented Sep 1, 2022

Can we also address time column here? We should have users control the default value when it is null or not in valid time range.

throw new RuntimeException("Caught exception while transforming data type for column: " + column, e);
} else {
LOGGER.debug("Caught exception while transforming data type for column: {}", column, e);
record.putValue(column, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended to always set null here and let NullValueTransformer (which happens after DataTypeTransformer) to change null to the default value set for nullValueHandling?It's pretty neat to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since NullValueTransformer takes care of this, I don't want to duplicate that logic and introduce the risk of it diverging from the current logic in future patches.

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2022

Codecov Report

Merging #9320 (2764ef0) into master (04c5a1a) will decrease coverage by 38.70%.
The diff coverage is 5.32%.

❗ Current head 2764ef0 differs from pull request most recent head 5dd8bec. Consider uploading reports for the commit 5dd8bec to get more accurate results

@@              Coverage Diff              @@
##             master    #9320       +/-   ##
=============================================
- Coverage     63.49%   24.79%   -38.71%     
+ Complexity     4991       53     -4938     
=============================================
  Files          1820     1867       +47     
  Lines         97342    99608     +2266     
  Branches      14906    15167      +261     
=============================================
- Hits          61806    24695    -37111     
- Misses        30955    72406    +41451     
+ Partials       4581     2507     -2074     
Flag Coverage Δ
integration2 24.79% <5.32%> (?)
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...lugin/inputformat/parquet/ParquetRecordReader.java 0.00% <0.00%> (-81.25%) ⬇️
...inputformat/parquet/ParquetRecordReaderConfig.java 0.00% <0.00%> (-54.55%) ⬇️
...pinot/plugin/inputformat/parquet/ParquetUtils.java 0.00% <0.00%> (-47.06%) ⬇️
...ot/query/runtime/executor/WorkerQueryExecutor.java 0.00% <0.00%> (-90.91%) ⬇️
...e/pinot/query/runtime/operator/FilterOperator.java 0.00% <0.00%> (ø)
...inot/query/runtime/operator/TransformOperator.java 0.00% <0.00%> (-87.15%) ⬇️
...query/runtime/operator/operands/FilterOperand.java 0.00% <0.00%> (ø)
...ery/runtime/operator/operands/FunctionOperand.java 0.00% <0.00%> (ø)
...uery/runtime/operator/operands/LiteralOperand.java 0.00% <0.00%> (ø)
...ry/runtime/operator/operands/ReferenceOperand.java 0.00% <0.00%> (ø)
... and 1542 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@KKcorps
Copy link
Contributor Author

KKcorps commented Sep 1, 2022

Can we also address time column here? We should have users control the default value when it is null or not in valid time range.

Added this change. If the config useDefaultValueOnError is enabled and timeColumn is set, we will check if each timestamp lies in the valid range or not. If not, then we set it to null and let NullValueTransform take care of replacing it with appropriate default value.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. With it we can catch bad data early

}
}

_useDefaultValueOnError = tableConfig.getIndexingConfig().useDefaultValueOnError();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making it inside the IngestionConfig

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@KKcorps KKcorps merged commit 6047b06 into apache:master Sep 2, 2022
@Jackie-Jiang Jackie-Jiang added release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Sep 3, 2022
@Jackie-Jiang
Copy link
Contributor

Can you please update the pinot doc for the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Configuration Config changes (addition/deletion/change in behavior) release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants