-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Feature][transform-v2] Data Validator Transform support #9445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Hisoka-X PTAL |
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CosmosNi
...-v2/src/main/java/org/apache/seatunnel/transform/validator/DataValidatorTransformConfig.java
Outdated
Show resolved
Hide resolved
...ansforms-v2/src/main/java/org/apache/seatunnel/transform/validator/udf/DataValidatorUDF.java
Show resolved
Hide resolved
| @DisabledOnContainer( | ||
| value = {}, | ||
| type = {EngineType.SPARK, EngineType.FLINK}, | ||
| disabledReason = "Currently SPARK not support adapt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why disabled flink and spark?
| if (errorTable != null && !errorTable.isEmpty()) { | ||
| SeaTunnelRow errorRow = inputRow.copy(); | ||
| errorRow.setTableId(errorTable); | ||
| log.debug("Routing invalid data to error table: {}", errorTable); | ||
| return errorRow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 1000 tables need to be verify. Does I also need create 1000 tables in sink for error table? Because each table has different schema. Can we design a standard structure so that the sink can use a table to store all error data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about keeping only three fields, source_table_id, original_data, validation_errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id, table_path, original_data, validation_error, create_time.
Let original_data and validation_error store value as json format. So user can parse it easiler.
...orms-v2-e2e-part-1/src/test/java/org/apache/seatunnel/e2e/transform/TestDataValidatorIT.java
Show resolved
Hide resolved
|
@Hisoka-X PTAL |
seatunnel-common/src/main/java/org/apache/seatunnel/common/exception/CommonErrorCode.java
Outdated
Show resolved
Hide resolved
...sforms-v2/src/main/java/org/apache/seatunnel/transform/validator/DataValidatorTransform.java
Outdated
Show resolved
Hide resolved
...ransforms-v2/src/main/java/org/apache/seatunnel/transform/common/TransformCommonOptions.java
Outdated
Show resolved
Hide resolved
| | name | type | required | default value | | ||
| |-----------------|--------|----------|---------------| | ||
| | error_handle_way| enum | no | FAIL | | ||
| | error_table | string | no | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the doc too.
|
|
||
| | name | type | required | default value | | ||
| |-----------------|--------|----------|---------------| | ||
| | error_handle_way| enum | no | FAIL | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | error_handle_way| enum | no | FAIL | | |
| | row_error_handle_way| enum | no | FAIL | |
|
waiting test case passes. |
close #9413
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide