-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Feature][Sink] File support new format: maxwell_json,canal_json,debezium_json (#9278) #9336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…bezium_json Signed-off-by: dyp12 <[email protected]>
…bezium_json Signed-off-by: dyp12 <[email protected]>
…bezium_json Signed-off-by: dyp12 <[email protected]>
…bezium_json Signed-off-by: dyp12 <[email protected]>
…bezium_json Signed-off-by: dyp12 <[email protected]>
…bezium_json Signed-off-by: dyp12 <[email protected]>
Signed-off-by: dyp12 <[email protected]>
Signed-off-by: dyp12 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for three new JSON file formats—maxwell_json, canal_json, and debezium_json—by introducing new serialization schemas and write strategies, and updating the file format configuration.
- Added overloaded constructors with Charset support in the serialization schema classes
- Created new write strategy classes for maxwell, canal, and debezium JSON formats
- Updated the FileFormat enum to include new file format options
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java | Added an overloaded constructor accepting Charset for maxwell JSON serialization |
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/debezium/DebeziumJsonSerializationSchema.java | Added an overloaded constructor accepting Charset for debezium JSON serialization |
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/canal/CanalJsonSerializationSchema.java | Added an overloaded constructor accepting Charset for canal JSON serialization |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/MaxWellJsonWriteStrategy.java | New write strategy implementation for maxwell JSON format |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/DebeziumJsonWriteStrategy.java | New write strategy implementation for debezium JSON format |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/CanalJsonWriteStrategy.java | New write strategy implementation for canal JSON format |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java | Updated the file format enum to include new JSON file formats |
Comments suppressed due to low confidence (3)
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java:35
- [nitpick] Consider renaming the class to 'MaxwellJsonSerializationSchema' to maintain consistent naming with the file format 'maxwell_json'.
public class MaxWellJsonSerializationSchema implements SerializationSchema {
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java:155
- Consider updating the exception message to 'file format debezium json type does not support reading' for improved clarity.
throw new UnsupportedOperationException("file format debezium json type not support read");
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java:168
- Consider updating this exception message to 'file format maxwell json type does not support reading' for consistency and clarity.
throw new UnsupportedOperationException("file format maxwell json type not support read");
|
|
||
| @Override | ||
| public ReadStrategy getReadStrategy() { | ||
| throw new UnsupportedOperationException("file format canal json type not support read"); |
Copilot
AI
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider updating the exception message to improve clarity and grammar, e.g., 'file format canal json type does not support reading'.
| throw new UnsupportedOperationException("file format canal json type not support read"); | |
| throw new UnsupportedOperationException("File format 'canal_json' does not support reading."); |
|
waiting for ci passed |
Signed-off-by: dyp12 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for new file formats (maxwell_json, canal_json, debezium_json) to the File sink.
- Introduces new constructors accepting Charset in the JSON serialization schemas.
- Implements new write strategies for maxwell_json, canal_json, and debezium_json in the file connector.
- Extends the FileFormat enum to include entries for the new JSON formats.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java | Added a new constructor to support Charset for maxwell_json serialization. |
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/debezium/DebeziumJsonSerializationSchema.java | Added a new constructor to support Charset for debezium_json serialization. |
| seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/canal/CanalJsonSerializationSchema.java | Added a new constructor to support Charset for canal_json serialization. |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/*.java | Implements new write strategies for maxwell_json, debezium_json, and canal_json formats with a similar approach to existing JSON strategies. |
| seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java | Extended the enum to handle the new JSON formats with appropriate write strategies and read strategy exceptions. |
Comments suppressed due to low confidence (1)
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java:53
- [nitpick] The class and file name 'MaxWellJsonSerializationSchema' use a mixed casing ('MaxWell') which might be inconsistent with the naming of the supported format ('maxwell_json') in the enum. Consider reviewing the naming for consistency.
public MaxWellJsonSerializationSchema(SeaTunnelRowType rowType, Charset charset) {
@hailin0 SqlServerSchemaChangeIT>AbstractSchemaChangeBaseIT.testMysqlCdcWithSchemaEvolutionCaseExactlyOnce:291->AbstractSchemaChangeBaseIT.assertSchemaEvolution:296 » ConditionTimeout |
Just retry it, CI is unstable |
Signed-off-by: dyp12 <[email protected]>
Signed-off-by: dyp12 <[email protected]>
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update all file series docs. Thanks
sorry,i will update it |
Signed-off-by: dyp12 <[email protected]>
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test case, you can refer
Line 52 in fbf7872
| public class LocalFileTest { |
Signed-off-by: dyp12 <[email protected]>
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if ci passes. Thanks @dyp12 !
|
Try ci again |
|
Please rebase on dev. Already fixed in #9360 |
ci passes,Thank you for your help |
|
Thank you very much for your contribution. debezium format maxwell format From the usage, a time field(ts_ms) is required in this format to indicate the generation time of the record. This makes it convenient to obtain the status of the last record. Full debezium json , source section is also very useful. If the file contains sources that are useful for batch Change Data Capture (CDC) for multiple tables. It is recommended to add the "ts" field at least.If you have obtained the other information, it is recommended to keep it. |
|
Create new issue to improve it. |

Purpose of this pull request
[Feature][Sink] File support new format: maxwell_json,canal_json,debezium_json (close #9278)
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide