Skip to content

Conversation

@dyp12
Copy link
Contributor

@dyp12 dyp12 commented May 19, 2025

Purpose of this pull request

[Feature][Sink] File support new format: maxwell_json,canal_json,debezium_json (close #9278)

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@nielifeng nielifeng requested a review from Copilot May 20, 2025 01:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for three new JSON file formats—maxwell_json, canal_json, and debezium_json—by introducing new serialization schemas and write strategies, and updating the file format configuration.

  • Added overloaded constructors with Charset support in the serialization schema classes
  • Created new write strategy classes for maxwell, canal, and debezium JSON formats
  • Updated the FileFormat enum to include new file format options

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java Added an overloaded constructor accepting Charset for maxwell JSON serialization
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/debezium/DebeziumJsonSerializationSchema.java Added an overloaded constructor accepting Charset for debezium JSON serialization
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/canal/CanalJsonSerializationSchema.java Added an overloaded constructor accepting Charset for canal JSON serialization
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/MaxWellJsonWriteStrategy.java New write strategy implementation for maxwell JSON format
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/DebeziumJsonWriteStrategy.java New write strategy implementation for debezium JSON format
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/CanalJsonWriteStrategy.java New write strategy implementation for canal JSON format
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java Updated the file format enum to include new JSON file formats
Comments suppressed due to low confidence (3)

seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java:35

  • [nitpick] Consider renaming the class to 'MaxwellJsonSerializationSchema' to maintain consistent naming with the file format 'maxwell_json'.
public class MaxWellJsonSerializationSchema implements SerializationSchema {

seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java:155

  • Consider updating the exception message to 'file format debezium json type does not support reading' for improved clarity.
throw new UnsupportedOperationException("file format debezium json type not support read");

seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java:168

  • Consider updating this exception message to 'file format maxwell json type does not support reading' for consistency and clarity.
throw new UnsupportedOperationException("file format maxwell json type not support read");


@Override
public ReadStrategy getReadStrategy() {
throw new UnsupportedOperationException("file format canal json type not support read");
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider updating the exception message to improve clarity and grammar, e.g., 'file format canal json type does not support reading'.

Suggested change
throw new UnsupportedOperationException("file format canal json type not support read");
throw new UnsupportedOperationException("File format 'canal_json' does not support reading.");

Copilot uses AI. Check for mistakes.
@hailin0
Copy link
Member

hailin0 commented May 20, 2025

waiting for ci passed

Signed-off-by: dyp12 <[email protected]>
@nielifeng nielifeng requested a review from Copilot May 20, 2025 07:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for new file formats (maxwell_json, canal_json, debezium_json) to the File sink.

  • Introduces new constructors accepting Charset in the JSON serialization schemas.
  • Implements new write strategies for maxwell_json, canal_json, and debezium_json in the file connector.
  • Extends the FileFormat enum to include entries for the new JSON formats.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java Added a new constructor to support Charset for maxwell_json serialization.
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/debezium/DebeziumJsonSerializationSchema.java Added a new constructor to support Charset for debezium_json serialization.
seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/canal/CanalJsonSerializationSchema.java Added a new constructor to support Charset for canal_json serialization.
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/*.java Implements new write strategies for maxwell_json, debezium_json, and canal_json formats with a similar approach to existing JSON strategies.
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/FileFormat.java Extended the enum to handle the new JSON formats with appropriate write strategies and read strategy exceptions.
Comments suppressed due to low confidence (1)

seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/maxwell/MaxWellJsonSerializationSchema.java:53

  • [nitpick] The class and file name 'MaxWellJsonSerializationSchema' use a mixed casing ('MaxWell') which might be inconsistent with the naming of the supported format ('maxwell_json') in the enum. Consider reviewing the naming for consistency.
public MaxWellJsonSerializationSchema(SeaTunnelRowType rowType, Charset charset) {

@dyp12
Copy link
Contributor Author

dyp12 commented May 20, 2025

waiting for ci passed

@hailin0 SqlServerSchemaChangeIT>AbstractSchemaChangeBaseIT.testMysqlCdcWithSchemaEvolutionCaseExactlyOnce:291->AbstractSchemaChangeBaseIT.assertSchemaEvolution:296 » ConditionTimeout
i need to setup something ? always fail because of timeout

@liunaijie
Copy link
Member

waiting for ci passed

@hailin0 SqlServerSchemaChangeIT>AbstractSchemaChangeBaseIT.testMysqlCdcWithSchemaEvolutionCaseExactlyOnce:291->AbstractSchemaChangeBaseIT.assertSchemaEvolution:296 » ConditionTimeout i need to setup something ? always fail because of timeout

Just retry it, CI is unstable

dyp12 added 2 commits May 21, 2025 19:38
Signed-off-by: dyp12 <[email protected]>
Signed-off-by: dyp12 <[email protected]>
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update all file series docs. Thanks

@dyp12
Copy link
Contributor Author

dyp12 commented May 22, 2025

Please update all file series docs. Thanks

sorry,i will update it

Signed-off-by: dyp12 <[email protected]>
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: dyp12 <[email protected]>
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if ci passes. Thanks @dyp12 !

@corgy-w
Copy link
Contributor

corgy-w commented May 26, 2025

Try ci again

@dyp12
Copy link
Contributor Author

dyp12 commented May 27, 2025

Try ci again

i rea

Try ci again

image
Re-run failed jobs so many times, jdbc-connectors-it-part-7 always fail

@Hisoka-X
Copy link
Member

Please rebase on dev. Already fixed in #9360

@dyp12
Copy link
Contributor Author

dyp12 commented May 27, 2025

Please rebase on dev. Already fixed in #9360

ci passes,Thank you for your help

@hailin0 hailin0 merged commit a1bfbb2 into apache:dev May 27, 2025
4 checks passed
dybyte pushed a commit to dybyte/seatunnel that referenced this pull request Jul 23, 2025
@wubx
Copy link

wubx commented Aug 7, 2025

Thank you very much for your contribution.
I have test this pr:

update t01 set c1='debezium_json' where id=7;

debezium format

{"before":{"id":7,"c1":"test"},"after":null,"op":"d"}
{"before":null,"after":{"id":7,"c1":"debezium_json"},"op":"c"}

maxwell format
update t01 set c1='debezium_json10' where id=7;

{"data":{"id":7,"c1":"debezium_json"},"type":"DELETE"}
{"data":{"id":7,"c1":"debezium_json10"},"type":"INSERT"}

From the usage, a time field(ts_ms) is required in this format to indicate the generation time of the record. This makes it convenient to obtain the status of the last record.

Full debezium json , source section is also very useful. If the file contains sources that are useful for batch Change Data Capture (CDC) for multiple tables.

{
    "before": {
        "id": 111,
        "name": "scooter",
        "description": "Big 2-wheel scooter ",
        "weight": 5.18
    },
    "after": {
        "id": 111,
        "name": "scooter",
        "description": "Big 2-wheel scooter ",
        "weight": 5.17
    },
    "source": {
        "version": "1.1.1.Final",
        "connector": "mysql",
        "name": "dbserver1",
        "ts_ms": 1589362330000,
        "snapshot": "false",
        "db": "inventory",
        "table": "products",
        "server_id": 223344,
        "gtid": null,
        "file": "mysql-bin.000003",
        "pos": 2090,
        "row": 0,
        "thread": 2,
        "query": null
    },
    "op": "u",
    "ts_ms": 1589362330904,
    "transaction": null
}

It is recommended to add the "ts" field at least.If you have obtained the other information, it is recommended to keep it.

@wubx
Copy link

wubx commented Aug 7, 2025

Create new issue to improve it.
#9675

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Sink] S3File support new format: maxwell_json

6 participants