Skip to content

[Bug] [Zeta Engine] the checkpoint lock cause checkpoint-flow blocking with long time #5694

@happyboy1024

Description

@happyboy1024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

I want to stopp a synchronization task by triggering savepoint. But I find that the task always ends after the data synchronization is complete. I traced the logs and found that the checkpoint process triggered by savepoint always failed when trying to obtain the checkpointLock.

It should be noted that my task is running in a single-core and 4G memory environment.

After analyzing this problem, I find that checkpointLock is locked through synchronized, while synchronized is an unfair lock. In single-core environment, thread hunger is more likely due to high CPU load. The checkpoint flow fails to obtain the checkpointLock.

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
    job.mode=BATCH
    job.name=DEMO
}
source {
    Jdbc {
        url="jdbc:mysql://xxxx/transfer_source"
        driver="com.mysql.cj.jdbc.Driver"
        user="root"
        password="xxxx"
        query="select * from order_info"
        partition_column=id
        partition_num=20
        parallelism=2
    }
}
transform {
}
sink {
    Jdbc {
        url="jdbc:mysql://xxxx/transfer_sink?rewriteBatchedStatements=true"
        driver="com.mysql.cj.jdbc.Driver"
        user="root"
        password="xxxx"
        database="transfer_sink"
        table="order_info_sink"
        batch_size=1000
	enable_upsert=true
     	generate_sink_sql=true
	primary_keys = [id]
        query = ""
    }
}

Running Command

./bin/seatunnel-local.sh -c config/savepoint.config

./bin/seatunnel-local.sh -s {jobid}

Error Exception

no exception

Zeta or Flink or Spark Version

zeta

Java or Scala Version

1.8

Screenshots

The major process obtain checkpoint lock in here:

image

The checkpoint process try to obtain checkpoint lock in here:

image

When the savepoint trigger, main thread is executing pollNext. The checkpoint thread will be block as long time in mark of picture one, Until the main thread is completed

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions