-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
1. Create a shell task and configure the timeout failure strategy
2. Manually kill the task, and the logs show kill success operation. (Only call the cancelApplication method once.)
2025-08-15 13:49:33.105 INFO [WorkerRpcServer-methodInvoker-224] - Publish TaskExecutorKillLifecycleEvent: {
"taskInstanceId" : 1081,
"eventCreateTime" : 1755236973105,
"type" : "KILL"
}
2025-08-15 13:49:33.147 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Begin killing task instance, processId: 749659
2025-08-15 13:49:33.449 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - prepare to parse pid, raw pid string: sudo(749659)---1081.sh(749674)---sleep(749748)
2025-08-15 13:49:34.003 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Sending SIGINT to process group: 749659 749674 749748, command: sudo -u dolphinscheduler -i kill -s SIGINT 749659 749674 749748
2025-08-15 13:49:44.992 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Kill command: sudo -u dolphinscheduler -i kill -s SIGINT 749659 749674 749748, timed out, still running PIDs: 749659 749674 749748
2025-08-15 13:49:45.545 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Sending SIGTERM to process group: 749659 749674 749748, command: sudo -u dolphinscheduler -i kill -s SIGTERM 749659 749674 749748
2025-08-15 13:49:46.253 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Successfully killed process tree using SIGTERM, processId: 749659
2025-08-15 13:49:46.254 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Process tree for task: 1081 is killed or already finished, pid: 749659
2025-08-15 13:49:46.254 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Get appIds from worker 192.168.30.121:1234, taskLogPath: /data01/dolphinscheduler/20250815/149143631011392/1/1015/1081.log
2025-08-15 13:49:46.254 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Start finding appId in /data01/dolphinscheduler/20250815/149143631011392/1/1015/1081.log, fetch way: log
2025-08-15 13:49:46.254 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - The appId is empty
2025-08-15 13:49:46.254 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Success fire TaskExecutorKillLifecycleEvent: {
"taskInstanceId" : 1081,
"eventCreateTime" : 1755236973105,
"type" : "KILL"
}
2025-08-15 13:49:46.360 INFO [exclusive-task-executor-container-worker-0] - process has exited. execute path:/data01/dolphinscheduler/exec/process/1081, processId:749659 ,exitStatusCode:143 ,processWaitForStatus:true ,processExitValue:143
3, However, an exception was thrown when killing due to timeout. (The cancelApplication method was called twice.)
2025-08-15 16:55:37.289 INFO [WorkerRpcServer-methodInvoker-31] - Publish TaskExecutorKillLifecycleEvent: {
"taskInstanceId" : 1084,
"eventCreateTime" : 1755248137289,
"type" : "KILL"
}
2025-08-15 16:55:37.333 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Begin killing task instance, processId: 837363
2025-08-15 16:55:37.730 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - prepare to parse pid, raw pid string: sudo(837363)---1084.sh(837379)---sleep(837453)
2025-08-15 16:55:38.316 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Sending SIGINT to process group: 837363 837379 837453, command: sudo -u dolphinscheduler -i kill -s SIGINT 837363 837379 837453
2025-08-15 16:55:49.325 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Kill command: sudo -u dolphinscheduler -i kill -s SIGINT 837363 837379 837453, timed out, still running PIDs: 837363 837379 837453
2025-08-15 16:55:49.876 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Sending SIGTERM to process group: 837363 837379 837453, command: sudo -u dolphinscheduler -i kill -s SIGTERM 837363 837379 837453
2025-08-15 16:55:50.166 ERROR [exclusive-task-executor-container-worker-0] - process has failure, the task timeout configuration value is:60, ready to kill ...
2025-08-15 16:55:50.167 INFO [exclusive-task-executor-container-worker-0] - Begin killing task instance, processId: 837363
2025-08-15 16:55:50.566 INFO [exclusive-task-executor-container-worker-0] - prepare to parse pid, raw pid string:
2025-08-15 16:55:50.567 ERROR [exclusive-task-executor-container-worker-0] - Kill task instance error, processId: 837363
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils.kill(ProcessUtils.java:124)
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.cancelApplication(AbstractCommandExecutor.java:216)
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.run(AbstractCommandExecutor.java:196)
at org.apache.dolphinscheduler.plugin.task.shell.ShellTask.handle(ShellTask.java:85)
at org.apache.dolphinscheduler.server.worker.executor.PhysicalTaskExecutor.doTriggerTaskPlugin(PhysicalTaskExecutor.java:74)
at org.apache.dolphinscheduler.task.executor.AbstractTaskExecutor.start(AbstractTaskExecutor.java:80)
at org.apache.dolphinscheduler.task.executor.worker.TaskExecutorWorker.start(TaskExecutorWorker.java:65)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
2025-08-15 16:55:50.567 ERROR [exclusive-task-executor-container-worker-0] - Failed to kill process tree for task: 1084, pid: 837363
2025-08-15 16:55:50.567 INFO [exclusive-task-executor-container-worker-0] - Get appIds from worker 192.168.30.121:1234, taskLogPath: /data01/dolphinscheduler/20250815/149143631011392/1/1018/1084.log
2025-08-15 16:55:50.567 INFO [exclusive-task-executor-container-worker-0] - Start finding appId in /data01/dolphinscheduler/20250815/149143631011392/1/1018/1084.log, fetch way: log
2025-08-15 16:55:50.567 INFO [exclusive-task-executor-container-worker-0] - The appId is empty
2025-08-15 16:55:50.568 INFO [exclusive-task-executor-container-worker-0] - process has exited. execute path:/data01/dolphinscheduler/exec/process/1084, processId:837363 ,exitStatusCode:-1 ,processWaitForStatus:false ,processExitValue:143
2025-08-15 16:55:50.568 INFO [exclusive-task-executor-container-worker-0] - Publish TaskExecutorFailedLifecycleEvent: {
"taskInstanceId" : 1084,
"eventCreateTime" : 1755248150568,
"type" : "FAILED",
"workflowInstanceId" : 1018,
"workflowInstanceHost" : "192.168.30.11:5678",
"taskInstanceHost" : "192.168.30.121:1234",
"appIds" : "",
"endTime" : 1755248150568,
"latestReportTime" : null
}
What you expected to happen
1, Task timeout kill don't throw exception
2, It's best to trigger the kill action only once.
How to reproduce
- Create a shell task and configure the timeout failure strategy
- Run the workflow, wait to kill the task after timeout
Anything else
No response
Version
dev
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct