-
Notifications
You must be signed in to change notification settings - Fork 5k
[Fix-17436][Workflow]Task timeout kill throw exception #17437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...r-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/utils/ProcessUtils.java
Fixed
Show fixed
Hide fixed
6ecde23 to
48d28cc
Compare
| String exitLogMessage = (EXIT_CODE_KILL == exitCode || EXIT_CODE_HARD_KILL == exitCode) ? "process has killed." | ||
| : "process has exited."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to check from status in taskExecutionContext rather then check from the exit code, we don't care about the exit, only kill from ds then we think it's kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to check from status in
taskExecutionContextrather then check from the exit code, we don't care about the exit, only kill from ds then we think it's kill.
Here it simply outputs the final execution status of the process, indicating whether it exited normally or was killed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| String pids = getPidsStr(processId); | ||
| String[] pidArray = PID_PATTERN.split(pids); | ||
| if (pidArray.length == 0) { | ||
| if (StringUtils.isBlank(pids) || pidArray.length == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ifStringUtils.isEmpty(pids) then we don't need to execute line 117.
| List<Integer> pidList = Arrays.stream(pidArray).filter(StringUtils::isNotBlank) | ||
| .map(s -> { | ||
| try { | ||
| return Integer.parseInt(s.trim()); | ||
| } catch (NumberFormatException e) { | ||
| log.warn("Invalid PID string ignored: {}", s); | ||
| return null; | ||
| } | ||
| }) | ||
| .filter(Objects::nonNull) | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to change getPidsStr return a validate pid list, rather than parse the pid in up layer.
| return Integer.parseInt(s.trim()); | ||
| } catch (NumberFormatException e) { | ||
| log.warn("Invalid PID string ignored: {}", s); | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw exception here, catch the exception will hide the bug.
ruanwenjun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty good.
|
@SbloodyS Could u plz help review when available? Thanks. |
SbloodyS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
…hod (apache#17005) [Fix-17316][Task-API] Add check process status after killing task (apache#17320) [Fix-17436][Workflow]Task timeout kill throw exception (apache#17437 改造支持seatunnel集群模式停止任务


Purpose of the pull request
close #17436
Brief change log
Verify this pull request
After fixing the bug, timeout kill logs:
2025-08-21 09:30:25.259 INFO [WorkerRpcServer-methodInvoker-155] - Publish TaskExecutorKillLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739825259,
"type" : "KILL"
}
2025-08-21 09:30:25.302 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Begin killing task instance, processId: 1115493
2025-08-21 09:30:25.826 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - prepare to parse pid, raw pid string: sudo(1115493)---1341.sh(1115497)---sleep(1115570)
2025-08-21 09:30:26.370 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Sending SIGINT to process group: 1115493 1115497 1115570, command: sudo -u dolphinscheduler -i kill -s SIGINT 1115493 1115497 1115570
2025-08-21 09:30:37.408 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Kill command: sudo -u dolphinscheduler -i kill -s SIGINT 1115493 1115497 1115570, timed out, still running PIDs: 1115493 1115497 1115570
2025-08-21 09:30:37.961 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Sending SIGTERM to process group: 1115493 1115497 1115570, command: sudo -u dolphinscheduler -i kill -s SIGTERM 1115493 1115497 1115570
2025-08-21 09:30:38.156 ERROR [exclusive-task-executor-container-worker-1] - process has failure due to timeout kill, the timeout value is:60, the taskTimeoutStrategy is:FAILED
2025-08-21 09:30:38.157 INFO [exclusive-task-executor-container-worker-1] - process has killed. execute path:/data01/dolphinscheduler/exec/process/1341, processId:1115493 ,exitStatusCode:-1 ,processWaitForStatus:false ,processExitValue:143
2025-08-21 09:30:38.157 INFO [exclusive-task-executor-container-worker-1] - Publish TaskExecutorFailedLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739838157,
"type" : "FAILED",
"workflowInstanceId" : 1273,
"workflowInstanceHost" : "192.168.30.11:5678",
"taskInstanceHost" : "192.168.30.121:1234",
"appIds" : null,
"endTime" : 1755739838157,
"latestReportTime" : null
}
2025-08-21 09:30:38.157 INFO [exclusive-task-executor-container-worker-1] - Publish TaskExecutorFinalizeLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739838157,
"type" : "FINALIZE"
}
2025-08-21 09:30:38.676 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Successfully killed process tree using SIGTERM, processId: 1115493
2025-08-21 09:30:38.676 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Process tree for task: 1341 is killed or already finished, pid: 1115493
2025-08-21 09:30:38.677 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Get appIds from worker 192.168.30.121:1234, taskLogPath: /data01/dolphinscheduler/20250821/149143631011392/7/1273/1341.log
2025-08-21 09:30:38.677 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Start finding appId in /data01/dolphinscheduler/20250821/149143631011392/7/1273/1341.log, fetch way: log
2025-08-21 09:30:38.677 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - The appId is empty
2025-08-21 09:30:38.677 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-22] - Success fire TaskExecutorKillLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739825259,
"type" : "KILL"
}
2025-08-21 09:30:38.718 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-20] - Success fire TaskExecutorFailedLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739838157,
"type" : "FAILED",
"workflowInstanceId" : 1273,
"workflowInstanceHost" : "192.168.30.11:5678",
"taskInstanceHost" : "192.168.30.121:1234",
"appIds" : null,
"endTime" : 1755739838157,
"latestReportTime" : null
}
2025-08-21 09:30:38.768 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-21] -
********************************* Finalize task instance ************************************
2025-08-21 09:30:38.768 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-21] - FINALIZE_SESSION
2025-08-21 09:30:38.768 INFO [PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-21] - Success fire TaskExecutorFinalizeLifecycleEvent: {
"taskInstanceId" : 1341,
"eventCreateTime" : 1755739838157,
"type" : "FINALIZE"
}
Pull Request Notice
Pull Request Notice
If your pull request contains incompatible change, you should also add it to
docs/docs/en/guide/upgrade/incompatible.md