-
Notifications
You must be signed in to change notification settings - Fork 268
Description
When generic-worker receives a SIGTERM it shuts down immediately rather than going through the graceful shutdown procedure that is used for, eg: preemptions noticed by worker-manager.
This scenario can come up in practice for graceful terminations that take longer than typical ones. What can happen is:
- worker-runner notices the preemption, and starts graceful termination
- generic-worker begins to wrap up work
- systemd begins shutting down services with SIGTERM
- generic-worker continues shut down / uploading
- systemd sends SIGTERM to generic-worker
- generic-worker immediately dies
I believe this can be observed in this log snippet:
Feb 18 02:59:37 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 19:59:37 Executing command 0: ["docker" "run" "-t" "--name" "taskcontainer_EQcyitziSG2JdilqKar9UA" "--memory-swap" "-1" "--pids-limit" "-1" "--cap-add=SYS_PTRACE" "--add-host=localhost.localdomain:127.0.0.1" "-v" "/home/task_177144461777645/cache0:/builds/worker/tooltool-cache" "-v" "/home/task_177144461777645/cache1:/builds/worker/.task-cache/uv" "-v" "/home/task_177144461777645/cache2:/builds/worker/.task-cache/pip" "--device=/dev/kvm" "--device=/dev/video0" "--add-host=taskcluster:host-gateway" "--env-file" "env.list" "9dba6e98253622cfa1bc00f7b57ef4f677d477f6167e7323199cbae97645663d" "/builds/worker/bin/run-task-hg" "--" "/builds/worker/bin/test-linux.sh" "--test-type=testharness" "--skip-implementation-status=backlog" "--skip-implementation-status=not-implementing" "--skip-timeout" "--skip-crash" "--no-update-status-on-crash" "--exclude-tag=webgpu" "--exclude-tag=canvas" "--exclude-tag=webcodecs" "--exclude-tag=eme" "--disable-fission" "--setpref=fission.disableSessionHistoryInParent=true" "--skip-timeout" "--setpref=layers.d3d11.enable-blacklist=false" "--total-chunk=24" "--this-chunk=21" "--download-symbols=ondemand"]
---------
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 GCP Metadata Service says termination is imminent
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Sending graceful-termination request with finish-tasks=false
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 polling for termination-time
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Got graceful-termination request with finish-tasks=false
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Killing process tree with parent PID 53816... (0xc000116e80)
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Process tree with parent PID 53816 killed.
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Notifying listener taskcluster-proxy of state change
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Received task status change: Aborted
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Stopping task feature Command Executor...
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Stopping task feature Task Timer...
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Stopping task feature Max Run Time...
Feb 18 03:00:23 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva start-worker 2026/02/18 20:00:23 Stopping task feature D2G...
Feb 18 03:00:24 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd docker-9968558d164d0f4a4efd3419454888f9516e953389af6c8578bdf9eff5dbc18b.scope: Deactivated successfully.
Feb 18 03:00:24 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd docker-9968558d164d0f4a4efd3419454888f9516e953389af6c8578bdf9eff5dbc18b.scope: Consumed 45.038s CPU time.
---------
Feb 18 03:00:24 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd-logind Power key pressed short.
Feb 18 03:00:24 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd-logind Powering off...
Feb 18 03:00:24 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd-logind System is powering down.
---------
Feb 18 03:00:25 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd Stopping worker.service - Start TC worker...
----------
Feb 18 03:00:25 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd worker.service: Deactivated successfully.
Feb 18 03:00:25 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd Stopped worker.service - Start TC worker.
Feb 18 03:00:25 PM gecko-t-t-linux-docker-kvm-gvcpmc6cqnsynjyxxz-fva systemd worker.service: Consumed 33.715s CPU time.
generic-worker seems to begin the "Stopping task feature D2G..." phase, but it never completes it. (I'm guessing it makes little progress in part because the image for this task is well over 1GB, but that's not verifiable.)
AIUI, what happens here is that systemd sends SIGTERM to the entire process group started by the start-worker service. If this signal is not handled by a process, it will die immediately. This suggests that we probably want to handle it in both generic-worker and worker-runner.