-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version: 1.10.14 and 2.0.0
Kubernetes version (if you are using kubernetes) (use kubectl version): N/A
Environment:
- Cloud provider or hardware configuration: AWS, custom Docker image based on Debian buster
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): Linux 9b57b2a952e3 4.14.193-149.317.amzn2.x86_64 - Install tools:
- Others:
What happened:
When gunicorn master process dies with all the workers, webserver fails to exit - instead it keeps logging the following message about every 10 seconds:
webserver_1 | [2021-01-04 22:32:47 +0000] [31] [INFO] Handling signal: ttou
webserver_1 | [2021-01-04 22:32:47 +0000] [82] [INFO] Worker exiting (pid: 82)
webserver_1 | [2021-01-04 22:32:57 +0000] [31] [INFO] Handling signal: term
webserver_1 | [2021-01-04 22:32:57 +0000] [95] [INFO] Worker exiting (pid: 95)
webserver_1 | [2021-01-04 22:32:57 +0000] [116] [INFO] Worker exiting (pid: 116)
webserver_1 | [2021-01-04 22:32:58 +0000] [31] [INFO] Shutting down: Master
webserver_1 | [2021-01-04 22:32:58,228] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:33:09,239] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:33:20,252] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:33:31,263] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:33:42,275] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:33:53,288] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:04,301] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:15,313] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:26,320] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:37,332] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:48,344] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:34:59,357] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:35:10,367] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:35:21,379] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:35:32,392] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:35:43,404] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1 | [2021-01-04 22:35:54,414] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
In the example above, I've killed the gunicorn master process intentionally: kill 31. In the real-world scenarios I've observed, the master process and the workers would crash due to some transient issue, such as temporary failure to fetch a secret.
What you expected to happen:
Airflow webserver should exit, so it can be restarted via systemd, docker deamon, or whatever else is managing the running services.
How to reproduce it:
Send KILL signal to the gunicorn master process.
Anything else we need to know:
I'll be providing a PR shortly with a fix for this issue.