Skip to content

Webserver does not exit upon gunicorn master crash #13469

@drago-f5a

Description

@drago-f5a

Apache Airflow version: 1.10.14 and 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment:

  • Cloud provider or hardware configuration: AWS, custom Docker image based on Debian buster
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a): Linux 9b57b2a952e3 4.14.193-149.317.amzn2.x86_64
  • Install tools:
  • Others:

What happened:

When gunicorn master process dies with all the workers, webserver fails to exit - instead it keeps logging the following message about every 10 seconds:

webserver_1  | [2021-01-04 22:32:47 +0000] [31] [INFO] Handling signal: ttou
webserver_1  | [2021-01-04 22:32:47 +0000] [82] [INFO] Worker exiting (pid: 82)
webserver_1  | [2021-01-04 22:32:57 +0000] [31] [INFO] Handling signal: term
webserver_1  | [2021-01-04 22:32:57 +0000] [95] [INFO] Worker exiting (pid: 95)
webserver_1  | [2021-01-04 22:32:57 +0000] [116] [INFO] Worker exiting (pid: 116)
webserver_1  | [2021-01-04 22:32:58 +0000] [31] [INFO] Shutting down: Master
webserver_1  | [2021-01-04 22:32:58,228] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:33:09,239] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:33:20,252] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:33:31,263] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:33:42,275] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:33:53,288] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:04,301] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:15,313] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:26,320] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:37,332] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:48,344] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:34:59,357] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:35:10,367] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:35:21,379] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:35:32,392] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:35:43,404] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
webserver_1  | [2021-01-04 22:35:54,414] {cli.py:1082} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected

In the example above, I've killed the gunicorn master process intentionally: kill 31. In the real-world scenarios I've observed, the master process and the workers would crash due to some transient issue, such as temporary failure to fetch a secret.

What you expected to happen:

Airflow webserver should exit, so it can be restarted via systemd, docker deamon, or whatever else is managing the running services.

How to reproduce it:

Send KILL signal to the gunicorn master process.

Anything else we need to know:

I'll be providing a PR shortly with a fix for this issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions