Add docker health check to integrations #13446

mik-laj · 2021-01-02T23:55:28Z

To constantly respond to container failures, I add a health check to all integrations. Thanks to this, Docker is able to detect the problem earlier and restart the container if necessary.

Information about the state of the containers is also available in docker ps command.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

XD-DENG · 2021-01-03T10:12:36Z

scripts/ci/docker-compose/backend-mysql.yml

+      interval: 10s
+      timeout: 10s
+      retries: 120
+


Is there any specific reason why restart: always is not added for mysql and postgres?

Good point. I missed it. Added.

XD-DENG

Overall looks good to me.

But I would like to understand a bit more how the values for retries are determined , especially it varies quite much among different cases. For example, for mysql it is 120, which may be too big to me.

github-actions · 2021-01-03T12:40:19Z

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

potiuk

Really nice. Much better than the in-container check (though the in-container one should stay as a 'sanity check's)

mik-laj · 2021-01-04T03:54:07Z

But I would like to understand a bit more how the values for retries are determined , especially it varies quite much among different cases. For example, for mysql it is 120, which may be too big to me.

This value was copied from the internet. All health checks in integration-*. xml files have identical configuration.

      interval: 5s
      timeout: 30s
      retries: 50

interval 5s - breeze is used by users so make it responsive. timeout 30s has no reason whatsoever, but that makes sense to me. retries 50 is is the value copied from (in-container health checks)[https://github.com/apache/airflow/blob/6ef23aff802032e85ec42dabda83907bfd812b2c/scripts/in_container/check_environment.sh#L158-L174]. This value was determined experimentally and so far I have the impression that it works.

mik-laj · 2021-01-04T06:07:10Z

@XD-DENG I updated the PR. Now all healt checks have set interval to 5s and timeout to 30s. All integration have set retries to 50. All backend have set retries to 5.

XD-DENG · 2021-01-04T06:09:43Z

@XD-DENG I updated the PR. Now all healt checks have set interval to 5s and timeout to 30s. All integration have set retries to 50. All backend have set retries to 5.

Thanks👍

ashb · 2021-01-04T14:08:07Z

Does docker do anything with these health checks now? Last time I looked (which was 12+ months ago) Docker itself didn't do anything with these.

ashb · 2021-01-04T14:08:47Z

Oh yes I see in your screenshot it does. Cool.

(cherry picked from commit 3341d21)

In order to avoid initialisation of database and other integrations when you are entering/leaving breeze, those are started when breeze starts but not stopped when you leave. Then stopping such running auxiliary containers should be done with `breeze stop` after you are done. However those who do not do it, and will restart their machine will find that the containers get restarted. This has been added as part of apache#13446 where health-checks are added. However "always" was not a good choice. It should have been "on-failure"

In order to avoid initialisation of database and other integrations when you are entering/leaving breeze, those are started when breeze starts but not stopped when you leave. Then stopping such running auxiliary containers should be done with `breeze stop` after you are done. However those who do not do it, and will restart their machine will find that the containers get restarted. This has been added as part of #13446 where health-checks are added. However "always" was not a good choice. It should have been "on-failure"

In order to avoid initialisation of database and other integrations when you are entering/leaving breeze, those are started when breeze starts but not stopped when you leave. Then stopping such running auxiliary containers should be done with `breeze stop` after you are done. However those who do not do it, and will restart their machine will find that the containers get restarted. This has been added as part of #13446 where health-checks are added. However "always" was not a good choice. It should have been "on-failure" (cherry picked from commit 337146d)

Kamil Breguła added 3 commits January 2, 2021 20:02

Add health check to integrations

ca90f38

fixup! Add health check to integrations

fe3a4d2

fixup! fixup! Add health check to integrations

d84dea7

boring-cyborg bot added the area:dev-tools label Jan 2, 2021

fixup! fixup! fixup! Add health check to integrations

35b9a0e

mik-laj requested review from kaxil and potiuk January 3, 2021 04:11

XD-DENG reviewed Jan 3, 2021

View reviewed changes

fixup! fixup! fixup! fixup! Add health check to integrations

4fd0bdb

XD-DENG approved these changes Jan 3, 2021

View reviewed changes

github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 3, 2021

potiuk approved these changes Jan 3, 2021

View reviewed changes

fixup! fixup! fixup! fixup! fixup! Add health check to integrations

550f5a6

TobKed approved these changes Jan 4, 2021

View reviewed changes

mik-laj merged commit 3341d21 into apache:master Jan 4, 2021

mik-laj deleted the monitor-container-status branch January 4, 2021 13:40

kaxil pushed a commit that referenced this pull request Jan 21, 2021

Add docker health check to integrations (#13446)

ade4d2b

(cherry picked from commit 3341d21)

mik-laj mentioned this pull request Apr 24, 2021

Remove condition field from depends_on for Docker Compose version 3 file. #15509

Closed

potiuk mentioned this pull request Sep 24, 2022

Do not restart breeze containers after restart #26647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add docker health check to integrations #13446

Add docker health check to integrations #13446

Uh oh!

mik-laj commented Jan 2, 2021

Uh oh!

XD-DENG Jan 3, 2021

Uh oh!

mik-laj Jan 3, 2021

Uh oh!

XD-DENG left a comment

Uh oh!

github-actions bot commented Jan 3, 2021

Uh oh!

potiuk left a comment

Uh oh!

mik-laj commented Jan 4, 2021

Uh oh!

mik-laj commented Jan 4, 2021

Uh oh!

XD-DENG commented Jan 4, 2021

Uh oh!

ashb commented Jan 4, 2021

Uh oh!

ashb commented Jan 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add docker health check to integrations #13446

Add docker health check to integrations #13446

Uh oh!

Conversation

mik-laj commented Jan 2, 2021

Uh oh!

XD-DENG Jan 3, 2021

Choose a reason for hiding this comment

Uh oh!

mik-laj Jan 3, 2021

Choose a reason for hiding this comment

Uh oh!

XD-DENG left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 3, 2021

Uh oh!

potiuk left a comment

Choose a reason for hiding this comment

Uh oh!

mik-laj commented Jan 4, 2021

Uh oh!

mik-laj commented Jan 4, 2021

Uh oh!

XD-DENG commented Jan 4, 2021

Uh oh!

ashb commented Jan 4, 2021

Uh oh!

ashb commented Jan 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants