Skip to content

Conversation

@mik-laj
Copy link
Member

@mik-laj mik-laj commented Jan 2, 2021

To constantly respond to container failures, I add a health check to all integrations. Thanks to this, Docker is able to detect the problem earlier and restart the container if necessary.

Information about the state of the containers is also available in docker ps command.
Screenshot 2021-01-03 at 00 54 18


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@mik-laj mik-laj requested review from kaxil and potiuk January 3, 2021 04:11
interval: 10s
timeout: 10s
retries: 120

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason why restart: always is not added for mysql and postgres?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I missed it. Added.

Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

But I would like to understand a bit more how the values for retries are determined , especially it varies quite much among different cases. For example, for mysql it is 120, which may be too big to me.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 3, 2021
@github-actions
Copy link

github-actions bot commented Jan 3, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice. Much better than the in-container check (though the in-container one should stay as a 'sanity check's)

@mik-laj
Copy link
Member Author

mik-laj commented Jan 4, 2021

But I would like to understand a bit more how the values for retries are determined , especially it varies quite much among different cases. For example, for mysql it is 120, which may be too big to me.

This value was copied from the internet. All health checks in integration-*. xml files have identical configuration.

      interval: 5s
      timeout: 30s
      retries: 50

interval 5s - breeze is used by users so make it responsive. timeout 30s has no reason whatsoever, but that makes sense to me. retries 50 is is the value copied from (in-container health checks)[https://github.com/apache/airflow/blob/6ef23aff802032e85ec42dabda83907bfd812b2c/scripts/in_container/check_environment.sh#L158-L174]. This value was determined experimentally and so far I have the impression that it works.

@mik-laj
Copy link
Member Author

mik-laj commented Jan 4, 2021

@XD-DENG I updated the PR. Now all healt checks have set interval to 5s and timeout to 30s. All integration have set retries to 50. All backend have set retries to 5.

@XD-DENG
Copy link
Member

XD-DENG commented Jan 4, 2021

@XD-DENG I updated the PR. Now all healt checks have set interval to 5s and timeout to 30s. All integration have set retries to 50. All backend have set retries to 5.

Thanks👍

@mik-laj mik-laj merged commit 3341d21 into apache:master Jan 4, 2021
@mik-laj mik-laj deleted the monitor-container-status branch January 4, 2021 13:40
@ashb
Copy link
Member

ashb commented Jan 4, 2021

Does docker do anything with these health checks now? Last time I looked (which was 12+ months ago) Docker itself didn't do anything with these.

@ashb
Copy link
Member

ashb commented Jan 4, 2021

Oh yes I see in your screenshot it does. Cool.

kaxil pushed a commit that referenced this pull request Jan 21, 2021
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 24, 2022
In order to avoid initialisation of database and other integrations
when you are entering/leaving breeze, those are started when breeze
starts but not stopped when you leave. Then stopping such running
auxiliary containers should be done with `breeze stop` after you
are done. However those who do not do it, and will restart their
machine will find that the containers get restarted.

This has been added as part of apache#13446 where health-checks are added.
However "always" was not a good choice. It should have been "on-failure"
potiuk added a commit that referenced this pull request Sep 25, 2022
In order to avoid initialisation of database and other integrations
when you are entering/leaving breeze, those are started when breeze
starts but not stopped when you leave. Then stopping such running
auxiliary containers should be done with `breeze stop` after you
are done. However those who do not do it, and will restart their
machine will find that the containers get restarted.

This has been added as part of #13446 where health-checks are added.
However "always" was not a good choice. It should have been "on-failure"
ephraimbuddy pushed a commit that referenced this pull request Nov 10, 2022
In order to avoid initialisation of database and other integrations
when you are entering/leaving breeze, those are started when breeze
starts but not stopped when you leave. Then stopping such running
auxiliary containers should be done with `breeze stop` after you
are done. However those who do not do it, and will restart their
machine will find that the containers get restarted.

This has been added as part of #13446 where health-checks are added.
However "always" was not a good choice. It should have been "on-failure"

(cherry picked from commit 337146d)
ephraimbuddy pushed a commit that referenced this pull request Nov 10, 2022
In order to avoid initialisation of database and other integrations
when you are entering/leaving breeze, those are started when breeze
starts but not stopped when you leave. Then stopping such running
auxiliary containers should be done with `breeze stop` after you
are done. However those who do not do it, and will restart their
machine will find that the containers get restarted.

This has been added as part of #13446 where health-checks are added.
However "always" was not a good choice. It should have been "on-failure"

(cherry picked from commit 337146d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants