Skip to content

Perform first healthcheck on startup without waiting the specified interval#33918

Closed
yongtang wants to merge 1 commit intomoby:masterfrom
yongtang:33410-start-period
Closed

Perform first healthcheck on startup without waiting the specified interval#33918
yongtang wants to merge 1 commit intomoby:masterfrom
yongtang:33410-start-period

Conversation

@yongtang
Copy link
Member

@yongtang yongtang commented Jul 1, 2017

- What I did
This fix tries to address the issue raised in #33410 where the first healthcheck was only performed after the specified interval.

That might be problematic in case the interval is long yet, as is specified in #33410.

- How I did it

This fix make the following changes:

  1. The first healthcheck is performed on startup (at time 0)
  2. The first healthcheck is only valid when start-period < 0

The changes allows the backward-compatibility while at the same
time it is possbile to address the issue raised in #33410.

- How to verify it

Additional test cases have been added.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

cute-kitten-kittens-22438020-480-360

This fix fixes #33410.

Signed-off-by: Yong Tang [email protected]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the indentation is not consistent here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann Thanks. The PR has been updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for _, _ =

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

daemon/health.go Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird that you have to set StartPeriod to a negative value to get the health check to run at startup. I would expect it to do this when StartPeriod is 0. I know there are backward compatibility issues, though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be clearer to introduce a new boolean flag for this, instead of overloading StartPeriod with a magic value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaronlehmann. The PR has been updated with a flag --nowait (True/False) to cover the scenario. Please take a look.

@thaJeztah
Copy link
Member

ping @yongtang

@yongtang
Copy link
Member Author

@aaronlehmann The PR has been updated. Please take a look and let me know if there are any issues.

@aaronlehmann
Copy link

Thanks, I like this design a lot more. The naming seems fine to me, but others can feel free to comment if they think of a flag name they prefer to --no-wait.

Design LGTM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize this wasn't hyphenated. Wouldn't no-wait be more consistent with the other flags?

…terval

This fix tries to address the issue raised in moby#33410 where the first
healthcheck was only performed after the specified interval.

That might be problematic in case the interval is long yet, as is
specified in moby#33410.

This fix make the following changes:
1. The first healthcheck is performed on startup (at time 0)
2. The first healthcheck is only valid when no-wait = true

The changes allows the backward-compatibility while at the same
time it is possbile to address the issue raised in moby#33410.

Additional test cases have been added.

This fix fixes moby#33410.

Signed-off-by: Yong Tang <[email protected]>
@yongtang yongtang force-pushed the 33410-start-period branch from d23ea2d to 13a19b8 Compare July 17, 2017 22:09
@yongtang
Copy link
Member Author

Thanks @aaronlehmann for the review. The PR has been updated with no-wait.

@cpuguy83
Copy link
Member

I wonder if we can just have a whole separate configuration for the startup protocol.
For instance, even if the healtcheck is performed immediately on startup, what if it fails because something wasn't quite ready, and then it needs to wait the entire interval period anyway.

@thaJeztah
Copy link
Member

@yongtang can you have a look at @cpuguy83's comment?

@rdxmb
Copy link

rdxmb commented Jan 19, 2018

What do you think about merging this as a good solution?
The current healthcheck is not useable. So this would be a major improvement.

#33918 (comment) could be another issue.

@cpuguy83
Copy link
Member

@rdxmb The problem is I don't believe this is a good solution. It may be a solution that works for some case that you currently have but that does not make it a good one.

@dtakken
Copy link

dtakken commented Feb 24, 2018

I am with @cpuguy83 on this one. I also have a bunch of containers that need a second or so before they are fully running. They should be able to go into healthy state within seconds, but there is no way to make that happen without setting a short check interval. This pull request still would not fix that.

Ideal solution in my case would be if a health check would happen right after the start-period. Then, I can choose a a start period of, say, 5 seconds and a check interval of 5 minutes and still have the container go healthy in seconds.

Some sort of a --wait-for-start option that triggers this behavior would be backward compatible and cover all discussed use cases, right?

@rdxmb
Copy link

rdxmb commented Feb 26, 2018

@cpuguy83 @dtakken Ok, I think I got it.
I just tried to draft the suggestions made up to now: #33410 (comment)

@thaJeztah
Copy link
Member

thaJeztah commented Apr 12, 2018

Discussing in the maintainers meeting: one option is to add a --health-start-interval instead, for example:

--health-start-interval=3sec
--health-interval=5min
--health-retries=3
--health-start-period=0

That configuration would:

  • start the container
  • start health-check every 3 seconds
  • if three consecutive health-checks fail, the container fails

Combining this with start-period:

--health-start-interval=3sec
--health-interval=5min
--health-retries=3
--health-start-period=10min

This would:

  • start the container
  • perform health-checks every 3 seconds
  • if the health-checks fail during the first 10 minutes (--health-start-period), they are not counted as a failure, and health-checks continue to run every 3 seconds
  • if the health-check passes during the first 10 minutes, the container becomes healthy, and the interval is changing to run healthchecks every 5 minutes (--health-interval)

--health-start-x options by default use the same values as the regular --health-x options; In future we can add additional --health-start-xx options that would override those if we see a reason to add those.

@ngrilly
Copy link

ngrilly commented Sep 5, 2018

@thaJeztah The --health-start-interval option proposed above is great! Is it still on the roadmap?

@wolfi101
Copy link

wolfi101 commented Feb 7, 2020

What's the current status here?
Without this fix a service will keep in "Starting" state until the first health check is performed.
This leads to a service downtime of the health intervale whenever a service is updated which makes the health checks unuseable if you don't want to have them in a high frequency.

@cpuguy83
Copy link
Member

cpuguy83 commented May 2, 2020

I've created #40894 to replace this.

@cpuguy83 cpuguy83 closed this May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perform first healthcheck on startup without waiting the specified interval

10 participants