Zero-downtime deployments with rolling upgrades

Hey all,

**Usecase**
I have a front-end proxy which listens on ports `80` and `443`. My applications are deployed as microservices behind this proxy, which acts as a layer-7 load balancer for those applications. These applications are, for example, `php:7-apache` containers which simply listen on port `80` for HTTP requests. The idea is to have at least 2 replicas for a service, so they can be upgraded incrementally using the *rolling upgrade* functionality that comes with Docker Swarm.

**Issue**
At this point, I'm not sure if zero-downtime deployments using the rolling update functionality of Docker Swarm are even possible, at least for my use case. There is one major issue for me here.

When containers are stopped during the rolling update, they are always stopped using the same signal (SIGTERM, or SIGKILL after a certain period). Many images, like the aforementioned apache-based image, won't gracefully shutdown with a SIGTERM, but need a different signal to be sent for the container to shutdown in a graceful way. I created an issue (#25696) for this as well, but this didn't make the 1.13 release. I don't see how the current rolling upgrade system can work in any use case, except for the cases where containers *actually* are designed specifically to shutdown gracefully when receiving a SIGTERM. In my situation, upgrading the service leads to intermittent HTTP-502 errors until the upgrade is complete. I can't imagine this not being a problem for anyone else, unless I'm missing something obvious.

**Possible workaround**
Wrap the main command of an image that needs to be able to shutdown gracefully in a wrapper script:

```bash
shut_down() {
  kill -SIGWINCH ${SCRIPT_PID}
}

trap 'shut_down' SIGTERM SIGINT

start_apache &

SCRIPT_PID = "$!"
wait ${SCRIPT_PID}
```

This would immediately fix the issue I'm having, since any `SIGINT` or `SIGTERM` that reaches the container would simply be relayed as a `SIGWINCH`. In this case, this would gracefully shutdown my apache container. However, this would mean that I would have to modify every image I'm using to use this script. It's also a non-standard and to be fair, a nasty solution.

What's the recommended course of action here? Am I missing a piece of the puzzle, or simply overseeing something? Also: even if *this* issue would be solved, would I get true **zero downtime** deployments with the rolling upgrade functionality of Docker Swarm Mode? In other words, are containers actually removed from the ingress load balancing pool prior to sending the `stop_signal` during the upgrade, or would I still get HTTP-502 errors from containers that are still being load-balanced *to*, but would be in the process of shutting down?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-downtime deployments with rolling upgrades #30321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Zero-downtime deployments with rolling upgrades #30321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions