Graceful shutdown drain not happening due to prometheus

## Description
While testing envoy graceful shutdown in my staging env, I am facing an issue where all connections close in mostly  5-10 seconds, but there is one single connection which remains active for 5 minutes (shown in log line  `envoy/shutdown_manager.go:224	total connections: 1`  in `shutdown-manager` logs, and this log line continues to appear for 5m). Due to this, during deployment/restarts of envoy proxy pods, new pods come up and get ready but old pods take 5m to terminate. [Full Shutdown Manager Logs](https://pastebin.com/TBmhpN7i)

While debugging this it was pointed out by @arkodg in following [slack thread](https://envoyproxy.slack.com/archives/C03E6NHLESV/p1724760349526369) that it could be due to prometheus. On removing the `ServiceMonitor` everything worked fine. So basically the one connection is due to prometheus scraping which is not getting closed automatically. My guess is the 5m time is due to be the idle timeout of GO Http library used by  prometheus but i am not sure about this - [Source](https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusPersistentConnections). 

Need suggestions on how can i fix this as i don't see any sort of configurable timeout in prometheus for connections used for scrapping. Possible solutions i can think of:
- Can decrease `shutdown.drainTimeout` in `EnvoyProxy` to decrease the time but this doesn't seems to be an ideal solution. 
- Add a timeout in `envoy-gateway-proxy-ready-0.0.0.0-19001` listener using [ProxyBootstrap](https://gateway.envoyproxy.io/docs/api/extension_types/#proxybootstrap), haven't tried this yet but this could work.

### Setup info: 
- prom global config :
```yaml
global:
  scrape_interval: 30s
  scrape_timeout: 15s
  scrape_protocols:
  - OpenMetricsText1.0.0
  - OpenMetricsText0.0.1
  - PrometheusText0.0.4
  evaluation_interval: 1m
  external_labels:
    prometheus: monitoring/kps-prometheus
    prometheus_replica: prometheus-kps-prometheus-0
```
- Envoy's graceful shutdown settings are at default.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown drain not happening due to prometheus #4125

Description

Setup info:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graceful shutdown drain not happening due to prometheus #4125

Description

Description

Setup info:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions