Increase graceful timeout and hardcode AWS_PROFILE#306
Conversation
| main_env = [] | ||
| if isinstance(flavor, RunnableImageLike) and flavor.env: | ||
| main_env = [{"name": key, "value": value} for key, value in flavor.env.items()] | ||
| main_env.append({"name": "AWS_PROFILE", "value": build_endpoint_request.aws_role}) |
There was a problem hiding this comment.
should we put a test for this (to test various merging logic with user-provided aws profiles)
| def start_server(): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument("--graceful-timeout", type=int, default=600) | ||
| parser.add_argument("--graceful-timeout", type=int, default=1800) |
There was a problem hiding this comment.
is this what's being used in async tasks?
There was a problem hiding this comment.
i thought we are going to patch llm-engine/model-engine/model_engine_server/inference/async_inference/celery.py to listen to sigterm?
There was a problem hiding this comment.
yeah iirc this code is used to serve the user container for both sync and async tasks for artifact-like bundles, and async_inference/celery.py shouldn't be used anymore for the user containers at least (not sure about celery-forwarder though)
There was a problem hiding this comment.
we'd also have to patch celery-forwarder to listen to sigterm
There was a problem hiding this comment.
is this what's being used in async tasks?
This is what Frances's async endpoint uses
we'd also have to patch celery-forwarder to listen to sigterm
From the celery documentation: "When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates" which I think means we're good?
There was a problem hiding this comment.
possible to simulate such an scenario by sending traffic while restarting a pod?
No description provided.