fix(runtime): drop service before sys exit#2109
Conversation
There was a problem hiding this comment.
Greptile Summary
This PR refactors the runtime exit handling to enable proper resource cleanup before process termination. The changes modify the rt::start function to return exit codes instead of calling std::process::exit() directly, allowing the caller in start.rs to handle cleanup operations before terminating the process.
The main change involves updating the function signature of rt::start from pub async fn start(...) to pub async fn start(...) -> i32, replacing all internal exit() calls with return statements that preserve the same exit codes (ranging from 0 for success to 202 for various failure scenarios). The start.rs module is updated to capture this exit code and call std::process::exit() explicitly.
This change addresses a fundamental issue in the Shuttle runtime where services and their associated resources (network connections, file handles, background tasks) weren't being properly cleaned up before process termination. In Rust, proper resource cleanup often relies on Drop trait implementations, which wouldn't run when exit() is called directly. By returning control to the caller first, the runtime now allows for orderly shutdown procedures.
The exit codes maintain their semantic meaning for debugging and monitoring purposes - for example, 101 indicates secret loading failures, 111 for loader failures, 121 for provisioning failures, etc. This preserves the runtime's ability to communicate specific failure reasons to process managers and deployment environments.
Important Files Changed
Changed Files
| Filename | Score | Overview |
|---|---|---|
| runtime/src/rt.rs | 3/5 | Refactored start function to return exit codes instead of calling exit() directly, with inconsistent exit() calls remaining in health check task |
| runtime/src/start.rs | 4/5 | Updated to capture exit code from rt::start and explicitly call std::process::exit, enabling proper cleanup |
Confidence score: 3/5
- This PR addresses an important resource management issue but has implementation inconsistencies that need attention
- Score reflects the correct approach to the problem but incomplete refactoring in the health check server spawned task
- Pay close attention to runtime/src/rt.rs lines 101, 107, and 127 where exit() calls still bypass the cleanup mechanism
Sequence Diagram
sequenceDiagram
participant User
participant Runtime as "Runtime (start.rs)"
participant RT as "RT Module (rt.rs)"
participant API as "Shuttle API Client"
participant Service as "User Service"
participant System as "System"
User->>Runtime: "Start Shuttle app"
Runtime->>Runtime: "Check --version flag"
alt version check
Runtime-->>User: "Print version and exit"
end
Runtime->>Runtime: "Parse args and env vars"
alt validation fails
Runtime-->>User: "Print error and hint"
Runtime->>System: "Return (no exit)"
end
Runtime->>Runtime: "Setup tracing/logging"
Runtime->>RT: "start(loader, runner)"
RT->>RT: "Parse RuntimeEnvVars"
RT->>API: "Create ShuttleApiClient"
opt health check enabled
RT->>RT: "Spawn health check server"
end
RT->>API: "get_secrets(project_id)"
API-->>RT: "Return secrets"
RT->>RT: "Create ResourceFactory"
RT->>RT: "loader.load(factory)"
loop for each Shuttle resource
RT->>API: "provision_resource(project_id, resource)"
API-->>RT: "Return resource state"
alt state is Provisioning/Authorizing
RT->>RT: "Sleep 2 seconds"
else state is Ready
RT->>RT: "Update resource bytes"
else bad state
RT-->>Runtime: "Return error code 132"
end
end
opt in Shuttle environment
RT->>API: "Send sidecar shutdown request"
end
RT->>RT: "runner.run(resources)"
RT-->>RT: "Get service instance"
alt runner fails
RT-->>Runtime: "Return error code 151"
end
RT->>Service: "service.bind(service_addr)"
par Service execution
Service->>Service: "Handle requests"
and Signal handling
RT->>RT: "Listen for SIGTERM/SIGINT/etc"
end
alt Signal received
RT-->>Runtime: "Return exit code 10"
else Service terminates normally
RT-->>Runtime: "Return exit code 0"
else Service error
RT-->>Runtime: "Return exit code 1"
end
Runtime->>Service: "Drop service (implicit)"
Runtime->>System: "std::process::exit(exit_code)"
2 files reviewed, no comments
76fe277 to
ce5c3da
Compare
Allows services and other internals to
Dropbefore the runtime callsexit()