Conversation
…te/down while running
📝 WalkthroughWalkthroughThe changes improve process lifecycle management by adding zombie process detection in the local compute provider, changing default cluster status from DOWN to UNKNOWN when no pid file exists, and reducing debug logging noise across the codebase. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/transformerlab/compute_providers/local.py`:
- Around line 269-276: The current liveness check only tests PID existence
(os.kill(pid, 0) and _is_process_zombie) which is vulnerable to PID reuse;
update the check in the status path that returns ClusterStatus/ClusterState.UP
to validate process identity by comparing a persistent launch fingerprint (e.g.,
store launch metadata when starting the cluster such as pid and
create_time/session id) with the live process info (use
psutil.Process(pid).create_time() or equivalent) before returning UP; if the
create_time/session metadata does not match, treat the cluster as not running
and return an appropriate non-UP status. Ensure this validation is added
alongside the existing os.kill/_is_process_zombie checks and that the stored
launch fingerprint is written when the cluster is created.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
api/transformerlab/compute_providers/local.pyapi/transformerlab/services/local_provider_queue.py
💤 Files with no reviewable changes (1)
- api/transformerlab/services/local_provider_queue.py
| os.kill(pid, 0) | ||
| if _is_process_zombie(pid): | ||
| raise ProcessLookupError("Process is zombie/defunct") | ||
| return ClusterStatus( | ||
| cluster_name=cluster_name, | ||
| state=ClusterState.UP, | ||
| status_message="Process running", | ||
| ) |
There was a problem hiding this comment.
PID reuse can still cause false UP status
On Line 269 and Line 272, liveness is inferred from PID existence. If the original process exits and that PID gets reused, an unrelated process can be reported as this cluster.
Please persist and validate process identity (e.g., launch-time fingerprint such as create time/session metadata) before returning ClusterState.UP.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/compute_providers/local.py` around lines 269 - 276, The
current liveness check only tests PID existence (os.kill(pid, 0) and
_is_process_zombie) which is vulnerable to PID reuse; update the check in the
status path that returns ClusterStatus/ClusterState.UP to validate process
identity by comparing a persistent launch fingerprint (e.g., store launch
metadata when starting the cluster such as pid and create_time/session id) with
the live process info (use psutil.Process(pid).create_time() or equivalent)
before returning UP; if the create_time/session metadata does not match, treat
the cluster as not running and return an appropriate non-UP status. Ensure this
validation is added alongside the existing os.kill/_is_process_zombie checks and
that the stored launch fingerprint is written when the cluster is created.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary by CodeRabbit
Bug Fixes
Chores