Update job status to FAILED for a crash detected by tfl remote trap by deep1401 · Pull Request #1432 · transformerlab/transformerlab-app

deep1401 · 2026-03-03T17:54:07Z

Summary by CodeRabbit

Chores
- Project version updated to 0.0.86
- Updated transformerlab dependency to 0.0.86
Bug Fixes
- Improved live status handling for crashed jobs so failures are mirrored to job status and marked as FAILED

coderabbitai · 2026-03-03T17:54:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f1ccfd and 220ab14.

📒 Files selected for processing (2)

api/pyproject.toml
lab-sdk/pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (2)

api/pyproject.toml
lab-sdk/pyproject.toml

📝 Walkthrough

Walkthrough

Version bumps: api and lab-sdk bumped from 0.0.85 → 0.0.86. Remote trap update: _set_live_status_async docstring expanded and when live_status == "crashed" it now also sets job status to "FAILED".

Changes

Cohort / File(s)	Summary
Version files `api/pyproject.toml`, `lab-sdk/pyproject.toml`	Project/dependency version incremented from `0.0.85` to `0.0.86`.
Remote Trap Logic `lab-sdk/src/lab/remote_trap.py`	Docstring expanded for `_set_live_status_async` and logic added to mark job status as `"FAILED"` when `live_status` is `"crashed"`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Remote trap to indicate live status of a command running irrespective of lab-sdk usage #1305: Related change introducing/adjusting remote_trap handling for live_status == "crashed" and marking jobs as failed.

Suggested labels

mode:multiuser

Suggested reviewers

aliasaria

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: updating job status to FAILED when a crash is detected by the remote trap, which is reflected in the lab-sdk/src/lab/remote_trap.py changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/remote-crash-running-state

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

paragon-review · 2026-03-03T18:03:32Z

Paragon Summary

This pull request review identified 1 issue across 1 category in 3 files. The review analyzed code changes, potential bugs, security vulnerabilities, performance issues, and code quality concerns using automated analysis tools.

This PR enhances crash detection by updating job status to FAILED when crashes are caught by the tfl remote trap mechanism, with updates to the SDK's remote_trap implementation and version bumps in both the API and SDK packages.

Key changes:

Updates job status to FAILED when crashes are detected by tfl remote trap
Modifies lab-sdk/src/lab/remote_trap.py for crash detection logic
Updates dependencies in both api/pyproject.toml and lab-sdk/pyproject.toml

Confidence score: 5/5

This PR has low risk with no critical or high-priority issues identified
Score reflects clean code review with only minor suggestions or no issues found
Code quality checks passed - safe to proceed with merge

3 files reviewed, 1 comment

Severity breakdown: Low: 1

Tip: @paragon-run <instructions> to chat with our agent or push fixes!

Dashboard

paragon-review · 2026-03-03T18:03:32Z

        job = await Job.get(job_id)
        if job is None:
            return
        await job.update_job_data_field("live_status", status)


Bug: Crash status writes can partially fail silently

Crash status writes can partially fail silently. Job may show live_status=crashed without FAILED status. Consider writing update_status before update_job_data_field.

View Details

Location: lab-sdk/src/lab/remote_trap.py (lines 18)

Analysis

Crash status writes can partially fail silently. Job may show live_status=crashed without FAILED sta

What fails If update_job_data_field succeeds but update_status('FAILED') throws, the blanket except silently returns — leaving live_status=crashed without a matching FAILED job status.

Result Job has live_status=crashed but top-level job status is not set to FAILED, creating inconsistent state.

Expected Both live_status and job status should reflect the crash consistently, or the more important status (FAILED) should be written first.

Impact Minor: Under the existing best-effort design this is acceptable, but the job could appear stuck in monitoring UIs that key on job status rather than live_status.

How to reproduce

1. Trigger a remote job crash (nonzero exit code) 2. Simulate update_status raising an exception (e.g., network or storage error) 3. Observe job state

AI Fix Prompt

Fix this issue: Crash status writes can partially fail silently. Job may show live_status=crashed without FAILED status. Consider writing update_status before update_job_data_field. Location: lab-sdk/src/lab/remote_trap.py (lines 18) Problem: If update_job_data_field succeeds but update_status('FAILED') throws, the blanket except silently returns — leaving live_status=crashed without a matching FAILED job status. Current behavior: Job has live_status=crashed but top-level job status is not set to FAILED, creating inconsistent state. Expected: Both live_status and job status should reflect the crash consistently, or the more important status (FAILED) should be written first. Steps to reproduce: 1. Trigger a remote job crash (nonzero exit code) 2. Simulate update_status raising an exception (e.g., network or storage error) 3. Observe job state Provide a code fix.

_{Tip: Reply with @paragon-run to automatically fix this issue}

…ab-app into fix/remote-crash-running-state

sentry · 2026-03-03T18:40:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

deep1401 and others added 2 commits March 3, 2026 10:52

Update job status to FAILED for a crash detected by tfl remote trap

0dc46c4

Merge branch 'main' into fix/remote-crash-running-state

8f1ccfd

paragon-review bot reviewed Mar 3, 2026

View reviewed changes

deep1401 added 3 commits March 3, 2026 11:03

Merge branch 'main' of https://github.com/transformerlab/transformerl…

e893de0

…ab-app into fix/remote-crash-running-state

Merge branch 'main' of https://github.com/transformerlab/transformerl…

b5ffcce

…ab-app into fix/remote-crash-running-state

version

220ab14

Merge branch 'main' into fix/remote-crash-running-state

c3ac6ac

dadmobile approved these changes Mar 3, 2026

View reviewed changes

aliasaria approved these changes Mar 3, 2026

View reviewed changes

Merge branch 'main' into fix/remote-crash-running-state

bcad1c9

deep1401 merged commit 52c9dbd into main Mar 3, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update job status to FAILED for a crash detected by tfl remote trap#1432

Update job status to FAILED for a crash detected by tfl remote trap#1432
deep1401 merged 7 commits intomainfrom
fix/remote-crash-running-state

deep1401 commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

paragon-review bot commented Mar 3, 2026

Uh oh!

paragon-review bot Mar 3, 2026

Uh oh!

sentry bot commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


What fails	If update_job_data_field succeeds but update_status('FAILED') throws, the blanket except silently returns — leaving live_status=crashed without a matching FAILED job status.
Result	Job has live_status=crashed but top-level job status is not set to FAILED, creating inconsistent state.
Expected	Both live_status and job status should reflect the crash consistently, or the more important status (FAILED) should be written first.
Impact	Minor: Under the existing best-effort design this is acceptable, but the job could appear stuck in monitoring UIs that key on job status rather than live_status.

Uh oh!

Conversation

deep1401 commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

paragon-review bot commented Mar 3, 2026

Paragon Summary

Confidence score: 5/5

Uh oh!

paragon-review bot Mar 3, 2026

Choose a reason for hiding this comment

Bug: Crash status writes can partially fail silently

Analysis

Uh oh!

sentry bot commented Mar 3, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deep1401 commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading