Skip to content

Show more interactive job status and move completed jobs to History#1633

Merged
aliasaria merged 22 commits intomainfrom
fix/interactive-jobs-status-history
Mar 26, 2026
Merged

Show more interactive job status and move completed jobs to History#1633
aliasaria merged 22 commits intomainfrom
fix/interactive-jobs-status-history

Conversation

@aliasaria
Copy link
Copy Markdown
Member

Summary

  • Add WAITING to the active interactive jobs filter so pending jobs appear as cards
  • Move completed/failed/stopped interactive jobs to the History section using JobsList
  • Comment out TaskTemplateList in History (to be removed later)
  • Hide Output and Files buttons for interactive jobs in JobsList
  • Update empty state message when no services are running
  • Use human-readable strings for job_data.live_status instead of codes (e.g. "Remote command started" instead of "started")
  • Add live_status updates during local provider job launch phases: "Preparing environment", "Running setup", "Starting service"
  • Fix org context propagation for live_status callback in queue worker (contextvars don't propagate via run_coroutine_threadsafe)
  • Increase remote job status polling frequency from 15s to 5s
  • Limit error message display size in JobProgress
  • Document SDK installation, context vars, and subsystem-specific docs in AGENTS.md

Test plan

  • Queue an interactive job on local provider and verify status updates appear: "Preparing environment" → "Running setup" → "Starting service" → "Remote command started"
  • Verify completed/failed/stopped interactive jobs appear in History section
  • Verify active jobs (WAITING, LAUNCHING, INTERACTIVE, RUNNING, STOPPING) appear as cards in Running Services
  • Verify Output and Files buttons are hidden for interactive jobs in History
  • Verify long error messages are scrollable and constrained in size

- Add WAITING to active jobs filter so pending jobs appear as cards
- Show COMPLETE/FAILED/STOPPED interactive jobs in History via JobsList
- Comment out TaskTemplateList in History (to be removed later)
- Limit error message display size in JobProgress
- Hide Output and Files buttons for interactive jobs in JobsList
- Update empty state message
live_status is now stored as a display-ready string (e.g.
"Remote command started") rather than a code (e.g. "started").
The frontend renders it directly instead of mapping codes to messages.

Updated: remote_trap.py, lab_facade.py, remote_job_status_service.py,
JobProgress.tsx, and corresponding tests.
Set live_status at each phase of LocalProvider.launch_cluster():
- "Preparing environment" before venv creation
- "Running setup" before setup commands run
- "Starting service" before the run command launches

This fills the visibility gap between job creation and
tfl-remote-trap taking over with "Remote command started".
Makes live_status updates visible more quickly in the UI.
The previous approach called job_service via asyncio.run() from the
executor thread, which failed because the org context var wasn't
propagated. Switch to a callback pattern: the queue worker (which
has the right async context) passes an on_status callback to
launch_cluster(), using run_coroutine_threadsafe to bridge the
thread boundary.
run_coroutine_threadsafe doesn't propagate contextvars from the
calling task. Set the organization_id explicitly in the coroutine
so the job directory can be resolved correctly.
- Add note about reinstalling lab-sdk after changes (cd lab-sdk && pip install -e .)
- Point agents to relevant docs based on whether they're working on frontend or backend
- Document context var pitfall: org_id doesn't propagate to new threads/coroutines and must be set explicitly
@sentry
Copy link
Copy Markdown

sentry bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...pi/transformerlab/services/local_provider_queue.py 0.00% 13 Missing ⚠️
api/transformerlab/compute_providers/local.py 10.00% 9 Missing ⚠️
...ansformerlab/services/remote_job_status_service.py 80.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!



REMOTE_JOB_STATUS_INTERVAL_SECONDS = int(os.getenv("REMOTE_JOB_STATUS_INTERVAL_SECONDS", "15"))
REMOTE_JOB_STATUS_INTERVAL_SECONDS = int(os.getenv("REMOTE_JOB_STATUS_INTERVAL_SECONDS", "5"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to switch this back because this causes some bugs when checking for empty jobs

Copy link
Copy Markdown
Member

@deep1401 deep1401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running remote jobs are being marked complete which brings back an older error er used to have.
Weirdly live_status also doesn't move past "live_status": "Lab initialized"
Will figure out what the error is

Copy link
Copy Markdown
Member

@deep1401 deep1401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a comment that the bug is fixed. I'll hold off on approval until we build so we dont accidentally merge

@aliasaria aliasaria merged commit 695f3e4 into main Mar 26, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants