Skip to content

Add more async calls to compute_provider#1369

Merged
dadmobile merged 5 commits intomainfrom
fix/more_async_provider_calls
Feb 20, 2026
Merged

Add more async calls to compute_provider#1369
dadmobile merged 5 commits intomainfrom
fix/more_async_provider_calls

Conversation

@dadmobile
Copy link
Copy Markdown
Member

@dadmobile dadmobile commented Feb 19, 2026

I think this will address the submit job call causing healthz to timeout. Fixes #1367

Summary by CodeRabbit

  • Refactor
    • Compute provider operations (cluster launches, job submission, listing/fetching jobs, logs, cancellations, and credential handling) now run off the main event loop, improving responsiveness and concurrent request handling.
    • Public behavior, returned data shapes, and error handling remain unchanged.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Synchronous compute-provider operations in the router were moved onto background threads via asyncio.to_thread(): AWS credential loading, provider launch/submit, job listing/info/logs, local path resolution, and cancellation-related provider calls. Public signatures and returned data shapes remain unchanged.

Changes

Cohort / File(s) Summary
Compute provider router
api/transformerlab/routers/compute_provider.py
Replaced direct synchronous provider and filesystem calls with await asyncio.to_thread(...) in paths for AWS credential retrieval, cluster launch/submit, get_local_provider_job_dir, list_jobs, get_job_info, get_job_logs, and cancellation flows. No public API signature changes; error handling preserved.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • transformerlab/transformerlab-app#1362 — Similar change that offloads blocking provider calls to asyncio.to_thread() in the same router and adds provider-specific connection-error handling.

Suggested labels

mode:multiuser

Poem

🐇 I hop on threads to keep queues bright,

I fetch the creds and launch by night,
Logs whisper back without a stall,
Jobs march on — I catch them all,
Hooray for async, small and tall.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: converting synchronous provider calls to asynchronous calls using asyncio.to_thread.
Linked Issues check ✅ Passed The PR addresses issue #1367 by refactoring synchronous provider operations to use asyncio.to_thread, offloading blocking calls to thread pools to prevent API unresponsiveness during job submission.
Out of Scope Changes check ✅ Passed All changes are focused on converting provider interactions to async operations, directly addressing the timeout and API unresponsiveness issue without introducing unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/more_async_provider_calls

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/routers/compute_provider.py`:
- Line 2673: The router currently calls provider_instance.submit_job via
asyncio.to_thread (using symbols provider_instance.submit_job,
asyncio.to_thread, cluster_name, job_config); move this execution into a new or
existing service in api/transformerlab/services (e.g., add a
ComputeProviderService.submit_job method) that encapsulates the threading and
provider invocation, then have the router call that service method and await its
result so the router only orchestrates HTTP I/O; likewise refactor the other
provider-executing calls referenced (the coroutine/threading usages at the other
spots) into corresponding service methods and replace direct
asyncio.to_thread/provider_instance calls in the router with simple service
calls.

Comment thread api/transformerlab/routers/compute_provider.py
@sentry
Copy link
Copy Markdown

sentry bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
api/transformerlab/routers/compute_provider.py 0.00% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dadmobile dadmobile merged commit 54a100b into main Feb 20, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When I submit a job I often get a "Connection Lost" overlay

3 participants