Skip to content

Make the interactive jobs not be interrupted by remote status worker#1567

Merged
deep1401 merged 7 commits intomainfrom
fix/remote-status-interactive
Mar 18, 2026
Merged

Make the interactive jobs not be interrupted by remote status worker#1567
deep1401 merged 7 commits intomainfrom
fix/remote-status-interactive

Conversation

@deep1401
Copy link
Copy Markdown
Member

No description provided.

@paragon-review
Copy link
Copy Markdown

Paragon Review Skipped

Hi @deep1401! Your Polarity credit balance is insufficient to complete this review.

Please visit https://home.polarity.cc to add more credits and continue using Paragon reviews.

@deep1401 deep1401 requested a review from dadmobile March 17, 2026 21:04
@sentry
Copy link
Copy Markdown

sentry bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 35.71429% with 27 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ansformerlab/services/remote_job_status_service.py 35.71% 24 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

else:
return False
elif job_status == JobStatus.STOPPING.value:
# If the user asked to stop, prefer STOPPED even if the wrapper reports "finished".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking for a change here...just thinking if this is right...
So the scenario is user clicks stop and before the background worker can stop it the job finishes on its own? My instinct says to tell the user FINISHED instead of STOPPED, otherwise they will look at the job and wonder if it completely finished or not?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sometimes what happens is that the live_status may reach the finished state and in that time, the remote status worker marks the job as COMPLETE. This usually happens for me on local provider which doesnt hold the job always.

I didn't want to mark it as COMPLETE because we dont know the exact scenario and this is the behaviour we follow for the normal tasks as well. We just dont mark it as FAILED if it reached till the end of the script because that means the command likely never crashed in between to mark it FAILED.

This function must never raise.
"""
try:
from transformerlab.db.session import async_session
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here...so basically importing these things can fail and you need it to not fail? When can they fail?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just combined the try except. The main failure to be handled is in the code part only. There was no need to have a separate try except for the imports


provider_id = job_data.get("provider_id")
cluster_name = job_data.get("cluster_name")
if not provider_id or not cluster_name:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think most of these returns are pretty unlikely and you are just catching to make sure there are no exceptions. But just wondering if there's anywhere else that we should at least print an error message about how a job wasn't stopped?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we just have a print statement but all of this data is present in the job data when we launch so I dont think we'd be at a scenario where a job isn't stopped. Usually for interactive jobs, they would get stopped by this function if they crashed (only for local providers, as all the other providers just stop on their own at the end).

@deep1401 deep1401 requested a review from dadmobile March 18, 2026 13:36
@deep1401 deep1401 merged commit 85dcb36 into main Mar 18, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants