Make the interactive jobs not be interrupted by remote status worker#1567
Make the interactive jobs not be interrupted by remote status worker#1567
Conversation
|
Paragon Review Skipped Hi @deep1401! Your Polarity credit balance is insufficient to complete this review. Please visit https://home.polarity.cc to add more credits and continue using Paragon reviews. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| else: | ||
| return False | ||
| elif job_status == JobStatus.STOPPING.value: | ||
| # If the user asked to stop, prefer STOPPED even if the wrapper reports "finished". |
There was a problem hiding this comment.
Not asking for a change here...just thinking if this is right...
So the scenario is user clicks stop and before the background worker can stop it the job finishes on its own? My instinct says to tell the user FINISHED instead of STOPPED, otherwise they will look at the job and wonder if it completely finished or not?
There was a problem hiding this comment.
Yes sometimes what happens is that the live_status may reach the finished state and in that time, the remote status worker marks the job as COMPLETE. This usually happens for me on local provider which doesnt hold the job always.
I didn't want to mark it as COMPLETE because we dont know the exact scenario and this is the behaviour we follow for the normal tasks as well. We just dont mark it as FAILED if it reached till the end of the script because that means the command likely never crashed in between to mark it FAILED.
| This function must never raise. | ||
| """ | ||
| try: | ||
| from transformerlab.db.session import async_session |
There was a problem hiding this comment.
Maybe add a comment here...so basically importing these things can fail and you need it to not fail? When can they fail?
There was a problem hiding this comment.
I just combined the try except. The main failure to be handled is in the code part only. There was no need to have a separate try except for the imports
|
|
||
| provider_id = job_data.get("provider_id") | ||
| cluster_name = job_data.get("cluster_name") | ||
| if not provider_id or not cluster_name: |
There was a problem hiding this comment.
So I think most of these returns are pretty unlikely and you are just catching to make sure there are no exceptions. But just wondering if there's anywhere else that we should at least print an error message about how a job wasn't stopped?
There was a problem hiding this comment.
Right now we just have a print statement but all of this data is present in the job data when we launch so I dont think we'd be at a scenario where a job isn't stopped. Usually for interactive jobs, they would get stopped by this function if they crashed (only for local providers, as all the other providers just stop on their own at the end).
No description provided.