Skip to content

fsl.check_first(): Utilise PID#2597

Closed
Lestropie wants to merge 2 commits intomasterfrom
check_first
Closed

fsl.check_first(): Utilise PID#2597
Lestropie wants to merge 2 commits intomasterfrom
check_first

Conversation

@Lestropie
Copy link
Member

Attempt at resolving #2595.

If possible, it will take the PID reported on stdout and wait for its completion.
Looking at the run_first_all code, I think that fsl_sub is used in such a way that each spawned processes inherits the previously executed processes as its children, as it's the PID of the last executed job that is sent to stdout.

This does as intended on my own system, in that the PID reported by run_first_all no longer exists and so the function can proceed as normal. However it really requires testing on a system with SGE configured. @glasserm is this something you're in a position to do easily?

Also, I've left the path.wait_for() call in place where SGE is easily detected just as a failsafe; I don't expect that code to actually be used anymore since it will only execute if the PID can't be queried.

As per discussion in #2595.
Function will utilise the PID reported by the run_first_all script, as this seems to be intended for use in determining when all processing tasks have been completed.
@Lestropie
Copy link
Member Author

Note that this solution uses the psutil module, which is seemingly not a "default" and needs to be explicitly installed in some environments. This would need to be addressed not only for CI, but also in package generation, containers, and installation-from-source documentation. It may not be an absolute requisite (I can have the code skip those steps if the module is not available), but we should try to propagate the dependency nevertheless.

@glasserm
Copy link

glasserm commented Mar 8, 2023

Happy to help if I can easily. Currently I have this as a conda install, but I am pretty illiterate with conda and python. What is the quickest way to get my current setup in a state to test this patch?

@Lestropie
Copy link
Member Author

Because it's not part of a tagged release, you'd need to clone the feature branch called check_first and then build from source.

@glasserm
Copy link

glasserm commented Mar 8, 2023

I'll see if I can get to this this weekend...

@glasserm
Copy link

glasserm commented Mar 20, 2023

Sorry for the delay. I realized you just changed python files so I applied the patch manually. Unfortunately it doesn't work. SGE does not spit out a process ID, rather it spits out a job ID. You could check to see if it is still in the queue list like this qstat | grep jobID. If it returns the job ID, it hasn't finished running, if it returns nothing, it has.

@Lestropie
Copy link
Member Author

Ah OK, it has to be queried much like a SLURM job.
I have no experience with SGE and no ability to test, so I'm taking shots a little blindly here.

Unfortunately there doesn't seem to be any established / mature Python package for querying SGE jobs. So I would indeed be forced to subprocess regular qstat calls. I don't think that just querying the presence of the qstat command in PATH as a proxy for potential use of SGE works.

Alternatively I could change strategy entirely, and directly access & query the various FIRST log files, looking for any errors. I didn't want to go this way as it could theoretically change easily between FSL versions, and would be entirely inapplicable to any other context (as opposed to path.wait_for()). But it might turn out to be simpler...

@glasserm
Copy link

Could you just fsl_sub -j something and then check for it to finish? Perhaps that would be agnostic to SGE vs SLURM (with fsl_sub handling the queuing system). Your fsl_sub -j command would not run until all of the FIRST jobs had completed (or failed) telling you when it is time to check the outputs.

Lestropie added a commit that referenced this pull request Mar 22, 2023
If possible, use fsl_sub command to halt execution until all jobs have completed.
Result of discussion in #2597.
Addresses #2595.
Lestropie added a commit that referenced this pull request Mar 22, 2023
If possible, use fsl_sub command to halt execution until all jobs have completed.
Result of discussion in #2597.
Addresses #2595.
Replicates some contents of bd3f19e.
@Lestropie
Copy link
Member Author

Closed in favour of #2609.

@Lestropie Lestropie closed this Apr 9, 2023
@Lestropie Lestropie deleted the check_first branch April 9, 2023 06:15
@Lestropie Lestropie restored the check_first branch August 26, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants