Conversation
As per discussion in #2595. Function will utilise the PID reported by the run_first_all script, as this seems to be intended for use in determining when all processing tasks have been completed.
|
Note that this solution uses the |
|
Happy to help if I can easily. Currently I have this as a conda install, but I am pretty illiterate with conda and python. What is the quickest way to get my current setup in a state to test this patch? |
|
Because it's not part of a tagged release, you'd need to clone the feature branch called |
|
I'll see if I can get to this this weekend... |
|
Sorry for the delay. I realized you just changed python files so I applied the patch manually. Unfortunately it doesn't work. SGE does not spit out a process ID, rather it spits out a job ID. You could check to see if it is still in the queue list like this qstat | grep jobID. If it returns the job ID, it hasn't finished running, if it returns nothing, it has. |
|
Ah OK, it has to be queried much like a SLURM job. Unfortunately there doesn't seem to be any established / mature Python package for querying SGE jobs. So I would indeed be forced to Alternatively I could change strategy entirely, and directly access & query the various FIRST log files, looking for any errors. I didn't want to go this way as it could theoretically change easily between FSL versions, and would be entirely inapplicable to any other context (as opposed to |
|
Could you just fsl_sub -j something and then check for it to finish? Perhaps that would be agnostic to SGE vs SLURM (with fsl_sub handling the queuing system). Your fsl_sub -j command would not run until all of the FIRST jobs had completed (or failed) telling you when it is time to check the outputs. |
|
Closed in favour of #2609. |
Attempt at resolving #2595.
If possible, it will take the PID reported on stdout and wait for its completion.
Looking at the
run_first_allcode, I think thatfsl_subis used in such a way that each spawned processes inherits the previously executed processes as its children, as it's the PID of the last executed job that is sent to stdout.This does as intended on my own system, in that the PID reported by
run_first_allno longer exists and so the function can proceed as normal. However it really requires testing on a system with SGE configured. @glasserm is this something you're in a position to do easily?Also, I've left the
path.wait_for()call in place where SGE is easily detected just as a failsafe; I don't expect that code to actually be used anymore since it will only execute if the PID can't be queried.