Help with FAiled Workflow Invocation

Hi,

Could I get some advice on how to convince my workflow to complete? It appears that 43 sequences failed Kraken and 20 failed Bracken which is surprising because these samples previously succeeded the taxonomic portion of my workflow.

HUMAnN seems to have bottlenecked the entire functional analysis when 79 jobs failed.

Is it possible to force my invocation to re-analyze the sequences that failed then complete the functional analysis? Alternatively, are there better ways to accomplish my goal?

Looking forward to any help you can provide.

Cheers,

BB

Hi @zpho3nix,

my primary advice would be to split up your samples into smaller chunks and invoke your workflow separately for each of them.

Neither Galaxy Europe, nor any other public server, has the capacity to process the entire set in parallel for a workflow that involves such computationally expensive steps meaning steps will run sequentially to a large extent anyways. Since you have a join step towards the end of the workflow that needs to wait for all of the upstream steps to finish it can take a long time until you obtain the aggregated result and, in case that even just one step for a single sample fails to process, you will not get anything at all.

If you set up batches of, say, 36 samples and process them via 12 invocations of your workflow (probably modified to omit the joint analysis part, which you’d turn into a separate workflow, and startting each new workflow run when the previous one is completed), you will obtain partial results faster and will likely end up with failing jobs localized to just some of the batches.


Second: for this invocation you have also been unlucky because we had server issues during this time. We are still working on those, so please wait with any reruns until tonight or tomorrow morning and you should have a much smoother experience again.


Now to answer your rerun question:

Normally you can rerun failed jobs as part of a workflow by finding them in your history, then selecting rerun. Then in the tool interface, scroll down and find and activate the option ”Resume dependencies from this job” just above the Run Tool button. When you now run the tool, the new run will replace the failed one and downstream jobs waiting for its results will resume.

If you have lots of failing jobs, you miight be interested in automating the task. For this, you would need to use the command line tool planemo and I could explain you how that works if you’re interested and not scared of using the command line.

I can also offer you to do the automated rerun for you on our server once we fixed our issues, which might be the simplest solution for now. Just let me know if you’re ok with me performing this action under your user account.

Cheers,

Wolfgang

1 Like

Hi Wolfgang,

I’d truly appreciate it if you’d automate the rerun for me. Do you think it’s safe now to process my sequences in batch format as you suggested? If so, I’ve got a lot of work to do :’)

I’ve triggered the rerun, but obviously it may take a while.

Just don’t interfere with things manually, but you can watch progress under your original link above.

Oh oh, that didn’t end well.

I hadn’t realized that all your data was on temporary storage and now everything got deleted before all datasets could be processed.

I’m very sorry, but you will have to upload the original data again and restart your WF (possibly taking into account any of my previous recommendations).

I can help you babysit the invocation if problems occur again.

Very sorry for this worst kind of outcome,

Wolfgang

Hi Wolfgang,

-deep sigh- I just can’t win with this Bioinformatics pipeline. I am going to reupload the files this week and take your advice about processing the files in batches. Would you mind providing the script (and instructions on how to deploy it) to automate rerun if any of my batches need it? I’m familiar with command line/Python/R, but I’m not sure how to implement it on the front end of Galaxy since I don’t have administrator access. You’re my only hope, Obi-wan.

Thank you,
Brittani

Hi Brittani,

yes, doing batches should make your life much easier.

For rerunning jobs:

You will need to get planemo installed and obtain your Galaxy API key from the Galaxy user preferences.

With those things in place, you can then run:

planemo rerun –invocation INVOCATION_ID --galaxy_url "https://usegalaxy.eu" --galaxy_user_key YOUR_API_KEY

from a terminal environment.

This command will discover any failed jobs associated with the indicated workflow run and put them back in the queue for another attempt.

Good luck and let me know if you need further help,

Wolfgang

Oh, and regarding the API key: never post that anywhere. It’s as important to keep it private as your Web login credentials because anyone who knows it can do anything on your behalf.