-
Notifications
You must be signed in to change notification settings - Fork 46
Description
This topic has been discussed before but perhaps not the same context. I've got a couple of workflow steps like this:
[step]
input: '/path/to/a/single/file.gz', for_each = 'chroms', concurrent = True
output: dynamic(glob.glob('{cwd}/{y_data:bnn}/chr*/*.rds'))
[another_step]
input: glob.glob(f'{cwd}/{y_data:bnn}/chr*/*.rds'), group_by = 1, concurrent = True
output: dynamic(glob.glob(f'{cwd}/{y_data:bnn}/SuSiE_CS_*/*.rds'))
R: expand = "${ }"
I run it in 2 separate sequential SoS commands:
sos run step
sos run another_step
You see the first step takes a single file file.gz, pair it with different chroms then create many small rds dynamic output. The actual output length at the end of the pipeline is
>>> len(glob.glob('chr*/*.rds'))
43601Now when I run the 2nd step it got stuck at the single SoS process to prepare for the run, for 10 minutes (i started writing this post 5 min ago), and it is still working on it ... not yet analyzing the data.
~43K files does not sound a big deal right? But this is indeed the first time I use dynamic output of a previous step as the input of the next, in separate commands. I am wondering what is going on maybe in this context? and if we can do something about it.