-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Persistent grouping of sos_targets
input: `a.txt`, `b.txt`, group_by=1
will be considered as equivalent to
input: sos_targets('a.txt', 'b.txt', group_by=1)
which creates a sos_targets with two targets and two groups, with groups accessible with property groups, which is a list of sos_targets with no subgroups.
sos_targets will keep its grouping information when it is passed around. That is to say
step_inputwill have groups that are essentially_inputfor substeps.step_outputwill contain_outputfrom each substep as its groups.
keyword arguments in input and output
Keyword arguments used to specify sources of targets.
input: name=targets
output: name=targets
Named input and output can be accessed by _input['name'] and _output['name'].
Implementation-wise,
input: name=targets
creates step_input as sos_targets(name=targets), which assigns sources of targets to name.
output_from(steps, **kwargs) to get output from other steps
Refers to output from one or more steps, parameter can be a name or a number. The latter refers to a step in the same workflow (output_from(10) from step_20 is equivalent to output_from('step_10')).
input: output_from('step')
input: output_from(1)
input: output_from([1, 2])
with named input and output, the syntax can be expanded to
input: ref=output_from('get_ref')['ref']
A special step name -1 as in
input: output_from(-1)
is reserved to output from previous step, which is only valid from a numerically indexed steps.
Options group_by, paired_with, pattern, group_with, and for_each can be used to regroup or attach variables to the output. For example, group_by can be used to regroup the retrieved sos_targets,
input: output_from(10, group_by='all')
named_output('name', **kwargs) for data flow without step name
named_output('ref') in the following example refers to any step with ref in named output,
[A]
output: ref=targets
[B]
input: named_output('ref')
which has the same effect with output_from('A')['ref'] but does not need the specification of step name.
Similar to output_from, parameters group_by, paired_with, pattern, group_with, for_each can be used to regroup or attached variables to retrieved targets.
Merging of multiple sos_targets
Multiple sos_targets can be specified in the input statement, either explicitly with sos_targets, or implicitly with output_from, named_output. In this case, targets and groups from multiple sos_targets will be merged. sos_targets objects with different numbers of groups can be merged only if one of them has no group information or has a single group with all targets. In this case the group will be replicated for all groups before merging.
For example,
input: 'a.txt', 'b.txt', sos_targets('c.txt', 'd.txt', group_by=1)
will create a sos_targets with four targets 'a.txt', 'b.txt', 'c.txt', 'd.txt', and two groups
'a.txt', 'b.txt', 'c.txt'
'a.txt', 'b.txt', 'd.txt'
The same rule applies to sos_targets created by output_from() or output_from(group_by). However, if a global group_by option is present, all individual groups will be overridden. That is to say,
input: 'a.txt', 'b.txt', output_from(10), group_by=1
will regroup all targets by 1, regardless of original grouping information from output_from(10).
set and get of attributes to sos targets
New functions are added BaseTarget.set(), BaseTarget.get()
A dictionary are now associated with each BaseTarget and can be access with .set() and .get() function, or as an attribute of the target. The .set() function is usually done automatically by parameters paired_with and group_with, but can be used directly. With
a = file_target('a.txt')
a.set('name', 'a')
it is usually easier to use
a.name
instead of
a.get('name')
but a.get('name', default=None) will return a default value instead of raising an AttributeError if name does not exist, which can be safer to use from time to time.
Changes to parameters paired_with, group_with and for_each
In addition to variables set to the global namespace, the paired values are written to _input as target or group properties. That is to say, with
sample = ['A', 'B']
files = ['a1', 'a2', 'a3', 'a4']
input: 'a1.txt', 'a2.txt', 'b1.txt', 'b2.txt', group_by=2,
paired_with='files', group_with='sample', for_each=dict(i=range(5))
you can access _sample, _files, and i both directly, and as
_input[0]._files
_input._sample
_input.i
So that
sample = ['A', 'B']
files = ['a1', 'a2', 'a3', 'a4']
input: 'a1.txt', 'a2.txt', 'b1.txt', 'b2.txt', group_by=2,
paired_with='files', group_with='sample', for_each=dict(i=range(5))
print(f'_input={_input}, _files={_files}, _sample={_sample}, i={i}')
print(f'_input[0]._files={_input[0]._files}, _input._sample={_input._sample}, _input.i={_input.i}')
would produce:
_input=a1.txt a2.txt, _files=['a1', 'a2'], _sample=A, i=0
_input[0]._files=a1, _input._sample=A, _input.i=0
_input=b1.txt b2.txt, _files=['a3', 'a4'], _sample=B, i=0
_input[0]._files=a3, _input._sample=B, _input.i=0
...