SoS performance dealing with large number of files

This bothers me when I do some very simple simulations:

```
[1]
output: [f"performance_test/{x+1}.out" for x in range(500)]
run:
  touch performance_test/{1..500}.out

[2]
r = [x for x in range(500)]
input: group_by = 1, paired_with = 'r'
output: [f"performance_test/{x+1}.rds" for x in range(500)], group_by = 1
task: concurrent = True
R: expand = '${ }'
  x = rnorm(${_r[0]})
  saveRDS(x, ${_output:r})
```

If you run this script, you'll see it halts for a second or 2 at the end of every batch of completed jobs. I can understand that things like signature checks etc are on going. Therefore a simple simulation that takes < 10 sec as a for loop can take as much as > 700 secs with SoS -- the overhead takes way longer time than the actual computation. I remember it used to be 10 sec vs > 100 secs before last summer. Now I guess as signature check becomes more strict and careful about racing conditions the whole process is a lot slower. Is there still room for optimization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SoS performance dealing with large number of files #874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SoS performance dealing with large number of files #874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions