Implement new placement strategy: 'limit-active-tasks'#4118
Implement new placement strategy: 'limit-active-tasks'#4118nader-ziada merged 4 commits intoconcourse:masterfrom
Conversation
b10e607 to
4ef6c31
Compare
|
I like the change. Thanks for the help 👍 |
|
idk how everyone feels about this but it might be confusing to differentiate |
I don't have any strong opinion on the naming, just keep in mind that |
In any case I've followed your suggestion and the strategy is now called |
1966878 to
e396d1b
Compare
xtreme-sameer-vohra
left a comment
There was a problem hiding this comment.
Hey @aledeganopix4d
Looks good. We added a few comments.
There are some scenarios where the decrementCounter might not be executed (ie. the ATC is restarted ). In these scenarios, the worker would have to be retired and registered to reset the counter. It would be worth documenting this so users are aware of these edge cases.
atc/engine/builder/step_factory.go
Outdated
| defaultLimits atc.ContainerLimits | ||
| strategy worker.ContainerPlacementStrategy | ||
| resourceFactory resource.ResourceFactory | ||
| lockFactory lock.LockFactory |
There was a problem hiding this comment.
It seems like this lockFactory isn't used as StepFactory.TaskStep also has a lockFactory in the signature
It would be preferable to not modify the interface and utilize the lockFactory in the stepFactory struct
There was a problem hiding this comment.
Well spotted, thanks!
I've now removed it!
| } | ||
|
|
||
| if step.strategy.ModifiesActiveTasks() { | ||
| if chosenWorker == nil { |
There was a problem hiding this comment.
Should we differentiate the 2 potential conditions;
- there are no workers in the pool
- there workers in the pool are busy
In the first case, we might want to bubble the error up to the user
In the latter case, Concourse can wait until a worker is free to take on the work
And lastly, is there a max time we would want Concourse to keep trying before giving up and bubbling the error up ?
There was a problem hiding this comment.
About the first comment:
That's already been taken care of as is because pool.allSatisfying already returns an error when the pool is empty, so that behavior is unchanged.
About the timeout: I've thought about it. If we are to implement it should be customizable and maybe that's the right approach eventually, but for now I really wanted to keep the PR as simple as possible. We can refine the feature incrementally afterward.
There was a problem hiding this comment.
Thx @aledeganopix4d
Add active_tasks counter to db Worker. Add db migrations to add the new column. Parametrize max-build-tasks-per-worker from command line argument. Create TaskStep lock. Pass lockfactory to task step. Add locking around task worker chosing. Signed-off-by: Alessandro Degano <[email protected]>
No new tests yet introduced. Signed-off-by: Alessandro Degano <[email protected]>
- placement: FewestActiveTasksPlacementStrategy - task_step: Increase and Decrease active tasks in fakeWorker - db.Worker: Increase and Decrease active tasks in DB Signed-off-by: Alessandro Degano <[email protected]>
Rename ModifyActiveTasks() to ModifiesActiveTasks(). Signed-off-by: Alessandro Degano <[email protected]>
e396d1b to
ab15b63
Compare
Sure, good idea, would you know where would the best place be to document this? |
Hey @aledeganopix4d And if you're so kind, you can submit a PR for Thanks :) |
Hello @xtreme-sameer-vohra, |
nader-ziada
left a comment
There was a problem hiding this comment.
Thanks @aledeganopix4d for the PR and the followup changes!
This is a follow-up of concourse#4118 which introduced the new placement strategy: `limit-active-tasks`. To improve the user experience this PR outputs to the UI when the task is waiting for a worker to be free, that is warning the user that the system is at full capacity and the task will wait for a worker to free up. Once a worker is available and the task starts the interface informs the user that the task can start and how long it has been waiting. Signed-off-by: Alessandro Degano <[email protected]>
This is a follow-up of concourse#4118 which introduced the new placement strategy: `limit-active-tasks`. To improve the user experience this PR outputs to the UI when the task is waiting for a worker to be free, that is warning the user that the system is at full capacity and the task will wait for a worker to free up. Once a worker is available and the task starts the interface informs the user that the task can start and how long it has been waiting. Signed-off-by: Alessandro Degano <[email protected]>
This is a follow-up of concourse#4118 which introduced the new placement strategy: `limit-active-tasks`. To improve the user experience this PR outputs to the UI when the task is waiting for a worker to be free, that is warning the user that the system is at full capacity and the task will wait for a worker to free up. Once a worker is available and the task starts the interface informs the user that the task can start and how long it has been waiting. Signed-off-by: Alessandro Degano <[email protected]>
Fix conflicts caused by PR #4118 - addition of `limit-task-step` container placement strategy
#4118 #4148 #4208 #4277 #4142 #4221 #4293 Signed-off-by: James Thomson <[email protected]> Co-authored-by: Jamie Klassen <[email protected]>
#4118 #4148 #4208 #4277 #4142 #4221 #4293 Signed-off-by: James Thomson <[email protected]> Co-authored-by: Jamie Klassen <[email protected]>
This commit adds the new parameters that were added to Concourse 5.5. Here's a breakdown of the new parameters: - max-active-tasks-per-worker > used by the `limit-active-tasks` container placement strategy > concourse/concourse#4118 - support for influxdb batching and bigger buffer size for metrics emissions > concourse/concourse#3937 - limitting number of max connections in db conn pools > concourse/concourse#4232 Signed-off-by: Ciro S. Costa <[email protected]> Co-authored-by: Zoe Tian <[email protected]>
This commit adds the new parameters that were added to Concourse 5.5. Here's a breakdown of the new parameters: - max-active-tasks-per-worker > used by the `limit-active-tasks` container placement strategy > concourse/concourse#4118 - support for influxdb batching and bigger buffer size for metrics emissions > concourse/concourse#3937 - limitting number of max connections in db conn pools > concourse/concourse#4232 Signed-off-by: Ciro S. Costa <[email protected]> Co-authored-by: Zoe Tian <[email protected]>
This commit adds the new parameters that were added to Concourse 5.5. Here's a breakdown of the new parameters: - max-active-tasks-per-worker > used by the `limit-active-tasks` container placement strategy > concourse/concourse#4118 - support for influxdb batching and bigger buffer size for metrics emissions > concourse/concourse#3937 - limitting number of max connections in db conn pools > concourse/concourse#4232 Signed-off-by: Ciro S. Costa <[email protected]> Co-authored-by: Zoe Tian <[email protected]>
This commit adds the new parameters that were added to Concourse 5.5. Here's a breakdown of the new parameters: - max-active-tasks-per-worker > used by the `limit-active-tasks` container placement strategy > concourse/concourse#4118 - support for influxdb batching and bigger buffer size for metrics emissions > concourse/concourse#3937 - limitting number of max connections in db conn pools > concourse/concourse#4232 Signed-off-by: Ciro S. Costa <[email protected]> Co-authored-by: Zoe Tian <[email protected]>

This PR proposes the addition of a new placement strategy. It introduces the concept of active task, that is a task that is effectively running on a worker (as opposed, for instance, to build containers that can stay on the worker long after the task is finished).
A new "active_task" counter is added to the DB Worker and the responsibility of increasing/decreasing the counter is in the
task_step.The placement strategy considers the number of active tasks in each compatible worker and assigns any step to the worker with the fewest of them. Additionally a parameter "MaxActiveTasksPerWorker" can be defined, in that case workers that already have that number of active tasks will not be selected for tasks placement. If no worker is available with less than
MaxActiveTasksPerWorkerthen the tasks will simply wait until one is free.The worker selection in the
task_stepis serialized through a lock to prevent races where different tasks could land on the same worker.Note that any other "step" is not restrained by
MaxActiveTasksPerWorkerand will simply chose the worker with fewest active tasks (thusput,get,checketc will never be blocked).The PR is split in 3 commits to ease review:
MaxActiveTasksPerWorkerparameterThis PR supersedes #4076 and, hopefully, implements all the suggestions proposed in that discussion.
@ddadlani