Conversation
888244f to
2ac9a6b
Compare
…ase when task exit Signed-off-by: Alessandro Degano <[email protected]>
Signed-off-by: Alessandro Degano <[email protected]>
Signed-off-by: Alessandro Degano <[email protected]>
Signed-off-by: Alessandro Degano <[email protected]>
2ac9a6b to
5c43837
Compare
Fix existing tests. Implement maxTasks == 0 that falls back to the normal fewest-build-containers strategy. Signed-off-by: Alessandro Degano <[email protected]>
5c43837 to
aaea9d8
Compare
Signed-off-by: Alessandro Degano <[email protected]>
ddadlani
left a comment
There was a problem hiding this comment.
Hi @aledeganopix4d, thanks for working on this. I do have a few questions/concerns though.
I'm a bit concerned that the number of "active tasks" is being tracked separately i.e. we aren't using feedback from the worker to set this number. For example, if you restart the worker, there will be no running containers but the database will still have the old "active tasks" number.
I also didn't realize we would only be tracking tasks, I was under the impression we'd be building off of the build_containers number that is based off of the actual containers placed on the worker. We have the BuildContainersCountPerWorker method in the db_worker_factory that can be used.
Have I misunderstood the use case behind this PR? My understanding was that the workers would get overloaded due to "heavy" containers being placed together on them.
atc/atccmd/command.go
Outdated
| ResourceTypeCheckingInterval time.Duration `long:"resource-type-checking-interval" default:"1m" description:"Interval on which to check for new versions of resource types."` | ||
|
|
||
| ContainerPlacementStrategy string `long:"container-placement-strategy" default:"volume-locality" choice:"volume-locality" choice:"random" choice:"fewest-build-containers" description:"Method by which a worker is selected during container placement."` | ||
| MaxBuildTasksWorker int `long:"max-build-tasks-worker" default:"0" description:"Maximum allowed number of active build tasks per worker. Has effect only when used with fewest-build-containers placement strategy. 0 means no limit."` |
There was a problem hiding this comment.
Hi @aledeganopix4d, I thought we were going to limit the max build containers per worker? I didn't realize the limit was only for tasks.
If we limit max-build-containers-per-worker, we could leverage the existing build_containers column in the workers table.
There was a problem hiding this comment.
Is better explained in the comments, but just for reference: while creating this PR I've realized that build_containers is not a reliable number to base the kind of scheduling I'm trying to achieve (as containers stays around long after the workers completed processing).
There was a problem hiding this comment.
Hey @aledeganopix4d
To better understand the issue with not being able to use build_containers and tracking only task containers, do you mean resource get & put containers are staying around much longer after a build is completed ?
There was a problem hiding this comment.
Hi @xtreme-sameer-vohra, no, that's not the case.
build_containers and task containers are actually the same thing: e.g. the container where Tasks are executed. However, the lifetime of a build_container is longer than the time the worker is actively running the task, this is because the build_container will stay on the worker until Garbage Collected which will happen:
- After ~1 minute if the task was successful
- After 1 hour if the task errored or failed
While the first case wouldn't be that bad, the second really throws a wrench into any scheduling system that relies on counting the build_containers to limit the amount of tasks on a worker.
There was a problem hiding this comment.
Thanks for the clarification @aledeganopix4d . I understand what you mean now.
Internally within Concourse, the hierarchy is;
checkcontainersbuildcontainersgetputtask( successful & error'ed )- hook containers ( on_failure, on_success , etc )
Adding workload limiting at the build container level prior to adding it at the sub task level allows Concourse as a system to remain general purpose and then providing knobs to be optimal in specific cases.
There was a problem hiding this comment.
Just a quick clarification, build_containers include get, put and task containers, basically just excluding check containers which are assumed to be fairly lightweight.
There was a problem hiding this comment.
Ah! Sorry, that was indeed a misunderstanding on my part!
atc/atccmd/command.go
Outdated
| ResourceTypeCheckingInterval time.Duration `long:"resource-type-checking-interval" default:"1m" description:"Interval on which to check for new versions of resource types."` | ||
|
|
||
| ContainerPlacementStrategy string `long:"container-placement-strategy" default:"volume-locality" choice:"volume-locality" choice:"random" choice:"fewest-build-containers" description:"Method by which a worker is selected during container placement."` | ||
| MaxBuildTasksWorker int `long:"max-build-tasks-worker" default:"0" description:"Maximum allowed number of active build tasks per worker. Has effect only when used with fewest-build-containers placement strategy. 0 means no limit."` |
There was a problem hiding this comment.
Would it make more sense to have each worker specify its own limit (when connecting to the web nodes?) That way if you have workers with separate capacities, they could utilize their full capacity.
There was a problem hiding this comment.
Yes! That's absolutely the end goal, for the time being I tried to keep the PR as minimal as possible.
atc/worker/pool.go
Outdated
| return nil, err | ||
| } | ||
|
|
||
| if metadata.Type == db.ContainerTypeTask { |
There was a problem hiding this comment.
The containerMetadata.Type field is not guaranteed to be accurate, it exists solely for metrics / output of fly containers. I'm not sure we should rely on it for business logic.
There was a problem hiding this comment.
Oh, okay, thanks for pointing that out.
I'll find a more reliable place to increase the number of active tasks then!
Hello @ddadlani and first of all thank you very much for your feedback!
About the restart of the workers: you are totally right and in that case I should reset the counter. |
…ent the active tasks count. Remove check on container metadata. Signed-off-by: Alessandro Degano <[email protected]>
Signed-off-by: Alessandro Degano <[email protected]>
Signed-off-by: Alessandro Degano <[email protected]>
|
@ddadlani following your comments I've updated the PR with the following changes:
|
|
Hey @aledeganopix4d Joining the party late :) but a few thoughts on looking at this PR
|
|
Hi @xtreme-sameer-vohra , @aledeganopix4d ,
This is one of the reasons I preferred the use of I'm also a bit worried that tracking
This is a very valid concern, but it brings up issues with this kind of scheduling optimization in general. We don't account for the state of the containers on a worker (e.g. creating, destroying, created etc) but we can't because one can still hit Garden's container limit of 250 regardless of what state the containers on a worker are in. If we don't actually measure the actual resource usage of each container, we're just making guesses as to which kinds of containers are most likely to bring down a worker. I can see the argument that if we have |
|
Hi @ddadlani and again thank you for taking the time to engage in the discussion.
This is a totally valid point, I would argue that since the worker resources typically engaged in
I understand your point and I think this is a discussion that goes well beyond the scope of this PR: in my experience as CI operator (on multiple systems, not only Concourse) the observation is that if you wait to see which resources a running task is employing then it's already too late for the scheduling as you would need to rebalance the system in flight (e.g. moving around containers to workers with lighter load). I think it's totally reasonable to introduce |
|
Hi @xtreme-sameer-vohra thank you for taking the time to leave some feedback!
I was thinking that maybe we could have a
In its current state the PR does not take this into account, true, but I think it would be quite easy to increment the active tasks on the worker when a containers on it get hijacked. Similarly when
You're right I didn't consider this. The problem is that it's hard to know:
I'm not sure how this could be addressed. |
|
Hi @aledeganopix4d,
I think you hit the nail on the head here. I think the most useful distinction to make would be "active" build containers vs "inactive", read: destroying, initializing, errored etc. This would be the most general answer to the race problem that occurs if we naively limit the number of build containers on the worker. The problem is that there is no direct correlation in the runtime between a step's state and its container's state. To me, this is a problem that requires more design and that makes this issue not as easy to resolve as we originally thought. I'm hesitant to have separate limits for active tasks, active gets and active puts because:
This leads me to think that this is not a straightforward stop-gap solution that can easily be implemented before the scheduling queue can be implemented. Does that make sense? What are your thoughts on it? |
I agree and I think it's relatively easy to expand on the idea I've proposed here: when we run steps (regardless of task, put, get) we can modify the db Container to be "running" and then "completed" when the same steps complete (regardless of how they complete).
Agree, I'm thinking that what I've wrote above could address these two points.
I understand that - however we tackle this task - it will require more workarounds. Moreover it also depends on how we fix #3301 as whatever serializing mechanism we use there will be used here too (btw I've given a shot to fix that here: #4108). |
ddadlani
left a comment
There was a problem hiding this comment.
Hey @aledeganopix4d,
Thanks for working on this feature! I still have a few reservations around ways that the number of active tasks may be inconsistent, but I think based on our discussion and with the changes I have suggested, it should be good for a v1 of this feature. We can document this feature as experimental and allow users to opt in to minimize impact on the general users.
Also regarding PR #4108 that adds in the container creating lock. Could you add a lock in this PR that only locks around running a task step? I understand that you originally wrote #4076 with the assumption that the ContainerCreatingLock would be there, but I don't think we need to lock around all kinds of container creation for this PR. We could add in a TaskStepLock or something similar to ensure that active_tasks for each worker is accurate. This new lock could also be opt-in, i.e. we don't try to serialize the task step if the user has not set max-active-tasks-worker.
I will leave my thoughts on #4108 on that PR.
Last request, can we also get some more test coverage for this feature?
Let me know if you have any questions!
| workerVersion = &atcWorker.Version | ||
| } | ||
|
|
||
| activeTasks := 0 // When a worker starts up by definition has no active tasks. |
There was a problem hiding this comment.
When calling saveWorker, it is possible for the worker to already exist in the DB. This can happen in a few ways:
- when there is a network blip and the worker is marked as stalled and then re-registers
- when the worker rebalances to another ATC after some time.
In these cases, we would expect that active_tasks should remain the same as they were previously. active_tasks should only be set to 0 for new/recreated workers.
|
|
||
| go func() { | ||
| processStatus, processErr = process.Wait() | ||
| logger.Info("Decreasing active tasks on worker.") |
There was a problem hiding this comment.
We would also want to decrement the number of active tasks in here
If the task was scheduled on a worker, but then the ATC died and another ATC picked up the task, it would go through this flow. If the task has already completed on the worker, we want to decrement the number of tasks to reflect reality.
|
|
||
| ContainerPlacementStrategy string `long:"container-placement-strategy" default:"volume-locality" choice:"volume-locality" choice:"random" choice:"fewest-build-containers" description:"Method by which a worker is selected during container placement."` | ||
| ContainerPlacementStrategy string `long:"container-placement-strategy" default:"volume-locality" choice:"volume-locality" choice:"random" choice:"fewest-build-containers" choice:"fewest-active-tasks" description:"Method by which a worker is selected during container placement."` | ||
| MaxActiveTasksWorker int `long:"max-active-tasks-worker" default:"0" description:"Maximum allowed number of active build tasks per worker. Has effect only when used with fewest-active-tasks placement strategy. 0 means no limit."` |
There was a problem hiding this comment.
minor nitpick: can we call this MaxActiveTasksPerWorker? :) I don't think the command-line flag needs to change though.
I agree with all your comments and requests. |
|
Superseded by #4118 |
This is a POC to fix #2928.
It relies on the locking mechanism introduced by #3902 (which has been reverted but I'm hoping that will be re-merged fixed :D ).
Introduce a new column in the workers db table to store the number of active tasks in each worker.
Every time a new task starts on a worker the counter is incremented (inside the lock!) and when the task exits the counter is decremented (still in the lock).
When using the
fewest-build-containersstrategy, then, we select only the workers with the number of active tasks is less than what parametrized (note, the parameter for this draft is just hardcoded to 1).This PR still needs some refinement but it works on my small test setup.
I would like some feedback to understand if I'm going in the right direction.
@vito @ddadlani @mhuangpivotal