Conversation
All steps chose the worker serially by acquiring the same lock. The lock is acquired in the task, put and get steps, each implemented in the relative Run method. The lock is released by the corresponding method that creates the Container in the database just after that's done: either `worker.FindOrCreateContainer` or `fetcher.Fetch`. Resource_scanner, resource_type_scanner and image_resource_fetcher do not implement the locking mechanism thus their processing is independent. The unit tests are adjusted for the addition of the lock. Signed-off-by: Alessandro Degano <[email protected]>
fb29a79 to
6c73810
Compare
ddadlani
left a comment
There was a problem hiding this comment.
Hi @aledeganopix4d
Thanks for working on fixing #3301!
My biggest concern with this implementation is that all steps are effectively serialized. Some get or task steps could take quite a while and holding the lock for this time would mean that nothing else would get scheduled while a step is running, which seems too restrictive.
For the use case of #4076, I think it would be sufficient to have a TaskStepLock that is only required with that feature. What are your thoughts?
We also have some incoming refactors (such as #4109) that will break this locking mechanism because we are trying to centralize container creation in one place (to enable things like scheduling and multiple runtimes).
Would it be sufficient for your use case to limit the lock to #4076?
| versionedSource, err := f.fetchWithLock(ctx, logger, source, imageFetchingDelegate.Stdout()) | ||
| if containerLock != nil { | ||
| containerLock.Release() | ||
| } |
There was a problem hiding this comment.
the lock would need to be released after the second call to fetchWithLock() on line 86 since container creation could occur there, but this is susceptible to deadlock because we have 2 locks being held/released here.
| s.session.Metadata, | ||
| s.containerSpec, | ||
| s.resourceTypes, | ||
| nil, |
There was a problem hiding this comment.
We should be passing the lock in here since we could be creating a db.CreatingContainer inside FindOrCreateContainer.
Hi!
I've noticed #4109 and I was indeed wondering how and when that refactor would impact my work with this PR and #4076. But my understanding is that I can keep working on the current HEAD as that rework might take a while?
Yes, in that case, probably, it's better if I create a single PR, that introduces |
|
Hi @aledeganopix4d, We are planning to merge #4109 into master in the next couple of days. The only effect it should have on your PR is where the "active_tasks" number is incremented. In fact, you could likely move the calls to IncrementActiveTasks and DecrementActiveTasks and the new locking logic, into What do you think? |
Yes, I agree |
That's perfect, thanks! |
|
Superseded by #4118 |
This is my tentative at fixing #3301. Taking a pragmatic way and implementing a lock wherever needed.
All steps chose the worker serially by acquiring the same lock.
The lock is acquired in the task, put and get steps,
each implemented in the relative Run method.
The lock is released by the corresponding method that creates
the Container in the database just after that's done:
either
worker.FindOrCreateContainerorfetcher.Fetch.Resource_scanner, resource_type_scanner and image_resource_fetcher
do not implement the locking mechanism thus their processing is independent.
The unit tests are adjusted for the addition of the lock.
New test cases are probably required but I'd like to have feedback on the general direction I've taken
before working more on this.
@ddadlani @deniseyu @kcmannem
Signed-off-by: Alessandro Degano [email protected]