Skip to content

optimize shim lock in exec to avoid exec hang#2820

Closed
lifubang wants to merge 2 commits intocontainerd:masterfrom
lifubang:execshimlog
Closed

optimize shim lock in exec to avoid exec hang#2820
lifubang wants to merge 2 commits intocontainerd:masterfrom
lifubang:execshimlog

Conversation

@lifubang
Copy link
Copy Markdown
Contributor

@lifubang lifubang commented Nov 21, 2018

Exec shim lock optimize before may cause two exec operations can enter to process.Exec area with the same exec id. (With small probability)
Please see:

func (s *Service) Exec(ctx context.Context, r *shimapi.ExecProcessRequest) (*ptypes.Empty, error) {
s.mu.Lock()
if p := s.processes[r.ID]; p != nil {
s.mu.Unlock()
return nil, errdefs.ToGRPCf(errdefs.ErrAlreadyExists, "id %s", r.ID)
}
p := s.processes[s.id]
s.mu.Unlock()
if p == nil {
return nil, errdefs.ToGRPCf(errdefs.ErrFailedPrecondition, "container must be created")
}
process, err := p.(*proc.Init).Exec(ctx, s.config.Path, &proc.ExecConfig{
ID: r.ID,
Terminal: r.Terminal,
Stdin: r.Stdin,
Stdout: r.Stdout,
Stderr: r.Stderr,
Spec: r.Spec,
})
if err != nil {
return nil, errdefs.ToGRPC(err)
}
s.mu.Lock()
s.processes[r.ID] = process
s.mu.Unlock()
return empty, nil
}

They can both enter here:
process, err := p.(*proc.Init).Exec(ctx, s.config.Path, &proc.ExecConfig{

Because the exec id has not been added to s.processes:
s.processes[r.ID] = process

This will cause one of exec hang when exit:
This one can exit succ:

root@dockerdemo:/opt/runctest/redis# ctr --address /run/docker/containerd/containerd.sock t exec --exec-id test -t redis1 sh
/data # ls /
bin    dev    home   media  proc   run    srv    tmp    var
data   etc    lib    mnt    root   sbin   sys    usr
/data # root@dockerdemo:/opt/runctest/redis#

This one can't exit:

root@dockerdemo:~# ctr --address /run/docker/containerd/containerd.sock t exec --exec-id test -t redis1 sh
/data # ls /
bin    dev    home   media  proc   run    srv    tmp    var
data   etc    lib    mnt    root   sbin   sys    usr
/data # 
^C

So, I think this lock should be for the whole func, and just one exec can run succ, the other one should return an error.

But this is not important for docker, because each exec id in docker are different.
If maintainers feel it's not really important because of some other factors, this can be ignored. Because it's really a very small probability.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #2820 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2820   +/-   ##
=======================================
  Coverage   43.79%   43.79%           
=======================================
  Files         100      100           
  Lines       10741    10741           
=======================================
  Hits         4704     4704           
  Misses       5307     5307           
  Partials      730      730
Flag Coverage Δ
#linux 47.45% <ø> (ø) ⬆️
#windows 40.96% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32aa0cd...2fe5c8d. Read the comment docs.

@lifubang
Copy link
Copy Markdown
Contributor Author

lifubang commented Nov 21, 2018

Or just add a pre value to s.processes[r.ID] after exec id check before s.mu.Unlock() in Line 256?

s.mu.Lock()
if p := s.processes[r.ID]; p != nil {
s.mu.Unlock()
return nil, errdefs.ToGRPCf(errdefs.ErrAlreadyExists, "id %s", r.ID)
}
p := s.processes[s.id]
s.mu.Unlock()

For example:
s.processes[r.ID]=&execProcess{}
And delete it if Exec error.

@lifubang
Copy link
Copy Markdown
Contributor Author

lifubang commented Nov 21, 2018

Or: for avoid global shim lock, I think we may just add a new lock(such as: s.execmu) just for Exec func, because it's different from other func.

@Random-Liu
Copy link
Copy Markdown
Member

Random-Liu commented Nov 21, 2018

If we are confident that Exec is reliable, and will never hang, we can lock the whole function;
If not, I prefer we fix it in another way, e.g. an id reservation table.

@Ace-Tang
Copy link
Copy Markdown
Contributor

Ace-Tang commented Nov 22, 2018

Does same exec id should be consider in the upper level, like moby? I am not sure

@lifubang
Copy link
Copy Markdown
Contributor Author

lifubang commented Nov 22, 2018

Does same exec id should be consider in the upper level, like moby? I am not sure

Yes, moby have consider it, each exec id is different from each other.
But if the user just use lib containerd, it will cause errors.

@ehazlett
Copy link
Copy Markdown
Member

@lifubang this needs a rebase please. thanks!

@lifubang
Copy link
Copy Markdown
Contributor Author

@lifubang this needs a rebase please. thanks!

have done, thanks.

@crosbymichael
Copy link
Copy Markdown
Member

I've been looking through this code recently, I don't think we want to hold the lock during execution but we should maybe "reserve" the id at the entrypoint of the Exec. I'm going to try updating this for this functionality.

crosbymichael added a commit to crosbymichael/containerd that referenced this pull request Jun 21, 2019
tussennet pushed a commit to tussennet/containerd that referenced this pull request Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants