Skip to content

CRI: set container to unknown if fail to start#7044

Closed
Burning1020 wants to merge 1 commit intocontainerd:mainfrom
Burning1020:set-unknown
Closed

CRI: set container to unknown if fail to start#7044
Burning1020 wants to merge 1 commit intocontainerd:mainfrom
Burning1020:set-unknown

Conversation

@Burning1020
Copy link
Copy Markdown
Member

@Burning1020 Burning1020 commented Jun 10, 2022

CRI: Contianer state should be unknown if fail to start, not exited.

I believe the exited state indicates that the process has started successfully and exited either by killed or with a completed execution. While the unknown state can be interpreted as this task has returned an error somehow and the reason is unknown.

In addition, if task.Start() failed and defered task.Delete() failed too, the follow-up RemoveContainer() will failed with ErrFailedPrecondition forever because task still can be loaded.

Signed-off-by: Zhang Tianyang [email protected]

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Jun 11, 2022

please use your real name to sign off. Thanks

Contianer state should be unknown if fail to start, not exited.

Signed-off-by: Zhang Tianyang <[email protected]>
// Set container to exited if fail to start.
// Set container state depended on whether task is alive
if err := cntr.Status.UpdateSync(func(status containerstore.Status) (containerstore.Status, error) {
if _, err := container.Task(ctx, nil); err == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't want to make RPC call in UpdateSync because the lock in UpdateSync will impact the ListContainer and kubelet's PLEG.

I would like to use a flag named by shouldBeUnknown to let rollback know there is no idea about shim.

defer func() {
if retErr != nil {
// Set container to exited if fail to start.
// Set container state depended on whether task is alive
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original comment is not detailed enough... it not only sets the finished time indicating exited state.. it also sets errorStartExitCode which has further meaning:

	// errorStartExitCode is the exit code when fails to start container.
	// 128 is the same with Docker's behavior.
	// TODO(windows): Figure out what should be used for windows.
	errorStartExitCode = 128

thus.. I wonder if this change is needed

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Jul 29, 2022

@Burning1020 could you mind to add the integration test case with failpoint? Thanks

@k8s-ci-robot
Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dims dims added the area/cri Container Runtime Interface (CRI) label Feb 7, 2024
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.

@github-actions github-actions Bot added the Stale label May 16, 2024
@github-actions
Copy link
Copy Markdown

This PR was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions Bot closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) needs-rebase Stale

Projects

Status: Done
Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants