Skip to content

Conversation

@katiewasnothere
Copy link

The goal of this PR is to add support for parsing signals based on the container's platform instead of the host's platform. This is important for scenarios like LCOW where linux containers are run on windows hosts.

This PR addresses #5931 by adding support to send signals as strings to the container. This allows us to put off parsing the signal until we're as close as possible to the container runtime (iow in runc). This PR maintains the existing support for sending signals as integers, but prefers the string value when present.

This PR additionally updates Task and Shim (v1 and v2) APIs with a new field RawSignal used for propagating the signal down to the container runtime.

This change relies on containerd/go-runc#78 first.

Signed-off-by: Kathryn Baldauf [email protected]

@k8s-ci-robot
Copy link

Hi @katiewasnothere. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Sep 15, 2021

Build succeeded.

@katiewasnothere katiewasnothere force-pushed the task_kill_raw_signal branch 2 times, most recently from 1b6d4e7 to 6a7933e Compare November 2, 2021 19:48
@theopenlab-ci
Copy link

theopenlab-ci bot commented Nov 2, 2021

Build succeeded.

@katiewasnothere katiewasnothere force-pushed the task_kill_raw_signal branch 2 times, most recently from 3f30327 to b85b2a3 Compare November 3, 2021 23:43
@theopenlab-ci
Copy link

theopenlab-ci bot commented Nov 4, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Nov 5, 2021

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Nov 5, 2021

Build succeeded.

@katiewasnothere
Copy link
Author

Test failures look unrelated as far as I can tell. @dmcgowan was there any discussion on this PR?

@katiewasnothere
Copy link
Author

@thaJeztah or @kzys PTAL when you can!

if err == nil {
return unix.Signal(s), nil
}
signal, err := signal.ParseSignal(rawSignal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end, ParseSignal will return syscall.Signal, which is an alias for int. In this PR we add rawSignal everywhere along with the old int signal. Would it be possible to parse the string signal earlier (somwhere on containerd/client side) and just pass parsed int signal? This way we don't need to change APIs everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the situation that this is trying to avoid; in the LCOW case, client != runtime platform (runc runs on Linux, but containerd on Windows).

@kzys
Copy link
Member

kzys commented Dec 2, 2021

I've summarized the discussion I had with @thaJeztah, @katiewasnothere and @jterry75. I'm convinced that passing a signnal as a string is the only viable option, but I'm not still fully sure about the Go API change.

https://docs.google.com/document/d/1UwEVg4e_BCiVtlwWUbSwHo3DUFvcq3-VQv4cv4ug7y8/edit?usp=sharing

@jterry75
Copy link
Contributor

jterry75 commented Dec 2, 2021

I agree, I think that passing signal as string is correct option. Please correct me if I am wrong here but the API change is non-breaking for the following reasons:

  • The public Task.Kill is done with KillOpts even today. This means that passing Kill(ctx, syscall.SIGTERM) is equivalent to Kill(ctx, 0, WithKillRawSignal("SIGTERM")). No breaking change as clients can update as needed.
  • The internal / shared with shims API is based on: KillRequest which is a proto file where extension of RawSignal is 100% allowed.
  • The sorta public api pkg/process/init.go is the only slightly concerning breaking change. It's possible that shims beyond what is "in-tree" have used this package abstraction and would need to update the method header to still compile.

Given that the last one is not really the "public" IE: "client" API, I don't think we should consider this breaking. Updating the shims is a relatively small set of people if and only if they are using that package. For example, the Windows shims do not.

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kzys
Copy link
Member

kzys commented Dec 2, 2021

This means that passing Kill(ctx, syscall.SIGTERM) is equivalent to Kill(ctx, 0, WithKillRawSignal("SIGTERM")). No breaking change as clients can update as needed.

Yes. This is not a breaking change. We are prioritizing its backward-compatibility over its clarity. My question is, do we really want to?

Let's say someone is writing a containerd client, they would do Kill(ctx, syscall.SIGTERM) because it does look right. Then someone may say "Hey, it wouldn't work on LCOW! You should do Kill(ctx, 0, WithKillRawSignal("SIGTERM")) instead!"

@kzys
Copy link
Member

kzys commented Dec 2, 2021

cc @dmcgowan @crosbymichael and @AkihiroSuda since you are on #3554

@mxpv
Copy link
Member

mxpv commented Dec 2, 2021

I'm also not sure about having signal and raw signal side by side :(
As I understand we want strings to be a cross platform representation.
Same way we can define a custom enum with signals we need and parse the enum to a platform specific code, right?

@kzys
Copy link
Member

kzys commented Dec 3, 2021

Yes. Currently containerd daemon converts signals' string representations (e.g. "SIGTERM") to integer representations (e.g. 15). We'd like to move the responsibility from the daemon to each shim. In LCOW, containerd's conversion on Windows host doesn't give its Linux containers right signals.

@thaJeztah
Copy link
Member

So, the "new" (string) option takes precedence if both are set, correct?

Which would make

if err := task.Kill(ctx, 0, containerd.WithKillRawSignal("SIGTERM")); err != nil {

and

if err := task.Kill(ctx, syscall.SIGTERM, containerd.WithKillRawSignal("SIGTERM")); err != nil {

Would be equivalent (mostly wondering what's better; set the numeric value to 0, or to the "existing" value - would that be more backward compatible?)

@kzys
Copy link
Member

kzys commented Dec 3, 2021

Maybe returning an error on the second case is better? Let's say I have

func forceKill(ctx context.Context, opts ...KillOpts) {
    err := task.Kill(ctx, syscall.SIGKILL, opts...)
   ...
}

After this change, the caller can override SIGKILL. This is signature-wise backward compatible though...

I generally think having Do-What-I-Mean behavior on system software is bad. If we have 2 signal information, we may call that is a programmer error.

@dmcgowan
Copy link
Member

dmcgowan commented Dec 3, 2021

I read through the summary document, thanks for the details description. I also agree that option 2 is the best long term solution and share the concerns about how the change is made. I am wondering if it is more of a "containerd 2.0" kind of change rather than something we should squeeze into the API right now. Option 3 doesn't seem that bad as a shorter term (getting into 1.6) solution as it mostly effects LCOW which already deals with these issues and Windows clients are already handling Linux specific cases on Windows host.

To pass signals from Windows to Linux, containerd on Windows has to forward them even if Windows doesn’t support them. In other words, We no longer fail-fast on invalid signals.

Is fail-fast really that important? Invalid signal is likely caused by a bug in the client or bad configuration rather than an expected case which needs to be handled quickly. Am I misunderstanding that?

Moreover, signal’s string -> integer conversion has to be platform-agnostic and the superset of all platforms, since shims would convert the integers to another integers later. This will be problematic if we consider other Unix OSes. For example, illumos has SIGFREEZE and SIGTHAW.

This doesn't seem like too big of a change especially if it is initially focused on supporting LCOW. I'm not sure how the behavior would be different in this case for the mentioned SIGRTMIN since the library is still not supporting the mentioned difference.

Related to Go API compatibility. While we do not guarantee it, it is a best effort to maintain it. This change seems like it would be disruptive so late in 1.x adoption. In the end, we should make the decision based on what is the most clear interface. If we make this change, having two ways to pass a signal into the interface is not the most clean solution. Let's decide what the best API would be if we were to choose it for 2.0, and work toward that. Either adding into a 1.x release or waiting for 2.0.

@kzys
Copy link
Member

kzys commented Dec 9, 2021

Thanks @dmcgowan. I like the two phase approach. In 1.x, we could treat Linux's signals' integer representaiton as signals' platform-agnostic representation. While this is unnecessary Linux-centric, we could unblock LCOW and doesn't cause a lot of confusion for 1.x customers.

In 2.x, we will fully migrate from signals' integer representation to string representation.

@jterry75 @katiewasnothere What do you think? I believe it should unblock LCOW.

@jterry75
Copy link
Contributor

jterry75 commented Dec 9, 2021

I think I have lost the train. What does it mean to use signals platform-agnostic representation? I still don't get how you do a parseSignal(string) on the containerd side that is platform-agnostic. Do you just mean use the moby/sys/signals package and hope for the best?

For containerd 2.0 it sounds like we are saying an API breaking change is preferable to Kill(ctx, string, bool) right?

@kzys
Copy link
Member

kzys commented Dec 9, 2021

Sorry, I meant,

  • containerd daemon will do signals' string -> integer conversion, based on Linux's conversion map.
  • LCOW can take Linux signals since the conversion map is Linux's one.
  • WCOW has to convert Linux signals' integer representation to Windows signals' integer representation.

For containerd 2.0 it sounds like we are saying an API breaking change is preferable to Kill(ctx, string, bool) right?

Yes. Kill(ctx, string, ...opts) and Kill(ctx, string, bool) are the ones that clients would use.

@jterry75
Copy link
Contributor

jterry75 commented Dec 9, 2021

Sorry, I meant,

  • containerd daemon will do signals' string -> integer conversion, based on Linux's conversion map.
  • LCOW can take Linux signals since the conversion map is Linux's one.
  • WCOW has to convert Linux signals' integer representation to Windows signals' integer representation.

For containerd 2.0 it sounds like we are saying an API breaking change is preferable to Kill(ctx, string, bool) right?

Yes. Kill(ctx, string, ...opts) and Kill(ctx, string, bool) are the ones that clients would use.

Got it! Does anyone know if there are any images that actually use Windows Stop Signals? https://github.com/microsoft/hcsshim/blob/7fa8bda4e6ba503caf0d53d0a4ee99b9a64ceed8/internal/signals/signal.go#L105

Would we break any Windows Images if the StopSignal was "CtrlC" and now the containerd parser would need to account for this?

@kzys
Copy link
Member

kzys commented Dec 9, 2021

@jterry75
Copy link
Contributor

@katiewasnothere
Copy link
Author

katiewasnothere commented Dec 16, 2021

@jterry75
Copy link
Contributor

Wow. Do they work today? They are not listed in https://github.com/moby/sys/blob/master/signal/signal_windows.go.

@katiewasnothere - Do you happen to know?

Not that I'm aware of. The code you linked here https://github.com/microsoft/hcsshim/blob/7fa8bda4e6ba503caf0d53d0a4ee99b9a64ceed8/internal/signals/signal.go#L105 is only referenced by code flows we no longer use in hcsshim @jterry75.

Oh wow! Thanks @katiewasnothere. cmd variant of runhcs should be deleted for sure! Its only used in this codebase to generate the bases writable layer for LCOW it looks like. Could be a standalone tool.

Ship it! No use of Windows Signals supported!

@katiewasnothere
Copy link
Author

Closing. This was resolved for now with moby/sys#98

@katiewasnothere katiewasnothere deleted the task_kill_raw_signal branch February 23, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants