[release/1.6] test: introduce failpoint control to runc-shimv2 and cni#7455
Conversation
Failpoint is used to control the fail during API call when testing, especially the API is complicated like CRI-RunPodSandbox. It can help us to test the unexpected behavior without mock. The control design is based on freebsd fail(9), but simpler. REF: https://www.freebsd.org/cgi/man.cgi?query=fail&sektion=9&apropos=0&manpath=FreeBSD%2B10.0-RELEASE Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit ffd59ba) Signed-off-by: Qiutong Song <[email protected]>
Currently, the runc shimv2 commandline manager doesn't support ttrpc server's customized option, for example, the ttrpc server interceptor. This commit is to allow the task plugin can return the `UnaryServerInterceptor` option to the manager so that the task plugin can do enhancement before handling the incoming request, like API-level failpoint control. Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 822cc51) Signed-off-by: Qiutong Song <[email protected]>
|
Hi @qiutongs. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cc @fuweid @samuelkarp |
The failure was due to #6827 |
ca2e110 to
1db5e31
Compare
fuweid
left a comment
There was a problem hiding this comment.
LGTM
To me, it belongs to test framework and it is good to test the new features or bugfix from main branch. Agree to do backport.
|
Would you mind squashing "Use the old path of runtime v2 task, prior to PR 6827" with "bin/ctr,integration: new runc-shim with failpoint"? It's better for each commit to be buildable so things like |
Added new runc shim binary in integration testing.
The shim is named by io.containerd.runc-fp.v1, which allows us to use
additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to
setup shim task API's failpoint. Since the shim can be shared with
multiple container, like what kubernetes pod does, the failpoint will be
initialized during setup the shim server. So, the following the
container's OCI failpoint's annotation will not work.
This commit also updates the ctr tool that we can use `--annotation` to
specify annotations when run container. For example:
```bash
➜ ctr run -d --runtime runc-fp.v1 \
--annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \
docker.io/library/alpine:latest testing sleep 1d
➜ ctr t ls
TASK PID STATUS
testing 147304 RUNNING
➜ ctr t kill -s SIGKILL testing
ctr: sorry: unknown
➜ ctr t kill -s SIGKILL testing
➜ sudo ctr t ls
TASK PID STATUS
testing 147304 STOPPED
```
The runc-fp.v1 shim is based on core runc.v2. We can use it to inject
failpoint during testing complicated or big transcation API, like
kubernetes PodRunPodsandbox.
Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 5f9b318)
Signed-off-by: Qiutong Song <[email protected]>
If there is any unskipable error during setuping shim plugins, we should fail return error to prevent from leaky shim instance. For example, there is error during init task plugin, the shim ttrpc server will not contain any shim API method. The any call to the shim will receive that failed to create shim task: service containerd.task.v2.Task: not implemented Then containerd can't use `Shutdown` to let the shim close. The shim will be leaky. And also fail return if there is no ttrpc service. Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit b297775) Signed-off-by: Qiutong Song <[email protected]>
Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 1ae6e8b) Signed-off-by: Qiutong Song <[email protected]>
Introduce cni-bridge-fp as CNI bridge plugin wrapper binary for CRI
testing.
With CNI `io.kubernetes.cri.pod-annotations` capability enabled, the user
can inject the failpoint setting by pod's annotation
`cniFailpointControlStateDir`, which stores each pod's failpoint setting
named by `${K8S_POD_NAMESPACE}-${K8S_POD_NAME}.json`.
When the plugin is invoked, the plugin will check the CNI_ARGS to get
the failpoint for the CNI_COMMAND from disk. For the testing, the user
can prepare setting before RunPodSandbox.
Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit be91a21)
Signed-off-by: Qiutong Song <[email protected]>
Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit 3c5e80b) Signed-off-by: Qiutong Song <[email protected]>
Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit cbebeb9) Signed-off-by: Qiutong Song <[email protected]>
* Use delegated plugin call to simplify cni-bridge-cni * Add README.md for cni-bridge-cni Signed-off-by: Wei Fu <[email protected]> (cherry picked from commit e6a2c07) Signed-off-by: Qiutong Song <[email protected]>
1db5e31 to
a85709c
Compare
Done |
|
/ok-to-test |
Backport #7069 to 1.6 branch. This is a prerequisite to backport #5904.
Testing