Fix the error of runc doesn't work with go1.22#4193
Fix the error of runc doesn't work with go1.22#4193lifubang wants to merge 4 commits intoopencontainers:mainfrom
Conversation
|
go 1.22.0 error msg: |
This can be fixed by adding |
|
Interestingly, both runc 1.1.12 and runc from git HEAD built with go1.22.0 work fine on my machine (all tests are passing). |
This comment was marked as outdated.
This comment was marked as outdated.
|
We also need to fix this for Go 1.22 I don't remember why I haven't switched to |
It seems that cgo may be broken with clone(2) in go1.22.0? |
Again, I can't repro locally. [kir@kir-tp1 cgoclone2]$ go version
go version go1.21.6 linux/amd64
[kir@kir-tp1 cgoclone2]$ go run main.go
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main!
[kir@kir-tp1 cgoclone2]$ go1.22.0 version
go version go1.22.0 linux/amd64
[kir@kir-tp1 cgoclone2]$ go1.22.0 run main.go
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main!Maybe it's your kernel version @lifubang? Can you show |
|
@lifubang also if you can repro that (alas I can not), you can git bisect golang between 1.21.0 and 1.22.0. |
|
Note in CI it happens with Ubuntu 20.04 but not Ubuntu 22.04. Will try to repro in a VM. |
|
On Ubuntu 20.04, when running the binary compiled with go 1.22, I am seeing a SIGSEGV:
Can't yet figure out what's going on there; will continue tomorrow. |
|
@lifubang I did a bisect, here are the results: golang/go#65625 (comment) Will continue tomorrow. |
So, to summarize the investigation done there -- it's a glibc bug, in fact, two bugs:
These two bugs are apparently specific to glibc used by Ubuntu 20.04 (libc6 2.31-0ubuntu9.14) and maybe also Debian 10 (libc6 2.28-10+deb10u2), as I was able to reproduce on both. With Debian 10, it even prints error from free: For some reason I was unable to repro on older Fedora (F32, glibc-2.31-2.fc32, F33, glibc-2.32-10.fc33) and Debian 11 (libc6 2.31-7). The bad news is, every version of glibc has the bug 1 above, and https://go-review.googlesource.com/c/go/+/563379 may make it so go 1.22.x will fail runc init on every version of glibc. Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that |
Go 1.22 currently causes crashes on older Debian/Ubuntu systems. lxc/incus#497 golang/go#65625 opencontainers/runc#4193 Signed-off-by: Stéphane Graber <[email protected]>
Go 1.22 currently causes crashes on older Debian/Ubuntu systems. lxc/incus#497 golang/go#65625 opencontainers/runc#4193 Signed-off-by: Stéphane Graber <[email protected]>
👍 |
|
Rebasing this to re-run with Go 1.22.1 |
c54384f to
5907889
Compare
|
Sorry @lifubang I've high-jacked your PR, needed to run it with Go 1.22.1 and added missing changes to |
|
OK, Go 1.22.1 makes no difference. I guess we have to disable Go 1.22 for now. |
3144923 to
aebd5b5
Compare
|
Do you think moving the c code in |
I think maybe no, because after clone(2), it has already in go routine, it can’t longjmp to C. So, it seems that there is no other way to fix this issue? |
7eac8b8 to
df04ed4
Compare
|
This patch also works, while still allowing us to use diff --git a/libcontainer/nsenter/nsexec.c b/libcontainer/nsenter/nsexec.c
index c771ac7e1165..319899bd9b71 100644
--- a/libcontainer/nsenter/nsexec.c
+++ b/libcontainer/nsenter/nsexec.c
@@ -15,6 +15,7 @@
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
+#include <pthread.h> /* _only_ used for pthread_self() in debug log */
#include <sys/ioctl.h>
#include <sys/prctl.h>
@@ -319,7 +320,41 @@ static int clone_parent(jmp_buf *env, int jmpval)
.jmpval = jmpval,
};
- return clone(child_func, ca.stack_ptr, CLONE_PARENT | SIGCHLD, &ca);
+ /*
+ * Since glibc 2.25 (see c579f48edba88380635ab98cb612030e3ed8691e),
+ * glibc no longer updates the TLS state containing the current process
+ * tid after clone(2). This results in stale TIDs being used when Go
+ * 1.22 and later call pthread_gettattr_np(pthread_self()), resulting
+ * in crashes on ancient glibcs and errors on newer glibcs.
+ *
+ * Luckily, because the same address is used for CLONE_PARENT_SETTID,
+ * we can poke around in glibc's internal cache by getting the address
+ * using PR_GET_TID_ADDRESS (only available in Linux >= 3.5, with
+ * CONFIG_CHECKPOINT_RESTORE=y) and then overwriting it with
+ * CLONE_CHILD_SETTID. CLONE_CHILD_CLEARTID is set to allow descendant
+ * PR_GET_TID_ADDRESS calls to work, as well as matching what glibc
+ * does in arch_fork().
+ *
+ * Yes, this is pretty horrific, but the core issue here is that we
+ * need to run Go code ("runc init") in the child after fork(), which
+ * is not allowed by glibc (see signal-safety(7)). We cannot exec to
+ * solve the problem because we are in a security critical situation
+ * here, and doing an exec would allow for container escapes (obvious
+ * issues include that the shared libraries loaded from a re-exec would
+ * come from the container, and doing an exec here would clear the bit
+ * that makes non-dumpable flags effective for userns containers with
+ * CAP_SYS_PTRACE).
+ */
+ pid_t *tid_addr = NULL;
+ if (prctl(PR_GET_TID_ADDRESS, &tid_addr) < 0)
+ /* what should we do here... */;
+ write_log(DEBUG, "nsenter clone: get_tid_address gave us %p (pthread_self=%p)", tid_addr, (void *) pthread_self());
+ if (!tid_addr || *tid_addr != gettid())
+ write_log(WARNING, "nsenter clone: glibc private tid address is wrong: *%p %d != gettid() %d", tid_addr, tid_addr ? *tid_addr : -1, gettid());
+
+ return clone(child_func, ca.stack_ptr,
+ CLONE_PARENT | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, &ca,
+ NULL /* parent_tid */ , NULL /* tls */ , tid_addr);
}
/* Returns the clone(2) flag for a namespace, given the name of a namespace. */@kolyshkin wdyt? |
|
@lifubang I can also take a look next week at whether we can somehow remove stage-1 so that we don't need a grandchild (which would remove the need for |
|
This PR needs some refactor work, so convert it to draft state. Line 184 in df04ed4 |
df04ed4 to
41d8548
Compare
Signed-off-by: lifubang <[email protected]>
1253158 to
bdec4c7
Compare
|
Welcome more suggestions. |
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <[email protected]>
This reverts commit ac31da6. Signed-off-by: lifubang <[email protected]>
This reverts commit e377e16. Signed-off-by: lifubang <[email protected]>
bdec4c7 to
1e60e59
Compare
| logrus.Warn(err) | ||
| } | ||
| } | ||
| func newSignalHandler(notifySocket *notifySocket) *signalHandler { |
There was a problem hiding this comment.
As @kolyshkin pointed out in #4278 (comment)
This PR has done such things. But there is still one thing I can't determine whether we should do or not:
If we use PR_SET_CHILD_SUBREAPER and fork(2) to replace clone(CLONE_PARENT), I think we should move this signal handler to libcontainer. Or else someone use libcontainer directly in the code will have to write a signal handler by themselves.
|
closing in favor of #4292 |
As the description in #4233, there is a bug in glibc, pthread_self()
will return wrong info after we do
clone(CLONE_PARENT)in libct/nsenter,it will cause runc can't work in
go 1.22.*. So we use fork(2) to replaceclone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use
PR_SET_CHILD_SUBREAPERto let runc can reap grand childprocess in libct/nsenter.
Fix #4233