Skip to content

Collect interpreter backtraces in systemd-coredump#4526

Merged
poettering merged 12 commits intosystemd:masterfrom
keszybz:coredump-python
Feb 16, 2017
Merged

Collect interpreter backtraces in systemd-coredump#4526
poettering merged 12 commits intosystemd:masterfrom
keszybz:coredump-python

Conversation

@keszybz
Copy link
Member

@keszybz keszybz commented Oct 31, 2016

This is a POC for https://bugs.freedesktop.org/show_bug.cgi?id=82507.
We allow callers to specify the backtrace, and attach metadata in systemd-coredump and send it off to the journal. This makes it possible to log e.g. python backtraces easily from within python, while keeping things integrated with systemd-coredumpctl and coredumpctl information gathering.
(The coredumpctl part is missing...)

I'm putting this out here in order to gather comments.

The python counterpart:
https://github.com/keszybz/systemd-coredump-python

if (_x) \
IOVEC_SET_STRING(iovec[(*(n_iovec))++], _x); \
_x; \
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not fan of such macros i must say. Can't this be a regular function?

Also, this allocates memory and returns it as a "const char*"... That looks strange. In C free() expects a non-const ptr really..

const char *x = SET_IOVEC_FIELD(iovec, n_iovec, field, value); \
free(value); \
x; \
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar case here, as above...

log_debug("Processing traceback on stdin...");

if (argc < CONTEXT_COMM + 1) {
log_error("Not enough arguments passed from kernel (%i, expected %i).", argc - 1, CONTEXT_COMM + 1 - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't say "from kernel" here anymore, no?

return r;
}

static int process_traceback(int argc, char *argv[]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i am more used to the word "backtrace" rather than "traceback"? /me is confused...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"traceback" is more commonly used in Python. I'll convert to "backtrace".


#define SD_MESSAGE_COREDUMP SD_ID128_MAKE(fc,2e,22,bc,6e,e6,47,b6,b9,07,29,ab,34,a2,50,b1)
#define SD_MESSAGE_TRUNCATED_CORE SD_ID128_MAKE(5a,ad,d8,e9,54,dc,4b,1a,8c,95,4d,63,fd,9e,11,37)
#define SD_MESSAGE_EXCEPTION SD_ID128_MAKE(1f,4e,0a,44,a8,86,49,93,9a,ae,a3,4f,c6,da,8c,95)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, you are calling this "exception" here... I think we should stick to a single terminology here. I.e. either call the message id "stacktrace" or call the command line switch "--exception" too.

I am mostly leaning towards" "stacktrace for both though.

@poettering
Copy link
Member

looks conceptually good to me, but see my comments.

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Nov 2, 2016
@keszybz keszybz force-pushed the coredump-python branch 2 times, most recently from f0a1d08 to d5efc18 Compare November 3, 2016 13:46
@keszybz
Copy link
Member Author

keszybz commented Nov 3, 2016

Oops, I pushed the wrong branch here. Please ignore for now.

@keszybz
Copy link
Member Author

keszybz commented Nov 3, 2016

TEST RUN: Job-related tests [FAILED]
make: *** [run] Error 1
Makefile:8: recipe for target 'run' failed
make: Leaving directory '/tmp/autopkgtest.abfqqZ/build.z6k/systemd-debian/test/TEST-03-JOBS'

I hope that's not related...

@keszybz keszybz force-pushed the coredump-python branch 3 times, most recently from cc66303 to 42dab00 Compare November 6, 2016 20:08
@keszybz keszybz changed the title [WIP] Collect python backtraces in systemd-coredump Collect interpreter backtraces in systemd-coredump Dec 18, 2016
@keszybz keszybz added tree-wide and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Dec 18, 2016
@keszybz
Copy link
Member Author

keszybz commented Dec 18, 2016

Rebased. Please have a look.

I think the general approach is good. There was some discussion on the bug, but no major objections. I'm sure that some tweaks will be needed as more experience is gained, but I think this should be quite useful in current state. I'll polish and submit the python counterpart as a fedora package once this is accepted.

@poettering
Copy link
Member

hmm, needs rebasing, currently conflicting

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments

HACKING Outdated
$ ./autogen.sh c # configure the source tree
$ make -j `nproc` # build it locally, see if everything compiles fine
$ sudo dnf builddep systemd # install build dependencies
$ sudo dnf install mkosi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change doesn't look right. You should really install the systemd build deps before invoking "autogen.sh c", otherwise the deps will be most likely missing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

core_pid = strjoina("COREDUMP_PID=", context[CONTEXT_PID]);
IOVEC_SET_STRING(iovec[n_iovec++], core_pid);
if (!set_iovec_field(iovec, n_iovec, "COREDUMP_UID=", context[CONTEXT_UID]))
return log_oom();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably add a comment somewhere that we will return a "half-initialized" iovec array on failure. Usually after all we try to rollback stuff we failed to finished...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that in separate commit.

goto finish;

if (isempty(context[CONTEXT_SIGNAL]))
context[CONTEXT_SIGNAL] = "an exception";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds weird to me to initialize this to "an exception", given that the field doesn't really apply... maybe use NULL or (char*) -1 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is removed in one of the later commits. I'd leave it as is because of that.

0 })

#define SD_ID128_MAKE_STR(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p) \
#a #b #c #d #e #f #g #h #i #j #k #l #m #n #o #p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so I wonder, why have this macro at all? I mean, it's easier to write this as string directly, isn't it?

i.e. this is shorter and easier to read:

#define MESSAGE_FOOBAR_STR "1a1528c2abd54e1ba9a7585e3edb767e"

than this:

#define MESSAGE_FOOBAR_STR SD_ID128_MAKE_STR(1a,15,28,c2,ab,d5,4e,1b,a9,a7,58,5e,3e,db,76,7e)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of digits is checked, and it is easier to copy&paste (or even use some macro as I did) to generate that line from SD_ID128_MAKE, and later to check that those numbers match. Dunno, I think it's a slight win.

@keszybz
Copy link
Member Author

keszybz commented Jan 16, 2017

Updated.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more notes. And sorry for forgetting to review this. Looks pretty good, let's get this in quickly now!

" of user ", context[CONTEXT_UID],
" failed with ", context[CONTEXT_SIGNAL]);
if (!message) {
log_oom();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r must be initialized here, before you jump to "finish"

while (true) {
r = journal_importer_process_data(&importer);
if (r < 0)
return log_error_errno(r, "Failed to parse journal entry on stdin: %m");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you jump to "finish" here, too?

<title>Description</title>
<para><command>systemd-coredump</command> is a system service that can acquire core dumps
from the kernel and handle them in various ways.</para>
<para><command>[email protected]</command> is a system service that can acquire core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unit names we so far used <filename></filename>, as they usually map 1:1 to filenames...

<para><command>[email protected]</command> is a system service that can acquire core
dumps from the kernel and handle them in various ways. The <command>systemd-coredump</command>
executable does the actual work. It is invoked twice: once as the handler by the kernel, and the
second time in the <command>[email protected]</command> to actually write the data to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

In preparation for subsequenct changes...

Various stack allocations are changed to use the heap. This might be minimally
slower, but probably doesn't matter. The upside is that we will now properly
free all memory that is allocated.
This is useful for example for Python progams. By installing a python
sys.execepthook we can store the backtrace in the journal. We gather the
backtrace in the python process, and call systemd-coredump to attach additional
fields (COREDUMP_COMM, COREDUMP_EXE, COREDUMP_UNIT, COREDUMP_USER_UNIT,
COREDUMP_OWNER_UID, COREDUMP_SLICE, COREDUMP_CMDLINE, COREDUMP_CGROUP,
COREDUMP_OPEN_FDS, COREDUMP_PROC_STATUS, COREDUMP_PROC_MAPS,
COREDUMP_PROC_LIMITS, COREDUMP_PROC_MOUNTINFO, COREDUMP_CWD, COREDUMP_ROOT,
COREDUMP_ENVIRON, COREDUMP_CONTAINER_CMDLINE). This could also be done in the
python process, but doing this in systemd-coredump saves quite a bit of
duplicate work and unifies the handling of various tricky fields like
COREDUMP_CONTAINER_CMDLINE in one place.

(Of course this applies to any other language which does not dump cores
but wants to log a traceback, e.g. ruby.)

journal entry:
    _TRANSPORT=journal
    _UID=1002
    _GID=1002
    _CAP_EFFECTIVE=0
    _AUDIT_LOGINUID=1002
    _SYSTEMD_OWNER_UID=1002
    _SYSTEMD_SLICE=user-1002.slice
    _SYSTEMD_USER_SLICE=-.slice
    _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    _BOOT_ID=1531fd22ec84429e85ae888b12fadb91
    _MACHINE_ID=519a16632fbd4c71966ce9305b360c9c
    _HOSTNAME=laptop
    _AUDIT_SESSION=1
    [email protected]
    _SYSTEMD_INVOCATION_ID=3c4238d790a44aca9576ecdb2c7576d3
    [email protected]
    COREDUMP_USER_UNIT=gnome-terminal-server.service
    COREDUMP_UID=1002
    COREDUMP_GID=1002
    COREDUMP_OWNER_UID=1002
    COREDUMP_SLICE=user-1002.slice
    COREDUMP_CGROUP=/user.slice/user-1002.slice/[email protected]/gnome-terminal-server.service
    COREDUMP_PROC_LIMITS=Limit                     Soft Limit           Hard Limit           Units
                         Max cpu time              unlimited            unlimited            seconds
                         Max file size             unlimited            unlimited            bytes
                         Max data size             unlimited            unlimited            bytes
                         Max stack size            8388608              unlimited            bytes
                         Max core file size        unlimited            unlimited            bytes
                         Max resident set          unlimited            unlimited            bytes
                         Max processes             15413                15413                processes
                         Max open files            4096                 4096                 files
                         Max locked memory         65536                65536                bytes
                         Max address space         unlimited            unlimited            bytes
                         Max file locks            unlimited            unlimited            locks
                         Max pending signals       15413                15413                signals
                         Max msgqueue size         819200               819200               bytes
                         Max nice priority         0                    0
                         Max realtime priority     0                    0
                         Max realtime timeout      unlimited            unlimited            us
    COREDUMP_PROC_CGROUP=1:name=systemd:/
                         0::/user.slice/user-1002.slice/[email protected]/gnome-terminal-server.service
    COREDUMP_PROC_MOUNTINFO=17 39 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
                            18 39 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
                            19 39 0:6 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=1972980k,nr_inodes=493245,mode=755
                            20 17 0:18 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - securityfs securityfs rw
                            21 19 0:19 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
                            22 19 0:20 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
                            23 39 0:21 / /run rw,nosuid,nodev shared:12 - tmpfs tmpfs rw,seclabel,mode=755
                            24 17 0:22 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:8 - cgroup2 cgroup rw
                            25 17 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:9 - pstore pstore rw,seclabel
                            36 17 0:24 / /sys/kernel/config rw,relatime shared:10 - configfs configfs rw
                            39 0 0:26 /root / rw,relatime shared:1 - btrfs /dev/mapper/fedora-root2 rw,seclabel,ssd,space_cache,subvolid=257,subvol=/root
                            26 17 0:16 / /sys/fs/selinux rw,relatime shared:11 - selinuxfs selinuxfs rw
                            27 19 0:15 / /dev/mqueue rw,relatime shared:13 - mqueue mqueue rw,seclabel
                            28 18 0:30 / /proc/sys/fs/binfmt_misc rw,relatime shared:14 - autofs systemd-1 rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13663
                            29 17 0:7 / /sys/kernel/debug rw,relatime shared:15 - debugfs debugfs rw,seclabel
                            30 19 0:31 / /dev/hugepages rw,relatime shared:16 - hugetlbfs hugetlbfs rw,seclabel
                            31 18 0:32 / /proc/fs/nfsd rw,relatime shared:17 - nfsd nfsd rw
                            32 28 0:33 / /proc/sys/fs/binfmt_misc rw,relatime shared:18 - binfmt_misc binfmt_misc rw
                            57 39 0:34 / /tmp rw,relatime shared:19 - tmpfs none rw,seclabel
                            61 57 0:35 / /tmp/test rw,relatime shared:20 - autofs systemd-1 rw,fd=48,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18251
                            59 39 8:1 / /boot rw,relatime shared:21 - ext4 /dev/sda1 rw,seclabel,data=ordered
                            60 39 253:2 / /home rw,relatime shared:22 - ext4 /dev/mapper/fedora-home rw,seclabel,data=ordered
                            65 39 0:37 / /var/lib/nfs/rpc_pipefs rw,relatime shared:23 - rpc_pipefs sunrpc rw
                            136 23 0:39 / /run/user/1002 rw,nosuid,nodev,relatime shared:91 - tmpfs tmpfs rw,seclabel,size=397432k,mode=700,uid=1002,gid=1002
                            211 23 0:41 / /run/user/42 rw,nosuid,nodev,relatime shared:163 - tmpfs tmpfs rw,seclabel,size=397432k,mode=700,uid=42,gid=42
                            329 136 0:44 / /run/user/1002/gvfs rw,nosuid,nodev,relatime shared:277 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1002,group_id=1002
                            287 61 253:3 / /tmp/test rw,relatime shared:236 - ext4 /dev/mapper/fedora-test rw,seclabel,data=ordered
                            217 23 0:42 / /run/user/1000 rw,nosuid,nodev,relatime shared:168 - tmpfs tmpfs rw,seclabel,size=397432k,mode=700,uid=1000,gid=1000
                            225 217 0:43 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:175 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000
    COREDUMP_ROOT=/
    PRIORITY=2
    CODE_FILE=src/coredump/coredump.c
    SYSLOG_IDENTIFIER=lt-systemd-coredump
    _COMM=lt-systemd-core
    _SYSTEMD_CGROUP=/user.slice/user-1002.slice/[email protected]/gnome-terminal-server.service
    _SYSTEMD_USER_UNIT=gnome-terminal-server.service
    MESSAGE_ID=1f4e0a44a88649939aaea34fc6da8c95
    CODE_FUNC=process_traceback
    COREDUMP_COMM=python3
    COREDUMP_EXE=/usr/bin/python3.5
    COREDUMP_CMDLINE=python3 systemd_coredump_exception_handler.py
    COREDUMP_CWD=/home/zbyszek/src/systemd-coredump-python
    COREDUMP_RLIMIT=-1
    COREDUMP_OPEN_FDS=0:/dev/pts/1
                      pos:	0
                      flags:	0102002
                      mnt_id:	22

                      1:/dev/pts/1
                      pos:	0
                      flags:	0102002
                      mnt_id:	22

                      2:/dev/pts/1
                      pos:	0
                      flags:	0102002
                      mnt_id:	22
    CODE_LINE=1284
    COREDUMP_SIGNAL=ZeroDivisionError: division by zero
    COREDUMP_ENVIRON=LANG=en_US.utf8
                     DISPLAY=:0
                     ...
                     MANWIDTH=90
                     LC_MESSAGES=en_US.utf8
                     PYTHONPATH=.
                     _=/usr/bin/python3
    COREDUMP_PID=14498
    COREDUMP_PROC_STATUS=Name:	python3
                         Umask:	0002
                         State:	S (sleeping)
                         Tgid:	14498
                         Ngid:	0
                         Pid:	14498
                         PPid:	16245
                         TracerPid:	0
                         Uid:	1002	1002	1002	1002
                         Gid:	1002	1002	1002	1002
                         FDSize:	64
                         Groups:
                         NStgid:	14498
                         NSpid:	14498
                         NSpgid:	14498
                         NSsid:	16245
                         VmPeak:	   34840 kB
                         VmSize:	   34792 kB
                         VmLck:	       0 kB
                         VmPin:	       0 kB
                         VmHWM:	    9332 kB
                         VmRSS:	    9332 kB
                         RssAnon:	    4872 kB
                         RssFile:	    4460 kB
                         RssShmem:	       0 kB
                         VmData:	    5012 kB
                         VmStk:	     136 kB
                         VmExe:	       4 kB
                         VmLib:	    5452 kB
                         VmPTE:	      84 kB
                         VmPMD:	      12 kB
                         VmSwap:	       0 kB
                         HugetlbPages:	       0 kB
                         Threads:	1
                         SigQ:	0/15413
                         SigPnd:	0000000000000000
                         ShdPnd:	0000000000000000
                         SigBlk:	0000000000000000
                         SigIgn:	0000000001001000
                         SigCgt:	0000000180000002
                         CapInh:	0000000000000000
                         CapPrm:	0000000000000000
                         CapEff:	0000000000000000
                         CapBnd:	0000003fffffffff
                         CapAmb:	0000000000000000
                         Seccomp:	0
                         Cpus_allowed:	f
                         Cpus_allowed_list:	0-3
                         Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
                         Mems_allowed_list:	0
                         voluntary_ctxt_switches:	2
                         nonvoluntary_ctxt_switches:	47
    COREDUMP_PROC_MAPS=55cb7b7fe000-55cb7b7ff000 r-xp 00000000 00:1a 5289186                    /usr/bin/python3.5
                       55cb7b9ff000-55cb7ba00000 r--p 00001000 00:1a 5289186                    /usr/bin/python3.5
                       55cb7ba00000-55cb7ba01000 rw-p 00002000 00:1a 5289186                    /usr/bin/python3.5
                       55cb7c007000-55cb7c189000 rw-p 00000000 00:00 0                          [heap]
                       7f4da2d51000-7f4da2d54000 r-xp 00000000 00:1a 5279150                    /usr/lib64/python3.5/lib-dynload/resource.cpython-35m-x86_64-linux-gnu.so
                       7f4da2d54000-7f4da2f53000 ---p 00003000 00:1a 5279150                    /usr/lib64/python3.5/lib-dynload/resource.cpython-35m-x86_64-linux-gnu.so
                       7f4da2f53000-7f4da2f54000 r--p 00002000 00:1a 5279150                    /usr/lib64/python3.5/lib-dynload/resource.cpython-35m-x86_64-linux-gnu.so
                       7f4da2f54000-7f4da2f55000 rw-p 00003000 00:1a 5279150                    /usr/lib64/python3.5/lib-dynload/resource.cpython-35m-x86_64-linux-gnu.so
                       7f4da2f55000-7f4da2f5d000 r-xp 00000000 00:1a 5279143                    /usr/lib64/python3.5/lib-dynload/math.cpython-35m-x86_64-linux-gnu.so
                       7f4da2f5d000-7f4da315c000 ---p 0000800 00:1a 5279143                    /usr/lib64/python3.5/lib-dynload/math.cpython-35m-x86_64-linux-gnu.so
                       7f4da315c000-7f4da315d000 r--p 00007000 00:1a 5279143                    /usr/lib64/python3.5/lib-dynload/math.cpython-35m-x86_64-linux-gnu.so
                       7f4da315d000-7f4da315f000 rw-p 0000800 00:1a 5279143                    /usr/lib64/python3.5/lib-dynload/math.cpython-35m-x86_64-linux-gnu.so
                       7f4da315f000-7f4da319f000 rw-p 00000000 00:00 0
                       7f4da319f000-7f4da31a4000 r-xp 00000000 00:1a 5279151                    /usr/lib64/python3.5/lib-dynload/select.cpython-35m-x86_64-linux-gnu.so
                       7f4da31a4000-7f4da33a3000 ---p 00005000 00:1a 5279151                    /usr/lib64/python3.5/lib-dynload/select.cpython-35m-x86_64-linux-gnu.so
                       7f4da33a3000-7f4da33a4000 r--p 0000400 00:1a 5279151                    /usr/lib64/python3.5/lib-dynload/select.cpython-35m-x86_64-linux-gnu.so
                       7f4da33a4000-7f4da33a6000 rw-p 00005000 00:1a 5279151                    /usr/lib64/python3.5/lib-dynload/select.cpython-35m-x86_64-linux-gnu.so
                       7f4da33a6000-7f4da33a9000 r-xp 00000000 00:1a 5279130                    /usr/lib64/python3.5/lib-dynload/_posixsubprocess.cpython-35m-x86_64-linux-gnu.so
                       7f4da33a9000-7f4da35a8000 ---p 00003000 00:1a 5279130                    /usr/lib64/python3.5/lib-dynload/_posixsubprocess.cpython-35m-x86_64-linux-gnu.so
                       7f4da35a8000-7f4da35a9000 r--p 00002000 00:1a 5279130                    /usr/lib64/python3.5/lib-dynload/_posixsubprocess.cpython-35m-x86_64-linux-gnu.so
                       7f4da35a9000-7f4da35aa000 rw-p 00003000 00:1a 5279130                    /usr/lib64/python3.5/lib-dynload/_posixsubprocess.cpython-35m-x86_64-linux-gnu.so
                       7f4da35aa000-7f4da362a000 rw-p 00000000 00:00 0
                       7f4da362a000-7f4da362c000 r-xp 00000000 00:1a 5279122                    /usr/lib64/python3.5/lib-dynload/_heapq.cpython-35m-x86_64-linux-gnu.so
                       7f4da362c000-7f4da382b000 ---p 00002000 00:1a 5279122                    /usr/lib64/python3.5/lib-dynload/_heapq.cpython-35m-x86_64-linux-gnu.so
                       7f4da382b000-7f4da382c000 r--p 00001000 00:1a 5279122                    /usr/lib64/python3.5/lib-dynload/_heapq.cpython-35m-x86_64-linux-gnu.so
                       7f4da382c000-7f4da382e000 rw-p 00002000 00:1a 5279122                    /usr/lib64/python3.5/lib-dynload/_heapq.cpython-35m-x86_64-linux-gnu.so
                       7f4da382e000-7f4da39ee000 rw-p 00000000 00:00 0
                       7f4da39ee000-7f4da3bab000 r-xp 00000000 00:1a 4844904                    /usr/lib64/libc-2.24.so
                       7f4da3bab000-7f4da3daa000 ---p 001bd000 00:1a 4844904                    /usr/lib64/libc-2.24.so
                       7f4da3daa000-7f4da3dae000 r--p 001bc000 00:1a 4844904                    /usr/lib64/libc-2.24.so
                       7f4da3dae000-7f4da3db0000 rw-p 001c0000 00:1a 4844904                    /usr/lib64/libc-2.24.so
                       7f4da3db0000-7f4da3db4000 rw-p 00000000 00:00 0
                       7f4da3db4000-7f4da3ebc000 r-xp 00000000 00:1a 4844910                    /usr/lib64/libm-2.24.so
                       7f4da3ebc000-7f4da40bb000 ---p 00108000 00:1a 4844910                    /usr/lib64/libm-2.24.so
                       7f4da40bb000-7f4da40bc000 r--p 00107000 00:1a 4844910                    /usr/lib64/libm-2.24.so
                       7f4da40bc000-7f4da40bd000 rw-p 00108000 00:1a 4844910                    /usr/lib64/libm-2.24.so
                       7f4da40bd000-7f4da40bf000 r-xp 00000000 00:1a 4844928                    /usr/lib64/libutil-2.24.so
                       7f4da40bf000-7f4da42be000 ---p 00002000 00:1a 4844928                    /usr/lib64/libutil-2.24.so
                       7f4da42be000-7f4da42bf000 r--p 00001000 00:1a 4844928                    /usr/lib64/libutil-2.24.so
                       7f4da42bf000-7f4da42c0000 rw-p 00002000 00:1a 4844928                    /usr/lib64/libutil-2.24.so
                       7f4da42c0000-7f4da42c3000 r-xp 00000000 00:1a 4844908                    /usr/lib64/libdl-2.24.so
                       7f4da42c3000-7f4da44c2000 ---p 00003000 00:1a 4844908                    /usr/lib64/libdl-2.24.so
                       7f4da44c2000-7f4da44c3000 r--p 00002000 00:1a 4844908                    /usr/lib64/libdl-2.24.so
                       7f4da44c3000-7f4da44c4000 rw-p 00003000 00:1a 4844908                    /usr/lib64/libdl-2.24.so
                       7f4da44c4000-7f4da44dc000 r-xp 00000000 00:1a 4844920                    /usr/lib64/libpthread-2.24.so
                       7f4da44dc000-7f4da46dc000 ---p 00018000 00:1a 4844920                    /usr/lib64/libpthread-2.24.so
                       7f4da46dc000-7f4da46dd000 r--p 00018000 00:1a 4844920                    /usr/lib64/libpthread-2.24.so
                       7f4da46dd000-7f4da46de000 rw-p 00019000 00:1a 4844920                    /usr/lib64/libpthread-2.24.so
                       7f4da46de000-7f4da46e2000 rw-p 00000000 00:00 0
                       7f4da46e2000-7f4da4917000 r-xp 00000000 00:1a 5277535                    /usr/lib64/libpython3.5m.so.1.0
                       7f4da4917000-7f4da4b17000 ---p 00235000 00:1a 5277535                    /usr/lib64/libpython3.5m.so.1.0
                       7f4da4b17000-7f4da4b1c000 r--p 00235000 00:1a 5277535                    /usr/lib64/libpython3.5m.so.1.0
                       7f4da4b1c000-7f4da4b7f000 rw-p 0023a000 00:1a 5277535                    /usr/lib64/libpython3.5m.so.1.0
                       7f4da4b7f000-7f4da4baf000 rw-p 00000000 00:00 0
                       7f4da4baf000-7f4da4bd4000 r-xp 00000000 00:1a 4844897                    /usr/lib64/ld-2.24.so
                       7f4da4bdf000-7f4da4c10000 rw-p 00000000 00:00 0
                       7f4da4c10000-7f4da4c61000 r--p 00000000 00:1a 5225117                    /usr/lib/locale/pl_PL.utf8/LC_CTYPE
                       7f4da4c61000-7f4da4d91000 r--p 00000000 00:1a 4844827                    /usr/lib/locale/en_US.utf8/LC_COLLATE
                       7f4da4d91000-7f4da4d95000 rw-p 00000000 00:00 0
                       7f4da4dc1000-7f4da4dc2000 r--p 00000000 00:1a 4844832                    /usr/lib/locale/en_US.utf8/LC_NUMERIC
                       7f4da4dc2000-7f4da4dc3000 r--p 00000000 00:1a 4844795                    /usr/lib/locale/en_US.utf8/LC_TIME
                       7f4da4dc3000-7f4da4dc4000 r--p 00000000 00:1a 4844793                    /usr/lib/locale/en_US.utf8/LC_MONETARY
                       7f4da4dc4000-7f4da4dc5000 r--p 00000000 00:1a 4844830                    /usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
                       7f4da4dc5000-7f4da4dc6000 r--p 00000000 00:1a 4844847                    /usr/lib/locale/en_US.utf8/LC_PAPER
                       7f4da4dc6000-7f4da4dc7000 r--p 00000000 00:1a 4844831                    /usr/lib/locale/en_US.utf8/LC_NAME
                       7f4da4dc7000-7f4da4dc8000 r--p 00000000 00:1a 4844790                    /usr/lib/locale/en_US.utf8/LC_ADDRESS
                       7f4da4dc8000-7f4da4dc9000 r--p 00000000 00:1a 4844794                    /usr/lib/locale/en_US.utf8/LC_TELEPHONE
                       7f4da4dc9000-7f4da4dca000 r--p 00000000 00:1a 4844792                    /usr/lib/locale/en_US.utf8/LC_MEASUREMENT
                       7f4da4dca000-7f4da4dd1000 r--s 00000000 00:1a 4845203                    /usr/lib64/gconv/gconv-modules.cache
                       7f4da4dd1000-7f4da4dd2000 r--p 00000000 00:1a 4844791                    /usr/lib/locale/en_US.utf8/LC_IDENTIFICATION
                       7f4da4dd2000-7f4da4dd4000 rw-p 00000000 00:00 0
                       7f4da4dd4000-7f4da4dd5000 r--p 00025000 00:1a 4844897                    /usr/lib64/ld-2.24.so
                       7f4da4dd5000-7f4da4dd6000 rw-p 00026000 00:1a 4844897                    /usr/lib64/ld-2.24.so
                       7f4da4dd6000-7f4da4dd7000 rw-p 00000000 00:00 0
                       7ffd24da1000-7ffd24dc2000 rw-p 00000000 00:00 0                          [stack]
                       7ffd24de8000-7ffd24dea000 r--p 00000000 00:00 0                          [vvar]
                       7ffd24dea000-7ffd24dec000 r-xp 00000000 00:00 0                          [vdso]
                       ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
    COREDUMP_TIMESTAMP=1477877460000000
    MESSAGE=Process 14498 (python3) of user 1002 failed with ZeroDivisionError: division by zero:

            Traceback (most recent call last):
              File "systemd_coredump_exception_handler.py", line 89, in <module>
                g()
              File "systemd_coredump_exception_handler.py", line 88, in g
                f()
              File "systemd_coredump_exception_handler.py", line 86, in f
                div0 = 1 / 0  # pylint: disable=W0612
            ZeroDivisionError: division by zero

            Local variables in innermost frame:
              h=<function f at 0x7f4da3606e18>
              a=3
    _PID=14499
    _SOURCE_REALTIME_TIMESTAMP=1477877460025975
…rter code

Only one test case is added, but it is enough to check basic sanity of the
code (single-line and binary fields and trusted fields, allocation and freeing).
The entry must be a single entry in the journal export format, including the
terminating double newline. The MESSAGE field is now generated on the sender
side.

The advantage is that the reporter can easily pass additional metadata.
Continuing with the example of the python excepthook:

COREDUMP_PYTHON_EXECUTABLE=/usr/bin/python3
COREDUMP_PYTHON_VERSION=3.5.2 (default, Sep 14 2016, 11:28:32)
                        [GCC 6.2.1 20160901 (Red Hat 6.2.1-1)]
COREDUMP_PYTHON_THREAD_INFO=sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.24')
COREDUMP_PYTHON_EXCEPTION_TYPE=ZeroDivisionError
COREDUMP_PYTHON_EXCEPTION_VALUE=division by zero
MESSAGE=Process 29514 (systemd_coredump_exception_handler.py) of user zbyszek failed with ZeroDivisionError: division by zero

        Traceback (most recent call last):
          File "systemd_coredump_exception_handler.py", line 134, in <module>
            g()
          File "systemd_coredump_exception_handler.py", line 133, in g
            f()
          File "systemd_coredump_exception_handler.py", line 131, in f
            div0 = 1 / 0
        ZeroDivisionError: division by zero

        Local variables in innermost frame:
          a=3
          h=<function f at 0x7efdc14b6ea0>

One consideration is whether to use the Journal Export Format, or send packets
over a UNIX socket instead. The advantage of current solution is that although
parsing is more complicated on the receiver side, it is much easier to use on the
sender side. I hope this can be used by various languages for which writing
binary structures to a UNIX socket is harder and more likely to be done wrong
than piping of a simple textyish format.
No functional change, and we don't lose match order.
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
  "conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.

Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:

$ size .libs/systemd{,-logind,-journald}{.old,}
   text	   data	    bss	    dec	    hex	filename
1265204	 149564	   4808	1419576	 15a938	.libs/systemd.old
1260268	 149564	   4808	1414640	 1595f0	.libs/systemd
 246805	  13852	    209	 260866	  3fb02	.libs/systemd-logind.old
 240973	  13852	    209	 255034	  3e43a	.libs/systemd-logind
 146839	   4984	     34	 151857	  25131	.libs/systemd-journald.old
 146391	   4984	     34	 151409	  24f71	.libs/systemd-journald

It is also much easier to check if a certain binary uses a certain MESSAGE_ID:

$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x

$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
$ ./coredumpctl --no-pager -1
TIME                            PID   UID   GID SIG COREFILE EXE
Sun 2016-11-06 10:10:51 EST   29514  1002  1002   - -        /usr/bin/python3.5

$ ./coredumpctl info 29514
           PID: 29514 (python3)
           UID: 1002 (zbyszek)
           GID: 1002 (zbyszek)
        Reason: ZeroDivisionError
     Timestamp: Sun 2016-11-06 10:10:51 EST (3h 22min ago)
  Command Line: python3 systemd_coredump_exception_handler.py
    Executable: /usr/bin/python3.5
 Control Group: /user.slice/user-1002.slice/[email protected]/gnome-terminal-server.service
          Unit: [email protected]
     User Unit: gnome-terminal-server.service
         Slice: user-1002.slice
     Owner UID: 1002 (zbyszek)
       Boot ID: 1531fd22ec84429e85ae888b12fadb91
    Machine ID: 519a16632fbd4c71966ce9305b360c9c
      Hostname: laptop
       Storage: none
       Message: Process 29514 (systemd_coredump_exception_handler.py) of user zbyszek failed with ZeroDivisionError: division by

                Traceback (most recent call last):
                  File "systemd_coredump_exception_handler.py", line 134, in <module>
                    g()
                  File "systemd_coredump_exception_handler.py", line 133, in g
                    f()
                  File "systemd_coredump_exception_handler.py", line 131, in f
                    div0 = 1 / 0
                ZeroDivisionError: division by zero

                Local variables in innermost frame:
                  a=3
                  h=<function f at 0x7efdc14b6ea0>
@keszybz
Copy link
Member Author

keszybz commented Feb 16, 2017

Pushed with all the suggested changes. I also tested this a bit, and systemd-journal-upload, and everything seems to work.

Assertion 'glob(TEST_DATA_DIR("/test-resolve/*.pkts"), GLOB_NOSORT, NULL, &g) == 0' failed at ../src/resolve/test-dns-packet.c:120, function main(). Aborting.
EXFAIL: test-dns-packet
...
root-unittests       FAIL non-zero exit status 134

I think that's unrelated, but I'm not entirely sure.

@poettering poettering merged commit 2fe917f into systemd:master Feb 16, 2017
@keszybz keszybz deleted the coredump-python branch February 19, 2017 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants