Fix: Timestamp inconsistency between coredump filename and fileattrib… by PRDinesh · Pull Request #4779 · systemd/systemd

PRDinesh · 2016-12-01T06:01:34Z

This fix is addressing the following issues :

The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution.
The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp.
The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash. Currently it is available as part of the extended attributes, but some times it gets lost when the files are moved across different file-systems. Hence it is better to preserve this information in the file name itself

…utes This fix is addressing the following issues : 1. The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution. 2. The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp. 3. The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash.

keszybz · 2016-12-02T01:03:36Z

src/coredump/coredump.c


+        s = filename_escape(context[CONTEXT_SIGNAL]);
+        if (!s)
+                return -ENOMEM;


I don't think this change is useful. Those names are very long already, and making them even longer is not nice.
The signal that killed the process is already recorded in the core file itself, so even if the extended attribute is lost, this information is available.

I agree, but adding signal number is going to increase the file name size by just 3 bytes. Even though we have the crash reason inside the core file, but that would require us to unxz it, launch gdb to get the reason. Instead if we have this information in the file name, then it would help us to get the reason quickly

Oooh, it's stored as a number? That's even worse, because signal numbers are not portable between architectures, so if it's copied somewhere else, the number becomes meaningless.

Ok, Thank you, I was not aware of this portability issue.

keszybz · 2016-12-02T01:17:00Z

src/coredump/coredump.c

-        context[CONTEXT_TIMESTAMP] = argv[CONTEXT_TIMESTAMP + 1];
+        strncat(timestamp, argv[CONTEXT_TIMESTAMP + 1], 10);
+        strncat(timestamp, "000000", 6);
+        context[CONTEXT_TIMESTAMP] = timestamp;


Don't use strncat. Instead do:

timestamp = strjoina(argv[CONTEXT_TIMESTAMP + 1], "000000");

does the strjoina allocated the newly required memory from heap? do we need to de allocate them? I am assuming it is taking from stack, please clarify

From the stack, no need to allocate (or deallocate) anything.

@keszybz. Today I tested this patch on my fedora machine and found that with this patch the extended attributes are not getting stored on the coredump. The fsetxattr function is returning -1 and the error string is " Operation not supported". I am not sure on how my changes broke the xattr. Without my changes it is working fine. Do you see any mistake here which would brake xattr?

@keszybz , Sorry the xattr issue is not due to this patch. The coredumps are stored in tmpfs. Looks like that tmpfs is not supporting xattrs. I tried setting xattrs on normal files, even there it is not getting set.

keszybz · 2016-12-02T01:18:00Z

The commit message should actually describe the effect of the change you are making... You give the motivation, i.e. what was wrong, but not how it is fixed.

poettering · 2016-12-02T14:46:16Z

So, in systemd we generally handle all time values in µs, and that's why we extend the more coarse time values passed by the kernel with trailing zeroes, when we use them for our own purposes.

The xattrs on the other hand are supposed to carry the raw kernel values, exactly the way the were passed to us, hence there we don't extend them to become µs.

I must say I much prefer if we'd exclusively continue to use µs for all our internal purposes, and that includes the COREDUMP_TIMESTAMP journal field, as well as the coredump file name. And I also think storing the raw kernel data as xattrs makes some sense too. Hence I am not convinced this change would be a good idea.

Or to say this differently: yes, it sucks to simply zero-extend the coarse time values, but I think the right way to fix this would be to fix the kernel to provide userspace with more accurate time values for this, like it tends to do for pretty much everything else...

PRDinesh · 2016-12-02T17:43:12Z

@poettering . Thank you for the insight.
But currently the file name contains 22 digit time stamp, ie., 12 zeros added to the end and the file xattr contains 16 digit timestamp with 6 zeros appended. Is this the intended behaviour? In v219 the timestamp is working as you have explained above, but in v229 it is not.

Also timestamp in case of journald coredump is different than other coredumps. I think it is because the of the fact that process_journal was called before the change of the timestamp value.

keszybz · 2016-12-03T18:07:21Z

Yeah, obviously the format should be the same for "special" coredumps like systemd-journald and other programs. It would be nice to see a separate commit which fixes that.

As to the format: the value in µs is 16 digits. It seems we actually store the time using 16 digits in user.coredump.timestamp xattr, and also in COREDUMP_TIMESTAMP in the journal, but we use 22 digits in the filename. So just changing the filename to also have 16 digits seems most reasonable.

Do you see any mistake here which would brake xattr?

No, everything seems kosher. But I don't want to debug this, if you're going to change the patch anyway. Can you repost the patch without the part that adds the signal number to the filename?

A separate issue is that we store the signal as a number. It'd be nice to (also?) store it as a signal name, so that we can move the journal entry between different architectures and report the signal correctly.

PRDinesh · 2016-12-04T06:12:41Z

@keszybz

A separate issue is that we store the signal as a number. It'd be nice to (also?) store it as a signal name, so that we can move the journal entry between different architectures and report the signal correctly.

Sure. Happy to do that, Shall i use strsignal or signal_to_string to convert the signal number to string?

In summary I will submit three patches separately

Same timestamp format for special and other program's coredump
Change the file name to use 16 digit timestamp
Convert the signal number to string

keszybz · 2016-12-05T14:42:36Z

Sounds good.

Shall i use strsignal or signal_to_string to convert the signal number to string?

No idea. I don't even know why we have signal_to_string if strsignal exists.

poettering · 2016-12-05T16:38:55Z

No idea. I don't even know why we have signal_to_string if strsignal exists.

signal_to_string() turns 2 into "SIGINT". strsignal() turns 2 into "Interrupted". In pretty much all cases strsignal() is hence not what you want, and signal_to_string() is...

PRDinesh · 2016-12-08T17:51:53Z

@poettering , I am sorry, not able to correctly understand your comment on strsignal. Do you suggest to use strsignal or signal_to_string?

keszybz · 2016-12-08T21:00:33Z

signal_to_string.

keszybz · 2017-02-03T01:49:51Z

@PRDinesh any progress?

…off trailing zeroes Our coredump handler operates on a "context" supplied by the kernel via the core_pattern arguments. When we pass off a coredump for processing to coredumpd we pass along enough information for this context to be reconstructed. This information is passed in the usual journal fields, and that means we extended the 1s granularity timestamp to 1µs granularity by appending 6 zeroes. We need to chop them off again when reconstructing the original kernel context. Fixes: systemd#4779

poettering · 2017-02-17T10:45:54Z

I have now posted #5373, which should fix the timestamping issue this is primarily about. Please have a look. I am taking the liberty to close this one now, in favour of the new PR. I hope that's OK!

keszybz requested changes Dec 2, 2016

View reviewed changes

keszybz added coredump reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Dec 2, 2016

poettering added the needs-discussion 🤔 label Dec 2, 2016

keszybz removed the needs-discussion 🤔 label Dec 5, 2016

keszybz added this to the v233 milestone Dec 12, 2016

poettering mentioned this pull request Feb 17, 2017

various coredump fixes #5373

Merged

poettering closed this Feb 17, 2017

Uh oh!

Conversation

PRDinesh commented Dec 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keszybz commented Dec 2, 2016

Uh oh!

poettering commented Dec 2, 2016

Uh oh!

PRDinesh commented Dec 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keszybz commented Dec 3, 2016

Uh oh!

PRDinesh commented Dec 4, 2016

Uh oh!

keszybz commented Dec 5, 2016

Uh oh!

poettering commented Dec 5, 2016

Uh oh!

PRDinesh commented Dec 8, 2016

Uh oh!

keszybz commented Dec 8, 2016

Uh oh!

keszybz commented Feb 3, 2017

Uh oh!

poettering commented Feb 17, 2017

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

PRDinesh commented Dec 2, 2016 •

edited

Loading