Skip to content

Fix: Timestamp inconsistency between coredump filename and fileattrib…#4779

Closed
PRDinesh wants to merge 1 commit intosystemd:masterfrom
PRDinesh:PRDinesh-coredump-timestamp-patch
Closed

Fix: Timestamp inconsistency between coredump filename and fileattrib…#4779
PRDinesh wants to merge 1 commit intosystemd:masterfrom
PRDinesh:PRDinesh-coredump-timestamp-patch

Conversation

@PRDinesh
Copy link

@PRDinesh PRDinesh commented Dec 1, 2016

This fix is addressing the following issues :

  1. The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution.

  2. The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp.

  3. The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash. Currently it is available as part of the extended attributes, but some times it gets lost when the files are moved across different file-systems. Hence it is better to preserve this information in the file name itself

…utes

This fix is addressing the following issues :

1. The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution.

2. The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp.

3. The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash.

s = filename_escape(context[CONTEXT_SIGNAL]);
if (!s)
return -ENOMEM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is useful. Those names are very long already, and making them even longer is not nice.
The signal that killed the process is already recorded in the core file itself, so even if the extended attribute is lost, this information is available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but adding signal number is going to increase the file name size by just 3 bytes. Even though we have the crash reason inside the core file, but that would require us to unxz it, launch gdb to get the reason. Instead if we have this information in the file name, then it would help us to get the reason quickly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, it's stored as a number? That's even worse, because signal numbers are not portable between architectures, so if it's copied somewhere else, the number becomes meaningless.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Thank you, I was not aware of this portability issue.

context[CONTEXT_TIMESTAMP] = argv[CONTEXT_TIMESTAMP + 1];
strncat(timestamp, argv[CONTEXT_TIMESTAMP + 1], 10);
strncat(timestamp, "000000", 6);
context[CONTEXT_TIMESTAMP] = timestamp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use strncat. Instead do:

timestamp = strjoina(argv[CONTEXT_TIMESTAMP + 1], "000000");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the strjoina allocated the newly required memory from heap? do we need to de allocate them? I am assuming it is taking from stack, please clarify

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the stack, no need to allocate (or deallocate) anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keszybz. Today I tested this patch on my fedora machine and found that with this patch the extended attributes are not getting stored on the coredump. The fsetxattr function is returning -1 and the error string is " Operation not supported". I am not sure on how my changes broke the xattr. Without my changes it is working fine. Do you see any mistake here which would brake xattr?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keszybz , Sorry the xattr issue is not due to this patch. The coredumps are stored in tmpfs. Looks like that tmpfs is not supporting xattrs. I tried setting xattrs on normal files, even there it is not getting set.

@keszybz
Copy link
Member

keszybz commented Dec 2, 2016

The commit message should actually describe the effect of the change you are making... You give the motivation, i.e. what was wrong, but not how it is fixed.

@keszybz keszybz added coredump reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Dec 2, 2016
@poettering
Copy link
Member

So, in systemd we generally handle all time values in µs, and that's why we extend the more coarse time values passed by the kernel with trailing zeroes, when we use them for our own purposes.

The xattrs on the other hand are supposed to carry the raw kernel values, exactly the way the were passed to us, hence there we don't extend them to become µs.

I must say I much prefer if we'd exclusively continue to use µs for all our internal purposes, and that includes the COREDUMP_TIMESTAMP journal field, as well as the coredump file name. And I also think storing the raw kernel data as xattrs makes some sense too. Hence I am not convinced this change would be a good idea.

Or to say this differently: yes, it sucks to simply zero-extend the coarse time values, but I think the right way to fix this would be to fix the kernel to provide userspace with more accurate time values for this, like it tends to do for pretty much everything else...

@PRDinesh
Copy link
Author

PRDinesh commented Dec 2, 2016

@poettering . Thank you for the insight.
But currently the file name contains 22 digit time stamp, ie., 12 zeros added to the end and the file xattr contains 16 digit timestamp with 6 zeros appended. Is this the intended behaviour? In v219 the timestamp is working as you have explained above, but in v229 it is not.

Also timestamp in case of journald coredump is different than other coredumps. I think it is because the of the fact that process_journal was called before the change of the timestamp value.

@keszybz
Copy link
Member

keszybz commented Dec 3, 2016

Yeah, obviously the format should be the same for "special" coredumps like systemd-journald and other programs. It would be nice to see a separate commit which fixes that.

As to the format: the value in µs is 16 digits. It seems we actually store the time using 16 digits in user.coredump.timestamp xattr, and also in COREDUMP_TIMESTAMP in the journal, but we use 22 digits in the filename. So just changing the filename to also have 16 digits seems most reasonable.

Do you see any mistake here which would brake xattr?

No, everything seems kosher. But I don't want to debug this, if you're going to change the patch anyway. Can you repost the patch without the part that adds the signal number to the filename?

A separate issue is that we store the signal as a number. It'd be nice to (also?) store it as a signal name, so that we can move the journal entry between different architectures and report the signal correctly.

@PRDinesh
Copy link
Author

PRDinesh commented Dec 4, 2016

@keszybz

A separate issue is that we store the signal as a number. It'd be nice to (also?) store it as a signal name, so that we can move the journal entry between different architectures and report the signal correctly.

Sure. Happy to do that, Shall i use strsignal or signal_to_string to convert the signal number to string?

In summary I will submit three patches separately

  1. Same timestamp format for special and other program's coredump
  2. Change the file name to use 16 digit timestamp
  3. Convert the signal number to string

@keszybz
Copy link
Member

keszybz commented Dec 5, 2016

Sounds good.

Shall i use strsignal or signal_to_string to convert the signal number to string?

No idea. I don't even know why we have signal_to_string if strsignal exists.

@poettering
Copy link
Member

No idea. I don't even know why we have signal_to_string if strsignal exists.

signal_to_string() turns 2 into "SIGINT". strsignal() turns 2 into "Interrupted". In pretty much all cases strsignal() is hence not what you want, and signal_to_string() is...

@PRDinesh
Copy link
Author

PRDinesh commented Dec 8, 2016

@poettering , I am sorry, not able to correctly understand your comment on strsignal. Do you suggest to use strsignal or signal_to_string?

@keszybz
Copy link
Member

keszybz commented Dec 8, 2016

signal_to_string.

@keszybz keszybz added this to the v233 milestone Dec 12, 2016
@keszybz
Copy link
Member

keszybz commented Feb 3, 2017

@PRDinesh any progress?

poettering added a commit to poettering/systemd that referenced this pull request Feb 17, 2017
…off trailing zeroes

Our coredump handler operates on a "context" supplied by the kernel via
the core_pattern arguments. When we pass off a coredump for processing
to coredumpd we pass along enough information for this context to be
reconstructed. This information is passed in the usual journal fields,
and that means we extended the 1s granularity timestamp to 1µs
granularity by appending 6 zeroes. We need to chop them off again when
reconstructing the original kernel context.

Fixes: systemd#4779
@poettering
Copy link
Member

I have now posted #5373, which should fix the timestamping issue this is primarily about. Please have a look. I am taking the liberty to close this one now, in favour of the new PR. I hope that's OK!

@poettering poettering closed this Feb 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

coredump reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks

Development

Successfully merging this pull request may close these issues.

3 participants