Fix: Timestamp inconsistency between coredump filename and fileattrib…#4779
Fix: Timestamp inconsistency between coredump filename and fileattrib…#4779PRDinesh wants to merge 1 commit intosystemd:masterfrom
Conversation
…utes This fix is addressing the following issues : 1. The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution. 2. The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp. 3. The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash.
|
|
||
| s = filename_escape(context[CONTEXT_SIGNAL]); | ||
| if (!s) | ||
| return -ENOMEM; |
There was a problem hiding this comment.
I don't think this change is useful. Those names are very long already, and making them even longer is not nice.
The signal that killed the process is already recorded in the core file itself, so even if the extended attribute is lost, this information is available.
There was a problem hiding this comment.
I agree, but adding signal number is going to increase the file name size by just 3 bytes. Even though we have the crash reason inside the core file, but that would require us to unxz it, launch gdb to get the reason. Instead if we have this information in the file name, then it would help us to get the reason quickly
There was a problem hiding this comment.
Oooh, it's stored as a number? That's even worse, because signal numbers are not portable between architectures, so if it's copied somewhere else, the number becomes meaningless.
There was a problem hiding this comment.
Ok, Thank you, I was not aware of this portability issue.
| context[CONTEXT_TIMESTAMP] = argv[CONTEXT_TIMESTAMP + 1]; | ||
| strncat(timestamp, argv[CONTEXT_TIMESTAMP + 1], 10); | ||
| strncat(timestamp, "000000", 6); | ||
| context[CONTEXT_TIMESTAMP] = timestamp; |
There was a problem hiding this comment.
Don't use strncat. Instead do:
timestamp = strjoina(argv[CONTEXT_TIMESTAMP + 1], "000000");
There was a problem hiding this comment.
does the strjoina allocated the newly required memory from heap? do we need to de allocate them? I am assuming it is taking from stack, please clarify
There was a problem hiding this comment.
From the stack, no need to allocate (or deallocate) anything.
There was a problem hiding this comment.
@keszybz. Today I tested this patch on my fedora machine and found that with this patch the extended attributes are not getting stored on the coredump. The fsetxattr function is returning -1 and the error string is " Operation not supported". I am not sure on how my changes broke the xattr. Without my changes it is working fine. Do you see any mistake here which would brake xattr?
There was a problem hiding this comment.
@keszybz , Sorry the xattr issue is not due to this patch. The coredumps are stored in tmpfs. Looks like that tmpfs is not supporting xattrs. I tried setting xattrs on normal files, even there it is not getting set.
|
The commit message should actually describe the effect of the change you are making... You give the motivation, i.e. what was wrong, but not how it is fixed. |
|
So, in systemd we generally handle all time values in µs, and that's why we extend the more coarse time values passed by the kernel with trailing zeroes, when we use them for our own purposes. The xattrs on the other hand are supposed to carry the raw kernel values, exactly the way the were passed to us, hence there we don't extend them to become µs. I must say I much prefer if we'd exclusively continue to use µs for all our internal purposes, and that includes the COREDUMP_TIMESTAMP journal field, as well as the coredump file name. And I also think storing the raw kernel data as xattrs makes some sense too. Hence I am not convinced this change would be a good idea. Or to say this differently: yes, it sucks to simply zero-extend the coarse time values, but I think the right way to fix this would be to fix the kernel to provide userspace with more accurate time values for this, like it tends to do for pretty much everything else... |
|
@poettering . Thank you for the insight. Also timestamp in case of journald coredump is different than other coredumps. I think it is because the of the fact that process_journal was called before the change of the timestamp value. |
|
Yeah, obviously the format should be the same for "special" coredumps like As to the format: the value in µs is 16 digits. It seems we actually store the time using 16 digits in
No, everything seems kosher. But I don't want to debug this, if you're going to change the patch anyway. Can you repost the patch without the part that adds the signal number to the filename? A separate issue is that we store the signal as a number. It'd be nice to (also?) store it as a signal name, so that we can move the journal entry between different architectures and report the signal correctly. |
Sure. Happy to do that, Shall i use In summary I will submit three patches separately
|
|
Sounds good.
No idea. I don't even know why we have |
signal_to_string() turns 2 into "SIGINT". strsignal() turns 2 into "Interrupted". In pretty much all cases strsignal() is hence not what you want, and signal_to_string() is... |
|
@poettering , I am sorry, not able to correctly understand your comment on strsignal. Do you suggest to use strsignal or signal_to_string? |
|
|
|
@PRDinesh any progress? |
…off trailing zeroes Our coredump handler operates on a "context" supplied by the kernel via the core_pattern arguments. When we pass off a coredump for processing to coredumpd we pass along enough information for this context to be reconstructed. This information is passed in the usual journal fields, and that means we extended the 1s granularity timestamp to 1µs granularity by appending 6 zeroes. We need to chop them off again when reconstructing the original kernel context. Fixes: systemd#4779
|
I have now posted #5373, which should fix the timestamping issue this is primarily about. Please have a look. I am taking the liberty to close this one now, in favour of the new PR. I hope that's OK! |
This fix is addressing the following issues :
The time-stamp in the core dump filename consists of 22 digits, 10 digit in seconds resolution followed by 12 zeros, where as the time-stamp in the file attributes contains 16 digits, ie., in microsecond resolution.
The core dump file attribute of special process (Journald,etc., ) contains 10 digit time-stamp where as other process core dump file attribute contains 16 digit time-stamp.
The signal number which is the reason of the core dump is also added to the file name, so that from the name of the file we could get the reason for the crash. Currently it is available as part of the extended attributes, but some times it gets lost when the files are moved across different file-systems. Hence it is better to preserve this information in the file name itself