Use double_fork() for gzip compression#1223
Conversation
WalkthroughRefactored forked checkpointing to use a type-safe enum and a new double_fork() helper, reorganized SIGCHLD handling and delivery, updated DmtcpWorker::postCheckpoint to support deferred rename under forked mode, and made several local variable type adjustments for consistency. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant open_ckpt as open_ckpt_to_write()
participant double_fork as double_fork()
participant Parent
participant Child
participant Grandchild
Caller->>open_ckpt: initiate fork-based checkpoint
open_ckpt->>double_fork: request two-stage fork
Note over double_fork: First fork -> creates Child
double_fork->>Child: second fork
alt Grandchild path
double_fork-->>Grandchild: return FORKED_CKPT_GRANDCHILD
Grandchild->>open_ckpt: perform checkpoint write/rename (grandchild responsibility)
else Parent path
double_fork-->>Parent: return FORKED_CKPT_PARENT
Parent->>open_ckpt: continue, wait for SIGCHLD delivery handling
else Failure
double_fork-->>open_ckpt: return FORKED_CKPT_FAILED
open_ckpt->>Caller: error handling
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1ac1617 to
8b26c5d
Compare
* This fixes a bug, when using forked checkpointing.
Previously, we forked a gzip, and did a waitpid for it.
So, the parent process was stuck in waitpid while the
forked process writing to a ckpt image.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ckptserializer.cpp (1)
406-417: After rename in GRANDCHILD, notify coordinator (currently missing).Grandchild renames
.tmp-> final, then_exit(0). To keep coordinator state consistent—especially if parent sent.tmpearlier—send the final filename here.- JASSERT(rename(ProcessInfo::instance().getTempCkptFilename().c_str(), - ProcessInfo::instance().getCkptFilename().c_str()) == 0); + JASSERT(rename(ProcessInfo::instance().getTempCkptFilename().c_str(), + ProcessInfo::instance().getCkptFilename().c_str()) == 0); + // Inform coordinator that final filename is now in place. + CoordinatorAPI::sendCkptFilename();
🧹 Nitpick comments (7)
src/wrappers.cpp (1)
404-407: Loop var type: prefer deduced or signed to match container element type.
vector<int> fdsis iterated withunsigned int fd. Not wrong (FDs are non-negative), but it forces a cast and can confuse overload resolution (e.g., passing to functions expectingint). Useauto fdorint fdfor clarity.Apply:
- for (unsigned int fd : fds) { + for (auto fd : fds) { // fd is intjalib/jfilesystem.cpp (1)
168-198: Avoid narrowingsize_ttointfor path lengths and indices.
strlen()returnssize_t. Converting tointrisks overflow on very long paths and mixes signed/unsigned in loop math. Usesize_t(for length) with a signed index for the sentinel-1, or avoid manual scan viastrrchr.Apply:
- int len = strlen(fname); + size_t len = strlen(fname); @@ - int lastSlash = -1; - for (int i = 0; i < len; i++) { - if (fname[i] == '/') { - lastSlash = i; - } - } + // Prefer libc scan to avoid manual index types + const char* slash = ::strrchr(fname, '/'); + int lastSlash = slash ? static_cast<int>(slash - fname) : -1;src/ckptserializer.cpp (5)
78-83: Initialize and document trivial SIGCHLD action.Good encapsulation with
set_trivial_sigchld_action. ConsiderSA_RESTARTif you rely on restarting certain syscalls elsewhere, but current usage withsigsuspend()is fine.
92-108: sigsuspend waits for “a” SIGCHLD, not necessarily from our child.If other children exist with pending SIGCHLD,
sigsuspend()may return before our freshly forked child exits;waitpid(pid, 0, 0)could then block. Likely safe in ckpt thread, but please confirm invariant “no other children can deliver SIGCHLD here” or guard with a non-blocking loop:- _real_sigsuspend(&mask_all_but_sigchld); + do { + if (_real_waitpid(-1, NULL, WNOHANG) == pid) break; + } while (_real_sigsuspend(&mask_all_but_sigchld) == -1 && errno == EINTR);
109-143: double_fork(): restore handler in parent only; fine, but consider mask state in grandchild.Parent restores SIGCHLD action; grandchild inherits blocked signal mask from ckpt thread. If compressor expects signals unmasked, consider unblocking in the execing path, or at least ensure gzip doesn’t rely on them. Otherwise, LGTM.
248-267: Compression pipe setup: consider O_CLOEXEC to reduce fd leaks.When using
pipe2, passO_CLOEXECto avoid descriptor leakage if future code adds more exec paths.- if (_real_pipe(pipe_fds) == -1) { + if (_real_syscall(SYS_pipe2, pipe_fds, O_CLOEXEC) == -1) {
289-345: Always using double_fork() for compressor: OK. Ensure LD_PRELOAD hygiene and fd discipline.Exec path unsets LD_PRELOAD and dup2s stdin/stdout correctly. One nit: if
fd <= STDERR_FILENO, closing conditionals skip it; safe. Considerfcntl(outfd, F_SETFD, FD_CLOEXEC)before exec as belt-and-suspenders; not required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
jalib/jfilesystem.cpp(1 hunks)src/ckptserializer.cpp(5 hunks)src/dmtcpworker.cpp(1 hunks)src/wrappers.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/dmtcpworker.cpp (3)
plugin/pathvirt/pathvirt.cpp (2)
rename(687-695)rename(687-687)src/processinfo.cpp (2)
instance(213-220)instance(214-214)src/coordinatorapi.cpp (2)
sendCkptFilename(724-753)sendCkptFilename(725-725)
src/ckptserializer.cpp (3)
src/signalwrappers.cpp (1)
sigaction(153-153)src/nosyscallsreal.c (2)
_real_waitpid(339-343)_real_close(188-192)src/processinfo.cpp (2)
instance(213-220)instance(214-214)
🔇 Additional comments (2)
src/ckptserializer.cpp (2)
62-69: Enum-based forked status: good move.Replacing raw ints with
forked_rc_timproves readability and avoids magic numbers. No concerns.
272-284: Forked-ckpt gate: env checked here but compressor path always double-forks.This is intentional per PR (double-fork for gzip regardless). No issues.
| // NOTE: If FORKED_CHECKPOINTING, the grandchild may still be checkpointing. | ||
| // Grandchild process will do renaming and inform coordinator. | ||
| if (getenv(ENV_VAR_FORKED_CKPT) == NULL) { | ||
| /* Now that temp checkpoint file is complete, rename it over old permanent | ||
| * checkpoint file. Uses rename() syscall, which doesn't change i-nodes. | ||
| * So, gzip process can continue to write to file even after renaming. | ||
| */ | ||
| JASSERT(rename(ProcessInfo::instance().getTempCkptFilename().c_str(), | ||
| ProcessInfo::instance().getCkptFilename().c_str()) == 0); | ||
| CoordinatorAPI::sendCkptFilename(); | ||
| } else { // else FORKED CHECKPOINTING. Write temp name; grandchild not done. | ||
| string ckptFilename = ProcessInfo::instance().getCkptFilename(); | ||
| ProcessInfo::instance().setCkptFilename( | ||
| ProcessInfo::instance().getTempCkptFilename().c_str()); | ||
| CoordinatorAPI::sendCkptFilename(); | ||
| ProcessInfo::instance().setCkptFilename(ckptFilename.c_str()); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Coordinator filename mismatch in forked mode (sends .tmp but never updated).
In the FORKED_CKPT branch, parent sends the temp filename to the coordinator, while the grandchild later renames to final .dmtcp. There is no subsequent update to the coordinator, so restart scripts will point to .tmp. Either: (A) keep sending the final name from the parent (recommended, rename happens later but inode stays the same), or (B) have the grandchild send the final name after rename.
Minimal fix (A):
- } else { // else FORKED CHECKPOINTING. Write temp name; grandchild not done.
- string ckptFilename = ProcessInfo::instance().getCkptFilename();
- ProcessInfo::instance().setCkptFilename(
- ProcessInfo::instance().getTempCkptFilename().c_str());
- CoordinatorAPI::sendCkptFilename();
- ProcessInfo::instance().setCkptFilename(ckptFilename.c_str());
- }
+ } else {
+ // Forked mode: still send final filename; grandchild will rename when done.
+ CoordinatorAPI::sendCkptFilename();
+ }If you prefer (B), ensure the grandchild calls CoordinatorAPI::sendCkptFilename() after the rename (see suggested change in ckptserializer.cpp).
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine the current code in dmtcpworker.cpp around lines 493-509
sed -n '485,520p' src/dmtcpworker.cpp
Length of output: 1572
🏁 Script executed:
#!/bin/bash
# Search for where the grandchild process does the rename operation
rg -nP "rename.*getTempCkptFilename|getTempCkptFilename.*rename" --type=cpp -C5Length of output: 1830
🏁 Script executed:
#!/bin/bash
# Search for FORKED_CKPT usage in ckptserializer.cpp
rg -nP "FORKED_CKPT|ENV_VAR_FORKED_CKPT" --type=cpp -C3 | head -80Length of output: 4334
🏁 Script executed:
#!/bin/bash
# Find where CoordinatorAPI::sendCkptFilename is called
rg -nP "sendCkptFilename" --type=cpp -C2Length of output: 1606
🏁 Script executed:
#!/bin/bash
# Search for grandchild process handling and file renaming logic
rg -nP "grandchild|GRAND_CHILD" --type=cpp -C3 | head -100Length of output: 2099
The review comment is accurate—coordinator receives .tmp filename but file ends up at final .dmtcp name.
The verification confirms a real bug in forked checkpointing:
-
Current behavior: Parent sends
.tmpto coordinator (line 507), but grandchild renames the file to final.dmtcp(ckptserializer.cpp:411) without notifying coordinator, then exits immediately. -
Consequence: Restart scripts record
.tmpbut the actual checkpoint file exists at the final.dmtcpname—causing restore failures. -
Code comment inaccuracy: Lines 493–494 claim the grandchild will "inform coordinator," but ckptserializer.cpp:416 shows it calls
_exit(0)without any coordinator communication. -
Fix A is safe: Sending the final filename from parent works because
rename()preserves i-nodes; the grandchild's subsequent rename operation still succeeds, and the coordinator has the correct final name.
🤖 Prompt for AI Agents
In src/dmtcpworker.cpp around lines 493 to 509, the forked-checkpoint branch
currently sends the .tmp filename to the coordinator but the grandchild later
renames the file to the final .dmtcp name (and exits without notifying), causing
a mismatch; change the else branch so it sends the final checkpoint filename
(ProcessInfo::instance().getCkptFilename()) to the coordinator instead of the
temp name, restore any temporary modifications as needed, and update the comment
to reflect that the parent now sends the final name so coordinator and
filesystem agree.
xuyao0127
left a comment
There was a problem hiding this comment.
The current solution looks good enough to me. So I'm approving this PR. But I think we should maintain a consistent filename on disk and in the coordinator. I'm not sure if the filename reported to the coordinator will be used anywhere, but it may cause confusion when debugging in the future.
|
I believe that DMTCP does not rely on whether the DMTCP coordinator thinks the file is *.dmtcp.temp or *.dmtcp. I think it's used only for informational purposes, and the human will be warned when they see *.dmtcp.temp, and hopefully will look if the file has been renamed yet. I wish there were a clean way to tell the coordinator when the forked ckpt is complete, but anything I can imagine is ugly, because the grandchild cannot talk to the coordinator, and so it would have to signal the parent, or some other weird solution. |
This is required for forked checkpointing. Previously, we forked a gzip, and did a waitpid for it. So, the parent process was stuck in waitpid while the forked process writing to a ckpt image.
By using a new function
double_fork(), the code is simpler to follow, 35 lines shorter, and it fixes the waitpid bug for forked checkpointing.And then the forked checkpointing code uses
double_fork()in a later commit. This cleans up the code.COMMITS:
4fa3ade: If FORKED_CKPT, grandchild renames .tmp to .dmtcp
(The parent process in dmtcpworker.cpp was renaming, even while the grandchild was still saving the ckpt.)
b15389a: Use double_fork() for gzip compression
f97b275: Use double_fork() if FORKED_CHECKPOINTING
(Note that this commit has the original process send the temp ckpt filename (*.dmtcp.temp) to the coordinator, and the grandchild currently cannot send the real filename (*.dmtcp) because the coordinator doesn't recognize it as part of the original computation group. An alternative would be for the original process to send the real filename (*.dmtcp) even while the grandchild process continues to append to that file.)
8c53751: Remove some compiler warnings
This is now ready for review. @karya0, I'll trade you a review of your PR for a review of this PR. :-)
Summary by CodeRabbit
Refactor
Bug Fixes
Chores