Skip to content

Fix for forked ckpt: commit 7e486722bdb broke it#1220

Merged
gc00 merged 1 commit intodmtcp:mainfrom
gc00:fix-forked-ckpt
Oct 11, 2025
Merged

Fix for forked ckpt: commit 7e486722bdb broke it#1220
gc00 merged 1 commit intodmtcp:mainfrom
gc00:fix-forked-ckpt

Conversation

@gc00
Copy link
Copy Markdown
Contributor

@gc00 gc00 commented Oct 10, 2025

@karya0 ,
Apparently, your addition of kvdb for DMTCP (commit 7e48672) had not considered the case of forked checkpointing.
I'm commenting out one line of that code when forked ckpt is being used.
Could you please check this, to see if you want to modify kvdb for this case, or just ignore it, since forked ckpt is less common?

To recall, in forked ckpt, the original process figures out open fd's, signal handlers, etc., and then forks a grandchild. The grandchild writes the memory areas to the ckpt_*.temp file. The original process then renames the *.temp file to *.dmtcp, but the grandchild is allowed to continue executing and write out the rest of the memory areas. The grandchild then exits when done.

@xuyao0127,
When we also have the enhancement for SOCK_SEQPACKET, let's add this commit, also, and do a new point release for DMTCP.

Thanks.

Summary by CodeRabbit

  • Bug Fixes

    • Adjusted behavior under forked checkpointing to avoid writing certain process mapping data, improving consistency with expected operation.
  • Documentation

    • Updated the configure option description for forked checkpointing to clarify its experimental status and note that shared memory is currently not supported.

@gc00 gc00 added the bug label Oct 10, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

A conditional compilation guard was added in DmtcpWorker::postCheckpoint() to wrap the kvdb::set write for ProcSelfMaps_Ckpt under #ifndef FORKED_CHECKPOINTING; the ProcSelfMaps_JAllocArenas write is unchanged. The configure/configure.ac help text for --enable-forked-checkpointing was updated to note that shared memory is not supported. No public interfaces were modified.

Changes

Cohort / File(s) Summary
Checkpoint post-processing
src/dmtcpworker.cpp
Wrapped kvdb::set for ProcSelfMaps_Ckpt with #ifndef FORKED_CHECKPOINTING; ProcSelfMaps_JAllocArenas handling unchanged; no other logic altered.
Build configuration docs
configure, configure.ac
Expanded help/description for --enable-forked-checkpointing to: "(EXPERIMENTAL: Currently, shared memory is not supported.)". No behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant W as DmtcpWorker::postCheckpoint()
  participant KV as kvdb

  rect rgb(236, 248, 255)
  note over W: Post-checkpoint metadata writes
  alt FORKED_CHECKPOINTING not defined
    W->>KV: set("ProcSelfMaps_Ckpt", ...)
  else FORKED_CHECKPOINTING defined
    note over W: Skip ProcSelfMaps_Ckpt write (compile-time)
  end
  W->>KV: set("ProcSelfMaps_JAllocArenas", ...)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Thump-thump, I nudge the code with care,
A tiny guard, a compile-time snare—
One map saved when forks aren’t waking,
Another stays, no habits breaking.
Ears up high, I hop and grin—small change—let checks begin! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely captures the primary change of restoring forked checkpoint support that was broken by commit 7e48672. It directly reflects the pull request’s main objective without extraneous details. The use of the commit hash helps reviewers immediately identify the associated upstream change. The casual phrasing “broke it” is understandable in context and does not hinder clarity.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd4e793 and 6208b1e.

📒 Files selected for processing (3)
  • configure (1 hunks)
  • configure.ac (1 hunks)
  • src/dmtcpworker.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • configure.ac
  • configure
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/dmtcpworker.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@karya0 karya0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

However, please note that forked checkpoint won't work beyond trivial use cases. We don't have a way to support shared-memory segments that could get overwritten while ckpt is being written. Not that it's not possible to do, but the outcome isn't worth all the trouble.

Copy link
Copy Markdown
Collaborator

@xuyao0127 xuyao0127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gc00 gc00 merged commit 98ac51e into dmtcp:main Oct 11, 2025
2 checks passed
@gc00 gc00 deleted the fix-forked-ckpt branch October 11, 2025 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants