Skip to content

Fixed readdmtcp.sh and minor cleanup for restore buf handling.#1188

Merged
karya0 merged 3 commits intomainfrom
dev/karya0/readmtcp
Feb 27, 2025
Merged

Fixed readdmtcp.sh and minor cleanup for restore buf handling.#1188
karya0 merged 3 commits intomainfrom
dev/karya0/readmtcp

Conversation

@karya0
Copy link
Copy Markdown
Member

@karya0 karya0 commented Feb 27, 2025

PR #1144 broke readdmtcp.sh working which is now fixed in this PR.

Also cleaned up a bit of logic around restore buffer handling.

@karya0 karya0 requested a review from gc00 February 27, 2025 17:02
Copy link
Copy Markdown
Contributor

@gc00 gc00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried aboutt the second commit. It says Minor cleanup for restoreBuf handling.. But it seems liek we're removing various comments that would help other developers. Usually, I don't anticipate that a "Minor cleanup" should remove documentation, instead of updaating it to the most recent code. :-) (joke)

Could you update the comments? If we're no longer worried about overlap leaving us no place for a restorebuf hole on restart, ccould you document why this will never be a problem? Even better, can you also add a JASSERT to check for any implicit assumptions that you're making (along with comments about why)? Thanks.

// FIXME: ProcessInfo::init() sets len to RESTORE_TOTAL_SIZE. This
// then sets it to _restoreBufLen. But ProcessInfo::init() had
// previously set _restoreBufLen to RESTORE_TOTAL_SIZE. So, we
// have now made the round trip. This is spaghetti code. :-(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you removed the comment '// NOTE: This note explains ...`
At the very least, this comment should be useful:

//       See the comment inside writeckpt.cpp:mtcp_writememoryareas() for
//       the purpose of the "restoreBuf" (aka "holebase"). 

But in addition, isn't the point about mmap'ing 3*_restoreBufLen still relevant?
If you could guarantee that the restoreBuf is always at most 4 KB (one page), then maybe we wouldn't need the 3*_restoreBufLen . But then if were assuming that the restoreBuf is 4KB, then we need to do a JASSERT to verify this fact. Further, whatever code we use, will this be valid for HUGEPAGES?

Whatever we decide, it seems like we need to docuent what we're doing here, for future maintainability.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you removed the comment '// NOTE: This note explains ...` At the very least, this comment should be useful:

//       See the comment inside writeckpt.cpp:mtcp_writememoryareas() for
//       the purpose of the "restoreBuf" (aka "holebase"). 

But in addition, isn't the point about mmap'ing 3*_restoreBufLen still relevant? If you could guarantee that the restoreBuf is always at most 4 KB (one page), then maybe we wouldn't need the 3*_restoreBufLen . But then if were assuming that the restoreBuf is 4KB, then we need to do a JASSERT to verify this fact. Further, whatever code we use, will this be valid for HUGEPAGES?

Whatever we decide, it seems like we need to docuent what we're doing here, for future maintainability.

The comment is now inside mtcp_restart but I can add a reference to it here.

* restart() is heavily overloaded, but assuming it's only relevant
* to processinfo.cpp, we have it used in: ProcessInfo::restart()
* and 'case DMTCP_EVENT_RESTART' in processInfo_EventHook().
*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the related comment. See my above inline comment.

Conceptually, we have non-local code. Data structures must be maintained consistently between writeckpt.cpp and mtcp_restart.c So, we need to document the consistency requirements in both places. Otherwise, new developers have a hard time to understand the code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't agree more. The three multiplier was mtcp-specific logic that we split across processInfo and write-ckpt but I have now moved it to mtcp_restart. We can probably cleanup some more of writeckpt logic, but we can defer it for now.

@@ -190,14 +184,7 @@ mtcp_writememoryareas(int fd)
// will invoke mmap if the JAlloc arena is full. Similarly, for STL objects
// such as vector and string.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I have a similar worry about removing this comment, and not documenting our consistency/design requirements between writeckpt.cpp and mtcp_restart.c.

@karya0
Copy link
Copy Markdown
Member Author

karya0 commented Feb 27, 2025

I agree it looks bigger than a minor cleanup, at least from the number of lines. However, the majority of comments are not removed, they are just moved to mtcp_restart. We are keeping all the logic related to bigger size in mtcp_restart.c only. The ProcessInfo/WriteCkpt locations are now back to using a single size that's defined in dmtcp.h, which in turn is also used by mtcp_restart.

Effectively, we are doing the following:

  • ProcessInfo: create a buffer of size RESTORE_BUF_TOTAL_SIZE. It doesn't do anything besides mmap/munmap of this size. The associated comment is now moved to mtcp_restart.c.
  • WriteCkpt: remove the 3 * logic since that logic is now entirely inside mtcp_restart.c. Here we are simply looking at the start and end addresses of the restore buffer.
  • mtcp_restart.c: The logic is now to divide the larger RESTORE_BUF_TOTAL_SIZE area into three chunks to ensure at least one doesn't overlap with stack, etc. The logic is the same, except for dividing by three instead of multiplying by three. Since this area is only for mtcp_restart, ProcessInfo doesn't need to know the rest of the logic besides size.
  • Renaming variables- restoreBufAddr and restoreBufLen are now replaced with restoreBuf.startAddr, and restoreBuf.endAddr (along with the len calculations, as appropriate) -- no other logic change.

In a way, this PR consolidates the restore-buf size handling code by putting it all inside mtcp_restart.

Copy link
Copy Markdown
Contributor

@gc00 gc00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for pointing out that the comment about three times the restoreBuf is still there -- just in mtcp_restart.c now. That works. And as you point out, the writeckpt.cpp code initially just creates an arbitrary buffer size, and mtcp_restart.c is the one that views it as three separate buffers, in case of overlap.

Thanks.

@karya0 karya0 merged commit 81342fd into main Feb 27, 2025
1 check passed
@karya0 karya0 deleted the dev/karya0/readmtcp branch February 27, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants