Skip to content

Conversation

@samuelkarp
Copy link
Member

The 10-containerd-net.conflist file generated from the conf_template should be written atomically so that partial writes are not visible to CNI plugins. This is accomplished by writing a temporary file, syncing, and renaming over the destination file name.

Partial/inconsistent writes can occur due to:

  1. A CNI plugin attempting to read the file while containerd is writing it (both in the case of a newly-generated file or in the case of a re-generated file via a subsequent call to UpdateRuntimeConfig).
  2. Concurrent calls to UpdateRuntimeConfig leading to multiple active writers of the same file

The above mechanism explicitly protects against (1) as all writes are to a file with a temporary name.

There is no explicit protection against multiple, concurrent CRI calls to UpdateRuntimeConfig. However, atomically writing the generated file should mean only one writer will "win" and a consistent file will be visible.

Fixes #8607

@samuelkarp samuelkarp added the area/cri Container Runtime Interface (CRI) label May 31, 2023
@samuelkarp samuelkarp added this to the 2.0 milestone May 31, 2023
@samuelkarp samuelkarp added cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels May 31, 2023
@samuelkarp
Copy link
Member Author

Switching to draft; this will need to be conditioned off for Windows.

@samuelkarp samuelkarp marked this pull request as draft May 31, 2023 07:26
@dcantah
Copy link
Member

dcantah commented May 31, 2023

Truly impeccable timing from myself 🤣. Whenever this is un-drafted I'll take a peek again

@samuelkarp
Copy link
Member Author

@dcantah Comments are appreciated and useful too 😄

@dcantah
Copy link
Member

dcantah commented May 31, 2023

Believe atomicity can be achieved on windows with a little wrapper around MoveFileEx (in place of os.Rename). You need the MOVEFILE_REPLACE_EXISTING|MOVEFILE_WRITE_THROUGH flags, which unfortunately os.Rename only provides the first of the two.

@samuelkarp
Copy link
Member Author

MOVEFILE_REPLACE_EXISTING|MOVEFILE_WRITE_THROUGH I don't see a mention of atomicity on the MoveFileEx documentation page, but maybe I'm not sure what I'm looking for. MOVEFILE_WRITE_THROUGH seems to mention a "copy and delete operation" which sounds like it's two separate operations?

@dcantah
Copy link
Member

dcantah commented May 31, 2023

Mmm, my university wisdom might be misguided now (or perhaps always was LOL). This was a pretty common tactic in many "write files atomically" libraries, and even stdlibs:

  1. https://github.com/untitaker/python-atomicwrites/blob/master/atomicwrites/__init__.py#L70-L79
  2. https://github.com/untitaker/rust-atomicwrites/blob/master/src/lib.rs#L303-L313
  3. https://github.com/natefinch/atomic/blob/master/file_windows.go#L29
  4. https://bugs.python.org/msg146307

This area seems completely fraught with peril though. There's no clear picture or guarantee on docs of atomicity, and everyone seems to just almost pray that their use of MoveFile or MoveFileTransacted (deprecated now) works. A fun read: https://www.postgresql.org/message-id/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw%40mail.gmail.com.

A recommended approach as recent as 2021 on an article from msft recommends ReplaceFile now for files with "document-like" data. ReplaceFile at least has the advantage that it preserves the following as well:

  • Creation time
  • Short file name
  • Object identifier
  • DACLs
  • Security resource attributes
  • Encryption
  • Compression
  • Named streams not already in the replacement file

This is much deeper than my drive-by comment at 1 AM led on 🤣

@samuelkarp samuelkarp marked this pull request as ready for review May 31, 2023 09:07
@samuelkarp samuelkarp marked this pull request as draft May 31, 2023 16:56
@dcantah
Copy link
Member

dcantah commented May 31, 2023

@kevpar Curious if anyone from the FS teams could give a blessing on some approach for atomic writes on Windows

@samuelkarp
Copy link
Member Author

I've refactored this a bit to encapsulate the rename-file logic out into its own package and keep the CRI code a bit cleaner.

Mmm, my university wisdom might be misguided now (or perhaps always was LOL).

My curriculum didn't cover Windows, so this is all a bit unfamiliar to me 🤣

@dcantah, I didn't attempt to write a Windows codepath yet. Do you think we should:

  1. Use MoveFileEx with MOVEFILE_REPLACE_EXISTING|MOVEFILE_WRITE_THROUGH
  2. Use RenameFile (not sure with which flags)
  3. Punt for now and leave Windows as-is (we're at least not making it worse while we do end up improving Linux)

@dcantah
Copy link
Member

dcantah commented May 31, 2023

I'd be fine with punting given we have an easy win on unix, but I'm curious if Kevin has any insights. At the very least lets add a TODO to consistentfile's Windows impl to find the right approach

@samuelkarp samuelkarp marked this pull request as ready for review May 31, 2023 22:17
@samuelkarp
Copy link
Member Author

At the very least lets add a TODO to consistentfile's Windows impl to find the right approach

Done

@kevpar
Copy link
Member

kevpar commented May 31, 2023

MoveFileEx with MOVEFILE_REPLACE_EXISTING|MOVEFILE_WRITE_THROUGH is not guaranteed to be atomic on Windows. In particular, it has to first delete the existing file, which means the file contents on disk have to be deallocated (per my understanding). So for instance, if the machine crashes in the middle of this operation, you can have a partially-deallocated file.

The filesystem doesn't provide any way to guarantee atomicity for this kind of thing. I think the best you can do is something like MoveFileEx or ReplaceFile.

@samuelkarp
Copy link
Member Author

The filesystem doesn't provide any way to guarantee atomicity for this kind of thing. I think the best you can do is something like MoveFileEx or ReplaceFile.

@kevpar Is there documentation somewhere (or do you have guidance you can share) covering the tradeoffs/recommendation to choose for this use-case?

@fuweid
Copy link
Member

fuweid commented Jun 1, 2023

@samuelkarp
Copy link
Member Author

maybe we can align with https://github.com/containerd/continuity/blob/25762ef7c27a37645197ed6d1d708e55a230e35d/ioutils.go#L28 ?

I hadn't seen that one either, but I was trying to avoid an interface with data []byte. The io.ReadWriteCloser model here allows something like template.Execute to still work (since it accepts an io.Writer) without needing an additional in-memory buffer. template.ParseFiles does already read the whole thing into a buffer, but other use-cases with large files might not, and the consistentfile/atomicfile package allows that pattern to work. (Reading a large user-supplied file into memory could cause containerd to OOM.)

@fuweid
Copy link
Member

fuweid commented Jun 1, 2023

but other use-cases with large files might not, and the consistentfile/atomicfile package allows that pattern to work. (Reading a large user-supplied file into memory could cause containerd to OOM.)

Thanks for the comment. SGTM.

@samuelkarp samuelkarp added the status/needs-update Awaiting contributor update label Jun 1, 2023
@samuelkarp samuelkarp force-pushed the issue-8607 branch 2 times, most recently from b54d795 to 5d6b8a8 Compare June 1, 2023 06:23
@samuelkarp samuelkarp removed the status/needs-update Awaiting contributor update label Jun 1, 2023
@samuelkarp
Copy link
Member Author

As part of replacing WritePidFile and WriteAddress I ported their (somewhat) platform-independent behavior to atomicfile. atomicfile now does the same rename workflow on Windows and reorders the underlying file close ahead of the rename to avoid the restriction that Windows has on renaming open files. If there's consensus on either MoveFileEx or RenameFile instead of Go's os.Rename (which does call MoveFileEx but not with the same flags), I'm happy to switch to that.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple questions

Copy link
Member

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Certain files may need to be written atomically so that partial writes
are not visible to other processes. On Unix-like platforms such as
Linux, FreeBSD, and Darwin, this is accomplished by writing a temporary
file, syncing, and renaming over the destination file name. On Windows,
the same operations are performed, but Windows does not guarantee that a
rename operation is atomic.

Partial/inconsistent reads can occur due to:
1. A process attempting to read the file while containerd is writing it
   (both in the case of a new file with a short/incomplete write or in
   the case of an existing, updated file where new bytes may be written
   at the beginning but old bytes may still be present after).
2. Concurrent goroutines in containerd leading to multiple active
   writers of the same file.

The above mechanism explicitly protects against (1) as all writes are to
a file with a temporary name.

There is no explicit protection against multiple, concurrent goroutines
attempting to write the same file. However, atomically writing the file
should mean only one writer will "win" and a consistent file will be
visible.

Signed-off-by: Samuel Karp <[email protected]>
The 10-containerd-net.conflist file generated from the conf_template
should be written atomically so that partial writes are not visible to
CNI plugins. Use the new consistentfile package to ensure this on
Unix-like platforms such as Linux, FreeBSD, and Darwin.

Fixes containerd#8607

Signed-off-by: Samuel Karp <[email protected]>
Copy link
Member

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuelkarp samuelkarp merged commit f92e576 into containerd:main Jun 5, 2023
@samuelkarp samuelkarp deleted the issue-8607 branch June 5, 2023 17:31
@thaJeztah thaJeztah added cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Jul 14, 2023
@thaJeztah thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch and removed cherry-pick/1.6.x labels Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix cni conf_template related code

7 participants