-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
What is the problem you're trying to solve
The general goal is to improve the security and performance of the windows container layer operations.
Windows containers currently use NTFS & WCIFS based layers to effectively combine multiple layers (along with a scratch layer) to generate a container's rootfs. These layers are stored on NTFS with special reparse points that allow the creation of rootfs.
This current approach of storing image layers on NTFS with reparse points has a few drawbacks:
- The layers cannot be stored on a non-NTFS formatted drive (since they depend on NTFS specific features like reparse points).
- Layer import with WCIFS involves some operations that need certain (backup/restore) privileges. Currently, containerd takes these privileges during image import but has no proper way of releasing those privileges after import is done (The privileges are acquired for the entire process and there is no ref counting done. One goroutine can try to release previously acquired privileges causing errors in some other goroutine).
- Performance of the layer import process has room for improvement. (For e.g., Non base layers are currently first written to a temp directory in the backup tar format and then they are rewritten to the actual snapshot directory in the WCIFS understandable format).
- We have seen multiple issues in the past where cleanup of these layers has been a problem because of the permissions that they need.
Describe the solution you'd like
At Microsoft, we have been working on a new format and a filesystem for storing and using container images on Windows. It's called CimFS.
Composite Image Filesystem (CimFS) is a filesystem specifically designed for storing layers of windows container images. A Composite Image (or a CIM) is a complete filesystem within itself, similar to a disk image. APIs exported by CimFS.dll allow applications to create a CIM, add files in it and write to those files. Once the creation of a CIM is complete it can be mounted (with the help of CimFS.dll) to a volume where it shows up as a read-only filesystem. Idea is to use 1 CIM for each unique layer.
CimFS will eliminate the need of any reparse points in image layers. CimFS will generate the union of read-only image layers with the container scratch. Currently, we also have to create WCIFS reparse points on the container scratch. However, we have also developed a new filesystem filter (named UnionFS - note that UnionFS is different than the generic unionfs term used on Linux) that will work with CIMFS without needing any reparse points on the scratch.
Advantages of this approach:
- Windows image layers can be imported & stored on non NTFS drives too. This makes it easier to use in remote snapshotters.
- Avoid certain complexities and inefficiencies (like the backuptar format, copying to temporary directory) of the legacy layer import workflow.
- Importing process doesn't need to hold backup/restore privileges which reduces the risk of any security vulnerabilities.
- Cleaning up layer directories (i.e. snapshot directories) doesn't need special permissions anymore.
- As CimFS is a read-only filesystem it needs much less synchronization which improves performance.
- In our testing so far, we have seen noticeable performance improvements when using CimFS layers. (I will update the post with actual perf numbers shortly.)
Additional context
Changes required to support CimFS will be split between containerd & hcsshim repos. A quick overview of these changes/PRs is given below, the PRs are dependent on each other and so should be merged in the same order. (I am planning to open these PRs one by one, and I will keep updating this post with the links of those PRs as I open them)
-
Add Go wrappers for CimFS.dll APIs.
This change has no dependency. (cimfs support: Add cimfs writer microsoft/hcsshim#927) -
Add a new LayerWriter (https://github.com/microsoft/hcsshim/blob/main/layer.go#L108) to hcsshim.
This layer writer will be used when importing container images to a CIM format. -
Add a new
cimfssnapshotter &cimfsdiffer in containerd.
These new snapshotter & differs will in turn use the layer writer added in the previous change to extract image layers into the CIM format.
Add support for cimfs snapshotter & differ #8807 -
Use CimFS based layers in hcsshim.
This involves using the new mount manager APIs to mount/unmount the container rootfs snapshots before
starting the container.
Note, WCIFS is a generic filter that can be used for non-container scenarios. CimFS isn't meant to replace
WCIFS everywhere. CimFS is designed to improve the performance and stability of container scenarios but WCIFS
still remains useful in other scenarios (including non CimFS containers).
We wanted to share this with the containerd community and hear your opinions on this. We wanted to understand
if the CimFS change & the mount manager change is compatible with the other changes/development happening in
containerd. I will try my best to provide answers to any question that you may have.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status