Skip to content

Conversation

@giuseppe
Copy link
Member

@giuseppe giuseppe commented Oct 12, 2020

force_mask sets a permission mask used for the new files and directories.

It is useful for using a NFS share for the rootless storage. It requires this change in fuse-overlayfs:

permissions_xattr, if it is set then it for each file write its original permissions to the specified extended attribute.

Together with force_mask= it allows to use NFS as a backing store for rootless containers:

containers/fuse-overlayfs#246

[storage]
  driver = "overlay"
  graphroot = "/mnt/nfs/home/storage"
  [storage.options]
    mountopt = "xattr_permissions=2"
  [storage.options.overlay]
     force_mask = "0755"
     ignore_chown_errors = "true"

Signed-off-by: Giuseppe Scrivano [email protected]

@giuseppe giuseppe force-pushed the force-mask branch 2 times, most recently from e3b4315 to 04dd4d0 Compare October 12, 2020 14:36
@rhatdan
Copy link
Member

rhatdan commented Oct 12, 2020

What is 493?

force_mask = 493

@giuseppe
Copy link
Member Author

What is 493?

force_mask = 493

that is 0755.

$ printf "%d" 0755
493

Unfortunately toml doesn't support octals. Should we use a string or just document it?

@rhatdan
Copy link
Member

rhatdan commented Oct 13, 2020

Yes it should be a string. Humans will not understand this number.
"0755" is much better.

@giuseppe
Copy link
Member Author

Yes it should be a string. Humans will not understand this number.
"0755" is much better.

changed to be a string.

@rhatdan
Copy link
Member

rhatdan commented Oct 13, 2020

So this means every file and directory will be marked 0755? I guess we need to talk about what you are after here? Should we separate the files from the directories. Would this make /etc/shadow inside of a container work readable to content in the container?

@giuseppe
Copy link
Member Author

So this means every file and directory will be marked 0755? I guess we need to talk about what you are after here? Should we separate the files from the directories. Would this make /etc/shadow inside of a container work readable to content in the container?

yes, every file and directory in the container image will get the mode 0755.

I didn't want to expose too much of the fuse-overlayfs internals in containers/storage. If we extend containers/storage to write the original permission mask (+owner uid/gid) in an extended attribute that fuse-overlayfs understands, we could keep the same privileges at mount time. What do you think about it?

@rhatdan
Copy link
Member

rhatdan commented Oct 14, 2020

I think we should have a quick meeting to discuss this.

@giuseppe giuseppe marked this pull request as draft October 20, 2020 15:46
@giuseppe giuseppe changed the title storage: add option force_mask= storage: add option force_mask= and permissions_xattr= Oct 23, 2020
@giuseppe giuseppe marked this pull request as ready for review October 23, 2020 17:55
@rhatdan
Copy link
Member

rhatdan commented Oct 26, 2020

Should this be an option in storage.conf, or just something set by podman and Buildah for rootless users?

@rhatdan
Copy link
Member

rhatdan commented Oct 26, 2020

The issue I have with this, is that this effects rootfull as well as rootless users.

@giuseppe
Copy link
Member Author

Should this be an option in storage.conf, or just something set by podman and Buildah for rootless users?

IMO, it should be something users set explicitly and not enabled by default. With the wrong fuse-overlayfs version containers will have the wrong permissions for the files.

I see a good use case for root as well. We can make images available to rootless users as a rootless user can read all the files in the root storage.

With this feature we can enable a read-only storage owned by root and accessible to rootless users.

We can discuss it further at the tomorrow meeting.

@rhatdan
Copy link
Member

rhatdan commented Oct 26, 2020

Ok.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, if there’s only one value of force_mask that works usefully, and only one value of permissions_xattr that works at all, I’m not a fan of making this available as individual options — and requiring users to know the magic values.

Provide a higher-level permissions_mechanism=native|fuse-overlayfs, which instructs the code to do whatever is necessary to interoperate with fuse-overlayfs. That still allows us to support future permissions mechanisms in the future, but does not imply any assumptions about how those mechanisms work (i.e. that things are stored in an extended attribute at all, let alone using a specific format).

@rhatdan
Copy link
Member

rhatdan commented Oct 27, 2020

permissions_mechanism=native|shared?

I would prefer not to hard code fuse-overlay in this as well. Setting it shared basically means the images are all world readable.

@giuseppe
Copy link
Member Author

I've added the permissions_xattr= option only to avoid hard-coding fuse-overlayfs. Any idea on how it can be done differently?

I like making force_mask more user friendly and support native|shared. What if we support force_mask=0xxx|native|shared so that users can still customize the mask if needed?

@rhatdan
Copy link
Member

rhatdan commented Oct 27, 2020

I am fine with the three options, but having a general user have to figure out 0755 or 0000 stinks versus native and shared.

@giuseppe giuseppe force-pushed the force-mask branch 2 times, most recently from 454c5f6 to cc6f09b Compare October 28, 2020 11:02
@mtrmac
Copy link
Collaborator

mtrmac commented Oct 29, 2020

Per the other conversation, we might actually want two independent parameters (please suggest better names); “NFS-hosted” does not automatically imply “OK-to-share”.

  • metadata-storage=native|simulated , which handles whether ownership+permission(+ACL) of the images are stored directly using the kernel filesystem, or only in a xattr that is interpreted by a FUSE driver.
  • raw-file-permissions=0700 (private) | 0755 (shared-for-read) | 0775 (multiple-writers) | 0750 | 0770 (I ran out of naming ideas): only valid with metadata-storage=simulated, to support sharing the c/storage backing store in multi-user systems.

(Are there any actual multi-user consumers that will benefit, or is it just theoretical at this point? Multi-user computers are increasingly rare.)

@koebi001
Copy link

koebi001 commented Nov 4, 2020

(Are there any actual multi-user consumers that will benefit, or is it just theoretical at this point? Multi-user computers are increasingly rare.)

I am very interested in getting this working. In our lab we use a shared network file system and configure all hosts with slurm to better utilize our limited resources. This is especially important for machine learning topics where it gets more an more challenging to get the frameworks working with all their wide dependencies. Being able to use containers would be a big win.
Right now we have a single shared host for docker where just a few users are testing stuff. The local container storage takes already 1TB.
The only feasible way to run containers on a cluster environment is by having a shared network file system for often used container images.
Using rootless podman the only problem to solve seems to be sharing the storage. Most important the part which can be read only.

@giuseppe giuseppe force-pushed the force-mask branch 2 times, most recently from 3d067d4 to fcb371d Compare November 5, 2020 15:44
@giuseppe giuseppe marked this pull request as ready for review November 5, 2020 16:39
"private": it is equivalent to 0700. The owner has rwx access to the files.
"shared": it is equivalent to 0755. The owner has rwx access to the files and everyone else can read, access and execute them.

**permissions_xattr** = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a constant? Or at least document the one that fuse-overlay will use. Not sure why users would expect more then one xattr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is useless complexity. Just store the attributes in storage.dac.permissions or something and be done with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to hard code "fuse-overlayfs" in containers/storage. I can document it better that it is expected to be used with fuse-overlayfs, and perhaps explain how fuse-overlayfs uses it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just define a "storage" xattr and have fuse-overlyfs deal with it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuse-overlayfs currently handles every user.fuseoverlayfs* as a private namespace, while any other xattr is not filtered out. If we add a storage xattr we'd have to handle a separate namespace and not permit it inside of the containers, so user.storage or user.containers won't be available to container processes.

SkipMountHome string `toml:"skip_mount_home"`
// ForceMask indicates a mask of permissions (e.g. "0755") that are always set for
// created files. The file permissions will be: ORIGINAL_PERMISSIONS | FORCE_MASK.
ForceMask string `toml:"force_mask"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ORed, just an assignment (if not in the “native” case)? I don’t see why a file that is 0755 inside a container should be readable over NFS by users that have nothing to do with that file.

Note the recent realization that the / directory is frequently not a part of a container image, so we are defaulting it to 0555 = making every container’s root directory world-readable by default.

Or is this all pretty much irrelevant because the top-level permissions of c/storage are by default strict enough? Is that true, and reliable enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the NFS case, the mask to use is 0700 so they won't become world readable (unless they were already so that ORing the original permission with 0700 won't change the group and world permissions).

The ForceMask=0755 is a different use case. That is useful to share the storage with other users. In that cases all the files/directories must be world readable and accessible.

Do you think we should make this distinction clearer somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not disputing the 0700/0755 values. But consider that the default permissions of / in a container are 0555, so even in the “private” setting, the file will end up with external 0755 (and an in-container 0777 file would end up 0777 external regardless of ForceMask). Is that OK, or should the file really end up exactly 0700/0755?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. Both ways are enough for the NFS storage and shared storage cases.

I thought having the ForceMask work as an "opposite" umask could give more flexibility in the future, but at the moment I cannot think of any other use case where it would be useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the semantic to mean "final permissions mask". I guess it is easier to understand and also avoids some issues if a file in an image is world writeable and we end up opening the image to every other user.

// PermissionsXattr is specified is the xattr where the original file
// permissions for the file are stored using the format UID:GID:MODE.
// It is useful together with ForceMask.
PermissionsXattr string `toml:"permissions_xattr"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I’d still prefer a higher-level option but if this is the consensus, I can live with it.)

@rhatdan
Copy link
Member

rhatdan commented Nov 5, 2020

In my opinion having the force-mask is all you need. This tells containers/storage to store all of the content with that mask and to save the real owner/group/permissions in a storage defined xattr. Once this is standardized, then we can take third party tools like fuse-overlay and use it present the owner/group/permissions to the callers.

@nalind Might prefer this to be a separate driver then to override the overlay driver. How much work would it be to create a fuse-overlay driver rather? Would that make more sense then to store data that the native overlay driver does not understand?

@giuseppe giuseppe force-pushed the force-mask branch 2 times, most recently from f238780 to ee42141 Compare November 6, 2020 12:52
@giuseppe
Copy link
Member Author

giuseppe commented Nov 6, 2020

I've dropped the second option and always write user.containers.override_stat. I've changed fuse-overlayfs to honor it: containers/fuse-overlayfs#257

@nalind Might prefer this to be a separate driver then to override the overlay driver. How much work would it be to create a fuse-overlay driver rather? Would that make more sense then to store data that the native overlay driver does not understand?

would it be ok if we restrict the use case to mountProgram != "" and raise an error when force_mask is used without mountProgram?

@rhatdan
Copy link
Member

rhatdan commented Nov 6, 2020

Ok force the mount_program.

**ignore_chown_errors** = "false"
ignore_chown_errors can be set to allow a non privileged user running with a single UID within a user namespace to run containers. The user can pull and use any image even those with multiple uids. Note multiple UIDs will be squashed down to the default uid in the container. These images will have no separation between the users in the container. (default: false)

**force_mask** = "0000|native|shared|private"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to mention here that it requires mount_program.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update man page, then LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Should we mark it as experimental for now so we keep some freedom to change it if we need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 7, 2020

I've dropped the second option and always write user.containers.override_stat. I've changed fuse-overlayfs to honor it: [containers/fuse-overlayfs#257](containers/fuse-overlayfs#257

From a quick look, I can’t see anything that prevents this to be set from inside the container (I’m worried about allowing a low-privileged in-container user to escalate to a high-privileged in-container user via a set-UID executable); from a quick search I found ovl_setxattr which does not filter that out. Where is the security mechanism?

@rhatdan
Copy link
Member

rhatdan commented Nov 7, 2020

Yes fuse-overlay will need to be changed to prevent from the setting of the file attribute.

@giuseppe
Copy link
Member Author

giuseppe commented Nov 7, 2020

Thanks for the hint. I've not done it previously as the user already needs write permissions to use setxattr on that file. In any case, I've opened a PR to always block it: containers/fuse-overlayfs#258

@giuseppe
Copy link
Member Author

giuseppe commented Nov 9, 2020

marked as experimental ⬆️

Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the default storage.conf file and add the commented out force_mask flag and an explanation of what it is,

@giuseppe giuseppe force-pushed the force-mask branch 2 times, most recently from e0dea11 to 3d4111c Compare November 11, 2020 09:29
force_mask sets a permission mask used for the new files and
directories.

It is useful for using a NFS share for the rootless storage.  It
requires this change in fuse-overlayfs:

containers/fuse-overlayfs#246

[storage]
  driver = "overlay"
  graphroot = "/mnt/nfs/home/storage"
  [storage.options]
    size = ""
    mountopt = "xattr_permissions=2"
   [storage.options.overlay]
     force_mask = "0755"
     ignore_chown_errors = "true"

Signed-off-by: Giuseppe Scrivano <[email protected]>
if force mask is configured, store the original permissions in the
'user.containers.override_stat` xattr.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Nov 11, 2020

LGTM

@rhatdan rhatdan merged commit 9721783 into containers:master Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants