-
Notifications
You must be signed in to change notification settings - Fork 599
Description
The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.
The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.
Benefits and Cost
Let's review the features we get from digesting a container:
- Provide a common digest based on the on disk container image. It should
be invariant to distribution methods. Any implementation that creates a container
distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
identifier to verify and sign. - The digest should be cryptographically secure and can be verified across
implementations. Signing the digest should be sufficient to verify that a
container root file system has not been tampered. We provide a common
base to provide pre-run verification. - Such a digest should only be used to verify after building a container
root. Such a system is not a replacement for validation of content from an
untrusted source. Ensuring trust and content integrity are left to the content
distribution system.
We need to consider the following properties of any approach to achieve these goals:
- Security - Such a system needs to provide a sufficient level of security to
be useful. Content should be well-identified by its hash. - Cost - Walking a filesystem tree is slow and hashing all files is expensive
and wrecks the buffer cache. Minimizing this IO or not doing it all is
ideal. We need consider the cost against the benefits. - Stability - The digest needs to be calculated at a time when the container
layout is not changing. It also needs to be reproducible across runtime
environments.
Requirements
We can take the above to define specific requirements for the digest:
- The digest will be made up of the hash of hashes of each resource in the
container. - The order of the additions to the digest should be based on the lexical sort
order of the relative container path of the resource ensuring stability under
additions and deletions. - Each resource should only be stat’ed and read once during a digesting process.
- Unless specifically omitted, the digest should include the following resource types:
- files
- directories
- hard links
- soft links
- character devices
- block devices
- named fifo/pipes
- sockets
- The digest of each resource must fix the following attributes:
- File contents
- File path relative to the container root.
- Owners (uid or names?)
- Groups (gid or names?)
- File mode/permissions
- xattr
- major/minor device numbers for block/char devices
- link target names for hard/soft links
- The digest should be re-calculable using information about only changed
files.
The Straw Man
The specification currently proposes the following approach to provide
a common "script" location for containers to provide a digest. It is included
here for reference.
Digest
The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:
$ $CONTAINER_PATH/digest $CONTAINER_PATH
The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:
- The script itself should be included in the output in some way to avoid
tampering - The output should include the content, each filesystem path relative to the
root and any other attributes that must be static for the container to
operate correctly - The output must be stable
- The only constraint is that the signatures directory should be ignored to
avoid the act of signing preventing the content from being verified
The following is a naive example:
#!/usr/bin/env bash
set -e
# emit content for building a hash of the container filesystem.
content() {
root=$1
if [ -z "$root" ]; then
echo "must specify root" 1>&2;
exit 1;
fi
cd $root
# emit the file paths, stat and their content hash
find . -type f -not -path './signatures/*' -exec shasum -a256 {} \; | sort
# emit the script itself to prevent tampering
cat $scriptpath
}
scriptpath=$( cd $(dirname $0) ; pwd -P )/$(basename $0)
content $1 | shasum -a256The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.
Goals
Let's use this issue to decide the following:
- Do we all agree on the benefits of generating a common digest and signature scheme for containers at the runtime level?
- Are there any benefits, trade offs or considerations missed above?
- Should we provide algorithmic flexibility with a verified "script"
approach or should we define a very specific algorithm?