Skip to content

bundle: trade offs of schemes for bundle digest #5

@stevvooe

Description

@stevvooe

The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.

The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.

Benefits and Cost

Let's review the features we get from digesting a container:

  1. Provide a common digest based on the on disk container image. It should
    be invariant to distribution methods. Any implementation that creates a container
    distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
    identifier to verify and sign.
  2. The digest should be cryptographically secure and can be verified across
    implementations. Signing the digest should be sufficient to verify that a
    container root file system has not been tampered. We provide a common
    base to provide pre-run verification.
  3. Such a digest should only be used to verify after building a container
    root. Such a system is not a replacement for validation of content from an
    untrusted source. Ensuring trust and content integrity are left to the content
    distribution system.

We need to consider the following properties of any approach to achieve these goals:

  1. Security - Such a system needs to provide a sufficient level of security to
    be useful. Content should be well-identified by its hash.
  2. Cost - Walking a filesystem tree is slow and hashing all files is expensive
    and wrecks the buffer cache. Minimizing this IO or not doing it all is
    ideal. We need consider the cost against the benefits.
  3. Stability - The digest needs to be calculated at a time when the container
    layout is not changing. It also needs to be reproducible across runtime
    environments.

Requirements

We can take the above to define specific requirements for the digest:

  1. The digest will be made up of the hash of hashes of each resource in the
    container.
  2. The order of the additions to the digest should be based on the lexical sort
    order of the relative container path of the resource ensuring stability under
    additions and deletions.
  3. Each resource should only be stat’ed and read once during a digesting process.
  4. Unless specifically omitted, the digest should include the following resource types:
    1. files
    2. directories
    3. hard links
    4. soft links
    5. character devices
    6. block devices
    7. named fifo/pipes
    8. sockets
  5. The digest of each resource must fix the following attributes:
    1. File contents
    2. File path relative to the container root.
    3. Owners (uid or names?)
    4. Groups (gid or names?)
    5. File mode/permissions
    6. xattr
    7. major/minor device numbers for block/char devices
    8. link target names for hard/soft links
  6. The digest should be re-calculable using information about only changed
    files.

The Straw Man

The specification currently proposes the following approach to provide
a common "script" location for containers to provide a digest. It is included
here for reference.

Digest

The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:

$ $CONTAINER_PATH/digest $CONTAINER_PATH

The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:

  • The script itself should be included in the output in some way to avoid
    tampering
  • The output should include the content, each filesystem path relative to the
    root and any other attributes that must be static for the container to
    operate correctly
  • The output must be stable
  • The only constraint is that the signatures directory should be ignored to
    avoid the act of signing preventing the content from being verified

The following is a naive example:

#!/usr/bin/env bash

set -e

# emit content for building a hash of the container filesystem.

content() {

    root=$1
    if [ -z "$root" ]; then
        echo "must specify root" 1>&2;
        exit 1;
    fi

    cd $root

    # emit the file paths, stat and their content hash
    find . -type f -not -path './signatures/*' -exec shasum -a256 {} \; | sort

    # emit the script itself to prevent tampering
    cat $scriptpath
}

scriptpath=$( cd $(dirname $0) ; pwd -P )/$(basename $0)

content $1 | shasum -a256

The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.

Goals

Let's use this issue to decide the following:

  • Do we all agree on the benefits of generating a common digest and signature scheme for containers at the runtime level?
  • Are there any benefits, trade offs or considerations missed above?
  • Should we provide algorithmic flexibility with a verified "script"
    approach or should we define a very specific algorithm?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions