bundle: trade offs of schemes for bundle digest

The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.

The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.
### Benefits and Cost

Let's review the features we get from digesting a container:
1. Provide a common digest based on the on disk container image. It should
   be invariant to distribution methods. Any implementation that creates a container 
   distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
   identifier to verify and sign.
2. The digest should be cryptographically secure and can be verified across
   implementations. Signing the digest should be sufficient to verify that a
   container root file system has not been tampered. We provide a common
   base to provide pre-run verification.
3. Such a digest should only be used to verify _after_ building a container
   root. Such a system is not a replacement for validation of content from an
   untrusted source. Ensuring trust and content integrity are left to the content
   distribution system.

We need to consider the following properties of any approach to achieve these goals:
1. Security - Such a system needs to provide a sufficient level of security to
   be useful. Content should be well-identified by its hash.
2. Cost - Walking a filesystem tree is slow and hashing all files is expensive
   and wrecks the buffer cache. Minimizing this IO or not doing it all is
   ideal. We need consider the cost against the benefits.
3. Stability - The digest needs to be calculated at a time when the container
   layout is not changing. It also needs to be reproducible across runtime
   environments.
### Requirements

We can take the above to define specific requirements for the digest:
1. The digest will be made up of the hash of hashes of each resource in the
   container.
2. The order of the additions to the digest should be based on the lexical sort
   order of the relative container path of the resource ensuring stability under
   additions and deletions.
3. Each resource should only be stat’ed and read once during a digesting process.
4. Unless specifically omitted, the digest should include the following resource types:
   1. files
   2. directories
   3. hard links
   4. soft links
   5. character devices
   6. block devices
   7. named fifo/pipes
   8. sockets
5. The digest of each resource must fix the following attributes:
   1. File contents
   2. File path relative to the container root.
   3. Owners (uid or names?)
   4. Groups (gid or names?)
   5. File mode/permissions
   6. xattr
   7. major/minor device numbers for block/char devices
   8. link target names for hard/soft links
6. The digest should be re-calculable using information about only changed
   files.
### The Straw Man

The specification currently proposes the following approach to provide 
a common "script" location for containers to provide a digest. It is included 
here for reference.
#### Digest

The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:

```
$ $CONTAINER_PATH/digest $CONTAINER_PATH

```

The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:
- The script itself should be included in the output in some way to avoid
  tampering
- The output should include the content, each filesystem path relative to the
  root and any other attributes that must be static for the container to
  operate correctly
- The output must be stable
- The only constraint is that the signatures directory should be ignored to
  avoid the act of signing preventing the content from being verified

The following is a naive example:

``` bash
#!/usr/bin/env bash

set -e

# emit content for building a hash of the container filesystem.

content() {

    root=$1
    if [ -z "$root" ]; then
        echo "must specify root" 1>&2;
        exit 1;
    fi

    cd $root

    # emit the file paths, stat and their content hash
    find . -type f -not -path './signatures/*' -exec shasum -a256 {} \; | sort

    # emit the script itself to prevent tampering
    cat $scriptpath
}

scriptpath=$( cd $(dirname $0) ; pwd -P )/$(basename $0)

content $1 | shasum -a256
```

The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.
### Goals

Let's use this issue to decide the following:
- [ ] Do we all agree on the benefits of generating a common digest and signature scheme for containers at the runtime level? 
- [ ] Are there any benefits, trade offs or considerations missed above?
- [ ] Should we provide algorithmic flexibility with a verified "script"
  approach or should we define a very specific algorithm?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bundle: trade offs of schemes for bundle digest #5

Benefits and Cost

Requirements

The Straw Man

Digest

Goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bundle: trade offs of schemes for bundle digest #5

Description

Benefits and Cost

Requirements

The Straw Man

Digest

Goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions