ccon

A single binary to handle basic container creation. The goal is to produce a lightweight tool in C that can serve as a test-bed for Open Container Specification development. Ccon is thin wrapper around the underlying syscalls and kernel primitives. It makes it easy to apply a given configuration, but does not have an opinion about what a container should look like (it's even less opinionated than LXC).

Lifecycle

When you invoke it from the command line, ccon clones a child process to create any new namespaces declared in the config file. The parent process continues running in the host namespace. When the child process exits, the host process collects its exit status and returns it to the caller. During an initial setup phase, the two processes pass messages on a Unix socket to synchronize the container setup. Here's an outline of the lifecycle:

Host process	Container process
opens host executable
opens namespace files
clones child →	(clone unshares namespaces)
sets user-ns mappings	blocks on user-ns mappings
sends mappings-complete →
blocks on full namespace	joins namespaces
	mounts filesystems
	← sends namespaces-complete
runs pre-start hooks	blocks on exec-message
sends exec-message →
	opens the local ptmx
	← sends pseudoterminal master
waits on child death	executes user process
splicing standard streams	…
onto the pseduoterminal master
	dies
collects child exit code
runs post-stop hooks
exits with child's code

A number of those steps are optional. For details, see the relevant section in the configuration specification. In general, leaving out a particular value (e.g. namespaces.user.setgroups or namespaces.mount.mounts) will result in that potential action (e.g. writing to /proc/{pid}/setgroups or calling mount) being skipped, while the rest of ccon carries on as usual.

Users who need to join namespaces before unsharing namespaces can use nsenter or a wrapping ccon invocation to join those namespaces before the main ccon invocation creates the new mount namespace.

Configuration

Ccon is similar to an Open Container Specification runtime in that it reads a configuration file named config.json from its current working directory. However the JSON content is a bit different to highlight how the components relate to each-other on Linux. For example, setting per-container mounts requires a mount namespace, so ccon's mount listing falls under namespaces.mount.mounts. There's an example in config.json that unprivileged users should be able to use to launch an interactive BusyBox shell in new namespaces (you may need to adjust the hostID entries to match id -u and id -g).

If you want to use ccon to launch OCI bundles, you can use the ccon-oci wrapper (example), which supports the Open Container Specification and the runtime command-line API.

You can load the configuration from a different file by giving its path with the --config option. For example:

$ ccon --config path/to/config.json

or:

$ ccon --config /dev/fd/4 4<path/to/config.json

or (using Bash's process substitution):

$ ccon --config <(echo '{"version": "0.4.0", "process": …}')

You can also specify the config JSON directly on the command line with --config-string, which may be convenient in situations where using pipes or process substitution are too awkward:

$ ccon --config-string '{"version": "0.4.0", "process": …}'

There are additional examples focusing on specific tasks in the examples/ directory.

Version

The ccon version represented in the config file.

version (required, SemVer 2.0.0 string)

Example

"version": "0.4.0"

Namespaces

A set of namespaces to be created or joined by the container process. Keys match the long-form options from unshare and nsenter without their leading hyphens. For each namespace entry, the presence of a path key means the container process will join an existing namespace at the absolute path specified by the path value. The absence of a path key means a new namespace will be created. There may be additional per-namespace configuration in the namespace object. If there is no namespaces entry or its value is an empty object, the container process will inherit all its namespaces from the host process. Similarly, if a particular namespaces entry is missing (e.g. user), the container process will inherit that namespace from the host process.

namespaces (optional, object) containing entries for each new or joined namespace.

Example

"namespaces": {
  "uts": {},
  "net": {"path": "/proc/2186/ns/net"},
  "user": {"setgroups": false}
}

Which will create new UTS and user namespaces, join the network namespace at /proc/2186/ns/net, and disable setgroups in the new user namespace.

User namespace

New user namespaces support the /proc/{pid}/{path} files setgroups, uid_map, and gid_map discussed in user_namespaces(7).

user (optional, object) which may contain:
- path (optional, string) the absolute path to a network namespace which the container process should join.
- setgroups (optional, boolean) whether to enable or disable setgroups. Implemented by writing to /proc/{pid}/setgroups.
- uidMappings (optional, array of objects) maps user IDs between the new namespace and its parent namespace. Implemented by writing to /proc/{pid}/uid_map. Array entries are objects with the following fields:
  - containerID (required, integer) is the start of the mapped UID range in the new namespace.
  - hostID (required, integer) is the start of the mapped UID range in the parent namespace.
  - size (required, integer) is the length of the range of mapped UIDs.
- gidMappings (optional, array of objects) maps group IDs between the new namespace and its parent namespace. Implemented by writing to /proc/{pid}/gid_map. Array entries are objects with the following fields:
  - containerID (required, integer) is the start of the mapped GID range in the new namespace.
  - hostID (required, integer) is the start of the mapped GID range in the parent namespace.
  - size (required, integer) is the length of the range of mapped GIDs.

Example

"user": {
  "setgroups": false,
  "uidMappings": [
    {
      "containerID": 0,
      "hostID": 1000,
      "size": 1
    }
  ],
  "gidMappings": [
    {
      "containerID": 0,
      "hostID": 1000,
      "size": 1
    }
  ]
},

Which will disable setgroups and map the host user and group 1000 to the container user and group 0.

Mount namespace

New mount namespace support the creation of arbitrary mounts, assuming the caller has sufficient privileges for the underlying syscall. The user namepace documentation outlines the mount permissions for processes inside a user namespace.

mount (optional, object) which may contain:
- path (optional, string) the absolute path to a network namespace which the container process should join.
- mounts (optional, array) an ordered list of mounts to perform. Array entries are objects with fields based on the mount call:
  - type (string) of mount (see filesystems(5)).
  - source (string) path of mount. This may be optional or required depending on type.
  - target (string, required) path of the mount being created or manipulated.
  - flags (array of strings, optional) MS_* flags to set.
  - data (string, optional) type-specific data for the mount.

If they don't start with a slash, source and target are interpreted as paths relative to ccon's current working directory.

In addition to the usual types supported by mount, ccon supports a pivot-root type that invokes the pivot_root syscall, shifting the old root to a temporary (after which it is unmounted and the temporary directory is removed). In that case, the only other field that matters is source, which specifies

Example

"mount": {
  "mounts": [
    {
      "source": "rootfs",
      "target": "rootfs",
      "flags": [
        "MS_BIND"
      ]
    },
    {
      "source": "/etc/resolv.conf",
      "target": "rootfs/etc/resolv.conf",
      "flags": [
        "MS_BIND"
      ]
    },
    {
      "source": "root",
      "target": "rootfs/root",
      "flags": [
        "MS_BIND"
      ]
    },
    {
      "source": "rootfs",
      "type": "pivot-root"
    }
  ]
}

Which will bind ${PWD}/rootfs to itself (the “trick” mentioned in switch_root(8) which we need for the later pivot), bind the host's resolv.conf onto ${PWD}/rootfs/etc/resolv.conf, bind ${PWD}/root onto ${PWD}/rootfs/root, and pivot to make ${PWD}/rootfs the container root.

PID namespace

There is no special configuration for the PID namespace, although if you are creating both a PID and a mount namespace, you probably want mount entries along the lines of:

{
  "target": "/proc",
  "flags": [
    "MS_PRIVATE",
    "MS_REC"
  ]
},
{
  "target": "/proc",
  "type": "proc",
  "flags": [
    "MS_NOSUID",
    "MS_NOEXEC",
    "MS_NODEV"
  ]
}

For more details, see the “/proc and PID namespaces” section of pid_namespaces(7).

pid (optional, object) which may contain:
- path (optional, string) the absolute path to a PID namespace which the container process should join.

Network namespace

There is no special configuration for the network namespace.

net (optional, object) which may contain:
- path (optional, string) the absolute path to a network namespace which the container process should join.

IPC namespace

There is no special configuration for the IPC namespace.

ipc (optional, object) which may contain:
- path (optional, string) the absolute path to an IPC namespace which the container process should join.

UTS namespace

There is no special configuration for the UTS namespace, although future work might build in support for sethostname.

uts (optional, object) which may contain:
- path (optional, string) the absolute path to a UTS namespace which the container process should join.

Process

After the container setup is finished, the container process can optionally adjust its state and execute the configured code. If process isn't specified, the container process will exit (with an exit code of zero) instead of executing a user process (which can be useful for the creation phase of a workflow that separates creation from execution).

process (optional, object) configuring the container process after the container is setup.

Example

"process": {
  "args": ["busybox", "sh"]
}

Which will execvpe a BusyBox shell with the host process's user and group (possibly mapped by the user namespace), working directory, and environment.

Terminal

If you launch ccon from a terminal (e.g. tty or test -t 0 return zero), your standard input is already a terminal and you probably don't need to worry about this setting. If you launch ccon from a non-terminal process (e.g. from a webserver that is communicating with the user over a socket), you may want to create a UNIX 98 psuedoterminal to do things like translate the user's control-C into SIGINT for the container.

Containers that do not pivot root or who otherwise keep access to the host ptmx can create such a pseudoterminal by calling opening the ptmx (e.g. with posix_openpt).

Containers that are pivoting to a new root and mounting their devpts with newinstance will want to ensure that the pseudoterminal is created using a devpts instance that will be accessible after the pivot, and there are a number of issues to consider.

terminal (optional, boolean) if true, the process will open its local /dev/ptmx (e.g. with posix_openpt), dup the pseudoterminal slave over its standard streams, and send the pseudoterminal master back to the host process. The host process will continually copy its standard input to that pseudoterminal master and the pseudoterminal master to its standard output.

Example

"args": ["sh"],
"terminal": true

User

Adjust the user and group IDs before executing the user-specified code.

uid (optional, integer) to setuid a different user.
gid (optional, integer) to setgid a different group.
additionalGids (optional, array of integers) for setgroups. See also namespaces.user.setgroups.

Example

"user": {
  "uid": 0,
  "gid": 0,
  "additionalGids": [5, 6]
}

Which will lead to a container process with id output like:

uid=0(root) gid=0(root) groups=0(root),5(tty),6(disk)

Current working directory

Change to a different directory before executing the configured code.

cwd (optional, string) to chdir to a different directory. If unset, the current directory will remain the same as the caller's working directory, unless there is a pivot-root entry in namespaces.mount.mounts, in which case the default working directory will be the new root.

Example

"cwd": "/root"

Capabilities

Define the minimum set of capabilities required for the container process. All other capabilities are dropped from all capabilities sets, including the bounding set, before executing the configured code.

capabilities (optional, array of strings) Set of CAP_* flags to set.

If unset, the container process will continue with the caller's capabilities (potentially increased in a child user namespace).

Example

"capabilities": [
  "CAP_NET_BIND_SERVICE",
  "CAP_NET_RAW"
]

Arguments

The command that the container process executes after container setup is complete. The process will inherit any open file descriptors; for example the standard streams (unless terminal is true) or systemd's SD_LISTEN_FDS_START.

args (optional, array of strings) holds command-line arguments passed to execvpe. The first argument (args[0]) is also used as the path, unless path is set.

If unset, the container process will exit with status zero instead of executing new code (see Process).

Example

"args": [
  "nginx",
  "-c",
  "/nginx.conf"
]

Which will execute an Nginx server using the configuration in /nginx.conf.

Path

Override args[0] with an alternate path (but the executed code will still see args[0] as its first argument).

path (optional, string) sets the path to the executed command. Paths without slashes will be resolved using the PATH environment variable.

Example

"args": ["sh"],
"path": "busybox"

Which will execute the first busybox executable found in your PATH with its argv[0] set to sh.

Host

Instead of looking up args[0] (or path) in the container mount namespace, look it up in the host mount namespace using the host PATH. This allows you to launch (via execveat, so you need Linux 3.19+) a statically-linked init process that only exists on the host.

host (optional, boolean) lookup args[0] (or path) in the host mount namespace using the host PATH.

Example

"args": ["sh"],
"path": "busybox",
"host": true

Which will execute the first busybox executable found in your PATH with its argv[0] set to sh.

Environment variables

Override the host environment.

env (optional, array of strings) holds environment settings for execvpe.

If unset, the container process will use the environ it inherited from the host.

Example

"env": [
  "PATH=/bin:/usr/bin",
  "TERM=xterm"
]

Which will set PATH and TERM.

Hooks

Not all container-related functionality is built into ccon (the only setup handled by the host process is the /proc/{pid}/setgroups, etc., writes for user namespaces. For example, control group manipulation and veth network configuration should be handled with external tools. What ccon provides are hooks so you can call those external tools at the appropriate point in the lifecycle.

hooks (optional, object) configuring the hooks run for each hook-triggering event.

Example

"hooks": {
  "pre-start": [
    {
      "args": [
        "echo",
        "I'm a pre-start hook"
      ]
    }
  ],
  "post-stop": [
    {
      "args": [
        "echo",
        "I'm a post-stop hook"
      ]
    }
  ]
}

Which will just print messages to the host process's stdout for each hook-triggering event.

Pre-start hooks

Hooks run after the container setup is complete but before the configured process is executed. This is useful for additional container configuration (e.g. creating cgroups or performing network setup)

pre-start (optional, array of objects) holds process objects (like process except for stdin handling and the lack of host) to run after the pre-start event.

Each hook receives the container process's PID in the host PID namespace on its stdin. Its stdout and stderr are inherited from the host process (unless terminal is true). The hooks are executed in the listed order, the host process waits until each hook exits before executing the next, and a nonzero exit code from any hook will cause the host process to abandon further hook execution, SIGKILL the container process. The host process resumes the usual lifecycle at “waits on child death”.

Example

"pre-start": [
  {
    "args": [
      "mkdir",
      "-p",
      "/sys/fs/cgroup/unified/nginx-0/container"
    ]
  },
  {
    "args": [
      "tee",
      "/sys/fs/cgroup/unified/nginx-0/container/cgroup.procs"
    ]
  }
]

Which will create new nginx-0 and nginx-0/container cgroups in the unified hierarchy (if they don't already exist) and add the container process to that cgroup.

Post-stop hooks

Hooks run after the host process has reaped the container process. You could handle this in the shell with:

$ ccon; post_stop_hook_1; post_stop_hook_2

but the most common use will be cleaning up after pre-start hooks, and it's nice to configure both in the same place (the ccon config file).

post-stop (optional, array of objects) holds process objects (like process except for the lack of host) to run after the post-stop event.

Its standard streams are inherited from the host process (unless terminal is true). The hooks are executed in the listed order, the host process waits until each hook exits before executing the next, and a nonzero exit code from any hook will cause the host process to print a message to stderr, after which it continues as if the hook had exited with zero.

Example

"post-stop": [
  {
    "args": [
      "rmdir",
      "/sys/fs/cgroup/unified/nginx-0/container"
    ]
  },
  {
    "args": [
      "rmdir",
      "/sys/fs/cgroup/unified/nginx-0"
    ]
  }
]

Which will remove nginx-0/container and nginx-0 cgroups (such as those created by the pre-start example. This will only succeed if the namespaces are empty, so if you were using this in production it would be best to:

Ensure there were no other processes in those cgroups (e.g. by creating a new PID namespace and adding all additional processes to that namespace before adding them to the nginx-0 cgroup tree)
Use a tool like cgdelete to recursively remove nginx-0, which would also remove additional child cgroups beyond nginx-0/container that may have been added by other processes since nginx-0 was created.

Dependencies

Linux headers for 3.19+ for execveat (sys-kernel/linux-headers on Gentoo).
The GNU C Library (sys-libs/glibc on Gentoo).
Jansson for JSON parsing (dev-libs/jansson on Gentoo).
libcap-ng for adjusting capabilities (sys-libs/libcap-ng on Gentoo).

Build dependencies

Ccon is pretty easy to compile, but to use the stock Makefile, you'll need:

A C compiler like GCC (sys-devel/gcc on Gentoo).
GNU Make (sys-devel/make on Gentoo).
pkg-config (dev-util/pkgconfig on Gentoo).

Development dependencies

indent (dev-util/indent on Gentoo). Invoke with make fmt.

Licensing

Because all the dependencies are GPL-compatible, ccon binaries can be distributed under the GPLv3+.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
examples		examples
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
COPYING		COPYING
Makefile		Makefile
README.md		README.md
ccon-ced		ccon-ced
ccon-oci		ccon-oci
ccon.c		ccon.c
config.json		config.json

License

wking/ccon

Folders and files

Latest commit

History

Repository files navigation

ccon

Table of contents

Lifecycle

Configuration

Version

Example

Namespaces

Example

User namespace

Example

Mount namespace

Example

PID namespace

Network namespace

IPC namespace

UTS namespace

Process

Example

Terminal

Example

User

Example

Current working directory

Example

Capabilities

Example

Arguments

Example

Path

Example

Host

Example

Environment variables

Example

Hooks

Example

Pre-start hooks

Example

Post-stop hooks

Example

Dependencies

Build dependencies

Development dependencies

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages