Skip to content

Conversation

@mrunalp
Copy link
Contributor

@mrunalp mrunalp commented Aug 21, 2015

Closes #100
Signed-off-by: Mrunal Patel [email protected]

@wking
Copy link
Contributor

wking commented Aug 21, 2015

I think we need to specify the mechanism for configuring which file
descriptors should be leaked (although this is a runtime-caller issue
more than a bundle-author issue, so maybe it deserves a different spec
1).

The bundle author might want a way to map file descriptor names to a
particular semantic interpretation (e.g. so you can tell the
application that fd 3 is the listening socket), but you could handle
that with environment variables (2 and opencontainers/runc#200), so
I don't think we need any bundle-author docs beyond this note that it
may be possible to get additional file descriptors (although it would
be nice to link to the runtime-caller docs explaining how to configure
which file descriptors are leaked).

runtime-linux.md Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps pass instead of leak?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think we should settle on something other than "container process" as the word container means too many things today. "application" seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container process should be container's init process i think. Do we want to pass extra file descriptors for non-init process ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dqminh what is an "init" process? In my mind "init" is /sbin/init and I don't think the runtime should care whether it is a non-init or init process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i was thinking of init process as the first process that is running inside the container, as we can execute multiple processes over the lifetime of the container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://man7.org/linux/man-pages/man7/pid_namespaces.7.html calls it as the "init" process for the pid namespace. However, we could be in a situation where we don't use a new pid namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm fine with calling it an application. I think we should standardize one term for it through out the specs. application sounds fine as user may want to run it inside a VM instead of container for example. Then process/init process is not so approriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, application implies that there is only one application being run in the container whereas in reality there could be many.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp As defined today an OCI bundle only has a single application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philips Updated.

@conqerAtapple
Copy link

Since the use case here is running services inside containers with a mapped handle between the host and the container, why not use terminology like service, handles and host mapping?

@wking
Copy link
Contributor

wking commented Aug 28, 2015

The wording changes from 3174a03→22ede9f look good to me.

Do we want to suggest environment variables 1 and/or external specs
(like 2) for associating semantic meaning with file descriptors? Or
are we happy to leave that up to spec readers to figure out on their
own?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdin, stdout, and stderr only seem to be attached between the runtime and the application when process.terminal is false. When it's true, the applications standard streams are attached to a TTY. See (1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what those fds point to depends on the TTY setting, but the main point being that these are open for the application process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Aug 28, 2015 at 08:21:14AM -0700, Mrunal Patel wrote:

Yes, what those fds point to depends on the TTY setting, but the
main point being that these are open for the application process.

Ah, good point.

the standard streams should maybe not be in backticks. Alternatively,
these existing occurrences should be in backticks [1,2] ;).

@mrunalp
Copy link
Contributor Author

mrunalp commented Aug 28, 2015

@philips @crosbymichael @vbatts ping

@crosbymichael
Copy link
Member

LGTM

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 28, 2015

@mrunalp I'm sorta failing to understand how this closes #100. Shouldn't this include how you can pass additional file descriptors?

@mrunalp
Copy link
Contributor Author

mrunalp commented Aug 28, 2015

@LK4D4 That is left as an implementation detail to the runtime. For e.g. using environment variables like in opencontainers/runc#231. What we concluded was that having a serializable field in the spec json for ExtraFields doesn't make much sense as the fds and what values they have is dynamic and determined at runtime. What could make sense instead over there is passing fds for specific files.

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 28, 2015

@mrunalp Hmm, I don't like implementation detail, why we need anything in spec then? Why this field can't be in runtime?

@philips
Copy link
Contributor

philips commented Aug 28, 2015

LGTM

@philips
Copy link
Contributor

philips commented Aug 28, 2015

I would like for the section to expand to explain that some of the fds may be opened but connected to /dev/null (like stdin).

@mrunalp
Copy link
Contributor Author

mrunalp commented Aug 28, 2015

@LK4D4 Yeah, so we aren't adding anything to the spec. Just saying that the runtimes may pass additional fds. If we want to standardize that then we may have to prescribe environment variables like OCI_LISTEN_FDS and OCI_LISTEN_FDS_START but they will still have to set from the application specific ones to make sense.

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 28, 2015

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/null should probably be in backticks.

@philips
Copy link
Contributor

philips commented Aug 29, 2015

Most everyone has LGTM'd this so I am going to fix the one nit about backticks around the /dev/null and merge.

@philips
Copy link
Contributor

philips commented Aug 29, 2015

Merged via e3ee431

@philips philips closed this Aug 29, 2015
@mrunalp
Copy link
Contributor Author

mrunalp commented Aug 29, 2015

@philips Thanks!

wking added a commit to wking/oci-command-line-api that referenced this pull request Dec 2, 2015
This landed in runC with [1], but the bundle-author <-> runtime specs
explicitly avoid talking about how this is set (since the
bundle-author doesn't care about the runtime-caller <-> runtime
interface) [2].  However, *this* spec is about the runtime-caller <->
runtime interface, so we need to document it here.

I've left LISTEN_PID [3,4] out, since I don't see how the
runtime-caller would choose anything other than 1 for its value.  It
seems like something that a process would have to set for itself
(because guessing the PID of a child before spawning it seems racy ;).
In any event, the runC implementation seems to set this to 1
regardless of what systemd passes to it [4].

I've borrowed Shishir's wording for the example [4].

[1]: opencontainers/runc#231
[2]: opencontainers/runtime-spec#113 (comment)
[3]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[4]: opencontainers/runc#231 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/oci-command-line-api that referenced this pull request Dec 2, 2015
This landed in runC with [1], but the bundle-author <-> runtime specs
explicitly avoid talking about how this is set (since the
bundle-author doesn't care about the runtime-caller <-> runtime
interface) [2].  However, *this* spec is about the runtime-caller <->
runtime interface, so we need to document it here.

I've left LISTEN_PID [3,4] out, since I don't see how the
runtime-caller would choose anything other than 1 for its value.  It
seems like something that a process would have to set for itself
(because guessing the PID of a child before spawning it seems racy ;).
In any event, the runC implementation seems to set this to 1
regardless of what systemd passes to it [4].

I've borrowed Shishir's wording for the example [4].

[1]: opencontainers/runc#231
[2]: opencontainers/runtime-spec#113 (comment)
[3]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[4]: opencontainers/runc#231 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Jul 25, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.20