Skip to content

Comments

[EXPERIMENTAL] new mount type: introspection#26331

Closed
AkihiroSuda wants to merge 1 commit intomoby:masterfrom
AkihiroSuda:introspectionfs-wip
Closed

[EXPERIMENTAL] new mount type: introspection#26331
AkihiroSuda wants to merge 1 commit intomoby:masterfrom
AkihiroSuda:introspectionfs-wip

Conversation

@AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Sep 6, 2016

CURRENT STATUS (Apr 6, 2017): discussing the design about "scope": #26331 (comment)


What I did

Introduced a new mount type: "introspection".

The introspection mount is a new feature that allows you to introspect the
metadata about the container from the container itself, via a procfs-like
filesystem.

Hierarchy

If you enable the introspection mount, following files are created under the mount point:

  • container/id: ID string of the container. e.g. 2f3cc2b029e0ca46564d5a5e38772b09947056f3b22b6a114054a468382e872e\n
  • container/name: Name of the container. e.g. nginx.3.8tc0va0kw59rbbdh5x3iqc3v9\n
  • container/fullname: Full name of the container. e.g. /nginx.3.8tc0va0kw59rbbdh5x3iqc3v9\n
  • container/labels/{LABELNAME}: Label of the container. e.g. the content of container/labels/com.docker.swarm.service.name would be nginx\n
  • daemon/name: Hostname of the daemon node. e.g. host01\n

For Swarm task containers running on a manager node, following files appear as well:

  • service/id: ID string of the service. e.g. 6h7nic7tsv16cfo0qhywj7bsh\n
  • service/name: Name of the service .e.g. nginx\n
  • task/id: ID string of the task. e.g. 8tc0va0kw59rbbdh5x3iqc3v9\n
  • task/name: Name of the task. e.g. nginx.3.8tc0va0kw59rbbdh5x3iqc3v9
  • task/slot: Slot number (1-based index for replicas) of the task. e.g. 1\n. Please also refer to the documentation of the Swarmkit. Note that there are cases where a slot may h
    ave multiple tasks with the desired state of RUNNING.

Use cases for the introspection mount

Below are some example use cases for the introspection mount.

Deploying a service that requires the task slot number (e.g. Apache ZooKeeper)

Apache ZooKeeper is a highly available coordination service that is used by
distributed systems such as Hadoop. A typical configuration file (zoo.cfg)
for ZooKeeper would be as follows:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

ZooKeeper also requires a file named myid to be located under dataDir.
The content of myid is 1\n for server.1, 2 for server.2, and so on.

The task/slot file under the introspection mount can be used for generating
the myid file.

See also #24110

3rd party job scheduler

A 3rd party job scheduler can be built on a Docker service using the
introspection mount.

For example, the task/slot file under the introspection mount can be used for
implementing a scheduler that executes multiple batch jobs in parallel.
(Similar to the {%} symbol in the GNU parallel.)

See also #23843

3rd party orchestration/monitoring tool

A container can send the container/id file under the introspection mount to
some 3rd party orchestration/monitoring tool. Then such a tool take appropriate
action using the ID information.

See also #7685

How I did it

Added a new mount type called "introspection".

The filesystem hiearchy is defined as the RuntimeContext struct in daemon/rtcontext.go.
The struct is mapped to the actual files via daemon/introspection.go.

How to verify it

Start the daemon with --experimental flag.

Then create a service with the introspection mount /foo:

$ docker service create --name nginx --replicas 3 --mount type=introspection,introspection-scope=.,dst=/foo nginx

Enter a container and read the files under /foo:

$ docker exec -it $(docker ps -q -f label=com.docker.swarm.service.name=nginx | head -n 1) sh
# find /foo
/foo
/foo/container
/foo/container/name
/foo/container/labels
/foo/container/labels/com.docker.swarm.service.id
/foo/container/labels/com.docker.swarm.node.id
/foo/container/labels/com.docker.swarm.task.name
/foo/container/labels/com.docker.swarm.task
/foo/container/labels/com.docker.swarm.task.id
/foo/container/labels/com.docker.swarm.service.name
/foo/container/id
/foo/container/fullname
/foo/task
/foo/task/name
/foo/task/slot
/foo/task/id
/foo/service
/foo/service/name
/foo/service/id
/foo/daemon
/foo/daemon/name
# cat /foo/task/slot
3

Note that the service directory and the task directory can be seen on manager nodes in the current implementation.

Description for the changelog

new mount type: introspection

A picture of a cute animal (not mandatory but encouraged)

penguin

Replaces #24893
Fix #24110
Update #23843, #24113, #26318

CC (those whose name appeared at #24893):
@stevvooe @justincormack @SvenDowideit @thaJeztah @dweomer @vbatts @dmcgowan @crosbymichael @tonistiigi

The code works, but maybe merging this PR should be deferred until several relevant PRs get merged (e.g. moby/swarmkit#1563, #26837, #26825)

TODOs

Some ideas might worth adding in future (in another PR)

For reference, an older version of this PR (which was implemented as a built-in volume driver) is available at https://web.archive.org/web/20161014053843/https://github.com/docker/docker/pull/26331 .

Signed-off-by: Akihiro Suda [email protected]

@justincormack
Copy link
Contributor

Hi, thanks, taking a look.

I really think map[string]string needs to be individual files ie a file for each label. If we are going to have files you should not have to do any parsing, not microformats. Especially as labels can be multiple paragraphs and contain line breaks.

@stevvooe
Copy link
Contributor

stevvooe commented Sep 6, 2016

@justincormack Is there a way we can do that without having user input become filenames?

@justincormack
Copy link
Contributor

@stevvooe hmm, probably no. We might have to filter keys that are not within the recommended guidelines for naming too.

@AkihiroSuda
Copy link
Member Author

Thank you for the comment and it makes sense 😃

2 questions:

  1. WDYT about escaping filenames? I think we can use net/url.QueryEscape.
  2. WDYT about dynamic label update Proposal: Update labels  #21721 ? There would be an atomicity problem if we allow updating multiple labels at once.

If we support updating multiple lables at once, I suggest creating a symlink from labels to a labels.snapshot.XXXXX directory, but I feel it is not pretty. So my opinion is not to support updating multiple labels at once.

@justincormack
Copy link
Contributor

  1. We already recommend strongly that label names be DNS names, so I would recommend only adding names that are valid DNS names to this interface (ie no initial ., 0-9a-zA-Z-., 64 characters max length). These should then be valid filenames on all systems, otherwise there will be other issues. Labels that are not DNS names would not be available in this interface at all.
  2. I don't think we should make any guarantee that multiple label updates are in any way a "transaction", in this or any other interface. If you have dependent data, put that as the value of a single label (eg hostname plus port; complex cases the user can have a json value for atomicity). So long as each individual label update is atomic that is fine.

@justincormack
Copy link
Contributor

justincormack commented Sep 7, 2016

Ok, trying it out a bit more, re other discussion points:

  1. re / in container name, hmm, not sure. This is confusing, as it is still not used anywhere.
  2. yes definitely remove the swarm directories if not a service, then it is obvious how to see if you are a service or not, it looks odd having them for non swarm containers.
  3. I don't think we need the v1 prefix. We can deprecate fields and add new ones, if we made radical changes we could add a v2 volume driver, as with this model you still have to choose to mount it. We could also expose a version file (not not really sure that is necessary).

My other concern at present is the container can't see where its introspection mountpoint is, so if it is mounted in a place it does not expect the scripts in the container won't work. I wonder if we should have a cli option --introspect which is short for say -v docker_introspection:/docker and so have a recommended mountpoint but still let you override it if you want?

@justincormack
Copy link
Contributor

One other thing people have asked for is host port info for published ports. We could add eg ports/tcp/80/HostPort for each published port. There is an issue about HostIp which is normally 0.0.0.0 which is not very helpful. This is also different in swarm mode, as you cant guarantee to reach a particular container, so maybe there are too many complexities to this for now (lets decide about design first).

@justincormack
Copy link
Contributor

The test failures don 't like the imports:

10:32:16 # github.com/docker/docker/daemon/cluster/executor/container
10:32:16 import cycle not allowed in test
10:32:16 package github.com/docker/docker/daemon/cluster/executor/container (test)
10:32:16    imports github.com/docker/docker/daemon
10:32:16    imports github.com/docker/docker/daemon/cluster
10:32:16    imports github.com/docker/docker/daemon/cluster/executor/container
10:32:16 
10:32:16 FAIL   github.com/docker/docker/daemon/cluster/executor/container [setup failed]

@ehazlett ehazlett added the status/failing-ci Indicates that the PR in its current state fails the test suite label Sep 8, 2016
@justincormack
Copy link
Contributor

Also needs a rebase...

@justincormack
Copy link
Contributor

@AkihiroSuda I think generally I like the overall design now. There is a bunch of stuff to resolve still and some decisions to make, but I think it is the right general direction...

@AkihiroSuda
Copy link
Member Author

@justincormack Thank you, I'll update this PR ASAP

@AkihiroSuda AkihiroSuda force-pushed the introspectionfs-wip branch 2 times, most recently from 2ee4fb6 to 4fffea9 Compare September 9, 2016 07:36
@AkihiroSuda
Copy link
Member Author

@justincormack Updated PR, please look into this?

Changes

Filesystem View

For #26331 (comment), #26331 (comment)

  • Now map[string]string are separate files. (daemon/introspect.go)
  • Prohibit a label that contains /, \, or : from appearing at the filesystem. For keeping compatibility, I didn't make the daemon itself to reject these bad labels. The introspector skips such a label and just calls logrus.Warn. (daemon/introspect.go)

For #26331 (comment)

  • Removed '/' prefix from the value of /container/name. However, I introduced /container/fullname that has the '/' prefix. I'm not sure we need this container/fullname file. (daemon/rtcontext.go)
  • Hide /task, /service, and node if the container is not a swarm task. (daemon/rtcontext.go, daemon/introspect.go)
  • Removed v1 prefix. We could safely add a v2 driver for future radical changes. (daemon/rtcontext.go)

Others (RFC!):

  • /task/name was always empty. So I switched to use the value of Container.Labels["com.docker.swarm.task.name"] rather than types/swarm.Task.Name.
  • There has been already dynamic label update for swarm related things. I removed these label files just for ease of the initial implementation.
  • Introduced /daemon (corresponds to types.Info). Currently, only /daemon/name and /daemon/labels are exposed. I think /daemon/labels is useful but not sure about /daemon/name.

Build

  • Rebased to upstream
  • Avoid import cyle ([EXPERIMENTAL] new mount type: introspection #26331 (comment)) by removing a import from github.com/docker/docker/daemon/cluster to github.com/docker/docker/daemon, and introducing the IntrospectableCluster interface in the daemon package. (daemon/cluster.go, daemon/daemon.go)

Future work (IMO they should be separate PRs?)


DEMO

$ docker run --rm  -v docker_introspection:/foo busybox sh -c 'for f in $(find /foo -type f);do echo ===$f===; cat $f; done'
===/foo/container/name===
romantic_noether
===/foo/container/id===
3b5f9e1bacf85424dbbbbd1373a0ae17ae00504fa1618059fa85bf9f4ad29e30
===/foo/container/fullname===
/romantic_noether
===/foo/daemon/name===
ws01
$ docker service create --name nginx --replicas 3 --mount type=volume,src=docker_introspection,dst=/foo nginx
$ docker exec $SOME_NGINX_CONTAINER bash -c 'for f in $(find /foo -type f);do echo ===$f===; cat $f; done'
===/foo/container/name===
nginx.1.8toszsdz7duypbbl8uwehlrbs
===/foo/container/labels/com.docker.swarm.service.id===
0g8upo0xzr0l7pnw41jcdpt1d
===/foo/container/labels/com.docker.swarm.node.id===
clhlkxk2gzbk3edhropygbknq
===/foo/container/labels/com.docker.swarm.task.name===
nginx.1
===/foo/container/labels/com.docker.swarm.task===
===/foo/container/labels/com.docker.swarm.task.id===
8toszsdz7duypbbl8uwehlrbs
===/foo/container/labels/com.docker.swarm.service.name===
nginx
===/foo/container/id===
e7a6c58f3cbfd6754158d8db284e08396df0cadafa1a4c0e338826a012d9ef67
===/foo/container/fullname===
/nginx.1.8toszsdz7duypbbl8uwehlrbs
===/foo/task/name===
nginx.1
===/foo/task/slot===
1
===/foo/task/id===
8toszsdz7duypbbl8uwehlrbs
===/foo/service/name===
nginx
===/foo/service/id===
0g8upo0xzr0l7pnw41jcdpt1d
===/foo/daemon/name===
ws01

Now I'm going to add tests and docs

@justincormack
Copy link
Contributor

The tests need tweaking to special case this volume driver:

08:16:06 
08:16:06 ----------------------------------------------------------------------
08:16:06 FAIL: docker_cli_run_test.go:4301: DockerSuite.TestRunNamedVolumeNotRemoved
08:16:06 
08:16:06 docker_cli_run_test.go:4309:
08:16:06     c.Assert(strings.TrimSpace(out), checker.Equals, "test")
08:16:06 ... obtained string = "" +
08:16:06 ...     "docker_introspection\n" +
08:16:06 ...     "test"
08:16:06 ... expected string = "test"
08:16:06 
08:16:07 
08:16:07 ----------------------------------------------------------------------
08:16:07 FAIL: docker_cli_run_test.go:4318: DockerSuite.TestRunNamedVolumesFromNotRemoved
08:16:07 
08:16:07 docker_cli_run_test.go:4331:
08:16:07     c.Assert(strings.TrimSpace(out), checker.Equals, "test")
08:16:07 ... obtained string = "" +
08:16:07 ...     "docker_introspection\n" +
08:16:07 ...     "test"
08:16:07 ... expected string = "test"
08:16:07 
08:16:07 

@AkihiroSuda AkihiroSuda force-pushed the introspectionfs-wip branch 3 times, most recently from 10ca8a1 to 9324360 Compare September 13, 2016 06:08
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Sep 13, 2016

Now CI is almost passing except a flaky test #26506 .

A basic integration test is added as DockerSwarmSuite.TestSwarmServiceWithIntrospectionVolume.
I'll add more assertions

@cpuguy83 cpuguy83 removed the status/failing-ci Indicates that the PR in its current state fails the test suite label Sep 13, 2016
@justincormack
Copy link
Contributor

@AkihiroSuda one thing we discussed, and is (almost) possible now that #22373 is merged is that rather than using a volume driver we could use a special mount type (this would also require CLI support for swarm style mounts, but I think this is planned now). ie docker run --mount type=introspection,target=/run/introspection ...

Still needs some thought though cc @cpuguy83 @ehazlett

@justincormack
Copy link
Contributor

There is an issue here #8427 that wants to know memory allocation. Just linking - I don't think we should necessarily try to get everything into the initial PR.

@thaJeztah
Copy link
Member

Discussing this in the maintainers meeting, and we seem to agree that option 1 ("explicit") is the way to go; make "scope" a required option

Also discussed / vs ., and we'd prefer using a .

@AkihiroSuda AkihiroSuda force-pushed the introspectionfs-wip branch 2 times, most recently from 8245595 to e2c325c Compare April 13, 2017 09:01
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Apr 13, 2017

Implemented scope.

Examples:

  • --mount type=introspection,target=/foo,introspection-scope=.
  • --mount type=introspection,target=/foo,introspection-scope=.container.labels,introspection-scope=.task

Error if no scope is set:

$ docker run -it --rm --mount type=introspection,target=/foo busybox
invalid argument "type=introspection,target=/foo" for --mount: at least one 'introspection-scope' is required. e.g. '.' (denotes all), '.containers.labels', '.task'
See 'docker run --help'.

Error if invalid scope is set:

$ docker run -it --rm --mount type=introspection,target=/foo,introspection-scope=foo busybox
docker: Error response from daemon: invalid scope: foo (valid scopes: [. .container .container.id .container.name .container.fullname .container.labels .daemon .daemon.name .daemon.labels]).

A scope name needs to start with a period ..
The scope name . allows all the files to be present.
This may look weird, but consistent with --format {{json .}} and so on.

This commit introduces a new mount type called "introspection".
The introspection mount allows users to introspect the metadata about the
container from the container itself, via a procfs-like filesystem.

How to test:

    $ docker service create --name nginx --replicas 3 --mount type=introspection,introspection-scope=.,dst=/foo nginx
    $ for f in $(docker ps -q -f label=com.docker.swarm.service.name=nginx);do docker exec $f sh -c 'for g in $(find /foo -type f); do echo ===$g===; cat $g; done'; done

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda AkihiroSuda force-pushed the introspectionfs-wip branch from e2c325c to 50eb554 Compare April 13, 2017 09:16
@titpetric
Copy link

It was mentioned higher up that DNS records would be a valid introspection API. I assume this would be done with TEXT records on individual hosts entries, and would completely drop the need to add --mount to any containers (and I agree, UX for introspection should be available to all without an additional mount opt). Was some work done along these lines yet?

cc: @krasi-georgiev @AkihiroSuda

@AkihiroSuda
Copy link
Member Author

@titpetric

Not yet.
This PR just adds infrastructure part.
DNS and default mount should be separate GitHub issues/PRs

@AkihiroSuda
Copy link
Member Author

@thaJeztah PTAL?

@AkihiroSuda
Copy link
Member Author

@thaJeztah PTAL?
If SGTY, I'll split CLI part from this PR.

@AkihiroSuda
Copy link
Member Author

kindly ping 😃 @thaJeztah

@cpuguy83
Copy link
Member

What's the status here? Hard to follow along. This is labeled with code-review but some of the comments would suggest otherwise.

@AkihiroSuda
Copy link
Member Author

@cpuguy83

@thaJeztah suggested implementing "scope" before merging this PR
#26331 (comment)
#26331 (comment)

I implemented that feature in #26331 (comment) and waiting for his feedback 😅

  • --mount type=introspection,target=/foo,introspection-scope=. (enables all introspections)
  • --mount type=introspection,target=/foo,introspection-scope=.container.labels,introspection-scope=.task (enables /container/labels/* and /task/*)

If I can get SGTM, I'll split CLI part from this PR.

@jasonmp85
Copy link

Is there an estimate for when it will be simple to determine a container's ID from within the container?

@AkihiroSuda
Copy link
Member Author

It is implemented in this PR but still waiting for review 😅

@cpuguy83
Copy link
Member

Ping

@AkihiroSuda I like this.
Can we make scope (maybe renamed to "format") just a normal go-template on types.ContainerJSON with the default being the full json-ified object?

@AkihiroSuda
Copy link
Member Author

Can we make scope (maybe renamed to "format") just a normal go-template on types.ContainerJSON with the default being the full json-ified object?

SGTM
@justincormack @thaJeztah WDYT?

@vdemeester
Copy link
Member

@AkihiroSuda SGTM 👼

@thaJeztah
Copy link
Member

Trying to get my head round the Go template;

  • First of all; this means we are good with giving access to all properties of a container 👍👎

    • Do we want to limit what can be accessed?
    • Do we want to "abstract" it (perhaps a custom "introspection context" / struct), also to make the format consistent, even if the container JSON format may change?
  • Bit in doubt over changing to format; IIUC we'd only use the Go template to select a key, or do we see uses for actually doing formatting in other ways (again, trying to get my head around this 😅 - bear with me). If all we accept is the path to the information, scope could still make sense (🚲 🏠)

  • What would it look like to select multiple scopes? format={{.Foo}},{{.Bar}}? format={{.Foo}},format={{.Bar}} ?

  • What filename convention should we use for array-values, or are those values put in a single file? Thinking of {{.HostConfig.Dns}}:

    (using docker container inspect for reference; the introspection would look something like --mount type=introspection,target=/foo,introspection-format='{{.HostConfig.DNS}}')

    docker container create --dns=1.1.1.1 --dns=2.2.2.2 --name foobar busybox
    docker container inspect --format '{{.HostConfig.Dns}}' foobar
    [1.1.1.1 2.2.2.2]
    

    Will this produce (inside the container);

    • 2 files: /foo/HostConfig/Dns/0 and /foo/HostConfig/Dns/1
    • 1 file: /foo/HostConfig/Dns, containing [1.1.1.1 2.2.2.2]
    • 1 file: /foo/HostConfig/Dns.json, containing ["1.1.1.1","2.2.2.2"] (or should this be optional by using {{json .HostConfig.Dns}}?)
  • Slightly related to the above; will {{.NetworkSettings.Networks}}

    • create files for each network (/foo/NetworkSettings/Networks/bridge, /foo/NetworkSettings/Networks/mynetwork), each containing a (JSON?) representation of the network?
    • create directories for each network, and files for each property? (/foo/NetworkSettings/Networks/bridge/NetworkID, /foo/NetworkSettings/Networks/bridge/IPAddress)
    • Or will we require users to use a range .. ({{range .NetworkSettings.Networks}}{{println .IPAddress}}{{end}})?
  • Do we want files to be cased the same as the JSON, or (e.g.) snake_case or dash-names ?

@cpuguy83
Copy link
Member

@thaJeztah My thought was to have the actual object created by the template in the mount target (which must be specified for the mount).
So if your format is {{.HostConfig.Dns}}, then that's the value you get in the target path rather than trying to create a tree.

@AkihiroSuda
Copy link
Member Author

As Kubernetes has got dominance now, I'd suggest using DownwardAPIVolumeFile with Kubernetes: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#downwardapivolumefile-v1-core

Field Description
fieldRef ObjectFieldSelector Required: Selects a field of the pod: only annotations, labels, name and namespace are supported.
resourceFieldRef ResourceFieldSelector Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, requests.cpu and requests.memory) are currently supported.

However,

  • DownwardAPIVolumeFile seems to lack some fields discussed above, e.g. port mapping and DNS. We might want to open a PR in kubernetes repo.
  • If we want to bring this feature for Swarm-mode, probably we might want to hornor the design of DownwardAPIVolumeFile for interoperatibility. Also, maybe we want to discuss the possibility of common spec with other orchestrator (and runtime) projects.

I'm going to close this PR (Thank you all for spending your time on this PR!), but if somebody else wants to get this PR merged, please feel free to carry or just ping me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.