Skip to content

Conversation

@TBBle
Copy link
Contributor

@TBBle TBBle commented Mar 12, 2022

- What I did

Introduced a new Windows Device string syntax for the --devices and the api.types.container.DeviceMapping.PathOnHost.

The syntax is IDType://ID, reflecting an evolution and generalisation of the existing class/ID syntax. The latter remains supported specifically for class/X: IDType/ID for other IDTypes is not implemented to avoid future parsing ambiguity.

This syntax will function with both the --device parameter from https://github.com/docker/cli, and also when fed from Kubernetes (or other CRI client) via https://github.com/Mirantis/cri-dockerd, or dockershim where supported. Kubernetes and CRI are both currently agnostic about this format, passing the strings down from user-controlled APIs unchanged.

DeviceMapping no long accepts-and-ignores values in the PathInContainer or CgroupPermissions fields for Windows Devices. (This will be a separate to-be-created PR, as this was a more-impactful change than I realised)

For reference, containerd has just implemented the same syntax for both CRI and the ctr CLI, to be part of the 1.7 release series. When a more-specific CRI type is implemented for Windows Devices, then cri-dockerd (or other CRI implementation) will be responsible for packing the required data into this format for consumption by the daemon.

Windows Devices are no-longer limited to Process Isolation mode, as there are IDTypes that are also supported for HyperV Isolation, and I don't think it's desirable to repeat the mapping of which IDTypes work with which Isolation mode here.

Note: Due to limitations of the HCS v1 API used when not using the containerd backend:

  • IDTypes other than class or vpci-class-guid are not supported without the containerd backend.
  • Assigning devices to HyperV-isolated containers is not supported without the containerd backend.

Because these depend on the backend in-use, they are reported as failures from the backend via ContainerStart when not supported. Previously, ContainerCreate rejected both of these cases unconditionally, so these restrictions are looser than they were before this change.

Practically, these limitations from using the HCS v1 API won't matter, since Windows Server 2019 and earlier do not support the other IDType mappings, even if using the HCS v2 API via containerd, and Windows Server 2022 and later is intended to default to using the containerd backend and the HCS v2 API in Docker 21.xx. The only other Windows Server product not already end-of-life is SAC 20H2, which does support the newer ID types. That product, the final Windows Server SAC release, reaches end-of-life in August 2022.

- How I did it

Changed the code that split DeviceMapping{PathOnHost: "class/X"} into WindowsDevice{IDType:"class", ID: "X"} to also split DeviceMapping{PathOnHost: "W://X"} into WindowsDevice{IDType:"W", ID: "X"}

- How to verify it

Unit test coverage for the parser.

Integration tests covering class/5B45201D-F2F2-4F3B-85BB-30FF1F953599, class://5B45201D-F2F2-4F3B-85BB-30FF1F953599 and vpci-class-guid://5B45201D-F2F2-4F3B-85BB-30FF1F953599 for both Process Isolation and HyperV isolation, supported by both containerd and HCS v1-based backends, where possible.

Note: Integration tests were having issues with HyperV isolation, so they have been skipped except for the early-failure case. It turns out we currently have no integration tests for HyperV isolation running on CI. This is possibly a nested-VM issue on the CI side, so it may vary by build-host config. I saw one pass for the skipped tests on an RS5 node.

Manually test exactly like the existing --device CLI flag on Windows, except using class:// where you would currently use class/, e.g. class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 and class://5B45201D-F2F2-4F3B-85BB-30FF1F953599 should produce identical results.

hcshim defines a bunch of other possible IDTypes, but platform support and use-cases will vary, so I can't really advise on how to test them. And we don't deal with them here anyway, they go on opaquely down to hcshim to be handled.

- Description for the changelog

Windows Devices described by string, e.g. --devices in the CLI, now support the syntax IDType://ID as well as class/ID. Supported IDTypes depend on the active backend and other parameters of the container being created.

- A picture of a cute animal (not mandatory but encouraged)

Baby Animal

@TBBle TBBle force-pushed the generalised-Windows-device-syntax branch 2 times, most recently from 1d12509 to 3add247 Compare March 12, 2022 11:08
@thaJeztah thaJeztah added platform/windows status/1-design-review kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. labels Mar 12, 2022
@TBBle TBBle force-pushed the generalised-Windows-device-syntax branch 14 times, most recently from 89f51ac to 87dfecb Compare March 13, 2022 08:32
@TBBle TBBle marked this pull request as ready for review March 13, 2022 10:14
@TBBle

This comment was marked as resolved.

@TBBle TBBle force-pushed the generalised-Windows-device-syntax branch 2 times, most recently from 26651f9 to 2def881 Compare March 13, 2022 14:22
out, _ = dockerCmd(c, "ps", "-q", "-f", "status=created")
containerOut := strings.TrimSpace(out)
assert.Assert(c, strings.HasPrefix(cID, containerOut))
assert.Assert(c, strings.Contains(containerOut, shortCID), "Should have seen '%s' in ps output:\n%s", shortCID, out)
Copy link
Contributor Author

@TBBle TBBle Mar 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was slightly faulty/weird... The strings.HasPrefix args were transposed, so it was checking if the containerID has the ps -q output as a prefix, which of course only works if the ps -q output lists only the container we want (in its truncated form). So a parallel run of another test could easily have broken it, for example.

It showed up here because one of my new tests earlier has left a container in the "created" state, due to failure being delayed until "run". I was considering cleaning up state at the end of the test, but none of the other tests in that file do that, and of course if the test fails, clean-up may not be determinate.

This line is now more consistent with earlier in the same test, and also the other tests in this file which either use the same 12-character truncated ID, or call docker ps -q --no-trunc in order to compare the full container ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that; perhaps this was done to prevent any other string in the output (errors?) from causing a false "pass" (only guessing here). I expect that would only occur if the command failed, in which case the test would already be marked as failed, so I agree that this approach makes sense.

There's a lot of (some quite old) tests in integration-cli that could use some TLC some of them are quite brittle (and indeed running them in parallel, or in some cases, possible leftovers from previous tests, could make them fail).

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Generally, I think this change sounds good (I'd yet have to do a full review though). This change (indirectly) updates the API (as container create accepts the HostConfig struct). Can you add a mention in docs/api/version-history.md ?

Some thoughts I had while glancing over; most of this isn't new, and may require more substantial changes, so I would consider them really out of scope for this PR (but worth further discussion)

  • The swagger documentation for the Devices struct is really under-documented see this section; that's already the case, so something to improve separately.
  • The errors returned don't have an errdef-type, and looking at daemon.startContainer() it looks like they're converted to a System error (i.e. a 500 status on the API). This is also an existing issue (so for separate), and relates to the next point;
  • It's unfortunate that we do all of this when creating the OCI Spec, which happens on container start. This means that a container can be created without error, even if fields are invalid, only to discover there's a problem when the container is started. While some of the validation / errors at "start" can't be avoided as they will occur when (e.g.) a device isn't there (anymore), and we also must perform checks when starting already existing containers, but ideally, we'd have handling for this on (before) container create, so that the user gets an error reported (400 invalid parameter) and no "faulty" container is created at all.
  • To assist with the above, perhaps we should consider adding a type=device to the Mount API (docker run --mount type=device,src=xxx,dest=xxx); see #39293 (comment) (but I think there's other issues discussing that option as well).

@TBBle
Copy link
Contributor Author

TBBle commented Mar 16, 2022

Thanks! Generally, I think this change sounds good (I'd yet have to do a full review though). This change (indirectly) updates the API (as container create accepts the HostConfig struct). Can you add a mention in docs/api/version-history.md ?

Sure, will do. I'm assuming that 1.42 is API version for 21.xx, so I don't need to bump the API again. It seemed like the changelog here is more about structural changes than semantics, there's no mention of when class/ was introduced as being snuck through the existing PathOnHost field. Looking again, I realise that the Devices array is never mentioned, so in hindsight I made a faulty assumption.

Some thoughts I had while glancing over; most of this isn't new, and may require more substantial changes, so I would consider them really out of scope for this PR (but worth further discussion)

  • The swagger documentation for the Devices struct is really under-documented see this section; that's already the case, so something to improve separately.

I did look at the Swagger docs, and they didn't seem to document the content or semantics of the DeviceMapping at all (particularly platform-specifics) so I left them alone. I recall needing to look up how to do multiple examples for a single field in OpenAPI 2, but don't recall if I actually found an answer. (I do know this changed in OpenAPI 3 with an examples field instead.).

  • The errors returned don't have an errdef-type, and looking at daemon.startContainer() it looks like they're converted to a System error (i.e. a 500 status on the API). This is also an existing issue (so for separate), and relates to the next point;
  • It's unfortunate that we do all of this when creating the OCI Spec, which happens on container start. This means that a container can be created without error, even if fields are invalid, only to discover there's a problem when the container is started. While some of the validation / errors at "start" can't be avoided as they will occur when (e.g.) a device isn't there (anymore), and we also must perform checks when starting already existing containers, but ideally, we'd have handling for this on (before) container create, so that the user gets an error reported (400 invalid parameter) and no "faulty" container is created at all.

Yeah, this was a big disappointment when I realised it was the case. It would be possible to add something to daemon.containerCreate to validate that the device syntax is correct, but the interesting errors of "valid syntax but IDType unknown to hcsshim" depend on calls that come later that daemon.createSpec, when turning OCI spec into HCS spec. hcsshim doesn't expose any API for validating these, AFAIK, particularly if you want to know if an IDType is not just known at all, but is valid in context of the rest of the spec.

  • To assist with the above, perhaps we should consider adding a type=device to the Mount API (docker run --mount type=device,src=xxx,dest=xxx); see #39293 (comment) (but I think there's other issues discussing that option as well).

That's an interesting idea, but I agree it's a bit out-of-scope for what I'm looking at here (because it's CLI usage, which was really just a nice bonus). It also doubles-down on the idea that for Windows devices, we're shoehorning device info via a string, since we're looking at --mount type=device,src=class://GUID, and that needs to end up with dest="", not dest=src which might be the obvious default.

In the end, my main focus is to ensure that dockershim/cri-dockerd users aren't getting shut out of the new niceness I added to containerd. I don't know how valuable the IDTypes other than the already-functional class are to other Docker CLI/API users.

Because of this, it would be quite reasonable for example to say that for the Docker API, it should be PathOnHost: IDType, PathInContainer: ID, with an exception for existing Path: class/GUID behaviour (to cover current CLI and dockershim), and then it's up to the cri-dockerd code to parse the CRI HostPath :// into two fields instead of dockerd. That would also fit closer with the --mount type=device idea, particularly if the src and dest flags could take idtype and id as aliases, which is why I note this. (Long term, CRI should receive IDType and ID as two separate fields, so at that point cri-dockerd would be pasting them back into a single string for PathOnHost, which is odd but not unbearable.)

And CDI will make things slightly weird again, although perhaps dockerd won't deal with that, if it's implemented at the CRI layer. (Although its current spec'd as modding an OCI spec, so I guess it would have to be supported in dockerd due to the way cri-dockerd is implemented? Or reimplement parts as modding a Docker API HostConfig instead.)

Looking at the OpenAPI spec and version changelog, I remember now that I was wondering if DeviceRequest would have been a better place in the API to support IDType+ID devices like this. That would require work in both the CLI and cri-dockerd; that's not a problem, just an observation. And the bulk of this work (getting tests in place and rounding a few rough edges in the existing PathOnHost: class/GUID support) remains valid.


If the :// separator itself is a concern, it'd be good to haggle that out soon, ideally aligning with containerd before it becomes a release build. Although I'm not really seeing any objection to that aspect here, which is nice.

Note: I won't get a chance to get back to this code-wise until this coming weekend. I'm not seeing any complex or structure-changing changes to be made at this point, so I expect I can knock them out pretty easily given a couple of hours.

@thaJeztah
Copy link
Member

Quick reply to some of the above (have to jump out, so may add more later)

Sure, will do. I'm assuming that 1.42 is API version for 21.xx, so I don't need to bump the API again

Correct; 1.42 is the "upcoming" version

there's no mention of when class/ was introduced

Yeah, I suspect there have been various changes that sneaked through in the past (some of them because they made changes "indirectly" through structs or parsing like the one that's being affected here).

I did look at the Swagger docs, and they didn't seem to document the content or semantics of the DeviceMapping at all (particularly platform-specifics) so I left them alone.

That's ok; we still have a lot of things to address in the swagger file(s); I'm trying to tackle them one at a time (but there's many things under-documented).

@thaJeztah thaJeztah added this to the 21.xx milestone Mar 24, 2022
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the handling of the PathInContainer and CgroupPermissions fields on Windows containers is going to have unintended consequences. Everything else LGTM. Since that API change is tangential to introducing the :// syntax for devices, would you consider spinning commit b63a072 out into its own PR?

Comment on lines 461 to 467
if deviceMapping.PathInContainer != "" {
return nil, errors.Errorf("device container path: '%s' should be empty", deviceMapping.PathInContainer)
}

if deviceMapping.CgroupPermissions != "" {
return nil, errors.Errorf("device cgroup permissions: '%s' must be empty", deviceMapping.CgroupPermissions)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container's HostConfig is serialized to disk so it's possible for these fields to be set on an existing Windows container. Such containers will fail to start until the container configuration is updated to zero out the offending fields. Since the presence of these fields is otherwise harmless, we should probably allow existing containers to start even if these fields are set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't aware of that. Definitely happy to pull that change into into a separate PR, and perhaps rework it to deal with this correctly.

Comment on lines 53 to 56
* `POST /containers/create` for Windows containers now rejects non-empty values in
`HostConfig.Resources.Devices.PathInContainer` and
`HostConfig.Resources.Devices.CgroupPermissions` that were previously silently ignored.
This change is not versioned, and affects all API versions if the daemon has this patch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't appear that the described behavior is aligned with the implementation. func (*Daemon) createSpec is only called in one place: func (*Daemon) containerStart. Therefore PathInContainer and CgroupPermissions will not be checked until one attempts to start the container.

Could this API change be versioned such that for API requests with version <= 1.41, the existing accept-but-ignore behavior is retained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, that's a cut-and-paste error on my part. It's "create" on the containerd/local-containerd side, but that's part of "start" on the Docker API side. I haven't looked at how API versioning works, I can look into that after breaking out into a separate PR.

@TBBle TBBle force-pushed the generalised-Windows-device-syntax branch 2 times, most recently from 0e11c1e to 0e10fb5 Compare March 26, 2022 04:05
@TBBle
Copy link
Contributor Author

TBBle commented Mar 26, 2022

I've broken out the PathInContainer and CgroupPermissions changes into a separate branch (https://github.com/TBBle/moby/tree/reject-non_empty-hostconfig-device-resources-we-cannot-use-on-windows) but I haven't created a PR yet since it's based on this one (adds to tests, changes a function created here). Once this lands, I'll rebase and create a draft PR to rework it based on @corhere's feedback.

That said, since it wasn't really a blocking change in any way (just seemed like low-hanging fruit, but not as low-hanging as I thought) it's not a high priority for me to advance that new PR right now.

(Edit a year later: I totally forgot about pushing this out into a new PR. I probably won't get around to it for another year (focused elsewhere), but the above branch still exists as it was... I'll try and rebase it to current at some point.)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! changes LGTM

unfortunately, looks like this needs one more rebase due to a conflict in the api/changes.md - happy to do a rebase for you if you prefer (should probably be easy to resolve)

Comment on lines +316 to +317
// these represent an Interface Class GUID.
if d.IDType != "class" && d.IDType != "vpci-class-guid" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity those consts are internal in hcsshim (https://github.com/microsoft/hcsshim/blob/v0.9.2/internal/uvm/virtual_device.go#L14-L21). Perhaps we should have a look if (some of) those make sense to be exported somewhere by hcshhim so that consumers of hcsshim can use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about doing exactly that earlier today. There's hcsshim/pkg/annotations, so something similar for these constants makes sense to me. I have a few (too many) hcsshim PRs in-flight though right now. -_- But this one's on my radar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I know you have some other ones. Let's see if we can chat with some of the MS maintainers to get some moving.

BTW; are you on any of the docker or CNCF slack channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not on any of the Slack channels, as it happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a pity, thought it would be easier to (when needed) have a synchronous chat (also with the containerd maintainers, who mostly reside on the CNCF slack). If you ever decide to join that slack, let me know, then I can connect you with them as well 👍

out, _ = dockerCmd(c, "ps", "-q", "-f", "status=created")
containerOut := strings.TrimSpace(out)
assert.Assert(c, strings.HasPrefix(cID, containerOut))
assert.Assert(c, strings.Contains(containerOut, shortCID), "Should have seen '%s' in ps output:\n%s", shortCID, out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that; perhaps this was done to prevent any other string in the output (errors?) from causing a false "pass" (only guessing here). I expect that would only occur if the command failed, in which case the test would already be marked as failed, so I agree that this approach makes sense.

There's a lot of (some quite old) tests in integration-cli that could use some TLC some of them are quite brittle (and indeed running them in parallel, or in some cases, possible leftovers from previous tests, could make them fail).

@TBBle
Copy link
Contributor Author

TBBle commented Mar 26, 2022

  • "ci / test-buildkit (./client, integration) (pull_request)" failure appears to have been a panic during a call to t.Skipf: Not related.
  • Jenkins "win-2020-c8d" failed in "github.com/docker/docker/libnetwork/ipam TestRequestReleaseAddressDuplicate": Not related, but annoying because it didn't get to run the integration test suite. Also, given there's a rand.Intn in that test, the failure suggests that there's a subtle or uncommon bug in the test or the code under test.

@thaJeztah
Copy link
Member

TestRequestReleaseAddressDuplicate is known to be flaky (quite some of the libnetwork tests we inherited from the libnetwork repository are known to be flaky (and need to be looked into / improved).

I'll kick CI once more 👍

@thaJeztah
Copy link
Member

nice; CI is green now

@corhere PTAL

TBBle added 5 commits March 27, 2022 13:23
Since this function is about to get more complicated, and change
behaviour, this establishes tests for the existing implementation.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
If not using the containerd backend, this will still fail, but later.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Arbitrary here does not include '', best to catch that one early as it's
almost certainly a mistake (possibly an attempt to pass a POSIX path
through this API)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
IDType `vpci-class-guid` is a synonym of `class`.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
The test was dependent on its container being _first_ in the response,
but anywhere on the line should be fine.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle force-pushed the generalised-Windows-device-syntax branch from c91ebe9 to 064650d Compare March 27, 2022 02:27
@TBBle
Copy link
Contributor Author

TBBle commented Mar 27, 2022

win-RS5 run timed out during integration-cli tests, everything else passed.

@thaJeztah
Copy link
Member

good catch on the changelog rebase (I definitely missed it 😓); let me kick CI once more

@thaJeztah
Copy link
Member

Thanks for review!

I think @corhere's comments have been addressed, and I'll go ahead with bringing this one in; if there's anything left to address let us know, @corhere, then we can address that in a follow-up

Thanks for your contribution, @TBBle !

@thaJeztah thaJeztah merged commit a6005ef into moby:master Mar 31, 2022
@TBBle TBBle deleted the generalised-Windows-device-syntax branch March 31, 2022 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs/revisit impact/api impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. platform/windows status/2-code-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants