Skip to content

registry: allow mirror paths in config#46594

Merged
thaJeztah merged 1 commit intomoby:masterfrom
regisb:36598-regisb/registry-mirror-path
Oct 12, 2023
Merged

registry: allow mirror paths in config#46594
thaJeztah merged 1 commit intomoby:masterfrom
regisb:36598-regisb/registry-mirror-path

Conversation

@regisb
Copy link
Contributor

@regisb regisb commented Oct 5, 2023

Close #36598

Note that this is my first PR in the Moby project, or in any go-language project for that matter. Please forgive my mistakes, and give detailed instructions if changes are necessary.

- What I did

Allow users to run registry mirrors that are rooted at a subpath of a domain.

- How I did it

In the registry configuration validation step, I removed the parts that check that there is no path in the mirror URIs.

- How to verify it

Run a registry with docker.io proxy config:

docker-compose.yml:

services:

  registry:
    image: docker.io/registry:2
    volumes:
      - ./registry.yml:/etc/docker/registry/config.yml:ro
      - ./data/registry:/var/lib/registry
    restart: unless-stopped

  proxy:
    image: docker.io/caddy
    ports:
      - "5001:5001"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
    restart: unless-stopped

registry.yml:

version: 0.1
log:
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /var/lib/registry
http:
  addr: :5000
  headers:
    X-Content-Type-Options: [nosniff]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3
proxy:
  remoteurl: https://registry-1.docker.io

Caddyfile:

:5001 {
    handle_path /hub/* {
        reverse_proxy http://registry:5000
    }
}

Run a Docker daemon with the following configuration in /etc/docker/daemon.json:

{
    "registry-mirrors": [
        "http://localhost:5001/hub"
    ]
}

Then pull an image: docker pull ubuntu and verify that the request goes through the self-hosted registry.

- Description for the changelog

Make it possible to run a registry mirror with an URI that includes a path.

- A picture of a cute animal (not mandatory but encouraged)

cute

@corhere
Copy link
Contributor

corhere commented Oct 5, 2023

Disallowing path, query and fragment components in the registry mirror URL config was not an accident. The restriction was added deliberately, with unit tests no less! There must have been a reason behind it. It might not have been a good reason, or maybe it was a historical reason which is no longer relevant. Regardless, we have to understand why the restrictions were added before they can be removed. Please update the PR description with an explanation as to why path, query and fragment components were historically disallowed in registry mirror URLs and why the rationale is no longer applicable.

How do non-empty path, query and fragment components in a configured mirror URL affect the requests made to the registry mirror? The OCI distribution spec defines all registry API endpoints as fully-qualified paths beginning with /v2/. Is the configured mirror URL path a path prefix, or a prefix to the repository namespace? How is the query component of the mirror config URL merged with the query parameters for e.g. listing tags? What happens if the query parameters in the configured mirror URL conflict with query parameters defined by the distribution API spec? (For the case of listing tags, the query parameters n and last would conflict.)

I can tell you immediately that the fragment component should continue to be disallowed. The URL fragment has no effect on HTTP semantics (it is not transmitted in HTTP requests) and thus is not needed for compatibility with any registry. Only the registry client—the moby daemon—ever sees the fragment component of the mirror URL, and so is the only part of the system which could imbue the mirror URL fragment with some semantic meaning. No such semantics are currently defined. Disallowing fragments in the mirror URLs effectively reserves fragments for future use. Lifting that restriction would effectively define the semantics as "permitted and ignored," making any future use of the fragment into a breaking change.

@gabor-boros
Copy link

@corhere Although I believe you didn't mean it, your comment feels a little bit passive-aggressive.

Disallowing path, query and fragment components in the registry mirror URL config was not an accident. The restriction was added deliberately, with unit tests no less! There must have been a reason behind it. It might not have been a good reason, or maybe it was a historical reason which is no longer relevant. Regardless, we have to understand why the restrictions were added before they can be removed. Please update the PR description with an explanation as to why path, query and fragment components were historically disallowed in registry mirror URLs and why the rationale is no longer applicable.

In my opinion, linking the great article about fence principle is not helping much here, instead generates disappointment as contributors, especially those who are contributing for the first time, would expect help rather than indoctrination -- speaking only in my name.

Before opening the pull request, both @regisb and I looked up the historical reasons, though similarly to docker-related contributors, we didn't find much information. I assume you tried to look up the reason as well, and ended up with similar results as we did. However, reading through the GitHub issue carefully, it turns out the relevant missing information is stated in a comment. Also, the other reference was linked in that comment, that explicitly states the original root cause:

containerd does not currently support subpaths either so its not only validation

but as the first linked comment mentions,

Since then, containerd looks to have added support [...]

So, now we know the historical reasons of the issue, and it is already stated that it is not an issue anymore from containerd's side. However, the comment continues as

however, it has not yet been looked into if that can be used (and BuildKit would have to be updated to user/support it, as it currently doesn't)

The comment is three years old and BuildKit evolved a lot over time. Neither of us (@regisb and I) are familiar with the BuildKit code base, however, as a core maintainer I assume you have better understanding than us and if you don't have time to look it up, at least you could help us pointing to the right direction to figure out if BuildKit added the support or not. If my assumptions are correct, could you please help out with that? Either pointing us to the right direction or validating if BuildKit added support by yourself would help I think.

How do non-empty path, query and fragment components in a configured mirror URL affect the requests made to the registry mirror? The OCI distribution spec defines all registry API endpoints as fully-qualified paths beginning with /v2/.

Although this is the first time I read the specification, the original specification that the one you linked builds on says that

All endpoints will be prefixed by the API version and the repository name: /v2//

though it does not mean that fully qualified API endpoints must start with /v2/. It only says that the endpoints serving the images/metadata/etc. must be prefixed by /v2, which means that /hub/v2/... should still be correct. If that's not the case, then it may worth clarifying this in the specification.

I can tell you immediately that the fragment component should continue to be disallowed. [...]

Regarding this, I agree that query and fragment components should not be allowed, but allowing the path would solve many companies' and individuals' issues without breaking the specification or the ecosystem.

@regisb
Copy link
Contributor Author

regisb commented Oct 6, 2023

Thanks for your detailed response @corhere, you are raising very valid points.

Based on your comment, I think it's quite clear that fragments and queries should not be allowed in the registry mirror URIs. I'll amend the PR in that direction. But for now let's focus on the path prefix, which is our (Gabor and I) primary subject of concern.

Is the configured mirror URL path a path prefix, or a prefix to the repository namespace?

I'm not sure I understand the difference. Would you care to explain?

Disallowing path, query and fragment components in the registry mirror URL config was not an accident. The restriction was added deliberately, with unit tests no less! There must have been a reason behind it. It might not have been a good reason, or maybe it was a historical reason which is no longer relevant. Regardless, we have to understand why the restrictions were added before they can be removed.

Yes, I agree with that. We would be very grateful if you helped us navigate the design decisions bethind the spec.

I may have found a clue in the "Historical context" section of the OCI distribution spec you linked to: https://github.com/opencontainers/distribution-spec/blob/main/spec.md#historical-context This spec points to issue #8093 This issue in turn includes the following paragraph:

Unlike the present /v1/... which is locked to the root of the URI path, the /v2/... is expected be the relative root of the path, for easier route handling.

This sentence seems to indicate that the v1 API was not designed to be compatible with a path prefix, as opposed to v2. Is my understanding correct? That would be confirmed by the PR that initially introduced the --registry-mirrors option: https://github.com/moby/moby/pull/7202/files

return fmt.Sprintf("%s://%s/v1/", uri.Scheme, uri.Host), nil

That mirroring scheme seems to have ignored path prefixes either for the sake of simplicity, or because the v1/ prefix needed to be at the root.

@regisb regisb force-pushed the 36598-regisb/registry-mirror-path branch from c2eb86e to a09d416 Compare October 6, 2023 08:59
@corhere
Copy link
Contributor

corhere commented Oct 6, 2023

Is the configured mirror URL path a path prefix, or a prefix to the repository namespace?

I'm not sure I understand the difference. Would you care to explain?

Consider the case of pulling the image manifest of docker.io/library/hello-world:latest through a configured mirror https://example.com/hub. If the path is a prefix, the daemon would prepend the mirror URL path to the API endpoint path:

GET /hub/v2/library/hello-world/manifests/latest
    \--/   \-------------------/         \------/
      \_ Path prefix  |                      \_ Reference (tag)
                       \_ Repository namespace

This is the interpretation you seem to want, given the verification steps in the PR description. It allows for registries "mounted" at a subpath of a domain to be used as Docker Hub mirrors. However, the Moby daemon would only be able to use a repository deployed on a subpath as a Hub mirror as the Docker image reference syntax used in Dockerfiles and on the docker command line only has affordances for registries hosted at the root of a domain. That is to say, there is no way to instruct docker pull to pull an image from the repository namespace foo/bar hosted on the registry example.com/my-private-registry. Kubernetes (cri-containerd) uses the same image reference syntax, and will have the same limitations. I can't speak for the wider OCI ecosystem, however, as the image reference syntax is merely a de facto standard. Containerd supports arbitrary path prefixes (and mirrors in general) as of v1.6.0 and BuildKit supports it as of v0.12.

I may have found a clue in the "Historical context" section of the OCI distribution spec you linked to: https://github.com/opencontainers/distribution-spec/blob/main/spec.md#historical-context This spec points to issue #8093 This issue in turn includes the following paragraph:

Unlike the present /v1/... which is locked to the root of the URI path, the /v2/... is expected be the relative root of the path, for easier route handling.

This sentence seems to indicate that the v1 API was not designed to be compatible with a path prefix, as opposed to v2. Is my understanding correct?

The reference registry client implementation, used by the daemon when the experimental containerd image backend is not enabled, has always supported registry URLs with an arbitrary path prefix. That is strong evidence in favour of it having been the intention of the designers of the Registry v2 API to permit registries hosted on subpaths of a domain. Unfortunately the language about paths being relative does not appear to have made it into either the Docker Registry v2 spec nor the OCI Distribution spec so other implementations of registries and clients may have interpreted the specs more conservatively. I suggest opening an issue on https://github.com/opencontainers/distribution-spec asking for clarification on the matter.


Whereas if the path is a prefix to the repository namespace, the daemon would prepend the mirror URL path to the repository namespace before interpolating the repository namespace into the API endpoint path:

GET /v2/hub/library/hello-world/manifests/latest
       \-----------------------/         \------/
             \_ Repository namespace        \_ Reference (tag)

This is equivalent to the daemon pulling the image reference example.com/hub/library/hello-world:latest. The registry would need to be hosted at the root of the domain, but would allow the same registry deployment to be used both as a Docker Hub mirror and as a private registry. From what I can tell, this is the interpretation which would interoperate with Harbor. It is also the interpretation I would prefer as it can be explained in terms of mapping one image reference to another, requires no understanding of how image pulls work over the wire, and avoids any ambiguities in the OCI Distribution spec.

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, we don't have to decide between allowing a path prefix or a repository namespace prefix to be configured for a Docker Hub mirror. I've set up a false dichotomy: different situations could call for some combination of both! We still have to pick which one of the prefixes the path component of the URL represents, though.

I propose that we stick with the interpretation as currently implemented in this PR: the path component of the mirror URL is defined to be the path prefix for API requests to the mirror. (https://example.com/hub -> GET /hub/v2/...) And in the future, (read: in another PR,) we could allow the repository namespace prefix to be configured through the fragment component to support the Harbor use-case!

Let's see what the other maintainers think.

@corhere corhere changed the title registry: allow mirror paths, query and fragments in config registry: allow mirror paths in config Oct 6, 2023
Path prefixes were originally disallowed in the `--registry-mirrors`
option because the /v1 endpoint was assumed to be at the root of the
URI. This is no longer the case in v2.

Close moby#36598

Signed-off-by: Régis Behmo <[email protected]>
@yob
Copy link

yob commented Oct 12, 2023

Cross linking a related buildkit issue of mine: moby/buildkit#4079

I propose that we stick with the interpretation as currently implemented in this PR: the path component of the mirror URL is defined to be the path prefix for API requests to the mirror. (https://example.com/hub -> GET /hub/v2/...) And in the future, (read: in another PR,) we could allow the repository namespace prefix to be configured through the fragment component to support the Harbor use-case!

This would also work very well with AWS ECR pull through caches. To tweak your example:

Consider the case of pulling the image manifest of public.ecr.aws/datadog/agent:7.46.0 through a configured mirror https://<account_id>.dkr.ecr.us-east-1.amazonaws.com. We'd need to prefix /ecr-public and do a request like this to the mirror:

GET /v2/ecr-public/datadog/agent/manifests/7.46.0
       \-----------------------/           \------/
             \_ Repository namespace        \_ Reference (tag)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 3e43f8e into moby:master Oct 12, 2023
@thaJeztah thaJeztah added this to the 25.0.0 milestone Oct 12, 2023
@regisb regisb deleted the 36598-regisb/registry-mirror-path branch October 13, 2023 13:57
@regisb
Copy link
Contributor Author

regisb commented Oct 13, 2023

Thanks for the quick review everyone! This was my first experience contributing to moby, and it's been a very pleasant one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker daemon won't start if a registry-mirror is configured in /etc/docker/daemon.json

5 participants