Skip to content

Add support for extracting certs from Caddy v2 data#3485

Closed
am97 wants to merge 2 commits intodocker-mailserver:masterfrom
am97:master
Closed

Add support for extracting certs from Caddy v2 data#3485
am97 wants to merge 2 commits intodocker-mailserver:masterfrom
am97:master

Conversation

@am97
Copy link
Copy Markdown

@am97 am97 commented Aug 16, 2023

Description

This PR adds support for extracting certificates from Caddy v2 data, as this is already supported for Traefik's acme.json.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

am97 added 2 commits August 16, 2023 17:51
Ensures we can retrieve the certificates even if the domain has a wildcard or other characters
@am97
Copy link
Copy Markdown
Author

am97 commented Aug 16, 2023

I would like to add a test for the new _caddy_sanitize_domain function (call it with multiple values and check that the output is what's expected). Where can I add it ? The function has no dependencies other than bash and sed, perhaps it's not necessary to spawn a container to test this function.

@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview for this PR is ready! 🎉

Built with commit: 55afb92

Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for the contribution! ❤️

There's a fair amount of changes to address, usually it's a better idea to open an issue and discuss them prior for approval.

If you can address the change requests I'm ok with continuing the review to approval, but keep in mind that a refactor in future may affect this feature.

EXTRACTED_DOMAIN=('DOMAINNAME' "${DOMAINNAME}")
else
_log 'warn' "letsencrypt (acme.json) failed to identify a certificate to extract"
exit 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we like to use exit as it'll stop DMS (during startup, or the check-for-changes.sh service), the log is only warn. If we really do want to fail hard we use our panic helpers instead.

The other supported SSL_TYPE do allow for panic, so we could issue the warning followed by panic:

function _dms_panic__misconfigured { dms_panic 'misconfigured' "${1:-}" "${2:-}" "${3:-}" ; }

( 'misconfigured' ) # PANIC_INFO == <something possibly misconfigured, eg an ENV var>
SHUTDOWN_MESSAGE="${PANIC_INFO} appears to be misconfigured, please verify."

Suggested change
exit 1
_dms_panic__misconfigured 'acme.json' "${SCOPE_SSL_TYPE}"

I'm not sure if that's the right choice, it's technically a misconfiguration related to it though. The logs would provide context of the warning prior to it and the SSL_TYPE=letsencrypt scope.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the panic above would trigger DMS shutdown fully, even from the check-for-changes.sh script noticing an updated acme.json if it fails to extract the cert.

I think we may not have panicked here as the acme.json was less explicit of a config vs the others, and the user had less direct control/influence over that 🤷‍♂️

You don't need to apply the suggestion, we could just revert the exit 1 you've added. Feel free to wait for another maintainer to share their opinion on what should happen 👍

In the extraction method call, you'll see some other failures are logged with warn and a return 1 (which is the proper way to exit a function early).

Suggested change
exit 1
return 1

Comment on lines +138 to +157
# Variable only intended for troubleshooting via debug output
local EXTRACTED_DOMAIN

_log 'debug' "Detected caddy certificates"

# Conditional handling depends on the success of `_extract_certs_from_caddy_data`,
# Failure tries the next fallback FQDN to try extract a certificate from.
# Subshell not used in conditional to ensure extraction log output is still captured
if [[ -n ${SSL_DOMAIN} ]] && _extract_certs_from_caddy_data "${SSL_DOMAIN}"; then
EXTRACTED_DOMAIN=('SSL_DOMAIN' "${SSL_DOMAIN}")
elif _extract_certs_from_caddy_data "${HOSTNAME}"; then
EXTRACTED_DOMAIN=('HOSTNAME' "${HOSTNAME}")
elif _extract_certs_from_caddy_data "${DOMAINNAME}"; then
EXTRACTED_DOMAIN=('DOMAINNAME' "${DOMAINNAME}")
else
_log 'warn' "letsencrypt (caddy data) failed to identify a certificate to extract"
exit 1
fi

_log 'trace' "letsencrypt (caddy data) extracted certificate using ${EXTRACTED_DOMAIN[0]}: '${EXTRACTED_DOMAIN[1]}'"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively copy/paste of the traefik method with the important distinction being calls to _extract_certs_from_caddy_data() instead of _extract_certs_from_acme().

As you're introducing a variant here, rather than duplicating this logic, you could instead detect caddy vs traefik handling into a common extract method.

Comment on lines +472 to +496
function _extract_certs_from_caddy_data() {
local CERT_DOMAIN=${1}
local SANITIZED_CERT_DOMAIN
if [[ -z ${CERT_DOMAIN} ]]; then
_log 'warn' "_extract_certs_from_caddy_data | CERT_DOMAIN is empty"
return 1
fi

SANITIZED_CERT_DOMAIN=$(_caddy_sanitize_domain "${CERT_DOMAIN}")

# Currently we advise SSL_DOMAIN for wildcard support using a `*.example.com` value,
# The filepath however should be `example.com`, avoiding the wildcard part:
if [[ ${SSL_DOMAIN} == "${CERT_DOMAIN}" ]]; then
CERT_DOMAIN=$(_strip_wildcard_prefix "${SSL_DOMAIN}")
fi

mkdir -p "/etc/letsencrypt/live/${CERT_DOMAIN}/"
# Caddy can use different CA to obtain a certificate (Lets Encrypt, ZeroSSL or
# local). We don't know which one was used, so we use a '*' to match both. In
# case both exist (improbable), just the first one will be copied by GNU cp
cp -v /caddy_data/caddy/certificates/*/"${SANITIZED_CERT_DOMAIN}/${SANITIZED_CERT_DOMAIN}.key" "/etc/letsencrypt/live/${CERT_DOMAIN}/key.pem" || exit 1
cp -v /caddy_data/caddy/certificates/*/"${SANITIZED_CERT_DOMAIN}/${SANITIZED_CERT_DOMAIN}.crt" "/etc/letsencrypt/live/${CERT_DOMAIN}/fullchain.pem" || exit 1

_log 'trace' "_extract_certs_from_caddy_data | Certificate successfully extracted for '${CERT_DOMAIN}'"
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding earlier review feedback on a common extraction method, we can observe that there is common logic that can be shared, only a conditional is necessary for when traefik vs caddy differ. A slight refactor would make that a single conditional block?


Common check:

local CERT_DOMAIN=${1}
if [[ -z ${CERT_DOMAIN} ]]; then
_log 'warn' "_extract_certs_from_acme | CERT_DOMAIN is empty"
return 1
fi

Attempt to retrieve / detect the certificate files (traefik example):

local KEY CERT
KEY=$(acme_extract.py /etc/letsencrypt/acme.json "${CERT_DOMAIN}" --key)
CERT=$(acme_extract.py /etc/letsencrypt/acme.json "${CERT_DOMAIN}" --cert)
if [[ -z ${KEY} ]] || [[ -z ${CERT} ]]; then
_log 'warn' "_extract_certs_from_acme | Unable to find key and/or cert for '${CERT_DOMAIN}' in '/etc/letsencrypt/acme.json'"
return 1
fi

Common step (now that cert content is found, ensure directory for CERT_DOMAIN exists):

# Currently we advise SSL_DOMAIN for wildcard support using a `*.example.com` value,
# The filepath however should be `example.com`, avoiding the wildcard part:
if [[ ${SSL_DOMAIN} == "${CERT_DOMAIN}" ]]; then
CERT_DOMAIN=$(_strip_wildcard_prefix "${SSL_DOMAIN}")
fi
mkdir -p "/etc/letsencrypt/live/${CERT_DOMAIN}/"

NOTE: The above SSL_DOMAIN is planned for removal (unnecessary legacy config contribtued a while back)

Final step, populate a copy of the cert content to internal files DMS is configured to use (traefik example):

echo "${KEY}" | base64 -d > "/etc/letsencrypt/live/${CERT_DOMAIN}/key.pem" || exit 1
echo "${CERT}" | base64 -d > "/etc/letsencrypt/live/${CERT_DOMAIN}/fullchain.pem" || exit 1
_log 'trace' "_extract_certs_from_acme | Certificate successfully extracted for '${CERT_DOMAIN}'"

Comment on lines +498 to +518
# Make the same substitutions that Caddy does on domains
# This is a translation to sh of the Safe function in
# https://github.com/caddyserver/certmagic/blob/v0.19.1/storage.go#L242
function _caddy_sanitize_domain() {
local CERT_DOMAIN=${1,,} # lowercase
CERT_DOMAIN=${CERT_DOMAIN##+([[:space:]])} # trim leading spaces
CERT_DOMAIN=${CERT_DOMAIN%%+([[:space:]])} # trim trailing spaces

# replace a few specific characters
CERT_DOMAIN=$(sed '
s/ /_/g;
s/+/_plus_/g;
s/*/wildcard_/g;
s/:/-/g;
s/\.\.//g;
' <<< "${CERT_DOMAIN}")

# Remove all non-word characters and print result to stdout
# Note: \w in the original code is equivalent to [:alnum:]_
echo "${CERT_DOMAIN//[^[:alnum:][email protected]]/}"
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CertMagic referenced code is used for plenty of methods to sanitize a filepath. Most of these should not be relevant for DMS support as CERT_DOMAIN should already be aligned with a valid hostname (mail.example.com or example.com). The only special handling we've had was for SSL_DOMAIN=*.example.com but as no *. value is relevant to supporting the Traefik acme.json feature it's only a UX value that we'd rather just drop.

I've not looked at the filenames Caddy is generating, I assume wildcard_ prefix is the only one you'd encounter? If so you're just wanting to check if wildcard_${DOMAINNAME} is a valid match in the filesystem before $HOSTNAME and $DOMAINNAME, like we do with SSL_DOMAIN?

This method could instead just check for existence of the file like we do in a later step:

# Identify a valid letsencrypt FQDN folder to use.
function _find_letsencrypt_domain() {
local LETSENCRYPT_DOMAIN
if [[ -n ${SSL_DOMAIN} ]] && [[ -e /etc/letsencrypt/live/$(_strip_wildcard_prefix "${SSL_DOMAIN}")/fullchain.pem ]]; then
LETSENCRYPT_DOMAIN=$(_strip_wildcard_prefix "${SSL_DOMAIN}")
elif [[ -e /etc/letsencrypt/live/${HOSTNAME}/fullchain.pem ]]; then
LETSENCRYPT_DOMAIN=${HOSTNAME}
elif [[ -e /etc/letsencrypt/live/${DOMAINNAME}/fullchain.pem ]]; then
LETSENCRYPT_DOMAIN=${DOMAINNAME}
else
_log 'error' "Cannot find a valid DOMAIN for '/etc/letsencrypt/live/<DOMAIN>/', tried: '${SSL_DOMAIN}', '${HOSTNAME}', '${DOMAINNAME}'"
_dms_panic__misconfigured 'LETSENCRYPT_DOMAIN' '_find_letsencrypt_domain'
fi
echo "${LETSENCRYPT_DOMAIN}"
}

@georglauterbach or @casperklein might have a better suggestion to scan your child dirs of /caddy_data/caddy/certificates/ for existence of the directory or file "${CERT_DOMAIN}/${CERT_DOMAIN}.key". Probably with the find command. Then I suppose running the same check first with wildcard_ prepended to ${CERT_DOMAIN} instead would do the trick 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just use find. It was written for this use case.

- caddy_data:/caddy_data
environment:
SSL_TYPE: letsencrypt
SSL_DOMAIN: mail.example.com # Ensure this is exactly the same as the domain you pass to Caddy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary once feedback is applied.

Suggested change
SSL_DOMAIN: mail.example.com # Ensure this is exactly the same as the domain you pass to Caddy

caddy_data:

services:
caddy:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with nginx + traefik example configs:

Suggested change
caddy:
reverse-proxy:

labels:
caddy.local_certs: # Remove this label when going to production

dms:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with nginx + traefik example configs:

Suggested change
dms:
mailserver:

[Caddy](https://caddyserver.com/) is an open-source web server with built-in SSL certificate generation. You can use the [official Docker image](https://hub.docker.com/_/caddy) and write your own Caddyfile, or use [lucaslorentz/caddy-docker-proxy](https://github.com/lucaslorentz/caddy-docker-proxy), which lets you add labels to your services in order to generate a Caddyfile.

The generated certificates can then be mounted:
DMS supports retrieving certificates from Caddy data when the local file system storage is used (the default). You just need to mount your Caddy data and set `SSL_TYPE: letsencrypt` in the DMS container. Here is a minimal example with `lucaslorentz/caddy-docker-proxy`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the admonition (example type, the purple collapsible section in our docs): ??? example

Adding that back requires 4 space indentation of the codeblock that follows, like the JSON snippet prior was.

}
}
```
[Caddy](https://caddyserver.com/) is an open-source web server with built-in SSL certificate generation. You can use the [official Docker image](https://hub.docker.com/_/caddy) and write your own Caddyfile, or use [lucaslorentz/caddy-docker-proxy](https://github.com/lucaslorentz/caddy-docker-proxy), which lets you add labels to your services in order to generate a Caddyfile.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the Caddy section was outdated a fair bit, you have dropped information about configuring the key type (Caddyfile and JSON examples). The self-generated and nginx-proxy sections both touch on this too, but it's most likely not too relevant / specific to DMS, so dropping these old examples is acceptable.

Comment on lines -567 to -568
- ${CADDY_DATA_DIR}/certificates/acme-v02.api.letsencrypt.org-directory/mail.example.com/mail.example.com.crt:/etc/letsencrypt/live/mail.example.com/fullchain.pem
- ${CADDY_DATA_DIR}/certificates/acme-v02.api.letsencrypt.org-directory/mail.example.com/mail.example.com.key:/etc/letsencrypt/live/mail.example.com/privkey.pem
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here in the old docs we have an example filepath that your PR presumably operates on and is still roughly the same?

Like with certbot section, it advises mounting the relevant cert folder to /etc/letsencrypt/live location that DMS will internally check for with SSL_TYPE=letsencrypt:

function _find_letsencrypt_key() {
local LETSENCRYPT_KEY
local LETSENCRYPT_DOMAIN=${1}
if [[ -z ${LETSENCRYPT_DOMAIN} ]]; then
_dms_panic__misconfigured 'LETSENCRYPT_DOMAIN' '_find_letsencrypt_key'
fi
if [[ -e /etc/letsencrypt/live/${LETSENCRYPT_DOMAIN}/privkey.pem ]]; then
LETSENCRYPT_KEY='privkey'
elif [[ -e /etc/letsencrypt/live/${LETSENCRYPT_DOMAIN}/key.pem ]]; then
LETSENCRYPT_KEY='key'
else
_log 'error' "Cannot find key file ('privkey.pem' or 'key.pem') in '/etc/letsencrypt/live/${LETSENCRYPT_DOMAIN}/'"
_dms_panic__misconfigured 'LETSENCRYPT_KEY' '_find_letsencrypt_key'
fi
echo "${LETSENCRYPT_KEY}"
}

Traefik has special support because of the acme.json format. Whereas I feel this PR is adding a very minor convenience to the user?

Our docs do cover SSL_TYPE=manual as an alternative to SSL_TYPE=letsencrypt. Both of which have check-for-changes.sh support as their locations are monitored:

if [[ ${SSL_TYPE:-} == 'manual' ]]; then
# When using "manual" as the SSL type,
# the following variables may contain the certificate files
STAGING_FILES+=(
"${SSL_CERT_PATH:-}"
"${SSL_KEY_PATH:-}"
"${SSL_ALT_CERT_PATH:-}"
"${SSL_ALT_KEY_PATH:-}"
)
elif [[ ${SSL_TYPE:-} == 'letsencrypt' ]]; then
# React to any cert changes within the following LetsEncrypt locations:
STAGING_FILES+=(
/etc/letsencrypt/acme.json
/etc/letsencrypt/live/"${SSL_DOMAIN}"/*.pem
/etc/letsencrypt/live/"${HOSTNAME}"/*.pem
/etc/letsencrypt/live/"${DOMAINNAME}"/*.pem
)
fi

However in your current PR state, your approach for Caddy would miss out on this and would not benefit. The Traefik acme.json support is hard-coded into our /etc/letsencrypt internal location which has been considered acceptable. While for this Caddy support we need to also rely on /caddy_data to monitor for changes (ideally only for those relevant to DMS, which is better supported if mounting the full path to isolate if the reverse-proxy was using the same data folder for other services, as shown with rspamd).

The difference with normal SSL_TYPE=letsencrypt is that unlike SSL_TYPE=manual there is no internal copy made. I'd like to simplify the SSL / TLS support, but haven't found time to. Ideally it'd be more generic in finding a cert from some location and then copying it to a fixed internal location. I understand the desire for convenience though.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here in the old docs we have an example filepath that your PR presumably operates on and is still roughly the same?

Whereas I feel this PR is adding a very minor convenience to the user?

The reason for implementing Caddy support is that we can't know in advance the path where Caddy will store the certificates. Caddy uses Lets Encrypt or ZeroSSL by default, and may be configured to issue local certs, or use another CA. The paths depends on the issuer, and may look like the paths in the old docs, but they may also have acme.zerossl.com-v2-dv90 (or local, or other) instead of acme-v02.api.letsencrypt.org-directory

Th paths could also change over time, for example if a certificate is initially issued with Lets Encrypt, but for some reason the renewal fails and Caddy falls back to ZeroSSL.

@am97
Copy link
Copy Markdown
Author

am97 commented Aug 17, 2023

Hello and thanks for the extended review !

As you say I should have opened an issue to discuss this first. The changes required to support Caddy are bigger than I originally thought. I was not aware of the internals of DMS, such as check-for-changes.sh.

I think that support for multiple «certificate managers» may be implemented in a modular way with «extractors», I'm opening an issue to discuss this from scratch.

If you're okay with the approach, I can transform this PR to «Add a Caddy extractor» (or open a new one).

@polarathene
Copy link
Copy Markdown
Member

If you're okay with the approach, I can transform this PR to «Add a Caddy extractor» (or open a new one).

A new PR would be better thanks. Feel free to close this one if you'd like to pursue the alternative discussed in your new issue.

There's a possibility I may be able to tackle it but if I don't find time by September I doubt it'll be any time soon 😅

I outlined in the issue what a rework would look like, I have some familiarity with what commands can be used to accomplish the different tasks. In the meantime, I believe Caddy can be configured to restrict the providers to a single one if you choose to, which would avoid your concern with potential vendor changing allowing for SSL_TYPE=manual to work reliably.

@am97
Copy link
Copy Markdown
Author

am97 commented Aug 18, 2023

Ok ! I'm Closing this PR then. I'll wait for your refactor of check-for-changes.sh and ssl.sh and add some notes on the issue.

Yes, Caddy can be configured to use a single CA, with the acme_ca global option (I'm keeping the default of Lets Encrypt + ZeroSSL on my side, and I use an intermediate container to copy the certs, it does not account for all corner cases but should be sufficient for the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants