Improved logic to handle HOSTNAME and DOMAINNAME + Improving handling of Letsencrypt changes + build docs WARNINGs should throw exit code#2245
Improved logic to handle HOSTNAME and DOMAINNAME + Improving handling of Letsencrypt changes + build docs WARNINGs should throw exit code#2245NorseGaud wants to merge 56 commits intodocker-mailserver:masterfrom NorseGaud:hostname-domain-and-ssl-issues
Conversation
|
UPDATE: Misunderstanding on my part, despite my recall ,
Just in-case anyone isn't aware... Our tests also I think only set So I'm not surprised if reliance on these values may be wonky 😅 |
polarathene
left a comment
There was a problem hiding this comment.
I think this being another PR about issues with HOSTNAME/DOMAINNAME highlights the concern to properly review the usage is correct.
I'm not particularly fond of band-aiding such issues regularly, especially if they're not well understood and the fix is accepted on the merits of "well it resolves problem X for me".
When contributing fixes such as these, a related test that would fail prior and pass with the fix would at least be a bit helpful in mapping out the problems experienced.
Our test suite does not presently encounter the issue that has been reported for which this PR is intended to fix. In this case there appears to be a few variants of inputs to test against, I don't think DOMAINNAME looks like the appropriate fix.
That issue is due to HOSTNAME being a single label, not a FQDN, preventing it from matching the correct letsencrypt folder to monitor for changes?
This comment has been minimized.
This comment has been minimized.
|
Anyone know why we set Do our users even use NIS? |
It's legacy AFAIK, intended for pre-LDAP. Would have to do a bit of time traveling via I assume the original project author went with it, he made quite a few assumptions without properly understanding the choices I think, other than they seemed to work and that was good enough at the time. I'm inclined to think that we don't need it for proper functionality, since AFAIK all or the majority of our tests only use It was probably misinterpreted and confused with a regular domain name. That said, Postfix or similar may implicitly leverage it as a default (pretty sure it does with Docker should be taking the |
…domain.sld.tld or only sld.tld - look for all certs that have changed - support only hostname (no domainname/subdomain) - note about domain only for mailserver - test fixes/improvements for new hostname logic
| image: docker.io/mailserver/docker-mailserver:latest | ||
| container_name: mailserver | ||
| hostname: mail | ||
| hostname: mail # If you don't use a subdomain for your mail server (example: mail.example.com), you can set only `hostname` to the domain |
There was a problem hiding this comment.
| hostname: mail # If you don't use a subdomain for your mail server (example: mail.example.com), you can set only `hostname` to the domain | |
| # If the FQDN for your mail-server is only two labels (eg: example.com), you can assign this entirely to `hostname` and remove `domainname`. | |
| hostname: mail |
Apart from the comment and example being a tad confusing, it should sit above the property when it's that long.
There was a problem hiding this comment.
Whoops, didn't mean to resolve this.
How do we feel about eliminating domainname entirely at this point @polarathene @casperklein ? I've confirmed that both subdomains and regular domains work in hostname without domainname.
domainname is not well documented and a little confusing.
There was a problem hiding this comment.
I am not sure, what exactly you mean be "removing" domainname (ps: haven't read the whole PR discussion)?
The docker(compose) options for domainname do only affect /etc/hosts in the container afaik.
Of course without domainname, similar can be achieved by using a FQDN as hostname. What do you exactly try to achieve? What improvements are you expecting?
There was a problem hiding this comment.
It seems like all we require is setting hostname and not domainname. I'd like to pull it out but it's a breaking change so I'll leave it alone for now.
There was a problem hiding this comment.
How do we feel about eliminating
domainnameentirely at this point
👍
I've confirmed that both subdomains and regular domains work in
hostnamewithoutdomainname.
I identified some cases where that's not the case I think, although I did not try them against docker-mailserver, and they're likely niche if at all in use (and some non-sensical ones).
I'd prefer to get the test suite refactored to run it against the variants though.
domainnameis not well documented and a little confusing.
Yeah, IIRC it was made available for some need, but was generally advised against. Probably why the docs are lacking. I think it was adopted by this project either by misunderstanding or an attempted workaround.
There was a problem hiding this comment.
The docker(compose) options for domainname do only affect
/etc/hostsin the container afaik.
I've tested docker-compose config and it matches expectation of /etc/hosts with the Docker CLI results I observed. I didn't do a full parity comparison, but assume they're in sync these days.
Of course without domainname, similar can be achieved by using a FQDN as hostname. What do you exactly try to achieve? What improvements are you expecting?
Supporting domainname is legacy. These two config options used to behave differently with CLI and docker-compose a while back. There are past issues and PRs for the project related to it with workarounds due to confusion IIRC.
Depending on the config, and other contexts (eg network host mode), inconsistencies can be introduced.
Proposal is to at some point move any reference/usage of $HOSTNAME env (and not overwrite it) to a DMS scoped one, look at any usage of hostname command etc. We have some workaround ENV like OVERRIDE_HOSTNAME that may no longer be relevant, plus various *DOMAIN ENV.
Docs in a follow up PR will remove the usage of domainname and only advise hostname if this is the only relevant option to set. I am not sure what's required to support mail.example.com that uses mail headers example.com, or other related changes (eg if FQDN is more than 3 labels), but those use cases should be covered (I think we have tests that do).
| then | ||
| DOMAINNAME="${HOSTNAME}" | ||
| #shellcheck disable=SC2034 | ||
| DOMAINNAME="$(hostname -d)" |
There was a problem hiding this comment.
| DOMAINNAME="$(hostname -d)" | |
| DMS_DOMAIN="$(hostname -d)" |
Has hostname -d been checked with:
-
hostname: mail.example.com(nodomainname) -
hostname: example.com(nodomainname) -
hostname: mailanddomainname: example.com
There was a problem hiding this comment.
Yep. Looks good.
hostname: deliver
domainname: xkr.email
root@deliver:/# hostname -f
deliver.xkr.email
root@deliver:/# hostname -d
xkr.email
root@deliver:/# hostname
deliver
root@deliver:/# echo $HOSTNAME
deliver
hostname: xkr.email
root@xkr:/# hostname -f
xkr.email
root@xkr:/# hostname -d
email # We override this and use HOSTNAME internally in the scripts
root@xkr:/# hostname
xkr.email
root@xkr:/# echo $HOSTNAME
xkr.email
hostname: deliver.xkr.email
root@deliver:/# hostname -f
deliver.xkr.email
root@deliver:/# hostname -d
xkr.email
root@deliver:/# hostname
deliver.xkr.email
root@deliver:/# echo $HOSTNAME
deliver.xkr.email
There was a problem hiding this comment.
Yep. Looks good.
Was this the output from docker-mailserver after scripts are run? Or on a direct base image without docker-mailserver (at least it's scripts) involved? (EDIT: I've collected results from Debian and Alpine base images and have shared them in a follow up comment)
# We override this and use HOSTNAME internally in the scripts
This is kind of an issue for me as it sounds like we're messing with it when we shouldn't need to?
Is the difference in $HOSTNAME from hostname -f desirable? (it's the same in two examples, but in the 1st it is different), same with the commented -d result for xhr.email in the 2nd example.
UPDATE: Consistent outputs can be done with:
domainnamecommand will output the actual domainname set for the container. Don't usehostname -dto retrieve it. Unless you want the FQDN minus the 1st label. If domainname was not set, the returned value for thedomainnamecommand is(none).hostnamelikewise matches the actual hostname set for the container.hostname -fwill return the composite value of hostname + domainname if a match is in/etc/hosts, but it must have an alias that matches the queried hostname (value of/proc/sys/kernel/hostname). For all 3 exampleshostname -fwill output the expected result (unless using--net=host).
Perhaps something like?:
| DOMAINNAME="$(hostname -d)" | |
| if [[ ! $(domainname) == '(none)' ]]; | |
| then | |
| DMS_DOMAIN="$(domainname)" | |
| fi |
| else | ||
| NEW_DOMAIN_NAME="${DOMAINNAME}" | ||
| fi | ||
| NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"${DOMAINNAME}"}" |
There was a problem hiding this comment.
Outdated
If this is relying on the system (in this case within the container instance) ENV (UPDATE: DOMAINNAME value, it should be equivalent to hostname -d, domainname -d, dnsdomain -d right?DOMAINNAME ENV isn't a system provided one, my bad)
| NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"${DOMAINNAME}"}" | |
| local NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"${DMS_DOMAIN}"}" |
or
| NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"${DOMAINNAME}"}" | |
| local NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"$(hostname -d)"}" |
Technically, I don't think we want the output of domainname -d, and if hostname -d is only correct under some configurations, then dnsdomain -d may be more reliable. I'd need to check.
UPDATE: All three commands -d option output the same result, no difference. domainname command default output however matches the actual domainname value configured for the container.
| NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"${DOMAINNAME}"}" | |
| local NEW_DOMAIN_NAME="${SRS_DOMAINNAME:-"$(dnsdomain -d)"}" |
There was a problem hiding this comment.
Not in a situation where hostname in the docker-compose/run is set as a normal domain like example.com. hostname -d would be com :(
There was a problem hiding this comment.
EDIT: At the time of writing this comment I had some misunderstandings. Those have since been cleared up with findings described here.
hostname -dwould becom:(
That would be expected and working correctly imo. Can you explain why that's bad?
Think about it with 4 labels or 5 perhaps, and what you'd expect. Or what about example.co.uk and mail.example.co.uk, some may even have mail.internal.company.com but want company.com in the email header? 🤷♂️
Perhaps this just highlights the issue you're working on here? Should you be relying on HOSTNAME/DOMAINNAME?
When is a distinction of separation important and would it be better handled via setting only hostname, reading HOSTNAME(or hostname -s and hostname -f) and having a separate ENV that we use internally (unrelated to DOMAINNAME or hostname -d and similar) that allows control to explicitly deviate from the default behaviour?
I know that some software like Postfix can implicitly use the detected hostname (and possibly the domainname) values it queries from the system, as can others, but if they also support explicit config, then that's a non-issue.
Some routing is handled via /etc/hosts based on these two values, but it would only impact services within the container AFAIK, since we're using containers for sandboxing, other containers on the same network I believe also can query the containers IP via that, but nothing on the docker daemon host itself should.
I know that the tests cover different variations, but I'm not familiar of what the different use-cases are that they're all covering. Lacks examples 😅
I assume setting hostname is still of value.
There was a problem hiding this comment.
AFAIK our use of OVERRIDE_HOSTNAME, HOSTNAME, and DOMAINNAME (now DMS_HOSTNAME_DOMAIN) is simply to create uniformity and predictability depending on what the user ends up configuring.
You'd notice that HOSTNAME takes priority over DOMAINNAME in postfix's config and several other important places. Changing it gives me a lot of anxiety.
Having hostname -d be com or in my case email (xkr.email) is not correct and doesn't work without some changes ( I'd rather not be responsible for ;) ).
We can't use hostname it's the just subdomain part of subdomain.sld.tld.
It looks like the /etc/hosts entry is being set properly and I've added tests for it too.
There was a problem hiding this comment.
AFAIK our use of
OVERRIDE_HOSTNAME,HOSTNAME, andDOMAINNAME(nowDMS_HOSTNAME_DOMAIN) is simply to create uniformity and predictability depending on what the user ends up configuring.
Seems a bit much and these names are a bit overloaded? The word HOSTNAME and DOMAINNAME is getting awfully confusing looking into this and discussing it since the values vary based on context.
I'd like to try address that in future. The less I have to think about things the better 😅
This is the current usage right?:
- Public (user provided):
OVERRIDE_HOSTNAME. - Private (internal):
DOMAINNAME,DMS_HOSTNAME_DOMAIN. - System (container default ENV already set):
HOSTNAME(partial internal override).
Changing it gives me a lot of anxiety.
That's fine. No need to change it any time soon if there's no immediate issues.
I'd like to get the test suite into a better shape for more easily testing this sort of change. I'll create a PR for it when I have the time.
Having
hostname -dbecomor in my case
My point was that it's the correct behaviour for those commands.
- Use only
hostnameconfig option with the two label FQDN (example.comorxkr.email) instead. - Use
hostnameordomainnamecommands if you want to match the equivalent config, or source directly from/proc/sys/kernel/{hostname,domainname}. hostname -dordomainname -dwill both output the same result. It is the 1st value from a matched/etc/hostsentry after the IP. The full value is whathostname -f/domainname -foutput, while the-dvariant truncates the 1st label.- For the short hostname, Debian will provide the first label via
hostname -s. Alpine linux and others however may do the same or act similar to-dand-fin sourcing from/etc/hosts, depending on packages installed. Remember that-sis always the first label only, regardless if you set a multi-labelhostnameand still provide adomainname. - In the even that there is no valid match in
/etc/hosts, the queried hostname may perform a DNS query to public resolvers, which may not be desireable (rarely likely to be an issue, but may hide the underlying issue).
We can't use
hostnameit's the just subdomain part ofsubdomain.sld.tld.
The command? As mentioned above, it's whatever was assigned to the container's hostname via config options.
The hostname is meant to identify the container itself on a network. That network doesn't need to be publicly accessible AFAIK.
It looks like the
/etc/hostsentry is being set properly and I've added tests for it too.
It's valid if not:
- Duplicating labels between
hostnameanddomainname. - Having both config options set but with a multi-label
hostnamevalue. - Using host network mode, unless an entry has been added for the containers hostname into
/etc/hosts(host or container copy, note that these differ in resolution behaviour).
| -e OVERRIDE_HOSTNAME=subdomain.sld.tld \ | ||
| --domainname sld2.tld \ | ||
| -h subdomain2 \ |
There was a problem hiding this comment.
Just curious what the point is to provide subdomain2 if you have ENV OVERRIDE_HOSTNAME with subdomain?
Is this ENV meant to simulate -h subdomain.sld.tld while having some value in hostname -s returning subdomain2 or hostname -f returning subdomain2.sld.tld? I haven't looked into what goes on logic wise, but that's the impression I get. I would look at /etc/hosts as well, as presumably it would container the FQDN of subdomain2.sld2.tld and alias subdomain2.
If I'm correct, it could be why some breakage is already being experienced, and possibly affecting your encryption PR issues if you've got a similar setup and Dovecot or similar tries to query the hostname (short or FQDN) and that conflicts with the HOSTNAME ENV that docker-mailserver modifies?
EDIT: I viewed the related test and see that it at least checks for expected override behaviour from all known occurrences.
There was a problem hiding this comment.
Is this ENV meant to simulate -h subdomain.sld.tld
Yep. That's what OVVERIDE_HOSTNAME does from what I can tell.
I would look at /etc/hosts as well, as presumably it would container the FQDN of subdomain2.sld2.tld and alias subdomain2.
Correct. We don't override /etc/hosts.
possibly affecting your encryption PR issues if you've got a similar setup and Dovecot or similar tries to query the hostname (short or FQDN) and that conflicts with the HOSTNAME ENV that docker-mailserver modifies?
Good point -- I'm hoping to try that stuff again and find the time to reach out to dovecot admins list after this is addressed.
This comment has been minimized.
This comment has been minimized.
…e + Added LETSENCRYPT_DOMAIN=
|
Good to go now. As the changes are rather large, along with the lengthy discussion and commit history.. if reviewers would prefer I can interactively rebase the whole thing and cherry pick it out to separate smaller PRs if preferred? Otherwise I can write up a summary/overview of the entire PR here. The commits I've added to finish up the PR don't have the cleanest history (I have refrained from rewriting history already pushed), but I've tried to scope commits to ease review if needed, with associated commit messages for context. I've done a bit of refactoring, added some fixes and rather heavy inline documentation to avoid any maintenance confusion about the niche knowledge we encountered during this PR. I've completely rewritten the letsencrypt tests. They adopt the helpers I added recently, and have a few methods that could later be extracted out to share with the other TLS tests. Let me know if I went overkill with the DRY abstractions, I understand that may not be as easy to decipher vs raw hard-coded commands. The entire letsencrypt test file directory has been deleted. I didn't like the actual cert data when inspecting it, and decided to use the existing For the Traefik tests, There are some checks in the I am deferring that to a follow-up PR due to the size of this one and the refactoring required to let @NorseGaud thanks for the tips on a previous PR for better debugging bats test failures, and for your contributions at Regarding the SSL docs, I've tried to make a compromise given your feedback earlier. There's now an early section about the FQDN and it's relation to configuration. It mentions the usage of |
Yeah man -- big thanks to @martin-schulze-vireso for that as he took the reins and finalized it. |
Also moves the service conditions into the prepare method.
|
Documentation preview for this PR is ready! 🎉 Built with commit: 1f0c237 |
There was a problem hiding this comment.
Spotted an issue that should be investigated further (I'll do so after I complete other tasks) while working on a complimentary check-for-changes.sh PR to pull out it's duplicated code that fell out of sync. I came across this concern when looking into /etc/postfix/vhost usage in our scripts.
Notes:
- Our default
target/postfix/main.cfuses the value$myhostname, localhost.$mydomain, localhost, same as the FAQ entry example. This is also the upstream default formydestination. setup-stack.sh:_setup_ldaphas a sed operation at the end on/etc/postfix/main.cfto remove$myhostnamefrom themydestinationvalue resulting in:mydestination = localhost.$mydomain, localhost.
Based on the Postfix docs for $myhostname, it is probably better that we explicitly set the value of myhostname in main.cf rather than rely on implicit values. That may also require some consideration for mydomain too, in addition to possibly explicitly setting mydestination.
EDIT: We already do set myhostname and mydomain via _setup_postfix_hostname with postconf -e.
| # If the FQDN for your mail-server is only two labels (eg: example.com), | ||
| # you can assign this entirely to `hostname` and remove `domainname`. |
There was a problem hiding this comment.
The docs also mention this should work.
However, a FAQ entry points out that using a bare domain requires additional configuration.
mydestination should not contain a value that /etc/postfix/vhost does, which is used for virtual_mailbox_domains in Postfix main.cf, the Postfix docs specifically warn against this:
NEVER list a virtual MAILBOX domain name as a mydestination domain!
We may want to add some extra handling logic to detect and abort or possibly fix this issue before advising bare domain support like this.
There was a problem hiding this comment.
May want to additionally add mention of issues with --network=host which will have trouble establishing the hostname from /etc/hosts. Present advice has been to use OVERRIDE_HOSTNAME, but it's probably more correct to ensure an entry in /etc/hosts.
|
We aren't running That was introduced with the original At the time it was only for non-LDAP config updates, and it seems we have since added in the SSL cert renewal change detection. I don't think anything else matters to LDAP users, but I'm not familiar enough with the monitored configuration and LDAP setups to know for sure. I will defer this concern to a follow up PR. That PR also seems to be the origin of |
Description
Original PR that may have broken this: https://github.com/docker-mailserver/docker-mailserver/pull/2175/files
In #2239, the user has:
And the certificate was triggering changedetector:
[[ INF ]] Cert found in /etc/letsencrypt/acme.json for mail.somedomain.comYet, they had two different directories under
/etc/letsencrypt:They indicated that they found a solution by:
This means that
mailis getting the renewal (_extract_certs_from_acme), butmail.somedomain.comis being watched.The user has explicitly set
SSL_DOMAINtomail.somedomain.comand yet the extraction seems to be happening into a folder namedmail. Under thesetup-stack.shI see:There are a few problems here:
mailand notmail.somedomain.comLETSENCRYPT_DOMAINtoSSL_DOMAINFor 1, we see
[[ INF ]] Cert found in /etc/letsencrypt/acme.json for mail.somedomain.com, so that's not the issue. HOWEVER, the code shows that ${1} is being used instead of ${HOSTNAME} so we aren't seeing accurately what is being used:To fix this, we shouldn't be using HOSTNAME and instead use
${1}.For 2, I have added the required code
It also looked as if we were testing acme with a "changed" cert that was for
example.comand notmail.my-domain.com. I'm unsure how this worked... I've expanded the tests to test the regular situation of mai.my-domain.com and then also the wildcard acme for example.com.Seems as if the docs generation script wasn't properly exiting on WARNING of broken links. I fixed that too.
Type of change
Checklist:
docs/)