Skip to content

Add support for plaintext clients in TLS cluster#8587

Merged
yossigo merged 5 commits intoredis:unstablefrom
zuiderkwast:tls-cluster-plaintext-port
Mar 30, 2021
Merged

Add support for plaintext clients in TLS cluster#8587
yossigo merged 5 commits intoredis:unstablefrom
zuiderkwast:tls-cluster-plaintext-port

Conversation

@zuiderkwast
Copy link
Copy Markdown
Contributor

@zuiderkwast zuiderkwast commented Mar 2, 2021

The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes #8134

@zuiderkwast zuiderkwast force-pushed the tls-cluster-plaintext-port branch from 53e81f3 to f4733d0 Compare March 4, 2021 14:46
@zuiderkwast zuiderkwast marked this pull request as ready for review March 4, 2021 14:47
@zuiderkwast zuiderkwast changed the title Add support for plaintext clients in TLS cluster (WIP) Add support for plaintext clients in TLS cluster Mar 4, 2021
@zuiderkwast
Copy link
Copy Markdown
Contributor Author

Open question: In commands like CLUSTER SLOTS, redirects, etc. the plaintext port of a node is reported to clients connected over plaintext if such port exists, otherwise, the TLS port is returned as before. This is backward-compatible, but given that the TLS port can never work for plaintext clients, it may be better to return an error instead of a redirect and exclude nodes without plaintext ports in the result of CLUSTER SLOTS/NODES, when the client is non-TLS. WDYT?

@yossigo, @oranagra whenever you have time...

@madolson madolson added the state:major-decision Requires core team consensus label Mar 5, 2021
@yossigo
Copy link
Copy Markdown
Collaborator

yossigo commented Mar 7, 2021

@zuiderkwast The heuristic of returning a TCP or TLS port depending on the client connection has a few other unresolved issues:

  • What do we return to Unix domain clients?
  • How does that going to play along with more complex network setups (proxies, service mesh, SSL termination endpoints, etc.)?

The real problem with -MOVED is not availability of a matching port, but the fact that we don't specify the protocol in any way (think about it as an HTTP redirect that assumes the caller will reuse the same scheme). Fixing that will break backwards compatibility unfortunately.

We plan to anyway extend CLUSTER SLOTS to support hostnames, and I think while at it we should also specify protocols and ports. If we do that, then the host:port tuple is just a key pointing to a specific node, and while we may make a best effort attempt to return a sensible port, it's not a must.

We could also take a step forward and let clients explicitly specify their preference, which will solve the Unix socket clients issue.

BTW personally I find the term "TCP port" more obvious than "plain port", not sure if it's just me though.

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

  • What do we return to Unix domain clients?

Good question. Perhaps it's safest to return the TLS port to these. (If domain sockets are used by a cluster client, it can only be used for the (first) connected node on the same host as the application. It's possible to use a domain socket for the init node to get slot configuration and then use TLS for the rest of the nodes, although I think it's an unlikely scenario.) Agree? Do you have a different use case?

  • How does that going to play along with more complex network setups (proxies, service mesh, SSL termination endpoints, etc.)?

If any proxy (etc.) speaks non-TLS with one Redis node, it would need the non-TLS port to the other Redis nodes too AFAIK. The only scenario which could break is the TCP version of the unix domain socket example above, a very unlikely scenario I think.

The real problem with -MOVED is not availability of a matching port, but the fact that we don't specify the protocol in any way (think about it as an HTTP redirect that assumes the caller will reuse the same scheme). Fixing that will break backwards compatibility unfortunately.

HTTP redirect with relative scheme is a good comparison. Location: //example.org/path. It's not terribly strange, and I think it's the only reasonably way.

We plan to anyway extend CLUSTER SLOTS to support hostnames, and I think while at it we should also specify protocols and ports. If we do that, then the host:port tuple is just a key pointing to a specific node, and while we may make a best effort attempt to return a sensible port, it's not a must.

I'm all for this, but our motivation for this PR is to ease transition to TLS by allowing some VMs/containers/etc. or some corner case clients (cron jobs, etc.) to continue running non-TLS during a transition phase, until all parts of a system has switched to TLS. The solution has to work with old clients out of the box to be meaningful. After transitioning to TLS, I don't see much value in using TLS and non-TLS side by side.

As for the CLUSTER SLOTS extension: Please consider using URIs to encode hostnames, schemes and other things. Some clients and similar use redis:// URIs already.

We could also take a step forward and let clients explicitly specify their preference, which will solve the Unix socket clients issue.

Sure, if you think it would be valuable. It sounds like a separate PR to me in that case.

BTW personally I find the term "TCP port" more obvious than "plain port", not sure if it's just me though.

The TLS port is also a TCP port, so I don't think it's a very good term for non-TLS, but I can accept it if it's used in this sense in existing config and docs. Other words for non-TLS are "plaintext" and "cleartext", as what I can think of. Perhaps "clear" is a better short version than "plain" as part of a config field?

@yossigo
Copy link
Copy Markdown
Collaborator

yossigo commented Mar 8, 2021

We should assume multiple nodes run on the same operating system (VM/instance) as a way to utilize multiple cores, so theoretically we could need to redirect from one Unix socket to another. This is of course not support and already a shortcoming of the protocol.

I think we can conclude this --

  1. Start with this PR as-is.
  2. Consider adding a mechanism for clients to explicitly ask for redirection target (TLS or non-TLS).
  3. CLUSTER SLOTS multiple endpoints, possibly along with a way to support Unix sockets, hostnames, etc.

Regarding terminology, the configuration file and minimal documentation already refer to TCP vs. TLS. Technically it's not a very correct but I think it's common, but would be happy to hear other opinions/preferences.

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

theoretically we could need to redirect from one Unix socket to another

MOVED 12345 /path/to/socket:0 by default for clients connected over Unix socket on the same machine. Nice! (Separate feature)

the configuration file and minimal documentation already refer to TCP vs. TLS

Interesting. I can change cluster-announce-plain-port to cluster-announce-tcp-port if you prefer(?).

For reference, here's what I've found: TCP vs TLS is used in comments in redis.conf but not in actual config names, so it's not hard-wired. In docs, the closest to this I've found is this:

"The tls-port configuration directive enables accepting SSL/TLS connections on the specified port. This is in addition to listening on port for TCP connections, so it is possible to access Redis on different ports using TLS and non-TLS connections simultaneously." (from https://redis.io/topics/encryption)

One config comment mentions "plain TCP":

# By default, the Redis Cluster bus uses a plain TCP connection. To enable
# TLS for the bus protocol, use the following directive:
#
# tls-cluster yes

Additionally, some TCP config probably also affects TLS, e.g. tcp-keepalive and repl-disable-tcp-nodelay(?).

@yossigo
Copy link
Copy Markdown
Collaborator

yossigo commented Mar 9, 2021

@zuiderkwast Redis is consistently inconsistent when it comes to terminology. I'm OK either way, @redis/core-team any other suggestions?

yossigo
yossigo previously approved these changes Mar 9, 2021
@itamarhaber
Copy link
Copy Markdown
Member

I think one of our goals is to make it eventually consistent, and we'll get there eventually :)

To make it plain, I'd keep it at cluster-announce-plain-port.

@oranagra
Copy link
Copy Markdown
Member

regarding terminology, i personally don't like "plain text" (maybe "plain-tcp" is ok, but IMHO the word "text" is unrelated). i would rather use "tcp" or "non-tls" than "plain text".

But maybe we can still find a better solution.
Maybe "port" can by default mean either a plain TCP port when TLS is, and a TLS port when TLS is enabled, unless a specific TLS port is specified, in which case "port" means plain TCP port. this way we have something that's hopefully simple to use, and also backwards compatible.

i'm not sufficiently familiar with that area, so forgive me if my advise makes no sense.

so we'll have cluster-announce-port and cluster-announce-tls-port.
by default cluster-announce-port means a TCP port when tls-cluster is off, and means a TLS port when tls-cluster is on.
however, if you also specify an non-zero cluster-announce-tls-port, then cluster-announce-port always refers to the plain TCP port.

p.s. putting that config apart, we also have a backwards compatibility issue with the cluster command responses, right?

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

Nice idea @oranagra! I like the symmetry:

name default
cluster-announce-tls-port tls-port
cluster-announce-port port

But then it has to be explained that, for historical reasons, cluster-announce-port is the TLS port if cluster-announce-tls-port = 0 and tls-cluster = yes. If that's not too complicated to understand for a user, then I'm fine with this.

@yossigo, @itamarhaber Agree on cluster-announce-port and cluster-announce-tls-port? Not too complicated for users?

i personally don't like "plain text"
...
the word "text" is unrelated

Note the difference between plain text (e.g. .txt files) and plaintext (used in cryptography as the input to encryption and output from decryption).

Note also that "cluster-announce-plain-port" doesn't contain "text". :-)

I suppose "non-TLS" is indeed a more exact term when encryption is not used. I can totally change all occurrences of "plaintext" to "non-TLS" in the comments and docs.

@oranagra
Copy link
Copy Markdown
Member

But then it has to be explained that, for historical reasons, cluster-announce-port is the TLS port if cluster-announce-tls-port = 0 and tls-cluster = yes. If that's not too complicated to understand for a user, then I'm fine with this.

arguably, it's not only for historical reasons, it's also possibly slightly easier to use. if you define just one port, and it is used in the context of the other configs.

I suppose "non-TLS" is indeed a more exact term when encryption is not used. I can totally change all occurrences of "plaintext" to "non-TLS" in the comments and docs.

noted about "plaintext" vs "plain text", but i still think "non-tls" would be better if we are to keep the current approach of the PR.
but it would be better to avoid the use of "non-" if my proposal is acceptable.

what about the concern about backwards compatibility issue with the cluster command responses? is there any concern here?

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

but it would be better to avoid the use of "non-" if my proposal is acceptable.

I'll try. "Non-TLS" is already used in docs though, to highlight the absence of TLS. The only way around it would be formulations like "clients connected without TLS"

what about the concern about backwards compatibility issue with the cluster command responses? is there any concern here?

Not AFAIK.

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

arguably, it's not only for historical reasons, it's also possibly slightly easier to use. if you define just one port, and it is used in the context of the other configs.

Given how port and tls-port are used in non-cluster (you have to zero port to force TLS) I think it's confusing to use cluster-announce-port for the TLS port on purpose or in examples. Rather, I think including "for historical reasons" provides an explanation for the slightly strange logic. Stay tuned for an updated commit.

@yossigo
Copy link
Copy Markdown
Collaborator

yossigo commented Mar 10, 2021

Maybe we just need to be more explicit and use cluster-announce-port <tcp|tls>:<port> <tcp|tls>:<port>. If no port prefix is included, we treat it as tcp or tls depending on tls-cluster settings so we're 100% backwards compatible. Configuration is a bit more messy but I think the user experience is as good as we can get.

Note that we also have some ambiguity about the port usage inside the cluster bus. We assume port is either TLS or non-TLS, and can include an additional non-TLS port; but what if the cluster bus is non-TLS and we want to also expose a TLS port to clients?

@oranagra
Copy link
Copy Markdown
Member

@yossigo i'm not sure i'm a fan of concatenation and string manipulation, i think i'd rather configs to be primitives (integers) whenever we can, and avoid additional parsing of compound configs (the places were we have that already are painful)

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

zuiderkwast commented Mar 10, 2021

what if the cluster bus is non-TLS and we want to also expose a TLS port to clients?

I've thought about the possibility. For a cluster running on a single host to utilize multiple cores it could be useful. Unix domain sockets can also be useful for the cluster bus in this case.

If we want to support it, I guess we can something similar, e.g. cluster-announce-tls-bus-port and cluster-announce-bus-port. [Edit] Sorry, I mixed up stuff. Perhaps we could add a flag in the bus protocol for this. [Edit 2] ... or we can swap them in the bus protocol so the port field is non-TLS and pport is TLS, if the cluster bus is non-TLS (and possibly rename pport to secondary port, etc.). This PR sends pport = 0 if the cluster bus is non-TLS, so it can be extended in this way without backward compat issues.

yossigo
yossigo previously approved these changes Mar 14, 2021
Copy link
Copy Markdown
Collaborator

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuiderkwast I think we can push this forward, need @redis/core-team approval as it's tagged a major decision.

To summarize our discussions, there is more follow-up work to this PR:

  1. Fully support the case where we have a non-TLS cluster and we want to advertise TLS ports for TLS clients. This will require another case in the config handling as well as more tests.
  2. A way for clients to explicitly reference their connection redirection preference.

In the longer term, we need to complete this work by extending CLUSTER SLOTS to support multiple endpoints per node. This can be used not only for TLS/non-TLS but also Unix sockets etc.

@yossigo yossigo added approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes labels Mar 14, 2021
oranagra
oranagra previously approved these changes Mar 15, 2021
Copy link
Copy Markdown
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New config directives LGTM.

I didn't review the code, i imagine the top comment needs some update.
So I'll describe what I understand:

  1. the nodes used to communicate on the cluster bus either the TLS or non-TLS port depending on the tls-cluster config. and the user was able to override that by defining cluster-announce-port
  2. now the nodes can communicate both ports between them, and the user is able to define an alternative (non-default) announce port for both TLS and non-TLS (with cluster-announce-tls-port).
  3. CLUSTER SLOTS now report the TLS port if the client connected to the current node on the TLS port, and reports the non-TLS port if the client connected to it by non-TLS.

Copy link
Copy Markdown
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some high level questions. I'm not a big fan of adding support for this type of configuration since it seems very insecure, but it seems like a stepping stone for having online TLS enabling, so that seems like a win.

@zuiderkwast zuiderkwast dismissed stale reviews from oranagra and yossigo via fef221e March 16, 2021 09:35
@zuiderkwast
Copy link
Copy Markdown
Contributor Author

zuiderkwast commented Mar 24, 2021

Bump.

Please re-approve if you're happy with the small refactoring commit added.

... or should I rebase first?

Although Redis can be configured to allow plaintext and TLS clients side by
side, (until now) Redis Cluster doesn't. If tls-cluster is set to yes, the
cluster bus and the client ports reported by CLUSTER SLOTS, CLUSTER NODES,
ASK and MOVED redirects are treated as TLS ports.

To support also plaintext clients, commands such as CLUSTER NODES report
the plaintext port to clients connected over plaintext and the TLS port
to clients connected over TLS.

The cluster bus is extended to contain the plaintext port of each node,
whenever TLS cluster is enabled. This is fit in unused bits in the cluster
bus header structure and the gossip structure. Before this commit, the
unused area was memset to zero, which is treated as none, meaning a
cluster node with this feature compatible with an older node without the
feature.
If a client is connected over plaintext (non-TLS), report the plaintext
port to the client in:

* CLUSTER NODES
* CLUSTER SLAVES
* ASK and MOVED redirects

Test:

* In TCL cluster client, add parameter to specify if TLS should be
  used, instead of using the global ::tls variable.
* In 04-resharding, query with both TLS and plaintext clients, covering
  redirects and CLUSTER NODES
If tls-cluster is yes and cluster-announce-tls-port is set, it refers
to TLS port and cluster-announce-port refers to the non-TLS port.

If tls-cluster is yes and cluster-announce-tls-port is zero, then
instead cluster-announce-port refers to the TLS port.

This logic replaces the cluster-announce-plain-port added in previous
commit.
@zuiderkwast zuiderkwast force-pushed the tls-cluster-plaintext-port branch from fef221e to dec4780 Compare March 24, 2021 20:56
@zuiderkwast
Copy link
Copy Markdown
Contributor Author

Rebased.

Copy link
Copy Markdown
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change. I left a comment about also committing the pport to the nodes conf so we don't show the wrong port transitively, but don't feel like that needs to be solved immediately.

@yossigo yossigo merged commit 5629dbe into redis:unstable Mar 30, 2021
@yossigo
Copy link
Copy Markdown
Collaborator

yossigo commented Mar 30, 2021

A long journey but this is now merged, thank you @zuiderkwast!

@oranagra oranagra mentioned this pull request Apr 19, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes redis#8134
@zuiderkwast zuiderkwast deleted the tls-cluster-plaintext-port branch January 12, 2024 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NEW] Include the plaintext or TLS ports in CLUSTER SLOTS depending on client connection

5 participants