Add support for plaintext clients in TLS cluster by zuiderkwast · Pull Request #8587 · redis/redis

zuiderkwast · 2021-03-02T14:14:51Z

The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster.

The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port.

The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients.

Fixes #8134

zuiderkwast · 2021-03-04T14:54:50Z

Open question: In commands like CLUSTER SLOTS, redirects, etc. the plaintext port of a node is reported to clients connected over plaintext if such port exists, otherwise, the TLS port is returned as before. This is backward-compatible, but given that the TLS port can never work for plaintext clients, it may be better to return an error instead of a redirect and exclude nodes without plaintext ports in the result of CLUSTER SLOTS/NODES, when the client is non-TLS. WDYT?

@yossigo, @oranagra whenever you have time...

yossigo · 2021-03-07T08:38:50Z

@zuiderkwast The heuristic of returning a TCP or TLS port depending on the client connection has a few other unresolved issues:

What do we return to Unix domain clients?
How does that going to play along with more complex network setups (proxies, service mesh, SSL termination endpoints, etc.)?

The real problem with -MOVED is not availability of a matching port, but the fact that we don't specify the protocol in any way (think about it as an HTTP redirect that assumes the caller will reuse the same scheme). Fixing that will break backwards compatibility unfortunately.

We plan to anyway extend CLUSTER SLOTS to support hostnames, and I think while at it we should also specify protocols and ports. If we do that, then the host:port tuple is just a key pointing to a specific node, and while we may make a best effort attempt to return a sensible port, it's not a must.

We could also take a step forward and let clients explicitly specify their preference, which will solve the Unix socket clients issue.

BTW personally I find the term "TCP port" more obvious than "plain port", not sure if it's just me though.

zuiderkwast · 2021-03-08T11:13:02Z

What do we return to Unix domain clients?

Good question. Perhaps it's safest to return the TLS port to these. (If domain sockets are used by a cluster client, it can only be used for the (first) connected node on the same host as the application. It's possible to use a domain socket for the init node to get slot configuration and then use TLS for the rest of the nodes, although I think it's an unlikely scenario.) Agree? Do you have a different use case?

How does that going to play along with more complex network setups (proxies, service mesh, SSL termination endpoints, etc.)?

If any proxy (etc.) speaks non-TLS with one Redis node, it would need the non-TLS port to the other Redis nodes too AFAIK. The only scenario which could break is the TCP version of the unix domain socket example above, a very unlikely scenario I think.

The real problem with -MOVED is not availability of a matching port, but the fact that we don't specify the protocol in any way (think about it as an HTTP redirect that assumes the caller will reuse the same scheme). Fixing that will break backwards compatibility unfortunately.

HTTP redirect with relative scheme is a good comparison. Location: //example.org/path. It's not terribly strange, and I think it's the only reasonably way.

We plan to anyway extend CLUSTER SLOTS to support hostnames, and I think while at it we should also specify protocols and ports. If we do that, then the host:port tuple is just a key pointing to a specific node, and while we may make a best effort attempt to return a sensible port, it's not a must.

I'm all for this, but our motivation for this PR is to ease transition to TLS by allowing some VMs/containers/etc. or some corner case clients (cron jobs, etc.) to continue running non-TLS during a transition phase, until all parts of a system has switched to TLS. The solution has to work with old clients out of the box to be meaningful. After transitioning to TLS, I don't see much value in using TLS and non-TLS side by side.

As for the CLUSTER SLOTS extension: Please consider using URIs to encode hostnames, schemes and other things. Some clients and similar use redis:// URIs already.

We could also take a step forward and let clients explicitly specify their preference, which will solve the Unix socket clients issue.

Sure, if you think it would be valuable. It sounds like a separate PR to me in that case.

BTW personally I find the term "TCP port" more obvious than "plain port", not sure if it's just me though.

The TLS port is also a TCP port, so I don't think it's a very good term for non-TLS, but I can accept it if it's used in this sense in existing config and docs. Other words for non-TLS are "plaintext" and "cleartext", as what I can think of. Perhaps "clear" is a better short version than "plain" as part of a config field?

yossigo · 2021-03-08T12:33:11Z

We should assume multiple nodes run on the same operating system (VM/instance) as a way to utilize multiple cores, so theoretically we could need to redirect from one Unix socket to another. This is of course not support and already a shortcoming of the protocol.

I think we can conclude this --

Start with this PR as-is.
Consider adding a mechanism for clients to explicitly ask for redirection target (TLS or non-TLS).
CLUSTER SLOTS multiple endpoints, possibly along with a way to support Unix sockets, hostnames, etc.

Regarding terminology, the configuration file and minimal documentation already refer to TCP vs. TLS. Technically it's not a very correct but I think it's common, but would be happy to hear other opinions/preferences.

zuiderkwast · 2021-03-08T15:46:47Z

theoretically we could need to redirect from one Unix socket to another

MOVED 12345 /path/to/socket:0 by default for clients connected over Unix socket on the same machine. Nice! (Separate feature)

the configuration file and minimal documentation already refer to TCP vs. TLS

Interesting. I can change cluster-announce-plain-port to cluster-announce-tcp-port if you prefer(?).

For reference, here's what I've found: TCP vs TLS is used in comments in redis.conf but not in actual config names, so it's not hard-wired. In docs, the closest to this I've found is this:

"The tls-port configuration directive enables accepting SSL/TLS connections on the specified port. This is in addition to listening on port for TCP connections, so it is possible to access Redis on different ports using TLS and non-TLS connections simultaneously." (from https://redis.io/topics/encryption)

One config comment mentions "plain TCP":

# By default, the Redis Cluster bus uses a plain TCP connection. To enable
# TLS for the bus protocol, use the following directive:
#
# tls-cluster yes

Additionally, some TCP config probably also affects TLS, e.g. tcp-keepalive and repl-disable-tcp-nodelay(?).

yossigo · 2021-03-09T12:15:06Z

@zuiderkwast Redis is consistently inconsistent when it comes to terminology. I'm OK either way, @redis/core-team any other suggestions?

itamarhaber · 2021-03-09T15:04:40Z

I think one of our goals is to make it eventually consistent, and we'll get there eventually :)

To make it plain, I'd keep it at cluster-announce-plain-port.

oranagra · 2021-03-10T08:55:29Z

regarding terminology, i personally don't like "plain text" (maybe "plain-tcp" is ok, but IMHO the word "text" is unrelated). i would rather use "tcp" or "non-tls" than "plain text".

But maybe we can still find a better solution.
Maybe "port" can by default mean either a plain TCP port when TLS is, and a TLS port when TLS is enabled, unless a specific TLS port is specified, in which case "port" means plain TCP port. this way we have something that's hopefully simple to use, and also backwards compatible.

i'm not sufficiently familiar with that area, so forgive me if my advise makes no sense.

so we'll have cluster-announce-port and cluster-announce-tls-port.
by default cluster-announce-port means a TCP port when tls-cluster is off, and means a TLS port when tls-cluster is on.
however, if you also specify an non-zero cluster-announce-tls-port, then cluster-announce-port always refers to the plain TCP port.

p.s. putting that config apart, we also have a backwards compatibility issue with the cluster command responses, right?

zuiderkwast · 2021-03-10T09:51:32Z

Nice idea @oranagra! I like the symmetry:

name	default
cluster-announce-tls-port	tls-port
cluster-announce-port	port

But then it has to be explained that, for historical reasons, cluster-announce-port is the TLS port if cluster-announce-tls-port = 0 and tls-cluster = yes. If that's not too complicated to understand for a user, then I'm fine with this.

@yossigo, @itamarhaber Agree on cluster-announce-port and cluster-announce-tls-port? Not too complicated for users?

i personally don't like "plain text"
...
the word "text" is unrelated

Note the difference between plain text (e.g. .txt files) and plaintext (used in cryptography as the input to encryption and output from decryption).

Note also that "cluster-announce-plain-port" doesn't contain "text". :-)

I suppose "non-TLS" is indeed a more exact term when encryption is not used. I can totally change all occurrences of "plaintext" to "non-TLS" in the comments and docs.

oranagra · 2021-03-10T10:25:49Z

But then it has to be explained that, for historical reasons, cluster-announce-port is the TLS port if cluster-announce-tls-port = 0 and tls-cluster = yes. If that's not too complicated to understand for a user, then I'm fine with this.

arguably, it's not only for historical reasons, it's also possibly slightly easier to use. if you define just one port, and it is used in the context of the other configs.

I suppose "non-TLS" is indeed a more exact term when encryption is not used. I can totally change all occurrences of "plaintext" to "non-TLS" in the comments and docs.

noted about "plaintext" vs "plain text", but i still think "non-tls" would be better if we are to keep the current approach of the PR.
but it would be better to avoid the use of "non-" if my proposal is acceptable.

what about the concern about backwards compatibility issue with the cluster command responses? is there any concern here?

zuiderkwast · 2021-03-10T10:44:15Z

but it would be better to avoid the use of "non-" if my proposal is acceptable.

I'll try. "Non-TLS" is already used in docs though, to highlight the absence of TLS. The only way around it would be formulations like "clients connected without TLS"

what about the concern about backwards compatibility issue with the cluster command responses? is there any concern here?

Not AFAIK.

zuiderkwast · 2021-03-10T10:51:47Z

arguably, it's not only for historical reasons, it's also possibly slightly easier to use. if you define just one port, and it is used in the context of the other configs.

Given how port and tls-port are used in non-cluster (you have to zero port to force TLS) I think it's confusing to use cluster-announce-port for the TLS port on purpose or in examples. Rather, I think including "for historical reasons" provides an explanation for the slightly strange logic. Stay tuned for an updated commit.

yossigo · 2021-03-10T12:41:07Z

Maybe we just need to be more explicit and use cluster-announce-port <tcp|tls>:<port> <tcp|tls>:<port>. If no port prefix is included, we treat it as tcp or tls depending on tls-cluster settings so we're 100% backwards compatible. Configuration is a bit more messy but I think the user experience is as good as we can get.

Note that we also have some ambiguity about the port usage inside the cluster bus. We assume port is either TLS or non-TLS, and can include an additional non-TLS port; but what if the cluster bus is non-TLS and we want to also expose a TLS port to clients?

oranagra · 2021-03-10T13:31:01Z

@yossigo i'm not sure i'm a fan of concatenation and string manipulation, i think i'd rather configs to be primitives (integers) whenever we can, and avoid additional parsing of compound configs (the places were we have that already are painful)

zuiderkwast · 2021-03-10T13:59:01Z

what if the cluster bus is non-TLS and we want to also expose a TLS port to clients?

I've thought about the possibility. For a cluster running on a single host to utilize multiple cores it could be useful. Unix domain sockets can also be useful for the cluster bus in this case.

If we want to support it, I guess we can something similar, e.g. cluster-announce-tls-bus-port and cluster-announce-bus-port. [Edit] Sorry, I mixed up stuff. Perhaps we could add a flag in the bus protocol for this. [Edit 2] ... or we can swap them in the bus protocol so the port field is non-TLS and pport is TLS, if the cluster bus is non-TLS (and possibly rename pport to secondary port, etc.). This PR sends pport = 0 if the cluster bus is non-TLS, so it can be extended in this way without backward compat issues.

redis.conf

yossigo

@zuiderkwast I think we can push this forward, need @redis/core-team approval as it's tagged a major decision.

To summarize our discussions, there is more follow-up work to this PR:

Fully support the case where we have a non-TLS cluster and we want to advertise TLS ports for TLS clients. This will require another case in the config handling as well as more tests.
A way for clients to explicitly reference their connection redirection preference.

In the longer term, we need to complete this work by extending CLUSTER SLOTS to support multiple endpoints per node. This can be used not only for TLS/non-TLS but also Unix sockets etc.

src/cluster.c

oranagra

New config directives LGTM.

I didn't review the code, i imagine the top comment needs some update.
So I'll describe what I understand:

the nodes used to communicate on the cluster bus either the TLS or non-TLS port depending on the tls-cluster config. and the user was able to override that by defining cluster-announce-port
now the nodes can communicate both ports between them, and the user is able to define an alternative (non-default) announce port for both TLS and non-TLS (with cluster-announce-tls-port).
CLUSTER SLOTS now report the TLS port if the client connected to the current node on the TLS port, and reports the non-TLS port if the client connected to it by non-TLS.

madolson

Mostly just some high level questions. I'm not a big fan of adding support for this type of configuration since it seems very insecure, but it seems like a stepping stone for having online TLS enabling, so that seems like a win.

src/cluster.c

redis.conf

zuiderkwast · 2021-03-24T20:09:38Z

Bump.

Please re-approve if you're happy with the small refactoring commit added.

... or should I rebase first?

Although Redis can be configured to allow plaintext and TLS clients side by side, (until now) Redis Cluster doesn't. If tls-cluster is set to yes, the cluster bus and the client ports reported by CLUSTER SLOTS, CLUSTER NODES, ASK and MOVED redirects are treated as TLS ports. To support also plaintext clients, commands such as CLUSTER NODES report the plaintext port to clients connected over plaintext and the TLS port to clients connected over TLS. The cluster bus is extended to contain the plaintext port of each node, whenever TLS cluster is enabled. This is fit in unused bits in the cluster bus header structure and the gossip structure. Before this commit, the unused area was memset to zero, which is treated as none, meaning a cluster node with this feature compatible with an older node without the feature.

If a client is connected over plaintext (non-TLS), report the plaintext port to the client in: * CLUSTER NODES * CLUSTER SLAVES * ASK and MOVED redirects Test: * In TCL cluster client, add parameter to specify if TLS should be used, instead of using the global ::tls variable. * In 04-resharding, query with both TLS and plaintext clients, covering redirects and CLUSTER NODES

If tls-cluster is yes and cluster-announce-tls-port is set, it refers to TLS port and cluster-announce-port refers to the non-TLS port. If tls-cluster is yes and cluster-announce-tls-port is zero, then instead cluster-announce-port refers to the TLS port. This logic replaces the cluster-announce-plain-port added in previous commit.

zuiderkwast · 2021-03-24T20:56:31Z

Rebased.

madolson

I'm good with this change. I left a comment about also committing the pport to the nodes conf so we don't show the wrong port transitively, but don't feel like that needs to be solved immediately.

yossigo · 2021-03-30T20:13:12Z

A long journey but this is now merged, thank you @zuiderkwast!

The cluster bus is established over TLS or non-TLS depending on the configuration tls-cluster. The client ports distributed in the cluster and sent to clients are assumed to be TLS or non-TLS also depending on tls-cluster. The cluster bus is now extended to also contain the non-TLS port of clients in a TLS cluster, when available. The non-TLS port of a cluster node, when available, is sent to clients connected without TLS in responses to CLUSTER SLOTS, CLUSTER NODES, CLUSTER SLAVES and MOVED and ASK redirects, instead of the TLS port. The user was able to override the client port by defining cluster-announce-port. Now cluster-announce-tls-port is added, so the user can define an alternative announce port for both TLS and non-TLS clients. Fixes redis#8134

zuiderkwast force-pushed the tls-cluster-plaintext-port branch from 53e81f3 to f4733d0 Compare March 4, 2021 14:46

zuiderkwast marked this pull request as ready for review March 4, 2021 14:47

zuiderkwast changed the title ~~Add support for plaintext clients in TLS cluster (WIP)~~ Add support for plaintext clients in TLS cluster Mar 4, 2021

madolson added the state:major-decision Requires core team consensus label Mar 5, 2021

yossigo previously approved these changes Mar 9, 2021

View reviewed changes

zuiderkwast dismissed yossigo’s stale review via 1966491 March 10, 2021 13:06

zuiderkwast commented Mar 10, 2021

View reviewed changes

redis.conf Outdated Show resolved Hide resolved

yossigo previously approved these changes Mar 14, 2021

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

yossigo added approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes labels Mar 14, 2021

oranagra previously approved these changes Mar 15, 2021

View reviewed changes

madolson reviewed Mar 16, 2021

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

redis.conf Outdated Show resolved Hide resolved

zuiderkwast dismissed stale reviews from oranagra and yossigo via fef221e March 16, 2021 09:35

zuiderkwast added 5 commits March 24, 2021 21:32

Add test for CLUSTER SLOTS and store TLS flag per client in TCL client

f9587e6

Refactor into helper function deriveAnnouncedPorts()

dec4780

zuiderkwast force-pushed the tls-cluster-plaintext-port branch from fef221e to dec4780 Compare March 24, 2021 20:56

madolson approved these changes Mar 25, 2021

View reviewed changes

yossigo approved these changes Mar 25, 2021

View reviewed changes

yossigo merged commit 5629dbe into redis:unstable Mar 30, 2021

oranagra mentioned this pull request Mar 31, 2021

Solve sentinel test issue in TLS due to recent tests change. #8728

Merged

oranagra mentioned this pull request Apr 19, 2021

Release 6.2.2 #8817

Merged

soloestoy mentioned this pull request May 12, 2023

Decouple the use of TLS in cluster bus and in client connection. #12143

Closed

zuiderkwast deleted the tls-cluster-plaintext-port branch January 12, 2024 09:56

ggivo mentioned this pull request Nov 11, 2024

Revisit TLS Cluster tests to drop stunnel redis/jedis#4018

Closed

Conversation

zuiderkwast commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Mar 4, 2021

Uh oh!

yossigo commented Mar 7, 2021

Uh oh!

zuiderkwast commented Mar 8, 2021

Uh oh!

yossigo commented Mar 8, 2021

Uh oh!

zuiderkwast commented Mar 8, 2021

Uh oh!

yossigo commented Mar 9, 2021

Uh oh!

itamarhaber commented Mar 9, 2021

Uh oh!

oranagra commented Mar 10, 2021

Uh oh!

zuiderkwast commented Mar 10, 2021

Uh oh!

oranagra commented Mar 10, 2021

Uh oh!

zuiderkwast commented Mar 10, 2021

Uh oh!

zuiderkwast commented Mar 10, 2021

Uh oh!

yossigo commented Mar 10, 2021

Uh oh!

oranagra commented Mar 10, 2021

Uh oh!

zuiderkwast commented Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Mar 24, 2021

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

yossigo commented Mar 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zuiderkwast commented Mar 2, 2021 •

edited

Loading

zuiderkwast commented Mar 10, 2021 •

edited

Loading

zuiderkwast commented Mar 24, 2021 •

edited

Loading