Obuf limit, exit during loop in RAND commands and KEYS #11676

oranagra · 2023-01-02T09:44:13Z

Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection)

This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).

NOTICE: in Redis 7.0 this fix covers KEYS as well, but in 6.2 and 6.0 it doesn't, this is because in 7.0 there's a mechanism to avoid sending partial replies to the client, and in older releases there isn't, and without it there's a risk that the client would be able to read what looks like a complete KEYS command.

ranshid · 2023-01-02T10:13:45Z

@oranagra I think that long commands are something we need to address at some point. it is true that this is an anti pattern but the truth is that it is still widely used and they do cause operational issues (like replica disconnecting and cluster failovers)
I am also not sure this is very maintainable to start placing checks during commands but let me address this specific proposal.
I took a quick overview of this PR and I was wondering that in case we start "heartbitting" in the middle of potentially long commands maybe we need something more similar to the LUA timeout handling? something that can also feed the cluster bus keepalive, report busy on hanging clients and optionally feed replicas (maybe only in sake of ping-pongs).
Having a dedicated function to keepAliveWhileBusy may serve many cases like module commands and might help to solve some issues related to LUA and long running commands

oranagra · 2023-01-02T13:33:05Z

I don't think we want to add heartbitting or yielding to core commands, i think we should aim for them to be short.
that is until we have threaded command execution, which is actually meant to improve performance and latency, not to avoid hangs and timeouts.

i think what makes this one different is one of two things:

in case we already identified the client should be disconnected, it's kinda a bug that we keep running a (nearly infinite) loop for it.
specifically for the RANDMEMBER commands, the caller can cause a loop that's not bound to the size of the data and have an infinite amplification factor for the amount of work he has to invest vs the load it causes on the server.

i'm thinking to revert all the changes in this PR and leave just the ones in the RANDMEMBER family.

zuiderkwast

LGTM. This doesn't solve everything but it solves some things and these if (CLIENT_CLOSE_ASAP) break doesn't make the code harder to read or anything.

ranshid · 2023-01-02T15:31:16Z

I don't think we want to add heartbitting or yielding to core commands, i think we should aim for them to be short. that is until we have threaded command execution, which is actually meant to improve performance and latency, not to avoid hangs and timeouts.

i think what makes this one different is one of two things:

in case we already identified the client should be disconnected, it's kinda a bug that we keep running a (nearly infinite) loop for it.

specifically for the RANDMEMBER commands, the caller can cause a loop that's not bound to the size of the data and have an infinite amplification factor for the amount of work he has to invest vs the load it causes on the server.

i'm thinking to revert all the changes in this PR and leave just the ones in the RANDMEMBER family.

If we suspect that RANDMEMBER can take so long and cause OOM , why do we not just limit the count?

oranagra · 2023-01-02T18:11:33Z

Limit how? By which threshold or factor? That would be a breaking change...

ranshid · 2023-01-02T19:24:43Z

Limit how? By which threshold or factor? That would be a breaking change...

Yes. I agree this is a breaking change (even though I doubt someone uses such high numbers)

madolson · 2023-01-03T08:17:00Z

I'm not very happy with this suggested mitigation, although it does handle some cases.

The change is making an assumption that "disconnecting" clients is not a breaking change, but we are doing it after the server has been impacted. Disconnecting before doing work would be much better than doing it late.
It doesn't meaningfully stop many of the CVEs since you can turn replys off and then execute the impacting command since it won't accumulated any CoB. We also won't evict in various memory scenarios when client eviction is disabled.
KEY * need not accumulate COB if it doesn't match anything.

I would still rather see is investigate potentially breaking changes in order to better lock down these changes. I agree with @ranshid that this practically won't be a breaking change.

oranagra · 2023-01-03T08:31:35Z

Limit how? By which threshold or factor? That would be a breaking change...

Yes. I agree this is a breaking change (even though I doubt someone uses such high numbers)

my concern is that any threshold we'll choose would be too high for some deployments and would not protect them from abuse, and on the other hand too low for other use cases and could be a bad breaking change.

the one that did occur to me with regards to the RAND commands is to limit it as a factor of 4 times the size of the set, after that i'll argue that the probability of selecting items at random doesn't matter much, and also it may much better to do that on the client side.
but i suppose even that would still be a breaking change.

the fact is that regardless of this discussion about RAND and KEYS, if redis is left exposed to abuse, it's not hard to overload it with other commands, not to mention scripts.

you are right that turning off client replies will render this mitigation useless, but on the bright side, it means the offender is only able to hog CPU and availability of redis, and not also the host memory.

zuiderkwast · 2023-01-03T10:20:37Z

I don't think this stops a malicious attacker. If they get in, they can cause problems using scripts as you say.

I believe KEYS is often used in simple admin scripts, taking a backup of all keys or something. It may work for the user but not for the redis adming or hosting service. If that's what we want to solve, then I guess we can add a config for maximum N for these O(N) commands. Then the redis admin can prevent the redis user from doing this. WDYT?

oranagra · 2023-01-03T13:11:55Z

i don't like it.
it means that when adding that config, the default value must be infinite / disabled.
and also anyone who sets it, his app can appear to work one day, and break the next day when for some reason a collection grows unexpectedly (normally i'd rather pay that price in latency, and have them debug things to find out what's wrong, i.e. gradual overhead, rather than a sudden malfunction).

it's important to note that this discussion has two topics that we should maybe keep separate.
the discussion around KEYS (and EVAL) is old news, and the one about the RAND family with negative count may deserve a quicker reaction (for being new, and having a larger amplification factor on the work an attacker needs to invest vs the damage it causes).

ranshid · 2023-01-03T14:11:46Z

I am not sure why the DDOS issue was raised, since Redis has mechanisms to prevent unauthorized execution of these commands.
In my opinion it is fine that the user will choose to run a very long command which will take long time to complete. it might be an anti-pattern, but it seems bad IMO to prevent the user for doing so entirely.
I think the main problem is that Redis as a service is impacted by these "anti-patterns". Replication which is lagging, cluster failovers and inability of the user to identify busy-engine are all issues which prevent Redis from being operated correctly on managed environments.
We have once discussed running long commands in background (which is why I started to work on refactoring the Redis blocking mechanism) which would probably help in allowing users to run long commands with less impact on their application, but I think the service should still not suffer impact even when all commands are executed in the foreground.

madolson · 2023-01-04T01:23:22Z

the fact is that regardless of this discussion about RAND and KEYS, if redis is left exposed to abuse, it's not hard to overload it with other commands, not to mention scripts.

This does seem to be the "crux" of the DDOS conversation. I want to write a little about it here, but maybe we should move the conversation elsewhere. There are really two attack vectors we are talking about, the first is someone has a valid authenticated connection to Redis. I honestly don't think we should really care about this vector. This attack vector includes someone writing an arbitrary lua script that causes an infinite loop. I think this could be generalized to just sending a large number of TLS connections as well, which can also overwhelm the server.

The second vector is someone is able to inject some data into an application which then sends it verbatim to a downstream Redis server. As was mentioned, I don't think many applications routinely call KEYS * or arbitrary lua scripts, but I do think they may call SRANDMKEY as part of a real flow. I think it's worth going out of our way to limit the output and impact of these commands.

the one that did occur to me with regards to the RAND commands is to limit it as a factor of 4 times the size of the set, after that i'll argue that the probability of selecting items at random doesn't matter much, and also it may much better to do that on the client side. but i suppose even that would still be a breaking change.

I would be okay with this type of breaking change.

JimB123 · 2023-01-10T22:14:58Z

In this context, I'm generally not that concerned about a malicious attacker. A Redis instance should already be protected by networking restrictions and authentication/authorization rules. IMO, the bigger concern is for accidental/unintended execution.

KEYS * is the classic case. We see all too often when users mistakenly use this as their dataset grows. We have introduced SCAN as the recommended alternative - but have done nothing to deprecate KEYS *. I would recommend that we introduce a limit for run-time and for number of keys returned. Yes, this is a "breaking change". But if we simply say (for example) that KEYS will return the first 1000 values and will exit after 100ms of run-time, that will clarify that it's a development-only command. It encourages the use of SCAN. It eliminates accidental/unintentional usage.

We can do something similar with other commands which accept a "count" parameter. Simply limit the count to a "reasonable" value. Who really needs 1,000,000,000 items extracted from a SET? One option for this would be to create a config parameter which can be used to explicitly set these limits.

oranagra · 2023-01-11T07:32:45Z

We discussed this issue in a core-team meeting, and our conclusions were:

we decided not to consider that as a new security issue since it's effectively the same outcome of a script with a write command and an infinite loop.
instead, we can consider the current PR as a bug fix. i.e. if we decided to drop a connection, we could / should quit the loop (still need to decide on which commands to apply that fix).
we would like to improve redis with a config that would limit these (and other) loops, and have a sensible default, but since that's a breaking change we're likely to add the config with an infinite default in 7.2 and maybe change the default to a sane one in 8.0.

oranagra · 2023-01-11T13:19:00Z

I decided to revert the changes to HGETALL, LRANGE, ZRANGE, etc since going this way would mean we'll need to start treating too many loops in redis code and it's hard to know when to stop (slippery slope).
instead, the decision of what to fix is where there's a huge amplification factor the caller can take advantage of.
e.g:

in HRANDMEMBER the caller only needs to create a hash with one field, and it can be used to hang redis.
in KESY the caller can make use of the existing keyspace to hang redis.
but for HGETALL it would require either creating a huge hash, or trying to find an existing one.
similar issues can come from SCAN, but with current implementation it doesn't append to the output buffer while searching, so we can't take advantage of client obuf limit

soloestoy

agree this change, but do we really need backport to old versions? I don't think it's a security issue.

Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).

oranagra · 2023-01-12T10:59:46Z

it's a harmless bugfix that can prevent a lot of pain, why not backport?

soloestoy · 2023-01-13T02:44:05Z

the mechanism we handle clients with CLIENT_CLOSE_ASAP flag.

in 7.0 we free the CLIENT_CLOSE_ASAP clients in beforeNextClient after processInputBuffer, it means if the client's obuf reaches the limit, it will be freed immediately and cannot send reply to client.

but before 7.0, we only free the CLIENT_CLOSE_ASAP clients in beforeSleep by calling freeClientsInAsyncFreeQueue, the client which reaches the limit cannot be freed immediately, it means if the client has already registered the write event(not the pending write list, the epoll event), then this client may send the wrong reply to client.

for example, if we backport to the old version before 7.0, then client may get wrong reply from server before disconnected.

it's ok if you backport with beforeNextClient, but I don't like the dependency, when you backport a feature you have to backport the other features it rely on, then we could fall into infinite recursion, unless the original feature is important enough like security issue.

oranagra · 2023-01-13T06:51:22Z

There's something I don't understand.
The issue with writing partial response to the disconnected client existed in previous releases anyway (regardless of this change).
The only difference in this change is that we break the loop. It doesn't affect anything else, and doesn't affect what the client can read.

soloestoy · 2023-01-13T07:28:50Z

previous release can only get part of right reply, but if break in keys client may get wrong reply.

say an instance contains 1000 keys, and client's obuf reaches limit when iterated 3 keys.

before backport, the client may gets *1000\r\n$1\r\na\r\n$1\r\nb\r\n$1\r\nc\r\n, the client gets the right count but part of keys, and then disconnected by server.

after backport, the client may gets *3\r\n$1\r\na\r\n$1\r\nb\r\n$1\r\nc\r\n, the client gets the whole reply but the count is wrong, users can see this reply and then client disconnected by server.

if backport with beforeNextClient, the client can be disconnected without any reply.

oranagra · 2023-01-13T11:01:16Z

ohh, now i get it.. because of the deferring reply which we update when exiting the loop and that part doesn't check the CLOSE_ASAP flag.

well, since addReplyBulk returns without adding the item to the output buffer, but we still increment numkeys, the response will still be an incomplete one (i.e. client that tries to read it and ignores the disconnection will hang as before).

soloestoy · 2023-01-17T02:10:36Z

ohh, now i get it.. because of the deferring reply which we update when exiting the loop and that part doesn't check the CLOSE_ASAP flag.

well, since addReplyBulk returns without adding the item to the output buffer, but we still increment numkeys, the response will still be an incomplete one (i.e. client that tries to read it and ignores the disconnection will hang as before).

the client will not hang since it can receive the complete response *3\r\n$1\r\na\r\n$1\r\nb\r\n$1\r\nc\r\n, but the response is wrong, the instance contains 1000 keys.

I see you backport to the previous versions but without beforeNextClient, forgot or fix it by another way?

oranagra · 2023-01-17T05:28:29Z

it will not be a complete response since it'll be a *3 with only 2 elements (or *4 with 3).
am i wrong?

soloestoy · 2023-01-17T05:59:04Z

the keys iteration breaks after addReplyBulk(c,keyobj) and numkeys++, so it has chance to be a complete response *3 with 3 elements I think.

oranagra · 2023-01-17T07:54:32Z

ohh, i see what i was missing. for some reason i thought that prepareClientToWrite was the one setting the CLOSE_ASAP flag (before adding the reply, thus incrementing numkeys on something that wasn't added to the reply list).

@yossigo please advise.
TL:DR in 6.2 (released by not announced yet), KEYS will return an incomplete reply (who looks valid), which will be readable by the client before being disconnected.
shall we make a 6.2.10 to fix this? by backporting another fix (to free CLOSE_ASAP clients without sending the reply)

oranagra · 2023-01-17T08:52:21Z

i discussed this with Yossi, and realized a few points:

if the client output buffer limit is greater than the OS socket buffers size, it is likely that even in that case of this bug, we'll get an incomplete write and the client application won't be able to read this (full) response. you can argue that it still gets the array length part, but it's likely that the client library will not pass this partial response to the client app.
we can add an additional numkeys++; next to the break; in keysCommand, this would guarantee that the user will not be able to read a complete response. (still arguably a problem if we say the client can still react to the array size and a false array size is bad)
we can backport the change to disconnect clients in beforeSleep or beforeNextClient before sending the response down the line, but that's arguably also a braking changes, since when the client uses pipeline, in the past it used to be able to read some responses before the incomplete one, and now it won't.
last option is to revert the change to keysCommand from 6.2 and 6.0 (and keep only the changes to the RAND commands).

@soloestoy we believe option [4] is the best, please share your thoughts.

soloestoy · 2023-01-17T09:28:41Z

I'm OK with option 4

Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: SRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. NOTICE: For in Redis 7.0 this fix handles KEYS as well, but in this branch it doesn't, details in #11676

in Redis 7.0 this fix covers KEYS as well, but in 6.2 and 6.0 it doesn't, this is because in 7.0 there's a mechanism to avoid sending partial replies to the client, and in older releases there isn't, and without it there's a risk that the client would be able to read what looks like a complete KEYS command.

Related to the hang reported in redis#11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).

oranagra linked an issue Jan 2, 2023 that may be closed by this pull request

[BUG] hrandfield hangs the server #11671

Open

zuiderkwast approved these changes Jan 2, 2023

View reviewed changes

oranagra mentioned this pull request Jan 11, 2023

[BUG] hrandfield hangs the server #11671

Open

oranagra removed a link to an issue Jan 11, 2023

[BUG] hrandfield hangs the server #11671

Open

oranagra changed the title ~~WIP - Obuf limit, exit during loop in *RAND* commands, KEYS, and alike~~ Obuf limit, exit during loop in *RAND* commands and KEYS Jan 11, 2023

oranagra requested a review from yossigo January 11, 2023 13:26

soloestoy approved these changes Jan 12, 2023

View reviewed changes

This was referenced Jan 16, 2023

Redis 6.0.17 #11716

Merged

Redis 6.2.9 #11717

Merged

Redis 7.0.8 #11718

Merged

squashme into obuf fix

48d2cb6

oranagra merged commit b412366 into redis:unstable Jan 16, 2023

oranagra deleted the obuf_exit_loops branch January 16, 2023 11:51

This was referenced Jan 17, 2023

Redis 6.2.10 #11727

Closed

Redis 6.2.10 #11728

Merged

msotheeswaran-sc mentioned this pull request Apr 18, 2023

Rand fixes from redis Snapchat/KeyDB#639

Merged

rjd15372 mentioned this pull request Sep 17, 2024

[BUG] A large negative number(e.g. -LONG_MAX) in HRANDFIELD/ZRANDMEMBER can easily cause Valkey to crash. valkey-io/valkey#844

Open

Obuf limit, exit during loop in *RAND* commands and KEYS #11676

Obuf limit, exit during loop in *RAND* commands and KEYS #11676

Uh oh!

Conversation

oranagra commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranshid commented Jan 2, 2023

Uh oh!

oranagra commented Jan 2, 2023

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

ranshid commented Jan 2, 2023

Uh oh!

oranagra commented Jan 2, 2023

Uh oh!

ranshid commented Jan 2, 2023

Uh oh!

madolson commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jan 3, 2023

Uh oh!

zuiderkwast commented Jan 3, 2023

Uh oh!

oranagra commented Jan 3, 2023

Uh oh!

ranshid commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimB123 commented Jan 10, 2023

Uh oh!

oranagra commented Jan 11, 2023

Uh oh!

oranagra commented Jan 11, 2023

Uh oh!

soloestoy left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Jan 12, 2023

Uh oh!

soloestoy commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jan 13, 2023

Uh oh!

soloestoy commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soloestoy commented Jan 17, 2023

Uh oh!

oranagra commented Jan 17, 2023

Uh oh!

soloestoy commented Jan 17, 2023

Uh oh!

oranagra commented Jan 17, 2023

Uh oh!

oranagra commented Jan 17, 2023

Uh oh!

soloestoy commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Obuf limit, exit during loop in RAND commands and KEYS #11676

Obuf limit, exit during loop in RAND commands and KEYS #11676

oranagra commented Jan 2, 2023 •

edited

Loading

madolson commented Jan 3, 2023 •

edited

Loading

ranshid commented Jan 3, 2023 •

edited

Loading

madolson commented Jan 4, 2023 •

edited

Loading

soloestoy commented Jan 13, 2023 •

edited

Loading

soloestoy commented Jan 13, 2023 •

edited

Loading

oranagra commented Jan 13, 2023 •

edited

Loading