-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Obuf limit, exit during loop in *RAND* commands and KEYS #11676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@oranagra I think that long commands are something we need to address at some point. it is true that this is an anti pattern but the truth is that it is still widely used and they do cause operational issues (like replica disconnecting and cluster failovers) |
|
I don't think we want to add heartbitting or yielding to core commands, i think we should aim for them to be short. i think what makes this one different is one of two things:
i'm thinking to revert all the changes in this PR and leave just the ones in the RANDMEMBER family. |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This doesn't solve everything but it solves some things and these if (CLIENT_CLOSE_ASAP) break doesn't make the code harder to read or anything.
If we suspect that RANDMEMBER can take so long and cause OOM , why do we not just limit the count? |
|
Limit how? By which threshold or factor? That would be a breaking change... |
Yes. I agree this is a breaking change (even though I doubt someone uses such high numbers) |
|
I'm not very happy with this suggested mitigation, although it does handle some cases.
I would still rather see is investigate potentially breaking changes in order to better lock down these changes. I agree with @ranshid that this practically won't be a breaking change. |
my concern is that any threshold we'll choose would be too high for some deployments and would not protect them from abuse, and on the other hand too low for other use cases and could be a bad breaking change. the one that did occur to me with regards to the RAND commands is to limit it as a factor of 4 times the size of the set, after that i'll argue that the probability of selecting items at random doesn't matter much, and also it may much better to do that on the client side. the fact is that regardless of this discussion about RAND and KEYS, if redis is left exposed to abuse, it's not hard to overload it with other commands, not to mention scripts. you are right that turning off client replies will render this mitigation useless, but on the bright side, it means the offender is only able to hog CPU and availability of redis, and not also the host memory. |
|
I don't think this stops a malicious attacker. If they get in, they can cause problems using scripts as you say. I believe KEYS is often used in simple admin scripts, taking a backup of all keys or something. It may work for the user but not for the redis adming or hosting service. If that's what we want to solve, then I guess we can add a config for maximum N for these O(N) commands. Then the redis admin can prevent the redis user from doing this. WDYT? |
|
i don't like it. it's important to note that this discussion has two topics that we should maybe keep separate. |
|
I am not sure why the DDOS issue was raised, since Redis has mechanisms to prevent unauthorized execution of these commands. |
This does seem to be the "crux" of the DDOS conversation. I want to write a little about it here, but maybe we should move the conversation elsewhere. There are really two attack vectors we are talking about, the first is someone has a valid authenticated connection to Redis. I honestly don't think we should really care about this vector. This attack vector includes someone writing an arbitrary lua script that causes an infinite loop. I think this could be generalized to just sending a large number of TLS connections as well, which can also overwhelm the server. The second vector is someone is able to inject some data into an application which then sends it verbatim to a downstream Redis server. As was mentioned, I don't think many applications routinely call
I would be okay with this type of breaking change. |
|
In this context, I'm generally not that concerned about a malicious attacker. A Redis instance should already be protected by networking restrictions and authentication/authorization rules. IMO, the bigger concern is for accidental/unintended execution.
We can do something similar with other commands which accept a "count" parameter. Simply limit the count to a "reasonable" value. Who really needs 1,000,000,000 items extracted from a SET? One option for this would be to create a config parameter which can be used to explicitly set these limits. |
|
We discussed this issue in a core-team meeting, and our conclusions were:
|
|
I decided to revert the changes to HGETALL, LRANGE, ZRANGE, etc since going this way would mean we'll need to start treating too many loops in redis code and it's hard to know when to stop (slippery slope).
|
soloestoy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree this change, but do we really need backport to old versions? I don't think it's a security issue.
Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).
|
it's a harmless bugfix that can prevent a lot of pain, why not backport? |
|
the mechanism we handle clients with in 7.0 we free the but before 7.0, we only free the for example, if we backport to the old version before 7.0, then client may get wrong reply from server before disconnected. it's ok if you backport with |
|
There's something I don't understand. |
|
previous release can only get part of right reply, but if break in say an instance contains 1000 keys, and client's obuf reaches limit when iterated 3 keys. before backport, the client may gets after backport, the client may gets if backport with |
|
ohh, now i get it.. because of the deferring reply which we update when exiting the loop and that part doesn't check the CLOSE_ASAP flag. well, since addReplyBulk returns without adding the item to the output buffer, but we still increment numkeys, the response will still be an incomplete one (i.e. client that tries to read it and ignores the disconnection will hang as before). |
the client will not hang since it can receive the complete response I see you backport to the previous versions but without |
|
it will not be a complete response since it'll be a |
|
the keys iteration breaks after |
|
ohh, i see what i was missing. for some reason i thought that prepareClientToWrite was the one setting the CLOSE_ASAP flag (before adding the reply, thus incrementing @yossigo please advise. |
|
i discussed this with Yossi, and realized a few points:
@soloestoy we believe option [4] is the best, please share your thoughts. |
|
I'm OK with option 4 |
Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: SRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. NOTICE: For in Redis 7.0 this fix handles KEYS as well, but in this branch it doesn't, details in #11676
in Redis 7.0 this fix covers KEYS as well, but in 6.2 and 6.0 it doesn't, this is because in 7.0 there's a mechanism to avoid sending partial replies to the client, and in older releases there isn't, and without it there's a risk that the client would be able to read what looks like a complete KEYS command.
Related to the hang reported in redis#11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).
Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection)
This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).
NOTICE: in Redis 7.0 this fix covers KEYS as well, but in 6.2 and 6.0 it doesn't, this is because in 7.0 there's a mechanism to avoid sending partial replies to the client, and in older releases there isn't, and without it there's a risk that the client would be able to read what looks like a complete KEYS command.