Skip to content

Don't write to clients with CLIENT_CLOSE_ASAP flag#7694

Closed
jeffreylovitz wants to merge 1 commit intoredis:unstablefrom
jeffreylovitz:write-to-closing-clients
Closed

Don't write to clients with CLIENT_CLOSE_ASAP flag#7694
jeffreylovitz wants to merge 1 commit intoredis:unstablefrom
jeffreylovitz:write-to-closing-clients

Conversation

@jeffreylovitz
Copy link
Contributor

This PR resolves an issue observed on RedisGraph in which prematurely closing clients that are awaiting replies from Redis can cause a server crash on a broken pipe.

I can consistently reproduce this issue locally, but the code to do so is unfortunately non-trivial. The reproduction steps include instantiating several clients, issuing RedisGraph commands which return data, then disconnecting the clients without waiting for responses.

This causes a crash with the stack trace:

Thread 1 "redis-server" received signal SIGPIPE, Broken pipe.
0x00007ffff76252c7 in __libc_write (fd=16, buf=0x555555a74bc8, nbytes=140) at ../sysdeps/unix/sysv/linux/write.c:27
27      ../sysdeps/unix/sysv/linux/write.c: No such file or directory.

#0  0x00007ffff76252c7 in __libc_write (fd=16, buf=0x555555a74bc8, nbytes=140) at ../sysdeps/unix/sysv/linux/write.c:27
#1  0x000055555564dfef in connSocketWrite (conn=0x555555a1ec50, data=0x555555a74bc8, data_len=140) at connection.c:168
#2  0x00005555555a2af7 in connWrite (conn=0x555555a1ec50, data=0x555555a74bc8, data_len=140) at connection.h:140
#3  0x00005555555a630f in writeToClient (c=0x555555a74970, handler_installed=0) at networking.c:1311
#4  0x00005555555a6780 in handleClientsWithPendingWrites () at networking.c:1426
#5  0x00005555555ab64c in handleClientsWithPendingWritesUsingThreads () at networking.c:3063
#6  0x0000555555590ef5 in beforeSleep (eventLoop=0x555555928230) at server.c:2202
#7  0x000055555558a42a in aeProcessEvents (eventLoop=0x555555928230, flags=27) at ae.c:443
#8  0x000055555558a748 in aeMain (eventLoop=0x555555928230) at ae.c:539
#9  0x00005555555996fe in main (argc=2, argv=0x7fffffffd2a8) at server.c:5325

This issue is reproduce on unstable and 6.0.6, this PR resolves it on both.

@madolson
Copy link
Contributor

Hey, thanks for trying to fix this, but I'm not sure this is the right fix. We should attempt to write out pending data from a client that has been marked as close asap, since there are certain places we do it intentionally. I'm also wondering what RedisGraph is doing, since the SIGPIPE signal should be ignored. You might want to follow up with them:

ref:

signal(SIGPIPE, SIG_IGN);

@oranagra
Copy link
Member

This is indeed odd that you're getting that signal although suppose to be ignored.
@jeffreylovitz is there any chance RedisGraph is doing something that overrides that?
@yossigo maybe you have some other idea why this can happen?

Regarding the fix, there's actually already a PR with a similar fix and a similar title: #7202 (attempting to fix a different issue), it looks like Salvatore was actually ok with this specific change, but not about another one in that other PR.

@madolson i suppose you're referring to clientAcceptHandler, but there we're writing before setting the flag, and anyway, we're writing directly to the socket, not using the write handler.
is there anything i'm missing?

@oranagra
Copy link
Member

oranagra commented Aug 23, 2020

Ohh sorry, i now realize it's not the same change as in the other PR (it's the same as the one Salvatore suggested instead of one of the changes there).

@yossigo
Copy link
Collaborator

yossigo commented Aug 23, 2020

@oranagra no idea, unless it gets overridden elsewhere but it doesn't seem so.

@jeffreylovitz
Copy link
Contributor Author

@oranagra We override the SIGUSR signals and, on some versions, inject our own logs upon receiving the SIGSEGV signal before calling the Redis handler. Other than that, RedisGraph shouldn't interfere at all.

@yossigo
Copy link
Collaborator

yossigo commented Aug 23, 2020

@jeffreylovitz I can confirm creating similar situations where there's a lot of buffered output and a client drops, connSocketWrite() does get an EPIPE response as expected (i.e. SIGPIPE is ignored). Something in RedisGraph must be messing up with the handler...

@jeffreylovitz
Copy link
Contributor Author

@yossigo To get more context, I added an explicit EPIPE signal handler to RedisGraph, and it fails to catch the signal or have any impact on this crash.

@madolson
Copy link
Contributor

@oranagra I was actually thinking about ACLs, but thinking about it more this PR looks correct, since we have another functionality to achieve the mechanism I was considering.

@oranagra
Copy link
Member

oranagra commented Sep 1, 2020

@madolson i guess you meant #7202 right?

@madolson
Copy link
Contributor

@jeffreylovitz I think your change should now pass the tests since we fixed the underlying issue, would you mind doing an empty push to your branch to kickoff the CI again?

@oranagra
Copy link
Member

This was actually already resolved in #7202 (has exactly this change and more. I forgot to close this one).
@madolson FYI, there's a button that can rerun a failed CI, but in this case a rebase was required (not an empty push)

@oranagra oranagra closed this Oct 16, 2020
YaacovHazan pushed a commit that referenced this pull request Jan 26, 2026
**Bug Fixes:**

* [#7219](RediSearch/RediSearch#7219) Fix a
concurrency issue on Reducer in `FT.AGGREGATE`
* [#7255](RediSearch/RediSearch#7255) Fix
`BM25STD` underflow wraparound
* [#7275](RediSearch/RediSearch#7275) Report
used memory as `unsigned long long` to avoid underflows
* [#7264](RediSearch/RediSearch#7264) Fix
`totalDocsLen` updates
* [#6995](RediSearch/RediSearch#6995) Do not
fanout `FT.INFO` to replicas
* [#7350](RediSearch/RediSearch#7350) Fix
`FT.CREATE` failure with LeanVec parameters on non-Intel architectures
* [#7694](RediSearch/RediSearch#7694) Use
asynchronous jobs for GC in SVS to accelerate execution
* [#7384](RediSearch/RediSearch#7384) Fix index
load from RDB temporary memory overhead
* [#7459](RediSearch/RediSearch#7459) Fix Fork
GC potential double-free on error path
* [#7458](RediSearch/RediSearch#7458) Fix a GC
performence regression
* [#7470](RediSearch/RediSearch#7470) Avoid
draining worker thread pool from FLUSH callback to avoid deadlocks
* [#7554](RediSearch/RediSearch#7554) Handle the
case where `SCORE` is sent alone without extra fields (coordinator)
* [#7685](RediSearch/RediSearch#7685) Fix cursor
logical leak
* [#7794](RediSearch/RediSearch#7794) Fix
`cmp_strings()` to correctly handle binary data with embedded NULLs in
TOLIST reducer in FT.AGGREGATE
* [#7873](RediSearch/RediSearch#7873) Handle
warnings in empty `FT.AGGREGATE` replies (cluster)
* [#7886](RediSearch/RediSearch#7886) Remove
non-TEXT fields from the spec keys dictionary
* [#7904](RediSearch/RediSearch#7904) Refactor
keys dictionary handling
* [#7901](RediSearch/RediSearch#7901) Support
multiple warnings in reply
* [#8083](RediSearch/RediSearch#8083) Fix
incorrect FULLTEXT field metric counts
* [#8153](RediSearch/RediSearch#8153) Fix
configuration registration issues

**Improvements:**

* [#7154](RediSearch/RediSearch#7154)
`FT.AGGREGATE` can return Background Indexing OOM warnings
* [#7083](RediSearch/RediSearch#7083) Add the
default text scorer as a configuration option
* [#7341](RediSearch/RediSearch#7341) Rename
`FT.PROFILE` counter fields
* [#7436](RediSearch/RediSearch#7436) Enhance
`FT.PROFILE` with vector search execution details
* [#7435](RediSearch/RediSearch#7435) Ensure
full `FT.PROFILE` output on timeout with RETURN policy
* [#7534](RediSearch/RediSearch#7534) Reduce the
number of worker threads asynchronously to avoid deadlocks during
queries
* [#7614](RediSearch/RediSearch#7614) Track
timeout warnings and errors in INFO
* [#7646](RediSearch/RediSearch#7646) Track
`maxprefixexpansions` warnings and errors in INFO
* [#7577](RediSearch/RediSearch#7577) Track
query syntax/argument errors (basis for query error tracking)
* [#7737](RediSearch/RediSearch#7737) Add
`Internal cursor reads` metric to cluster `FT.PROFILE` output
* [#7759](RediSearch/RediSearch#7759) Extend
indexing metrics
* [#7710](RediSearch/RediSearch#7710) Support
`WITHCOUNT` keyword in `FT.AGGREGATE`
* [#7957](RediSearch/RediSearch#7957) Persist
query warnings across cursor reads
* [#8054](RediSearch/RediSearch#8054) Add
logging for index-related commands
* [#8151](RediSearch/RediSearch#8151) Fix shard
total profile time reporting in `FT.PROFILE`
* [#8103](RediSearch/RediSearch#8103) Output
current thread IndexSpec information on crash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants