-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix #2249: Handle fail on cluster node responses appropriately #2288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now. Docs at: https://redis.io/commands/cluster-nodes/
slorello89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @NickCraver - did you intend to check the PFAIL state as well? From the comments above it sounds like it, but this is only looking at the FAIL state, (fail? in the the flags rather than fail?).
| } | ||
|
|
||
| NodeId = parts[0]; | ||
| IsFail = flags.Contains("fail"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps have a separate field for fail? which would indicate a PFAIL? Not sure what the alternative branch would be, perhaps don't retry connection to a node that's in a PFAIL state, since we have some confirmation that it's possibly in a failed state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally left it off because I didn't want to make assumptions especially say across a failed DC link where the client might be able to see a split brained cluster node but the initial connected node may not. Not set on anything though - we could still have it and not make decisions on it. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PFAIL is meant to be purely informational "hey this cluster node thinks that node's dead but it doesn't KNOW it's dead, make of it what you will". To your point you could have a situation where cluster node 1 think's it's alive and that cluster node 2 is dead and vice versa. So the question then becomes do you make the decision as to what to do with it in the client, or do you leave it to the library's user. I don't believe there's a canonical behavior here so leaving an informational PFAIL here and letting the user decide what to do about it could be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a IsPossiblyFail property so it's accessible!
|
|
||
| /// <summary> | ||
| /// Gets whether this node is possibly in a failed state. | ||
| /// Possibly here means the node we're getting status from can't communicate with it, but doesn't doesn't mean it's down for sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment sentence could use a polish
Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now.
Docs at: https://redis.io/commands/cluster-nodes/