Fix #2249: Handle fail on cluster node responses appropriately #2288

NickCraver · 2022-10-28T12:16:53Z

Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now.

Docs at: https://redis.io/commands/cluster-nodes/

Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now. Docs at: https://redis.io/commands/cluster-nodes/

slorello89

Hey @NickCraver - did you intend to check the PFAIL state as well? From the comments above it sounds like it, but this is only looking at the FAIL state, (fail? in the the flags rather than fail?).

slorello89 · 2022-11-02T12:32:41Z

src/StackExchange.Redis/ClusterConfiguration.cs

            }

            NodeId = parts[0];
+            IsFail = flags.Contains("fail");


perhaps have a separate field for fail? which would indicate a PFAIL? Not sure what the alternative branch would be, perhaps don't retry connection to a node that's in a PFAIL state, since we have some confirmation that it's possibly in a failed state?

I intentionally left it off because I didn't want to make assumptions especially say across a failed DC link where the client might be able to see a split brained cluster node but the initial connected node may not. Not set on anything though - we could still have it and not make decisions on it. Thoughts?

I think PFAIL is meant to be purely informational "hey this cluster node thinks that node's dead but it doesn't KNOW it's dead, make of it what you will". To your point you could have a situation where cluster node 1 think's it's alive and that cluster node 2 is dead and vice versa. So the question then becomes do you make the decision as to what to do with it in the client, or do you leave it to the library's user. I don't believe there's a canonical behavior here so leaving an informational PFAIL here and letting the user decide what to do about it could be useful.

Added a IsPossiblyFail property so it's accessible!

philon-msft · 2022-11-15T15:51:55Z

src/StackExchange.Redis/ClusterConfiguration.cs

+
+        /// <summary>
+        /// Gets whether this node is possibly in a failed state.
+        /// Possibly here means the node we're getting status from can't communicate with it, but doesn't doesn't mean it's down for sure.


This comment sentence could use a polish

NickCraver added 2 commits October 28, 2022 08:16

Add release notes

042a403

NickCraver mentioned this pull request Oct 28, 2022

"master,fail" state not handled correctly #2249

Closed

NickCraver added 🪲 bug ⚙️ area:cluster labels Oct 28, 2022

NickCraver requested review from mgravell and slorello89 October 28, 2022 12:38

slorello89 reviewed Nov 2, 2022

View reviewed changes

NickCraver added 5 commits November 15, 2022 08:03

Merge remote-tracking branch 'origin/main' into craver/fail-state

dfdc4a7

Merge remote-tracking branch 'origin/main' into craver/fail-state

30422fd

Add IsPossiblyFail and release notes

066b801

Heartbeat tests: wide birth

5ba97b7

Alrighty, you get out of here.

4e15165

NickCraver requested a review from philon-msft November 15, 2022 15:36

philon-msft reviewed Nov 15, 2022

View reviewed changes

philon-msft approved these changes Nov 15, 2022

View reviewed changes

NickCraver added 2 commits November 17, 2022 09:29

Fix comment

30dc08a

Fix release notes

01e836a

NickCraver merged commit f3ac74a into main Nov 17, 2022

NickCraver deleted the craver/fail-state branch November 17, 2022 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #2249: Handle fail on cluster node responses appropriately #2288

Fix #2249: Handle fail on cluster node responses appropriately #2288

Uh oh!

NickCraver commented Oct 28, 2022

Uh oh!

slorello89 left a comment

Uh oh!

slorello89 Nov 2, 2022

Uh oh!

NickCraver Nov 2, 2022

Uh oh!

slorello89 Nov 2, 2022

Uh oh!

NickCraver Nov 15, 2022

Uh oh!

philon-msft Nov 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix #2249: Handle fail on cluster node responses appropriately #2288

Fix #2249: Handle fail on cluster node responses appropriately #2288

Uh oh!

Conversation

NickCraver commented Oct 28, 2022

Uh oh!

slorello89 left a comment

Choose a reason for hiding this comment

Uh oh!

slorello89 Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

NickCraver Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

slorello89 Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

NickCraver Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

philon-msft Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants