Skip to content

Conversation

@NickCraver
Copy link
Collaborator

Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now.

Docs at: https://redis.io/commands/cluster-nodes/

Right now we don't pay attention to fail state (PFAIL == FAIL) and continue trying to connect in the main loop. I don't believe this was intended looking at the code, we just weren't handling the flag appropriately. Added now.

Docs at: https://redis.io/commands/cluster-nodes/
Copy link
Collaborator

@slorello89 slorello89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NickCraver - did you intend to check the PFAIL state as well? From the comments above it sounds like it, but this is only looking at the FAIL state, (fail? in the the flags rather than fail?).

}

NodeId = parts[0];
IsFail = flags.Contains("fail");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps have a separate field for fail? which would indicate a PFAIL? Not sure what the alternative branch would be, perhaps don't retry connection to a node that's in a PFAIL state, since we have some confirmation that it's possibly in a failed state?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally left it off because I didn't want to make assumptions especially say across a failed DC link where the client might be able to see a split brained cluster node but the initial connected node may not. Not set on anything though - we could still have it and not make decisions on it. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PFAIL is meant to be purely informational "hey this cluster node thinks that node's dead but it doesn't KNOW it's dead, make of it what you will". To your point you could have a situation where cluster node 1 think's it's alive and that cluster node 2 is dead and vice versa. So the question then becomes do you make the decision as to what to do with it in the client, or do you leave it to the library's user. I don't believe there's a canonical behavior here so leaving an informational PFAIL here and letting the user decide what to do about it could be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a IsPossiblyFail property so it's accessible!


/// <summary>
/// Gets whether this node is possibly in a failed state.
/// Possibly here means the node we're getting status from can't communicate with it, but doesn't doesn't mean it's down for sure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sentence could use a polish

@NickCraver NickCraver merged commit f3ac74a into main Nov 17, 2022
@NickCraver NickCraver deleted the craver/fail-state branch November 17, 2022 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants