Skip to content

Conversation

@amsoedal
Copy link
Contributor

@amsoedal amsoedal commented Sep 23, 2021

Adding an automatic subscription to the AzureRedisEvents pubsub channel for Azure caches. This channel notifies clients of upcoming maintenance and failover events.

By exposing these events, users will be able to know about maintenance ahead of time, and can implement their own logic (e.g. diverting traffic from the cache to another database for the duration of the maintenance event) in response, with the goal of minimizing downtime and disrupted connections. We also automatically refresh our view of the topology of the cluster in response to certain events.

Here are some of the possible notifications:

// This event gets fired ~20s before maintenance begins
NodeMaintenanceStarting,

// This event gets fired when maintenance is imminent (<5s)
NodeMaintenanceStart,

// Indicates that the node maintenance operation is over
NodeMaintenanceEnded,

// Indicates that a replica has been promoted to primary
NodeMaintenanceFailover

@NickCraver
Copy link
Collaborator

Hey there! Thanks for getting this in :)

I'll try and get eyes on this week, but wanted to scan quick and add some top level thoughts:

  1. I think we can remove the config option here - we're detecting if it's Azure Redis already so we can use that to determine whether to hook up to events or not.
  2. I think we can simplify a lot of the objects here, e.g. MaintenanceNotificationListener being simplified to a single method like AzureMaintenanceEvent.AddListener(ConnectionMultiplexer muxer, string cause) (or whatever overload makes sense).
    • Also applies to ReconfigureAfterMaintenance - we can decide whether parsing the event is needed or not
  3. I think we can simplify the looping on the pipe-delimited message to be span-based for an easier-to-understand iterator.
  4. In testing, doing inline Assert.True() instead of if...return false statements in ValidateAzureMaintenanceEvent will make for easier testing - we'll see exactly which check failed by line number in test output that way. If it's false, it'll blow up and not continue the method so that's alright, IMO.

@amsoedal
Copy link
Contributor Author

@NickCraver Thanks so much for the quick reply! I have a couple questions/comments:

  1. Philo mentioned that we'll automatically subscribe to these events for anyone who is using the Azure.StackExchange.Redis extension package. However since subscribing to the pubsub creates another connection, we want this to be opt-in only in order not to burden folks who might already be short on connections. Thoughts?

  2. Could you point me to a place in the code or a doc that illustrates what span-based iteration looks like? Not sure I'm familiar with that type/concept!

Working on addressing the other feedback now :)

@NickCraver
Copy link
Collaborator

Oooh gotcha - alrighty maybe a mismatch on comms here between me and @philon-msft - let me clarify and see if we agree.

On 1: I'm of the opinion we can go ahead and opt everyone into this in the current package if we think we're connecting to Azure. We could make the method for doing so explicitly public so that any non-detectable/custom endpoints are easily opted in for anyone in that situation. I don't think we have to wait for a consumer package move. It won't create another connection than it does today because we already light up the subscriber connection unconditionally on the ServerEndPoint - ultimately you're looking here - called when we first connect.

On 2: I took a stab at illustrating this and...meh, it's kind of a wash. Here's what I ended up with:

/// <summary>
/// Azure node maintenance event details.
/// </summary>
public class AzureMaintenanceEvent
{
	public AzureMaintenanceEvent(ReadOnlySpan<char> message)
	{
		try
		{
			while (message.Length > 0)
			{
				if (message[0] == '|')
				{
					message = message.Slice(1);
					continue;
				}

				// Grab the next pair
				var key = message.Slice(0, message.IndexOf('|'));
				message = message.Slice(key.Length + 1);

				var valueEnd = message.IndexOf('|');
				var value = valueEnd > -1 ? message.Slice(0, valueEnd) : message;
				message = message.Slice(value.Length);

				if (key.Length > 0 && value.Length > 0)
				{
					switch (key)
					{
						case var _ when key.SequenceEqual(nameof(NotificationType)):
							NotificationType = value.ToString();
							break;
						case var _ when key.SequenceEqual(nameof(StartTimeUtc)) && DateTime.TryParse(value, out DateTime startTime):
							StartTimeUtc = DateTime.SpecifyKind(startTime, DateTimeKind.Utc);
							break;
						case var _ when key.SequenceEqual(nameof(IsReplica)) && bool.TryParse(value, out var isReplica):
							IsReplica = isReplica;
							break;
						case var _ when key.SequenceEqual(nameof(IPAddress)) && IPAddress.TryParse(value, out var ipAddress):
							IpAddress = ipAddress;
							break;
						case var _ when key.SequenceEqual(nameof(SSLPort)) && Int32.TryParse(value, out var port):
							SSLPort = port;
							break;
						case var _ when key.SequenceEqual(nameof(NonSSLPort)) && Int32.TryParse(value, out var nonsslport):
							NonSSLPort = nonsslport;
							break;
						default:
							break;
					}
				}
			}
		}
		catch { }
	}

	/// <summary>
	/// The type of event.
	/// </summary>
	public readonly string NotificationType;

	/// <summary>
	/// The start time of the event.
	/// </summary>
	public readonly DateTime? StartTimeUtc;

	/// <summary>
	/// Whether the event is for a replica node.
	/// </summary>
	public readonly bool IsReplica;

	/// <summary>
	/// IPAddress of the node event is intended for
	/// </summary>
	public readonly IPAddress IpAddress;

	/// <summary>
	/// SSL port
	/// </summary>
	public readonly int SSLPort;

	/// <summary>
	/// Non-SSL Port
	/// </summary>
	public readonly int NonSSLPort;

	/// <summary>
	/// Returns a string representing the maintenance event with all of its properties.
	/// </summary>
	public override string ToString() =>
		$"{nameof(NotificationType)}|{NotificationType}|" +
			$"{nameof(StartTimeUtc)}|{StartTimeUtc:s}|" +
			$"{nameof(IsReplica)}|{IsReplica}|" +
			$"{nameof(IpAddress)}|{IpAddress}|" +
			$"{nameof(SSLPort)}|{SSLPort}|" +
			$"{nameof(NonSSLPort)}|{NonSSLPort}";
}

However, after doing that I don't really know that there's any reason to even parse it for our use case, so maybe we just record the string itself (for logs/events/whatever) and move along? If the only practical use of the event parsing is to have specifics we're turning back into a string for debugging...maybe we don't do any work on it at all. If we can log it as-is and that's equally readable as any .ToString() implementation: how about we simplify? We should be aiming to trigger the same reconfigure either way.

What are y'all thinking on these fronts?

@amsoedal
Copy link
Contributor Author

@NickCraver thanks for explaining!

I removed the configuration option since we're not sparing any connections by having it around, as you pointed out. I also cleaned up the tests and condensed the classes into an AzureMaintenanceEvent.AddListenerAsync() method as suggested, and it's so much cleaner now! 😃

The only thing I had some concerns about is the parsing -- I'm not sure if I understood your response correctly (please do correct me if I'm wrong) -- I think we do want to parse the message into a class, since the user might want to do some additional logic based on which event comes back and when the start time is. As for whether we keep the pipe-based iteration or not, I'm happy to switch to using the span method if it's more intuitive!

Copy link
Collaborator

@NickCraver NickCraver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I left some more comments for refinement, but I'm a little fuzzy on if/why we need this to be public and may be missing info there, so take comments with a grain of salt. I see the mismatch on "do we need to parse" after the latest changes - if we expose, I agree we'd need to parse.

Tried to address everything in comments but happy to chat further - looking great!

/// <summary>
/// Azure node maintenance event details
/// </summary>
public class AzureMaintenanceEvent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, if we do remove the events, let's make this internal to start with so we can change the surface area if needed in a non-breaking way :)

@amsoedal
Copy link
Contributor Author

@NickCraver thanks for the excellent review! I've replied to most of the comments, and updated this PR's description with more context (specifically re: parsing and exposing events). Let me know if it makes sense, and hopefully we can sync up if there's anything else we need to align on! :)

case var _ when key.SequenceEqual(nameof(NotificationType).ToCharArray()):
NotificationType = value.ToString();
break;
case var _ when key.SequenceEqual(nameof(StartTimeInUTC).ToCharArray()) && DateTime.TryParse(value.ToString(), out DateTime startTime):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These .ToString() are avoidable; on modern TFMs, there are span overloads on the types directly, but before that: consider System.Buffers.Text.Utf8Parser.TryParse, which has a range of APIs; also - and this is a very important one: think i18n/l10n: I would say "explicitly specify invariant", but I believe if using Utf8Parser: it is assumed - you should probably check on that, though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried applying the Utf8Parser, but it deals with ReadOnlySpan instead of ReadOnlySpan, which is what we're working with here. Adding the extra conversion between the two seems like the opposite of what we're trying to do here? Also the documentation doesn't mention anything about cultures, so I think it makes sense to opt for the DateTime.TryParseExact() option for StartTimeUtc so we can specify it explicitly, as per your other comment.
I ended up using the preprocessor directive you mentioned in another comment to get rid of all the .ToString() conversions for the newer frameworks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DateTime parse is of particular concern to me; I wonder if the format should be explicit/"exact" here (i.e. DateTime.TryParseExact) - or is it sufficient to use the 'u' pattern (which is not the default) on Utf8Parser? Does 'u' act the same as "u"? (open question; I don't know the answer)

Regarding this comment -- on the Azure side we're sending these events with the "s" (sortable) format, but according to VS 'u' looks like this:

image

And 'U' looks like:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough; whatever works :) My key point: we need to make sure it is culture-insensitive, and we should probably validate that in a test by changing (and restoring in a finally) the thread's culture and ui-culture to something random and suitably awkward

case var _ when key.SequenceEqual(nameof(IsReplica).ToCharArray()) && bool.TryParse(value.ToString(), out var isReplica):
IsReplica = isReplica;
break;
case var _ when key.SequenceEqual(nameof(IPAddress).ToCharArray()) && IPAddress.TryParse(value.ToString(), out var ipAddress):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: no overload in Utf8Parser here, so this might be a good candidate for:

#if NETSTANDARD2_1_OR_GREATER || NETCOREAPP3_0_OR_GREATER
    && IPAddress.TryParse(value, out var ipAddress):
#else
    && IPAddress.TryParse(value.ToString(), out var ipAddress):
#endif

await sub.SubscribeAsync(PubSubChannelName, (channel, message) =>
{
var newMessage = new AzureMaintenanceEvent(message);
multiplexer.InvokeServerMaintenanceEvent(newMessage);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid the AzureMaintenanceEvent alloc when no subscriber, if we just pass down message, i.e.

multiplexer.InvokeServerMaintenanceEvent(message);
// ...
internal void InvokeServerMaintenanceEvent(string message)
  => AzureServerMaintenanceEvent?.Invoke(this, new AzureMaintenanceEvent(message));

the args on the right here are only evaluated if the ?. check passes, i.e. we have listeners

Copy link
Contributor Author

@amsoedal amsoedal Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when there's no subscriber, we currently want the AzureMaintenanceEvent handy since we access the properties to see what type of event it is and trigger reconfigure in certain cases. Let me know if you disagree!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless I'm missing it, all we do is AzureServerMaintenanceEvent?.Invoke with the event; I'm just trying to avoid the overhead if AzureServerMaintenanceEvent turns out to be null, which we can reasonably presume is the default and most common scenario; do we do anything else special in the case when no external consumer has touched the event?

Copy link
Collaborator

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a review pass; heading in the right direction, but quite a few suggestions; please please feel free to discuss any of them (not trying to be a dictator here)

@mgravell
Copy link
Collaborator

mgravell commented Oct 1, 2021

oh, can we update releasenotes.md too please, in the unreleased section? thanks

@mgravell
Copy link
Collaborator

mgravell commented Oct 1, 2021

The DateTime parse is of particular concern to me; I wonder if the format should be explicit/"exact" here (i.e. DateTime.TryParseExact) - or is it sufficient to use the 'u' pattern (which is not the default) on Utf8Parser? Does 'u' act the same as "u"? (open question; I don't know the answer)

@NickCraver
Copy link
Collaborator

Hey @amsoedal! I figured we'd see you on call today but missed out on syncing up - figured it was easier to talk through some tweaks to make this more extensible to other providers later and not lock ourselves in on the new public types with a few changes. I have the proposals in craver/1865-tweaks but definitely don't want to stomp on PR and credit here. I've granted you write access to this repo - could you please move your branch here and open a new PR (I think all current comments are good to lose). Then we can open a PR from craver/1865-tweaks as a suggested PR to that branch :) Curious on thoughts here, and told @philon-msft I'm happy to hop on a call tomorrow to go through and finish this off if sync helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants