Start to flesh out Polly design #58

rynowak · 2018-02-14T01:39:28Z

The makings of some integration for using HttpClientFactory and Polly together.

Right now my focus is on nailing down the basics of the public-facing API:

What are the most common scenarios?
How does a beginner figure out what to do?
Does an expert have enough control?

My goals is to answer those questions (with feedback) along with the the following non-goals:

Don't lock users out of powerful but complicated things
Don't try to address every possible scenario and combination
Don't hide or abstract away Polly's APIs

rynowak · 2018-02-14T01:41:02Z

Pinging everyone who's commented on the issue thread, I welcome your thoughts and feedback. @stevejgordon @theezak @reisenberger and @joelhulen as well as @glennc @javiercn and @anurse for good measure

rynowak · 2018-02-14T01:42:32Z

src/Microsoft.Extensions.Http.Polly/PolicyHttpMessageHandler.cs

+
+namespace Microsoft.Extensions.Http
+{
+    public class PolicyHttpMessageHandler : DelegatingHandler


The key building block here is a message handler that plugs in a policy to the pipeline. Ideally this is as generic as possible.

Not seen here: we'll want logging as well as that's one of the best ways that we add value by providing this. I'd like to defer discussions about that until after the initial API for configuring clients is designed.

👍 to this highly general PolicyHttpMessageHandler

@reisenberger - do you think that we'll end up having a generic and non-generic policy handler? That seems logical to me based on the desire to provide both:

public static IHttpClientBuilder AddPolicyHandler(this IHttpClientBuilder builder, IAsyncPolicy policy) public static IHttpClientBuilder AddPolicyHandler(this IHttpClientBuilder builder, IAsyncPolicy<HttpResponseMessage> policy)

Do you agree with this direction?

Yes.

While only pure IAsyncPolicy<HttpResponseMessage> policies would be needed to govern SendAsync(...), common Polly syntax (for example Policy.TimeoutAsync(10)) results more naturally in a non-generic IAsyncPolicy, so not supporting IAsyncPolicy might frustrate users.

Generic policies are marginally more performant but any difference is in the order of microseconds or less.

rynowak · 2018-02-14T01:51:08Z

samples/HttpClientFactorySample/Program.cs

+            .AddServerErrorPolicyHandler(p => p.RetryAsync())
+
+            // Build a policy that will handle exceptions, 400s, and 500s from the remote server
+            .AddBadRequestPolicyHandler(p => p.RetryAsync())


This is probably the best place to start the discussion.

I understand the requirements for building a policy to be:

what kinds of things do I consider an error?

what do I want to do when I find one?

I think there's some value in us providing guidance about the first question (what should I consider an error), and I think the second question is easy enough for users to answer (p => p.RetryAsync()).

I've defined a few flavors of extension methods here. I think AddServerErrorPolicyHandler is probably going to be the most useful balance between "sane default" and hiding complexity. Most folks will always want to treat a connection failure or a 500 as an error.

If you use AddPolicyHandler to build AddServerErrorPolicyHandler it will end up looking like:

.AddPolicyHandler( Policy .Handle<HttpRequestException>() .OrResult<HttpResponseMessage>(r => r.StatusCode >= HttpStatusCode.InternalServerError) .RetryAsync());

I think it's a good enough for us to find an attractive name for this and include it 👍

A few questions for discussion:

Handlers may be long-lived or short-lived. Is it important we reuse policies? It is important that we don't?

There's a little bit of tension around whether or not things should be IAsyncPolicy or IAsyncPolicy<T> in some of the simple cases, so I planned to support both. Is this a good plan?

I jump in even if not pinged. Policies' lifecycle should be independent from the handler's one because there are policies like the short-circuit one that are designed to be "singleton-ish".

Ideally policies should be attached to freshly generated handler's but it feels like it leads to a pool manager for the policies to be used by a pool manager for the handlers...

@Kralizek thanks!

The factory already provides some pooling and lifecycle management for handlers (docs in progress).

It's easy to make a policy a singleton.

It's also easy to make a policy have the same lifetime as a handler.

It would be a bit tricky to do something different, and I'd like to avoid it if possible

Policies' lifecycle should be independent from the handler's one because there are policies like the short-circuit one that are designed to be "singleton-ish".

This makes intuitive sense to me.

Are there real use cases for making policies that have the same lifetime as the handler? Should they all be singleton?

Responding to @rynowak and @Kralizek on policy lifecycle:

TL;DR (I have assumed...) Policies should be single-instance for each policy configured for use with a given named HttpClient configuration.

For example, if a Polly policy for a given named HttpClient is config'd by supplying a createPolicy() factory Func, that createPolicy() func should be used to create one instance of that policy, and that single policy instance supplied to all subsequently created handlers/HttpClient instances of that named configuration.

Explanation

Two policy types maintain state across calls for their designed function: circuit-breakers (tracking call faults/successes) and bulkheads (limiting parallelism).

If HttpClientFactory were to create fresh instances when recycling handlers, that state would suffer either discontinuity (if new instances replaced old instances 'instantaneously') or fragmentation/multiplicity (if new and old instances co-existed).

eg Bulkhead: Generating fresh BulkheadPolicy instances for new handlers could cause executions to exceed the parallelism configured by the developer, as double/multiple capacity would be available while both newly-spun-up and existing / being-retired instances co-existed.

No other policies maintain state, but all Polly policies are entirely thread-safe and memory-efficient for long lifetime, so there's no real case I can see for shorter lifetimes; it'd just be more allocations and GCs.

Does this sound right to you?

rynowak · 2018-02-14T01:52:41Z

src/Microsoft.Extensions.Http.Polly/Microsoft.Extensions.Http.Polly.csproj

+
+  <PropertyGroup>
+    <Description>
+      The HttpClient factory is a pattern for configuring and retrieving named HttpClients in a composable way. This package provides integration for IHttpClientFactory using the Polly library for exception handling and fault-tolerance.


@reisenberger @joelhulen - feel free to provide feedback about how we're describing your project 😆

Not a bad start :)

I'd perhaps word the second sentence as such:

This package integrates IHttpClientFactory with the Polly library, to add transient-fault-handling and resiliency through fluent policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback.

@reisenberger, do you agree?

martincostello · 2018-02-14T08:55:24Z

src/Microsoft.Extensions.Http.Polly/PolicyHttpMessageHandler.cs

+                throw new ArgumentNullException(nameof(request));
+            }
+
+            return _policy.ExecuteAsync(() => base.SendAsync(request, cancellationToken));


Polly supports handing the cancellation token to the policy itself as well so it can be used by its internals (at least that's my understanding), which is passed through into the lambda doing the actual work. Should the cancellation token be used that way here?

Agree, eg:

return _policy.ExecuteAsync(token => base.SendAsync(request, token), cancellationToken);

An example benefit: if the user has configured a wait-and-retry policy, a cancellationToken passed through the policy in this manner also governs the wait phase of wait-and-retry, so that signalling the cancellationToken will cause the policy to abandon the wait immediately.

Yep, this is just something I missed. Will fix in the next iteration.

analogrelay

Looks pretty good to me. Happy to let the Polly folks take the lead on what API makes sense here.

reisenberger · 2018-02-15T00:17:11Z

src/Microsoft.Extensions.Http.Polly/DependencyInjection/PollyHttpClientBuilderExtensions.cs

+            //builder.AddHttpMessageHandler(() => new PolicyHttpMessageHandler(createPolicy()));
+            return builder;
+        }
+


Polly policies divide into reactive policies (Retry; CircuitBreaker; Fallback), and proactive and preemptive policies (Cache; Bulkhead; Timeout). The reactive policies are the ones which flow from PolicyBuilder, so the choice of configuration syntax here with variants around Func<PolicyBuilder..., IAsyncPolicy...> focuses on (only) the Retry, CircuitBreaker and Fallback policies.

I'm drawing this out in case it wasn't intended, but it seems reasonable, as Retry and CircuitBreaker seem likely majority use cases for Polly with HttpClientFactory. And for those who want to go further, the more advanced overloads (eg AddPolicyHandler(this IHttpClientBuilder builder, IAsyncPolicy policy) line 14) do provide.

I don't know if this was deliberate, but I agree with your judgement that Retry and CircuitBreaker are likely to be the most commonly used.

Thanks for the feedback. This will be good info for writing the docs.

reisenberger · 2018-02-15T00:17:39Z

src/Microsoft.Extensions.Http.Polly/DependencyInjection/PollyHttpClientBuilderExtensions.cs

+            return builder;
+        }
+
+        public static IHttpClientBuilder AddExceptionPolicyHandler(this IHttpClientBuilder builder, Func<PolicyBuilder, IAsyncPolicy<HttpResponseMessage>> createPolicy)


Point of detail: this overload is not needed. Polly fluent syntax happens to dictate that PolicyBuilder must already be generic PolicyBuilder<TResult> if the eventual policy is to be generic in TResult. (no configuration overloads flow from PolicyBuilder to IAsyncPolicy<TResult>.)

Great! will remove this in the next iteration.

reisenberger · 2018-02-15T00:34:43Z

src/Microsoft.Extensions.Http.Polly/DependencyInjection/PollyHttpClientBuilderExtensions.cs

+            //builder.AddHttpMessageHandler(() => new PolicyHttpMessageHandler(createPolicy()));
+            return builder;
+        }
+    }


Is there as clear motivation for an AddBadRequestPolicyHandler() covering the 4xx codes generally, as for the other overloads? Where Http status codes in the 4xx range indicate client errors, there may be no point in retrying these (eg if a request is badly formed 400, resubmitting it is usually not going to make it succeed). Likewise, one typically might not want to take client-errors as cause to break a circuit-breaker.

Some specific exceptions:

408 (afaiu) the server signalling giving up waiting for the client's request to arrive complete; it could reflect wider network connectivity which merits a retry.

429 RetryAfter: Polly has retry overloads which can handle this case (extract how long to wait before the next try, from the 429 response), but it remains a fairly specific configuration.

Is there as clear motivation for an AddBadRequestPolicyHandler() covering the 4xx codes generally, as for the other overloads?

Not really. That was just a straw man for us to beat up.

I still want to get more feedback on whether we should have both AddExceptionPolicyHandler and AddServerErrorPolicyHandler.. When might you want to apply a policy to a connection failure but not a 500? Still needs more thought IMO.

[should we] have both AddExceptionPolicyHandler and AddServerErrorPolicyHandler?
When might you want to apply a policy to a connection failure but not a 500?

If the questions asked by users to the Polly team are a relevant measure, then handling them differently seems at least uncommon. We received a lot of requests for policies to handle result codes as well as exceptions, but since adding that, no requests for help how to handle them differently. My feeling (fwiw) is the majority use case defines a common strategy for all kinds of transient fault at an endpoint.

If AddExceptionPolicyHandler and AddServerErrorPolicyHandler were merged, the power overloads you are proposing would still provide, for those who want more nuanced strategies.

Still interested what others may think.

The two enquiries the Polly team handles periodically (I would not say frequently) on behaviour for specific result codes are:

401 Unauthorized: Can I make a policy to re-authenticate, then retry, on 401 Unauthorised?

429 RetryAfter: How do I make a wait-and-retry policy wait for a duration specified in an x-ms-retry-after-ms return header?

We also in the Polly doco recommend retrying on 408s, similar to advice from the Azure team.

I think it's worth considering if we want to follow up on a convenience method for 'retryable' faults. Adding it to my todo list.

Regarding AuthZ, we plan to do that in the future (not this release). I'm not sure that plan is related to Polly, but we're interested in your thoughts as well.

I think it's worth considering if we want to follow up on a convenience method for 'retryable' faults.

Happy to hear more what is the thinking here. Is this around helpers for commoner cases of how one might handle (common retry strategies such as exponential backoff), or something else?

Happy to hear more what is the thinking here. Is this around helpers for commoner cases of how one might handle (common retry strategies such as exponential backoff), or something else?

I think we'd want to provide a more opinionated approach to what we consider a 'retry-able' request. The starting point would be the info that you've already shared with us from your docs/guidance.

I think the polly API for creating different retry policies is already sufficient.

If you have more ideas about this, feel free to open an issue for more focused discussion. Your feedback has been really valuable.

stevejgordon · 2018-02-15T08:46:02Z

I like where this is headed 👍

aristovg · 2018-02-15T12:21:33Z

Sorry for jumping in, but have you considered the HttpClient timeout when doing retries within delegating handler? Consider the following scenario:

HttpClient timeout is set to 10 seconds
Retry policy has 5 attempts
Each attempts takes 5 seconds (either it is set by retry policy or server is responding at that rate).
This will result in only 2 retries being made instead of 5.

Any thoughts on that?

stevejgordon · 2018-02-16T09:25:00Z

I was playing with this briefly last night and thinking about the scenario we'd likely have were we like to register and reuse policies. This is what I ended up with...

NOTE: It's a bit rushed and I didn't have time to / couldn't actually get the sample running so this may very well not work in practice!

Add an extension:

public static IHttpClientBuilder AddPolicyHandler<T>(this IHttpClientBuilder builder, string namedPolicy) where T : IPolicyRegistry<string>
{
	IReadOnlyPolicyRegistry<string> registry;

	using (var sp = builder.Services.BuildServiceProvider())
	{
		registry = sp.GetService<T>();
	}
	
	if (registry != null)
	{
		if (registry.TryGet<IAsyncPolicy<HttpResponseMessage>>(namedPolicy, out var policy))
		{
			builder.AddHttpMessageHandler(() => new PolicyHttpMessageHandler(policy));
		}
	}

	return builder;
}

Basic consumption...

public static void Configure(IServiceCollection services)
{
	services.AddSingleton<PolicyRegistry>(s =>
	{
		var registry = new PolicyRegistry();

		var policy = Policy.Handle<HttpRequestException>().RetryAsync();

		registry.Add("mypolicy", policy);

		return registry;
	});

	services.AddHttpClient("github", c =>
	{
		c.BaseAddress = new Uri("https://api.github.com/");

		c.DefaultRequestHeaders.Add("Accept", "application/vnd.github.v3+json"); // Github API versioning
		c.DefaultRequestHeaders.Add("User-Agent",
			"HttpClientFactory-Sample"); // Github requires a user-agent
	})

	.AddPolicyHandler<PolicyRegistry>("mypolicy");
}

I'm not sure if this is a) a good approach b) useful as part of the main library but interested in thoughts on this style of usage!

rynowak · 2018-02-16T16:02:43Z

Thanks for all of the comments, I'm super excited to see so much interest. I'm going to go through and respond to everyone's feedback, and I'll create a more complete version of this PR over the weekend.

rynowak · 2018-02-16T16:22:25Z

Responding to a comment from @aristovg

Sorry for jumping in, but have you considered the HttpClient timeout when doing retries within delegating handler? Consider the following scenario:

HttpClient timeout is set to 10 seconds
Retry policy has 5 attempts
Each attempts takes 5 seconds (either it is set by retry policy or server is responding at that rate).
This will result in only 2 retries being made instead of 5.
Any thoughts on that?

For reference the default timeout is 100 seconds.

I guess I'd say that with a scheme like this you can configure the overall timeout of the system (including retries) using the timeout on HttpClient. You can configure the timeout on the actual HTTP requests by using another handler (after the policy) with a stricter timeout enforced by a cancellation token.

Polly has policies for timeout as well, so I'd assume that once I address this feedback that would 'just work'.

It's a good point and I'm glad that you brought it up. We will need to document this.

rynowak · 2018-02-16T16:27:09Z

Responding to a comment from @stevejgordon

I was playing with this briefly last night and thinking about the scenario we'd likely have were we like to register and reuse policies. This is what I ended up with...

Yeah, I like the direction of what you came up with. This seems promising. I'm tracking this for follow up here: #8

rynowak · 2018-02-16T18:04:07Z

/cc @sebastienros

reisenberger · 2018-02-16T18:26:05Z

Responding to @aristovg and @rynowak re timeouts:

Sorry for jumping in, but have you considered the HttpClient timeout when doing retries within
delegating handler? Consider the following scenario:

HttpClient timeout is set to 10 seconds
Retry policy has 5 attempts
Each attempts takes 5 seconds (either it is set by retry policy or server is responding at that rate).
This will result in only 2 retries being made instead of 5.
Any thoughts on that?

[...] You can configure the timeout on the actual HTTP requests by using another handler (after the
policy) with a stricter timeout enforced by a cancellation token.
Polly has policies for timeout as well [...] so I'd assume [...] that would 'just work'.

Exactly so. Polly's TimeoutPolicy functions by this same mechanism, so one could:

.AddServerErrorPolicyHandler(p => p.RetryAsync(/* detailed  configuration */))
.AddPolicyHandler(Policy.TimeoutAsync(10)) // impose 10-second timeout on each individual retry

Additionally, rather than only letting the TaskCanceledException or OperationCanceledException propagate, Polly's TimeoutPolicy throws a wrapping Polly.Timeout.TimeoutRejectedException, if specifically distinguishing timeout to be the cause. This helps any wrapping RetryPolicy distinguish timeout cancellation from other cancellation: for instance, one commonly wants to retry on timeout but bail out immediately (not retry) on user/caller cancellation.

Great to have drawn this point out and agree that the role of each timeout merits documentation.

rynowak · 2018-02-21T04:30:42Z

Updated this. I think I've addressed all of the important feedback that isn't tracked somewhere else.

I want to come to an understanding still on https://github.com/aspnet/HttpClientFactory/pull/58/files#r169533010

rynowak · 2018-02-21T04:32:03Z

src/Microsoft.Extensions.Http.Polly/DependencyInjection/PollyHttpClientBuilderExtensions.cs

+            }
+
+            // Important - cache policy instances so that they are singletons per handler.
+            var innerPolicy = policy.WrapAsync(Policy.NoOpAsync<HttpResponseMessage>());


@reisenberger - what do you think of this approach instead of creating two separate code paths?

Having two separate code paths just seems a bit like clutter - and then there's naming to think about 😲..

var innerPolicy = policy.WrapAsync(Policy.NoOpAsync<HttpResponseMessage>());

I like how this simplifies the separate codepaths; that concerned me too.

rynowak · 2018-02-21T23:51:12Z

It sounds like we've pretty much converged at this point, and there are lots of follow up items 👍

I'm going to go ahead and merge this once I do all the other fancy build stuff I need to do in our private repos.

Adds an Microsoft.Extensions.Http.Polly project for convenience functionality to integrate Polly with the IHttpClientFactory. This is just basic building blocks now, and more is coming.

rynowak commented Feb 14, 2018

View reviewed changes

martincostello reviewed Feb 14, 2018

View reviewed changes

analogrelay reviewed Feb 14, 2018

View reviewed changes

reisenberger reviewed Feb 15, 2018

View reviewed changes

rynowak mentioned this pull request Feb 16, 2018

Add Polly #8

Closed

6 tasks

reisenberger mentioned this pull request Feb 17, 2018

Scope of Polly polices in relation to named HttpClient configurations #60

Closed

rynowak commented Feb 21, 2018

View reviewed changes

rynowak added 2 commits February 21, 2018 15:54

Add Polly.Signed dependency

5f01bf1

Add basic Polly integration

8f492c1

Adds an Microsoft.Extensions.Http.Polly project for convenience functionality to integrate Polly with the IHttpClientFactory. This is just basic building blocks now, and more is coming.

rynowak force-pushed the rynowak/polly branch from f88d8cd to 8f492c1 Compare February 21, 2018 23:56

rynowak merged commit 157cb89 into dev Feb 22, 2018

rynowak deleted the rynowak/polly branch February 22, 2018 01:12

reisenberger mentioned this pull request Feb 26, 2018

Add support for policy registry #66

Merged

Start to flesh out Polly design #58

Start to flesh out Polly design #58

Uh oh!

Conversation

rynowak commented Feb 14, 2018

Uh oh!

rynowak commented Feb 14, 2018

Uh oh!

rynowak Feb 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rynowak Feb 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reisenberger Feb 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Explanation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

analogrelay left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reisenberger Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevejgordon commented Feb 15, 2018

Uh oh!

aristovg commented Feb 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevejgordon commented Feb 16, 2018

Uh oh!

rynowak commented Feb 16, 2018

Uh oh!

rynowak commented Feb 16, 2018

Uh oh!

rynowak Feb 14, 2018 •

edited

Loading

rynowak Feb 14, 2018 •

edited

Loading

reisenberger Feb 17, 2018 •

edited

Loading

analogrelay left a comment •

edited

Loading

reisenberger Feb 21, 2018 •

edited

Loading

aristovg commented Feb 15, 2018 •

edited

Loading

reisenberger commented Feb 16, 2018 •

edited

Loading

rynowak commented Feb 21, 2018 •

edited

Loading