-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need more intelligent re-resolution of names #12295
Comments
For future reference, for items like this that are not bugs, instead of filing issues, please send email to the grpc.io mailing list. That's a much better forum for this kind of discussion. I don't know anything about node or speak javascript, so @murgatroid99 will have to help you with that side of things. But I can help answer your questions about how client-side load balancing is supposed to work. I think a similar question came up a while back; see discussion in #11406. Glancing at your code, it looks like you're using the wrong name for the channel arg to select the LB policy. Try changing If that doesn't fix the problem, please let us know what you're trying to do, what you expected to see, and what you're actually seeing. |
Hello @markdroth! First off, thank you! Secondly, I tried changing the load balancing policy to I run kubernetes. I have a service for our gRPC server. I have a client that is told to use the internal DNS address for the service: |
Ok, so this was a kubernetes-related issue. I had to turn off kubernetes load balancing for this specific server's service entry in k8s. However, the next thing I tested was destroying a server randomly. It appears that the client does not periodically re-evaluate the entries for the DNS name provided; as such it continued to send traffic to the remaining instance it knew of but never detected the new instance (until I restarted the client). How can I have my clients automatically start using a new server instance that came online? |
Currently, we only re-resolve DNS names when all subchannels become disconnected. So if you restart both of the servers, the client will re-resolve and then connect to the new addresses. But if you move just one of the servers, we will just stop being able to use that one. @dgquintas and I have talked about possible ways to address this problem. It's fairly tricky, because we can't really know when it's useful to re-resolve, and we need to avoid slamming the DNS server with constant queries. For example, if a server is crashing every 10 seconds, we don't want every single client to try to re-resolve the name every 10 seconds. And if this is an environment where servers have static addresses, then there's no point in re-resolving in the first place. One possible solution would be to make the DNS resolver aware of DNS TTLs, so that we can automatically re-resolve after the previous results expire; this would essentially allow the DNS data to determine how often the clients re-resolve. However, while we could probably do this in the C-core gRPC implementation, it's not clear that we have reasonable ways to access DNS TTL information in Java or Go, which would make our clients' behavior inconsistent. Another possibility is to provide the ability to configure the threshold for what percentage of subchannels need to become disconnected before we re-resolve. The default would be 100%, which would match the current behavior, but it would allow people to reduce the threshold to something more appropriate for their environment. We might also want to provide a way to set the minimum interval between re-resolutions, just to provide some additional safety against slamming the DNS server. Anyway, we've had a lot of discussions about this but have not yet decided on any particular behavior or scheduled any work on this. But if this is something you'd like, let us know, and we can start figuring out how to prioritize it. |
Okay, I understand. I did some testing and splitting traffic reliably in a CI/CD environment is hit or miss at the moment. Consider the following: You have 2 instances (iA and iB) of service 1 (s1) running. You have 2 instances (iC and iD) of service 2 (s2) running. If s1 has a rolling update and shortly after s2 has a rolling update: the events could pan out like this:
In the scenario above, s1 instances (iA and iB) will only see 1/2 of s2 instances (iC) given. This is bad and means that none of my traffic is round-robin'ed and that 1 instance is getting slammed. After reading the solutions you proposed, it kind of feels like all of those things should at least be available. Let the developer decided whether or not they want to take extra DNS traffic. Let the developer decided if they're using a client (in a language) which can access DNS TTL and so on. Just thinking out loud here: I think there are really 2 scenarios we care about: 1. Losing an established subchannel When a subchannel disconnects (2/2 instances becomes 1/2 instances), we could attempt to re-resolve the DNS entry for some number (attempt-based or time-based) of times. This gives us a way to start listening until we're 2/2 again OR we were unsuccessful in getting back 2/2 (so we stay 1/2). 2. Discovery of new instances Consider we had a healthy, happy service that was correctly load balancing 5/5 instances. What if we had autoscaling enabled and we had a sudden surge of traffic hit our servers. Now we're running 7 instances and because we never scaled the clients and only the servers, we have no way to serve the surge of traffic (because they'll never refresh the DNS pool) so we stay 5/7... One possible solution is giving us a function to call that can refresh the pool of hosts via DNS resolution. This would give us a way to decide when we want to trigger it ourselves. For example, imagine that we performed a rolling update on a service that had 100 instances; we could publish an event onto our messaging queue (when 100 of 100 is up) that could tell clients to refresh their DNS hosts. Really, we could write whatever logic we want with this and perform the resolution whenever we saw fit; it gives us full control of something that works for us. Anyways... </2cents> |
I definitely agree that we should do better in the first scenario you mentioned, and some combination of the ideas we've been discussing could address that. With regard to the second scenario you mention, I think it's worth noting that DNS is fundamentally unsuited to the kind of dynamic environment you're describing, because DNS is a polling-based mechanism, whereas what you really want is a push-based mechanism where the clients are proactively notified when addresses change. While we might be able to find a way to work around this with DNS with the DNS TTL solution I mentioned above, I think it will never really scale the way it needs to, because it really wasn't designed for this kind of usage. A better approach would be to write a new resolver mechanism that subscribes to notification from some centralized system as the servers move around. For example, I'm not sure what mechanism kubernetes uses to update DNS, but you could presumably have it also notify some other name service that would allow clients to subscribe to particular names and would proactively send them updates when kubernetes notifies them of changes to those names. Then your clients would be getting a constant stream of updates and would always have an up-to-date set of addresses. Given that, I think that any changes we make here will likely be focused on the first scenario, not the second. But we'll have to talk further to decide exactly how we're going to handle this. |
@markdroth DNS may not be the perfect solution, but it is ubiquitous and easy to integrate with. I would prefer to setup a DNS poll every 10-20 seconds for my microservices to at least get going with load balancing my gRPC services. When that produces too much load on the DNS servers, then I will start looking at a lookaside balancer. Right now the cost to getting simple load balancing that we are used to with HTTP 1.1 is very high. The solutions are, as I see them:
A DNS-based resolver with a refresh interval would be a very low-cost, low barrier-to-entry solution that lots of developers would be comfortable with and not require a huge investment in either infrastructure or coding. |
For anyone encountering issues and looking for a simple solution: https://github.com/grpc/proposal/pull/23/files Using server-side connection options can cause load to redistribute in a fairly easy manner! Wish I had seen this document 2 weeks ago. |
We've recently done some work to make this somewhat better. The round_robin code now re-resolves whenever any individual backend connection fails, and the DNS resolver enforces a minimum time between lookups to ensure that we don't hammer the DNS server when backend connections are flapping. This doesn't address the discovery case, but it does improve the scenario where only a subset of backends fail. |
@hollinwilkins can you describe what changes to your setup you've made (in reference to #12295 (comment)) and confirm that the "discovering of new endpoints" problem went away? I am currently facing the exact issue you were facing (losing an established instance is handled correctly, but new instances are not being discovered) while trying to make a simple RoundRobin LB scenario work out of the box on kubernetes. Are there any other possible workarounds (like forcing re-resolution of backends)? |
Ad workaround based on https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md: I tried setting |
@jtattermusch This is the approach I took. Not ideal, but works for now. |
grpc go client re-resolves DNS every 30 minutes. Could c++ client do the same, so we can configure the interval? |
After having read this thread and some of the linked issues I'm still not sure I understand why observing the DNS TTL for refresh would be a bad thing. From what I can tell it would just work. Be it scaling up or down, k8s or outside of it. I think its properties would fit the principle of least surprise. I cannot imagine many selecting a DNS based round-robin load balancing approach would be surprised by clients having to poll DNS in TTL interval and that producing load. However many will be surprised to learn it won't react to changes in DNS. Load seems to be the most commonly stated reason why observing DNS TTL would be bad but I just don't see it. If my DNS service cannot handle the polling load I can easily trade-off with higher TTL, scale my DNS and ultimately once that no longer makes sense transition to another more scaleable LB approach. It is not like DNS RR LB in gRPC allows arbitrary scale to begin with so why pretend it has to? It is the simple solution for the simple cases. It should work as best as it can inside of those constraints. Having to use MaxConnectionAge, which just happens to be coupled to re-resolution, to emulate a polling behaviour seems like a bad workaround to me. I don't see how making a DNS query to some DNS cache every X seconds would be seen as problematic but having to do a magnitudes more expensive re-connect plus (encryption-)handshake with each of the backends plus having to regularly refresh DNS anyway is an acceptable workaround to that. Currently all the alternatives I can see are vastly more complex to run and expensive to implement. Why force users to use a service mesh or some custom look-aside load-balancing scheme when there's a way to make what is already supported just work for a lot of cases? |
Personally, I tend to agree that the max-connection-age approach is a fairly ugly solution to the problem of forcing re-resolution. However, I'm not sure that everyone on the team agrees with that. I think the main argument against using TTLs is that we want consistent client behavior across languages, but while we would be able to access the TTL information in C-core, we have no reasonable mechanism for doing so in Java or Go. So it's not really a portable solution. I do think we should consider providing a way for the client to be configured to periodically re-resolve at some fixed interval. I'd like to get feedback from @ejona86, @zhangkun83, and @dfawley on this. |
I see. Technically DNS TTL could be retrieved in any language by using a custom resolver (e.g. using something like miekg/dns in go or netty DNSResolver in Java. The latter would also get rid of the broken built-in DNS caching behaviour of the JVM...). That's basically what using c-ares in C-core amounts to. Whether that's a "reasonable" thing to do everywhere is of course debatable. In any case I definitely would prefer a configurable polling interval for DNS to the current MaxConnectionAge approach. Maybe there could even be an opt-in flag that makes it use the DNS TTL when supported and fallback to the polling interval otherwise? I'm not sure whether such "extensions" is something that's done across the gRPC clients in different languages but I would be surprised if they are totally equal now. But as I said. Just having the configurable DNS polling interval would be a considerable improvement. |
It's not quite that simple. When caching DNS resolvers are in place a single response from the authoritative DNS server can be sent to 1000s of clients. All those clients will have the TTL expire at the same time (independent of when they originally queried) so they form a "stampeding herd." Every time the TTL expires the entire "herd" will re-request DNS at the same time. Increasing the TTL would decrease average load but wouldn't reduce peak load. With a limited number of clients, that can be fine. But the DNS resolver would do this in all cases, including in large-scale pick-first cases like googleapis.com. Using a consistent polling frequency doesn't cause herds, but configuration becomes a problem.
I would call it a "functional but non-optimal solution." We have to have MaxConnectionAge for other reasons, so the question is if the deficiencies are bad enough to warrant another solution for this specific case. Note that one great property of the current solution is that the configuration is the service's control, and we'd want to avoid losing that property with a new solution. Note that I don't really consider the solution to be a "workaround" or "hack," in that most of the web relies on the behavior of re-issuing DNS when reconnecting. The problem for round-robin is that it can refresh too frequently.
TLS Session Resumption should reduce the cost of the re-handshake to something fairly low. That said, I've not verified that our clients are using resumption and I think I saw that Java is not. But that's a clearly-defined problem that could be resolved. Yes, reconnecting is more expensive than a DNS query, but it is small when amortized over the lifetime of the connection. We're not trying to fully 100% optimize this one use-case at any expense, eking out every last CPU cycle in code that runs once every O(minutes). We support many use cases and we want them to work reasonably well at reasonable cost. So to me, the discussion shouldn't be narrowly focusing on whether some alternative is more efficient than what we have now. Instead, it should focus on the problems caused by the existing approach. This issue was started in the days that C core had very poor re-resolution behavior (which changed sometime around March, based on the markdroth's comment). The problem then was "older clients virtually never connect to new servers," such that load was woefully underdistributed in normal use-cases. That has been resolved. |
For anyone else getting stuck trying to implement load balancing the non-optimal (DNS) way I created this example project originally to help me debug scaling and round robin load balancing.
I'm still not sure why the round robin load balancing fails to send requests sequentially to each replica, sometimes it will hit the same replica twice before sending a request to another replica. |
The workaround suggested in the issue is to set |
There is no way to transparently move a stream from one server to another; you need to start a new stream for that. In general, clients that use long-running streams need to be prepared to restart their stream any time it fails anyway, so you just need to arrange for the streams to be periodically terminated, and that will cause the streams to move to whatever new connections have been created by the clients. One way to do this is to this is to set Another option is to handle this at the application level by having your service implementation terminate streams after a certain amount of time. That approach would need to be done individually for each service, so it may be more work if you have a large number of services, but it does allow the service implementation more control over shutting down the stream (e.g., it can send whatever status code or trailing metadata it wants). |
@markdroth thanks for the response. We're more concerned about whether new streams we initiate from a client pick up the added servers and would like to avoid disrupting the existing streams from the client. If our client has streams 1, 2, and 3 connected to server A, when server B comes up we'd like the new stream 4 to consider both servers A and B without having to disrupt streams 1, 2, and 3 connected to server A. |
That will work properly with the approach I've described. Once the client sees a GOAWAY on an existing connection, it will stop using that connection for new streams, even if the connection remains open for a long time for the existing streams that had already been started on that connection before the GOAWAY was received. The GOAWAY will trigger the client to stop using that connection and to do a DNS re-resolution, at which point it will pick up the new addresses, and all of the new addresses will be considered for subsequently started streams. |
@markdroth if we were to set |
Yes, that could lead to accumulating connections. If that's a concern, it can be ameliorated by either setting |
The Go client has this resolver thing. You can import the package and Register a resolver builder or whatever. The client automatically picks it up, so I guess it's a global. My coworker hacked something together to watch endpoints in the Kubernetes API and we have perfect load balancing when pods go in an out of service. I got here trying to figure out how to do it with the Node.js client. I guess it's not possible. Although, this repo isn't specific to the Node.js client and at least one other language client already does this. I'm confused. |
I'm not reporting an issue. I was trying to help by pointing out that a different client already has a solution to OP's issue. The issues you linked me to seem to be other clients that, along with the Node.js client, also lack features developed for the Go client. |
@mqsoh, oh, okay. Yes, many custom resolvers wouldn't have the problem discussed here as many naming systems have watch APIs. When available in your language and it makes sense for your deployment, such a custom/pluggable name resolver would be the preferred solution. And there is (slow) progress on allowing custom name resolvers in more languages. I don't think there is much argument there. I think this discussion is more focusing on where you don't have a watch API (aka DNS), or even though a watch API might be available you aren't in direct control of the clients and so practically you need to use DNS. |
From what I understand this still isn't a silver bullet because the pluggable resolver is only called when a channel (or a sub channel now?) fails. Therefore if your target service is only scaling up, your client will not distribute load to new pods. |
A project maintainer already marked my comment as irrelevant two and a half years ago. |
I know someone uses KeepAlive policy to fore rediscovery, since I dont wanna do it on client side, is it safe just trigger d.watcher (or dnsResolver.ResolveNow)with a timer, every 30s for example? I mean, if I fork dnsResolver and register it manually |
For starters, let me state that this is not a bug. I'm trying to reach out for help because I can't simply piece together enough information to figure out how to address client-side load balancing properly within the node.js world of gRPC.
Here is my example code base: https://gist.github.com/carldanley/39d5a0d7f9b1ea865af94481da1e0cac. I deploy that to a kubernetes environment and use a load balancer to attempt to split the traffic (with no luck)... What am I doing wrong?
The text was updated successfully, but these errors were encountered: