As discussed in #11578, the round_robin load-balancer should periodically attempt to reconnect to failed subchannels.
As it stands, if a service lives on n remote servers, all of which eventually undergo downtime/maintenance, then a (long-lived) client will gradually lose connectivity to more and more servers, until only one is left, and only when that one goes down, does it reconnect to all of them. Not especially great for load-balancing.
I'm creating this issue for tracking, as advised by @dgquintas. Hopefully this can get fixed for 1.5.