I have some rethinkdb clusters (3 instances/cluster) frequently got into Raft election timeout infinite loops.
Environment: CoreOS 899.15.0, Docker 1.9.1. Each cluster is containing around 60 to more than 100 tables
The clusters are running in a quite unstable network. Before it occurred, network partition might happen a few times, causing heartbeat timeouts. But while the loops was happening, there was no heartbeat timeout, the instances still connected to each other. I had to restart each instance to stop the loop. But for some clusters, restart didn't resolve. In those cases, I had to stop all 3 instances and start one by one.