Persistent nukeeper snapshot storage#21425
Conversation
|
Shutdown bug :( |
|
Seems like after query: OOM kills the server. |
|
Hmmm, looks like a bug.... But I don't understand where and cannot reproduce it. For some reason when the follower node down leader tries to send a 0-level snapshot (fake) with each |
|
I've added sleeps and really, kazoo just ignore heart beats: |
|
Wow, seems like our zookeeper client sends redundant heartbeats. Each operation in ZK considered as a heartbeat and we don't need to send them by timeout if have other ops (https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L1150). Need to change NuKeeper server logic! |
|
|
|
Stress test: This is a separate problem with hanging queries, but after that, we cannot start the server because of: |
|
This replica was created for 5 minutes before the shutdown. |
|
No idea how is it possible. I'll upload of coordination dump after stress test. |
|
I have some bug with snapshots: Child node exist without parent. |
|
Ok, will make one more CI run just to be sure. |
|
|
|
No related failures. Let's merge and see what will happen. And that is all.... |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):