-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Recovery Optimization #1713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovery Optimization #1713
Conversation
…e engine), if healthy, delete data files and join as new
alexmiller-apple
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to have used spaces for indentation, which should be converted to tabs.
I think you can also just run git clang-format to fix this as well.
|
@fdb-tools, ok to test 20190617-132559-mengranwo-49c193ec9d88b498 |
fdbserver/storageserver.actor.cpp
Outdated
| return false; | ||
| } | ||
|
|
||
| ACTOR Future<bool> memoryStoreRecover(IKeyValueStore* store, Reference<ClusterConnectionFile> connFile, UID id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to return Future<Void> here as this actor either returns true or doesn't return at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why i had the return value as Future<bool> is that i wanted to check the storage engine type at the beginning, if the type != memory, i would return immediately and do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(store->getType() != KeyValueStoreType::MEMORY || connFile.getPtr() == nullptr) {
return false;
}
this part, to be more specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also return Never() in this case... It would simplify your code quite a bit, but I am not going to insist
fdbserver/storageserver.actor.cpp
Outdated
| wait(self.storage.commit()); //after a rollback there might be uncommitted changes. | ||
|
|
||
| state Future<bool> dispose = memoryStoreRecover (persistentData, connFile, self.thisServerID); | ||
| wait(self.storage.init()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems to be wrong here.
fdbserver/storageserver.actor.cpp
Outdated
| //after a rollback there might be uncommitted changes. | ||
| //for memory storage engine type, wait until recovery is done before commit | ||
| state Future<Void> committed = self.storage.commit(); | ||
| wait(success(dispose) || success(committed)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dispose would return a Future<Void> you could instead use a choose-when construct here (and you can get rid of the two state variables):
choose {
when (wait(self.storage.commit())) {}
when (wait(memoryStoreRecover (persistentData, connFile, self.thisServerID))) {
TraceEvent("DisposeStorageServer", self.thisServerID);
throw worker_removed();
}
}
If you do this you also don't need to explicitly cancel the dispose Future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. However, one of my concern is, if type != memory, the method memoryStoreRecover() would immediately return Never(), but in this case, we don't want to throw worker_removed(); right?
What is the difference between Never() and Void() then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, figure it out.
push the changes into branch, thank you!
|
Correctness approves. |
fdbserver/storageserver.actor.cpp
Outdated
| //for memory storage engine type, wait until recovery is done before commit | ||
| wait(self.storage.commit()) {} | ||
|
|
||
| wait(memoryStoreRecover (persistentData, connFile, self.thisServerID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the whens
mpilman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks good to me - but I would like the code-formatting to be a bit more consistent
fdbserver/storageserver.actor.cpp
Outdated
|
|
||
| ACTOR Future<Void> memoryStoreRecover(IKeyValueStore* store, Reference<ClusterConnectionFile> connFile, UID id) | ||
| { | ||
| if(store->getType() != KeyValueStoreType::MEMORY || connFile.getPtr() == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if-block has a different indentation then the rest of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run git clang-format on storageserver.cpp
|
@etschannen Thank you! |
fdbserver/storageserver.actor.cpp
Outdated
| tr.reset(); | ||
| TraceEvent("RemoveStorageServerRetrying").detail("Count", noCanRemoveCount++).detail("ServerID", id).detail("CanRemove", canRemove); | ||
| } else { | ||
| wait(tr.commit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is being written, so there is no need to commit the transaction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering what's the protocol here.
For the read & write transaction, we need to commit at the end.
But for the read-only transaction, we can skip this step? (To avoid the overhead?)
fdbserver/storageserver.actor.cpp
Outdated
| try { | ||
| wait( tr.onError(e)); | ||
| } catch (Error& e2) { | ||
| return Never(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what non-retryable errors is this second try catch expected to catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of the try & block
resolved
|
Sorry for taking so long to review this. I posted some very minor code cleanup comments, and will merge after you have looked at them. |
Sorry, had to open a new PR on this issue
Issue we(Wavefront) have:
#53
Solution:
Based on branch mater
During recovery process for memory storage engine, we’ll check periodically whether we can safely remove this storage engine (by calling canRemoveStorageServer() method). If the cluster is healthy then abort recovery and join in as an empty node. Otherwise, keep going until it’s done.