Skip to content

Conversation

@mengranwo
Copy link
Contributor

@mengranwo mengranwo commented Jun 17, 2019

Sorry, had to open a new PR on this issue
Issue we(Wavefront) have:
#53

Solution:
Based on branch mater
During recovery process for memory storage engine, we’ll check periodically whether we can safely remove this storage engine (by calling canRemoveStorageServer() method). If the cluster is healthy then abort recovery and join in as an empty node. Otherwise, keep going until it’s done.

Copy link
Contributor

@alexmiller-apple alexmiller-apple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to have used spaces for indentation, which should be converted to tabs.

I think you can also just run git clang-format to fix this as well.

@alexmiller-apple
Copy link
Contributor

@fdb-tools, ok to test

20190617-132559-mengranwo-49c193ec9d88b498

return false;
}

ACTOR Future<bool> memoryStoreRecover(IKeyValueStore* store, Reference<ClusterConnectionFile> connFile, UID id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to return Future<Void> here as this actor either returns true or doesn't return at all...

Copy link
Contributor Author

@mengranwo mengranwo Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why i had the return value as Future<bool> is that i wanted to check the storage engine type at the beginning, if the type != memory, i would return immediately and do nothing.

Copy link
Contributor Author

@mengranwo mengranwo Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   if(store->getType() != KeyValueStoreType::MEMORY || connFile.getPtr() == nullptr) {
        return false;
    }

this part, to be more specifically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also return Never() in this case... It would simplify your code quite a bit, but I am not going to insist

wait(self.storage.commit()); //after a rollback there might be uncommitted changes.

state Future<bool> dispose = memoryStoreRecover (persistentData, connFile, self.thisServerID);
wait(self.storage.init());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems to be wrong here.

//after a rollback there might be uncommitted changes.
//for memory storage engine type, wait until recovery is done before commit
state Future<Void> committed = self.storage.commit();
wait(success(dispose) || success(committed));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if dispose would return a Future<Void> you could instead use a choose-when construct here (and you can get rid of the two state variables):

choose {
    when (wait(self.storage.commit())) {}
    when (wait(memoryStoreRecover (persistentData, connFile, self.thisServerID))) {
        TraceEvent("DisposeStorageServer", self.thisServerID);
         throw worker_removed();
    }
}

If you do this you also don't need to explicitly cancel the dispose Future.

Copy link
Contributor Author

@mengranwo mengranwo Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. However, one of my concern is, if type != memory, the method memoryStoreRecover() would immediately return Never(), but in this case, we don't want to throw worker_removed(); right?

What is the difference between Never() and Void() then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, figure it out.
push the changes into branch, thank you!

@alexmiller-apple
Copy link
Contributor

20190617-132559-mengranwo-49c193ec9d88b498 branch='recovery-optimization' compressed=True data_size=12929949 duration=2869496 ended=100233 fail_fast=10 max_runs=100000 pass=100000 pr='1713' priority=100 sanity=False started=100234 timeout=5400 username='mengranwo'

Correctness approves.

//for memory storage engine type, wait until recovery is done before commit
wait(self.storage.commit()) {}

wait(memoryStoreRecover (persistentData, connFile, self.thisServerID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the whens

Copy link
Contributor

@mpilman mpilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks good to me - but I would like the code-formatting to be a bit more consistent


ACTOR Future<Void> memoryStoreRecover(IKeyValueStore* store, Reference<ClusterConnectionFile> connFile, UID id)
{
if(store->getType() != KeyValueStoreType::MEMORY || connFile.getPtr() == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-block has a different indentation then the rest of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run git clang-format on storageserver.cpp

@mengranwo
Copy link
Contributor Author

@etschannen
Evan, if you have time, can you take a look at this PR, we talked about the high-level design months ago. Would love to get this one merged into the master branch

Thank you!

tr.reset();
TraceEvent("RemoveStorageServerRetrying").detail("Count", noCanRemoveCount++).detail("ServerID", id).detail("CanRemove", canRemove);
} else {
wait(tr.commit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is being written, so there is no need to commit the transaction

Copy link
Contributor Author

@mengranwo mengranwo Jun 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what's the protocol here.
For the read & write transaction, we need to commit at the end.
But for the read-only transaction, we can skip this step? (To avoid the overhead?)

try {
wait( tr.onError(e));
} catch (Error& e2) {
return Never();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what non-retryable errors is this second try catch expected to catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of the try & block
resolved

@etschannen
Copy link
Contributor

Sorry for taking so long to review this. I posted some very minor code cleanup comments, and will merge after you have looked at them.

@alexmiller-apple alexmiller-apple merged commit e54eedf into apple:master Jul 1, 2019
@mengranwo mengranwo deleted the mengran/recovery-optimization branch July 7, 2019 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants