-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Checkpoint restore with shard #8667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint restore with shard #8667
Conversation
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
physicalShards are not cleared.
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
|
|
||
| void addRange(Reference<ShardInfo> shard); | ||
| void removeRange(Reference<ShardInfo> shard); | ||
| void removeRange(KeyRangeRef range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two removeRange need comments. From what the code does, they are different: The first one seems to remove only if the shard pointers are the same; the second one remove any overlapping shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
fdbserver/storageserver.actor.cpp
Outdated
| // A SSPhysicalShard represents a physical shard, it contains a list of keyranges. | ||
| class SSPhysicalShard { | ||
| public: | ||
| SSPhysicalShard() : id(0LL) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, removed.
fdbserver/storageserver.actor.cpp
Outdated
| .detail("Range", shard->keys); | ||
|
|
||
| if (shard->notAssigned()) { | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an unexpected situation? Should we add some TraceEvent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed to an ASSERT.
| for (int i = 0; i < this->ranges.size(); ++i) { | ||
| const auto& r = this->ranges[i]; | ||
| if (r.getPtr() == shard.getPtr()) { | ||
| this->ranges[i] = this->ranges.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ranges are modified while iterating through it. That may cause corruption in ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not using iterators, and the changes are essentially pop this->ranges[I], it should be safe unless I missed something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mixed up this one and below. LGTM.
| for (int i = 0; i < this->ranges.size();) { | ||
| const auto& r = this->ranges[i]; | ||
| if (r->keys.intersects(range)) { | ||
| this->ranges[i] = this->ranges.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see why this one doesn't have ++i in the for statement.
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
| for (int i = 0; i < this->ranges.size();) { | ||
| const auto& r = this->ranges[i]; | ||
| if (r->keys.intersects(range)) { | ||
| this->ranges[i] = this->ranges.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see why this one doesn't have ++i in the for statement.
| for (int i = 0; i < this->ranges.size(); ++i) { | ||
| const auto& r = this->ranges[i]; | ||
| if (r.getPtr() == shard.getPtr()) { | ||
| this->ranges[i] = this->ranges.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mixed up this one and below. LGTM.
* main: (67 commits) Do not set timeout for RocksDB reads in simulation. (apple#8716) Update BlobManager epoc after restore Checkpoint restore with shard (apple#8667) More nit changes around DD Add some comments in DD and fix some nit Mako: Initialize Arguments::num_report_files to 0 (apple#8717) Use a network option for retaining temporary client lib copies instead of a client knob (apple#8630) Fix unintended behavior in Mako (apple#8635) Update TLogServer in main Change ASSERT_WE_THINK to ASSERT when checking that peek reply start version must be greater than latest pop version Apply clang format Fix a race condition between batched peek and pop, where the server removal pop may be lost revert sys.path placement revert sys.path placement fixing build vs with that caused compiling aws sdk to not work properly Fix /GlobalTagThrottler/TagLimit unit test fix update txn options moving fdbblob in sim folder for simulations (apple#8701) Set minimum average cost to 1 page in GlobalTagThrottler Add metrics for read range. (apple#8692) ...
* main: (35 commits) Do not set timeout for RocksDB reads in simulation. (apple#8716) Update BlobManager epoc after restore Checkpoint restore with shard (apple#8667) More nit changes around DD Add some comments in DD and fix some nit Mako: Initialize Arguments::num_report_files to 0 (apple#8717) Use a network option for retaining temporary client lib copies instead of a client knob (apple#8630) Fix unintended behavior in Mako (apple#8635) Update TLogServer in main Change ASSERT_WE_THINK to ASSERT when checking that peek reply start version must be greater than latest pop version Apply clang format Fix a race condition between batched peek and pop, where the server removal pop may be lost revert sys.path placement revert sys.path placement fixing build vs with that caused compiling aws sdk to not work properly Fix /GlobalTagThrottler/TagLimit unit test fix update txn options moving fdbblob in sim folder for simulations (apple#8701) Set minimum average cost to 1 page in GlobalTagThrottler Add metrics for read range. (apple#8692) ...
|
This change introduces a use of an uninitialized ShardInfo::shardId according to valgrind |
Replace this text with your description here...
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)