-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fdbserver/DiskQueue.actor.cpp
Outdated
| hash = UID( (int64_t(part[0])<<32)+part[1], 0xfdb ); | ||
| } | ||
| void updateHash_crc32c() { | ||
| uint32_t checksum = crc32c_append( 0xfdbeefdb, (uint8_t*)&seq, sizeof(Page)-sizeof(hash) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great seed choice.
| #include "fdbclient/FDBTypes.h" | ||
| #include "fdbserver/IKeyValueStore.h" | ||
|
|
||
| enum class CheckHashes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use a bool ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's personal preference. I really dislike functions that look like doFoo(true, true, false, false, true), because any bool argument always requires me going to look at the declaration to see what each of the trues and falses mean, rather than just giving them a name that's meaningful at the callsite.
0603481 to
3a6c214
Compare
|
I've left "potato" as "potato" because it's filled with trace events that will need to be rebased out anyway. Otherwise, this code is complete as far as I can tell. I'll find out from the correctness that I'll leave running overnight if that's actually accurate. The only other work I have planned for spill-by-reference is to add TraceEvents and knobs, which can be done on release-6.1, and I'm sure performance tests will result in something to fix. |
Can I assume it is at least faster than non-spilling solution? :) |
|
I'm expecting performance tests will reveal things like slow tasks, large packets, OOMs, etc. The sorts of things that simulation is bad at finding, but saturating workloads on a real cluster are good at finding. |
738b5e7 to
ec17719
Compare
Though this format is being deprecated in favor of an eventual plumbing through of TLogVersion, we should probably bump it anyway. And also remove the fallback to OldTLogServer code. It should never be executed, as OldTLogServer_6_0 is entirely relied upon to execute OldTLogServer_4_6.
There's various ASSERT()'s that assume firstPages is empty, and enforces things about `seq`. Some of these asserts have spuriously passed, since uninitialized pages look like they have a `seq` of 0, which would be the beginning of the disk queue. Now they'll look like the end of the disk queue, which is far easier to fail on.
We don't have a forward compatibility story for the memory storage engine, so its DiskQueue will still be hashlittle2 until one exists.
This allows us to do easy upgrades of SpilledData in the future, if the need arises, because we then have a protocol version to compare against.
If a server has its data spilled, then it's behind the 5s window. Feeding it data is less important than committing, so we can hide the extra CPU usage from checksumming the read amplified disk queue pages.
ec17719 to
540450a
Compare
| IKeyValueStore* keyValueStoreMemory( std::string const& basename, UID logID, int64_t memoryLimit, std::string ext ) { | ||
| TraceEvent("KVSMemOpening", logID).detail("Basename", basename).detail("MemoryLimit", memoryLimit); | ||
| IDiskQueue *log = openDiskQueue( basename, ext, logID); | ||
| IDiskQueue *log = openDiskQueue( basename, ext, logID, DiskQueueVersion::V0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've validated in correctness that we'd be fine flipping this to V1 passes restarting tests, so it can be set to V1 for 6.2.
Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.
This time, track what location in the DiskQueue has been spilled in persistent state, and then feed it back into the disk queue before recovery. This also introduces an ASSERT that recovery only reads exactly the bytes that it needs to have in memory.
540450a to
37ea71b
Compare
|
Simulation seems to approve of the recent changes, so this should be good to merge. |
Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data
Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data
Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data
Part 5 of probably 5 for #1048
This PR introduces 3 important things:
versionLocationis a map of version to location on disk. Rough back-of-the-envelope math suggested that a 1TB DiskQueue could take 10s of GB of memory for versionLocation, which means it has to be cleaned up.