Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275

alexmiller-apple · 2019-03-11T22:03:21Z

Part 5 of probably 5 for #1048

This PR introduces 3 important things:

It checksums pages it reads from the disk queue, thereby closing the accidentally existing possibility of feeding bitrot to storage servers.
It removes versions from versionLocation when they're spilled, instead of when they're popped.
versionLocation is a map of version to location on disk. Rough back-of-the-envelope math suggested that a 1TB DiskQueue could take 10s of GB of memory for versionLocation, which means it has to be cleaned up.
It restricts recovery to only read the portion of the disk queue that needs to be re-indexed in memory. Anything that has already been spilled won't be read. This means recovering a 1TB disk queue will only read ~2GB of it.

dongxinEric · 2019-03-11T22:59:13Z

fdbserver/DiskQueue.actor.cpp

 			hash = UID( (int64_t(part[0])<<32)+part[1], 0xfdb );
 		}
+		void updateHash_crc32c() {
+			uint32_t checksum = crc32c_append( 0xfdbeefdb, (uint8_t*)&seq, sizeof(Page)-sizeof(hash) );


Great seed choice.

dongxinEric · 2019-03-11T23:06:37Z

fdbserver/IDiskQueue.h

 #include "fdbclient/FDBTypes.h"
 #include "fdbserver/IKeyValueStore.h"

+enum class CheckHashes {


Why not just use a bool ?

It's personal preference. I really dislike functions that look like doFoo(true, true, false, false, true), because any bool argument always requires me going to look at the declaration to see what each of the trues and falses mean, rather than just giving them a name that's meaningful at the callsite.

alexmiller-apple · 2019-03-13T12:42:19Z

I've left "potato" as "potato" because it's filled with trace events that will need to be rebased out anyway. Otherwise, this code is complete as far as I can tell. I'll find out from the correctness that I'll leave running overnight if that's actually accurate. The only other work I have planned for spill-by-reference is to add TraceEvents and knobs, which can be done on release-6.1, and I'm sure performance tests will result in something to fix.

xumengpanda · 2019-03-13T16:00:50Z

I've left "potato" as "potato" because it's filled with trace events that will need to be rebased out anyway. Otherwise, this code is complete as far as I can tell. I'll find out from the correctness that I'll leave running overnight if that's actually accurate. The only other work I have planned for spill-by-reference is to add TraceEvents and knobs, which can be done on release-6.1, and I'm sure performance tests will result in something to fix.

Can I assume it is at least faster than non-spilling solution? :)
Maybe more performance tests aim to make it even faster?

alexmiller-apple · 2019-03-13T21:38:39Z

I'm expecting performance tests will reveal things like slow tasks, large packets, OOMs, etc. The sorts of things that simulation is bad at finding, but saturating workloads on a real cluster are good at finding.

Though this format is being deprecated in favor of an eventual plumbing through of TLogVersion, we should probably bump it anyway. And also remove the fallback to OldTLogServer code. It should never be executed, as OldTLogServer_6_0 is entirely relied upon to execute OldTLogServer_4_6.

There's various ASSERT()'s that assume firstPages is empty, and enforces things about `seq`. Some of these asserts have spuriously passed, since uninitialized pages look like they have a `seq` of 0, which would be the beginning of the disk queue. Now they'll look like the end of the disk queue, which is far easier to fail on.

We don't have a forward compatibility story for the memory storage engine, so its DiskQueue will still be hashlittle2 until one exists.

This allows us to do easy upgrades of SpilledData in the future, if the need arises, because we then have a protocol version to compare against.

If a server has its data spilled, then it's behind the 5s window. Feeding it data is less important than committing, so we can hide the extra CPU usage from checksumming the read amplified disk queue pages.

alexmiller-apple · 2019-03-16T04:46:34Z

fdbserver/KeyValueStoreMemory.actor.cpp

 IKeyValueStore* keyValueStoreMemory( std::string const& basename, UID logID, int64_t memoryLimit, std::string ext ) {
 	TraceEvent("KVSMemOpening", logID).detail("Basename", basename).detail("MemoryLimit", memoryLimit);
-	IDiskQueue *log = openDiskQueue( basename, ext, logID);
+	IDiskQueue *log = openDiskQueue( basename, ext, logID, DiskQueueVersion::V0 );


I've validated in correctness that we'd be fine flipping this to V1 passes restarting tests, so it can be set to V1 for 6.2.

Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.

This time, track what location in the DiskQueue has been spilled in persistent state, and then feed it back into the disk queue before recovery. This also introduces an ASSERT that recovery only reads exactly the bytes that it needs to have in memory.

fdbserver/TLogServer.actor.cpp

… debugging.

alexmiller-apple · 2019-03-18T23:56:12Z

Simulation seems to approve of the recent changes, so this should be good to merge.

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data

dongxinEric reviewed Mar 11, 2019

View reviewed changes

alexmiller-apple force-pushed the tstlog9 branch from 0603481 to 3a6c214 Compare March 13, 2019 11:54

alexmiller-apple assigned etschannen Mar 13, 2019

alexmiller-apple requested a review from etschannen March 13, 2019 11:57

alexmiller-apple force-pushed the tstlog9 branch from 738b5e7 to ec17719 Compare March 16, 2019 04:00

alexmiller-apple added 8 commits March 15, 2019 21:01

Add LogId to all TLog TraceEvents that have it.

4f98634

Remove verification code from DiskQueue and TLogServer.

686b097

If TLogVersion >= 3, use crc32c for the DiskQueue hash for TLogs.

bf247ee

We don't have a forward compatibility story for the memory storage engine, so its DiskQueue will still be hashlittle2 until one exists.

Persist the protocol version of a TLog instance when it is created.

81c59e8

This allows us to do easy upgrades of SpilledData in the future, if the need arises, because we then have a protocol version to compare against.

Make checking or ignoring checksums part of the IDiskQueue::read API.

ee4721a

Checksum DiskQueue pages on read, but at a lower priority.

7f5bc29

If a server has its data spilled, then it's behind the 5s window. Feeding it data is less important than committing, so we can hide the extra CPU usage from checksumming the read amplified disk queue pages.

alexmiller-apple force-pushed the tstlog9 branch from ec17719 to 540450a Compare March 16, 2019 04:01

alexmiller-apple changed the title ~~Tstlog9~~ Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data Mar 16, 2019

alexmiller-apple marked this pull request as ready for review March 16, 2019 04:44

alexmiller-apple commented Mar 16, 2019

View reviewed changes

alexmiller-apple added 2 commits March 18, 2019 15:09

alexmiller-apple force-pushed the tstlog9 branch from 540450a to 37ea71b Compare March 18, 2019 22:10

etschannen suggested changes Mar 18, 2019

View reviewed changes

Remove random bits of code that were either unneeded or leftover from…

b11ecb3

… debugging.

etschannen approved these changes Mar 19, 2019

View reviewed changes

etschannen merged commit ddf8e86 into apple:master Mar 19, 2019

etschannen added a commit to etschannen/foundationdb that referenced this pull request Mar 26, 2019

Merge pull request apple#1275 from alexmiller-apple/tstlog9

2bfec2c

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data

etschannen added a commit to etschannen/foundationdb that referenced this pull request Mar 26, 2019

Merge pull request apple#1275 from alexmiller-apple/tstlog9

c581e6c

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data

alexmiller-apple pushed a commit to etschannen/foundationdb that referenced this pull request Mar 26, 2019

Merge pull request apple#1275 from alexmiller-apple/tstlog9

a92c684

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275

Uh oh!

alexmiller-apple commented Mar 11, 2019 •

edited

Loading

Uh oh!

dongxinEric Mar 11, 2019

Uh oh!

dongxinEric Mar 11, 2019

Uh oh!

alexmiller-apple Mar 12, 2019 •

edited

Loading

Uh oh!

alexmiller-apple commented Mar 13, 2019

Uh oh!

xumengpanda commented Mar 13, 2019

Uh oh!

alexmiller-apple commented Mar 13, 2019

Uh oh!

alexmiller-apple Mar 16, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmiller-apple commented Mar 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275

Spill-By-Reference TLog Part 5: TLogs That Can Spill Large Amounts Of Data #1275

Uh oh!

Conversation

alexmiller-apple commented Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongxinEric Mar 11, 2019

Choose a reason for hiding this comment

Uh oh!

dongxinEric Mar 11, 2019

Choose a reason for hiding this comment

Uh oh!

alexmiller-apple Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexmiller-apple commented Mar 13, 2019

Uh oh!

xumengpanda commented Mar 13, 2019

Uh oh!

alexmiller-apple commented Mar 13, 2019

Uh oh!

alexmiller-apple Mar 16, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmiller-apple commented Mar 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexmiller-apple commented Mar 11, 2019 •

edited

Loading

alexmiller-apple Mar 12, 2019 •

edited

Loading