-
Notifications
You must be signed in to change notification settings - Fork 1.5k
add bytelimit for prefetch #8768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
fdbserver/storageserver.actor.cpp
Outdated
|
|
||
| // create a temporary byte limit for index fetching ONLY, this should be excessive | ||
| // because readRange is cheap when reading additional bytes | ||
| const double fraction = 0.8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption would actually be that the primary is a much smaller fraction--maybe closer to 20% than 80%. That's just a guess though. Perhaps this should be a knob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this probably should be a knob, but I suppose the other side would be, if we're assuming that overriding from the primary scan is cheap (because it's a single scan), and we only really are concerned about overreading from the secondary scans (because those are random reads and are potentially larger), then I suppose having this overestimate (as a heuristic) is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, will make it a Knob.
Doxense CI Report for Windows 10
|
| *remainingLimitBytes -= size; | ||
| ++cnt; | ||
| result.data.push_back(result.arena, kvms[i]); | ||
| if (SERVER_KNOBS->STRICTLY_ENFORCE_BYTE_LIMIT && *remainingLimitBytes <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alecgrieser I checked the byte here and before each batch
| "storageserver.mapKeyValues.BeforeLoop"); | ||
| for (; offset < sz; offset += SERVER_KNOBS->MAX_PARALLEL_QUICK_GET_VALUE) { | ||
|
|
||
| for (; offset<sz&& * remainingLimitBytes> 0; offset += SERVER_KNOBS->MAX_PARALLEL_QUICK_GET_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition (remainingLimitBytes > 0) correct with the EXACT streaming mode? It's a relatively rare streaming mode, but in that case, the user specifies that they want the row limit to be exactly matched, and so they specify a _byte) limit of 0 (indicating "unlimited"). I believe that in that case, we'd effectively want to ignore the remaining limit bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great question, basically to clarify on the case where we have unlimited byteLimit.
- it is handled here then here, and set to
CLIENT_KNOBS->REPLY_BYTE_LIMIT. So if rowLimit wants more, there is nothing we can do anyways.
However I found a bug -- we actually allow either rowLimit or byteLimit to be empty, rather than only enforce rowLimit code
fdbserver/storageserver.actor.cpp
Outdated
| result.data.back().key = input.data[i + offset].key; | ||
| result.data.back().value = input.data[i + offset].value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we similarly need to set the results back() key and value if we terminate early because of the byte limit after a batch (that is, if the STRICTLY_ENFORCE_BYTE_LIMIT knob is false)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in mapSubQuery(, we set the KV if isBoundary is true. I agree it is a bit complicated here -- we need to set both the [0] and [indexResultSize - 1], so we have bool isBoundary = (i + offset) == 0 || (i + offset) == sz - 1; and pass it to mapSubQuery(.
So this code is only for "early" termination, KV will be set in mapSubQuery( in normal termination(i.e. return all entries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood this comment, this is a good catch, I missed it, will fix.
fdbserver/storageserver.actor.cpp
Outdated
|
|
||
| // create a temporary byte limit for index fetching ONLY, this should be excessive | ||
| // because readRange is cheap when reading additional bytes | ||
| const double fraction = 0.8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this probably should be a knob, but I suppose the other side would be, if we're assuming that overriding from the primary scan is cheap (because it's a single scan), and we only really are concerned about overreading from the secondary scans (because those are random reads and are potentially larger), then I suppose having this overestimate (as a heuristic) is fine.
fdbserver/storageserver.actor.cpp
Outdated
| wait(mapKeyValues(data, getKeyValuesReply, req.mapper, &req, req.matchIndex, &remainingLimitBytes)); | ||
| r = _r; | ||
| } catch (Error& e) { | ||
| // TODO: cannot allow txn_too_old to reach here, need to handle it gracefully, add test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think transaction_too_old here actually is fine. It'll get propagated up to the user, and it would be (I think) the expected behavior if doing a read-range close to the 5-second limit boundary. That being said, it would be kind of nice if it could return whatever it had already read rather than just throwing an error, but probably not the end of the world,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining and makes total sense, it is something I want to get to the bottom to, before merging the PR.
| // Map the scanned range to another list of keys and look up. | ||
| GetMappedKeyValuesReply _r = | ||
| wait(mapKeyValues(data, getKeyValuesReply, req.mapper, &req, tenantPrefix, req.matchIndex)); | ||
| wait(mapKeyValues(data, getKeyValuesReply, req.mapper, &req, req.matchIndex, &remainingLimitBytes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need remainingLimitBytes to be a pointer? As I understood it, this was (in the older PR) passed as a pointer because it needed to include the secondary fetches and then decrease that value from the primary scans limit. However, with the current design, I don't think that's a thing any more, as the first scan uses a different limit, and then the secondary scans will keep iterating through batches until they have read up to the original limit (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, it is used for stats and metric only now.. but we might still want to keep it?
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
c75c183 to
03ea326
Compare
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
| init( QUICK_GET_VALUE_FALLBACK, true ); | ||
| init( QUICK_GET_KEY_VALUES_FALLBACK, true ); | ||
| init( MAX_PARALLEL_QUICK_GET_VALUE, 50 ); if ( randomize && BUGGIFY ) MAX_PARALLEL_QUICK_GET_VALUE = deterministicRandom()->randomInt(1, 100); | ||
| init( STRICTLY_ENFORCE_BYTE_LIMIT, false); if( randomize && BUGGIFY ) STRICTLY_ENFORCE_BYTE_LIMIT = deterministicRandom()->coinflip() ? false : true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STRICTLY_ENFORCE_BYTE_LIMIT = deterministicRandom()->coinflip();
* add bytelimit for prefetch A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch. byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit. There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch.
* add bytelimit for prefetch A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch. byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit. There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch.
A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch.
byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit.
There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch.
joshua 20221111-061921-haofu-18a7457297fa0065
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)