Skip to content

Conversation

@halfprice
Copy link
Contributor

@halfprice halfprice commented Nov 1, 2022

As @sfc-gh-ajbeamon found out, batching empty peek may lost pop generated by storage server removal, so a removed storage server may still live after the storage server removal.

This is not an issue when PEEK_BATCHING_EMPTY_MSG is turned off.

Also add some assertions to catch the invariant that when replying TLogPeek, the begin version must be greater than the popped version.

500K simulation test: 20221102-060846-zhewu_7.1_fix-604d5aec66036f65 compressed=True data_size=31702403 duration=26141298 ended=500000 fail_fast=10 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:52:57 sanity=False started=501434 stopped=20221102-090143 submitted=20221102-060846 timeout=5400 username=zhewu_7.1_fix

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@halfprice halfprice marked this pull request as ready for review November 1, 2022 23:07
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 88530f4
  • Duration 0:03:52
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 88530f4
  • Duration 0:04:47
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 88530f4
  • Duration 0:06:05
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 2f25e2e
  • Duration 0:08:45
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 2f25e2e
  • Duration 0:09:31
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 2f25e2e
  • Duration 0:10:42
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)


state Version endVersion;
state bool onlySpilled;
ASSERT_WE_THINK(reqBegin >= poppedVersion(logData, reqTag));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use ASSERT instead? ASSERT_WE_THINK is noop in release build. Or, if this conditions triggers, return an error instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that we compute latest popped version using poppedVersion() instead of reading from (cached) poppedVer. So I want to avoid this computation in prod.

Also, I feel this should be an invarant rather than an error, so it's better to be caught in simulation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is still going to be computed in non-simulation, it just won't trigger the assert. If you want to prevent it from running at all, you would need to do something like:

if (g_network->isSimulated()) { ASSERT(...); }

Also, I think this condition would warrant failing the tlog since we know it shouldn't be happening and if it does it is dangerous, but if it's too expensive to compute then we don't have to have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poppedVersion() only does a map lookup, so it should be cheap. I am fine to keep as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous block handles the case for poppedVer > reqBegin, so it seems this assertion always true after all, when ENABLE_VERSION_VECTOR is not enabled.

Copy link
Collaborator

@sfc-gh-abeamon sfc-gh-abeamon Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also true when ENABLE_VERSION_VECTOR is enabled since we look up and set poppedVer at the end of that block. I'm in favor of just changing this to the ASSERT then if this is inexpensive, otherwise you can change it to the formulation I described above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All, good suggestion. I change it to ASSERT.

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: cc28310
  • Duration 0:04:03
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: cc28310
  • Duration 0:04:50
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2

  • Commit ID: cc28310
  • Duration 0:04:59
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Monterey 12.x

  • Commit ID: cc28310
  • Duration 0:05:05
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: cc28310
  • Duration 0:05:09
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@halfprice halfprice force-pushed the zhewu/fix-batch-peek-pop-race branch from cc28310 to 1866c57 Compare November 2, 2022 05:41
@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS BigSur 11.5.2

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Monterey 12.x

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

@halfprice halfprice requested a review from jzhou77 November 2, 2022 06:54
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 1866c57
  • Duration 1:45:21
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" ${CODEBUILD_SRC_DIR}/fdb-kubernetes-tests/logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" ${CODEBUILD_SRC_DIR}/fdb-kubernetes-tests/logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

jzhou77
jzhou77 previously approved these changes Nov 2, 2022
@halfprice halfprice requested a review from jzhou77 November 2, 2022 17:14
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 9aa56ef
  • Duration 0:05:26
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 9aa56ef
  • Duration 0:07:28
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 9aa56ef
  • Duration 0:07:33
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

…version must be greater than latest pop version
@halfprice halfprice force-pushed the zhewu/fix-batch-peek-pop-race branch from 9aa56ef to 2b7ed1d Compare November 2, 2022 17:22
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

@jzhou77 jzhou77 merged commit a9b8a4b into apple:release-7.1 Nov 2, 2022
@halfprice halfprice deleted the zhewu/fix-batch-peek-pop-race branch February 20, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants