Skip to content

Conversation

@dlambrig
Copy link
Contributor

@dlambrig dlambrig commented Jul 20, 2022

Add new counter to track when the commit version for an SS is not taken from the version vector. When version vector is enabled there may be a performance hit in that situation. Performance diagnosis is harder without it. The counter should normally be 0.

Joshua:
20220728-195222-henrylambright-45953926d0fe9fe5

Code-Reviewer Section

The general guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@dlambrig dlambrig requested review from jzhou77 and sbodagala July 20, 2022 22:38
}

if (ssVersionVectorCache.getMaxVersion() != invalidVersion && readVersion > ssVersionVectorCache.getMaxVersion()) {
if (readVersion > ssVersionVectorCache.getMaxVersion()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check was redundant per line 241 above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my bad, actually. I should have removed this redundant check as part of commit c74710f.

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 82624e6
  • Duration 0:59:06
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 82624e6
  • Duration 4:17:50
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Logs (available for 30 days)

@dlambrig dlambrig marked this pull request as ready for review July 21, 2022 14:50
@dlambrig dlambrig requested a review from jzhou77 July 21, 2022 14:50
@dlambrig dlambrig self-assigned this Jul 21, 2022
@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS BigSur 11.5.2

  • Commit ID: b01e8b8
  • Duration 0:42:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: b01e8b8
  • Duration 1:01:17
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)

if (!updatedVersionMap) {
TraceEvent(SevDebug, "CommitVersionNotFoundForSS")
.detail("InSSIDMap", ssidTagMapping.find(uid) != ssidTagMapping.end() ? 1 : 0)
.detail("InSSIDMap", iter != ssidTagMapping.end() ? 1 : 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the overhead that would be imposed by this event - Is this trace event really needed? What additional actionable information does this provide in production?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only log SevInfo and higher in production. There is a knob controlling the logging level.

Copy link
Contributor Author

@dlambrig dlambrig Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is SevDebug, so off by default. I think it is useful to have if we see a problem in ckperf that is hard to reproduce elsewhere. In that case we may wish to change the trace severity knob to Debug level (this seems to be a common code pattern in fdb), and can find which of the 3 conditionals is failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Would you like to log the contents of the version vector too (this may help with identifying special scenarios like post-recovery at which point the version vector is going to be empty)?

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: b01e8b8
  • Duration 4:16:47
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Logs (available for 30 days)

jzhou77
jzhou77 previously approved these changes Jul 22, 2022
if (!updatedVersionMap) {
TraceEvent(SevDebug, "CommitVersionNotFoundForSS")
.detail("InSSIDMap", ssidTagMapping.find(uid) != ssidTagMapping.end() ? 1 : 0)
.detail("InSSIDMap", iter != ssidTagMapping.end() ? 1 : 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only log SevInfo and higher in production. There is a knob controlling the logging level.

sbodagala
sbodagala previously approved these changes Jul 22, 2022
@dlambrig dlambrig dismissed stale reviews from sbodagala and jzhou77 via 850a59b July 22, 2022 14:42
@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS BigSur 11.5.2

  • Commit ID: 850a59b
  • Duration 0:42:34
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 850a59b
  • Duration 1:00:03
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)

sbodagala
sbodagala previously approved these changes Jul 22, 2022
.detail("Tag", tag)
.detail("CommitVersion", commitVersion)
.detail("ReadVersion", readVersion)
.detail("VersionVector", ssVersionVectorCache.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be large when VV has many entries and the value may be truncated. So probably need to set the max field and event length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use settings I see many places elsewhere. .setMaxEventLength(11000) and .setMaxFieldLength(10000)

jzhou77
jzhou77 previously approved these changes Jul 22, 2022
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 850a59b
  • Duration 4:17:36
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Logs (available for 30 days)

@dlambrig dlambrig marked this pull request as draft July 22, 2022 20:08
@dlambrig
Copy link
Contributor Author

There is a bug in vv upgrade/downgrade, putting into draft until that is resolved

@dlambrig
Copy link
Contributor Author

dlambrig commented Jul 28, 2022

There is a bug in vv upgrade/downgrade, putting into draft until that is resolved

The problem was the new event occurred too many times and TracedTooManyLines was triggered, because the readVersion = commit Version is a common case and should not be logged.

@dlambrig dlambrig dismissed stale reviews from jzhou77 and sbodagala via 42c23a8 July 28, 2022 20:06
@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS BigSur 11.5.2

  • Commit ID: 42c23a8
  • Duration 0:44:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 42c23a8
  • Duration 1:03:57
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@dlambrig dlambrig marked this pull request as ready for review July 28, 2022 21:19
@dlambrig dlambrig requested a review from jzhou77 July 28, 2022 21:20
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 42c23a8
  • Duration 1:29:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@dlambrig dlambrig merged commit 83d2351 into main Jul 28, 2022
@dlambrig dlambrig deleted the clientCounter branch July 28, 2022 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants