-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add trace event for role recruitment by CC #5370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trace event for role recruitment by CC #5370
Conversation
AWS CodeBuild CI Report
|
| return Void(); | ||
| } else if (e.code() == error_code_operation_failed || e.code() == error_code_no_more_servers) { | ||
| // recruitment not good enough, try again | ||
| TraceEvent("RecruitFromConfigurationRetry", self->id).error(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add whether goodRecruitmentTime is ready or not in the traceevent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also how long until it is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was to add
.detail("GoodRecruitmentTimeReady", goodRecruitmentTime.isReady())
Regarding And also how long until it is ready., I was thinking whether we can print something that how long until the goodRecruitmentTime is ready (the amount of time left in the delay), but looking at Future class, I don't see a way to actually get this info, so you can ignore it for now.
AWS CodeBuild CI Report
|
AWS CodeBuild CI Report
|
| return Void(); | ||
| } else if (e.code() == error_code_operation_failed || e.code() == error_code_no_more_servers) { | ||
| // recruitment not good enough, try again | ||
| TraceEvent("RecruitFromConfigurationRetry", self->id).error(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was to add
.detail("GoodRecruitmentTimeReady", goodRecruitmentTime.isReady())
Regarding And also how long until it is ready., I was thinking whether we can print something that how long until the goodRecruitmentTime is ready (the amount of time left in the delay), but looking at Future class, I don't see a way to actually get this info, so you can ignore it for now.
| try { | ||
| auto rep = self->findWorkersForConfiguration(req); | ||
| req.reply.send(rep); | ||
| TraceEvent("RecruitFromConfigurationDone", self->id).detail("WaitTime", now() - startTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this. See my other comment.
AWS CodeBuild CI Report
|
AWS CodeBuild CI Report
|
In some long recovery, the recruiting_transaction_servers step takes abnormally long time.
We want to understand why the recruiting_transaction_servers step takes long time to finish.
A starting point of root cause analysis is to see whether (1) CC repeatedly retry the recruitment; or (2) the recruitment is delay to backend; or (3) some CC error occurs and CC dies.
For case (2) and (3), we already have RecruitFromConfigurationNotAvailable event and RecruitFromConfigurationError event to decide the case.
For case (1), we add RecruitFromConfigurationRetry event by this PR.
The appearance of this new event should be rare since (1) the event happens when no enough available servers in recruitment and (2) the retry (not in backend) lasts at most 1 second (set by WAIT_FOR_GOOD_RECRUITMENT_DELAY).
Passed Joshua correctness-7.1.0 test: 20210824-013255-zhewang-1f4960baeba2f8f6
Code-Reviewer Section
The general guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormasterif this is the youngest branch)