-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Overall framework and first buggify entries #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fdbclient/NativeAPI.actor.cpp
Outdated
| throw invalid_option_value(); | ||
| } | ||
| break; | ||
| case FDBNetworkOptions::CLIENTBUGGIFY_ENABLE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the CLIENTBUGGIFY_DISABLE option and simply default the CLIENTBUGGIFY_ENABLE to false? This is totally non-blocking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - I don't know what makes more sense and I don't have strong opinions. My line of thinking was that you might want to be able to enable and disable client buggify multiple times during a single run. But that might be a rare enough use-case so that we might not want to have more options for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm personally ok with having both options, but if anybody disagrees we should have that discussion now rather than try to remove the option from our API later.
Anybody else want to weigh in? @alecgrieser @dongxinEric @alexmiller-apple etc.
|
@mpilman Thanks for working on it. I was thinking of doing something similar in Doc Layer transaction abstraction. But, this is much better and generic. I didn't do code review, but the concept looks good to me. Couple of comments
|
|
Thanks, I'll continue with this then. To answer your question:
|
ajbeamon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we could buggify (though it needn't be part of this PR necessarily) is cluster upgrades via the multi-version client. The idea here would be to cause the multi-version client to "switch" from the current client to the same one while triggering all the usual behavior of failing outstanding requests, etc.
What kind of errors might |
Good point. |
amouehsan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. This looks great to me. Just a couple of questions for you.
775baf3 to
32585e5
Compare
alecgrieser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these were discussed in real life, but here are my comments for completeness.
|
Is there anything blocking this from being a regular, non-draft PR? If so, what is it? |
32585e5 to
6ea7571
Compare
fdbclient/NativeAPI.actor.cpp
Outdated
| case FDBNetworkOptions::CLIENT_BUGGIFY_SECTION_ACTIVATED_PROBABILITY: | ||
| validateOptionValue(value, true); | ||
| clearBuggifySections(BuggifyType::Client); | ||
| P_BUGGIFIED_SECTION_ACTIVATED[int(BuggifyType::Client)] = 100.0/double(extractIntOption(value, 0, 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be value/100.0 rather than 100.0/value? The same question applies to CLIENT_BUGGIFY_SECTION_FIRED_PROBABILITY as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, of course. This is an embarrassing one
fdbclient/vexillographer/fdb.options
Outdated
| description="" /> | ||
| <Option name="client_buggify_section_activated_probability" code="82" | ||
| paramType="Int" paramDescription="probability expressed as a percentage between 0 and 100" | ||
| description="Set the probability of a CLIENT_BUGGIFY section being active for the current execution. Only applies to code paths first traversed AFTER this option is changed." /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? It looks like the buggified sections get cleared when this option is set. That leads me to another thought, which is that if it's the case that buggified sections can change during a run, we may want to note that somewhere in case someone tries to write a buggify statement that depends on the section being stable during a run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, this is not true anymore (it used to be but I later introduced the whole clear thingy as I thought it will make it easier to understand).
For your second thought: the normal BUGGIFY which we usually use should be stable, this only applies to CLIENT_BUGGIFY. But nonetheless it would be good to keep this in mind. Where do you want me to document this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, and I'm not sure there's any perfect place to put it. My best suggestion would be to note this as a comment on the macros CLIENT_BUGGIFY* in NativeAPI.actor.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the comment now, it looks good. Don't forget to also update the description of this option to reflect the new behavior of clearing activated sections.
Co-Authored-By: A.J. Beamon <[email protected]>
Co-Authored-By: A.J. Beamon <[email protected]>
Co-Authored-By: A.J. Beamon <[email protected]>
Co-Authored-By: A.J. Beamon <[email protected]>
I just noticed this previous comment, and you had mentioned subsequently that you would add some of these (though for |
|
Good point - I missed the |
Co-Authored-By: A.J. Beamon <[email protected]>
This PR will add a special client side buggify. The overall idea is that a user can enable this client-side buggify to test client libraries and layers.
If enabled, the client will start throwing errors with a configurable probability. This way an application or layer implementer can test that her exception handling works correctly.