roachtest: randomly run with runtime assertions by default#111949
roachtest: randomly run with runtime assertions by default#111949craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
49bdf6a to
6a0b6e0
Compare
01c896e to
04c9709
Compare
pkg/cmd/roachtest/tests/kvbench.go
Outdated
| Suites: registry.ManualOnly, | ||
| Tags: registry.Tags("manual"), | ||
| Owner: registry.OwnerKV, | ||
| Benchmark: true, // TODO: Comments imply this is a benchmark test but this was not set |
There was a problem hiding this comment.
The test name and comments imply this is a benchmark test so I assume the missing benchmark spec is a mistake but if there is a reason why it shouldn't be set let me know.
04c9709 to
1d6e2db
Compare
edbf21b to
09f34ee
Compare
09f34ee to
05598c3
Compare
pkg/cmd/roachtest/test_impl.go
Outdated
| cockroach string // path to main cockroach binary | ||
| cockroachShort string // path to cockroach-short binary compiled with --crdb_test build tag | ||
| cockroach string // path to main cockroach binary | ||
| cockroachShort string // path to cockroach-short binary compiled with --crdb_test build tag |
There was a problem hiding this comment.
Nit: cockroachAssertionsEnabled (or similar, it seems elsewhere we use cockroachEA for "enabled assertions") is a better name than cockroachShort. Yes, the comment explains the real difference between these two things, but everyone sees the variable name, not everyone sees the comment.
| roachtest) | ||
| # Roachtest binary. | ||
| bazel_args=(//pkg/cmd/roachtest) | ||
| bazel_args=(//pkg/cmd/roachtest --crdb_test) |
There was a problem hiding this comment.
What benefit does this give exactly?
There was a problem hiding this comment.
Stealing from what Stan said:
Note, to fully support interop between a roachtest runner, (executing the roachtest binary), and a node, executing the cockroach binary compiled with assertions (i.e., crdb_test), the roachtest binary must also be compiled with assertions to establish the same serialization on the wire. E.g., KVNemesisSeq is used on the wire only under crdb_test builds. The corresponding (sequence) Container type is conditional on the binary being compiled under crdb_test. (See pkg/kv/kvpb/api.proto) Thus, a request serialized from the roachtest runner is compatible with the cockroach binary iff both binaries are built under crdb_test.
The solution here was to just compile all roachtest binaries with crdb_test since it touches very little code, rather than try and figure out a way to have team city selectively use roachtest binaries based on what the cockroach binary is compiled with.
I can add a comment though to make it more clear.
05598c3 to
77d45ad
Compare
3c00c4d to
10f3709
Compare
renatolabs
left a comment
There was a problem hiding this comment.
Specific comments on
kv/splittests below
👍 sounds good, the exception seems reasonable.
I want to say the connection failure was just an infra flake since I can't repro it
I think the connection failure is a red herring, the test timed out after 2h.
These are full runs, but are tests running with metamorphic builds all the time? I don't see any custom parameters to force that. Also, when I look at the logs for some tests in that #2 link, I only see the Running with runtime assertions enabled on global seed message on some tests, not all.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rickystewart, and @srosenberg)
pkg/cmd/roachtest/test_impl.go line 166 at r6 (raw file):
// CI build, but we can't assume it exists in every roachtest call. if path := t.RuntimeAssertionsCockroach(); path != "" { t.l.Printf("Runtime assertions enabled")
For instant knowledge of whether we are running with assertions enabled or not, we could add a log message to both branches.
pkg/cmd/roachtest/tests/asyncpg.go line 54 at r5 (raw file):
Previously, DarrylWong wrote…
The
asyncpgtests drops the entire database several times throughout the test which also drops session vars. So even setting the default session value like inpgjdbcisn't enough.
Got it, thanks. Would it make sense to run these tests with assertions enabled, but disabling metamorphic constants?
c58ba1a to
339f002
Compare
There was a problem hiding this comment.
I think the connection failure is a red herring, the test timed out after 2h.
Yeah after running it 5+ times I finally got it to timeout on creating indexes... Seems like an actual issue and not a flake but seems weird why it doesn't consistently fail on the same seed. Will have to investigate more.
These are full runs, but are tests running with metamorphic builds all the time?
Yeah I did it in an admittedly not very obvious way of just temporarily replacing t.StandardCockroach() with the runtime binary in the code itself partly because I was tired of typing in a flag every time. I could do another run if you want, more testing wouldn't hurt but it will probably time out.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @rickystewart, and @srosenberg)
pkg/cmd/roachtest/test_impl.go line 166 at r6 (raw file):
Previously, renatolabs (Renato Costa) wrote…
For instant knowledge of whether we are running with assertions enabled or not, we could add a log message to both branches.
Done.
pkg/cmd/roachtest/tests/asyncpg.go line 54 at r5 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Got it, thanks. Would it make sense to run these tests with assertions enabled, but disabling metamorphic constants?
I was under the impression that just disabling metamorphic constants wasn't easily doable in roachtests since they get enabled on init. If it's simple functionality I'd be happy to try adding, but otherwise I wonder how much benefit it would be over a metamorphic exclusion list .
renatolabs
left a comment
There was a problem hiding this comment.
temporarily replacing
t.StandardCockroach()with the runtime binary in the code itself
Ah, I see it in the commit now!
Seems like an actual issue
Could be, the last we hear from the index creation is
executing declarative schema change PostCommitPhase stage 2 of 7 with 1 BackfillType op (rollback=false) for CREATE INDEX
2h before the test timed out, so it got stuck there for a really long time. We should probably let sql-foundations investigate this issue when it occurs on master, as it doesn't seem related to anything we're doing.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rickystewart, and @srosenberg)
pkg/cmd/roachtest/tests/asyncpg.go line 54 at r5 (raw file):
Previously, DarrylWong wrote…
I was under the impression that just disabling metamorphic constants wasn't easily doable in roachtests since they get enabled on init. If it's simple functionality I'd be happy to try adding, but otherwise I wonder how much benefit it would be over a metamorphic exclusion list .
You can set COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=true when starting the cockroach process, and that will disable use of metamorphic constants in CRDB. Runtime checks (gated with buildutil.CrdbTestBuild) will still run.
I'm saying there's still value in running tests with these runtime checks (but without changing any constants) instead of always using "standard cockroach" in cases like asyncpg. But this can be done in follow-up work too.
339f002 to
68e5974
Compare
There was a problem hiding this comment.
We should probably let sql-foundations investigate this issue when it occurs on master,
Sounds good to me
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @rickystewart, and @srosenberg)
pkg/cmd/roachtest/tests/asyncpg.go line 54 at r5 (raw file):
Previously, renatolabs (Renato Costa) wrote…
You can set
COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=truewhen starting the cockroach process, and that will disable use of metamorphic constants in CRDB. Runtime checks (gated withbuildutil.CrdbTestBuild) will still run.I'm saying there's still value in running tests with these runtime checks (but without changing any
constants) instead of always using "standard cockroach" in cases likeasyncpg. But this can be done in follow-up work too.
Yeah that seems easy enough. In that case, we should only be using StandardCockroach for excessive memory usage and timeouts.
Added it in for asyncpg, kv/split and copy/bank/rows=1000000 and made them all t.Cockroach() again. Initial runs seem good.
This changes the semantics of `t.Cockroach()` to use a binary with runtime assertions and metamorphic constants enabled by default. Performance tests (indicated by the benchmark TestSpec) will continue using the standard binary, without runtime assertions, as usual. This commit also opts-out other tests that cannot easily be run with runtime assertions or metamorphic constants enabled, most often due to timeouts and metamorphic constant incompatibility. Resolves cockroachdb#86678. Informs cockroachdb#94986. Epic: none Release note: None
68e5974 to
581e671
Compare
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rickystewart, and @srosenberg)
renatolabs
left a comment
There was a problem hiding this comment.
Something I was thinking about: I think it shouldn't be too difficult to add a "cluster param" in [1] indicating whether the run used metamorphic builds or not. I think it would be nice to give teams an instant way to know if we used metamorphic builds without even having to click through TC artifacts. Thoughts?
[1]
cockroach/pkg/cmd/roachtest/github.go
Line 207 in cedd409
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rickystewart, and @srosenberg)
Adds a label indicating that a roachtest failure was on a metamorphic build on github issue posts.
DarrylWong
left a comment
There was a problem hiding this comment.
Done, but unfortunately it's not the cleanest. The binary used is decided at runtime so it's not available through testSpec like the other params are. I had to pass it in as a separate argument which seems a little messy.
It should work but I'm happy to revert the commit and go back to the drawing board if you have any better suggestions.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @renatolabs, @rickystewart, and @srosenberg)
mikeCRL
left a comment
There was a problem hiding this comment.
Confirmed no impacts to docs dependencies within the relevant files.
No, what you did is kind of what I had in mind originally. There's no elegant way to aggregate random decision made at test run-time and solving that problem is out of scope for this PR. Given the value that this brings (immediately knowing if we used metamorphic binaries), I think this is a good trade-off. |
|
TFTRs! bors r=renatolabs, mikeCRL, rickystewart, herkolategan |
|
Build failed (retrying...): |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 581e671 to blathers/backport-release-23.2-111949: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This changes the semantics of
t.Cockroach()to use a binary with runtime assertions enabled by default. Performance tests (indicated by the benchmark TestSpec) will be able to continue using the standard binary, without runtime assertions or metamorphic constants, as usual.This commit also opts-out other tests that cannot easily be run with runtime assertions or metamorphic constants enabled, most often due to timeouts or metamorphic constant incompatibility.
Resolves #86678.
Informs #94986.
Epic: none
Release note: None