-
Notifications
You must be signed in to change notification settings - Fork 838
Remove redundant "ok" files in core tests #17709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ No release notes required |
|
We may be losing some information without the file. It seems it was the only positive signal that the test script didn't fail silently. |
|
So what we need is to understand where the problem is if it happens. For that matter, these files don't help really, whereas the test framework propagates exceptions to the tests outputs anyway. Sometimes it looks ugly, but then it's up to the specific test to improve the situation a bit. In the example above, nothing really changes if the "test.ok" files is being created - the log is still the same. There might be some edge cases where the "test.ok" is handled in some special manner, but I didn't notice anything interesting in that regard and that can be handled later if needed... But if you notice any particular information we might be missing, then yes, it's worth finding a way to compensate it. |
|
I think that shim is just for stuff executed in hosted fsi. We also run some suff out of proc. It's something to investigate. |
|
Alright so this seems to work, tests seem to be running as they used to. Don't know how much time this saves for the CI since the CI times vary a lot anyway - we'll see. |
|
@KevinRansom maybe run your eyes over this one, if any concerns come to your mind :) |
KevinRansom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I will be glad to see the back of these.
T-Gro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With any change like this, it is not the goal to see that "tests still pass". Rather the opposite, it is important that they can still fail :-).
What measures have been taken for all types of tests affected by this change to make sure they can still fail?
(in-proc tests, out of proc tests, tests using the exec cfg ... primitives in a longer series...)
Was the change generated with a regex/script that guaranteed the opposite clause/branch to always yield an exit 1, and then verified it always leads to test failure in all forms of test execution, not just a few selected ones ? E.g. for topinit/generate.fsx it is not immediately obvious. Has it been tested it can still fail properly?
Most importantly, the tests are executed in multiple forms, and we should see evidence that each of these forms still fails the test run in an equivalent way.
In the in-process migrated FSharpSuite, this will be mainly a single place to verify - that one would be fine to demonstrate with just a few examples of failure (exit code, exception, runtime never getting to the block of code writing the file,...).
In the out-of-proc Fsharp.Suite, there are more ways on how the tests are run via the DSL (such as FSI_NETFX_STDIN , FSC_NETFX_TEST_ROUNDTRIP_AS_DLL,..) as well as via custom primitives for checking the file in sequential code.
Typeprovider tests come to mind as an example, as they can have multiple causes of failure, and each of the causes should still lead to a visible test failure after this change.
|
@T-Gro thanks for the considerations. In general:
So the tests here are coming from different backgrounds. To get some diversity, let's look at different tests in Most common type of test in when things are just being compiled and run, like in the case of [<Fact>]
let ``subtype-langversion-checknulls`` () =
let cfg = testConfig "core/subtype"
fsc cfg "%s -o:test-checknulls.exe -g --checknulls" cfg.fsc_flags ["test.fsx"]
exec cfg ("." ++ "test-checknulls.exe") ""Let's set the Let's put a runtime error then - change We can still achieve the same just running the target file in fsi: Then there are these [<Fact>]
let ``SDKTests`` () =
let cfg = testConfig "SDKTests"
let FSharpRepositoryPath = Path.GetFullPath(__SOURCE_DIRECTORY__ ++ ".." ++ "..")
let projectFile = cfg.Directory ++ "AllSdkTargetsTests.proj"
exec cfg cfg.DotNetExe ($"msbuild {projectFile} /p:Configuration={cfg.BUILD_CONFIG} -property:FSharpRepositoryPath={FSharpRepositoryPath}")To fail then, we can bug the project file: Then there are these [<Fact>]
let ``attributes-FSC_OPTIMIZED`` () = singleTestBuildAndRun "core/attributes" FSC_OPTIMIZEDThose are similar to the first category, let me remove the Then there are some negative baseline tests: [<Fact>]
let ``state-machines neg-resumable-01`` () =
let cfg = testConfig "core/state-machines"
singleVersionedNegTest cfg "preview" "neg-resumable-01"It's enough to put an extra empty line in the target file to get a mismatch between warning ranges: Yet others are calling other fun stuff like If I mess up with C# code, I still get the exception propagated through the test failure:
Exit code were already mostly there, there were just a few places missing them, I put exit codes there manually. The topinit tests are a bit beyond my comprehension but at least if I mess up with a file mentioned there, I get errors.
Same here, I messed up with some target files and still seem to get errors: As I say, the tests have different backgrounds here and it's not always obvious how they are supposed to pass and fail, but above is at least some evidence. Importantly, there are these bad exit codes set up everywhere so it's reasonable to expect that tests fail via natural xUnit means. |
|
Conceptually, the testok file was the signal of test success. An exe or a script can always return 0 because it just finished it's run one way or another but it does not necessarily mean it tested what it intended to test and determined it to be ok. We had such situations reported as failures before, "script returned 0 but test ok not created" or something like this. |
Exactly, agree with @majocha. The Especially if something is beyond comprehension or has a more difficult setup (e.g. a codegenerated test), I don't think this should be used as a reason for modification just for the sake of changing it. Seeing the more complex scenarios, e.g. what was removed for |
|
The challenge here is to find a channel of communication other than the filesystem, that would be as ubiquitous and work across the whole zoo of different modes of execution that we have for these tests. |
|
For the in memory ones, it gets easier - we can pass in a shim for FileSystem API and it could store the result in a mutable, and the test engine then verify. But for those FSC/FSI/FSC_DLL_ROUNDTRIP as well as manual .exe invocations, in-memory shim would not communicate it back. I actually do want to change Fsharp.Suite (slowest CI suite) to run on multiple agents (so not multithreading. Not even multi-processing. But multiple machines). Manually running the full suite from VS would still be single threaded and slow, but at least CI times would not suffer. And if your journey to enable multithreaded parallelization could focus just on componenttests suite for example (at least for me, this is the suite I run locally most of the time), maybe the ROI would improve significantly? |
@T-Gro, in my experiment I think the slowest one is actually ComponentTests, both in terms of physical running time and cumulative CPU time (?) Desktop FSharpSuite with no limit on multithreading I can run in 6 - 7 minutes: |
|
@T-Gro so far the investment has been quite small - I've created the regex, added a few exit codes and went through the tests which was useful for getting a picture. I am reluctant to dig deep into all the aspects that you mention, especially with different types of testing - we don't have this documented properly, we don't know who and how uses it, so basically - I am not entirely sure how to do that and this would be a much bigger mental and time investment. So maybe @majocha we take a different strategy - instead of removing all the ok files, maybe you can come up with the particular tests that come into your way in your parallelization efforts? Then we can adjust only those, and it will be easier to reason about. |
|
ATM I'm experimenting with an idea: Since I already have stdout abstracted per test case in my experiment, I can use it instead of the filesystem. Just do printfn "TEST PASSED OK" and then check the captured stdout. This is quite general and works for me in all of the cases but one, that is really black magic: Here fsi is being executed as part of a build, I believe with the target fsharp/tests/fsharp/single-test.fs Line 302 in 767b5ec
Is there a way to spit out Fsi output along with the rest of the build output? @KevinRansom, you probably know the most about it, is it possible? |
|
This is now dealt with in my PR, the way I described above, not elegantly but should work. |
|
Alright then, thanks Jakub, I'll close this one :) |










This is to help @majocha in his journey for test parallelization here.
We noticed these ubiquitous "test.ok" files creation in the tests. Most of them are in the tests/fsharp (core tests) - and this is the target of this PR.
Usually the files are created just before sending "exit 0" signal from tests. After this, they are sometimes checked (sometimes not). Now, there is no need for this really, because the test engine would fail the test if the exit code is different anyway.
At the same time, files contaminate the logs and the file system which turns out to be one of the biggest obstacles for proper test parallelization.
Take this example, quotesInMultipleModules tests. They do some tests and then print this "ok file". Now, let's remove the "ok file" and break the very first assertion:
so that it's
The tests fails:
I tested a few more cases just for sanity checks, but anyway the files are just the artifacts of ancient ways of testing we had here.