Remove redundant "ok" files in core tests #17709

psfinaki · 2024-09-12T14:51:28Z

This is to help @majocha in his journey for test parallelization here.

We noticed these ubiquitous "test.ok" files creation in the tests. Most of them are in the tests/fsharp (core tests) - and this is the target of this PR.

Usually the files are created just before sending "exit 0" signal from tests. After this, they are sometimes checked (sometimes not). Now, there is no need for this really, because the test engine would fail the test if the exit code is different anyway.

At the same time, files contaminate the logs and the file system which turns out to be one of the biggest obstacles for proper test parallelization.

Take this example, quotesInMultipleModules tests. They do some tests and then print this "ok file". Now, let's remove the "ok file" and break the very first assertion:

let a = Module1.Test.bar()
let b = sprintf "%A" (Module1.Test.run())

let test1 = (a=b)

so that it's

let test1 = (a <> b)

The tests fails:

I tested a few more cases just for sanity checks, but anyway the files are just the artifacts of ancient ways of testing we had here.

github-actions · 2024-09-12T14:52:09Z

✅ No release notes required

majocha · 2024-09-12T15:11:20Z

Looking here:
https://github.com/dotnet/fsharp/blob/main/tests/FSharp.Test.Utilities/ScriptingShims.fsx

We may be losing some information without the file. It seems it was the only positive signal that the test script didn't fail silently.
At the moment we convert the exit 1 into exn throw, but we don't translate exit 0 to anything that can be captured as a positive signal of success. OTOH that's how testing works. We test for failures not for successes
.

psfinaki · 2024-09-12T15:43:28Z

So what we need is to understand where the problem is if it happens. For that matter, these files don't help really, whereas the test framework propagates exceptions to the tests outputs anyway. Sometimes it looks ugly, but then it's up to the specific test to improve the situation a bit.

In the example above, nothing really changes if the "test.ok" files is being created - the log is still the same. There might be some edge cases where the "test.ok" is handled in some special manner, but I didn't notice anything interesting in that regard and that can be handled later if needed... But if you notice any particular information we might be missing, then yes, it's worth finding a way to compensate it.

majocha · 2024-09-12T16:23:22Z

I think that shim is just for stuff executed in hosted fsi. We also run some suff out of proc. It's something to investigate.
Anyway this change is for the better. We can add a better signaling mechanism than creating a "ok" file, if needed.

psfinaki · 2024-09-12T17:26:55Z

Alright so this seems to work, tests seem to be running as they used to.

Don't know how much time this saves for the CI since the CI times vary a lot anyway - we'll see.

psfinaki · 2024-09-12T17:28:10Z

@KevinRansom maybe run your eyes over this one, if any concerns come to your mind :)

KevinRansom

Sweet, I will be glad to see the back of these.

T-Gro

With any change like this, it is not the goal to see that "tests still pass". Rather the opposite, it is important that they can still fail :-).

What measures have been taken for all types of tests affected by this change to make sure they can still fail?
(in-proc tests, out of proc tests, tests using the exec cfg ... primitives in a longer series...)

Was the change generated with a regex/script that guaranteed the opposite clause/branch to always yield an exit 1, and then verified it always leads to test failure in all forms of test execution, not just a few selected ones ? E.g. for topinit/generate.fsx it is not immediately obvious. Has it been tested it can still fail properly?

Most importantly, the tests are executed in multiple forms, and we should see evidence that each of these forms still fails the test run in an equivalent way.
In the in-process migrated FSharpSuite, this will be mainly a single place to verify - that one would be fine to demonstrate with just a few examples of failure (exit code, exception, runtime never getting to the block of code writing the file,...).

In the out-of-proc Fsharp.Suite, there are more ways on how the tests are run via the DSL (such as FSI_NETFX_STDIN , FSC_NETFX_TEST_ROUNDTRIP_AS_DLL,..) as well as via custom primitives for checking the file in sequential code.
Typeprovider tests come to mind as an example, as they can have multiple causes of failure, and each of the causes should still lead to a visible test failure after this change.

psfinaki · 2024-09-18T15:33:14Z

@T-Gro thanks for the considerations.

In general:

I did some "smoke testing" in that regard, deliberating bugging some tests and seeing them failing
@majocha is battletesting the whole testing infra in Experiment, parallelize some tests #17662 where he incorporated these changes as well, tests behave as expected there as well

What measures have been taken for all types of tests affected by this change to make sure they can still fail?
(in-proc tests, out of proc tests, tests using the exec cfg ... primitives in a longer series...)

So the tests here are coming from different backgrounds. To get some diversity, let's look at different tests in tests.fs.

Most common type of test in when things are just being compiled and run, like in the case of subtype-langversion-checknulls:

    [<Fact>]
    let ``subtype-langversion-checknulls`` () =
        let cfg = testConfig "core/subtype"
        fsc cfg "%s -o:test-checknulls.exe -g --checknulls" cfg.fsc_flags ["test.fsx"]
        exec cfg ("." ++ "test-checknulls.exe") ""

Let's set the core/subtype/test.fsx to not compile - put "blah" at the end. We are getting the failed test as expected:

Let's put a runtime error then - change test "cenwceoiwe1" ((id |> toConverter |> fromConverter) 6 = 6) to equal 5. This also still fails:

We can still achieve the same just running the target file in fsi: fsi core/subtype/test.fsx:

Then there are these SDKTests, those are trying to build some projects and run targets on them:

    [<Fact>]
    let ``SDKTests`` () =
        let cfg = testConfig "SDKTests"

        let FSharpRepositoryPath = Path.GetFullPath(__SOURCE_DIRECTORY__ ++ ".." ++ "..")

        let projectFile = cfg.Directory ++ "AllSdkTargetsTests.proj"

        exec cfg cfg.DotNetExe ($"msbuild {projectFile} /p:Configuration={cfg.BUILD_CONFIG} -property:FSharpRepositoryPath={FSharpRepositoryPath}")

To fail then, we can bug the project file:

Then there are these singleTestBuildAndRun tests:

    [<Fact>]
    let ``attributes-FSC_OPTIMIZED`` () = singleTestBuildAndRun "core/attributes" FSC_OPTIMIZED

Those are similar to the first category, let me remove the Inline in this line there:

    check "vwlnwer-0wreknj3" (test2()) "Equals: [||], ExtensionMethod: [||], GetHashCode: [||], GetType: [||], ToString: [||], get_Item: [|Inline|], get_P: [||], set_Item: [|Inline|]"

Those also still fail:

Then there are some negative baseline tests:

    [<Fact>]
    let ``state-machines neg-resumable-01`` () =
        let cfg = testConfig "core/state-machines"
        singleVersionedNegTest cfg "preview" "neg-resumable-01"

It's enough to put an extra empty line in the target file to get a mismatch between warning ranges:

Yet others are calling other fun stuff like fsi or csc:

    [<Fact>]
    let events () =
        let cfg = testConfig "core/events"
        fsc cfg "%s -a -o:test.dll -g" cfg.fsc_flags ["test.fs"]
        peverify cfg "test.dll"
        csc cfg """/r:"%s" /reference:test.dll /debug+""" cfg.FSCOREDLLPATH ["testcs.cs"]
        peverify cfg "testcs.exe"
        fsi cfg "" ["test.fs"]
        exec cfg ("." ++ "testcs.exe") ""

If I mess up with C# code, I still get the exception propagated through the test failure:

Was the change generated with a regex/script that guaranteed the opposite clause/branch to always yield an exit 1, and then verified it always leads to test failure in all forms of test execution, not just a few selected ones ? E.g. for topinit/generate.fsx it is not immediately obvious. Has it been tested it can still fail properly?

Exit code were already mostly there, there were just a few places missing them, I put exit codes there manually. The topinit tests are a bit beyond my comprehension but at least if I mess up with a file mentioned there, I get errors.

Typeprovider tests come to mind as an example, as they can have multiple causes of failure, and each of the causes should still lead to a visible test failure after this change.

Same here, I messed up with some target files and still seem to get errors:

As I say, the tests have different backgrounds here and it's not always obvious how they are supposed to pass and fail, but above is at least some evidence. Importantly, there are these bad exit codes set up everywhere so it's reasonable to expect that tests fail via natural xUnit means.

majocha · 2024-09-18T16:57:43Z

Conceptually, the testok file was the signal of test success. An exe or a script can always return 0 because it just finished it's run one way or another but it does not necessarily mean it tested what it intended to test and determined it to be ok. We had such situations reported as failures before, "script returned 0 but test ok not created" or something like this.

T-Gro · 2024-09-18T18:35:38Z

Conceptually, the testok file was the signal of test success. An exe or a script can always return 0 because it just finished it's run one way or another but it does not necessarily mean it tested what it intended to test and determined it to be ok. We had such situations reported as failures before, "script returned 0 but test ok not created" or something like this.

Exactly, agree with @majocha. The ok file was an important assert which is now being removed.
There are several ways in a which a test execution could not reach the line of code writing the file, and trying "some changes" in "some tests" does not give me confidence that this is covered. Some samples mentioned in my brief scan above were not covered, e.g. the different modes of execution defined for the FsharpSuite framework (each has different characteristics of how it is executed).

Especially if something is beyond comprehension or has a more difficult setup (e.g. a codegenerated test), I don't think this should be used as a reason for modification just for the sake of changing it. Seeing the more complex scenarios, e.g. what was removed for splitAssembly test within typeprovidertests, I fail to see this as a proof - at least it is not immediately visible to me.

majocha · 2024-09-18T18:42:32Z

The challenge here is to find a channel of communication other than the filesystem, that would be as ubiquitous and work across the whole zoo of different modes of execution that we have for these tests.

T-Gro · 2024-09-19T07:31:00Z

For the in memory ones, it gets easier - we can pass in a shim for FileSystem API and it could store the result in a mutable, and the test engine then verify.

But for those FSC/FSI/FSC_DLL_ROUNDTRIP as well as manual .exe invocations, in-memory shim would not communicate it back.

I actually do want to change Fsharp.Suite (slowest CI suite) to run on multiple agents (so not multithreading. Not even multi-processing. But multiple machines). Manually running the full suite from VS would still be single threaded and slow, but at least CI times would not suffer.

And if your journey to enable multithreaded parallelization could focus just on componenttests suite for example (at least for me, this is the suite I run locally most of the time), maybe the ROI would improve significantly?

majocha · 2024-09-19T08:54:14Z

Fsharp.Suite (slowest CI suite)

@T-Gro, in my experiment I think the slowest one is actually ComponentTests, both in terms of physical running time and cumulative CPU time (?)

Desktop FSharpSuite with no limit on multithreading I can run in 6 - 7 minutes:

ComponentTests with same settings takes around 13 minutes:

psfinaki · 2024-09-19T09:01:19Z

@T-Gro so far the investment has been quite small - I've created the regex, added a few exit codes and went through the tests which was useful for getting a picture.

I am reluctant to dig deep into all the aspects that you mention, especially with different types of testing - we don't have this documented properly, we don't know who and how uses it, so basically - I am not entirely sure how to do that and this would be a much bigger mental and time investment.

So maybe @majocha we take a different strategy - instead of removing all the ok files, maybe you can come up with the particular tests that come into your way in your parallelization efforts? Then we can adjust only those, and it will be easier to reason about.

majocha · 2024-09-19T09:22:37Z

ATM I'm experimenting with an idea:

Since I already have stdout abstracted per test case in my experiment, I can use it instead of the filesystem. Just do printfn "TEST PASSED OK" and then check the captured stdout.

This is quite general and works for me in all of the cases but one, that is really black magic:

Here fsi is being executed as part of a build, I believe with the target
<UsingTask TaskName="Fsi" AssemblyFile="$(FSharpBuildAssemblyFile)" />

fsharp/tests/fsharp/single-test.fs

Line 302 in 767b5ec

    
           execBothToOut cfg directory buildOutputFile cfg.DotNetExe "build /t:RunFSharpScript"

Is there a way to spit out Fsi output along with the rest of the build output?
@KevinRansom, you probably know the most about it, is it possible?

majocha · 2024-09-19T15:07:35Z

This is now dealt with in my PR, the way I described above, not elegantly but should work.

psfinaki · 2024-09-19T15:44:19Z

Alright then, thanks Jakub, I'll close this one :)

Remove redundant "ok" files in core tests

6345fa8

remove this also

93aeccb

psfinaki marked this pull request as ready for review September 12, 2024 17:21

psfinaki requested a review from a team as a code owner September 12, 2024 17:21

Merge branch 'main' into oknotok

99578da

KevinRansom approved these changes Sep 16, 2024

View reviewed changes

Merge branch 'main' into oknotok

7cac6e4

T-Gro requested changes Sep 17, 2024

View reviewed changes

psfinaki mentioned this pull request Sep 18, 2024

Tests: clean-up unneeded code #17762

Closed

psfinaki closed this Sep 19, 2024

majocha mentioned this pull request Sep 28, 2024

Tests: remove dependency on CurrentDirectory ("test.ok" files) #17815

Merged

psfinaki deleted the oknotok branch February 10, 2025 15:17

Remove redundant "ok" files in core tests #17709

Remove redundant "ok" files in core tests #17709

Uh oh!

Conversation

psfinaki commented Sep 12, 2024

Uh oh!

github-actions bot commented Sep 12, 2024

✅ No release notes required

Uh oh!

majocha commented Sep 12, 2024

Uh oh!

psfinaki commented Sep 12, 2024

Uh oh!

majocha commented Sep 12, 2024

Uh oh!

psfinaki commented Sep 12, 2024

Uh oh!

psfinaki commented Sep 12, 2024

Uh oh!

KevinRansom left a comment

Choose a reason for hiding this comment

Uh oh!

T-Gro left a comment

Choose a reason for hiding this comment

Uh oh!

psfinaki commented Sep 18, 2024

Uh oh!

majocha commented Sep 18, 2024

Uh oh!

T-Gro commented Sep 18, 2024

Uh oh!

majocha commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

T-Gro commented Sep 19, 2024

Uh oh!

majocha commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psfinaki commented Sep 19, 2024

Uh oh!

majocha commented Sep 19, 2024

Uh oh!

majocha commented Sep 19, 2024

Uh oh!

psfinaki commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

majocha commented Sep 18, 2024 •

edited

Loading

majocha commented Sep 19, 2024 •

edited

Loading