Skip to content

Conversation

@psfinaki
Copy link
Contributor

This is to help @majocha in his journey for test parallelization here.

We noticed these ubiquitous "test.ok" files creation in the tests. Most of them are in the tests/fsharp (core tests) - and this is the target of this PR.

Usually the files are created just before sending "exit 0" signal from tests. After this, they are sometimes checked (sometimes not). Now, there is no need for this really, because the test engine would fail the test if the exit code is different anyway.

At the same time, files contaminate the logs and the file system which turns out to be one of the biggest obstacles for proper test parallelization.

Take this example, quotesInMultipleModules tests. They do some tests and then print this "ok file". Now, let's remove the "ok file" and break the very first assertion:

let a = Module1.Test.bar()
let b = sprintf "%A" (Module1.Test.run())

let test1 = (a=b)

so that it's

let test1 = (a <> b)

The tests fails:

image

I tested a few more cases just for sanity checks, but anyway the files are just the artifacts of ancient ways of testing we had here.

@github-actions
Copy link
Contributor

✅ No release notes required

@majocha
Copy link
Contributor

majocha commented Sep 12, 2024

Looking here:
https://github.com/dotnet/fsharp/blob/main/tests/FSharp.Test.Utilities/ScriptingShims.fsx

We may be losing some information without the file. It seems it was the only positive signal that the test script didn't fail silently.
At the moment we convert the exit 1 into exn throw, but we don't translate exit 0 to anything that can be captured as a positive signal of success. OTOH that's how testing works. We test for failures not for successes
.

@psfinaki
Copy link
Contributor Author

So what we need is to understand where the problem is if it happens. For that matter, these files don't help really, whereas the test framework propagates exceptions to the tests outputs anyway. Sometimes it looks ugly, but then it's up to the specific test to improve the situation a bit.

In the example above, nothing really changes if the "test.ok" files is being created - the log is still the same. There might be some edge cases where the "test.ok" is handled in some special manner, but I didn't notice anything interesting in that regard and that can be handled later if needed... But if you notice any particular information we might be missing, then yes, it's worth finding a way to compensate it.

@majocha
Copy link
Contributor

majocha commented Sep 12, 2024

I think that shim is just for stuff executed in hosted fsi. We also run some suff out of proc. It's something to investigate.
Anyway this change is for the better. We can add a better signaling mechanism than creating a "ok" file, if needed.

@psfinaki psfinaki marked this pull request as ready for review September 12, 2024 17:21
@psfinaki psfinaki requested a review from a team as a code owner September 12, 2024 17:21
@psfinaki
Copy link
Contributor Author

Alright so this seems to work, tests seem to be running as they used to.

Don't know how much time this saves for the CI since the CI times vary a lot anyway - we'll see.

@psfinaki
Copy link
Contributor Author

@KevinRansom maybe run your eyes over this one, if any concerns come to your mind :)

Copy link
Contributor

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, I will be glad to see the back of these.

Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With any change like this, it is not the goal to see that "tests still pass". Rather the opposite, it is important that they can still fail :-).

What measures have been taken for all types of tests affected by this change to make sure they can still fail?
(in-proc tests, out of proc tests, tests using the exec cfg ... primitives in a longer series...)

Was the change generated with a regex/script that guaranteed the opposite clause/branch to always yield an exit 1, and then verified it always leads to test failure in all forms of test execution, not just a few selected ones ? E.g. for topinit/generate.fsx it is not immediately obvious. Has it been tested it can still fail properly?

Most importantly, the tests are executed in multiple forms, and we should see evidence that each of these forms still fails the test run in an equivalent way.
In the in-process migrated FSharpSuite, this will be mainly a single place to verify - that one would be fine to demonstrate with just a few examples of failure (exit code, exception, runtime never getting to the block of code writing the file,...).

In the out-of-proc Fsharp.Suite, there are more ways on how the tests are run via the DSL (such as FSI_NETFX_STDIN , FSC_NETFX_TEST_ROUNDTRIP_AS_DLL,..) as well as via custom primitives for checking the file in sequential code.
Typeprovider tests come to mind as an example, as they can have multiple causes of failure, and each of the causes should still lead to a visible test failure after this change.

@psfinaki
Copy link
Contributor Author

@T-Gro thanks for the considerations.

In general:

  • I did some "smoke testing" in that regard, deliberating bugging some tests and seeing them failing
  • @majocha is battletesting the whole testing infra in Experiment, parallelize some tests #17662 where he incorporated these changes as well, tests behave as expected there as well

What measures have been taken for all types of tests affected by this change to make sure they can still fail?
(in-proc tests, out of proc tests, tests using the exec cfg ... primitives in a longer series...)

So the tests here are coming from different backgrounds. To get some diversity, let's look at different tests in tests.fs.

Most common type of test in when things are just being compiled and run, like in the case of subtype-langversion-checknulls:

    [<Fact>]
    let ``subtype-langversion-checknulls`` () =
        let cfg = testConfig "core/subtype"
        fsc cfg "%s -o:test-checknulls.exe -g --checknulls" cfg.fsc_flags ["test.fsx"]
        exec cfg ("." ++ "test-checknulls.exe") ""

Let's set the core/subtype/test.fsx to not compile - put "blah" at the end. We are getting the failed test as expected:

image

Let's put a runtime error then - change test "cenwceoiwe1" ((id |> toConverter |> fromConverter) 6 = 6) to equal 5. This also still fails:

image

We can still achieve the same just running the target file in fsi: fsi core/subtype/test.fsx:

image


Then there are these SDKTests, those are trying to build some projects and run targets on them:

    [<Fact>]
    let ``SDKTests`` () =
        let cfg = testConfig "SDKTests"

        let FSharpRepositoryPath = Path.GetFullPath(__SOURCE_DIRECTORY__ ++ ".." ++ "..")

        let projectFile = cfg.Directory ++ "AllSdkTargetsTests.proj"

        exec cfg cfg.DotNetExe ($"msbuild {projectFile} /p:Configuration={cfg.BUILD_CONFIG} -property:FSharpRepositoryPath={FSharpRepositoryPath}")

To fail then, we can bug the project file:

image


Then there are these singleTestBuildAndRun tests:

    [<Fact>]
    let ``attributes-FSC_OPTIMIZED`` () = singleTestBuildAndRun "core/attributes" FSC_OPTIMIZED

Those are similar to the first category, let me remove the Inline in this line there:

    check "vwlnwer-0wreknj3" (test2()) "Equals: [||], ExtensionMethod: [||], GetHashCode: [||], GetType: [||], ToString: [||], get_Item: [|Inline|], get_P: [||], set_Item: [|Inline|]"

Those also still fail:
image


Then there are some negative baseline tests:

    [<Fact>]
    let ``state-machines neg-resumable-01`` () =
        let cfg = testConfig "core/state-machines"
        singleVersionedNegTest cfg "preview" "neg-resumable-01"

It's enough to put an extra empty line in the target file to get a mismatch between warning ranges:

image


Yet others are calling other fun stuff like fsi or csc:

    [<Fact>]
    let events () =
        let cfg = testConfig "core/events"
        fsc cfg "%s -a -o:test.dll -g" cfg.fsc_flags ["test.fs"]
        peverify cfg "test.dll"
        csc cfg """/r:"%s" /reference:test.dll /debug+""" cfg.FSCOREDLLPATH ["testcs.cs"]
        peverify cfg "testcs.exe"
        fsi cfg "" ["test.fs"]
        exec cfg ("." ++ "testcs.exe") ""

If I mess up with C# code, I still get the exception propagated through the test failure:

image


Was the change generated with a regex/script that guaranteed the opposite clause/branch to always yield an exit 1, and then verified it always leads to test failure in all forms of test execution, not just a few selected ones ? E.g. for topinit/generate.fsx it is not immediately obvious. Has it been tested it can still fail properly?

Exit code were already mostly there, there were just a few places missing them, I put exit codes there manually. The topinit tests are a bit beyond my comprehension but at least if I mess up with a file mentioned there, I get errors.

Typeprovider tests come to mind as an example, as they can have multiple causes of failure, and each of the causes should still lead to a visible test failure after this change.

Same here, I messed up with some target files and still seem to get errors:

image


As I say, the tests have different backgrounds here and it's not always obvious how they are supposed to pass and fail, but above is at least some evidence. Importantly, there are these bad exit codes set up everywhere so it's reasonable to expect that tests fail via natural xUnit means.

@majocha
Copy link
Contributor

majocha commented Sep 18, 2024

Conceptually, the testok file was the signal of test success. An exe or a script can always return 0 because it just finished it's run one way or another but it does not necessarily mean it tested what it intended to test and determined it to be ok. We had such situations reported as failures before, "script returned 0 but test ok not created" or something like this.

@T-Gro
Copy link
Member

T-Gro commented Sep 18, 2024

Conceptually, the testok file was the signal of test success. An exe or a script can always return 0 because it just finished it's run one way or another but it does not necessarily mean it tested what it intended to test and determined it to be ok. We had such situations reported as failures before, "script returned 0 but test ok not created" or something like this.

Exactly, agree with @majocha. The ok file was an important assert which is now being removed.
There are several ways in a which a test execution could not reach the line of code writing the file, and trying "some changes" in "some tests" does not give me confidence that this is covered. Some samples mentioned in my brief scan above were not covered, e.g. the different modes of execution defined for the FsharpSuite framework (each has different characteristics of how it is executed).

Especially if something is beyond comprehension or has a more difficult setup (e.g. a codegenerated test), I don't think this should be used as a reason for modification just for the sake of changing it. Seeing the more complex scenarios, e.g. what was removed for splitAssembly test within typeprovidertests, I fail to see this as a proof - at least it is not immediately visible to me.

@majocha
Copy link
Contributor

majocha commented Sep 18, 2024

The challenge here is to find a channel of communication other than the filesystem, that would be as ubiquitous and work across the whole zoo of different modes of execution that we have for these tests.

@T-Gro
Copy link
Member

T-Gro commented Sep 19, 2024

For the in memory ones, it gets easier - we can pass in a shim for FileSystem API and it could store the result in a mutable, and the test engine then verify.

But for those FSC/FSI/FSC_DLL_ROUNDTRIP as well as manual .exe invocations, in-memory shim would not communicate it back.

I actually do want to change Fsharp.Suite (slowest CI suite) to run on multiple agents (so not multithreading. Not even multi-processing. But multiple machines). Manually running the full suite from VS would still be single threaded and slow, but at least CI times would not suffer.

And if your journey to enable multithreaded parallelization could focus just on componenttests suite for example (at least for me, this is the suite I run locally most of the time), maybe the ROI would improve significantly?

@majocha
Copy link
Contributor

majocha commented Sep 19, 2024

Fsharp.Suite (slowest CI suite)

@T-Gro, in my experiment I think the slowest one is actually ComponentTests, both in terms of physical running time and cumulative CPU time (?)

Desktop FSharpSuite with no limit on multithreading I can run in 6 - 7 minutes:
image

ComponentTests with same settings takes around 13 minutes:
image

@psfinaki
Copy link
Contributor Author

@T-Gro so far the investment has been quite small - I've created the regex, added a few exit codes and went through the tests which was useful for getting a picture.

I am reluctant to dig deep into all the aspects that you mention, especially with different types of testing - we don't have this documented properly, we don't know who and how uses it, so basically - I am not entirely sure how to do that and this would be a much bigger mental and time investment.

So maybe @majocha we take a different strategy - instead of removing all the ok files, maybe you can come up with the particular tests that come into your way in your parallelization efforts? Then we can adjust only those, and it will be easier to reason about.

@majocha
Copy link
Contributor

majocha commented Sep 19, 2024

ATM I'm experimenting with an idea:

Since I already have stdout abstracted per test case in my experiment, I can use it instead of the filesystem. Just do printfn "TEST PASSED OK" and then check the captured stdout.

This is quite general and works for me in all of the cases but one, that is really black magic:

Here fsi is being executed as part of a build, I believe with the target
<UsingTask TaskName="Fsi" AssemblyFile="$(FSharpBuildAssemblyFile)" />

execBothToOut cfg directory buildOutputFile cfg.DotNetExe "build /t:RunFSharpScript"

Is there a way to spit out Fsi output along with the rest of the build output?
@KevinRansom, you probably know the most about it, is it possible?

@majocha
Copy link
Contributor

majocha commented Sep 19, 2024

This is now dealt with in my PR, the way I described above, not elegantly but should work.

@psfinaki
Copy link
Contributor Author

Alright then, thanks Jakub, I'll close this one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants