Skip to content

Conversation

@fanyang-mono
Copy link
Member

@fanyang-mono fanyang-mono commented Sep 29, 2021

Temporarily disable runtime tests running on Android arm64. Will re-enable once dotnet/xharness#663 is resolved.

cc: @greenEkatherine @premun

@ghost
Copy link

ghost commented Sep 29, 2021

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@krwq
Copy link
Member

krwq commented Oct 7, 2021

@fanyang-mono can you please point to the issue in dotnet/runtime related to this PR? We need that to track re-enabling the test once the external issue gets fixed.

Do you think this will disable tests corresponding for both JIT.jit64 and baservices.mono failures in https://dev.azure.com/dnceng/public/_build/results?buildId=1382481&view=results ?

external tracking issue: dotnet/xharness#663 (so it's easier to find this is connected)

@SamMonoRT
Copy link
Member

fyi - @premun @greenEkatherine

Copy link
Member

@SamMonoRT SamMonoRT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a tracking issue in dotnet\runtime as a reminder to enable the tests again once external issue is fixed

@fanyang-mono
Copy link
Member Author

Create a tracking issue in dotnet\runtime as a reminder to enable the tests again once external issue is fixed

Created #60128

@premun
Copy link
Member

premun commented Oct 7, 2021

@fanyang-mono is there actually an installation timeout happening still? I thought it's just the tests timing out for running too long now? There were failing installations because apps were left behind and the phones were full but no actually Android installation timeout, right?

@SamMonoRT, @fanyang-mono I have made a lot of improvements on how we time scope commands you run in Helix and how we collect telemetry. The only problem is that currently, Arcade hasn't been updated in runtime for a long time so we cannot get these things. Not sure who is responsible for that (runtime infra team?) but if we had the newest bits (from last week at least), we could actually measure each operation and see what is happening inside of the Xunit runner as it's hard to peek in.

The new updates will also give you the possibility to re-run the work item on a different Helix machine but overall I am afraid we are now rather hitting your tests running too long.

@premun
Copy link
Member

premun commented Oct 7, 2021

We are looking in the code and there's no installation timeouts. Furthermore it seems to be happening in the same set of apps such as baseservices.mono (for some it happens more, for some less). I also pulled Kusto logs and it is not tied to a specific machine - there's no set of machines that would cause this. I also see BCL running many times the number of work items you are and without a single timeout.

What's happening that we know about is that the Xunit runner calls XHarness to install the app, that finishes immediately. Then, after 3 hours, Xunit runner calls XHarness to run the app which starts running and then the Helix work item times out and the process is killed. We don't know yet what happens in those 3 hours.

I am not convinced this is actually infrastructural issue but there's no "external issue" that is being worked on that would fix this at the moment. Might be in the way Xunit runner calls things, might be somewhere in the Helix SDK but I think it needs active investigation. We will dig more with @fanyang-mono.

@premun
Copy link
Member

premun commented Oct 7, 2021

For the issue that @krwq is linking - that one can be resolved with the retry mechanism once we have the new Arcade and we make changes to request the retries.

@fanyang-mono fanyang-mono added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Oct 7, 2021
@premun premun removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Oct 8, 2021
@premun premun changed the title Stop running runtime tests on Android arm64, until the installation timeout gets fixed Stop running runtime tests on Android arm64 in main build Oct 8, 2021
@premun premun merged commit 2880cc1 into dotnet:main Oct 8, 2021
@premun
Copy link
Member

premun commented Oct 8, 2021

@fanyang-mono @SamMonoRT I merged this as the queue got heavily back up with 3 hours work items today.
I think it's the right thing to disable the tests, I was just noting that the reason we are doing this is not what this PR claims.

We should bump Arcade in runtime (it has been already promoted and will soon be in main), open a PR to re-enable these and resolve the Xunit runner 3 hours timeout in the PR.

image

@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants