Skip to content

[integration_test] Enable sharding of integration test suites #101296

@Giuspepe

Description

@Giuspepe

Use case

When running integration tests normally, the tests are executed sequentially on the connected device.
This is fine if the integration test suite is small. But it does not scale well when the suite grows.
This is an impediment for CI/CD pipelines containing integration tests because the integration tests will become the bottleneck.
For example, I have a test suite containing 10 integration tests. Running those alone already takes about 12 minutes (only running the tests on the device, excluding the time it takes to build the tests).

Sharding the integration test suite means to split the whole integration test suite into n shards which can be run in parallel on n devices to speed up running the whole integration test suite.

Firebase Test Lab provides an option num-uniform-shards to do this. See the documentation here and note that this is only available on the beta channel of the gcloud sdk.

However, this option currently does not work with Flutter tests. When using the option, one shard contains all the tests and the other shards are empty. This causes the whole test suite to fail because shards should not be empty.
The flank team has documented this quite nicely here with an example project containing integration tests.

Excerpt from flank's documentation

--num-uniform-shards

Test:

gcloud alpha firebase test android run \
  --project flank-open-source \
  --type instrumentation \
  --app build/app/outputs/apk/debug/app-debug.apk \
  --test build/app/outputs/apk/androidTest/debug/app-debug-androidTest.apk \
  --num-uniform-shards=3 \
  --timeout 5m

Result:

┌─────────┬────────────────────────┬───────────────────────────────┐
│ OUTCOME │    TEST_AXIS_VALUE     │          TEST_DETAILS         │
├─────────┼────────────────────────┼───────────────────────────────┤
│ Failed  │ walleye-27-en-portrait │ 1 test cases failed, 5 passed │
└─────────┴────────────────────────┴───────────────────────────────┘

Expected behaviour

The Flutter example app contains 6 test methods, so according to doc, the gcloud should create 3 shards, each shard should contain 2 methods.

Investigation results

  • One shard contains all test's
  • Two other shards are being empty.

Conclusions

The result is different from expected behaviour.

Proposal

I would like to be able to shard an integration test suite so it can be run faster by executing the shards in parallel on multiple devices.

More specifically, I would like to be able to use the --num-uniform-shards option of Firebase Test Lab.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issues that are less important to the Flutter projecta: tests"flutter test", flutter_test, or one of our testsc: new featureNothing broken; request for a new capabilityc: proposalA detailed proposal for a change to Fluttercustomer: crowdAffects or could affect many people, though not necessarily a specific customer.f: integration_testThe flutter/packages/integration_test pluginframeworkflutter/packages/flutter repository. See also f: labels.platform-androidAndroid applications specificallyteam-androidOwned by Android platform teamtriaged-androidTriaged by Android platform team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions