-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Split tests to more sub-types #11402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is an attempt to improve stability in our tests. I am still trying it - it will likely fail (I had to move tests from "tests" to "core" directory and it likely will cause some more troubles, but I think it's going in the right direction - we will have many less tests to run "per job" but many more jobs to run. I think that will be fine because those jobs will generally run much, much faster in general and I hope the 137 "errors" will be gone (I also move the "backfill_job" to Heisentests for now. The next step will be to only run subset of tests for non-core-related changes as described in #10507 |
|
The Workflow run is cancelling this PR. Building image for the PR has been cancelled |
67d4aca to
828f305
Compare
dd8392d to
adc99d4
Compare
|
Hey Everyone. I think with this change we have a chance to finally reach stability of the tests. I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every test type locally. If (also built only once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set for the given test type and reproduce the failure, then you can enter Breeze and reproduce it one-by-one. Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. |
08e2f64 to
f0d6435
Compare
|
Some of the tests about processes were flaky with dropping connections to MySQL/Postgres. I moved them to Quarantined. However, I have not seen yet a single transient error caused by resource problems (Exit 137). So it looks really good. Paired with the workaround I implemented for the "unknown blob" problem (#11411 ) we might be finally back to a reasonably stable state of tests. |
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of apache#10507 issue. This split is possible thanks to apache#10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning.
|
Succes! I got green build. Thera are a few follow-up tasks with Qurantined tests that need some love, and implementing full selective tests (I run some last tests with it) and we might be back in Green CI business. Looking forward to reviews, but I think it's going to be quite a game-changer. |
|
@potiuk Oh heck yes! I was about to suggest exactly this! |
|
Yes exactly, before we couldn't do this because we'd have to rebuild the image a bunch of times, but I think this will be great for reducing strain on the CI |
|
Just look out for #11417! This will be the killer one. |
|
BTW. @dimberman -> This is where I wanted to get with CI when I joined the project ~ 2 years ago :). Pretty much ALL the work I've done with Breeze and CI was to reach this very point where we can do this thing and massively speed CI up. t was maaaaaaaaany PRs to get us here :). Especially that now it will be so easy and straightforward to reproduce any failure locally. This is what I am especially happy about - that when one of those jobs fails for a good reason, It's literally one command to reproduce the failed build and another to enter the container and re-run the test. It should take literally a few minutes now, to reproduce any failure and we even show you in the logs how you can do it with breeze. |
|
Nice one! |
|
I already run several hundreds of those tests and not a single intermittent problem as of yet. I have really high hopes for this one! |
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of apache#10507 issue. This split is possible thanks to apache#10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning. (cherry picked from commit 5bc5994)
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.
This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.