Skip to content

ci: make tests run in parallel#2822

Closed
georglauterbach wants to merge 6 commits intomasterfrom
tests/parallelize
Closed

ci: make tests run in parallel#2822
georglauterbach wants to merge 6 commits intomasterfrom
tests/parallelize

Conversation

@georglauterbach
Copy link
Copy Markdown
Member

@georglauterbach georglauterbach commented Oct 10, 2022

Description

This is a major step forward for our CI. This PR enables the first tests to run in parallel. The Makefile and the GH workflow file generic_test.yml were adjusted. Files under test/ were renamed. 6 files were adjusted to provide a "template" for future PRs that port even more file from running in serial to running in parallel.

The new system works in the following way: files under test/ that begin with "parallel" can be run in parallel. The number that follows is the "execution unit" that they are executed in, i.e. with which other tests they actually run in parallel. For the GH merge request job, this is equivalent to telling which job certain tests run together in. Tests that begin with "serial" run one-after-another. This is either because they actually require running serialzed (serial.setup.sh.bats) or because they were not yet adjusted.

I also added a template (parallel.2.template.bats) which serves as a template for all other tests, especially parallel tests. I will provide a commit that adds documentation about this change, and the template file will be referenced.

Size of This PR

I know, it's a big PR. I know you don't like that. But this is mostly because of the few test files I adjusted and the documentation additions.. the important changes are in Makefile and the GH Action workflow files. I think having these changes in one PR and not in separate ones (this only applied to the first PR of slowly adopting the test suite to run in parallel) is the right way.

Disclaimer

One could consider this a breaking change for tests. This change is not backwards compatible (by default!), because the --jobs x where x > 0 requires GNU parallel to be installed. On most Linux distributions, this is installed by default, but I cannot guarantee it. One can "revert" the parallelization by using PARALLEL_JOBS=1 make ... though (the default is 2).

Effect

We have an "outer" and "inner" parallelization with GH's CI. The "outer" by using matrix, which should, when tests are evenly distributed in the future, speed up running the test suite by a factor of 4. The inner parallelization comes from using BATS with the --jobs flag to run test files themselves in parallel. With --jobs 2, the speedup can be as large as 30% (on my system, maybe larger on more potent systems), with 3, which is possible on GH runners I think, the speedup should be even larger.

This should yield a theoretical speedup of 82.5% (with even distribution of tests into their respective execution units).

Type of change

  • New feature (non-breaking change which adds functionality)
  • Improvement (non-breaking change that does improve existing functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@georglauterbach georglauterbach added area/ci priority/high kind/improvement Improve an existing feature, configuration file or the documentation meta/feature freeze On hold due to upcoming release process labels Oct 10, 2022
@georglauterbach georglauterbach added this to the v12.0.0 milestone Oct 10, 2022
@georglauterbach georglauterbach self-assigned this Oct 10, 2022
@georglauterbach georglauterbach marked this pull request as draft October 10, 2022 13:43
Comment thread test/serial.setup.sh.bats
@@ -0,0 +1,150 @@
load 'test_helper/common'

# ! NEEDS SERIAL EXECTION - DO NOT RUN IN PARALLEL
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff for this file is misleading. Nothing changed except for this line.. but because of the renaming, git thinks this is a completely new file. Makes the diff appear large :D

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but because of the renaming, git thinks this is a completely new file

This is going to be annoying, but I'd like to request you avoid this issue by separating out commits of renaming and then a commit for file change, or if it's easier at this point the other way around.

You can probably handle this without making it a pain to address. Try this:

  1. Use a rename tool or script to remove the serial. / parallel. (or similar with the optional numbers) to revert the filename.
  2. Commit just these file renames.
  3. Commit a revert commit that does the opposite.
  4. Try rebase that revert commit (original test names to new test names) so that it's the first commit in this PR. If this works you're done.
  5. If it's not going to go smoothly, revert the existing commits (in GitKraken I can just right-click a commit and create a revert commit, I assume there is a direct git command for this). Then you can revert the revert 😆 and a rebase that should be able to drop original commits and some of the reverts that follow... that should give a clean commit history now. This is fairly easy to do with GitKraken UI for interactive rebasing a range of commits, if you've got the commits pushed here I can handle that part if you like 👍

I remember for when we imported the Github Wiki git history to merge into our git repo history here (via merge commit, which visually interleaved the commit history, but didn't rewrite the main project commit history), we made sure to separate the file renames from content edits to ensure git could follow a file to it's original and diff properly :)


I won't have time to look at this until about 8 hours or so, but I'm curious about the current naming convention choice vs using folders for serial and parallel? Any benefit in prefixing filenames with it?

Perhaps you're using that from one of the BATS ENV or scripts on what to run? (although from our end grouping by folder should be just as easy?) I know that I'd like to eventually have some folders to group tests by topics/relevance.

I was planning for CI to have several runners/jobs that would run a subset of the tests (as a different form of parallelism), a good candidate would be the SSL tests, where one of those takes a huge amount of time to iterate through permutations of RSA + ECDSA and modern / intermediate (legacy) TLS levels.

Copy link
Copy Markdown
Member Author

@georglauterbach georglauterbach Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to create directories, but this caused the load directive in BATS in conjunction with test/test_helper/common.sh to throw errors because of wrong paths when sourcing / loading other scripts. Maybe a future PR can take care of this, for now, we should be fine with a simple naming scheme that is somewhat similar to DNS.


I will take care of the commit history.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote history and I hope it is now easier to review.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote history and I hope it is now easier to review.

Hey @georglauterbach sorry for the late response. Looking over the commit history, I can see a few files in the commits were not just renamed but added changes in that same commit, which breaks the rename detection for git blame history.

It happened in a few commits, if I have your permission I don't mind correct this history and doing a force-push?

Copy link
Copy Markdown
Member Author

@georglauterbach georglauterbach Oct 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote history and I hope it is now easier to review.
It happened in a few commits, if I have your permission I don't mind correct this history and doing a force-push?

Oh, I see. gitui showed them as renaming to me (maybe I did not pay enough attention). Please, go ahead :)


I will resolve merge conflicts with master later.

Comment thread docs/content/contributing/issues-and-pull-requests.md Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview for this PR is ready! 🎉

Built with commit: c4d85ae

If `./test/bats/bin/bats --timing test/serial.*.bats` would fail, Bash
  would then try the `|| ...` part which is not desired. I therefore use
  a `if .. then .. else` statement, which solved this problem.
@georglauterbach georglauterbach removed the meta/feature freeze On hold due to upcoming release process label Oct 13, 2022
@polarathene
Copy link
Copy Markdown
Member

polarathene commented Oct 14, 2022

NOTE: To preserve the rename history (better git blame traceability), this PR will use a rebase merge, not a squash merge.

Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Makefile and a few others presently have some conflicts to resolve (due to other PRs being merged, such as introducing ClamAV test file, while this didn't have that to rename into your parallel version at the time).

When you've addressed the feedback, and if I've been granted permission to interactively rebase with a force push, I'll rewrite commit history to accommodate the file history the way I'd like it. Depending on my availability, the rebase may be delayed up until next weekend at the latest (hopefully).

assert_failure
assert_output --partial "Timed out on command"

docker rm -f "${CONTAINER_NAME}"container_has_service_running/wait_for_service
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this line here?

The container should remove itself once done due to the docker run --rm?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this was due to me testing repeatedly, and I did not want to have that many containers running on my small machine. Can be removed :)

run docker exec "${CONTAINER_NAME}" cat /tmp/marker
assert_output "${CONTAINER_NAME}"

docker rm -f "${CONTAINER_NAME}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the container should already remove itself after completion of the sleep?

Are you just wanting to bring it down sooner for cleanup? There is no explicit name assigned, so it should not cause any problems, the container will just run for 100 seconds and be seen in docker ps until then.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread Makefile
Comment thread Makefile
Comment thread Makefile
Comment on lines +49 to +53
test/part/%:
# part/0 => tests run in a serialized manner
# part/x where (x > 0) => tests are run in parallel
@ if [[ $* -eq 0 ]]; then ./test/bats/bin/bats --timing test/serial.*.bats; else \
./test/bats/bin/bats --timing --jobs $(PARALLEL_JOBS) test/parallel.$*.*.bats; fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of the change here.

You've got a conditional to handle serial vs parallel testing, and supporting commentary to further clarify that.

Let's change this to test/part/0: and after that test/part/%, I assume it's smart enough to only match and handle numbers after 0, if not, we can just have clear commands tests-serial + tests-parallel, both called by make tests. I think that would be clearer?

Can you clarify if I understand this correctly? You are running parallel test files in parallel, but not their individual test cases within a file correct? Default parallel jobs is two, so for local users the resources required may be twice what was needed before? I think that was mostly only an issue for ClamAV which is now it's own serial test, thus probably nothing to worry about.

If a user does want to run tests all in serial, then setting PARALLEL_JOBS=1 will do the trick?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part could get some rework. Depending on your rebase I will tackle it afterwards.

The docs preview explains the new strategy better: https://pullrequest-2822--dms-doc-previews.netlify.app/contributing/general/#understanding-the-test-suite. Locally, individual test files are run in parallel, but only those that have a parallel in their name.


If a user does want to run tests all in serial, then setting PARALLEL_JOBS=1 will do the trick?

Exactly.

Comment on lines +17 to +19
strategy:
matrix:
part: [0, 1, 2, 3]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only to support the make generate-accounts for tests below?

Is that an issue for running tests locally in this manner too? I'd need to review the Github Actions docs. It's not clear to me if you're configuring the CI here to run the matrix on the same runner, or separate runners per part. I assume it's the latter.

Thus we're splitting the testing out to 4 runners / machines instead of the present 1 runner on a single machine? And then 1 of those will run all serial tests, while the other 3 run two parallel tests at a time?

Our CI has introduced some concurrency here in addition to parallelism? (I think the technically correct term is the CI is providing parallelism, and BATS jobs is providing concurrency, but whatever 🤷‍♂️ 😛 )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we're splitting the testing out to 4 runners / machines instead of the present 1 runner on a single machine? And then 1 of those will run all serial tests, while the other 3 run two parallel tests at a time?

Yes

Our CI has introduced some concurrency here in addition to parallelism?

Indeed :D I think this is not a problem though, as there are no write-after-x conflicts, because we do not write to a cache. Or de we? In the generic-test, there is no cache-to: .., so all is well?


Think the technically correct term is the CI is providing parallelism, and BATS jobs is providing concurrency

This is absolutely correct.

@polarathene
Copy link
Copy Markdown
Member

I know, it's a big PR. I know you don't like that.
I think having these changes in one PR and not in separate ones (this only applied to the first PR of slowly adopting the test suite to run in parallel) is the right way.

It's not likely to be a problem when the history is properly fixed.

Limiting the amount of tests converted for running in parallel in this PR is the correct choice and keeps this concern with diff size under control 👍


This change is not backwards compatible (by default!), because the --jobs x where x > 0 requires GNU parallel to be installed.

Technically we could probably check if parallel is available, otherwise set the ENV to 1. That can be deferred until a contributor expresses any frustration regarding it. I'm sure we're all good with just using PARALLEL_JOBS=1 make tests if it were an issue anyway 👍


- name: 'Run tests'
run: make generate-accounts tests
run: make generate-accounts test/part/${{ matrix.part }}
Copy link
Copy Markdown
Member

@casperklein casperklein Oct 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right, that make generate-accounts is run 4 times? For each matrix.part? If so, lets change it to:

Suggested change
run: make generate-accounts test/part/${{ matrix.part }}
name: Prepare tests
run: make generate-accounts
name: Run tests
run: make test/part/${{ matrix.part }}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait with this until after @polarathene has taken care of the history :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry for the delay!

It did not help that much with the review UI on Github (due to a merge commit diff I think), but the history looks like it should be preserved fine with a rebase merge now (Github blame won't likely honour it, but other git blame tooling should).

@georglauterbach
Copy link
Copy Markdown
Member Author

The Makefile and a few others presently have some conflicts to resolve (due to other PRs being merged, such as introducing ClamAV test file, while this didn't have that to rename into your parallel version at the time).

If you interactive rebase does not take care of all the conflicts I can after spare some time to resolve them too, so you don't have too much work with this :)

When you've addressed the feedback, and if I've been granted permission to interactively rebase with a force push, I'll rewrite commit history to accommodate the file history the way I'd like it. Depending on my availability, the rebase may be delayed up until next weekend at the latest (hopefully).

Go ahead and do it, sound very good to me :) And thanks a lot :D

georglauterbach added a commit that referenced this pull request Oct 15, 2022
Got some inspiration from comments in #2822 and decided to clean up the
whole `Makefile`.
@georglauterbach georglauterbach mentioned this pull request Oct 15, 2022
7 tasks
@polarathene
Copy link
Copy Markdown
Member

Go ahead and do it, sound very good to me :) And thanks a lot :D

I'll try get this done today or tomorrow if possible, but definitely by the weekend 😅

@georglauterbach
Copy link
Copy Markdown
Member Author

@polarathene if you have the time figuring out how to put the tests into subdirectories, please go ahead :) The naming scheme that I introduced is just a "workaround" for the import errors in BATS that I did not yet have time to resolve. Possibly, constructing a complete path would solve the problems.. I may be able to have a look today in the evening. I will let you know.

@georglauterbach
Copy link
Copy Markdown
Member Author

Closing in favour of #2850.

georglauterbach added a commit that referenced this pull request Oct 23, 2022
See #2822 (comment)

I also removed `CI: true` because I think GH Runners can now properly
handle colored TTY output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci kind/improvement Improve an existing feature, configuration file or the documentation priority/high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants