Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CI #1693

Merged
merged 14 commits into from
Aug 14, 2024
Merged

Refactor CI #1693

merged 14 commits into from
Aug 14, 2024

Conversation

Zerpet
Copy link
Collaborator

@Zerpet Zerpet commented Aug 6, 2024

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed

Summary Of Changes

  • Do not notify failed tests in PR
  • Add a test-ci rule branch
  • Split image building into different jobs
  • Install tools as part of unit test
  • Add permissions for steps using GCP
  • [ci] Use local builds in other jobs
  • [ci] Refactor build-test workflow
  • Fix examples
  • Update CMCTL binary
  • Use github token in carvel-setup action
  • Checkout code before setup GO
  • Re-enable GChat triggers

Additional Context

We are moving away from the massive CI image (~1GB) with a catch-all for tools, into a actions based
on small simple steps that setup the worker as we need to, and rely on the worker cache.

Another important step is moving away from our GCP project for system tests, into local system tests
using KinD. This is important because it relieves the project from depending on sponsored infrastructure,
and it gives more independence to run its test suites.

There's a important fix to the examples job, as it was always passing earlier. Now, any failure in the
examples will also fail the workflow. In PRs, we only dry-run the examples, in order to verify they have
a correct syntax. Once merged, the examples will run in KinD. The examples job is the heaviest, taking
about 17 minutes to run.

Zerpet added 11 commits August 6, 2024 12:29
Main reason is that PR won't have the secret to post to the webhook URL.
In addition, there's someone looking at PRs, at least the author, and
therefore the failure will be noticed.
This allows us to test CI changes in a dedicated branch, and keep the
main branch history clean.
This allows to run them in parallel, unblocking other jobs that depend
on the build_operator job.
Unfortunately, we cannot get rid of this step, because we need to
install testenv, and the setup-env binary for testing.
We are pushing every commit to a registry. This seems a bit unnecessary,
specially given that we can build locally very quickly. It's more
efficient to build and export to a tarball, and then share this tarball
among other jobs.
The GCS and manifest part is now a separate job, so it doesn't delay the
execution of other parts of the workflow. kubectl-rabbitmq plugin tests
now use the locally built image, instead of locally building one. This
should speed-up the execution time of this job.
The examples test.sh script was not exiting when an error was returned.
Any error in the test.sh scripts should now stop the execution and
return the exit code.

__ Downscale vault-tls example
The github runners struggles to run 3-node rabbit cluster due to
resource constraints.

__ Delete obsolete example
Federation is better achieved using the Messaging Topology Operator.
There's another example that covers how to setup TLS. The case of
federation over TLS is a combination of using the Topology Operator in
combination with the TLS example.

__ Fix external secret example
The test.sh file was missing, and this test was not excluded from the CI
runs. Added a short script to validate that the external secret is used
to seed the default user.

__ Removed system tests in GKE
They do not provide more value than local KinD system tests

__ cmctl now waits watching the correct namespace
Earlier, if the kubeconfig had a namespace that didn't exist as current
ns, cmctl will always fail, even when cert-manager was ready

__ Skip mtls internode example tests

It requires a 3 node cluster, and this setup is not reliable inside a
GitHub runner, given the resource constraints inside the runner.

Typo in kubectl testss
Starting with cert-manager 1.15.x, cmctl cli and cert manager are
shipped in different repos, as different software.
To avoid rate limitting from GitHub
This allows the setup-go action to cache our dependencies. In order to
cache our deps, the go.sum file needs to be present before executing the
setup-go action.
@Zerpet Zerpet self-assigned this Aug 6, 2024
Zerpet added 2 commits August 6, 2024 13:20
Login to Dockerhub, otherwise, the push will fail.

[skip ci]
@Zerpet
Copy link
Collaborator Author

Zerpet commented Aug 8, 2024

@PujaVad @DanielePalaia would you have some bandwidth to review?
I'd like to merge this PR this week, as it is blocking other PRs.

@DanielePalaia
Copy link
Contributor

Hey @Zerpet it looks good to me, I noticed that in the test_upgrade we are still using the GKE cluster. I was wondering if we are planning to migrate this one in a next step or it was for some limitations encountered.

@Zerpet
Copy link
Collaborator Author

Zerpet commented Aug 13, 2024

I left it out because I suspect we may have a resource limitation in Actions. The upgrade tests use a 3-node rabbit. Those were not quite deploying when I did this refactor. We can try a couple things to make those tests work in KinD. I would prefer to do that in a separate PR.

@Zerpet Zerpet merged commit fb5e375 into main Aug 14, 2024
@Zerpet Zerpet deleted the test-ci/drop-gcr branch August 14, 2024 10:30
@Zerpet Zerpet linked an issue Aug 14, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix CI
3 participants