-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CI #1693
Refactor CI #1693
Conversation
Main reason is that PR won't have the secret to post to the webhook URL. In addition, there's someone looking at PRs, at least the author, and therefore the failure will be noticed.
This allows us to test CI changes in a dedicated branch, and keep the main branch history clean.
This allows to run them in parallel, unblocking other jobs that depend on the build_operator job.
Unfortunately, we cannot get rid of this step, because we need to install testenv, and the setup-env binary for testing.
We are pushing every commit to a registry. This seems a bit unnecessary, specially given that we can build locally very quickly. It's more efficient to build and export to a tarball, and then share this tarball among other jobs.
The GCS and manifest part is now a separate job, so it doesn't delay the execution of other parts of the workflow. kubectl-rabbitmq plugin tests now use the locally built image, instead of locally building one. This should speed-up the execution time of this job.
The examples test.sh script was not exiting when an error was returned. Any error in the test.sh scripts should now stop the execution and return the exit code. __ Downscale vault-tls example The github runners struggles to run 3-node rabbit cluster due to resource constraints. __ Delete obsolete example Federation is better achieved using the Messaging Topology Operator. There's another example that covers how to setup TLS. The case of federation over TLS is a combination of using the Topology Operator in combination with the TLS example. __ Fix external secret example The test.sh file was missing, and this test was not excluded from the CI runs. Added a short script to validate that the external secret is used to seed the default user. __ Removed system tests in GKE They do not provide more value than local KinD system tests __ cmctl now waits watching the correct namespace Earlier, if the kubeconfig had a namespace that didn't exist as current ns, cmctl will always fail, even when cert-manager was ready __ Skip mtls internode example tests It requires a 3 node cluster, and this setup is not reliable inside a GitHub runner, given the resource constraints inside the runner. Typo in kubectl testss
Starting with cert-manager 1.15.x, cmctl cli and cert manager are shipped in different repos, as different software.
To avoid rate limitting from GitHub
This allows the setup-go action to cache our dependencies. In order to cache our deps, the go.sum file needs to be present before executing the setup-go action.
b5c3ca7
to
5ce3583
Compare
Login to Dockerhub, otherwise, the push will fail. [skip ci]
@PujaVad @DanielePalaia would you have some bandwidth to review? |
Hey @Zerpet it looks good to me, I noticed that in the test_upgrade we are still using the GKE cluster. I was wondering if we are planning to migrate this one in a next step or it was for some limitations encountered. |
I left it out because I suspect we may have a resource limitation in Actions. The upgrade tests use a 3-node rabbit. Those were not quite deploying when I did this refactor. We can try a couple things to make those tests work in KinD. I would prefer to do that in a separate PR. |
Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Summary Of Changes
Additional Context
We are moving away from the massive CI image (~1GB) with a catch-all for tools, into a actions based
on small simple steps that setup the worker as we need to, and rely on the worker cache.
Another important step is moving away from our GCP project for system tests, into local system tests
using KinD. This is important because it relieves the project from depending on sponsored infrastructure,
and it gives more independence to run its test suites.
There's a important fix to the examples job, as it was always passing earlier. Now, any failure in the
examples will also fail the workflow. In PRs, we only dry-run the examples, in order to verify they have
a correct syntax. Once merged, the examples will run in KinD. The examples job is the heaviest, taking
about 17 minutes to run.