Skip to content

Use the latest tag for azure images#6601

Merged
estesp merged 1 commit intocontainerd:mainfrom
gabriel-samfira:set-lates-image-tag
Mar 8, 2022
Merged

Use the latest tag for azure images#6601
estesp merged 1 commit intocontainerd:mainfrom
gabriel-samfira:set-lates-image-tag

Conversation

@gabriel-samfira
Copy link
Copy Markdown
Contributor

This change ensures that we use a fairly recent and updated version of
the windows images used in testing. The latest tag doesn't always have
the absolute latest image as some images may contain just minor updates,
but most times they are identical to the latest version updated.

Signed-off-by: Gabriel Adrian Samfira [email protected]

@k8s-ci-robot
Copy link
Copy Markdown

Hi @gabriel-samfira. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Member

@kzys kzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I'd like to stick with specific versions rather than using latest.

  • Is there a way to distinguish a breakage due to the images vs. due to containerd?
  • What is the frequency of the updates?

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 1, 2022

Build succeeded.

@mikebrow
Copy link
Copy Markdown
Member

mikebrow commented Mar 1, 2022

latest can be problematic .. over time we've found latest tags being deleted, not knowing what version is being used, missing changes needed to tests, etc..

@gabriel-samfira
Copy link
Copy Markdown
Contributor Author

The latest tag in this case refers to Azure VM images we use to spin up the instances we run Windows tests on. These are maintained by the Azure team and are usually the latest version of Windows VM images available for that particular SKU. This means a fully up to date (în most cases) image, that should mirror what customers should be running in their production environments (bug fixes, security updates, etc).

We should be running tests against a fully patched version of windows for several reasons:

  1. Containerd should run well on a fully updated and secure Windows Server system. Any failure will expose issues in the way containerd interacts with systems that we expect to see in customer production environments.

  2. We want to know if a recent update causes issues. If something fails due to an update, the issue is either in the update, or in expectations containerd may have in regards to new Windows behavior pulled in by the udate. In either case, a change needs to be made in either Windows itself, or contained. After all, we want to be able to run containerd even on fully updated systems, such as those that should be running in production environments.

  3. Windows updates pull in fixes to HCS, which is the subsystem leveraged by hcsshim which in turn is leveraged by containerd. Testing against an old and potentially buggy version of HCS will yield results that are less relevant. We also want to catch new bugs introduced in HCS that have not been triggered by their respective CIs. For example, release v0.9.2 of hcsshim fixes a bunch of bugs that were triggered by containerd tests. Bugs that had a domino effect in all projects that end up calling hcsshim.

Is there a way to distinguish a breakage due to the images vs. due to containerd?

Yes. My main focus is root cause analysis on Windows tests flakyness. I am currently attempting to determine the cause of the recent flakyness (az cli broken releases, that freeze the CI, aside), but after running the tests in a loop for a day on the latest Windows images, I have been unable to reproduce issues other than mcr.microsoft.com resetting the connection and tests failing due to not being able to fetch the image.

  • What is the frequency of the updates?

Windows server updates usually come out on the second Tuesday of each month (Patch Tuesday). But Azure images have no published schedule. The latest image tag is what gets used by default when using the Azure portal as a normal user.

latest can be problematic .. over time we've found latest tags being deleted, not knowing what version is being used, missing changes needed to tests, etc..

Have you seen this happen for Azure VM images? Will ping the MSFT folks, but I expect their images to be fairly well maintained.

@gabriel-samfira
Copy link
Copy Markdown
Contributor Author

If you prefer, I can use the latest version number available now for both images, and keep updating them every month or so. But we need to keep the image version as close as possible to the latest one. The current images have a considerable number of updates available.

@mikebrow
Copy link
Copy Markdown
Member

mikebrow commented Mar 2, 2022

nod.. thx for the information. Sounds like we can trust the maintenance of :latest for these images for the main branch. Would also be interesting to know if these images are targeted to be backwards compatible.. meaning 1.6.x service, 1.7, main...

@kzys
Copy link
Copy Markdown
Member

kzys commented Mar 2, 2022

How about having both hard-coded specific versions and latest?

@gabriel-samfira
Copy link
Copy Markdown
Contributor Author

Would also be interesting to know if these images are targeted to be backwards compatible.. meaning 1.6.x service, 1.7, main...

Windows Server is not a rolling release. Updated images for a particular SKU (Windows server 2019 - 1809, ltsc2022, etc) means that it's the same OS but up to date with the latest security updates. It's the same thing you would get if you spin up a VM with the versions currently hard coded in the workflow, and run Windows updates. You still have the same version of Windows, but with updates applied. Microsoft has a great track record in regards to backwards compatibility. Containerd is expected to run on all supported Windows versions, regardless of patch level.

We're tasked to make sure that any issue that comes up and is not an error in either containerd or hcsshim (which is leveraged by containerd) gets reported to MSFT. If it is an error in either project, we usually send PRs to fix those.

When doing root cause analysis, it's usually good practice to try to reproduce the issue on an up to date installation of Windows. This is to eliminate potential bugs in HCS/HNS that were already fixed and released.

How about having both hard-coded specific versions and latest?

Not sure if running the tests on an unpatched Windows and on a patched one is helpful, but I can make the image configurable, and we can set a different one when manually triggering the job, with the default set to either latest or the most recent version available. Would that be acceptable?

@kzys
Copy link
Copy Markdown
Member

kzys commented Mar 4, 2022

Not sure if running the tests on an unpatched Windows and on a patched one is helpful, but I can make the image configurable, and we can set a different one when manually triggering the job, with the default set to either latest or the most recent version available. Would that be acceptable?

Sounds good to me if the default is the most recent version. We generally avoid auto-upgrade even if that means we have to upgrade versions by ourselves (e.g. #6619)

@gabriel-samfira
Copy link
Copy Markdown
Contributor Author

@kzys updated to the latest available version. Apologies for the delay on this.

This updates the Windows test worker images to the latest one available
in Azure. The updated images contain security and bug fixes.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 8, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 55s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 53s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 25m 00s (non-voting)

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit b0075c9 into containerd:main Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants