Skip to content

Conversation

@tiborvass
Copy link
Contributor

Signed-off-by: Stefan Scherer [email protected]

Opening on behalf of @StefanScherer

Ref #39691

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! Just a minor comment. Otherwise lgtm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this tag step may not be necessary any more. Do you know if anything fails if the tagging is skipped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create the microsoft/windowsservercore tag to make it work on new CI.
I saw changes from @thaJeztah in our internal repos, maybe that's not upstreamed yet.
For this PR I didn't want to mix other PR's into one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw changes from @thaJeztah in our internal repos, maybe that's not upstreamed yet.
For this PR I didn't want to mix other PR's into one.

Perhaps we should upstream those changes before this PR (to reduce merge-conflicts once it gets synced with our internal repositories)

@StefanScherer
Copy link
Contributor

Current status with new CI infra: We can compile, and run the unit tests and integration tests on Windows Server 2016 and 2019.

Errors that appear at the moment in the integration tests:

  • windows-2016 / RS1: Timeout in DockerSuite.TestAPIImagesSaveAndLoad
  • windows-2019 / RS5: DockerSuite.TestRunCleanupCmdOnEntrypoint
    Error: Expected output ContainerAdministrator, got "user manager\containeradministrator". mcr.microsoft.com/windows/servercore:ltsc2019

This should be investigated/fixed in separate PR's.

@tiborvass This PR is no longer a Draft.

@StefanScherer
Copy link
Contributor

Pushed another commit to fix TestRunCleanupCmdOnEntrypoint for new CI.

@tiborvass tiborvass force-pushed the use-new-windows-labels branch 2 times, most recently from 2cdae31 to 8618a58 Compare August 20, 2019 01:30
@StefanScherer StefanScherer force-pushed the use-new-windows-labels branch from 8618a58 to 3091aca Compare August 20, 2019 16:33
@andrewhsu
Copy link
Contributor

I'd really prefer to get the new PR check status green since the legacy PR checks for windowsRS1 and windowsRS5-process are green and we depend on those for merge ready status.

The windows-2016 error seems repeatable, @ddebroy should this test be disabled or fixed?

--- FAIL: Test (605.33s)
 panic: DockerSuite.TestAPIImagesSaveAndLoad test timed out after 10m0s [recovered]

I see a new windows-2019 error:

PANIC: docker_cli_build_test.go:4707: DockerSuite.TestBuildNoNamedVolume

Also, should Windows Defender be disabled?

# Disable Windows Defender
Write-Host -ForegroundColor Green "INFO: Disable Windows Defender to avoid handles on container volume files"
Set-MpPreference -DisableRealtimeMonitoring $true

@ddebroy
Copy link
Contributor

ddebroy commented Aug 21, 2019

Good catch, @andrewhsu . Windows Defender should indeed be disabled. The 2019/rs5 failure signature error removing volume path is due to Windows Defender keeping a handle open for too long for the automated tests to fail.

TestAPIImagesSaveAndLoad should be disabled as well for Windows for now based on #37573

@StefanScherer StefanScherer force-pushed the use-new-windows-labels branch from 3091aca to ad80995 Compare August 21, 2019 07:24
@StefanScherer
Copy link
Contributor

New CI is green, and windows-2016 node was faster than windowsRS1, and windows-2019 was faster than windowsRS5-process. 💚

Jenkinsfile Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use 1 here? It's what we do elsewhere

Suggested change
$env:SKIP_VALIDATION_TESTS="true"
$env:SKIP_VALIDATION_TESTS="1"

Jenkinsfile Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to setting these environment variables in powershell instead of using something like;

environment {
    DOCKER_BUILDKIT        = '0'
    SKIP_VALIDATION_TESTS  = '1'
    SOURCES_DRIVE          = 'd'
    SOURCES_SUBDIR         = 'gopath'
    TESTRUN_DRIVE          = 'd'
    TESTRUN_SUBDIR         = $(BUILD_NUMBER)
    WINDOWS_BASE_IMAGE     = 'mcr.microsoft.com/windows/servercore'
    WINDOWS_BASE_IMAGE_TAG = 'ltsc2016'
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, that looks better. I'll update the Jenkinsfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw changes from @thaJeztah in our internal repos, maybe that's not upstreamed yet.
For this PR I didn't want to mix other PR's into one.

Perhaps we should upstream those changes before this PR (to reduce merge-conflicts once it gets synced with our internal repositories)

Jenkinsfile Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions here as above

Signed-off-by: Stefan Scherer <[email protected]>
@StefanScherer StefanScherer force-pushed the use-new-windows-labels branch from ad80995 to ca3e230 Compare August 21, 2019 11:13
@thaJeztah
Copy link
Member

@StefanScherer Seeing this on a previous run of RS1; https://ci.docker.com/public/blue/rest/organizations/jenkins/pipelines/moby/branches/PR-39747/runs/7/nodes/213/log/?start=0

[2019-08-21T08:03:50.255Z] powershell.exe : Error: No such object: mcr.microsoft.com/windows/servercore
[2019-08-21T08:03:50.255Z] At D:\gopath\src\github.com\docker\docker@tmp\durable-9ca862d9\powershellWrapper.ps1:3 char:1
[2019-08-21T08:03:50.255Z] + & powershell -NoProfile -NonInteractive -ExecutionPolicy Bypass -Comm ...
[2019-08-21T08:03:50.255Z] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2019-08-21T08:03:50.255Z]     + CategoryInfo          : NotSpecified: (Error: No such ...dows/servercore    :String) [], RemoteException
[2019-08-21T08:03:50.255Z]     + FullyQualifiedErrorId : NativeCommandError
[2019-08-21T08:03:50.255Z]  
[2019-08-21T08:03:50.255Z] INFO: Version of mcr.microsoft.com/windows/servercore:ltsc2016 is ''

For RS5 I also see https://ci.docker.com/public/blue/rest/organizations/jenkins/pipelines/moby/branches/PR-39747/runs/7/nodes/183/log/?start=0

Not sure if Version of should be empty?

[2019-08-21T07:48:49.368Z] INFO: Version of mcr.microsoft.com/windows/servercore:ltsc2019 is ''

Build proceeded after that, but looks like there may be something to prevent the error

@thaJeztah
Copy link
Member

Both RS1 and RS5 seem to hit some cleanup step in the middle of tests (why?)

RS1 https://ci.docker.com/public/blue/rest/organizations/jenkins/pipelines/moby/branches/PR-39747/runs/8/nodes/325/log/?start=0

[2019-08-21T13:13:32.614Z] SUCCESS: Specified value was saved.
[2019-08-21T13:13:33.612Z] INFO: Downloading git...
[2019-08-21T13:13:36.590Z] INFO: Downloading go...
[2019-08-21T13:13:39.576Z] INFO: Downloading compiler 1 of 3...
[2019-08-21T13:13:41.809Z] INFO: Downloading compiler 2 of 3...
[2019-08-21T13:13:42.807Z] INFO: Downloading compiler 3 of 3...
[2019-08-21T13:13:43.280Z] INFO: Extracting git...
[2019-08-21T13:13:46.561Z] Sending interrupt signal to process
[2019-08-21T13:13:56.562Z] After 10s process did not stop
[2019-08-21T13:13:58.430Z] jenkins.util.io.CompositeIOException: Unable to delete 'd:\gopath\src\github.com\docker\docker@tmp\durable-f470cb39'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.

RS5 https://ci.docker.com/public/blue/rest/organizations/jenkins/pipelines/moby/branches/PR-39747/runs/8/nodes/308/log/?start=0

[2019-08-21T13:10:37.969Z] INFO: Pulling mcr.microsoft.com/windows/servercore:ltsc2019 from docker hub into daemon under test. This may take some time...
[2019-08-21T13:10:37.969Z] ltsc2019: Pulling from windows/servercore
[2019-08-21T13:10:38.443Z] 65014b3c3121: Pulling fs layer
[2019-08-21T13:10:38.443Z] e7a74de96ddd: Pulling fs layer
[2019-08-21T13:10:45.667Z] e7a74de96ddd: Verifying Checksum
[2019-08-21T13:10:45.667Z] e7a74de96ddd: Download complete
[2019-08-21T13:10:54.383Z] 65014b3c3121: Verifying Checksum
[2019-08-21T13:10:54.383Z] 65014b3c3121: Download complete
[2019-08-21T13:13:34.199Z] Sending interrupt signal to process
[2019-08-21T13:13:37.126Z] 65014b3c3121: Pull complete
[2019-08-21T13:13:44.199Z] After 10s process did not stop
[2019-08-21T13:13:45.964Z] jenkins.util.io.CompositeIOException: Unable to delete 'd:\gopath\src\github.com\docker\docker@tmp\durable-10074fd8'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
[2019-08-21T13:13:45.964Z] 	at jenkins.util.io.PathRemover.forceRemoveRecursive(PathRemover.java:95)

@StefanScherer
Copy link
Contributor

Yes, in the middle of the docker build both agents. Looks like someone accidentally hit the abort button. Sending interrupt signal to process is a message from Jenkins. 🤔

@thaJeztah
Copy link
Member

Jenkins is weird; how can it skip the DCO check??

Screenshot 2019-08-21 at 16 36 31

Screenshot 2019-08-21 at 16 36 16

@andrewhsu
Copy link
Contributor

@StefanScherer the PR check that looked like an abort was actually the 2-hr timelimit being hit. Jenkins will send a signal to the process to bow out.

Copy link
Contributor

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewhsu
Copy link
Contributor

@tiborvass i think this PR is ready for the effects of mergeability

@tiborvass tiborvass marked this pull request as ready for review August 21, 2019 23:39
@tiborvass tiborvass requested a review from tianon as a code owner August 21, 2019 23:39
@tiborvass
Copy link
Contributor Author

@StefanScherer @andrewhsu this will bring the CI time to over 1h as windows-RS1 took 1h43min. We should probably do something similar to what we do in janky here #39682

Copy link
Contributor Author

@tiborvass tiborvass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 38ab9da into moby:master Aug 26, 2019
@StefanScherer StefanScherer deleted the use-new-windows-labels branch August 26, 2019 09:00
@StefanScherer
Copy link
Contributor

@benhillis I understand you concerns.

We have problems on RS1 with DockerSuite.TestAPIImagesSaveAndLoad -> https://ci.docker.com/public/blue/organizations/jenkins/moby/detail/PR-39747/6/pipeline/222#step-223-log-1244
and problems on RS5 with TestRemoveContainerWithRemovedVolume and TestRemoveContainerWithVolume and maybe others -> https://ci.docker.com/public/blue/organizations/jenkins/moby/detail/PR-39747/6/pipeline/187#step-188-log-89

The old pet machines also have OnAccessProtectionEnabled set to false for a very long time.

I've only added that check to ensure the currently needed settings are given on the nodes and to give a hint how to reach that.

It would be great to have a change/PR, so that the integration tests could also run with realtime protection enabled.

@thaJeztah
Copy link
Member

@StefanScherer if it's only those two tests, could we somehow perform that check from the Go code, and skip those tests if OnAccessProtectionEnabled is enabled? Similar to

skip.If(t, testEnv.DaemonInfo.OSType != "linux")
skip.If(t, testEnv.IsRemoteDaemon())
skip.If(t, !requirement.CgroupNamespacesEnabled())

@benhillis
Copy link

I would prefer skipping tests that do not work correctly rather than failing the entire suite.

@thaJeztah
Copy link
Member

@benhillis @StefanScherer I opened #39804 for now, so that we can continue looking for the better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants