Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Flaky test TestServiceWithPredefinedNetwork #36551

Merged
merged 1 commit into from
Mar 22, 2018

Conversation

selansen
Copy link
Contributor

@selansen selansen commented Mar 9, 2018

Fixes #36547

TestServiceWithPredefinedNetwork test case was failing
at times. To fix the issue, added new API to check
for services after we clean up all services. Tested
multiple times and this sould fix flaky issue.

Signed-off-by: selansen [email protected]

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a couple problems with this test.

  1. It's not actually checking anything about the service, just that it exists. Shouldn't it be comparing the inspect result to the expected?

  2. It shouldn't be doing cleanup. That is handled by setupTest(t)().

What is the purpose of this test? What is the host network a special case? That is what needs to be checked.


}

func serviceRunningCount(client client.ServiceAPIClient, serviceID string, instances uint64) func(log poll.LogT) poll.Result {
return func(log poll.LogT) poll.Result {
filter := filters.NewArgs()
filter.Add("service", serviceID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is not the correct fix. It should be used as a filter in ServiceList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The actual regression issue was we couldnt create service with host network. Thats what I tried to test it to make sure there is no issue in creating service with host network.
  2. For your clean up comment : I try to remove service and make sure service is getting cleared. Thats what I meant by cleanup. I am not doing anything extra. Do you suggest I dont need to do verify the service that I created whether I am able to remove or not?. Pls let me know your feedback on this.
    3)Regarding not adding filter: I am trying to get all the services running. so looked into existing examples of functions calling serviceList API. none of them using filter. Your previous comment on original comment said , I have defined filter but not using it. So i wanted to clean up the code as part of your previous comment. Pls correct me if my understand is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to remove service and make sure service is getting cleared. Thats what I meant by cleanup. I am not doing anything extra. Do you suggest I dont need to do verify the service that I created whether I am able to remove or not?. Pls let me know your feedback on this.

That's right. The daemon which runs this service is going to be shutdown when the test exits (defer d.Stop(t)) so you don't need to spend time doing cleanup.

Regarding not adding filter: I am trying to get all the services running.

Ok, I guess this is ok since you're targeting a daemon that was launched for this test.

@thaJeztah
Copy link
Member

@selansen this needs a minor rebase

@selansen
Copy link
Contributor Author

selansen commented Mar 14, 2018 via email

@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@4460472). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #36551   +/-   ##
=========================================
  Coverage          ?   34.76%           
=========================================
  Files             ?      612           
  Lines             ?    45577           
  Branches          ?        0           
=========================================
  Hits              ?    15844           
  Misses            ?    27663           
  Partials          ?     2070

@arm64b
Copy link
Contributor

arm64b commented Mar 15, 2018

should I pull latest moby and apply my change on top of it and submit again?

You should fetch the latest code then make your topic branch rebase to the master.

@selansen
Copy link
Contributor Author

Thats what I did and committed my changes. Looks like all sanity passed.
@thaJeztah , pls let me know if I will have to do anything else.

@arm64b
Copy link
Contributor

arm64b commented Mar 15, 2018

Seems the symptom of TestServiceWithIngressNetwork is very similar as this one...

@selansen
Copy link
Contributor Author

Yes. taken care of that too

@selansen
Copy link
Contributor Author

@thaJeztah , Hope this helps flaky issue that we are seeing on selective target.

@thaJeztah
Copy link
Member

thaJeztah commented Mar 16, 2018

I think for now this looks ok (not entirely, see below), but we need to implement #36581, so that we don't wait for all services to be cleaned up (in case we want the option of running tests in parallel)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed some things; @selansen can you check?

@@ -159,3 +149,17 @@ func swarmIngressReady(client client.NetworkAPIClient) func(log poll.LogT) poll.
return poll.Success()
}
}

func noServices(client client.ServiceAPIClient) func(log poll.LogT) poll.Result {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, this function doesn't seem to be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the new API so that TestServiceWithIngressNetwork will call it. Let me update it.

@@ -106,9 +101,6 @@ func TestServiceWithIngressNetwork(t *testing.T) {
err = client.ServiceRemove(context.Background(), serviceID)
require.NoError(t, err)

poll.WaitOn(t, serviceIsRemoved(client, serviceID), pollSettings)
poll.WaitOn(t, noTasks(client), pollSettings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't check if the service (plus tasks?) are all removed here, this test may not be testing what it was created for (i.e.: check that the ingress network is not removed when removing services)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to check service is removed instead task as we are testing service create/remove. Thats why I added new API. Let me update TestServiceWithIngressNetwork. Thanks for pointing it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW actual test that was written by Chris Telfer was to verify Ingress network absence after service remove. So test will fail if it is not met. Daniel mentioned service clean up will happen automatically when we tear down the swarm . so we dont need to check this explicitly. Anyways I update the with new API that TestServiceWithIngressNetwork with my newly added noServices check to make sure we cover that .

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@selansen
Copy link
Contributor Author

Failed test cases are not related to my test cases.

@arm64b
Copy link
Contributor

arm64b commented Mar 19, 2018

Seems DockerSuite.TestPostContainersAttach test case failed is introduced recently, don't know if @thaJeztah concern has been addressed or not, if it's true, we can try to rebuild the 2 failure nodes

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but needs a rebase (looks like because of the gotestyourself changes)

@selansen
Copy link
Contributor Author

@thaJeztah , I just did a rebase. should be good to go.

	TestServiceWithPredefinedNetwork test case was failing
	at times. To fix the issue, added new API to check
	for services after we clean up all services. Tested
	multiple times and this sould fix flaky issue.

Signed-off-by: selansen <[email protected]>
@thaJeztah
Copy link
Member

Thanks! moved it to "status/merge" - fingers crossed if it all goes green 👍

@thaJeztah
Copy link
Member

Hm, more flakiness?

16:53:08 ----------------------------------------------------------------------
16:53:08 FAIL: docker_api_swarm_service_test.go:61: DockerSwarmSuite.TestAPISwarmServicesCreate
16:53:08 
16:53:08 [dee4f3a4df82c] waiting for daemon to start
16:53:08 [dee4f3a4df82c] daemon started
16:53:08 
16:53:08 docker_api_swarm_service_test.go:92:
16:53:08     waitAndAssert(c, defaultReconciliationTimeout, d.CheckActiveContainerCount, checker.Equals, 0)
16:53:08 docker_utils_test.go:452:
16:53:08     c.Assert(v, checker, args...)
16:53:08 ... obtained int = 3
16:53:08 ... expected int = 0
16:53:08 ... output: "dec8242edf45\na6b9f3e67481\nb05bf8c0000a\n"
16:53:08 
16:53:08 [dee4f3a4df82c] exiting daemon

@thaJeztah
Copy link
Member

These tests pass now though;

16:17:58 === RUN   TestServiceWithPredefinedNetwork
16:18:00 --- PASS: TestServiceWithPredefinedNetwork (1.91s)
16:18:00 	daemon.go:283: [d081ae40fa771] waiting for daemon to start
16:18:00 	daemon.go:315: [d081ae40fa771] daemon started
16:18:00 	daemon.go:273: [d081ae40fa771] exiting daemon
16:18:00 === RUN   TestServiceWithIngressNetwork
16:18:11 --- PASS: TestServiceWithIngressNetwork (11.51s)
16:18:11 	daemon.go:283: [de1f27941839a] waiting for daemon to start
16:18:11 	daemon.go:315: [de1f27941839a] daemon started
16:18:11 	daemon.go:273: [de1f27941839a] exiting daemon
16:18:11 PASS

@selansen
Copy link
Contributor Author

Not sure about docker_api_swarm_service_test. They are not related to the fix I did. May be we need to take it as separate issue.

@arm64b
Copy link
Contributor

arm64b commented Mar 22, 2018

docker_api_swarm_service_test.go failure can be tracked by #36501 and #36386.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test TestServiceWithPredefinedNetwork
6 participants