pull-kubernetes-federation-e2e-gce is flaky #45978

fejta · 2017-05-17T18:14:29Z

from @nikhita on kubernetes/test-infra#2787

I keep hitting this flake: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/45721/pull-kubernetes-federation-e2e-gce/5138/. Noticed that most of the recently updated PRs are failing on this one. See also: #42072.

I mentioned this in the sig-testing channel as well and wasn't sure if I should be mentioning in the above issue or create a new one in this repo but thought it would be better to document it here. :)

@kubernetes/sig-federation-test-failures
/kind flake

csbell · 2017-05-17T18:37:10Z

The clusters have been re-cycled (jenkins job re-build) and test runs are passing as of ~10:10 PDT. We'll have to figure out why the base clusters timed out.

…

On Wed, May 17, 2017 at 11:15 AM, Erick Fejta ***@***.***> wrote: from @nikhita <https://github.com/nikhita> on kubernetes/test-infra#2787 <kubernetes/test-infra#2787> I keep hitting this flake: https://k8s-gubernator. appspot.com/build/kubernetes-jenkins/pr-logs/pull/45721/ pull-kubernetes-federation-e2e-gce/5138/. Noticed that most of the recently updated PRs are failing on this one. See also: kubernetes/kubernetes#42072 <#42072>. I mentioned this in the sig-testing channel as well and wasn't sure if I should be mentioning in the above issue or create a new one in this repo but thought it would be better to document it here. :) @kubernetes/sig-federation-test-failures <https://github.com/orgs/kubernetes/teams/sig-federation-test-failures> /kind flake — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#45978>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWSmG1qJDUzkG4TiJnP3PEIToA615-4_ks5r6zkmgaJpZM4NeQHv> .

madhusudancs · 2017-05-17T21:51:14Z

This was an infrastructure issue. @shashidharatd and @irfanurrehman who were around when this happened both noticed this but they did not have access to Jenkins to manually trigger a new run. We will prioritize moving the federation presubmit deploy job from Jenkins to prow.

Filed an issue here - kubernetes/test-infra#2791

It has been a little over a week since we started reporting the results of this job on all the PRs. We did not have any major issues. We had a couple of minor hiccups: Issues kubernetes/kubernetes#45795 and kubernetes/kubernetes#45978. We had foreseen problems of the first type but the second one was a little surprising. SIG-Federation is starting a buildcop rotation and the buildcops should be able to handle both these types of situations. We don't have all the tooling in place for non-Googlers to handle these issues because it needs access to Jenkins, so they still need to ping a Googler when they see a problem. We are working on moving these jobs out of Jenkins to prow (Issue kubernetes#2791). Empirically, these problems have been uncommon and shouldn't affect the submit queue often.

foxish · 2017-05-22T06:11:02Z

The same issue appears to be happening again with #46071 and #46071

madhusudancs · 2017-05-22T06:35:53Z

Thanks! We are debugging this. We have made federation presubmits non-blocking for now. You should be able to merge PRs without that job passing.

0xmichalis · 2017-05-22T14:27:57Z

Can't seem to get past this flake in #46169

0xmichalis · 2017-05-22T14:28:56Z

Ok, just saw @madhusudancs's comment and it seems that the PR is in the queue, sorry for the noise

xiao-zhou · 2017-05-23T09:09:37Z

My PR hit this flaky test as well #46213

pmichali · 2017-05-23T10:52:53Z

In my PR #46138, I see this fail and pull-kubernetes-kubemark-e2e-gce and pull-kubernetesnode-e2e. Not sure if flakes should be generated for the other two tests.

ncdc · 2017-05-23T15:12:29Z

@pmichali as we discussed on slack, the other 2 failures were most likely related to changes in your PR itself and not actual flakes.

pmichali · 2017-05-25T11:57:32Z

Corrected the other two issues with my latest commit on #46138, but still see this issue.

janetkuo · 2017-05-25T20:00:12Z

Is it flaky or broken? I haven't seen it pass

perotinus · 2017-05-25T23:22:21Z

It's flaky: https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-federation-e2e-gce

There was a fix yesterday for one issue that was causing flakiness, but it appears that it was not comprehensive. We're looking into this.

perotinus · 2017-05-27T00:41:18Z

This appears to have been fixed by recycling the clusters: the issue that was fixed earlier was merged in on May 25, after the daily cluster recycling. So, some clusters were left in a bad state because of previous failures. Once the clusters were recycled on the morning of the 26th, the tests stopped being flaky.

perotinus · 2017-05-27T00:42:10Z

/assign @perotinus

perotinus · 2017-05-27T00:42:16Z

/close

caesarxuchao · 2017-06-02T03:44:13Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/45184/pull-kubernetes-federation-e2e-gce/8189/

caesarxuchao · 2017-06-02T21:16:04Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46500/pull-kubernetes-federation-e2e-gce/8463/

caesarxuchao · 2017-06-02T21:17:27Z

I'm reopening the issue. @madhusudancs could you take a look? Since the error message is error during ./federation/cluster/federation-up.sh: exit status 255, so it might be a test-infra issue.

madhusudancs · 2017-06-02T21:31:27Z

@caesarxuchao this test ran when we had just fixed the test infra issue and redeploying things. You shouldn't see this problem if you re-run the test now.

caesarxuchao · 2017-06-03T01:01:58Z

Thanks. Closing.

pmichali · 2017-07-10T12:55:05Z

@madhusudancs I still see this issue (I think it is this) on #46138 and #46874. Can you please advise?

sttts · 2017-07-10T14:07:10Z

Here is a recent log: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46138/pull-kubernetes-federation-e2e-gce/13815/

pmichali · 2017-07-10T14:11:10Z

I see a few other people have the same log messages as what I see. The failure seems consistently occurring (I tried retest 2x).

mattmoyer · 2017-07-13T16:04:15Z

There's another recent string of failures for this: https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-federation/?

For example, the most recent: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-federation/5221/

W0713 15:10:55.034] 2017/07/13 15:10:54 main.go:191: Something went wrong: error starting federation: error during ./federation/cluster/federation-up.sh: exit status 124
W0713 15:10:

csbell · 2017-07-13T17:10:29Z

Looking into it.

…

On Thu, Jul 13, 2017 at 9:05 AM, Matt Moyer ***@***.***> wrote: There's another recent string of failures for this: https://k8s-gubernator.appspot.com/builds/kubernetes- jenkins/logs/ci-kubernetes-e2e-gce-federation/? For example, the most recent: https://k8s-gubernator. appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes- e2e-gce-federation/5221/ W0713 15:10:55.034] 2017/07/13 15:10:54 main.go:191: Something went wrong: error starting federation: error during ./federation/cluster/federation-up.sh: exit status 124 W0713 15:10: — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#45978 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWSmG_PJ5zew60N0VTjGucMOkxxuszW5ks5sNkA5gaJpZM4NeQHv> .

perotinus · 2017-08-04T18:44:14Z

I believe this has been addressed.

perotinus · 2017-08-04T18:44:17Z

/close

k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. sig/federation labels May 17, 2017

nikhita mentioned this issue May 17, 2017

pull-kubernetes-federation-e2e-gce test failed #45954

Closed

madhusudancs closed this as completed May 17, 2017

madhusudancs mentioned this issue May 17, 2017

Make the federation presubmit job blocking. kubernetes/test-infra#2792

Merged

madhusudancs reopened this May 22, 2017

madhusudancs mentioned this issue May 22, 2017

pull-kubernetes-federation-e2e-gce flake: PR #45781 #45795

Closed

foxish mentioned this issue May 22, 2017

PDB Max Unavailable Field #45587

Merged

0xmichalis mentioned this issue May 23, 2017

Move PDB controller and type ownership to SIG-Apps #45301

Merged

pmichali mentioned this issue May 23, 2017

IPv6 support for getting IP from default route #46138

Merged

timothysc mentioned this issue May 23, 2017

Update RBAC policy for configmap locked leader leasing. #45966

Merged

pipejakob mentioned this issue May 23, 2017

refactor certificate controller to break it into two parts #45514

Merged

ericchiang mentioned this issue May 23, 2017

oidc client plugin: reduce round trips and fix scopes requested #45317

Merged

This was referenced May 25, 2017

Controller history #45867

Merged

Implement Daemonset history #45924

Merged

janetkuo added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label May 25, 2017

k8s-ci-robot assigned perotinus May 27, 2017

k8s-ci-robot closed this as completed May 27, 2017

caesarxuchao reopened this Jun 2, 2017

caesarxuchao closed this as completed Jun 3, 2017

sttts reopened this Jul 10, 2017

mattmoyer mentioned this issue Jul 13, 2017

Flake: pull-kubernetes-federation-e2e-gce: error during ./federation/cluster/federation-up.sh: exit status 124 #46690

Closed

mattmoyer mentioned this issue Jul 13, 2017

Automated cherry pick of #48737 #48834

Merged

k8s-ci-robot closed this as completed Aug 4, 2017

ebbeelsborg mentioned this issue Aug 10, 2017

Typed static/mirror pod UID translation #48699

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pull-kubernetes-federation-e2e-gce is flaky #45978

pull-kubernetes-federation-e2e-gce is flaky #45978

fejta commented May 17, 2017

csbell commented May 17, 2017 via email

madhusudancs commented May 17, 2017

foxish commented May 22, 2017

madhusudancs commented May 22, 2017

0xmichalis commented May 22, 2017

0xmichalis commented May 22, 2017

xiao-zhou commented May 23, 2017

pmichali commented May 23, 2017

ncdc commented May 23, 2017

pmichali commented May 25, 2017

janetkuo commented May 25, 2017

perotinus commented May 25, 2017

perotinus commented May 27, 2017

perotinus commented May 27, 2017

perotinus commented May 27, 2017

caesarxuchao commented Jun 2, 2017

caesarxuchao commented Jun 2, 2017

caesarxuchao commented Jun 2, 2017

madhusudancs commented Jun 2, 2017

caesarxuchao commented Jun 3, 2017

pmichali commented Jul 10, 2017

sttts commented Jul 10, 2017

pmichali commented Jul 10, 2017

mattmoyer commented Jul 13, 2017

csbell commented Jul 13, 2017 via email

perotinus commented Aug 4, 2017

perotinus commented Aug 4, 2017

pull-kubernetes-federation-e2e-gce is flaky #45978

pull-kubernetes-federation-e2e-gce is flaky #45978

Comments

fejta commented May 17, 2017

csbell commented May 17, 2017 via email

madhusudancs commented May 17, 2017

foxish commented May 22, 2017

madhusudancs commented May 22, 2017

0xmichalis commented May 22, 2017

0xmichalis commented May 22, 2017

xiao-zhou commented May 23, 2017

pmichali commented May 23, 2017

ncdc commented May 23, 2017

pmichali commented May 25, 2017

janetkuo commented May 25, 2017

perotinus commented May 25, 2017

perotinus commented May 27, 2017

perotinus commented May 27, 2017

perotinus commented May 27, 2017

caesarxuchao commented Jun 2, 2017

caesarxuchao commented Jun 2, 2017

caesarxuchao commented Jun 2, 2017

madhusudancs commented Jun 2, 2017

caesarxuchao commented Jun 3, 2017

pmichali commented Jul 10, 2017

sttts commented Jul 10, 2017

pmichali commented Jul 10, 2017

mattmoyer commented Jul 13, 2017

csbell commented Jul 13, 2017 via email

perotinus commented Aug 4, 2017

perotinus commented Aug 4, 2017