Skip to content

ipcache says a pod has identity 1 (GKE, native routing) #16542

@jrajahalme

Description

@jrajahalme

Following came up in cilium-cli testing:

 [gke_***_us-west2-a_cilium-cilium-cli-936991635] Waiting for Cilium pod kube-system/cilium-9h9vv to have all the pod IPs in eBPF ipcache...
Error: Connectivity test failed: timeout reached waiting for pod IDs in ipcache of Cilium pod kube-system/cilium-9h9vv
error: cannot exec into a container in a completed pod; current phase is Failed
NAMESPACE     NAME                                                             READY   STATUS    RESTARTS   AGE   IP            NODE                                                  NOMINATED NODE   READINESS GATES
cilium-test   client-8655c47fd7-gw9t9                                          1/1     Running   0          12m   10.12.1.26    gke-cilium-cilium-cli-93-default-pool-9c603183-jt1d   <none>           <none>
cilium-test   client2-657df6649d-46lm9                                         1/1     Running   0          12m   10.12.0.1     gke-cilium-cilium-cli-93-default-pool-9c603183-51zv   <none>           <none>

See how client2 IP address is 10.12.0.1?
bpf endpoint list tells it is a “localhost”:

IP ADDRESS      LOCAL ENDPOINT INFO
10.12.0.123:0   id=2225  flags=0x0000 ifindex=15  mac=16:8B:35:19:4D:6A nodemac=56:E9:C3:93:D4:EC   
10.12.0.140:0   id=468   flags=0x0000 ifindex=21  mac=92:59:2B:7D:04:6D nodemac=C6:A0:DC:10:D8:68   
10.168.0.25:0   (localhost)                                                                         
10.12.0.1:0     (localhost)                                                                         
10.12.0.98:0    (localhost)                                                                         
10.12.0.169:0   id=2718  flags=0x0000 ifindex=17  mac=6A:8A:A6:E3:E1:22 nodemac=5A:AB:9F:23:44:CA   

ipcache says the same:

10.12.0.1/32        1 0 0.0.0.0           

CEP tells a different story:

    external-identifiers:
      container-id: 96fc8c05cfa3b7320b8f7f96c920beb1714ca8030e37d8661a2f21318c4baee5
      k8s-namespace: cilium-test
      k8s-pod-name: client2-657df6649d-46lm9
      pod-name: cilium-test/client2-657df6649d-46lm9
    id: 105
    identity:
      id: 5139
      labels:
      - k8s:io.cilium.k8s.policy.cluster=cilium-cilium-cli-936991635
      - k8s:io.cilium.k8s.policy.serviceaccount=default
      - k8s:io.kubernetes.pod.namespace=cilium-test
      - k8s:kind=client
      - k8s:name=client2
      - k8s:other=client
    networking:
      addressing:
      - ipv4: 10.12.0.1
      node: 10.168.0.25
    state: ready

The ipcache of the other Cilium node (not the one hosting the pod) has the correct entry in its ipcache:

10.12.0.1/32        5139 0 10.168.0.25    

Cilium agent logs for the IP address and the endpoint:

2021-06-14T19:57:05.350225182Z level=info msg="  --native-routing-cidr='10.12.0.0/14'" subsys=daemon
...
2021-06-14T19:57:06.036223076Z level=info msg="Inheriting MTU from external network interface" device=vethfa1ca2a6 ipAddr=10.12.0.1 mtu=1460 subsys=mtu
...
2021-06-14T19:57:07.084221220Z level=info msg="Addressing information:" subsys=daemon
2021-06-14T19:57:07.084226830Z level=info msg="  Cluster-Name: cilium-cilium-cli-936991635" subsys=daemon
2021-06-14T19:57:07.084233530Z level=info msg="  Cluster-ID: 0" subsys=daemon
2021-06-14T19:57:07.084240329Z level=info msg="  Local node-name: gke-cilium-cilium-cli-93-default-pool-9c603183-51zv" subsys=daemon
2021-06-14T19:57:07.084246051Z level=info msg="  Node-IPv6: <nil>" subsys=daemon
2021-06-14T19:57:07.084252017Z level=info msg="  External-Node IPv4: 10.168.0.25" subsys=daemon
2021-06-14T19:57:07.084257553Z level=info msg="  Internal-Node IPv4: 10.12.0.98" subsys=daemon
2021-06-14T19:57:07.084263080Z level=info msg="  IPv4 allocation prefix: 10.12.0.0/24" subsys=daemon
2021-06-14T19:57:07.084268447Z level=info msg="  IPv4 native routing prefix: 10.12.0.0/14" subsys=daemon
2021-06-14T19:57:07.084274334Z level=info msg="  Loopback IPv4: 169.254.42.1" subsys=daemon
2021-06-14T19:57:07.084279828Z level=info msg="  Local IPv4 addresses:" subsys=daemon
2021-06-14T19:57:07.084311624Z level=info msg="  - 10.168.0.25" subsys=daemon
2021-06-14T19:57:07.084318146Z level=info msg="  - 10.12.0.1" subsys=daemon
2021-06-14T19:57:07.084323659Z level=info msg="  - 10.12.0.1" subsys=daemon
2021-06-14T19:57:07.084329084Z level=info msg="  - 10.12.0.1" subsys=daemon
2021-06-14T19:57:07.084334735Z level=info msg="  - 10.12.0.1" subsys=daemon
...
2021-06-14T19:57:34.905555589Z level=info msg="Create endpoint request" addressing="&{10.12.0.1 c310b283-cd4a-11eb-ade5-42010aa80019  }" containerID=96fc8c05cfa3b7320b8f7f96c920beb1714ca8030e37d8661a2f21318c4baee5 datapathConfiguration="&{false true false true 0xc001d317ea}" interface=lxc022e741785fa k8sPodName=cilium-test/client2-657df6649d-46lm9 labels="[]" subsys=daemon sync-build=true
2021-06-14T19:57:34.905689952Z level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=105 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-14T19:57:34.905963746Z level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=105 identityLabels="k8s:io.cilium.k8s.policy.cluster=cilium-cilium-cli-936991635,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=cilium-test,k8s:kind=client,k8s:name=client2,k8s:other=client" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-14T19:57:34.906151677Z level=info msg="Skipped non-kubernetes labels when labelling ciliumidentity. All labels will still be used in identity determination" labels="map[]" subsys=crd-allocator
2021-06-14T19:57:34.914948184Z level=info msg="Allocated new global key" key="k8s:io.cilium.k8s.policy.cluster=cilium-cilium-cli-936991635;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=cilium-test;k8s:kind=client;k8s:name=client2;k8s:other=client;" subsys=allocator
2021-06-14T19:57:34.915069668Z level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=105 identity=5139 identityLabels="k8s:io.cilium.k8s.policy.cluster=cilium-cilium-cli-936991635,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=cilium-test,k8s:kind=client,k8s:name=client2,k8s:other=client" ipv4= ipv6= k8sPodName=/ oldIdentity="no identity" subsys=endpoint
2021-06-14T19:57:34.915084884Z level=info msg="Waiting for endpoint to be generated" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=105 identity=5139 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-14T19:57:35.718697564Z level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
2021-06-14T19:57:36.014673719Z level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=105 identity=5139 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-14T19:57:36.015919337Z level=info msg="Successful endpoint creation" containerID= datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=105 identity=5139 ipv4= ipv6= k8sPodName=/ subsys=daemon
2021-06-14T19:57:36.018217418Z level=info msg="API call has been processed" name=endpoint-create processingDuration=1.112696121s subsys=rate totalDuration=1.112817363s uuid=c313ad32-cd4a-11eb-ade5-42010aa80019 waitDurationTotal=0s
...
2021-06-14T19:57:39.126518574Z level=warning msg="Unable to update ipcache map entry on pod add" error="ipcache entry for podIP 10.12.0.1 owned by kvstore or agent" hostIP=10.12.0.1 k8sNamespace=cilium-test k8sPodName=client2-657df6649d-46lm9 podIP=10.12.0.1 podIPs="[{10.12.0.1}]" subsys=k8s-watcher

General Information

  • Cilium version: 1.9.8
  • Kernel version:

Linux gke-cilium-cilium-cli-93-default-pool-9c603183-51zv 5.4.89+ #1 SMP Sat Feb 13 19:45:14 PST 2021 x86_64 x86_64 x86_64 GNU/Linux

  • Orchestration system version in use:

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-5-g76a04fc", GitCommit:"95881afb5df065c250d98cf7f30ee4bb6d281acf", GitTreeState:"clean", BuildDate:"2021-05-21T04:25:07Z", GoVersion:"go1.15.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.9-gke.1900", GitCommit:"008fd38bf3dc201bebdd4fe26edf9bf87478309a", GitTreeState:"clean", BuildDate:"2021-04-14T09:22:08Z", GoVersion:"go1.15.8b5", Compiler:"gc", Platform:"linux/amd64"}

cilium-sysdump-out.zip

How to reproduce the issue

Don't know how to reproduce, but happens as test flake.

Metadata

Metadata

Assignees

Labels

kind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions