When restoring endpoints on startup there can be a race between two threads to update the identity. The first thread sets the identity for restored endpoints, while the second recomputes the identity if the pod (or node in case of the host endpoint) labels changed. If the second thread register an identity in the identity manager before the first, it results in the error removing identity not added to the identity manager!.
This error seems to be more frequent with the host endpoint because the identity update from label update might happen sooner, via InitHostEndpointLabels().
The following log trace shows the error happening, with the second thread trying to update (by first removing) the host endpoint's identity from the manager at 22:08:47.03, and the first thread setting the identity only later, at 22:08:57.13.
2021-06-02T22:08:47.037805506Z level=debug msg="Refreshing labels of endpoint" containerID= endpointID=2653 identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" infoLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" subsys=endpoint
2021-06-02T22:08:47.037816164Z level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=2653 identity=1 identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-02T22:08:47.037820655Z level=debug msg="Resolving identity for labels" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=2653 identity=1 identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-02T22:08:47.037824485Z level=debug msg="Resolving identity" identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" subsys=identity-cache
2021-06-02T22:08:47.037828071Z level=debug msg="Resolved reserved identity" identity=host identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" isNew=false subsys=identity-cache
2021-06-02T22:08:47.037831528Z level=debug msg="Assigned new identity to endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=2653 identity=1 identityLabels="k8s:cilium.io/ci-node=k8s1,k8s:node-role.kubernetes.io/control-plane,k8s:node-role.kubernetes.io/master,reserved:host" ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-06-02T22:08:47.037835485Z level=debug msg="removing old and adding new identity" new=1 old=1 subsys=identitymanager
2021-06-02T22:08:47.037838767Z level=error msg="removing identity not added to the identity manager!" identity=1 subsys=identitymanager
[...]
2021-06-02T22:08:57.133117604Z level=info msg="Restored endpoint" endpointID=2653 ipAddr="[ ]" subsys=endpoint
This error happens regularly in CI because the host firewall tests set and unset a node label.
When restoring endpoints on startup there can be a race between two threads to update the identity. The first thread sets the identity for restored endpoints, while the second recomputes the identity if the pod (or node in case of the host endpoint) labels changed. If the second thread register an identity in the identity manager before the first, it results in the error
removing identity not added to the identity manager!.This error seems to be more frequent with the host endpoint because the identity update from label update might happen sooner, via
InitHostEndpointLabels().The following log trace shows the error happening, with the second thread trying to update (by first removing) the host endpoint's identity from the manager at 22:08:47.03, and the first thread setting the identity only later, at 22:08:57.13.
This error happens regularly in CI because the host firewall tests set and unset a node label.