Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions rules/cre-2025-0122/aws-vpc-cni-ip-exhaustion-crisis.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
rules:
- cre:
id: CRE-2025-0121
severity: 0
title: AWS VPC CNI IP Address Exhaustion Crisis
category: networking-problem
author: Prequel
description: |
Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures
where subnet IP exhaustion leads to ENI allocation failures, pod scheduling failures, and
complete service unavailability. The failure sequence shows IP allocation errors, ENI attachment
failures, and resulting pod startup failures that affect cluster scalability and workload deployment.
cause: |
- Subnet IP address pool exhaustion in VPC
- Maximum ENI limit reached per EC2 instance
- Secondary IP allocation failures on existing ENIs
- VPC CNI plugin configuration errors
- Insufficient subnet CIDR block size for cluster scale
- ENI warm pool depletion during traffic spikes
- AWS API rate limiting on EC2 ENI operations
- Security group or NACL blocking ENI operations
- IAM permissions missing for ENI management
- Cross-AZ networking constraints affecting IP allocation
impact: |
- CRITICAL: Complete inability to schedule new pods
- Existing pods fail to restart or scale
- Service degradation due to reduced pod capacity
- Cluster autoscaling failures and node provisioning issues
- Application deployment failures and rollback complications
- Load balancer health check failures due to unreachable pods
- Cascading failures across microservices architecture
- Data plane connectivity loss between pods
- Revenue loss from service unavailability
- Compliance violations for high-availability requirements
impactScore: 10
tags:
- aws
- vpc-cni
- kubernetes
- networking
- ip-exhaustion
- eni-allocation
- pod-scheduling
- cluster-scaling
- high-availability
- service-unavailability
mitigation: |
IMMEDIATE ACTIONS:
- Check available IPs in subnets: `aws ec2 describe-subnets --subnet-ids subnet-xxx`
- Verify ENI limits: `aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=i-xxx`
- Monitor VPC CNI logs: `kubectl logs -n kube-system -l app=aws-node`
- Check pod scheduling: `kubectl get pods --all-namespaces | grep Pending`
- Verify CNI configuration: `kubectl get configmap -n kube-system aws-node -o yaml`

RECOVERY STEPS:
1. Add additional subnets with larger CIDR blocks
2. Increase ENI warm pool size: `kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=2`
3. Enable prefix delegation: `kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true`
4. Scale down non-critical workloads to free IPs
5. Restart VPC CNI daemonset: `kubectl rollout restart daemonset/aws-node -n kube-system`
6. Monitor IP allocation recovery: `kubectl get pods -n kube-system -l app=aws-node`

PREVENTION:
- Implement IP address monitoring and alerting
- Configure subnet auto-scaling with larger CIDR blocks
- Set up VPC CNI metrics monitoring in CloudWatch
- Implement pod density limits per node
- Use prefix delegation for improved IP efficiency
- Regular capacity planning for cluster growth
- Implement network policy optimization
- Set up automated subnet provisioning
references:
- https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
- https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
- https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/
- https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html
- https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
applications:
- name: amazon-vpc-cni-k8s
version: ">= 1.7.0"
- name: kubernetes
version: ">= 1.18.0"
mitigationScore: 6
metadata:
gen: 1
id: 6E7meYDEvC5c6yub5dVgkW
kind: prequel
rule:
set:
event:
source: cre.log.aws-vpc-cni
match:
- regex: "failed to allocate a private IP address.*no available IP addresses|ENI allocation failed.*insufficient IP addresses|failed to assign private IP.*AddressLimitExceeded|pod.*failed.*no available IP|insufficient IP addresses in subnet|failed to create ENI.*AddressLimitExceeded|unable to provision ENI.*IP address limit|failed to allocate IP.*subnet has no available addresses|pod scheduling failed.*insufficient IP addresses|CNI failed to allocate IP.*no free addresses"
21 changes: 21 additions & 0 deletions rules/cre-2025-0122/test.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1234 failed to allocate ENI: AddressLimitExceeded: The maximum number of addresses has been reached.
2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1235 no available IP addresses in subnet
2025/07/02 08:29:03 [WARN] aws-node-daemonset-xyz: ipamd.go:1236 insufficient IP addresses available for new pods
2025/07/02 08:29:03 [ERROR] kubelet: event.go:294 FailedScheduling: 0/3 nodes are available: 3 Insufficient IP addresses in subnet
2025/07/02 08:29:03 [ERROR] kubelet: event.go:295 FailedScheduling: pod "test-app-deployment-abc123-xyz" failed to fit in any node
2025/07/02 08:29:03 [ERROR] scheduler: scheduler.go:456 Failed to schedule pod test-app/test-pod-789: Insufficient IP
2025/07/02 08:29:03 [ERROR] aws-node: cni.go:123 failed to assign an IP address to container: no available IP addresses in subnet
2025/07/02 08:29:03 [ERROR] aws-node: eni.go:234 failed to allocate ENI for pod test-pod-456: NetworkInterfaceLimitExceeded
2025/07/02 08:29:03 [ERROR] aws-node: ipam.go:345 IPAM: failed to get IP address from datastore: no available IP addresses
2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:567 EC2 API error: AddressLimitExceeded - The maximum number of addresses has been reached
2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:568 EC2 API error: NetworkInterfaceLimitExceeded - The maximum number of network interfaces has been reached
2025/07/02 08:29:03 [ERROR] aws-node: vpc.go:789 VPC CNI error: insufficient IP addresses in subnet for pod allocation
2025/07/02 08:29:03 [ERROR] cluster-autoscaler: scale_up.go:123 failed to scale up: nodes cannot accommodate new pods due to IP exhaustion in VPC
2025/07/02 08:29:03 [ERROR] karpenter: provisioner.go:234 failed to provision new node: insufficient IP addresses in subnet
2025/07/02 08:29:03 [ERROR] aws-load-balancer-controller: controller.go:345 failed to create target group: no available IP addresses
2025/07/02 08:29:03 [ERROR] deployment-controller: deployment.go:456 Deployment "critical-app" failed: pods cannot be scheduled due to IP exhaustion
2025/07/02 08:29:03 [ERROR] replicaset-controller: replicaset.go:567 ReplicaSet "web-app-rs" failed to create pods: Insufficient IP addresses
2025/07/02 08:29:03 [ERROR] statefulset-controller: statefulset.go:678 StatefulSet "database" stuck: cannot allocate IP addresses for new pods
2025/07/02 08:29:03 [ERROR] service-controller: service.go:789 Service "api-service" endpoints unavailable: pods failed to start due to IP exhaustion
2025/07/02 08:29:03 [ERROR] ingress-controller: ingress.go:890 Ingress "web-ingress" backend unavailable: target pods cannot be scheduled
2025/07/02 08:29:03 [ERROR] dns-controller: dns.go:901 DNS resolution failing: CoreDNS pods cannot be scheduled due to IP exhaustion
6 changes: 6 additions & 0 deletions rules/tags/tags.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -830,3 +830,9 @@ tags:
- name: certificate-verification
displayName: Certificate Verification
description: Issues with SSL/TLS certificate verification including trust chain validation, certificate authority verification, and hostname matching
- name: pod-scheduling
displayName: Pod Scheduling
description: Issues with Kubernetes pod scheduling due to resource constraints or networking problems
- name: cluster-scaling
displayName: Cluster Scaling
description: Problems related to Kubernetes cluster scaling operations and capacity management
Loading