prequel-dev · tonymeehan · Jul 29, 2025 · Jul 1, 2025 · Jul 2, 2025 · Jul 22, 2025
diff --git a/rules/cre-2025-0122/aws-vpc-cni-ip-exhaustion-crisis.yaml b/rules/cre-2025-0122/aws-vpc-cni-ip-exhaustion-crisis.yaml
@@ -0,0 +1,93 @@
+rules:
+  - cre:
+      id: CRE-2025-0121
+      severity: 0
+      title: AWS VPC CNI IP Address Exhaustion Crisis
+      category: networking-problem
+      author: Prequel
+      description: |
+        Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures
+        where subnet IP exhaustion leads to ENI allocation failures, pod scheduling failures, and
+        complete service unavailability. The failure sequence shows IP allocation errors, ENI attachment
+        failures, and resulting pod startup failures that affect cluster scalability and workload deployment.
+      cause: |
+        - Subnet IP address pool exhaustion in VPC
+        - Maximum ENI limit reached per EC2 instance
+        - Secondary IP allocation failures on existing ENIs
+        - VPC CNI plugin configuration errors
+        - Insufficient subnet CIDR block size for cluster scale
+        - ENI warm pool depletion during traffic spikes
+        - AWS API rate limiting on EC2 ENI operations
+        - Security group or NACL blocking ENI operations
+        - IAM permissions missing for ENI management
+        - Cross-AZ networking constraints affecting IP allocation
+      impact: |
+        - CRITICAL: Complete inability to schedule new pods
+        - Existing pods fail to restart or scale
+        - Service degradation due to reduced pod capacity
+        - Cluster autoscaling failures and node provisioning issues
+        - Application deployment failures and rollback complications
+        - Load balancer health check failures due to unreachable pods
+        - Cascading failures across microservices architecture
+        - Data plane connectivity loss between pods
+        - Revenue loss from service unavailability
+        - Compliance violations for high-availability requirements
+      impactScore: 10
+      tags:
+        - aws
+        - vpc-cni
+        - kubernetes
+        - networking
+        - ip-exhaustion
+        - eni-allocation
+        - pod-scheduling
+        - cluster-scaling
+        - high-availability
+        - service-unavailability
+      mitigation: |
+        IMMEDIATE ACTIONS:
+        - Check available IPs in subnets: `aws ec2 describe-subnets --subnet-ids subnet-xxx`
+        - Verify ENI limits: `aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=i-xxx`
+        - Monitor VPC CNI logs: `kubectl logs -n kube-system -l app=aws-node`
+        - Check pod scheduling: `kubectl get pods --all-namespaces | grep Pending`
+        - Verify CNI configuration: `kubectl get configmap -n kube-system aws-node -o yaml`
+
+        RECOVERY STEPS:
+        1. Add additional subnets with larger CIDR blocks
+        2. Increase ENI warm pool size: `kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=2`
+        3. Enable prefix delegation: `kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true`
+        4. Scale down non-critical workloads to free IPs
+        5. Restart VPC CNI daemonset: `kubectl rollout restart daemonset/aws-node -n kube-system`
+        6. Monitor IP allocation recovery: `kubectl get pods -n kube-system -l app=aws-node`
+
+        PREVENTION:
+        - Implement IP address monitoring and alerting
+        - Configure subnet auto-scaling with larger CIDR blocks
+        - Set up VPC CNI metrics monitoring in CloudWatch
+        - Implement pod density limits per node
+        - Use prefix delegation for improved IP efficiency
+        - Regular capacity planning for cluster growth
+        - Implement network policy optimization
+        - Set up automated subnet provisioning
+      references:
+        - https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
+        - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
+        - https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/
+        - https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html
+        - https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
+      applications:
+        - name: amazon-vpc-cni-k8s
+          version: ">= 1.7.0"
+        - name: kubernetes
+          version: ">= 1.18.0"
+      mitigationScore: 6
+    metadata:
+      gen: 1
+      id: 6E7meYDEvC5c6yub5dVgkW
+      kind: prequel
+    rule:
+      set:
+        event:
+          source: cre.log.aws-vpc-cni
+        match:
+          - regex: "failed to allocate a private IP address.*no available IP addresses|ENI allocation failed.*insufficient IP addresses|failed to assign private IP.*AddressLimitExceeded|pod.*failed.*no available IP|insufficient IP addresses in subnet|failed to create ENI.*AddressLimitExceeded|unable to provision ENI.*IP address limit|failed to allocate IP.*subnet has no available addresses|pod scheduling failed.*insufficient IP addresses|CNI failed to allocate IP.*no free addresses"
diff --git a/rules/cre-2025-0122/test.log b/rules/cre-2025-0122/test.log
@@ -0,0 +1,21 @@
+2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1234 failed to allocate ENI: AddressLimitExceeded: The maximum number of addresses has been reached.
+2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1235 no available IP addresses in subnet 
+2025/07/02 08:29:03 [WARN] aws-node-daemonset-xyz: ipamd.go:1236 insufficient IP addresses available for new pods
+2025/07/02 08:29:03 [ERROR] kubelet: event.go:294 FailedScheduling: 0/3 nodes are available: 3 Insufficient IP addresses in subnet 
+2025/07/02 08:29:03 [ERROR] kubelet: event.go:295 FailedScheduling: pod "test-app-deployment-abc123-xyz" failed to fit in any node
+2025/07/02 08:29:03 [ERROR] scheduler: scheduler.go:456 Failed to schedule pod test-app/test-pod-789: Insufficient IP
+2025/07/02 08:29:03 [ERROR] aws-node: cni.go:123 failed to assign an IP address to container: no available IP addresses in subnet 
+2025/07/02 08:29:03 [ERROR] aws-node: eni.go:234 failed to allocate ENI for pod test-pod-456: NetworkInterfaceLimitExceeded
+2025/07/02 08:29:03 [ERROR] aws-node: ipam.go:345 IPAM: failed to get IP address from datastore: no available IP addresses
+2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:567 EC2 API error: AddressLimitExceeded - The maximum number of addresses has been reached
+2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:568 EC2 API error: NetworkInterfaceLimitExceeded - The maximum number of network interfaces has been reached
+2025/07/02 08:29:03 [ERROR] aws-node: vpc.go:789 VPC CNI error: insufficient IP addresses in subnet  for pod allocation
+2025/07/02 08:29:03 [ERROR] cluster-autoscaler: scale_up.go:123 failed to scale up: nodes cannot accommodate new pods due to IP exhaustion in VPC 
+2025/07/02 08:29:03 [ERROR] karpenter: provisioner.go:234 failed to provision new node: insufficient IP addresses in subnet 
+2025/07/02 08:29:03 [ERROR] aws-load-balancer-controller: controller.go:345 failed to create target group: no available IP addresses
+2025/07/02 08:29:03 [ERROR] deployment-controller: deployment.go:456 Deployment "critical-app" failed: pods cannot be scheduled due to IP exhaustion
+2025/07/02 08:29:03 [ERROR] replicaset-controller: replicaset.go:567 ReplicaSet "web-app-rs" failed to create pods: Insufficient IP addresses
+2025/07/02 08:29:03 [ERROR] statefulset-controller: statefulset.go:678 StatefulSet "database" stuck: cannot allocate IP addresses for new pods
+2025/07/02 08:29:03 [ERROR] service-controller: service.go:789 Service "api-service" endpoints unavailable: pods failed to start due to IP exhaustion
+2025/07/02 08:29:03 [ERROR] ingress-controller: ingress.go:890 Ingress "web-ingress" backend unavailable: target pods cannot be scheduled
+2025/07/02 08:29:03 [ERROR] dns-controller: dns.go:901 DNS resolution failing: CoreDNS pods cannot be scheduled due to IP exhaustion
diff --git a/rules/tags/tags.yaml b/rules/tags/tags.yaml
@@ -830,3 +830,9 @@ tags:
   - name: certificate-verification
     displayName: Certificate Verification
     description: Issues with SSL/TLS certificate verification including trust chain validation, certificate authority verification, and hostname matching
+  - name: pod-scheduling
+    displayName: Pod Scheduling
+    description: Issues with Kubernetes pod scheduling due to resource constraints or networking problems
+  - name: cluster-scaling
+    displayName: Cluster Scaling
+    description: Problems related to Kubernetes cluster scaling operations and capacity management