You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The agentgatewayinference-gateway is provisioned as a public LoadBalancer (port 80, HTTP, unauthenticated). Its allowedSourceRanges defaults to [], which — by deliberate design (#1138) — leaves the Service open to 0.0.0.0/0. The default itself is defensible (a baked-in CIDR would firewall every downstream operator to NVIDIA's network), but the exposure is silent: nothing in aicr bundle output or validation surfaces that the inference gateway is internet-facing. An operator only discovers it by manually inspecting the live Service.
Discovered during GKE H100 inference e2e validation: the deployed inference-gateway had a public IP with loadBalancerSourceRanges empty (0.0.0.0/0), found only by hand.
Proposal (two guardrails — keep open-by-default, remove the silence)
Bundle-time warning — when the agentgateway component is included with an empty allowedSourceRanges, emit a warning in aicr bundle output, e.g.:
⚠ inference-gateway will be provisioned as an internet-facing LoadBalancer open to 0.0.0.0/0. Scope it to trusted networks via agentgateway.allowedSourceRanges (recipe componentRef override). See docs/user/component-catalog.md.
This mirrors the existing kube-prometheus-stack PVC storageClassName warning already printed by the bundler.
Conformance/security check — a validator check that, on a live cluster, flags an inference-gateway (or any agentgateway-managed) LoadBalancer Service whose spec.loadBalancerSourceRanges is empty while type=LoadBalancer. Reports the open exposure as a finding (warn/fail per policy).
Non-goals: (a) baking a CIDR into the shared overlay/default — would lock external operators out of their own gateway; (b) flipping the default to internal/ClusterIP — a stronger posture worth considering, but a separate maintainer decision with broad blast radius.
Summary
The
agentgatewayinference-gatewayis provisioned as a publicLoadBalancer(port 80, HTTP, unauthenticated). ItsallowedSourceRangesdefaults to[], which — by deliberate design (#1138) — leaves the Service open to0.0.0.0/0. The default itself is defensible (a baked-in CIDR would firewall every downstream operator to NVIDIA's network), but the exposure is silent: nothing inaicr bundleoutput or validation surfaces that the inference gateway is internet-facing. An operator only discovers it by manually inspecting the live Service.Discovered during GKE H100 inference e2e validation: the deployed
inference-gatewayhad a public IP withloadBalancerSourceRangesempty (0.0.0.0/0), found only by hand.Proposal (two guardrails — keep open-by-default, remove the silence)
Bundle-time warning — when the
agentgatewaycomponent is included with an emptyallowedSourceRanges, emit a warning inaicr bundleoutput, e.g.:This mirrors the existing kube-prometheus-stack PVC
storageClassNamewarning already printed by the bundler.Conformance/security check — a validator check that, on a live cluster, flags an
inference-gateway(or any agentgateway-managed)LoadBalancerService whosespec.loadBalancerSourceRangesis empty whiletype=LoadBalancer. Reports the open exposure as a finding (warn/fail per policy).Rationale / scope
Pointers
recipes/components/agentgateway/values.yaml(allowedSourceRanges: [])recipes/components/agentgateway/manifests/inference-gateway.yaml(loadBalancerSourceRangesunderAgentgatewayParameters.spec.service.spec)pkg/bundleroutputAddressed by: #1163 (bundle-time warning + conformance finding). Related: #1161 / #1162 (
--set-jsonCLI flag for scopingallowedSourceRanges).