feat(bundler): private-by-default agentgateway inference-gateway exposure#1450
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR changes Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/bundler/bundler.go`:
- Around line 1213-1216: The warning in the bundler branch that logs explicit
any-source exposure is IPv4-specific even when the matched range is IPv6. Update
the log message in the `slog.Warn` call near the agentgateway source-ranges
handling to be protocol-agnostic, either by saying “any-source CIDR” or by
including the actual matched range value so both `0.0.0.0/0` and `::/0` are
reported correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: c89d8354-f82f-4dfd-98c7-dc380a2b2219
📒 Files selected for processing (11)
docs/user/cli-reference.mddocs/user/component-catalog.mdpkg/bundler/bundler.gopkg/bundler/bundler_test.gopkg/netutil/cidr.gopkg/netutil/cidr_test.gotests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yamltests/uat/aws/tests/cuj2-inference/chainsaw-test.yamltests/uat/azure/tests/cuj2-inference/chainsaw-test.yamltests/uat/gcp/tests/cuj2-inference/chainsaw-test.yaml
e8404fb to
2e338d9
Compare
Recipe evidence checkNo leaf overlays affected by this PR. This gate is warning-only and never blocks merge. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/bundler/bundler.go (1)
1234-1237: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winMake the any-source warning log protocol-agnostic.
The
exposureOpenbranch also covers explicit::/0opt-ins, but thisslog.Warnline hardcodes0.0.0.0/0, so IPv6-only exposure is logged as IPv4. Use an "any-source CIDR" phrasing (or include the matched range).Suggested fix
- slog.Warn("agentgateway inference-gateway is explicitly opened to 0.0.0.0/0 (deliberate any-source opt-in)", + slog.Warn("agentgateway inference-gateway is explicitly opened via an any-source CIDR (deliberate public opt-in)", "component", agentgatewayComponentName, "path", agentgatewaySourceRangesPath, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/bundler/bundler.go` around lines 1234 - 1237, The warning in the `exposureOpen` branch is IPv4-specific because it hardcodes `0.0.0.0/0` even though it also covers `::/0` opt-ins. Update the `slog.Warn` message in the agentgateway exposure logging path to be protocol-agnostic by using an “any-source CIDR” phrasing or by logging the actual matched range, while keeping the existing `agentgatewayComponentName` and `agentgatewaySourceRangesPath` context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml`:
- Around line 74-78: The negative any-source check in the chainsaw test is too
broad because it scans the entire manifest and can match commented text instead
of the rendered source ranges. Update the assertions around the manifest check
in the test near the loadBalancerSourceRanges validation so that the `0.0.0.0/0`
absence is verified only within the actual `loadBalancerSourceRanges` list
entries, or by reading that YAML path directly, using the existing `MANIFEST`
check block as the locator.
---
Duplicate comments:
In `@pkg/bundler/bundler.go`:
- Around line 1234-1237: The warning in the `exposureOpen` branch is
IPv4-specific because it hardcodes `0.0.0.0/0` even though it also covers `::/0`
opt-ins. Update the `slog.Warn` message in the agentgateway exposure logging
path to be protocol-agnostic by using an “any-source CIDR” phrasing or by
logging the actual matched range, while keeping the existing
`agentgatewayComponentName` and `agentgatewaySourceRangesPath` context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 513c09e1-41aa-481f-bbea-3fadd5355a94
📒 Files selected for processing (10)
docs/user/cli-reference.mddocs/user/component-catalog.mdpkg/bundler/bundler.gopkg/bundler/bundler_test.gopkg/netutil/cidr.gopkg/netutil/cidr_test.gorecipes/components/agentgateway/manifests/inference-gateway.yamlrecipes/components/agentgateway/values.yamltests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml
2e338d9 to
7ac34f0
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/user/component-catalog.md`:
- Around line 71-75: The exposure section is inconsistent: it still describes
agentgateway as internet-facing by default even though the new bundling behavior
makes it private by default and validates allowedSourceRanges. Update the
component-catalog exposition around agentgateway, aicr bundle, and the nearby
--set example so they all describe the same default behavior, the opt-in
override path, and the failure/validation case consistently.
In `@pkg/bundler/bundler_test.go`:
- Around line 2073-2145: The test table for classifyAgentgatewaySourceRanges is
missing coverage for two new input branches: a valid []string input path and a
non-string element inside a []any list. Add one case that passes a []string
containing a scoped CIDR to verify the []string contract, and one case that
passes []any{123} to verify non-string list entries are rejected. Keep the
additions alongside the existing allowedSourceRanges scenarios in
bundler_test.go so the behavior stays pinned.
In `@tests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml`:
- Around line 72-81: The default-private source range assertions in the chainsaw
test currently exclude only the IPv4 any-source range, so update the rendered
Service checks to also verify that the IPv6 any-source range is absent. In the
same assertion block that uses grep against MANIFEST, add a negative match for
::/0 alongside the existing 0.0.0.0/0 check, keeping the pattern anchored to the
loadBalancerSourceRanges list entries so it does not match comments or unrelated
content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 09db3849-a510-46f7-bf14-30141ad67da4
📒 Files selected for processing (11)
docs/user/bundling.mddocs/user/cli-reference.mddocs/user/component-catalog.mdpkg/bundler/bundler.gopkg/bundler/bundler_test.gopkg/netutil/cidr.gopkg/netutil/cidr_test.gorecipes/components/agentgateway/manifests/inference-gateway.yamlrecipes/components/agentgateway/values.yamltests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml
njhensley
left a comment
There was a problem hiding this comment.
✅ Review — private-by-default agentgateway inference-gateway exposure
Method. Reviewed across four lenses (correctness, security, domain/architecture, operability) with an adversarial verification pass; findings re-derived against the resolved code at 7ac34f0.
Legend: 🔴 Blocker · 🟠 Major · 🟡 Minor · 🔵 Nitpick
Overall
The private-by-default design is sound and a clear improvement over an open default: an unset/empty allowedSourceRanges now injects the RFC1918 private set into loadBalancerSourceRanges so the gateway denies the public internet while bundles still generate (no breaking failure). I verified:
resolveAgentgatewayExposuremutates the mergedcomponentValuesin the unset case before render, at the same call site as the prior warning — ordering is correct, so overlay/componentRef/--set-jsonoverrides are all seen first.IsValidCIDRis now strict (netip.ParsePrefix+p == p.Masked()+ rejectIs4In6). This fail-fasts non-canonical CIDRs before a K8s apply failure and closes the IPv4-mapped-IPv6 (::ffff:…) any-source blind spot. No existing in-repo CIDR (all canonical) is broken by the stricter check.- The
bundle-defaults-privatechainsaw step asserts the three RFC1918 entries render and0.0.0.0/0does not, with anchored regex.
No blockers, no Majors. The remaining items are minor/discussion-level (see inline). Recommendation: approve — please consider a release-note callout for the open→private reachability change before merge.
Tier summary
| 🔴 Blocker | 🟠 Major | 🟡 Minor | 🔵 Nitpick |
|---|---|---|---|
| 0 | 0 | 2 | 2 |
Reviewed with a multi-persona + adversarial meta-review pass.
7ac34f0 to
a15308d
Compare
|
@coderabbitai full review |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
recipes/components/agentgateway/manifests/inference-gateway.yaml (1)
42-46: 🔒 Security & Privacy | 🟠 MajorKeep the safe default in the chart, not only in bundler code.
{{- if $agw.allowedSourceRanges }}still leaves directhelm template/ GitOps renders with the old open0.0.0.0/0behavior when the bundler hasn't prefilled values. If bundle-only enforcement is intentional, say so explicitly; otherwise seed the RFC1918 list here too so the chart is safe when rendered standalone.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@recipes/components/agentgateway/manifests/inference-gateway.yaml` around lines 42 - 46, The chart’s default source-range handling is still unsafe when rendered outside the bundler, so the safe RFC1918 restriction needs to live in the chart itself. Update the allowedSourceRanges logic in inference-gateway.yaml so standalone helm template/GitOps renders also seed the private ranges, using the existing $agw.allowedSourceRanges path and related Service rendering, rather than relying only on bundler injection. If bundle-only enforcement is intended, make that explicit in the chart comments and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@recipes/components/agentgateway/manifests/inference-gateway.yaml`:
- Around line 42-46: The chart’s default source-range handling is still unsafe
when rendered outside the bundler, so the safe RFC1918 restriction needs to live
in the chart itself. Update the allowedSourceRanges logic in
inference-gateway.yaml so standalone helm template/GitOps renders also seed the
private ranges, using the existing $agw.allowedSourceRanges path and related
Service rendering, rather than relying only on bundler injection. If bundle-only
enforcement is intended, make that explicit in the chart comments and behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 7ad7c357-158c-4324-a6ea-7baede73d2a2
📒 Files selected for processing (11)
docs/user/bundling.mddocs/user/cli-reference.mddocs/user/component-catalog.mdpkg/bundler/bundler.gopkg/bundler/bundler_test.gopkg/netutil/cidr.gopkg/netutil/cidr_test.gorecipes/components/agentgateway/manifests/inference-gateway.yamlrecipes/components/agentgateway/values.yamltests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml
njhensley
left a comment
There was a problem hiding this comment.
✅ Re-approval — a15308d
My prior approval (7ac34f0) was auto-dismissed by the new push. Re-reviewed the incremental diff 7ac34f0..a15308d: the changes are non-substantive improvements, no production-logic change.
pkg/bundler/bundler.goandpkg/netutil/cidr.goare byte-identical to the approved revision — the private-by-default injection, call-site ordering, and strictIsValidCIDR(canonical + rejectIs4In6) are unchanged.bundler_test.go: +2 cases — a[]stringscoped list passes, and a non-string list entry errors. These close the two previously-untested branches inclassifyAgentgatewaySourceRanges. 👍- chainsaw
bundle-defaults-private: +1 assertion that::/0is absent from the default-private manifest. Strengthens the check. component-catalog.md: intro reworded for clarity.
The earlier inline notes (🟡 bundler-only safe default / 🟡 open→private release-note callout / 🔵 RFC1918 breadth) still stand as non-blocking discussion items. Approving.
…sure The agentgateway inference-gateway defaulted open to the internet: when allowedSourceRanges was empty/unset, no loadBalancerSourceRanges was rendered and the LoadBalancer accepted 0.0.0.0/0. aicr bundle only printed a non-blocking warning. Make the gateway private by default. When agentgateway is included and allowedSourceRanges is empty/unset, aicr bundle now injects the private RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) into the generated Service's loadBalancerSourceRanges and records a bundle note. The deployed gateway is reachable from inside the cluster/VPC but denied to the public internet — it is never emitted open to 0.0.0.0/0 without an explicit opt-in. Kubernetes treats an empty loadBalancerSourceRanges as allow-all, so the safe default has to be a real list, not an empty one. - Empty/unset -> defaulted to RFC1918 private ranges (deny public), surfaced as a bundle note. Generation succeeds, so existing inference bundle pipelines (kwok lanes, runtime-install) keep working with no flag. - Scoped CIDR list -> passes silently. - Explicit any-source opt-in (0.0.0.0/0 / ::/0) -> allowed, logged loudly as a deliberate choice. - A bare-string --set, a non-list, an unparseable CIDR, or a non-canonical CIDR (e.g. 1.2.3.4/24) -> rejected with ErrCodeInvalidRequest. netutil.IsValidCIDR now enforces canonical form via netip.ParsePrefix (rejecting host-bits-set and leading-zero prefixes) so a bundle that generates does not later fail Kubernetes 1.36+ strict CIDR validation at Service apply time. Update the agentgateway chainsaw test to assert the private-by-default render; ai-conformance offline and aws/gcp/azure cuj2-inference tests pass an explicit scoped range. Refresh component-catalog and cli-reference docs. Fixes NVIDIA#1373 Review feedback: make the catalog exposure section internally consistent (the raw LoadBalancer is internet-facing; aicr bundle scopes it private by default), add an IPv6 ::/0 negative assertion to the chainsaw default-private check, and cover the []string and non-string-list-entry branches of classifyAgentgatewaySourceRanges.
a15308d to
f37406f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/bundler/bundler.go`:
- Around line 1163-1178: The default source-range list in
agentgatewayDefaultSourceRanges only covers IPv4 RFC1918 ranges, so
private-by-default behavior is incomplete for dual-stack or IPv6-only
environments. Update the default allowlist used by resolveAgentgatewayExposure()
to include an appropriate private IPv6 range alongside the existing IPv4
entries, while keeping the public opt-in behavior for ::/0 unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: b8acc835-914f-4a9d-be10-c5a28dbdb603
📒 Files selected for processing (11)
docs/user/bundling.mddocs/user/cli-reference.mddocs/user/component-catalog.mdpkg/bundler/bundler.gopkg/bundler/bundler_test.gopkg/netutil/cidr.gopkg/netutil/cidr_test.gorecipes/components/agentgateway/manifests/inference-gateway.yamlrecipes/components/agentgateway/values.yamltests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/chainsaw/bundle-templates/agentgateway/chainsaw-test.yaml
Summary
Make
aicr bundleprivate by default for the agentgateway inference-gateway: whenagentgatewayis included andallowedSourceRangesis empty/unset, the bundler injects the private RFC1918 ranges intoloadBalancerSourceRangesinstead of emitting a LoadBalancer open to0.0.0.0/0. The deployed gateway is reachable inside the cluster/VPC but denied to the public internet.Motivation / Context
The inference-gateway defaulted open to the internet — an unset
allowedSourceRangesrendered noloadBalancerSourceRanges, and Kubernetes treats that as allow-all.aicr bundleonly printed a non-blocking warning. An unauthenticated, plaintext-HTTP inference endpoint open to the world is a real exposure (flagged on multiple clusters).Fixes: #1373
Related: #1139, #1160, #1161
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*)pkg/netutildocs/)Implementation Notes
resolveAgentgatewayExposureclassifiesagentgateway.allowedSourceRangesand resolves it default-then-validate:agentgatewayDefaultSourceRanges(RFC1918:10.0.0.0/8,172.16.0.0/12,192.168.0.0/16) into the merged values + bundle note. Denies the public internet; keeps in-VPC reachability. Generation succeeds, so no-flag inference pipelines keep working.0.0.0.0/0/::/0) → allowed, logged loudly + bundle warning (deliberate opt-in).--set, non-list, unparseable, or non-canonical CIDR) →ErrCodeInvalidRequest.netutil.IsValidCIDRnow usesnetip.ParsePrefixand requires canonical form (p == p.Masked()) and rejects IPv4-mapped IPv6 prefixes (p.Addr().Is4In6()), so a CIDR like1.2.3.4/24,1.2.3.0/024, or::ffff:192.12.2.0/120is rejected at bundle time rather than failing Kubernetes 1.36+ strict CIDR validation at Service apply.The default is a named constant, so switching to a deny-all sentinel (
255.255.255.255/32) later is a one-line change. An internal-LB default (per-cloud annotations) is a possible follow-up (--internal), per #1373.Design choice (private RFC1918 vs fail-closed vs deny-all): Kubernetes has no native deny-all for a LoadBalancer (empty = allow-all), so the default must be a real list. RFC1918 is "secure but not too limited" — no public exposure, gateway usable inside the VPC, and
validate/conformance pass with no flags — without firewalling every deployment to one customer network. Note: the corporate VPN egresses from a public IP (216.228.127.128/30), so it is not covered by the RFC1918 default and still requires the flag — that's the intended private-vs-public separation.Resolves the CI-breakage findings from cross-review: because no-flag bundles now generate (private) rather than error,
runtime-install/generate-bundle.shand the kwok inference lanes are unaffected — no per-workflow--set-jsonneeded.Testing
Manual end-to-end with a freshly built binary on an
eks/h100/ubuntu/inference/dynamorecipe:10.0.0.0/8,172.16.0.0/12,192.168.0.0/16into the inference-gateway Service; bundle note recorded--set-json …='["216.228.127.128/30"]'→ renders only the corp CIDR0.0.0.0/0→ generates with a loud warningVerified on EKS cluster
aicr3: redeployingagentgateway-postfrom a scoped bundle correctly reconcilesloadBalancerSourceRangesonto the liveinference-gatewayService (sentinel-swap test confirmed the field is live-controlled).Updated chainsaw
bundle-templates/agentgatewayto assert the private-by-default render — all source-range checks match the renderedloadBalancerSourceRangeslist entries (not the surrounding comments), confirming RFC1918 present and0.0.0.0/0absent.ai-conformance/offlineanduat/{aws,gcp,azure}/cuj2-inferencenow exercise the default path (the redundant scoped--set-jsonwas removed). Docs updated:component-catalog.md,cli-reference.md,bundling.md, plus the inline comments inrecipes/components/agentgateway/values.yamlandmanifests/inference-gateway.yaml.Coverage:
netutil.IsValidCIDRcovered incidr_test.go(incl. non-canonical rejection);resolveAgentgatewayExposure/classifyAgentgatewaySourceRangescovered by the rewrittenTestResolveAgentgatewayExposuretable (default-injection, scoped, any-source, invalid, non-canonical).Risk Assessment
Rollout notes: No-flag inference bundles now deploy a private gateway instead of a public one. Operators who need public/VPN access pass
--set-json agentgateway:allowedSourceRanges='["<cidr>"]'(corp VPN:216.228.127.128/30) or'["0.0.0.0/0"]'for deliberate public exposure.Release note (behavior change — inference-gateway open → private): The bundled
agentgatewayinference-gateway is now private by default. Operators who currently reach the endpoint from a public IP (corporate VPN egress, a public client) will be blocked on upgrade until they opt back in. Remediation: regenerate the bundle with--set-json agentgateway:allowedSourceRanges='["<your-cidr>"]'(or'["0.0.0.0/0"]'to restore the previous open behavior). This must be called out in the release notes.Checklist
-raceviamake testscope)git commit -S)