Skip to content

sec(indexing): pin DNS resolution in mem_fetch to defeat rebinding#794

Merged
memtomem merged 3 commits intomainfrom
sec/url-fetcher-dns-pin
May 5, 2026
Merged

sec(indexing): pin DNS resolution in mem_fetch to defeat rebinding#794
memtomem merged 3 commits intomainfrom
sec/url-fetcher-dns-pin

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 5, 2026

Summary

PR-4 of the security hardening plan: pin DNS resolution in mem_fetch so a TOCTOU rebinding attack between validate and connect can no longer reach private/internal addresses (CWE-918).

The pre-existing _validate_url resolved the hostname to gate private targets and then dropped the IP — httpx re-resolved at connect time, giving an attacker who controls DNS a window to swing the answer to a private IP. The new _validate_and_pin keeps the IP and the actual TCP connection goes to that IP literal. The original hostname is preserved on the wire via the Host header and the sni_hostname httpx extension, so vhosting + TLS cert validation still work.

Same path also gets:

  • Streaming 50 MB body cap (Content-Length pre-check + per-chunk accumulator) — closes a small memory-exhaustion footgun in the same fetcher.
  • Keepalive disabled — with an IP-literal origin, two requests to different hostnames that resolve to the same IP could reuse a TLS session negotiated with the first SNI, silently serving traffic over the wrong cert. A fresh TCP+TLS per request avoids that.
  • has_redirect_location instead of is_redirectis_redirect is True across the entire 3xx range in httpx 0.28, so 304 / Location-less 3xx previously slipped through as a silent empty-body fetch. Now they raise HTTPStatusError. (Caught by the Codex review pass.)

The legacy _validate_url(url) -> str keeps the old fail-open-on-gaierror behaviour for back-compat with existing tests; the new pinning helper fails closed.

Threat model

Scenario Pre-fix Post-fix
Attacker DNS returns public IP at validate, private at connect Reaches private IP Connection goes to validated IP, hostname preserved
Attacker DNS returns mix of public + private IPs in one answer Could pick private at connect Whole resolution refused
Hostname is itself a decimal-encoded / hex-encoded private IP literal Platform-dependent Blocked at parse + DNS layers
Redirect points to a hostname that resolves to a private IP Followed Each hop re-validated, blocked
Server returns 60 MB chunked response Loaded into memory Aborted mid-stream with size-cap error
Server returns 304 / 302 with no Location Silent empty-body file Raises HTTPStatusError

Tests

tests/test_ssrf_pinning.py (26 cases) covers:

  • DNS rebinding sims (mixed result + second-resolve-flip with httpx.MockTransport).
  • IP-literal edge cases: IPv4-mapped IPv6 (::ffff:127.0.0.1), decimal-encoded IPv4 (http://2130706433/), with paired positive markers (public IPv4/IPv6) so the block isn't a tautology.
  • URL/Host/SNI rewriting (port, IPv6 brackets, userinfo).
  • Redirect-to-private blocked + redirect-to-public followed (positive marker).
  • Response size cap on Content-Length and on streamed body.
  • 304 / 302-no-Location / too-many-redirects branches.
  • Mutation-validation test: stubs _rewrite_to_pinned_ip to identity, then asserts the rebinding test fails — proves the pinning assertion isn't vacuous.

The legacy 28 tests in test_ssrf.py and test_url_fetcher.py continue to pass unchanged.

Test plan

  • uv run pytest packages/memtomem/tests/test_ssrf.py packages/memtomem/tests/test_ssrf_pinning.py packages/memtomem/tests/test_url_fetcher.py -q → 57 passed
  • uv run pytest -m "not ollama" -q → 4106 passed, 11 skipped
  • uv run ruff check packages/memtomem/src packages/memtomem/tests → clean
  • uv run ruff format --check packages/memtomem/src packages/memtomem/tests → clean
  • uv run mypy packages/memtomem/src/memtomem/indexing/url_fetcher.py → clean
  • Live network smoke: await fetch_url("https://example.com/", tmp) → 220 B md with frontmatter
  • Codex review pass — surfaced the 304 / Location-less 3xx regression; fixed in the same branch.

Out of scope

  • PII pattern coverage, multi-agent ACL, .env secret rotation — see plan doc, separate tracks.
  • HTTP/HTTPS proxy semantics: not currently used by mem_fetch; if a proxy is introduced later the pinning recipe will need a re-think (proxy CONNECT to IP vs. hostname).

🤖 Generated with Claude Code

pandas-studio and others added 3 commits May 5, 2026 15:12
`_validate_url` resolved the hostname to gate private-IP targets, then
let httpx re-resolve at connect time — an attacker controlling DNS can
flip the answer to a private IP between those two calls (CWE-918 SSRF
via DNS rebinding).

Pin the resolution: `_validate_and_pin` returns the hostname's first
public IP; the TCP connection goes to that IP literal, with the
original hostname preserved on the wire via the `Host` header and the
`sni_hostname` httpx extension so vhosting + TLS cert validation still
work. Each redirect hop is re-validated and re-pinned, and any
resolution that returns a mix of public + private IPs is refused.

Same path also gets:
- 50 MB streaming response cap (Content-Length pre-check + per-chunk
  accumulator) so an unbounded chunked body can't exhaust memory.
- Connection keepalive disabled — with an IP-literal origin, two
  requests to different hostnames could reuse a TLS session negotiated
  with the first SNI, silently serving traffic over the wrong cert.
- `has_redirect_location` (not `is_redirect`) gates the redirect
  branch. httpx 0.28's `is_redirect` is True for the entire 3xx range,
  so 304 / Location-less 3xx previously slipped through as a silent
  empty-body fetch.

`tests/test_ssrf_pinning.py` adds rebinding sims, IP-literal edge
cases (IPv4-mapped IPv6, decimal-encoded IPv4), redirect-to-private,
size cap, and a mutation-validation test that removes
`_rewrite_to_pinned_ip` to prove the rebinding assertion isn't vacuous.

Co-Authored-By: Claude <[email protected]>
The DNS-pinning rewrite used `urlparse(...).hostname` to build the
Host header, which strips IPv6 brackets — `http://[2001:db8::1]:8080/`
was emitting `Host: 2001:db8::1:8080`, ambiguous between IP+port and a
longer IPv6 and rejected as malformed by RFC 7230 §5.4 strict servers.

Bracket the host whenever it contains a colon (the unambiguous IPv6
marker; DNS hostnames cannot contain `:` per RFC 952/1123). End-to-end
check via MockTransport: hostname input that resolves to IPv6 produces
`Host: hostname:port`, IPv6-literal input produces `Host: [v6]:port`.

Co-Authored-By: Claude <[email protected]>
Two related correctness gaps in the DNS-pinning request builder:

1. `_host_header` returned `urlparse(...).hostname` directly. For IDN
   URLs like `https://bücher.example/`, that yields a Unicode Host
   header — httpx's header normalisation then raises UnicodeEncodeError
   before the request even leaves the client. Same problem for the
   `sni_hostname` extension. RFC 3490 (IDNA) is the standard fix:
   encode DNS hostnames to ASCII (`bücher.example` → `xn--bcher-kva
   .example`) for both Host and SNI. We also IDNA-encode before passing
   to `socket.getaddrinfo` so Unicode IDNs resolve portably.

2. RFC 6066 §3 forbids IP literals as TLS SNI. The previous code
   forced `sni_hostname` for every HTTPS URL, including `https://1.2.
   3.4/` and `https://[2001:db8::1]/`. Now we only override SNI for
   DNS hostnames; for IP-literal URLs we omit the extension and let
   httpcore fall back to the rewritten origin host (the pinned IP),
   which matches what httpx does natively for IP-literal HTTPS URLs.

Tests in `test_ssrf_pinning.py` cover IDN URL (Host + SNI both ASCII),
IDN with port, IPv4-literal HTTPS (no SNI), and IPv6-literal HTTPS
(bracketed Host + no SNI). End-to-end via MockTransport: pre-fix the
IDN case raised UnicodeEncodeError; post-fix the wire form is
xn--bcher-kva.example.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit b8630f5 into main May 5, 2026
11 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants