sec(indexing): pin DNS resolution in mem_fetch to defeat rebinding#794
Merged
sec(indexing): pin DNS resolution in mem_fetch to defeat rebinding#794
Conversation
`_validate_url` resolved the hostname to gate private-IP targets, then let httpx re-resolve at connect time — an attacker controlling DNS can flip the answer to a private IP between those two calls (CWE-918 SSRF via DNS rebinding). Pin the resolution: `_validate_and_pin` returns the hostname's first public IP; the TCP connection goes to that IP literal, with the original hostname preserved on the wire via the `Host` header and the `sni_hostname` httpx extension so vhosting + TLS cert validation still work. Each redirect hop is re-validated and re-pinned, and any resolution that returns a mix of public + private IPs is refused. Same path also gets: - 50 MB streaming response cap (Content-Length pre-check + per-chunk accumulator) so an unbounded chunked body can't exhaust memory. - Connection keepalive disabled — with an IP-literal origin, two requests to different hostnames could reuse a TLS session negotiated with the first SNI, silently serving traffic over the wrong cert. - `has_redirect_location` (not `is_redirect`) gates the redirect branch. httpx 0.28's `is_redirect` is True for the entire 3xx range, so 304 / Location-less 3xx previously slipped through as a silent empty-body fetch. `tests/test_ssrf_pinning.py` adds rebinding sims, IP-literal edge cases (IPv4-mapped IPv6, decimal-encoded IPv4), redirect-to-private, size cap, and a mutation-validation test that removes `_rewrite_to_pinned_ip` to prove the rebinding assertion isn't vacuous. Co-Authored-By: Claude <[email protected]>
The DNS-pinning rewrite used `urlparse(...).hostname` to build the Host header, which strips IPv6 brackets — `http://[2001:db8::1]:8080/` was emitting `Host: 2001:db8::1:8080`, ambiguous between IP+port and a longer IPv6 and rejected as malformed by RFC 7230 §5.4 strict servers. Bracket the host whenever it contains a colon (the unambiguous IPv6 marker; DNS hostnames cannot contain `:` per RFC 952/1123). End-to-end check via MockTransport: hostname input that resolves to IPv6 produces `Host: hostname:port`, IPv6-literal input produces `Host: [v6]:port`. Co-Authored-By: Claude <[email protected]>
Two related correctness gaps in the DNS-pinning request builder: 1. `_host_header` returned `urlparse(...).hostname` directly. For IDN URLs like `https://bücher.example/`, that yields a Unicode Host header — httpx's header normalisation then raises UnicodeEncodeError before the request even leaves the client. Same problem for the `sni_hostname` extension. RFC 3490 (IDNA) is the standard fix: encode DNS hostnames to ASCII (`bücher.example` → `xn--bcher-kva .example`) for both Host and SNI. We also IDNA-encode before passing to `socket.getaddrinfo` so Unicode IDNs resolve portably. 2. RFC 6066 §3 forbids IP literals as TLS SNI. The previous code forced `sni_hostname` for every HTTPS URL, including `https://1.2. 3.4/` and `https://[2001:db8::1]/`. Now we only override SNI for DNS hostnames; for IP-literal URLs we omit the extension and let httpcore fall back to the rewritten origin host (the pinned IP), which matches what httpx does natively for IP-literal HTTPS URLs. Tests in `test_ssrf_pinning.py` cover IDN URL (Host + SNI both ASCII), IDN with port, IPv4-literal HTTPS (no SNI), and IPv6-literal HTTPS (bracketed Host + no SNI). End-to-end via MockTransport: pre-fix the IDN case raised UnicodeEncodeError; post-fix the wire form is xn--bcher-kva.example. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-4 of the security hardening plan: pin DNS resolution in
mem_fetchso a TOCTOU rebinding attack between validate and connect can no longer reach private/internal addresses (CWE-918).The pre-existing
_validate_urlresolved the hostname to gate private targets and then dropped the IP — httpx re-resolved at connect time, giving an attacker who controls DNS a window to swing the answer to a private IP. The new_validate_and_pinkeeps the IP and the actual TCP connection goes to that IP literal. The original hostname is preserved on the wire via theHostheader and thesni_hostnamehttpx extension, so vhosting + TLS cert validation still work.Same path also gets:
has_redirect_locationinstead ofis_redirect—is_redirectis True across the entire 3xx range in httpx 0.28, so 304 / Location-less 3xx previously slipped through as a silent empty-body fetch. Now they raiseHTTPStatusError. (Caught by the Codex review pass.)The legacy
_validate_url(url) -> strkeeps the old fail-open-on-gaierror behaviour for back-compat with existing tests; the new pinning helper fails closed.Threat model
HTTPStatusErrorTests
tests/test_ssrf_pinning.py(26 cases) covers:httpx.MockTransport).::ffff:127.0.0.1), decimal-encoded IPv4 (http://2130706433/), with paired positive markers (public IPv4/IPv6) so the block isn't a tautology._rewrite_to_pinned_ipto identity, then asserts the rebinding test fails — proves the pinning assertion isn't vacuous.The legacy 28 tests in
test_ssrf.pyandtest_url_fetcher.pycontinue to pass unchanged.Test plan
uv run pytest packages/memtomem/tests/test_ssrf.py packages/memtomem/tests/test_ssrf_pinning.py packages/memtomem/tests/test_url_fetcher.py -q→ 57 passeduv run pytest -m "not ollama" -q→ 4106 passed, 11 skippeduv run ruff check packages/memtomem/src packages/memtomem/tests→ cleanuv run ruff format --check packages/memtomem/src packages/memtomem/tests→ cleanuv run mypy packages/memtomem/src/memtomem/indexing/url_fetcher.py→ cleanawait fetch_url("https://example.com/", tmp)→ 220 B md with frontmatterOut of scope
.envsecret rotation — see plan doc, separate tracks.mem_fetch; if a proxy is introduced later the pinning recipe will need a re-think (proxy CONNECT to IP vs. hostname).🤖 Generated with Claude Code