Conversation
and optimize memory usage. This commit addresses a critical regression and improves memory efficiency in the `httputil` pkg. Changes: * Previously, it returned a slice referencing a buffer that was immediately returned to the pool, leading to data corruption (use-after-free). It now allocates a safe, independent copy. * Make `Close()` idempotent. Calling `Close()` multiple times no longer causes a panic. * Optimize `FullResponseString` to use zero-copy conversion from the safe byte slice. * Optimize `limitedBuffer` to use a `sync.Pool` for temp 32KB (via `chunkSize` = `DefaultChunkSize`) read chunks. Signed-off-by: Dwi Siswanto <[email protected]>
Signed-off-by: Dwi Siswanto <[email protected]>
| // The returned slice is valid only until Close() is called. | ||
| // Note: This creates a new buffer internally which is returned to the pool. | ||
| // The returned slice is a copy and remains valid even after Close() is called. | ||
| func (r *ResponseChain) FullResponseBytes() []byte { |
There was a problem hiding this comment.
Here (and (*ResponseChain).FullResponseString()), the inconsistency is intentional -- those are now a "safe copy". The previous implementation (#700, yeah, my bad) failed to manage the lifecycle correctly (returning a ptr to a buffer that was immediately freed), which caused the race condition.
And since (*ResponseChain).fullResponse field was removed and headers+body are now stored in separate buffers, we must (forced anyway) alloc new memory to concat them -- rather than returning a fragile view into a temp pooled buffer (which is racy).
tl;dr: (*ResponseChain).FullResponse* is now a copy because physics demands it (non-contiguous source data).
There was a problem hiding this comment.
Shall we retract v0.7.0, @Mzack9999?
EDIT:
I don’t think we need to retract that since the issues are in the new API(s) anyway.
|
Bench: func BenchmarkResponseChain_FullResponse(b *testing.B) {
body := bytes.Repeat([]byte("B"), 1024*1024) // 1MB
resp := &http.Response{
StatusCode: 200,
Status: "200 OK",
Body: io.NopCloser(bytes.NewReader(body)),
Header: http.Header{
"Content-Type": []string{"text/plain"},
"Server": []string{"Benchmark"},
},
Proto: "HTTP/1.1",
ProtoMajor: 1,
ProtoMinor: 1,
}
rc := NewResponseChain(resp, -1)
_ = rc.Fill()
defer rc.Close()
b.Run("String", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.FullResponseString()
}
})
b.Run("String-Concat", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.HeadersString() + rc.BodyString()
}
})
b.Run("Bytes", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.FullResponseBytes()
}
})
b.Run("Bytes-Concat", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
h := rc.HeadersBytes()
bo := rc.BodyBytes()
buf := make([]byte, len(h)+len(bo))
copy(buf, h)
copy(buf[len(h):], bo)
}
})
}$ go test -run - -bench=BenchmarkResponseChain_FullResponse -benchmem -count 10 ./http | tee /tmp/fullresp
$ cat /tmp/fullresp | grep "/String-16" | tee /tmp/fullresp-string
$ cat /tmp/fullresp | grep "/Bytes-16" | tee /tmp/fullresp-bytes
$ cat /tmp/fullresp | grep "/String-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-string-concat
$ cat /tmp/fullresp | grep "/Bytes-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-bytes-concat$ benchstat fullresp-string fullresp-string-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
│ fullresp-string │ fullresp-string-concat │
│ sec/op │ sec/op vs base │
ResponseChain_FullResponse/String-16 529.3µ ± 24% 598.3µ ± 21% ~ (p=0.853 n=10)
│ fullresp-string │ fullresp-string-concat │
│ B/op │ B/op vs base │
ResponseChain_FullResponse/String-16 1.008Mi ± 0% 1.008Mi ± 0% ~ (p=0.169 n=10)
│ fullresp-string │ fullresp-string-concat │
│ allocs/op │ allocs/op vs base │
ResponseChain_FullResponse/String-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
$ benchstat fullresp-bytes fullresp-bytes-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
│ fullresp-bytes │ fullresp-bytes-concat │
│ sec/op │ sec/op vs base │
ResponseChain_FullResponse/Bytes-16 557.9µ ± 10% 564.0µ ± 21% ~ (p=1.000 n=10)
│ fullresp-bytes │ fullresp-bytes-concat │
│ B/op │ B/op vs base │
ResponseChain_FullResponse/Bytes-16 1.008Mi ± 0% 1.008Mi ± 0% ~ (p=1.000 n=10)
│ fullresp-bytes │ fullresp-bytes-concat │
│ allocs/op │ allocs/op vs base │
ResponseChain_FullResponse/Bytes-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal |
fix(httputil): racy in
ResponseChainand optimize memory usage.
This commit addresses a critical regression and
improves memory efficiency in the
httputilpkg.Changes:
buffer that was immediately returned to the
pool, leading to data corruption
(use-after-free). It now allocates a safe,
independent copy.
Close()idempotent. CallingClose()multiple times no longer causes a panic.
FullResponseStringto use zero-copyconversion from the safe byte slice.
limitedBufferto use async.Poolfor temp 32KB (via
chunkSize=DefaultChunkSize) read chunks.Additional context:
p.s. (prior to this) behavior was observed in https://github.com/projectdiscovery/nuclei/actions/runs/19680155957/job/56400995022?pr=6629 -- that the perf-regression job hung indefinitely, so I canceled manually (after 2h+, lol, I didn't notice) -- also confirmed locally;
nuclei> make build-test; timeout 5m ./bin/nuclei.test -test.run - -test.bench=. -test.benchmem ./cmd/nuclei/.