Conversation
5c395d7 to
b631246
Compare
pires
left a comment
There was a problem hiding this comment.
Thank you for your contribution. Can you, please, move the test over to protocol_test.go and change in a way that it fails if n < payloadSize?
Also, please, make sure linters are happy 🙏🏻
| } | ||
|
|
||
| if n < len(b) { | ||
| nn, err := p.conn.Read(b[n:]) |
There was a problem hiding this comment.
IIUC this blocks waiting for the buffer to be filled which is somewhat of a violation of the io.Reader contract:
If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.
The bufio.Reader handled this.
The bytes are taken from at most one Read on the underlying Reader, hence n may be less than len(p). To read exactly len(p) bytes, use io.ReadFull(b, p).
There was a problem hiding this comment.
I don't think it blocks more than what the previous code was doing. with a multireader, assuming you have no buffered data, the MultiReader would have received an EOF from the first reader (bufReader) and would then have called the next reader (reader) with a Read(b) call (which might have blocked as well).
There was a problem hiding this comment.
IIUC a bufio.Reader backed by an open net.Conn will only return io.EOF on .Read(b) when the net.Conn is closed.
| nn, err := p.conn.Read(b[n:]) | ||
| return n + nn, err | ||
| } | ||
| return n, nil |
There was a problem hiding this comment.
Following my comments from above, I think this logic can be improved
// Drain the buffer.
if p.bufReader != nil && p.bufReader.Buffered() > 0 {
n, err := p.bufReader.Read(b)
// If we got ANY data, return immediately. Do not block on conn.
// We do not return EOF here because the underlying conn might have more.
if err == io.EOF {
p.bufReader = nil // Done with buffer
return n, nil // Return n, not EOF, so next Read hits p.conn
}
return n, err
}
// We're done with the buffer.
// From now on, read directly from the underlying connection.
return p.conn.Read(b)There was a problem hiding this comment.
this version does not fix the bug.
my expectation, e.g. in ingress-nginx controller, when calling this library with a 16KiB buffer and using conn.Read(b), is that at most 16KiB will be read in a single call, and most importantly, that if say 2KiB are available on the connection (having been previously buffered or not), those 2KiB will end up in my buffer.
I don't want to use a io.ReadFull in the upstream code to achieve that, as I don't want necessarily need to fill the entire 16KiB buffer.
perhaps my expectation is incorrect, but I think the following code in ingress-nginx should not need a modification:
https://github.com/kubernetes/ingress-nginx/blob/a3088e571b36fee8052d550b7bc8c1c93bf7a1e7/pkg/tcpproxy/tcp.go#L60-L69
There was a problem hiding this comment.
In #142 (comment) you are agreeing with the assessment that conn.Read(b) returns whatever is available and doesn't block until b is full. What happened when the io.MultiReader was implemented was that we stopped piping the underlying conn directly to the library consumer because the buffer reads would never return EOF, hence the io.MultiReader would not iterate to the next io.Reader, namely the underlying conn.
There was a problem hiding this comment.
The logic I proposed above is incorrect. Here's the revised one:
// Drain the buffer if it exists and has data.
if p.bufReader != nil {
if p.bufReader.Buffered() > 0 {
n, err := p.bufReader.Read(b)
// Did we empty the buffer?
// Buffering a net.Conn means the buffer doesn't return io.EOF until the connection returns io.EOF.
// Therefore, we use Buffered() == 0 to detect if we are done with the buffer.
if p.bufReader.Buffered() == 0 {
// Garbage collect the buffer.
p.bufReader = nil
}
// Return immediately. Do not touch p.conn.
// If err is EOF here, it means the connection is actually closed,
// so we should return that error to the user anyway.
return n, err
}
// If buffer was empty to begin with (shouldn't happen with the >0 check
// but good for safety), clear it.
p.bufReader = nil
}
// From now on, read directly from the underlying connection.
return p.conn.Read(b)There was a problem hiding this comment.
Ok I see your point about the MultiReader which never iterated to the underlying conn, so that data was always read in 256bytes chunks. that was not what I was bothered with in the first place.
this 2nd version is fixing that aspect, but it's still making this library act in a non-transparent way w.r.t. the user
1b1d58a to
3ca07d1
Compare
|
I've modified the code to add the
what do you think? it's not an easy question 🤔 |
there's a severe bug in go-proxyproto, making it so that Conns wrapped by this library provide truncated data to the upstream. the core issue is that a MultiReader only concatenates it's Input Readers when those return an EOF. If an underlying input reader returns a number of bytes (because data was buffered, e.g. after a Peek()), then the MultiReader abstraction will not read data from the other InputReaders. this e.g. impacts ingress-nginx v1.14.x and breaks TLS passtrough in some scenarios, typically when the combined PROXY protocol header and TLS header are > 256 bytes. here's the core issue at play on the go playground: https://go.dev/play/p/wd2VVCvOPb_7
remove the multireader abstraction and modify conn.Read() so that we start by draining buffered data (from the header parsing), then reading directly from the underlying connection in a single Read() call fixes data loss in TLS passthrough scenarios (e.g. ingress-nginx) when PROXY header + TLS ClientHello exceeds the 256-byte buffer.
3ca07d1 to
979a453
Compare
|
I've been thinking about that since yesterday, trying to argue for implementing a fix on the user/caller side (nginx-controller) or in this library. here are my thoughts (with some help from claude.ai for the formulation) for implementing my fix:
TLDR; |
|
I share the same expectations and that's what I'm proposing in #142 (comment). |
| _ = conn.SetReadDeadline(time.Now().Add(time.Second)) | ||
|
|
||
| buf := make([]byte, 16384) | ||
| n, _ := conn.Read(buf) |
There was a problem hiding this comment.
This is where our understanding of io.Reader.Read(b) contract is different. In my understanding, Read(b) returns available data immediately instead of waiting to fill b. A way to ensure b is filled is to rely on io.ReadFull.
There was a problem hiding this comment.
that is the case for a typical reader. here however your library is acting as a transparent "pipe" between an underlying conn from which we abstract away the proxy protocol header. therefore I would think it's reasonable to also transparently forward the Read() call to the underlying Conn, as if the library hadn't done anything.
also, if b is a 16KiB buffer and I use conn.Read(b), it's not going to wait until b is filled: if 4KiB are ready (typical linux socket buffer size), then it'll receive 4096, nil.
There was a problem hiding this comment.
I mislead you for some weird reason. The theoretical maximum size of a v2 header is 65,551 bytes (16 bytes fixed + 65,535 bytes variable).
|
closed as we are likely not going to break io.Reader contract |
this PR removes the multireader abstraction and modify conn.Read() so that we
start by draining buffered data (from the header parsing), then reading
directly from the underlying connection in a single Read() call, instead of the current behavior where if a larger bug is provided, only buffered data will be written to that buffer.
fixes data loss in TLS passthrough scenarios (e.g. ingress-nginx)
when PROXY header + TLS ClientHello exceeds the 256-byte buffer (cf kubernetes/ingress-nginx#14489)
this PR also adds a test to cover that exact issue
@pires