Skip to content

fix(conn): ensure ReadTimeout applies to each conn.Read#1612

Closed
GeorgeMac wants to merge 9 commits intoClickHouse:mainfrom
GeorgeMac:gm/read-timeout
Closed

fix(conn): ensure ReadTimeout applies to each conn.Read#1612
GeorgeMac wants to merge 9 commits intoClickHouse:mainfrom
GeorgeMac:gm/read-timeout

Conversation

@GeorgeMac
Copy link
Copy Markdown
Member

@GeorgeMac GeorgeMac commented Jul 20, 2025

Summary

Fixes #1607

This PR addresses the issue of ReadTimeout currently only applying to the first block returned on a query to ClickHouse.

For the Changelog:

Fix `ReadTimeout` so that it applies to each call to `conn.Read` not just the first read block

The first change in this PR introduces lib/proto updates to support some of the missing client encoding and server decoding functions for CH packets.
This was so that I could implement a fake CH server where we can define our own handlers:
https://github.com/GeorgeMac/clickhouse-go/blob/37404af3d6637594d79bf3345a20970ddba01c31/lib/testing/server.go#L57-L85

Then I used this fake server in a couple unit tests. The first a simple happy path SELECT 1, mostly to ensure that the fake server was working as expected alongside the client.

The second is the case which demonstrates the problem in the issue.
When the test suite is run with a timeout before the fix in 299fe0c:

=== RUN   TestConn_Query_ReadTimeout
time=2025-07-20T08:48:50.671+01:00 level=DEBUG msg="Packet type" type=0
time=2025-07-20T08:48:50.671+01:00 level=DEBUG msg="Handling handshake" handshake="clickhouse-go/2.37.2 (lv:go/1.24.4; os:darwin) 2.37.0" auth="{database: username:default password:}"
time=2025-07-20T08:48:50.671+01:00 level=DEBUG msg="Packet type" type=4
time=2025-07-20T08:48:50.671+01:00 level=DEBUG msg="Packet type" type=1
time=2025-07-20T08:48:50.671+01:00 level=DEBUG msg="Handling query" query="&{ID: ClientName:clickhouse-go/2.37.2 (lv:go/1.24.4; os:darwin) ClientVersion:2.37.0 ClientTCPProtocolVersion:54460 Span:{traceID:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] spanID:[0 0 0 0 0 0 0 0] traceFlags:0 traceState:{list:[]} remote:false} Body:SELECT 1 QuotaKey: Settings:[] Parameters:[] Compression:false InitialUser: InitialAddress:[::1]:53128}" blocks=[]
=== RUN   TestConn_Query_ReadTimeout/first_row_is_returned
=== RUN   TestConn_Query_ReadTimeout/second_row_timeout
panic: test timed out after 10s
        running tests:
                TestConn_Query_ReadTimeout (10s)
                TestConn_Query_ReadTimeout/second_row_timeout (10s)

Which nicely demonstrates in the panic output where the client blocks on rows.Next():

goroutine 5 [select]:
github.com/ClickHouse/clickhouse-go/v2.(*rows).Next(0x1400001e120)
        /Users/georgemacrorie/github/ClickHouse/clickhouse-go/clickhouse_rows.go:52 +0x118
github.com/ClickHouse/clickhouse-go/v2.TestConn_Query_ReadTimeout.func4(0x14000003340)
        /Users/georgemacrorie/github/ClickHouse/clickhouse-go/conn_test.go:119 +0x34
testing.tRunner(0x14000003340, 0x1400000c060)
        /opt/homebrew/Cellar/go/1.24.4/libexec/src/testing/testing.go:1792 +0xe4
created by testing.(*T).Run in goroutine 38
        /opt/homebrew/Cellar/go/1.24.4/libexec/src/testing/testing.go:1851 +0x374
FAIL    github.com/ClickHouse/clickhouse-go/v2  10.349s
FAIL

Checklist

Delete items not relevant to your PR:

@GeorgeMac GeorgeMac changed the title fix(conn): ensure ReadTimeout applies to each conn.Read and not just the first block fix(conn): ensure ReadTimeout applies to each conn.Read Jul 20, 2025
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
})
t.Run("query which returns in time", func(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the reviewer:

This particular context driven deadline is now being checked on each read down in the net stack (it is not being removed after first read) this test became unstable. I tweaked the timeout running the tests with -count 10 until I could get something consistent. Three seconds seemed enough to process all of the response.

@SpencerTorres
Copy link
Copy Markdown
Member

Thank you for reporting and submitting this, this is the correct fix.

However, the test server is A LOT of code to be adding for this one patch. It's a lot of code that will need to be maintained, and I would prefer to find a way to test this through the real ClickHouse server. I fear long term that something will break with the test server and then we have to update both, or vice versa.

Ultimately what we need is a sleepNthBlock style setting or debug function in ClickHouse. I already tried reproducing this behavior through sleep and sleepEachRow, but there's no way to tell the server to send the first block fast and the second block slower.

Considering the fix is incredibly simple, I decided to just merge the related solution, and we can create an issue to improve testing on this later. We already have some code that tests the read timeout setting and context deadline, so it's not like this is a completely untested feature.

Thank you for the solution and detailed comments + PR! I will get this released asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReadTimeout appears to apply to the first read block only

2 participants