Fix possible issues with MySQL client protocol TLS connections#65938
Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom Jul 1, 2024
Merged
Fix possible issues with MySQL client protocol TLS connections#65938alexey-milovidov merged 1 commit intoClickHouse:masterfrom
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
Conversation
Member
|
This is an automated comment for commit 9c7db67 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
Occasionally, 02479_mysql_connect_to_self fails on CI [1]. [1]: ClickHouse#50911 The problem was indeed query profiler and EINTR, but not in a way you may think. For such failures you may see the following trace in trace_log: contrib/openssl/crypto/bio/bss_sock.c:127::sock_read contrib/openssl/crypto/bio/bio_meth.c:121::bread_conv contrib/openssl/crypto/bio/bio_lib.c:285::bio_read_intern contrib/openssl/crypto/bio/bio_lib.c:311::BIO_read contrib/openssl/ssl/record/methods/tls_common.c:398::tls_default_read_n contrib/openssl/ssl/record/methods/tls_common.c:575::tls_get_more_records contrib/openssl/ssl/record/methods/tls_common.c:1122::tls_read_record contrib/openssl/ssl/record/rec_layer_s3.c:645::ssl3_read_bytes contrib/openssl/ssl/s3_lib.c:4527::ssl3_read_internal contrib/openssl/ssl/s3_lib.c:4551::ssl3_read contrib/openssl/ssl/ssl_lib.c:2343::ssl_read_internal contrib/openssl/ssl/ssl_lib.c:2357::SSL_read contrib/mariadb-connector-c/libmariadb/secure/openssl.c:729::ma_tls_read contrib/mariadb-connector-c/libmariadb/ma_tls.c:90::ma_pvio_tls_read contrib/mariadb-connector-c/libmariadb/ma_pvio.c:250::ma_pvio_read contrib/mariadb-connector-c/libmariadb/ma_pvio.c:297::ma_pvio_cache_read contrib/mariadb-connector-c/libmariadb/ma_net.c:373::ma_real_read contrib/mariadb-connector-c/libmariadb/ma_net.c:427::ma_net_read contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:192::ma_net_safe_read contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2138::mthd_my_read_query_result contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2212::mysql_real_query src/Common/mysqlxx/Query.cpp:56::mysqlxx::Query::executeImpl() src/Common/mysqlxx/Query.cpp:73::mysqlxx::Query::use() src/Processors/Sources/MySQLSource.cpp:50::DB::MySQLSource::Connection::Connection() After which the connection will fail with: Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION) But, if you will take a look at ma_tls_read() you will see that it has proper retries for SSL_ERROR_WANT_READ (and EINTR is just a special case of it), but still, for some reason it fails. And the reason is the units of the read/write timeout, ma_tls_read() calls poll(read_timeout) in case of SSL_ERROR_WANT_READ, but, it incorrectly assume that the timeout is in milliseconds, but that timeout was in seconds, this bug had been fixed in [2], and now it works like a charm! [2]: ClickHouse/mariadb-connector-c#17 I've verified it with patching openssl library: diff --git a/crypto/bio/bss_sock.c b/crypto/bio/bss_sock.c index 82f7be85ae..3d2f3926a0 100644 --- a/crypto/bio/bss_sock.c +++ b/crypto/bio/bss_sock.c @@ -124,7 +124,24 @@ static int sock_read(BIO *b, char *out, int outl) ret = ktls_read_record(b->num, out, outl); else # endif - ret = readsocket(b->num, out, outl); + { + static int i = 0; + static int j = 0; + if (!(++j % 5)) + { + fprintf(stderr, "sock_read: inject EAGAIN with ret=0\n"); + ret = 0; + errno = EAGAIN; + } + else if (!(++i % 3)) + { + fprintf(stderr, "sock_read: inject EAGAIN with ret=-1\n"); + ret = -1; + errno = EAGAIN; + } + else + ret = readsocket(b->num, out, outl); + } BIO_clear_retry_flags(b); if (ret <= 0) { if (BIO_sock_should_retry(ret)) And before this patch (well, not the patch itself, but the referenced patch in mariadb-connector-c) if you will pass read_write_timeout=1 it will fail: SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1) Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION) But after, it always works: $ ch benchmark -c10 -q "SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connection_pool_size=1, connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1)" ^CStopping launch of queries. SIGINT received. Queries executed: 478. localhost:9000, queries: 478, QPS: 120.171, RPS: 120.171, MiB/s: 0.001, result RPS: 120.171, result MiB/s: 0.001. 0.000% 0.014 sec. 10.000% 0.058 sec. 20.000% 0.065 sec. 30.000% 0.073 sec. 40.000% 0.079 sec. 50.000% 0.087 sec. 60.000% 0.089 sec. 70.000% 0.091 sec. 80.000% 0.095 sec. 90.000% 0.100 sec. 95.000% 0.102 sec. 99.000% 0.105 sec. 99.900% 0.140 sec. 99.990% 0.140 sec. Signed-off-by: Azat Khuzhin <[email protected]>
f4dc6aa to
9c7db67
Compare
Member
Author
|
Hmm, so does CI now requires cloud fork sync? And I've seen couple of false positive if this check recently |
alexey-milovidov
approved these changes
Jul 1, 2024
Member
Author
|
@alexey-milovidov are you sure that all CI passed? I'm seeing only 28 checks, while there should 200+ |
Member
|
That's our problem :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix possible issues with MySQL client protocol TLS connections
Occasionally, 02479_mysql_connect_to_self fails on CI 1.
The problem was indeed query profiler and EINTR, but not in a way you may think.
For such failures you may see the following trace in trace_log:
After which the connection will fail with:
But, if you will take a look at ma_tls_read() you will see that it has proper retries for SSL_ERROR_WANT_READ (and EINTR is just a special case of it), but still, for some reason it fails.
And the reason is the units of the read/write timeout, ma_tls_read() calls poll(read_timeout) in case of SSL_ERROR_WANT_READ, but, it incorrectly assume that the timeout is in milliseconds, but that timeout was in seconds, this bug had been fixed in 2, and now it works like a charm!
I've verified it with patching openssl library:
And before this patch (well, not the patch itself, but the referenced patch in mariadb-connector-c) if you will pass read_write_timeout=1 it will fail:
But after, it always works:
Fixes: #50911