Skip to content

Use 1MB HTTP buffers to avoid frequent send syscalls#65028

Merged
serxa merged 3 commits intomasterfrom
fix-poco-send-http-buffer
Jun 18, 2024
Merged

Use 1MB HTTP buffers to avoid frequent send syscalls#65028
serxa merged 3 commits intomasterfrom
fix-poco-send-http-buffer

Conversation

@serxa
Copy link
Copy Markdown
Member

@serxa serxa commented Jun 9, 2024

Reading from a table that works over an S3 disk is done mostly with syscalls that read 1MB, but writing to this table is done with 8KB of data per syscall. It is limited by HTTP_DEFAULT_BUFFER_SIZE.

There are a few options we can improve it:

  • change default buffer size (this PR);
  • specify 1MB buffer size only in some places to avoid unnecessarily large buffers (harder);
  • eliminate unnecessary buffering during sends, because now every byte is copied at least twice (very hard);
# bpftrace -e 'BEGIN { printf("Tracing send/recv and read/write syscalls from VFSWrite and VFSRead threads. (Press Ctrl-C to stop)\n"); }
        tracepoint:syscalls:sys_exit_sendto /comm=="VFSWrite"/ {
            if (args->ret >= 0) { @send_bytes = sum(args->ret); @send_hist[comm] = hist(args->ret); }
            else { @send_errors = sum(1); }
        }
        tracepoint:syscalls:sys_exit_recvfrom /comm=="VFSRead"/ {
            if (args->ret >= 0)  { @recv_bytes = sum(args->ret); @recv_hist[comm] = hist(args->ret); }
            else { @recv_errors = sum(1); }
        }'
Attaching 2 probes...
Tracing send/recv and read/write syscalls from VFSWrite and VFSRead threads. (Press Ctrl-C to stop)
^C

@recv_bytes: 3986589724

@recv_hist[VFSRead]: 
[256, 512)            28 |                                                    |
[512, 1K)              1 |                                                    |
[1K, 2K)               2 |                                                    |
[2K, 4K)               6 |                                                    |
[4K, 8K)              64 |                                                    |
[8K, 16K)             31 |                                                    |
[16K, 32K)            33 |                                                    |
[32K, 64K)           436 |@@@@@@                                              |
[64K, 128K)          117 |@                                                   |
[128K, 256K)         151 |@@                                                  |
[256K, 512K)         128 |@                                                   |
[512K, 1M)           391 |@@@@@@                                              |
[1M, 2M)            3387 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@send_bytes: 2008053316
@send_bytes_cloud: 194594

@send_hist[VFSWrite]: 
[64, 128)            956 |                                                    |
[128, 256)             8 |                                                    |
[256, 512)           254 |                                                    |
[512, 1K)              8 |                                                    |
[1K, 2K)             126 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               4 |                                                    |
[8K, 16K)         245104 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This change was reverted

All HTTP buffers are now 1MB instead of 8KB to avoid too frequent send syscalls

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing):

  • Allow: All Required Checks
  • Allow: Stateless tests
  • Allow: Stateful tests
  • Allow: Integration Tests
  • Allow: Performance tests
  • Allow: Normal Builds
  • Allow: Special Builds
  • Allow: All NOT Required Checks
  • Allow: batch 1, 2 for multi-batch jobs
  • Allow: batch 3, 4, 5, 6 for multi-batch jobs

  • Exclude: Style check
  • Exclude: Fast test
  • Exclude: All with ASAN
  • Exclude: All with TSAN, MSAN, UBSAN, Coverage
  • Exclude: All with aarch64, release, debug

  • Do not test
  • Upload binaries for special builds
  • Disable merge-commit
  • Disable CI cache

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-improvement Pull request with some product improvements label Jun 9, 2024
@robot-ch-test-poll3
Copy link
Copy Markdown
Contributor

robot-ch-test-poll3 commented Jun 9, 2024

This is an automated comment for commit 446e28d with description of existing statuses. It's updated for the latest CI running

⏳ Click here to open a full report in a separate page

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Successful checks
Check nameDescriptionStatus
A SyncIf it fails, ask a maintainer for help✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
PR CheckChecks correctness of the PR's body✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success

@serxa
Copy link
Copy Markdown
Member Author

serxa commented Jun 9, 2024

Fuzzers build has failed with:

Jun 09 15:30:21 clang++-18: error: unknown argument: '-fprofile-instr-generate -fcoverage-mapping'

@alexey-milovidov alexey-milovidov self-assigned this Jun 10, 2024
@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Jun 11, 2024

Also note that we store http sessions (which are also used in s3 client) which hold this buffer in http connection pool and and we should check that total memory usage dramatically will not increase dramatically. Most likely not because it will add only additional few GBs in the worst case.

@CheSema
Copy link
Copy Markdown
Member

CheSema commented Jun 12, 2024

Also note that we store http sessions (which are also used in s3 client) which hold this buffer in http connection pool and and we should check that total memory usage dramatically will not increase dramatically. Most likely not because it will add only additional few GBs in the worst case.

AFAIK that buffer is reset and freed when session is stored in session pool.

@serxa
Copy link
Copy Markdown
Member Author

serxa commented Jun 18, 2024

Failed test is indeed related, easily reproducable:

FAILED test_checking_s3_blobs_paranoid/test.py::test_when_s3_broken_pipe_at_upload_is_retried - assert 0 == 3

@serxa serxa added this pull request to the merge queue Jun 18, 2024
@serxa serxa removed this pull request from the merge queue due to a manual request Jun 18, 2024
@serxa serxa added this pull request to the merge queue Jun 18, 2024
Merged via the queue into master with commit f1cf175 Jun 18, 2024
@serxa serxa deleted the fix-poco-send-http-buffer branch June 18, 2024 19:07
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 18, 2024
@Algunenano Algunenano added pr-not-for-changelog This PR should not be mentioned in the changelog and removed pr-improvement Pull request with some product improvements labels Jun 20, 2024
@al13n321
Copy link
Copy Markdown
Member

FYI, this made 00763_lock_buffer_long (S3, TSAN) 2x slower, and the revert made it 2x faster again: https://pastila.nl/?0153cd86/d7e6efac0feaa6d6a8a2afe900602933.link (https://pastila.nl/?000a5eb1/a44eacfdacac6e7aca53c02f019ab426.html). The test does DROP+CREATE+INSERT of a MergeTree table with 1 column 500 times.

(Disclaimer: I didn't check this directly, only looked at the timing from the queries above vs commit history.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants