Skip to content

Send profile events for INSERT queries (previously only SELECT was supported)#37391

Merged
kssenii merged 5 commits intoClickHouse:masterfrom
azat:insert-profile-events-fix
Jun 20, 2022
Merged

Send profile events for INSERT queries (previously only SELECT was supported)#37391
kssenii merged 5 commits intoClickHouse:masterfrom
azat:insert-profile-events-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented May 20, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Send profile events for INSERT queries (previously only SELECT was supported)

Reproducer:

echo "1" | clickhouse-client --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1

But note, that in case of clickhouse-local they were implicitly supported, but only if query was long enough, i.e.:

# yes | head -n100000 | clickhouse-local --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1
[s1.ch] 2022.05.20 15:20:27 [ 0 ] ContextLock: 10 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] DiskReadElapsedMicroseconds: 29 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocBytes: 200000 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocs: 1 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertQuery: 1 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedBytes: 1000000 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedRows: 100000 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] MemoryTrackerUsage: 1521975 (gauge)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] OSCPUVirtualTimeMicroseconds: 102148 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] OSReadChars: 135700 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] OSWriteChars: 8 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] Query: 1 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] RWLockAcquiredReadLocks: 2 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorRead: 5 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorReadBytes: 134464 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] RealTimeMicroseconds: 293747 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] SoftPageFaults: 382 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] TableFunctionExecute: 1 (increment)
[s1.ch] 2022.05.20 15:20:27 [ 0 ] UserTimeMicroseconds: 102148 (increment)

Fixes: #37241 (cc @alexey-milovidov @zhicwu )

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 20, 2022
@azat azat marked this pull request as draft May 21, 2022 08:58
@kssenii kssenii self-assigned this May 21, 2022
@azat azat changed the title Fix sending profile events for INSERT queries Send profile events for INSERT queries (previously only SELECT was supported) May 24, 2022
@azat azat force-pushed the insert-profile-events-fix branch 2 times, most recently from 312d612 to b3a03e4 Compare May 24, 2022 12:50
@azat azat marked this pull request as ready for review May 24, 2022 12:50
@robot-ch-test-poll robot-ch-test-poll added pr-improvement Pull request with some product improvements and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels May 24, 2022
@azat azat marked this pull request as draft May 25, 2022 07:48
@azat azat marked this pull request as ready for review May 25, 2022 15:44
@azat azat force-pushed the insert-profile-events-fix branch from b3a03e4 to 325e1b1 Compare May 25, 2022 15:44
@azat
Copy link
Copy Markdown
Member Author

azat commented May 25, 2022

Actually the whole idea of profile events was to render progress bar on client (#28364, cc @novikd ), but for INSERT (not INSERT SELECT or input()) it does not makes any sense, since you will not be able to see it anyway, since you will not use interactive clickhouse-client.

This patch adds proper support for ProfileEvents in INSERT, but it looks doubtful to me.

ProfileEvents in INSERT queries has very limited pros:

And also have some caveats/cons:

  • requires protocol change (although it should be 100% compatible)
  • not used by the clickhouse-client for RAM/CPU usage
  • it is better to have determine behavior for this ProfileEvents packet, i.e. send it after each Data block, since otherwise you need to poll for possible packets, like for Log

FWIW right now it does not affect performance.

Details

patched

$ time yes | head -n100000000 | clickhouse-client --query="insert into function null('foo String') format TSV"
real    0m2.049s
user    0m8.036s
sys     0m0.647s
$ time yes | head -n100000000 | clickhouse-client --query="insert into function null('foo String') format TSV"
real    0m2.093s
user    0m8.130s
sys     0m0.591s

upstream

$ time yes | head -n100000000 | clickhouse-client --query="insert into function null('foo String') format TSV"
real    0m2.078s
user    0m8.224s
sys     0m0.595s
$ time yes | head -n100000000 | clickhouse-client --query="insert into function null('foo String') format TSV"
real    0m2.050s
user    0m7.960s
sys     0m0.674s

@kssenii
Copy link
Copy Markdown
Member

kssenii commented May 27, 2022

not used by the clickhouse-client for RAM/CPU usage

but it possibly can, later, so not really a caveat?

@azat
Copy link
Copy Markdown
Member Author

azat commented May 27, 2022

but it possibly can, later, so not really a caveat?

Not sure, usually you pass stdin/--file to the client in this case, and so you cannot use interactive mode.

@kssenii
Copy link
Copy Markdown
Member

kssenii commented May 27, 2022

Not sure, usually you pass stdin/--file to the client in this case, and so you cannot use interactive mode.

but progress (+ RAM/CPU usage) can be shown for non-interactive mode

@azat azat force-pushed the insert-profile-events-fix branch from 325e1b1 to e47fad8 Compare May 31, 2022 08:59
@azat
Copy link
Copy Markdown
Member Author

azat commented May 31, 2022

I'm fine with caveats that I've found.

but progress (+ RAM/CPU usage) can be shown for non-interactive mode

Indeed, and with this patch it will be rendered by the client.
Also I found one issue (with concurrent access) after this patch, that had been fixed now.

Copy link
Copy Markdown
Member

@kssenii kssenii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@azat azat marked this pull request as draft May 31, 2022 13:42
@zhicwu
Copy link
Copy Markdown
Contributor

zhicwu commented May 31, 2022

@azat, will DDL be supported as well? For example: clickhouse-client --query="create database if not exists test" --print-profile-events --profile-events-delay-ms=-1.

It's trivial but I'm wondering if it's been considered or not.

@azat azat force-pushed the insert-profile-events-fix branch from e47fad8 to 4bd5c33 Compare June 1, 2022 07:58
@azat azat marked this pull request as ready for review June 1, 2022 08:00
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 1, 2022

@azat, will DDL be supported as well? For example: clickhouse-client --query="create database if not exists test" --print-profile-events --profile-events-delay-ms=-1.

No, such queries will not provide ProfileEvents.
But does this makes sense for DDL? What event you are interested in?

@zhicwu
Copy link
Copy Markdown
Contributor

zhicwu commented Jun 1, 2022

No, such queries will not provide ProfileEvents. But does this makes sense for DDL?

Good to know. Yes, it's probably useless for DDL, except ALTER UPDATE/DELETE.

What event you are interested in?

Thanks @azat. As of now, SelectedBytes and SelectedRows for select queries, and InsertedBytes and InsertedRows for inserts, should be good enough. The purpose was to get consistent summary like {"read_rows":"0","read_bytes":"0","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"} returned from http interface.

@azat azat force-pushed the insert-profile-events-fix branch from 4bd5c33 to 0ee65fd Compare June 1, 2022 13:52
@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 3, 2022

@Mergifyio update

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jun 3, 2022

update

✅ Branch has been successfully updated

@azat azat force-pushed the insert-profile-events-fix branch 2 times, most recently from 780302b to f53f183 Compare June 7, 2022 11:44
@azat azat force-pushed the insert-profile-events-fix branch from f53f183 to 026d7f7 Compare June 7, 2022 13:45
@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 11, 2022

Stateless tests flaky check (address, actions) — Some queries hung, fail: 1, passed: 299

do you know why it hangs?

@azat azat marked this pull request as draft June 13, 2022 10:37
azat added 5 commits June 16, 2022 11:59
…pported)

Reproducer:

    echo "1" | clickhouse-client --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1

However, clickhouse-local is differnt, it does sent the periodically,
but only if query was long enough, i.e.:

    # yes | head -n100000 | clickhouse-local --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ContextLock: 10 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] DiskReadElapsedMicroseconds: 29 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocBytes: 200000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocs: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertQuery: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedBytes: 1000000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedRows: 100000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] MemoryTrackerUsage: 1521975 (gauge)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSCPUVirtualTimeMicroseconds: 102148 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSReadChars: 135700 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSWriteChars: 8 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] Query: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] RWLockAcquiredReadLocks: 2 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorRead: 5 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorReadBytes: 134464 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] RealTimeMicroseconds: 293747 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] SoftPageFaults: 382 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] TableFunctionExecute: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] UserTimeMicroseconds: 102148 (increment)

v2: Proper support ProfileEvents in INSERTs (with protocol change)
v3: Receive profile events on INSERT queries
Signed-off-by: Azat Khuzhin <[email protected]>
In case of all of the above:
- clickhouse-local
- input_format_parallel_parsing=true
- write_progress_on_update=true

It is possible concurrent access to the following:
- writeProgress() (class properties) (guarded with progress_mutex)
- thread_data/host_cpu_usage (guarded with profile_events_mutex)

v2: decrease number of rows for INSERT ProfileEvents test (10 times)
    CI: https://s3.amazonaws.com/clickhouse-test-reports/37391/4bd5c335182279dcc5020aa081b13c3044135951/stateless_tests__debug__actions__[1/3].html
v3: decrease number of rows for INSERT ProfileEvents test (10 times) and add a comment
    CI: https://s3.amazonaws.com/clickhouse-test-reports/37391/026d7f732cb166c90d6c287b02824b6c7fdebf0c/stateless_tests_flaky_check__address__actions_/runlog.log
Signed-off-by: Azat Khuzhin <[email protected]>

f
v2: apply black formatting (sigh)
Signed-off-by: Azat Khuzhin <[email protected]>
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 16, 2022

do you know why it hangs?

The problem is that now there is mutex for profile events and under thread fuzzer for some of runs 10 mins may not be enough:

2022-06-07 19:34:52 Thread 3 (Thread 0x7ff78de5d700 (LWP 609)):
2022-06-07 19:34:52 #0  __syscall () at ../base/glibc-compatibility/musl/x86_64/syscall.s:14
2022-06-07 19:34:52 #1  0x000000003e5553a6 in clock_nanosleep (clk=<optimized out>, flags=<optimized out>, req=<optimized out>, rem=<optimized out>) at ../base/glibc-compatibility/musl/clock_nanosleep.c:21
2022-06-07 19:34:52 #2  0x00000000368b0454 in sleepForNanoseconds (nanoseconds=<optimized out>) at ../base/base/sleep.cpp:48
2022-06-07 19:34:52 #3  0x000000000d82cda2 in DB::injection (yield_probability=<optimized out>, migrate_probability=<optimized out>, sleep_probability=<optimized out>, sleep_time_us=<optimized out>) at ../src/Common/ThreadFuzzer.cpp:242
2022-06-07 19:34:52 #4  0x000000000d82c6f5 in DB::ThreadFuzzer::signalHandler () at ../src/Common/ThreadFuzzer.cpp:253
2022-06-07 19:34:52 #5  <signal handler called>
2022-06-07 19:34:52 #6  0x00007ff88d23e84b in sched_setaffinity () from /lib/x86_64-linux-gnu/libc.so.6
2022-06-07 19:34:52 #7  0x000000000d82cc6f in DB::injection (yield_probability=<optimized out>, migrate_probability=<optimized out>, sleep_probability=<optimized out>, sleep_time_us=<optimized out>) at ../src/Common/ThreadFuzzer.cpp:232
2022-06-07 19:34:52 #8  0x000000000d82cedd in pthread_mutex_unlock (arg=<optimized out>) at ../src/Common/ThreadFuzzer.cpp:317
2022-06-07 19:34:52 #9  0x000000000d6fdd8a in std::__1::lock_guard<std::__1::mutex>::~lock_guard (this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:97
2022-06-07 19:34:52 #10 DB::ThreadGroupStatus::getProfileEventsCountersAndMemoryForThreads (this=0x0) at ../src/Common/ThreadStatus.cpp:93
2022-06-07 19:34:52 #11 0x000000002d890086 in ProfileEvents::getProfileEvents (server_display_name=..., profile_queue=..., block=..., last_sent_snapshots=...) at ../src/Interpreters/ProfileEventsExt.cpp:120
2022-06-07 19:34:52 #12 0x000000002fea97d7 in DB::TCPHandler::sendProfileEvents (this=<optimized out>) at ../src/Server/TCPHandler.cpp:870
2022-06-07 19:34:52 #13 0x000000002fea1173 in DB::TCPHandler::sendInsertProfileEvents (this=0x0) at ../src/Server/TCPHandler.cpp:896
2022-06-07 19:34:52 #14 DB::TCPHandler::readDataNext (this=<optimized out>) at ../src/Server/TCPHandler.cpp:591
2022-06-07 19:34:52 #15 0x000000002fe9d676 in DB::TCPHandler::processInsertQuery()::$_0::operator()<DB::PushingPipelineExecutor>(DB::PushingPipelineExecutor&) const (executor=..., this=<optimized out>) at ../src/Server/TCPHandler.cpp:649
2022-06-07 19:34:52         "elapsed": 714.65320192,
2022-06-07 19:34:52         "is_cancelled": 0,
2022-06-07 19:34:52         "is_all_data_sent": 0,
2022-06-07 19:34:52         "read_rows": "18754767",
2022-06-07 19:34:52         "read_bytes": "187547670",
2022-06-07 19:34:52         "total_rows_approx": "0",
2022-06-07 19:34:52         "written_rows": "18754767",
2022-06-07 19:34:52         "written_bytes": "187547670",
2022-06-07 19:34:52         "memory_usage": "12593182",
2022-06-07 19:34:52         "peak_memory_usage": "14690334",
2022-06-07 19:34:52         "query": "insert into function null('foo String') format TSV",
2022-06-07 19:34:52         "thread_ids": [
2022-06-07 19:34:52             "609"
2022-06-07 19:34:52         ],

So I've decreased number of rows for this test and rebased.

Note, that w/o thread fuzzer the difference is insignificant:

$ time yes | head -n100000000 | ~/ch/tmp/37391/clickhouse client -q "insert into function null('foo String') format TSV" --progress

real    0m2.133s
user    0m8.374s
sys     0m0.616s

$ time yes | head -n100000000 | ~/ch/tmp/37391/clickhouse client -q "insert into function null('foo String') format TSV" --progress

real    0m2.089s
user    0m8.399s
sys     0m0.642s

$ time yes | head -n100000000 | ~/ch/tmp/upstream/clickhouse client -q "insert into function null('foo String') format TSV" --progress

real    0m2.150s
user    0m8.422s
sys     0m0.734s

$ time yes | head -n100000000 | ~/ch/tmp/upstream/clickhouse client -q "insert into function null('foo String') format TSV" --progress

real    0m2.060s
user    0m8.297s
sys     0m0.724s

@azat azat force-pushed the insert-profile-events-fix branch from 026d7f7 to 68b4f3f Compare June 16, 2022 08:59
@azat azat marked this pull request as ready for review June 16, 2022 08:59
@azat azat requested a review from kssenii June 19, 2022 13:22
@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 20, 2022

01183_custom_separated_format_http, 01184_long_insert_values_huge_strings

not related to changes

@kssenii kssenii merged commit a756b4b into ClickHouse:master Jun 20, 2022
@azat azat deleted the insert-profile-events-fix branch June 20, 2022 14:33
azat added a commit to azat/ClickHouse that referenced this pull request Jun 22, 2022
Right now RemoteInserter does not read ProfileEvents for INSERT, it
handles them only after sending the query or on finish.

But ClickHouse#37391 sends them for each INSERT block, but sometimes they can be
no ProfileEvents packet, since it sends only non-empty blocks.

And this adds too much complexity, and anyway ProfileEvents are useless
for the server, so let's send them only if the query is initial (i.e.
send by user).

Note, that it is okay to change the logic of sending ProfileEvents w/o
changing DBMS_TCP_PROTOCOL_VERSION, because there were no public
releases with the original patch included yet.

Fixes: ClickHouse#37391
Refs: ClickHouse#35075
Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clickhouse-client --print-profile-events not working for insert in non-interactive mode?

5 participants