Skip to content

Improve the observability of INSERT on distributed table#41034

Merged
rschu1ze merged 18 commits intoClickHouse:masterfrom
FrankChen021:distributed
Sep 13, 2022
Merged

Improve the observability of INSERT on distributed table#41034
rschu1ze merged 18 commits intoClickHouse:masterfrom
FrankChen021:distributed

Conversation

@FrankChen021
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improve the observability of INSERT on distributed table

Current INSERT on distributed is executed asynchronously. If the ASYNC insert from distributed table to local table fails, we don't know about that unless we check the text logs on the server. For a managed database service, it's not a good way for users to check such errors.

This PR first add more dimensions/metrics to the header of temp file that are going to be sent to remote tables, including:

  • cluser name
  • the database of distribtued table
  • name of target table
  • to which shard the data is written
  • how many rows and bytes the data is written to each shard

And then above information is written into opentelemetry span logs if opentelemetry tracing is enabled.
From the log, it's very clear that how the raw INSERT is splitted into different INSERTs on different shards.
image

The reason that we write these info to span logs is that we can easily extract these dimension/metrics from span logs with some existing tools, and then visualize these metrics for users.
Following is a demonstration that how we do this based on above information in the span logs.
image

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 6, 2022
@evillique evillique added the can be tested Allows running workflows for external contributors label Sep 6, 2022
@FrankChen021

This comment was marked as outdated.

@rschu1ze rschu1ze self-assigned this Sep 7, 2022
@FrankChen021

This comment was marked as outdated.

@rschu1ze

This comment was marked as outdated.

Signed-off-by: Frank Chen <[email protected]>
@rschu1ze

This comment was marked as outdated.

@azat

This comment was marked as outdated.

@FrankChen021
Copy link
Copy Markdown
Contributor Author

The problem is that test_cluster_two_shards consists from the same node - 127.1 and 127.2 (which is the same clickhouse instance), and so it executes the query, however runner waits until there will be two finished hosts.

So you can simply remove ON CLUSTER queries from you test, since it is not required there and everything will work.

Thanks for the explanation. I have not dived into the details of test.
I thought these are two nodes, so I set up this test cluster on my local with two nodes. If these two are the same clickhouse isntance, will the INSERT on shard 0 (or shard 1) be executed asynchronously - that means not call the writeToLocal instead of writing to a file first and then send the file to remote? I'm not sure about it.

@azat
Copy link
Copy Markdown
Member

azat commented Sep 8, 2022

I thought these are two nodes

This is the cluster that is used - https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/config.xml#L827-L840

so I set up this test cluster on my local with two nodes

You can simply use clickhouse-server -C /path/to/config.xml to use the same (but also there is a bunch of overrides in tests/config but AFAICS it is not important for you test)

If these two are the same clickhouse isntance, will the INSERT on shard 0 (or shard 1) be executed asynchronously - that means not call the writeToLocal instead of writing to a file first and then send the file to remote? I'm not sure about it.

  • 127.1 - detected as localhost
  • 127.2 - does not detected as localhost

That said that there will be one local node and one remote, just what you test needs.

@FrankChen021

This comment was marked as outdated.

Signed-off-by: Frank Chen <[email protected]>
Signed-off-by: Frank Chen <[email protected]>
@FrankChen021 FrankChen021 requested a review from azat September 9, 2022 04:20
#
${CLICKHOUSE_CLIENT} -nq "
DROP TABLE IF EXISTS ${CLICKHOUSE_DATABASE}.dist_opentelemetry;
DROP TABLE IF EXISTS ${CLICKHOUSE_DATABASE}.local_opentelemetry;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit CLICKHOUSE_DATABASE in CREATE/DROP, though it is minor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when I remove the database from the the CREATE/DROP, the tables are not correctly created under the ${CLICKHOUSE_DATABASE} database but under default database. We should add --database to the command so that correct database will be applied, so I didn't make change to these queries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done automatically by shell_config.sh -

[ -v CLICKHOUSE_DATABASE ] && CLICKHOUSE_CLIENT_OPT0+=" --database=${CLICKHOUSE_DATABASE} "

So this should work, but like I said this is minor, and if you don't have any other changes you can ignore this. But if you have something else, then please apply this suggestion.

@FrankChen021
Copy link
Copy Markdown
Contributor Author

  • 127.1 - detected as localhost
  • 127.2 - does not detected as localhost

That said that there will be one local node and one remote, just what you test needs.

Hi @azat

Last question, how does the CI setup the ClickHouse instance to listen on both 127.0.0.1 and 127.0.0.2?

I created a loopback 127.0.0.2 by using ifconfig lo0 alias 127.0.0.2 command on my MacBook, but the INSRT on distributed table treats both 127.0.01 and 127.0.0.2 as local nodes. However from the test result of CI, the 127.0.0.2 is treated as remote node.

@azat
Copy link
Copy Markdown
Member

azat commented Sep 12, 2022

Last question, how does the CI setup the ClickHouse instance to listen on both 127.0.0.1 and 127.0.0.2?

From https://en.wikipedia.org/wiki/Loopback:

Various Internet Engineering Task Force (IETF) standards reserve the IPv4 address block 127.0.0.0/8, in CIDR notation and the IPv6 address ::1/128 for this purpose. The most common IPv4 address used is 127.0.0.1. Commonly these loopback addresses are mapped to the hostnames localhost or loopback.

So listening on 0.0.0.0/:: is enough.

I created a loopback 127.0.0.2 by using ifconfig lo0 alias 127.0.0.2 command on my MacBook

You don't need to create anything this should out of the box.
But, apparently on macos it does not work, since loopback interface has 127.0.0.1 while on linux 127.0.0.1/8

but the INSRT on distributed table treats both 127.0.01 and 127.0.0.2 as local nodes.

Yes, this is because it finds 127.2 on some interface locally and treat it as localhost because of this -

NetworkInterfaces interfaces;
return interfaces.hasAddress(address);

If you have macos setup, then I would use test_cluster_two_shards_localhost and add two queries, with prefer_localhost_replica=0 and with prefer_localhost_replica=1 (default). That way you will cover both cases explicitly.

@FrankChen021
Copy link
Copy Markdown
Contributor Author

If you have macos setup, then I would use test_cluster_two_shards_localhost and add two queries, with prefer_localhost_replica=0 and with prefer_localhost_replica=1 (default). That way you will cover both cases explicitly.

Ah, I almost forget we have this setting prefer_localhost_replica. Thanks for reminder.

Signed-off-by: Frank Chen <[email protected]>
Signed-off-by: Frank Chen <[email protected]>
Signed-off-by: Frank Chen <[email protected]>
Signed-off-by: Frank Chen <[email protected]>
@FrankChen021
Copy link
Copy Markdown
Contributor Author

Looks like the building failure is not related to changes in this PR.

@rschu1ze
Copy link
Copy Markdown
Member

Agree. Also the stress test failure looks unrelated to me. Should be good to merge once the remaining tests finished.

thread_trace_context->root_span.addAttribute("clickhouse.distributed", distributed_header.distributed_table);
thread_trace_context->root_span.addAttribute("clickhouse.remote", distributed_header.remote_table);
thread_trace_context->root_span.addAttribute("clickhouse.rows", distributed_header.rows);
thread_trace_context->root_span.addAttribute("clickhouse.bytes", distributed_header.bytes);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to adjust the code path for distributed_directory_monitor_batch_inserts - processFilesWithBatching, see 00e3c21

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I was wondering, maybe it worth to use clickhouse.distributed_send. namespace for the metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants