Skip to content

Add system table with blob storage operations log#52918

Merged
vdimir merged 1 commit intomasterfrom
vdimir/s3_blob_log
Nov 21, 2023
Merged

Add system table with blob storage operations log#52918
vdimir merged 1 commit intomasterfrom
vdimir/s3_blob_log

Conversation

@vdimir
Copy link
Copy Markdown
Member

@vdimir vdimir commented Aug 2, 2023

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Add system table blob_storage_log

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-feature Pull request with new product feature label Aug 2, 2023
@robot-clickhouse-ci-2
Copy link
Copy Markdown
Contributor

robot-clickhouse-ci-2 commented Aug 2, 2023

This is an automated comment for commit 1523447 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker image for serversThe check to build and optionally push the mentioned image to docker hub✅ success
Docs CheckBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success
Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure

@vdimir vdimir force-pushed the vdimir/s3_blob_log branch from d00cb2c to 9dd3c0e Compare August 2, 2023 12:57
@kssenii kssenii self-assigned this Aug 2, 2023
@vdimir vdimir force-pushed the vdimir/s3_blob_log branch 2 times, most recently from bd02979 to c1f5dce Compare August 18, 2023 12:10
namespace DB
{

struct BlobStorageLogElement
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to:

  1. add the compressed size in bytes?
  2. add a new event_type DeleteFailed in case of a blob has been not removed from s3 due to some errors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add data size for upload operations.
Table will have error_msg column, I'll try to do best to have a records for failed queries. However, it can be some network error, so non-empty error_msg doesn't mean that object was not deleted, because we cannot know for sure if request have reached s3 or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdimir vdimir force-pushed the vdimir/s3_blob_log branch 4 times, most recently from 7fdfe82 to 3372508 Compare October 31, 2023 11:27
@vdimir vdimir changed the title [WIP] Add system table s3_blob_log Add system table with blob storage operations log Oct 31, 2023
@vdimir vdimir force-pushed the vdimir/s3_blob_log branch 7 times, most recently from 64a5446 to 1d32d42 Compare November 6, 2023 15:23
@vdimir vdimir force-pushed the vdimir/s3_blob_log branch from dd633a1 to 27505f9 Compare November 8, 2023 10:10
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Nov 8, 2023

Integration tests (asan) [6/6] — fail: 2, passed: 488 Details
Integration tests (asan, analyzer) [6/6] — fail: 2, passed: 487 Details
Integration tests (release) [2/4] — fail: 2, passed: 600 Details
Integration tests (tsan) [6/6] — fail: 2, passed: 489 Details

test_merge_tree_s3/test.py::test_simple_insert_select need to be updated a bit, minor fix, I'll do it with addressing review comments. Other things is ready and should be in order.

@kssenii you may take a look on the PR

@vdimir vdimir marked this pull request as ready for review November 8, 2023 17:02
@vdimir vdimir requested a review from kssenii November 9, 2023 11:08
element.remote_path = remote_path;
element.local_path = local_path_.empty() ? local_path : local_path_;

if (data_size > std::numeric_limits<decltype(element.data_size)>::max())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it possible if data_size is size_t and element.data_size is UInt32?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I aimed to store UInt32 in table to save space, but I don't think it will be significant, changed to UInt64 and removed the check


UInt32 data_size;

Int32 error_code = -1; /// negative if no error
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually no error is code 0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


auto clients_ = clients.get();

BlobStorageLogWriter blob_storage_log = getBlobStorageLog();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a way to enable/disable writing to blob_storage_log with a query/profile setting?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it make lot's of sense, since user can configure it in server level config. In that case it can be used to debug some issues connected with missing or unexpected blobs. If logging is disabled for half of the requests it will not be helpful anymore, so probably it may be just disabled in config.

Comment on lines +561 to +567
BlobStorageLogWriter S3ObjectStorage::getBlobStorageLog()
{
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD /// Keeper standalone build doesn't have context
/// Make a copy with local properties like query_id, object path, etc
BlobStorageLogWriter blob_storage_log(Context::getGlobalContextInstance()->getBlobStorageLog());
blob_storage_log.disk_name = disk_name;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it could be better if getBlobStorageLog returned a pointer instead of an object and it will be nullptr if context->getBlobStorageLog() returns nullptr (or if it is disabled with query setting), this will allow to avoid redundant work with checking query context below and then calling addEvent if the no event will actually be added because log is nullptr. Also there are places where we do addEvent in a loop for many objects, but there is no point in doing it if log is not initialized or disabled.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced BlobStorageLogWriter with BlobStorageLogWriterPtr. Initially I didn't want to check

if (blob_log)
	blob_log->addEvent(...)

each time, but as you mentioned having plain object BlobStorageLogWriter also have some drawbacks, so pointer is good.

@vdimir vdimir force-pushed the vdimir/s3_blob_log branch 3 times, most recently from 00b52d4 to cd6560f Compare November 16, 2023 11:28
@vdimir vdimir requested a review from kssenii November 17, 2023 09:12
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Nov 17, 2023

Stateless tests (aarch64) — fail: 1, passed: 5814, skipped: 30 Details
Stateless tests (release) — fail: 1, passed: 5838, skipped: 6Details
Stateless tests (release, analyzer) — fail: 1, passed: 5833, skipped: 11 Details
Stateless tests (release, wide parts enabled) — fail: 1, passed: 5836, skipped: 8 Details

00002_log_and_exception_messages_formatting was broken on master.

@kssenii you may check an updated code

@vdimir vdimir force-pushed the vdimir/s3_blob_log branch 2 times, most recently from 64c6c74 to a65279a Compare November 21, 2023 09:14
@vdimir vdimir force-pushed the vdimir/s3_blob_log branch from a65279a to 1523447 Compare November 21, 2023 09:18
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Nov 21, 2023

Known issues:

Integration tests (tsan) [4/6] — fail: 1, passed: 422 Details

test_storage_rabbitmq

Stateless tests (asan) [3/4] — fail: 1, passed: 1420, skipped: 11 Details

01280_ttl_where_group_by

@vdimir vdimir merged commit a139ae9 into master Nov 21, 2023
@vdimir vdimir deleted the vdimir/s3_blob_log branch November 21, 2023 16:40
baibaichen added a commit to Kyligence/gluten that referenced this pull request Nov 22, 2023
binmahone pushed a commit to apache/gluten that referenced this pull request Nov 22, 2023
…22) (#3802)

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20231122)

* fix build due to ClickHouse/ClickHouse#52918

* [Gluten-3769] Revert Fix due to 20231117(#3756) upgrade

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants