Skip to content

Feat add _etag for object storage#65386

Merged
kssenii merged 16 commits intoClickHouse:masterfrom
skyoct:feat-s3-field
Aug 7, 2024
Merged

Feat add _etag for object storage#65386
kssenii merged 16 commits intoClickHouse:masterfrom
skyoct:feat-s3-field

Conversation

@skyoct
Copy link
Copy Markdown
Contributor

@skyoct skyoct commented Jun 18, 2024

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add _etag virtual column for S3 table engine. Fixes #65312.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing):

  • Allow: All Required Checks
  • Allow: Stateless tests
  • Allow: Stateful tests
  • Allow: Integration Tests
  • Allow: Performance tests
  • Allow: All Builds
  • Allow: All NOT Required Checks
  • Allow: batch 1, 2 for multi-batch jobs
  • Allow: batch 3, 4, 5, 6 for multi-batch jobs

  • Exclude: Style check
  • Exclude: Fast test
  • Exclude: All with ASAN
  • Exclude: All with TSAN, MSAN, UBSAN, Coverage
  • Exclude: All with aarch64, release, debug

  • Do not test
  • Upload binaries for special builds
  • Disable merge-commit
  • Disable CI cache

@kssenii kssenii self-assigned this Jun 18, 2024
@kssenii kssenii added the can be tested Allows running workflows for external contributors label Jun 18, 2024
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-improvement Pull request with some product improvements label Jun 18, 2024
@robot-ch-test-poll3
Copy link
Copy Markdown
Contributor

robot-ch-test-poll3 commented Jun 18, 2024

This is an automated comment for commit 364e973 with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success

@skyoct skyoct marked this pull request as ready for review June 19, 2024 13:26
@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jun 24, 2024

@kssenii Hi, How can I rerun CI?

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 24, 2024

CI fails for a reason.

ClickHouse build check — 34/35 artifact groups are OK

If you open details, you'll see:

clickhouse_common_io.dir/IO/WriteBufferFromS3.cpp.o -c /build/src/IO/WriteBufferFromS3.cpp
Jun 21 02:43:34 /build/src/IO/S3/getObjectInfo.h:18:12: error: redundant string initialization [readability-redundant-string-init,-warnings-as-errors]
Jun 21 02:43:34    18 |     String etag = "";
Jun 21 02:43:34       |            ^~~~~~~~~
Jun 21 02:43:34       |            etag
Jun 21 02:43:34 48866 warnings generated.

Copy link
Copy Markdown
Member

@kssenii kssenii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jun 25, 2024

@kssenii Ok, Thank you very much.

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jun 25, 2024

@kssenii The etag should just be a random string, and I have no way of knowing its value. Can I just check if it is empty here?

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 26, 2024

@kssenii The etag should just be a random string, and I have no way of knowing its value. Can I just check if it is empty here?

Yes, of course.

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jul 1, 2024

image
@kssenii Hi, It seems that I can't find what caused the error from this.

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jul 1, 2024

Stateless tests (release) — fail: 1

00992_system_parts_race_condition_zookeeper_long

-- this test is flaky, do not mind this failure. I will mark this failure is ignored. The tests will continue once I fix Cloud fork sync (only for ClickHouse Inc. employees) — tests failed -- will do now.

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jul 1, 2024

Docs check — Found errors in docs

Please check this failure it should be related to changes

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jul 3, 2024

Stateless tests (release) — fail: 1

00992_system_parts_race_condition_zookeeper_long -- flaky, not related to changes

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jul 8, 2024

@skyoct, could you please fix the build?

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jul 9, 2024

@kssenii Okay, I will fix it immediately.

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jul 11, 2024

Cloud fork sync (only for ClickHouse Inc. employees) — Failed to merge into the sync PR

Fixed it now, @skyoct please don't do merge with master for this PR now, if it is not necessary, because it now breaks the sync each time for this PR.

Could you please also fix style-check?

In /ClickHouse/tests/queries/0_stateless/02270_errors_in_files.sh line 31:
rm "${user_files_path}"/test_02270_1.csv
    ^----------------^ SC2154 (warning): user_files_path is referenced but not assigned (did you mean 'USER_FILES_PATH'?).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw why change this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kssenii I feel that this test case not passing is quite strange. Want to try again.

@skyoct skyoct force-pushed the feat-s3-field branch 2 times, most recently from 79aea8d to 1505cb2 Compare July 12, 2024 03:27
@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jul 12, 2024

image
@kssenii Is it necessary to rerun the tests after this change?

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Jul 16, 2024

image @kssenii Is it necessary to rerun the tests after this change?

@kssenii Can you help me with this?

@skyoct
Copy link
Copy Markdown
Contributor Author

skyoct commented Aug 7, 2024

@kssenii Hi, Can you help review?

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Aug 7, 2024

@kssenii Hi, Can you help review?

Sorry, Cloud For Sync gets stuck in pending state for this PR, I restarted it yesterday via github api, but it does not finish still. I will try again to reset it again via github api, and if it does not help again - I will push an empty commit.

@kssenii kssenii enabled auto-merge August 7, 2024 14:19
@kssenii kssenii added this pull request to the merge queue Aug 7, 2024
Merged via the queue into ClickHouse:master with commit 315fd54 Aug 7, 2024
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 7, 2024
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Aug 20, 2024

@kssenii, it is very strange to have ETag in double quotes:

:) SELECT _path, _etag FROM s3('s3://ookla-open-data/parquet/performance/type=*/year=*/quarter=*/*.parquet') LIMIT 10

SELECT
    _path,
    _etag
FROM s3('s3://ookla-open-data/parquet/performance/type=*/year=*/quarter=*/*.parquet')
LIMIT 10

Query id: 992f4385-b9e1-4194-8f3a-55843b093615

    ┌─_path─────────────────────────────────────────────────────────────────────────────────────────────────────────┬─_etag─────────────────────────────────┐
 1. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 2. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 3. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 4. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 5. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 6. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 7. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 8. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
 9. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
10. │ ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet │ "921e7e414a81f5eb426810e73a3ee108-28" │
    └───────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────┘

Unless it already was in double quotes in HTTP headers, it looks like a bug in AWS SDK.

@alexey-milovidov
Copy link
Copy Markdown
Member

And I don't see implementation for the URL engine, where it would also be natural.

@alexey-milovidov
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add virtual columns _etag, _last_modified, _storage_class for S3 storage.

4 participants