Skip to content

Commit 2db2655

Browse files
committed
Merge remote-tracking branch 'origin/master' into pqdec
2 parents 44384aa + 901b611 commit 2db2655

File tree

534 files changed

+10686
-5064
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

534 files changed

+10686
-5064
lines changed

CHANGELOG.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,10 @@
3333
* Add new parameter to `S3` table engine and `s3` table function named `storage_class_name` which allows to specify intelligent tiring supported by AWS. Supported both in key-value format and in positional (deprecated) format). [#87122](https://github.com/ClickHouse/ClickHouse/pull/87122) ([alesapin](https://github.com/alesapin)).
3434
* `ALTER UPDATE` for Iceberg table engine. [#86059](https://github.com/ClickHouse/ClickHouse/pull/86059) ([scanhex12](https://github.com/scanhex12)).
3535
* Add system table `iceberg_metadata_log` to retrieve Iceberg metadata files during SELECT statements. [#86152](https://github.com/ClickHouse/ClickHouse/pull/86152) ([scanhex12](https://github.com/scanhex12)).
36-
* ⚠️WTF, fix this! `Iceberg` and `DeltaLake` tables support custom disk configuration. [#86778](https://github.com/ClickHouse/ClickHouse/pull/86778) ([scanhex12](https://github.com/scanhex12)).
36+
* `Iceberg` and `DeltaLake` tables support custom disk configuration via storage level setting `disk`. [#86778](https://github.com/ClickHouse/ClickHouse/pull/86778) ([scanhex12](https://github.com/scanhex12)).
3737
* Support Azure for data lakes disks. [#87173](https://github.com/ClickHouse/ClickHouse/pull/87173) ([scanhex12](https://github.com/scanhex12)).
3838
* Support `Unity` catalog on top of Azure blob storage. [#80013](https://github.com/ClickHouse/ClickHouse/pull/80013) ([Smita Kulkarni](https://github.com/SmitaRKulkarni)).
3939
* Support more formats (`ORC`, `Avro`) in `Iceberg` writes. This closes [#86179](https://github.com/ClickHouse/ClickHouse/issues/86179). [#87277](https://github.com/ClickHouse/ClickHouse/pull/87277) ([scanhex12](https://github.com/scanhex12)).
40-
* ⚠️WTF, revert this feature before the release! Support table engine `Alias`. [#76569](https://github.com/ClickHouse/ClickHouse/pull/76569) ([RinChanNOW](https://github.com/RinChanNOWWW)).
4140
* Add a new system table `database_replicas` with information about database replicas. [#83408](https://github.com/ClickHouse/ClickHouse/pull/83408) ([Konstantin Morozov](https://github.com/k-morozov)).
4241
* Added function `arrayExcept` that subtracts one array as a set from another. [#82368](https://github.com/ClickHouse/ClickHouse/pull/82368) ([Joanna Hulboj](https://github.com/jh0x)).
4342
* Adds a new `system.aggregated_zookeeper_log` table. The table contains statistics (e.g. number of operations, average latency, errors) of ZooKeeper operations grouped by session id, parent path and operation type, and periodically flushed to disk. [#85102](https://github.com/ClickHouse/ClickHouse/pull/85102) [#87208](https://github.com/ClickHouse/ClickHouse/pull/87208) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
@@ -71,8 +70,8 @@
7170
* Reduce memory usage in Iceberg writes. [#86544](https://github.com/ClickHouse/ClickHouse/pull/86544) ([scanhex12](https://github.com/scanhex12)).
7271

7372
#### Improvement
74-
* ⚠️WTF, fix this! Support writing multiple data files in Iceberg in a single insertion. Add new settings, `max_iceberg_data_file_rows` and `max_iceberg_data_file_bytes` to control the limits. [#86275](https://github.com/ClickHouse/ClickHouse/pull/86275) ([scanhex12](https://github.com/scanhex12)).
75-
* ⚠️WTF, fix this! Add rows/bytes limit for inserted data files in delta lake. Controlled by settings `delta_lake_insert_max_rows_in_data_file` and `delta_lake_insert_max_bytes_in_data_file`. [#86357](https://github.com/ClickHouse/ClickHouse/pull/86357) ([Kseniia Sumarokova](https://github.com/kssenii)).
73+
* Support writing multiple data files in Iceberg in a single insertion. Add new settings, `iceberg_insert_max_rows_in_data_file` and `iceberg_insert_max_bytes_in_data_file` to control the limits. [#86275](https://github.com/ClickHouse/ClickHouse/pull/86275) ([scanhex12](https://github.com/scanhex12)).
74+
* Add rows/bytes limit for inserted data files in delta lake. Controlled by settings `delta_lake_insert_max_rows_in_data_file` and `delta_lake_insert_max_bytes_in_data_file`. [#86357](https://github.com/ClickHouse/ClickHouse/pull/86357) ([Kseniia Sumarokova](https://github.com/kssenii)).
7675
* Support more types for partitions in Iceberg writes. This closes [#86206](https://github.com/ClickHouse/ClickHouse/issues/86206). [#86298](https://github.com/ClickHouse/ClickHouse/pull/86298) ([scanhex12](https://github.com/scanhex12)).
7776
* Make S3 retry strategy configurable and make settings of S3 disk can be hot reload if change the config XML file. [#82642](https://github.com/ClickHouse/ClickHouse/pull/82642) ([RinChanNOW](https://github.com/RinChanNOWWW)).
7877
* Improved S3(Azure)Queue table engine to allow it to survive zookeeper connection loss without potential duplicates. Requires enabling S3Queue setting `use_persistent_processing_nodes` (changeable by `ALTER TABLE MODIFY SETTING`). [#85995](https://github.com/ClickHouse/ClickHouse/pull/85995) ([Kseniia Sumarokova](https://github.com/kssenii)).

SECURITY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,4 @@ Removal criteria:
7474

7575
Notification process:
7676
ClickHouse will post notifications within our OSS Trust Center and notify subscribers. Subscribers must log in to the Trust Center to download the notification. The notification will include the timeframe for public disclosure.
77+

ci/defs/job_configs.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
"./ci/jobs/functional_tests.py",
5151
"./ci/jobs/scripts/clickhouse_proc.py",
5252
"./ci/jobs/scripts/functional_tests_results.py",
53+
"./ci/jobs/scripts/functional_tests/setup_log_cluster.sh",
5354
"./tests/queries",
5455
"./tests/clickhouse-test",
5556
"./tests/config",
@@ -786,7 +787,7 @@ class JobConfigs:
786787
include_paths=[
787788
"./ci/docker/fuzzer",
788789
"./tests/ci/ci_fuzzer_check.py",
789-
"./tests/ci/ci_fuzzer_check.py",
790+
"./ci/jobs/scripts/functional_tests/setup_log_cluster.sh",
790791
"./ci/jobs/scripts/fuzzer/",
791792
"./ci/docker/fuzzer",
792793
],

ci/docker/integration/runner/Dockerfile

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - \
6161
&& apt-get clean \
6262
&& dockerd --version; docker --version
6363

64-
6564
# kazoo 2.10.0 is broken
6665
# https://s3.amazonaws.com/clickhouse-test-reports/59337/524625a1d2f4cc608a3f1059e3df2c30f353a649/integration_tests__asan__analyzer__[5_6].html
6766
COPY requirements.txt /
@@ -75,8 +74,11 @@ RUN curl -fsSL -O https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-
7574
# if you change packages, don't forget to update them in tests/integration/helpers/cluster.py
7675
RUN packages="io.delta:delta-spark_2.12:3.1.0,\
7776
org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.1,\
78-
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3" \
79-
&& /spark-3.5.5-bin-hadoop3/bin/spark-shell --packages "$packages" > /dev/null \
77+
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3,\
78+
org.apache.hadoop:hadoop-aws:3.3.4,\
79+
com.amazonaws:aws-java-sdk-bundle:1.12.262,\
80+
org.apache.spark:spark-avro_2.12:3.5.1"\
81+
&& /spark-3.5.5-bin-hadoop3/bin/spark-shell --packages "$packages" \
8082
&& find /root/.ivy2/ -name '*.jar' -exec ln -sf {} /spark-3.5.5-bin-hadoop3/jars/ \;
8183

8284
RUN set -x \

ci/docker/libfuzzer/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ MarkupSafe==2.1.5
1414
more-itertools==8.10.0
1515
oauthlib==3.2.2
1616
packaging==24.1
17-
pip==25.0.1
17+
pip==25.2
1818
pipdeptree==2.23.0
1919
PyJWT==2.10.1
2020
pyparsing==2.4.7

ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -814,6 +814,7 @@ DiskUnreserved
814814
DiskUsed
815815
displayName
816816
displaySecretsInShowAndSelect
817+
DistanceTransposed
817818
distinctdynamictypes
818819
distinctDynamicTypes
819820
distinctjsonpaths
@@ -2349,6 +2350,8 @@ py
23492350
PyArrow
23502351
PyCharm
23512352
QATlib
2353+
qbit
2354+
QBit
23522355
QEMU
23532356
qryn
23542357
QTCreator
@@ -2371,6 +2374,7 @@ quantileExactWeighted
23712374
quantileexactweightedinterpolated
23722375
quantileExactWeightedInterpolated
23732376
quantileGK
2377+
quantilePrometheusHistogram
23742378
quantileInterpolatedWeighted
23752379
quantiles
23762380
quantilesExactExclusive
@@ -2840,6 +2844,8 @@ structureToCapnProtoSchema
28402844
structureToProtobufSchema
28412845
studentttest
28422846
studentTTest
2847+
studentTTestOneSample
2848+
studentttestonesample
28432849
subarray
28442850
subarrays
28452851
subBitmap

ci/jobs/scripts/check_style/check_cpp.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,12 @@ find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' | \
355355
grep -v -F -e \"config.h\" -e \"config_tools.h\" -e \"SQLGrammar.pb.h\" -e \"out.pb.h\" -e \"clickhouse_grpc.grpc.pb.h\" -e \"delta_kernel_ffi.hpp\" | \
356356
xargs -i echo "Found include with quotes in '{}'. Please use <> instead"
357357

358+
# Forbid using std::shared_mutex and point to the faster alternative
359+
find ./{src,programs,utils} -name '*.h' -or -name '*.cpp' | \
360+
grep -vP $EXCLUDE |
361+
xargs grep 'std::shared_mutex' | \
362+
xargs -i echo "Found std::shared_mutex '{}'. Please use DB::SharedMutex instead"
363+
358364
# Context.h (and a few similar headers) is included in many parts of the
359365
# codebase, so any modifications to it trigger a large-scale recompilation.
360366
# Therefore, it is crucial to avoid unnecessary inclusion of Context.h in

ci/jobs/scripts/clickhouse_proc.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ def __init__(
9898
self.debug_artifacts = []
9999
self.extra_tests_results = []
100100
self.logs = []
101+
self.log_export_host, self.log_export_password = None, None
101102

102103
Utils.set_env("CLICKHOUSE_CONFIG_DIR", self.ch_config_dir)
103104
Utils.set_env("CLICKHOUSE_CONFIG", self.config_file)
@@ -320,7 +321,7 @@ def install_vector_search_config(self):
320321
for command in commands:
321322
res = res and Shell.check(command, verbose=True)
322323

323-
with open(f"{temp_dir}/config.xml", 'a') as config_file:
324+
with open(f"{temp_dir}/config.xml", "a") as config_file:
324325
config_file.write(c1)
325326
return res
326327

@@ -358,7 +359,11 @@ def create_log_export_config(self):
358359

359360
def start_log_exports(self, check_start_time):
360361
print("Start log export")
361-
os.environ["CLICKHOUSE_CI_LOGS_CLUSTER"] = CLICKHOUSE_CI_LOGS_CLUSTER
362+
if self.log_export_host:
363+
os.environ["CLICKHOUSE_CI_LOGS_CLUSTER"] = CLICKHOUSE_CI_LOGS_CLUSTER
364+
os.environ["CLICKHOUSE_CI_LOGS_HOST"] = self.log_export_host
365+
os.environ["CLICKHOUSE_CI_LOGS_USER"] = CLICKHOUSE_CI_LOGS_USER
366+
os.environ["CLICKHOUSE_CI_LOGS_PASSWORD"] = self.log_export_password
362367
info = Info()
363368
os.environ["EXTRA_COLUMNS_EXPRESSION"] = (
364369
f"toLowCardinality('{info.repo_name}') AS repo, CAST({info.pr_number} AS UInt32) AS pull_request_number, '{info.sha}' AS commit_sha, toDateTime('{Utils.timestamp_to_str(check_start_time)}', 'UTC') AS check_start_time, toLowCardinality('{info.job_name}') AS check_name, toLowCardinality('{info.instance_type}') AS instance_type, '{info.instance_id}' AS instance_id"

ci/jobs/scripts/functional_tests/setup_log_cluster.sh

Lines changed: 21 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,28 +14,27 @@ set -e
1414
# Pre-configured destination cluster, where to export the data
1515
CLICKHOUSE_CI_LOGS_CLUSTER=${CLICKHOUSE_CI_LOGS_CLUSTER:-system_logs_export}
1616

17-
[ -n "$EXTRA_COLUMNS_EXPRESSION" ] || { echo "ERROR: EXTRA_COLUMNS_EXPRESSION env must be defined"; exit 1; }
1817
EXTRA_COLUMNS=${EXTRA_COLUMNS:-"repo LowCardinality(String), pull_request_number UInt32, commit_sha String, check_start_time DateTime('UTC'), check_name LowCardinality(String), instance_type LowCardinality(String), instance_id String, INDEX ix_repo (repo) TYPE set(100), INDEX ix_pr (pull_request_number) TYPE set(100), INDEX ix_commit (commit_sha) TYPE set(100), INDEX ix_check_time (check_start_time) TYPE minmax, "}
19-
echo "EXTRA_COLUMNS_EXPRESSION=$EXTRA_COLUMNS_EXPRESSION"
18+
echo "EXTRA_COLUMNS_EXPRESSION=${EXTRA_COLUMNS_EXPRESSION:?}"
2019
EXTRA_ORDER_BY_COLUMNS=${EXTRA_ORDER_BY_COLUMNS:-"check_name"}
2120

2221
# coverage_log needs more columns for symbolization, but only symbol names (the line numbers are too heavy to calculate)
2322
EXTRA_COLUMNS_COVERAGE_LOG="${EXTRA_COLUMNS} symbols Array(LowCardinality(String)), "
2423
EXTRA_COLUMNS_EXPRESSION_COVERAGE_LOG="${EXTRA_COLUMNS_EXPRESSION}, arrayDistinct(arrayMap(x -> demangle(addressToSymbol(x)), coverage))::Array(LowCardinality(String)) AS symbols"
2524

2625

27-
function __set_connection_args
26+
function __set_connection_args()
2827
{
2928
# It's impossible to use a generic $CONNECTION_ARGS string, it's unsafe from word splitting perspective.
3029
# That's why we must stick to the generated option
3130
CONNECTION_ARGS=(
3231
--receive_timeout=45 --send_timeout=45 --secure
33-
--user "${CLICKHOUSE_CI_LOGS_USER}" --host "${CLICKHOUSE_CI_LOGS_HOST}"
34-
--password "${CLICKHOUSE_CI_LOGS_PASSWORD}"
32+
--user "${CLICKHOUSE_CI_LOGS_USER:?}" --host "${CLICKHOUSE_CI_LOGS_HOST:?}"
33+
--password "${CLICKHOUSE_CI_LOGS_PASSWORD:?}"
3534
)
3635
}
3736

38-
function __shadow_credentials
37+
function __shadow_credentials()
3938
{
4039
# The function completely screws the output, it shouldn't be used in normal functions, only in ()
4140
# The only way to substitute the env as a plain text is using perl 's/\Qsomething\E/another/
@@ -46,37 +45,35 @@ function __shadow_credentials
4645
')
4746
}
4847

49-
function check_logs_credentials
50-
(
48+
function check_logs_credentials()
49+
{
5150
# The function connects with given credentials, and if it's unable to execute the simplest query, returns exit code
5251

53-
# First check, if all necessary parameters are set
5452
set +x
5553
echo "Check CI Log cluster..."
56-
for parameter in CLICKHOUSE_CI_LOGS_HOST CLICKHOUSE_CI_LOGS_USER CLICKHOUSE_CI_LOGS_PASSWORD; do
57-
export -p | grep -q "$parameter" || {
58-
echo "Credentials parameter $parameter is unset"
59-
return 1
60-
}
61-
done
62-
63-
__shadow_credentials
6454
__set_connection_args
55+
__shadow_credentials
6556
local code
6657
# Catch both success and error to not fail on `set -e`
67-
clickhouse-client "${CONNECTION_ARGS[@]}" -q 'SELECT 1 FORMAT Null' && return 0 || code=$?
58+
clickhouse-client "${CONNECTION_ARGS[@]:?}" -q 'SELECT 1 FORMAT Null' && return 0 || code=$?
6859
if [ "$code" != 0 ]; then
6960
echo 'Failed to connect to CI Logs cluster'
7061
return $code
7162
fi
72-
)
63+
}
7364

74-
function setup_logs_replication
75-
(
65+
function setup_logs_replication()
66+
{
7667
# The function is launched in a separate shell instance to not expose the
7768
# exported values
7869
set +x
7970

71+
if [[ -n "$CLICKHOUSE_CI_LOGS_HOST" ]]; then
72+
check_logs_credentials
73+
else
74+
echo 'No CI logs creds found, tables check will be skipped'
75+
fi
76+
8077
echo "My hostname is ${HOSTNAME}"
8178

8279
echo 'Create all configured system logs'
@@ -142,7 +139,7 @@ function setup_logs_replication
142139

143140
echo "$statement" | clickhouse-client --database_replicated_initial_query_timeout_sec=10 \
144141
--distributed_ddl_task_timeout=30 --distributed_ddl_output_mode=throw_only_active \
145-
"${CONNECTION_ARGS[@]}" || continue
142+
"${CONNECTION_ARGS[@]:?}" || continue
146143

147144
echo "Creating table system.${table}_sender" >&2
148145

@@ -166,9 +163,9 @@ function setup_logs_replication
166163
SELECT ${EXTRA_COLUMNS_EXPRESSION_FOR_TABLE}, * FROM system.${table}
167164
" || continue
168165
done
169-
)
166+
}
170167

171-
function stop_logs_replication
168+
function stop_logs_replication()
172169
{
173170
echo "Detach all logs replication"
174171
clickhouse-client --query "select database||'.'||table from system.tables where database = 'system' and (table like '%_sender' or table like '%_watcher')" | {

ci/jobs/scripts/workflow_hooks/filter_job.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ def only_docs(changed_files):
3838

3939
INTEGRATION_TEST_FLAKY_CHECK_JOBS = [
4040
"Build (amd_asan)",
41-
"Integration tests (asan, flaky check)",
41+
"Integration tests (amd_asan, flaky check)",
4242
]
4343

4444
FUNCTIONAL_TEST_FLAKY_CHECK_JOBS = [
4545
"Build (amd_asan)",
46-
"Stateless tests (asan, flaky check)",
46+
"Stateless tests (amd_asan, flaky check)",
4747
]
4848

4949
_info_cache = None
@@ -78,7 +78,7 @@ def should_skip_job(job_name):
7878
):
7979
return (
8080
True,
81-
f"Skipped, labeled with '{Labels.CI_INTEGRATION_FLAKY}' - run integration test jobs only",
81+
f"Skipped, labeled with '{Labels.CI_INTEGRATION_FLAKY}' - run integration test flaky check job only",
8282
)
8383

8484
if (

0 commit comments

Comments
 (0)