Skip to content

Commit 3a2240a

Browse files
committed
Merge remote-tracking branch 'origin/master' into workload-memory-scheduling
2 parents 2a4de30 + eef623a commit 3a2240a

File tree

94 files changed

+3241
-2683
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+3241
-2683
lines changed

.claude/skills/review/SKILL.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ allowed-tools: Task, Bash, Read, Glob, Grep, WebFetch, AskUserQuestion
1818
- Fetch PR metadata (title, description, base/head refs, changed files).
1919
- Fetch the full PR diff.
2020
- Note the PR title, description, and linked issues
21+
- Validate PR template metadata against `.github/PULL_REQUEST_TEMPLATE.md`:
22+
- `Changelog category` is present, valid, and semantically correct for the actual code change.
23+
- `Changelog entry` is present and user-readable when required by the selected category.
24+
- `Changelog entry` quality follows ClickHouse expectations: specific user-facing impact, no vague wording, and migration guidance for backward-incompatible changes.
2125

2226
**If a branch name is given:**
2327
- Get the diff against `master`.
@@ -45,7 +49,7 @@ SCOPE & LANGUAGE
4549

4650
INPUTS YOU WILL RECEIVE
4751
- PR title, description, motivation
48-
- PR template fields (`Changelog category`, `Changelog entry`)
52+
- PR template changelog metadata (`Changelog category`, `Changelog entry`, requirement/sufficiency, and user-facing quality)
4953
- Diff (file paths, added/removed lines)
5054
- Linked issues / discussions
5155
- CI status and logs (if available)
@@ -93,7 +97,8 @@ WHAT TO REVIEW VS WHAT TO IGNORE
9397
- Scan all changed lines for typos in comments, variable names, string literals, log messages, error messages, and documentation.
9498
- Report all typos found with suggested corrections.
9599
- Check that error messages are clear, informative, and help the user understand what went wrong and how to fix it.
96-
- Review PR template changelog quality: `Changelog category` must match the change, and `Changelog entry` (when required by the PR template) must be present and user-readable.
100+
- Review PR template changelog quality: `Changelog category` must match the change, and `Changelog entry` (when required by the PR template) must be present, specific, and user-readable.
101+
- Read the changelog-entry standards from `clickhouse-pr-description` and apply them: avoid vague text (e.g. "fix bug"), describe the exact affected feature/behavior, and for backward-incompatible changes explain old behavior, new behavior, and how to preserve old behavior when possible.
97102

98103
**Explicitly ignore (do not comment on these unless they indicate a bug):**
99104
- Commented debugging code (completely ignore for draft PR, no more than one message in total)
@@ -192,6 +197,8 @@ CLICKHOUSE RULES (MANDATORY)
192197
Ensure incremental rollout is feasible in both OSS and Cloud (feature flags, safe defaults, non-disruptive changes).
193198
- **Compilation time**
194199
Follow checklist **7) Compilation time & build impact**. Treat violations there as ClickHouse-rule issues.
200+
- **PR metadata quality**
201+
For PR-number reviews, verify PR template metadata against `.github/PULL_REQUEST_TEMPLATE.md`: `Changelog category` correctness, required `Changelog entry` quality, and alignment with `clickhouse-pr-description` changelog guidance (specificity, user impact, and migration details for backward-incompatible changes).
195202

196203
SEVERITY MODEL – WHAT DESERVES A COMMENT
197204

@@ -221,11 +228,17 @@ SEVERITY MODEL – WHAT DESERVES A COMMENT
221228
REQUESTED OUTPUT FORMAT
222229
Respond with the following sections. Be terse but specific. Include code suggestions as minimal diffs/patches where helpful.
223230
Focus on problems — do not describe what was checked and found to be fine. Use emojis (❌ ⚠️ ✅ 💡) to make findings scannable.
224-
**Omit any section entirely if there is nothing notable to report in it** — do not include a section just to say "looks good" or "no concerns". The only mandatory sections are Summary, ClickHouse Compliance, and Final Verdict.
231+
**Omit any section entirely if there is nothing notable to report in it** — do not include a section just to say "looks good" or "no concerns". The only mandatory sections are Summary, ClickHouse Rules, and Final Verdict.
225232

226233
**Summary**
227234
- One paragraph explaining what the PR does and your high-level verdict.
228235

236+
**PR Metadata** (omit if no issues found)
237+
- State whether `Changelog category` is correct for the actual change.
238+
- State whether `Changelog entry` is required by the chosen category, and whether the provided entry satisfies that requirement.
239+
- Evaluate `Changelog entry` quality using `clickhouse-pr-description` criteria (specific change, user impact, and migration guidance for backward-incompatible changes).
240+
- If any item is incorrect, provide the exact replacement text.
241+
229242
**Missing context** (omit if none)
230243
- Bullet list of critical info you lacked. Prefix each item with ⚠️ (e.g., ⚠️ No CI logs available, ⚠️ No benchmarks provided).
231244
- If PR motivation/reason is not clear from the title and description, add a ⚠️ item explicitly stating that motivation is unclear.
@@ -237,11 +250,10 @@ Focus on problems — do not describe what was checked and found to be fine. Use
237250
- **⚠️ Majors**
238251
- `[File:Line(s)]` Issue + rationale.
239252
- Suggested fix.
240-
- **💡 Nits** (only if they reduce bug risk or user confusion)
253+
- **💡 Nits**
241254
- `[File:Line(s)]` Issue + quick fix.
242-
- Use this section for changelog-template quality issues (`Changelog category` mismatch, missing/unclear required `Changelog entry`).
255+
- Use this section for changelog-template quality issues (`Changelog category` mismatch, missing/unclear required `Changelog entry`, or low-quality user-facing `Changelog entry` that is too vague).
243256

244-
If there are **no Blockers or Majors**, you may omit the "Nits" section entirely and just say the PR looks good.
245257

246258
**Tests** (omit if adequate)
247259
- Only include this section if tests are **missing or insufficient**. Prefix each missing test with ⚠️. Specify which additional tests to add and why.
@@ -261,6 +273,7 @@ Example:
261273
| No magic constants || |
262274
| Backward compatibility | ⚠️ | Default changed without `SettingsChangesHistory.cpp` update |
263275
| `SettingsChangesHistory.cpp` || Not updated |
276+
| PR metadata quality | ⚠️ | `Changelog category` does not match change type; `Changelog entry` is too vague for users |
264277
| Safe rollout || |
265278
| Compilation time || |
266279

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
/build
1313
/build_*
1414
/build-*
15+
/ci/tmp
1516
/tests/venv
1617
/obj-x86_64-linux-gnu/
1718

ci/docker/integration/runner/Dockerfile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,6 @@ org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.1,\
8080
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3,\
8181
org.apache.hadoop:hadoop-aws:3.3.4,\
8282
com.amazonaws:aws-java-sdk-bundle:1.12.262,\
83-
org.apache.hadoop:hadoop-azure:3.3.4,\
84-
com.microsoft.azure:azure-storage:8.6.6,\
8583
org.apache.spark:spark-avro_2.12:3.5.1"\
8684
&& /spark-3.5.5-bin-hadoop3/bin/spark-shell --packages "$packages" \
8785
&& find /root/.ivy2/ -name '*.jar' -exec ln -sf {} /spark-3.5.5-bin-hadoop3/jars/ \;

ci/jobs/integration_test_job.py

Lines changed: 162 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import argparse
22
import os
3+
import re
34
import subprocess
45
import time
56
from pathlib import Path
6-
from typing import List, Tuple
7+
from typing import List, Optional, Tuple
78

89
from more_itertools import tail
910

@@ -103,6 +104,150 @@ def _start_docker_in_docker():
103104
print(f"Started docker-in-docker asynchronously with PID {dockerd_proc.pid}")
104105

105106

107+
_COMPOSE_DIR = Path("./tests/integration/compose")
108+
109+
# Explicit mapping for with_* flags whose compose file name cannot be derived
110+
# by simply prepending "docker_compose_" and appending ".yml".
111+
_WITH_FLAG_TO_COMPOSE: dict[str, List[str]] = {
112+
"mysql57": ["docker_compose_mysql.yml"],
113+
"mysql8": ["docker_compose_mysql_8_0.yml"],
114+
"dremio26": ["docker_compose_dremio_26_0.yml"],
115+
"kerberos_kdc": ["docker_compose_kerberos_kdc.yml"],
116+
# with_iceberg_catalog can use any of the iceberg catalogs; include them all
117+
"iceberg_catalog": [
118+
"docker_compose_iceberg_rest_catalog.yml",
119+
"docker_compose_iceberg_hms_catalog.yml",
120+
"docker_compose_iceberg_lakekeeper_catalog.yml",
121+
"docker_compose_iceberg_nessie_catalog.yml",
122+
],
123+
"hms_catalog": ["docker_compose_iceberg_hms_catalog.yml"],
124+
"glue_catalog": ["docker_compose_glue_catalog.yml"],
125+
"prometheus_writer": ["docker_compose_prometheus.yml"],
126+
"prometheus_reader": ["docker_compose_prometheus.yml"],
127+
"prometheus_receiver": ["docker_compose_prometheus.yml"],
128+
# with_odbc_drivers implicitly sets up mysql8 + postgres
129+
"odbc_drivers": ["docker_compose_mysql_8_0.yml", "docker_compose_postgres.yml"],
130+
# Flags with no separate compose file of their own
131+
"jdbc_bridge": [],
132+
"net_trics": [],
133+
}
134+
135+
136+
def get_compose_files_for_test_modules(test_modules: List[str]) -> List[Path]:
137+
"""Return compose files needed by the given test modules.
138+
139+
Grep every Python source file in each test suite directory for:
140+
- `with_X=True` patterns (mapped via `_WITH_FLAG_TO_COMPOSE` or the obvious
141+
`docker_compose_{X}.yml` naming convention), and
142+
- explicit `docker_compose_*.yml` file name strings (used e.g. via
143+
`extra_parameters={"docker_compose_file_name": "..."}` calls).
144+
"""
145+
needed: set[Path] = set()
146+
suite_dirs = {m.split("/")[0] for m in test_modules}
147+
148+
for suite_dir in suite_dirs:
149+
suite_path = Path("./tests/integration/") / suite_dir
150+
if not suite_path.is_dir():
151+
continue
152+
for py_file in suite_path.glob("**/*.py"):
153+
try:
154+
content = py_file.read_text(errors="replace")
155+
except OSError:
156+
continue
157+
158+
# 1. with_X=True → compose file via mapping or naming convention
159+
for m in re.finditer(r"\bwith_(\w+)\s*=\s*True", content):
160+
flag = m.group(1)
161+
if flag in _WITH_FLAG_TO_COMPOSE:
162+
for fname in _WITH_FLAG_TO_COMPOSE[flag]:
163+
p = _COMPOSE_DIR / fname
164+
if p.exists():
165+
needed.add(p)
166+
else:
167+
p = _COMPOSE_DIR / f"docker_compose_{flag}.yml"
168+
if p.exists():
169+
needed.add(p)
170+
171+
# 2. Directly named compose files (e.g. in extra_parameters dicts)
172+
for m in re.finditer(r"(docker_compose_\w+\.yml)", content):
173+
p = _COMPOSE_DIR / m.group(1)
174+
if p.exists():
175+
needed.add(p)
176+
177+
return sorted(needed)
178+
179+
180+
def get_images_from_compose_files(compose_files: List[Path]) -> List[str]:
181+
"""Parse compose files and return a deduplicated list of image references.
182+
183+
Environment variable placeholders like `${DOCKER_NGINX_DAV_TAG:-latest}` are
184+
resolved from `os.environ`. For clickhouse images that appear without a tag
185+
(e.g. `clickhouse/integration-test`) the tag is looked up from `IMAGES_ENV`.
186+
Images with still-unresolvable variables are silently skipped.
187+
"""
188+
known_image_tags: dict[str, str] = {}
189+
for image_name, env_var in IMAGES_ENV.items():
190+
tag = os.environ.get(env_var)
191+
if tag:
192+
known_image_tags[image_name] = tag
193+
194+
def resolve_image(raw: str) -> Optional[str]:
195+
def replace_var(m: re.Match) -> str:
196+
var_name = m.group(1)
197+
default = m.group(2) if m.group(2) is not None else "latest"
198+
return os.environ.get(var_name, default)
199+
200+
resolved = re.sub(r"\$\{(\w+)(?::-([^}]*))?\}", replace_var, raw)
201+
if "${" in resolved:
202+
return None # Still-unresolvable variable — skip
203+
# Append the correct tag for tagless known clickhouse images
204+
if ":" not in resolved and resolved in known_image_tags:
205+
resolved = f"{resolved}:{known_image_tags[resolved]}"
206+
return resolved
207+
208+
images: set[str] = set()
209+
for compose_file in compose_files:
210+
try:
211+
content = compose_file.read_text()
212+
except OSError:
213+
continue
214+
for m in re.finditer(r"^\s+image:\s+(.+)$", content, re.MULTILINE):
215+
# Strip inline YAML comments from unquoted values before resolving
216+
# (e.g. `coredns/coredns:1.9.3 # :latest broke this test`).
217+
raw = re.sub(r"\s+#.*$", "", m.group(1).strip())
218+
resolved = resolve_image(raw)
219+
if resolved:
220+
images.add(resolved)
221+
222+
return sorted(images)
223+
224+
225+
def prefetch_images(
226+
images: List[str], retries: int = 3, pull_timeout: int = 300
227+
) -> bool:
228+
"""Pull every image in parallel using `ci/prefetch-integration-test-images`.
229+
230+
Images with no manifest for the current architecture (e.g. amd64-only images
231+
on arm64 runners) are silently skipped. Returns True on success, False if any
232+
image fails to pull for a real reason.
233+
"""
234+
if not images:
235+
print("No images to pre-fetch.")
236+
return True
237+
238+
script = f"{repo_dir}/ci/jobs/scripts/prefetch-integration-test-images"
239+
env = {
240+
**os.environ,
241+
"PULL_RETRIES": str(retries),
242+
"PULL_TIMEOUT": str(pull_timeout),
243+
}
244+
return Shell.check(
245+
f"{script} {' '.join(images)}",
246+
verbose=True,
247+
env=env,
248+
)
249+
250+
106251
def parse_args():
107252
parser = argparse.ArgumentParser(description="ClickHouse Build Job")
108253
parser.add_argument("--options", help="Job parameters: ...")
@@ -617,6 +762,22 @@ def main():
617762
else:
618763
assert False, f"No tag found for image [{image_name}]"
619764

765+
# Pre-fetch all Docker images needed by the selected test suites.
766+
# This is done after IMAGES_ENV vars are set so tag resolution works correctly.
767+
# Fail fast here rather than discovering missing images mid-test-run.
768+
all_test_modules = parallel_test_modules + sequential_test_modules
769+
compose_files = get_compose_files_for_test_modules(all_test_modules)
770+
print(
771+
f"Compose files detected for this batch ({len(compose_files)}): "
772+
+ ", ".join(str(f.name) for f in compose_files)
773+
)
774+
images_to_prefetch = get_images_from_compose_files(compose_files)
775+
if not prefetch_images(images_to_prefetch):
776+
Result.create_from(
777+
status=Result.Status.ERROR,
778+
info="Failed to pre-pull Docker images needed by the test batch",
779+
).complete_job()
780+
620781
test_env = {
621782
"CLICKHOUSE_TESTS_BASE_CONFIG_DIR": clickhouse_server_config_dir,
622783
"CLICKHOUSE_TESTS_SERVER_BIN_PATH": clickhouse_path,
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
#!/bin/bash
2+
# Pre-fetches Docker images in parallel before integration tests start.
3+
#
4+
# Images that are unavailable for the current architecture (e.g. amd64-only
5+
# images on arm64 runners) are silently skipped rather than failing the job.
6+
#
7+
# Usage:
8+
# ci/prefetch-integration-test-images IMAGE [IMAGE ...]
9+
#
10+
# Environment variables:
11+
# PULL_RETRIES Number of pull attempts per image (default: 3)
12+
# PULL_TIMEOUT Per-attempt timeout in seconds (default: 300)
13+
14+
set -uo pipefail
15+
16+
PULL_RETRIES="${PULL_RETRIES:-3}"
17+
PULL_TIMEOUT="${PULL_TIMEOUT:-300}"
18+
19+
images=("$@")
20+
if [[ ${#images[@]} -eq 0 ]]; then
21+
echo "No images to pre-fetch."
22+
exit 0
23+
fi
24+
25+
echo "Pre-fetching ${#images[@]} Docker image(s) in parallel:"
26+
for img in "${images[@]}"; do
27+
echo " $img"
28+
done
29+
30+
work_dir="$(mktemp -d)"
31+
trap 'rm -rf "$work_dir"' EXIT
32+
33+
pull_one()
34+
{
35+
local image="$1"
36+
local fail_file="$2"
37+
local attempt out start elapsed
38+
39+
for ((attempt = 1; attempt <= PULL_RETRIES; attempt++)); do
40+
echo "Pulling $image (attempt $attempt/$PULL_RETRIES) ..."
41+
start=$SECONDS
42+
if out=$(timeout "$PULL_TIMEOUT" docker pull "$image" 2>&1); then
43+
elapsed=$((SECONDS - start))
44+
echo "Pulled $image in ${elapsed}s"
45+
return 0
46+
fi
47+
elapsed=$((SECONDS - start))
48+
# Arch-specific image — not a real failure on this runner.
49+
if echo "$out" | grep -q "no matching manifest"; then
50+
echo "SKIP $image: no manifest for this architecture (${elapsed}s)"
51+
return 0
52+
fi
53+
echo "Pull of $image failed on attempt $attempt (${elapsed}s elapsed):"
54+
echo "$out"
55+
if ((attempt < PULL_RETRIES)); then
56+
sleep 5
57+
fi
58+
done
59+
60+
echo "FAILED to pull $image after $PULL_RETRIES attempt(s)"
61+
echo "$image" > "$fail_file"
62+
return 1
63+
}
64+
65+
# Launch all pulls in parallel; each writes its image name to a unique file on failure.
66+
pids=()
67+
for image in "${images[@]}"; do
68+
# Sanitize image name to a safe filename.
69+
safe="${image//[^a-zA-Z0-9._-]/_}"
70+
pull_one "$image" "$work_dir/fail_${safe}" &
71+
pids+=($!)
72+
done
73+
74+
# Collect exit statuses.
75+
any_fail=0
76+
for pid in "${pids[@]}"; do
77+
wait "$pid" || any_fail=1
78+
done
79+
80+
# Report failures.
81+
failed=()
82+
for f in "$work_dir"/fail_*; do
83+
[[ -s "$f" ]] && failed+=("$(cat "$f")")
84+
done
85+
86+
if [[ ${#failed[@]} -gt 0 ]]; then
87+
echo "ERROR: Failed to pull the following image(s) after $PULL_RETRIES attempt(s): ${failed[*]}"
88+
exit 1
89+
fi
90+
91+
if [[ $any_fail -ne 0 ]]; then
92+
# A pull_one returned non-zero without writing a fail file — shouldn't happen,
93+
# but guard against it anyway.
94+
echo "ERROR: One or more image pulls failed unexpectedly."
95+
exit 1
96+
fi
97+
98+
echo "All images pre-fetched successfully."

0 commit comments

Comments
 (0)