Skip to content

Commit 220603a

Browse files
Merge origin/master into yarik/case-insensitive-identifiers
Resolved conflicts: - `IdentifierNode.cpp`: adapted `setQuoteStyles` to use `make_intrusive` instead of `std::make_shared` - `QueryAnalyzer.cpp`: combined PR's `is_part_double_quoted` tracking with master's `allow_to_resolve_niladic_functions` parameter; added both new extern settings - `Settings.h`: added both `CaseInsensitiveNames` and `DeduplicateInsertMode` setting types - `ExpressionElementParsers.cpp`: adapted `setQuoteStyle`/`setQuoteStyles` calls to use `make_intrusive`
2 parents 5dce45b + 299127e commit 220603a

File tree

7,517 files changed

+446337
-90400
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

7,517 files changed

+446337
-90400
lines changed

.claude/CLAUDE.md

Lines changed: 140 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,40 @@
11
When working with a branch, do not use rebase or amend - add new commits instead.
22

3+
Do not commit to the master branch. Create a new branch for every task.
4+
35
When writing text such as documentation, comments, or commit messages, wrap literal names from ClickHouse SQL language, classes and functions, or literal excerpts from log messages inside inline code blocks, such as: `MergeTree`.
46

57
When writing text such as documentation, comments, or commit messages, write names of functions and methods as `f` instead of `f()` - we prefer it for mathematical purity when it refers a function itself rather than its application.
68

79
When mentioning logical errors, say "exception" instead of "crash", because they don't crash the server in the release build.
810

9-
Links to ClickHouse CI, such as `https://s3.amazonaws.com/clickhouse-test-reports/json.html?...` should be interpreted with a headless browser, e.g., Playwright, because they contain JavaScript. Use the tool at `.claude/tools/fetch_ci_report.js`:
11+
Links to ClickHouse CI should be analyzed using the tool at `.claude/tools/fetch_ci_report.js`, which directly fetches the underlying JSON data without requiring a browser. It accepts GitHub PR URLs (fetches all CI reports) or direct S3/CI HTML URLs.
1012

1113
```bash
12-
# Install playwright if needed (one-time setup)
13-
cd /tmp && npm install playwright && npx playwright install chromium
14+
# Fetch all CI reports for a PR
15+
node .claude/tools/fetch_ci_report.js "https://github.com/ClickHouse/ClickHouse/pull/12345"
16+
17+
# Show only failed tests with CIDB links
18+
node .claude/tools/fetch_ci_report.js "https://github.com/ClickHouse/ClickHouse/pull/12345" --failed --cidb
19+
20+
# Fetch only a specific report from a PR (by index)
21+
node .claude/tools/fetch_ci_report.js "https://github.com/ClickHouse/ClickHouse/pull/12345" --report 2
22+
23+
# Filter by test name, show artifact links
24+
node .claude/tools/fetch_ci_report.js "<url>" --test peak_memory --links
1425

15-
# Fetch and analyze CI report
16-
node /path/to/ClickHouse/.claude/tools/fetch_ci_report.js "<ci-url>" [options]
26+
# Download logs and show failed tests
27+
node .claude/tools/fetch_ci_report.js "<url>" --failed --download-logs
1728

1829
# Options:
19-
# --test <name> Filter tests by name
20-
# --failed Show only failed tests
21-
# --all Show all test results
22-
# --links Show artifact links (logs.tar.gz, etc.)
23-
# --download-logs Download logs.tar.gz to /tmp/ci_logs.tar.gz
24-
25-
# Examples:
26-
node .claude/tools/fetch_ci_report.js "https://s3.amazonaws.com/..." --failed --links
27-
node .claude/tools/fetch_ci_report.js "https://s3.amazonaws.com/..." --test peak_memory --download-logs
30+
# --test <name> Filter tests by name
31+
# --failed Show only failed tests
32+
# --all Show all test results
33+
# --links Show artifact links (logs.tar.gz, etc.)
34+
# --cidb Show CIDB links for failed tests
35+
# --report <number> For PR URLs: fetch only one specific report
36+
# --download-logs Download logs.tar.gz to /tmp/ci_logs.tar.gz
37+
# --credentials <user,password> HTTP Basic Auth for private repositories
2838
```
2939

3040
After downloading logs, extract specific test logs:
@@ -33,6 +43,122 @@ tar -xzf /tmp/ci_logs.tar.gz ci/tmp/pytest_parallel.jsonl
3343
grep "test_name" ci/tmp/pytest_parallel.jsonl | python3 -c "import sys,json; [print(json.loads(l).get('longrepr','')) for l in sys.stdin if 'failed' in l]"
3444
```
3545

46+
To analyze CI performance comparison results (slower/faster queries, unstable queries), use the tool at `.claude/tools/fetch_perf_report.py`. It fetches the machine-readable `all-query-metrics.tsv` from S3 for each performance shard, filters to `client_time`, and classifies queries as changed or unstable using the same thresholds as `compare.sh`.
47+
48+
```bash
49+
# Show performance changes for a PR (default: changed + unstable queries only)
50+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345"
51+
52+
# Filter by architecture
53+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --arch amd
54+
55+
# Show only per-shard summary (no individual queries)
56+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --summary
57+
58+
# Filter by test name
59+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --test group_by
60+
61+
# Show all queries (not just changes)
62+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --all --sort times
63+
64+
# JSON output for structured analysis
65+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --json
66+
67+
# TSV output for piping
68+
python3 .claude/tools/fetch_perf_report.py "https://github.com/ClickHouse/ClickHouse/pull/12345" --tsv
69+
70+
# Also accepts CI HTML URLs
71+
python3 .claude/tools/fetch_perf_report.py "https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=12345&sha=abc123"
72+
```
73+
74+
Key options: `--arch <amd|arm|all>` to filter architecture, `--metric <name>` to change metric (default `client_time`), `--shard <n>` for a specific shard, `--test <name>` / `--query <text>` for substring filtering, `--sort <diff|times|threshold|test>` for ordering, `--summary` for shard-level overview only, `--json` / `--tsv` for machine-readable output.
75+
76+
To compile and run C++ code snippets against the ClickHouse codebase without modifying any source files, use the tool at `.claude/tools/cppexpr.sh`. This is a wrapper around `utils/c++expr` that auto-detects build directories and handles working directory setup. When asked about the size, layout, or alignment of ClickHouse data structures, or asked to compare performance of code snippets, use this tool to get a definitive answer instead of guessing.
77+
78+
```bash
79+
# Query the size of a ClickHouse data structure
80+
.claude/tools/cppexpr.sh -i Core/Block.h 'OUT(sizeof(DB::Block))'
81+
82+
# Query multiple expressions at once
83+
.claude/tools/cppexpr.sh -i Core/Field.h 'OUT(sizeof(DB::Field)) OUT(sizeof(DB::Array))'
84+
85+
# Use global code for helper functions or custom types
86+
.claude/tools/cppexpr.sh -g 'struct Foo { int a; double b; };' 'OUT(sizeof(Foo)) OUT(alignof(Foo))'
87+
88+
# Benchmark a code snippet (100000 iterations, 5 tests)
89+
.claude/tools/cppexpr.sh -i Common/Stopwatch.h -b 100000 'Stopwatch sw;'
90+
91+
# Standalone mode (no ClickHouse headers, just standard C++)
92+
.claude/tools/cppexpr.sh --plain 'OUT(sizeof(std::string))'
93+
```
94+
95+
Key options: `-i HEADER` to include headers, `-g 'CODE'` for global-scope code, `-b STEPS` for benchmarking, `-l LIB` to link extra libraries, `--plain` for standalone compilation without ClickHouse. The `OUT(expr)` macro prints `expr -> value`.
96+
97+
When asked to analyze assembly, inspect generated code, find register spills, check branch density, compare codegen between builds, or investigate optimization opportunities in compiled functions, use the tool at `.claude/tools/analyze-assembly.py`. It disassembles functions from a compiled binary, builds a CFG, computes metrics (spill/branch/call density), and reports findings. Use it instead of manually running `llvm-objdump` or `llvm-nm`.
98+
99+
```bash
100+
# Basic analysis of a function
101+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>"
102+
103+
# Search for overloaded/templated functions by regex
104+
python3 .claude/tools/analyze-assembly.py <binary> "insertRangeFrom" --search
105+
106+
# Pick a specific overload from ambiguous results
107+
python3 .claude/tools/analyze-assembly.py <binary> "insertRangeFrom" --search --select 3
108+
109+
# JSON output for structured analysis
110+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --format json
111+
112+
# Source-interleaved disassembly (needs debug info)
113+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --source
114+
115+
# Microarchitectural analysis of loop bodies (--mcpu is required)
116+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --mca --mcpu=znver3
117+
118+
# Profile-weighted analysis (re-ranks findings by runtime impact)
119+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --perf-map tmp/perf.map.jsonl
120+
121+
# Compare codegen between two builds
122+
python3 .claude/tools/analyze-assembly.py --before <old_binary> --after <new_binary> "<function_name>"
123+
124+
# Analyze function at a specific address (useful for heavily-templated symbols)
125+
python3 .claude/tools/analyze-assembly.py <binary> 0x0dc7c780
126+
127+
# Verbose mode to see tool commands
128+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" -v
129+
```
130+
131+
Key options: `--search` for regex matching, `--fuzzy` for substring matching, `--select N` to pick from ambiguous results, `--all` to analyze all matches, `--context N` to show surrounding symbols, `--max-instructions N` to control output size, `--mca --mcpu=<model>` for llvm-mca throughput analysis, `--perf-map <file>` for runtime-weighted scoring, `--before`/`--after` for diff mode. Hex addresses (e.g. `0x0dc7c780`) are resolved to the enclosing symbol automatically — useful when symbol names are too long for regex matching. The tool caches symbol tables by build-id for fast repeated queries.
132+
36133
You can build multiple versions of ClickHouse inside `build_*` directories, such as `build`, `build_debug`, `build_asan`, etc.
37134

38135
You can run integration tests as in `tests/integration/README.md` using: `python -m ci.praktika run "integration" --test <selectors>` invoked from the repository root.
136+
137+
When writing tests, do not add "no-*" tags (like "no-parallel") unless strictly necessarily.
138+
139+
When writing tests in tests/queries, prefer adding a new test instead of extending existing ones.
140+
141+
When adding a new test, consult `./tests/queries/0_stateless/add-test` to determine the correct name prefix for the new test.
142+
143+
When writing C++ code, always use Allman-style braces (opening brace on a new line). This is enforced by the style check in CI.
144+
145+
Never use sleep in C++ code to fix race conditions - this is stupid and not acceptable!
146+
147+
When writing messages, say ASan, not ASAN, and similar (because there are two words: Address Sanitizer).
148+
149+
When checking the CI status, pay attention to the comment from robot with the links first. Look at the Praktika reports first. The logs of GitHub actions usually contain less info.
150+
151+
Do not use `-j` argument with ninja; do not use `nproc` - let it decide automatically.
152+
153+
When building ClickHouse (running ninja), always redirect output to the build log file in the build directory. Always use a subagent to analyze the log and return only a concise summary.
154+
155+
When running tests, always redirect output to a log file in the build directory (e.g. `<build_directory>/test_<test_name>.log`). Use unique file names per test so multiple tests can run in parallel. Always use a subagent to analyze each log and return only a concise summary.
156+
157+
If I provided a URL with the CI report, logs, or examples, include it in the commit message.
158+
159+
When creating or updating a pull request, use `.github/PULL_REQUEST_TEMPLATE.md` as the PR body template. The body should contain: a short description of the change and motivation, then the Changelog category (leave one from the list), then the Changelog entry, then the Documentation entry checkbox. Do not invent a custom "## Summary" or "## Test plan" structure — follow the template exactly. The "Bug Fix" category should be used only for real bug fixes, while for fixing CI reports you can use the "CI Fix or improvement" category. Include the URL to CI report I provided if any. If the PR is about a CI failure, search for the corresponding open issues and provide a link in the PR description.
160+
161+
ARM machines in CI are not slow. They are similar to x86 in performance.
162+
163+
Use `tmp` subdirectory in the current directory for temporary files (logs, downloads, scripts, etc.), do not use `/tmp`. Create the directory if needed.
164+

.claude/instructions.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# ClickHouse Development Instructions
2+
3+
## Running Stateless Tests
4+
5+
Stateless tests are located in `tests/queries/0_stateless/`.
6+
7+
### Prerequisites
8+
1. Build ClickHouse: `cd build && ninja clickhouse`
9+
2. Start the server: `./build/programs/clickhouse server --config-file ./programs/server/config.xml`
10+
3. Wait for server to be ready: `./build/programs/clickhouse client -q "SELECT 1"`
11+
12+
### Running Tests
13+
Run tests with the correct port environment variables (default config uses TCP=9000, HTTP=8123):
14+
15+
```bash
16+
CLICKHOUSE_PORT_TCP=9000 CLICKHOUSE_PORT_HTTP=8123 ./tests/clickhouse-test <test_name>
17+
```
18+
19+
### Useful Flags
20+
- `--no-random-settings` - Disable settings randomization (useful for deterministic debugging)
21+
- `--no-random-merge-tree-settings` - Disable MergeTree settings randomization
22+
- `--record` - Automatically update `.reference` files when stdout differs
23+
24+
### Test File Extensions
25+
- `.sql` - SQL test (most common)
26+
- `.sql.j2` - Jinja2-templated SQL test
27+
- `.sh` - Shell script test
28+
- `.py` - Python test
29+
- `.expect` - Expect script test
30+
- `.reference` - Expected output (compared against stdout)
31+
- `.gen.reference` - Generated reference for `.j2` tests
32+
33+
### Database Name Normalization
34+
The test runner creates a temporary database with a random name (e.g., `test_abc123`) for each test.
35+
After test execution, the random database name is replaced with `default` in stdout/stderr files before comparison with `.reference`.
36+
This means `.reference` files should use `default` for database names, NOT `${CLICKHOUSE_DATABASE}` or the actual random name.
37+
38+
### Test Tags
39+
Tests can have tags in the first line as a comment: `-- Tags: no-fasttest, no-parallel`
40+
Common tags: `disabled`, `no-fasttest`, `no-parallel`, `no-random-settings`, `no-random-merge-tree-settings`, `long`
41+
42+
### Random Settings Limits
43+
Tests can specify limits for randomized settings: `-- Random settings limits: max_threads=(1, 4); ...`
44+
45+
### Stopping the Server
46+
Find and kill the server process:
47+
```bash
48+
pgrep -f "clickhouse server" # Get PIDs
49+
kill <pid1> <pid2> # Stop processes
50+
```

.claude/settings.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
"Bash(gh pr view:*)",
5+
"Bash(gh issue view:*)",
6+
"Bash(gh pr list:*)",
7+
"Bash(gh issue list:*)",
8+
"Bash(gh pr checks:*)",
9+
"Bash(gh pr diff:*)",
10+
"Bash(gh search:*)",
11+
"WebFetch(domain:github.com)"
12+
]
13+
}
14+
}

0 commit comments

Comments
 (0)