Remove zero byte by alexey-milovidov · Pull Request #85063 · ClickHouse/ClickHouse

alexey-milovidov · 2025-08-04T18:32:52Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Remove zero byte. Closes #85062. A few minor bugs were fixed. Functions structureToProtobufSchema, structureToCapnProtoSchema didn't correctly put a zero-terminating byte and were using a newline instead of it. That was leading to a missing newline in the output, and could lead to buffer overflows while using other functions that depend on the zero byte (such as logTrace, demangle, extractURLParameter, toStringCutToZero, and encrypt/decrypt). The regexp_tree dictionary layout didn't support processing strings with zero bytes. The formatRowNoNewline function, called with Values format or with any other format without a newline at the end of rows, erroneously cuts the last character of the output. Function stem contained an exception-safety error that could lead to a memory leak in a very rare scenario. The initcap function worked in the wrong way for FixedString arguments: it didn't recognize the start of the word at the start of the string if the previous string in a block ended with a word character. Fixed a security vulnerability of the Apache ORC format, which could lead to the exposure of uninitialized memory. Changed behavior of the function replaceRegexpAll and the corresponding alias, REGEXP_REPLACE: now it can do an empty match at the end of the string even if the previous match processed the whole string, such as in the case of ^a*|a*$ or ^|.* - this corresponds to the semantic of JavaScript, Perl, Python, PHP, Ruby, but differs to the semantic of PostgreSQL. Implementation of many functions has been simplified and optimized. Documentation for several functions was wrong and has now been fixed. Keep in mind that the output of byteSize for String columns and complex types, which consisted of String columns, has changed (from 9 bytes per empty string to 8 bytes per empty string), and this is normal.

nikitamikhaylov · 2025-08-04T19:51:20Z

FYI @al13n321, it can remove additional memcpy while reading Parquet files.

clickhouse-gh · 2025-08-04T22:28:50Z

Workflow [PR], commit [4fa0665]

Summary: ❌

job_name	test_name	status
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel)		failure
	02428_decimal_in_floating_point_literal	FAIL
	03315_join_temporary_table_names	FAIL
	02504_regexp_dictionary_table_source	FAIL
	00392_enum_nested_alter	FAIL
	00411_long_accurate_number_comparison_int2	FAIL
	Exception in test runner	FAIL
	Killed by signal (in clickhouse-server.log or clickhouse-server.err.log)	FAIL
	Fatal messages (in clickhouse-server.log or clickhouse-server.err.log)	FAIL
Stateless tests (amd_debug, distributed plan, s3 storage, parallel)		failure
	01086_odbc_roundtrip	FAIL
Performance Comparison (arm_release, master_head, 3/3)		failure
	Check Results	failure

amosbird · 2025-08-05T02:26:03Z

It would be perfect to land this before #82850, so that the new serialization layout can be better aligned and optimized.

…-zero-byte

alexey-milovidov · 2025-08-16T05:08:28Z

I've sped up hashed_dictionary by adding alignment.
Upd: but it's controversial - reverted.

…-zero-byte

This reverts commit 856ed0f.

Avogar

LGTM

Avogar · 2025-08-21T11:35:36Z

Stateless tests (amd_debug, distributed plan, s3 storage, parallel) - #85973
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) - #85861

Fix use-of-unitialized-value and crash introduced in #85063

Cherry pick #86171 to 25.8: Fix use-of-unitialized-value and crash introduced in #85063

…duced in #85063

Backport #86171 to 25.8: Fix use-of-unitialized-value and crash introduced in #85063

clickhouse-gh bot added the pr-backward-incompatible Pull request with backwards incompatible changes label Aug 4, 2025

alexey-milovidov added 5 commits August 5, 2025 02:21

Remove zero byte

0bd3c24

arrayStringConcat

99b2f1d

Various

adb6335

Various

956ea6a

Various

5c6cbc5

alexey-milovidov force-pushed the remove-zero-byte branch from 8fd1cff to 5c6cbc5 Compare August 5, 2025 00:21

alexey-milovidov added 3 commits August 5, 2025 02:32

Various

ee351dc

Various

277c04a

Various

c937ac4

alexey-milovidov added 17 commits August 5, 2025 16:50

Merge branch 'master' of github.com:ClickHouse/ClickHouse into remove…

6d94ee4

…-zero-byte

Various

f28bfb7

Various

3fbe9f7

Various

08e1964

Various

1b61965

Maybe fix something and improve the code

0852f73

Various

46a6df9

Various

38cbf33

Various

31828bc

Fix bad code

5969775

Fix bad code

f7b1fbd

Various

3d7eedb

Various

803f79d

Various

05ec86b

Various

3c8eff1

Various

ce6c2b3

Various

eb1b585

Fix error in extractURLParameters

2da2e66

alexey-milovidov added 7 commits August 17, 2025 01:32

Update test reference

58bc6c9

Faster IPv4 parsing

4997108

Faster IPv4 parsing

b7d7be2

Faster IPv4 parsing

313cd84

Merge branch 'master' of github.com:ClickHouse/ClickHouse into remove…

2058460

…-zero-byte

Move code to .cpp

4a55287

Revert "Attempt to speedup dictionaries with memory alignment"

8b4ffb3

This reverts commit 856ed0f.

Avogar approved these changes Aug 18, 2025

View reviewed changes

Add stateful flag to the new test

4fa0665

Avogar added this pull request to the merge queue Aug 21, 2025

Merged via the queue into master with commit d25f875 Aug 21, 2025
119 of 122 checks passed

Avogar deleted the remove-zero-byte branch August 21, 2025 12:03

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 21, 2025

Algunenano mentioned this pull request Aug 25, 2025

MemorySanitizer: use-of-uninitialized-value in DB::getURLHost #86133

Closed

Avogar mentioned this pull request Aug 25, 2025

MemorySanitizer: use-of-uninitialized-value in DB::ColumnString::sizeAt(long) #86134

Closed

Avogar added a commit to Avogar/ClickHouse that referenced this pull request Aug 26, 2025

Fix use-of-unitialized-value and crash introduced in ClickHouse#85063

3db2390

github-merge-queue bot pushed a commit that referenced this pull request Aug 26, 2025

Merge pull request #86171 from Avogar/fix-unitialized-memory-string

9d39fcf

Fix use-of-unitialized-value and crash introduced in #85063

robot-ch-test-poll1 added a commit that referenced this pull request Aug 26, 2025

Merge pull request #86234 from ClickHouse/cherrypick/25.8/86171

e00587b

Cherry pick #86171 to 25.8: Fix use-of-unitialized-value and crash introduced in #85063

robot-clickhouse added a commit that referenced this pull request Aug 26, 2025

Backport #86171 to 25.8: Fix use-of-unitialized-value and crash intro…

8570e42

…duced in #85063

clickhouse-gh bot added a commit that referenced this pull request Aug 26, 2025

Merge pull request #86235 from ClickHouse/backport/25.8/86171

1df1b7f

Backport #86171 to 25.8: Fix use-of-unitialized-value and crash introduced in #85063

This was referenced Aug 27, 2025

Crash in CRC32Hash #86261

Closed

Fix crash with replaceRegex, a FixedString haystack and an empty needle #86270

Merged

PedroTadim mentioned this pull request Aug 27, 2025

ASAN heap buffer overflow with empty JSON object #86279

Closed

UnamedRus mentioned this pull request Sep 9, 2025

approx_top_sum state format changed between 25.7 and 25.8 - null bytes appear in finalized strings #86915

Closed

antonio2368 mentioned this pull request Nov 24, 2025

25.7 querying 25.8 does not finalize aggregation by String columns in some cases #90631

Closed

vitlibar mentioned this pull request Mar 16, 2026

Add a natural sort key function #90322

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove zero byte#85063

Remove zero byte#85063
Avogar merged 173 commits intomasterfrom
remove-zero-byte

alexey-milovidov commented Aug 4, 2025 •

edited

Loading

Uh oh!

nikitamikhaylov commented Aug 4, 2025

Uh oh!

clickhouse-gh bot commented Aug 4, 2025 •

edited

Loading

Uh oh!

amosbird commented Aug 5, 2025

Uh oh!

alexey-milovidov commented Aug 16, 2025 •

edited

Loading

Uh oh!

Avogar left a comment

Uh oh!

Avogar commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

alexey-milovidov commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

nikitamikhaylov commented Aug 4, 2025

Uh oh!

clickhouse-gh bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amosbird commented Aug 5, 2025

Uh oh!

alexey-milovidov commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Avogar left a comment

Choose a reason for hiding this comment

Uh oh!

Avogar commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alexey-milovidov commented Aug 4, 2025 •

edited

Loading

clickhouse-gh bot commented Aug 4, 2025 •

edited

Loading

alexey-milovidov commented Aug 16, 2025 •

edited

Loading