feat(sql): add length_bytes() function for varchars#6685
feat(sql): add length_bytes() function for varchars#6685bluestreak01 merged 3 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/test/java/io/questdb/test/std/str/Utf8sTest.java (1)
303-315: Fix codepoint iteration in the fuzz comparator.
The comparator advances by UTF‑16 code units, so surrogate pairs can be misordered relative toUtf8s.compare. Iterate by code point and compare codepoint counts for the length tie‑break.🐛 Proposed fix
- Arrays.sort(strings, (l, r) -> { - int len = Math.min(l.length(), r.length()); - for (int i = 0; i < len; i++) { - int lCodepoint = l.codePointAt(i); - int rCodepoint = r.codePointAt(i); - int diff = lCodepoint - rCodepoint; - if (diff != 0) { - return diff; - } - } - return l.length() - r.length(); - }); + Arrays.sort(strings, (l, r) -> { + int i = 0; + int j = 0; + while (i < l.length() && j < r.length()) { + int lCodepoint = l.codePointAt(i); + int rCodepoint = r.codePointAt(j); + int diff = lCodepoint - rCodepoint; + if (diff != 0) { + return diff; + } + i += Character.charCount(lCodepoint); + j += Character.charCount(rCodepoint); + } + return l.codePointCount(0, l.length()) - r.codePointCount(0, r.length()); + });
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/engine/groupby/GroupByUtf8Sink.java`:
- Around line 155-159: The method zeroPaddedSixPrefix() can call Unsafe.getLong
when ptr == 0 causing invalid memory reads; update zeroPaddedSixPrefix() to
first check if ptr == 0 and return 0L (or an appropriate neutral masked value)
before calling Unsafe.getUnsafe().getLong(ptr + HEADER_SIZE), then apply the
existing VARCHAR_INLINED_PREFIX_MASK as before so the unsafe read is only
executed when ptr is non-zero.
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByUtf8Sink.java
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 28 / 28 (100.00%) file detail
|
Includes the following changes:
length_bytes()SQL function. It returns the number of bytes present in the varchar argumentmin(varchar)andmax(varchar)group-by functions by avoiding data vector access when prefixes differ:GroupByUtf8Sink: AddedzeroPaddedSixPrefix(),longAt(),intAt()methods and ensured proper zero-padding for short strings.Benchmarks
Queries run on clickbench's
hitstable, on Ryzen 7900x box: