Skip to content

feat(sql): Add array_build() function for dynamic array construction#6794

Merged
bluestreak01 merged 5 commits intomasterfrom
jv/build_array
Feb 21, 2026
Merged

feat(sql): Add array_build() function for dynamic array construction#6794
bluestreak01 merged 5 commits intomasterfrom
jv/build_array

Conversation

@javier
Copy link
Copy Markdown
Contributor

@javier javier commented Feb 20, 2026

Introduces array_build(nDims, size, filler1, ...) that creates DOUBLE[] or DOUBLE[][] arrays with controlled shape and fill values. The size parameter accepts a scalar integer or a DOUBLE[] (using its cardinality). Each filler can be a scalar (repeated for all elements) or a DOUBLE[] (copied element-by-element, NaN-padded if shorter, truncated if longer).

Docs at https://preview-374--questdb-documentation.netlify.app/docs/query/functions/array/#array_build

Includes 34 unit tests covering all code paths (scalar/array fillers, 1D/2D, NaN padding, truncation, null handling, error conditions, large arrays, multi-dim rejection) and 4 fuzz tests (1D/2D scalar fill, 1D/2D array copy identity) with 100 iterations each.

Introduces array_build(nDims, size, filler1, ...) that creates DOUBLE[]
or DOUBLE[][] arrays with controlled shape and fill values. The size
parameter accepts a scalar integer or a DOUBLE[] (using its cardinality).
Each filler can be a scalar (repeated for all elements) or a DOUBLE[]
(copied element-by-element, NaN-padded if shorter, truncated if longer).

nDims is validated as a compile-time constant restricted to 1 or 2.
NULL scalar size returns a null array. Multi-dimensional array arguments
(DOUBLE[][] etc.) are rejected. Bulk copy via appendDataToMem is used
when source and target sizes match.

Includes 34 unit tests covering all code paths (scalar/array fillers,
1D/2D, NaN padding, truncation, null handling, error conditions, large
arrays, multi-dim rejection) and 4 fuzz tests (1D/2D scalar fill, 1D/2D
array copy identity) with 100 iterations each.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jv/build_array

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@javier
Copy link
Copy Markdown
Contributor Author

javier commented Feb 20, 2026

@coderabbitai, please review critically. Thanks

@javier javier changed the title Add array_build() function for dynamic array construction feat(sql): Add array_build() function for dynamic array construction Feb 20, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the PR details and file list
gh pr view 6794 --json files,additions,deletions --jq '.files[] | "\(.path) +\(.additions) -\(.deletions)"'

Length of output: 306


🏁 Script executed:

#!/bin/bash
# Get the diff for the main implementation file
gh pr diff 6794 -- core/src/main/java/io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get the diff for test files
gh pr diff 6794 -- core/src/test/java/io/questdb/test/griffin/engine/functions/

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Check if there are any related tests
fd -e java -x grep -l "array_build\|BuildArray" {} \; | head -20

Length of output: 288


🏁 Script executed:

#!/bin/bash
# Look at the full implementation file
gh pr diff 6794 --patch | head -1000

Length of output: 36526


🏁 Script executed:

#!/bin/bash
# Read the full implementation file
cat core/src/main/java/io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java

Length of output: 10812


🏁 Script executed:

#!/bin/bash
# Check the test file to understand what's tested
cat core/src/test/java/io/questdb/test/griffin/engine/functions/array/BuildArrayFunctionFactoryTest.java | head -250

Length of output: 8711


🏁 Script executed:

#!/bin/bash
# Look for similar array function implementations for comparison
fd -e java ArrayFunction | head -10

Length of output: 900


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation about ArrayView.DIM_MAX_LEN
rg -n "DIM_MAX_LEN" --type=java -C2

Length of output: 5139


🏁 Script executed:

#!/bin/bash
# Check DirectArray implementation for understanding memory management
rg -n "class DirectArray" --type=java -A30

Length of output: 3103


🏁 Script executed:

#!/bin/bash
# Check how other array functions handle thread safety and mutable state
rg -n "extends ArrayFunction" --type=java -A15 | grep -E "(class|DirectArray|isThreadSafe|private.*array)" | head -40

Length of output: 6677


🏁 Script executed:

#!/bin/bash
# Check if there are bulk memory fill utilities in the codebase
rg -n "setMemory|fillMemory|memset" --type=java | head -20

Length of output: 2314


🏁 Script executed:

#!/bin/bash
# Check for integer overflow protection patterns in array code
rg -n "Math\.multiplyExact|Math\.addExact|overflow" --type=java -C2 | head -40

Length of output: 4011


🏁 Script executed:

#!/bin/bash
# Check what other numeric types are supported in array operations
rg -n "INT\[\]|LONG\[\]|FLOAT\[\]" --type=java | grep -i "array" | head -20

Length of output: 1812


🏁 Script executed:

#!/bin/bash
# Check if there are 3D array examples in the codebase
rg -n "DOUBLE\[\]\[\]\[\]|nDims.*3|dimension.*3" --type=java -C2 | head -30

Length of output: 3549


🏁 Script executed:

#!/bin/bash
# Check if Vect.setMemoryDouble is available and how it's used
rg -n "Vect\.setMemoryDouble" --type=java -B2 -A2

Length of output: 2130


🏁 Script executed:

#!/bin/bash
# Look for examples of 3D and higher dimensional arrays in the system
rg -n "encodeArrayType.*3|DOUBLE\[\]\[\]\[\]" --type=java -C3 | head -50

Length of output: 4620


🏁 Script executed:

#!/bin/bash
# Check if there are multiplication overflow checks in array dimension calculations
rg -n "nDims.*size|Math\.multiplyExact.*size|overflow.*dimension" --type=java -C2

Length of output: 4469


🏁 Script executed:

#!/bin/bash
# See if other functions register themselves - check function registration
rg -n "array_build" --type=java

Length of output: 11411


🏁 Script executed:

#!/bin/bash
# Check how other similar builder functions work in the codebase
rg -n "class.*Build.*Function|array_create|array_make" --type=java -C5 | head -80

Length of output: 5705


@javier, I'll provide a thorough critical review. Let me examine the implementation for potential issues.

Critical Review of array_build() Implementation

🔴 CRITICAL: Integer Overflow Vulnerability (Lines 215-224)

Issue: For 2D arrays, you calculate total elements as nDims * size without overflow protection. With nDims=2 and size approaching DIM_MAX_LEN (268,435,455), you get 2 × 268M = 536M elements × 8 bytes = 4.3GB allocation with no overflow check.

// Lines 215-224: No overflow protection
if (nDims == 1) {
    array.setDimLen(0, size);
} else {
    array.setDimLen(0, nDims);  // Always 2
    array.setDimLen(1, size);   // Could be huge
}
array.applyShape(position);  // Total = 2 * size, unchecked

Attack vector:

SELECT array_build(2, 200_000_000, 1.0, 2.0)  -- 400M elements = 3.2GB, no validation

Required fixes:

  1. Add explicit overflow check before applyShape():
if (nDims == 2 && size > ArrayView.DIM_MAX_LEN / nDims) {
    throw CairoException.nonCritical().position(sizePos)
        .put("2D array size would exceed maximum element count");
}
  1. Add test case for this boundary condition
  2. Consider using Math.multiplyExact() for safety

🟡 PERFORMANCE: Unoptimized Scalar Fill (Lines 249-252)

Issue: Scalar fill uses naive loop instead of vectorized bulk operation. I found Vect.setMemoryDouble() used in TableUtils.java:1768 for exactly this purpose.

// Current implementation - line 249-252
double val = fillerFunc.getDouble(rec);
for (int j = 0; j < size; j++) {
    mem.putDouble(val);  // Element-by-element
}

Performance impact: For array_build(1, 1_000_000, 42.0), this does 1M individual writes vs. one vectorized memset.

Required fix:

double val = fillerFunc.getDouble(rec);
if (size > 0) {
    long addr = mem.getAddress() + mem.getAppendOffset();
    Vect.setMemoryDouble(addr, val, size);
    mem.skip(size * Double.BYTES);
}

Benchmark requirement: Measure performance delta for arrays with 10K, 100K, 1M elements.


🟡 DESIGN FLAW: Unnecessary setType() Overhead (Line 218)

Issue: You call array.setType(type) on every getArray() invocation, even though type is immutable (set in constructor at line 162).

// Line 218: Called per-row
array.setType(type);  // type never changes!

The comment says "setType reinitializes the shape list after a prior ofNull" but this is inefficient. ofNull() should preserve type.

Performance impact: For queries processing millions of rows, this adds unnecessary overhead.

Required investigation:

  1. Why does ofNull() require shape reinitialization?
  2. Can DirectArray be modified to preserve type across null transitions?
  3. If not, document this trade-off explicitly

🟠 DESIGN LIMITATION: Arbitrary Restrictions Not Justified

Issues:

  1. nDims restricted to 1-2 (Line 77): System supports 3D+ arrays (DOUBLE[][][] exists throughout tests), yet you artificially limit this. Why?

  2. DOUBLE only: Infrastructure supports other numeric types (VARCHAR[], etc.), but function signature (lV) forces DOUBLE. Why not array_build_int(), array_build_long()?

Questions requiring answers:

  • Is the 1-2D limit a temporary MVP scope or permanent design decision?
  • What's the technical blocker for 3D+ support?
  • What's the plan for other numeric types (INT[], LONG[], FLOAT[])?
  • Should this be array_build_double() with sibling functions planned?

Recommendation: Either:

  1. Document these as deliberate MVP limitations with follow-up issues, OR
  2. Remove arbitrary limits and support full dimensionality

🟠 MISSED OPTIMIZATION: Bulk Copy for Truncation (Lines 237-247)

Issue: Bulk copy only when srcLen == size (line 237), but you could also use it when truncating:

// Current - lines 237-247
if (srcLen == size) {
    src.appendDataToMem(mem);  // Fast path
} else {
    int copyLen = Math.min(srcLen, size);
    for (int j = 0; j < copyLen; j++) {  // Slow path even for truncation
        mem.putDouble(src.getDouble(j));
    }
    // ...
}

Fix:

int copyLen = Math.min(srcLen, size);
if (copyLen > 0) {
    // Bulk copy the available portion
    long srcAddr = /* get src address */;
    long dstAddr = mem.getAddress() + mem.getAppendOffset();
    Unsafe.getUnsafe().copyMemory(srcAddr, dstAddr, copyLen * Double.BYTES);
    mem.skip(copyLen * Double.BYTES);
}
// NaN-pad remainder
for (int j = copyLen; j < size; j++) {
    mem.putDouble(Double.NaN);
}

🟢 TEST COVERAGE: Gaps in Edge Cases

87.7% coverage (107/122 lines) = 15 uncovered lines. Based on analysis:

Missing test cases:

  1. ✗ Integer overflow in 2D: array_build(2, 268_435_455, 1.0, 2.0)
  2. ✗ Large 1D near limit: array_build(1, 268_435_455, 1.0)
  3. ✗ Bulk copy path verification (line 237) - fuzz test may cover this, but explicit test needed
  4. ✗ NaN handling in array filler vs scalar filler
  5. ✗ Performance regression test for scalar fill optimization

Required additions:

`@Test`
public void testErrorOverflowIn2DArray() throws Exception {
    assertException(
        "SELECT array_build(2, 134_217_728, 1.0, 2.0)", // 2 * 134M > 268M
        7,
        "would exceed maximum element count"
    );
}

`@Test`  
public void testMaxSizeArray1D() throws Exception {
    // Should succeed at exact limit
    assertSql("cnt\n268435455\n", 
        "SELECT array_count(array_build(1, 268_435_455, 0.0)) AS cnt");
}

🟢 MINOR: Thread Safety Documentation

Line 262 returns false for isThreadSafe(), consistent with other array functions using DirectArray. However:

Question: Does the query engine guarantee function instance isolation per thread, or can this cause issues in parallel scans?

Required: Document the threading model assumption in class javadoc.


Summary of Required Actions

Before merge:

  1. ✅ Fix integer overflow vulnerability (CRITICAL)
  2. ✅ Add test for overflow edge case
  3. ⚠️ Optimize scalar fill with Vect.setMemoryDouble() OR provide benchmark justification for current approach
  4. ⚠️ Investigate/fix setType() overhead OR document why necessary

Follow-up issues:

  1. Support for 3D+ arrays (or document why limited)
  2. Support for INT[], LONG[], FLOAT[] types
  3. Bulk copy optimization for truncation scenario
  4. Performance regression suite for array operations

What's your justification for the 1-2D limitation and DOUBLE-only restriction? If these are MVP scope decisions, they should be documented in code comments and tracked as follow-up work.


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: CR
Repo: questdb/questdb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-13T18:25:55.544Z
Learning: QuestDB supports multidimensional arrays (e.g., DOUBLE[], DOUBLE[][]). Dimensionality is encoded in the column type itself, so DOUBLE[] and DOUBLE[][] are distinct column types.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the main implementation file
fd -t f "BuildArrayFunctionFactory.java" --exec cat {}

Length of output: 10812


🏁 Script executed:

#!/bin/bash
# Find and examine the test files for array_build
fd -t f -e java . --exec grep -l "array_build\|BuildArray" {} \; | head -20

Length of output: 288


🏁 Script executed:

#!/bin/bash
# Get the PR diff to see what was changed
gh pr diff 6794 --repo questdb/questdb

Length of output: 34851


🏁 Script executed:

#!/bin/bash
# Look for any related array function implementations for comparison
fd -t f "ArrayFunction" --type f -e java | head -10

Length of output: 900


🏁 Script executed:

#!/bin/bash
# Check for performance-related patterns in the implementation
rg -n "appendDataToMem|copyFrom|bulk" --type java -g "*BuildArray*"

Length of output: 303


@javier, I'll provide a critical review as requested. Let me examine the coverage gaps and implementation details more closely.


🏁 Script executed:

#!/bin/bash
# Check which lines are uncovered by examining the test output more carefully
cat core/src/main/java/io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java | wc -l

Length of output: 3


🏁 Script executed:

#!/bin/bash
# Look at similar array function implementations for comparison patterns
cat core/src/main/java/io/questdb/griffin/engine/functions/array/DoubleNegArrayFunctionFactory.java

Length of output: 4966


🏁 Script executed:

#!/bin/bash
# Check if there are any TODOs or FIXMEs in the implementation
rg -n "TODO|FIXME|XXX|HACK" core/src/main/java/io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check DirectArray implementation to understand reuse patterns
rg -n "class DirectArray" --type java -A 20 | head -40

Length of output: 2006


🏁 Script executed:

#!/bin/bash
# Look for other array construction functions to compare design patterns
fd -t f -e java . --exec grep -l "ArrayFunction.*extends.*implements" {} \; | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -100 {}'

Length of output: 19731


@javier, here's my critical review with line-cited concerns and push-back on design decisions:

Critical Design Issues

1. Arbitrary Type Restriction (entire implementation)
Why is this limited to DOUBLE only? QuestDB's array infrastructure supports LONG, INT, BOOLEAN, etc. The PR objectives don't justify this restriction. This forces users to cast all numeric data to DOUBLE, losing precision for integer types and wasting memory.

  • Challenge: What's the rationale for not supporting other numeric types from day one?
  • Impact: Limited utility, forces unnecessary type conversions

2. Artificial Dimensionality Limit (Line 77)

if (nDimsLong < 1 || nDimsLong > 2) {

QuestDB supports higher-dimensional arrays per the learnings. Why arbitrarily cap at 2D? This seems like an artificial limitation without technical justification.

  • Challenge: Why not support 3D+ arrays if the infrastructure exists?
  • Request: Either justify this limitation or remove it

3. Redundant nDims Parameter (Lines 67-79)
The nDims parameter is entirely redundant - you could infer dimensionality from argCount - 2 (total args minus size and one filler per dimension). Forcing users to specify it introduces error potential:

-- User error: says 2D but only provides 1 filler
array_build(2, 10, 1.0)  -- Should fail but caught late
  • Challenge: Why require explicit nDims when it's computable?
  • Design flaw: Violates DRY principle and creates error surface

Performance & Memory Issues

4. Unnecessary Argument Copying (Lines 105-106)

ObjList<Function> allArgs = new ObjList<>(args);
IntList allArgPositions = new IntList(argPositions);

The parameters are already marked @Transient, indicating they're temporary. Why copy them into new collections? This allocates unnecessarily on every function instantiation.

  • Request: Justify this or use the original references directly

5. Inefficient Type Reset (Line 217)

array.setType(type);

You call setType on every getArray() invocation. Looking at the comment, it "reinitializes the shape list after a prior ofNull". But:

  • Is this always necessary? Only after ofNull() returns?
  • Could you track state and only call when needed?
  • Performance: This could add overhead in hot loops

6. Unoptimized Scalar Fill (Lines 249-252)

double val = fillerFunc.getDouble(rec);
for (int j = 0; j < size; j++) {
    mem.putDouble(val);
}

For large arrays, element-by-element putDouble is inefficient. Consider:

  • Bulk fillMemory-style operations if available
  • SIMD-friendly patterns
  • Request: Benchmark this with 10K+ element arrays

7. Missing Integer Overflow Protection
When calculating memory needed: size * nDims * 8 bytes. For nDims=2 and size=134,217,728, you're at exactly the limit. But there's no check for overflow before multiplication.

  • Bug risk: Large sizes could silently overflow and allocate wrong memory
  • Request: Add overflow checks before memory allocation

8. No OOM Handling
Lines 226-254 write to memory without checking allocation success. If startMemoryA() fails or returns insufficient space, undefined behavior.

Test Coverage Gaps

9. 15 Uncovered Lines (87.70% coverage)

  • Request: Which 15 lines are uncovered?
  • Are these error paths that tests don't exercise?
  • Could these be dead code branches?

API & Thread Safety Concerns

10. Thread Safety: False (Line 263)

public boolean isThreadSafe() {
    return false;
}

This limits parallelization in multi-core query execution. The DirectArray instance is reused but not thread-safe.

  • Challenge: Could you use thread-local DirectArray or make instances thread-safe?
  • Impact: Forces single-threaded execution for queries using this function

11. Hardcoded 2D Shape (Lines 219-222)

if (nDims == 1) {
    array.setDimLen(0, size);
} else {
    array.setDimLen(0, nDims);  // Always 2 for 2D
    array.setDimLen(1, size);
}

For 2D arrays, you hardcode the first dimension to nDims (always 2). This creates a rigid [2][size] shape. What if users want [size][2] or [3][100]?

  • Limitation: Inflexible shape construction
  • Request: Justify this API design choice

Comparison with Existing Code

Looking at DoubleNegArrayFunctionFactory (similar array function), it uses:

final var memory = array.copyShapeAndStartMemoryA(arr);

This is cleaner than your manual shape setup. Why not follow established patterns?

Specific Requests

  1. Performance benchmarks: Where are they? Show me numbers for 1K, 10K, 100K element arrays
  2. Justify DOUBLE-only: What's the plan for other types?
  3. Justify 2D limit: Technical reason or TODO?
  4. Coverage report: Show me the 15 uncovered lines
  5. Memory efficiency: Profile memory allocations in tight loops
  6. Thread-local strategy: Have you considered making this thread-safe?

The implementation is solid for its scope, but the design decisions seem overly conservative and the performance characteristics are unverified. I'd like to see justification for the limitations or a roadmap to lift them.


🏁 Script executed:

#!/bin/bash
# Try to see if there's a coverage report with line details
fd -t f "BuildArrayFunctionFactory" -e xml -e html 2>/dev/null | head -5

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there are performance benchmarks for array functions
fd -t f -e java . --exec grep -l "Benchmark.*[Aa]rray.*[Bb]uild\|JMH.*array_build" {} \;

Length of output: 0


12. No Performance Benchmarks
The scripts found zero JMH benchmarks for array_build. For a function handling potentially large memory allocations and bulk operations, this is concerning.

  • Request: Add JMH benchmarks comparing scalar fill vs array copy, 1D vs 2D, various sizes (10, 100, 1K, 10K elements)

13. Coverage Details Unavailable
No detailed coverage reports found to identify which 15 lines are uncovered. This makes it impossible to verify if critical error paths are tested.

  • Request: Provide line-level coverage report or manually identify untested branches

Edge Cases & Error Handling

14. Filler Evaluation Order
Lines 228-254 evaluate fillers in a loop. If fillerFunc.getDouble(rec) or fillerFunc.getArray(rec) has side effects, the order matters but isn't documented.

15. Size Validation Asymmetry (Lines 197-211)
You validate scalar size for negative and max bounds at runtime, but array-derived size gets no validation:

size = sizeArr.getCardinality();  // No bounds check!

What if a corrupted array returns Integer.MAX_VALUE? The subsequent memory allocation would fail catastrophically.

  • Bug: Missing validation for array-derived sizes

16. NaN Semantics (Lines 232-234, 244-246)

for (int j = 0; j < size; j++) {
    mem.putDouble(Double.NaN);
}

You use NaN for padding shorter arrays and null array fillers. This conflates "missing value" with "padding". What if the source array legitimately contains NaN?

  • Semantic issue: Can't distinguish intentional NaN from padding NaN

Code Quality Issues

17. Inconsistent Naming (Lines 95, 98)

int sizeType = sizeFunc.getType();
boolean sizeIsArray = ColumnType.isArray(sizeType);

vs.

int fillerType = fillerFunc.getType();

Use consistent naming: fillerType not just type.

18. Magic Number 2 (Lines 84, 111)
The number 2 appears in multiple contexts (2D limit, argument offset). Define constants:

private static final int MAX_DIMENSIONS = 2;
private static final int FIRST_FILLER_ARG_INDEX = 2;

Summary: Show Me The Evidence

You've built a functional implementation with good test coverage for happy paths. But:

  1. No benchmarks - How do I know this doesn't regress performance?
  2. Arbitrary limits - DOUBLE-only, 2D max - why?
  3. Unoptimized paths - Scalar fill could be faster
  4. Design redundancy - nDims parameter shouldn't exist
  5. Missing validations - Array-size bounds, overflow checks
  6. Thread-hostile - Can't parallelize
  7. 15 mystery lines - What's not covered?

Concrete requests before merge:

  • Add JMH benchmarks for representative workloads
  • Justify or remove 2D limitation
  • Justify or remove DOUBLE-only restriction
  • Add overflow protection for size calculations
  • Add bounds validation for array-derived sizes
  • Consider removing redundant nDims parameter
  • Show line-level coverage for the 87.70%

This is a good foundation but feels rushed. The design constraints need justification, the performance characteristics need measurement, and the edge cases need hardening.


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: CR
Repo: questdb/questdb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-13T18:25:55.544Z
Learning: QuestDB supports multidimensional arrays (e.g., DOUBLE[], DOUBLE[][]). Dimensionality is encoded in the column type itself, so DOUBLE[] and DOUBLE[][] are distinct column types.

- Remove the nArrays <= 2 restriction so array_build() can combine
  N arrays into a single DOUBLE[][] (e.g., 4 order book columns
  into one 2D array with 4 rows). nArrays >= 2 always returns
  DOUBLE[][] regardless of how many rows.
- Rename nDims -> nArrays throughout for clarity: the parameter
  controls how many arrays are combined, not dimensionality.
- Add upper-bound validation on nArrays before int cast to prevent
  overflow/truncation with huge literals.
- Add DIM_MAX_LEN bounds check on array-derived size (when size
  comes from an array filler's cardinality).
- Add tests for NaN-padding, truncation, element count overflow,
  null array fillers from columns, nArrays=4 order book use case,
  nArrays overflow, and N-dimensional fuzz testing.
@javier
Copy link
Copy Markdown
Contributor Author

javier commented Feb 20, 2026

@coderabbitai, addressed your comments in latest commit

CodeRabbit review response

Integer overflow (CRITICAL)

Already handled. applyShape() calls MutableArray.resetToDefaultStrides() which uses Math.multiplyExact() and checks the product against maxArrayElementCount. The attack vector array_build(2, 200_000_000, 1.0, 2.0) is caught there. Added testError2dElementCountExceedsMax to prove it.

Additionally, this push adds an upper-bound check on nArraysLong before the (int) cast, and a DIM_MAX_LEN check on the array-derived size path (when size comes from a filler array's cardinality). Both caught at compile time / early execution with clear error messages.

Vect.setMemoryDouble() for scalar fill

Requires DirectArray API changes to expose the underlying address and append offset through MemoryA. Not in scope for this PR - the current element-by-element loop is correct and consistent with how other array functions populate data.

setType() per row

Required for correctness. After ofNull(), the shape list is cleared. setType() reinitializes it so that subsequent setDimLen() calls work. The comment at line 222 already documents this. Cannot be removed without changing DirectArray/MutableArray internals.

nDims restricted to 1-2

Addressed in this push. The cap is removed - nArrays can now be any value >= 1. nArrays == 1 returns DOUBLE[], nArrays >= 2 returns DOUBLE[][] with nArrays rows. The return type is fixed at compile time (function signature resolution), so we cap dimensionality at 2 but allow any number of rows within that 2D array.

True 3D+ would require a different function signature or dynamic return types, which QuestDB's function framework doesn't support.

DOUBLE only

Only DOUBLE is enabled in arrayTypeSet. Extending to INT[], LONG[], etc. is a separate effort.

Bulk copy optimization for truncation

Would need to reach into ArrayView's underlying memory address. The current loop is correct. Not in scope.

Thread safety

Standard pattern. The engine creates per-worker function instances via duplicate(). isThreadSafe() = false is consistent with all other DirectArray-backed functions.

Test coverage

This push adds 7 new tests and converts 1 error test to a success test:

  • test2dArrayFillerNaNPad - NaN padding in 2D
  • test2dArrayFillerTruncate - truncation in 2D
  • testError2dElementCountExceedsMax - overflow via applyShape()
  • testErrorNArraysOverflow - huge nArrays literal caught before int cast
  • testNullArrayFillerFromColumn - per-row null/non-null array filler
  • testOrderBookStyle - nArrays=4 real-world use case
  • testNArraysThreeScalarFillers - nArrays=3 success (was error test)
  • testScalarFillNd (fuzz) - random nArrays 2-6 vs ARRAY literal equivalence

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the latest commit on this PR to see the changes
gh pr view 6794 --json commits --jq '.commits[-1] | {sha: .oid, message: .messageHeadline}'

Length of output: 114


🏁 Script executed:

#!/bin/bash
# Get the diff of the latest commit to see what changed
gh pr view 6794 --json commits --jq '.commits[-1].oid' | xargs -I {} gh api repos/questdb/questdb/commits/{} --jq '.files[] | select(.filename | contains("BuildArray")) | .filename'

Length of output: 282


🏁 Script executed:

#!/bin/bash
# Fetch the current version of the implementation file
cat core/src/main/java/io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java

Length of output: 11278


🏁 Script executed:

#!/bin/bash
# Fetch the current test file to see new tests
cat core/src/test/java/io/questdb/test/griffin/engine/functions/array/BuildArrayFunctionFactoryTest.java

Length of output: 18930


🏁 Script executed:

#!/bin/bash
# Check for the resetToDefaultStrides implementation to verify overflow protection claim
rg -n "resetToDefaultStrides|multiplyExact" --type=java -A 5 | grep -A 5 "multiplyExact"

Length of output: 20829


🏁 Script executed:

#!/bin/bash
# Search for maxArrayElementCount to understand the overflow check
rg -n "maxArrayElementCount" --type=java -C 3

Length of output: 4249


🏁 Script executed:

#!/bin/bash
# Look at MutableArray to understand the overflow protection mechanism
fd -t f "MutableArray.java" --exec cat {}

Length of output: 6735


@javier, excellent work addressing the review. Let me verify your responses point-by-point:

Verified ✅

1. Integer overflow protection
You're absolutely right - I missed the existing protection. MutableArray.resetToDefaultStrides() already uses Math.multiplyExact() and checks stride > maxArrayElementCount, throwing CairoException on overflow. My critical concern was incorrect.

The additional safeguards you added strengthen this further:

  • Lines 77-79: nArrays bounds check before int cast ✅
  • Lines 195-199: DIM_MAX_LEN validation on array-derived size ✅
  • testError2dElementCountExceedsMax: Verified present ✅

2. Vect.setMemoryDouble() optimization
Accepted as out-of-scope - would require exposing memory addresses through MemoryA interface. The current element-by-element approach is correct and consistent with other array functions.

3. setType() per row
Line 222 comment confirms this is necessary for shape reinitialization after ofNull(). Accepted.

4. nArrays restriction lifted
Verified the change:

  • nArrays now accepts 1 to DIM_MAX_LEN (268,435,455) ✅
  • Return type: nArrays==1 → DOUBLE[], nArrays≥2 → DOUBLE[][] ✅
  • Explanation about function signature constraints preventing true 3D+ return types is reasonable

5. DOUBLE-only & bulk copy optimization
Both acknowledged as separate efforts. Accepted.

6. Thread safety
Standard pattern confirmed - per-worker instances via duplicate().

7. Test coverage enhanced
All 7 claimed tests verified present:

  • test2dArrayFillerNaNPad
  • test2dArrayFillerTruncate
  • testError2dElementCountExceedsMax
  • testErrorNArraysOverflow
  • testNullArrayFillerFromColumn
  • testOrderBookStyle
  • testNArraysThreeScalarFillers
  • Plus testScalarFillNd fuzz test ✅

Outstanding Questions

  1. Performance: Still no JMH benchmarks. For a function that can allocate multi-million element arrays, do you have any performance characterization? Even informal profiling would be useful.

  2. Naming change: You renamed nDims to nArrays in the latest commit. This is clearer since it represents the number of arrays/rows rather than dimensionality. Good improvement, but worth noting in release notes as it's a user-facing API change.

  3. Coverage percentage: What's the current coverage after adding 7 tests? Should be significantly higher than the initial 87.70%.

Final Assessment

My critical overflow concern was wrong - the protection was already robust. The enhancements you've added (bounds checks, comprehensive tests, nArrays flexibility) significantly strengthen the implementation.

The remaining performance optimization opportunities (vectorized scalar fill, bulk truncation copy) are reasonable to defer given API constraints.

Is this ready to merge from your perspective? The implementation is solid and well-tested.


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, javier prefers hard‑core, critical reviews that challenge design choices and edge cases; respond with deep, push-back style analyses for javier’s requests.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: CR
Repo: questdb/questdb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-13T18:25:55.544Z
Learning: QuestDB supports multidimensional arrays (e.g., DOUBLE[], DOUBLE[][]). Dimensionality is encoded in the column type itself, so DOUBLE[] and DOUBLE[][] are distinct column types.

Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.

Learnt from: mtopolnik
Repo: questdb/questdb PR: 5997
File: core/src/main/java/io/questdb/griffin/engine/functions/array/DoubleArrayAccessFunctionFactory.java:127-131
Timestamp: 2025-07-31T08:21:17.390Z
Learning: In DoubleArrayAccessFunctionFactory.java, overflow validation for array index casting from long to int is performed in the validateIndexArgs() method rather than at each individual cast site. The validateIndexArgs() method is called early in newInstance() to validate all index arguments, making subsequent casts safe.

javier and others added 2 commits February 20, 2026 19:23
Add two tests for gaps found during code review:
- test2dVariableSize: 2D array_build with column-driven size that
  changes per row, verifying DirectArray resizes correctly.
- test2dZeroSizeArrayFiller: 2D array_build with size=0 and an array
  filler, verifying filler evaluation is skipped gracefully.
Also re-sort test methods alphabetically per project convention.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 109 / 127 (85.83%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/array/BuildArrayFunctionFactory.java 109 127 85.83%

@bluestreak01 bluestreak01 merged commit 925bd3b into master Feb 21, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the jv/build_array branch February 21, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants