Add arrayCombinations, arrayPermutations, arrayPartialPermutations functions#99280
Add arrayCombinations, arrayPermutations, arrayPartialPermutations functions#99280Onyx2406 wants to merge 6 commits intoClickHouse:masterfrom
Conversation
Add three combinatorics functions for arrays: - arrayCombinations(arr, k): all k-length unordered subsets - arrayPermutations(arr): all full permutations - arrayPartialPermutations(arr, k): all k-length ordered selections Fixes ClickHouse#43175
…size limit - Use used-flags approach instead of swap-based recursion for partial permutations to produce lexicographic output order - k=0 now correctly returns [[]] (one empty selection) matching C(n,0)=1 - Add TOO_LARGE_ARRAY_SIZE guard (1M element cap) to prevent OOM
| { | ||
| if (result > limit / (n - i)) | ||
| return 0; | ||
| result = result * (n - i) / (i + 1); |
There was a problem hiding this comment.
Overly aggressive overflow guard rejects valid combination counts
High Severity
The overflow guard result > limit / (n - i) in combinationCountCapped checks whether the intermediate product result * (n - i) exceeds limit, but the actual computation on the next line divides by (i + 1), which can bring the value back well within range. This causes the function to return 0 (interpreted as "too large") for many valid inputs. For example, C(20, 10) = 184,756 is well under the 1,000,000 limit, but at iteration i=7 the intermediate C(20,7) = 77,520 fails the check against 1,000,000 / 13 = 76,923, causing arrayCombinations(range(20), 10) to throw TOO_LARGE_ARRAY_SIZE incorrectly.
The overflow guard was too strict — checking before division caused valid C(n,k) values to be rejected (e.g. C(20,10)=184756). Now uses GCD cross-cancellation between numerator and denominator factors to keep intermediates small. Added test for C(20,10) success case.
|
@Onyx2406, almost ready! Let's check the review comments. |
Address review comments: 1. Size guard now caps total materialized elements (rows * k) instead of just the number of result rows. Previously, inputs like `arrayCombinations(range(20), 10)` produced 1.8M elements while passing the 1M limit check. 2. Replace O(k^2) GCD cross-cancellation in `combinationCountCapped` with an O(k) incremental formula using 128-bit intermediate arithmetic. This prevents pathological inputs from spending excessive time in the guard before rejection. 3. Add regression tests verifying the total-elements invariant: - `arrayCombinations(range(20), 10)` now correctly rejected - `arrayPermutations(range(9))` now correctly rejected (362880*9 > 1M) 4. Fix CRLF line endings in source and test files.
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 98.07% (254/259, 0 noise lines excluded) |
|
@alexey-milovidov is this good to merge? |


Changelog category (leave one):
New Feature
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added
arrayCombinations(arr, k),arrayPermutations(arr), andarrayPartialPermutations(arr, k)functions for generating combinations and permutations of array elements.Documentation entry for user-facing changes
What this PR does
Adds three combinatorics functions for arrays, as proposed in the Intern Tasks 2025/2026:
arrayCombinations(arr, k)arrayCombinations([1,2,3], 2)→[[1,2],[1,3],[2,3]]arrayPermutations(arr)arrayPermutations([1,2,3])→[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]arrayPartialPermutations(arr, k)arrayPartialPermutations([1,2,3], 2)→[[1,2],[1,3],[2,1],[2,3],[3,1],[3,2]]Implementation (2 files, ~250 lines)
arrayCombinations.cpp— Iterative index-based algorithm for generating C(n,k) combinationsarrayPermutations.cpp— Template class handling both full (std::next_permutation) and partial (recursive swap-based) permutationsarrayShinglespattern:Array(T)→Array(Array(T))with dual offset managementTest plan
04043_array_combinatorics.sqlwith 12 test cases:Fixes #43175
Note
Medium Risk
Adds new array functions that can generate very large result sets; although capped at 1M elements, mistakes in size estimation or offsets could still cause performance/memory issues.
Overview
Introduces three new array combinatorics functions:
arrayCombinations(arr, k),arrayPermutations(arr), andarrayPartialPermutations(arr, k), each returningArray(Array(T))and validatingkbounds.All functions include result-size guards (1M element cap) to avoid OOM by precomputing combination/permutation counts and throwing
TOO_LARGE_ARRAY_SIZEwhen exceeded.Adds a new stateless query test (
04043_array_combinatorics) covering normal cases (includingk=0, empty arrays), error cases for invalidk, and size-limit behavior.Written by Cursor Bugbot for commit 412859c. This will update automatically on new commits. Configure here.