feat(core): N-dimensional array support#5243
Merged
bluestreak01 merged 856 commits intomasterfrom Jun 2, 2025
Merged
Conversation
amunra
commented
Jan 30, 2025
This was referenced Feb 16, 2025
Contributor
|
Added |
# Conflicts: # core/src/main/java/io/questdb/cutlass/http/DefaultHttpServerConfiguration.java
# Conflicts: # core/src/main/java/io/questdb/cairo/CairoEngine.java
bluestreak01
previously approved these changes
Jun 2, 2025
bluestreak01
approved these changes
Jun 2, 2025
Contributor
[PR Coverage check]😍 pass : 4266 / 5062 (84.27%) file detail
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
N-Dimensional Array is a flat array of numbers, but with a hierarchical addressing scheme applied to it, which makes it usable as an N-dimensional array accessed with N indexes.
Hierarchical subdivision of the flat array:
Formula to access a member with syntax
arr[i,j,k]:Note that this is just summing up contributions at each dimension. We can perform it in any order. This fact makes transposing an array a cheap operation.
New SQL syntax
Assuming
CREATE TABLE tango (a DOUBLE, b DOUBLE)andCREATE TABLE samba (arr DOUBLE[][]), these work:SELECT ARRAY[1.0, 2.0, 3.0]-- DOUBLE array literalSELECT ARRAY[1, 2, 3]-- also DOUBLE array literal (auto-cast)SELECT ARRAY[a, b] FROM tangoSELECT ARRAY[[a], [b]] FROM tangoINSERT INTO samba SELECT ARRAY[[a, a], [b, b]] FROM tangoCREATE TABLE foxtrot AS (SELECT ARRAY[[a, a], [b, b]] arr FROM tango)SELECT arr[1, 2] FROM samba-- returns the value in theDOUBLE[][]array at coordinates (1, 2)SELECT arr[1] FROM samba-- returns theDOUBLE[]sub-array at index 1SELECT arr[1][2] FROM samba-- same result and performance asarr[1, 2]SELECT arr[2:] FROM samba-- a slice without the upper bound takes everything from the lower bound till the end of that dimension. The returned array has the same dimensionality -- in this case,DOUBLE[][].SELECT arr[1:100] FROM samba-- a slice with up to 100 elements. If the array is shorter, this returns the entire array.SELECT arr[2:3, 3:4] FROM samba-- selects a slice of theDOUBLE[][]array, comprising of the second row and third column. The returned array has two dimensions, each of length 1.SELECT arr[2:3, 3:4][1] FROM samba-- first selects the slice as above, then takes the sub-array at index 1 of the slice (part of the 2nd row of the original array, aDOUBLE[])SELECT arr[1:, 2] FROM samba-- selects aDOUBLE[]subarray of all the 2nd elements in the 2nd dimensionSELECT a * a FROM samba-- element-wise multiplication of arrays with same shapeSELECT a + a FROM samba-- element-wise addition of arrays with same shapeSELECT matmul(arr, transpose(arr)) FROM samba-- multiplies matrix with its transposed selfAlso implemented:
l2price_arr(target, price_array, size_array)SELECT arr1 = arr2 FROM table,SELECT arr1 != arr2 FROM tabledim_length(arr, 1)-- length of array's 1st dimension (1-based)MUST HAVE:
Support in ILP clients:
...
create table tab(a double[][]);@bluestreak01ARRAYcolumns @bluestreak01ARRAY[[1,2,bid],[2,3,ask]]) @mtopolnikarr[1, 2]@mtopolnikarr[1][2]->arr[1, 2]@mtopolnikarr[1:3, 2:4]@mtopolnikarr[2:]@mtopolnikt(arr)@mtopolnikarr1 * arr2@mtopolnikl2price_arr()implemented for orderbook stored in arrays. It should take two array parameters:price_arr,size_arr. @mtopolnikl2price_arr()tol2price()and make it work alongside the legacyl2price(). Currently there's a signature clash sincel2price()is varargs. @mtopolnikdefaultinterface methodArrayFunctioninto the function framework @bluestreak01rnd_arraytest function @bluestreak01create as selectandinsert as selectfor arrays non-WAL @bluestreak01create as selectandinsert as selectfor arrays WAL @bluestreak01ArrayTypeDriver.arrayToJson()so it works with non-default strides (i.e., transposed array) @mtopolnikARRAYto storage fuzz test system (O3 test, column top, drop/create partitions etc) @bluestreak01ARRAYsupport to ILP https://github.com/questdb/rfc/discussions/116 @kafka1991ARRAYtype @bluestreak01ARRAYtype @bluestreak01ARRAYtype @jerrinotDOUBLEarray_length()functionARRAY[1, 2]and auto-cast toDOUBLE[]ARRAYingress @mtopolnikLONGand other types from occurring in an arrayDOUBLE@mtopolnikStage 2 - data ingress refinement
INSERTstatement, e.g. send queries as POST request (optionally) @bluestreak01Stage 3 - data egress
Stage X - Misc
approx_percentileoverload that accepts an array of percents and returns an array of resultsStage 3 - misc