Skip to content

perf(sql): support advanced aggregate functions in parallel GROUP BY#4097

Merged
ideoma merged 39 commits intomasterfrom
puzpuzpuz_parallel_group_by_pt3
Jan 11, 2024
Merged

perf(sql): support advanced aggregate functions in parallel GROUP BY#4097
ideoma merged 39 commits intomasterfrom
puzpuzpuz_parallel_group_by_pt3

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Jan 4, 2024

The core idea is to use a specialized allocator for aggregate functions with additional state. That's to amortize the cost of frequent alloc/free calls. The allocator is thread-safe, supports best-effort free operation, and frees the memory fully only when it's closed. Considering that the aggregate functions grow their state as the power of 2, the max theoretical memory overhead should be around 2x (that's only with very active data structure to thread migration). See GroupByAllocator for more details.

Migrated aggregate functions use new flyweight off-heap data structures backed by the allocator and store the data structure pointers in the GROUP BY map. The data structures can be found in the GroupByCharSink, GroupByLongHashSet and GroupByIntHashSet classes.

Includes parallel GROUP BY migration for min(str)/max(str) functions and count_distinct() function for LONG, INT and IPv4 types.

Also includes the following:

  • Fixes broken designated timestamp push-down when parallel GROUP BY steals nested filter.

ClickBench results on my 4c/8t machine (hot runs):

-- before 33.5s
-- after 10s
-- allocated 1528MB
SELECT * FROM (SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '$1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer IS NOT NULL GROUP BY k) WHERE c > 100000 ORDER BY l DESC LIMIT 25;

-- before 9.2s
-- after 2.6s
-- allocated 1021MB
SELECT RegionID, count_distinct(UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10;

-- before 10.5s
-- after 3s
-- allocated 1014MB
SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), count_distinct(UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;

-- before 500ms
-- after 250ms
-- allocated 50MB
SELECT MobilePhoneModel, count_distinct(UserID) AS u FROM hits WHERE MobilePhoneModel IS NOT NULL GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;

-- before 530ms
-- after 270ms
-- allocated 52MB
SELECT MobilePhone, MobilePhoneModel, count_distinct(UserID) AS u FROM hits WHERE MobilePhoneModel IS NOT NULL GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;

-- before 7.7s
-- after 1.8s
-- allocated 2094MB
SELECT SearchPhrase, count_distinct(UserID) AS u FROM hits WHERE SearchPhrase IS NOT NULL GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;

-- before 8s
-- after 1.2s
-- allocated 3MB
SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, count_distinct(UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase IS NOT NULL GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;

Microbenchmark results:

Benchmark                                           Mode  Cnt  Score    Error  Units
GroupByLongHashSetBenchmark.baseline                avgt    3  0.005 ±  0.001  us/op
GroupByLongHashSetBenchmark.testFastMap             avgt    3  0.099 ±  0.003  us/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet  avgt    3  0.026 ±  0.001  us/op

@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Jan 4, 2024
@puzpuzpuz puzpuzpuz self-assigned this Jan 4, 2024
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_parallel_group_by_pt3 branch 2 times, most recently from 1ebca60 to 1d67bf3 Compare January 4, 2024 12:49
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_parallel_group_by_pt3 branch from 1d67bf3 to 8f187ca Compare January 5, 2024 08:47
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 5, 2024 09:25
@puzpuzpuz puzpuzpuz marked this pull request as draft January 5, 2024 15:28
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 8, 2024 08:20
@puzpuzpuz puzpuzpuz marked this pull request as draft January 8, 2024 15:11
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_parallel_group_by_pt3 branch from 4649f88 to 2412ab6 Compare January 10, 2024 15:41
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

nit: it would be good to explain in the comments the implications of having a parallelizable function that is not thread-safe with an example

Updated Javadoc in ea4e17b

@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_parallel_group_by_pt3 branch 2 times, most recently from dd07850 to 35412db Compare January 11, 2024 08:26
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_parallel_group_by_pt3 branch from 35412db to b6660cb Compare January 11, 2024 08:50
ideoma
ideoma previously approved these changes Jan 11, 2024
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@ideoma many thanks for the thoughtful review! I've addressed all pending comments. PTAL

@ideoma
Copy link
Copy Markdown
Collaborator

ideoma commented Jan 11, 2024

[PR Coverage check]

😍 pass : 984 / 1242 (79.23%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/constants/SymbolConstant.java 0 1 00.00%
🔵 io/questdb/griffin/engine/groupby/MapSymbolColumn.java 0 1 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullStrGroupByFunction.java 0 1 00.00%
🔵 io/questdb/griffin/engine/PerWorkerLocks.java 0 1 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastStrGroupByFunction.java 0 1 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstSymbolGroupByFunction.java 0 3 00.00%
🔵 io/questdb/griffin/engine/functions/test/TestSumDoubleGroupByFunction.java 0 4 00.00%
🔵 io/questdb/griffin/engine/functions/test/TestSumStringGroupByFunction.java 0 4 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/IsLongOrderedGroupByFunction.java 0 13 00.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 0 2 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/NSumDoubleGroupByFunction.java 0 4 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/InterpolationGroupByFunction.java 0 3 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/test/TestSumTDoubleGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/ApproxPercentileLongPackedGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/AbstractCovarGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctStringGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/KSumDoubleGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/ApproxPercentileDoublePackedGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionLong.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/StringAggGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/CorrGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/ApproxPercentileDoubleGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/HaversineDistDegreeGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionByte.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstStrGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/ApproxPercentileLongGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountLongConstGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/AbstractStdDevGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionShort.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionInt.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstUuidGroupByFunction.java 1 4 25.00%
🔵 io/questdb/griffin/engine/functions/groupby/VwapDoubleGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/SumLongGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinIPv4GroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/SumLong256GroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxIPv4GroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinIntGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/SumDoubleGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/SumIntGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxIntGroupByFunction.java 2 5 40.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxDoubleGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinTimestampGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxFloatGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinFloatGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxCharGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinDoubleGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxLongGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinDateGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/MinLongGroupByFunction.java 3 6 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/IsIPv4OrderedGroupByFunction.java 9 13 69.23%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunction.java 35 47 74.47%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java 41 51 80.39%
🔵 io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunction.java 41 51 80.39%
🔵 io/questdb/griffin/engine/functions/MultiArgFunction.java 5 6 83.33%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java 9 10 90.00%
🔵 io/questdb/griffin/engine/functions/groupby/MaxStrGroupByFunction.java 33 35 94.29%
🔵 io/questdb/griffin/engine/groupby/GroupByCharSink.java 43 45 95.56%
🔵 io/questdb/std/IntList.java 30 31 96.77%
🔵 io/questdb/griffin/engine/groupby/GroupByLongHashSet.java 92 95 96.84%
🔵 io/questdb/griffin/engine/groupby/GroupByIntHashSet.java 92 95 96.84%
🔵 io/questdb/griffin/engine/groupby/GroupByUtils.java 25 26 96.15%
🔵 io/questdb/griffin/engine/functions/groupby/MinStrGroupByFunction.java 34 35 97.14%
🔵 io/questdb/cairo/map/FastMap.java 61 62 98.39%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/PropServerConfiguration.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/columns/SymbolColumn.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java 4 4 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByMergeShardJob.java 4 4 100.00%
🔵 io/questdb/cairo/sql/Function.java 1 1 100.00%
🔵 io/questdb/std/Hash.java 7 7 100.00%
🔵 io/questdb/griffin/engine/functions/bool/OrFunctionFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/GroupByFunction.java 1 1 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameReduceJob.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java 10 10 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByAtom.java 41 41 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java 4 4 100.00%
🔵 io/questdb/std/AbstractLongHashSet.java 9 9 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 2 2 100.00%
🔵 io/questdb/griffin/model/QueryModel.java 3 3 100.00%
🔵 io/questdb/griffin/engine/functions/UnaryFunction.java 1 1 100.00%
🔵 io/questdb/cairo/map/FastMapCursor.java 1 1 100.00%
🔵 io/questdb/std/LongLongHashMap.java 3 3 100.00%
🔵 io/questdb/std/Uuid.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/AbstractCountGroupByFunction.java 5 5 100.00%
🔵 io/questdb/PropertyKey.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/std/MemoryTag.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java 9 9 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/CountLong256GroupByFunction.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/StdDevPopGroupByFunction.java 1 1 100.00%
🔵 io/questdb/std/LongList.java 9 9 100.00%
🔵 io/questdb/cairo/sql/NetworkSqlExecutionCircuitBreaker.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/AbstractNoRecordSampleByCursor.java 3 3 100.00%
🔵 io/questdb/griffin/engine/functions/SymbolFunction.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByAllocator.java 82 82 100.00%
🔵 io/questdb/std/LongLongHashSet.java 1 1 100.00%
🔵 io/questdb/griffin/SqlCodeGenerator.java 50 50 100.00%
🔵 io/questdb/griffin/engine/functions/BinaryFunction.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/TernaryFunction.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByNotKeyedRecordCursorFactory.java 8 8 100.00%
🔵 io/questdb/std/ConcurrentLongHashMap.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java 31 31 100.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredRecordCursorFactory.java 9 9 100.00%
🔵 io/questdb/griffin/SqlUtil.java 4 4 100.00%
🔵 io/questdb/cairo/map/FastMapValue.java 19 19 100.00%
🔵 io/questdb/griffin/engine/functions/bool/AndFunctionFactory.java 1 1 100.00%

@ideoma ideoma merged commit e810c3c into master Jan 11, 2024
@ideoma ideoma deleted the puzpuzpuz_parallel_group_by_pt3 branch January 11, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants