Skip to content

Group by multi-value columns without aggregation can not work #8019

@MeihanLi

Description

@MeihanLi

[Pinot SQL] It seems that non-aggregation group by only works for single value columns. Grouping by multi-value columns throws an exception with error code 200. The exception message from Pinot Controller UI is redundant and it repeats the first message over 20 times. The detailed stack tree is not meaningful for us to understand what causes the issue.

Example tags_value (String array):
restaurant_professional,restaurant_fast,restaurant_fresh,restaurant_tasty

Example query:
select tags_value from myTable GROUP BY tags_value

Error message from Pinot Controller UI:
{ "errorCode": 200, "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:278)\n\tat org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:88)\n\tat org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:99)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getDictionaryIdsSV(ProjectionBlockValSet.java:69)\n\tat org.apache.pinot.core.query.distinct.dictionary.DictionaryBasedSingleColumnDistinctOnlyExecutor.process(DictionaryBasedSingleColumnDistinctOnlyExecutor.java:42)\n\tat org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:61)\n\tat org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:38)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.processSegments(BaseCombineOperator.java:150)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:105)\n\tat org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)" },

Error message from the server:
2022-01-10 19:16:24.768 [pqw-10] ERROR org.apache.pinot.core.operator.combine.BaseCombineOperator - Caught exception while executing operator of index: 1 (query: QueryContext{_tableName='re staurant_bi_feedback_OFFLINE', _selectExpressions=[distinct(tags_value)], _aliasList=[null], _filter=rating_value > '0', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=nul l, _limit=10, _offset=0, _queryOptions={responseFormat=sql, trace=true, groupByMode=sql, timeoutMs=16000}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName: restaurant_bi_feedback_OFFLINE), pinotQuery:PinotQuery(dataSource:DataSource(tableName:restaurant_bi_feedback_OFFLINE), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:D ISTINCT, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:tags_value))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:GREATER_THAN, operands:[ Expression(type:IDENTIFIER, identifier:Identifier(name:rating_value)), Expression(type:LITERAL, literal:<Literal longValue:0>)])), groupByList:[], orderByList:[], limit:10, queryOptions:{res ponseFormat=sql, trace=true, groupByMode=sql, timeoutMs=16000}))}) java.lang.UnsupportedOperationException: null at org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84) at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:278) at org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:88) at org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:99) at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getDictionaryIdsSV(ProjectionBlockValSet.java:69) at org.apache.pinot.core.query.distinct.dictionary.DictionaryBasedSingleColumnDistinctOnlyExecutor.process(DictionaryBasedSingleColumnDistinctOnlyExecutor.java:42) at org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:61) at org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:38) at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42) at org.apache.pinot.core.operator.combine.BaseCombineOperator.processSegments(BaseCombineOperator.java:150) at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:105) at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Also, It is weired that aggregation group by works for multi-value columns. This query shows more details and if this query can work, the non-aggregation group by should also work for multi-value columns.
Workable query:
select tags_value, count(*) from myTable GROUP BY tags_value

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions