Skip to content

[multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets #10129

@ankitsultana

Description

@ankitsultana

Repro:

SELECT
  COUNT(*) -- the issue happens even if you have COUNT(A.playerID)
FROM
  baseballStats_OFFLINE AS A
  JOIN baseballStats_OFFLINE AS B ON A.playerID = B.playerID
WHERE
  A.hits > 10 AND B.hits < 5

The issue happens even if you use a sub-query:

SELECT
  COUNT(*)
FROM
  (
    SELECT
      A.playerID
    FROM
      baseballStats_OFFLINE AS A
      JOIN baseballStats_OFFLINE AS B ON A.playerID = B.playerID
    WHERE
      A.hits > 10
      AND B.hits < 5
  )

For each of the queries above, they will end up reading all the columns in the table-scan stage. The reason is that there's no projection node created by Calcite.

I am testing out a few approaches for a fix in this PR: #10122

Btw, this is what GPT recommends:

cc: @walterddr

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions