Skip to content

Incorrect precision for FLOAT columns in the multi-stage query engine #13615

@yashmayya

Description

@yashmayya
  • [multistage] disable calcite float to double cast #11518 updated the TypeFactory (that is used to convert between Pinot's data types to Calcite SQL's types) to map Pinot's FLOAT data type to Calcite SQL's DOUBLE type.
  • This is incorrect since Pinot's FLOAT data type uses Java's float primitive type internally which is a 32 bit floating point type and should map to Calcite's REAL type instead (see https://calcite.apache.org/docs/reference.html#data-types).
  • This leads to precision issues where a simple query like SELECT aFloatColumn FROM table returns different values in the v1 and v2 query engines (can be verified via any of the quickstarts). The value returned in the multi-stage query engine is incorrect and won't match the value that was originally ingested.
  • The reason the original change was introduced was for queries like SELECT COUNT(*) FROM table WHERE aFloatColumn = 0.05. However, such queries should be written like SELECT COUNT(*) FROM table WHERE aFloatColumn = CAST(0.05 AS FLOAT) instead of incorrectly mapping Pinot's FLOAT data type. This is consistent with other standard databases like Postgres and MySQL where queries comparing FLOAT / REAL columns to literals need explicit casts on the literal value to return the expected result.
  • In a query like SELECT COUNT(*) FROM table WHERE aFloatColumn = 0.05, the literal 0.05 is inferred as a DECIMAL type with fixed precision and scale. During query compilation, Calcite tries to unify the operand types in an expression using a principle of finding a consistent type across the operands here. For a DECIMAL / REAL combination, this comes out to be DOUBLE here and thus there is an additional cast to DOUBLE added to the FLOAT column in the query plan. This can be avoided by explicitly casting the literal to a FLOAT type instead.
  • Note that similar explicit casts aren't required for DOUBLE column comparisons with literals since the consistent type inferred in this situation is DOUBLE itself and the column won't be auto-casted. Again, this behavior also matches standard databases like Postgres and MySQL where DOUBLE columns don't need explicit casts on comparison literals to get the expected results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugmulti-stageRelated to the multi-stage query engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions