-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
Description
- [multistage] disable calcite float to double cast #11518 updated the TypeFactory (that is used to convert between Pinot's data types to Calcite SQL's types) to map Pinot's
FLOATdata type to Calcite SQL'sDOUBLEtype. - This is incorrect since Pinot's
FLOATdata type uses Java'sfloatprimitive type internally which is a 32 bit floating point type and should map to Calcite'sREALtype instead (see https://calcite.apache.org/docs/reference.html#data-types). - This leads to precision issues where a simple query like
SELECT aFloatColumn FROM tablereturns different values in the v1 and v2 query engines (can be verified via any of the quickstarts). The value returned in the multi-stage query engine is incorrect and won't match the value that was originally ingested. - The reason the original change was introduced was for queries like
SELECT COUNT(*) FROM table WHERE aFloatColumn = 0.05. However, such queries should be written likeSELECT COUNT(*) FROM table WHERE aFloatColumn = CAST(0.05 AS FLOAT)instead of incorrectly mapping Pinot'sFLOATdata type. This is consistent with other standard databases like Postgres and MySQL where queries comparing FLOAT / REAL columns to literals need explicit casts on the literal value to return the expected result. - In a query like
SELECT COUNT(*) FROM table WHERE aFloatColumn = 0.05, the literal0.05is inferred as aDECIMALtype with fixed precision and scale. During query compilation, Calcite tries to unify the operand types in an expression using a principle of finding a consistent type across the operands here. For aDECIMAL/REALcombination, this comes out to beDOUBLEhere and thus there is an additional cast toDOUBLEadded to theFLOATcolumn in the query plan. This can be avoided by explicitly casting the literal to aFLOATtype instead. - Note that similar explicit casts aren't required for
DOUBLEcolumn comparisons with literals since the consistent type inferred in this situation isDOUBLEitself and the column won't be auto-casted. Again, this behavior also matches standard databases like Postgres and MySQL whereDOUBLEcolumns don't need explicit casts on comparison literals to get the expected results.