-
Notifications
You must be signed in to change notification settings - Fork 289
Description
What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark divide_ym_interval function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The DivideYMInterval expression divides a year-month interval by a numeric value, returning a new year-month interval. This operation supports division by various numeric types including integral, decimal, and fractional types, with proper rounding using the HALF_UP rounding mode.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
year_month_interval / numeric_valueArguments:
| Argument | Type | Description |
|---|---|---|
| interval | YearMonthIntervalType | The year-month interval to be divided |
| num | NumericType | The numeric divisor (IntegralType, DecimalType, or FractionalType) |
Return Type: Returns YearMonthIntervalType() - a year-month interval with the same structure as the input interval.
Supported Data Types:
Left operand (interval):
- YearMonthIntervalType
Right operand (num):
- LongType
- IntegralType (IntegerType, ShortType, ByteType)
- DecimalType
- FractionalType (DoubleType, FloatType)
Edge Cases:
- Null handling: Expression is null-intolerant - returns null if either operand is null
- Divide by zero: Throws
QueryExecutionErrorswhen divisor is zero - Overflow behavior:
- Checks for
Int.MinValue / -1overflow condition - Throws
QueryExecutionErrors.overflowInIntegralDivideErroron overflow - Uses
intValueExact()for Decimal results to ensure no precision loss
- Checks for
- Rounding: All division operations use
RoundingMode.HALF_UPfor consistent behavior
Examples:
-- Divide a 2-year 6-month interval by 2
SELECT INTERVAL '2-6' YEAR TO MONTH / 2;
-- Result: INTERVAL '1-3' YEAR TO MONTH
-- Divide by decimal value
SELECT INTERVAL '5-0' YEAR TO MONTH / 2.5;
-- Result: INTERVAL '2-0' YEAR TO MONTH// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(col("year_month_interval") / lit(3))
// Using expression directly
val divideExpr = DivideYMInterval(
interval = col("ym_interval").expr,
num = lit(2).expr
)Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/ - Register: Add to appropriate map in
QueryPlanSerde.scala - Protobuf: Add message type in
native/proto/src/proto/expr.protoif needed - Rust: Implement in
native/spark-expr/src/(check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.DivideYMInterval
Related:
MultiplyYMInterval- Multiplication of year-month intervalsDivideDTInterval- Division of day-time intervalsExtractIntervalYears- Extracting years from intervalsExtractIntervalMonths- Extracting months from intervals
This issue was auto-generated from Spark reference documentation.