Skip to content

[Feature] Support Spark expression: divide_ym_interval #3097

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark divide_ym_interval function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The DivideYMInterval expression divides a year-month interval by a numeric value, returning a new year-month interval. This operation supports division by various numeric types including integral, decimal, and fractional types, with proper rounding using the HALF_UP rounding mode.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

year_month_interval / numeric_value

Arguments:

Argument Type Description
interval YearMonthIntervalType The year-month interval to be divided
num NumericType The numeric divisor (IntegralType, DecimalType, or FractionalType)

Return Type: Returns YearMonthIntervalType() - a year-month interval with the same structure as the input interval.

Supported Data Types:
Left operand (interval):

  • YearMonthIntervalType

Right operand (num):

  • LongType
  • IntegralType (IntegerType, ShortType, ByteType)
  • DecimalType
  • FractionalType (DoubleType, FloatType)

Edge Cases:

  • Null handling: Expression is null-intolerant - returns null if either operand is null
  • Divide by zero: Throws QueryExecutionErrors when divisor is zero
  • Overflow behavior:
    • Checks for Int.MinValue / -1 overflow condition
    • Throws QueryExecutionErrors.overflowInIntegralDivideError on overflow
    • Uses intValueExact() for Decimal results to ensure no precision loss
  • Rounding: All division operations use RoundingMode.HALF_UP for consistent behavior

Examples:

-- Divide a 2-year 6-month interval by 2
SELECT INTERVAL '2-6' YEAR TO MONTH / 2;
-- Result: INTERVAL '1-3' YEAR TO MONTH

-- Divide by decimal value
SELECT INTERVAL '5-0' YEAR TO MONTH / 2.5;
-- Result: INTERVAL '2-0' YEAR TO MONTH
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(col("year_month_interval") / lit(3))

// Using expression directly
val divideExpr = DivideYMInterval(
  interval = col("ym_interval").expr,
  num = lit(2).expr
)

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.DivideYMInterval

Related:

  • MultiplyYMInterval - Multiplication of year-month intervals
  • DivideDTInterval - Division of day-time intervals
  • ExtractIntervalYears - Extracting years from intervals
  • ExtractIntervalMonths - Extracting months from intervals

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions