Skip to content

Conversation

@stampy88
Copy link
Contributor

@stampy88 stampy88 commented Apr 4, 2024

  • Ran into a nasty issue where I had incorrectly used a time.Duration with a long Avro type and timestamp-millis. Hamba was encoding the entire int64 for this case instead of erroring on an invalid schema. When Spark/SQL tried to convert this into a microseconds (it wants everything as micros), it eventually tries to do this:
def millisToMicros(millis: Long): Long = {
    Math.multiplyExact(millis, MICROS_PER_MILLIS)
}

And Math.multiplyExact detects an overflow because it is trying to multiply the long by 1000, which doesn't fit. This fix will give the user an error if they use the logicalType incorrectly. The error will look something like avro: time.Duration is unsupported for Avro long and logicalType timestamp-micros.

stampy88 added 2 commits April 4, 2024 15:28
- Ran into a nasty issue where I had incorrectly used a `time.Duration` with a `long` Avro type and `timestamp-millis`. Hamba was encoding the entire int64 for this case instead of erroring on an invalid schema. When Spark/SQL tried to [convert](https://github.com/apache/spark/blob/v3.5.0/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L142C38-L142C52) this into a microseconds (it wants everything as micros), it eventually tries to do this:
```
def millisToMicros(millis: Long): Long = {
    Math.multiplyExact(millis, MICROS_PER_MILLIS)
}
```
And [Math.multiplyExact](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Math.html#multiplyExact(long,int)) detects an overflow because it is trying to multiply the long by 1000, which doesn't fit.
This fix will give the user an error if they use the `logicalType` incorrectly. The error will look something like `avro: time.Duration is unsupported for Avro long and logicalType timestamp-micros`.
@stampy88 stampy88 force-pushed the invalid_logical_type_combination branch from 4c7f21e to bdc185a Compare April 4, 2024 19:41
@nrwiersma nrwiersma merged commit 4894c0b into hamba:main Apr 5, 2024
@stampy88 stampy88 deleted the invalid_logical_type_combination branch April 6, 2024 15:57
@hhromic
Copy link
Contributor

hhromic commented Apr 9, 2024

Hey, I know this has been merged already but I had a question. Sorry to hijack this PR conversation!

I don't see why using a time.Duration is invalid for a long Avro type annotated with either a timestamp-millis or timestamp-micros logical type. While I do agree that it was incorrect for time.Duration to be encoded blindly as int64 (and thus nanos), I think there is no reason for hamba/avro to not do the correct conversion automatically.

  • If logical type is timestamp-millis then convert from Duration.Milliseconds().
  • If logical type is timestamp-micros then convert from Duration.Microseconds().

At least that is what I would expect the encoder to do because it has everything necessary to do the conversion automatically.

Is there any reason why this behaviour is not desired? I can contribute with it if you feel is a good idea.

EDIT: Ah, sorry, I just realized that time.Duration is not a timestamp to begin with, so is invalid no matter the logical type. Apologies for the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants