Skip to content

ARROW-9961: [Rust][DataFusion] Make to_timestamp function parses timestamp without timezone offset as local#8161

Closed
alamb wants to merge 1 commit intoapache:masterfrom
alamb:alamb/local_timezone
Closed

ARROW-9961: [Rust][DataFusion] Make to_timestamp function parses timestamp without timezone offset as local#8161
alamb wants to merge 1 commit intoapache:masterfrom
alamb:alamb/local_timezone

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Sep 10, 2020

The TO_TIMESTAMP function added in #8142 supports parsing timestamps without a specified timezone, such as 2020-09-08T13:42:29.190855

Such timestamps are supposed to be interpreted as in the local timezone, but instead are interpreted as UTC.

This PR corrects that logical error

Here is a query before this PR showing the problem on my machine (in America/New_York timezone) - note the values of the timestamp are the same for both the Z and non Z timestamps:

>  select to_timestamp('2020-09-08T13:42:29.190855'),  to_timestamp('2020-09-08T13:42:29.190855Z') from ts_table limit 1;
+-------------------------------------------------+--------------------------------------------------+
| totimestamp(Utf8("2020-09-08T13:42:29.190855")) | totimestamp(Utf8("2020-09-08T13:42:29.190855Z")) |
+-------------------------------------------------+--------------------------------------------------+
| 1599572549000190855                             | 1599572549190855000                              |
+-------------------------------------------------+--------------------------------------------------+
1 rows in set. Query took 0 seconds.

Here is a query after this PR:

>  select to_timestamp('2020-09-08T13:42:29.190855'),  to_timestamp('2020-09-08T13:42:29.190855Z') from ts_table limit 1;
+-------------------------------------------------+--------------------------------------------------+
| totimestamp(Utf8("2020-09-08T13:42:29.190855")) | totimestamp(Utf8("2020-09-08T13:42:29.190855Z")) |
+-------------------------------------------------+--------------------------------------------------+
| 1599586949000190855                             | 1599572549190855000                              |
+-------------------------------------------------+--------------------------------------------------+
1 rows in set. Query took 0 seconds.

@github-actions
Copy link
Copy Markdown

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Sep 11, 2020

FYI @jhorstmann / @andygrove

@andygrove
Copy link
Copy Markdown
Member

@alamb I think you copy and pasted the same result twice in the PR description?

Copy link
Copy Markdown
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks a lot for taking this, @alamb; datetime conversions are always nasty.

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Sep 12, 2020

@alamb I think you copy and pasted the same result twice in the PR description?

@andygrove -- I don't think so -- the timestamps are (very) subtlety different. Maybe this screenshot helps to show it better
Screen Shot 2020-09-12 at 6 06 11 AM

Thanks a lot for taking this, @alamb; datetime conversions are always nasty.
Thanks @jorgecarleitao -- I agree. I spent far longer on this task than I would have liked to or would like to admit.

@jhorstmann
Copy link
Copy Markdown
Contributor

I guess one choice has to be made, whether to default to local timezone or utc, and it's always possible to set the local timezone to utc to avoid these issues. My only concern is that this could cause tests to fail if they use local timestamps and are then run on machines with different time zones. This can probably only be solved with careful review, and by hopefully having the ci run with different time zones than your local machines.

One (not directly related) issue I noticed while trying this out, is that the local patterns seem to require the millisecond part, while for utc timestamps with "Z" they are optional:

> select to_timestamp('2020-09-12T10:30:00') from test limit 1;
ArrowError(ExternalError(General("Error parsing \'2020-09-12T10:30:00\' as timestamp")))

> select to_timestamp('2020-09-12T10:30:00Z') from test limit 1;
+-------------------------------------------+
| totimestamp(Utf8("2020-09-12T10:30:00Z")) |
+-------------------------------------------+
| 1599906600000000000                       |
+-------------------------------------------+

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Sep 12, 2020

Thank you @jhorstmann

My only concern is that this could cause tests to fail if they use local timestamps and are then run on machines with different time zones. This can probably only be solved with careful review, and by hopefully having the ci run with different time zones than your local machines.

Yes -- I agree. I have been bitten by this kind of issue in the past. The other fun one was when the local timezone offset changes due to daylight savings time. The other work around is to use a timestamp with an explicit timezone offset (eg ending in Z).

One (not directly related) issue I noticed while trying this out, is that the local patterns seem to require the millisecond part, while for utc timestamps with "Z" they are optional:

This is a good find and I will fix that in a subsequent PR.

@andygrove andygrove closed this in ad82762 Sep 12, 2020
andygrove pushed a commit that referenced this pull request Sep 16, 2020
…actional seconds

As pointed out on @jhorstmann: #8161 (comment)

> One (not directly related) issue I noticed while trying this out, is that the local patterns seem to require the millisecond part, while for utc timestamps with "Z" they are optional:

```
> select to_timestamp('2020-09-12T10:30:00') from test limit 1;
ArrowError(ExternalError(General("Error parsing \'2020-09-12T10:30:00\' as timestamp")))

> select to_timestamp('2020-09-12T10:30:00Z') from test limit 1;
+-------------------------------------------+
| totimestamp(Utf8("2020-09-12T10:30:00Z")) |
+-------------------------------------------+
| 1599906600000000000                       |
+-------------------------------------------+
```

This PR allows parsing local timestamp patterns without the fractional seconds and tests for same

Closes #8179 from alamb/alamb/ARROW-9986-fractional-secs

Authored-by: alamb <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
@alamb alamb deleted the alamb/local_timezone branch October 26, 2020 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants