Skip to content

Date time64 extended range#9404

Merged
alexey-milovidov merged 76 commits intoClickHouse:masterfrom
Enmk:DateTime64_extended_range
Mar 17, 2021
Merged

Date time64 extended range#9404
alexey-milovidov merged 76 commits intoClickHouse:masterfrom
Enmk:DateTime64_extended_range

Conversation

@Enmk
Copy link
Copy Markdown
Contributor

@Enmk Enmk commented Feb 27, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Extended range of DateTime64 to properly support dates from year 1925 to 2283. Improved support of DateTime around zero date (1970-01-01).
...

Detailed description / Documentation draft:
The Year 1925 is a starting point because most of the timezones switched to saner (mostly 15-minutes based) offsets somewhere during 1924 or before. And that significantly simplifies implementation.

2238 is to simplify arithmetics for sanitizing LUT index access; there are less than 0x1ffff days from 1925.

  • Extended DateLUTImpl internal LUT to 0x1ffff items, some of which represent negative (pre-1970) time values.
    As a collateral benefit, Date now correctly supports dates up to 2149 (instead of 2106).
  • Added a new strong typedef ExtendedDayNum, which represents dates pre-1970 and post 2149.
  • Functions that used to return DayNum now return ExtendedDayNum.
  • Refactored DateLUTImpl to untie DayNum from the dual role of being a value and an index (due to negative time). Index is now a different type LUTIndex with explicit conversion functions from DatNum, time_t, and ExtendedDayNum.
  • Updated DateLUTImpl to properly support values close to epoch start (1970-01-01 00:00), including negative ones.
  • Reduced resolution of DateLUTImpl::Values::time_at_offset_change to multiple of 15-minutes to allow storing 64-bits of time_t in DateLUTImpl::Value while keeping same size.
  • Minor performance updates to DateLUTImpl when building month LUT by skipping non-start-of-month days.
  • Fixed extractTimeZoneFromFunctionArguments to work correctly with DateTime64.
  • New unit-tests and stateless integration tests for both DateTime and DateTime64.

Progress:

  • Larger LUT in DateLUTImpl that supports negative time too
  • Cleanup
  • Add more high-level tests (⚠️ Help needed ⚠️)
  • Fix remaining tests

closes #7316

@Enmk Enmk mentioned this pull request Mar 9, 2020
@Enmk Enmk changed the title WIP: Date time64 extended range Date time64 extended range Mar 31, 2020
@blinkov blinkov added the pr-feature Pull request with new product feature label Apr 1, 2020
@alexey-milovidov alexey-milovidov removed their assignment May 13, 2020
@4ertus2 4ertus2 self-assigned this May 18, 2020
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static constexpr const unsigned seconds_in_day = 86400; ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only change tabs here?

@alexey-milovidov
Copy link
Copy Markdown
Member

Good news are that most of DateTime functions are not slowed down.

@alexey-milovidov
Copy link
Copy Markdown
Member

Parsing does not work correctly:

SELECT toDateTime64('1971-01-01 00:00:00', 0) - toIntervalYear(45)

Query id: 9387ca2b-6243-4bfa-b615-3cfee663d747

┌─minus(toDateTime64('1971-01-01 00:00:00', 0), toIntervalYear(45))─┐
│                                               1926-01-01 00:00:00 │
└───────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

milovidov-desktop :) SELECT toDateTime64('1926-01-01 00:00:00', 0)

SELECT toDateTime64('1926-01-01 00:00:00', 0)

Query id: 4c999cd5-9300-4d6f-86d7-7057bcf1627d

┌─toDateTime64('1926-01-01 00:00:00', 0)─┐
│                    2062-02-06 07:28:16 │
└────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec.

@alexey-milovidov
Copy link
Copy Markdown
Member

Fixed.

@alexey-milovidov
Copy link
Copy Markdown
Member

toStartOfDay does not work properly
toRelativeDayNum should be extended from UInt16

This is due to the limitations of DateTime / Date. Won't be fixed in this PR.

@alexey-milovidov
Copy link
Copy Markdown
Member

toStartOfTenMinutes is slowed down only in tz with fractional offset (Asia/Kolkata) - passable.

@alexey-milovidov
Copy link
Copy Markdown
Member

Now performance is almost break even.

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Mar 16, 2021

@KrishnaPG I also have an idea that we can make a LUT of 16777216 second intervals (it is about 194 days) and then assume that there is at most one time transition during this interval (daylight saving time or global change). In the LUT cell we will record an offset to the beginning of the year from beginning of the interval, offset to the transition and the amount of transition if any, year number and a flag if it is a leap year. Then everything is calculated with simple arithmetic and a few branches.

Unfortunately it looks less efficient than our current LUT by days (but I did not try).

For example, the "round time to midnight" is performed in about 300 000 000 iterations per second on single CPU core and I'm not going to lose this performance.

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Mar 16, 2021

Another idea is to place a pointer to virtual table in the LUT cells. Basically the idea is to make LUTCell a polymorphic class and allocate it in LUT with placement new. It will make simple cases very cheap in cost of one indirect function call.

@alexey-milovidov alexey-milovidov merged commit d02726b into ClickHouse:master Mar 17, 2021
0x4ec7 added a commit to 0x4ec7/clickhouse-driver that referenced this pull request Jun 2, 2021
xzkostyan pushed a commit to mymarilyn/clickhouse-driver that referenced this pull request Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Dates and DateTimes outside of 1970-2105 range.

9 participants