Added functions for working with the Custom week number#5212
Added functions for working with the Custom week number#5212vitlibar merged 16 commits intoClickHouse:masterfrom
Conversation
|
@andyyzh what's the usecase for this? is it somehow related to Lunar calendar? |
A week number computing rule: the week contained January 1st is the first week, contained a day to make configurable.
|
|
I'm trying to figure out the motivation for this implementation. (Sorry, it doesn't look clear for me.) It looks like that the initial motivation is to provide week numbering to start first week from January 1st. But you probably have noticed that the ISO week numbering starts from the first week containing January 4th. And you wanted to generalize this function, so you've added the possibility to start the numbering from any selected day within a year. Is this really needed? I'm asking because it looks like a false generalization. ISO week numbering starts from the first week containing January 4th, because, coincidentally, it is also the first week that contains majority of days (at least 4 of 7) within a year. And the possibility to start from the first week containing January 3rd or January 5th doesn't make much sense. It is still can make sense to start from a week containing January 1st (first week with at least one day within year) or January 7th (first full week within a year). But I wonder if it is motivated by some unusual local calendars... |
|
For week numbering system, at least the following customizations make sense:
For example of (still incomplete) generalization of week numbering system, look at: |
Adding 'CustomDate' parameter is unnecessary, it's to make functions more general.
No special calendar requirement, I plan to delete the 'CustomDate' parameter to make the purpose of the function clearer. Do you want to change the name of the function? called NorthWeek or USWeek? |
Ok. Yes, better to change the name. Better to implement the |
update larst code to fix LFAlloc problem
Merge larst code from master
|
Prepare to add these functions that similar to mysql, and add new mode:
I'll commit it in two weeks. |
|
Ok. |
update laster code
…ickHouse into custom_week_functions update to laster code
|
Implementing functions similar to those in MySQL; week(date[,mode])This function returns the week number for date or datetime. The two-argument form of week() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the default mode is 0.
For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:
For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It doesn't matter how many days in the new year the week contained, even if it contained only one day. Parameters
Example SELECT toDate('2016-12-27') AS date, week(date) AS week0, week(date,1) AS week1, week(date,9) AS week9;yearWeek(date[,mode])Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year. The mode argument works exactly like the mode argument to week(). For the single-argument syntax, a mode value of 0 is used. Example SELECT toDate('2016-12-27') AS date, yearWeek(date) AS yearWeek0, yearWeek(date,1) AS yearWeek1, yearWeek(date,9) AS yearWeek9; |
|
@alexey-milovidov |
Update the latest code
|
@vitlibar , I'm done. please review code, thank you. |
|
|
||
| Converts a date or date with time to a UInt16 number containing the Custom Year number. | ||
| ``` | ||
| toCustomYear(DateTime, [, CustomDate][, Timezone]) |
There was a problem hiding this comment.
It seems your PR doesn't implement the function toCustomYear anymore.
There was a problem hiding this comment.
Yes, it needs to be deleted.
| For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It doesn't matter how many days in the new year the week contained, even if it contained only one day. | ||
|
|
||
| ``` | ||
| week(DateTime1, [, mode][, Timezone]) |
There was a problem hiding this comment.
Too complicated name for the documentation. Please rename DateTime1 => date
| toCustomYear(DateTime, [, CustomDate][, Timezone]) | ||
| ``` | ||
|
|
||
| ## week(date[,mode]) |
There was a problem hiding this comment.
Is this function a generalization of toISOWeek() ? If so it would be nice to write about it here.
There was a problem hiding this comment.
Yes, I mentioned it in here.
toISOWeek() is a compatibility function that is equivalent to week(date,3).
Here is the test case for toISOWeek:
SELECT
toDate('2018-12-25') + number AS x,
toDateTime(x) AS x_t,
toISOWeek(x) AS w,
toISOWeek(x_t) AS wt
FROM system.numbers
LIMIT 10;┌──────────x─┬─────────────────x_t─┬──w─┬─wt─┐
│ 2018-12-25 │ 2018-12-25 00:00:00 │ 52 │ 52 │
│ 2018-12-26 │ 2018-12-26 00:00:00 │ 52 │ 52 │
│ 2018-12-27 │ 2018-12-27 00:00:00 │ 52 │ 52 │
│ 2018-12-28 │ 2018-12-28 00:00:00 │ 52 │ 52 │
│ 2018-12-29 │ 2018-12-29 00:00:00 │ 52 │ 52 │
│ 2018-12-30 │ 2018-12-30 00:00:00 │ 52 │ 52 │
│ 2018-12-31 │ 2018-12-31 00:00:00 │ 1 │ 1 │
│ 2019-01-01 │ 2019-01-01 00:00:00 │ 1 │ 1 │
│ 2019-01-02 │ 2019-01-02 00:00:00 │ 1 │ 1 │
│ 2019-01-03 │ 2019-01-03 00:00:00 │ 1 │ 1 │
└────────────┴─────────────────────┴────┴────┘
SELECT
toDate('2018-12-25') + number AS x,
toDateTime(x) AS x_t,
week(x, 3) AS w,
week(x_t, 3) AS wt
FROM system.numbers
LIMIT 10;Returns the same result.
|
|
||
| /// For DateTime, if time zone is specified, attach it to type. | ||
| if (std::is_same_v<ToDataType, DataTypeDateTime>) | ||
| return std::make_shared<DataTypeDateTime>(extractTimeZoneNameFromFunctionArguments(arguments, 1, 0)); |
There was a problem hiding this comment.
It's not very clear. The class template FunctionCustomWeekToSomething is used only to implement the week and yearWeek functions, isn't it? week() returns an integer 0..53, yearWeek returns a string yyyyww (or integer?). So it seems the return type should be DataTypeUInt8 or DataTypeString/DataTypeUInt16, not DataTypeDateTime.
There was a problem hiding this comment.
I'll fix it right away,
week return DataTypeUInt8 yearWeek return DataTypeUInt32
|
All checks have passed. |
| |6|Sunday|1-53|with 4 or more days this year | ||
| |7|Monday|1-53|with a Monday in this year | ||
| |8|Sunday|1-53|contains January 1 | ||
| |9|Monday|1-53|contains January 1 |
There was a problem hiding this comment.
I didn't see the modes 8 and 9 in the MySQL documentation. Did you take them from somewhere else?
There was a problem hiding this comment.
| #endif | ||
| # if !__clang__ | ||
| # pragma GCC diagnostic ignored "-Wmaybe-uninitialized" | ||
| # endif |
There was a problem hiding this comment.
I didn't change it manually, automatically modified by the clang-format.
There was a problem hiding this comment.
clang-format can format only your changes
| #define WEEK_MONDAY_FIRST 1 | ||
| #define WEEK_YEAR 2 | ||
| #define WEEK_FIRST_WEEKDAY 4 | ||
| #define WEEK_NEWYEAR_DAY 8 |
There was a problem hiding this comment.
Macros don't respect namespaces, it's better to avoid them. I think enum class WeakMode (with constants from 0 to 9) here would be better.
| } | ||
|
|
||
| inline unsigned toISOWeek(time_t t) const | ||
| inline unsigned calc_yearWeek(DayNum d, UInt32 week_mode) const |
There was a problem hiding this comment.
It looks like you don't need two functions calc_week() and calc_yearWeek() because you have to calculate year anyway. So it seems it's enough to have only function
std::pair<UInt16 /* year */, UInt8 /* week */> toYearAndWeek(DayNum d, WeekMode week_mode) const
and WeekImpl::execute() and YearWeekImpl::execute() will get from this pair everything they need.
|
|
||
| inline unsigned toRelativeQuarterNum(DayNum d) const | ||
| /* | ||
| Calculate nr of day since year 0 in new date-system (from 1615) |
There was a problem hiding this comment.
What's the purpose of this fuinction? We already have lut and years_lut arrays which allow to perform date/time calculations very quickly. And we don't have to support years from 1615 because date in ClickHouse is stored as uint16 numbers of days since Unix epoch.
There was a problem hiding this comment.
This code is ported from mysql and I'm trying to change it.
| { | ||
| return DayNum(d - (lut[d].day_of_week - 1)); | ||
| } | ||
| inline DayNum toFirstDayNumOfWeek(DayNum d) const { return DayNum(d - (lut[d].day_of_week - 1)); } |
There was a problem hiding this comment.
Not 100% related to your PR, but there are two other functions toFirstDayNumOfWeek() and toStartOfWeekInterval() and it seems they should get the new parameter week_mode too to find out whether weeks start on Sunday or on Monday.
There was a problem hiding this comment.
@vitlibar
The default value of week_mode is 0, which is the first day of the week on Sunday.
toFirstDayNumOfWeek() is called toMonday for users, and if you add new parameter week_mode, it will output Sunday or Monday, which is confusing.
struct ToMondayImpl
{
static constexpr auto name = "toMonday";
static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone)
{
return time_zone.toFirstDayNumOfWeek(time_zone.toDayNum(t));
}Does default week_mode value need to be added to the system settings?
The default value is recommended to be 3, so that we can remain the consistency of the original implementation.
toStartOfWeekInterval() itself has several parameters, isn't it too complicated for users to add week_mode? There is no week interval calculation in the document.
https://clickhouse.yandex/docs/en/query_language/functions/date_time_functions/#tostartofinterval-time-or-data-interval-x-unit-time-zone
toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone])
Summary:
- add new function:
toStartOfWeek(t[,mode])to find out whether week start on Sunday or on Monday. - Add a
week_modeconfiguration item to the system settings, default value of week_mode is 3. toStartOfIntervaldoes not change any more.
Can you see if this is ok?
There was a problem hiding this comment.
Ok, I agree about toMonday(). Why do you think the default value for default_week_mode should be 3? In MySQL it's 0.
| SELECT yearWeek(toDate('2000-01-01'),1) AS w2000, yearWeek(toDate('2001-01-01'),1) AS w2001, yearWeek(toDate('2002-01-01'),1) AS w2002,yearWeek(toDate('2003-01-01'),1) AS w2003, yearWeek(toDate('2004-01-01'),1) AS w2004, yearWeek(toDate('2005-01-01'),1) AS w2005, yearWeek(toDate('2006-01-01'),1) AS w2006; | ||
| SELECT yearWeek(toDate('2000-01-06'),1) AS w2000, yearWeek(toDate('2001-01-06'),1) AS w2001, yearWeek(toDate('2002-01-06'),1) AS w2002,yearWeek(toDate('2003-01-06'),1) AS w2003, yearWeek(toDate('2004-01-06'),1) AS w2004, yearWeek(toDate('2005-01-06'),1) AS w2005, yearWeek(toDate('2006-01-06'),1) AS w2006; | ||
| SELECT week(toDate('1998-12-31'),2),week(toDate('1998-12-31'),3), week(toDate('2000-01-01'),2), week(toDate('2000-01-01'),3); | ||
| SELECT week(toDate('2000-12-31'),2),week(toDate('2000-12-31'),3); |
There was a problem hiding this comment.
Almost all the date/time functions in ClickHouse have prefix "to". For consistency it would be nice to rename week() => toWeek(), and yearWeek() => toYearWeek(), and add case-insensitive aliases week, yearWeek (like this) for compatibility with MySQL.
There was a problem hiding this comment.
I was thinking of supporting mysql syntax.
I'm going to change it to week() => toWeek(), and yearWeek() => toYearWeek().
|
All previous tests have passed. I don't think the failing tests are caused by my change. |
update last code
|
@vitlibar |
|
@vitlibar went on vacation. I will reassign this task to someone else. |
ok, thanks! |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
The week number 1 is the first week in year that contains some day, which default to January 1, it can be modify by parameter of function. Week begins at monday.
In this way, users can customize the first week of the year through parameters.
Detailed description (optional):
toCustomWeek
Converts a date or date with time to a UInt8 number containing the Custom Week number.
toCustomWeek(DateTime, [, CustomDate][, Timezone])
toCustomYear
Converts a date or date with time to a UInt16 number containing the Custom Year number.
toCustomYear(DateTime, [, CustomDate][, Timezone])
toStartOfCustomYear
Rounds down a date or date with time to the first day of Custom year. Returns the date.
toStartOfCustomYear(DateTime, [, CustomDate][, Timezone])
Example:
By default, the week contains January 1 is the first week:
SELECT toDate('2016-12-27') AS date, toCustomYear(date) AS year, toCustomWeek(date) AS week, toStartOfCustomYear(date) AS startday;
┌───────date─┬─year─┬─week─┬───startday─┐
│ 2016-12-27 │ 2017 │ 1 │ 2016-12-26 │
└───────────┴─────┴─────┴─────────┘
The week contains January 28 is the first week:
SELECT toDate('2016-12-27') AS date, toCustomYear(date, '01-28') AS year, toCustomWeek(date,'01-28') AS week, toStartOfCustomYear(date,'01-28') AS startday;
┌───────date─┬─year─┬─week─┬───startday─┐
│ 2016-12-27 │ 2016 │ 49 │ 2016-01-25 │
└───────────┴────┴─────┴─────────┘