Skip to content

Added functions for working with the Custom week number#5212

Merged
vitlibar merged 16 commits intoClickHouse:masterfrom
andyyzh:custom_week_functions
Jul 12, 2019
Merged

Added functions for working with the Custom week number#5212
vitlibar merged 16 commits intoClickHouse:masterfrom
andyyzh:custom_week_functions

Conversation

@andyyzh
Copy link
Copy Markdown
Contributor

@andyyzh andyyzh commented May 7, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature
    Short description (up to few sentences):
    The week number 1 is the first week in year that contains some day, which default to January 1, it can be modify by parameter of function. Week begins at monday.
    In this way, users can customize the first week of the year through parameters.

Detailed description (optional):

toCustomWeek

Converts a date or date with time to a UInt8 number containing the Custom Week number.
toCustomWeek(DateTime, [, CustomDate][, Timezone])

toCustomYear

Converts a date or date with time to a UInt16 number containing the Custom Year number.
toCustomYear(DateTime, [, CustomDate][, Timezone])

toStartOfCustomYear

Rounds down a date or date with time to the first day of Custom year. Returns the date.
toStartOfCustomYear(DateTime, [, CustomDate][, Timezone])

Example:

By default, the week contains January 1 is the first week:
SELECT toDate('2016-12-27') AS date, toCustomYear(date) AS year, toCustomWeek(date) AS week, toStartOfCustomYear(date) AS startday;

┌───────date─┬─year─┬─week─┬───startday─┐
│ 2016-12-27 │ 2017 │ 1 │ 2016-12-26 │
└───────────┴─────┴─────┴─────────┘

The week contains January 28 is the first week:
SELECT toDate('2016-12-27') AS date, toCustomYear(date, '01-28') AS year, toCustomWeek(date,'01-28') AS week, toStartOfCustomYear(date,'01-28') AS startday;

┌───────date─┬─year─┬─week─┬───startday─┐
│ 2016-12-27 │ 2016 │ 49 │ 2016-01-25 │
└───────────┴────┴─────┴─────────┘

@alexey-milovidov alexey-milovidov added can be tested pr-feature Pull request with new product feature labels May 7, 2019
@blinkov
Copy link
Copy Markdown
Contributor

blinkov commented May 7, 2019

@andyyzh what's the usecase for this? is it somehow related to Lunar calendar?

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented May 8, 2019

@andyyzh what's the usecase for this? is it somehow related to Lunar calendar?

A week number computing rule: the week contained January 1st is the first week, contained a day to make configurable.
It has nothing to do with the lunar calendar.

USA, Canada, most of Latin America, Japan, Israel, South Korea, among others, use a week numbering system (called North American in our Calculator) in which the first week (numbered 1) of any given year is the week which contains January 1st.
https://www.calendar-12.com/week_number

@alexey-milovidov
Copy link
Copy Markdown
Member

I'm trying to figure out the motivation for this implementation. (Sorry, it doesn't look clear for me.)

It looks like that the initial motivation is to provide week numbering to start first week from January 1st.

But you probably have noticed that the ISO week numbering starts from the first week containing January 4th. And you wanted to generalize this function, so you've added the possibility to start the numbering from any selected day within a year. Is this really needed?

I'm asking because it looks like a false generalization. ISO week numbering starts from the first week containing January 4th, because, coincidentally, it is also the first week that contains majority of days (at least 4 of 7) within a year.

And the possibility to start from the first week containing January 3rd or January 5th doesn't make much sense. It is still can make sense to start from a week containing January 1st (first week with at least one day within year) or January 7th (first full week within a year).

But I wonder if it is motivated by some unusual local calendars...

@alexey-milovidov
Copy link
Copy Markdown
Member

For week numbering system, at least the following customizations make sense:

  • start a week from Monday (as in Europe) or Sunday (as in USA) or Saturday;
  • allow numbered week to cross the year boundary (as in ISO-8601); or reset the numbering at January 1st; or use number zero instead of one if a week belongs to previous year;
  • how to select the first week of a year: by first day of year or by majority of days within a year (ISO-8601) or containing completely within a year.

For example of (still incomplete) generalization of week numbering system, look at:
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week
and
https://en.wikipedia.org/wiki/Week#Week_numbering

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented May 9, 2019

But you probably have noticed that the ISO week numbering starts from the first week containing January 4th. And you wanted to generalize this function, so you've added the possibility to start the numbering from any selected day within a year. Is this really needed?
toCustomWeek(DateTime, [, CustomDate][, Timezone])

Adding 'CustomDate' parameter is unnecessary, it's to make functions more general.

But I wonder if it is motivated by some unusual local calendars...

No special calendar requirement,

I plan to delete the 'CustomDate' parameter to make the purpose of the function clearer.

Do you want to change the name of the function? called NorthWeek or USWeek?

@alexey-milovidov
Copy link
Copy Markdown
Member

I plan to delete the 'CustomDate' parameter to make the purpose of the function clearer.
Do you want to change the name of the function? called NorthWeek or USWeek?

Ok. Yes, better to change the name.
USWeek is not appropriate, because weeks start at Sunday in US.
NorthWeek is too confusing.

Better to implement the week function from MySQL (at least for compatibility).
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week

andyyzh added 2 commits May 11, 2019 19:51
update larst code to fix LFAlloc problem
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented May 20, 2019

Prepare to add these functions that similar to mysql, and add new mode:

WEEK(date[,mode])
YEAR(date[,mode])
YEARWEEK(date[,mode])

I'll commit it in two weeks.

@alexey-milovidov
Copy link
Copy Markdown
Member

Ok.

@vitlibar vitlibar self-assigned this May 31, 2019
@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 9, 2019

Implementing functions similar to those in MySQL;

week(date[,mode])

This function returns the week number for date or datetime. The two-argument form of week() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the default mode is 0.
The following table describes how the mode argument works.

Mode First day of week Range Week 1 is the first week …
0 Sunday 0-53 with a Sunday in this year
1 Monday 0-53 with 4 or more days this year
2 Sunday 1-53 with a Sunday in this year
3 Monday 1-53 with 4 or more days this year
4 Sunday 0-53 with 4 or more days this year
5 Monday 0-53 with a Monday in this year
6 Sunday 1-53 with 4 or more days this year
7 Monday 1-53 with a Monday in this year
8 Sunday 1-53 contains January 1
9 Monday 1-53 contains January 1

For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:

  • If the week containing January 1 has 4 or more days in the new year, it is week 1.

  • Otherwise, it is the last week of the previous year, and the next week is week 1.

For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It doesn't matter how many days in the new year the week contained, even if it contained only one day.

week(DateTime1, [, mode][, Timezone])

Parameters

  • DateTime1 – Date or DateTime.
  • mode – Optional parameter, Range of values is [0,9], default is 0.
  • Timezone – Optional parameter, it behaves like any other conversion function.

Example

SELECT toDate('2016-12-27') AS date, week(date) AS week0, week(date,1) AS week1, week(date,9) AS week9;
┌───────date─┬─week0─┬─week1─┬─week9─┐
│ 2016-12-27 │    52 │    52 │     1 │
└────────────┴───────┴───────┴───────┘

yearWeek(date[,mode])

Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.

The mode argument works exactly like the mode argument to week(). For the single-argument syntax, a mode value of 0 is used.

Example

SELECT toDate('2016-12-27') AS date, yearWeek(date) AS yearWeek0, yearWeek(date,1) AS yearWeek1, yearWeek(date,9) AS yearWeek9;
┌───────date─┬─yearWeek0─┬─yearWeek1─┬─yearWeek9─┐
│ 2016-12-27 │    201652 │    201652 │    201701 │
└────────────┴───────────┴───────────┴───────────┘

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 9, 2019

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 13, 2019

@vitlibar , I'm done. please review code, thank you.


Converts a date or date with time to a UInt16 number containing the Custom Year number.
```
toCustomYear(DateTime, [, CustomDate][, Timezone])
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems your PR doesn't implement the function toCustomYear anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it needs to be deleted.

For mode values with a meaning of “contains January 1”, the week contains January 1 is week 1. It doesn't matter how many days in the new year the week contained, even if it contained only one day.

```
week(DateTime1, [, mode][, Timezone])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too complicated name for the documentation. Please rename DateTime1 => date

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

toCustomYear(DateTime, [, CustomDate][, Timezone])
```

## week(date[,mode])
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function a generalization of toISOWeek() ? If so it would be nice to write about it here.

Copy link
Copy Markdown
Contributor Author

@andyyzh andyyzh Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentioned it in here.

toISOWeek() is a compatibility function that is equivalent to week(date,3).
Here is the test case for toISOWeek:

SELECT 
    toDate('2018-12-25') + number AS x, 
    toDateTime(x) AS x_t, 
    toISOWeek(x) AS w, 
    toISOWeek(x_t) AS wt
FROM system.numbers 
LIMIT 10;
┌──────────x─┬─────────────────x_t─┬──w─┬─wt─┐
│ 2018-12-25 │ 2018-12-25 00:00:00 │ 52 │ 52 │
│ 2018-12-26 │ 2018-12-26 00:00:00 │ 52 │ 52 │
│ 2018-12-27 │ 2018-12-27 00:00:00 │ 52 │ 52 │
│ 2018-12-28 │ 2018-12-28 00:00:00 │ 52 │ 52 │
│ 2018-12-29 │ 2018-12-29 00:00:00 │ 52 │ 52 │
│ 2018-12-30 │ 2018-12-30 00:00:00 │ 52 │ 52 │
│ 2018-12-31 │ 2018-12-31 00:00:00 │  1 │  1 │
│ 2019-01-01 │ 2019-01-01 00:00:00 │  1 │  1 │
│ 2019-01-02 │ 2019-01-02 00:00:00 │  1 │  1 │
│ 2019-01-03 │ 2019-01-03 00:00:00 │  1 │  1 │
└────────────┴─────────────────────┴────┴────┘
SELECT 
    toDate('2018-12-25') + number AS x, 
    toDateTime(x) AS x_t, 
    week(x, 3) AS w, 
    week(x_t, 3) AS wt
FROM system.numbers 
LIMIT 10;

Returns the same result.


/// For DateTime, if time zone is specified, attach it to type.
if (std::is_same_v<ToDataType, DataTypeDateTime>)
return std::make_shared<DataTypeDateTime>(extractTimeZoneNameFromFunctionArguments(arguments, 1, 0));
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very clear. The class template FunctionCustomWeekToSomething is used only to implement the week and yearWeek functions, isn't it? week() returns an integer 0..53, yearWeek returns a string yyyyww (or integer?). So it seems the return type should be DataTypeUInt8 or DataTypeString/DataTypeUInt16, not DataTypeDateTime.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it right away,
week return DataTypeUInt8 yearWeek return DataTypeUInt32

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 14, 2019

All checks have passed.

|6|Sunday|1-53|with 4 or more days this year
|7|Monday|1-53|with a Monday in this year
|8|Sunday|1-53|contains January 1
|9|Monday|1-53|contains January 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the modes 8 and 9 in the MySQL documentation. Did you take them from somewhere else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

#endif
# if !__clang__
# pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
# endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change it manually, automatically modified by the clang-format.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format can format only your changes

#define WEEK_MONDAY_FIRST 1
#define WEEK_YEAR 2
#define WEEK_FIRST_WEEKDAY 4
#define WEEK_NEWYEAR_DAY 8
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macros don't respect namespaces, it's better to avoid them. I think enum class WeakMode (with constants from 0 to 9) here would be better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll fix it.

}

inline unsigned toISOWeek(time_t t) const
inline unsigned calc_yearWeek(DayNum d, UInt32 week_mode) const
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you don't need two functions calc_week() and calc_yearWeek() because you have to calculate year anyway. So it seems it's enough to have only function

std::pair<UInt16 /* year */, UInt8 /* week */> toYearAndWeek(DayNum d, WeekMode week_mode) const

and WeekImpl::execute() and YearWeekImpl::execute() will get from this pair everything they need.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll fix it.


inline unsigned toRelativeQuarterNum(DayNum d) const
/*
Calculate nr of day since year 0 in new date-system (from 1615)
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this fuinction? We already have lut and years_lut arrays which allow to perform date/time calculations very quickly. And we don't have to support years from 1615 because date in ClickHouse is stored as uint16 numbers of days since Unix epoch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is ported from mysql and I'm trying to change it.

{
return DayNum(d - (lut[d].day_of_week - 1));
}
inline DayNum toFirstDayNumOfWeek(DayNum d) const { return DayNum(d - (lut[d].day_of_week - 1)); }
Copy link
Copy Markdown
Member

@vitlibar vitlibar Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% related to your PR, but there are two other functions toFirstDayNumOfWeek() and toStartOfWeekInterval() and it seems they should get the new parameter week_mode too to find out whether weeks start on Sunday or on Monday.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitlibar
The default value of week_mode is 0, which is the first day of the week on Sunday.
toFirstDayNumOfWeek() is called toMonday for users, and if you add new parameter week_mode, it will output Sunday or Monday, which is confusing.

https://clickhouse.yandex/docs/en/query_language/functions/date_time_functions/#tomonday

struct ToMondayImpl
{
    static constexpr auto name = "toMonday";

    static inline UInt16 execute(UInt32 t, const DateLUTImpl & time_zone)
    {
        return time_zone.toFirstDayNumOfWeek(time_zone.toDayNum(t));
    }

Does default week_mode value need to be added to the system settings?
The default value is recommended to be 3, so that we can remain the consistency of the original implementation.

toStartOfWeekInterval() itself has several parameters, isn't it too complicated for users to add week_mode? There is no week interval calculation in the document.
https://clickhouse.yandex/docs/en/query_language/functions/date_time_functions/#tostartofinterval-time-or-data-interval-x-unit-time-zone
toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone])

Summary:

  1. add new function:toStartOfWeek(t[,mode]) to find out whether week start on Sunday or on Monday.
  2. Add a week_mode configuration item to the system settings, default value of week_mode is 3.
  3. toStartOfInterval does not change any more.

Can you see if this is ok?

Copy link
Copy Markdown
Member

@vitlibar vitlibar Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree about toMonday(). Why do you think the default value for default_week_mode should be 3? In MySQL it's 0.

SELECT yearWeek(toDate('2000-01-01'),1) AS w2000, yearWeek(toDate('2001-01-01'),1) AS w2001, yearWeek(toDate('2002-01-01'),1) AS w2002,yearWeek(toDate('2003-01-01'),1) AS w2003, yearWeek(toDate('2004-01-01'),1) AS w2004, yearWeek(toDate('2005-01-01'),1) AS w2005, yearWeek(toDate('2006-01-01'),1) AS w2006;
SELECT yearWeek(toDate('2000-01-06'),1) AS w2000, yearWeek(toDate('2001-01-06'),1) AS w2001, yearWeek(toDate('2002-01-06'),1) AS w2002,yearWeek(toDate('2003-01-06'),1) AS w2003, yearWeek(toDate('2004-01-06'),1) AS w2004, yearWeek(toDate('2005-01-06'),1) AS w2005, yearWeek(toDate('2006-01-06'),1) AS w2006;
SELECT week(toDate('1998-12-31'),2),week(toDate('1998-12-31'),3), week(toDate('2000-01-01'),2), week(toDate('2000-01-01'),3);
SELECT week(toDate('2000-12-31'),2),week(toDate('2000-12-31'),3);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all the date/time functions in ClickHouse have prefix "to". For consistency it would be nice to rename week() => toWeek(), and yearWeek() => toYearWeek(), and add case-insensitive aliases week, yearWeek (like this) for compatibility with MySQL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of supporting mysql syntax.
I'm going to change it to week() => toWeek(), and yearWeek() => toYearWeek().

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 19, 2019

All previous tests have passed. I don't think the failing tests are caused by my change.
@alexey-milovidov If you have time, take a look.

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jun 24, 2019

@vitlibar
All the review issues have been fixed except toStartOfWeekInterval() function was not added.
Please review again.

@alexey-milovidov
Copy link
Copy Markdown
Member

@vitlibar went on vacation. I will reassign this task to someone else.

@andyyzh
Copy link
Copy Markdown
Contributor Author

andyyzh commented Jul 4, 2019

@vitlibar went on vacation. I will reassign this task to someone else.

ok, thanks!

@vitlibar vitlibar merged commit fd89a8b into ClickHouse:master Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants