Skip to content

Sample by 1M align to calendar with negative offset groups data incorrectly #6301

@jonas-debeuk

Description

@jonas-debeuk

To reproduce

It seems the OFFSET option is interacting incorrectly with the alignment to calendar for Month timeframe.
See the two minimal replication examples

-- Query 2: POSITIVE OFFSET -> will return the expected 6 rows of data

SELECT 
    date,
    COUNT(*) AS row_count,
    SUM(value) AS total_value
FROM (
    SELECT cast('2024-01-15T12:00:00.000000Z' as timestamp) as date, 100.0 as value FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-02-15T12:00:00.000000Z' as timestamp), 200.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-03-15T12:00:00.000000Z' as timestamp), 300.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-04-15T12:00:00.000000Z' as timestamp), 400.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-05-15T12:00:00.000000Z' as timestamp), 500.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-06-15T12:00:00.000000Z' as timestamp), 600.0 FROM long_sequence(1)
    ORDER BY date
)
SAMPLE BY 1M ALIGN TO CALENDAR WITH OFFSET '00:05';

-- Returns 
-- date	row_count	total_value
-- 2024-01-01T00:05:00.000000Z	1	100.0
-- 2024-02-01T00:05:00.000000Z	1	200.0
-- 2024-03-01T00:05:00.000000Z	1	300.0
-- 2024-04-01T00:05:00.000000Z	1	400.0
-- 2024-05-01T00:05:00.000000Z	1	500.0
-- 2024-06-01T00:05:00.000000Z	1	600.0

Query 1: NEGATIVE OFFSET -> this will return only 3 rows of data (1 every 2 months) where it groups the data of two months together

SELECT 
    date,
    COUNT(*) AS row_count,
    SUM(value) AS total_value
FROM (
    SELECT cast('2024-01-15T12:00:00.000000Z' as timestamp) as date, 100.0 as value FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-02-15T12:00:00.000000Z' as timestamp), 200.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-03-15T12:00:00.000000Z' as timestamp), 300.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-04-15T12:00:00.000000Z' as timestamp), 400.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-05-15T12:00:00.000000Z' as timestamp), 500.0 FROM long_sequence(1)
    UNION ALL
    SELECT cast('2024-06-15T12:00:00.000000Z' as timestamp), 600.0 FROM long_sequence(1)
    ORDER BY date
)
SAMPLE BY 1M ALIGN TO CALENDAR WITH OFFSET '-00:05';

-- Returns
-- date	row_count	total_value
-- 2024-01-31T23:55:00.000000Z	2	300.0
-- 2024-03-31T23:55:00.000000Z	2	700.0
-- 2024-05-31T23:55:00.000000Z	2	1100.0

I've not seen any mention of this bug being fixed in any more recent versions in the release notes, so figured worth posting here.

QuestDB version:

8.1.4
UPDATE also true in version 9.1.2

OS, in case of Docker specify Docker and the Host OS:

Docker version 20.10.27, Container-Optimized OS from Google version 101

File System, in case of Docker specify Host File System:

unsure

Full Name:

Jonas De Beukelaer

Affiliation:

Prediko

Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?

  • Yes, I have

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions