Skip to content

Add support for Fractional limits and offsets#88755

Merged
nihalzp merged 121 commits intoClickHouse:masterfrom
0xgouda:add-fractional-limit-offset
Nov 8, 2025
Merged

Add support for Fractional limits and offsets#88755
nihalzp merged 121 commits intoClickHouse:masterfrom
0xgouda:add-fractional-limit-offset

Conversation

@0xgouda
Copy link
Copy Markdown
Contributor

@0xgouda 0xgouda commented Oct 17, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add support for fractional LIMIT and OFFSET for selecting a fraction of a table. Closes #81892

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Details

Add support for fractional limits and offset to retrieve percentages of a table for new and old analyzers.

WITH TIES is also supported.

Examples:

# get the median of the table
SELECT * FROM table ORDER BY id LIMIT 1 OFFSET 0.5
# returns first 20% of the result
SELECT * FROM numbers(10) LIMIT 0.2;
   ┌─number─┐
1. │      02. │      1 │
   └────────┘
# returns 3rd quartile of the result
SELECT * FROM numbers(12) LIMIT 0.25 OFFSET 0.5;
   ┌─number─┐
1. │      62. │      73. │      8 │
   └────────┘
# returns first 10% of the result with ties
SELECT * FROM test ORDER BY id LIMIT 0.1 WITH TIES;
   ┌─id─┐
1. │  02. │  13. │  14. │  1 │
   └────┘

Closes #81892

…or limit and offset

- Used `DataTypeDecimal32(2,1)` for type conversion for now because
for some reason it doesn't recognise `DataTypeFloat32()`.
@0xgouda 0xgouda force-pushed the add-fractional-limit-offset branch from a5d615e to 0761e21 Compare November 6, 2025 19:19
Copy link
Copy Markdown
Member

@nihalzp nihalzp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that during transforms, the algorithm first caches all chunks; afterwards, when all pulling is done, only then it starts processing the chunks. The issue is that it will have slow and high memory usage and would not work if the entire table size is bigger than available memory.

For example:

SELECT *
FROM numbers(1e9)
OFFSET 0.99999999
10 rows in set. Elapsed: 3.646 sec. Processed 990.88 million rows, 7.93 GB (271.76 million rows/s., 2.17 GB/s.)
Peak memory usage: 7.45 GiB.

Here we are caching the entire table of 1e9 rows and that's why memory usage is high. If we used a bigger table whose size is bigger than available RAM, the query would fail.

But if we keep evicting front chunks which will never be part of the final result, we can get very low memory footprint:

SELECT *
FROM numbers(1e9)
LIMIT -10
10 rows in set. Elapsed: 0.328 sec. Processed 919.59 million rows, 7.36 GB (2.80 billion rows/s., 22.41 GB/s.)
Peak memory usage: 17.27 KiB.

Offset version:

SELECT *
FROM numbers(1e9)
OFFSET 1e9 - 10
10 rows in set. Elapsed: 0.320 sec. Processed 943.98 million rows, 7.55 GB (2.95 billion rows/s., 23.61 GB/s.)
Peak memory usage: 9.23 KiB.

For context, we can do some inference about already cache chunks and either discard or push them to free up memory. For example, suppose we have a query with OFFSET 0.5. At one point in time, before we have pulled all the data, we have cached chunks which have a total of 100 rows. Now, even if we do not yet know the total number of rows in the table, we can safely infer that the first 50 rows can be discarded, and we can discard cached complete chunks within the first 50 rows.

Similar for LIMIT, instead of discard, we would push to output ports.

This will require some extra bookkeeping but it can make the feature really high quality and blazingly fast!

@nihalzp
Copy link
Copy Markdown
Member

nihalzp commented Nov 7, 2025

It could be very tricky for FractionalLimitTransform and it will not help much because for fractional limit, especially if small, we would need to keep the most of the table in memory anyway.

However, I think optimizing for FractionalOffsetTransform should be simple change. For this initial version, I think this should be okay. And, later we could optimize FractionalLimitTransform in a separate PR.

Let me know what you think.

- add additinoal `getLimimtLengthAndOffset()` parsing step to ensure
fractional limit and offset are zero before adding a prelimit.
@0xgouda 0xgouda force-pushed the add-fractional-limit-offset branch from c458be7 to 7bd3b4a Compare November 7, 2025 13:49
@0xgouda
Copy link
Copy Markdown
Contributor Author

0xgouda commented Nov 7, 2025

I think optimizing for FractionalOffsetTransform should be simple change

I am not sure if optimizing FractionalOffsetTransform is going to be that simple, because we will remove the push phase entirely and in the pull phase every couple of chunks apply the offset on them as you proposed and at the very end push all remaining chunks, but the point is (i am not very sure) there might be scenarios where we still want to offset in the remaining chunks (the ones to be pushed at the end) then maintaining the offset integer till that final phase is going to be tricky i guess.

I would prefer working on these + some more optimizations like limit pushdown, etc. in a separate PR This one already got too big.

@0xgouda
Copy link
Copy Markdown
Contributor Author

0xgouda commented Nov 7, 2025

I really enjoyed working on this. You are a very good reviewer, thanks.

I was planning to implement WITH TIES for the negative limit processor next, but I think I should go for the optimizations you proposed first.

@nihalzp
Copy link
Copy Markdown
Member

nihalzp commented Nov 7, 2025

I see what you mean. I underestimated the complexity. Okay, let's go with your plan then!

@nihalzp
Copy link
Copy Markdown
Member

nihalzp commented Nov 7, 2025

I was planning to implement WITH TIES for the negative limit processor next, but I think I should go for the optimizations you proposed first.

Anything that interests you would be very much welcome :)

Copy link
Copy Markdown
Member

@nihalzp nihalzp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Just few small doc additions I thought would be better if we are more precise.

@0xgouda 0xgouda force-pushed the add-fractional-limit-offset branch from 2dfeda3 to 635e83e Compare November 7, 2025 15:51
@0xgouda 0xgouda force-pushed the add-fractional-limit-offset branch from f073ef0 to f3ca2bc Compare November 8, 2025 07:22
@nihalzp
Copy link
Copy Markdown
Member

nihalzp commented Nov 8, 2025

@nihalzp nihalzp enabled auto-merge November 8, 2025 12:20
@nihalzp nihalzp disabled auto-merge November 8, 2025 12:23
@nihalzp nihalzp enabled auto-merge November 8, 2025 12:40
@nihalzp nihalzp added this pull request to the merge queue Nov 8, 2025
Merged via the queue into ClickHouse:master with commit f24b4ec Nov 8, 2025
126 of 131 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Fractional LIMIT

5 participants