-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
Description
We would like to introduce a new format for raw forward indexes, which does not need a constant number of documents per chunk, instead partitioning columns based on uncompressed size. It is expected that this design will lead to:
- less memory consumption when there are large values in a raw column
- fewer chunks than when the number of documents is derived
- more balanced chunk sizes than when the number of documents is derived
- will provide support for realtime segments by breaking the dependency on column statistics for sizing
The format would be opt in for the foreseeable future.
kishoreg