Skip to content

Raw forward index enhancement proposal #7616

@richardstartin

Description

@richardstartin

We would like to introduce a new format for raw forward indexes, which does not need a constant number of documents per chunk, instead partitioning columns based on uncompressed size. It is expected that this design will lead to:

  • less memory consumption when there are large values in a raw column
  • fewer chunks than when the number of documents is derived
  • more balanced chunk sizes than when the number of documents is derived
  • will provide support for realtime segments by breaking the dependency on column statistics for sizing

The format would be opt in for the foreseeable future.

Design document

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions