Subcolumn storage for Map

### Company or project name

_No response_

### Use case

Improved storage compression and query performance for map-structured data

### Describe the solution you'd like

It would be nice to have the underlying storage for, e.g. `Map(LowCardinality(String), LowCardinality(String))`, to split into multiple subcolumns, where each subcolumn only contains the values corresponding to a single key. It can lead to much better compression if there are long runs of identical values (for example, common key-value labels attached to observability data). A reasonable limit should be in place to prevent an explosion of columns on disk.

### Describe alternatives you've considered

There are plenty of alternatives, here are some thoughts on each of them:

* Plain `Map`
  * As I understand it, the storage for a map right now is to splat out each key and value consecutively in a single file, like (key1, value1, key2, value2, ...). This is okay for compression, but less performant when aggregating on a single key (like `my_map['my-key']`) because the `m[k]` syntax scans the whole map, so is O(n) with the size of the map.
* Parallel arrays
  * Very similar in concept, you can store two arrays `my_keys` and `my_values`, which always have the same length. This compresses better than a map in my experience, because the keys tend to change less frequently and you get more long runs of identical values in both arrays. But the syntax is much worse, something like `my_values[indexOf(my_keys, 'key')]`. It's a lot less intuitive for users, and I suspect it is still requires reading the entire arrays (now coordinating across two files).
* JSON type
  * This type has the ideal storage for great compression of each key's values and accommodating a dynamic number of keys. In one sense, this feature request is asking to decouple the special subcolumn storage and the JSON front-end, allowing the optimal storage to be used for more types. For example, a Map can be stricter than normal JSON, by only allowing a fixed level of nesting, and constraining the value types. Maybe it's possible to combine the flexibility of Maps (especially with the  -Map aggregation combinator) with the good storage of JSON, and without special-casing the JSON format. Practically, I can probably just use JSON while being mindful of parsing edge cases like https://github.com/ClickHouse/ClickHouse/issues/75017 and waiting for it to be possible to set the field type for all fields (to avoid Dynamic and Variant indirection).

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subcolumn storage for Map #78373

Company or project name

Use case

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Subcolumn storage for Map #78373

Description

Company or project name

Use case

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions