Skip to content

Improve the performance of "DictionaryValue" row encoding #4712

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are consider using the "don't use dictionary interning" in DataFusion for high cardinality columns: apache/datafusion#7200 (comment)

@tustvold mentioned this mode could be made faster

Describe the solution you'd like
Review and optimize Code::DictionaryValues

https://github.com/apache/arrow-rs/blob/b810e8f207bbc70294b01acba4be32153c18a6ab/arrow-row/src/lib.rs#L437C14-L437C14

Perhaps this could be made faster:

Codec::DictionaryValues(converter, _) => {

(I am not sure)

Describe alternatives you've considered
There may not be any way to make this faster, but I wanted to file the ticket as follow on to the meeting

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratearrow-flightChanges to the arrow-flight crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions