core: reduce tag index memory overhead by brharrington · Pull Request #1878 · Netflix/atlas

brharrington · 2026-02-27T15:57:39Z

Reduce initial capacity for temporary HashSets used during index construction. Based on production data, the number of unique keys is typically under 10k and unique values are roughly 1/6th of the number of items, so items.length was significantly oversized for both.

Flatten itemTags from Array[Array[Int]] to a pair of flat arrays (offsets + data). This eliminates one Array[Int] object per item, saving ~24 bytes of JVM object overhead each. For a production index with ~16M items, this saves roughly 390MB.

Reduce initial capacity for temporary HashSets used during index construction. Based on production data, the number of unique keys is typically under 10k and unique values are roughly 1/6th of the number of items, so items.length was significantly oversized for both. Flatten itemTags from Array[Array[Int]] to a pair of flat arrays (offsets + data). This eliminates one Array[Int] object per item, saving ~24 bytes of JVM object overhead each. For a production index with ~16M items, this saves roughly 390MB.

brharrington added this to the 1.9.0 milestone Feb 27, 2026

brharrington merged commit 4e671a4 into Netflix:main Feb 27, 2026
5 checks passed

brharrington deleted the idx-flatten branch February 27, 2026 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: reduce tag index memory overhead#1878

core: reduce tag index memory overhead#1878
brharrington merged 1 commit intoNetflix:mainfrom
brharrington:idx-flatten

brharrington commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brharrington commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant