Skip to content

ref(mem,perf): Turning well-known headers into a Set#3351

Merged
RobinTail merged 6 commits intomasterfrom
well-known-headers-set
Apr 29, 2026
Merged

ref(mem,perf): Turning well-known headers into a Set#3351
RobinTail merged 6 commits intomasterfrom
well-known-headers-set

Conversation

@RobinTail
Copy link
Copy Markdown
Owner

@RobinTail RobinTail commented Apr 25, 2026

This should solve two problems:

  1. Unnecessary memory consumption (2k) by the large array that is statically imported while only used by Documentation (not API runtime);
  2. Array::includes() is O(n) while Set::has is O(1): in current situation it could give about 5.5x performance improvement in worst scenario (finding last entry)

The solution, however, requires a lazy creation of the Set because it uses even more memory, so:

  • R.once() is involved, which is also reducing the expected performance boost down to 3.58x,
  • module cache with nullish coalescing assignment used giving 4.8x boost (better than R.once)

Summary by CodeRabbit

  • Performance Improvements

    • Header lookup now uses a memoized set for faster, lower-memory checks.
  • Tests

    • Added benchmarks measuring header lookup performance.
    • Added unit tests validating the memoized header collection.
    • Removed an obsolete queue-operations benchmark.
  • Chores

    • Updated generated tooling output to use the new header lookup approach.
  • Changelog

    • Recorded release v27.2.5 noting memory and lookup improvements.

@RobinTail RobinTail added the refactoring The better way to achieve the same result label Apr 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36050ff0-0ef2-49bd-b6bf-48f46bc585ff

📥 Commits

Reviewing files that changed from the base of the PR and between e1f1868 and 9da827a.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Replaces a default-exported header array with a named getWellKnownHeaders() function that returns a cached Set<string>, updates consumers to use .has(), adds a headers lookup benchmark, and removes an obsolete queue benchmark file.

Changes

Cohort / File(s) Summary
Header lookup implementation
express-zod-api/src/well-known-headers.ts, express-zod-api/tools/headers.ts, express-zod-api/src/documentation-helpers.ts
Replaced default string[] export with a named getWellKnownHeaders(): Set<string> (lazy-cached module-scoped Set). Tooling now emits getWellKnownHeaders; consumer code updated from .includes() to .has().
Benchmarks & tests
express-zod-api/bench/headers-lookup.bench.ts, express-zod-api/bench/queue-ops.bench.ts, express-zod-api/tests/well-known-headers.spec.ts
Added headers-lookup.bench.ts comparing Array .includes() vs Set .has() for "x-frame-options", removed queue-ops.bench.ts, and added a test confirming getWellKnownHeaders() returns a memoized Set with >200 entries.
Changelog
CHANGELOG.md
Added v27.2.5 entry describing deferred population of well-known headers and improved header lookup performance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

prevention, documentation, coverage

Poem

🐇 I nibble headers in a sunny spot,
A cached Set remembers every dot,
Hop once to fetch, then stay out of sight,
Fast has-checks make my mornings bright,
Hooray — memoized and light! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: converting well-known headers from an array to a Set for memory and performance optimization.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch well-known-headers-set

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coveralls-official
Copy link
Copy Markdown

coveralls-official Bot commented Apr 25, 2026

Coverage Status

coverage: 100.0%. remained the same — well-known-headers-set into master

@RobinTail RobinTail marked this pull request as ready for review April 25, 2026 19:50
Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed — no issues found.

Task list (4/4 completed)
  • Read the PR diff to understand the changes
  • Read the changed source files for context
  • Check for impact on consumers (exports, tests, docs)
  • Draft and submit review

Pullfrog  | View workflow run | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
express-zod-api/bench/headers-lookup.bench.ts (1)

7-15: Benchmark composition note.

Calling getWellKnownHeaders() inside the Set.has bench (line 14) while hoisting headersArray outside (line 7) is intentional per the PR description — it captures the realistic call site in defaultIsHeader (incl. R.once invocation overhead) and explains the reported ~3.58× vs. the raw 5.5× from the POC. Worth a one-line comment in the file so future readers don't "fix" it by hoisting the Set.

♻️ Optional clarifying comment
   const headersArray = Array.from(getWellKnownHeaders());
 
   bench("Array.includes", () => {
     headersArray.includes(target);
   });
 
+  // Intentionally not hoisting the Set: this mirrors `defaultIsHeader`'s real call pattern
+  // (R.once-memoized accessor invoked on every check).
   bench("Set.has", () => {
     getWellKnownHeaders().has(target);
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@express-zod-api/bench/headers-lookup.bench.ts` around lines 7 - 15, Add a
one-line comment above the Set.has benchmark explaining that calling
getWellKnownHeaders() inside that bench is intentional (unlike the hoisted
headersArray) to replicate the real defaultIsHeader call site including R.once
invocation overhead, and to justify the observed ~3.58× vs POC raw 5.5× result;
reference the getWellKnownHeaders() call and headersArray constant so future
readers don't hoist the Set and change the benchmark semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@express-zod-api/bench/headers-lookup.bench.ts`:
- Around line 7-15: Add a one-line comment above the Set.has benchmark
explaining that calling getWellKnownHeaders() inside that bench is intentional
(unlike the hoisted headersArray) to replicate the real defaultIsHeader call
site including R.once invocation overhead, and to justify the observed ~3.58× vs
POC raw 5.5× result; reference the getWellKnownHeaders() call and headersArray
constant so future readers don't hoist the Set and change the benchmark
semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 45ab854c-8034-439d-ad9c-778136a0c5b2

📥 Commits

Reviewing files that changed from the base of the PR and between 40a559a and 1472fa2.

📒 Files selected for processing (5)
  • express-zod-api/bench/headers-lookup.bench.ts
  • express-zod-api/bench/queue-ops.bench.ts
  • express-zod-api/src/documentation-helpers.ts
  • express-zod-api/src/well-known-headers.ts
  • tools/headers.ts
💤 Files with no reviewable changes (1)
  • express-zod-api/bench/queue-ops.bench.ts

… R.once() - up to 4.8x speed over Array::includes().
@RobinTail RobinTail changed the title exp(Documentation): Turning well-known headers into a Set chore(mem,perf): Turning well-known headers into a Set Apr 26, 2026
Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues. Reviewed the following changes:

  • Replaced R.once memoization with a module-level cache variable and ??= nullish coalescing assignment, removing the Ramda dependency from this module
  • Updated the code generator in tools/headers.ts to emit the new memoization pattern

Pullfrog  | View workflow run | Using Claude Opus𝕏

@RobinTail
Copy link
Copy Markdown
Owner Author

@pullfrog evaluate the approach thoughtfully: explain the impact on memory consumption and performance both for API runtime and when running Documentation generator.

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 26, 2026

Posted a detailed analysis of the approach's impact on memory consumption and performance. Key findings: the lazy Set pattern saves ~5 KB at API runtime (avoids eager array allocation), the 3.55x lookup speedup is real but absolute gains are negligible in practice (sub-microsecond per call), and the bundle size is effectively unchanged (+18 bytes). Also identified that errors.ts uses a value import of OpenAPIContext (an interface) which forces the entire documentation-helpers chain into the runtime bundle.

Task list (5/5 completed)
  • Checkout PR branch and review the diff
  • Analyze memory consumption impact (API runtime vs Documentation)
  • Analyze performance impact (Set vs Array lookup)
  • Review lazy initialization pattern and its tradeoffs
  • Post analysis comment on the PR

Pullfrog  | View workflow run | via Pullfrog | Using Claude Opus𝕏

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 26, 2026

Analysis: Impact on Memory and Performance

The import chain problem

The PR description mentions the array is "statically imported while only used by Documentation (not API runtime)." This is true at the source level — only documentation-helpers.ts imports well-known-headers.ts, and defaultIsHeader is only called during OpenAPI spec generation.

However, the well-known headers module cannot be excluded from the API runtime because the build produces a single bundle (dist/index.js), and the import graph forces inclusion regardless:

index.ts → errors.ts → documentation-helpers.ts → well-known-headers.ts

The bridge is errors.ts:3:

import { OpenAPIContext } from "./documentation-helpers";

This is a value import of an interface (could be import type), which forces the bundler to include the entire documentation-helpers.ts module and all its transitive dependencies. The header strings end up in the bundle either way.

Memory consumption

Both before and after this PR, the minified bundle contains the same ~2.7 KB string of dot-separated header names:

Branch Bundle representation Runtime memory at module load
master `a-im.accept...`.split(`.`) → eagerly creates an Array of 204 strings ~2 KB for the string + ~5 KB for the array (204 pointers + header objects)
PR cache ??= new Set(....split(.)) → deferred to first getWellKnownHeaders() call ~2 KB for the string literal only (the Set is not constructed until called)

The lazy pattern does save memory at API runtime startup — the Set (which uses ~8–10 KB due to hash table overhead) is never constructed unless Documentation is instantiated. On master, the array (~5 KB) is created eagerly at module load.

Net savings for API-only runtime: ~5 KB (the array is never allocated). This is real but modest.

When Documentation is used: The Set costs more memory than the array (~8–10 KB vs ~5 KB due to hash table overhead), but the array is no longer created at all, so the net difference is roughly +3–5 KB. This is negligible for a documentation generation process.

Lookup performance

The benchmark confirms Set.has is ~3.55x faster than Array.includes for the worst-case lookup (x-frame-options, near the end of the 204-element list). This is consistent with O(1) vs O(n) for n=204.

In practice, defaultIsHeader is called once per input property per endpoint during Documentation construction. For a typical API with, say, 50 endpoints averaging 5 input properties each, that's ~250 calls. At ~0.0002 ms per Set.has call vs ~0.0002 ms per Array.includes call (both are sub-microsecond at this scale), the total wall-clock difference for the entire documentation generation is on the order of microseconds — unmeasurable in real use.

The performance gain is real in relative terms (3.55x) but the absolute improvement is negligible because documentation generation is dominated by JSON Schema processing, not header lookups.

The lazy initialization pattern

The let cache + cache ??= new Set(...) pattern is clean and works correctly:

  • No extra dependency (R.once was rightly discarded)
  • The module-level let is safe because ESM modules are singletons
  • The ??= ensures the Set is constructed at most once

Summary

Aspect Impact
API runtime memory Saves ~5 KB (avoids eagerly allocating the array)
Documentation memory Costs ~3–5 KB more (Set overhead vs array) — negligible
API runtime performance ZerodefaultIsHeader is never called at runtime
Documentation performance Negligible improvement (~microseconds saved total)
Bundle size +18 bytes (66,523 vs 66,505) — effectively identical

The approach is sound. The primary benefit is avoiding eager allocation of data that only the Documentation generator needs. The lookup speedup is a nice theoretical improvement but won't be measurable in practice.

A follow-up worth considering: changing errors.ts:3 to import type { OpenAPIContext } would break the runtime import chain from errors.ts into documentation-helpers.ts, which could let a future code-splitting setup fully exclude documentation code from the API runtime bundle.

Pullfrog  | View workflow run | via Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues. Reviewed the following changes:

  • Added unit test for getWellKnownHeaders() covering return type (Set), entry count, and memoization (referential equality)

Pullfrog  | View workflow run | Using Claude Opus𝕏

@RobinTail
Copy link
Copy Markdown
Owner Author

this may accompany another Set lookup in #3344

@RobinTail RobinTail changed the title chore(mem,perf): Turning well-known headers into a Set ref(mem,perf): Turning well-known headers into a Set Apr 27, 2026
@RobinTail RobinTail merged commit 7757d69 into master Apr 29, 2026
14 checks passed
@RobinTail RobinTail deleted the well-known-headers-set branch April 29, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring The better way to achieve the same result

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant