Skip to content

Add AAD as a first-class OperationParams variant in BDN benchmarks#1859

Merged
tiagonapoli merged 6 commits into
mainfrom
tiagonapoli/bdn-add-aad-param
Jun 6, 2026
Merged

Add AAD as a first-class OperationParams variant in BDN benchmarks#1859
tiagonapoli merged 6 commits into
mainfrom
tiagonapoli/bdn-add-aad-param

Conversation

@tiagonapoli

@tiagonapoli tiagonapoli commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Problem

Garnet's continuous BDN benchmarks (published to https://microsoft.github.io/garnet/charts/) cover the None / ACL / AOF matrix for Operations.* — but no variant exercises the AAD authentication code path. Regressions in GarnetAadAuthenticator.IsAuthorized() cannot be caught by these charts and only surface when someone runs a production CPU profile.

A recent example: DateTime.UtcNow accounted for ~62.7% exclusive CPU in a cluster under AAD auth (fixed in #1855). That regression would have shown up immediately on the InlinePing_AAD chart had this variant existed.

Fix

Add AAD as a 4th first-class OperationParams variant alongside None / ACL / AOF:

  • OperationParams — gains useAad bool + "AAD" (or "ACL+AAD", etc.) suffix in ToString().
  • OperationsBase.OperationParamsProvider — yields the new (false, false, useAad: true) variant.
  • OperationsBase.GlobalSetup — when Params.useAad:
    • Builds an in-process AadAuthenticationSettings backed by a symmetric signing key, a 12-hour JWT, and a stub IssuerSigningTokenProvider (no OpenID metadata refresh, authority="").
    • Sets opts.AuthSettings = aadSettings.
    • After session creation, sends a one-shot RESP AUTH default <jwt> so the session is fully authenticated before any benchmark iteration runs.
  • Self-contained: no external IdP, no network IO. Exercises the same GarnetAadAuthenticator code path production runs.

Result

Every existing Operations.* BDN benchmark now runs a 4th time with AAD enabled. The hot path includes the IsAuthenticatedIsAuthorized() check on every RESP command (twice per command, matching production), so any change in AAD-auth overhead immediately moves the _AAD chart line.

Local verification (Operations.BasicOperations.InlinePing, .NET 10, i7-12700K, 100 PINGs/op)

Params Mean Δ vs None
None 954.2 ns
ACL 1,014.1 ns +60 ns
AOF 986.1 ns +32 ns
AAD 2,594.8 ns +1,641 ns

The +1641 ns is the existing DateTime.UtcNow cost on the AAD path. Once #1855 merges, this drops to ~110 ns (a ~15× improvement on the AAD overhead specifically).

Costs / caveats

  • CI time: adds a 4th BDN variant to every Operations.* test (BasicOperations, ObjectOperations, HashObjectOperations, SortedSetOperations, CustomOperations, RawStringOperations, ScriptOperations, ModuleOperations, JsonOperations, PubSubOperations, SetOperations, TxnOperations). Roughly ~33% more time per matrix cell in ci-bdnbenchmark.yml.
  • BDN_Benchmark_Config.json: intentionally not updated. There are no expected baselines for the new _AAD entries yet, and github-action-benchmark picks up the new chart lines from the full BDN report regardless. Gates can be added once baselines settle (recommend WARN-ON-FAIL_* initially).
  • Lua / Network / Cluster benchmarks: unchanged. Different Params model; can be extended in a follow-up if desired.

Testing

  • dotnet build benchmark/BDN.benchmark — clean on net8.0 and net10.0.
  • Ran Operations.BasicOperations end-to-end: all 4 variants (None, ACL, AOF, AAD) execute and produce results. AAD-authenticated PING returns successfully, confirming the AUTH command + JWT validation are wired correctly.

Related

Copilot AI review requested due to automatic review settings June 5, 2026 21:55
@tiagonapoli tiagonapoli force-pushed the tiagonapoli/bdn-add-aad-param branch from 572b963 to 9ad76a9 Compare June 5, 2026 21:58
Tiago Napoli added 2 commits June 5, 2026 14:59
Per review feedback: extend OperationsBase so every Operations.* BDN benchmark runs a 4th variant alongside None / ACL / AOF that exercises GarnetAadAuthenticator on the service hot path.

- OperationParams gains a useAad flag (mutually exclusive with useACLs) and a corresponding _AAD suffix in ToString().

- OperationsBase.OperationParamsProvider yields the new variant.

- On AAD setup: build an in-process AadAuthenticationSettings backed by a symmetric signing key + a 12h JWT, set opts.AuthSettings, and send a one-shot RESP AUTH command immediately after the embedded session is created so subsequent benchmark iterations exercise IsAuthenticated on every command (twice per command via the AAD code path).

- Self-contained: no external IdP, no network IO; uses the same GarnetAadAuthenticator code path production runs.

- BDN_Benchmark_Config.json intentionally not updated yet (no baselines for the new variant); github-action-benchmark will pick up the new _AAD chart lines automatically. Gates can be added later.

Verified on Operations.BasicOperations.InlinePing: None 977ns, ACL 969ns, AOF 1000ns, AAD 1072ns (cost-of-AAD ~1ns per command — matches the microbenchmark in IsAuthorizedBenchmark).
@tiagonapoli tiagonapoli force-pushed the tiagonapoli/bdn-add-aad-param branch from 9ad76a9 to 2862019 Compare June 5, 2026 21:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the benchmark/BDN.benchmark “Operations.*” BenchmarkDotNet suite to include an additional, self-contained AAD-authenticated run variant, so continuous benchmark charts can catch performance regressions in Garnet’s AAD authentication hot path (IsAuthenticatedGarnetAadAuthenticator.IsAuthorized()).

Changes:

  • Added a new OperationParams.useAad flag and included AAD in OperationParams.ToString() output naming.
  • Extended OperationsBase.OperationParamsProvider() to yield a 4th parameter variant with useAad: true.
  • Implemented in-process AAD auth setup in OperationsBase.GlobalSetup() by building AadAuthenticationSettings (symmetric key + stub signing token provider) and sending a one-shot RESP AUTH default <jwt> after session creation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
benchmark/BDN.benchmark/Operations/OperationsBase.cs Adds AAD benchmark variant wiring, creates in-process AAD settings, and issues an initial AUTH to put the session into authenticated state before benchmarking.
benchmark/BDN.benchmark/Operations/OperationParams.cs Adds useAad to benchmark parameter struct and updates parameter naming (ToString) to reflect AAD-enabled runs.

Comment thread benchmark/BDN.benchmark/Operations/OperationParams.cs
Comment thread benchmark/BDN.benchmark/Operations/OperationsBase.cs
Tiago Napoli added 3 commits June 5, 2026 15:18
…ts/formatting

- Add EmbeddedNetworkSender.GetLastResponse() that returns the bytes written by the most recent SendResponse(int, int) call.

- EmbeddedRespServer.GetRespSession now accepts an optional sender so callers can hold a reference for response inspection.

- OperationsBase.GlobalSetup, on the AAD path, asserts the AUTH response is exactly '+OK\\r\\n' after sending the AUTH command. A failed AUTH would otherwise silently leave the session unauthenticated and the benchmark would measure auth-rejection overhead instead of the authenticated hot path. Verified by deliberately corrupting the JWT: BDN aborts with InvalidOperationException and the AAD result becomes NA — failure is loud, not silent.

- Drop the redundant 'matches the production GarnetAadAuthenticator code path exactly' comment and collapse two multi-line constructor invocations to one line per repo formatting preference.
- OperationParams.ctor now throws ArgumentException when both useACLs and useAad are true. Per Copilot review (#1859): useAad was documented as mutually exclusive with useACLs but nothing enforced it, so a caller passing (true, false, true) would silently get the ACL path with a confusing 'ACL+AAD' ToString label.

- Drop the redundant comment on BuildAadAuthSettings and collapse two multi-line constructor invocations to one line per repo formatting preference (lost in the revert of fb9464c).
tiagonapoli pushed a commit that referenced this pull request Jun 5, 2026
- OperationParams.ctor now throws ArgumentException when both useACLs and useAad are true. Per Copilot review (#1859): useAad was documented as mutually exclusive with useACLs but nothing enforced it, so a caller passing (true, false, true) would silently get the ACL path with a confusing 'ACL+AAD' ToString label.

- Drop the redundant comment on BuildAadAuthSettings and collapse two multi-line constructor invocations to one line per repo formatting preference (lost in the revert of fb9464c).
@tiagonapoli tiagonapoli merged commit c5ec666 into main Jun 6, 2026
144 of 145 checks passed
@tiagonapoli tiagonapoli deleted the tiagonapoli/bdn-add-aad-param branch June 6, 2026 00:19
badrishc pushed a commit that referenced this pull request Jun 6, 2026
…#1863)

* Revert "charts: group all Params variants into one chart per benchmark (#1861)"

This reverts commit b89cc9e.

* charts: add AAD variants for Operations.* benchmarks to Featured

PR #1859 added Params: AAD as a 4th first-class OperationParams variant. The Featured Benchmarks view didn't surface the new AAD charts because FEATURED_BENCHMARKS only listed the (Params: None) entry for each benchmark.

Add the 8 corresponding (Params: AAD) entries for the Operations.* benchmarks that gained an AAD variant. Network.* and Lua.* are unchanged since OperationsBase's useAad flag only applies to the Operations.* hierarchy.

---------

Co-authored-by: Tiago Napoli <[email protected]>
tiagonapoli pushed a commit that referenced this pull request Jun 8, 2026
- OperationParams.ctor now throws ArgumentException when both useACLs and useAad are true. Per Copilot review (#1859): useAad was documented as mutually exclusive with useACLs but nothing enforced it, so a caller passing (true, false, true) would silently get the ACL path with a confusing 'ACL+AAD' ToString label.

- Drop the redundant comment on BuildAadAuthSettings and collapse two multi-line constructor invocations to one line per repo formatting preference (lost in the revert of fb9464c).
tiagonapoli added a commit that referenced this pull request Jun 9, 2026
* Cache UTC time in GarnetAadAuthenticator.IsAuthorized to remove syscall from hot path

IsAuthorized() is checked twice per command (AdminCommands.cs:32 and :131)
via the IsAuthenticated property. Under AAD auth, DateTime.UtcNow showed
~60% exclusive CPU in cluster traces because it issues a clock_gettime
syscall via vDSO on every call.

Introduce Garnet.common.CachedTime, a process-wide UTC time cache refreshed
every 100ms by a background Timer (modeled after Redis's mstime). Reads are
a pure memory load via Volatile.Read of a static long.

GarnetAadAuthenticator now precomputes the token validity window in ticks
once in Authenticate() and compares against CachedTime.UtcNowTicks in
IsAuthorized(). 100ms granularity is operationally invisible against AAD
token windows measured in hours.

Co-authored-by: Copilot <[email protected]>

* Fix misleading Redis hz=10 reference in CachedTime doc

Redis's server.mstime is refreshed both from serverCron (hz floor, default
100ms) and from beforeSleep (every event loop iteration), so under load it
updates far more often than 100ms. Rephrase as an illustrative example
rather than a precise cadence match.

Co-authored-by: Copilot <[email protected]>

* Drop underscore prefix on CachedTime private fields

libs/common consistently uses bare camelCase for private fields; align.

Co-authored-by: Copilot <[email protected]>

* Drop unnecessary tick-cached validity fields

GarnetAadAuthenticator is per-RespServerSession (single-threaded), so
_validFrom/_validateTo are never accessed cross-thread; there's no
DateTime atomicity concern and DateTime comparison compiles to the same
mask+compare as long comparison. Strip the fields and inline
CachedTime.UtcNow into the original DateTime check.

Co-authored-by: Copilot <[email protected]>

* Extract Timer callback into named UpdateCachedUtc method; tidy stale docs

Co-authored-by: Copilot <[email protected]>

* Collapse Timer ctor to single line; document benign overlap race

Co-authored-by: Copilot <[email protected]>

* Drop Redis reference from RefreshPeriodMs doc

Co-authored-by: Copilot <[email protected]>

* Use block-scoped namespace in CachedTime (matches libs/common convention)

Co-authored-by: Copilot <[email protected]>

* Drop platform syscall names from CachedTime summary

Co-authored-by: Copilot <[email protected]>

* Note CachedTime staleness can exceed RefreshPeriodMs under high load

Co-authored-by: Copilot <[email protected]>

* Drop Timer comment from CachedTime

Co-authored-by: Copilot <[email protected]>

* Rename CachedTime to CoarseUtcNow; switch accessors to .Value/.Ticks

Co-authored-by: Copilot <[email protected]>

* Correct CoarseUtcNow doc: DateTime.UtcNow is vDSO/KUSER_SHARED_DATA, not syscall

Co-authored-by: Copilot <[email protected]>

* Add BDN benchmark for IsAuthorized hot path: DateTime.UtcNow vs CoarseUtcNow

Mirrors GarnetAadAuthenticator.IsAuthorized() body verbatim plus the raw
clock primitives for context. Results on i7-12700K / .NET 10:

  IsAuthorized_DateTimeUtcNow : 19.34 ns (baseline)
  IsAuthorized_CoarseUtcNow   :  1.35 ns (0.070x — ~14.4x faster)
  Raw_DateTimeUtcNow          : 18.06 ns
  Raw_CoarseUtcNowValue       :  0.51 ns

Co-authored-by: Copilot <[email protected]>

* Rename CoarseUtcNow to CoarseDateTime (UtcNow/UtcNowTicks); use Ticks in AAD hot path

CoarseDateTime exposes both UtcNow (DateTime) and UtcNowTicks (long) accessors.
GarnetAadAuthenticator now caches token validity as long ticks and compares against
CoarseDateTime.UtcNowTicks, eliminating DateTime's per-call Kind-bit masking on the
hot path. BDN benchmark shows 1.35 ns -> 0.32 ns vs the DateTime variant
(~60x speedup vs DateTime.UtcNow baseline).

Co-authored-by: Copilot <[email protected]>

* Drop raw clock primitive benchmarks; keep only IsAuthorized variants

Co-authored-by: Copilot <[email protected]>

* Add blank lines around tick-fields block in GarnetAadAuthenticator

Co-authored-by: Copilot <[email protected]>

* Refactor coarse time to TimeProvider; inject into AAD authenticator; add tests + TickCount64 benchmark

Addresses PR review feedback:

1. CoarseDateTime becomes a thin static wrapper around the new CoarseTimeProvider (instance) which wraps an arbitrary TimeProvider and uses TimeProvider.CreateTimer for cache refresh. CoarseTimeProvider.System is the process-wide singleton backed by TimeProvider.System -- production behaviour unchanged.

2. GarnetAadAuthenticator now takes a CoarseTimeProvider (default = .System). Hot path is identical in production and tests; tests inject a FakeTimeProvider-backed CoarseTimeProvider so advancing the fake fires the same Timer-refresh callback the production cache uses.

3. AadAuthenticationSettings exposes the public TimeProvider parameter, internally owning/disposing the CoarseTimeProvider lifecycle (reuses .System singleton when no override).

4. New GarnetAadAuthenticatorTests covers: default fallback, valid window, expiry after Advance(), staying valid across Advances, malformed token, failed reauthentication clears state.

5. IsAuthorizedBenchmark adds the Environment.TickCount64 variant suggested in PR review (Tornhoof) -- monotonic ms-since-boot, validity window projected at Authenticate() time.

* CoarseTimeProvider extends TimeProvider; drop CoarseDateTime facade

- CoarseTimeProvider now inherits TimeProvider: overrides GetUtcNow to return the cached coarse value; delegates GetTimestamp / TimestampFrequency / LocalTimeZone / CreateTimer to the wrapped provider. Adds an explicit fast UtcNowTicks accessor (single Volatile.Read) that the AAD hot path keeps using.

- Dispose() no-ops on the .System singleton so callers can dispose unconditionally; removes the _ownsCoarseTime bool from AadAuthenticationSettings.

- Drop the CoarseDateTime static facade -- redundant now that CoarseTimeProvider.System exposes UtcNow/UtcNowTicks directly. Benchmark renamed (CoarseTimeProvider / CoarseTimeProviderTicks).

- libs/common naming convention: drop underscore prefix on new private fields in CoarseTimeProvider (matches the rest of libs/common).

* Tighten CoarseTimeProvider API; add Stopwatch.GetTimestamp benchmark

- Make CoarseTimeProvider ctor private; CoarseTimeProvider.Create is now the only entry point, structurally preventing accidental Timer duplication.

- Replace RefreshPeriodMs (int) with RefreshPeriod (static readonly TimeSpan); pass directly to CreateTimer.

- Add IsAuthorized_StopwatchTimestamp benchmark for completeness.

- Test updated to use CoarseTimeProvider.Create(fakeProvider).

* Verify AAD AUTH succeeded in BDN OperationsBase setup; tighten comments/formatting

- Add EmbeddedNetworkSender.GetLastResponse() that returns the bytes written by the most recent SendResponse(int, int) call.

- EmbeddedRespServer.GetRespSession now accepts an optional sender so callers can hold a reference for response inspection.

- OperationsBase.GlobalSetup, on the AAD path, asserts the AUTH response is exactly '+OK\\r\\n' after sending the AUTH command. A failed AUTH would otherwise silently leave the session unauthenticated and the benchmark would measure auth-rejection overhead instead of the authenticated hot path. Verified by deliberately corrupting the JWT: BDN aborts with InvalidOperationException and the AAD result becomes NA — failure is loud, not silent.

- Drop the redundant 'matches the production GarnetAadAuthenticator code path exactly' comment and collapse two multi-line constructor invocations to one line per repo formatting preference.

* Revert "Verify AAD AUTH succeeded in BDN OperationsBase setup; tighten comments/formatting"

This reverts commit fb9464c.

* Reject ACL+AAD combo in OperationParams ctor; collapse formatting

- OperationParams.ctor now throws ArgumentException when both useACLs and useAad are true. Per Copilot review (#1859): useAad was documented as mutually exclusive with useACLs but nothing enforced it, so a caller passing (true, false, true) would silently get the ACL path with a confusing 'ACL+AAD' ToString label.

- Drop the redundant comment on BuildAadAuthSettings and collapse two multi-line constructor invocations to one line per repo formatting preference (lost in the revert of fb9464c).

* dotnet format: strip trailing newlines per .editorconfig insert_final_newline=false

* Clarify in ClusterAadAuthTests that the tenant ID is a public identifier

Co-authored-by: Copilot <[email protected]>

* Bump CoarseTimeProvider.RefreshPeriod from 100ms to 1s

Coarseness is already documented as ~RefreshPeriod best-effort; AAD token
validity windows are measured in hours, so a 1s lag is operationally invisible
and reduces Timer wakeups 10x.

Co-authored-by: Copilot <[email protected]>

* Add the same not-a-secret comment to GarnetAadAuthenticatorTests

Co-authored-by: Copilot <[email protected]>

* Simplify CoarseTimeProvider; switch authenticator to TimeProvider abstraction

CoarseTimeProvider is no longer parameterized on a wrapped TimeProvider: it
always caches TimeProvider.System, the refresh Timer is process-wide static,
and instances are stateless (Create/Dispose and the wrapping ctor are gone).
A new Instance singleton replaces the former System field.

GarnetAadAuthenticator and AadAuthenticationSettings now hold an abstract
TimeProvider (defaulting to CoarseTimeProvider.Instance) instead of a
concrete CoarseTimeProvider. IsAuthorized reads ticks via
_timeProvider.GetUtcNow().UtcTicks so tests can inject a FakeTimeProvider
directly, and production keeps the cached hot-path read through Instance.

Tests updated accordingly, plus a dedicated CoarseTimeProviderTests fixture.

Co-authored-by: Copilot <[email protected]>

* Cache DateTimeOffset directly in CoarseTimeProvider via Snapshot wrapper

Switch the cache from long utcTicks (reconstructed per read) to a
heap-allocated Snapshot holding the immutable DateTimeOffset. Readers
become a single Volatile.Read on a reference + field load, avoiding the
DateTimeOffset constructor's range-checks (ticks bounds, offset bounds)
on the hot path. The Timer allocates one Snapshot per refresh tick
(~24 B/s, gen-0).

Co-authored-by: Copilot <[email protected]>

* Simplify CoarseTimeProvider: drop UtcNowTicks cache and redundant overrides

- Remove the parallel long-ticks Volatile cache (UtcNowTicks). Benchmarks showed the heap-allocated Snapshot<DateTimeOffset> reaches parity with direct ticks under PGO, so the dual cache is unnecessary complexity.

- Drop the four delegating TimeProvider overrides (GetTimestamp, TimestampFrequency, LocalTimeZone, CreateTimer). The base class defaults already match TimeProvider.System, so the overrides are pure noise.

Co-authored-by: Copilot <[email protected]>

* Drop spurious allocation-rate estimate in Snapshot comment

Co-authored-by: Copilot <[email protected]>

* Fix DateTimeOffset size in Snapshot comment (it's 16 bytes, not 12)

Co-authored-by: Copilot <[email protected]>

* Rename Snapshot to DateTimeOffsetSnapshot for clarity

Co-authored-by: Copilot <[email protected]>

* Use volatile field for snapshot instead of Volatile.Read/Write

The volatile keyword gives reference fields the same acquire/release semantics as Volatile.Read/Write at the call sites, but reads cleaner. The two writes (init + timer tick) and two reads (GetUtcNow + UtcNow) now look like ordinary field access.

Co-authored-by: Copilot <[email protected]>

* Move DateTimeOffsetSnapshot to end and trim comment verbosity

Co-authored-by: Copilot <[email protected]>

* Trim verbose comments in GarnetAadAuthenticator

Co-authored-by: Copilot <[email protected]>

* Revert TimeProvider passthrough in AadAuthenticationSettings

GarnetAadAuthenticator's ctor already defaults TimeProvider to CoarseTimeProvider.Instance when null, so the settings-level passthrough is unnecessary in production.

Co-authored-by: Copilot <[email protected]>

* Remove RefreshPeriod XML doc

Co-authored-by: Copilot <[email protected]>

* Drop quantified speedup from tick comparison comment

Co-authored-by: Copilot <[email protected]>

* Remove _timeProvider comment

Co-authored-by: Copilot <[email protected]>

* Make UTC explicit in token validity tick projection

DateTime.Ticks ignores Kind, so .Ticks on a non-UTC DateTime would silently miscompare against UtcTicks. Construct DateTimeOffset with TimeSpan.Zero so UTC interpretation is explicit at the assignment, and a Local-kind DateTime would throw instead of silently mismatching.

Also trim the verbose XML doc on GarnetAadAuthenticatorTests.

Co-authored-by: Copilot <[email protected]>

* Comment UTC contract on token validity tick projection

Co-authored-by: Copilot <[email protected]>

* Drop redundant CoarseTimeProvider/default-clock tests

Co-authored-by: Copilot <[email protected]>

* Update IsAuthorized BDN to match CoarseTimeProvider API

Co-authored-by: Copilot <[email protected]>

* Trim IsAuthorized BDN to DateTime vs CoarseTimeProvider

Drop the Stopwatch.GetTimestamp variant and a verbose test comment.

Co-authored-by: Copilot <[email protected]>

* Apply dotnet format

Co-authored-by: Copilot <[email protected]>

* Make CoarseTimeProvider constructor private

Singleton via Instance is the only intended use; private ctor prevents accidental allocations.

Co-authored-by: Copilot <[email protected]>

* Trim CoarseTimeProvider.Instance xmldoc

Co-authored-by: Copilot <[email protected]>

* Add minimal CoarseTimeProvider tests

Co-authored-by: Copilot <[email protected]>

* Apply dotnet format

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Tiago Napoli <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants