Skip to content

[metrics] fix concurrency panic in processing duration#106

Merged
harveyxia merged 3 commits intomainfrom
fix-processing-duration-metric
Jan 20, 2026
Merged

[metrics] fix concurrency panic in processing duration#106
harveyxia merged 3 commits intomainfrom
fix-processing-duration-metric

Conversation

@harveyxia
Copy link
Copy Markdown
Collaborator

@harveyxia harveyxia commented Jan 20, 2026

The processing duration metric implementation uses google/btree, which is not safe to mutate during iteration.

The fix is simple, accumulate the items during iteration and mutate after.

I confirmed the root cause with a reproduction in a unit test (it's non-deterministic, you have to run it a few times), where I saw the exact same index out of bounds panic:

panic: runtime error: index out of range [47] with length 31

goroutine 21 [running]:
github.com/google/btree.(*node[...]).iterate(0x104a953a0, 0x0, {{{0x140004ac230?, 0x4}, {0x140004ac218, 0x6}, 0x23c, {0x0, 0x0, 0x0}, ...}, ...}, ...)
	/Users/harvey.xia/.gvm/pkgsets/go1.24/global/pkg/mod/github.com/google/[email protected]/btree_generic.go:541 +0x998
github.com/google/btree.(*node[...]).iterate(0x104a953a0, 0x14000295ce8, {{{0x140004ac230?, 0x4}, {0x140004ac218, 0x6}, 0x23c, {0x0, 0x0, 0x0}, ...}, ...}, ...)
	/Users/harvey.xia/.gvm/pkgsets/go1.24/global/pkg/mod/github.com/google/[email protected]/btree_generic.go:560 +0x940
github.com/google/btree.(*BTreeG[...]).DescendLessOrEqual(0x0?, {{0x140004ac230?, 0x4}, {0x140004ac218, 0x6}, 0x23c, {0x0, 0x0, 0x0}, 0x0}, ...)
	/Users/harvey.xia/.gvm/pkgsets/go1.24/global/pkg/mod/github.com/google/[email protected]/btree_generic.go:797 +0x120
github.com/reddit/achilles-sdk/pkg/fsm/metrics/internal.(*ProcessingStartTimes).SetRangeFailed(0x1400007c620, {0x140004ac218, 0x6}, {0x140004ac230, 0x4}, 0x23c)
	/Users/harvey.xia/Code/achilles-sdk/pkg/fsm/metrics/internal/processing_duration.go:143 +0x134
github.com/reddit/achilles-sdk/pkg/fsm/metrics/internal.Test_ProcessingStartTimes_Concurrency.func2(0x0?)
	/Users/harvey.xia/Code/achilles-sdk/pkg/fsm/metrics/internal/processing_duration_test.go:599 +0x164
created by github.com/reddit/achilles-sdk/pkg/fsm/metrics/internal.Test_ProcessingStartTimes_Concurrency in goroutine 7
	/Users/harvey.xia/Code/achilles-sdk/pkg/fsm/metrics/internal/processing_duration_test.go:585 +0x320

@harveyxia harveyxia requested a review from a team as a code owner January 20, 2026 15:20
@harveyxia harveyxia merged commit 7cb8935 into main Jan 20, 2026
1 check passed
@harveyxia harveyxia deleted the fix-processing-duration-metric branch January 20, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants