Skip to content

Comments

Fixing longTermMax stats corruption in Regulator#1365

Merged
cchafe merged 1 commit intodevfrom
bugfix/regulator-stats-corruption
Dec 23, 2024
Merged

Fixing longTermMax stats corruption in Regulator#1365
cchafe merged 1 commit intodevfrom
bugfix/regulator-stats-corruption

Conversation

@mikedickey
Copy link
Collaborator

When auto headroom is updated dynamically, it could cause a race condition where calcAuto() was called before any measurements were made (ctr == 0). This would cause a recording of "max" with a negative value (the reset() value of -999999.0). This is such a large (or small?) value that it corrupted the longTermMax rolling average for a long duration of time.

The result of longTermMax corruption is that tolerance would be reset to a minimum value equal to the duration of a single audio buffer. This resulted in audio received from the participant being very garbled until longTermMax was finally able to recover.

It's a pretty nasty bug. A bit hard to find and unwind, but the fix is pretty clear and obvious. I just added some extra sanity checks to calcAuto to ensure that max (and min) are only used with sane values.

The "good" news is that I think this was only recently introduced by adding the ability to adjust headroom dynamically (via OSC). The "bad" news is that we currently have this running in production and I'm pretty sure vs-agent is calling that method at least once on startup for all studio sessions.

When auto headroom is updated dynamically, it could cause a race
condition where calcAuto() was called before any measurements
were made (ctr == 0). This would cause a recording of "max" with
a negative value (the reset() value of -999999.0). This is such
a large (or small?) value that it corrupted the longTermMax
rolling average for a long duration of time.

The result of longTermMax corruption is that tolerance would be
reset to a minimum value equal to the duration of a single audio
buffer. This resulted in audio received from the participant being
very garbled until longTermMax was finally able to recover.

It's a pretty nasty bug. A bit hard to find and unwind, but the fix
is pretty clear and obvious. I just added some extra sanity checks
to calcAuto to ensure that max (and min) are only used with sane
values.

The "good" news is that I think this was only recently introduced
by adding the ability to adjust headroom dynamically (via OSC).
The "bad" news is that we currently have this running in production
and I'm pretty sure vs-agent is calling that method at least once
on startup for all studio sessions.
@mikedickey mikedickey requested review from cchafe and nwang92 December 23, 2024 20:15
@mikedickey
Copy link
Collaborator Author

I'm actually not entirely sure yet how this gets triggered. I've been able to reproduce it even without dynamically updating headroom. I thought tat was making it easier to repro at least, but it may have been a cooincidence. I think it may have more to do with peer having a different buffer size because I haven't been able to repro (at least yet) when they are the same. So impact surface may not be all that big.. Whatever the specific triggers may be, I'm still confident in the fix.

Copy link
Collaborator

@cchafe cchafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and fix!

@cchafe cchafe merged commit 42b57fa into dev Dec 23, 2024
19 checks passed
@mikedickey mikedickey deleted the bugfix/regulator-stats-corruption branch December 24, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants