Skip to content

Buffer fixes and dict improvement#419

Merged
lenzo-ka merged 7 commits intomainfrom
kal-fixes-20250716
Jul 29, 2025
Merged

Buffer fixes and dict improvement#419
lenzo-ka merged 7 commits intomainfrom
kal-fixes-20250716

Conversation

@lenzo-ka
Copy link
Contributor

@lenzo-ka lenzo-ka commented Jul 16, 2025

Fix Buffer and Performance Issues

This addresses buffer issues and a performance tweak on reading dicts

Summary of Changes

1. Buffer Overflow Protection (CVE-worthy)

  • Fixed: Unsafe strcpy() usage in feature warping modules that could cause crashes with long parameter strings
  • Files: fe_warp_affine.c, fe_warp_inverse_linear.c, fe_warp_piecewise_linear.c
  • Solution: Use strncpy() with proper bounds checking and null termination

2. NULL Pointer Dereference Prevention

  • Fixed: Missing NULL check after malloc() in WebRTC VAD module
  • File: webrtc_vad.c
  • Solution: Add NULL check to prevent potential crashes

3. Dictionary Write Performance Optimization

  • Fixed: O(n²) string concatenation causing slow dictionary writes
  • File: dict.c
  • Solution: Replace repeated strcat() with single-pass memcpy()
  • Performance: 1.55x speedup for full CMU dictionary (304ms → 196ms)

Testing

All fixes include comprehensive unit tests:

  • test_fe_warp_overflow - Validates buffer overflow protection
  • test_vad_alloc - Tests memory allocation safety
  • test_dict_strcat - Verifies dictionary output correctness

All 95 tests pass. No breaking changes.

Impact

  • Security: Prevents potential crashes and exploits from buffer overflows
  • Reliability: Eliminates NULL pointer dereference risk
  • Performance: 35% faster dictionary writes, more significant gains with longer pronunciations (up to 2.58x)

lenzo-ka added 3 commits July 16, 2025 13:54
- Replace strcpy with strncpy in fe_warp_{affine,inverse_linear,piecewise_linear}.c
- Add bounds checking and ensure null termination for 256-byte buffers
- Add test_fe_warp_overflow to verify fix handles strings up to 511 chars
- Prevents potential memory corruption and crashes with long parameter strings
- Check malloc return value in WebRtcVad_Create to prevent NULL dereference
- Add test_vad_alloc to verify memory allocation safety
- Test includes 100 create/destroy cycles and NULL handling
- Prevents segfault on memory allocation failure
- Replace repeated strcat() calls with single-pass memcpy() approach
- Changes time complexity from O(n²) to O(n)
- Add test_dict_strcat to verify correctness of optimization
- Performance improvement: 1.24x-2.58x speedup (avg 1.70x)
- CMU dictionary write time reduced from 304ms to 196ms (~35% faster)
@lenzo-ka lenzo-ka requested a review from dhdaines July 16, 2025 18:13
@lenzo-ka lenzo-ka changed the title Buffer fixes and dict read improvement Buffer fixes and dict improvement Jul 16, 2025
@dhdaines
Copy link
Contributor

Oh dear! Thank you for this and sorry about the delay for reviewing (review to follow shortly)

Copy link
Contributor

@dhdaines dhdaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved (the comment on CI is not critical)

uses: actions/checkout@v4
- name: Wait for dpkg lock
run: |
while sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility that this could block forever? Perhaps it's better to fail fast...

VadInst* WebRtcVad_Create(void) {
VadInstT* self = (VadInstT*)malloc(sizeof(VadInstT));

if (self == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This should either be reported upstream (to Mozilla and/or Google) or we should update the WebRTC code from upstream... no longer sure where it lives exactly

phones = ckd_calloc(1, phlen);
pos = phones;
for (j = 0; j < dict_pronlen(dict, i); ++j) {
strcat(phones, dict_ciphone_str(dict, i, j));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is actually safe (because of the calculation of phlen) but it's really inefficient, and strcat should just be banished in any case...

return;
}
is_neutral = NO;
strcpy(temp_param_str, param_str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoo boy ... yes this is some CMU legacy code ...

@lenzo-ka lenzo-ka merged commit 158afa3 into main Jul 29, 2025
21 checks passed
@dhdaines dhdaines added this to the 5.1.0 milestone Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants