Skip to content

feat: add process memory telemetry and heap snapshot hooks#19494

Open
sjawhar wants to merge 3 commits intoanomalyco:devfrom
sjawhar:feat/memory-telemetry
Open

feat: add process memory telemetry and heap snapshot hooks#19494
sjawhar wants to merge 3 commits intoanomalyco:devfrom
sjawhar:feat/memory-telemetry

Conversation

@sjawhar
Copy link
Copy Markdown

@sjawhar sjawhar commented Mar 28, 2026

Issue for this PR

Fixes #16697

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds process memory telemetry and signal-triggered heap snapshot support to diagnose memory leaks and growth patterns.

Features:

  • SIGUSR1 triggers heap snapshot capture
  • SIGUSR2 triggers one-shot memory sample
  • Periodic RSS/heap/JSC sampling with configurable intervals
  • Wired into all entry points (server, TUI, workspace serve)

This enables production diagnostics for memory issues like the 187GB RSS incident.

How did you verify your code works?

  • Telemetry startup wired into runtime entry points
  • Signal handlers tested locally
  • Rebase conflicts in telemetry wiring resolved to preserve branch behavior

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Mar 28, 2026
@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Mar 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@sjawhar sjawhar force-pushed the feat/memory-telemetry branch from e98d77a to 5975b95 Compare March 30, 2026 21:36
sjawhar added 3 commits April 2, 2026 00:29
- Remove SIGTERM snapshot handler (serve.ts owns shutdown)
- Switch on-demand snapshots from SIGUSR2 to SIGUSR1
- Add RSS size guard (>10GB) and disk space check before snapshots
- Add snapshot-in-flight guard against concurrent signals
- Use Bun.gc(false) for regular sampling, Bun.gc(true) only for snapshots
- Add 60s startup grace period for growth alerts
- Add stderr fallback for critical telemetry events
- Wire telemetry into web.ts, acp.ts, workspace-serve.ts
@sjawhar sjawhar force-pushed the feat/memory-telemetry branch from 1f30205 to c597fe2 Compare April 2, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple memory leaks cause unbounded RAM growth during extended TUI usage

1 participant