Skip to content

Fix stale tlmcnt Redis keys causing interface disconnect loops#2857

Merged
mcosgriff merged 5 commits intomainfrom
2855-stale-tlmcnt-redis-keys-cause-interface-disconnect-loops
Feb 24, 2026
Merged

Fix stale tlmcnt Redis keys causing interface disconnect loops#2857
mcosgriff merged 5 commits intomainfrom
2855-stale-tlmcnt-redis-keys-cause-interface-disconnect-loops

Conversation

@mcosgriff
Copy link
Copy Markdown
Contributor

Stale TELEMETRYCNTS Redis keys left over from removed packet definitions caused RuntimeError in init_tlm_packet_counts and sync_tlm_packet_counts, which propagated through handle_packet() into handle_connection_lost(), triggering a continuous disconnect/reconnect loop.

Changes

  • init_tlm_packet_counts and sync_tlm_packet_counts now catch RuntimeError for unknown packet names and skip stale keys rather than raising.
  • A WARN is logged once per stale key per interface restart. The warned-key set is cleared on each init_tlm_packet_counts call so memory is bounded to the current connection epoch.
  • Unit tests added for init_tlm_packet_counts and sync_tlm_packet_counts (target_model) and an integration test confirming the interface stays connected when stale keys are present.
  • just test now accepts optional arguments so just test works alongside bare just test.

Closes #2855

mcosgriff and others added 3 commits February 20, 2026 13:21
Demonstrates that stale TELEMETRYCNTS Redis keys left over after a
plugin upgrade cause RuntimeError in both init_tlm_packet_counts and
sync_tlm_packet_counts, which propagates through handle_packet() and
triggers an interface disconnect/reconnect loop.

Also updates the just test recipe to accept optional args so that
`just test <file>` works alongside the bare `just test`.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Wrap System.telemetry.packet() calls in init_tlm_packet_counts and
sync_tlm_packet_counts with RuntimeError handling so stale TELEMETRYCNTS
keys left over after a plugin upgrade are skipped rather than propagating
an exception through handle_packet() into the interface disconnect logic.

A WARN is logged once per stale key per interface restart so operators
can identify leftover keys without flooding the log. The warned-keys set
is reset on each init_tlm_packet_counts call (each interface restart),
bounding memory to the stale keys seen in the current epoch only.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@mcosgriff mcosgriff requested a review from ryanmelt February 20, 2026 20:49
@mcosgriff mcosgriff self-assigned this Feb 20, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 55.55556% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.82%. Comparing base (640782d) to head (1a59117).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
openc3/lib/openc3/models/target_model.rb 55.55% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2857      +/-   ##
==========================================
+ Coverage   78.74%   78.82%   +0.08%     
==========================================
  Files         667      667              
  Lines       54497    54542      +45     
  Branches      731      731              
==========================================
+ Hits        42913    42993      +80     
+ Misses      11504    11469      -35     
  Partials       80       80              
Flag Coverage Δ
python 80.84% <ø> (+0.12%) ⬆️
ruby-api 80.26% <ø> (+0.04%) ⬆️
ruby-backend 82.14% <55.55%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mcosgriff mcosgriff marked this pull request as ready for review February 24, 2026 19:37
Copy link
Copy Markdown
Member

@jmthomas jmthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same exact code is in Ruby. Please make the equivalent changes there.

@sonarqubecloud
Copy link
Copy Markdown

@mcosgriff
Copy link
Copy Markdown
Contributor Author

This same exact code is in Ruby. Please make the equivalent changes there.

Updated Ruby implementation to match changes in Python

@mcosgriff mcosgriff requested a review from jmthomas February 24, 2026 23:36
@mcosgriff mcosgriff merged commit 2d83173 into main Feb 24, 2026
31 of 32 checks passed
@mcosgriff mcosgriff deleted the 2855-stale-tlmcnt-redis-keys-cause-interface-disconnect-loops branch February 24, 2026 23:56
jmthomas pushed a commit that referenced this pull request Mar 21, 2026
* Add failing tests for stale tlmcnt Redis keys (issue #2855)

Demonstrates that stale TELEMETRYCNTS Redis keys left over after a
plugin upgrade cause RuntimeError in both init_tlm_packet_counts and
sync_tlm_packet_counts, which propagates through handle_packet() and
triggers an interface disconnect/reconnect loop.

Also updates the just test recipe to accept optional args so that
`just test <file>` works alongside the bare `just test`.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* Fix stale tlmcnt Redis keys causing interface disconnect loops (#2855)

Wrap System.telemetry.packet() calls in init_tlm_packet_counts and
sync_tlm_packet_counts with RuntimeError handling so stale TELEMETRYCNTS
keys left over after a plugin upgrade are skipped rather than propagating
an exception through handle_packet() into the interface disconnect logic.

A WARN is logged once per stale key per interface restart so operators
can identify leftover keys without flooding the log. The warned-keys set
is reset on each init_tlm_packet_counts call (each interface restart),
bounding memory to the stale keys seen in the current epoch only.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* Ran formatter

* Same issue was in the Ruby implementation. Wrapping System.telemetry.packet to protect against stale redis keys

---------

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale tlmcnt Redis Keys cause Interface Disconnect Loops

2 participants