Add throughput testing infrastructure and fix Python telemetry performance#2742
Add throughput testing infrastructure and fix Python telemetry performance#2742
Conversation
Throughput Server (examples/throughput_server/): - Standalone TCP/IP server for measuring COSMOS command/telemetry throughput - Dual-port operation for INST (7778) and INST2 (7780) targets - CCSDS packet encoding/decoding with configurable streaming rates - Time-compensated streaming to maintain accurate rates up to 100kHz - Pre-allocated buffers for minimal allocation in hot paths - Raw TCP rate test scripts (Ruby/Python) achieving ~300-500k cmd/s DEMO Plugin Changes: - Add THROUGHPUT_STATUS telemetry packet with rate/count metrics - Add throughput commands: START_STREAM, STOP_STREAM, GET_STATS, RESET_STATS - Add throughput_test procedures for INST (Ruby) and INST2 (Python) - Add throughput screen for real-time monitoring - Add plugin variables to toggle between simulator and throughput server - Configure LengthProtocol for CCSDS packet framing when using throughput server Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add fire-and-forget mode to CommandTopic.send_command when timeout <= 0 to skip ACK waiting for high-throughput command scenarios - Add thread-safe packet caching in TargetModel with 10-second timeout to reduce Redis lookups for repeated packet access - Cache is automatically invalidated when set_packet is called - Add unit tests for packet caching in both Ruby and Python Co-Authored-By: Claude Opus 4.5 <[email protected]>
Root cause: jsonpath_ng.parse() was recompiling JSONPath expressions on every call, taking ~2.7ms per call. When identifying packets in unique_id_mode (required for INST2 due to mixed CCSDS/JSON packet types), this caused ~5.7ms overhead per packet, limiting throughput to ~320 Hz. Changes: - Add lru_cache to JSONPath parsing in JsonAccessor (103x speedup) - Add orjson as optional dependency for faster JSON parsing - Fix missing self.queued initialization in Python interface_microservice - Update throughput test scripts for both Ruby and Python Results: - Python telemetry: 320 Hz → 3,545 Hz (10x improvement) - Python now outperforms Ruby at high rates (3,545 Hz vs 2,627 Hz) - Zero packet loss maintained at all tested rates Co-Authored-By: Claude Opus 4.5 <[email protected]>
Revert the bytearray optimization in Python protocols to maintain backward compatibility for custom protocol implementations. The change from bytes to bytearray could break user code that: - Type checks self.data expecting bytes - Relies on immutability of self.data - Uses bytes-specific operations The JSONPath caching fix (the real 10x performance improvement) remains intact. Also adds tests for UPDATE_INTERVAL option in both Ruby and Python interface_microservice to verify the queued writes functionality. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2742 +/- ##
=======================================
Coverage 78.68% 78.69%
=======================================
Files 671 671
Lines 54738 54796 +58
Branches 731 731
=======================================
+ Hits 43072 43122 +50
- Misses 11586 11594 +8
Partials 80 80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @interface.options.each do |option_name, option_values| | ||
| if option_name.upcase == 'OPTIMIZE_THROUGHPUT' | ||
| # OPTIMIZE_THROUGHPUT was changed to UPDATE_INTERVAL to better represent the setting | ||
| if option_name.upcase == 'UPDATE_INTERVAL' or option_name.upcase == 'OPTIMIZE_THROUGHPUT' |
There was a problem hiding this comment.
This was just a bug ... missing keyword that we already changed in Python and in the docs
| while True: # Loop until we get some data | ||
| try: | ||
| data = self.read_socket.recv(4096, socket.MSG_DONTWAIT) | ||
| data = self.read_socket.recv(65535, socket.MSG_DONTWAIT) |
There was a problem hiding this comment.
Now sure how much of an optimization this is but it matches Ruby
|
Results calculated on my Macbook Pro M3 Max with 36GB RAM with Docker CPU limit 10 and memory limit 16GB RAM. Ruby results: Python results: |
| # This provides a ~1000x speedup for repeated accesses (2.7ms -> 2.7µs) | ||
| @lru_cache(maxsize=256) | ||
| def _parse_jsonpath(path): | ||
| return parse(path) |
There was a problem hiding this comment.
This was the huge win for Python telemetry performance ... it only affects JSON and CBOR but it affects everything if you have at least 1 JSON or CBOR packet defined.
|
The packet caching I added to target_model has a measurable impact on commanding. Python sees a 30-80% improvement and Ruby shows a 15-20% improvement. The caching optimization primarily benefits command operations where repeated packet lookups occur during burst sending. |
|
Comparing these changes (plus other previous changes) from 6.10.4 to now: With only binary CCSDS telemetry (no JSON, CBOR, XML, HTML): Full telemetry JSON, CBOR, XML, HTML: |
clayandgen
left a comment
There was a problem hiding this comment.
Going to run it myself shortly, in the meantime, reminder to remove the cache statistics data!
There was a problem hiding this comment.
Converting the Throughput scripts to a suite could be nice for the "Test Results" formatting!
Apple M4 Max, 36GB RAM results. Python commanding seems consistently higher performance than the M3 Max, Ruby results are comparable 👍
============================================================
SUMMARY (Ruby)
============================================================
Command Throughput:
100 cmd burst: 421.4 cmd/s
500 cmd burst: 455.5 cmd/s
1000 cmd burst: 486.5 cmd/s
Telemetry Throughput:
100 Hz target: 100.0 Hz (0.0% loss)
1000 Hz target: 1000.8 Hz (0.0% loss)
2000 Hz target: 1959.8 Hz (0.0% loss)
3000 Hz target: 2958.2 Hz (9.99% loss)
4000 Hz target: 2940.6 Hz (0% loss)
5000 Hz target: 2840.4 Hz (0% loss)
============================================================
SUMMARY (Python)
============================================================
Command Throughput:
100 cmd burst: 705.4 cmd/s
500 cmd burst: 664.8 cmd/s
1000 cmd burst: 724.0 cmd/s
Telemetry Throughput:
100 Hz target: 99.0 Hz (0% loss)
1000 Hz target: 976.4 Hz (0% loss)
2000 Hz target: 1963.6 Hz (0% loss)
3000 Hz target: 2949.6 Hz (0% loss)
4000 Hz target: 3895.6 Hz (0% loss)
5000 Hz target: 3788.0 Hz (0% loss)
|
|
||
| 1. Start the throughput server: | ||
| ```bash | ||
| python throughput_server.py |
There was a problem hiding this comment.
nit: python examples/throughput_server/throughput_server.py
| packets_received = final_cosmos_count - initial_cosmos_count | ||
|
|
||
| # Calculate actual rate from test data (more accurate than server's TLM_SENT_RATE which is stale) | ||
| actual_rate = packets_sent.to_f / duration |
There was a problem hiding this comment.
One observation, is that in the Command test it uses the actual system clock time (Time.now) as part of the calculation, whereas in Telemetry test we use the duration. This is probably arbitrarily close, but thought to note the discrepancy
|
Add throughput testing infrastructure and fix Python telemetry performance




Summary
This PR adds comprehensive throughput testing infrastructure and fixes a critical Python telemetry performance bottleneck, improving Python throughput by 10x.
Key Changes
JsonAccessorimproves throughput from ~320 Hz to 3,545 HzPerformance Results
Python now outperforms Ruby at high telemetry rates while maintaining zero packet loss.
Root Cause Analysis
The Python performance issue was caused by
jsonpath_ng.parse()recompiling JSONPath expressions on every call (~2.7ms per call). When identifying packets inunique_id_mode, this caused ~5.7ms overhead per packet. Addinglru_cacheto cache parsed expressions reduced this to ~0.12µs (23,000x speedup per call).Files Changed
Performance Fixes:
openc3/python/openc3/accessors/json_accessor.py- Add JSONPath caching withlru_cacheopenc3/python/openc3/microservices/interface_microservice.py- Fix missingself.queuedinitializationopenc3/python/pyproject.toml- Add orjson as optional dependencyThroughput Testing Infrastructure:
examples/throughput_server/- New standalone throughput testing serveropenc3-cosmos-demo/targets/INST/procedures/throughput_test.rb- Ruby throughput testopenc3-cosmos-demo/targets/INST2/procedures/throughput_test.py- Python throughput testopenc3-cosmos-demo/targets/*/screens/throughput.txt- Throughput monitoring screensCommand/Telemetry Optimizations:
openc3/lib/openc3/topics/command_topic.rb- Fire-and-forget modeopenc3/python/openc3/topics/command_topic.py- Fire-and-forget modeopenc3/lib/openc3/models/target_model.rb- Packet cachingopenc3/python/openc3/models/target_model.py- Packet cachingTest Coverage:
openc3/spec/microservices/interface_microservice_spec.rb- UPDATE_INTERVAL testopenc3/python/test/microservices/test_interface_microservice.py- UPDATE_INTERVAL testsopenc3/spec/models/target_model_spec.rb- Packet caching testsopenc3/python/test/models/test_target_model.py- Packet caching testsTest plan
🤖 Generated with Claude Code