Shutdown statsbeat after failure threshold is met#1127
Shutdown statsbeat after failure threshold is met#1127lzchen merged 9 commits intocensus-instrumentation:masterfrom
Conversation
| not state.get_statsbeat_initial_success(): | ||
| # If ingestion threshold during statsbeat initialization is reached, return back code to shut it down | ||
| if _statsbeat_failed_to_ingest(): | ||
| return -2 |
There was a problem hiding this comment.
It would be nice to use some kind of constant instead of the value here, -2 is shutdown signal, but looking at the code here I have no clue what -1 means.
There was a problem hiding this comment.
-1 is the exception signal for telemetry in general. -2 is the shutdown signal for only statsbeat exporter. I agree a constant would be better but that would probably require a refactor of all the return signals which I would prefer leaving to a different pr.
There was a problem hiding this comment.
Well using these kinds of numbers instead of enumerators or constants is usually a pretty bad practice in other languages, code is harder to understand and maintain by other developers, maybe this is the way to go in Python, just my two cents here.
There was a problem hiding this comment.
is -2 introduce in this PR? if so, how much work does it take to refactor?
There was a problem hiding this comment.
@hectorhdzg @heyams
Created new issue to track this refactor #1128
| pass | ||
|
|
||
| if self._is_stats_exporter() and \ | ||
| not state.get_statsbeat_shutdown() and \ |
There was a problem hiding this comment.
I can see you check if shutdown was called in several places, is the exporter process for Statsbeat expected to keep running after shutdown?
There was a problem hiding this comment.
This is for very specific race conditions in which multiple threads could be accessing the same piece of "check if we need to shutdown" logic. It also serves as a good sanity check to prevent from any statsbeat logic from executing if the statsbeat exporte ris already shutdown.
|
@lzchen you're tagging the wrong helen. |
contrib/opencensus-ext-azure/opencensus/ext/azure/common/transport.py
Outdated
Show resolved
Hide resolved
| batch = self.apply_telemetry_processors(batch) | ||
| result = self._transmit(batch) | ||
| # If statsbeat exporter and received signal to shutdown | ||
| if self._is_stats_exporter() and result == -2: |
There was a problem hiding this comment.
_statsbeat_failed_to_ingest above can return the counter and here just check if the counter is >= 3. -2 seems so random.
There was a problem hiding this comment.
_statsbeat_failed_to_ingest is a private function used only within transport to handle the count as well as determining whether it is reached. The returning of the result code back to the exporter is by design. I agree the codes are a bit random (-1, -2, etc) but changing them can be part of a different PR. See my response here as well.
Following specs
Similar to Node js, retry only occurs on successes (200) so shutdown occurs only when 3 attempts are reached.
@hectorhdzg @heyams