Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

Comments

Shutdown statsbeat after failure threshold is met#1127

Merged
lzchen merged 9 commits intocensus-instrumentation:masterfrom
lzchen:stats
Jun 14, 2022
Merged

Shutdown statsbeat after failure threshold is met#1127
lzchen merged 9 commits intocensus-instrumentation:masterfrom
lzchen:stats

Conversation

@lzchen
Copy link
Contributor

@lzchen lzchen commented Jun 7, 2022

Following specs

Similar to Node js, retry only occurs on successes (200) so shutdown occurs only when 3 attempts are reached.

@hectorhdzg @heyams

@lzchen lzchen requested review from a team, aabmass, hectorhdzg and songy23 as code owners June 7, 2022 20:10
not state.get_statsbeat_initial_success():
# If ingestion threshold during statsbeat initialization is reached, return back code to shut it down
if _statsbeat_failed_to_ingest():
return -2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to use some kind of constant instead of the value here, -2 is shutdown signal, but looking at the code here I have no clue what -1 means.

Copy link
Contributor Author

@lzchen lzchen Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is the exception signal for telemetry in general. -2 is the shutdown signal for only statsbeat exporter. I agree a constant would be better but that would probably require a refactor of all the return signals which I would prefer leaving to a different pr.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well using these kinds of numbers instead of enumerators or constants is usually a pretty bad practice in other languages, code is harder to understand and maintain by other developers, maybe this is the way to go in Python, just my two cents here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -2 introduce in this PR? if so, how much work does it take to refactor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hectorhdzg @heyams
Created new issue to track this refactor #1128

pass

if self._is_stats_exporter() and \
not state.get_statsbeat_shutdown() and \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see you check if shutdown was called in several places, is the exporter process for Statsbeat expected to keep running after shutdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for very specific race conditions in which multiple threads could be accessing the same piece of "check if we need to shutdown" logic. It also serves as a good sanity check to prevent from any statsbeat logic from executing if the statsbeat exporte ris already shutdown.

@heyams
Copy link

heyams commented Jun 8, 2022

@lzchen you're tagging the wrong helen.

batch = self.apply_telemetry_processors(batch)
result = self._transmit(batch)
# If statsbeat exporter and received signal to shutdown
if self._is_stats_exporter() and result == -2:
Copy link

@heyams heyams Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_statsbeat_failed_to_ingest above can return the counter and here just check if the counter is >= 3. -2 seems so random.

Copy link
Contributor Author

@lzchen lzchen Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_statsbeat_failed_to_ingest is a private function used only within transport to handle the count as well as determining whether it is reached. The returning of the result code back to the exporter is by design. I agree the codes are a bit random (-1, -2, etc) but changing them can be part of a different PR. See my response here as well.

Copy link

@hectorhdzg hectorhdzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lzchen lzchen merged commit 7cbf82f into census-instrumentation:master Jun 14, 2022
@lzchen lzchen deleted the stats branch June 14, 2022 17:21
@lzchen lzchen added the azure Microsoft Azure label Nov 9, 2022
inirudebwoy pushed a commit to inirudebwoy/opencensus-python that referenced this pull request Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

azure Microsoft Azure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants