Skip to content

Pub/Sub Subscriber does not catch & retry UNAVAILABLE errors #4234

@kir-titievsky

Description

@kir-titievsky

A basic Pub/Sub message consumer stops consuming messages after a retryable error (see stack trace below, but in short _Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, The service was unable to fulfill your request. Please try again. [code=8a75])>). The app does not crash but the stream never recovers and continue to receive messages. Interesting observations;

  • If I simply turn off WiFi on my laptop and run the same code, it keeps retrying until the machine is connected to the network and functions as expected. This tells me that this is a reaction to the specific StatusCode
  • The exception sometimes happens on startup sometimes mid-stream.

Expected behavior:

  • The application code would continue retrying to build the streamingPull connection and eventually recover and receive messages.
  • This would be handled and surfaced as a warning, rather than a thread-killing exception.

This might be the same issue as 2683. This comment, in particular, seems like the solution that I would expect the client library to implement.

Answers to standard questions:

  1. OS type and version
    MacOS Sierra 10.12.6
  2. Python version and virtual environment information python --version
    Python 2.7.10 (running in virtualenv)
  3. google-cloud-python version pip show google-cloud, pip show google-<service> or pip freeze
$ pip show google-cloud
Name: google-cloud
Version: 0.27.0
pip show google-cloud-pubsub
Name: google-cloud-pubsub
Version: 0.28.4
  1. Stacktrace if available
Exception in thread Consumer helper: consume bidirectional stream:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/kir/cloud/env/lib/python2.7/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 248, in _blocking_consume
    self._policy.on_exception(exc)
  File "/Users/kir/cloud/env/lib/python2.7/site-packages/google/cloud/pubsub_v1/subscriber/policy/thread.py", line 140, in on_exception
    raise exception
_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, The service was unable to fulfill your request. Please try again. [code=8a75])>
  1. Steps to reproduce
  • I was not able to reproduce this consistently. But it would happen ~1 in 10 times I ran the code.
  1. Code example
import time, datetime, sys
from google.cloud import pubsub_v1 as pubsub

subscription_name = "projects/%s/subscriptions/%s"%(sys.argv[1], sys.argv[2])
sleep_time_ms = 0
try:
    sleep_time_ms = int(sys.argv[3])
except Exception:
    print "Could not parse custom sleep time."
print "Using sleep time %g ms"%sleep_time_ms

def callback(message):
    t = time.time()
    time.sleep(float(sleep_time_ms)/1000)
    print "Message " + message.data + " acked in %g second"%(time.time() - t)
    message.ack()

subscription = pubsub.SubscriberClient().subscribe(subscription_name).open(callback=callback)
time.sleep(10000)

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions