Skip to content

DefaultCredentialsProvider caches failure for flaky Compute Engine credential lookup #109

@lukecwik

Description

@lukecwik

Looking up application default credentials on a GCE VM can fail due to VM metadata server being unavailable during VM launch. This is a rare event but Google Cloud Dataflow customers hit this rare case one or two times a month due to the sheer number of VMs. GCE attempted to mitigate VM metadata server unavailability but were only able to reduce it be an order of magnitude thus we need support from the client to retry. Additionally, when contacting the GCE VM metadata server, we should be using the fixed IP address avoiding the nameserver lookup (another potential point of failure).

Problem area in the code:
https://github.com/google/google-auth-library-java/blob/b94f8e4d02bf6917af2e2f7ef8d7114a51dbcfa8/oauth2_http/java/com/google/auth/oauth2/DefaultCredentialsProvider.java#L261

Note that the code in this library and the Apiary auth support code are very similar. The fix was done within the Apiary auth code (note the use of the static IP address and also the presence of a fixed number of retries):
https://github.com/google/google-api-java-client/blob/4fc8c099d9db5646770868cc1bc9a33c9225b3c7/google-api-client/src/main/java/com/google/api/client/googleapis/auth/oauth2/OAuth2Utils.java#L74

It turned out that the fixes resulted in zero future customer contacts about this issue.

Metadata

Metadata

Assignees

Labels

🚨This issue needs some love.triage meI really want to be triaged.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions