-
Notifications
You must be signed in to change notification settings - Fork 190
Esoteric Bug with regards to authentication via requests headers= argument being overwritten by .netrc #337
Description
Describe the bug
It's an odd one that set me back around a day: I hope to save people a similar amount of frustration.
When running a python model on an all purpose cluster, I would get this error:
10:54:28 Error getting status of cluster.
10:54:28 b'{"error_code": "403", "message": "Invalid access token."}'
despite putting a brand new token into the profiles.yml for the target corresponding all purpose cluster.
Turns out that the my .netrc file was to blame. It had an expired token for the same host:. Also turns out that the requests library overrides authentication methods passed through header= argument with the contents of the .netrc file, as stated here: https://requests.readthedocs.io/en/latest/user/authentication/#netrc-authentication
To be 100% clear, my .netrc file looked like this (replacing sensitive info below):
.netrc:
machine <my-workspace>.cloud.databricks.com
login token
password <expired_token>
My .profiles contained a target like:
all_purpose:
type: databricks
host: <my-workspace>.cloud.databricks.com
http_path: <sql_path_for_all_purpose_cluster>
catalog: <catalog>
schema: <schema>
token: <valid_token>
threads: 1
Then the weird part:
requests.pydoes pass the<valid_token>in to the headers argument in theget_cluster_statuscall.- With that said,
requests.pywould give me an error from this method because it defaulted to my.netrctoken: https://github.com/databricks/dbt-databricks/blob/main/dbt/adapters/databricks/python_submissions.py#L245
Steps To Reproduce
Fairly easy to trigger:
- Put an expired or invalid token into the
.netrcfile.- Note that using a
.netrcfile is encouraged in the Databricks API documentation
- Note that using a
- Put a valid token in the
~/.dbt/profiles.ymlfor an all purpose cluster that we want to use to run a python model. - Run the model, get the error from here: https://github.com/databricks/dbt-databricks/blob/main/dbt/adapters/databricks/python_submissions.py#L255
Expected behavior
I'd expect the token provided in my profiles.yml to take precedence.
Screenshots and log output
Not needed IMO, already know the cause.
System information
Relevant versions are:
dbt-databricks==1.5.0requests==2.28.1
Additional context
I would say the solution is to ensure the dbt-databricks code uses the auth argument instead. It seems a "bearer token" auth method doesn't come included in requests (which seems odd but okay). To this end there is a good SO solution to this: https://stackoverflow.com/a/58055668
I would also highlight that while this feels like it might only be a me problem, Databricks has lots of docs around using .netrc file, as linked before: Databricks API documentation. Doesn't seem too far-fetched someone else will fall into this trap, and its a hard one to get out of.