-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Issue
The API key and User-Agent are currently passed as GET parameters, ideally they should be passed as headers. User-Agents are commonly passed as headers, and for the key this is actually preferred.
It also is slightly more secure, as the dataverse API will return the full request URL on error, possibly exposing API keys in logs:
{"status":"ERROR","code":404,"message":"API endpoint does not exist on this server. Please check your code for typos, or consult our API guide at http://guides.dataverse.org.","requestUrl":"http://localhost:8080/api/v1/access/datafile/:persistentId/?persistentId=23&User-Agent=pydataverse&key=<the API token>","requestMethod":"GET"}(taken from datalad/datalad-dataverse#310, where I found out about this; it also contains a little bit of context)
I am not sure if this request is a bug or feature, so I decided to open a normal issue. The pyDataverse version I use is 0.3.3, and the code doing this is in pyDataverse/api.py:
pyDataverse/pyDataverse/api.py
Lines 103 to 138 in 3b6fe06
| def get_request(self, url, params=None, auth=False): | |
| """Make a GET request. | |
| Parameters | |
| ---------- | |
| url : str | |
| Full URL. | |
| params : dict | |
| Dictionary of parameters to be passed with the request. | |
| Defaults to `None`. | |
| auth : bool | |
| Should an api token be sent in the request. Defaults to `False`. | |
| Returns | |
| ------- | |
| class:`requests.Response` | |
| Response object of requests library. | |
| """ | |
| params = {} | |
| params["User-Agent"] = "pydataverse" | |
| if self.api_token: | |
| params["key"] = str(self.api_token) | |
| if self.client is None: | |
| return self._sync_request( | |
| method=httpx.get, | |
| url=url, | |
| params=params, | |
| ) | |
| else: | |
| return self._async_request( | |
| method=self.client.get, | |
| url=url, | |
| params=params, | |
| ) |
The same is done for post_request, etc.
A possible change would be to use headers and pass them as headers:
headers = {}
headers["User-Agent"] = "pydataverse"
if self.api_token:
headers["X-Dataverse-key"] = str(self.api_token)
if self.client is None:
return self._sync_request(
method=httpx.get,
url=url,
headers=headers,
)Additionally, I noticed that neither the params nor the auth attributes are used.
For the params a fix could be, especially after moving the key and user-agent to the headers:
self._sync_request(
method=httpx.get,
url=url,
headers=headers,
params=params or {}, # maybe httpx.get even handles params=None, in that case they could simply be passed through
)For the auth parameter it's a bit more difficult, since it currently does not affect the behavior although it should.
One possible way would be to update the if statement before adding the token to the header:
if auth and self.api_token:
However, it might make sense to change the default value to True in this case, to keep the behavior stable. However, I am not 100% sure what is the best way to handle this. Of course, one option would also be to remove the auth argument completely.
Lastly, the docstring of the function mentions the requests.Response, while it's now a httpx.Response. While they do have a similar interface, they have some minor differences; e.g., requests has an .ok property and uses iter_content, while httpx does not have .ok and uses iter_bytes.
I think I could try to "fix" this and draft a PR.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status