-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
I'm writing integration tests against a local GCS instance using fake-gcs-server, however, the call when reading the file does not seem to work:
➜ python git:(fd-gcs) ✗ ipython
Python 3.9.17 (main, Jun 20 2023, 18:00:22)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pyarrow.fs import GcsFileSystem
...: from datetime import datetime
...:
...: fs = GcsFileSystem(
...: access_token='anon',
...: credential_token_expiration=datetime(2023, 8, 2, 16, 30, 4),
...: scheme='http',
...: endpoint_override='0.0.0.0:4443'
...: )
In [2]: location = 'warehouse/vo.txt'
...:
...: with fs.open_output_stream(location) as f:
...: print(f.write(b"foo"))
3
In [3]: print(fs.get_file_info(location))
<FileInfo for 'warehouse/vo.txt': type=FileType.File, size=3>
In [4]: with fs.open_input_file(location) as f:
...: print(f.read())
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[4], line 1
----> 1 with fs.open_input_file(location) as f:
2 print(f.read())
File ~/Library/Python/3.9/lib/python/site-packages/pyarrow/_fs.pyx:763, in pyarrow._fs.FileSystem.open_input_file()
File ~/Library/Python/3.9/lib/python/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
File ~/Library/Python/3.9/lib/python/site-packages/pyarrow/error.pxi:113, in pyarrow.lib.check_status()
FileNotFoundError: [Errno 2] google::cloud::Status(NOT_FOUND: Permanent error in Read(): ). Detail: [errno 2] No such file or directory
Can be reproduced using:
from pyarrow.fs import GcsFileSystem
from datetime import datetime
fs = GcsFileSystem(
access_token='anon',
credential_token_expiration=datetime(2023, 8, 2, 16, 30, 4),
scheme='http',
endpoint_override='0.0.0.0:4443'
)
location = 'warehouse/vo.txt'
with fs.open_output_stream(location) as f:
print(f.write(b"foo"))
print(fs.get_file_info(location))
with fs.open_input_file(location) as f:
print(f.read())Failing calls with PyArrow
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"GET /storage/v1/b/warehouse/o/1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt HTTP/1.1\" 404 59"
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"GET /storage/v1/b/warehouse/o?prefix=1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt%2F&pageToken= HTTP/1.1\" 200 27"
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"POST /upload/storage/v1/b/warehouse/o?uploadType=resumable&name=1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt HTTP/1.1\" 200 335"
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"PUT /upload/storage/v1/b/warehouse/o?uploadType=resumable&name=1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt&upload_id=43a8ec7bc33a15592b750fc916790750 HTTP/1.1\" 200 570"
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"GET /storage/v1/b/warehouse/o/1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt HTTP/1.1\" 200 570"
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"GET /warehouse/1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt HTTP/1.1\" 404 10"
The last call is causing the 404, and it seems to be missing /storage/v1/b/.
The equivalent code using GCSSpec:
time="2023-08-02T14:35:57Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:35:57 +0000] \"GET /warehouse/1bc68628-a1d3-4081-b3f1-9d69224ddd5c.txt HTTP/1.1\" 404 10"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"GET /storage/v1/b/warehouse/o/d3057e83-52ab-4ce4-b16f-d55af7ba3525.txt HTTP/1.1\" 404 59"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"GET /storage/v1/b/warehouse/o?delimiter=/&prefix=d3057e83-52ab-4ce4-b16f-d55af7ba3525.txt/ HTTP/1.1\" 200 27"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"GET /storage/v1/b/warehouse/o/d3057e83-52ab-4ce4-b16f-d55af7ba3525.txt HTTP/1.1\" 404 59"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"POST /upload/storage/v1/b/warehouse/o?uploadType=resumable HTTP/1.1\" 200 335"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"POST /upload/storage/v1/b/warehouse/o?uploadType=resumable&name=d3057e83-52ab-4ce4-b16f-d55af7ba3525.txt&upload_id=2b6f8d48acf8dd87cc86d1e51bd3120e HTTP/1.1\" 200 570"
time="2023-08-02T14:36:10Z" level=info msg="172.19.0.1 - - [02/Aug/2023:14:36:10 +0000] \"GET /storage/v1/b/warehouse/o/d3057e83-52ab-4ce4-b16f-d55af7ba3525.txt HTTP/1.1\" 200 570"
This only seems to happen when the endpoint_override is set
Component(s)
Python
Reactions are currently unavailable