Skip to content

handle some broken redirects to pre-signed S3 URLs#438

Merged
drewda merged 3 commits intomainfrom
s3-redirect
Apr 25, 2025
Merged

handle some broken redirects to pre-signed S3 URLs#438
drewda merged 3 commits intomainfrom
s3-redirect

Conversation

@drewda
Copy link
Copy Markdown
Member

@drewda drewda commented Apr 24, 2025

This is a change to support https://www.transit.land/feeds/f-9tb-valleymetro

Currently tlib fails to follow the redirect to the pre-signed S3 URL:

✗ transitland validate http://www.phoenixopendata.com/dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip
2025-04-24T11:22:17-07:00 [INFO ] Validating: http://www.phoenixopendata.com/dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip
Error: could not open reader 'http://www.phoenixopendata.com/dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip': response status code: 40

The problem is because the URL redirect includes a :443:

➜  transitland git:(s3-redirect) ✗ curl -v https://www.phoenixopendata.com/dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip
* Host www.phoenixopendata.com:443 was resolved.
* IPv6: (none)
* IPv4: 104.21.16.1, 104.21.32.1, 104.21.48.1, 104.21.64.1, 104.21.80.1, 104.21.96.1, 104.21.112.1
*   Trying 104.21.16.1:443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=phoenixopendata.com
*  start date: Apr 10 08:35:44 2025 GMT
*  expire date: Jul  9 09:33:54 2025 GMT
*  subjectAltName: host "www.phoenixopendata.com" matched cert's "*.phoenixopendata.com"
*  issuer: C=US; O=Google Trust Services; CN=WE1
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
*   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 2: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
* Connected to www.phoenixopendata.com (104.21.16.1) port 443
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.phoenixopendata.com/dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.phoenixopendata.com]
* [HTTP/2] [1] [:path: /dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip]
* [HTTP/2] [1] [user-agent: curl/8.13.0]
* [HTTP/2] [1] [accept: */*]
> GET /dataset/3eae9a4a-98b9-40c8-8df7-8c00c1756235/resource/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/download/googletransit.zip HTTP/2
> Host: www.phoenixopendata.com
> User-Agent: curl/8.13.0
> Accept: */*
>
* Request completely sent off
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 302
< date: Thu, 24 Apr 2025 18:21:16 GMT
< content-type: text/html; charset=utf-8
< location: https://s3.amazonaws.com:443/og-production-open-data-phoenixaz-892364687672/resources/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/googletransit.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJJIENTAPKHZMIPXQ%2F20250424%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250424T182116Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=6ccc25a1639ee82593c75fe2e6b61a901d7f0eda5f483d2ee22ecaabe4ff937f
< server: cloudflare
< cache-control: max-age=28800
< expires: Thu, 24 Apr 2025 18:21:16 GMT
< strict-transport-security: max-age=31536000; includeSubdomains
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block
< cf-cache-status: EXPIRED
< cf-ray: 93579e8f7fde2b61-LAX
< alt-svc: h3=":443"; ma=86400                                                                                                              <                                                                                                                                           <!doctype html>
<html lang=en>
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to the target URL: <a href="https://s3.amazonaws.com:443/og-production-open-data-phoenixaz-892364687672/resources/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/googletransit.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Credential=AKIAJJIENTAPKHZMIPXQ%2F20250424%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Date=20250424T182116Z&amp;X-Amz-Expires=86400&amp;X-Amz-SignedHeaders=host&amp;X-Amz-Signature=6ccc25a1639ee82593c75fe2e6b61a901d7f0eda5f483d2ee22ecaabe4ff937f">https://s3.amazonaws.com:443/og-production-open-data-phoenixaz-892364687672/resources/28ccc0a5-49c8-495c-b91f-193de5ce2cb7/googletransit.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Credential=AKIAJJIENTAPKHZMIPXQ%2F20250424%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Date=20250424T182116Z&amp;X-Amz-Expires=86400&amp;X-Amz-SignedHeaders=host&amp;X-Amz-Signature=6ccc25a1639ee82593c75fe2e6b61a901d7f0eda5f483d2ee22ecaabe4ff937f</a>. If not, click the link.
* Connection #0 to host www.phoenixopendata.com left intact

Turns out Go standard library's net/http will include the :443 port in the Host header of the request, even though that's the default port for HTTPS. S3 isn't expecting that when it checks the signature, so it fails signature validation. In contrast, clients like curl strip out the port from the Host header, if it's a default value. So we'll modify our code to mimic that behavior.

@drewda drewda merged commit fac0bec into main Apr 25, 2025
11 checks passed
@drewda drewda deleted the s3-redirect branch April 25, 2025 21:54
@drewda
Copy link
Copy Markdown
Member Author

drewda commented Apr 28, 2025

Now working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant