Skip to content

feat: HTTP retrieval by block CID#946

Merged
LexLuthr merged 9 commits intononsense/piece-storefrom
feat/http-block-retrieval
Nov 16, 2022
Merged

feat: HTTP retrieval by block CID#946
LexLuthr merged 9 commits intononsense/piece-storefrom
feat/http-block-retrieval

Conversation

@LexLuthr
Copy link
Collaborator

No description provided.

@willscott
Copy link
Contributor

Where is the logic here that would prevent serving blocks that should not be retrievable by provider policy?

}

func (s *HttpServer) blockBasePath() string {
return s.path + "/block/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets have a spec for this interface?

In particular, there are a set of semantics that have become fairly precise around how the IPFS gateways serve a block over HTTP.

We should, unless we have very strong reason otherwise, attempt to replicate those semantics.

Otherwise, we end up having to maintain support for this alternative endpoint with slightly different semantics. that we haven't fully documented anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should replicate the gateway semantics - could you point us to any existing docs about gateway semantics that we can take a look at?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like what's missing is:

  1. Set Content-Type header to application/vnd.ipld.raw
  2. Set Content-Disposition header to attachment; filename="<cid>.car"

Is there anything else I'm missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider path semantics of /ipfs/<cid> rather than /block/<cid> as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not familiar enough with the spec to know what you mean by path semantics - can you point us to a specific section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need "X-Content-Type-Options" set as "nosniff" as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is requested by CID, only supported path is /ipfs/{cid}

just that the implemented handler is at the path "/block/" and the gateway spec is expecting it to be at "/ipfs/"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see - sure that seems like an easy change to make

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All requested changes done. @willscott and @dirkmc Can you review again?

@dirkmc
Copy link
Contributor

dirkmc commented Nov 14, 2022

We need to implement provider policy for booster-http, but that's true of all endpoints, so I think it's outside the scope of this PR

@dirkmc dirkmc force-pushed the feat/piece-store-http branch 3 times, most recently from 7b31cb5 to 3b9cf50 Compare November 14, 2022 14:12
@LexLuthr LexLuthr requested a review from willscott November 14, 2022 14:40
@LexLuthr LexLuthr force-pushed the feat/http-block-retrieval branch from 230da21 to 387b25c Compare November 14, 2022 14:58
Comment on lines 246 to 253
filename := r.URL.Query().Get("filename")
if filename == "" {
filename = blockCidStr
}

utf8Name := url.PathEscape(filename)
asciiName := url.PathEscape(regexp.MustCompile("[[:^ascii:]]").ReplaceAllLiteralString(filename, "_"))
w.Header().Set("Content-Disposition", fmt.Sprintf("%s; filename=\"%s\"; filename*=UTF-8''%s", "attachment", asciiName, utf8Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how strongly do we feel we need this additional logic?

this diverges from the headers the gateway spec details, and means you can make a url that downloads as an attacker-controlled filename / includes data not in the block itself.

Copy link
Collaborator Author

@LexLuthr LexLuthr Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are talking about the filename logic in case user did not specify one? Or filename within the header itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Let me remove the filename input altogether to avoid this. Filename would be same as block Cid to mitigate this.

utf8Name := url.PathEscape(filename)
asciiName := url.PathEscape(regexp.MustCompile("[[:^ascii:]]").ReplaceAllLiteralString(filename, "_"))
w.Header().Set("Content-Disposition", fmt.Sprintf("%s; filename=\"%s\"; filename*=UTF-8''%s", "attachment", asciiName, utf8Name))
w.Header().Set("Content-Type", "application/vnd.ipld.raw")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the Content-Type gets overridden in the parameter passed to serveContent()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@LaurenSpiegel
Copy link
Contributor

Could you please do a related PR to the docs so we can see how this appears from a user perspective along with the piece cid endpoints and query params?

@LexLuthr
Copy link
Collaborator Author

Could you please do a related PR to the docs so we can see how this appears from a user perspective along with the piece cid endpoints and query params?

I will update the docs once this PR is merged and a release has the new code. Targeted release for this feature will be v1.5.1-rc1

Comment on lines 246 to 247
utf8Name := url.PathEscape(blockCidStr)
asciiName := url.PathEscape(regexp.MustCompile("[[:^ascii:]]").ReplaceAllLiteralString(blockCidStr, "_"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing all the escaping, can you just use blockCid.String()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but I think this due to how Go treats a string as slice of bytes(runes). It would still try to use the special characters so forcing them to escape would be a safer choice.
BTW, I picked this directly from Kubo.
https://github.com/ipfs/kubo/blob/1d5e46ac97dbd9862818b191517a4dc04186a26e/core/corehttp/gateway_handler.go#L688-L692

Base automatically changed from feat/piece-store-http to nonsense/piece-store November 16, 2022 08:05
@dirkmc
Copy link
Contributor

dirkmc commented Nov 16, 2022

@LexLuthr some other PRs got merged into nonsense/piece-store. Could you please resolve the conflicts and merge when ready.

@LexLuthr LexLuthr force-pushed the feat/http-block-retrieval branch from 6d1bf3a to 3e2126b Compare November 16, 2022 10:43
@LexLuthr LexLuthr merged commit f5e1948 into nonsense/piece-store Nov 16, 2022
@LexLuthr LexLuthr deleted the feat/http-block-retrieval branch November 16, 2022 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants