Skip to content

Comments

Implement options to handle IDENTITY CIDs gracefully#239

Merged
masih merged 1 commit intomasterfrom
masih/graceful-id-cid-opts
Sep 20, 2021
Merged

Implement options to handle IDENTITY CIDs gracefully#239
masih merged 1 commit intomasterfrom
masih/graceful-id-cid-opts

Conversation

@masih
Copy link
Member

@masih masih commented Sep 17, 2021

Implement two additional options that allow a CARv2 file to 1) include
IDENTNTIY CIDs, and 2) specify a maximum allowed CID length with default
of 2KiB as a sufficiently large default.

Configure ReadWrite blockstore to persist given blocks with IDENTITY
CIDs.

Introduce a new Characteristics filed that signalls whether an index in
a CAR file contains a full catalog of CIDs for backward compatibility
purposes. Note, this is a new addition and will need to be added to the
spec in a separate PR.

Relates to #215

@masih masih force-pushed the masih/graceful-id-cid-opts branch 2 times, most recently from f1622e4 to 9f9f687 Compare September 17, 2021 21:09
// Assert the id is indexed.
r, err := carv2.OpenReader(path)
require.NoError(t, err)
t.Cleanup(func() { require.NoError(t, r.Close()) })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check read-only close errors 😉

records = append(records, index.Record{Cid: c, Offset: uint64(sectionOffset)})

if c.Prefix().MhType != multihash.IDENTITY || o.IncludeIdentityCIDs {
if uint64(cidLen) > o.MaxIndexCidSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage of this case might be nice - it could even fit in one of the existing tests here without needing a special test just for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for large CIDs already in this PR here. Do you think it's missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's not using the default (common) path but setting an unrealistic max to break, that was my thought
but this is a nice to have, not a demand for busy-work if you're comfortable with this as is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out; captured #241

@rvagg
Copy link
Member

rvagg commented Sep 20, 2021

The maximum of 2K is somewhat arbitrary isn't it? Not disagreeing with it, it feels arbitrary no matter how you slice it, but if we have a rationale then that might be good to include in a note.

By locking it in a default here as 2K we're probably going to be pushing that fairly broadly through the stack if the CARv2 writer interface becomes the default for creating CARs. Which is probably good we have a proper limiting condition, just with possible ramifications.

I might have to consider getting something similar in to the JavaScript CAR because we now have a lot of user data being generated that way, it's possible we get large identity CIDs slipping in there and only being caught if/when they hit this code in Filecoin.

@masih
Copy link
Member Author

masih commented Sep 20, 2021

The maximum of 2K is somewhat arbitrary isn't it? Not disagreeing with it, it feels arbitrary no matter how you slice it, but if we have a rationale then that might be good to include in a note.

By locking it in a default here as 2K we're probably going to be pushing that fairly broadly through the stack if the CARv2 writer interface becomes the default for creating CARs. Which is probably good we have a proper limiting condition, just with possible ramifications.

I might have to consider getting something similar in to the JavaScript CAR because we now have a lot of user data being generated that way, it's possible we get large identity CIDs slipping in there and only being caught if/when they hit this code in Filecoin.

It is mostly arbitrary indeed; the rationale was to pick something that is somewhat future proof and unlikely to be hit in reasonable use cases today. Any suggestions for a better value is more than welcomed. I know @aschmahmann pointed out a limit of 80 in KAD DHT? The key thing to point out here though, is that, this is an option default value and not a hard limit; i.e. option can be overridden and set to all sorts of unreasonable values.

Thoughts?

@masih masih force-pushed the masih/graceful-id-cid-opts branch 2 times, most recently from bdfd5f3 to 92e3c89 Compare September 20, 2021 11:53
@masih masih force-pushed the masih/graceful-id-cid-opts branch from 92e3c89 to 023c842 Compare September 20, 2021 12:59
Copy link
Contributor

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@masih masih marked this pull request as ready for review September 20, 2021 13:13
@masih masih requested a review from ribasushi September 20, 2021 13:15
Copy link

@ribasushi ribasushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid wok overall, left a number of comments regarding naming/clarifications

None of the remarks are showstoppers, take them or leave them: it's good either way.

Implement two additional options that allow a CARv2 file to 1) include
IDENTNTIY CIDs, and 2) specify a maximum allowed CID length with default
of 2KiB as a sufficiently large default.

Configure ReadWrite blockstore to persist given blocks with IDENTITY
CIDs.

Introduce a new Characteristics filed that signalls whether an index in
a CAR file contains a full catalog of CIDs for backward compatibility
purposes. Note, this is a new addition and will need to be added to the
spec in a separate PR.

Relates to #215
@masih masih force-pushed the masih/graceful-id-cid-opts branch from 023c842 to 29405e4 Compare September 20, 2021 14:40
@masih masih merged commit f35d88c into master Sep 20, 2021
@masih masih deleted the masih/graceful-id-cid-opts branch September 20, 2021 14:44
masih added a commit to ipld/ipld that referenced this pull request Sep 20, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 20, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 20, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 20, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 21, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 24, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
masih added a commit to ipld/ipld that referenced this pull request Sep 27, 2021
Update CARv2 specification to include a format for MultihashIndexSorted
and fully-indexed characteristic bitfield.

Relates to:
- ipld/go-car#239
- ipld/go-car#217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants