Skip to content

Add hash() and get_index() to phf_shared.#62

Merged
sfackler merged 1 commit intorust-phf:masterfrom
SimonSapin:string-cache
Aug 4, 2015
Merged

Add hash() and get_index() to phf_shared.#62
sfackler merged 1 commit intorust-phf:masterfrom
SimonSapin:string-cache

Conversation

@SimonSapin
Copy link
Copy Markdown
Contributor

In https://github.com/servo/string-cache, we currently use a phf::OrderedSet with its get_index method to get an identified stored in an Atom, and index to get a string back from that identifier.

However, the extra inderection of OrderedSet of Set is not necessary. We don’t care about the order, only about getting numeric identifiers.

Additionally, when get_index returns None, we hash the input string again to find it in table of dynamic atoms. With this chang, we can reuse the phf hash instead: servo/string-cache#103

At first I tried adding hash and index access to phf::Map, but the API got messy quickly. Do you think that is worth pursuing more, rather than having string-cache use phf internals?

In https://github.com/servo/string-cache, we currently use a `phf::OrderedSet`
with its `get_index` method to get an identified stored in an `Atom`,
and `index` to get a string back from that identifier.

However, the extra inderection of `OrderedSet` of `Set` is not necessary.
We don’t care about the order, only about getting numeric identifiers.

Additionally, when `get_index` returns `None`,
we hash the input string again to find it in table of dynamic atoms.
With this chang, we can reuse the phf hash instead:

servo/string-cache#103

At first I tried adding hash and index access to `phf::Map`,
but the API got messy quickly.
@sfackler
Copy link
Copy Markdown
Collaborator

sfackler commented Aug 4, 2015

I think this makes sense as-is. I see hash and index as basically implementation details of the current algorithm so it's somewhat fitting for them to be hidden a bit from the "main" APIs.

sfackler added a commit that referenced this pull request Aug 4, 2015
Add hash() and get_index() to phf_shared.
@sfackler sfackler merged commit 6f59718 into rust-phf:master Aug 4, 2015
@SimonSapin SimonSapin deleted the string-cache branch August 4, 2015 06:17
@SimonSapin
Copy link
Copy Markdown
Contributor Author

So you’re in favor of string-cache using these details from phf_shared rather than adding them to the phf crate API?

Could you publish this to crates.io? Thanks!

@sfackler
Copy link
Copy Markdown
Collaborator

sfackler commented Aug 4, 2015

Yep, I think that strategy makes sense. Will publish now.

bors-servo pushed a commit to servo/string-cache that referenced this pull request Sep 1, 2015
Reuse phf hash and remove phf::OrderedSet indirection

<s>Do not merge yet.</s> This depends on rust-phf/rust-phf#62

Use the `phf_shared` and `phf_generator` crates directly instead of `phf`. This allows us to re-use the phf hash in the dynamic table and avoid hashing the same string again.

Also remove the indirection of `phf::OrderedSet` compared to `phf::Set`: we don’t care about the order, only about getting numeric indices. (The optimization mentioned in a comment of using a bit map of the first 64 atoms in the html5ever tree builder was never implemented. If we want it, the indirection and order preservation can be added back while preserving the hash reuse.)

Fixes #38.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/103)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants