0% found this document useful (0 votes)
34 views1 page

04 - Hash Tables and Hash Functions - en

Uploaded by

ali79hm2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views1 page

04 - Hash Tables and Hash Functions - en

Uploaded by

ali79hm2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

So let us say you have

several data items and you want to group them into buckets
by some kind of similarity. One bucket can hold more than one item and each item is
always assigned
to the same bucket. So the results would be these blue
ovals and up in bucket number one, these gray rectangles. And up in bucket number
two, and these magenta triangles
are assigned to buckets three. Now let's think about how we'd
like to do this with word vectors. First, let's assume that the word
vectors have just 1 dimension instead the 300 dimensions. So each word is
represented
by a single number such as 100, 14 ,17, 10 and 97. You need to find a way to give
each vector
a hash value which is a key that tells it which bucket it's assigned to. A function
that assigns a hash
value is called a hash function. In this example, here is a hash
table which is a set of buckets. In this case,
the hash table has ten buckets. Notice how the word vectors 100 and
10 are assigned to bucket 0. The word vector 14 is
assigned to buckets 4 and the word vectors 17 and
97 are assigned to bucket 7. Do you notice a pattern? This formula here is the hash
function that's being used to assign the word vectors to
their respective buckets. The modulo operator takes
the remainder after dividing by ten. The remainder is the hash value that tells
us where the word vector should be stored. For example, 14 divided by 10 has
a remainder of 4, so it goes to buckets 4. Now let's build a basic
hash table in code. Here is a definition of a function
that takes in a list of values. You can think of each value
as a one-dimensional vector. It also takes in the number of pockets. Define the
hash function
used in the modulo operator. Then you create the hash table, notice
that this is a dictionary comprehension. The key is an integer and
the value is an empty list, which you'll use as a bucket for storage. For each word
vector,
calculate its hash value, then append it to the appropriate list. The hash table
that is returned can be
seen in the notebook that's goes with this vector. You will see that the hash table
is the same as what you saw in the previous slide. Now let's take another look
at this basic hash table. Recall that your original goal was to put
similar word vectors into the same bucket, but here it doesn't look like
numbers that are close to each other are in the same bucket. For instance, 10, 14
and
17 are in different buckets. Ideally, you want to
have a hash function that puts similar word vectors in
the same bucket like this. To do this you'll need to use what's
called locality sensitive hashing. Locality is another word for location,
sensitive is another word for caring. So locality sensitive hashing is
a hashing method that's cares very deeply about assigning items based on where
they're located in vector space. You'll learn about locality
sensitive hashing next.

You might also like