0% found this document useful (0 votes)
89 views8 pages

Algo Ds Bloom Typed

1) Bloom filters are a space-efficient data structure that allows for fast inserts and lookups but cannot delete items or store associated objects. They have a small false positive probability where an item may be reported as inserted when it was not. 2) Bloom filters have applications like early spellcheckers, lists of forbidden passwords, and network routers where memory is limited and speed is critical. 3) A Bloom filter uses a bit array and k hash functions. To insert an item, its k hash values are computed and the corresponding bit positions in the array are set to 1. Lookup checks if all k bit positions for an item are set to 1.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views8 pages

Algo Ds Bloom Typed

1) Bloom filters are a space-efficient data structure that allows for fast inserts and lookups but cannot delete items or store associated objects. They have a small false positive probability where an item may be reported as inserted when it was not. 2) Bloom filters have applications like early spellcheckers, lists of forbidden passwords, and network routers where memory is limited and speed is critical. 3) A Bloom filter uses a bit array and k hash functions. To insert an item, its k hash values are computed and the corresponding bit positions in the array are set to 1. Lookup checks if all k bit positions for an item are set to 1.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Structures

Bloom Filters
Design and Analysis
of Algorithms I

Bloom Filters: Supported Operations


Raison Dtre: fast Inserts and Lookups.
Comparison to Hash Tables:
Pros: more space efficient.
Cons:
1) cant store an associated object
2) No deletions
3) Small false positive probability
(i.e., might say x has been inserted even though it hasnt been)

Tim Roughgarden

Bloom Filters: Applications


Original: early spellcheckers.
Canonical: list of forbidden passwords
Modern: network routers.
- Limited memory, need to be super-fast

Tim Roughgarden

Bloom Filter: Under the Hood


Ingredients: 1) array of n bits (

2) k hash functions h1,..,hk (k = small constant)


Insert(x): for i = 1,2,,k
(whether or not bit already set ot 1)
set A[hi(x)]=1
Lookup(x): return TRUE  A[hi(x)] = 1 for every I = 1,2,.,k.
Note: no false negatives. (if x was inserted, Lookup (x) guaranteed to
succeed)
But: false positive if all k hi(x)s already set to 1 by other insertions.

Tim Roughgarden

Heuristic Analysis
Intuition: should be a trade-off between space and error (false
positive) probability.
Assume: [not justified] all hi(x)s uniformly random and independent
(across different is and xs).
Setup: n bits, insert data set S into bloom filter.
Note: for each bit of A, the probability its been set to I is (under above
assumption):

Tim Roughgarden

Under the heuristic assumption, what is the probability that a


given bit of the bloom filter (the first bit, say) has been set to 1
after the data set S has been inserted?
prob 1st bit = 0
prob 1st bit = 1

Heuristic Analysis
Intuition: should be a trade-off between space and error (false
positive) probability.
Assume: [not justified] all hi(x)s uniformly random and independent
(across different is and xs).
Setup: n bits, insert data set S into bloom filter.
Note: for each bit of A, the probability its been set to I is (under above
assumption):
b = # of
bits per
object
(n/|S|)
Tim Roughgarden

Heuristic Analysis (cond)


Story so far: probability a given bit is 1 is
So: under assumption, for x not in S, false positive probability is
,
where b = # of bits per object.
error rank
How to set k?: for fixed b, is minimized by setting
Plugging back in:
or

(exponentially
small in b )

Ex: with b = 8, choose k = 5 or 6 , error probability only approximately 2%.

Tim Roughgarden

You might also like