Streaming Algorithms Explained with Use Cases
1. Misra-Gries Algorithm
Purpose: Identify elements in a data stream that occur frequently (above a given threshold).
Real-Life Use: Detecting most common queries in a search engine, finding trending topics on Twitter.
Pseudocode:
Initialize an empty dictionary C
Set k (maximum number of counters)
For each element x in the stream:
If x in C:
C[x] += 1
Else if len(C) < k - 1:
C[x] = 1
Else:
Decrease all counts in C by 1
Remove entries with count 0
2. Lossy Counting
Purpose: Track frequent items approximately with error tolerance.
Real-Life Use: Online ad clickstream analysis, monitoring frequently accessed web pages.
Pseudocode:
Set epsilon (error parameter), N = 0, empty dict C
For each x:
N += 1
If x in C: C[x][0] += 1
Else: C[x] = [1, current_bucket - 1]
Every bucket_width items: remove entries if C[key][0] + C[key][1] <= current_bucket
3. Reservoir Sampling
Streaming Algorithms Explained with Use Cases
Purpose: Random sampling from a stream of unknown size.
Real-Life Use: Select random user sessions, logs, or tweets from a firehose for analysis.
Pseudocode:
Fill reservoir of size k with first k items
For each i > k, replace item at random with probability k/i
4. KMV (K Minimum Values)
Purpose: Estimate number of distinct elements (cardinality).
Real-Life Use: Estimate unique website visitors, count distinct IP addresses in logs.
Pseudocode:
Hash elements to [0,1]; maintain k smallest values
Estimate = (k - 1) / max(k smallest hash values)
5. Boyer-Moore Majority Vote
Purpose: Find majority element (>50% frequency).
Real-Life Use: Detect most dominant behavior in logs, top-voted answer in feedback.
Pseudocode:
candidate = None, count = 0
For each x:
If count == 0: candidate = x
If x == candidate: count += 1 else: count -= 1
6. Space-Saving Algorithm
Purpose: Find most frequent elements using fixed memory.
Real-Life Use: Top-k search terms, most active users on a platform.
Streaming Algorithms Explained with Use Cases
Pseudocode:
Keep k counters. If x is new and full, replace min entry and increase count.
7. Count Sketch
Purpose: Approximate frequency counts with negative noise correction.
Real-Life Use: Network traffic monitoring, approximate counting in DBMS.
Pseudocode:
Use d hash + sign functions. Update multiple counters using signs.
Estimate = median of (counter * sign) for each row.
8. Count-Min Sketch
Purpose: Estimate frequencies (overestimate only).
Real-Life Use: Spam detection, streaming logs, cache eviction policies.
Pseudocode:
Update count in d hash buckets; estimate = min(counts across all hashes)
9. Bloom Filter
Purpose: Test membership in a set with false positives.
Real-Life Use: Caches, databases (avoid unnecessary disk lookups), network security.
Pseudocode:
Hash x to k bit positions and set them to 1
To query: check all k bits are 1 (else not in set)
10. Sliding Window Model
Purpose: Analyze most recent data (e.g., last 1 min/hour).
Streaming Algorithms Explained with Use Cases
Real-Life Use: Real-time alerts, CPU/memory usage, fraud detection.
Pseudocode:
Maintain deque of last N elements, slide as new ones arrive
11. HyperLogLog (HLL)
**Purpose:** Estimate the number of distinct elements in a stream using limited memory.
**Simple Usage:** Count the number of unique visitors to a website without storing all IPs.
**Real-World Use Cases:**
- Analytics platforms like Google Analytics, Mixpanel
- Network monitoring tools for unique flows
- Databases (e.g., Redis, Postgres) for cardinality estimation
**Pseudocode:**
1. Use a good hash function to hash each element to a binary string.
2. Divide hash space into `m` buckets using the first few bits.
3. For each bucket, track the maximum number of leading zeros in the remaining bits.
4. Estimate the cardinality using the formula:
`Estimate = alpha * m^2 / sum(2^-R[i])`
**Go Implementation (simplified):**
```go
import "hash/fnv"
func leadingZeros(x uint32) int {
n := 1
for x >>= 1; x > 0; x >>= 1 {
n++
Streaming Algorithms Explained with Use Cases
return 32 - n
func HyperLogLog(stream []string, m int) float64 {
buckets := make([]int, m)
for _, item := range stream {
h := fnv.New32a()
h.Write([]byte(item))
hash := h.Sum32()
idx := int(hash % uint32(m))
val := hash >> uint32(32 - 5)
buckets[idx] = max(buckets[idx], leadingZeros(val))
sum := 0.0
for _, r := range buckets {
sum += 1.0 / math.Pow(2, float64(r))
alpha := 0.7213 / (1 + 1.079/float64(m))
return alpha * float64(m*m) / sum
```
12. Flajolet-Martin
**Purpose:** Early method to estimate number of unique elements in a stream.
**Simple Usage:** Estimate how many distinct IP addresses hit a server.
**Real-World Use Cases:**
- Web analytics
Streaming Algorithms Explained with Use Cases
- Spam detection (unique message signatures)
- DNS query monitoring
**Pseudocode:**
1. Hash each incoming element.
2. Count the position of the least significant 1 in the binary form.
3. Track the maximum such position R.
4. Estimated count = 2^R
**C++ Implementation:**
```cpp
#include <iostream>
#include <bitset>
#include <string>
#include <functional>
int trailingZeros(uint32_t x) {
int count = 0;
while ((x & 1) == 0 && count < 32) {
x >>= 1;
count++;
return count;
int flajoletMartin(std::vector<std::string>& stream) {
int R = 0;
for (auto& s : stream) {
std::hash<std::string> hasher;
uint32_t hash = hasher(s);
Streaming Algorithms Explained with Use Cases
R = std::max(R, trailingZeros(hash));
return 1 << R;
```
13. Counting Bloom Filter
**Purpose:** Like Bloom Filter but supports deletions using counters.
**Simple Usage:** Membership check for keys in cache with support for removal.
**Real-World Use Cases:**
- Cache invalidation in CDNs
- Malware filtering
- Router forwarding table
**Pseudocode:**
1. Use k hash functions.
2. For each item, increment counters at k positions.
3. For delete, decrement those counters.
4. Query if all k counters > 0.
**Go Implementation (simplified):**
```go
type CountingBloomFilter struct {
counters []int
k int
func (cbf *CountingBloomFilter) Add(item string) {
Streaming Algorithms Explained with Use Cases
for i := 0; i < cbf.k; i++ {
idx := hash(item, i) % len(cbf.counters)
cbf.counters[idx]++
func (cbf *CountingBloomFilter) Remove(item string) {
for i := 0; i < cbf.k; i++ {
idx := hash(item, i) % len(cbf.counters)
cbf.counters[idx]--
func (cbf *CountingBloomFilter) Query(item string) bool {
for i := 0; i < cbf.k; i++ {
idx := hash(item, i) % len(cbf.counters)
if cbf.counters[idx] <= 0 {
return false
return true
```
14. DGIM Algorithm
**Purpose:** Count 1s in a binary stream over a sliding window using little space.
**Simple Usage:** Count how many times a user clicked 'Yes' in last 100 responses.
**Real-World Use Cases:**
Streaming Algorithms Explained with Use Cases
- Monitoring recent events in sliding windows (alerts, success/failures)
- Packet monitoring in network
**Pseudocode:**
1. Each bucket = (timestamp, size)
2. Merge old buckets of same size.
3. Keep at most 2 buckets of each size.
4. Estimate count from all bucket sizes; halve last one.
**Go Implementation (simplified):**
```go
type Bucket struct {
timestamp int
size int
func estimate(buckets []Bucket, windowSize int) int {
total := 0
for i, b := range buckets {
if i == len(buckets)-1 {
total += b.size / 2
} else {
total += b.size
return total
```
15. Exponential Histogram
Streaming Algorithms Explained with Use Cases
**Purpose:** Approximate count over a sliding window with error bound.
**Simple Usage:** Count how many messages arrived in the last 10 minutes.
**Real-World Use Cases:**
- Real-time dashboards
- IoT time-window stats
**Pseudocode:**
1. Buckets have (timestamp, count), in powers of 2.
2. When too many buckets of same size, merge.
3. Estimate count = sum of bucket sizes.
**Go Implementation (simplified):**
```go
type EHBucket struct {
timestamp int
size int
func addBucket(buckets *[]EHBucket, ts int) {
*buckets = append(*buckets, EHBucket{ts, 1})
for len(*buckets) > 2 && (*buckets)[len(*buckets)-1].size == (*buckets)[len(*buckets)-2].size {
b2 := (*buckets)[len(*buckets)-1]
b1 := (*buckets)[len(*buckets)-2]
*buckets = (*buckets)[:len(*buckets)-2]
*buckets = append(*buckets, EHBucket{b1.timestamp, b1.size + b2.size})
```