0% found this document useful (0 votes)
31 views48 pages

CS246 Final Exam Review Session

big-data

Uploaded by

Trương Đình
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views48 pages

CS246 Final Exam Review Session

big-data

Uploaded by

Trương Đình
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

We will host a final exam review session on

March 7, 6:30-8:30 pm PT (check our website for


zoom link)
¡ The review session will cover solutions to CS246
exam in 2019.

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 1
Note to other teachers and users of these slides: We would be delighted if you found our
material useful for giving your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. If you make use of a significant portion of these slides
in your own lecture, please include this message, or a link to our web site: http://www.mmds.org

CS246: Mining Massive Datasets


Jure Leskovec, Stanford University
Mina Ghashami, Amazon
http://cs246.stanford.edu
¡ Classic model of algorithms
§ You get to see the entire input, then compute
some function of it
§ In this context, “offline algorithm”

¡ Online Algorithms
§ You get to see the input one piece at a time, and
need to make irrevocable decisions along the way
§ Similar to the data stream model

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 4
¡ Query-to-advertiser graph:
query

advertiser

[Andersen, Lang: Communities from seed sets, 2006]

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 5
Advertiser Opportunity to Which advertiser
show an ad gets picked
1 a
(1,a)
2 b (2,b)
c
(3,d)
3

4 d

Advertiser X wants
to show an ad for
topic/query Y

This is an online problem: We have to make decisions


as queries/topics show up. We do not know what
topics will show up in the future.
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 6
1 a

2 b

3 c

Boys 4 d Girls

Nodes: Boys and Girls; Links: Preferences


Goal: Match boys to girls so that the most
preferences are satisfied Note: edges are only
preferences with no
weight or order.
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 8
1 a

2 b

3 c

Boys 4 d Girls

M = {(1,a),(2,b),(3,d)} is a matching
Cardinality of matching = |M| = 3

Matching means that we are not using any vertex twice


3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 9
1 a

2 b

3 c

Boys 4 d Girls

M = {(1,c),(2,b),(3,d),(4,a)} is a
perfect matching
Perfect matching … all vertices of the graph are matched
Maximum matching … matching that contains the largest possible number of matches
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 10
¡ Problem: Find a maximum matching for a
given bipartite graph
§ A perfect one if it exists

¡ There is a polynomial-time offline algorithm


based on augmenting paths (Hopcroft & Karp 1973,
see http://en.wikipedia.org/wiki/Hopcroft-Karp_algorithm)

¡ But what if we do not know the entire


graph upfront?

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 11
¡ Initially, we are given the set boys
¡ In each round, one girl’s choices are revealed
§ That is, the girl’s edges are revealed
¡ At that time, we have to decide to either:
§ Pair the girl with a boy
§ Do not pair the girl with any boy

¡ Example of application:
Assigning tasks to servers

Note: Matching means that we are not using any girl or boy twice
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 12
1 a
(1,a)
2 b (2,b)
c
(3,d)
3

4 d

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 13
¡ Greedy algorithm for the online graph
matching problem:
§ Pair the new girl with any eligible boy
§ If there is none, do not pair the girl

¡ How good is the algorithm?

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 14
¡ For input I, suppose greedy produces
matching Mgreedy while an optimal
matching is Mopt

Competitive ratio =
minall possible inputs I (|Mgreedy|/|Mopt|)
(what is greedy’s worst performance over all possible inputs I)

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 15
¡ Consider a case: Mgreedy≠ Mopt 1
Mopt
a
¡ Consider the set G of girls 2
M gr e
edy

b
matched in Mopt but not in Mgreedy 3 c

¡ (1) By definition of G: 4 d

|Mopt| £ |Mgreedy| + |G| B={ } G={


Edge color indicates matching in
}
Mopt (blue) vs. Mgreedy(red).

¡ (2) Define set B of boys linked to girls in G


§ Notice boys in B are already matched in Mgreedy. Why?
§ If there would exist such non-matched (by Mgreedy) boy
adjacent to a non-matched girl then greedy would have
matched them
So: |Mgreedy|≥ |B|
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 16
Mopt
¡ Summary so far: 1
M gr e
edy
a

§ Girls G matched in Mopt but not in Mgreedy2 b


3
§ Boys B adjacent to girls in G c

§ (1) |Mopt| £ |Mgreedy| + |G| 4 d


B={ } G={ }
§ (2) |Mgreedy|≥ |B| Edge color indicates matching in
Mopt (blue) vs. Mgreedy(red).

¡ Optimal matches all girls in G to (some) boys in B


§ (3) |G| £ |B|

¡ Combining (2) and (3):


§ (4) |G| £ |B| £ |Mgreedy|
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 17
Mopt
¡ So we have: 1
M gr e
edy
a

§ (1) |Mopt| £ |Mgreedy| + |G| 2 b


3
§ (4) |G| £ |B| £ |Mgreedy| c

4 d

Combining (1) and (4): B={ } G={ }


¡ Edge color indicates matching in
Mopt (blue) vs. Mgreedy(red).

§ Worst case is when |G| = |B| = |Mgreedy|


§ |Mopt| £ |Mgreedy| + |Mgreedy|
§ Then |Mgreedy|/|Mopt| ³ 1/2

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 18
1 a
(1,a)
2 b (2,b)
3 c

4 d

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 21
¡ Banner ads (1995-2001)
§ Initial form of web advertising
§ Popular websites charged
$X for every 1,000
“impressions” of the ad
§ Called “CPM” rate
CPM…cost per mille
(Cost per thousand impressions) Mille…thousand in Latin
§ Modeled similar to TV, magazine ads
§ From untargeted to demographically targeted
§ Low click-through rates
§ Low ROI for advertisers
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 23
¡ Introduced by Overture around 2000
§ Advertisers bid on search keywords
§ When someone searches for that keyword, the
highest bidder’s ad is shown
§ Advertiser is charged only if the ad is clicked on

¡ Similar model adopted by Google with some


changes around 2002
§ Called Adwords

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 24
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 25
¡ Performance-based advertising works!
§ Multi-billion-dollar industry

¡ Interesting problem:
Which ads to show for a given query?
§ (Today’s lecture)

¡ If I am an advertiser, which search terms


should I bid on and how much should I bid?
§ (Not focus of today’s lecture)

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 26
¡ A stream of queries arrives at the search
engine: q1, q2, …
¡ Several advertisers bid on each query
¡ When query qi arrives, search engine must
pick a subset of advertisers to show their ads
¡ Goal: Maximize search engine’s revenues
§ Simple solution: Instead of raw bids, use the
“expected revenue per click” (i.e., Bid*CTR)
¡ Clearly we need an online algorithm!

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 27
Advertiser Bid CTR Bid * CTR

A $1.00 1% 1 cent

B $0.75 2% 1.5 cents

C $0.50 2.5% 1.25 cents


Click through Expected
rate revenue

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 28
Advertiser Bid CTR Bid * CTR

B $0.75 2% 1.5 cents

C $0.50 2.5% 1.25 cents

A $1.00 1% 1 cent

Instead of sorting advertisers by bid, sort by expected revenue

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 29
Instead of sorting advertisers by bid, sort by expected revenue

Advertiser Bid CTR Bid * CTR

B $0.75 2% 1.5 cents

C $0.50 2.5% 1.25 cents

A $1.00 1% 1 cent

Challenges:
¡ CTR of an ad is unknown
¡ Advertisers have limited budgets and bid on
multiple queries
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 30
¡ Two complications:
§ Budget
§ CTR of an ad is unknown

1) Budget: Each advertiser has a limited budget


§ Search engine guarantees that the advertiser
will not be charged more than their daily budget

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 31
¡ 2) CTR (Click-Through Rate): Each ad-query
pair has a different likelihood of being clicked
§ Advertiser 1 bids $2 on query A,
click probability = 0.1
§ Advertiser 2 bids $1 on query B,
click probability = 0.5
¡ CTR is predicted or measured historically
§ Averaged over a time period

¡ Some complications we will not cover:


§ 1) CTR is position dependent:
§ Ad #1 is clicked more than Ad #2
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 32
¡ Some complications we will cover
(next lecture):
§ 2) Exploration vs. exploitation
Exploit: Should we keep showing an ad for which
we have good estimates of click-through rate?
or
Explore: Shall we show a brand new ad to get a
better sense of its click-through rate?

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33
¡ Given:
§ 1. A set of bids by advertisers for search queries
§ 2. A click-through rate for each advertiser-query pair
§ 3. A budget for each advertiser (say for 1 month)
§ 4. A limit on the number of ads to be displayed with
each search query
¡ Respond to each search query with a set of
advertisers such that:
§ 1. The size of the set is no larger than the limit on the
number of ads per query
§ 2. Each advertiser has bid on the search query
§ 3. Each advertiser has enough budget left to pay for
3/3/22
the ad if it is clicked upon
Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 35
¡ Our setting: Simplified environment
§ There is 1 ad shown for each query
§ All advertisers have the same budget B
§ All ads are equally likely to be clicked
§ Bid value of each ad is the same (=$1)

¡ Simplest algorithm is greedy:


§ For a query pick any advertiser who has
bid 1 for that query
§ Competitive ratio of greedy is 1/2

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36
¡ Two advertisers A and B
§ A bids on query x, B bids on x and y
§ Both have budgets of $4
¡ Query stream: x x x x y y y y
§ Worst case greedy choice: B B B B _ _ _ _
§ Optimal: A A A A B B B B
§ Competitive ratio = ½
¡ This is the worst case!
§ Note: Greedy algorithm is deterministic – it always
resolves draws in the same way

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 37
¡ BALANCE Algorithm by Mehta, Saberi,
Vazirani, and Vazirani
§ For each query, pick the advertiser with the
largest unspent budget
§ Break ties arbitrarily (but in a deterministic way)

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 38
¡ Two advertisers A and B
§ A bids on query x, B bids on x and y
§ Both have budgets of $4

¡ Query stream: x x x x y y y y

¡ BALANCE choice: A B A B B B _ _
§ Optimal: A A A A B B B B

¡ In general: For BALANCE on 2 advertisers


Competitive ratio = ¾
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 39
¡ Consider simple case (w.l.o.g.):
§ 2 advertisers, A1 and A2, each with budget B (³1)
§ Optimal solution exhausts both advertisers’ budgets
¡ BALANCE must exhaust at least one budget:
§ If not, we can allocate more queries
§ Whenever BALANCE makes a mistake (both advertisers bid
on the query), advertiser’s unspent budget only decreases
§ Since optimal exhausts both budgets, one will for sure get
exhausted
§ Assume BALANCE exhausts A2’s budget,
but allocates x queries fewer than the optimal
§ So revenue of BALANCE = 2B – x (where OPT is 2B)
§ Let’s work out what x is!
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 40
Queries allocated to A1 in optimal solution
B
Queries allocated to A2 in optimal solution
Opt revenue = 2B
A1 A2

Balance revenue = 2B-x = B+y


x
B
We claim y > x (next slide).
y x Balance revenue is minimum for x=y=B/2.
Minimum Balance revenue = 3B/2.
A1 A2 Not used
Competitive Ratio = 3/4.
Balance allocation

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 41
Queries allocated to A1 in the optimal solution
B
Queries allocated to A2 in the optimal solution

Optimal revenue = 2B
A1 A2 Assume Balance gives revenue = 2B-x = B+y
Assume we exhausted A2’s budget

x Notice: Unassigned queries should be assigned


B
to A2 (since if we could assign to A1 we would since we still have
y x the budget)
Goal: Show we have y ³ B/2
A1 A2 Not
used Case 1) BALANCE assigns ≥B/2 blue queries to
A1.
Then trivially, 𝒚 ≥ 𝑩/𝟐

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42
Queries allocated to A1 in the optimal solution
B
Queries allocated to A2 in the optimal solution
Optimal revenue = 2B
Assume Balance gives revenue = 2B-x = B+y
A1 A2
Assume we exhausted A2’s budget
Unassigned queries should be assigned to A2
x (if we could assign to A1 we would since we still have the budget)
B Goal: Show we have y ³ B/2
y x
Balance revenue is minimum for 𝒙 = 𝒚 = 𝑩/𝟐
A1 A2 Not Minimum Balance revenue = 𝟑𝑩/𝟐
used Competitive Ratio: BAL/OPT = 3/4
Case 2) BALANCE assigns ≥B/2 blue queries to A2.
Consider the last blue query assigned to A2.
At that time, A2’s unspent budget must have been at least as big as A1’s.
That means at least as many queries have been assigned to A1 as to A2.
At this point, we have already assigned at least B/2 queries to A2.
So, 𝒙 ≤ 𝑩/𝟐 and 𝒙 + 𝒚 = 𝑩 then 𝒚 > 𝑩/𝟐
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43
¡ In the general case, worst competitive ratio
of BALANCE is 1–1/e = approx. 0.63
§ e = 2.7182
§ Interestingly, no online algorithm has a better
competitive ratio!

¡ Let’s see the worst case example that gives


this ratio

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44
¡ N advertisers: A1, A2, … AN
§ Each with budget B > N
¡ Queries:
§ N·B queries appear in N rounds of B queries each
¡ Bidding:
§ Round 1 queries: bidders A1, A2, …, AN
§ Round 2 queries: bidders A2, A3, …, AN
§ Round i queries: bidders Ai, …, AN
¡ Optimum allocation:
Allocate all round i queries to Ai
§ Optimum revenue N·B
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 45
: Budget
Advertiser’s budget

spent in rounds
1,2, 3, …

… B/(N-2)
B/(N-1)
B/N
A1 A2 A3 AN-1 AN

BALANCE assigns each of the queries in round 1 to N advertisers.


After k rounds, sum of allocations 𝑺𝒌 to each of advertisers
𝒌 𝑩

Ak,…,AN is 𝑺𝒌 = 𝑺𝒌-𝟏 = ⋯ = 𝑺𝑵 = 𝒊1𝟏 𝑵2(𝒊2𝟏)

If we find the smallest k such that Sk ³ B, then after k rounds


we cannot allocate any queries to any advertiser
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 46
B/1 B/2 B/3 … B/(N-(k-1)) … B/(N-1) B/N
S1
S2

Sk = B
Can divide everything by B:
1/1 1/2 1/3 … 1/(N-(k-1)) … 1/(N-1) 1/N
S1
S2

Sk = 1

3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 47
¡ Fact: 𝑯𝒏 = ∑𝒏𝒊#𝟏 𝟏/𝒊 ≈ 𝐥𝐧 𝒏 for large n
§ Result due to Euler
1/1 1/2 1/3 … 1/(N-(k-1)) … 1/(N-1) 1/N
ln(N)

ln(N)-1 Sk = 1
𝑵
¡ 𝑺𝒌 = 𝟏 implies: 𝑯𝑵'𝒌 = 𝒍𝒏(𝑵) − 𝟏 = 𝒍𝒏( )
𝒆
¡ We also know: 𝑯𝑵'𝒌 = 𝒍𝒏(𝑵 − 𝒌)
𝑵
¡ So: 𝑵 − 𝒌 = N terms sum to ln(N).
𝒆 Last k terms sum to 1.
𝟏
¡ Then: 𝒌 = 𝑵(𝟏 − ) First N-k terms sum
𝒆 to ln(N-k) but also to ln(N)-1
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 48
¡ So after the first k=N(1-1/e) rounds, we
cannot allocate a query to any advertiser

¡ Revenue = B·N (1-1/e)

¡ Competitive ratio = 1-1/e

¡ Note: So far we assumed:


§ All advertisers have the same budget B
§ All advertisers bid 1 for the ad
§ (but each advertiser can bid on any subset of ads)
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 49
¡ Arbitrary bids and arbitrary budgets!
¡ Consider we have 1 query q, advertiser i
§ Bid = xi
§ Budget = bi
¡ In a general setting BALANCE can be terrible
§ Consider two advertisers A1 and A2
§ A1: x1 = 1, b1 = 110
§ A2: x2 = 10, b2 = 100
§ Consider we see 10 instances of q
§ BALANCE always selects A1 and earns 10
§ Optimal earns 100
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 50
¡ Arbitrary bids: consider query q, bidder i
§ Bid = xi
§ Budget = bi
§ Amount spent so far = mi
§ Fraction of budget left over fi = 1-mi/bi
§ Define yi(q) = xi(1-e-fi)

¡ Allocate query q to bidder i with largest


value of yi(q)
¡ Same competitive ratio (1-1/e) = 0.63
3/3/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 51

You might also like