Week 2
Week 2
• The two sectors provide returns of f1(x) and f2(x) respectively for investment ‘x’
• I need to decide how to divide my budget in the two sectors to maximize the net return
• The two sectors provide returns of f1(x) and f2(x) respectively for investment ‘x’
• I need to decide how to divide my budget in the two sectors to maximize the net return
Linear Regression: squared error loss function (h(x) - y)2 where h(x) = W.X + b
Optimal values of (W,b): argmin ∑i (W.Xi +b - yi)2
Optimal values can be calculated by equating the derivatives to 0
For vector x, optimization for each dimension can be decoupled from the rest!
Gradient Descent based Optimization
Consider f(x) = x2 – 4x + 4; f’(x) = 2x – 4
Set initial point x=5, learning rate a = 0.1
Gradient Descent based Optimization
Consider f(x) = x2 – 4x + 4; f’(x) = 2x – 4
Set initial point x=5, learning rate a = 0.1
Gradient Descent based Optimization
Consider f(x) = x2 – 4x + 4; f’(x) = 2x – 4
Set initial point x=5, learning rate a = 0.1
Gradient Descent based Optimization
• Does it always converge?
- Depends on the “learning rate”
- low learning rate: slow convergence
- high learning rate: may oscillate around the minima!
• Does it always give the optimal solution?
- Yes, if the function is Convex (has unique minima)
• - Otherwise, it converges at any minima
Source: Wikipedia
Numerical methods can converge to any local minima (closest to initial point)
Solution: i) start descent from multiple initial points
ii) Find local minima for each of them
iii) Find minima of these local minima!
Multi-objective Optimization
Multi-objective Optimization
Multi-objective Optimization
Multi-objective Optimization
Pareto-Optimal Solutions
• In multi-objective optimization, goodness of a solution is defined by “Dominance”
• A solution ‘x1’ dominates another solution ‘x2’ if (assuming minimization problem for all objectives)
i. fi(x1) <= fi(x2) for every objective ‘i’
ii. fj(x1) < fj(x2) for at least one objective ‘j’
• Given two solutions (x1, x2), it is possible that neither dominates the other!
• A solution ‘x’ is pareto-optimal if it is not dominated by any other solution!
• Pareto Improvements in Trade: International trade can lead to Pareto improvements by allowing
countries to specialize in producing goods where they have a comparative advantage and then trade for
goods produced more efficiently by other countries.
• Income Redistribution: A transfer of resources from a wealthy individual to a poorer individual can be
Pareto-improving if it increases overall social welfare without making the wealthy individual worse off.
• Social Welfare Maximization: A social welfare function that achieves Pareto optimality
ensures that no feasible reallocation of resources can make any individual better off
without making someone else worse off.
Lecture – 07
Topic: Constrained Optimization
• Linear Program
• Lagrange Multipliers
• Original problem:
• Min f(x) such that h1(x) = 0, h2(x) = 0 …… hk(x) = 0
• Alternative problem: Min F(x, λ1, λ2, …. λk) = f(x) + λ1*h1(x) + λ2*h2(x) + …… + λk*hk(x)
• The idea: if λ1, λ2 etc are high enough, then the optimal solution must have low values of h1(x), h2(x) etc
automatically!
• Approach: Find the stationery points of F with respect to all the variables,
i.e. partial derivatives of F should be 0 w.r.t. all variables
• All inequality constraints are converted to equality constraints with slack variables
• Lagrange Multipliers added as earlier
If all of these are satisfied by x*, then it is the optimal solution to original problem
Interior Point Methods
• Basic Approach: start with a point within the feasible set (need not be optimal w.r.t. objective)
• Perturb the solution towards optimal (maybe gradient descent)
• Check if constraints are violated – limit the perturbation accordingly!
• Implemented using Barrier Functions – very low inside feasible region, very high along and beyond the borders!
Interior Point Methods
• Basic Approach: start with a point within the feasible set (need not be optimal w.r.t. objective)
• Perturb the solution towards optimal (maybe gradient descent)
• Check if constraints are violated – limit the perturbation accordingly!
• Implemented using Barrier Functions – very low inside feasible region, very high along and beyond the borders!
Interior Point Methods
• Basic Approach: start with a point within the feasible set (need not be optimal w.r.t. objective)
• Perturb the solution towards optimal (maybe gradient descent)
• Check if constraints are violated – limit the perturbation accordingly!
• Implemented using Barrier Functions – very low inside feasible region, very high along and beyond the borders!
Interior Point Methods
• Basic Approach: start with a point within the feasible set (need not be optimal w.r.t. objective)
• Perturb the solution towards optimal (maybe gradient descent)
• Check if constraints are violated – limit the perturbation accordingly!
• Implemented using Barrier Functions – very low inside feasible region, very high along and beyond the borders!
Exterior Point Methods
• Basic approach: start at initial point outside feasible region
• Optimize w.r.t. objective function plus constraints
• Gradually increase constraint weights till they are satisfied
• Automatically pushes solution towards the feasible set
Exterior Point Methods
• Basic approach: start at initial point outside feasible region
• Optimize w.r.t. objective function plus constraints
• Gradually increase constraint weights till they are satisfied
• Automatically pushes solution towards the feasible set
Exterior Point Methods
• Basic approach: start at initial point outside feasible region
• Optimize w.r.t. objective function plus constraints
• Gradually increase constraint weights till they are satisfied
• Automatically pushes solution towards the feasible set
Lecture – 08
Topic: Heuristic Search Techniques
• Graph Search
• Best-First Approach: A*
A A
B B
C C
Breadth First Search: whenever we reach a vertex, note down its unexplored neighbors, visit them in order (queue)
Depth First Search: whenever we reach a vertex, pick one of its unexplored neighbors and explore it (stack)
1 2 1 2
2 3 3 4
3 6
3 5
Dijkstra’s Algorithm: finds shortest path from fixed origin to all other vertices
Floyd-Warshall’s Algorithm: finds shortest path between all pairs of vertices!
Heuristic Search
• BFS, DFS involve backtracking (revisiting vertices)
• Shortest-path algorithms assume the entire graph will be revealed together
• These may not be possible in an actual graph-search scenario
• We may have to choose one vertex at each step, and may not be able to backtrack!
• Our final aim: to reach a “goal vertex” at least cost!
• Eg. Chess!
5
START 1
2 2
4 GOAL
1
3 2 4
GOAL
6
Heuristic Search
• We need to have an estimate – how far is a goal vertex from a given node?
• From current vertex, we will move to that neighbouring vertex through which the shortest path seems to pass
(based on estimates)!
• Heuristic function h(x): estimated shortest path cost from vertex x to any goal vertex (available for each vertex)
5 1
START 7 1
2 2
5 4 1 GOAL
1
6 3 2 4
4 GOAL
6
F H D 1 100
6
GOAL E 1 100
F 4 100
A* Algorithm
• Set x=START vertex, Add x to OPEN list, Set f(x)=h(x), g(x)=0, g(y)= INFTY for all other vertices ‘y’, CLOSED list: {}
• Repeat:
• 1) At the current node x, if x=GOAL, stop and report g(x)
• 2) Else, for each neighbor y of x (that is not in CLOSED list)
Add y to OPEN list (if not already present)
If g(y) > g(x) + c(x,y) [c(x,y): weight of edge x->y]
Set g(y) = g(x) + c(x,y), f(y) = g(y) + h(y), set Prev(y)=x
• 3) Add x to CLOSED list, Move to the vertex in OPEN list with least value of f, and GOTO 1!
h(x) g(x) f(x) Prev(x)
5 D
START A 1
GOAL A 7 0 7 -
2 2
C 4 E G B 6 100
1
B 3 2 4 C 5 2 7 A
F H D 1 5 6 A
6
GOAL E 1 100
F 4 100
A* Algorithm
• Set x=START vertex, Add x to OPEN list, Set f(x)=h(x), g(x)=0, g(y)= INFTY for all other vertices ‘y’, CLOSED list: {}
• Repeat:
• 1) At the current node x, if x=GOAL, stop and report g(x)
• 2) Else, for each neighbor y of x (that is not in CLOSED list)
Add y to OPEN list (if not already present)
If g(y) > g(x) + c(x,y) [c(x,y): weight of edge x->y]
Set g(y) = g(x) + c(x,y), f(y) = g(y) + h(y), set Prev(y)=x
• 3) Add x to CLOSED list, Move to the vertex in OPEN list with least value of f, and GOTO 1!
h(x) g(x) f(x) Prev(x)
5 D
START A 1
GOAL A 7 0 7 -
2 2
C 4 E G B 6 100
1
B 3 2 4 C 5 2 7 A
F H D 1 5 6 A
6
GOAL E 1 6 7 D
F 4 100
A* Algorithm
• Set x=START vertex, Add x to OPEN list, Set f(x)=h(x), g(x)=0, g(y)= INFTY for all other vertices ‘y’, CLOSED list: {}
• Repeat:
• 1) At the current node x, if x=GOAL, stop and report g(x)
• 2) Else, for each neighbor y of x (that is not in CLOSED list)
Add y to OPEN list (if not already present)
If g(y) > g(x) + c(x,y) [c(x,y): weight of edge x->y]
Set g(y) = g(x) + c(x,y), f(y) = g(y) + h(y), set Prev(y)=x
• 3) Add x to CLOSED list, Move to the vertex in OPEN list with least value of f, and GOTO 1!
h(x) g(x) f(x) Prev(x)
5 D
START A 1
GOAL A 7 0 7 -
2 2
C 4 E G B 6 3 9 C
1
B 3 2 4 C 5 2 7 A
F H D 1 5 6 A
6
GOAL E 1 6 7 D
F 4 5 9 C
A* Algorithm
• Set x=START vertex, Add x to OPEN list, Set f(x)=h(x), g(x)=0, g(y)= INFTY for all other vertices ‘y’, CLOSED list: {}
• Repeat:
• 1) At the current node x, if x=GOAL, stop and report g(x)
• 2) Else, for each neighbor y of x (that is not in CLOSED list)
Add y to OPEN list (if not already present)
If g(y) > g(x) + c(x,y) [c(x,y): weight of edge x->y]
Set g(y) = g(x) + c(x,y), f(y) = g(y) + h(y), set Prev(y)=x
• 3) Add x to CLOSED list, Move to the vertex in OPEN list with least value of f, and GOTO 1!
h(x) g(x) f(x) Prev(x)
5 D
START A 1
GOAL A 7 0 7 -
2 2
C 4 E G B 6 3 9 C
1
B 3 2 4 C 5 2 7 A
F H D 1 5 6 A
6
GOAL E 1 6 7 D
F 4 5 9 C
A* Algorithm
• Set x=START vertex, Add x to OPEN list, Set f(x)=h(x), g(x)=0, g(y)= INFTY for all other vertices ‘y’, CLOSED list: {}
• Repeat:
• 1) At the current node x, if x=GOAL, stop and report g(x)
• 2) Else, for each neighbor y of x (that is not in CLOSED list)
Add y to OPEN list (if not already present)
If g(y) > g(x) + c(x,y) [c(x,y): weight of edge x->y]
Set g(y) = g(x) + c(x,y), f(y) = g(y) + h(y), set Prev(y)=x
• 3) Add x to CLOSED list, Move to the vertex in OPEN list with least value of f, and GOTO 1!
h(x) g(x) f(x) Prev(x)
5 D
START A 1
GOAL A 7 0 7 -
2 2
C 4 E G B 6 3 9 C
1
B 3 2 4 C 5 2 7 A
F H D 1 5 6 A
6
GOAL E 1 6 7 D
Shortest path: A->D->E->G (cost 8)
Note: A->C->E->G is also 8, but wasn’t chosen! F 4 5 9 C
Guarantees of A* Algorithm
• At any point, f(x) indicates the cost of the shortest path from START to a GOAL VIA ‘x’ and the
vertices visited till then
• Guaranteed to find shortest path if h(x) is ADMISSIBLE
• Admissible heuristic function: always UNDER-ESTIMATES least cost to any Goal state!
• Otherwise, we may choose wrong vertex and never come back to correct vertex!
• If h(x) is too low, we may have to search entire graph (equivalent to Breadth First Search)
F D 10 100
H
4
GOAL E 4 100
F 4 100
Guarantees of A* Algorithm
• At any point, f(x) indicates the cost of the shortest path from START to a GOAL VIA ‘x’ and the
vertices visited till then
• Guaranteed to find shortest path if h(x) is ADMISSIBLE
• Admissible heuristic function: always UNDER-ESTIMATES least cost to any Goal state!
• Otherwise, we may choose wrong vertex and never come back to correct vertex!
• If h(x) is too low, we may have to search entire graph (equivalent to Breadth First Search)
F D 10 5 15 A
H
4
GOAL E 4 100
F 4 100
Guarantees of A* Algorithm
• At any point, f(x) indicates the cost of the shortest path from START to a GOAL VIA ‘x’ and the
vertices visited till then
• Guaranteed to find shortest path if h(x) is ADMISSIBLE
• Admissible heuristic function: always UNDER-ESTIMATES least cost to any Goal state!
• Otherwise, we may choose wrong vertex and never come back to correct vertex!
• If h(x) is too low, we may have to search entire graph (equivalent to Breadth First Search)
F D 10 5 15 A
H
4
GOAL E 4 6 10 C
F 4 5 9 C
Guarantees of A* Algorithm
• At any point, f(x) indicates the cost of the shortest path from START to a GOAL VIA ‘x’ and the vertices
visited till then
• Guaranteed to find shortest path if h(x) is ADMISSIBLE
• Admissible heuristic function: always UNDER-ESTIMATES least cost to any Goal state!
• Otherwise, we may choose wrong vertex and never come back to correct vertex!
• If h(x) is too low, we may have to search entire graph (equivalent to Breadth First Search)
• Simple choice of h(x): minimum cost to any of its neighbors!
F D 10 5 15 A
H
4
GOAL E 4 6 10 C
• Supply Chain Optimization: Companies often need to optimize their supply chains to minimize
costs or maximize efficiency. Heuristic graph search algorithms can be used to find the most
efficient routes for transportation and distribution, taking into account factors like
distance, time, and costs.
• Can use graph algorithms like Dijkstra’s Algorithm if full graph is known
• For limited-horizon search, we need to estimate distances to goal state using Heuristic function
• Too low value of heuristic function: slows down search process, explores unnecessary
vertices!
Lecture – 09
Topic: Multi-objective Heuristic Search and Game Trees
• Game Tree
• Alpha-Beta Pruning
chesscoach.com
Why is Chess So Hard?
• At each step, many possibilities available, many goal states
• We need a heuristic to estimate the distance from a state to a goal state (how?)
• It is difficult to think ahead beyond 2-3 steps!
• Plus, the opponent also moves!
• Tic-tac-toe also has similar structure with much fewer configurations!
Game Tree Formulation
• Can be used to represent any two/multi-player contest, one player moves at each step turnwise
• Vertex represents state of the contest, labelled with a heuristic function
• Alternate levels of tree represents the move by a player
• At each level, the player can move from current state to another by choosing an action
• Each action has a cost, each player aims to reach a goal state at minimum total cost
• In some cases, whoever reaches a goal state first, wins!
P1 MOVES
P2 MOVES
P1 MOVES
Game Tree Formulation
• Can be used to represent any two/multi-player contest, one player moves at each step turnwise
• Vertex represents state of the contest, labelled with a heuristic function
• Alternate levels of tree represents the move by a player
• At each level, the player can move from current state to another by choosing an action
• Each action has a cost, each player aims to reach a goal state at minimum total cost
• In some cases, whoever reaches a goal state first, wins!
• Each state has a heuristic function, indicating “goodness” of state to one player
P1 MOVES
8
2 4
3
6 9 7 P2 MOVES
2 7 3 4 8
6
2 8 12 5 P1 MOVES
Game Tree Formulation
• Can be used to represent any two/multi-player contest, one player moves at each step turnwise
• Vertex represents state of the contest, labelled with a heuristic function
• Alternate levels of tree represents the move by a player
• At each level, the player can move from current state to another by choosing an action
• Each action has a cost, each player aims to reach a goal state at minimum total cost
• In some cases, whoever reaches a goal state first, wins!
• Each state has a heuristic function, indicating “goodness” of state to one player
P1 2
P1 MOVES
P2 8
2 4
3
6 9 7 P2 MOVES
2 7 3 4 8
6
2 8 12 5 P1 MOVES
Game Tree Formulation
• Can be used to represent any two/multi-player contest, one player moves at each step turnwise
• Vertex represents state of the contest, labelled with a heuristic function
• Alternate levels of tree represents the move by a player
• At each level, the player can move from current state to another by choosing an action
• Each action has a cost, each player aims to reach a goal state at minimum total cost
• In some cases, whoever reaches a goal state first, wins!
• Each state has a heuristic function, indicating “goodness” of state to one player
P1 2
P1 MOVES
P2 7 8
2 4
3
6 9 7 P2 MOVES
2 7 3 4 8
6
2 8 12 5 P1 MOVES
Game Tree Formulation
• At each step, P1 tries to reach a vertex with
8 P1 MOVES least value of heuristic function
4 • At next step, P2 tries to reach a vertex with
2 3 maximum value of heuristic function!
P2 MOVES • Each leaf node characterized by a value
6 9 7
• Low values preferred by P1, high values by P2
2 7 3 4 8
6
P1 2
2 8 12 5 P1 MOVES
P2 7
1 5 9 7 8 4
6 1 0 P2 MOVES
3 6 5 1
4 8 1 GOAL
Game Tree Formulation
• At each step, P1 tries to reach a vertex with
8 P1 MOVES least value of heuristic function
4 • At next step, P2 tries to reach a vertex with
2 3 maximum value of heuristic function!
P2 MOVES • Each leaf node characterized by a value
6 9 7
• Low values preferred by P1, high values by P2
2 7 3 4 8
6
P1 2 9
2 8 12 5 P1 MOVES
P2 7 5
1 5 9 7 8 4
6 1 0 P2 MOVES
3 6 5 1
4 8 1 GOAL
Game Tree Formulation
• At each step, P1 tries to reach a vertex with
8 P1 MOVES least value of heuristic function
4 • At next step, P2 tries to reach a vertex with
2 3 maximum value of heuristic function!
P2 MOVES • Each leaf node characterized by a value
6 9 7
• Low values preferred by P1, high values by P2
2 7 3 4 8
6
P1 2 9
2 8 12 5 P1 MOVES
P2 7 5
1 5 9 7 8 4
6 1 0 P2 MOVES
P2 wins the game, but
3 6 5 1 pays higher cost (12)
compared to P1 (11)!
4 8 1 GOAL
Optimal Path Identification
• Each leaf vertex already has a “utility” measure
8 P1 MOVES • Use it to update the heuristic values at the
4 remaining vertices
2 3 • Consider each pre-leaf vertex
P2 MOVES • If P2 makes move, they will surely move to the
6 9 7
leaf with higher value
2 7 3 4 8 • Copy same value to pre-leaf vertex!
6
2 8 12 5 P1 MOVES
1 5 9 7 8 4
8 8 0 P2 MOVES
3 6 5 1
4 8 1 GOAL
Optimal Path Identification
• Each leaf vertex already has a “utility” measure
0 P1 MOVES • Use it to update the heuristic values at the
4 remaining vertices
2 3 • Consider each pre-leaf vertex
P2 MOVES • If P2 makes move, they will surely move to the
8 8 0
leaf with higher value
2 7 3 4 8 • Copy same value to pre-leaf vertex!
6
8 8 0 0 P1 MOVES
2 7 3 4 8
6
8 8 0 0 P1 MOVES
1 5 9 7 8 4
8 8 0 P2 MOVES
3 6 5 1
4 8 1 GOAL
Optimal Path Identification
• But if P2 moves first, result can be quite
4 P2 MOVES different!
2 4
3
4 1 0 P1 MOVES
2 7 3 4 8
6
4 4 1 0 P2 MOVES
1 5 9 7 8 4
4 1 0 P1 MOVES
3 6 5 1
4 8 1 GOAL
Alpha-Beta Pruning
• Is it really necessary to evaluate the entire
game tree?
• Sometimes, we already know that a particular
move can never give the best result!
• Prune the unnecessary parts!
• Alpha-pruning: prune subtrees with high
values!
• Beta-pruning: prune substrees with low values!
Source: wikipedia
Multi-objective Heuristic Search
• Edge cost c(x,y) is vector-valued!
• Heuristic function h(x), estimated path costs f(x) and g(x) also vector-valued!
• Need to change A* algorithm accordingly!
• When comparing f(y) values of open list, we may find a ‘non-dominating’ set of vertices
• Eg. f(y1) = [3 4 8], f(y2) = [2 5 7], f(y3) = [4 5 8] : {y1, y2} are non-dominating!
Expand n2
Expand n4
Expand n7
Max-dominance Min-dominance
Non-dominance
• In behavioral economics, game trees allow economists to predict how rational agents
will behave in strategic situations, such as oligopoly competition, bargaining
situations, or negotiations where each player's decision depends on their
expectations of others' actions.
• Agglomerative Clustering
• K-means Clustering
Single linkage: minimum distance between two points from the two sets
Multiple linkage: maximum distance between two points from the two sets
Average linkage: mean distance between two points from the two sets
3
4
2
Agglomerative Clustering (Threshold: 5)
TERMINATION!!
TOO HIGH!
Prototype-based Clustering
K-means Clustering (MacQueen, 1967)
Calculate Distances to Initial Clusters
Assignment to Clusters
Re-estimate Cluster Centers
Stopping Criteria
• No change of cluster assignment for any data-point
• No (or negligible) change of cluster centers
• Intra-cluster distance at local minima
Stopping Criteria
• No change of cluster assignment for any data-point
• No (or negligible) change of cluster centers
• Intra-cluster distance at local minima
Heuristics
• K-means provides only a local optima
• Success of K-means clustering often depends on choice of initial clusters
• Choice of K (number of clusters) is empirical
Applications of Clustering in Economics
• Market Segmentation: The aim here is to identify groups of consumers, who may
behave as a collective. Such groups may be formed on the basis of multiple attributes
including age, gender, demographics etc.
• Labour Market Analysis: Clustering of workers in labour force based on skills, education,
experience, and other relevant factors. This helps in understanding labor market trends,
wage differentials, and workforce mobility, also identifying sets of workers to advertise
specific job opportunities.
• Clustering: partition examples into coherent and distinct groups