MST-based Dependency Parsing
Pawan Goyal
CSE, IIT Kharagpur
September 27th - October 3rd, 2016
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 1 / 24
Maximum Spanning Tree Based
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 2 / 24
Maximum Spanning Tree Based
Basic Idea
Starting from all possible connections, find the maximum spanning tree.
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 2 / 24
Maximum Spanning Tree Based
Basic Idea
Starting from all possible connections, find the maximum spanning tree.
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 2 / 24
Some Graph Theory Reminders
A graph G = (V, A) is a set of vertices V and arcs (i, j) A where i, j V .
Undirected graphs: (i, j) A (j, i) A
Directed graphs (digraphs) : (i, j) A ; (j, i) A
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 3 / 24
Multi-Digraphs
A multi-digraph is a digraph where multiple arcs between vertices are
possible
(i, j, k) A represents the kth arc from vertex i to vertex j.
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 4 / 24
Directed Spanning Trees
A directed spanning tree of a (multi-)digraph G = (V, A) is a subgraph
G0 = (V 0 , A0 ) such that:
I V0 = V
I A0 A, and |A0 | = |V 0 | 1
I G0 is a tree (acyclic)
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 5 / 24
Directed Spanning Trees
A directed spanning tree of a (multi-)digraph G = (V, A) is a subgraph
G0 = (V 0 , A0 ) such that:
I V0 = V
I A0 A, and |A0 | = |V 0 | 1
I G0 is a tree (acyclic)
A spanning tree of the following (multi-)digraphs
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 5 / 24
Weighted Directed Spanning Trees
Assume we have a weight function for each arc in a multi-digraph
G = (V, A).
Define wij k 0 to be the weight of (i, j, k) A for a multi-digraph
Define the weight of directed spanning tree G0 of graph G as
X
w(G0 ) = wij k
(i,j,k)G0
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 6 / 24
Maximum Spanning Trees (MST)
Let T(G) be the set of all spanning trees for graph G
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 7 / 24
Maximum Spanning Trees (MST)
Let T(G) be the set of all spanning trees for graph G
The MST problem
Find the spanning tree G0 of the graph G that has the highest weight
X
G0 = arg max w(G0 ) = arg max wij k
G0 T(G) G0 T(G) (i,j,k)G0
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 7 / 24
Finding MST
Directed Graph
For each sentence x, define the directed graph Gx = (Vx , Ex ) given by
Vx = {x0 = root, x1 , . . . , xn }
Ex = {(i, j) : i , j, (i, j) [0 : n] [1 : n]}
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 8 / 24
Finding MST
Directed Graph
For each sentence x, define the directed graph Gx = (Vx , Ex ) given by
Vx = {x0 = root, x1 , . . . , xn }
Ex = {(i, j) : i , j, (i, j) [0 : n] [1 : n]}
Gx is a graph with
the sentence words and the dummy root symbol as vertices and
a directed edge between every pair of distinct words and
a directed edge from the root symbol to every word
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 8 / 24
Chu-Liu-Edmonds Algorithm
Chu-Liu-Edmonds Algorithm
Each vertex in the graph greedily selects the incoming edge with the
highest weight.
If a tree results, it must be a maximum spanning tree.
If not, there must be a cycle.
I Identify the cycle and contract it into a single vertex.
I Recalculate edge weights going into and out of the cycle.
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing September 27th - October 3rd, 2016 9 / 24
Chu-Liu-Edmonds Algorithm
x = John saw Mary
Build the directed graph
September 27th - October 3rd, 2016 10 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
Find the highest scoring incoming arc for each vertex
If this is a tree, then we have found MST.
September 27th - October 3rd, 2016 11 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
If not a tree, identify cycle and contract
Recalculate arc weights into and out-of cycle
September 27th - October 3rd, 2016 12 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
Outgoing arc weights
Equal to the max of outgoing arc over all vertices in cycle
e.g., John Mary is 3 and saw Mary is 30.
September 27th - October 3rd, 2016 13 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
Incoming arc weights
Equal to the weight of best spanning tree that includes head of incoming
arc and all nodes in cycle
root saw John is 40
root John saw is 29
September 27th - October 3rd, 2016 14 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
Calling the algorithm again on the contracted graph:
This is a tree and the MST for the contracted graph
Go back up the recursive call and reconstruct final graph
September 27th - October 3rd, 2016 15 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Chu-Liu-Edmonds Algorithm
The edge from wjs to Mary was from saw
The edge from root to wjs represented a tree from root to saw to John.
September 27th - October 3rd, 2016 16 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc weights as linear classifiers
wij k = w.f (i, j, k)
September 27th - October 3rd, 2016 17 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc weights as linear classifiers
wij k = w.f (i, j, k)
Arc weights are a linear combination of features of the arc f (i, j, k) and a
corresponding weight vector w
What arc features?
September 27th - October 3rd, 2016 17 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc Features f (i, j, k)
Features
Identities of the words wi and wj for a label lk
head = saw & dependent=with
September 27th - October 3rd, 2016 18 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc Features f (i, j, k)
Features
Part-of-speech tags of the words wi and wj for a label lk
head-pos = Verb & dependent-pos=Preposition
September 27th - October 3rd, 2016 19 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc Features f (i, j, k)
Features
Part-of-speech of words surrounding and between wi and wj
inbetween-pos = Noun
inbetween-pos = Adverb
dependent-pos-right = Pronoun
head-pos-left=Noun
September 27th - October 3rd, 2016 20 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc Features f (i, j, k)
Features
Number of words between wi and wj , and their orientation
arc-distance = 3
arc-direction = right
September 27th - October 3rd, 2016 21 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Arc Features f (i, j, k)
Features
Combinations
head-pos=Verb & dependent-pos=Preposition & arc-label=PP
head-pos=Verb & dependent=with & arc-distance=3
No limit : any feature over arc (i, j, k) or input x
September 27th - October 3rd, 2016 22 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Learning the parameters
Re-write the inference problem
X
G = arg max wij k
GT(Gx ) (i,j,k)G
X
= arg max w f (i, j, k)
GT(Gx ) (i,j,k)G
= arg max w f (G)
GT(Gx )
September 27th - October 3rd, 2016 23 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Learning the parameters
Re-write the inference problem
X
G = arg max wij k
GT(Gx ) (i,j,k)G
X
= arg max w f (i, j, k)
GT(Gx ) (i,j,k)G
= arg max w f (G)
GT(Gx )
Which can be plugged into learning algorithms
September 27th - October 3rd, 2016 23 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24
Inference-based Learning
|T|
Training data: T = {(xt , Gt )}t=1
1. w(0) = 0; i = 0
2. for n : 1..N
3. for t : 1..|T|
4. Let G0 = argmaxG0 w(i) .f (G0 )
5. if G0 , G
6. w(i+1) = w(i) + f (Gt ) f (G0 )
7. i = i+1
8. return wi
September 27th - October 3rd, 2016 24 /
Pawan Goyal (IIT Kharagpur) MST-based Dependency Parsing 24