CYK PARSING
• CYK algorithm is a parsing algorithm for context free grammar.
• It was independently developed by three Russian scientists
named Cocke, Younger, and Kasami, hence the name CYK!
• t is used to check if a particular string can be derived from the language
generated by a given grammar. It is also called the membership algorithm as it
tells whether the given string is a member of the given grammar or not.
• To apply CYK algorithm to a grammar, it must be in Chomsky Normal Form.
• It uses a dynamic programming algorithm to tell whether a string is in the
language of a grammar.
Basics
In a CYK algorithm, the structure of grammar should be in Chomsky normal form.
In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.
CNF
• Chomsky Normal Form (CNF) is a way to simplify context-free grammars (CFGs)
so that all production rules follow specific patterns.
• In CNF, each rule either produces two non-terminal symbols, a single terminal
symbol, or, in some cases, the empty string.
• Converting a CFG to CNF is an important step in many parsing algorithms, like
the CYK algorithm, and helps in understanding the structure of languages.
CNF Rules
• It has starting symbol
• It has LHS and RHS
• LHS will have non-terminal
• RHS can contain:
▪ Single terminal
▪ Two non-terminal
• RHS should not contain:
▪ Combination of terminal and non-terminal
▪ Single non-terminal
The grammar will be in CNF if each rule has one of the following forms:
• A→BC (at most two variables on the right-hand side)
• A→ a (a single terminal on the right-hand side)
• S→Ø (null string)
If the given Grammar is not in the CNF form, convert it to the CNF form before applying
the CYK Algorithm.
Step 1:
Original Grammar
S → NP VP
NP → Det N | N
VP → V NP
Det → 'the' | 'a'
N → 'cat' | 'dog'
V → 'chased'
Step 2: Convert to CNF
S → NP VP
NP → Det N
VP → V NP
Det → 'the' | 'a'
N → 'cat' | 'dog'
V → 'chased'
The above is already in CNF form
CYK Parsing Table Construction
We construct a CYK table where row i, column j represents the non-terminals that
can derive the substring from position iii to jjj.
Sentence: "The cat chased a dog"
Count the number of words in the sentence and construct a matrix accordingly.
i.e., 5 x 5 matrix (rows from 1 to 5 and columns from 0 to 4)
0 the 1 cat 2 chased 3 a 4 dog 5
1 2 3 4 5
[0,1] [1,2]
0 Det
1 N
2
3
4
Check the CNF rule and find if there are any non-terminal contains Det and V;
Yes, we have :
NP → Det N
1 2 3 4 5
[0,1] [1,2]
0 Det NP
1
N
2
Next, we have ‘chased’ comes in between 2 and 3
1 2 3 4 5
[0,1] [1,2] [2,3]
0 Det NP
1
N
2 V
4
Next , we have ‘a’ which occurs between 3 and 4 and similarly ‘dog’ occurs between 4
and 5:
1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]
0 Det NP
1
N
2 V
3 Det NP
4
N
Finally go up one by one, V and NP is there in the rule, so we can combine
1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]
0 Det NP
1
N
2 V VP
3 Det NP
4
N
Now, check VP and NP in the rule, which is there in S→ NP VP
1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]
0 Det NP S
1
N
2 V VP
3 Det NP
4
N
Finally S is obtained, so can end here.
Advantages:
• Guaranteed Completeness – Ensures all valid parses are found.
• Efficient (O(n³) Complexity) – Uses dynamic programming for faster parsing.
• Handles Ambiguous Grammars – Finds all possible parses for a sentence.
• Supports Probabilistic Parsing (PCFGs) – Determines the most probable
parse.
• Foundation for Modern Parsers – Used in NLP and machine learning models.
• Useful in Compiler Design – Helps in syntax analysis and syntax tree
generation.
• Works Well for Large Sentences – More efficient than top-down approaches.