0% found this document useful (0 votes)
152 views7 pages

Formal Grammar and Chomsky Hierarchy

This document discusses different types of language models, including grammar-based and statistical models. It focuses on formal grammar-based models, explaining the key components of a formal grammar including terminals, nonterminals, rules, and a start symbol. It then covers the Chomsky hierarchy, which categorizes formal grammars from Type-0 (recursively enumerable) to Type-3 (regular) based on the complexity of the rules. Higher types allow more complex patterns in rule right-hand sides. The distinction matters as more complex grammar types require more computational resources to parse.

Uploaded by

psn.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views7 pages

Formal Grammar and Chomsky Hierarchy

This document discusses different types of language models, including grammar-based and statistical models. It focuses on formal grammar-based models, explaining the key components of a formal grammar including terminals, nonterminals, rules, and a start symbol. It then covers the Chomsky hierarchy, which categorizes formal grammars from Type-0 (recursively enumerable) to Type-3 (regular) based on the complexity of the rules. Higher types allow more complex patterns in rule right-hand sides. The distinction matters as more complex grammar types require more computational resources to parse.

Uploaded by

psn.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Language models

Human Communication 1
Lecture 6
• Two kinds of language models
• Grammar-based models
• Statistical models
• We still look at grammar-based models
today

1 2

Grammar Formal grammar


• More technically, a formal grammar consists of
• a finite set of terminal symbols
• Symbols • a finite set of nonterminal symbols

• Rules • a set of rules (also called production rules)

• Procedure of rule application • with a left- and a right-handed side


• each consisting of a word
• a start symbol

3 4
Special symbols Terminology
• Alphabet: A set of (terminal and
nonterminal) symbols
• Formal grammars usually have two special • Word: A string of symbols from an alphabet
symbols (what we also called ‘sentence’)
• S: the start symbol • Grammar: A set of rules defined on a
• ε: the empty string (sometimes: λ) alphabet
• Language: The set of words defined by a
grammar

5 6

Representing formal
Formal definition
grammar
A grammar G =〈Φ, ∑, R, S〉consists of

1. An alphabet Φ of nonterminal symbols, • Nonterminals are usually represented by


upper-case letters {S, A, B}
2. An alphabet ∑ of terminal symbols,
• Terminals by lower case letters {a, b, c}
3. A set R ⊆ Γ*╳Γ* of rules (where Γ = Φ ∪ ∑),
• The start symbols by S
4. A start symbol S ∈ Φ.

7 8
Example Solution
• Grammar • Apply rewrite rules until the result contains only symbols
from the alphabet.
• Alphabet: a, b
• We can rewrite
• Start symbol: S
• S to aSb by replacing S with aSb (rule 1);
• Rules: • aSb to aaSbb (rule 1)

1. S → aSb • aSb to abab (rule 2)

2. S → ba • For example: S → aSb → aaSbb → aababb


• The language of this grammar consists of the words
• What words are covered by this grammar? anbabn (where n are 0 or more occurrences of the
symbol, but the number of a’s and b’s is the same)
9 10

Chomsky hierarchy Type-3: regular

• 4 types of grammars (Type-0 to Type-3) • LHS: 1 nonterminal


• Type-0: recursively enumerable • RHS: 1 terminal and 0 or 1 nonterminals
• Type-1: context sensitive • Pattern:
• Type-2: context free (CFG) N→t
N1 → t N2 OR N1→ N2 t
• Type-3: regular
11 12
Type-2: context free Type-1: context
(CFG) sensitive
• LHS: 1 nonterminal • LHS: at least 1 nonterminal
• RHS: terminals and nonterminals • RHS: terminals and nonterminals
• Pattern: • Pattern:
αNβ → αγβ
N→γ
(where N is a nonterminal; α, β, γ are
(where N is a nonterminal; γ is a string of
strings of terminals and nonterminals; γ is
terminals and nonterminals)
not empty)

13 14

Type-1: context Type-0: recursively


sensitive enumerable
• All grammars and languages
• (Those that can be recognised by a Turing
• αNβ → αγβ Machine.)

• α and β are the context in which N can be • Pattern:


replaced by γ α→β
(where α and β are any string of terminals
and nonterminals, including the empty
string)

15 16
Chomsky hierarchy What type?
Type-0 · recursively enumerable S → aS
Type-1 · context sensitive S → abS
S→c
Type-2 · context free

Type-3 · regular

17 18

What type? What type?


S → aS S → aS
S → abS S → abS
S → Sb S → Bb
S→c B→ε
S → cngw

19 20
What type? What type?
S → aS S → aS
S → aABb S → aABb
A → Aa aA → Aa
A→a Aa → a
B → Bb B → Bb
B→b B→b

21 22

What type? Resource


S → aS
S → aABb
• The Wikipedia page on Chomsky
aAb → Aa Hierarchies is a good starting point for
formal grammars:
Aa → a http://en.wikipedia.org/wiki/
B → Bb Chomsky_grammar
Bb → b

23 24
Why the distinction? Human grammar
• Why is this distinction relevant?
• What type of grammar is human grammar?
• Answer: different computational complexity
• Probably in between context free and context
• This means: different amount of resources sensitive: mildly context-sensitive grammars
(MCSG)
needed
• This means in particular: different execution • MCSGs
times
• allow certain kinds of context dependencies
• Generally: The simpler the grammar type, the
• have low computational complexity (they
faster the parsing and generation
have polynomial complexity)
25 26

Parsing algorithms What is the problem?


• In addition to complexity of the grammar, • The simple parsing models have difficulties
there are also different parsing algorithms
• Parallel instead of serial processing: • Serial model requires too much time
If multiple rules apply, investigate all • Parallel models requires too much
possibilities at once. storage
• Problem: • Solution: Often a combination of serial and
A large number of possible analyses must be parallel parsing as well as top-down and
stored (probably exponentially many: length- bottom-up is used
of-sentencesome-constant).
27 28

You might also like