0% found this document useful (0 votes)
16 views51 pages

05 Context-Free

The document discusses context-free languages, focusing on context-free grammars, parsing, and ambiguity. It defines context-free grammars, provides examples, and explains derivations, including leftmost and rightmost derivations, as well as derivation trees. Additionally, it addresses parsing methods and the challenges of exhaustive search parsing, including efficiency and termination issues.

Uploaded by

ghmpersonal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views51 pages

05 Context-Free

The document discusses context-free languages, focusing on context-free grammars, parsing, and ambiguity. It defines context-free grammars, provides examples, and explains derivations, including leftmost and rightmost derivations, as well as derivation trees. Additionally, it addresses parsing methods and the challenges of exhaustive search parsing, including efficiency and termination issues.

Uploaded by

ghmpersonal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Context-Free Languages

Week 5

Formal Language and Automata Theory, SeoulTech

1
Topics

• Context-Free Grammars
• Parsing and Ambiguity
• Context-Free Grammars and Programming Languages

2
Context-Free Grammars

3
Context-Free Grammars
Definition 5.1

• A grammar G = (V, T, S, P) is said to be context-free if all productions in P


have the form

• A → x,
• where A ∈ V and x ∈ (V ∪ T)*.
• A language L is said to be context-free iff. there is a context-free grammar G
s.t. L = L(G).

4
Context-Free Grammars
Example 5.1

• The grammar G = ({S}, {a, b}, S, P), with productions


• S → aSa, S → bSb, S → λ.
• is context-free.
• Typical derivation in this grammar is
• S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa.
R
• L(G) = {ww : w ∈ {a, b}*}.

5
Context-Free Grammars
Example 5.2

• The grammar G, with productions


• S → abB,
• B → bbAa,
• A → aaBb, a b B

• A → λ.
• is context-free.
n n
• L(G) = {ab(bbaa) bba(ba) : n ≥ 0}.
6
Context-Free Grammars
Example 5.3

n m
• The language L = {a b : n ≠ m} is context-free.
• To show this, we can produce a context-free grammar. (Def. 5.1)
• If n ≠ m, then either n > m or n < m.
• On top of the same number of a's and b's.
• For n > m, we need to add extra a's on the left, and for n < m, extra b's
on the right.

• We will construct productions for each case, and combine them together.
7
Context-Free Grammars
Example 5.3

• For n > m, • For n < m,


• S → AS1, • S → S2B,
• S1 → aS1b | λ, • S2 → aS2b | λ,
• A → aA | a. • B → bB | b.
• S → AS1 | S1B, S1 → aS1b | λ, A → aA | a, B → bB | b.
• The resulting grammar is context-free, but it is not linear. Can we find a linear
grammar for it? (Exercise 28, Section 5.1)
8
Context-Free Grammars
Example 5.4

• Consider the grammar with productions


• S → aSb | SS | λ.
• This grammar is context-free, but not linear.
• L(G) = {w ∈ {a, b}* : na(w) = nb(w) and na(v) ≥ nb(v),
where v is any prefix of w}. (5.1)

• Replace a, b with parentheses: S → (S) | SS | λ.


• The language contains strings such as (()), ()(), which represent properly nested
parentheses in programming languages.
9
Leftmost and Rightmost Derivations

• For non-linear grammars, a derivation may have sentential forms with more than one
variable.

• e.g.) S ⇒ AB ⇒ aaB ⇒ aab.


• We have a choice that which variable is replaced first.
• For instance, the grammar G = ({A, B, S}, {a, b}, S, P) with productions
• S → AB,
• A → aaA, A → λ,
• B → Bb, B → λ.
10
Leftmost and Rightmost Derivations

• (1) S → AB,
• (2) A → aaA, (3) A → λ,
• (4) B → Bb, (5) B → λ.
2n m
• The grammar generates the language L(G) = {a b : n ≥ 0,m ≥ 0}.
1 2 3 4 5
• S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab.
1 4 2 5 3
• S ⇒ AB ⇒ ABb ⇒ aaABb ⇒ aaAb ⇒ aab.
11
Leftmost and Rightmost Derivations
Definition 5.2

• A derivation is said to be leftmost if in each step the leftmost variable in the


sentential form is replaced.

• If in each step the rightmost variable is replaced, we call the derivation


rightmost.

12
Leftmost and Rightmost Derivations
Example 5.5

• Consider the grammar with productions


• S → aAB,
• A → bBb,
• B → A | λ.
• For string abbbb,
• S ⇒ aAB ⇒ abBbB ⇒ abAbB ⇒ abbBbbB ⇒ abbbbB ⇒ abbbb (leftmost).
• S ⇒ aAB ⇒ aA ⇒ abBb ⇒ abBb ⇒ abAb ⇒ abbBbb ⇒ abbbb (rightmost).
13
Leftmost and Rightmost Derivations
Example 5.5

• Remind that there exist more than one derivation for the same string.
• To complete derivation, you have to be careful about the choices of
productions.

• S → aAB, A → bBb, B → A | λ.
• For string abbbb, two leftmost derivations
• S ⇒ aAB ⇒ abBbB ⇒ abAbB ⇒ abbBbbB ⇒ abbbbB ⇒ abbbb,
• S ⇒ aAB ⇒ abBbB ⇒ abbB ⇒ abbA ⇒ abbbBb ⇒ abbbb.
14
Derivation Trees Figure 5.1

• A Derivation Tree or Parse Tree.


• A way of showing derivations, independent of the order in which
productions are used.

• It is an ordered tree in which nodes are labeled with the left side of productions,
and in which the children of a node represent its corresponding right sides.

• A → abABc.
• Beginning with the root (start symbol) and ending in leaf nodes (terminals), it
shows how each variable is replaced in the derivation.

15
Derivation Trees
Definition 5.3

• Let G = (V, T, S, P) be a context-free grammar. An ordered tree is a derivation tree for G iff. it has the following
properties.

1. The root is labeled S.

2. Every leaf has a label from T ∪ {λ}.


3. Every interior vertex (a vertex that is not a leaf) has a label from V.

4. If a vertex has label A ∈ V, and its children are labeled (from left to right) a1, a2, ⋯, an, then P must contain
a production of the form A → a1a2⋯an.

5. A leaf labeled λ has no siblings, that is, a vertex with a child labeled λ can have no other children.

• A tree w/ properties 3, 4, 5, but not necessarily 1 and with '2a. Every leaf has a label from V ∪ T ∪ {λ}',
• is said to be a partial derivation tree.
16
Derivation Trees Figure 5.2

Example 5.6

• Consider the grammar G, with productions


• S → aAB,
• A → bBb,
• B → A | λ.
• The yield of a tree is the string of symbols
on leaves from left to right,

• in the order of visited during depth-first


traversal,

• taking the leftmost unexplored branch.


Figure 5.3

17
Relation btw. Sentential Forms and Derivation Trees
Theorem 5.1

• Let G = (V, T, S, P) be a context-free grammar.


• Then for every w ∈ L(G), there exists a derivation tree of G whose yield is w.
• Conversely, the yield of any derivation tree is in L(G).
• Also, if tG is any partial derivation tree for G whose root is labeled S, then the
yield of tG is a sentential form of G.

18
Relation btw. Sentential Forms and Derivation Trees
Theorem 5.1

• Proof:
• A sentential form ➞ a partial derivation tree.
• Proof by induction on the number of steps in the derivation.
• (Basis) the claim is true for one step derivation.
• S ⇒ u indicates S → u in G.
• We have a partial derivation tree for a production (property 4).

19
Relation btw. Sentential Forms and Derivation Trees
Theorem 5.1

• (Inductive Assumption) For every sentential form with n steps, there is a corresponding partial
derivation tree.

• (Inductive Step) Now consider any w derivable in n + 1 steps. Then


• S ⇒* xAy, x, y ∈ (V ∪ T)*, A ∈ V, in n steps, and
• xAy ⇒ xa1a2⋯am y = w, ai ∈ V ∪ T.
• By the inductive assumption, there is a partial derivation tree with yield xAy.
• Also, there must be a production A → a1a2⋯am.
• Hence we can expand the leaf labeled A to get a partial derivation tree with yield xa1a2⋯am y.
20
Relation btw. Sentential Forms and Derivation Trees
Theorem 5.1

• A partial derivation tree ➞ a sentential form.


• (Basis) A partial derivation tree of height 1.
• There must be a production in which the root on the left side and all the children (yield) on
the right side.

• (Inductive Assumption) For every partial derivation tree with height n, there exists a
corresponding sequential form identical to the yield of the tree.

• (Inductive Step) Now think about a partial derivation tree with height n + 1.
• Suppose the yield of this tree is of the form x1x2⋯xia1a2⋯ak y1y2⋯yj, where ai are the
symbols from leaf nodes at level n.

21
Relation btw. Sentential Forms and Derivation Trees
Theorem 5.1

• The yield of this tree is of the form x1x2⋯xia1a2⋯ak y1y2⋯yj, where ai are the symbols from
leaf nodes at level n.

• Now let p1, p2, ⋯pt are the symbols of the parent nodes having leaves with ai's as their
children.

• We can replace a1a2⋯ak with p1p2⋯pt, to obtain a yield of a partial derivation tree with height
n.
• By definition, there must be productions which have p's on the left side, and a's on the right
side.

• The resulting string of applying productions on a sentential form is also a sentential form, which
proves the claim by induction.

22
Parsing and Ambiguity

23
Parsing and Membership

• Given a string w of terminals, decide whether or not w is in L(G).


• If so, can we find a derivation of w?
• An algorithm which answers the first question is a membership algorithm.
• For the second, parsing describes finding a sequence of productions by
which a w ∈ L(G) is derived.

24
Parsing and Membership

• Given a string w in L(G), we can parse it in obvious fashion:


• Construct all possible (leftmost) derivations of the grammar, and see whether any of
them match w.

• Specifically, we start at round one by looking at all productions of the form


• S→x
• finding all x that can be derived from S in one step.
• If none of these results match to w, we can go next round, applying all applicable
productions on the leftmost variable of every x.

25
Parsing and Membership

• If w ∈ L(G), then it must have a leftmost derivation of finite length (Exercise


27, Section 5.1).

• Since we have a derivation tree for w, we can always traverse the tree to
obtain derivation within finite steps.

• We will call this exhaustive search parsing or brute force parsing.


• This is a form of top-down parsing, which we can view as the construction of
a derivation tree from the root down.

26
Exhaustive Search Parsing
Example 5.7

• Consider the grammar S → SS | aSb | bSa | λ, and the string w = aabb.


• Is w in the language generated by the grammar?
• Round one gives us S ⇒ SS, S ⇒ aSb, S ⇒ bSa, S ⇒ λ.
• Round two yields the following by replacing the leftmost S.
• S ⇒ SS ⇒ SSS, S ⇒ SS ⇒ aSbS, S ⇒ SS ⇒ bSaS, S ⇒ SS ⇒ S,
• S ⇒ aSb ⇒ aSSb, S ⇒ aSb ⇒ aaSbb, S ⇒ aSb ⇒ abSab, S ⇒ aSb ⇒ ab, ...
• On the third round, we find the actual derivation for the string.
• S ⇒ aSb ⇒ aaSbb ⇒ aabb.
• We found a derivation, hence w is in the language generated by the grammar.
27
Exhaustive Search Parsing

• Exhaustive search parsing has two serious problems.


• First, it is not efficient.
• More importantly, it may never terminate for a string not in a given language.
• e.g.) S → SS | aSb | bSa | λ, w = abb
• The non-termination problem can be resolved if we restrict the form of productions.
• e.g.) S → λ decreases the length of sentential forms in the previous example.
• S → A will provide no length change in sentential forms.
• We may inhibit such productions in the grammar.
28
Exhaustive Search Parsing
Example 5.8

• The grammar S → SS | aSb | bSa | ab | ba satisfies the given requirements.


• It generates the language in Example 5.7 w/o the empty string.
+
• Given any w ∈ {a, b} , the exhaustive search parsing method will always
terminate in no more than | w | rounds.

• The length of the sentential form grows by at least one symbol in each
round.

• After | w | rounds, we either have a parsing or w ∉ L(G).

29
Exhaustive Search Parsing
Theorem 5.2

• Suppose that G = (V, T, S, P) is a context-free grammar that does not have


rules of the form

• A → λ or A → B,
• where A, B ∈ V.
• Then the exhaustive search parsing method can be made into an algorithm
that, for any w ∈ Σ*, either produces a parsing of w or tells us that no
parsing is possible.

30
Exhaustive Search Parsing
Theorem 5.2

• Proof:
• For each sentential form, consider both its length and the number of terminal symbols.
• Each step in the derivation increases at least one of these.
• Since neither the length of a sentential form nor the number of terminal symbols can exceed | w | ,
• if it exceeds | w | , there is no way to reduce it, cause we don't have A → λ.
• a derivation cannot involve more than 2 | w | rounds, at which time we either have a successful
parsing or w cannot be generated by the grammar.

• Cause in each step either we have one more terminal symbol (at most | w | ), or the length
increased by one (at most | w | times).

31
Exhaustive Search Parsing
Efficiency

• How many sentential forms are generated?


• We may have a rough upper bound.
• Let's consider leftmost derivations only.
• We can have no more than | P | sentential forms after one round.
2
• Next round, no more than | P | , and so on.
• Parsing cannot involve more than 2 | w | rounds.
2 2|w| 2|w|+1
• M = |P| + |P| + ⋯ + |P| = O( | P | ).
32
Exhaustive Search Parsing
Theorem 5.3

• For every context-free grammar there exists an algorithm that parses any w ∈ L(G) in a
3
number of steps proportional to | w | (CYK algorithm).

• This is not efficient enough for real-use.


• What we want is a linear time parsing method, which is proportional to | w | .
• There is no such method for all context-free languages in general.
• However, we can do it better with bottom-up parsing algorithms, for a certain portion of
context-free languages.

• e.g.) a LR parser has O(n) time complexity for parsing, but details are beyond this
course.
33
Simple Grammars
Definition 5.4

• A context-free grammar G = (V, T, S, P) is said to be a simple grammar or


s-grammar if all its productions are of the form

• A → ax,
• where A ∈ V, a ∈ T, x ∈ V*, and any pair (A, a) occurs at most once in
P.
• Basically it indicates that there is only one production to replace a certain
variable to a certain terminal symbol.

• Also, the condition is quite similar to linear grammar.


34
Simple Grammars
Example 5.9

• The grammar
• S → aS | bSS | c
• is an s-grammar.
• The grammar
• S → aS | bSS | aSS | c
• is not an s-grammar.
• The pair (S, a) occurs twice in S → aS and S → aSS.
35
Simple Grammars

• If G is an s-grammar, then any string w in L(G) can be parsed with an effort proportional to | w | .
• Consider the string w = a1a2⋯an, parsing it with exhaustive search method.
• We have only one choice for (S, a1), hence a leftmost derivation must begin with
• S ⇒ a1A1A2⋯Am.
• Next, we replace A1, and again since we have only one choice for A1,
• S ⇒* a1a2B1B2⋯A2⋯Am.
• From these, we can see that each step produces only one terminal symbol and the whole process
must be completed in no more than | w | steps.

36
Ambiguity
Definition 5.5

• A context-free grammar G is said to be ambiguous if there exists some


w ∈ L(G) that has at least two distinct derivation trees.
• Alternatively, ambiguity implies the existence of two or more leftmost or
rightmost derivations.

37
Ambiguity
Example 5.10

• The grammar in Example 5.4, with productions S → aSb | SS | λ, is


ambiguous.

• The sentence aabb has the two derivation trees shown in Figure 5.4.
Figure 5.4

38
Ambiguity
Example 5.11

• Consider the grammar G = (V, T, E, P) with


• V = {E, I}, T = {a, b, c, + , * , (, )},
• and productions
• E → I,
• E → E + E | E * E | (E),
• I → a | b | c.

39
Ambiguity
Example 5.11

a + b*c
• The strings (a + b) * c and
a * b + c are in L(G).
• The grammar generates a
restricted subset of arithmetic
expression in PLs.

• The grammar is ambiguous.


• We have two derivation trees
for a + b * c.
Figure 5.5

40
Resolving Ambiguity

• To resolve ambiguity, programming languages often employ precedence rules.


• e.g.) * before +.
• However, this solution is completely outside of the grammar.
• Syntactic analysis doesn't provide the answer.
• We can rewrite the grammar,
• so that only one parsing tree is possible.

41
Resolving Ambiguity
Example 5.12

• To rewrite the grammar in Example 5.11, we introduce new variables,


• taking V as {E, T, F, I},
• and replacing the productions with
• E → T | E + T,
• T → F | T * F,
• F → I | (E),
• I → a | b | c.
42
Resolving Ambiguity
Example 5.12
a + b*c

• E → T | E + T,
• T → F | T * F,
• F → I | (E),
• I → a | b | c.
• The grammar is unambiguous, and equivalent to
the grammar in Example 5.11.
Figure 5.6
• However, it is not easy to show that two context-
free grammars are equivalent in general.
43
Ambiguous and Unambiguous Languages
Definition 5.6

• If L is a context-free language for which there exists an unambiguous


grammar, then L is said to be unambiguous.

• If every grammar that generates L is ambiguous, then the language is called


inherently ambiguous.

44
Inherently Ambiguous Languages
Example 5.13

n n m n m m
• The language L = {a b c } ∪ {a b c }
• with n and m nonnegative, is an inherently ambiguous context-free language.
• It's easy to show that L is context-free.
• L = L1 ∪ L2, where L1 is generated by
• S1 → S1c | A, A → aAb | λ, and
• L2 is generated by
• S2 → aS2 | B, B → bBc | λ.
• Then we can add one more production S → S1 | S2 to combine them together.
45
Inherently Ambiguous Languages
Example 5.13

• The grammar is ambiguous.


n n n
• The string a b c has two distinct derivations.
• One starting with S ⇒ S1, the other with S ⇒ S2.
n n m n m m
• L = {a b c } ∪ {a b c }
• It does not imply that L is inherently ambiguous, since there might be an unambiguous
grammar for it.

• However, there are conflicting requirements on a's and b's in L1, and b's and c's in L2.
• It's impossible to satisfy both with a single set of rules that cover n = m uniquely.
46
Context-Free Grammars and
Programming Languages

47
Context-Free Grammars and PLs

• One of the most important uses of the formal language theory is in


• the definition of programming languages, and
• the construction of interpreters and compilers for them.
• We define a programming language precisely,
• and use it to write efficient, reliable translation programs.

48
Backus Naur Form

• Backus Naur Form (BNF) is a convention for specifying grammars for


programming languages.

• Variables are enclosed in < >.


• Terminal symbols are written w/o special marking.
• ::= for →.
• <expression> ::= <term> | <expression> + <term>,

• <term> ::= <factor> | <term> * <factor>, and so on.

49
Context-Free Grammars and PLs

• Many parts of C-like programming languages are susceptible to definition by restricted


forms of context-free grammars.

• <while_statement> ::= while <expression><statement>.

• This form is similar to s-grammar, which makes it easier to be parsed.


• However, not all features of PLs can be expressed by s-grammar.
• Those aspects which can be modeled by a context-free grammar are usually referred as
syntax of a programming language.

• https://docs.python.org/3/reference/grammar.html
• https://spec.dart.dev/DartLangSpecDraft.pdf
50
Summary

• You should remember what is a context-free grammar/language or s-grammar,


• and distinguish such grammars and languages.
• You need to get familiar with BNF, and how you can represent grammars with it.
• You should understand the meaning of ambiguous, unambiguous, and inherently
ambiguous.

• What is parsing and leftmost/rightmost derivation.


• Exhaustive search method
• How to apply and what are the issues?
51

You might also like