0% found this document useful (0 votes)
555 views16 pages

UNIT - III - Context Free Grammars

The document describes context free grammars (CFGs). It defines the key components of a CFG as a finite set of non-terminal symbols (Vn), terminal symbols (Vt), a start symbol (S), and a finite set of productions (P). Productions have the form A->β where A is a non-terminal and β is a string of terminals and non-terminals. Several examples are provided of designing CFGs to generate specific languages.

Uploaded by

stalin1227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
555 views16 pages

UNIT - III - Context Free Grammars

The document describes context free grammars (CFGs). It defines the key components of a CFG as a finite set of non-terminal symbols (Vn), terminal symbols (Vt), a start symbol (S), and a finite set of productions (P). Productions have the form A->β where A is a non-terminal and β is a string of terminals and non-terminals. Several examples are provided of designing CFGs to generate specific languages.

Uploaded by

stalin1227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT–III: Context Free Grammars

Formal Languages, Grammars


All we know that grammar is nothing but the set of rules to define valid sentences in
anylanguage,which generates context-freegrammar-language
Context free languages have great practical significance in defining programming language and in
simplifying the translation for programming language.

The original motivation for grammars was the description of natural languages. We can write
rules for the grammar of natural languages as follows:

<sentence> -> <nounpharse><verbpharse>


<nounpharse> -> <article><noun>

According to above set of rules “The girl eats” is valid sentence.

It can be easily observed that,some words in the grammar works as terminator for the sentence,
these symbols are called terminal symbol. Rest of the symbols of the vocabulary are called non-
terminals. So it is clear that terminals and non-terminals form vocabulary for any language.

Let us now formal is ethe idea of agrammar and how it is used.There are four important
components in a grammatical description of a language:

1. There is a finite set of symbols that form the strings of the language being defined.We
call these alphabets terminal symbols, represented by Vt.

2. There is a finite set of variables, also called some times non-terminals or syntactic
categories.Each variable represents a language;i.e.,a set of strings, represented byVn.

3. One of the variables represents the language being defined;it is called the start sym-
bol.Generally it is denoted by S.

4. There is a finiteset of rules or productions that represents the recursive


definitination of a language.Each production consists of:

a) Avariablethatbeing(partially)definedbytheproduction.This
variableisoftencalledheadoftheproduction.

b) The production symbol->

1|Page
c) Astringofzeroormoreterminalsandvariables.Thisstring,
calledthebodyoftheproduction,representsonewaytoform
stringsinthelanguageofthevariableofthehead.So,weleave
terminalsunchangedandsubstituteforeachvariableofthebody
andastringthatknowntobeinthelanguageofthatvariables.

Context Free Grammar


Mathematically Context-Free Grammar is defined as follows:
“AgrammarG=(Vn,Vt,P,S)issaidtobecontext-free” Where
Vn:Afinitesetofnon-terminals,generallyrepresentedbycapitalletters, like A,B,C,D,......
Vt:Afinitesetofterminals,generallyrepresentedbysmallletterslike a,b,c,d,.......
S:Startingnon-terminal,calledstartsymbolofthegrammar.SbelongtoVn. P: Set of rules or
productions in CFG.
G is context-free and all productions in P have the form
a->þ
aϵVn and þϵ(VtUVn)*
where
Every regular grammar is context-free, so a regular language is also a context-
freeone.Itisalreadyprovedbypumpinglemmathatlanguage{anbn/n>0}isnotregular,butitispossibletod
esignacontext-freegrammar for these language. So now it is very clear that “the family of regular
languageisapropersubsetofthefamilyofcontext-freelanguage”

Example: Terminal: a Non-


terminal: S
Production: S->As
S->ε
Is a simple CFG that defines L(G)=a* where Vn={S} Vt={A}

Example:ConsideragrammarG=(Vn,Vt,P,S)whereVn={S},Vt={a,b}andsetof
productionPisgivenbyP={S->aSb,S->ab}.

Solution:Here S is the only non-terminal which is the starting symbol for the
grammar;‘a’and‘b’areterminals.TherearetwoproductionsS->aSb and S->ab .Now we will show
how the strings a2b2 can be derived.
S=>aSb
=>aabb
=>a2b2
Here we need to apply the first production then second production.
By applying first production n-1times,followed by an application of second production, we get
S=>aSb
=>aaSbb
=>:
=>:
=>a Sbn-1
n-1
=>anbn
HerewecansaythatlanguagefortheabovegrammarisL(G)={anbn/n>=1}

Example:FollowingisaCFGforthelanguageL={wcw R/Wϵ(a,b)*}
Solution:LetGbeCFGforlanguageL={wcwR/Wϵ(a,b)*} Here
Vn={S}
Vt={a,b,c}
And P is given by
S->aSa
S->bSb
S->c
LetuscheckthatabbcbbacanbederivedfromtheCFG.
S=>aSa (usethes->aSa)
=>abSba (usetheS->bSb)
=>abbSbba (usetheS->bSb)
=>abbcbba (use theS->c)

Example: write a CFG, which generates palindrome for binary numbers.

Solution:Grammarwillgeneratepalindromeforbinarynumbers,thatis
0,1,00,11,010,11,101,11100111,.....

LetCFGbeG=(Vn,Vt,P,S)Vn=setofnon-terminal={S},Vt=setofterminals{0,1}
andproductionrulePisdefinedas

S->0S0/1S1
S->0/1/ε
Obviouslythisgrammargeneratespalindromeforbinarynumbers,itcanbeseen by the
followingderivation.
S=>0S0 (usetheS->0S0)
=>01S10(usetheS->1S1)
=>010S010 (use theS->0S0)
=>0101010 (use thes->1)

Example: write a CFG for the regular expression


r= 0*1(0+1)*
Solution:Clearlyregularexpressionisthesetofstringswhichstartswith
anynumberof0's,followedbyaoneandendwithanycombinationof0'sand 1's.

LetCFGbeG=(Vn,Vt,P,S)whereVn={S,A,B},Vt={0,1}andsetofproductionPis defined as
S-->A1B
A-->0A|ϵ
B-->0B|1B|ϵ
Letusseethederivationofthestring00101
S=>A1B
=>0A10B
=>00A101B
=>00A101
So clearly G is CFG for regular expression r.

Example: Design CFG for the regular expression


r= (a+b)*aa(a+b)*
Solution:LetCFGbeG=(Vn,Vt,P,S) whereVn={S,T},Vt={a,b}ProductionPare definedas
S->TaaT
T ->aT
T->bT
T->ϵ
Lastthreeproductioni.e.,P2,P3andP4allowsustogenerateanywordwe wantfromterminalT.Ifthenon-
terminalTappearsinanyworkingstring,we can apply productions to turn in to any string.

For example to generate baabaab, we can proceed as follows:


S=>TaaT
=>bTaaT
=>baTaaT
=>baaTaaT
=>baabTaaT
=>baabϵaaT
=>baabaaT
=>baabaabT
=>baabaabϵ
=>baabaab
There are other sequences that can also derive the word baabaab.

Example: Design CFG for theLanguage


L={anbm:n≠m}
Solution:Ifn≠mthenthereareonlytwocasesarepossible.
Case 1: n>m
Let us say the language L on condition n>m is L={anbm:n>m}.
LetussayG1betheCFGforthelanguageL1,G1=(V1n,V1t,P1,S1) where
V1n={S1,A,S1},V1t={a,b}ProductionParedefinedas
S1->AS1
S1->aS1b|ϵ A-
>aA|a
Case 2: n<m
Let us say the language L on condition n>m is L2={anbm:n<m}.
LetussayG2betheCFGforthelanguageL2,G2=(V2n,V2t,P2,S2) where
2 2 2
V n={S2,B,S },V t={a,b}ProductionParedefinedas
S2->S2B
S1->aS2b|ϵ A-
>bB|b
bythehelp(bycombiningG1andG2)ofG1andG2wecanwritetheCFGforL asS-->S1|S2
Where S is the start symbol of CFG for L.

Example: Design CFG for theLanguage


L={0n1n|nS0}U{1n0n|nS0}
Solution:WecanassureLas L =
L1UL2
L1={0n1n|nS0}
Let say G1be CFG for the language L1
G1=(V1n,V1t,P1,S1) whereV1n={S1},V1t={0,1}ProductionParedefinedas
S1-->0S11|ϵ
n n
and L2={1 0 |nS0}
Let say G2be CFG for the language L2

LetsayG2=(V2n,V2t,P2,S2) whereV2n={S2},V2t={0,1}ProductionParedefinedas
S2-->1S20|ϵ
SinceL=L1UL2,supposeGbeCFGforlanguageL,withstartingnon-terminalS. Then productions in G
willbe
S->S1|S2
S1->0S11|ϵ S2-
>1S20|ϵ
Example:DesignCFGforΣ={a,b}thatgeneratesthesetof
a) all strings with exactly onea.
b) allstringswithatleastonea.
c) allstringswithatleastthreea's.
Solution:
a) Let CFG be XG1
G1=(Vn,Vt,P,S) whereVn={S,A},Vt={a,b}ProductionParedefinedas S-->AaA
A-->bA|ϵ
Let us derive a stringbbaab
S=>AaA
=>bAaA
=>bbAaA
=>bbAabA
=>bbAabϵ
howwedon'thaveanychoiceforderivesecondGsostringbbaabcannot be derive from G1
b) Let CFG be XG2
G2=(Vn,Vt,P,S) whereVn={S,A},Vt={a,b}ProductionParedefinedas
S->AaA A-
>aA|bA|ϵ
Letusderivestringbaab
S=>AaA
=>bAaA
=>baAaA
=>baϵAaA
=>baaA
=>baabA
=>baabϵ
=>baab

c) Let CFG be XG3


G3=(Vn,Vt,P,S) whereVn={S,A},Vt={a,b}ProductionParedefinedas
S-->AaAaAaA
A-->aA|bA|ϵ

Example:WriteaCFGforlanguageL={0i1j2k|kSiorkSj}
Solution:LetusassumeL=L1UL2whereL1={0
i j k
1 2 |kSi}
L2={0i1j2k|kSj}
LetusassumethatCFGforL1={0i1j2k|kSi} Let say G1be
CFG for the language L1
G1=(Vn,Vt,P,S) whereVn={S1,X,Y},Vt={0,1}ProductionP1isdefinedas
S1->0S22|0Y2
Y->C|0Y|ϵ C-
>1C|ϵ
Let us assume that CFG for L2is G2
G2=(Vn,Vt,P,S) whereVn={S2,A,B},Vt={0,1}ProductionP2isdefinedas
S2->AB
A->0A|ε
B->1B2|1B|ϵ
WecandefineCFGforL(SinceL=L1UL2) Let us
assume that CFG for L is G
G=(Vn,Vt,P,S) whereVn={S1,S2,A,B,Y},Vt={0,1}ProductionP2isdefinedas
S->S1|S2S1-
>0S22|0Y2
Y->C|0Y|ϵ
C->1C|ϵ
S2->AB
A->0A|ε
B->1B2|1B|ϵ
Example: Write the CFG for the language L = {0i1j2k|i=j or j=k}
Solution:LetusassumeL=L1UL2where
L1={0i1j2k| i=j }
L2={0i1j2k| j=k }
LetusassumethatCFGforL1={0i1j2k|i=j} Let say G1be
CFG for the language L1
G1=(Vn,Vt,P,S) whereVn={S1,A,B},Vt={0,1,2}ProductionP1isdefinedas
S1->AB
A->0A1|ϵ B-
>2B|ϵ
Let us assume that CFG for L2is G2
G2=(Vn,Vt,P,S) whereVn={S2,C,D},Vt={0,1,2}ProductionP2isdefinedas
S2->CD
C->0C|ε B-
>1D2|ϵ
WecandefineCFGforL(SinceL=L1UL2) Let us
assume that CFG for L is G
G=(Vn,Vt,P,S) whereVn={S1,S2,A,B,C,D},Vt={0,1,2}ProductionP2isdefinedas
S->S1|S2S1->AB
A->0A1|
ϵ B-
>2B|ϵ
S2->CD
C->0C|ε B-
>1D2|ϵ
which is required CFG.

Classification of Grammars -- Chomsky Hierarchy Theorem


Types of Grammars – Chomsky Hierarchy
Linguist Noam Chomsky defined a Hierarchy of languages, in terms of complexity. This four-
level hierarchy, called the Chomsky hierarchy,
correspondstofourclassesofmachines.Eachhigherlevelinthehierarchy
incorporatesthelowerlevels,thatis,anythingthatcanbecomputedbya machineatthelowest
levelcanalso becomputedbya achineat thenext highest level.

TheChomskyHierarchyclassifiesgrammarsaccordingtotheformoftheir productions into the


followinglevels:

(a) Type0Grammars–UnrestrictedGrammars(URG):These grammars include


All formal grammer.InURG,alltheproductionsareoftheforma
->þwhereaandþmayhaveanynumberofterminalsandnon-
terminalsi.e.,norestrictionsoneithersideofproductions.Everygrammaris
includedinitifithasatleastonenon-terminalonthelefthand side
(b) Type1Grammars–ContextSensitiveGrammars(CSG):Thesegrammars
definethecontextsensitivelanguages.InCSG,alltheproductions
oftheforma->þwhere│a│S│þ│,aandþmayhaveanynumberof
terminals and non-terminals.

ThesegrammarscanhaverulesoftheformaAþ->aγþwithAasnon-
terminalanda,þandγarestringsofterminals.WecanreplaceAby
γwhereAliesbetweenaandþ.Hencethenamecontextsensitivegrammar.Thestringsa
andþmaybeempty,butγmustbenon-empty.ItcannotincludetheruleS-
>ε.Theselanguagesareexactlyall
languagesthatcanberecognizedbyaLinearBoundAutomata.

(c) Type2Grammars–ContextFreeGrammars(CFG):Thesegrammarsdefine
thecontextfreelanguages.Thesearedefinedbyrulesoftheforma
->þwhere│a│=1andisanonterminalandþisastringofterminals andnon-
terminals.Wecanreplaceabyþregardlessofwhereit
appears.Hencethename,context-freegrammars.Theselanguagesare
exactlythoselanguagesthatcanberecognizedbyanon-deterministic
pushdownautomaton.Contextfreelanguagesdefinethesyntaxofall programming
language.

(d) Type3Grammars–RegularGrammar(RG):Thesegrammarsgeneratethe
regularlanguages.Suchgrammarrestrictsitsrulestoasinglenon- terminal on the left hand
side. The right hand side consists of
eitherasingleterminalorstringofterminalswithsinglenon-
terminalonleftorrightend.HererulescanbeoftheformA->aB|aorA->Ba|a.TheruleS-
>εisalsoallowedhere.These
languagesareexactlythoselanguagesthatcanbedecidedbyafinite
stateautomaton.Thisfamilyofformallanguagescanbeobtainedby
regularexpressionsalso.Regularlanguagesareusedtodefinesearch
patternsandthelexicalstructureofprogramminglanguages.

ThefollowingtablesummariseseachChomsky’sfourtypesofgrammars,theclassoflanguageitgenerates,t
hetypeofautomationthatrecognizesit, andtheformofrulesitmusthave.

Thehierarchyoflanguagesandthemachinethatcanrecoginzethemcanbe shown as below.


Everyrg is contextfree,Every cfg is context sensitive and everycsg is
Unrestricted.Sofinallyofregularlanguagescanberecognizedbyany
Machine.CFL’sarerecognizedbypushdownautomata,Linearboundautomataandunrestricted Languages Are
Recognized Onlybyturing Machine.

Leftmost and RightmostDerivations


Theleftmostnonterminalinaworkingstringisthefirstnon-
terminalsthatweencounterwhenwescanthestringfromleftto right.

Forexample,inthestringbbabXbaYSbXbY,theleftmostnonterminalis X.

DefinitionIfawordwisgeneratedbysCFGbyacertainderivation
andateachstepinthederivation,andaruleofproductionis
appliedtotheleftmostnonterminalintheworkingstring,thenthis
derivationiscalledaleftmostderivation(LMD).

Practically,wheneverwereplacetheleftmostvariablefirstina
string,thentheresultingderivationisleftmostderivation.
Similarly,replacingtherightmostvariablefirstateverystepgives rightmost derivation RMD.

Example:ConsidertheCFG:({S,X},{a,b},P,S)Whereproductionare: S->baXaS|ab
X->Xab|aa
Find LMD and RMD for a string w=baaaababaab.

Solution: The following is aLMD:


S=->baXaS ->baXabaS ->baXababaS ->baaaababaS ->baaaababaab The

following is aRMD:

S=->baXaS->baXaab->baXabaab ->baXababaab ->baaaababaab

Anywordthatcangeneratedbya givenCFGcanhaveLMD|RMD.
Example:ConsidertheCFG:({S,A,B},{a,b},P,S)Whereproductionare: S-> aB|Ba
A->a|aS|bAA
B->b|bS|aBB
Find LMD and RMD for a string w=aabbabba.

Solution: The following is aLMD:


S ->aB->aaBB->aabSB->aabbAB->aabbaB->aabbabS->aabbabbA
->aabbabba

The following is aRMD:


S->aB ->aaBB->aaBbS->aaBbA->aaBbba->aabSbba->aabbAbba
->aaabbabba

Parse Tree (or) DerivationTree:


This derivation process can be shown pictorially as a tree to
illustratehowawordisderivedfromaCFG.Thesetreesarecalled syntax trees, parse tree or derivation
trees. These trees showus clearlyhowthesymbolsofterminalstringaregroupedintosubstring, each
of which belongs to the language of one of the variables of grammar.

For constructing a parse tree for a grammar G = (V,T,P,S)


 Root is startsymbol
 EachinteriornodeismarkedbyvariableinV.
 Eachvariableislabelledbyeitheravariable,a
terminal, or ε.
 IfainteriornodeislabelledA,anditschildrenare
labelled X1,X2,….,Xk is a production in P.
CFG:
Terminals: a,b Non-
Terminal:S,A
Production S->AAA|AA A-
>AA|aA|Ab|a|b
String “baaba” has derivationtree:

Whenweconcatenatetheleavesofanyparsetreefromtheleftweget
stringwhichisknownasyieldofthetree.Theyieldisastringthat
isalwaysderivedfromtherootvariable.Thereareparsetreeswhose
yieldsareinthelanguageoftheunderlyinggrammar.

The importance of such treeis:


 Theyieldisaterminalstring.Allleavesarelabelledwith
terminal or a ε.
 Therootislabelledbystartsymbol.
 Atanyintermediatestepifweconcatenatetheelementswegeta string in sententialform.

Ambiguous Grammar
A CFG is ambiguous if there exists more than one parse tree or
equivalently,morethanoneleftmostderivationandonerightmost derivation for at least one word
in its CFL.

Whenagrammarfailstoprovideuniquestructures,itissometimes
possibletoredesignthegrammartomakethestructureuniqueforeach
stringinthelanguage.Unfortunately,sometimeswecannotdoso.That
is,therearesomeCFL’sthatare“inherentlyambiguous”;everygrammarforthelanguageputsmorethanonestru
ctureforsomestringsinthe language.
Definition:“ACFGiscalledambiguousifforatleastonewordinthe
languagethatitgenerates,therearetwopossiblederivationsofthe word that correspond to different
syntax tress. I f a CFG is not ambiguous, it is calledunambiguous.”

Example:Showthefollowinggrammarisambiguous. E->id|E+E|
E*E|E-E
Solution: LMD: for string id + id * id
E->E+E ->id+E ->id+ E*E ->id+ id*E ->id + id *id
RMD: for string id + id * id
E->E*E ->E*id ->E+E*id ->E+id*id ->id+id*id
ParseTreesrepresentedbyaboveLMDandRMDareasfollows:

Astherearemorethanoneparsetree,grammarisambiguous.

Example:Considerthegrammar‘G’withterminals:aNon-terminals: S
Productions:S->aS|Sa|a
ShowthatGisambiguous.

Solution:Theword‘aa’canbegeneratedbytwodifferenttrees:
Therefore, this grammar isambiguous.

Example:Considerthegrammar‘G’withterminals:a,bNon-terminals:
S,X
Productions:S->aS|aSb|X X->Xa|a
Show that G isambiguous.

Solution:Theword‘aa’hastwodifferentderivationsthatcorrespondto different syntaxtrees.

S->X->Xa->aa S->as->ax->aa

Hence grammar isambiguous.

Example:Considerthegrammar‘G’for‘PALINDROMES’isterminal:a,bNon-terminals: S
Productions: S->aSa|bSb|a|b|ε
Show that G isambiguous.

Solution:Thegrammarcangeneratetheword‘babbab’asfollows:
S->bSb ->baSab->babSbab->babbab

Which has derivationtree:


Since there is only one parse tree, the grammar is unambiguous.

Example:Checkwhetherthegivengrammarisambiguousornot Productions: S-
>iCtS|iCtSeS|a
C->b
Show that G isambiguous.

Solution:Tochecktheambiguitiestakeaninputstring.Letussay
‘ibtibtaea”letusdrawthederivationtrees.

Simplification of Context FreeGrammars


Simplificationofthegrammarinvolvesremovingalltheseunnecessary symbols.
1. Elimination of Uselesssymbols.
2. Elimination of εProduction
3. EliminationofUnitProductionoftheformA->B

Definition: A CFG are not always optimised. That means grammarmay


containsomeextrasymbols(unnecessarysymbols).Thesewillincrease the length of the grammar.
Simplification of the grammar involves removing all these unnecessarysymbols.

ForExample,lookatfollowinggrammar: S->AB A-
>a B->b|C E->c|ε
Here C never defines anyterminals.
EandCdonotappearinanysententialform. E->ε is a null
production
B->C simply replaces B byC

Hereifwesimplifythegrammarasfollows S->AB A->a


B->b

Example:Eliminateuselesssymbolsandproductionsfromthefollowing grammr
S->ABa|BC A->aC|BCC, C->a, B->bcc, D->E, E->d, F->e
Solution:
Step1:Eliminate non-generating symbols: All variables are generating.
Step2:Eliminate of non-reachable varaibles:Draw the dependency graph.
 D,E and F are non-reachable from S.
 After removing uselesssymbols
S->ABa|BC A->aC|BCC, C->a,B->bcc

Example:EliminateuselesssymbolsinG S->AB|CA,
S->BC|AB, A->a,C->aB|b

Solution: Here B is not defined hence useless.


CandAarereachableandaredervingterminals.Henceuseful.So reduced grammar is, one
without useless symbols.
S->CA, A->a,C->b

Example:EliminateuselesssymbolsinG S->aAa,A-
>bBB, B->ab,C->ab

Solution:HereCisuselessasitisnotreachablefromstartsymbol. so reduced grammar is, one


without useless symbols.
S->aAa,A->bBB, B->ab
Example:EliminateuselesssymbolsinG S->aS|A|
BC,A->a, B->aa,C->aCb

Solution:HereCisuselessasitisnotderivinganystring.Bisnot
reachable.Soreducedgrammarisone,withoutuselesssymbols.
S->aS|A, A->a

Elimination of ε Production

You might also like