Applications of Context
Free Grammars
Ca uae)
YRCoie ec]
anh LTT)
ed 2
Applications of Context Free Grammars o)
Parsers nl
Ue Core ede SF SES
Rea treeorTer ey nT
AU Re eS ara)
ACC CCR CR Lr elt) Pee)Context Free Grammars
eo eee On uae
eC aD ee anon
Context free grammar G can be defined by four tuples as:
ra)
ao
G describes the grammar
See ee ete nc
Sheen eee
ree ee
ere assis!
> In CRG, the start symbol is used to derive the string. You
can derive the string by repeatedly replacing a non-
een g eee eter re eee
Pee eee ee ce
eee een
a)
)
}
Pee clad
ae eR een ee teed
head of the production.
Poa mnoes
A string of zero or more terminals and
SM ee snes ee
production, represents one way to form
Poe ata arog
head.Example
ae RCs Rec ee)
ers . .
eT DCs et
cary
coe ete nee
Sis aroun
Soe
> Now check that abbebba string can be derived from the given CFG
coe
eer
een
ee
eet ee eee ee eee ng
SS a ee te toe eeDerivations Using a Grammar
Inferred
200)
00)Leftmost and Rightmost Deri
aR enrages RTD
Reporter enter one eaten eices cinta ose y
PAREN See CO rece Reece a SEY
the current string is replaced with a production rule at each
Ae RUS Ee Rhea ee CoM ns meet
terminal symbol in the current string is replaced with a
Peete eee encom
Pa UD aes eee ORR ee eee on
Ret eee Ace ce mrs Restos Mae Ns
non-terminal symbols are replaced is different.
Pa Ro ee ee ese ein occas erat
ea med Ocue et tee e Geter tet)
Retr tee st Sase et een nots ssnRYailcoash @ deyurely
> Ina Context-Free Grammar (CFG), a sentential form isa string of terminal and
non-terminal symbols that can be generated from the start symbol of the
‘grammar using a sequence of production rule applications.
pe eee eon een ee Cee sv my ed
eee Seer ee enue ence nT
symbols represent the basic elements of the language being generated.
> Sentential forms are generated by repeatedly applying production rules to non-
terminal symbols in the current form, until no non-terminal symbols remain.
> Sentential forms are important in the theory of formal languages and automata,
SOc ace en ne ned eee Ce
> The analysis of sentential forms is also important in CFGs to determine
properties such as ambiguity or whether the CFG generates a regular language.
> Sentential forms can be used to represent intermediate steps in the process of
generating a language from a CFG, and to analyze the structure of the language
eoParse Trees
> In the context of Context-Free Grammars (CFG), a parse tree is a graphical
Seen ace ee ea tee ome cus
> A parse tree isa tree where the internal nodes represent non-terminal symbols
and the leaf nodes represent terminal symbols.
> Each non-terminal node in the parse tree corresponds to a production rule used
in the derivation of the string, and the children of a non-terminal node represent
the symbols generated by applying the corresponding production rule,
> Parse trees are used to visualize the syntactic structure of strings generated by
the grammar, where the root of the tree represents the entire string and the leaf
Fe ee eRe cn
> Parse trees are important in the analysis of CFCs, as they provide a way to
Se ener ete na ec
> Parse trees are also used in parsing algorithms, such as top-down and bottom-up
parsing, to generate the parse tree from an input string,Qc meen sc
> Context-free grammars (CFGs) were originally developed by Noam
Sea et ec eon omen)
practical applications in computer science.
> One practical application of CFGs is in the description of
programming languages and the mechanical generation of parsers
ee aaa Reece een
en a aR oe ey
computer science were put into practice.
> Another application of CFGs is in the development of XML, which
‘uses a Document Type Definition (DTD) that is essentially a CFG to
Cre Seen Casa get
eR oon acy cy
formatting, allowing for more specific and structured interpretations
Cero
> For example, using the "PHONE" tag to surround a sequence of
See a caSen anceBe Vactos a)
Le See Renae Oncor
it into the corresponding Intermediate Representation(IR). And as there are so}
programming languages that cannot be represented by regular expressions alone’
recited
DCB
St oso ene ene co noe
> we must be able to match some left parenthesis against a right parenthesis that appears immediately to its 1
eee eRe eee
> Ifwe eventually eliminate all the parentheses then the string was balanced and if we cannot match parentheses
Pog rece
> Examples of strings of balanced parentheses are ()), 00) (00) and e,, while ( and (() are not.
A grammar Gbal = ({B},{()},P.B) generates all and only the strings
of balanced parentheses, where P consists of the productions.
ere
|b BB |(B)|« | Ee re nee enc crts
Poe eee nics?
Then according to this grammar,(* a)" isa Cd — ee ew Le Pais of Lerimnted
= eo around a balanced string, then the result is
ae Emel
SeesLacan eee ,
et ee eC es
ee et eC
Tee see ee tc
ee eRe
s input
aoa een ey es rs tet
aa Seen ee sre cco
ee et een es
ereeen ts
SO en ee ee eet
corresponds to a C program whose structure is like the
Pee nee
if (Condition
if (Condition) Statements
iF (Condizéon)The YACC ParserGenerator
ee eae EL Ineo
> YACC takes a Context-Free Grammar (CFG) as input to generate a parser function.
> The parser function can create parse trees from source programs that follow the
Ps
a Cr cn Cone
> The action is a fragment of C code executed whenever a node of the parse tree
corresponding to the production and its children is created.
Pa ecco egos
ar een ye EO Ue ey Rae eee eC
CeresAn example of a grammar in the YACC notation
PSS too es
eee eee es ee eno
Perea Sa ot
The list of bodies for a given head ends with a semicolon.
Terminals are enclosed in single quotes, and it is possible to include
several characters within a single pair of quotes.
YACC also allows the definition of symbolic terminals. The
Peete eet Rectan ecstasy
eee tree cen ne een te
the lexical analyzer.
ite eed ei See eRe res
Ree ee eee ees
eee en tS en tc)Markup Languages
Nee en ee gee eR Set
that convey semantic information.
Dane ner a cr aS CS eT coe
eres ee Rte Rc re
eran
‘The structure of HTML can be simplified, and a CFG could be used to
describe legal HTML documents and guide their processing for display on a
monitor or printer. Let us see how with an Example :~
People who die to sin in the fast ane
0) The testContinued ...
eae ea eS ose Ree ae)
Eee od sen ec ee ere eae
SR Ce ee Reon een ec Nac acest a nD
tags. An example of a Text element in Fig (a) is "Moldy bread.”
2. Charis any string consisting of a single character that is legal in HTML text.
Note that blanks are included as characters.
Se Roe Ree a Oa
Cerne em hacer RR cnc eo de
4. Blement is either a Text string, or a pair of matching tags and the document
Pert mC Oia Race
See erS et Seer sO oa omer Pees Ss ese
ee aaContinued
With the conditions we create a CFG that describes as much of the
Pte eg SSeS eet oc eect ee
RO eee Tener eC ee ena ene
Pee eee et ne LL eens
Char Text
fee
) explains that "Text" in HTML can be either an empty string or any
legal character followed by more text, withthe exception of < and >, which
eee ree tree ene ie cened
Element Doe
Line (3) defines a document in HTML as a sequence of zero or more
cee
Oe en eee ee ee cy
ee eee en ert
possibility of additional productions for other tag types in HTML.
Line (6) defines a list tem as the
tag followed by any document.
POR ee eee es
‘And with these productions, we have defined our CFG for the given
Pete tan sneXML and Document Type Definitions,
The role of context-free grammars (CFGs) in markup languages
HTML uses CFGs to describe formatting, while XML uses them to
cer ears
In XML, tags surround phrases that represent elements, ane a document
type definition (DTD) is used to clarify the different kinds of tagsand
structures that can appear between them
IDOCTYPE name-of-D
The DTD language uses a CFG notation to describe variables and list of element
tens)
Element descriptions in a DTD are regular expressions, which can
include other element names, the special term PCDATA, union,
concatenation, and closure operators
of the elenent)
Peer eaten te CNet ee een ec a set cs
The above mentioned points show the importance of CFGs in describing,
en naDocument type definition (DTD)
Oe ce CC ea neue
ctr The form of 2 DTD
De ee SE ey
ee ees ee ea ete
1. Other element names, representing the fact that elements of one type can appear within elements of another type,
eee me oreo eee ere es
2. The special term #PCDATA, standing for any text that does not involve XML. tags.
The allowed operators are:
Deore
eee enters
Peek ent eet es
Seen ae etd
ey
ee eet
> Parentheses may group operators her arguments otherwise te usul precedence ofegular- expression operators ;
applies.Example
eee ee ae ee ee cre eeeT
DTD that they can use to publish. on the Web, descriptions of the various
PCs that they currently sell
Each description of a PC will have a model number, and details about the
features of the model, e.g, the amount of RAM, number and size of disks, and maereoree
ELEMENT PCS (PCe
ELEMENT PC (MODEL, PRICE, PROCE
OEE ene ene at ELEMENT MODEL. (PCDATA)
ELEMENT PRICE. (#PCDATA)
ELEMENT PROCESSOR (HAN.
ELEMENT WAN. (47CDATA):
erent ee ete an ace ELEMENT MODEL (WPODATA)
constituents are other elements for model, price, processor type, and RAM. ELEMENT SPEED (#PODATA)>
Brie ELEMENT RIK (¥PCDATA
The first element is PCS, which is lke the start symbol of a CFG. PC* is
dlofined as zero or more PC entries
ach of these constituents must appear once in that order. The lst ELEMENT DISK (HARDDISK | 0D | O¥D)
eee See ret ete ene renee Peet ELEMENT HARDDISK (WANE, MODEL, SIZE
Rom coer Nn ren ELEMENT SIZE @PCOITH
ELEMENT (SPE:
De eee ee ee ee eee
pone
The DISK entry isthe most complex, A disk can be a hard disk, CD, or
DVD. The element DISK is the “OR of three other elements,
Hard disks have a structure in which the manufacturer, model, and size
eee en ae eee er? 5An example of an XML document that conforms {6 DED PeSpecs
Within the illustrated entry, we can easily see that the model number is}
Pee ee OL Vee cena
256Mb of RAM, a 30.5Gb Maxtor Diamond hard disk, and a 32x CD-ROM
cores
What is important is not that we can read these facts, but that a program could read
the document, and guided by the grammar in the DTD.
Sora ee ee eee ened Tc
ee eee eee et ee ta eee
Pose
Oe ee
HARDDISK
Dase
where the "body" has a closure operator within it. The solution is to replace DISK+ by a
gem ee sec ome ee eee ere oe eed
of the variable Disk. The equivalent productions are thus:
Lear Rar ears nd
Dore eee a aeA general technique for converting a CFG wit
Derr imo ce lnic lad
If the body is the concatenation of elements, then the production is already in the legal form for CFG's, so we do nol
De ee eee oe ge Ry acetone
1. The production is ofthe form A — El, E2, where El andl E2 are expressions permitted in the DTD language. This isthe
concatenation case. Introduce two new variables, B and C, that appear nowhere else in the grammar. Replace A+ El, E2 by the
eres
ret
oa
coR
2. The production is ofthe form A —+ E1 | F2. For this union operator, replace this production by the pair of productions:
reas
ess)
3, The production is ofthe form A — (B1)*- Introduce a new variable B that appears nowhere else, and replace this production by:
rear
ra
peal
Pe eee eee ee ee ee eee
rer
AB
peat
5. The production is of the form A — (E1)?, Replace this production by:
ras
eenA general technique for converting a CFG with
Derm elni cag
Lot us consider how to convert the DTD rule to legal CFG productions.
See gee Rec ee el Sea ee
PROCESSOR, RAM and the second of which is DISK+. If we create variables for these two subexpressions, say A
and B, respectively, then we can use the productions:
ee)
eed
cannes
2. Only the last of these is not in legal form. We introduce another variable C and the productions:
creak
rns
3. In this special case, because the expression that A derives is just a concatenation of variables, and Disk is a single
variable, we actually have no need for the variables A or C. We could use the following productions instead:
Pe — Model Price Processor Ram B
Peo pens