0% found this document useful (0 votes)
39 views36 pages

Compiler Phases and Lexical Analysis

The document discusses the different phases of compilation in detail. It describes lexical analysis, syntactic analysis, semantic analysis, intermediate code generation, code optimization, and code generation phases. The main tasks of each phase are also explained.

Uploaded by

mayur1000.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views36 pages

Compiler Phases and Lexical Analysis

The document discusses the different phases of compilation in detail. It describes lexical analysis, syntactic analysis, semantic analysis, intermediate code generation, code optimization, and code generation phases. The main tasks of each phase are also explained.

Uploaded by

mayur1000.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Jnioduclion

SOunCe

that is sed o canvent fArm rs


Conpilen a phogiam
qulvalkrnt machine yel torgg
languag to

BouAca Pgm taget Pgn.


Compl let

cohich wss joloung shalagg


Jnlepaales compols progpom
enCewbin
paogtam coce crectly
1. Execulã Bome ef6icatnt inlimadalz
code inl6
TAanslalo
tapresenlalen.

Ease % Implernenlalon
pohtabili
Fast dit-compila un cyele.

tongagt pouae pgam

Pre pacceest
mediiecd 8oua pm
Compi lat
Langek calemly Pgm.

Asemblr
Aaloctoble cocd

inia/eode 4ctget ode


modules
AouACe protam all these mecdoles called as
Aepanali yiles t colleclng
"pre-phcc e8O.
sht hancls
mcLy
Capond
conpi len to get the
Pgm is givern lo
* dhe mcdljiec
asembly Larngogo
proces the aisembiy language psocues
the asemblt
solo catabe machíne codo.
loads, be
all he Aelocalable code 4 loads
The lin ker linKS peLubien.
eoecitable Cbzect files inle manty

The DHrant Phases 2 compile

Lexical analyau
aclãal
shis phase q a compuk indudes
whese it callecla characlbis
tho sequenle
4ouaL Paofam 66
Lo ker' f
unii called
in% muamgun
<token.name,, Qlttinuévalues
an absliact Bymbol Used to repreKent a toen
toicenna
enlay in the symbo) table
altaiboli value poinls to Cm

Pask cn=inkal +al x[o

Lexical
onayas

T may pea7om cel tion c hilispae Ouoline, (omans tabs.


Semanlit analysis i"
f the nyinali on in
shes phases ws2s 1he snäx ie
the hymbol table to check tho progmo ja
variabla claclasalon f ync lon ype
caclaxahicn

he pe chacing. g Yascase eo thuns


f a als0 Psgdn

Jnlamedai coce qongnatoi 2


qenthals the inEnadkalt cale in thu pf
Jhis phase
inslvction ueth 3 cpUxornts
s an ausorbly lavel
3-addrRS Code" cohch
conpoted byy
st creatas the ycany m0es G hold hu values
3-adolreLs coce.
ti= into jleat (6e ) Re7es cla; nete

to
ies l6 ímpaove the inimadialb cocle
Jhis phase chis ophimiy should gn0
qet the
jace tasgel code.
diiginal cale.
tho

id, idz tt|


taras the input g pom eqphmi
gemuato)
COde qomira lil i- The cocle machina lamg. Ihe Aogsla , nenoy
a faget
Yariabley
Aclec ad each
localiors
ADDE R,R R
OF R, ds
mulF Ra, Rz
Lor R td
L8xal analy

Synla analyae

<rd,i)

pOsit on
in lal 8emanlic anaiyza

Gymbol Tabe
<id, 3)
intiopleat
60

code
Bntumdali
t- intbjioat(6o)
+32 d rt ,idtses
code cplimy e
tiidy *60
id =idL tt|
code genutq bd
LOF R, id3

LOE R, id

STF id', R

SynlQy analy th i
Jn this phase the obtoined
sOUSa coe frn the
stivche12.
hnlon analyis ohich delesmirus ha
Synlax
9 the phoqfam in he

6o
<id, 3>
The main lask

ChoraclEnA ythe poyhagop hen inlo enes gprodue


the cut

Curalyis.
ohen
lexical anayze cis ceves ar
Lsb~sh n a
iclenyjen man this
ymbol lahle
ah
all
shon
the
selon
ingdmalia

<id, plaz) <assign.s <id, oe) <mit.a (onst, 2)

she inlFac lon b laxical analyis as Ahon belos


Kheney Passih Tdl a loun C uqueal to bxial

-7hls emables the leical analyge 6 sad the input chasctins


Untl the Tirl

toktn
Lexial
anayge To Semonlic
getrext Toxen. aralylis

ymbol
Table

The lexiccul ardlyes Contist phases,


iy
Samig shis acUs AUvOVes all whili foce and dabllon
ke xical aralyis VensUs tansing

osalng ical aralyss no passng


The Beason bebind the
ghis ecu ces the ovesheod
commtns 4
Paser ohich has lo cdoal oih
these ase absat Aehn Oved
h.lipaca gphose.
aeady

7he compilen ficicncy can be impaeved by


adding spoa adyanced
leocicol aralysis.

poflab.kli can be enhanled


3. poitabilty The conpiler

Tokens, Patesns and Lexemes

TOKe gt s a logical unit of the pscgiam that ontapsulat


all the ingdmaticn asociated auith valicl lexem es.

pain corsislig
(tokom-name a thibulavalue)
which is a sointes l6 Pymkol Table,
5

patlesn 1equla ExpresSion ie z L8 C ule

cdescibe the loken.

id = ca-jA -z] Lo -g]


vum [o-g7

Kexeme - a t that matches thu poHe ond is


neOgTized by a loxen.

NOTE -
Consiclea Tencb
ait ;
lesems
above Tect ts as vahd
the
The clasijy
Bcamner
Tatiesrn(1),at also detesmines
by matching cuith the paRs these as a
idon'es). ord
the calegdiy (ie Iayuoid
to tho paser.
tokom aLt

coside a Text

patean ot
Vow, 5Can ne did not ind a malching
amd is tesmed as
is
lexeme
So i
1Qait

NOTE - COnside stalement

Pointy l valee - d , a );
Pinty lexemes matching the patiesn C1).
id.
6

Sample
To ke BesCAiplion lexemeh

Numbe Any Tum iC


coslant 3142

Leter bollo wecl by Scoe, aiti23

cgt le tes

Composison

Jny opesalst +,4./*


cper atd

than Cne lenc em p


Tt sBeqvihec because, ohen moe
then Kexical analya e has to give
natche% a patt n ,

moie irgnolon aheut tha legtemeh

below Code,
°gi consi dea the

11 int a, b= lo, G; eid pt)


Dymboi
TYPe
int
Lvo inlol_cd
2loct te cl=50; iDt
NO

þloat yes
aboye eg, the lencical analyet (Symbo t Table ).
In the
inds 3 lxemes (a,5 and c) which
belongs same calagiy ie, idenles
.. Lexical analget upcdales the Symbo 17a6e abaut the 3 lexeme5
ie, iG clala lópe, linano, ohethes it is intalizd &not. elE
the toke mane 4 attrihute value o he below
elatinnerst .
a-atb;

Sol i
<ity <cpon.bsace ) (ed, pãI) <leeshan y <id, plias < cloße baa)
(id, pla) (asgn > <id piuy <plus> <id pt;

token name 4 attaibot vahe yo the below stalint.

Sol

<zd, Ponlt i) (asign_cp> < éd, ponlar) ( pls.op> (id , poini a)


(mulHop) <umba int vaua Q0

Lexical
A chaa cter bqwn ce hat canot be Rranmed inlQ any
vald token called oUs "ercical

main)

int
LZal:d chasades sequen ee
Hence keaical ttt.

unpedi ctable and it can


be hancllecl by
Laxical are
Bimplest rucoveay slialegy ie,panic mode nci Ry
Cosidei

hethe gi misspeling
leocical analy3eL cannot tell
In this is valid onone whidh
not. bot
7 egto keywd
is tcatd CUS iden+lijts

The Panic moce Acovery


dlelete home 4 haadess pom he
inperl valid toxen is bound. at may poym one
Hhe aclions
2) Jelela Cne chasacts Aemaining inpot
Inseat missig chataces

Replace chasacles by another chaacles


too adyatent chasadters.
Replace

Used 6 impaove the peed


Irpt Bpr ing technque is
the poogam. . This tahniqve educes
the compiler l6 pAocees Bingle input
the oveshead
chahactea.

4 Rach
invohves bojs which ane altenalvey lbaded
N byfeg as shown
Bame Size
below.

Laxemabegin
BUe n
Bogjer.
hecognie a Token ti lexeme too poinless
1. lexene Begin - t Bels the beginning the ceent
lexeme.
BCaTs the mect chasacty unlill a
4. Focasd plr Tt
pattin
Once the lesceme Pacog niged gooasd pli ds det to the
aye the lenere JUsl jound.
chasadtR at 9ight end. ie,
tnce tho
Aacticlad, axemabg'n
lepcoe
to the Chasactes immediataly oyi

+autes time to
dtios not check the end the
The boye
l it be a oveshaad to tha comp:leA
the code lõ othea bujet
whilo acloacing

Pentinels 3 (eo) ohich indiales the


chasacle
JE i s a qpecial
end
chatacte addod at each byger end
This
Camol be pacgham
-he buyfer
4 this chasacte Ceo;) othes than at the emd
at am end.
meams énput is

+ This is hatgoll in allanali loading


thu sOACe Phogham.
beoß

Iox Cmbesn
Soitch ( usabd +t)

Case

eLse C luoasd at the end rnd


raload I Bojot
7

else
1eaminal Lenical analysis

The Requlat presiOns ae impilont rolalins y pacyying


the potlean ToAens

eg E4o, 1} a alphabet
8inoy

yinig equemce ymbols Choosen


the
sings au ol, Oo0I ,0)01

eg 9 lensth 3}.
Pealions

lonical analyis, mast impitort languages ane


Union, concatenal on amd Closue.

1. Unigp
eg dyha,6y, than d La,6, ,e

Q. Concalona lon :- d, hg are the lang them

then
3. a language oye E

set y al sirgs inciochng 6.


U

It is a

Reqolaa Expaesion
shotthard moalen to desCRLbe
Pater oi toke.
a R.E thon darg acceplec by R 8 aCR),
then aCE) ¬3

R.G hon
9{S ae R.G tHhen &orgua_e acepad by d i
Algebaaic
Text Bo0 k.
Rejes

Rsgulas Reji
ket be an alshabet, then egulas deginiion
he
Aqence doyintia
d - dË’h; ¿>0.

where d' ara unique.


Aegolas

can be dercaibcd by boluing alan dynitns.


letea AlBlc- zalb-3
digit 219
ol1|
d ’ leltts (le tea digt )

d; Can dapend on ll paevious d

Each dË canmol depend


rhis L8 the

Tn h above icl is daped


letlea and cigit Aducticna, ,
13

Regulas cen yo inlges t Flaliog Joint Such aS

12-34 Os05678 , 1 56E-5.

digit oj: |2-- 9


Znligeh
dig;ls’ dg:t d'gt
cplaral jfacln ’digla6 pracliona

qplional Exporent ’ (6(+}-JE) dig. &

TUmbe2 ’ cigils oplinal Poc icn oplonalExporst.

a Regulag cejn ol qL - (512) -59 -1234


digit o|| --- 19
digit digi
tigit a
dtgit 3
cg:t 4
Tombu digiti- (dgita')' _ digt3-t cligit4

Ragulat defo abc @ait ac. in

Sol: letea alb


dentjia letent

all the bove prob sach psaluction is depurd on i5


vOTE :- In
Paevious Psouclions. i.e. Regola dosn.
15

Tattera ae Conyated inlo


Sn leical analy3 e9, he
called as aTsikon oqiam:
Tt has a collec lon nodes called as slates.

insfal state amd many acapling stal.


4 Tt has only

4 Fot each inpot symbol, it mares


a

her is a diaacted edge. he gven


slali amothe ',

Facognize he aten the


The
iansihern dagam that
to ken.
malchrg
)tanl Aelop, LE)

0efun (ulep, NE)


othe
hetuanl Aalof, LTT)

agtutnlAe lop,E&)

retun(Aelop, GE)

othe etutn ( nelop ,GT)


16

*(slas) at fhe acastng slat ig rpics, to heact


the yotuvatd pointes to Ohe Pslnu
Consider enampla,

Jt mares a Laansilion prn


by Corsomins <

t tpls is inonemenTed.

Othen
By Cons uming bit mouas a

inciamonted

4 ratami the toam LT


Ja<b
rtte, the leeced aaaly ge, cide hot

Tn tid to '6 as a identjen, it seliocls the


pas han.

NOw, the lexical analyg n wnsb secognges b as a idenlyics by

wbing Sonne oth. Taanalon dagtom


ohen Cne wodd has moie lhn ne meaniog is Called

lexial ambiquly .
: The lerical analyge Can itat
imt either kay wd d d identj4
icdentye
letea ld.gt

7ecogngc the
TDA

the above wo iamBilions


(nt f retuans the 1o ken. 4 Cannot decide & Aecognije whethor
the int
a eywcd o ident ie.

to hande this dmbiguily


The are
coayf in a
Table.
the Rescyed ids

Set the highest pricily to Koctds than identyjca.

10X
Fast, Trstall all the Reseoved wds éLy mbo l Tble
as shoun below.

TYPe
do

double
Symbot Table.
install
il tis lo
when kexital analyza nds the identyit,
rstal 7do. ghis Funcbon
it Aynbol Table by callng Tase.
in a Symbol
re tusrs the poimte thés is anew mby
it Cals gelOkn1) ohch Aslons
Othsuist
teme Hhis is icdenljies
pointes to liaial analygr.

8. Conbides, the loo Ganshorn diagiaTA shown belouo


TD 1 -
Ae tusnl id, p)
Othes

abve clagtams pelns the ton the input


this cithen

this apsooch aal apalyze Tieat this as ey wcâd


but in
because bighest poucalomce" than cdnitjs.

Consi cdes tho tàg below iana lion cagfams

DoUbLe
l9
Double as a
leoci cal analy3h, Can braat
either DO
The
6ecause both diagams goes to the fnal slai
only
Lorgest nathog n hince
Dooble consdehed.

Architecue aTransiion c'agtam based Locical

Cors de the Bans lon diogam o Aelalonal oprato f its


inplementa lan s oon beloo

TokeTn get Relop ()


ToKEN Aot lojeon = Ttw (RELop) ;

Swikh (siak )

Case C= rect Chone);


Slali - 1;
CC:-'<')
slak 5
elu ih CC 2'=')

(Use 1

Case 5: rot lokcan. attaiboa EQ


Aelicsm (rat Torcun);
breanj

Case8: AalRact cO;


AotTorm. atti bule = GT
nelion Cret ToIm),
the Gans hion clagam oi onsianed nombtk.
dgt dig t
othe

otheh.

Wile ians bien diogam yot nohileayace .


delim
delim o thut
()
Can (onclude thak
Fhorn the ahove

Akink oubmala -hab,

Finilt
salis Inpot Symbels

nfal

Aulomala
7he
delminise Foile

"Fve-tepkd quin- tuple.


M=(Q, 6, , F)

shte, stales

set
inpot ynbols
a Aoa

bansisTon Fonclon
zt is a

whe oG
inlal slali
stoles.whee.
set Fndl stote d acaplig
FCQ(A Subset ae).
the Liansilen Table :
7he Liaailon Convenioral
ynclion 8, that tokes tobola rau
ogmenitt
Valve.

the Lansitlon Table y the Foili oolomali a


shaon belon.

ingot

The deleamioishi Fioit aulomala,

M=(to, 9,2}, {o, 13, 6, o i4!)

6(o, 0) o 6(9,, o) 2 sC2,,0 ) -


6(9,, ) = ,
Aulomala aapt the 49:Ah 10
conshclon DEA:

. Consc to alapt

orth o.
Cnsbuet

Londbuct DEA

DFA
aapt
Conshict
Rein io
dlaminsk auimala is 5

Q cnd
m:(Q, E, o, 6, F) where
F CQ and,

ental atae
is fnal stale FcQ
SUb
Lionsikion unchon, relusns a

slats Q in Caau vFA.


Convet the belos

bonslivcbon.
, I

0:(Qu, E, u, , i) e ho DFA
n-(an, E Sn,,G) be he nEA uch thet

i,4,3

I oapls tanguage endly


i9,, ,33
Sp( {o, ,1,0) = 6, ({235, 0) u
Sn(Bil, 0)

Spl {o,,j, t) 6n ({9o3, 1) U6n (14,5, ) =


onveit NEA DEA.

Aet o:(, E, &, t26}, a) be th OFA ond ne(On , E, &n

Lo, Fn ) be the NFA


9Uch hat C0) : dCw)

{o,,3

{to,2,3

Epl{935, 0) = E2o, 2,}


Sp({os, 1) =

Spl{eo, 2,3, o) * En(o,0) U6(i, 0)

Ep ({%o.e,!, 1) = Sn( to,t)0 5(%, 1)


Automali With Epsilen- Transibongin (Nen-E) :

On daleaminuslit enik aubomal on ith e lianaition


ad ae
6- laple (Q, L, . F, S), Aheu
EniG aels, FCQ and

8: d x (EU{6}) -’2
Blals.
iB the

am
intal stat, , 6 a.
Rinal SlaliFce.
tho
Kamsibion ynctior, ralusns
8

Fpisolon - clasre i- (E).


Rel 3 be any
Ket m:la, E, o., F, 8) be
S is he
suhsut &. he

Clejined as ECs)
elonomt
1. Evy element %
iepeat elarent 6C9, ¬) in bCs)
fof any 9 e e(s}, evay
Untl
the

fun as,
The

he yeund
the below NFA-E to

Korshuclo.
1,0

Son
Find the 6- cloue,

8C{903) o2, 3
ghe stast Alali
EC{963) : 1262,3

Slap 3 Suboet Cons lbvchon algo

{262,39623 i962,3

6ol i9%2,} o) =¬6n (9o,0) U Sn (Q,, o))

Solt962,!, ) :¬(6n(%0, 1) U Sn (91,1))

RFA Ls,
nvett the belou DEA-E to DEA tUsing vauhel consbuclio

slp 1 i- Fnd ¬-Clctne.

ECto3) io,,2, 4, 73
EC33) t3,6, 7. !, 2,4 }
ECC53) t 5,6,7, ,2, 43
E(46)) 6,, I, d, v3.

slapa :- Fnd he oll otal a DeA,


b ECto3) to,1, 9, 4, 73.

Abply 8uhAst Aonsbiclin algaithm

{12346 783 tI2456 73

L1 2346 783 L124 563

|02346753
So, Unin
Camthat
Expresion. - hegulakxprssion
{0,10} hJe
Jhenotahon she
heplocng
Concalenakon,nd
language
Regulon
U
by i}
o, obtained
wsd
ohich t, the

bbd,
bbcs bbb{a3
The
cbore
dkrcibe Reautt

far
t0)* (o
finili thais, Gporabonthe
alliynale
lary known s
alphabt
- amd
automala"
byhy
the
aAe (h,t)
ojt
+1o
Reia
b
R. Ré coluspenlng
coayondirg
thon ponclog coruaponding
theand coupo
cdrin hen
thn
then then
Kegulat dgnd elonenls R
R,
R R, element, B,
y elenent
lorgg elemont eloenl eleont
RA an
Egpressicn is an
Ddasapénding
is m
K)
is
ghe
A
ony b
containg
q
numbes
fprucodsnce piodsnte
fradinco
hghet
the 3
any
hoye lownol )
c)
indudg
baabbb-. cwith
3-bab
aa
ahsting
the singy
voet
the any
t(6) by
do the the endig
ongh aaab,
g:{
yolcwedby
R.E
the has has has
the (a lergtha0,
6,9{ yollowd
as
eypralgubnai
ssione-slis)
Ckleen
on
* Concalnahi he he g is
of th)
(aaa
dot
* ab
The The The
tb) tbjCa
(a tb)
(a
2) 3)
i ED 3)
Finally, Son: phoblu--
Convat )
moitI

m,: NEA-&:
mo
M

the

Regulan

Exprees,cn
m,
:

&

Cate 3
Aoaly, Scr'

Eyprecsicn
Rsulakthe

+)* to
(o

18

You might also like