Apache HTTP Server Log Analysis
Business Analytics Using R Project
Business Analytics Using R Project Apace Log Analysis
Objec t ive s
T his pr o gram e nable s the par ticipa nts to r evie w the le ar nin gs o f t he
B usine ss Analyt ics Usi ng R Wor ks ho p.
T he pr imar y o bje ct ive o f the pr o je ct is to e nhance the par tici pan ts
kno w le dge o f R & de velo p e xplo rator y analysis & visualiz at io n skill s.
Pr o c edur e
Vie w A pache Sam ple L o g
Refer apa che_ sam ple .p df
Un der sta nd Apache L o gs
Refer apa che_ de sc. pdf
So ur ce : ht tp :// ht tpd .apache .o r g /do cs/2 .2 /lo gs .ht ml
Use Data se t as give n
Refer se ctio n Apache Data Se ts
Par se & Analyze
Refer se ctio n Pr o ce dur e
Ana lytic s Requir e me nt
Refer se ctio n Analy tics Require me nt
G e ne rate Pr o je ct Repo r t
Refer se ctio n Pr o je ct Repor t
A pa c he Da t a Set s
apac he_ ht tp.lo g - small apache lo g to cre ate your pro to type
usask_ acce ss_ lo g. gz - co mpr e sse d file co nta inin g
" Uo fS_ acce ss_ lo g" ; an apache lo g o f appr o x 2 33 MB
N o te :
" Uo fS_ acce ss_ lo g" to be r e name d as " apache_ da tase t. lo g"
Si te : Web lo gs fro m N ASA Web Si te
So ur ce : ht tp :// ita .ee .l bl. gov/h tml/ co ntr ib/N A SA-HTTP.ht ml
Pr o c edur e
C o py lo g_ file to your w or ki ng dire cto r y o f your cho ice
Par se lo g file & r ead into data frame
S tor e csv fo r mat data in an w or kin g dir e cto r y
T he csv file sho uld have
da te fie ld in yyyy-mm -dd for ma t ( time zo ne to be igno r e d)
ti me fie ld in hh: mm:s s fo r mat (time zone to be ig no re d)
pr o to co l, page viste d & htt p-ver sio n sho ul d be se parate co ls
Pr o vide analysis re sult s as pe r se ctio n Analy tics Require me nt be low
C o py r esul ts to lo cal file sys te m o r lo cal MySQ L as per r e quir e me n t.
A na lyt ic s R equir em ent
R e quir e d As Data Fr ame
340925032
Page: 1/3
Apache HTTP Server Log Analysis
Business Analytics Using R Project
For e ach mo nth , ho w ma ny time s e ach indivi dual ho st has co nne cte d
to our se r ver ? Sto re data mo nth w ise & hi ghe st co un t fir st .
For e ach mo nth , ho w ma ny time s e ach indivi dual pa ge has be e n
r e que ste d fr o m o ur ser ver? S to re da ta mo nt h w ise & hig he st co unt
fir st .
For e ach mo nth , ho w muc h data has be e n do w nlo ade d by each
in divid ual ho st tha t has co nne cte d to o ur se r ver ? Sto r e data mo nth
w ise & by hi ghe st co un t fir st .
Ho w much dat a was se nt
o ut
as
e ach
indiv idual
page
was
do w nlo ade d fr o m o ur ser ver ? Sto r e da ta mo nth w ise & by highe st
co unt fir st .
R e quir e d As Visua liz atio n
For each data se t ge ne rate d ab ove, pr e pare suitab le visualiz at io n
( givin g re aso n w hy the graph is cho se n). L imi t data to sui table
sig nifica nt numbe r if graph is lo o king too clutte r e d.
T ime Se r ie s Graph fo r to t al hits pe r day w ith e ach mo nth be ing
sho w n as se parate line .
T ime Se r ie s Graph fo r to t al do w nlo ad size pe r d ay w ith e ach mo nth
be ing sho w n as se parate line .
Ho w w o uld you sho w to p 10 mo st po pul ar page s pe r d ay as T ime
Se r ie s G raph w ith e ach mo nth be in g sho w n as se parate line . Ex plain
ho w you fi nd to p 1 0 mo st po pular page s
Answ e r s R e quir e d
Usin g t he above r esul ts and also car r ying o ut any o the r analys is as
m ay be r e quire d , pr o vide answe r s to the fo llo w ing que stio ns.
W hich ho st has co nne cte d the maxi mum numbe r o f ti me s to o ur
se r ver ? Gi ve t he ho st name & co un t o f co nne ctio ns fr o m tha t ho st .
W hich page that has be e n r e que ste d t he maxim um nu mbe r o f time s
fr o m o ur ser ver? G ive the pa ge name & co unt o f the ti me s the pa ge
w as r e que ste d.
Ho w ma ny uni que ho sts have co nne cte d to o ur se r ver ? G ive co un ts.
Ho w many uni que page s have bee n r e que ste d fr o m o ur se r ver ? G ive
co unt s.
W hich ho st has cause d maxim um da ta transfe r fro m o ur se r ver ?
G ive ho st na me & the data transfe r fo r the ho st .
W hich page has cause d maximu m da ta transfe r fro m o ur se r ver ?
G ive page name & the data transfe r fo r the pa ge .
W hich page has maximu m do w nlo ad size fr o m our se r ver ? Gi ve page
name & t he size for t he page .
W hat is the do w nlo ad co un t o f the page tha t has maxi mum do w nlo ad
size fr o m o ur se r ver ? G ive page name & dow nlo ad co unt
W hich page has min imum do w nlo ad siz e fr o m o ur ser ver? G ive page
name & t he size for t he page .
340925032
Page: 2/3
Apache HTTP Server Log Analysis
Business Analytics Using R Project
W hat is the do w nlo ad co un t o f the page tha t mini mum do w nlo ad size
fr o m o ur ser ver? G ive page name & the size for the page .
Pr o jec t Repo r t Using RM D
Pr o je ct O ver vie w
C o mmand s / Co de Se ctio n
Results Se ctio n
Su mmar y - Ho w you use d R fo r Data Analy tics
Pr o jec t Ov er v iew
B r ie f O ver vie w O f T he Pr o je ct
L e ar nin g O bje c tive
Co m m a nds / Co de Sec t io n
Sho ul d co nta in all R co mmands use d to transfe r lo gs to data- frame s
and samp le data -frame usin g he ad (dat a-frame , 10 )
Sho ul d co nta in all R co mmands use d vis ualiz atio n o f data -frame s
and the out put as de sir e d.
Sho ul d co ntain all R co mmands use d to answ e r the spe cific quer ie s
raise d in pr o ble m de fini tio n alo ng w ith the out put as de sir e d.
Sum m a r y
De scr ibe yo ur e xper ie nce o f usin g R fo r Data Analyt ics.
340925032
Page: 3/3