0% found this document useful (0 votes)
53 views11 pages

Mapping Visitors' Behavior To Business Goals Through Click Stream Analysis

This document analyzes clickstream data from a website that facilitates online partner matching. The analysis aims to understand user behavior and identify strategies to maximize revenue. Clickstream data from the site over 3 days is preprocessed to identify user sessions and actions. Sessions with 50-100 clicks are analyzed. Analysis of hourly usage and user paths is conducted to understand bounces, engagement, and how users progress toward paying membership. Insights can guide the site to improve profitability and usability.

Uploaded by

friend983
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views11 pages

Mapping Visitors' Behavior To Business Goals Through Click Stream Analysis

This document analyzes clickstream data from a website that facilitates online partner matching. The analysis aims to understand user behavior and identify strategies to maximize revenue. Clickstream data from the site over 3 days is preprocessed to identify user sessions and actions. Sessions with 50-100 clicks are analyzed. Analysis of hourly usage and user paths is conducted to understand bounces, engagement, and how users progress toward paying membership. Insights can guide the site to improve profitability and usability.

Uploaded by

friend983
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Mapping Visitors’ Behavior to Business Goals through Click Stream Analysis

Vishnuprasad Nagadevara
Jacob Paracka
Sudarshan K T
Indian Institute of Management Bangalore, India

Introduction

The recent decade has witnessed a tremendous growth in web based businesses. E-commerce
businesses routinely capture large amounts of data through their web pages. The analysis of this
data can provide insights on the working of the business and the customer behavior. Every click
on the website is recorded and it provides important information about the movement of the
visitors through different pages of the website. Thus it is possible to actually trace the path that
the visitors have taken to reach the end point using the click stream data. The analysis of
different paths taken by different visitors can provide vary valuable insights into the behavioral
patterns and help the e-commerce sites to maximize the revenue by guiding the potential
customers to the desired path.

What is becoming more important on the internet is the ability of the search engines to identify
and list the website. If the search engine finds the right website it facilitates the user to identify
the right data quickly and easily. If a user is not successful in identifying the required data easily
in the website, he/she can exit the site. Thus a poorly organized website is of no value to the user
if he/she is unable to find the data they need. The secret of a successful website lies in its ability
to cater to user needs and should easily appear in the search engines. Web analytics retrieves data
from websites and plays an important role in providing insights to understand the behavior of
customers, unravel the problems that exist, and solve them so as to improve profitability and
usability of the website.
Objectives
The objectives of this study are to
a. Explore Web analytics and its usefulness to web based business.
b. Identify the techniques used in click stream analysis.
c. Identify the application of click stream analysis through analyzing click stream data
obtained from a particular website using appropriate click stream analysis techniques.
Methodology
This study analyzes the click stream data obtained from a web site, which specializes in an
online information exchange service to facilitate identification of suitable partners, in India and
other countries. The site has a very different revenue model. The visitors are allowed to browse
through the site without any initial payment. The visitors are allowed to look at the profiles of
prospective partners free of charge. The visitors will have to become members by making a one-
time payment only when they need to contact the prospective brides or grooms. Users can
search for profiles through advanced search options on the site on various preferences ranging
from basic details of preferred partner to lifestyle, career, education, profession etc. Members
can make initial contact with each other through services available via Chat, SMS, and e-mail.
Users can avail free registration on the website and are assured of exclusive privacy and
confidentiality. The website allows the users to create their profiles, search for other profiles, and
express interest in other profiles and contact others. Registration and creating a profile is free of
cost. Registered users can become paid members that will allow them to contact others, view
contact details of other members, write personalized messages, initiate chats and let other
members view their contact details. Paid memberships are provided for a specified duration.
The click stream data is analyzed to identify different paths taken by the visitors and the
sequence of pages that lead to payment of membership fee. Based on this analysis, specific
strategies are recommended to maximize the revenue for the website.
Preprocessing of Data
Data is obtained from the site in the form of click stream records. Each record consists of the
details of clicks by the visitors and each record contains the following details:
Server IP
Client IP
Time stamp with Date
Status: HTTP Status code
URL requested: has three subfields namely The request method, resource requested and the
protocol used
No. of bytes transferred
The country of origin for a specific request is identified using the IP address. Similarly, the URL
is used to identify the information/web page browsed by the visitors. Similarly, the time stamp
of each click is used to sequence the movement of the visitors across different pages in the
website. Identifying a unique user session is an important step in the analysis of click stream
data. Inactivity for more than 30 minutes is considered as a break of session. In order to identify
different user sessions, we determine the time between consecutive clicks from a single IP
address. If the time between two hits is more than 30 minutes then the subsequent click is
considered the start of a new session. This is an approximation since there could be multiple
users accessing from the same IP, or the same user accessing from different IPs. Due to lack of
more data available we consider hits from each unique IP as belonging to a unique user for a
unique session.
Based on the URL structure data derived from navigating through various pages on the website,
we identify key words in the URL string to identify the different actions that the user is
performing. This action is then captured against each click.
The webpage uses hypertext preprocessor scripting for creating dynamic web pages. All the
actions done by the users result in a .php file returned by the server. The name of the .php file
that is returned is indicative of the action taken by the user. This information is now extracted
from the URL string and a field called “web action” is captured. All the records where there is no
.php file returned are ignored.
Once the different sessions are identified and the actions performed in each click are identified,
the information is consolidated to represent the unique characteristics of the session. The
following information is captured for every session.
1. Session ID
2. Client IP
3. Start time of the session
4. End time of the session
5. Duration of the session
6. Country of client IP
7. Number of clicks in the session
Data is available for 3 days. The volume of data in terms of clicks for each day is fairly large. In
addition to this we do not have user profile information and our user identification based on
unique IPs. Taking these into consideration we have to reduce the data that is processed for this
study.
In order to further reduce the size of the data, we take only those sessions where there are
between 50 and 100 hits per session. This is collected across the 3 days. A summary of the
sample data is provided below.

Day Number of sessions Number of clicks

Day 1 23,440 460,211

Day 2 22,717 453,977

Day 3 24,694 461,518


This consolidated data is then be used to analyze the user behaviors and patterns and hot spots in
the website. Data mining tools are applied to this information to reveal useful information that
can help the business objectives.

Bounces are defined as visits to the website where the user has viewed only one or two
pages and then exited. These are users who have probably come to the website by
mistake or left the site early because they did not find what they were looking for.
We can identify serious users on the website by looking at the number of pages that were
viewed by the user. For ease of analysis we use the number of clicks as an approximation
of the number of pages visited.
The data on the number of clicks per unique IP is extracted to identify the bounces and
the serious users as shown in Figure 1. This gives an idea of the amount of users coming
to the website who are genuine users against the casual visitors.

Bouncers and Serious Users


35000
30000
25000
20000 Clicks per IP
15000
10000
5000
0
5 20 50 100 200 500 1000 More

Figure 1: Bouncers and Serious Users – Number of clicks per IP on a single day

Analysis
The data pre-processed as described above is analyzed to identify eh patterns in
terms of usage by the hour of the day, country of origin, various actions performed
etc. in order to identify the user behavior.

The amount of hits to the website is categorized by the time of the day. This will
provide an insight into what are the peak usage hours of the website as shown in
Figure 2. The usage by the hour of the day can be obtained both by the number of
clicks per hour or the number of unique IPs per hour as shown in Figure 3.
Number of clicks by hour of day
500000
450000
400000
350000
300000 Number of clicks
250000
200000
150000
100000
50000
0
1 2 3 4 5 6 7 8 9 1011121314151617181920212223

Figure 2: Number of clicks by hour of day

Number of Ips per hour


14000

12000

10000

8000 Number of Ips

6000

4000

2000

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Figure 3: Number of IPs by hour of day

Usage by country
Another interesting aspect is to look at the countries from where the users are accessing the
website. Figure 4 below provides the information with respect to countries from where users are
accessing the web site. It is not surprising that most of the users originate from India. The site is
specifically customized to cater to the needs of Indians living in India or abroad. The next
largest country of origin is USA followed by UK and Australia. These countries have a fairly
large Indian population and hence contribute to a significant proportion of traffic to the website.
500000

450000

400000

350000 US
UK
300000 Singapore
NZ
250000
NULL
India
200000
Europe
150000 Australia
Asia Pacific
100000

50000

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Figure 4: Country wise usage per hour of day


Exit points on the website
Analysis of the last action performed in each session provides information on where users exit
the website. Figure 5 gives the summary of the last actions performed by users before ending the
session.

Last action performed in a session

7000

6000

5000

4000

3000

2000

1000

0
ut j
ile try ck de
x sg nd en
u ed in
g ch _a gi
n n
rc
h
go of it_ he m ba iv er ar ct lo iso
lo pr _h oc in w _ nm ce s t s e t a a r
s ea
ew ct ot ho c h a i r e lu n p _
vi ta ph _s ar m e_ _c co om il e
n m se ad ch e_ _c rof
co m _ r
ng
l p
p _m se
a
si em m
to ts m si
a c
n t
co

Figure 5. Last action performed in a session


Analysis based on web pages accessed per session
The data collected was divided into sessions. Based on the pages that are available in the website
and the number of times a particular page was accessed a data set was created with the count of
pages that were accessed per session. The data set was created for sessions with a click range
between 0-100 clicks. Figure 6 below shows the distribution of clicks based on the top 20 pages
that were accessed in a day by users. Viewprofile appears to the top action that a user accesses
on the site followed by sending a contact request and photocheck.

Figure 6. Count of number of pages accessed


It can be seen from Figure 6 that the most frequent activity is the ViewProfile. The activity of
log in is much less frequent. This could happen because, the web site allows users to access and
view different profiles without registering and logging in. Payment is not a very frequent
activity. This is understandable because the payment is only a one time activity. It is not even
required to register on the site. The payment is required only when the user wihes to make a
contat with other users. In addition, once the payment is made, the members can access and
contact the other users and potential partners as many times as needed within the secified time
limit.
Given the revenue model of the site it is interesting to see how the activity of payment is
connected to other activities. Two different techniques are used to analyze this aspect of the
behaviour. The first one is to develop web diagrams and see the linkages with the activity of
payment. Figure 7 presents the web diagram with activities which have a very high level
frequency (greater than or equal to 19,000). Only four important activities emege out of this web
diagram, namely view profile, single contact, photo check and show message. These are other
than the usual activvities such as login, logout etc.

Figure 7. Web Diagram with Frequency ≥ 19,000


Other web diagrams with lower frequencies were generated. Figure 8 shows the web diagram
with low frequency activities. These are the activities with frequency of 1000 or more. The
payment activity makes its first appearance in this web diagram. This diagram shows that the
payment is limited to vary few activities such as match alert and show message.
In order to ascertain the antecedent activities of payment, an apriori association analysis is
carried out on the activities. This is similar to the market basket analysis where we determine if
a particular set of actions will lead to another action. The Antecedents in the table below lead to
the consequent with the support level and the confidence as mentioned. Some of the association
rules generated are shown in Table 1. Again, the table reveals that the most important activities
that precede the payment are member comparison and photo request.
Figure 8. Web diagram with frequency ≥ 1000

Table 1. Association rules

Consequent Antecedent 1 Antecedent 2 Antecedent 3 Antecedent 4 Support Confidenc


% e%
Payment = T Photorequest= memcomp=T 100 73.1
T
Payment = T Country = Photorequest= memcomp=T 80 73
India T
Payment = T Login=T Photorequest= memcomp=T 60 73
T
Payment = T ViewProfile=T Photorequest= memcomp=T 90 72.8
T
Payment = T ViewProfile=T Login=T Photorequest= memcomp= 60 72.5
T T
Payment = T Country = ViewProfile=T Photorequest= memcomp= 70 71.4
India T T
Payment = T mmshowmsg Photorequest= memcomp=T 50 67.2
T
Payment = T ViewProfile=T mmshowmsg Photorequest= memcomp= 50 66.4
T T
Summary and Conclusions
The main challenge in click stream analysis is the collection and processing of data. Data has to
be setup in different ways for different type of analysis. Data volume is huge, therefore if we
should do some analysis spanning longer periods then there should be some mechanism to filter
out the relevant data easily.
Following information was retrieved from the click stream data
 Usage of the website by time of the day. This will help busy hour identification, and
provide information of the server capacity required for the website, and when
maintenance window can be scheduled.
 Usage of website from different geographic location. This can provide the data of the
distribution of users across geographical locations
 Exit screens provide information on where the users exit from the website. This input
can help redesign the webpage if it provides information on which pages are breaking the
flow of the user session.
 Most accessed and least accessed pages. This information can be used for variable
pricing of advertisings on the web page. This can also be used for better user interface
design and space utilization, by removing or repositioning the links that are infrequently
accessed.
 Associations provide information on unique actions on the website and the sequence in
which the user has performed these actions. This can be used in better user interface
design.
 Web diagrams given information on the co-occurrence of actions on the webpage and
their significance, this also provides inputs on user interface design.
References
 Peter I. Hofgesang and Wojtek Kowalczyk “Analysing Clickstream Data: From Anomaly
Detection to Visitor Profiling” ree University of Amsterdam, Department of Computer
Science, Amsterdam, The Netherlands
 Weinan Wang Osmar R. Za¨ane "Clustering Web Sessions by Sequence Alignment"
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
 Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava, "Data Preparation for
Mining World Wide Web Browsing Patterns" Department of Computer Science and
Engineering University of Minnesota, 4-192 EECS Bldg., 200 Union St. SE,
Minneapolis, MN 55455, USA
Cyrus Shahabi , Amir M. Zarkesh , Jafar Adibi , and Vishal Shah "Knowledge Discovery
from UsersWeb-Page Navigation" Integrated Media Systems Center and Computer
Science Department, University of Southern California, Los Angeles, California 90089
 Jaideep Srivastava , Robert Cooley , Mukund Deshpande, PangNing Tan "Web Usage
Mining: Discovery and Applications of Usage Patterns from Web Data" Department of
Computer Science and Engineering, University of Minnesota, 200 Union St SE,
Minneapolis, MN 55455
 Web Analytics Key Metrics and KPIs – by Guy Creese & Jason Burby
[Link]
 Achieving Business Results with Google Analytics by Sonny Cohen and Fred Salchli,
Duo Consulting.
[Link]
[Link]
 Increasing accuracy for online business growth by Brian Clifton and Omega digital media
limited Version 0.1. [Link]
[Link]
 Glossary of Interactive advertising terms version 2.0 by Interactive advertising bureau.
[Link]
 Integrating web analytics into information architecture and user-centered design by Hallie
Wilfert. [Link]
architecture-and-usercentered-design

You might also like