0 ratings 0% found this document useful (0 votes) 72 views 13 pages Web Usage Mining
The document discusses the three categories of web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining, each focusing on different aspects of knowledge discovery from web data. Web Content Mining extracts useful information from web page content, Web Structure Mining analyzes hyperlink structures, and Web Usage Mining predicts user behavior based on interaction patterns. The document also highlights challenges, techniques, advantages, and ethical concerns associated with web mining.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Web usage mining For Later vayvti24, 229 aM Web Content vs Web Structure vs Web Usage Mining -Javatpant
Decision Tree Induction
Educational Data Mining
Data Mining in Healthcare
Apriori Algorithm
Data Integration in Data Mining
Data mining vs Text mining
+ prev fs
Difference between Web Content, Web
Structure, and Web Usage Mining
Web mining is the application of data mining techniques to extract knowledge from web
Gata, including web documents, hyperlinks between documents, usage logs of websites,
etc. Web mining aims to discover and retrieve useful and interesting patterns from large
data sets and classic data mining. Big data act as data sets on web mining. Web data
includes information, documents, structure, and profile. Web mining is based on two
concepts defined, process-based and data-driven. In general, the use of web mining
typically involves several steps, such as collecting data, selecting the data before processing,
knowledge discovery, and analysis.
The internet has become a crucial part of our lives nowa:
extract data on the web are an interesting area of res
extract knowledge from Web data, in which at least oni
Gata is used in the mining process (with or without other
mining tasks can be classified into three categories:
1. Web content mining
2. Web structure mining
3. Web usage mining
bitps Avo javatpoint comiweb-content-vesweb-structure-vs-web-sage-minng
22912124, 939 aM ‘Web Content vs Web Structure vs Web Usage Mining -Javatpoint
All three categories focus on the process of knowledge discovery of implicit, previously
unknown, and potentially useful information from the web. Each of them focuses on
different mining objects of the web. Let's study all of the three categories in brief for good
understanding
What is Web Content Mining?
Web Content Mining can be used for the mining of useful data, information, and
knowledge from web page content. Web content mining performs scanning and mining of
the text, images, and group of web pages according to the content of the input by
displaying the list in search engines.
It is also quite different from data mining because web data are mainly semi-structured or
unstructured, while data mining deals primarily with structured data. Web content mining
is also different from text mining because of the semi-structured nature of the web, while
text mining focuses on unstructured texts. Thus, Web content mining requires creative
applications of data mining and text mining techniques and its own unique approaches
©
In the past few years, there has been a rapid expansion
mining area. This is not surprising because of the phenom
the significant economic benefit of such mining. Howeve
the lack of structure of web data, automated discov
knowledge information still present many challenging r4
mining could be differentiated from two approaches, such
1. Agent-based Approach
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 312912124, 939 aM ‘Web Content vs Web Structure vs Web Usage Mining -Javatpoint
This approach involves intelligent systems. It aims to improve information finding and
filtering. It usually relies on autonomous agents that can identify relevant websites. And it
could be placed into the following three categories, such as:
° Intelligent Search Agents: These agents search for relevant information using
domain characteristics and user profiles to organize and interpret the discovered
information.
© Information Filtering or Categorization: These agents use information retrieval
techniques and characteristics of open hypertext Web documents to retrieve
automatically, filter, and categorize them.
© Personalized Web Agents: These agents learn user preferences and discover
Web information based on other users’ preferences with similar interests.
Data based approach
Data based approach is used to organize semi-structured data present on the internet into
structured data. It aims to model the web data into a more structured form to apply
standard database querying mechanisms and data mining applications to analyze it
Web Content Mining Challenges
Web content mining has the following problems or chal
such as:
© Data Extraction: Extraction of structured data from Web pages, such as
products and search results, Extracting such data allows one to provide services.
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 4129tarnt24, 994M ‘Web Content ve Web Siicture vs Web Usoge Mining -Javatpint
Two main types of techniques, machine learning and automatic extraction, are
used to solve this problem
° Web Information Integration and Schema Matching: Although the Web
contains a huge amount of data, each website (or even page) represents similar
information differently. Identifying or matching semantically similar data is an
important problem with many practical applications.
° Opi
n extraction from online sources: There are many online opinion sources,
eg, customer reviews of products, forums, blogs, and chat rooms. Mining
opinions are of great importance for marketing intelligence and product
benchmarking.
° Knowledge synthesis: Concept hierarchies or ontology are useful in many
applications, However, generating them manually is very time-consuming. The
main application is to synthesize and organize the pieces of information on the
web to give the user a coherent picture of the topic domain. A few existing
methods that explore the web's information redundancy will be presented.
° Segmenting Web pages and detecting noise: In many Web applications, one
only wants the main content of the Web page without advertisements,
navigation links, copyright notices. Automatically segmenting Web pages to
extract the pages’ main content is an interesting problem.
What is Web Structure Mining?
The challenge for Web structure mining is to deal with the structure of the hyperlinks
within the web itself. Link analysis is an old area of research. However, with the growing
interest in Web mining, the research of structure analysis has increased. These efforts
resulted in a newly emerging research area called Link Mining, which is located at the
intersection of the work in link analysis, hypertext, web mi
logic programming, and graph mining
Web structure mining uses graph theory to analyze a
structure. According to the type of web structural datal
divided into two kinds:
° Extracting patterns from hyperlinks in the web: a hyperlink is a structural
component that connects the web page to a different location.
bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng 5129varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
© Mining the document structure: analysis of the tree-like structure of page
structures to describe HTML or XML tag usage
The web contains a variety of objects with almost no unifying structure, with differences in
the authoring style and content much greater than in traditional collections of text
documents. The objects in the WWW are web pages, and links are in, out, and co-citation
(two pages linked to by the same page). Attributes include HTML tags, word appearances,
and anchor texts. Web structure mining includes the following terminology, such as
© Web graph:directed graph representing web.
© Node: web page in the graph.
Edg
hyperlinks.
In degree: the number of links pointing to a particular node
Out degree: number of links generated from a particular node.
An example of a technique of web structure mining is the PageRank algorithm used by
Google to rank search results. A page's rank is decided by the number and quality of links
pointing to the target node.
Link mining had produced some agitation on some traditional data mining tasks. Below we
summarize some of these possible tasks of link mining which are applicable in Web
structure mining, such as:
1. Link-based Classification: The most recent upgrade of a classic data mining task to linked
Domains. The task is to predict the category of a web page based on words that occur on the
page, links between pages, anchor text, html tags, and other possible attributes found on the
web page. ©
2. Link-based Cluster Analysis: The data is segmented intd
grouped together, and dissimilar objects are group
previous task, link-based cluster analysis is unsupervised 2
patterns from di
3. Link Type: There is a wide range of tasks concerning pred
predicting the type of link between two entities or predict:
4, Link Strength: Links could be associated with weights.
5. Link Cardinality: The main task is to predict the number of links between objects. page
categorization used to
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 629varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
© Finding related pages.
© Finding duplicated websites and finding out the similarity between them
What is Web Usage Mining?
Web Usage Mining focuses on techniques that could predict the behavior of users while
they are interacting with the WWW. Web usage mining, discovering user navigation
patterns from web data, trying to discover useful information from the secondary data
derived from users’ interactions while surfing the web. Web usage mining collects the data
from Weblog records to discover user access patterns of web pages. Several available
research projects and commercial tools analyze those patterns for different purposes. The
insight knowledge could be utilized in personalization, system improvement, site
modification, business intelligence, and usage characterization,
The only information left behind by many users visiting a Web site is the path through the
©
pages they have accessed. Most of the Web information
information, while they ignore the link information that c
there are mainly four kinds of data mining techniques ap
to discover the user navigation pattern, such as:
1. Association Rule Mining
Association rule is the most basic rule of data mining methods which is used more than
other methods in web usage mining. This method enables the website for more efficient
content organization or provides recommendations for an effective cross-selling product.
bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng
m9varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
These rules are statements in the form X => Y where (X) and (Y) are the
et of available items
in a series of transactions. The rule of X => Y states that transactions that contain items in X
may also include iterns in Y. Associat
on rules in the web usage mining are used to find
relationships between pages that frequently appear next to one another in user sessions.
2. Sequential Patterns
©
Sequential pate!
's are used to discover the subsequenct
Gata. In web usage mining, sequential patterns are used
that frequently appear at meetings. The sequential patte|
rules, But the sequential patterns are included the time, wt
events that occurred is defined in sequential patterns. Al
association rules can also be used to generate sequential patterns. Two types of algorithms
are used for sequential mining patterns.
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining
8129varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
© The first type of algorithm is based on association rules mining. Many common
algorithms of sequential mining patterns have been changed for mining
association rules. For example, GSP and AprioriAll are two developed species of
Apriori algorithms that are used to extract association rules. But some
researchers believe that association rules mining algorithms do not have enough
performance in the long sequential patterns mining
© The second type of sequential patterns mining algorithms has been introduced
in which the tree structure and Markov chain are used to represent survey
patterns. For example, in one of these algorithms called WAP-mine, the tree
structure called WAP-tree is used to explore access patterns to the web.
Evaluation results show that its performance is higher than an algorithm such as
sp.
3. Clustering
Clustering techniques diagnose groups of similar iterns among high volumes of data. This is
done based on distance functions which measure the degree of similarity between
different items. Clustering in web usage mining is used for grouping similar meetings.
What is important in this type of search is the contrast between the user and individual
groups. Two types of interesting clustering can be found in this area: user clustering and
page clustering.
Clustering of user records is usually used to analyze web mining and web analytics tasks,
More knowledge derived from clustering is used to partition the market in e-commerce.
Different methods and techniques are used for clustering, which includes:
© Using the similarity graph and the amount of time spent viewing a page to
estimate the similarity of meetings.
©
® Using genetic algorithms and user feedback
© Clustering matrix
© K-means algorithm, which is the most classic clust|
The repetitive patterns are first extracted from the user's s
other clustering methods. Then, these patterns are used to construct a graph where the
nodes are the visited pages. The edges of the graph connect two or more pages. If these
pages exist in a pattern extracted, the weight will be assigned to the edges that show the
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 929varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
relationship between the nodes. Then, for clustering, this graph is recursively divided to user
behavior groups are detected.
4, Classification Mi
Discovering classification rules allows one to develop a profile of items belonging to a
particular group according to their common attributes. This profile can classify new data
items added to the database. In Web Mining, classified techniques allow one to develop a
profile for clients who access particular server files based on demographic information
available on those clients or their navigation patterns.
Advantages
Web usage mining has many advantages, making this technology attractive to
corporations, including government agencies.
© This technology has enabled e-commerce to do personalized marketing,
resulting in higher trade volumes. Governmen@agencies are using this
technology to classify threats and fight against terr
© Companies can establish better customer relatio}
customer's needs better and reacting to custo]
increase profitability by target pricing based on
even find customers who might default to a comp]
retain the customer by providing promotional offers to the specific customer,
thus reducing the risk of losing a customer or customers.
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 0129‘2724, 2994M ‘Web Content ve Web Siicture vs Web Usoge Mining -Javatpint
® More benefits of web usage mining, particularly personalization, are outlined in
specific frameworks like the probabilistic latent semantic analysis model, which
offers additional features to user behavior and access patterns. This is because
the process provides the user with more relevant content through collaborative
recommendations.
© There are also elements unique to web usage mining that show the technology's
benefits. These include the way semantic knowledge is applied when
interpreting, analyzing and reasoning about usage patterns during the mining
phase
Disadvantages
Web usage mining by itself does not create issues, but when used on data of personal
nature, this technology might cause concerns.
© The most criticized ethical issue involving web usage mining is the invasion of
privacy. Privacy is considered lost when information concerning an individual is
obtained, used, or disseminated, especially if this occurs without the individual's
knowledge or consent. The obtained data will be analyzed, made anonymous,
and then clustered to form anonymous profiles.
© These applications de-individualize users by judging them by their mouse clicks
rather than by identifying information. De-individualization, in general, can be
defined as a tendency to judge and treat people based on group characteristics
instead of on their characteristics and merits
© The companies collecting the data for a specific purpose might use the data for
totally different purposes, violating the user's interests.
©
Web Usage Mining Applications
The main objective of web usage mining is to collect
patterns. This information can improve the Web sites in th
applications of this mining, such as:
1. Privatization of web content
Web usage mining techniques can be used for the personalization of web users. For
example, user behavior can be immediately predicted by comparing her current survey
bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 19varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant
patterns with those extracted from the log files. Recommendation systems with a real
s. Some sites
application in this area suggest links that direct the user to his favorite pag
also organize their product catalogs based on the predicted interests of a specific user and
represent them.
2. Pre- recovery
The results of web usage mining can be used to improve the performance of Web servers
and Web-based applications. Web usage mining can be used for retrieving and caching
strategies and thus reduce the response time of Web servers.
3. Improvement of Web site design
Usability is one of the most important issues in designing and implementing websites. The
results of web usage mining can help to appropriate the design of websites. Adaptive
websites are an application of this type of mining. Website content and structure are
dynamically reorganized based on data derived from user b@avior in these sites.
Difference between Web Content, Web Str|
Mining
Here are the following difference between web content,
mining, such as:
ard Web Content MOS Etta a rT
bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng 1029varn1i24, 29 AM Web Content vs Web Structure vs Web Usage Mining - Javatpant
eng Py RUC
View of data © Unstructured ° Semi- Link structure Interacti
structured
° Structured
© Website as
DB
Main data © Text Hypertext Link structure © Ser
documents gocuments
© Brow
© Hypertext logs
documents
reco” © Machine © Proprietary proprietary] © Mact
Learning algorithm algorithm learn
© Statistical © Association © Stati:
(Including rules
© Asso
NLP)
Rule:
Representation © Bag of © Edged Graph © Rela
words, n= labeled Table
gram terms graph
© Grap
° Phrases, © Relational
concepts, or ©
ontology
© Relational
Application 5 Categorization _© Finding
Categories renten
© Clustering ° Clustering
substructures,
© Adapt
and
bitps Aww javatpoint comiweb-conten-vs-web-structure-vs-web-sage-minng 13129sarv1i24, 2:29 AM Web Content vs Web Structure vs Web Usage Mining Javatpant
° Finding ° Web site moan
Extract rules schema
discovery
° Finding
Patterns in
text
Dost What is Binning in Data Mining
Learn Important Tutorial
a ¢
bitps Aww javatpolnt comiweb-content-ve-web-structure-vs-web-sage-minng 4129