0% found this document useful (0 votes)

61 views11 pages

2nd Project Report Pse12april

The report summarizes a project on developing a personalized search engine. It discusses collecting user data through search history and preferences to understand search patterns. Data analysis techniques are applied to discover usage patterns and interests to tailor search results. The Databionic ESOM tool is proposed to cluster and analyze web traffic data to gain insights into user behavior and refine the search personalization.

Uploaded by

Khalid Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views11 pages

2nd Project Report Pse12april

Uploaded by

Khalid Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

Project Progress Report

Personalized Search Engine

Under the Guidance of Mr. Rinkaj Goyal Asst. Prof. USIT GGSIPU

Made by: Akshat Agrawal M.Tech(IT) I

01816405309

1. Introduction:

There are 300 million people use Google search engine in a day.Web has become a huge repository of information and keeps growing exponentially under no editorial control. Providing people with access to information is not the problem. The problem is that people with varying needs and preferences navigate through large Web structures, missing the goal of their inquiry. Web personalization is one of the most promising approaches for alleviating this information overload. Personalization can be the solution to this information overload problem, as its objective is to provide users with what they want or need, without having to ask for it explicitly.

1.1

Personalization in Search:

Like a person who is evangelist of any subject if they type basic word of their subject on search engine then they doesn't want the basic information about that topic. They want some advanced knowledge about that topic for this they have to use various keywords and spend lot of time to know the advanced or valuable information which they want.

1.2

Contemporary personalized search engine

In the personalization of your search Google plays a very Intelligent role .Google does introspection of every step of your search. Whatever you want to search. If you use these services provide by Google like: Gmail, Ad Sense and Google Analytics. Yahoo also provides facility of personalization in search. When we search anything on Google with login our Gmail id then everything we going to search will be saved and monitored by Google. Content link and how many pages you visited on that search that is also saved in that web history with time and date. Then when we search same thing often then that page will shown on the top of our search.

2. Concept behind the personalization of search engine

Our main interest is building web applications that do take into account the input and behaviour of every user in the system, over time, as well as any other potentially useful information that may be available. Lets say that you start using a web application to order food, and every Wednesday You order paneer. Youd have a much better experience if, on Wednesdays, the application asked you Would you like paneer today? instead of What would you like to order today? In the first case, the application somehow realized that you like paneer on Wednesdays. In the second case, the application remains oblivious to this fact. Thus, the data created by your interaction with the site doesnt affect how the application chooses the content of a page or how its presented. Asking a question thats based on the users prior selections introduces a new kind of interactivity between the website and its users. So, we could say that websites with that property have a learning capacity. So we have to use intelligent website who knows our behaviour and nature of work. So that in personalization we have to use a concept that is called intelligent web.

2.1

Personalized process decomposed

Data acquisition Data analysis Personalized output

(a) (b) (c)

(a) Data acquition In the large majority of cases, Web personalization is a data-intensive task that is based on three general types of data: data about the user, data about the Website usage and data about the software and hardware available on the users side. User data. This category denotes information about personal characteristics of the user. Several such types of data have been used in personalization applications,

such as: Demographics Users knowledge Skills and capabilities Interests and preferences Goals and plans Data Usage Environment data (b)Data Analysis User profiling dramatically affects the kinds of analysis that can be applied after the phase of data acquisition in order to accomplish more sophisticated personalization. The techniques that may be applied for further analyzing and expanding user profiles so as to derive inferences vary and come from numerous scientific areas that comprise artificial intelligence, machine learning, statistics, and information retrieval. Data Preparation and Preprocessing The objective of this phase is to derive a set of server sessions from raw usage data, as recorded in the form of Web server logs. Before proceeding with a more detailed description of data preparation, it is necessary to provide a set of data abstractions as introduced by the W3C1 (World Wide Web Consortium) for describing Web usage. A server session is defined as a set of page views served due to a series of HTTP requests from a single user to a single Web server. Pattern Discovery Pattern discovery aims to detect interesting patterns in the pre-processed Web usage data by deploying statistical and data mining methods.

Pattern Analysis In this final phase the objective is to convert discovered rules, patterns and statistics into knowledge or insight involving the Website being analyzed. Knowledge here is an abstract notion that in essence describes the transformation from information

to understanding; it is thus highly dependent on the human performing the analysis and reaching conclusions.

(c)

Personalized Output

After gathering the appropriate input data (about the user, the usage and/or the usage environment), storing them using an adequate representation and analyzing them for reaching secondary inferences, what remains is to explore and decide upon the kind of adaptations the Website will deploy in order to personalize itself.

Content Structure Presentation and media format

3. Intelligent web traffic mining and analysis

With the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge and information to the end users. Discovering hidden and meaningful information about Web users usage patterns is critical to determine effective marketing strategies to optimize the Web server usage for accommodating future growth. Most of the

currently available Web server analysis tools provide only explicitly and statistical information without real useful knowledge for Web managers. The task of mining useful information becomes more challenging when the Web traffic volume is enormous and keeps on growing. The World Wide Web (WWW) is continuously growing with the information transaction volume from Web servers and the number of requests from Web users. Providing Web administrators with meaningful information about users access behaviour and usage patterns has become a necessity to improve the quality of Web information service performances. As such, the hidden knowledge obtained from mining Web server traffic and user access patterns could be applied directly for marketing and management of E-business, E-services, E-searching, E-education and so on.

3.1

Web Mining Web usage mining Web content mining Web structure mining

(a) (b) (c)

3.2

Web analysis

(a) past usage patterns (b) degree of shared content (C) inter-memory associative link structures

Traffic and data analysis

To see this thing practically i will follow a research paper from(Xiaozhe Wanga,*, Ajith
Abrahamb, Kate A. Smitha aSchool of Business Systems, Faculty of Information Technology, Monash University,Clayton, Victoria 3800, Australia bDepartment of Computer Science, Oklahoma State University, 700 N Greenwood Avenue,Tulsa, OK 74106-0700, USA)

According to that paper researchers follow the Monash University Web server data and then analyze it.to see the users usage pattern and then know the how many users want the same query answer and do full traffic analysis. The typical Web traffic patterns of Monash University in Fig. 1(a) and (b) are showing the daily and hourly Web traffic (request volume and page volume) on the main server site for the week starting from 14-Jul2002, 00:13 A.M to

20-Jul-2002, 12:22 A.M. Generally, in a week, Monash Universitys main Web server (Server Usage Statistics) receives over 7 million hits.

4.1 Hybrid neuro-fuzzy approach for web traffic mining and prediction The hybrid framework combines SOM and Fuzzy Inference System (FIS) operating in a concurrent environment as shown in Fig. 2. In this concurrent model, neural network assists the fuzzy system continuously to determine the required parameters especially when certain input variables cannot be measured directly. Such combinations do not optimise the fuzzy system but only aids to improve the performance of the overall system

4.2 Data clustering and experimental analysis using SOM Web usage mining normally contains four processing stages including data collection, data preprocessing, pattern discovery and pattern analysis. The data source selected for our approach is from the Web traffic data generated by the Analog Web access log file analyzer.

So for this we can understand data analysis traffic analysis to know what the user wants and how many types of users having the same query and how many types of users want the same application to and how many user send request to different -2 applications .

5. Databionic ESOM Tool

Data bionic emergent self organising tool will provide us to do the same data analysis and lot many things for any kind of data repository so we can easily measure the behaviour of user and their pattern of usage. Data bionic tool describe these things. The Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and classification with Emergent SelfOrganizing Maps (ESOM). Features include: Training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels.

Using following steps databionic tool analyze the data ,and user patterns.
1. 2. 3. 4. 5. 6. 7.

Preprocessing Training Visualization Data Analysis Clustering Projection Classification

By using this tool we will all user pattern and usage then by using tool and then we develop specific algorithm for personalization of the search engine .

Problems of personalization in search

This personalization will done only if user will feel free to provide his personal information . Personalization based on knowledge about likes and dislikes of users .

These techniques required users to input personal information about their interests, needs and/or preferences. this posed in many cases a big obstacle, since Web users are not usually cooperative in revealing these types of data. Due to such personal problems. Personalization is based on the assumption that we can find clues about how to personalize information

References :
1. Web mining application and technique by Anthony scime (www.ideagroup.com)

2. Intelligent web algorithm by harlamambos marmenis and Dmitry babenko

3. Xiaozhe Wanga,*, Ajith Abrahamb, Kate A. Smith(aSchool of Business Systems, Faculty of
Information Technology, Monash University,Clayton, Victoria 3800, Australia bDepartment of Computer Science, Oklahoma State University, 700 N Greenwood Avenue,)

4. databionic-esom.sourceforge.net

IRJET V2i102
No ratings yet
IRJET V2i102
7 pages
Improving Web Search Results in Web Personalization
No ratings yet
Improving Web Search Results in Web Personalization
4 pages
Ev03 Toit
No ratings yet
Ev03 Toit
22 pages
Web Personalization Survey
No ratings yet
Web Personalization Survey
7 pages
Web Personalization
No ratings yet
Web Personalization
5 pages
Cluster Optimization For Improved Web Usage Mining
No ratings yet
Cluster Optimization For Improved Web Usage Mining
6 pages
Web Mining For Web Personalization
No ratings yet
Web Mining For Web Personalization
37 pages
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
7 pages
H 5
No ratings yet
H 5
13 pages
Web Usage Mining: Patterns & Applications
No ratings yet
Web Usage Mining: Patterns & Applications
27 pages
Paper Prs Ntation Apurva
No ratings yet
Paper Prs Ntation Apurva
5 pages
Advances in Web Inteligent-2
No ratings yet
Advances in Web Inteligent-2
190 pages
Web Usage Mining Techniques
No ratings yet
Web Usage Mining Techniques
51 pages
Framework For Web Personalization Using Web Mining
No ratings yet
Framework For Web Personalization Using Web Mining
6 pages
Intelligent Web Mining Techniques Using Semantic Web
No ratings yet
Intelligent Web Mining Techniques Using Semantic Web
7 pages
Web Usage Mining Overview
No ratings yet
Web Usage Mining Overview
32 pages
A Novel Technique To Predict Oftenly Used Web Pages From Usage Patterns
No ratings yet
A Novel Technique To Predict Oftenly Used Web Pages From Usage Patterns
7 pages
Clustering and Classification
No ratings yet
Clustering and Classification
1 page
Ijctt V3i4p110
No ratings yet
Ijctt V3i4p110
3 pages
Purposal Finla Touch
No ratings yet
Purposal Finla Touch
13 pages
Algorithm For Tracing Visitors' On-Line Behaviors
No ratings yet
Algorithm For Tracing Visitors' On-Line Behaviors
7 pages
Web Mining PPT 4121
No ratings yet
Web Mining PPT 4121
18 pages
Web Usage Mining Techniques Explained
No ratings yet
Web Usage Mining Techniques Explained
34 pages
Ws Personal
No ratings yet
Ws Personal
16 pages
Log Paper-1
No ratings yet
Log Paper-1
15 pages
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
No ratings yet
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
9 pages
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
No ratings yet
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
5 pages
9-Advanced Preprocessing Using Distinct User
No ratings yet
9-Advanced Preprocessing Using Distinct User
5 pages
Webpersonalizer: A Server-Side Recommender System Based On Web Usage Mining
No ratings yet
Webpersonalizer: A Server-Side Recommender System Based On Web Usage Mining
12 pages
Ijesat 2012 02 Si 01 12
No ratings yet
Ijesat 2012 02 Si 01 12
5 pages
Web Mining: Presented By: Vikash Kumar
No ratings yet
Web Mining: Presented By: Vikash Kumar
24 pages
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
No ratings yet
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
6 pages
Web Analytics, Web Mining, and Social Analytics
No ratings yet
Web Analytics, Web Mining, and Social Analytics
53 pages
Chapter 12: Web Usage Mining: - An Introduction
No ratings yet
Chapter 12: Web Usage Mining: - An Introduction
34 pages
A Study of Web Traffic Analysis
No ratings yet
A Study of Web Traffic Analysis
8 pages
Web Analytics Tutorial Guide
No ratings yet
Web Analytics Tutorial Guide
29 pages
Web Usage Mining: Patterns and Applications
No ratings yet
Web Usage Mining: Patterns and Applications
12 pages
Web Usage Mining Techniques Explained
No ratings yet
Web Usage Mining Techniques Explained
29 pages
Web Analytics Tutorial Overview
No ratings yet
Web Analytics Tutorial Overview
29 pages
Web Mining For BI - Part 2
No ratings yet
Web Mining For BI - Part 2
31 pages
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
27 pages
Semantic Approach For Improving Pattern Quality in Web Usage Mining
No ratings yet
Semantic Approach For Improving Pattern Quality in Web Usage Mining
7 pages
Web Mining
No ratings yet
Web Mining
13 pages
Data Mining For Web Personalization
No ratings yet
Data Mining For Web Personalization
59 pages
7 PDF
No ratings yet
7 PDF
10 pages
Unit 5 DM
No ratings yet
Unit 5 DM
61 pages
Web Usage Mining
No ratings yet
Web Usage Mining
14 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
An Improved Heuristic Approach To Page Recommendation in Web Usage Mining
No ratings yet
An Improved Heuristic Approach To Page Recommendation in Web Usage Mining
4 pages
Neuro-Fuzzy Based Hybrid Model For Web Usage Minin
No ratings yet
Neuro-Fuzzy Based Hybrid Model For Web Usage Minin
9 pages
Experiment 9: Web Mining
No ratings yet
Experiment 9: Web Mining
9 pages
Web Assignment1
No ratings yet
Web Assignment1
4 pages
A Review On Clustering Techniques
No ratings yet
A Review On Clustering Techniques
4 pages
Intr To Web X.0
No ratings yet
Intr To Web X.0
13 pages
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
No ratings yet
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
4 pages
Web Analytics Overview Guide
No ratings yet
Web Analytics Overview Guide
13 pages
DWM Report
No ratings yet
DWM Report
12 pages
Roadmap Web Mining
No ratings yet
Roadmap Web Mining
8 pages
Web X.0
No ratings yet
Web X.0
49 pages
Sample HTML
No ratings yet
Sample HTML
8 pages
Career Planning Guide: Steps to Success
No ratings yet
Career Planning Guide: Steps to Success
2 pages
A Review On Developing Critical Thinking Skills Through Literary Texts
No ratings yet
A Review On Developing Critical Thinking Skills Through Literary Texts
6 pages
Non Equilibrium Thermodynamics and Physical Kinetics de Gruyter Textbook 2nd Edition Halid Bikkin Online PDF
No ratings yet
Non Equilibrium Thermodynamics and Physical Kinetics de Gruyter Textbook 2nd Edition Halid Bikkin Online PDF
127 pages
Komatsu Trash Compactors Wf550 3 Workshop Manuals
100% (45)
Komatsu Trash Compactors Wf550 3 Workshop Manuals
20 pages
Computer Science Project 12th
No ratings yet
Computer Science Project 12th
5 pages
WRC Nozzle Loads
No ratings yet
WRC Nozzle Loads
3 pages
Dependence vs. Independence in Society
No ratings yet
Dependence vs. Independence in Society
2 pages
Young Et Al 2023 Patterns of Host Plant Use Do Not Explain Mushroom Body Expansion in Heliconiini Butterflies
No ratings yet
Young Et Al 2023 Patterns of Host Plant Use Do Not Explain Mushroom Body Expansion in Heliconiini Butterflies
9 pages
Lembar Kerja Siswa Procedure Text
No ratings yet
Lembar Kerja Siswa Procedure Text
8 pages
Oxford Handbook of Medical Dermatology PDF Download
100% (2)
Oxford Handbook of Medical Dermatology PDF Download
19 pages
Virtual Memory & File Systems Guide
No ratings yet
Virtual Memory & File Systems Guide
42 pages
NCR Final Shs Eng q1m1 Creativenonfiction I
No ratings yet
NCR Final Shs Eng q1m1 Creativenonfiction I
15 pages
Cam Attendance Scaner PDF
No ratings yet
Cam Attendance Scaner PDF
12 pages
Digital Platforms Contribution To Improvement of Service Provision To Citizens in Nampula
No ratings yet
Digital Platforms Contribution To Improvement of Service Provision To Citizens in Nampula
13 pages
Unit 3 AI
No ratings yet
Unit 3 AI
11 pages
Linux Firewall Setup Guide
No ratings yet
Linux Firewall Setup Guide
7 pages
CH 12 Brain Teasers
No ratings yet
CH 12 Brain Teasers
9 pages
De Cuong Anh 8 HK I 23-24 Oanh
No ratings yet
De Cuong Anh 8 HK I 23-24 Oanh
8 pages
International A Level Pure Maths Exam
No ratings yet
International A Level Pure Maths Exam
36 pages
CA Advanced Authentication - 8.1 - ENU - Collecting Device ID and DeviceDNA - 20170816
No ratings yet
CA Advanced Authentication - 8.1 - ENU - Collecting Device ID and DeviceDNA - 20170816
16 pages
Introduction to Telecommunication Engineering
No ratings yet
Introduction to Telecommunication Engineering
34 pages
24 SGW - ISO 450012018 vs. OHSAS 180012007 - Matrix
No ratings yet
24 SGW - ISO 450012018 vs. OHSAS 180012007 - Matrix
9 pages
Credit
No ratings yet
Credit
7 pages
Headworks Design in Steep Sediment Loaded River
No ratings yet
Headworks Design in Steep Sediment Loaded River
15 pages
CVT Transmission Wiring Diagram
No ratings yet
CVT Transmission Wiring Diagram
2 pages
Claudia Angelelli's Academic Profile
No ratings yet
Claudia Angelelli's Academic Profile
29 pages
The Stranger 2
No ratings yet
The Stranger 2
4 pages
Farmer Database Format NP & CAN User North Zone
No ratings yet
Farmer Database Format NP & CAN User North Zone
354 pages
November 2022 (v1) QP
No ratings yet
November 2022 (v1) QP
16 pages