Project Progress Report
Personalized Search Engine
Under the Guidance of Mr. Rinkaj Goyal Asst. Prof. USIT GGSIPU
Made by: Akshat Agrawal M.Tech(IT) I
st
01816405309
1. Introduction:
There are 300 million people use Google search engine in a day.Web has become a huge repository of information and keeps growing exponentially under no editorial control. Providing people with access to information is not the problem. The problem is that people with varying needs and preferences navigate through large Web structures, missing the goal of their inquiry. Web personalization is one of the most promising approaches for alleviating this information overload. Personalization can be the solution to this information overload problem, as its objective is to provide users with what they want or need, without having to ask for it explicitly.
1.1
Personalization in Search:
Like a person who is evangelist of any subject if they type basic word of their subject on search engine then they doesn't want the basic information about that topic. They want some advanced knowledge about that topic for this they have to use various keywords and spend lot of time to know the advanced or valuable information which they want.
1.2
Contemporary personalized search engine
In the personalization of your search Google plays a very Intelligent role .Google does introspection of every step of your search. Whatever you want to search. If you use these services provide by Google like: Gmail, Ad Sense and Google Analytics. Yahoo also provides facility of personalization in search. When we search anything on Google with login our Gmail id then everything we going to search will be saved and monitored by Google. Content link and how many pages you visited on that search that is also saved in that web history with time and date. Then when we search same thing often then that page will shown on the top of our search.
2. Concept behind the personalization of search engine
Our main interest is building web applications that do take into account the input and behaviour of every user in the system, over time, as well as any other potentially useful information that may be available. Lets say that you start using a web application to order food, and every Wednesday You order paneer. Youd have a much better experience if, on Wednesdays, the application asked you Would you like paneer today? instead of What would you like to order today? In the first case, the application somehow realized that you like paneer on Wednesdays. In the second case, the application remains oblivious to this fact. Thus, the data created by your interaction with the site doesnt affect how the application chooses the content of a page or how its presented. Asking a question thats based on the users prior selections introduces a new kind of interactivity between the website and its users. So, we could say that websites with that property have a learning capacity. So we have to use intelligent website who knows our behaviour and nature of work. So that in personalization we have to use a concept that is called intelligent web.
2.1
Personalized process decomposed
Data acquisition Data analysis Personalized output
(a) (b) (c)
(a) Data acquition In the large majority of cases, Web personalization is a data-intensive task that is based on three general types of data: data about the user, data about the Website usage and data about the software and hardware available on the users side. User data. This category denotes information about personal characteristics of the user. Several such types of data have been used in personalization applications,
such as: Demographics Users knowledge Skills and capabilities Interests and preferences Goals and plans Data Usage Environment data (b)Data Analysis User profiling dramatically affects the kinds of analysis that can be applied after the phase of data acquisition in order to accomplish more sophisticated personalization. The techniques that may be applied for further analyzing and expanding user profiles so as to derive inferences vary and come from numerous scientific areas that comprise artificial intelligence, machine learning, statistics, and information retrieval. Data Preparation and Preprocessing The objective of this phase is to derive a set of server sessions from raw usage data, as recorded in the form of Web server logs. Before proceeding with a more detailed description of data preparation, it is necessary to provide a set of data abstractions as introduced by the W3C1 (World Wide Web Consortium) for describing Web usage. A server session is defined as a set of page views served due to a series of HTTP requests from a single user to a single Web server. Pattern Discovery Pattern discovery aims to detect interesting patterns in the pre-processed Web usage data by deploying statistical and data mining methods.
Pattern Analysis In this final phase the objective is to convert discovered rules, patterns and statistics into knowledge or insight involving the Website being analyzed. Knowledge here is an abstract notion that in essence describes the transformation from information
to understanding; it is thus highly dependent on the human performing the analysis and reaching conclusions.
(c)
Personalized Output
After gathering the appropriate input data (about the user, the usage and/or the usage environment), storing them using an adequate representation and analyzing them for reaching secondary inferences, what remains is to explore and decide upon the kind of adaptations the Website will deploy in order to personalize itself.
Content Structure Presentation and media format
3. Intelligent web traffic mining and analysis
With the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge and information to the end users. Discovering hidden and meaningful information about Web users usage patterns is critical to determine effective marketing strategies to optimize the Web server usage for accommodating future growth. Most of the
currently available Web server analysis tools provide only explicitly and statistical information without real useful knowledge for Web managers. The task of mining useful information becomes more challenging when the Web traffic volume is enormous and keeps on growing. The World Wide Web (WWW) is continuously growing with the information transaction volume from Web servers and the number of requests from Web users. Providing Web administrators with meaningful information about users access behaviour and usage patterns has become a necessity to improve the quality of Web information service performances. As such, the hidden knowledge obtained from mining Web server traffic and user access patterns could be applied directly for marketing and management of E-business, E-services, E-searching, E-education and so on.
3.1
Web Mining Web usage mining Web content mining Web structure mining
(a) (b) (c)
3.2
Web analysis
(a) past usage patterns (b) degree of shared content (C) inter-memory associative link structures
4.
Traffic and data analysis
To see this thing practically i will follow a research paper from(Xiaozhe Wanga,*, Ajith
Abrahamb, Kate A. Smitha aSchool of Business Systems, Faculty of Information Technology, Monash University,Clayton, Victoria 3800, Australia bDepartment of Computer Science, Oklahoma State University, 700 N Greenwood Avenue,Tulsa, OK 74106-0700, USA)
According to that paper researchers follow the Monash University Web server data and then analyze it.to see the users usage pattern and then know the how many users want the same query answer and do full traffic analysis. The typical Web traffic patterns of Monash University in Fig. 1(a) and (b) are showing the daily and hourly Web traffic (request volume and page volume) on the main server site for the week starting from 14-Jul2002, 00:13 A.M to
20-Jul-2002, 12:22 A.M. Generally, in a week, Monash Universitys main Web server (Server Usage Statistics) receives over 7 million hits.
4.1 Hybrid neuro-fuzzy approach for web traffic mining and prediction The hybrid framework combines SOM and Fuzzy Inference System (FIS) operating in a concurrent environment as shown in Fig. 2. In this concurrent model, neural network assists the fuzzy system continuously to determine the required parameters especially when certain input variables cannot be measured directly. Such combinations do not optimise the fuzzy system but only aids to improve the performance of the overall system
4.2 Data clustering and experimental analysis using SOM Web usage mining normally contains four processing stages including data collection, data preprocessing, pattern discovery and pattern analysis. The data source selected for our approach is from the Web traffic data generated by the Analog Web access log file analyzer.
So for this we can understand data analysis traffic analysis to know what the user wants and how many types of users having the same query and how many types of users want the same application to and how many user send request to different -2 applications .
5. Databionic ESOM Tool
Data bionic emergent self organising tool will provide us to do the same data analysis and lot many things for any kind of data repository so we can easily measure the behaviour of user and their pattern of usage. Data bionic tool describe these things. The Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and classification with Emergent SelfOrganizing Maps (ESOM). Features include: Training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels.
Using following steps databionic tool analyze the data ,and user patterns.
1. 2. 3. 4. 5. 6. 7.
Preprocessing Training Visualization Data Analysis Clustering Projection Classification
By using this tool we will all user pattern and usage then by using tool and then we develop specific algorithm for personalization of the search engine .
6.
Problems of personalization in search
This personalization will done only if user will feel free to provide his personal information . Personalization based on knowledge about likes and dislikes of users .
These techniques required users to input personal information about their interests, needs and/or preferences. this posed in many cases a big obstacle, since Web users are not usually cooperative in revealing these types of data. Due to such personal problems. Personalization is based on the assumption that we can find clues about how to personalize information
References :
1. Web mining application and technique by Anthony scime (www.ideagroup.com)
2. Intelligent web algorithm by harlamambos marmenis and Dmitry babenko
3. Xiaozhe Wanga,*, Ajith Abrahamb, Kate A. Smith(aSchool of Business Systems, Faculty of
Information Technology, Monash University,Clayton, Victoria 3800, Australia bDepartment of Computer Science, Oklahoma State University, 700 N Greenwood Avenue,)
4. databionic-esom.sourceforge.net