2 Web Page Classification: Features and Algorithms

Mhs TEKNIK INFORMATIKA HENDRI6

2 Web Page Classification: Features and Algorithms

Mhs TEKNIK INFORMATIKA HENDRI6

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.

Figures (5)

Fig. 2. Flat classification and hierarchical classification.

Fig. 3. Neighbors within a radius of two.

(pages or queries) to related objects. This approach showed an improvement of 26% in F-measure over content-based Web page classification. In addition to Web page classi- fication, artificial connections built upon query results and query logs can be used for query classification.

APPAVU BALAMURUGAN

International Journal of Data Mining, Modelling and Management, 2013

The boom in the use of Web and its exponential growth are now well known. The amount of textual data available on the Web is estimated to be in the order of one terra byte, in addition to images, audio and video. This has imposed additional challenges to the Web directories which help the user to search the Web by classifying selected Web documents into subject. Manual classification of web pages by human expertise also suffers from the exponential increase in the amount of Web documents. Instead of using the entire web page for classifying it, this article emphasizes the need for automatic web page classification using minimum number of features in it. A method for generating such optimum number of features for web pages is also proposed. Machine learning classifiers are modeled using these optimum features. Experiments on the bench marking data sets with these machine learning classifiers have shown promising improvement in classification accuracy.

Log In

2 Web Page Classification: Features and Algorithms

Sign up for access to the world's latest research

Abstract

Related papers