Web Page Classification: A Soft Computing Approach

Ribeiro, Angela; Fresno, Víctor; Garcia-Alegre, María C.; Guinea, Domingo

Web Page Classification: A Soft Computing Approach

Victor Fresno

2003

visibility

…

description

10 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has generated a poorly organized environment that hinders the sharing and mining of useful data. The need for meaningful web-page classification techniques is therefore becoming an urgent issue. This paper describes a novel approach to web-page classification based on a fuzzy representation of web pages. A doublet representation that associates a weight with each of the most representative words of the web document so as to characterize its relevance in the document. This weight is derived by taking advantage of the characteristics of HTML language. Then a fuzzy-rule-based classifier is generated from a supervised learning process that uses a genetic algorithm to search for the minimum fuzzy-rule set that best covers the training examples. The proposed system has been demonstrated with two significantly different classes of web pages.

Victor Fresno

This paper addresses the issue of an adequate representation of a web page, to perform further on classification and data mining. The approach focuses the textual part of web pages, which are represented by a two-dimension vector. The vector components are sorted by the relevance of each word in the text. Two approaches, analytical and fuzzy, that take advantage of characteristics of the HTML language are presented to compute the word relevance. Both models are contrasted in learning and classification tasks, to evaluate the suitability of each approach. The experiments show an obvious improvement of fuzzy method versus analytical one. The analytical and fuzzy approaches here presented are general, in the sense that every characteristic of the web pages could be easily integrated without additional cost.

Log In

Web Page Classification: A Soft Computing Approach

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics