Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, 2010 10th International Conference on Hybrid Intelligent Systems
…
6 pages
1 file
This work presents the integration of a fuzzy method and text mining to obtain an approach that enables the text documents classification to be closer to the user needs. The aim of this work is to develop a mechanism to reduce the high dimensionality of the attribute-value matrix obtained from the documents and, with this, to manage the imprecision and uncertainty using fuzzy rules to classify text documents. Some experiments have been run using different domains in order to validate the proposed approach and to compare the results with the ones obtained with the Ibk, J48, Naive Bayes and OneR classification methods. The advantages of the method, the experiments and the results obtained are discussed.
mirlabs.org
In this work, we present a method to generate, from text documents, fuzzy rules used to classify documents and to improve the information retrieval. With this method, we face the issue of dimensionality in text documents for information retrieval. We also present a comparison analysis among the method that we proposed and well-known machine learning methods for classification. The aim of our work is to develop a mechanism to reduce the high dimensionality of the attribute-value matrix obtained from the documents and, consequently, scale up the proposed classifier. Some experiments have been run using different domains in order to validate the proposed approach and compare the results with the ones obtained with the OneR, K-Nearest Neighbor classifier, C4.5, Multi-variable Naive Bayes, and SVM methods. The experiments and the obtained results showed that this is a promising approach to deal with the dimensionality problem of document for information retrieval.
Text categorization is mostly required to label the documents automatically with the predefined set of topics. It has been achieved by the large number of advanced machine learning algorithms. In the proposed system, fuzzy rule along with Bayesian classification method is proposed for automatic text categorization using the class-specific features. The proposed method selects the particular feature subset for each class. Then, these class features are applied for the classification. To achieve this, Baggenstoss's PDF Projection Theorem is followed to reconstruct PDF in raw data space from the class-specific PDF in low-dimensional feature space and build the fuzzy based Bayes classification rule. The noticeable significance of this method is that most feature selection criteria such as information gain and maximum discrimination which can be easily incorporated into the proposed method. The proposed classification performance is evaluated on different datasets and compared with the different feature selection methods. The experimental results illustrate that the effectiveness of the proposed method and further indicates its wide applications in text categorization.
Advances in Intelligent Systems and Computing, 2015
In this paper, a supervised automatic text documents classification using the fuzzy decision trees technique is proposed. Whatever the algorithm used in the fuzzy decision trees, there must be a criterion for the choice of discriminating attribute at the nodes to partition. For fuzzy decision trees two heuristics are usually used to select the discriminating attribute at the node to partition. In the field of text documents classification there is a heuristic that has not yet been tested. This paper tested this heuristic.
International Journal of Data Mining & Knowledge Management Process, 2012
In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity based models is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimental results of these models are also discussed. The technical comparisons among each model's parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.
International Journal of Mathematical Sciences and Computing, 2015
Document clustering is an integral and important part of text mining. There are two types of clustering, namely, hard clustering and soft clustering. In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster. Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster. Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering. In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm. Enron email dataset is used for experimental purpose. Using FCM clustering algorithm, emails are classified into different clusters. The results obtained are compared with the output produced by k means clustering algorithm. The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.
Journal of Computer Science, 2016
The ever-increasing amount of information on the Web is organized in structured, semi-structured and unstructured data. Text classification systems, capable of handling such different structures, may facilitate the work of important tasks such as indexation and information retrieval in search engines. The objective of this research is to develop a method for the classification of documents into multiple categories with fuzzy logic. This method was built from a process of pattern recognition and, also, two variables called similarity and accuracy were used. The proposed fuzzy classification method uses variables that express the ability to analyze the similarity and accuracy of a document through a database of terms. The database of terms is generated by a collection of pre-classified documents in categories of interest. The documents processed according to the similarity and accuracy in the database of terms composes a training set also called knowledge base. From this database, it is possible to identify a pattern that specifies a set of rules through a knowledge discovery process. This process involves the data mining of the knowledge base. Thus, it was possible to define a general model that is used in the creation of rules and membership functions of the fuzzy model for the classification of documents into multiple categories. The general model of the rules identified in the data mining process and implemented in fuzzy model considers the most significant variables and also contributes to the specification of the membership functions, such as the definition of linguistic terms of fuzzy sets. Thus, it was possible to implement a more deterministic approach regarding the input, membership functions and inference rules of the fuzzy model. The results of the proposed method for classification of documents are relevant because they have a satisfactory accuracy rate.
2014
Text categorization is the task in which text documents are classified into one or more of predefined categories based on their contents. This paper shows that the proposed system consists of three main steps: text document representation, classifier construction and performance evaluation. In the first step, a set of pre-classified text documents is provided. Each text document is initially preprocessed in order to be split into features, these features are weighted based on the frequency of each feature in that text document and eliminate the non-informative features. The remaining features are next standardized by reducing a feature to its root using the stemming process. Due to the large number of features even after the non-informative features removal and the stemming process, the proposed system applies specific thresholds to extract distinct features which represent that text document. In the second step, the text categorization model (classifier) is built by learning the distinct features which represent all the preclassified text documents for each sub-category of main categories; this process can be achieved by using one of the supervised categorization techniques that is called the rough set theory. Thereafter, the model uses a pair of precise concepts from the above theory that are called the lower and upper approximations to classify any test text document into one or more of main categories and sub-categories. In the final step, the performance of the proposed system is evaluated. It has achieved good results up to 96%, when applied to a number of test text documents for each sub-category of main categories.
Lecture Notes in Computer Science, 2006
In this paper we develop the general framework for text representation based on fuzzy set theory.
Arxiv preprint arXiv:1009.4994, 2010
As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic categorization of text can provide ...
In this paper, fuzzy association rules are used in a text framework. Text transactions are defined based on the concept of fuzzy association rules considering each attribute as a term of a collection. The purpose of the use of text mining technologies presented in this paper is to assist users to find relevant information. The system helps the user to formulate queries by including related terms to the query using fuzzy association rules. The list of possible candidate terms extracted from the rules can be added automatically to the original query or can be shown to the user who selects the most relevant for her/his preferences in a semi-automatic process.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computational Science and Its …, 2004
Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063), 2000
International Journal of Intelligent Systems and Applications