Data Mining is used extensively in many sectors today, viz., business, health, security, informat... more Data Mining is used extensively in many sectors today, viz., business, health, security, informatics etc. The successful application of data mining algorithms can be seen in marketing, retail, and other sectors of the industry. The aim of this paper is to present the readers with the various data mining algorithms which have wide applications. This paper focuses on four data mining algorithms K-NN, Naïve Bayes Classifier, Decision tree and C4.5. An attempt has been made to do a comparative study on these four algorithms on the basis of theory, its advantages and disadvantages, and its applications. After studying all these algorithms in detail, we came to a conclusion that the accuracy of these techniques depend on various characteristics such as: type of problem, dataset and performance matrix.
International Journal of Applied Information Systems, 2014
Web search users usually submit short and ambiguous queries to specify their requirement. In orde... more Web search users usually submit short and ambiguous queries to specify their requirement. In order to improve performance of short and ambiguous queries, query expansion is used. Query expansion is as an effective way to improve the performance of information retrieval systems by adding relevant terms to the original query. After using search engine lots of data get accumulated, from which queries that have been used to retrieve documents are used. This data is stored as query log. These query logs provide valuable information to extract relationships between queries and documents that can be used in query expansion. This paper proposes method first to determine ambiguous queries using Kullback leibler distance model. It measures difference between two probability distributions. Second, relevant or most suitable expansion terms are selected from the documents with the analysis of relation between queries and documents. The relation can be evaluated by calculating frequency co-efficient with respect to document and document collection.
Data Mining is used extensively in many sectors today, viz., business, health, security, informat... more Data Mining is used extensively in many sectors today, viz., business, health, security, informatics etc. The successful application of data mining algorithms can be seen in marketing, retail, and other sectors of the industry. The aim of this paper is to present the readers with the various data mining algorithms which have wide applications. This paper focuses on four data mining algorithms K-NN, Naïve Bayes Classifier, Decision tree and C4.5. An attempt has been made to do a comparative study on these four algorithms on the basis of theory, its advantages and disadvantages, and its applications. After studying all these algorithms in detail, we came to a conclusion that the accuracy of these techniques depend on various characteristics such as: type of problem, dataset and performance matrix.
International Journal of Applied Information Systems, 2014
Web search users usually submit short and ambiguous queries to specify their requirement. In orde... more Web search users usually submit short and ambiguous queries to specify their requirement. In order to improve performance of short and ambiguous queries, query expansion is used. Query expansion is as an effective way to improve the performance of information retrieval systems by adding relevant terms to the original query. After using search engine lots of data get accumulated, from which queries that have been used to retrieve documents are used. This data is stored as query log. These query logs provide valuable information to extract relationships between queries and documents that can be used in query expansion. This paper proposes method first to determine ambiguous queries using Kullback leibler distance model. It measures difference between two probability distributions. Second, relevant or most suitable expansion terms are selected from the documents with the analysis of relation between queries and documents. The relation can be evaluated by calculating frequency co-efficient with respect to document and document collection.
Uploads
Papers by Lynette Lopes