Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
4 pages
1 file
Clustering plays an important role in data mining. Its main job is division of data into groups. The similar type data is grouped one cluster and dissimilar data is grouped another cluster. But major problem with in clustering is to handle outliers. Outliers occur because of mechanical faults, system behaviour, human fault or mistake of natural deviations. Outlier detection refers to the problem of finding pattern in data that do not conform to expected normal behaviour. A variety of algorithms used to solve the problem of outliers. They are subject of this paper. This paper explores the behaviour of some clustering algorithms that performs on different type's dataset and methods to solve the problem of outliers.
Data mining, in general, deals with the discovery of non-trivial, hidden and interesting knowledge from different types of data. With the development of information technologies, the number of databases, as well as their dimension and complexity, grow rapidly. It is necessary what we need automated analysis of great amount of information. The analysis results are then used for making a decision by a human or program. One of the basic problems of data mining is the outlier detection. The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. In this thesis, the ability to detect outliers can be improved using a combined perspective from outlier detection and cluster identification. In proposed work comparison of four methods will be done like K-Mean, k-Mediods, Iterative k-Mean and density based method. Unlike the traditional clustering-based methods, the proposed algorithm provides much efficient outlier detection and data clustering capabilities in the presence of outliers, so comparison has been made. The purpose of our method is not only to produce data clustering but at the same time to find outliers from the resulting clusters. The goal is to model an unknown nonlinear function based on observed input-output pairs. The whole simulation of this proposed work has been taken in MATLAB environment.
2014
Outlier detection is a fundamental issue in data mining, specifically it has been used to detect and remove anomalous objects from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, network intrusions or human errors. Firstly, this thesis presents a theoretical overview of outlier detection approaches. A novel outlier detection method is proposed and analyzed, it is called Clustering Outlier Removal (COR) algorithm. It provides efficient outlier detection and data clustering capabilities in the presence of outliers, and based on filtering of the data after clustering process. The algorithm of our outlier detection method is divided into two stages. The first stage provides k-means process. The main objective of the second stage is an iterative removal of objects, which are far away from their cluster centroids. The removal occurs according to a chosen threshold. Finally, we provide experimental results from the application of our algori...
Applied Mechanics and Materials, 2014
The Outlier detection is one of the major issues that has been worked out deeply within the Data Mining domain. It has been used to detect dissimilar observations within the data taken into the account. Detection of outliers helps to recognize the system faults and thereby helping the administrators to take preventive measures before it rises. In this paper, we recommends a comprehensive survey of an outlier detection. We anticipate this survey will support a better understanding of various directions in which experimental approach can be done on this topic.
2011 Second International Conference on Emerging Applications of Information Technology, 2011
In this paper we propose a clustering based method to capture outliers. We apply K-means clustering algorithm to divide the data set into clusters. The points which are lying near the centroid of the cluster are not probable candidate for outlier and we can prune out such points from each cluster. Next we calculate a distance based outlier score for remaining points. The computations needed to calculate the outlier score reduces considerably due to the pruning of some points. Based on the outlier score we declare the top n points with the highest score as outliers. The experimental results using real data set demonstrate that even though the number of computations is less, the proposed method performs better than the existing method.
Data mining is the extraction of hidden predictive information from large databases. This is a technology with potential to study and analyze useful information present in data. Data objects which do not usually fit into the general behavior of the data are termed as outliers. Outlier Detection in databases has numerous applications such as fraud detection, customized marketing, and the search for terrorism. By definition, outliers are rare occurrences and hence represent a small portion of the data. However, the use of Outlier Detection for various purposes is not an easy task. This research proposes a modified PAM for detecting outliers. The proposed technique has been implemented in JAVA. The results produced by the proposed technique are found better than existing technique in terms of outliers detected and time complexity.
International Journal of Computer Applications, 2013
Data Mining is used to extract useful information from a collection of databases or data warehouses. In recent years, Data Mining has become an important field. This paper has surveyed upon data mining and its various techniques that are used to extract useful information such as clustering, and has also surveyed the techniques that are used to detect the outliers. This paper also presents various techniques used by different researchers to detect outliers and present the efficient result to the user.
This research paper deals with an outlier which is known as an unusual behavior of any substance present in the spot. This is a detection process that can be employed for both anomaly detection and abnormal observation. This can be obtained through other members who belong to that data set. The deviation present in the outlier process can be attained by measuring certain terms like range, size, activity, etc. By detecting outlier one can easily reject the negativity present in the field. For instance, in healthcare, the health condition of a person can be determined through his latest health report or his regular activity. When found the person being inactive there may be a chance for that person to be sick. Many approaches have been used in this research paper for detecting outliers. The approaches used in this research are 1) Centroid based approach based on K-Means and Hierarchical Clustering algorithm and 2) through Clustering based approach. This approach may help in detecting outlier by grouping all similar elements in the same group. For grouping, the elements clustering method paves a way for it. This research paper will be based on the above mentioned 2 approaches.
International Journal of Computer Applications, 2012
Outlier detection is a fundamental issue in data mining, specifically it has been used to detect and remove anomalous objects from data.mining. The proposed approach to detect outlier includes three methods which are clustering, pruning and computing outlier score. For clustering k-means algorithm is used which partition the dataset into given number of clusters. In pruning, based on some distance measure, points which are closed to centroid of each cluster are pruned. For the unpruned points, local distance based outlier factor (LDOF) measure is calculated. A measure called LDOF, tells how much a point is deviating from its neighbors. The high LDOF value of a point indicates that the point is deviating more from its neighbors and probably it may be an outlier.
International Journal of Advance Research and Innovative Ideas in Education, 2017
Data mining has the crucial task of Outlier detection which aims to detect an outlier from given data set. The data is said to be an outlier which appears to have inconsistent observation with the remaining data. Outliers are generated because of improper measurements, data entry errors or data arriving from various sources than remaining data. Outlier detection is the technique which discovers such type of data from the given data set. Several techniques of outlier detection have been introduced which requires input parameter from the user such as distance threshold, density threshold, etc. The goal of this proposed work is to partition the input data set into the number of clusters using Unsupervised Extreme Learning Machine algorithm. Then the clusters are given as an input to the outlier detection methods namely cluster based outlier detection algorithm and outlier detection algorithm. The methods detects an outlier from each cluster. This work aims at studying cluster based out...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Computer Applications, 2013
International Journal of Computer Applications, 2015
International Journal of Modern Education and Computer Science, 2015
WSEAS Transactions on …, 2010
International Journal of Engineering Sciences and Research Technology, 2016
PeerJ Computer Science
World Journal of Computer Application and Technology
International Journal of Database Theory and Application, 2016