Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, IRAQI JOURNAL OF STATISTICAL SCIENCES
…
18 pages
1 file
In this paper, we shall try to determine outliers and pinpoint its source of existence by using Box-Whisker plots technique which is an effective approach to detect and treat outliers. Thus the researchers prove that the Box-Whisker-Plot is the most effective method among other methods used in this research which is the hypothesis of the paper.
Tukey's boxplot is very popular tool for detection of outliers. It reveals the location, spread and skewness of the data. It works nicely for detection of outliers when the data are symmetric. When the data are skewed it covers boundary away from the whisker on the compressed side while declares erroneous outliers on the extended side of the distribution. Hubert and Vandervieren (2008) made adjustment in Tukey's technique to overcome this problem. However another problem arises that is the adjusted boxplot constructs the interval of critical values which even exceeds from the extremes of the data. In this situation adjusted boxplot is unable to detect outliers. This paper gives solution of this problem and proposed approach detects outliers properly. The validity of the technique has been checked by constructing fences around the true 95% values of different distributions. Simulation technique has been applied by drawing different sample size from chi square, beta and lognormal distributions. Fences constructed by the modified technique are close to the true 95% than adjusted boxplot which proves its superiority on the existing technique.
2016
Tukey’s boxplot is very popular tool for detection of outliers. It reveals the location, spread and skewness of the data. It works nicely for detection of outliers when the data are symmetric. When the data are skewed it covers boundary away from the whisker on the compressed side while declares erroneous outliers on the extended side of the distribution. Hubert and Vandervieren (2008) made adjustment in Tukey’s technique to overcome this problem. However another problem arises that is the adjusted boxplot constructs the interval of critical values which even exceeds from the extremes of the data. In this situation adjusted boxplot is unable to detect outliers. This paper gives solution of this problem and proposed approach detects outliers properly. The validity of the technique has been checked by constructing fences around the true 95 % values of different distributions. Simulation technique has been applied by drawing different sample size from chi square, beta and lognormal dis...
2021
Many real-world phenomena generate data sets with outliers i.e., extreme observations that are away from the mainstream of the data. The presence of outliers may cause invalid analysis by violating the conventional assumptions of regression models. Hence identification of outliers holds significant importance in data analysis. This study reviews various outlier labeling methods and shows the comparative detection of outliers by applying these methods on several real data sets with small to large sample sizes and low to high levels of skewness. Some graphical and formal methods of univariate outlier detection are also applied. All labeling methods detected no outlier for symmetric shape except adjusted boxplot. For slightly skewed distribution, Z-score, 3SD method, and 3IQR found resistance for both small and large sample sizes except adjusted boxplot which is resistant in large data only. In the case of mildly skewed and large sample size, the 2Median Absolute Deviation method shown...
International Journal of Engineering Sciences and Research Technology, 2016
Data Mining just alludes to the extraction of exceptionally intriguing patterns of the data from the monstrous data sets. Outlier detection is one of the imperative parts of data mining which Rexall discovers the perceptions that are going amiss from the normal expected conduct. Outlier detection and investigation is once in a while known as Outlier mining. In this paper, we have attempted to give the expansive and a far reaching literature survey of Outliers and Outlier detection procedures under one rooftop, to clarify the lavishness and multifaceted nature connected with each Outlier detection technique. Besides, we have likewise given a wide correlation of the different strategies for the diverse Outlier techniques. Outliers are the focuses which are unique in relation to or conflicting with whatever is left of the information. They can be novel, new, irregular, strange or uproarious data. Outliers are in some cases more fascinating than most of the information. The principle di...
2014
— Outlier Detection has many applications. Outlier is instance of the data set which has exceptional characteristics compare to other instance of data and exhibits unusual behavior of the system. There are many methods used for detecting outliers. Every method has its advantages and limitations. In this review paper a relative comparison of few statistical methods is carried out. This shows which method is more efficient in detecting outlier. Keywords-outlier detection methods; mean; standard deviation; median absolute deviation; clever variance and clever mean I.
2016
This paper discussed the two well-known procedures to detect single outlier as well as multiple outliers in linear regression on some considered situations. The procedure is Rn statistics for single outlier and Marasinghe procedure for multiple outliers. We have calculated the power of these test statistics using Monte Carlo simulation method and conclusions are made based on the calculated results.
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/outlier-detection-methods-an-analysis https://www.ijert.org/research/outlier-detection-methods-an-analysis-IJERTV2IS90377.pdf An outlier is an extreme observation that is considerably dissimilar from the rest of the objects. The detection of outlier is helpful in many applications such as data cleaning, network intrusion, credit card fraud detection, telecom fraud detection, customer segmentation, medical analysis etc. Outliers behave very differently from the rest of the observations in the dataset. Outliers are mostly removed to improve the accuracy of the predictions. But, the presence of an outlier can have certain meaning also. In our work we compare detection of outlier techniques based on statistical method, density based method, distance based method and deviation based. Keywords Outlier detection, statistical method, density based method, deviation based method, distance based method, artificial intelligence, fuzzy logic, neural network.
International Journal of Computer Applications, 2013
Outlier detection is an extremely important problem with direct application in a wide variety of domains. A key challenge with outlier detection is that it is not a wellformulated problem like clustering. In this paper, discussion on different techniques and then comparison by analyzing their different aspects, essentially, time complexity. Every unique problem formulation entails a different approach, resulting in a huge literature on outlier detection techniques. Several techniques have been proposed to target a particular application domain. The classification of outlier detection techniques based on the applied knowledge discipline provides an idea of the research done by different communities and also highlights the unexplored research avenues for the outlier detection problem. Discussed of the behavior of different techniques will be done, in this paper, with respect to the nature. The feasibility of a technique in a particular problem setting also depends on other constraints. For example, Statistical techniques assume knowledge about the underlying distribution characteristics of the data. Distance based techniques are typically expensive and hence are not applied in scenarios where computational complexity is an important issue.
2015
This paper compares two approaches in identifying outliers in multivariate datasets; Mahalanobis distance (MD) and robust distance (RD). MD has been known suffering from masking and swamping effects and RD is an approach that was developed to overcome problems that arise in MD. There are two purposes of this paper, first is to identify outliers using MD and RD and the second is to show that RD performs better than MD in identifying outliers. An observation is classified as an outlier if MD or RD is larger than a cut-off value. Outlier generating model is used to generate a set of data and MD and RD are computed from this set of data. The results showed that RD can identify outliers better than MD. However, in non-outliers data the performance for both approaches are similar. The results for RD also showed that RD can identify multivariate outliers much better when the number of dimension is large.
2014
In many applications outlier detection is an important task . In the process of Knowledge Discovery in Databases, isolation of outlying data is important. This isolation process improves the quality of data and reduces the impact of outlying data on the existing values. Numerous methods are available in the detection process of outliers in univariate data sets. Most of these methods handle one outlier at a time. In this paper, Grubb’s statistics, sigma rule and fence rules deal more than one outliers at a time. In general, when multiple outliers are present, presence of such outliers prevents us from detecting other outliers. Hence, as soon as outliers are found, removing outlier is an important task. Multiple outliers are evaluated on different data sets and proved that results are effective. Separate procedures are used for detecting outliers in continuous and discrete data. Experimental results show that our method works well for different data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computers & Chemical Engineering, 2011
INTERNATIONAL JOURNAL OF LATEST TRENDS IN ENGINEERING AND TECHNOLOGY, 2016
International Journal of Computer Applications, 2013
Kalpa Publications in Engineering
International Journal of Computer Applications, 2015
Comunicaciones en Estadística, 2015
mat.hacettepe.edu.tr