Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, ACM Transactions on Knowledge Discovery from Data
Given a spatial dataset placed on an n × n grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.
2009
Given a spatial data set placed on an n × n grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithms that, given any usersupplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.
Knowledge and Information Systems, 2012
A spatial anomaly captures a phenomenon occurring in a region which is vastly deviant in behavior with respect to the other normal observations. However, in reality this anomaly may impact other phenomena in the region across multiple domains, for example, crime is often linked to other sociopolitical factors or phenomenon such as poverty and education. Similarly, accidents in the region may be linked to other environmental factors such as weather and surface condition. So, finding anomalies across multiple domains is important in various applications. In this paper, we propose an approach for finding such a tangible anomalous window across multiple domains where window refers to the set of contiguous points in space, and since the window is multi-domain, there are several overlapping windows in the same space across domains. Our approach for finding anomalous window across the domains comprises the following steps: (1) single-domain anomaly detection: discovering anomalous window in each domain; (2) association rule mining: discovering relationship between the anomalous windows across domains using association rule mining; and (3) validation: validating the result using (a) Monte Carlo simulation, (b) correlation using lift and (c) ground truth evaluation. In addition, we also provide a probabilistic framework to evaluate the relationships between the spatial nodes as a postprocessing step. Finally, we provide a visualization technique for viewing the multi-domain anomalous window and the probabilistic relationships between the nodes. We provide detailed experimental results and comparisons with other approaches using real-world health ranking [51] and transportation datasets [50] with known ground truth windows. The results show that our approach is effective in finding the anomalies in multiple domains as compared to other approaches.
International Journal on Artificial Intelligence Tools, 2004
A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. Previous work in spatial outlier detection focuses on detecting spatial outliers with a single attribute. In the paper, we propose two approaches to discover spatial outliers with multiple attributes. We formulate the multi-attribute spatial outlier detection problem in a general way, provide two effective detection algorithms, and analyze their computation complexity. In addition, using a real-world census data, we demonstrate that our approaches can effectively identify local abnormality in large spatial data sets.
The ever-increasing volume of spatial data has greatly challenged our ability to extract useful but implicit knowledge from them. As an important branch of spatial data mining, spatial outlier detection aims to discover the objects whose non-spatial attribute values are significantly different from the values of their spatial neighbors. These objects, called spatial outliers, may reveal important phenomena in a number of applications including traffic control, satellite image analysis, weather forecast, and medical diagnosis. Most of the existing spatial outlier detection algorithms mainly focus on identifying single attribute outliers and could potentially misclassify normal objects as outliers when their neighborhoods contain real spatial outliers with very large or small attribute values. In addition, many spatial applications contain multiple non-spatial attributes which should be processed altogether to identify outliers. To address these two issues, we formulate the spatial outlier detection problem in a general way, design two robust detection algorithms, one for single attribute and the other for multiple attributes, and analyze their computational complexities. Experiments were conducted on a real-world data set, West Nile virus data, to validate the effectiveness of the proposed algorithms.
Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '10, 2010
A spatial outlier is a spatially referenced object whose nonspatial attributes are very different from those of its spatial neighbors. Spatial outlier detection has been an important part of spatial data mining and attracted attention in the past decades. Numerous SOD (Spatial Outlier Detection) approaches have been proposed. However, in these techniques, there exist the problems of masking and swamping. That is, some spatial outliers can escape the identification, and normal objects can be erroneously identified as outliers. In this paper, two Random walk based approaches, RW-BP (Random Walk on Bipartite Graph) and RW-EC (Random Walk on Exhaustive Combination), are proposed to detect spatial outliers. First, two different weighed graphs, a BP (Bipartite graph) and an EC (Exhaustive Combination), are modeled based on the spatial and/or non-spatial attributes of the spatial objects. Then, random walk techniques are utilized on the graphs to compute the relevance scores between the spatial objects. Using the analysis results, the outlier scores are computed for each object and the top k objects are recognized as outliers. Experiments conducted on the synthetic and real datasets demonstrated the effectiveness of the proposed approaches.
Data Mining and Knowledge Discovery, 2014
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as-weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases-i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018
Automatic detection of anomalies in space-and time-varying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatiotemporal time-series, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition to existing techniques for detecting isolated anomalous data points, we propose the "Maximally Divergent Intervals" (MDI) framework for unsupervised detection of coherent spatial regions and time intervals characterized by a high Kullback-Leibler divergence compared with all other data given. In this regard, we define an unbiased Kullback-Leibler divergence that allows for ranking regions of different size and show how to enable the algorithm to run on large-scale data sets in reasonable time using an interval proposal technique. Experiments on both synthetic and real data from various domains, such as climate analysis, video surveillance, and text forensics, demonstrate that our method is widely applicable and a valuable tool for finding interesting events in different types of data.
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011
Spatial Categorical Outlier Detection (SCOD) has attracted considerable attentions from the areas of spatial data mining and geological analysis. When encountering an SCOD problem, some researchers introduce to utilize Spatial Numerical Outlier Detection measures by mapping categorical attributes to continuous ones. However, such approaches fail to capture the special properties of spatial categorical data, which is prone to incur the masking and swamping issues. In this paper, we model spatial dependencies between spatial categorical observations and propose a Pair Correlation Function(PCF) based method to detect SCOs. First, a new metric, named Pair Correlation Ratio(PCR), is estimated for each pair of categorical combinations based on their co-occurrence frequency at different spatial distances. Then discrete PCRs are fitted in a continuous function of distances. The outlier score is computed using the average PCRs between referenced object and its spatial neighbors. Observations with the lowest PCRs are labeled as potential SCOs. Extensive experiments demonstrated that PCF based method outperformed existing approaches.
Abstract: An outlier is any object which is inconsistent with the remaining objects in a database in data mining the outlier detection playsan interesting and important role because the removal of false outliers may affect the mined results to a greater extent if it is important information needed for analysis. The Spatial outliers are locations which are significantly different from theirneighborhoods even though they are not much deviated from the entire population. It helps in finding out the local instabilities ofobjects when compared with other objects in spatial data. The spatial exploration becomes important because it is applied in many applications like weather prediction, clinical traits, geospaced information processing etc.The detection of spatial outliers is necessary for analysis in this area. This paper presents a survey and study of spatial outliers, its approaches, detectionmethods and algorithms with their complexity along with their pros and cons. Key words: Outliers, approaches, methods, algorithms
Fourth IEEE International Conference on Data Mining (ICDM'04), 2004
We propose a measure, Spatial Local Outlier Measure (SLOM) which captures the local behaviour of datum in their spatial neighborhood. With the help of SLOM we are able to discern local spatial outliers which are usually missed by global techniques like "three standard deviations away from the mean". Furthermore the measure takes into account the local stability around a data point and supresses the reporting of outliers in highly unstable areas, where data is too heterogeneous and the notion of outliers is not meaningful. We prove several properties of SLOM and report experiments on synthetic and real data sets which show that our approach is novel and scalable to large data sets.
GeoInformatica, 2013
Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.
2010
This paper presents a novel modification to an existing algorithm for spatial anomaly detection in binary labeled point data sets, using the Bernoulli version of the Spatial Scan Statistic. We identify a potential ambiguity in p-values produced by Monte Carlo testing, which (by the selection of the most conservative p-value) can lead to sub-optimal power. When such ambiguity occurs, the modification uses a very inexpensive secondary test to suggest a less conservative p-value. Using benchmark tests, we show that this appears to restore power to the expected level, whilst having similarly retest variance to the original. The modification also appears to produce a small but significant improvement in overall detection performance when multiple anomalies are present.
Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004
The behavior of spatial objects is under the influence of nearby spatial processes. Therefore in order to perform any type of spatial analysis we need to take into account not only the spatial relationships among objects but also the underlying spatial processes and other spatial features in the vicinity that influence the behavior of a given spatial object. In this paper, we address the outlier detection by refining the concept of a neighborhood of an object, which essentially characterizes similarly behaving objects into one neighborhood. This similarity is quantified in terms of the spatial relationships among the objects and other semantic relationships based on the spatial processes and spatial features in their vicinity. These spatial features could be natural such as a stream, and vegetation, or man-made such as a bridge, railroad, and chemical factory. The paper also addresses the identification of spatio-temporal outliers in high dimensions, in their neighborhood.
Intelligent Data Analysis, 2006
We propose a new method for detecting groups of anomalies in categorical datasets. Our approach is a generalization of the spatial scan statistic, a commonly used method for detecting clusters of increased counts in spatial data. We extend this framework to non-spatial datasets with discrete valued attributes, where the degree of anomalousness of each record depends on its attribute values and we wish to find self-similar groups of anomalous records. We model the relationship between the attributes using a probabilistic model (e.g. Bayesian network), define a likelihood ratio statistic in terms of the pseudo-likelihoods for the null and alternative hypotheses, and maximize this statistic over all subsets of records. Since an exhaustive search over all such groups is computationally infeasible, we propose an efficient (but approximate) search heuristic. We show that this algorithm is able to accurately detect anomalous groups in real-world hospital, container shipping and network connections data.
International Journal on Artificial Intelligence Tools, 2011
Spatial outliers are the spatial objects whose nonspatial attribute values are quite different from those of their spatial neighbors. Identification of spatial outliers is an important task for data mining researchers and geographers. A number of algorithms have been developed to detect spatial anomalies in meteorological images, transportation systems, and contagious disease data. In this paper, we propose a set of graph-based algorithms to identify spatial outliers. Our method first constructs a graph based on k-nearest neighbor relationship in spatial domain, assigns the differences of nonspatial attribute as edge weights, and continuously cuts high-weight edges to identify isolated points or regions that are much dissimilar to their neighboring objects. The proposed algorithms have three major advantages compared with other existing spatial outlier detection methods: accurate in detecting both point and region outliers, capable of avoiding false outliers, and capable of computin...
Journal of Computational and Graphical Statistics, 1999
In this article we suggest a unified approach to the exploratory analysis of spatial data. Our technique is based on a forward search algorithm that orders the observations from those most in agreement with a specified autocorrelation model to those least in agreement with it. This leads to the identification of spatial outliers-that is, extreme observations with respect to their neighboring values-and of nonstationary pockets. In particular, the focus of our analysis is on spatial prediction models. We show that standard deletion diagnostics for prediction are affected by masking and swamping problems when multiple outliers are present. The effectiveness of the suggested method in detecting masked multiple outliers, and more generally in ordering spatial data, is shown by means of a number of simulated datasets. These examples clearly reveal the power of our method in getting inside the data in a way which is more simple and powerful than it would be using standard diagnostic procedures. Furthermore, the behavior of our algorithm under the null hypothesis of no outliers is investigated through a Monte Carlo experiment. Such simulations are also used to build envelopes for the forward search.
2008
Detection of traffic anomalies is an important problem that has been the focus of considerable research. Recent work has shown the utility of spatial detection of anomalies via crosslink traffic comparisons. In this paper we identify three advances that are needed to make such methods more useful and practical for network operators. First, anomaly detection methods should avoid global communication and centralized decision making. Second, nonparametric anomaly detection methods are needed to augment current ...
Proceedings of SIAM Conference on Data Mining, 2006
Spatial outliers are the spatial objects with distinct features from their surrounding neighbors. Detection of spatial outliers helps reveal valuable information from large spatial data sets. In many real applications, spatial objects can not be simply abstracted as isolated points. They have different boundary, size, volume, and location. These spatial properties affect the impact of a spatial object on its neighbors and should be taken into consideration. In this paper, we propose two spatial outlier detection methods which integrate the impact of spatial properties to the outlierness measurement. Experimental results on a real data set demonstrate the effectiveness of the proposed algorithms.
2004
Outlier detection, as a data mining task, is to identify a small set of data that is considerably dissimilar or inconsistent with the remainder of the data. Spatial outliers are spatially referenced objects whose nonspatial attribute values are (or whose distribution is) significantly different from that of their neighbors. Identification of spatial outliers can lead to the discovery of unexpected, interesting spatial patterns for further investigation. In this paper, bipartite methods are used to detect spatial outliers based on the concepts of spatial point estimation and spatial statistical theory. Two point estimation methods are introduced to estimate values of spatial points. The concept of Z-score is used to evaluate the deviation of ratios of estimated values vs. true values from average ratio in the study space. Two algorithms are proposed to identify spatial outliers using different methods. These algorithms are used in New Mexico Produced Water Chemistry Database (PWCD). Results show that outlier detection can aid in bad data checking and the analysis of produced water (in oil and gas production) related problems.
2006
The discovery of interesting regions in spatial datasets is an important data mining task. In particular, we are interested in identifying disjoint, contiguous regions that are unusual with respect to the distribution of a given class; i.e. a region that contains an unusually low or high number of instances of a particular class. This paper centers on the discussion of techniques, methodologies, and algorithms to discover such regions. A measure of interestingness and a supervised clustering framework are introduced for this purpose. Moreover, three supervised clustering algorithms are proposed in the paper: an agglomerative hierarchical supervised clustering named SCAH, an agglomerative, grid-based clustering method named SCHG, and lastly an algorithm named SCMRG which searches a multi-resolution grid structure top down for interesting regions. Finally, experimental results of applying the proposed framework and algorithms to the problem of identifying hotspots in spatial datasets are discussed.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.