Distributed clustering algorithm for spatial data mining

Mohand Kechadi

Distributed clustering algorithm for spatial data mining

Mohand Kechadi

2015, 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM)

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Distributed data mining techniques and mainly distributed clustering are widely used in last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach analyses the datasets on their locations the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect knowledge. In this paper we propose a new clustering approach for very large spatial datasets that are heterogeneous and distributed. The approach is based on K-means Algorithm but it generates the number of global clusters dynamically. It is not necessary to fix the number of clusters. Moreover, this approach uses a very sophisticated aggregation phase. The aggregation phase is designed in such away that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. Preliminary results show that the proposed approach scales up well in terms of running time, and result quality, we also compared it to two other clustering algorithms BIRCH and CURE and we show clearly this approach is much more efficient than the two algorithms.

Mohand Kechadi

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016

In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each node performs a clustering on its local data, 2) aggregation phase, where the local clusters are aggregated to produce global clusters. This approach is characterised by the fact that the local clusters are represented in a simple and efficient way. And The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in both response time and memory allocation. We evaluated the approach with different datasets and compared it to well-known clustering techniques. The experimental results show that our approach is very promising and outperforms all those algorithms.

Log In

Distributed clustering algorithm for spatial data mining

Sign up for access to the world's latest research

Abstract

Related papers

Related topics