A distributed data clustering algorithm in P2P networks

Hedieh Sajedi

A distributed data clustering algorithm in P2P networks

Hedieh Sajedi

2017, Applied Soft Computing

visibility

…

description

3 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Clustering is one of the important data mining issues, especially for large and distributed data analysis. Distributed computing environments such as Peer-to-Peer (P2P) networks involve separated/scattered data sources, distributed among the peers. According to unpredictable growth and dynamic nature of P2P networks, data of peers are constantly changing. Due to the high volume of computing and communications and privacy concerns, processing of these types of data should be applied in a distributed way and without central management. Today, most applications of P2P systems focus on unstructured P2P systems. In unstructured P2P networks, spreading gossip is a simple and efficient method of communication, which can adapt to dynamic conditions in these networks. Recently, some algorithms with different pros and cons have been proposed for data clustering in P2P networks. In this paper, by combining a novel method for extracting the representative data, a gossip-based protocol and a new centralized clustering method, a Gossip Based Distributed Clustering algorithm for P2P networks called GBDC-P2P is proposed. The GBDC-P2P algorithm is suitable for data clustering in unstructured P2P networks and it adapts to the dynamic conditions of these networks. In the GBDC-P2P algorithm, peers perform data clustering operation with a distributed approach only through communications with their neighbours. The GBDC-P2P does not need to rely on a central server and it performs asynchronously. Evaluation results demonstrate the superior performance of the GBDC-P2P algorithm. Also, a comparative analysis with other well-established methods illustrates the efficiency of the proposed method.

Co. SEP

Peer-to-peer systems and applications have attracted much attention as they are more scalable than traditional client-server ones. To provide efficient communications among nodes in the network, node clustering can be utilized to avoid flooding messages. In this paper, a distributed node clustering algorithm was proposed which adopts a new way to choose originators; then the ns-2 simulator was applied to evaluate the proposed clustering algorithm. Experimental results showed that the proposed algorithm can achieve better clustering accuracy than existing algorithms for different types of network topologies. More importantly, the number of messages required for clustering is less than the compared algorithms.

Log In

A distributed data clustering algorithm in P2P networks

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics