Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2017, Applied Soft Computing
…
3 pages
1 file
Clustering is one of the important data mining issues, especially for large and distributed data analysis. Distributed computing environments such as Peer-to-Peer (P2P) networks involve separated/scattered data sources, distributed among the peers. According to unpredictable growth and dynamic nature of P2P networks, data of peers are constantly changing. Due to the high volume of computing and communications and privacy concerns, processing of these types of data should be applied in a distributed way and without central management. Today, most applications of P2P systems focus on unstructured P2P systems. In unstructured P2P networks, spreading gossip is a simple and efficient method of communication, which can adapt to dynamic conditions in these networks. Recently, some algorithms with different pros and cons have been proposed for data clustering in P2P networks. In this paper, by combining a novel method for extracting the representative data, a gossip-based protocol and a new centralized clustering method, a Gossip Based Distributed Clustering algorithm for P2P networks called GBDC-P2P is proposed. The GBDC-P2P algorithm is suitable for data clustering in unstructured P2P networks and it adapts to the dynamic conditions of these networks. In the GBDC-P2P algorithm, peers perform data clustering operation with a distributed approach only through communications with their neighbours. The GBDC-P2P does not need to rely on a central server and it performs asynchronously. Evaluation results demonstrate the superior performance of the GBDC-P2P algorithm. Also, a comparative analysis with other well-established methods illustrates the efficiency of the proposed method.
Peer-to-peer systems and applications have attracted much attention as they are more scalable than traditional client-server ones. To provide efficient communications among nodes in the network, node clustering can be utilized to avoid flooding messages. In this paper, a distributed node clustering algorithm was proposed which adopts a new way to choose originators; then the ns-2 simulator was applied to evaluate the proposed clustering algorithm. Experimental results showed that the proposed algorithm can achieve better clustering accuracy than existing algorithms for different types of network topologies. More importantly, the number of messages required for clustering is less than the compared algorithms.
Proceedings of the Twenty-First …, 2010
Several algorithms have been recently developed for distributed data clustering, which are applied when data cannot be concentrated on a single machine, for instance because of privacy reasons or due to net-work bandwidth limitations, or because of the huge amount of distributed ...
IEEE Internet Computing, 2000
Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peer-to-peer computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of peers with little centralized coordination. Peerto-peer file sharing, peer-to-peer electronic commerce, and peer-to-peer monitoring based on a network of sensors are some examples. This paper offers an overview of distributed data mining applications and algorithms for peer-to-peer environments. It describes both exact and approximate distributed data mining algorithms that work in a decentralized manner. It illustrates these approaches for the problem of computing and monitoring clusters in the data residing at the different nodes of a peer-to-peer network.
Connectivity-based node clustering has wide-ranging applications in decentralized peer-to-peer (P2P) networks such as P2P file sharing systems, mobile ad-hoc networks, P2P sensor networks, and so forth. This paper describes a Connectivity-based Distributed Node Clustering scheme (CDC). This scheme presents a scalable and efficient solution for discovering connectivity-based clusters in peer networks. In contrast to centralized graph clustering algorithms, the CDC scheme is completely decentralized and it only assumes the knowledge of neighbor nodes instead of requiring a global knowledge of the network (graph) to be available. An important feature of the CDC scheme is its ability to cluster the entire network automatically or to discover clusters around a given set of nodes. To cope with the typical dynamics of P2P networks, we provide mechanisms to allow new nodes to be incorporated into appropriate existing clusters and to gracefully handle the departure of nodes in the clusters. These mechanisms enable the CDC scheme to be extensible and adaptable in the sense that the clustering structure of the network adjusts automatically as nodes join or leave the system. We provide detailed experimental evaluations of the CDC scheme, addressing its effectiveness in discovering good quality clusters and handling the node dynamics. We further study the types of topologies that can benefit best from the connectivitybased distributed clustering algorithms like CDC. Our experiments show that utilizing message-based connectivity structure can considerably reduce the messaging cost and provide better utilization of resources, which in turn improves the quality of service of the applications executing over decentralized peer-to-peer networks.
2008
Most social networks exhibit community structures, in which nodes are tightly connected to each other within a community but only loosely connected to nodes in other communities. Researches on community mining have received a lot of attention; however, most of them are based on a centralized system model and thus not applicable to the distributed model of P2P networks. In this paper, we propose a distributed community mining algorithm, namely Asynchronous Clustering and Merging scheme (ACM), for computing environments. Due to the dynamic and distributed nature of P2P networks, The ACM scheme employs an asynchronous strategy such that local clustering is executed without requiring an expensive global clustering to be performed in a synchronous fashion. Experimental results show that ACM is able to discover community structures with high quality while outperforming the existing approaches.
2011 11th International Conference on Intelligent Systems Design and Applications, 2011
Due to the dramatic increase of data volumes in different applications, it is becoming infeasible to keep these data in one centralized machine. It is becoming more and more natural to deal with distributed databases and networks. That is why distributed data mining techniques have been introduced. One of the most important data mining problems is data clustering. While many clustering algorithms exist for centralized databases, there is a lack of efficient algorithms for distributed databases. In this paper, an efficient algorithm is proposed for clustering distributed databases. The proposed methodology employs an iterative optimization technique to achieve better clustering objective. The experimental results reported in this paper show the superiority of the proposed technique over a recently proposed algorithm based on a distributed version of the well known K-Means algorithm (Datta et al. 2009) [1].
The emerging widespread use of Peer-to-Peer computing is making the P2P Data Mining a natural choice when data sets are distributed over such kind of systems. The huge amount of data stored within the nodes of P2P networks and the bigger and bigger number of applications dealing with them as p2p file-sharing, p2p chatting, p2p electronic commerce etc.., is moving the spotlight on this challenging field. In this paper we give an overview of two different approaches for implementing primitives for P2P Data Mining, trying then to show differences and similarities. The first one is based on the definition of Local algorithms; the second one relies on the Newscast model of computation.
International Journal of Peer to Peer Networks, 2013
This paper proposes a peer clustering scheme for unstructured Peer-to-Peer (P2P) systems. The proposed scheme consists of an identification of critical links, local reconfiguration of incident links, and a retaliation rule. The simulation result indicates that the proposed scheme improves the performance of previous schemes and that a peer taking a cooperative action will receive a higher profit than selfish peers.
2011
Peer-to-Peer (P2P) networks are distributed systems in which nodes of equal roles and capabilities exchange information and services directly with one another. In recent years, they have become a popular way to share large amounts of data. Such architectures, however, complicate the process of knowledge discovery and data mining since algorithms must deal with distributed (and often) dynamic sources of data and computing. In this paper, we present a distributed algorithm for learning linear classifiers in P2P networks. The problem is posed as a linear program such that each peer has its own constraints, but needs to solve a global objective function. A randomized-gossip based approximate algorithm is presented which reduces communication cost in the network significantly while ensuring convergence of the algorithm.
Proceedings of the 2007 SIAM International Conference on Data Mining, 2007
In distributed data mining models, adopting a flat node distribution model can affect scalability. To address the problem of modularity, flexibility and scalability, we propose a hierarchically-distributed peer-to-peer architecture and algorithm for data clustering (HP2PC). The architecture is based on a multi-layer overlay network of peer neighborhoods. Supernodes, which act as representatives of neighborhoods, are recursively grouped to form higher level neighborhoods. Peers at a certain level of the hierarchy cooperate within their respective neighborhoods to perform clustering. Using this model, we can partition the clustering problem in a modular way, solve each part individually, then successively combine clusterings up the hierarchy where increasingly global solutions are computed. The algorithm was applied to a distributed document clustering problem and achieved decent speedup with comparable clustering quality to the centralized approach.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Knowledge and Data Engineering, 2009
Information Sciences, 2006
Proceedings of the 6th …, 2006
Proceedings of the 15th International Conference on Enterprise Information Systems, 2013
Journal of Physics: Conference Series
… of the 7th Conference on 7th …, 2007
International Journal of Computer Applications, 2013
Arxiv preprint arXiv: …, 2011
20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06), 2006
2007 Ieee International Conference on Communications, 2007