Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Proceedings of the 7th Wseas International Conference on Software Engineering Parallel and Distributed Systems
In order for peer-to-peer (P2P) content sharing network to be scalable, it is imperative to efficiently route queries through the network. Semantic based search should, with as little messages (traffic) as possible, return just those relevant documents stored throughout the network thus achieving precision and recall values comparable to those of correspondent centralized system. In this article we propose protocols for self-organizing P2P network that arranges links between peers according to peer's content. In proposed network peers organize themselves into "semantic communities" but without losing links to other semantic communities. Proposed network has no prior knowledge of the semantics of documents that are to be stored in the system.
IEEE Transactions on Knowledge and Data Engineering, 2004
Most existing Peer-to-Peer (P2P) systems support only title-based searches and are limited in functionality when compared to today's search engines. In this paper, we present the design of a distributed P2P information sharing system that supports semantic-based content searches of relevant documents. First, we propose a general and extensible framework for searching similar documents in P2P network. The framework is based on the novel concept of Hierarchical Summary Structure. Second, based on the framework, we develop our efficient document searching system, by effectively summarizing and maintaining all documents within the network with different granularity. Finally, an experimental study is conducted on a real P2P prototype, and a large-scale network is further simulated. The results show the effectiveness, efficiency and scalability of the proposed system.
2008
We present iCluster, a self-organizing peer-to-peer overlay network for supporting full-fledged information retrieval in a dynamic environment. iCluster works by organizing peers sharing common interests into clusters and by exploiting clustering information at query time for achieving low network traffic and high recall. We define the criteria for peer similarity and peer selection, and we present the protocols for organizing the peers into clusters and for searching within the clustered organization of peers. iCluster is evaluated on a realistic peer-to-peer environment using real-world data and queries. The results demonstrate significant performance improvements (in terms of clustering efficiency, communication load and retrieval accuracy) over a state-of-the-art peer-to-peer clustering method. Compared to exhaustive search by flooding, iCluster exchanged a small loss in retrieval accuracy for much less message flow.
2005
The peer-to-peer computing paradigm is an intriguing alternative to Google-style search engines for querying and ranking Web content. In a network with many thousands or millions of peers the storage and access load requirements per peer are much lighter than for a centralized Google-like server farm; thus more powerful techniques from information retrieval, statistical learning, computational linguistics, and ontological reasoning can be employed on each peer's local search engine for boosting the quality of search results. In addition, peers can dynamically collaborate on advanced and particularly difficult queries. Moroever, a peer-to-peer setting is ideally suited to capture local user behavior, like query logs and click streams, and disseminate and aggregate this information in the network, at the discretion of the corresponding user, in order to incorporate richer cognitive models. This paper gives an overview of ongoing work in the EU Integrated Project DELIS that aims to develop foundations for a peer-to-peer search engine with Google-or-better scale, functionality, and quality, which will operate in a completely decentralized and self-organizing manner. The paper presents the architecture of such a system and the Minerva prototype testbed, and it discusses various core pieces of the approach: efficient execution of top-k ranking queries, strategies for query routing when a search request needs to be forwarded to other peers, maintaining a self-organizing semantic overlay network, and exploiting and coping with user and community behavior.
2004
ABSTRACT Gnutella, a well-known P2P system, uses resources inefficiently when directly applied to information retrieval problems. In this paper we propose an efficient search mechanism that extends the standard Gnutella protocol to support content-based retrieval in P2P networks. The idea is to estimate locally the relevance of peers when they receive query messages. Only those peers estimated as relevant will retrieve the query and send response messages back to the source.
2010
For the last few years peer-to-peer (p2p) networks have become widely used tools for sharing any kind of information from multimedia data to text documents. The vast amount of shared information leads issues on finding relevant information over p2p networks. Existing p2p file search and information retrieval techniques are based on the name of files, which is insufficient when searching relevant documents. In this paper we present a method to perform semantic information retrieval over p2p networks. Our method semantically inspects the content of shared data in peers to generate conceptual information about documents and general information about the peer.
Computer Networks, 2010
The last years have brought a dramatic increase in the popularity of collaborative Web 2.0 sites. According to recent evaluations, this phenomenon accounts for a large share of Internet traffic and significantly augments the load on the end-servers of Web 2.0 sites. In this paper, we show how collaborative classifications extracted from Web 2.0-like sites can be leveraged in the design of a self-organizing peer-to-peer network in order to distribute data in a scalable manner while preserving a high-content locality. We propose Affinity P2P (AP2P), a novel cluster-based locality-aware self-organizing peer-to-peer network. AP2P self-organizes in order to improve content locality using a novel affinity-based metric for estimating the distance between clusters of nodes sharing similar content. Searches in AP2P are directed to the cluster of interests, where a logarithmic-time parallel flooding algorithm provides high recall, low latency, and low communication overhead. The order of clusters is periodically changed using a greedy cluster placement algorithm, which reorganizes clusters based on affinity in order to increase the locality of related content. The experimental and analytical results demonstrate that the locality-aware cluster-based organization of content offers substantial benefits, achieving an average latency improvement of 45%, and up to 12% increase in search recall.
Journal of Computers, 2007
In this paper we present a formal description of PROSA, a P2P resource management system heavily inspired by social networks. Social networks have been deeply studied in the last two decades in order to understand how communities of people arise and grow. It is a widely known result that networks of social relationships usually evolves to small-worlds, i.e. networks where nodes are strongly connected to neighbours and separated from all other nodes by a small amount of hops. This work shows that algorithms implemented into PROSA allow to obtain an efficient small-world P2P network. We also show how taking advantage of PROSA structure it is possible to effectively answer queries. In particular, the so-called query recall for PROSA is estimated and compared to that obtained in SETS [1] and GES [2].
… technology, 2007. iccit …, 2007
Searching is an important factor in p2p network for content retrieval. Most of the searches in p2p system are title-based with their limited functionality. Without knowing the unique filename we canpsilat retrieve the content of the file in title based search. Here super peer p2p network is designed that supports content-based search for relevant documents. At the beginning, a general and extensible framework is proposed which is based on hierarchical summary structure for searching similar documents in p2p network. The summary structure is formed by Vector Space Model (VSM), Latent Semantic Indexing (LSI) and Singular Value Decomposition (SVD) techniques. Than an effective document searching is developed by summarizing and maintaining all documents within the network with different factors. Finally at the end, the experimental result is verified on a real p2p prototype and large scale network. The results show the effectiveness, efficiency and scalability of the proposed system.
2005
We initiate a study on the effect of the network topology on the performance of Peer-to-Peer (P2P) information retrieval systems. The emerging P2P model has become a very powerful and attractive paradigm for developing Internet-scale systems for sharing resources, including files, or documents. We show that the performance of Information Retrieval algorithms can be significantly improved through the use of fully distributed topologically aware overlay network construction techniques. Our empirical results, using the Peerware middleware infrastructure, show that the approach we propose is both efficient and practical.
Lecture Notes in Computer Science
Resource searching in the current peer-to-peer (P2P) applications is mainly based on the keyword match. However, more and more P2P applications require an efficient semantic searching based on the contents. In this paper, we propose a novel scalable semantic searching algorithm named SemSearch for the unstructured P2P networks. For the consistency and flexibility of semantic analysis, we integrate global and local semantic information to do the semantic analysis. Moreover, SemSearch transfers the searching requests to the peers whose shared resources are more semantic similar to implement semantic searching. We further evaluate the performance of SemSearch through the simulation experiments.
2013
Peer-to-peer computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of peers with little centralized coordination. Scalability is without doubt the foremost requirement for a peer-to-peer system. To obtain a high factor of scalability, we partition network search space using a global ontology. Our proposed system takes the form of a semantic layer that can be superimposed on top of any P2P infrastructure. This layer is subdivided as semantic categories through a Hilbert curve which has the merit of good preservation of locality. We choose HyperCup structure to support semantic category in order to increase fault tolerance and overcome bottleneck problem. HyperCup was also selected, because of its efficient broadcast algorithm. Classification within a semantic category will be conducted through multidimensional data analysis. The classification process will be recursively repeated with ontolo...
2005
Existing cluster-based searching schemes in unstructured peer-to-peer (P2P) networks employ flooding/random forwarding on connected dominating sets (CDS) of networks. There exists no upper bound on the size of CDS of a network. Both flooding and CDS hinder query efficiency. Random forwarding worsens the recall ratio. In this paper, we propose a cluster-based searching scheme that intelligently forward queries on the maximum independent sets (MIS) of networks. Our approach partitions the entire network into disjoint clusters with one clusterhead (CH) per cluster. CHs form a MIS and are connected through gateway nodes. Each node takes one role, a CH, a gateway, or an ordinary node. A CH looks up the data for the entire cluster using data summaries of cluster members, which are represented by bloom filters. Between clusters, CHs intelligently forward queries via gateways to the best neighbor CHs that are most likely to return query results. The experimental results demonstrate that our scheme greatly improves the query efficiency without degrading the quality of the query results, compared to existing approaches.
2007
Information Retrieval (IR) in distributed and decentralized environments has become an active field of research during the last years. Recently, Peer-to-Peer (P2P) networks have emerged as an attractive architectural paradigm for IR, both for technical and economic reasons. P2P networks are distributed and self-organizing systems that support resource sharing.
Peer-to-peer computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of peers with little centralized coordination. Scalability is without doubt the foremost requirement for a peer-to-peer system. To obtain a high factor of scalability, we partition network search space using a global ontology. Our proposed system takes the form of a semantic layer that can be superimposed on top of any P2P infrastructure. This layer is subdivided as semantic categories through a Hilbert curve which has the merit of good preservation of locality. We choose HyperCup structure to support semantic category in order to increase fault tolerance and overcome bottleneck problem. HyperCup was also selected, because of its efficient broadcast algorithm. Classification within a semantic category will be conducted through multidimensional data analysis. The classification process will be recursively repeated with ontolo...
Peer-to-Peer (P2P) systems are application layer networks which enable networked hosts to share resources in a distributed manner. An important problem in such networks is to be able to efficiently search the contents of other peers. In this paper we present a survey of search techniques for information retrieval in P2P networks, including recent techniques proposed by the authors. We also present a realistic experimental evaluation and comparison of these techniques, using a distributed middleware infrastructure we have designed and implemented.
Efficient discovery of information, based on partial knowledge, is a challenging problem faced by many large scale distributed systems. This paper presents a peer-to-peer search protocol that addresses this problem. The proposed system provides an efficient mechanism for advertising a binary pattern, and discovering it using any subset of its 1-bits. A pattern (e.g., Bloom filter) summarizes the properties (e.g., keywords or service description) associated with a shared object (e.g., document or service).
Information Systems, 2005
An important problem in unstructured peer-to-peer (P2P) networks is the efficient content-based retrieval of documents shared by other peers. However, existing searching mechanisms are not scaling well because they are either based on the idea of flooding the network with queries or because they require some form of global knowledge.
Information Systems, 2009
Recent progress in peer to peer (P2P) search algorithms has presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P databases. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significant reduction in the system's storage requirements. During query lookup, agents use unstructured search to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved in structured and unstructured approaches, allowing for a significant reduction in query costs. Finally, we address how node failures can be effectively addressed through storing multiple copies of selected data. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.
Peer-to-Peer (P2P) systems are very large computer networks, where peers collaborate to provide a common service. Providing large-scale Information Retrieval (IR), e.g. for searching the Word Wide Web, is an appealing application for P2P systems. The research community has presented several proposal for P2P-IR. However, so far the concepts of P2P and of IR have been intermingled. In this paper, we propose an architecture to structure P2P-IR systems. We differentiate between concepts belonging to the construction and maintenance of a P2P overlay network, and those belonging to IR. Furthermore, we distinguish basic P2P-IR concepts, which are likely to be needed in all P2P-IR systems, and advanced P2P-IR concepts, that rather depend on the flavor of the system. This decomposition of the P2P retrieval process is an important step towards a structured implementation of such systems. Furthermore, it allows a systematic sharing of methods and resources needed to perform retrieval. The next generation of global information retrieval systems will combine these distributed resources in new ways to provide more efficient web search.
Journal of Parallel and Distributed Computing, 2008
The past few years have seen tremendous advances in distributed storage infrastructure. Unstructured and structured overlay networks have been successfully used in a variety of applications, ranging from file-sharing to scientific data repositories. While unstructured networks benefit from low maintenance overhead, the associated search costs are high. On the other hand, structured networks have higher maintenance overheads, but facilitate bounded time search of installed keywords. When dealing with typical data sets, though, it is infeasible to install every possible search term as a keyword into the structured overlay.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.