Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008
This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontology-based search and keyword-based matching. Hybrid search smoothly copes with lack of semantic coverage of document content, which is one of the main limitations of current semantic search methods. In this paper we define hybrid search formally, discuss its compatibility with the current semantic trends and present a reference implementation: K-Search. We then show how the method outperforms both keyword-based search and pure semantic search in terms of precision and recall in a set of experiments performed on a collection of about 18.000 technical documents. Experiments carried out with professional users show that users understand the paradigm and consider it very powerful and reliable. K-Search has been ported to two applications released at Rolls-Royce plc for searching technical documentation about jet engines.
2011
Keyword search suffers from a number of issues: ambiguity, synonymy, and an inability to handle semantic constraints. Semantic search helps resolve these issues but is limited by the quality of annotations which are likely to be incomplete or imprecise. Hybrid search, a search technique that combines the merits of both keyword and semantic search, appears to be a promising solution. In this paper we describe and evaluate HyKSS, a hybrid search system driven by extraction ontologies for both annotation creation and query interpretation. For displaying results, HyKSS uses a dynamic ranking algorithm. We show that over data sets of short topical documents, the HyKSS ranking algorithm outperforms both keyword and semantic search in isolation, as well as a number of other non-HyKSS hybrid approaches to ranking. 1 Introduction Keyword search for documents on the web works well-often surprisingly well. Can semantic search, added to keyword search, make the search for relevant documents even better? Clearly, the answer should be yes, and researchers are pursuing this initiative (e.g., [1]). The real question, however, is not whether adding semantic search might help, but rather how can we, in a cost-effective way, identify the semantics both in documents in the search space and in the free-form queries users wish to ask. Keyword search has a number of limitations: (1) Polysemy: Ambiguous keywords may result in the retrieval of irrelevant documents. (2) Synonymy: Document publishers may use words that are synonymous with, but not identical to, terms in user queries causing relevant documents to be missed. (3) Constraint satisfaction: Keyword search is incapable of recognizing semantic constraints. If a query specifies "Hondas for under 12 grand", a keyword search will treat each word as a keyword (or stopword) despite the fact that many, if not most, relevant documents likely do not contain any of these words-not even "Hondas" since the plural is relatively rare in relevant documents. Semantic search can resolve polysemy by placing words in context, synonymy by allowing for alternatives, and constraint satisfaction by recognizing specified conditions. Thus, for example, semantic search can interpret the query "Hondas
Aslib Proceedings, 2011
PurposeThis paper seeks to describe the preliminary studies (on both users and data), the design and evaluation of the K‐Search system for searching legacy documents in aerospace engineering. Real‐world reports of jet engine maintenance challenge the current indexing practice, while real users' tasks require retrieving the information in the proper context. K‐Search is currently in use in Rolls‐Royce plc and has evolved to include other tools for knowledge capture and management.Design/methodology/approachSemantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focused retrieval than with more traditional techniques. The paper combines semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and personal approach to s...
2005
This paper develops a modular approach to improving effectiveness of searching documents for information by reusing and integrating mature software components such as Lucene APIs, WORDNET, LSA techniques, and domain-specific controlled vocabulary. To evaluate the practical benefits, the prototype was used to query MEDLINE database, and to locate domain-specific controlled vocabulary terms in Materials and Process Specifications. Its extensibility has been demonstrated by incorporating a spell-checker for the input query, and by structuring the retrieved output into hierarchical collections for quicker assimilation. It is also being used to experimentally explore the relationship between LSA and document clustering using 20-mini-newsgroups and Reuters data. In future, this prototype will be used as experimental testbed for expressive, context-aware and scalable searches.
Information Systems, 2012
9th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE '10)
This paper describes GoNTogle, a framework for document annotation and retrieval, built on top of Semantic Web and IR technologies. GoNTogle supports ontology-based annotation for documents of several formats, in a fully collaborative environment. It provides both manual and automatic annotation mechanisms. Automatic annotation is based on a learning method that exploits user annotation history and textual information to automatically suggest annotations for new documents. GoNTogle also provides search facilities beyond the traditional keyword-based search. A flexible combination of keyword-based and semantic-based search over documents is proposed in conjunction with advanced ontology-based search operations. The proposed methods are implemented in a fully functional tool and their effectiveness is experimentally validated.
Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. We propose a novel method for computing top-k routing plans based on their potentials to contain results for a given keyword query. We employ a keyword-element relationship summary that compactly represents relationships between keywords and the data elements mentioning them. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements, element sets, and subgraphs that connect these elements. Experiments carried out using 150 publicly available sources on the web showed that valid plans (precision@1 of 0.92) that are highly relevant (mean reciprocal rank of 0.89) can be computed in 1 second on average on a single PC. Further, we show routing greatly helps to improve the performance of keyword search, without compromising its result quality.
2011
This paper describes the preliminary studies (on both users and data), the design and evaluation of the K-Search system for searching legacy documents in aerospace engineering. Real-world reports of jet engine maintenance challenge the current indexing practice, while real users" tasks require retrieving the information in the proper context. K-Search is currently in use in Rolls-Royce plc. and has evolved to include other tools for knowledge capture and management. Design: Semantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focussed retrieval than with more traditional techniques. We combine semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and personal approach to searching legacy data. Findings: The user evaluation showed the system is effective and well received by users. It also shows different people look at the same data in different ways and make different use of the same system depending on their individual needs, influenced by their job profile and personal attitude. Research limitations: This study focuses on a specific case of an enterprise working in aerospace engineering. Although the findings are likely to be shared with other engineering domains, e.g. mechanical, electronic, the study did not expand the evaluation to different settings. Value: The study shows how real context of use can provide new and unexpected challenges to researchers and how effective solutions can be then adopted and used in organizations.
… Web Applications and …, 2008
The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: for a given query, a list of documents, ordered by relevance, is returned. Relevance computation is primarily driven by a basic string-matching operation. To date, several attempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word meanings in documents and queries. The general feeling is that dealing explicitly with only semantic information does not improve significantly the performance of text retrieval systems. This paper presents SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. We show how SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic information provided by the two other indexing levels.
ACM SIGIR Forum, 1989
Knowledge-based search tactics are discussed in terms of their role in the functioning of a semantically-based search system for bibliographic information retrieval. This prototype system, EP-X, actively assists users in defining or refining their topics of interest. It does so by applying search tactics to a knowledge-base describing topics in a particular domain and a database describing the contents of individual documents. This paper reviews the empirical studies that lead to the two central concepts implemented in EP-X: Semantically-based search; Knowledge-based search tactics.
2011
Abstract. Keyword search is receiving a lot of attention not only in Web contexts but also in the database area. It is an easy way to allow inexperienced user to query systems without the need of knowing any specific language or how data is structured. As a matter of fact, the amount of data available, in the Web as well as in other systems, is constantly increasing. And, with the improvements and the simplification of the technology, the amount of people accessing such information is growing too. Providing simple, yet effective tools that allow inexperienced users to quickly discover desired information is a big challenge in modern times. The prevalent approaches build on dedicated indexing techniques as well as search algorithms aiming at finding substructures
The Scientific World Journal, 2015
Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.
2014
We present an innovative system for semantic search, based on an ontology, called Search Ontology. The Search Ontology contains search terms, for which synonymous labels can be defined, and search concepts, specified by rules, that determine how search terms are combined with abstract NEAR or Boolean operators to describe corresponding concepts in documents. A search query can be generated from the ontological specification and executed on an information retrieval system such as Lucene afterwards. This approach has the advantage that the user can create powerful and complex queries by ontological specifications only, with minimal effort and without knowing the query syntax. The ontology itself is easily adaptable, extensible and reusable. No information contained in the ontology is used while preprocessing and indexing the documents, since the ontology is being constantly expanded by users of the system and changes in the ontology should not trigger new indexing and analysis for the...
Journal of Systems Integration, 2001
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and ef®ciently retrieve the documents that match the query.
The Semantic Web, 2007
Current information retrieval (IR) approaches do not formally capture the explicit meaning of a keyword query but provide a comfortable way for the user to specify information needs on the basis of keywords. Ontology-based approaches allow for sophisticated semantic search but impose a query syntax more diAEcult to handle. In this paper, we present an approach for translating keyword queries to DL conjunctive queries using background knowledge available in ontologies. We present an implementation which shows that this interpretation of keywords can then be used for both exploration of asserted knowledge and for a semantics-based declarative query answering process. We also present an evaluation of our system and a discussion of the limitations of the approach with respect to our underlying assumptions which directly points to issues for future work.
Semantic Search, 2008
2014 International Conference on Intelligent Computing Applications, 2014
Ontology based semantic search will lead to new generation of search based on the meaning of keyword rather than keyword and helps in finding correct information on the web. Here, ontology provides an explicit specification of conceptualization which helps to connect the information on the existing web pages with the background knowledge. Ontology based search overcomes the semantic gap between the keyword found in documents and those in query. This survey provides an introduction to ontology based semantic search and review the different details of selected ontology based search approaches and compare them by means of classification criteria. Based on this comparison, this survey attempts to identify the possible directions for future research.
Karbala International Journal of Modern Science, 2019
Traditional search mechanisms are based on the keyword search, which does not consider the semantic links between different concepts. This leads to the loss of relevant documents due to inaccurate query formulation or using contextually close words and concepts in the query. To solve the problems of formulating user queries and interdisciplinarity of concepts, it is suggested to use semantic search. The proposed method for implementing semantic search is applicable to large scopes of text data and is based on using a genetic algorithm. Unlike standard methods for information search, the suggested method allows us to consider the semantics of interrelationships between concepts and to handle interdisciplinary concepts correctly. By the aid of semantic tagging, documents contain concepts that are not present in the user's initial query but are semantically close to the requested concepts. Semantic tagging is performed for each document separately, which provides parallel tagging in several subject areas. By the time of the document ontological profile formation is completed, all semantic distances between pairs of distinguished concepts are calculated. Concepts are considered contextually close if their semantic proximity value is above a certain threshold value that is specified in the search parameters. Building a document ontological profile is a multicriteria task, since it depends on a lot of characteristics, so genetic algorithms can be used to solve it effectively. The developed genetic algorithm is intended for more accurate distribution of weight coefficients and estimation of semantic proximity of concepts.
Information Systems, 2009
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a sessionbased perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.
Keyword based Search engines are not able to provide relevant search result because they suffer from the fact that they do not know the meaning of the terms and expression used in the web pages and the relationship between them. This paper compares the semantic search performance of both keyword-based and semantic web based search engines. Initially, two keyword based search engines (Google and Yahoo) and three semantic search engines (Hakia, DuckDuckGo and Bing) are selected to compare their search performance on the basis of precision ratio and how they handle natural language queries. Ten queries, from various topics was run on each search engine, the first twenty documents on each retrieval output was classified as being "relevant" or "non-relevant". Afterwards, precision ratios were calculated for the first 20 document retrieved to evaluate performance of these search engines. Also, comparison of some popular Semantic search engines is provided with their features.
Proceedings of the …, 2008
We are happy to see that this workshop succeeded in attracting a large number of high quality paper submissions, all of which are targeting one or, most often, multiple of these questions. Overall, the workshop program committee has selected 10 submissions for oral presentation and inclusion in these proceedings.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.