Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
AI
This book introduces the principles and advancements in the field of information retrieval, focusing on web search engines and their evolution. It highlights the shift in user preferences towards digital information sources over traditional ones and outlines the educational framework for understanding core concepts, including inverted indexes, term weighting, and ranking algorithms. Aimed at graduate students and advanced undergraduates, the work provides a structured approach to mastering the basics of information retrieval along with its practical applications.
Web Data Mining, 2011
Information retrieval is the process of searching within a document collection for information most relevant to a user's query. However, the type of document collection significantly affects the methods and algorithms used to process queries. In this chapter, we distinguish between two types of document collections: traditional and Web collections. Traditional information retrieval is search within small, controlled, nonlinked collections (e.g., a collection of medical or legal documents), whereas Web information retrieval is search within the world's largest and linked document collection. In spite of the proliferation of the Web, more traditional nonlinked collections still exist, and there is still a place for the older methods of information retrieval.
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited by the sources are consistent and point to exponential growth in the past and in the coming decade. Hence it is not surprising that about 85% of Internet users surveyed claim using search engines and search services to find specific information. The same surveys show, however, that users are not satisfied with the performance of the current generation of search engines; the slow retrieval speed, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. We discuss the development of new techniques targeted to resolve some of the problems associated with Web-based information retrieval,and speculate on future trends.
Decision Support Systems, 1999
There is limited reliability of internet-based information systems. For example, Internet search engines provide results that have limited reliability and data available on the Internet is limited in its reliability. As a result, the purpose of this paper is to elicit sources of the lack of reliability, develop a model that can be used to study the impact of reliability and propose some solutions to mitigate reliability issues. The model couches Internet data as an ''intermediary report.'' For example, use Ž . of a search engine will generate an intermediary ''report'' providing a list of relevant universal resource locators URL and a corresponding brief description that may or may not correctly describe the label being searched. This ''report'' structure is used to model Internet information and retrieval systems as an intermediate step between users of the system and the original or expected information. The basic model of information relevance in the information retrieval process is reviewed, where the precision is a function, in part, of the recall and fallout rate. Reliability is found to have an impact on precision and fallout rates. Alternatives are proposed to mitigate the impact of this lack of reliability. q
Desde muito cedo que a espécie Humana sentiu a necessidade de manter registos da sua actividade, para que possam ser facilmente consultados futuramente. A nossa própria evolução depende, em larga medida, deste processo iterativo em que cada iteração se baseia nestes registos. O aparecimento da web e o seu sucesso incrementaram significativamente a disponibilidade da informação que rapidamente se tornou ubíqua. No entanto, a ausência de controlo editorial origina uma grande heterogeneidade sob vários aspectos. As técnicas tradicionais em recuperação de informação provam ser insuficientes para este novo meio. A recuperação de informação na web é a evolução natural da área de recuperação de informação para o meio web. Neste artigo apresentamos uma análise retrospectiva e, esperamos, abrangente desta área do conhecimento Humano.
2010
Page 1. Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Page 2. Contents Search Engines and Information Retrieval 1 1.1 What Is Information Retrieval?
Computing in Science and Engineering, 2004
The first Web information services were based on traditional information retrieval algorithms, which were originally developed for smaller, more coherent collections than the Web. Due to the Web's continued growth, today's Web searches require new techniquesexploiting or extending linkages among Web pages, for example.
Internet is one of the main sources of information for millions of people. One can find information related to practically all matters on internet. Moreover if we want to retrieve information about some particular topic we may find thousands of Web Pages related to that topic. But our main concern is to find relevant Web Pages from among that collection. So in this paper I have discussed that how information is retrieved from the web and the efforts required for retrieving this information in terms of system and users efforts.
2008
Searching for information is commonly an individual task which aims at solving any information need. To do that, one may go to a library, or go surfing the Web in order to find relevant information. Indeed, due to the large amount of available documents, the Web has become a favorite information source for solving daily information needs. An issue remains: the Web is in perpetual evolution; so the problem is less the existence of relevant information rather than the way users find it. One may compare searching for information on the Web with "looking for a needle in a haystack." Thus, searching the Web suffers from many limits that can be reduced by using a search assistant. Such an assistant helps the user to find relevant information on the Web. At the beginning, those assistants were principally helping each user individually. Nowadays, we are witnessing the rise of social approaches in such systems. Those latter systems help users to find relevant information by using other users' experience, shared information… Therefore, each user is helped thanks to the mass crowd. This chapter underlines this search assistants evolution, it is organized as follows: section 1 introduces the underlying concepts and limits of traditional information search process and its application to the Web. Section 2 explains the search assistant concept by detailing their evolution from individual to social approaches. Sections 3 up to 5 present current approaches that search assistants may use to help any user to query and browse the Web as well as to improve search-related activities. To conclude, future trends for Web information assistants are discussed.
Contemporary Issues
Since the 19 th century, the world has witnessed an exponential growth in the number and variety of information products, sources, and services. This development has resulted in technological innovations for faster and more efficient processing and storage of information, as individuals and organisations strive to keep up with increasing demands. The value of information organisation cannot be overemphasized. The volume of information generated, transmitted and stored is of such immense proportion that without adequate organisation, the retrieval process would be cumbersome and frustrating. This chapter will highlight and describe the roles of an information retrieval system and the context of information organisation in several institutions. It will also discuss the various information retrieval tools and the different models used in information retrieval process. The ultimate goal of this chapter is to enable students, practicing librarians, and others interested in information services to understand the concepts, principles, and tools behind information organisation and retrieval. The conclusion of the chapter will emphasize the need for continuous evaluation of these principles and tools for sustained improvement.
2007
Abstract Since its inception in the late 1950s, the field of Information Retrieval (IR) has developed tools that help people find, organize, and analyze information. The key early influences on the field are well-known. Among them are HP Luhn's pioneering work, the development of the vector space retrieval model by Salton and his students, Cleverdon's development of the Cranfield experimental methodology, Spärck Jones' development of idf, and a series of probabilistic retrieval models by Robertson and Croft.
University of Surrey , 2005
Web World Wide contains large sets of information. This characteristic of Web however, can become a real pain for users who seek sources that would be qualitative and relative, at the same time, to their informative needs. In this Final Year project we try to examine some information retrieval methods over web stored information. The main focus is given on if and how software agents could potentially enhance the information retrieval process. Another topic that we examine in this final year project is the require- ments, phases and evaluation process that are necessary in software design & production process
IEEE Internet Computing, 1997
T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.
ACM SIGIR Forum, 2012
During a three-day workshop in February 2012, 45 Information Retrieval researchers met to discuss long-range challenges and opportunities within the field. The result of the workshop is a diverse set of research directions, project ideas, and challenge areas. This report describes the workshop format, provides summaries of broad themes that emerged, includes brief descriptions of all the ideas, and provides detailed discussion of six proposals that were voted "most interesting" by the participants. Key themes include the need to: move beyond ranked lists of documents to support richer dialog and presentation, represent the context of search and searchers, provide richer support for information seeking, enable retrieval of a wide range of structured and unstructured content, and develop new evaluation methodologies.
Without growing sources of new retrievable academic and practical knowledge, society will become poorer. This paper asks: what future model is suitable for retrieval from the digital library? What kind of Information Retrieval (IR) education does research and teaching in Higher Education demand?
2012
Abstract This paper describes a brief history of the research and development of information retrieval systems starting with the creation of electromechanical searching devices, through to the early adoption of computers to search for items that are relevant to a user's query. The advances achieved by information retrieval researchers from the 1950s through to the present day are detailed next, focusing on the process of locating relevant information. The paper closes with speculation on where the future of information retrieval lies.
Bulletin of the Medical Library Association, 1960
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99, 1999
Efforts to improve Web search facilities call for improved understanding of user characteristics. We investigated the types of knowledge that are relevant for web-based information seeking, along with the knowledge structures and related strategies. In an exploratory field experiment, 12 established Internet experts were first interviewed about search strategies and then performed a series of realistic search tasks on the WWW. Based on this preliminary study a model of information searching on the WWW was derived and tested in a second study. In the second experiment two classes of potentially relevant types of knowledge were directly compared. Using a series of search tasks in an economics-related domain (introduction of the EURO currency) we investigated the effects of Web experience and domain-specific background knowledge on search strategies. We found independent and combined effects of both Web experience and domain knowledge, hinting at the importance of considering both types of expertise as cognitive factors in web-based searches.
2016
The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. The search is still potentially combinatorial explosive, so we put a resource limitation on search activity. This limit is expressed as a maximum number of accesses to non-local Web nodes per minute. Current information retrieval tools mostly use keyword search, which is unsatisfactory option because of its low precision and recall.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.