Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1999
Although search over World Wide Web pages has recently received much academic and commercial attention, surprisingly little research has been done on how to search the web pages within large, diverse intranets. Intranets contain the information associated with the internal workings of an organization.
2000
We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-thefly. A user study compared our new category interface with the typical ranked list interface of search results. The study showed that the category interface is superior both in objective and subjective measures. Subjects liked the category interface much better than the list interface, and they were 50% faster at finding information that was organized into categories. Organizing search results allows users to focus on items in categories of interest rather than having to browse through all the results sequentially.
We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-thefly. A user study compared our new category interface with the typical ranked list interface of search results. The study showed that the category interface is superior both in objective and subjective measures. Subjects liked the category interface much better than the list interface, and they were 50% faster at finding information that was organized into categories. Organizing search results allows users to focus on items in categories of interest rather than having to browse through all the results sequentially.
1993
Users of Web search engines are often forced to sift through the long ordered list of document "snippets" returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into "custom folders" based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users' interests. In this paper, we introduce Grouper-an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents followed, and in the amount of time and effort expended by users accessing search results through these two interfaces.
2001
I am deeply grateful to my advisor James Allan for his support and guidance through all my work. To paraphrase a Russian proverb, he helped me to see the forest behind the trees -he helped me to stay aware and appreciate the bigger picture behind the small details of the everyday research problems.
2006
While searching the web, the user is often confronted by a great number of results, generally displayed in a list which is sorted according to the relevance of the results. Facing the limits of this approach, we propose to explore new organizations and presentations of search results, as well as new types of interactions with the results to make their exploration more intuitive and efficient. The main topic of this paper is the processing of the results coming from an information retrieval system. Although the relevance depends on the results quality, the effectiveness of the results processing represents an alternative way to improve the relevance for the user. Given the current expectations, this processing is composed by an organization step and a visualization step. Then the proposed approach organizes the results according to their meaning using a Kohonen Self-Organizing Map (SOM), and visualizes them in a 3D scene to increase the representation space. The 3D metaphor proposed here is a city.
Information Processing & Management, 1993
We have developed a collaborative, reuse hypertext system that has novel browsing and retrieval characteristics. The system, called Many Using and Creating Hypertext (MUCH), has been implemented on a network of UNIX workstations and used extensively in our group. This paper ...
Information Processing and Management, 1996
The paper describes the design and implementation of TACHIR, a tool for the automatic construction of hypertexts for Information Retrieval. Through the use of an authoring methodology employing a set of well known Information Retrieval techniques, TACHIR automatically builds up a hypertext from a document collection. The structure of the hypertext reflects a three level conceptual model that has proved to be quite effective for Information Retrieval. Using this model it is possible to navigate among documents, index terms, and concepts using automatically determined links. The hypertext is implemented using the HTML hypertext mark up language, the mark up language of the World Wide Web project. It can be distributed on different sites and different machines over the Internet, and it can be navigated using any of the interfaces developed in the framework World Wide Web project, for example NETScAPE.
Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries - JCDL '01, 2001
The potential of automatically generated indexes for information access has been recognized for several decades, but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge. This paper considers two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: (1) What criteria are useful for assessing the usefulness of automatically identified index terms? (2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content? The terms this paper focuses on have been identified by LinkIT, a software tool for identifying significant topics in text. Over 90t of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps user to efficiently browse and disambiguate terms. The paper conclude that the approach to information access discussed is very promising, and that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing. (Contains 25 references.) (Author/AEF) Reproductions supplied by EDRS are the best that can be made from the original document.
1997
With the explosive growth of the Web, one of the biggest challenges in exploiting the wealth of available information is to locate the relevant documents. Search engines play a crucial role in addressing this problem by precompiling a large index of available information to quickly produce a set of possibly relevant documents in response to a query. While most Web users make extensive use of the Internet search engines, few people have more than a vague idea of how these systems work.
International Journal of Computer …, 2010
World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. While retrieving information through user queries, a search engine results in a large and unmanageable collection of documents. Web mining tools are used to classify, cluster and order the documents so that users can easily navigate through the search results and find the desired information content. A more efficient way to organize the documents can be a combination of clustering and ranking, where clustering can group the documents and ranking can be applied for ordering the pages within each cluster. Based on this approach, in this paper, a mechanism is being proposed that provides ordered results in the form of clusters in accordance with user"s query. An efficient page ranking method is also proposed that orders the results according to both the relevancy and the importance of documents. This approach helps user to restrict his search to some top documents in particular clusters of his interest.
International Journal of Human-Computer Studies, 2008
Web directories organize voluminous information into hierarchical structures, helping users to quickly locate relevant information and to support decision-making. The development of existing ontologies and Web directories either relies on expert participation that may not be available or uses automatic approaches that lack precision. As more users access the Web in their native languages, better approaches to organizing and developing non-English Web directories are needed. In this paper, we have proposed a semi-automatic framework, which consists of anchor directory boosting, meta-searching, and heuristic filtering, to construct domain-specific Web directories. Using the framework, we have built a Web directory in the Spanish business (SBiz) domain. Experimental results show that the SBiz Web directory achieved significantly better recall, F-value, efficiency, and satisfaction rating than the benchmark directory. Subjects provided favorable comments on the SBiz Web directory. This research thus contributes to developing a useful framework for organizing domain-specific information on the Web and to providing empirical findings and useful insights for end-users, system developers, and researchers of Web information seeking and knowledge management.
Journal of Information Science, 2007
Context has long been considered very useful to help the user assess the actual relevance of a document. In web searching, context can help assess the relevance of a web page by showing how the page is related to other pages in the same web site, for example. Such information is very difficult to convey and visualize in a user friendly way. In this paper we present the design, implementation and evaluation of a graphical visualization tool aimed at helping users to determine the relevance of a web page by displaying the structure of the web site the page belongs to. The results of an initial evaluation suggest that this visualization technique helps the user navigate large web sites and find useful information in an effective way, without increasing the cognitive load of the user.
Computer Networks, 1999
Users of Web search engines are often forced to sift through the long ordered list of document 'snippets' returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into 'custom folders' based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users" interests. In this paper, we introduce Grouper, an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents followed, and in the amount of time and effort expended by users accessing search results through these two interfaces.
IEEE Internet Computing, 1997
T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.
2003
In the Web search process people often think that the hardest work is done by the search engines or by the directories which are entrusted with finding the Web pages. While this is partially true, a not less important part of the work is done by the user, who has to decide which page is relevant from the huge set of retrieved pages. In this paper we present a graphical visualisation tool aimed at helping users to determine the relevance of a Web page with respect to its structure.
Computer, 1997
A Tool for Organizing Web Information T he physical and logical differences among information sources on the Internet complicate information retrieval in several ways.
Proceedings of the American Society for …, 2005
By examining the log files from a corporate intranetsearch engine, we have analysed the actual web searching behaviour of real users in a real business environment. While building on previous research on public search engines, we apply an alternative session definition that we argue is more appropriate. Our results regarding session length, query construction and result page viewing confirm some of the findings from similar studies carried out on public search engines but further our understanding of web searching by presenting details on corporate users’ activities. In particular, we suggest that search sessions are shorter than previously suggested, search queries have fewer terms than observed for public search engines, and number of examined result pages is smaller than reported in other research. More research on how corporate intranet users search for information is needed.
2001
We developed and evaluated seven interfaces for integrating semantic category information with Web search results. List interfaces were based on the familiar ranked-listing of search results, sometimes augmented with a category name for each result. Category interfaces also showed page titles and/or category names, but re-organized the search results so that items in the same category were grouped together visually. Our user studies show that all Category interfaces were more effective than List interfaces even when lists were augmented with category names for each result. The best category performance was obtained when both category names and individual page titles were presented. Either alone is better than a list presentation, but both together provide the most effective means for allowing users to quickly examining search results. These results provide a better understanding of the perceptual and cognitive factors underlying the advantage of category groupings and provide some practical guidance to Web search interface designers.
IEEE Transactions on Knowledge and Data Engineering, 2002
Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase search engine. T he knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes.
Information processing & management, 1996
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.