Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
The article addresses: (1). The design of an information retrieval (IR), as the Multilingual Information Retrieval Tool Hierarchy (MIRTH), which with virtual corpora on the World Wide Web, also known as Web or WWW. It is motivated by the desire to create a search engine to retrieve information by accessing a virtual. (2). The implementation of a general model of multilingual retrieval for the Web searching. It copes with both and Chinese information retrieval techniques. This paper starts to address some problems of the World Wide Web relating to information retrieval. Then it introduces some existing information retrieval tools on the Web. The need to create a multilingual search engine is discussed. Next, a general hierarchy of MIRTH search engine is illustrated. Furthermore, techniques to set up a MIRTH search engine are explored. These include build up data files, a structure of the search engine [Gilster, 1996], and constraints on query syntax. In addition, the means to create MIRTH multilingual search engine for Chinese (English) information retrieval is dealt and some examples of using MIRTH search engine are given.
1997
The article addresses: (1). The design of an information retrieval (IR), as the Multilingual Information Retrieval Tool Hierarchy (MIRTH), which with virtual corpora on the World Wide Web, also known as Web or WWW. It is motivated by the desire to create a search engine to retrieve information by accessing a virtual. (2). The implementation of a general model of multilingual retrieval for the Web searching. It copes with both and Chinese information retrieval techniques. This paper starts to address some problems of the World Wide Web relating to information retrieval. Then it introduces some existing information retrieval tools on the Web. The need to create a multilingual search engine is discussed. Next, a general hierarchy of MIRTH search engine is illustrated. Furthermore, techniques to set up a MIRTH search engine are explored. These include build up data files, a structure of the search engine [Gilster, 1996], and constraints on query syntax. In addition, the means to create MIRTH multilingual search engine for Chinese (English) information retrieval is dealt and some examples of using MIRTH search engine are given.
Information processing & …, 2000
In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisation and automatic translation of summaries and documents. The system supports French, German and English and has been installed and tested in the online services of two European internet content and service provider companies. This paper focuses on the techniques and algorithms used in the MULINEX system, explaining how each component works and how it contributes to the overall functionality of the integrated system. The primary system functionalities are outlined from the user perspective, followed by a description of the document database used in the system. The technologies and linguistic resources used in the various system components are then described in detail.
Language independent information retrieval is one of the major issues in the web access by the regional population of any kind. This paper addresses the design and implementation of such information retrieval system. In this system the user is allowed to pose the query in any language and also he can retrieve the information in any other specified language. This approach encounters the design and implementation of a software_morph_parser which encompasses the natural language processing principles and retrieves the information efficiently. The software_morph_parser divides the input search text into individual words and keywords are identified. The keywords are converted into their root forms by removing all their inflexion forms and the corresponding root words are translated into the target language. The multi-lingual web database is dynamically indexed by a dyn_crawler and a search engine is invoked which searches the indexed database and ranks the pages as per the relevance to the keyword.
To provide the user with relevant information, is the utmost goal of information retrieval.As the information on web is available in many languages, retrieval need not be restricted to single language. We can place the query in any language and also can search for information in documents represented in any language. In this paper different flavors of retrieval with respect to language like monolingual,bilingual,cross lingual and multilingual are summarized. Different types of resources available to perform the search are also described.
Indonesian Journal of Electrical Engineering and Computer Science
Cross language information retrieval (CLIR) is a retrieval process in which the user fires queries in one language to retrieve information from another (different) language. The diversity of information and language barriers are the serious issues for communication and cultural exchange across the world. To solve such barriers, Cross language information retrieval system, are nowadays in strong demand. CLIR is a subset of Information Retrieval (IR) system. Information Retrieval deals with finding useful information from a large collection of unstructured, structured and semi-structured data to a user query where the query is a set of keywords. Information Retrieval can be classified into different classes such as Monolingual information retrieval, Bi-Lingual Information Retrieval, Multilingual information retrieval and Cross language information retrieval. This paper focuses on the various IR variants and techniques used in CLIR system. Further, based on available literature, a numb...
Information Retrieval, 12(3):227-229, 2009. ISSN 1386-4564. DOI 10.1007/s10791-009-9091-2, 2009
World conference on …, 2008
This paper describes a multilingual semantic search assistant (MSSA) that could help users find proper keywords in FAO fulltext-based search engines. The MSSA implements four independent functions each associated with the AGROVOC multilingual thesaurus to assist users in choosing better keywords: a Venn diagram-based Boolean search interface, an animated AGROVOC concept browser, cross-language query expansion for five official FAO languages, and domain-specific synonym expansion. A usability study of cross-language support and domain-specific synonym expansion was performed through a comparative evaluation of precision and recall in the WAICENT Information Finder. The cross-language support functionality yielded a significant increase in recall without any decrease in precision when compared to a simple search in. Domain-specific synonym expansion, on the other hand, made no significant difference in precision and recall. We suggest some ways to improve the performance of MSSA.
Advances in Computer- …, 2010
One of the first Multi-Language Information Retrieval (MLIR) systems was implemented in 1969 by Gerard Salton who enhanced his SMART system to retrieve multilingual documents in two languages, English and German. However, the research field of MLIR is still struggling since the majority of information retrieval systems are monolingual and more precisely English-based, even though only 6% of the world's population native language have as English . This paper presents a Multi-Language Information Retrieval (MLIR) approach that falls into the area of Domain Specific Information Retrieval (E-learning being the domain). The approach we followed is a synergistic approach between (1) Thesaurus-based Approach and (2) Corpusbased Approach. This research has been implemented on a real platform called HyperManyMedia 1 at Western Kentucky University.
2012
Bilingual corpora together with machine learning technology can be used to solve problems in natural language processing. In addition, bilingual corpora are useful for mapping linguistic tags of less popular languages, such as Vietnamese, and for studying comparative linguistics. However, Vietnamese corpora still have some shortcomings, especially English-Vietnamese bilingual corpora. This paper focuses on a searching method for bilingual Internet materials to support establishing an English-Vietnamese bilingual corpus. Based on the benefit of natural language processing toolkits, the system concentrates on using them as a solution for the problem of searching any Internet English-Vietnamese bilingual document without the need for any rules. We propose a method for extracting the main content of webpages without the need for frame of website or source of website before processing. Several other natural language processing tools included in our system are English-Vietnamese machine translation, extracting Vietnamese keywords, search engines, and comparing similar documents. Our experiments show several valuable auto-searching results for the US Embassy and Australian Embassy websites.
The number of web users accessing information’s over the net is growing hastily with each passing day. A massive quantity of information in extraordinary language is to be had on internet which can be accessed by way of all and sundry. Information retrieval(IR) is the technology which offers with locating beneficial statistics from a large series data, specifically unstructured, based and semi-based information. Records retrieval can be labeled into exclusive lessons which include monolingual information retrieval, move language information retrieval (CLIR) and multilingual information retrieval (MLIR). The sector has turn out to be global village naw and the range of data and language boundaries are the principal issues for verbal exchange and cultural trade internationally. To remedy such issues and to take away those barriers, move language information retrieval (CLIR) device is in strong demand now days. CLIR is the IR device in which the question or documents may seem in one-of-a-kind languages. on this paper we are able to offer an overview of the brand new application areas of CLIR. We are able to also overview the strategies which are used in the manner of CLIR research for question and record translation. Furthermore, we will additionally try and identify some of demanding situations and troubles in CLIR systems.
Journal of the American Society for Information Science, 2000
Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable translated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what form the translated result is presented in. About 100,000 Web pages translated in the last four months of 1997 are used for quantitative study of online and real-time Web page translation.
In this paper, we present a meaning based search engine that can be used as a multi-lingual platform for all sorts of search queries. We have used the Universal Networking Language (UNL) as the underlying communicator. We try to surpass the language barrier at the World Wide Web (WWW) level. WWW is the largest repository of knowledge known and a language gap here is obviously a big drawback. Although, we strongly believe that this hiatus is surmountable and the search engine is an early effort in this direction.
International Journal of Innovative Technology and Exploring Engineering
In the era of globalization, internet being accessible and affordable has gained huge popularity and is widely being used almost everywhere by Government, private organizations, companies, banks, etc. as well as by individuals. It has empowered its users to contribute to the creation of information on web enabling them to use their native languages which consequently has drastically increased the volume of web-accessible documents available in languages other than English. This exponential growth of information on the internet has also induced several challenges before the information retrieval systems. Most of the present monolingual information retrieval systems can retrieve documents in the language of query only, missing the information in other languages that may be more relevant to the user. The need of information retrieval systems to become multilingual has given rise to the research in Cross Language Information Retrieval (CLIR) which can cross the language barriers and ret...
QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disam- biguatinn. Similar methods are applied to the returned docu- ment to produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates tradi- tional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that rela- tively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR inter- face to WWW-based search services.
1996
Information retrieval in a foreign language requires modification to text and user interfaces. Stemming, word boundary identification, punctuation and stopword identificdation must all be modified; appropriate input and presentation methods must be provided. But once these interface issues are resolved the retrieval model and enhancement techniques operate equally effectively in all the languages we have worked with.
Web search is becoming essential for every day life, where major need arises for extracting relevant knowledge from enormous amounts of the available data. In a modern information retrieval systems, data is modeled as a term-by-document matrix. User query is represented as a vector and database search becomes a simple vector operation. The Latent Semantic Indexing (LSI) method reduces the size of term by document matrix and improves the performance of information retrieval system. Great majority of these systems are based on the English language. Although these systems are applicable to documents in other languages, they can suffer from incomplete terms recognition. We focus on languages with a complex set of grammar rules where improvement can be achieved by giving the indexing system basic knowledge of the language, and ability to recognize different forms of the same word. Using this technique, original matrix can be reduced by order of magnitude and important term-document connections strengthened. We are developing web indexing engine with local language support using Ispell dictionary files. As part of this effort, Croatian language dictionary files have been developed.
IEEE Internet Computing, 1997
T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.
Int. J. Comput. Linguistics Chin. Lang. Process., 2000
Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an incredibly high rate in recent years. Due to the ideographic nature of Japanese and Chinese, complicated with the existence of several encoding standards in use, efficient processing (representation, indexing, retrieval, etc.) of such information became a tedious task. In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information. We report the results of mono- and cross- language information retrieval on a Kanji space where documents and queries are represented in terms of Kanji oriented vectors. We also employ a dimens...
Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an incredibly high rate in recent years. Due to the ideographic nature of Japanese and Chinese, complicated with the existence of several encoding standards in use, efficient processing (representation, indexing, retrieval, etc.) of such information became a tedious task. In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information. We report the results of mono-and cross-language information retrieval on a Kanji space where documents and queries are represented in terms of Kanji oriented vectors. We also employ a dimensionality reduction technique to compute a Kanji Conceptual Space (KCS) from the initial Kanji space, which can facilitate conceptual retrieval of both mono-and cross-language information for these languages. Similar indexing approaches for multiple European languages through term association (e.g., latent semantic indexing) or through conceptual mapping (using lexical ontology such as, WordNet) are being intensively explored. The Interlingua approach investigated here with Japanese and Chinese languages, and the term (or concept) association model investigated with the European languages are similar; and these approaches can be easily integrated. Therefore, the proposed Interlingua model can pave the way for handling multilingual information access and retrieval efficiently and uniformly.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.