A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM

Iram Siraj; MOHD SHAHID HUSAIN

A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM

Iram Siraj

MOHD SHAHID HUSAIN

visibility

…

description

10 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This is the era of Information Technology. Today the most important thing is how one gets the right information at right time. More and more data repositories are now being made available online. Information retrieval systems or search engines are used to access electronic information available on the internet. These information retrieval systems depend on the available tools and techniques for efficient retrieval of information content in response to the user query needs. During last few years, a wide range of information in Indian regional languages like Hindi, Urdu, Bengali, Oriya, Tamil and Telugu has been made available on web in the form of e-data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these languages are very limited. We have developed a language independent system to facilitate efficient retrieval of information available in Urdu language which can be used for other languages as well. The system gives precision of 0.63 and the recall of the system is 0.8.

Figures (4)

Table 2: different options for considering inverse document frequency The options for the factor document length i.e. C is:

Table 1: different options for considering term frequency The options for the factor inverse document frequency i.e. B is: Computer Science & Information Technology (CS & IT)

Fig. 1: A VSM model representing 3 documents and a query The IR system rank the documents by the closeness of document vectors to the query vectors. IR system then retrieve the top ranked documents to the user.

Table 4: dataset specification used for Urdu IR

Muhammad Mudassir

Indexing techniques are used to improve retrieval of data in response to certain search condition. Inverted files are mostly used for creating indexes. This paper proposes indexing technique for Urdu language. Language processing step in Index creation is different for a particular language. We discuss index creation steps specifically for Urdu language. We explore morphological rules for Urdu language and implement these rules to create Urdu stemmer. We implement our proposed technique with different implementations and compare results. We suggest that indexes should be created without stop words and also index file should be an order index file.

Log In

A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM

Sign up for access to the world's latest research

Abstract

Figures (4)

Related papers

Related papers

Related topics