Papers by Somnath Banerjee
Human-Computer Interaction with Mobile …, Jan 1, 2003
L. Chittaro (Ed.): Mobile HCI 2003, LNCS 2795, pp. 271-285, 2003. © Springer-Verlag Berlin Heidel... more L. Chittaro (Ed.): Mobile HCI 2003, LNCS 2795, pp. 271-285, 2003. © Springer-Verlag Berlin Heidelberg 2003 ... Online Transcoding of Web Pages for Mobile Devices ... Somnath Banerjee1, Arobinda Gupta1,2, and Anupam Basu1 ... 1 Department of Computer ...

… workshop: Learning to …, Jan 1, 2009
Learning to score-and thereby order-items represented as feature vectors xi ∈ R d , with the goal... more Learning to score-and thereby order-items represented as feature vectors xi ∈ R d , with the goal of minimizing various ranking loss functions, is now a major topic in Information Retrieval. Training data consists of relevant and irrelevant documents identified for a number of queries. Many systems train a model w ∈ R d , such that the score of an item xi is w xi. However, queries are diverse: navigational, informational, transactional, etc. Recently, there has been interest in local learning: customizing a model to each test query. While intuitively appealing, these proposals have either resulted in modest gains, or entail excessive computational burden at test time. In this paper we propose a local learning algorithm based on a new similarity measure between queries. The proposed local learning algorithm does not depend on a fixed query classification scheme. First, we represent (relevant and irrelevant) document vectors for the query as a point cloud. Second, we define an intuitive notion of similarity between the shapes of two point clouds, based on principal component analysis (PCA). Our local learning algorithm clusters queries at training time, using the PCA-based measure of query similarity. During test time, we simply locate the nearest training cluster, and use the model trained for that cluster. Very few clusters are adequate to give substantial boost to test accuracy. Our test time is small, training time is reasonable, and our accuracy beats several recent local learning approaches, as tested on the well-known LETOR dataset.

Proceedings of the 32nd …, Jan 1, 2009
Web search is increasingly exploiting named entities like persons, places, businesses, addresses ... more Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX and TREC. Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rank quantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs with ground truth taken from TREC QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20% better MAP and NDCG compared to the best-known collective rankers, and are 35% better than scoring snippets independent of each other.
… of the 17th international conference on …, Jan 1, 2008
collaborative filtering, skewed dataset, pLSA
Proceedings of the 31st annual international ACM …, Jan 1, 2008
ABSTRACT The World Wide Web has many document repositories that can act as valuable sources of ad... more ABSTRACT The World Wide Web has many document repositories that can act as valuable sources of additional data for various machine learning tasks. In this paper, we propose a method of improving text classification accuracy by using such an additional corpus that can easily be obtained ...

Machine Learning and Applications, 2007. ICMLA …, Jan 1, 2007
Inductive transfer is applying knowledge learned on one set of tasks to improve the performance o... more Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1 corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting.
Proceedings of the 30th annual …, Jan 1, 2007
Uploads
Papers by Somnath Banerjee