A relevant image search engine with late fusion

luis gonzalo garcìa leiva

A relevant image search engine with late fusion

2011, Proceedings of the 16th international conference on Intelligent user interfaces

Abstract

A fundamental problem in image retrieval is how to improve the text-based retrieval systems, which is known as "bridging the semantic gap". The reliance on visual similarity for judging semantic similarity may be problematic due to the semantic gap between low-level content and higher-level concepts. One way to overcome this problem and increase thus retrieval performance is to consider user feedback in an interactive scenario. In our approach, a user starts a query and is then presented with a set of (hopefully) relevant images; selecting from these images those which are more relevant to her. Then the system refines its results after each iteration, using late fusion methods, and allowing the user to dynamically tune the amount of textual and visual information that will be used to retrieve similar images. We describe how does our approach fit in a real-world setting, discussing also an evaluation of results.

We propose in this paper the specification of an image retrieval architecture based on a relevance feedback framework which operates on high-level image descriptions instead of their extracted low-level features. This framework features a conceptual model which integrates visual semantics as well as symbolic relational characterizations and operates on image objects, abstractions of visual entities within a physical image. Also, it manipulates a rich query language, consisting of both boolean and quantification operators, which therefore leads to optimized user interaction and increased retrieval performance. Let us first introduce the context of our research. In order to cope with the storing and retrieval of ever-growing digital image collections, the first retrieval systems (cf. [Smeulders et al. 00] for a review of the state-of-the-art), known as content-based, propose fully automatic processing methods based on low-level signal features (color, texture, shape...). Although they allow the fast processing of queries, they do not make it possible to search for images based on their semantic content and consider for example red apples or Ferraris as being the same entities simply because they have the same color distribution. Failing to relate low-level features to semantic characterization (also known as the semantic gap) has slowed down the development of such solutions since, as shown in [Hollink 04], taking into account aspects related to the image content is of prime importance for efficient retrieval. Also, users are more skilled in defining their information needs using language-based descriptors and would therefore rather be given the possibility to differentiate between red roses and red cars. In order to overcome the semantic gap, a class of frameworks within the framework of the European Fermi project proposed to model the image semantic and signal contents following a sharp process of human-assisted indexing [Mechkour 95] [Meghini et al. 01]. These approaches, based on elaborate knowledge-based representation models, provide satisfactory results in terms of retrieval quality but are not easily usable on large collections of images because of the necessary human intervention required for indexing. Automated systems which attempt to deal with the semantics/signal integration (e.g. iFind [Lu et al. 00] and the prototype presented in [Zhou & Huang 02]) propose solutions based on textual annotations to characterize semantics and on a relevance feedback (RF) scheme operating on low-level features. RF techniques are based on an interaction with a user www.intechopen.com providing judgment on displayed images as to whether and to what extent they are relevant or irrelevant to his need. For each loop of the interaction, these images are learnt and the system tries to display images close in similarity to the ones targeted by the user. As any learning process, it requires an important number of training images to achieve reasonable performance. The user is therefore solicited through several tedious and time-consuming loops to provide feedback for the system in real time, which penalizes user interaction and involves costly computations over the whole set of images. Moreover, starting from a textual query on semantics, these state-of-the art systems are only able to manage opaque RF (i.e. a user selects relevant and/or non-relevant documents and is then proposed a revised ranking without being given the possibility to 'understand' how his initial query was transformed) since it operates on extracted low-level features. Finally, these systems do not take into account the relational spatial information between visual entities, which affects the quality of the retrieval results. Our RF process is a specific case of state-of-the-art RF frameworks reducing the user's burden since it involves a unique loop returning the relevant images. Moreover, as opposed to the opacity of state-of-the-art RF frameworks, it holds the advantage of being transparent (i.e. the system displays the query generated from the selected documents) and penetrable (i.e. the modification of the generated query is possible before processing), which increases the quality of retrieval results. Through the use of a symbolic representation, the user is indeed able to visualize and comprehend the intelligible query being processed. We manage transparent and penetrable interactions by considering a conceptual representation of images and model their conveyed visual semantics and relational information through a high-level and expressive representation formalism. Given a user's feedback (i.e. judgment or relevance or irrelevance), our RF process, operating on both visual semantics and relational spatial characterization, is therefore able to first generate and then display a query for eventual further modifications operated by the user. It enforces computational efficiency by generating a symbolic query instead of dealing with costly learning algorithms and optimizes user interaction by displaying this 'readable' symbolic query instead of operating on hidden low-level features. As opposed to state-of-the-art loosely-coupled solutions penalizing user interaction and retrieval performance with an opaque RF framework operating on low-level features, our architecture combines a keyword-based module with a transparent and penetrable RF process which refines the retrieval results of the first. Moreover, we offer a rich query language consisting of several Boolean operators. At the core of our work is the notion of image objects (IOs), abstract structures representing visual entities within an image. Their specification is an attempt to operate beyond simple low-level signal features since IOs convey the semantic and relational information. In the remainder, we first detail the processes allowing to abstract the extracted low-level features to high-level relational description in section 2. Section 3 deals with the visual semantic characterization. We specify in section 4 the image model and develop its conceptual instantiation integrating visual semantics and relational (spatial) features. Section 5 is dedicated to the presentation of the RF framework. Taking into account spatial relations between semantically-defined visual entities is crucial in the framework of an image retrieval system since it enriches the index structures and www.intechopen.com

Log In

A relevant image search engine with late fusion

Sign up for access to the world's latest research

Abstract

Related papers

Related topics