0% found this document useful (0 votes)
26 views13 pages

Digital Video Transcoding

Uploaded by

Krishanu Naskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views13 pages

Digital Video Transcoding

Uploaded by

Krishanu Naskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MITSUBISHI ELECTRIC RESEARCH LABORATORIES

[Link]

Digital Video Transcoding

Jun Xin, Chia-Wen Lin, Ming-Ting Sun

TR2005-006 January 2005

Abstract
Video transcoding, due to its high practical values for a wide range of networked video appli-
cations, has become an active research topic. In this review paper, we outline the technical
issues and research results related to video transcoding. We also discuss techniques for reducing
the complexity, and techniques for improving the video quality, by exploiting the information
extracted from the input video bitstream.

Proceedings of the IEEE

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part
without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include
the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of
the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or
republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All
rights reserved.

Copyright c Mitsubishi Electric Research Laboratories, Inc., 2005


201 Broadway, Cambridge, Massachusetts 02139
MERLCoverPageSide2
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 1

Video Adaptation: Concepts, Technologies, and


Open Issues
Shih-Fu Chang, Fellow, IEEE, and Anthony Vetro, Senior Member, IEEE

pervasive media environments. It takes into account


Abstract — Video adaptation is an emerging field that offers a information about content characteristics, usage environments,
rich body of techniques for answering challenging questions in user preferences, and digital rights conditions. Its objective is
pervasive media applications. It transforms the input video(s) to to maximize the utility of the final presentation while
an output in video or augmented multimedia form by utilizing
manipulations at multiple levels (signal, structural, or semantic)
satisfying various constraints. Utility represents users’
in order to meet diverse resource constraints and user satisfaction towards the final presentation and is defined based
preferences while optimizing the overall utility of the video. on application contexts and user preferences.
There has been a vast amount of activities in research and Video adaptation differs from video coding in its scope and
standard development in this area. This paper first presents a intended application locations. There are a wide variety of
general framework that defines the fundamental entities, adaptation approaches – signal-level vs. structural-level vs.
important concepts (i.e., adaptation, resource, and utility), and
formulation of video adaptation as constrained optimization
semantic-level, transcoding vs. selection vs. summarization, or
problems. A taxonomy is used to classify different types of bandwidth- vs. power- vs. time-constrained. Adaptation
adaptation techniques. The state-of–the-art in several active typically takes a coded video as input and produces a different
research areas is reviewed with open challenging issues coded video or an augmented multimedia presentation.
identified. Finally, support of video adaptation from related Another difference is that adaptation is typically deployed in
international standards is discussed. the intermediate locations such as proxy between server and
client, although they may be included in the servers or clients
Index Terms — Video Adaptation, Universal Multimedia
Access, Pervasive Media, Transcoding, Summarization, MPEG- in some applications.
7, MPEG-21 There have been many research activities and advances in
this field. Earlier work such as [1][2] has explored some
I. INTRODUCTION interesting aspects of adaptation like bandwidth reduction,
format conversion, and modality replacement for Web
I N pervasive media environments, users may access and
interact with multimedia content on different types of
terminals and networks. Such an environment includes a rich
browsing applications. Recently, international standards such
as MPEG-7 [20], MPEG-21 [21][24], W3C [21], and TV-
variety of multimedia terminals such as PC, TV, PDA, or Anytime [22] have developed related tools and protocols to
cellular phones. One critical need in such a ubiquitous support development and deployment of video adaptation
environment is the ability to handle the huge variation of applications.
resource constraints such as bandwidth, display capability, Despite the burgeoning interest and advances, video
CPU speed, power, etc. The problem is further compounded adaptation is still a relatively less defined field. There has not
by the diversity of user tasks – ranging from active been a coherent set of concepts, terminologies, or issues
information seeking, interactive communication, to passive defined over well-formulated problems. This paper serves as a
consumption of media content. Different tasks influence preliminary attempt in establishing part of the foundation that
different user preferences in presentation styles and formats. can be used to unify and explore various issues and
Video adaptation is an emerging field that includes a body approaches in this field.
of knowledge and techniques responding to the above Specifically, in section II we present a general conceptual
challenges. A video adaptation tool or system adapts one or framework to define the entity, concepts (resource, utility, and
more video programs to generate a new presentation with a adaptation), and their relations from the perspective of video
video or multimedia format to meet user needs in customized adaptation. Based on the framework, we present a
situations. Fig. 1 shows the role of video adaptation in straightforward but systematic procedure for designing video
adaptation solutions, as well as a taxonomy of different
classes of adaptation technologies. In section III, we review
Manuscript received January XX, 2004; revised May XX, 2004.
S.-F. Chang is with the Department of Electrical Engineering, Columbia current active research areas in video adaptation, with
University, New York, NY 10027 USA (sfchang@[Link]). important open issues discussed in section IV. Support from
A. Vetro is with Mitsubishi Electric Research Labs, Cambridge, MA 02139 related international standards is discussed in section V.
USA (avetro@[Link]).
Finally, conclusions are presented in section VI.
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 2

Terminal Resource
Server / r3 Space
Proxy / Adaptation
Client Space
Live Video a3 r2
Video
Adaptation r1
Entity
e u3 Utility
a2
Space
Stored a1
Video u2

u1
Fig. 1 Role of video adaptation in pervasive media environments to support
heterogeneous terminals and networks. Fig. 2 A general conceptual framework for video adaptation and associated
concepts of resources and utility.
II. A UNIFIED CONCEPTUAL FRAMEWORK AND TECHNOLOGY
TAXONOMY and utility values. An adaptation operation transforms the
entity into a new one and thus changes the associated
Design of video adaptation systems involves many complex
resources and utility values. Like the adaptation space, there
issues. In this section, we first present a general conceptual
are multiple dimensions in the resource space and the utility
framework to clarify and unify various interrelated issues, as
space. Resources may include transmission bandwidth (i.e.,
illustrated in Fig. 2. The framework was based from the one
bit rate), display capabilities (e.g., resolution, color depth),
we presented in [3], with extended description of a systematic
processor speed, power, and memory. Here we focus on the
design procedure and a taxonomy for classifying different
resources available in the usage environment or the delivery
adaptation techniques.
network. The information describing the usage environment
First, “entity” is defined to refer to the basic unit of video
resources (e.g., maximal channel capacity) can be used to
that undergoes the adaptation process. Entities may exist at
derive implicit constraints and limitations for determining
different levels, such as pixel, object, frame, shot, scene,
acceptable adaptation operations.
syntactic components, as well as semantic components.
The utility value represents the quality or users’ satisfaction
Different adaptation operators can be defined for different
of the video content. Utility can be measured in different
types of entities. For example, a video frame can be reduced in
levels – the objective level (e.g., peak signal-to-noise ratio,
resolution, spatial quality, or skipped in order to reduce the
PSNR), the subjective level (e.g., subjective scores), and the
overall bandwidth. A semantic component (such as a story in
comprehension level. The comprehension-level utility
a news program) can be summarized in a visual or textual
measures viewers’ capability in comprehending the semantic
form. A subset of shots in a sequence may be removed in
information contained in a video. Measurement of such
order to generate a condensed version of the video, i.e., video
semantic-level comprehension is difficult as it depends on
skims.
many factors including users’ knowledge, tasks, and domain
Complex entities can be defined by using additional
contexts. In some restricted scenarios, however, it might be
properties. For example, syntactic entities like recurrent
possible to come up with measures of generic
anchor shots in news, pitching shots in baseball, and
comprehensibility without deep understanding of the content.
structured dialog shot sequences in films can be defined by
Such generic semantics may include generic location (indoor
syntactic relations among elements in the video. Semantic
vs. outdoor), people (portrait vs. crowd), time (day vs. night),
entities like scoring events in sports and news stories are
etc. Again, we use the term, utility space, to represent the
caused by real-world events, created by the producer or
multiple-dimensional characteristics of video utility measures.
formed by expectations of the viewers. Affective entities are
The utility value of a video entity is not fixed and is heavily
those defined by affect attributes (such as emotion and mood)
affected by the user preferences. This is particularly true for
conveyed by the video elements.
the subjective and semantic-level utilities. The subjective
The space of feasible adaptations for a given video entity is
relevance of a video entity depends on the user needs for his
called the adaptation space. Note we use the term “space” in a
current task. The user preferences may also be used to set
loose way – the coordinates in each dimension represent
explicit constraints on the feasible adaptation operations, in
particular adaptation operations and a point in the space
addition to the implicit constraints set by the resource
represents a combination of operations from different
limitations described above. For example, if the user prefers to
dimensions. For example, a popular method for transcoding
receive video summaries not longer than certain lengths,
inter-frame transform coded video includes two dimensions:
temporal condensation operations will be needed. Or the user
(1) dropping a subset of transform coefficients in each frame
may prefer to view videos within a window no larger than a
and (2) skipping a subset of frames in the video sequence.
fraction of the screen size, though the actual display resolution
Each entity is associated with certain resource requirements
is not a limiting factor for the full-sized video.
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 3

MPEG-2 Images
shot
sequence

MPEG-4 Thumbnails

Hierarchical
Mosaic
(a) (b)
Scene

Key Frames Shot Shot Shot


+ Narrative

Key Frame Key Frame Key Frame Key Frame

Key Frame Key Frame

(c)
(d)
Fig. 3 Taxonomy of video adaptation operations: (a) Transcode, (b) Select/Reduce, (c) Replace, (d) Synthesize.

Given a video entity, the relationships among the adaptation set to find the optimal adaptation operator to satisfy utility
space, the resource space, and the utility space represent constraints while requiring minimal resources.
critical information for designing content adaptation solutions.
B. Video Adaptation Taxonomy
For example, in Fig. 2 the shaded cube in the resource space
represents the resource constraints imposed by the usage Many interesting adaptation operations have been reported
environment. There exist multiple adaptation solutions that in the literature. To help readers develop a coherent view
satisfy the constraints – we refer to these solutions as the towards different solutions, we present a simple taxonomy
resource-constrained permissible adaptation set. Similarly, based on the type of manipulations performed. Fig. 3 shows
different adaptation operators may result in the same utility illustrative examples of each class of adaptation.
value. Such operators are said to form an equal-utility 1) Format Transcoding: A basic adaptation process is to
adaptation set. It is such a multi-option situation that makes transcode video from one format to another, in order to make
the adaptation problem interesting – our objective is to choose the video compatible with the new usage environment. This is
the optimal one with the highest utility or the minimal not surprising when there are still many different formats
resource while satisfying the constraints. prevailing in different application sectors such as
broadcasting, consumer electronics, and Internet streaming.
A. Systematic Procedure for Designing Video Adaptation One straightforward implementation is to concatenate the
Technologies decoder of one format with the encoder of the new format.
The above conceptual framework can be used to guide the However, such implementations may not be feasible
design process of practical adaptation solutions. Below, we sometimes due to the potential excessive computational
discuss a systematic procedure that utilizes the concepts and complexity or quality degradation. Alternate solutions and
relations of adaptation, resource, and utility. complexity reducing techniques can be found in [9].
1. Identify the adequate entities for adaptation, e.g., 2) Selection/Reduction: In resource-constrained situations, a
frame, shot, sequence of shots, etc. popular adaptation approach is to trade some components of
2. Identify the feasible adaptation operators, e.g., re- the entity for saving of some resources. Such schemes usually
quantization, frame dropping, shot dropping, are implemented by selection and reduction of some elements
replacement, etc., and their associated parameters. in a video entity like shots and frames in a video clip, pixels in
3. Develop models for measuring and estimating the an image frame, bit planes in pixels, frequency components in
resource and utility values associated with video transformed representation, etc. Some of these schemes
entities undergoing identified operators. typically are also considered as some forms of transcoding –
4. Given user preferences and constraints on resource or
changing the bit rate, frame rate, or resolution of an existing
utility, develop strategies to find the optimal adaptation
coded video stream. Reduction involves a selection step to
operator(s) satisfying the constraints.
determine which specific components to be deleted. Uniform
With the above procedure, many video adaptation problems
decimation sometimes is sufficient, while sophisticated
can be formulated as follows. Given a content entity (e), user
methods further explore the non-equal importance of different
preferences, and resource constraints (CR), find the optimal
components based on psychophysical or high-level semantic
adaptation operation, aopt, within the feasible adaptation
models. For example, in several video summarization systems,
region so that the utility of the adapted entity e’ is maximized.
key events (such as scoring in sports) are defined based on
Similar to the above, we can formulate other problems in a
user preferences or domain knowledge. During adaptation,
symmetric way – exploring the utility-constrained permissible
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 4

such highlight events are used to produce condensed video important events are usually defined by the content providers
skims. or derived from user preferences, for example, the scoring
3) Replacement: This class of adaptation replaces selected points in sports video, the breaking news in broadcast
elements in a video entity with less expensive counterparts, programs, and the security breaking events in surveillance
while aiming at preserving the overall perceived utility. For video. Following the framework defined in Section II, we can
instance, a video sequence may be replaced with still frames interpret such events as the segments in the video that have
(e.g., key frames or representative visuals) and associated the highest semantic-level utilities.
narratives to produce a slide show presentation. The overall Video analysis for event detection has been an active
bandwidth requirement can thus be dramatically reduced. If research area in the community of image processing, computer
bandwidth reduction is not a major concern, such adaptation vision, and multimedia. In [4], information in metadata
methods can be used to provide efficient browsing aids in streams (e.g., closed captions and sports statistics) is
which still visuals can be used as visual summaries as well as combined with video analysis to detect important events and
efficient indexes to important points in the original video. players. Sports statistics are provided by commercially
Note the replacement content does not have to be extracted available services. Such data have specific information about
from the original video. Representative visuals that can the scores, player names, and outcomes of events. However,
capture the salient information in the video (e.g., landmark they may not give complete information about content shown
photos of a scene) can be used. in the video. Recognition of scenes and objects in the audio-
4) Synthesis: Synthesis adaptation goes beyond the visual streams adds complementary information, and more
aforementioned classes by synthesizing new content importantly, helps detecting the precise start/end time of
presentations based on analysis results. The goal is to provide events reported in the statistics streams.
a more comprehensive experience or a more efficient tool for In [6], canonical views in sports (e.g., pitching in baseball
navigation. For example, visual mosaics (or panorama) can be and serving in tennis) were recognized through joint feature-
produced by motion analysis and scene construction. The layout modeling. Because of the fixed convention used in the
extended view provides an enhanced experience in production syntax, major events in some sports domains
comprehending the spatio-temporal relations of objects in a usually start with the canonical views, detection of which can
scene. In addition, transmission of the synthesized stream be used to find the event boundaries. Semantic labels of the
usually requires much less bandwidth than the original video detected events were further extracted by recognizing the
sequence since redundant information in the background does score text box embedded in the image [7], resulting in the
not have to be transmitted. Another example of adaptation by development of a video summarization system that
synthesis is the hierarchical summary of video, as shown in automatically captures all the highlight points in the video
Fig. 3(d). Key frames corresponding to highlight segments in such as scoring and last pitch for each player. The above two
a video sequence are organized in a hierarchical structure to systems serve as excellent examples of the necessity of
facilitate efficient browsing. The structures in the hierarchy combining multi-modality information in detecting high-level
can be based on temporal decomposition or semantic semantic events in video.
classification. Results of video event analysis can be utilized to produce
In practical applications of adaptation, various different forms of adaptation. In live video applications such
combinations of the above classes can be used. Selected as sports broadcast, detected event information can be used to
elements of content may be replaced with counterparts of dynamically determine the optimal encoding and transmission
different modalities, encoded with reduced resolutions, formats of the video. [6] demonstrated a real-time adaptive
synthesized according to practical application requirements, streaming system in which non-important segments of the
and finally transcoded to a different format. video were replaced with still visuals, text summaries, and/or
audio only. Fig. 4 illustrates the concept of adaptive
III. ACTIVE RESEARCH AREAS streaming. Such replacements facilitate great saving of the
In this section, we review several active research areas of bandwidth or condensation of the total viewing duration.
video adaptation and show how the proposed resource-utility Important segments (those showing key events) in the video
framework can be used explicitly or implicitly to help can be encoded with high quality or delivered as alerts
formulate the optimization of adaptation processes at different depending on the user preferences. Because of the variable bit
levels – semantic, structural, and signal. The chosen areas are rate used in live video streaming, special transmission
not meant to be exclusive. Many interesting combinations or scheduling and buffer management methods are needed in
variations exist. order to handle the bursty traffic.

A. Semantic Event-Based Adaptation


Detecting semantic highlights or events in video has
attracted much interest in many applications, such as personal
multimedia information agent, video archive management, and
security monitoring. In the context of video adaptation, the
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 5

important segments: video mode optimization design procedure proposed in section II offers a
systematic solution – given the constraints on the transmission
bandwidth or the screen real estate in the user interfaces,
time
determine the optimal set of key frames adaptively so that the
bitrate largest amount of information utility can be achieved.
Another interesting technique for video adaptation at the
non-important segments: structural level is mosaicing, which transforms image frame
still frame + audio + captions
sequences captured by continuous camera takes (usually pan
and zoom) into a panoramic view [8]. Background pixels
Fig. 4 Event-based adaptive streaming of videos over bandwidth limited links
captured in different frames are aligned and “stitched”
together by estimating camera motions and pixel
The performance gains of the above event-adaptive
correspondence. The foreground moving objects are detected,
streaming scheme depend on the actual video content, e.g., the
and their moving trajectories are shown on top of the
percentage of important segments in the whole stream. In an
mosaiced background to highlight the long-term movement of
experiment using baseball videos, we found non-important
the objects. An example of video mosaic for soccer video
segments occupy more than 50% of duration. Such a
from [8] is shown in Fig. 5.
significant ratio provides a large room for bandwidth
reduction or time condensation. The speed and accuracy also C. Transcoding
depend on the complexity of events in each domain. For Below the semantic and structural levels comes the signal
canonical views in sports, we realized a real-time software level adaptation, involving various manipulations of coded
implementation with detection accuracy higher than 90% [6]. representations and issues of bit allocation. As mentioned in
the adaptation taxonomy, the most straightforward way of
transcoding is to decode video from a format to a new one,
usually with change of bit rate as well. In applications that
involve real-time transcoding of live videos for multiple users,
design of the video transcoding system requires novel
architectural- and algorithm-level solutions in order to reduce
the hardware complexity and improve video quality (see a
companion paper in this special issue on transcoding [9]).
In addition to format and basic bitrate transcoding, signal-
level video adaptation may involve manipulation of video
signals in the following dimensions:
• Spatial - change spatial resolution, i.e., frame size
Fig. 5 Synopsis mosaic as visual summary of baseball events (from [8]). • Precision - change the bit plane depth, color depth, or
the step size for quantizing the transform coefficients
B. Structural-Level Adaptation • Temporal - change the frame rate
Video is a linear medium capturing the real-world events • Object - transmit a subset of objects
and scenes that occur in space and time. The structures in Multiple dimensions of adaptation form a rich adaptation
video are caused by event occurrence orders, camera control space as described earlier in section II. For example, Fig. 6(a)
patterns, and the final editing process. Exploration of relations illustrates a system that combines frame dropping (FD) and
of structural elements provides great potential for video coefficient dropping (CD), which can be implemented in most
adaptation. Such adaptations differ from those described in the compression standards such as MPEG-2 and MPEG-4 [10].
previous subsection in the utility measure used – structural vs. Fig. 6(b) shows another example varying the frame rates for
semantic. encoding different objects in a scene according to their
First, representative frames, or key frames, in each shot can importance [11]. Both methods can be used to meet tight
be used to summarize the information in the shot. There has bandwidth or storage constraints while optimizing the overall
been a lot of work in key frame extraction based on detection utility of the adapted video. If the spatio-temporal resolution
of feature discontinuity, statistical characteristics, or syntactic of the video is unchanged, conventional quality measures such
rules. The adaptation process takes the original full-length as PSNR can be measured at a fixed resolution. But, if the
video as input and produces a sequence of key frames, which spatio-temporal resolutions are different, perceptual-level
can be sequentially played along with audio as a slide show, quality measures are needed. In [5], we conduct user studies to
or organized in a hierarchical interface as navigation aids. In compare the subjective quality of videos transcoded at
practical designs, there is tradeoff between the number of key different spatio-temporal rates. We find distinctive patterns of
frames and information completeness. In addition, ideal users’ preferences of the temporal rate under different
positions of key frames are usually difficult to determine – bandwidth conditions and content types.
leaving the evaluation to some subjective criteria. The utility- Support of multi-dimensional video transcoding may be
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 6

readily available if the source video is already encoded in a described in the previous section, can be extracted to form a
scalable format, i.e., a single encoded stream that can be much shorter image sequence. However, with such a frame-
truncated at different points to generate compatible based subsampling scheme, we will lose the synchronization
substreams. The truncation results in different spatio-temporal between video and the associated audio track.
rates and different bandwidth. Fixed-layer scalable coding An alternative approach to drastic video condensation is
usually consists of a small number of layers, targeting typical by intelligent dropping of a subset of continuous segments of
usage scenarios. On the other hand, continuous scalable video like shots or part of shots from the whole sequence.
coding provides much higher flexibility by allowing arbitrary Simple heuristic rules like dropping from the end of each shot
truncation points. Interested readers are referred to a or random dropping of shots does not work because the
companion paper in this special issue on scalable video coding perceptual quality will be severely undermined. In [14], a
[12]. theoretical approach based on the utility-based conceptual
framework discussed in section II was developed to find the
Dropped optimal condensation scheme. First, video shots and syntactic
coefficients structural units are identified as adaptation entities. Adaptation
operations include length trimming or dropping of individual
shots. The problem was formulated as constrained
I B B P … I optimization, using the target viewing time as the main
constraint. Other constraints were also used to take account of
Original Degraded important production syntax used in films. For example,
Dropped frames establishing shots at the beginning and syntactically critical
(a) structures such as dialogs cannot be changed. At emphasis
points (e.g., key phrases or key audio-visual events),
VO3 (Background)
synchronization between audio and visual cannot be altered.
In addition, psychophysical models based on subjective
studies were used to estimate the relation between perceptual
VO2 (Moving Object)
quality and audio-visual segment length. The subjective
experiments confirmed the user preference of the optimized
fast-forward schemes over the alternatives using fixed sub-
VO1 (Stationary Object)
sampling methods.

VOP: instance of a video object at a given time IV. OPEN ISSUES


(b) Despite the many exciting advances discussed in the
previous section, many open issues require future
Fig. 6 (a) Video transcoding using combination of frame dropping and investigation in order for video adaptation to become a viable
coefficient dropping (b) Video transcoding that varies the frame rate of
field. Some of the issues identified below are related to the
different objects according to their importance (from [11]).
analytical foundation, while others mainly address practical
aspects.
D. Rapid Fast-Forward – Drastic Temporal Condensation
Rapid fast-forward, sometimes referred to as video A. Define Utility Measures and User Preferences
skimming [13][14], is a very useful adaptation tool when The most challenging part of quantitative analysis of video
users’ preferred viewing time is severely limited, while other adaptation is to define adequate measures or methods for
resources may not be restricted. For example, users may want estimating utility. Conventional signal level measures like
to rapidly complete viewing of a 10-min video within 1 PSNR need to be modified when video quality is compared at
minute. Such function resembles the typical fast-forward different spatio-temporal resolutions. In [15], the signal-level
feature in the VCR player. However, here we are interested in distortion for videos coded at different spatio-temporal scales
a much higher time reduction ratio (e.g., 10x) compared to is computed at the full resolution, while some weighting
that of typical fast-forward (e.g., 2x to 3x). factors are incorporated to account for the perceptual effects.
Due to the drastic time condensation, simply increasing the Similarly, weights that account for motion masking effects are
frame rate of the player is not feasible, neither is the uniform discussed in [16]. However, signal-level measures are often
subsampling of the frames in the original sequence. The inadequate since the adaptation space involves many high-
former requires a playback throughput that’s beyond the level operations such as shot removal, modality replacement,
player’s capability and will make the audio track etc. Such operations cause complex changes to the content
unrecognizable. The latter will result in poor perceptual beyond the signal level and thus affect quality at other levels
quality (e.g., important video frames skipped and audio (such as perceptual, semantic, and comprehensiveness). Each
content unrecognizable). level of quality may also involve multiple dimensions. For
Instead of uniform frame subsampling, keyframes, as example, the perceptual level may involve spatial, temporal,
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 7

or resolution dimensions. The second approach is to allow for an ambiguity margin


Given the complex nature of utility, it will be difficult to tolerating implementation variations, and estimate the bound
define a universal measure for different levels or dimensions. of the variations in resource and utility. Theoretical estimate
In practice, input from user preferences can be used to set of such bounds is hard if not impossible. But assuming there
multiple constraints and optimization objectives. For example, exists some consistence among implementations, empirical
a practical approach is to find an adaptation solution bounds of such variations may be obtained. For example, it
maximizing the comprehension-level utility while keeping the can be reasonably assumed that shot segmentation tools are
signal-level utility (e.g., SNR) above some threshold. relatively mature and bounds of shot boundary variations from
However, asking users to unambiguously specify their different detection algorithms can be estimated through
preferences of some dimensions (e.g., temporal) over others empirical simulations. Imposing further restrictions on
(e.g., spatial) is impractical. In addition, user preferences often implementations can tighten the bounds. For example, in the
vary with content, task, and usage environment. case of transform coefficient dropping, a uniform dropping
One possible alternative is to infer user preferences based policy can be used to restrict each block in a frame to drop the
on the usage history. Analysis of such data can be used to same percentage of coefficients.
predict user preferences in similar contexts. Sharing of Third, in some applications, the absolute values of resource
analysis results among different participating users may also and utility of each adapted entity are not important. Instead,
be used to derive common criteria in collaborative filtering, the relative ranking of such values among different adaptation
provided that privacy concerns are adequately addressed. options are critical. In such cases, the chance of achieving
Another direction is to correlate subjective preferences with ranking consistence is higher than consistence in individual
content characteristics. In [10], we assume users have similar values.
preferences of transcoding options (e.g., spatial vs. temporal
C. Relations Among Adaptation, Utility, and Resource
scaling) for videos of similar characteristics. Based on this,
automatic tools were developed to extract content features, Relations among adaptation, resource, and utility are often
cluster video data, and predict the utility values of each complex, as described in section II. The complexity is
transcoding option, and thus automatically select the optimal especially high when the dimensionality of each space is high.
transcoding option satisfying the resource constraints. The Choices of the representation schemes for such complex
prediction accuracy was promising – about 90% of time the relations will greatly affect flexibility and efficiency of the
optimal transcoding option was correctly predicted. design of video adaptation.
Despite several potential approaches mentioned above, One potential approach to tackling such complexity is to
understanding what factors contribute to the overall video sample the adaptation space and store the corresponding
utility before and after adaptation and what components are resource and utility values as multi-variable high-dimensional
computable/predictable still remains as a wide open issue. In tables. If a certain scanning scheme is adopted in the sampling
[17], a relevant, broad concept called universal media process, elements of the tables can be represented by a one-
experience is proposed to emphasize the need of optimizing dimensional sequence.
the overall user experience instead of just enhancing the Another option is to decompose the adaptation space into
accessibility of content as in most existing UMA systems. low-dimensional spaces and sample each subspace separately.
However, such schemes may lose the chance of exploring
B. Resolve Ambiguity in Specifying Adaptation Operation correlations among different dimensions.
Due to the flexible formulation and implementation, some Adequate representations vary with and depend on actual
adaptation operations are not unambiguously defined. For applications. For example, in a case that the adaptation space
example, an operation “remove the second half of each shot” has a single dimension of varying quantization step size, the
appears to be clear. But in practice, the shot boundaries may classical representation of rate-distortion curves is appropriate
not be exactly defined because of the use of automatic, and has proven to be powerful. If the application only requires
imperfect shot detection tools. For another example, an the information about ranking among adaptation operations
operation “drop 10% of transform coefficients” does not satisfying certain resource (or utility) constraints, then
specify the exact set of coefficients to be dropped. Different sampling in the resource (or utility) space and representing the
implementations may choose different sets and result in ranking among feasible adaptation options is an adequate
inconsistent resource and utility values. solution.
There are several possible ways to address this problem.
D. Search Optimal Solutions in Large Spaces
First, we can restrict adaptation operations to be based on
unambiguous representation formats. For example, some Exploration of the above multi-space relations often leads
scalable compression formats, such as JPEG-2000 and to formulation of constrained optimization, some of which
MPEG-4 fine-grained scalable schemes, provide analytical solutions exist. For example, in most video coding
unambiguously defined scalable layers. Subsets of the layers systems, the rate-distortion (R-D) models have been used to
can be truncated in a consistent way as long as the codecs are represent resource-utility relations of video signals and
compliant with standards. achieve optimal coding performance. Such models are usually
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 8

used for low-dimensional cases, e.g., quantization in the video adaptation system. Such descriptions may address
adaptation space, bit rate in the resource space, and SNR in content adaptability, adaptation options, usage environment,
the utility space. Joint optimization in multi-dimensional and user preferences. In addition, standards are needed for
adaptation space including spatial, temporal, and SNR describing information related to media rights management. In
adaptation dimensions has been addressed in [10][15]. In the the next section, we will briefly review several international
general cases, each space may have high dimensionality and standards that are closely related to video adaptation.
the relations across spaces may be complex. It remains a
challenging issue to find analytically optimal solutions or V. SUPPORT OF ADAPTATION IN INTERNATIONAL STANDARDS
efficient search strategies under such complex conditions. Recognizing the importance of media adaptation
E. Design End-to-End Integrated Systems applications, several international bodies have recently
developed standards to facilitate deployment and
Design of effective video adaptation solutions requires joint
interoperability of adaptation technologies. Most notable ones
consideration of the adaptation subsystem with other
include MPEG-7 [19][20], MPEG-21 [21][24], W3C [22], and
subsystems such as content analysis, transmission, or usage
TV-Anytime [23]. Different standards are targeted at different
environment monitoring. For example, many adaptation
applications. For example, TV-Anytime focuses on adaptation
methods require recognition of structural elements or semantic
of content consumption in high-volume digital storage in
events in the video. How do we design robust adaptation
consumer platforms such as PVRs. W3C and IETF focus on
systems to accommodate the inconsistent, imperfect results
facilitating server/proxy to make decisions on content
from content analysis? Or sometimes it might be desirable to
adaptation and delivery. Its approach is based on a profile
include users in the loop and use semi-automatic recognition
framework, called composite capabilities/preferences profile
methods in lieu of fully automatic ones. Adaptation solutions
(CC/PP), and is mainly used to describe terminal capabilities
are often designed to satisfy various constraints or user
and user preferences. In the following, we focus on a select set
preferences, which may be dynamically varying. What are the
of tools provided by the MPEG-7 and MPEG-21 standards,
mechanisms and protocols for acquiring and monitoring such
and illustrate how these tools could be used together in a
dynamic conditions? How should the adaptation process be
standardized adaptation framework that is consistent with the
designed in order to tolerate imprecise or imperfect
concepts put forward in this paper.
information about usage environments?
In some applications that require live adaptation of A. MPEG-7 Content Descriptions
embedded implementation, the computational resources are MPEG-7 has standardized a comprehensive set of
limited. We need to optimize resource allocation not only description tools, i.e., descriptors (Ds) and description
among components of adaptation but also between adaptation schemes (DSs) to describe information about the content (such
and other subsystems mentioned above. In such cases, the as program title and creation date) and information present in
utility-resource framework described earlier offers an the audio-visual content (such as low-level features, mid-level
adequate conceptual basis that can be extended to address features and structures, and high-level semantics). Such Ds
multi-subsystem resource allocation. and DSs are encoded using an extensible language based on
Another critical issue that affects the feasibility of video XML and XML schema. In the area of video adaptation,
adaptation is related to the rights management. Many MPEG-7 provides comprehensive support by specifying a
adaptation applications are hindered in practice today due to wide variety of tools for describing the segmentation,
the restriction imposed by content owners on video content transcoding hints, variations, and summaries of multimedia
altering. Such restrictions may be placed through the use of content. An excellent review of such components along with
proprietary formats or explicit legal limitations on some application scenarios is presented in [18]. We include a
manipulating the video content. brief summary of the tools and their use for adaptation here.
The first partial response to the above issues is to adopt MPEG-7 provides tools for describing user preferences and
modular designs of subsystems and provide well-defined usage history, which can be combined with description about
abstraction of requirements and performance of each content in personal content filtering/selecting applications.
subsystem. For example, each content recognition subsystem Specifically, the usage history DS consists of lists of actions
can be abstracted in terms of the detection accuracy, the input performed by the user over some periods of time. A variety of
content format, and the implementation complexity, etc. actions (e.g., PlayStream, Record, etc) have been defined in an
Similarly, each usage monitoring subsystem is abstracted extensible dictionary. The UserPreferences DS describes user
based on the accuracy, the complexity, and the frequency of preferences related to different categories of attributes such as
the measurement. With such modular abstraction, system-level creation (creators, time periods, locations, etc), classification
integration and performance optimization can be made more (genre, language, etc), dissemination (delivery type, source,
tractable. and disseminator), media format, and format of navigation or
Another potential solution is to adopt international summarization. Each preference attribute may be associated
standards that define protocols and tools for describing with a numerical weight, indicating the relative importance of
important attributes required for designing an end-to-end each attribute compared to others.
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 9

MPEG-7 also provides summary descriptions that define User characteristics include several tools imported from
the summary content, its relation to the original content, and MPEG-7 (e.g., user preference), as well as a number of newly
the way the summary content is used to synthesize the final developed tools. Among the new tools are presentation
summary presented to the user. Summary content specifies the preferences, which describe preferences related to audio-
parts or components of the source content such as key visual rendering, or to the format/modality a user prefers to
segments or key frames of video or audio. The final receive, accessibility characteristics, which enable one to
synthesized form of summaries can be based on hierarchical adapt content according to certain auditory or visual
organization of key components, sequential display of key impairments of a user, and location characteristics, which
components, or some customized presentations defined by describe the mobility and destination of a user. Terminal
practical applications. capabilities include encoding and decoding capabilities,
The variation description is used to describe alternative display and audio output capabilities, as well as power,
versions derived from the original version. The type of the storage and input-output characteristics of a device. Network
derivation process is specified by the variation relationship characteristics include static capabilities of a network such as
attribute. General types of processing may include revision by its maximum capacity, as well as dynamic conditions of a
editing/post processing, substitution, or data compression. network such as the available bandwidth, error and delay. The
Transcoding types of processing involve reduction of bit rate, natural environment pertains to physical environmental
spatio-temporal resolution, spatial detail, color depth, or conditions such as the lighting condition or auditory noise
change of color format. Other processing types include level, or a circumstance such as the time and location that
summarization, abstraction, extraction, and modality content is consumed or processed.
conversion. Each variation is given a fidelity value and a While the usage environment description tools may be used
priority value – the former indicates the quality of the in a standalone manner to convey implicit constraints to a
alternative version of the content compared to the original server or proxy, they may also be used to provide a richer
version, while the latter the relative importance of the form of expression through the Universal Constraints
variation compared to other options. Description (UCD) tool. With the UCD tool it is possible to
In many applications of transcoding, low-delay, low- formulate explicit limitation and optimization constraints. In
complexity, and quality preservation is required. To facilitate this way, additional guidance is provided to an adaptation
satisfaction of such requirements, MPEG-7 defines engine in a standardized way so that a more satisfactory
transcoding hints to provide metadata for guiding practical adaptation could be provided and/or to limit the space of
transcoding implementations. Such descriptions contain feasible adaptations so that the required effort to search for an
specifications of importance, priority, and value of segments, optimal solution is reduced. As an example, consider an input
objects, and regions in audio-visual content, as well as image to be adapted according to the following: maximize the
descriptions of behaviors of transcoding methods. Some adapted image quality, such that (i) the output rate is less than
examples are motion hints (for guiding motion-based the average available network bandwidth, (ii) the adapted
transcoding methods), difficulty hints (for bit rate control), width is greater than 50% of the display width, and (iii) the
semantic importance hints (for guiding rate control), spatial aspect ratio of the adapted image is equal to that of the input
resolution hint (for specifying the maximum allowable spatial image.
resolution reduction), etc. Transcoding hints descriptions are It should be noted that such expressions may be provided
associated with compressed videos and can be stored in the not only by the user, but the content provider as well to
server or transmitted to proxies where the transcoding enforce some level of control as to how their content is
operations take place. adapted and the form it is ultimately delivered. As part of
ongoing work in DIA, the link to such constraints with
B. MPEG-21 Digital Item Adaptation
adaptation rights and other digital rights management tools is
An extended scope of issues related to adaptation of digital being explored.
multimedia content is addressed by Part 7 of the MPEG-21 Also worth noting in the above example is that descriptions
standard, Digital Item Adaptation (DIA) [24]. In the of both the usage environment such as network bandwidth,
following, specific tools related to the adaptation conceptual and descriptions of media characteristics such as output rate of
framework presented in section II are briefly outlined and the source, are required to describe both ends of the system.
discussed. This reinforces the inherit dependency between MPEG-7 and
Given that adaptation always aims to satisfy a set of MPEG-21 towards solving UMA related problems.
constraints, tools that describe the usage environment in a To complete the adaptation framework, the DIA standard
standardized way are essential. As a result, the DIA standard also specified a means to describe the relationship between the
specifies tools that could be used to describe a wide array of above constraints, the feasible adaptation operations satisfying
user characteristics, terminal capabilities, network these constraints and associated utilities that result from
characteristics and natural environment characteristics. As a adaptation. The tool enabling this is referred to as the
whole, this set of usage environment descriptions (UED's) AdaptationQoS tool. The relations that this tool describes
comprise the resource space discussed in section II. could be specified at various levels of granularity (e.g., frame,
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 10

group-of-pictures), which is consistent with the concept of an standards such as MPEG-7 and MPEG-21 also include tools
entity and the adaptation-resource-utility relations introduced to describe various information about the content, user and
in section II. With this information, the adaptation problem usage environment, which is necessary for video adaptation.
becomes a well-defined mathematical problem to which the Despite the bourgeoning activities and advances, this field
optimal adaptation strategies described earlier could be is in need of an analytical foundation and solutions to many
applied. challenging open issues. This paper offers a preliminary
framework that characterizes fundamental entities and
C. Standardized Adaptation Framework
important concepts related to video adaptation. Introduction of
Fig. 7 illustrates how the above concepts fits together into a such a framework allows for systematic formulation of many
standardized form of the conceptual adaptation framework practical problems as resource-utility tradeoff optimization.
presented in this paper. Several inputs are provided to an Critical open issues that call for further investigation
adaptation decision engine, including media characteristics as include development of effective measures and estimation
given by MPEG-7, along with the constraints and relations as methods for utility (i.e., video quality in a general sense),
given by the UED/UCD and AdaptationQoS tools of MPEG- adequate representation of relationships among concepts (i.e.,
21. It is the function of the adaptation decision engine to use adaptation, resource and utility), efficient search methods of
this input to find an optimal set of adaptation parameters that optimal solutions satisfying diverse constraints, and finally
satisfy all the given constraints. These parameters are then systematic methodologies for designing complex end-to-end
passed to a bitstream adaptation engine, where the actual adaptation systems. The first issue related to utility
adaptation of the input bitstream occurs. measurement is of the foremost importance for the theoretical
From the above, it is clear that both MPEG-7 and MPEG- development in the field. In view of the difficulty in
21 are well aligned with the conceptual adaptation framework establishing universal computable metrics for utility, potential
presented in this paper and could provide solutions that solutions may be derived by exploring the description,
address some of the end-to-end design concerns raised in analysis, and prediction of user preferences of different
section IV.E. adaptation options in each practical application setting.
It is worthwhile to note that solutions to most of the above
Content Descriptions identified open issues require joint consideration of adaptation
(MPEG-7) with several other closely related issues, such as analysis of
video content, understanding and modeling of users and
environments, and rights management of digital content. Such
Relations Constraints cross-disciplinary exploration is critical to innovation and
Adaptation
(AdaptationQoS) (UED/UCD) advancement of video adaptation technologies for next-
Decision
Engine generation pervasive media applications.

VII. ACKNOWLEDGEMENTS
Adaptation Parameters The authors are grateful for anonymous reviewers for their
valuable input, and for Yong Wang and Anita Huang for their
Input Adapted editorial help.
Bitstream Bitstream Bitstream
Adaptation REFERENCES
Engine [1] A. Fox and E. A. Brewer, "Reducing WWW Latency and Bandwidth
Requirements by Real timer Distillation", Proc. Intl. WWW Conf., Paris,
Fig. 7 Diagram illustrating adaptation framework according to MPEG-7/21 France, May 1996.
description tools. [2] J. R. Smith, R. Mohan and C. Li, "Scalable Multimedia Delivery for
Pervasive Computing", Proc. ACM Multimedia, Orlando, FL, Oct-Nov
1999.
VI. CONCLUSIONS [3] S.-F. Chang, “Optimal Video Adaptation and Skimming Using a Utility-
Based Framework,” Proc. Tyrrhenian Intl Workshop on Digital
Video adaptation is an emerging field that encompasses a Communications, Capri Island, Italy, Sept. 2002.
wide variety of useful technologies and tools for responding to [4] B. Li, J. Errico, H. Pan, and I. Sezan, “Bridging The Semantic Gap in
the need of transmitting and consuming multimedia content in Sports Video Retrieval and Summarization,” Journal of Visual
diverse types of usage environments and contexts. Different Communication and Image Representation, Special Issue on Multimedia
Database Management, 2004.
from video compression or transcoding, adaptation offers a [5] Y. Wang, S.-F. Chang, A. C. Loui, “Subjective Preference of Spatio-
broader spectrum of operations at multiple levels ranging from Temporal Rate in Video Adaptation Using Multi-Dimensional Scalable
signal, perceptual, to semantic. Recently, exploration of Coding,” Proc. IEEE Int'l Conf. Multimedia and Expo, Taipei, Taiwan,
June 2004.
various adaptation techniques has facilitated development of [6] S.-F. Chang, D. Zhong, and R. Kumar, “Real-Time Content-Based
many exciting applications such as event-adaptive streaming, Adaptive Streaming of Sports Video,” Proc. IEEE Workshop on
personalized media variation and summarization, and multi- Content-Based Access to Video/Image Library, IEEE CVPR conference,
Hawaii, Dec. 2001.
level multi-dimensional transcoding. Several international
Published in Proc of IEEE, vol. 93, no. 1, pp. 148-158, Jan 2005. 11

[7] D. Zhang, S.-F. Chang, Event Detection in Baseball Video Using Anthony Vetro (S'92-M'96-SM'04) received the B.S., M.S. and Ph.D. degrees
Superimposed Caption Recognition, Proc. ACM Multimedia, Jean Les in Electrical Engineering from Polytechnic University, Brooklyn, NY.
Pins, France, Dec. 2002. He joined Mitsubishi Electric Research Labs, Cambridge, MA, in 1996,
[8] M. Irani and P. Anandan. Video indexing based on mosaic and is currently a Team Leader and Senior Principal Member of the Technical
representation. IEEE Trans. on PAMI, 86(5):905--921, May 1998. Staff. His current research interests are related to the encoding and transport of
[9] Transcoding paper in this special issue. multimedia content, with emphasis on video transcoding, rate-distortion
[10] Y. Wang, J.-G. Kim, and S.-F. Chang, "Content-based utility function modeling and optimal bit allocation. He has published more than 80 papers in
prediction for real-time MPEG-4 transcoding," Proc. IEEE Int'l Conf. these areas and holds 18 U.S. patents. Since 1997, he has been an active
Image Processing, Barcelona, Spain, Sept 2003. participant in MPEG, contributing to the development of the MPEG-4 and
[11] A. Vetro, T. Haga, K. Sumi and H. Sun, "Object-Based Coding for MPEG-7 standards. Most recently, he served as editor for Part 7 of MPEG-21,
Long-Term Archive of Surveillance Video", Proc. IEEE Int'l Conf. Digital Item Adaptation.
Multimedia and Expo, Baltimore, MD, July 2003. Dr. Vetro has been a member of the Technical Program Committee for the
[12] Scalable coding paper in this special issue. International Conference on Consumer Electronics since 1998 and has served
[13] M. A. Smith and T. Kanade, “Video Skimming for Quick Browsing the conference in various capacities. He has been a member of the
Based on Audio and Image Characterization,” Carnegie Mellon Publications Committee of the IEEE TRANSACTIONS ON CONSUMER
University, Technical Report CMU-CS-95-186, July 1995. ELECTRONICS since 2002 and elected to the AdCom of the IEEE Consumer
[14] H. Sundaram, L. Xie, S.-F. Chang, “A Utility Framework for the Electronics Society from 2001-2003. He is a member of the Technical
Automatic Generation of Audio-Visual Skims,” Proc. ACM Multimedia, Committee on Visual Signal Processing and Communications of the IEEE
Juan Les Pins, France, December 2002 . Circuits and Systems Society, and was a member of the Editorial Board for the
[15] E.C. Reed and J.S. Lim, "Optimal multidimensional bit-rate control for Journal of VLSI Signal Processing Systems for Signal, Image, and Video
video communications," IEEE Trans Image Processing, vol. 11, no. 8, Technology from 2001-2004. He served as Guest Editor (with C.
pp. 873-885, Aug. 2002. Christopoulos and T. Ebrahami) for the special issue on Universal Multimedia
[16] J. Lee and B. W. Dickinson, “Temporally adaptive motion interpolation Access of IEEE Signal Processing Magazine. He has also received several
exploiting temporal masking in visual perception,” IEEE Transactions awards for his work on transcoding, including the 2003 IEEE Circuits and
on Image Processing, vol. 3, pp. 513–526, September 1994 Systems CSVT Transactions Best Paper Award.
[17] F. Pereira and I. Burnett, "Universal Multimedia Experiences for
Tomorrow," IEEE Signal Processing Magazine, vol. 20, pp. 63-73,
March 2003.
[18] P. van Beek, J. R. Smith, T. Ebrahimi, T. Suzuki, J. Askelof, “Metadata-
Driven Multimedia Access,” IEEE Signal Processing Magazine, pp. 40-
52, March 2003.
[19] Public MPEG documents,
[Link]
[20] MPEG-7 Overview v.9, ISO/IEC JTC1/SC29/WG11N5525, March
2003.
[21] MPEG-21 Overview v.5, ISO/IEC JTC1/SC29/WG11/N5231, Oct.
2002.
[22] W3C CCPP CC/PP, “Exchange protocol based on HTTP extension
framework” [Online]. Available: [Link]
CCPPexchange
[23] TV-Anytime Forum. Available: [Link]
[24] Information Technology – Multimedia Framework (MPEG-21) – Part 7:
Digital Item Adaptation, ISO/IEC 21000-7 FDIS, ISO/IEC JTC
1/SC29/WG11/N6168, December 2003.

Shih-Fu Chang (M’93–SM’02–F’04) is a Professor in the Department of


Electrical Engineering of Columbia University. He directs the Digital
Video/Multimedia Lab ([Link] and ADVENT
industry-university consortium at Columbia University, conducting research in
video analysis, multimedia indexing, universal media access, and media
authentication. Systems developed by his group have been widely used,
including VisualSEEk, VideoQ, WebSEEk for image/video search, WebClip
for networked video editing, and Sari for online image authentication. He has
led major cross-disciplinary projects, such as a medical video digital library
funded by NSF DLI initiative, an art image education project with Teachers
College, and a large video search engine with stock footage companies in
NYC. Through collaboration with industry partners, his group has made major
contributions to the development of MPEG-7 multimedia description standard.
Prof. Chang served as a general co-chair of ACM 8th Multimedia
Conference 2000, and will participate as a conference Co-Chair in IEEE
Multimedia Conference 2004. He has also consulted at several media
technology companies. He has been a Distinguished Lecturer of the IEEE
Circuits and Systems Society, 2001-2002; a recipient of a Navy ONR Young
Investigator Award, IBM Faculty Development Award, NSF CAREER
Award, and three best paper awards from IEEE, ACM, and SPIE in the areas
of multimedia indexing and manipulation. In recent years, his students have
also received recognition with several best student paper awards in peer
reviewed publications.

You might also like