Buffer Constrained Proactive Dynamic Voltage Scaling For Video Decoding Systems
Buffer Constrained Proactive Dynamic Voltage Scaling For Video Decoding Systems
complexity modeling, energy saving Finish Job 1 Finish Job 2 Finish Job 3
Start Job 2 Start Job 3
1. INTRODUCTION T1 T3
T2
Power-frequency reconfigurable processors are already JOB 1 JOB 2 JOB 3
available in wireless and portable devices. Recently, hardware Tmax Tmax Tmax
components are being designed to support multiple power Figure 1: The top panel is no-DVS, middle panel is conventional DVS and
modes that enable trading off execution time for energy the bottom panel is proposed DVS method.
savings. For example, some mobile processors can change the As shown in the top panel of Figure 1, the first job needs T1
speed (i.e., frequency) and energy consumption at runtime [1]. time units whereas the second job needs T2 time units. The key
Significant energy savings can be achieved by adapting the idea behind existing adaptive systems is to adapt the operating
voltage and processing speed for the tasks where early frequency such that the job is processed exactly in 4MAX time
completion does not provide any gains. The energy spent on units (see the middle panel of Figure 1). Clearly, this shows the
one process can be reduced by decreasing the voltage, which reactive and greedy nature of the existing adaptation process –
will correspondingly increase the delay [2].The main goal of all the resources are optimized within a job for a fixed time
DVS algorithms is utilizing this energy-delay trade off for tasks allocation 4MAX . However, the DVS gains can be substantially
whose jobs’ completion times are immaterial as long as they improved when the allocated times (T1, T2, and T3) and
are completed before their processing deadline An example is operating levels (power-frequency pairs) are optimized jointly
real-time video decoding, where an early completion of frame (inter-job optimization) as shown in the bottom panel, Figure 1.
decoding does not provide any benefit as long as the display Assume we have three jobs with complexities
deadline for that frame is met. A DVS algorithm essentially C # MAX , C # MAX , C # MAX . From now on, we
assigns the operating level (i.e., power and frequency) for each use the term complexity to represent the number of execution
job given the estimated cycle requirement (i.e., the required cycles. With “no-DVS”, processing is performed considering
complexity) and the job deadline [2]. the worst case scenario: at the maximum frequency for each job
Recently, several researchers have addressed the problem of &MAX and corresponding maximum power 0MAX . For
efficient DVS for multimedia applications. In [3], rather than “conventional DVS”, frequencies are adjusted to finish each
using the worst case complexity, a worst case estimate
Authorized licensed use limited to: Viettel Group. Downloaded on November 26,2020 at 09:25:25 UTC from IEEE Xplore. Restrictions apply.
job just-in-time, F &MAX , F &MAX , F &MAX . If
P2
we assume a power-frequency relationship of 0 r F [2] then P
P1
the power spent on the various jobs will be I B3
P 0MAX , P 0MAX , P 0MAX .For the “proactive I B2 B3 B4 B2
optimization” with modified deadlines, the total complexity is B1
B1
C C C # MAX , and frequencies are kept
constant: F F F &MAX . The total energy Job 1 Job 2 Job 3 Job 4 Job 5 Job 1 Job 2 Job 2
spent on the group of jobs equals %TOTAL PI CI F I . Figure 2: Directed acyclic dependence graphs for a) Dependencies between
Hence, the total energy spent for the “no-DVS” case I-B1-B2-P1-B3-B4-P2 frames b) Hierarchical B Pictures, I-B1-B2-B3-P
is %NO DVS % , for the “conventional DVS” equals In predictive coding, frames are encoded with
%DVS % and for the “proactive DVS” case equals
interdependencies that can be represented by a directed acyclic
% PROACTIVE ? DVS % , where % 0MAX # MAX &MAX . In this
dependence graph (DAG). Examples are shown in Figure 2 for
example, conventional DVS provides 35% energy reduction two different GOP structures: the conventional I-B-B-P-B-B-P
with respect to no-DVS, whereas proactive DVS provides 66% and the hierarchical B pictures. Decoding frame I is a job and
energy reduction with respect to no-DVS. decoding frames P1 and B1 jointly represent another job.
As mentioned before, none of the previous studies consider Prediction structures using hierarchical B pictures as in the
proactively changing the time allocations and frequency; H.264/AVC standard lead to the following sizes (S),
instead, they aim at adapting the frequency to fixed time complexities (C), and deadlines (D): the first job represents the
allocations in a greedy fashion. We propose a novel DVS decoding of the I-frame ( S D %T ) the second job
algorithm that adapts jobs deadlines by buffering the decoded consists of decoding frames P, B1 and B2
frames before display. By utilizing this post-decoding buffer, ( S D %T ) and the last job is the decoding of frame
we study the DVS problem into the context of the buffer B3 ( S D %T ). It is important to notice that, both
constrained optimization problem, similar to well studied the second and the third job can be viewed from a high level
problems of rate-control with buffer constraints. We also perspective as decoding a B frame. However, the job
propose an optimal and several low-complexity suboptimal parameters are substantially different, thereby highlighting the
solutions for the buffer-constrained DVS problem. need for encoder-specific complexity estimation.
This paper is organized as follows: Section II describes the
compression aware job definitions. The proposed DVS 3. BUFFER CONSTRAINED DVS
algorithms are explained in Section III. Section IV presents the Let us assume there is a discrete set of operating levels with
comparative results and Section V concludes the paper. corresponding frequency and power levels which can be used
2. COMPRESSION AWARE JOB DEFINITIONS in our frequency/voltage adaptation. Each level has a different
power consumption and different frequency
The state of the art video encoders deploy complex temporal
0 & [P F P. F. \ P P. F F. ] . Assume there
predictions from both past and/or future frames which must be
are a total of - jobs with deadlines
decoded much before their display deadline. Hence, each frame
$ [D D - \ D D - ] ,sizes 3 [S S - ]
has a decoding deadline that is determined by the temporal
decomposition structure (temporal dependencies). This and complexity estimates # [C C- ] . The DVS
deadline is different from the play-back (display) deadline problem attempts to find the set of operating levels (power and
determined by the frame rate fr. Let }N be the set of frames frequency tuple) for each job, as follows:
for which frame n is used as a reference. Then, the display and DVS Problem:
decoding deadlines for frame n can be written: -
VI - 478
Authorized licensed use limited to: Viettel Group. Downloaded on November 26,2020 at 09:25:25 UTC from IEEE Xplore. Restrictions apply.
the parameters of M jobs and buffer occupancy B(j)Å. For each Hence, before any optimization, the power-frequency values
job, the complexity estimates are updated and based on the which do not provide complex E-D points should be pruned,
buffer occupancy, a new operating level is assigned. We define i.e., a convex hull of E-D points, from a possible set of E-D
the buffer occupancy for jobÅJ as "J recursively points should be generated. Since the slopes are identical for all
" J MAX" J S J ¢ T J FR ± and " "INITIAL jobs (Proposition II), this pruning is done only once for all jobs.
where fr denotes the frame rate, "INITIAL ,the initial state of the Proposition IV: The optimal operating level assignment
buffer, depends on the initial playback delay(which may be will result in equal slopes for every job. Proof: See [7].
zero if no delay is tolerable), and T J is the time that job j Buffer constrained DVS problem is analogous to buffer
takes which can be written as T J C J F J .Then, the DVS constrained R-D optimal bit allocation problem [5]. Hence,
problem becomes the problem of minimizing total energy under similar to rate (buffer)-control problems in video transmission,
buffer constraints. we aim to keep the buffer state in equilibrium ( "MAX ) for a
Buffer Constrained DVS Problem: group of W jobs. Then, using Propositions II and IV, the
-
optimal frequency considering a look-ahead window of W jobs:
find F OPT P OPT ARG MIN
0 &
% I (energy consumption) J 7
I FR CM
MJ
s.t. b " J b "MAX (buffer constraints) F OPT J J 7
"MAX
We need to guarantee the buffer never underflows, such that no " J
SM
MJ
frame freeze occurs. Also, the buffer state can not grow
indefinitely because it is a finite physical buffer. If we assume Algorithm 1 performs the delay-energy optimization for every
the maximum buffer size is "MAX , we need to guarantee that job considering a look ahead window of W jobs.
the buffer occupancy at any state is lower than "MAX .
Table I: Algorithm 1
The optimal solution can be found by dynamic programming 1. Choose the set of frequency-power tuples
methods, which explicitly consider every possible frequency- which creates convex E-D points
power pair for each job and check the buffer state for overflow 2. For each job J ( b J b - ),find the
or underflow. A trellis is created with given complexity optimum frequency
3. Proceed to job J ( Step 2)
estimates of the job and possible power-frequency assignments.
At every stage, the paths reaching the same buffer occupancy at A fast extension of this algorithm, Algorithm 2, is to perform
a higher energy cost are pruned. this optimization and assign the same frequency for 7 jobs
Although there is a finite number of operating levels, and re-perform this optimization only when the buffer level
intermediate operating levels are achievable by changing the gets lower or higher than the specified thresholds.
frequency/power within the job similar to the approach in [2]
[3]. Figure 4 shows the energy-delay curve for jobs j and j+1 Table II: Algorithm 2
Intermediate levels are achieved by frequency changes within 1. Choose the set of frequency-power tuples
the job. Greater values of the slope means greater energy spent which creates convex E-D points
with smaller delay yielding a higher frequency-power choice.
Energy
2. For each look-ahead window of size 7
D1,E1 D E
x 1, 1 3. For job J within the window find the
x M M
x D2,E2 optimum frequency
x D2,E2 P P
Job J 4. Execute the job with the assigned
D3,E3
D3,E3
x F F
x
x
D4,E4 frequency and check buffer state " J
D4,E4
x D5,E5
Job J x D6,E6
5.
D5,E5
x
D6,E6
x
x
If " J b C or " J p C change the
M Delay frequency (Step 3) else continue with
M M M same frequency (Step 4)
Figure 4: Operational Energy-Delay curve of jobs j and j+1 with D,E pairs A least complex approximation, Algorithm 3, is changing the
corresponds to different frequency-power levels of the processor. frequency only when there is a risk of overflow or underflow.
Proposition I: If we neglect the transition cost from one
Table III: Algorithm 3
frequency to another, the frequency change within the job
Choose the set of frequency-power tuples
corresponds to piece-wise bilinear interpolation power- 1.
which creates convex E-D points.
frequency points as shown in Figure 4. Proof: See [7]. 2. For job J find the optimum frequency
Proposition II: The slope between two E-D points only 3.
Execute the job with the assigned
depends on the power/frequency values but not on frequency and check the buffer state
complexities. Proof: See [7]. Set J J . If " J b C or " J p C change
Proposition III: From a set of given power-frequency 4. the frequency (Step-2)else continue
with same frequency(Step 3)
points, only the ones that generate a convex set of E-D points
should be used for proactive DVS. Proof: See [7].
VI - 479
Authorized licensed use limited to: Viettel Group. Downloaded on November 26,2020 at 09:25:25 UTC from IEEE Xplore. Restrictions apply.
4. COMPARATIVE RESULTS method, utilizes a statistical worst case complexity estimate
denoted here as Conventional [3]. Another method is the
In this section we compare the proposed DVS method to
optimum conventional DVS for benchmark purposes which is
conventional DVS methods. We used four different test
named as Conventional Lower Bound. We assumed the
sequences, foreman, mobile, coastguard and silence, 256
complexities of each job are known apriori and find the
frames at CIF resolution and frame rate 30fps, encoded at two
optimum theoretical DVS gain achievable by conventional
different compression specifications (low complexity and high
DVS methods. Optimal Proactive is the optimal dynamic
complexity), and decoded at 2 different rates, 512kbps and
programming based algorithm assuming an exact knowledge of
1024kbps. To obtain statistically meaningful results, we
complexities before the actual decoding is performed. Optimal
concatenated the resulting 16 videos in 12 different orders,
Proactive DVS shows the limit of energy savings given the
resulting in a set of 12 long videos with 3072 frames each. We
buffer size.
present the average results of 12 videos with different decoding
traces. Power and frequency values are taken from [6] . Table IV: Comparison of different DVS algorithms in terms of scaled
energy spent and number of frequency changes per job. Benchmark (%100
220 energy spent case) is no-DVS.
"MAX 7 "MAX 7
buffer occupancy in number of frames
16
200
14
180 Scaled # freq. Scaled #of freq.
12
Energy changes Energy changes
freqeuncy in Mhz
160
10 Conv. Bound 45.21 1519.8 45.21 1519.8
140
8
120 Conventional 60.43 1688.5 60.43 1688.5
6
100 Optimal Proactive 36.93 1753.7 36.63 1747.3
80
4 Algorithm 1 40.36 655.91 40.23 545.67
60
2 Algorithm 2 40.60 161.0 40.58 82.75
0 500 1000 1500 2000
0
0 500 1000 1500 2000
Algorithm 3 42.35 121.17 41.66 57.92
number of jobs number of jobs
As the results show, all of the proposed methods
Figure 5 : Frequency assignment (a) and buffer fullness (b) for Algorithm 1
significantly out-perform the conventional algorithms in terms
In the following, we compare the fast algorithms, Algorithm of energy savings and the number of frequency changes. Note
1, 2, and 3 in terms of the buffer utilization, the number of that, the proposed method provides energy savings exceeding
frequency transitions and energy savings. The highest number the upper bound of the conventional methods which is based
of frequency changes occurs in Algorithm 1 compared to other on the exact knowledge of the complexity. This result clearly
algorithms whereas the least buffer level variation is observed, shows the superiority of the proposed proactive DVS method.
as shown in Figure 5. Algorithm 3 is the most conservative in
terms of the number of frequency changes and as expected 5. CONCLUSION
provides the least energy savings among the proposed three In this paper we proposed a novel DVS method for video
algorithms. The performance of Algorithm 2 is between that of decoding which achieves energy savings exceeding the upper
Algorithm 1 and 3 in the sense of energy savings, buffer bound of the conventional DVS methods. We explored the
utilization, and the number of frequency changes. fundamental energy vs. delay trade-off for video decoding
220
16
systems and proposed new trade-offs such as buffer size vs.
buffer occupancy in number of frames
200
14
energy savings and the number of frequency changes that the
180
12 platform can handle.
freqeuncy in Mhz
160
10
140 6. REFERENCES
8
120 [1] Intel Inc, “Intel XScale Technology,” Available online
6
100 http://www.intel.com/ design/intelxscale
4
80 [2] T. Ishihara and H. Yasuura, “Voltage scheduling problem for
2
60 dynamically variable voltage processors,” in Proceedings of Intern.
0
0 500 1000 1500 2000 0 500 1000 1500 2000 Sym. on Low-Power Electronics and Design. Monterey, CA 1998.
number of jobs number of jobs
[3] W. Yuan et.al., “GRACE: Cross-layer adaptation for multimedia
Figure 6: Frequency assignment (a) and buffer fullness (b) for Algorithm 3 quality and battery energy,” IEEE Trans. on Mobile Comp.,to
appear
As the buffer size and the look-ahead window size
[4] M. van der Schaar and Y. Andreopoulos, “Rate-distortion-
increases, energy savings of the proposed buffer controlled complexity modeling for network and receiver aware adaptation,”
DVS algorithms increase as expected intuitively. The gap IEEE Trans. on Multimedia, vol. 7, no. 3, pp. 471-479, June 2005.
between the energy savings of Algorithm 1 and Algorithm 3 [5] A. Ortega, K. Ramchandran, and M. Vetterli, “Optimal trellis-
decrease as the buffer size gets large. This result shows the based buffered compression and fast approximations,” IEEE Trans.
trade off between the buffer size, energy savings and number of on Image Processing, Vol. 3, No. 1, Jan. 1994.
[6] A. Sinha and A. P. Chandrakasan, “Jouletrack: A web based tool
frequency changes. If the buffer size is large, energy savings for software energy profiling,” in Proc. IEEE/ACM DAC, 2001.
close to the savings of Algorithm 1 can be achieved with less [7] E. Akyol, M. van der Schaar, “Complexity model based proactive
frequency transitions using Algorithm 3. Conversely, when the dynamic voltage scaling for video decoding systems ”, IEEE Trans.
buffer size is small, the difference in energy savings of on Multimedia, to appear 2007
Algorithm 3 and Algorithm 1 can be significant. We also
simulated the conventional DVS method in two ways. The first
VI - 480
Authorized licensed use limited to: Viettel Group. Downloaded on November 26,2020 at 09:25:25 UTC from IEEE Xplore. Restrictions apply.