0% found this document useful (0 votes)
32 views25 pages

MMC Module 4 Notes

The document provides an overview of video compression standards, specifically H.261 and H.263, detailing their functionalities, encoding techniques, and error resilience mechanisms. H.261 is designed for video telephony and conferencing over ISDN, while H.263 enhances compression for lower bit rates suitable for wireless and PSTN applications. The document also discusses the importance of motion vectors, frame types, and error tracking schemes to maintain video quality during transmission.

Uploaded by

Mamtha P P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views25 pages

MMC Module 4 Notes

The document provides an overview of video compression standards, specifically H.261 and H.263, detailing their functionalities, encoding techniques, and error resilience mechanisms. H.261 is designed for video telephony and conferencing over ISDN, while H.263 enhances compression for lower bit rates suitable for wireless and PSTN applications. The document also discusses the importance of motion vectors, frame types, and error tracking schemes to maintain video quality during transmission.

Uploaded by

Mamtha P P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multimedia Communications

Module 4
Video compression standards: H.261, H.263, MPEG, MPEG 1, MPEG 2, MPEG-4 and Reversible VLCs, Standards for
multimedia communications: Reference models, standards relating to interpersonal communications.
H.261:
 The H.261 video compression standard has been defined by the ITU-T for the provision of video telephony and
videoconferencing services over an integrated-services digital network (ISDN).
 Thus the standard assumes that the network offers transmission channels of multiples of 64 kbps.
 The standard is also known, therefore, as p x 64 where p can be 1 through 30.
 The digitization format used is either the common intermediate format (CIF) or the quarter CIF (QCIF).
Normally, the CIF is used for videoconferencing and the QCIF for video telephony.
 Each frame is divided into macroblocks of 16 x 16 pixels for compression, the horizontal resolution is reduced
from 360 to 352 pixels to produce an integral number of 22 macroblocks.
 Hence, since both formats use subsampling of the two chrominance signals at half the rate used for the
luminance signal, the spatial resolution of each format is:
CIF: Y= 352 x 288, Cb = Cr = 176 x 144
QCIF: Y= 176 x 144, Cb = Cr = 88 x 72
 Progressive (non-interlaced) scanning is used with a frame refresh rate 30 fps for the CIF and either 15 or 7.5 fps
for the QCIF.
 Only I- and P- frames are used in H.261 with three P-frames between pair of I-frames.
 The encoding of each of the six 8 x 8 pixel blocks that make up each macroblock in both I- and Pframes 4 blocks
for Y and one each for Cb and Cr- is carried out.
 The format of each encoded macroblock is shown in outline in Figure below.

 For each macroblock:-


1. Address us for identification purposes
2. The type field indicates whether the macroblock has been encoded independently intracoded or with reference
to a macroblock in a preceding frame inter-coded.
3. The Quantization value is the threshold value that has been used to quantize all the DCT coefficients in the
macroblock.
4. Motion vector is the encoded vector if one present.
5. The coded block patient indicates which of the six 8 x 8 pixel bloc that rake up the macroblock are present if
any, and, for those present, the JPEG-encoded DCT coefficients are given in each block.
The format of each complete frame is shown in the figure below

Page 1
Multimedia Communications

 The picture start code indicate the start of each new (encoded) video frame/picture.
 The temporal reference field which is a time stamp to enable the decoder to synchronize each video block with
an associated audio block containing the same time stamp.
 The picture type indicates whether the frame is an I- or P-frame.
 Although the encoding operation is carried out on individual macroblocks, a larger data structure known as a
group of (macro) blocks (GOB) is also defined. This is a matrix of 11 x 3 macroblocks, the size of which has
been chosen that both the CIF and QCIF comprise an integral number of GOBs 12 in the case of the CIF (2 x 6)
and 3 in the case of the QCIF (1 x 3) which allows interworking between the two formats.
 At the head of each GOB is a unique start code which is chosen so that no valid sequence of variablelength
codewords from the table of codewords used in the entropy encoding stage can produce the same code.
 In the event of a transmission error affecting a GOB, the decoder simply searches the received bitstream for this
code which signals the start of the next GOB. For this reason the start code is also known resynchronization
marker.

 Each GOB has a group number associated with it which allows for a string of GOBs to be missing from a
particular frame.
 This may be necessary, for example, if the amount of (compressed) information to be transmitted is temporarily
greater than the bandwidth the transmission channel.
 With motion estimation, the amount of information produced during the compression operation varies.
 However, since the transmission bandwidth that is available with target applications of the H.261 standard is
fixed 64 kbps or multiples of this in order to optimize the use of this bandwidth, it is necessary to convert the
variable bit rate produced by the basic encoder into a constant bit rate.
 This is achieved by first passing the encoded bitstream output by the encoder through a first-in, first-out (FIFO)
buffer prior to it being transmitted and at then providing a feedback path from this to the quantizer unit within
the encoder.
The role of FIFO buffer is shown in figure below.

Page 2
Multimedia Communications

Role of FIFO Buffer:


 The output bit rate produced by the encoder is determined by the quantization threshold values that are used; the
higher the threshold the lower the accuracy and hence the lower is the output bit rate.
 Hence, since the same compression technique is used for macroblocks in video encoders.
 It is possible to obtain a constant output bit rate from the encode by dynamically varying the quantization
threshold used. This is the role of the FIFO buffer.
FIFO Buffer:
 As the name implies, the order of the output from a FIFO buffer is the same as that on input.
 However, since the output rate from the buffer is a constant determined by the (constant) bit rate of the
transmission channel the input rate temporarily exceeds the output rate then the buffer will start to fill.
 Conversely, if the input rate falls below the output rate then the buffer contents will decrease. In order to
exploit this property, two threshold level is derived:
 The low threshold and The high threshold. The amount of information in the buffer is continuously monitored
and, should the contents fall below the low threshold, then the quantization threshold is reduced thereby
increasing the output rate from the encoder.
 Conversely, should the contents increase beyond the high threshold, then the quantization threshold is increased
in order to reduce the output rate from the encoder.
 Normally, the control procedure operates at the GOB level rather than at the macroblock level.
 Hence, should the high threshold be reached, first the quantization value associated with the GOB is increased
and, if this is not sufficient, GOBs are dropped until the overload subsides.
 Of course, any adjustments to the quantization threshold values that are made must be made also to those used in
the matching dequantizer.
 In addition, the standard also allows complete frames to be missing in order to match the frame rate to the level
of transmission bandwidth that is available

H.263:
 Video compression standard has been defined by the ITU-T for use in a range of video application over wireless
and PSTN.
 The applications include video telephony, video-conferencing, survey surveillance, interactive games playing,
and so on, all of which require the output of the video encoder to be transmitted across the network connection
in real time as it is output by the encoder.
 The access circuit to the PSTN operates in an analog mode and to transmit a digital signal over these circuits,
requires modem.
 Typical maximum bit rates over switches connections arrange from 28.8 kbps through to 56kbps and hence
the requirement of the video encoder is to compress the video associated with these applications down to very
low bit rates
 The basic structure is based on that used in the H.261 standard.
 At bit rates lower than 64kbps, however, the H.261 encoder gives a relatively poor picture quality.
 Since it uses only I- and P- frames at low bit rates it has to revert to using a high quantization threshold and
relatively low frame rate.
 The high quantization threshold leads known as BLOCKING ARTIFACTS – which are caused by the
macroblocks encoded using high thresholds differing from those quantized using lower thresholds.
 The use of a low frame rate can also leads to jerky movements.
 In order to minimize these effects, the H.263 standard has a number of advanced coding options compared with
those used in an H.261 encoder.
Digitization Formats:
 Two formats are QCIF and Sub – QCIF
 Each frame is divided into macroblock of 16X16 pixels for compression, the horizontal resolution is reduced
from 180 to 176 pixels to produce an integral number of (11) macroblocks.
 Hence since subsampling of the two chrominance signals used, the two alternative spatial resolutions are
QCIF: Y = 176 X 144, Cb = Cr = 88 X 72 o S-QCIF: Y = 128 X 96, Cb = Cr = 64 X 68
Progressive scanning is used with frame refresh rate of either 15 or 7.5 fps.

Page 3
Multimedia Communications
 The support of both formats is mandatory only for the decoder and encoder need support only one of them.
 The motion estimation unit is not required in the decoder and hence is less expensive to implement than the
encoder.
 The additional cost of having two alternative decoders is only small.
 However by having choice for the encoder, low cost encoder design based on the SQCIF can be used for
applications such as games playing or more sophisticated design based on the QCIF can be used for
videoconferencing.
 The decoder can be the same in both cases as it supports both formats.
Frame Types:
 In order to obtain higher levels of compression that are needed, H.263 uses I-P-& B- frames.
 Also in order to use as high a frame rate as possible, optionally neighboring pairs of P- and B- frames can be
encoded as a single entity, resulting frame is known as a PB – frame and Because of the much reduced encoding
overheads that are required, its use enables a higher frame rate to be used with given transmission channel.
 PB – frame comprises a B- frame and immediately succeeding P- frame.
 The encoded information for the macroblock in both these frames is interleaved, with information for the P-
frame preceding that of the B-frame.
 Hence at the decoder, as the encoded information is received, the macroblock for the P-frame is reconstructed
first using the received information relating to the P-macroblock and the retained contents of the preceding P-
frame.
 The contents of the reconstructed P- macroblock are used together with the received encoded information
relating to the macroblock in the B- frame and retained contents of the preceding P frame bi-directionally predict
the decoded contents of the B-macro-block.
 When the decoding of both frames complete, the B-frame is decoded first followed by the P-frame.

Unrestricted Motion Vectors

 The motion vectors associated with predicted macro-blocks are restricted to defined areas in the reference frame
around the location in the target frame of the macro-block being encoded.
 The search area is restricted to the edge of the frame.
 This means that should a small portion of a potential close- match macro-block fall outside of the frame
boundary, then the target macro-block is automatically encoded as for an I-frame.
 This occurs even though the portion of the macro-block within the frame area is a close match.
 To overcome this limitation, in the unrestricted motion vector mode for those pixels of a potential close –match
macroblock that fall outside of the frame boundary and the edge pixels themselves are used instead and should
the resulting macro-block produce a close match, then the motion vector.
 If necessary is allowed to point outside of the frame area. The small digitized frame formats are used with
H.263 standard.
Error Resilence:

 The target network for the H.263 standard is a wireless network or PSTN.
 This type of network is a relatively high probability that transmission bit errors will be present in the bitstream
received by the decoder.
 Normally such errors are characterized by periods when a string of error – free frames is received followed by a
short burst of errors typically corrupt a string macro-blocks within a frame.
 In practice, it is not possible to identify the specific macroblocks that are corrupted but rather that the related
group of blocks (GOB) contains one or more macroblocks that are in error.
 Also we can deduce from the frame because the contents of many frames are predicted from information in
other frames and it is highly probable that the same GOB in each of following frames that are derived from the
GOB is error will contains errors.
 This means that when an error in a GOB occurs , the error will persist for a number of frames and hence
making the errors more apparent to the viewers.
 When an error in GOB is detected, the decoder skips the remaining macro-blocks in the affected GOB and
searches for the resynchronization marker (start code) at the head of the GOB. It then recommences decoding
from the start of this GOB.
 In order to mask the error from the viewer and error concealment scheme is incorporated into the decoder.
 Since a PSTN provides only a relatively low bit rate transmission channel, to conserve bandwidth, intra coded
(I) frames are inserted at relatively infrequent intervals.
Page 4
Multimedia Communications
 Hence in applications such as video telephony in which the video and audio are being transmitted in real time.
 The lack of I – frames has the effect that errors within a GOB may propagate to other regions of the frame due
to the resulting errors in the motion estimation vectors and motion compensation information
 With digitization formats such as the QCIF the resulting effect can be very annoying to the viewer, although the
initial errors on neighboring GOBs. The schemes include error tracking independent segment decoding &
reference picture selection.
Error Tracking

 In video telephony; two – way communication channel required for the exchange of the compressed audio and
video information generated by the codec in each terminal.
 This means that there is always a return channel from the receiving terminal back to the sending terminal and
this is used in all 3 schemes by the decoder in order to inform the related encoder that error in a GOB has been
detected.
 Errors are detected in a number of ways including :
 1 or more out-of-range motion vectors
 1 or more invalid variable –length codewords
 1 or more out – of – range DCT coefficients
 An excessive number coefficients within a macro-block
 In error tracking scheme, the encoder retains known as error prediction information for all the GODs in each of
the most recently transmitted frames; i.e., the likely spatial and temporal effects on the macro-blocks in the
following frames that will result if a specific GOB in a frame is corrupted.
 When an error is detected, the return channel is used by the decoder to send a negative acknowledgment (NAK)
message back to the encoder in the source codec containing both the frame number and the location of the GOB
in the frame that is in error.
 The encoder then uses the error prediction information relating to this GOB to identify the macroblocks in those
GOBs later frames that are likely to be affected.
 Then proceeds to transmit the macro-block in these frames in their intra-coded-form. It is shown in figure below.
 H.263 error tracking scheme: (a) example error propagation

 H.263 error tracking scheme: (b) same example with error tracking applied.

Page 5
Multimedia Communications

Independent Segment Decoding:

 The aim of this scheme is not to overcome errors that occur within a GOB but rather to prevent these errors form
affecting neighboring GOBs in succeeding frames.
 To achieve this, each GOB is treated as a separate sub-video which is independent of the other GOBs in the
frame.
 There the motion estimation and compensation in limited to the boundary pixels GOB rather than a frame.
 Operation is shown in a & b of below figure.
 In (a) part, although when an error in a GOB occurs the same GOB in each successive frame is affected – until a
new intra-coded GOB is sent by the encoder – neighboring GOB s are not affected.
 Clearly, however, a limitation of this scheme is that the efficiency of the motion estimation the search area being
limited to a single GOB.
 The scheme is not normally used on its own but in conjunction with either the error tracking scheme.
 Independent segment decoding: (a) effect of a GOB being corrupted.

 Independent segment decoding: (b) when used with error tracking.

Page 6
Multimedia Communications

Reference Picture Selection


 This is similar to the error tracking to stop errors propagating by the decoder returning acknowledgement
messages when an error in a GOB is detected
 The scheme can be operated in 2 different modes
1)NAK MODE and 2) ACK MODE
 NAK MODE: only GOBs in error signaled by the decoder returning a NAK message. Normally intracoded
messages are encoded using an intra-coded (I) as the initial reference frame.
 However during encoding of Intracoded frames a copy of the decoded preceding frame is retained by the
encoder.
 Therefore , the encoder can select any of these when the NAK relating to frame 2 is received, the encoder selects
GOB 3 of frame 1 as the reference to encode god 3 of the next frame – frame.
 This scheme the GOB in error will propagate for a numbers of frames, the number being determined by the
round – trip delay being sent by the decoder and interceded frame derived from the initial I-frame being received
 Reference picture selection with independent segment decoding: (a) NAK mode

 ACK Mode: all frames received without errors are acknowledged by the decoder returning an ACK message.
 One frame that have acknowledged are used as reference frames.
 Hence the lack of an ACK for frame 3 means that frame 2 must be used to encode frame 6 in addition to frame 5.
 At this point the ACK for frame 4 is received and hence the encoder then uses this to encode frame 7
 The effect of using a reference frame which is distant from the frame being encoded is to reduce the encoding
efficiency for the frame.
 The ACK mode performs best when the round-trip delay of the communication channel is short and less than the
time the encoder takes to encode each frame.
 Reference picture selection with independent segment decoding: (b) ACK mode.

Page 7
Multimedia Communications

MPEG: Motion Pictures Expert Group


 Formed by the ISO to formulate a set of standards relating to a range of multimedia applications that involve the
use of video with sound.
 The outcome is a set of 3 standards which relate to either the recording or the transmission of integrated audio
and video streams.
 Each targeted at particular application domain and describes how the audio and video are compressed and
integrated together.
 The 3 standards which use different video resolutions are MPEG-1, MPEG – 2 MPEG – 4.
MPEG-1
 This is defined in a series of documents which are all subset of ISO Recommendation 11172.
 The video resolution is based on the source intermediate digitization format(SIF) with resolution of up to 352 X
288 pixels.
 The standard is intended for the storage of VHS – quality audio and video on CD-ROM at bit rates upto
1.5Mbps.
 Normally higher bit rates of multiples of this are more common in order to provide faster access to the stored
material
 MPEG – 1 & MPEG- 2 video standard uses similar video compression technique as H.261.
 The digitization format used with MPEG-1 is the SIF each frame is divided into macro-blocks of 16X16 pixels
for compression; the horizontal resolution is reduced from 360 to 352 pixels to produce an integral number of 22
macro-blocks.
 Hence the two chrominance signals are subsamples at half the rate of the luminance signal, the spatial resolutions
for the two types of video source are
 NTSC: Y = 352 X 240, Cb = Cr = 176 X120
 PAL: Y = 352 X 288, Cb = Cr = 176 X144
 Progressive scanning is used with refresh rate of 30Hz for NTSC and 25Hz for PAL.
 The standard allows the use of I-frames one I- and P – frames only or I – P – and B – frames, No D- frames
are supported in any of the MPEG standards and hence the case MPEG-1, I – frames must be used for the
various random access functions associated with VCRs.
 The accepted maximum random – access time is 0.5 seconds and so this is the main factor – along with video
quality – that influences the maximum separation of I – frames in the frame sequence used.
 Two example sequences are: IBBBPBBPBBI………………………… and IBBPBBPBBPBBI…………….
Page 8
Multimedia Communications
 The first being the original sequence proposed for use with PAL (which has slower frame refresh rate) and the
second for use with NTSC.
 The compression algorithm used is based on the H.261 standard.
 Hence each macro-block made up of 16X16 pixels in the Y plane and 8X8 pixels in the Cb and Cr planes
 However there are 2 main differences.
 The first is time stamps can be inserted within a frame to enable the decoder to resynchronize more quickly in
the event of one or more corrupted or missing macro-blocks
 The number macro-blocks between two – stamps is known as slice and slice can comprise from 1 through to the
maximum number of macro-blocks in a frame.
 Typically slice is made equal to 22 which is the number of macro-blocks in a line.
 The second difference arises because of the introduction of B-frames increases the time interval btw I & P
frames.
 To allow for the resulting increase the separation of moving objects with P- frames, the search window in the
reference frame is increased.
 Also to improve the accuracy of the motion vectors, finer resolution is used.
 Typical compression rations vary from about 10:1 for I frames, 20:1 for P-frames and 50:1 for B frames
 At the top level, the completely compressed video is known as a sequence which in turn consists of a string of
groups of pictures (GOPs) each comprising a string of I, P or B pictures / frames in the defined sequence.
 Each picture / frame is made of N slices, each of which comprises multiple macro-blocks and won to an 8 X 8
pixel block.
 Hence in order for the decoder to de-compress the received bitstream , each data structure must be clearly
identified within bitstream.
 MPEG-1 video bitstream structure: (a) composition.

 MPEG-1 video bitstream structure: (b) format.

Page 9
Multimedia Communications

 The start of the sequence is indicated by a sequence start code. This is followed by 3 parameters, each of which
apply to the complete video sequence.
 The video parameters specify the screen size and aspect ratio, bitstream parameters indicate the bit rate and the
size of the memory / frame buffers that are required, and the quantization parameters contain the contents of the
quantization tables are to be used for the various frame / picture types. These are followed by the encoded video
stream which in the form of a string of GOPs.
 Each GOP is separated by a start code followed by time- stamp for synchronization purposes and a parameter
fields defines the particular sequence of frame types that are used in each GOP.
 This is then followed by the string of encoded pictures /frames in each GOP.
 Each is separated by a picture start code and is followed by a type field (I,P, or B), buffer parameters, which
indicate how full the memory buffer should be before the decoding operation should start and encode parameters
which indicate the resolution used for the motion vectors. This is followed by a string of slices, each comprising
a string of macroblocks.
 Each slice is separated by a slice start code followed by a vertical position field, which defines the scan line the
slice relates to and quantization parameter that indicates the scaling factor that applies to this slice. This is the
followed by a string of macro-blocks each of which is encoded in the same was for H.261.

MPEG-2
 This is defined in a series of documents which are all subsets of ISO recommendation 11172.
 Intended for the recording and transmission of studio – quality audio and video.
 The standard covers 4 levels of video resolutions
 LOW: based on SIF digitization format with a resolution of up to 352 X 288 Pixels. It is compatible with the
MPEG-1 standard and produces VHS quality video. The audio is of CD quality and the target bit rate is upto
4 Mbps
 MAIN: based on the [Link] digitization format with resolution of upto 720X576 pixels. This produces studio-
quality video and the audio allows for multiple CD-quality audio channels. The target bit rate is upto
15Mbps or 20Mbps with the [Link] format.
 HIGH 1440: based on the [Link] digitization format with a resolution of 1440X 1152 pixels. Intended for
HDTV at bit rates upto 60Mbps or 80Mbps with the [Link] format
 HIGH: based on the [Link] digitization format with a resolution of 1920X 1152 pixels. It is intended for wide
– screen HDTV at a bit rate of up to 80Mbps or 100Mbps with the [Link] format.
 In addition there are 5 profiles associated with each level; simple, main, spatial resolution, quantization accuracy
and high .
 These have been defined so that the 4 levels and 5 profiles collectively form a 2D table which acts framework for
all standards activities associated with MPEG-2.
 Low level MPEG-2 is compatible with MPEG-1 ; MP@ML and two high levels relating to HDTV.

Page 10
Multimedia Communications
MP@ML
 Standard for digital TV broadcasting - Interlaced scanning used – resulting frame refresh rate of either 30Hz
(NTSC) or 25HZ (PAL) - [Link] digitization format used with resolution of either 720 X 480 pixels at 30Hz or
720 X 576 Pixels at 25Hz .
 The output bit rate from MUX can range from 4 Mbps through to 15Mbps, the actual rate used being determined
by the bandwidth available with broadcast channel.
 Video coding scheme similar to that used MPEG-1 difference is interlaced scanning used - Interlaced means that
each frame is made up of two fields : filed mode – frame mode ; The choice of the mode is determined by the
amount of motion present in the video. If large amount of motion is present then it is better to perform the DCT
encoding operation on the lines in a field part since this will produce a higher compression ration owing to the
shorter time interval btw successive fields.
 Alternatively if there is little movement, the frame mode can be used since longer time interval btw successive
frames is less important. Hence in this case, the macro-blocks / DCT blocks are derived from the lines in each
complete frame. Standard allows either mode to be used, the choice being determined by the type of video. For
example, a live sports event is likely to be encoded using the field mode and studio based program the frame
mode.
 In relation to the motion estimation associated with the encoding of macro-blocks in P and B frames, 3 different
modes are possible; field, frame, mixed.
 Field mode: the motion vector for each macro-block is computed using the search window around the
corresponding macro-block in the immediately preceding (I or P) field for P- frame and B – frame and for B-
frames, the immediately succeeding (P or I) field. The motion vectors, therefore relate to the amount of
movement that has taken place in the time to scan one field.
 Frame mode: - macro-block in an odd field is encoded relative to that in the preceding / succeeding odd fields
and similarly for the macro-blocks in even fields. The motion vectors relate to the amount of movement that has
taken place in the time to scan two fields; that is, time to scan complete frame.
 Mixed mode: the motion vectors for both field and frame modes are computed and the one with the smallest
mean value is selected.

HDTV
 3 standards associated with HDTV
i. Advanced television (ATV) in north America
ii. Digital Video broadcast (DVB) in Europe
iii. Multiple sub-Nyquist sampling encoding (MUSE) in Japan and rest of ASIA
 All 3 standards defining the digitization format and audio video compression schemes used also how the
resulting digital bitstream are transmitted over the different types of broadcast network.
 There is ITU-R HDTV specification concerned with the normal digital TV used in TV studios for the
production of HDTV programs and also for the international exchange of programs; this defines a 16/9
aspect ratio with 1920 samples per line and 1152 lines frame.
 Currently interlaced scanning used with [Link] digitization format. In future it is expected that progressive
scanning will be introduced with the [Link] formats.
 ATV standard formulated by an alliance of a large number of TV manufacturer known as GRAND
ALLIANCE. It includes ITU-R HDTV specification and second lower – resolution format. This also uses a
16 /9 aspect ratio but with resolution of 1280X720.
 The video compression algorithm in both case is based on the main profile at the high level mp@HL of
MPEG-2 and audio compression standard is based on Dolby AC-3.

Page 11
Multimedia Communications
 DVB HDTV standard on 4/3 aspect ratio and defines a resolution of 1440 samples per line and 1152 lines
per frame. This is exactly twice the resolution of the low-definition PAL digitization format of 720X576.
 The video compression algorithm is based on SSP@H1440 – spatially scalable profile at high 1440 – of
MPEG-2 similar to that used with MP@HL. The audio compression standard is MPEG audio layer 2.
 MUSE standard based on 16/9 aspect ratio with a digitization format of 1920 samples per line and 1035 lines
per frame. The video compression algorithm is used MP@HL.
 MPEG-2 DCT block derivation with I-frames: (a) effect of interlaced scanning.

 MPEG-2 DCT block derivation with I-frames: (b) field mode;

 MPEG-2 DCT block derivation with I-frames: (c) frame mode.

Page 12
Multimedia Communications
MPEG- 4

 Initially this standard was concerned with a similar range of application to those of H.263.
 Each running over very low bit rate channels ranging from 4.8 to 64kbps.
 Later its scope was expanded to embrace a wide range of interactive multimedia app over the internet and the
various types of entertainment networks.
 The first 3 MPEG standards are in 3 parts: video, audio and system.
 The video and audio are concerned with the way each is compressed and how the resulting bitstream are
formatted.
 The system part is concerned with how the two streams are integrated together to produce a synchronized output
stream.
 The standard contains features to enable a user not only to passively access a video sequence but also to access and
manipulate the individual elements that make up each scene within the sequence / video.
 Of the accessed video is computer – generated – cartoon, the user may be given the capability by the creator of the
video to reposition, delete, or alter the movements of the individual characters within scene.
 Because of its high coding efficiency with scenes like video telephony, standard also used is low bit rate network
for such application it is an alternative to the H.263 standard.
Scene Composition:
 MPEG-4 has number called content-based functionalities. Before being compressed each scene is defined in the
form of background and one or more foreground AVO. Each AVO in turn defined in the form of one or more video
objects and audio objects.
 For example stationary car in a scene may be defined suing just single video object while a person who is talking
may be defined using both an AV object.
 Similar way each video and audio object may itself be defined as being made up of a number of subobjects. For
e.g. if a person face may be defined in the form of three sub-object 1) head 2) eyes 3) mouth, once has been done
the encoding of the background and each AVO is carried out separately.
 AVO consisting of both audio and video objects each has additional timing information relating to it to enable the
receiver to synchronize the various objects and sub-objects together before they are decoded.
 Each AV object has separate object descriptor associated which allows the object – providing the creator of the AV
– to be manipulated by the viewer prior to it being decoded and played out. The language used to describe and
modify objects called binary format for scenes (BIFS). This has commands to delete an object and in case of video
object, change its shape, appearance – color and animate the object in rea time.
 Audio object have similar set of commands to change its volume. Possible to have multiple versions of an AVO,
the first containing the base-level compressed AV streams and other various levels of enhancement.
 This types of compression called scaling and allows the encoded contents of an AVO to be played out at a rate and
resolution that matches those of the interactive terminal being used. At a higher level, the composition of a scene
contains is defined in a separate scene descriptor. This defines the way the various AVO are related to each other in
the context of the complete scene.
AV compression
 AVO is compressed using one of the algorithms depends on the available bit rate of the transmission channel and
sound quality required.
 From figure part (A) each VOP is derived is a difficult image processing task. It involves identifying regions of a
frame that have similar properties such as color, texture, or brightness.
 Each of the resulting object shapes is then bounded by a rectangle to form the relate VOP which has no motion
associated with it will produce a minimum compressed information. Also which move often occupy only a small
portion of the scene / frame, the bit rate of multiplexed video stream is much lower than that obtained with other
standards.
 For applications that support interactions with particular VOP, a number of advanced coding algorithms are
available to perform the shape coding functions.
 MPEG-4 coding principles: (a) encoder/decoder schematics:

Page 13
Multimedia Communications

 VOP encoder schematic.

Transformation format

Page 14
Multimedia Communications

 MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio,
but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it
allows streaming over the Internet
 All information relating to frame / scene encoded in MPEG-4 is transmitted over a network in the form of
Transparent Stream consisting of Multiplexed stream of packetized elementary streams.
 The compressed Audio Video information relating to each AVO in the scene is called an Elementary stream this is
carried in the pay load field of the PES packet. Each PES packet contains a type field in the packet header and this
is used by the FlexMux layer to identify and route the PES to the related synchronization block in the
synchronization layer. The compressed Audio Video associated with each AVO is carries in a separate ES.
Exercise: A digitized video is to be compressed using the MPEG-1 standard. Assuming a frame sequence of:
IBBPBBPBBPBBI... and average compression ratios of 10:1 (I), 20:1 (P) and 50:1 (B), derive the average bit
rate that is generated by the encoder for both the NTSC and PAL digitization formats.

Page 15
Multimedia Communications

Reversible variable Length Codes (RVLCs) were proposed for facilitating the bidirectional decoding of a source-encoded bit
stream, which moderate the visual effects of transmission errors in case of losing synchronization between the encoder and
decoder in the wireless video telephony. RVLCs are accepted as a substitute to Huffman codes in rising video coding standards
such as H.261+, H.263++ and MPEG-4 to develop the consequent error resilience capability. Traditionally variable length codes
(VLCs) have been used as entropy coding in many image coding standards like JPEG and video coding standards like H.261,
H.263, Mpeg-1 and MPEG-2. An example of VLC is the Huffman code, which is well known to give the optimal code with
minimum redundancy. However, in recent years new standards uses RVLC, because VLCs have the problem of error
propagation. Even one single bit error will cause many following code words to be misinterpreted. This is a big problem in error
prone environment. RVLC are not only prefix free codes but also suffix free codes. A code is called prefix free codes if no
codeword is a prefix of any other code words. Same as, a code is called a suffix free code if no code word is a suffix of any other
code words. Therefore, RVLCs can be decoded both in forward and backward directions. So as to provide error resilient
transmission over a noisy channel. RVLCs can also be applied to speed up the searching of encoded data, because by using
RVLC we can begin by searching the encoded data in the forward and backward direction at the same time. This can drastically
reduce the search time and this kind of searching is impossible when we are using [Link] are two types of RVLCs,
symmetrical and asymmetrical. Symmetrical RVLCs share the same coding table when decoding in forward direction and
decoding in backward direction, because the code is symmetrical. In case of asymmetrical RVLC codes, two types of coding
tables are necessary separate for forward direction and backward direction. For this reason symmetrical RVLCs are simpler then
asymmetrical RVLCs. The memory requirement of symmetrical RVLC is less then asymmetrical [Link]; asymmetrical
RVLCs always provide better efficiency than symmetrical RVLC because a more supple code selection is allowed. In this paper
we concentrate on designing of Reversible VLC with good error correcting properties. The remaining paper is organised as
follows Section 2 give detail use of RVLC in recent video coding standard. Section 3 explains the detail of our proposed
algorithm. Experimental result is discussed in Section 4 .The conclusion and future work is discussed in section 5.

USE OF RVLC IN RECENT VIDEO CODING STANDARD Variable length codes are widely for data compression in much
different application. However, as with other data compression techniques, variable length codes are prone to channel error. A
single bit error in the encoded bit stream may guide to a large number of decoding error. Many researches have been proposed to
make variable length codes more vigorous in the presence of channel error at the cost of increasing slightly the average code
word length. Some authors suggested the use of RVL codes to facilitate the location of error in the encoded message. They have
both the prefix and suffix properties and can be immediately decoded in both forward and backward direction. One of the error
resilience tools in MPEG-4 uses RVL codes rather than Huffman or Arithmetic codes to limit the number of bits in a video
frame which are corrupted. The decoding procedure for RVL codes are discussed as follows. The RVL codes encoded RLL
values i.e. Run, Level and Last are decoded between two resynchronization markers in the forward direction .RLL is a
representation of a segment which are composed of some zeros followed by a non-zero coefficient of DCT coefficient in the
video compression methods using DCT transformation

The run value represents the number of successive zeros in a segment; the level value represents the magnitude of the non-zero
coefficient following the run of zeros; and the last is a flag (one bit) that indicates whether the current non-zero coefficient is the

Page 16
Multimedia Communications
last non-zero coefficient in the block. In decoding procedure if an error is detected, forward decoding is stopped and the next
resynchronization marker is found. The same data block is then decoded in the backward direction from the next
resynchronization marker as shown in figure (1). If an error has occurred in the data block when forward decoding is performed,
the error is detected to the right of the authentic position of the error. Whereas in the case of backward decoding the error is
detected to the left of the actual position of the error’s actual position. The data between the two detected error positions is
discarded; whereas the rest of the data is correctly decoded. In table 1, the average code length of symmetrical RVLC C14 is
shorter than C12 and C13, where C11 is a list of Huffman code, C12 is a symmetrical RVLC by Takishima et al.’s algorithm and
C13 is a symmetrical RVLC generated by Tsai and Wu’s algorithm

Standards for multimedia communications

• A range of application-level standards have been defined that are concerned with how the integrated information streams
associated with the various applications are structured.
• Standards are necessary because it is essential that the two or more items of equipment that are used for the application
interpret the integrated information stream in the same way.
• It is necessary also to ensure that both communicating parties utilize the same standards for detecting the presence of bit errors
in the received information stream.
• Aspects of communication protocol:
• Detecting the presence of bit errors in the received information stream and requesting for retransmission.
• The initiation and clearing of a communications session between 2 communicating applications
• The setting up and clearing of a connection through the particular network being used.

Reference model
The standards associated with the 3 types of basic applications have a common structure.
Functionality of set of standards:
Application standards: provide users, through an appropriate interface, with access to a range of multimedia communication
applications.
Network interface standards: different types of network operate in different modes and each type of network has a different set
of standards for interfacing to it.
Internal network standards: deal with the internal operation of the network.

Page 17
Multimedia Communications

In practice, associated with each standard is the set of procedures that are to be used to perform the particular function such as
How to format the source information stream
How to detect transmission errors
How to handle errors
For each function, both communicating parties must adhere to the same set of procedures and collectively these form the
communications protocol relating to that function.
The implementation of a communication system is based on a layered architecture.
The protocol layers that normally used are based on what is called the TCP/IP reference model
TCP/IP reference model
A reference model is simply a common framework for defining the specific set of protocols to be used with a particular
application/network combination.
The resulting set of protocols are then known as the
protocol stack for that application/network combination.

Page 18
Multimedia Communications

Physical layer
The physical layer is concerned with how the binary information stream associated with an application is transmitted over the
access circuit to the network interface
Link layer
The more usual form of representing the source information stream is in the form of a contiguous stream of blocks with each
block containing the integrated media stream associated with the application.
The role of the link layer is to indicate the start and end of each block within the source bitstream and, in a packet-switched
network, to add error check bits to the information bitstream for error detection and/or error correction purposes.
Network layer
The network layer is concerned with how the source information stream gets from one end system to another across the total
network.
Examples:
Connection-oriented network : how to set up/clear connection, exchange information
Connectionless network: how to format a packet
There are different network layer protocols for different types of network
Transport layer
The role of the transport layer is to mask the differences between the service offered by the various network types from the
application layer and instead, to provide the application with a network-independent information interchange service.
Application layer
The application layer provides the user, through a suitable interface, with access to a range of multimedia communication
services.
The application layer in an end system contains a selection of application protocols, each providing a particular service

Standards relating to interpersonal communications


Interpersonal communications such as telephony, video telephony, data conferencing, and videoconferencing can be provided
both by circuit-mode networks and packet- mode networks.
Most of the standards relating to these applications have been defined by the ITU-T and there are separate standards for different
types of network.

Circuit-mode networks
The network interface standards relate primarily to the physical connection to the network termination and with the procedures
followed to set up and clear a connection.
The basic transport layer function is provided by the multiplexer/demultiplexer.
The multiplexer merges (1) the source information from the 3 application streams - audio, video and user data - and (2) the
system control application into a single stream for transmission over the constant bit rate channel provided by the connection

Page 19
Multimedia Communications

The system control application is concerned with negotiating and agreeing on the operational parameters to be used with the
call/session.
In a multiparty conference call, it involves each end system communicating with a multipoint control unit (MCU).
The audio and video codecs each use a particular compression algorithm which is appropriate for the application and within the
bandwidth limits provided by the network.
If the user data is shared between the various members of a conference, the application uses the services provided by a protocol
known as a multipoint communications service (MCS).
A system -level standard embraces a number of additional standards for the various component functions such as audio and
video compression
Table 5.1 Summary of the standards used with the different types of circuit-mode network

Standard H.320 H.324 H.321 H.310 H.322

Network ISDN PSTN B-ISDN B-ISDN Guaranteed


bandwidth LSNs
Audio codec G.711* G.722 G.723.1* G.729 G.711* G.722 G.711* G.722 G.711* G.722 G.728
G.728 G.728 G.728
MPEG-1*
Video codec H.261 H.261* H.263* H.261 H.261* MPEG-2* H.261
User data application T.120 T.120 T.120 T.120 T.120
Multiplexer/de multiplexer H.221 H.223 H.221 H.221 H.222 H.221
System control H.242 H.245 H.242 H.245 H.242
Call setup (signaling) Q.931 V.25 Q.931 Q.931 Q.931
* Mandatory
H.320
The H.320 standard is intended for use in end systems that support a range of multimedia applications over an ISDN
Audio
Options: G.711 (64kbps), G.722 (64kbps) and G.728 (16kbps)
Determined primarily by the amount of available bandwidth.

Video

Page 20
Multimedia Communications
H.261
video resolution: QCIF or CIF, negotiable
a constant bit rate is maintained by varying the quantization threshold.

User data
Based on T.120 standard
Application-specific recommendations that support the sharing of various media types
T.124: text
T.126: still-image and whiteboard
T.127: file contents (text and binary)
T.128: text documents and spreadsheets
Communications-related recommendations
T.122: multipoint control unit (MCU) procedures
T.125: multipoint communication services (MCS) procedures
T.123: a series of network-specific transport protocols for providing a reliable transport service.
To use non-standard protocols is negotiable.

System control/call setup


The call setup (signaling) procedure associated with an ISDN is defined in Recommendation Q.931.
It involves the exchange of messages over a separate 16kbps channel known as the signaling channel.
The bandwidth associated with the audio, video and data streams are negotiated and fixed at the start of a conference.
Recommendation H.242 is concerned primarily with the negotiation of the bandwidth/bit rate to be used for each stream.
Once an end system has set up a connection to the MCU, it informs the MCU of its capabilities. The MCU then negotiates and
agrees a minimum set of capabilities so that all members of the conference can participate
Multiplexing
It is defined in Recommendation H.221 and describes how the audio, video, and data streams are multiplexed together for
transmission over the network.
Time division multiplexing (TDM) technique is normally used.

H.324
The H.324 standard is intended for use in a PSTN.

Video
Options: H.261 or H.263

Audio
Options: either G.723.1 (5.3/6.3kbps) or G.729

User data:
Basically the same set of protocols as are used in an H.320-compliant terminal except for the network- specific transport
protocol T.123
Multiplexing
Streams are not allocated fixed portions of the available bandwidth but rather these are negotiated using the
H.245 system control protocol.
The total channel bandwidth is divided into a number of separate logical channels each of which is identified by means of a
logical channel number (LCN).
The allocation of LCNs is controlled by the transmitter.
A bit-oriented protocol is used to merge streams that are currently present into the available channel

Page 21
Multimedia Communications

M-PDU is sandwiched by 01111110.


M-PDU is a field containing a header and information.
A field contains a number of logical channels.
Each channel carries separate media stream or control information.
H contains a 4-bit multiplex code which is used to specify a particular pattern of logical channels in the M- PDU.
The multiplex table can be modified by the transmitter if necessary
Adaptation
Additional bytes can be added by the transmitter for error detection/correction purposes.
The adaptation layer specified in H.223 standard supports 3 different schemes:
AL1: support retransmission, for user data applications
AL2: retransmission is optional, for audio and video streams
AL3: support retransmission, for serious video applications
All schemes support error detection.

Multipoint conferencing
The H.324 standard supports multipoint conferencing via an MCU.
MCU negotiates an agreed minimum bit rate with all the participants by the exchange of system control messages.
Internetworking between an H.324 terminal and an
H.320 terminal can be supported.
In such a case, transcoding for audio stream may be necessary

System control
The H.245 system control standard is concerned with the overall control of the end system.
Functions involved:
Exchange of messages for the negotiation of capabilities
Page 22
Multimedia Communications
Opening and closing of logical channels
Transmission of the contents of the multiplex table
Choice of adaptation layers
Packet-switched networks
Two alternative sets of protocols have been defined for providing interpersonal communication services over packet-switched
networks:
ITU recommendation H.323
IETF standards
H.323
Unlike the H.322 standard which relates to LANS that offer a guaranteed bandwidth/QoS, the H.323 standard is intended for use
with LANs that provide a non- guaranteed QoS which, in practice, applies to the majority of LANs.
The standard comprises components for the packetization and synchronization of the audio and video streams, an admission
control procedure for end systems to join a conference, multipoint conference control, and interworking with terminals that are
connected to the different types of circuit-switched networks.
The standard is independent of the underlying transport and network interface protocols and hence can be used with any type of
LAN

Audio and video coding


Audio :
Options: G.711 or G.728 when work with H.320- compliant terminals
G.723.1 or G.729 when work with H.324-compliant terminals.

Video:
Options: either H.261 or H.263, negotiated prior to transmission
The compressed audio and video streams are formatted into packets for transfer over the network using the real- time transport

Page 23
Multimedia Communications
protocol (RTP).
RTP is for the transfer of real-time information.
At the head of each RTP packet, there is a format specification which defines how the packet contents are structured.
A sending end system does the following with the Real- time transport control protocol (RTCP).
Send information to allow the receiving end system to synchronize the audio and video streams
Send information such as the transmitted packet rate, the packet transmission delay, the percentage of packets lost/corrupted and
the interarrival jitter such that the corresponding end systems can use them to optimize the number and size of receiver buffers
and to determine if the retransmission of lost packets is feasible.

Call setup
LANs do not provide a guaranteed QoS and have no procedures to limit the number of calls/sessions that are using the LAN
concurrently.
In order to limit the number of concurrent calls that involve multimedia, a device called an H.323 gatekeeper can (optionally) be
used.
To set up a call or request additional bandwidth, each end system must first obtain permission from the gatekeeper
The messages exchanged with the gatekeeper concerned with the 2 end systems obtaining permission to set up a call are part of
the resource access service (RAS) protocol

Interworking
The H.323 standard also defines how internetworking with end systems that are attached to a circuit-mode network is achieved
through a H.323 gateway.
The role of a gateway is to provide translations between the different procedures associated with each network type.
In order to minimize the amount of transcoding required in the gateway, the same audio and video codec standards are used
whenever possible.
A second function associated with a gateway relates to address translation.
Different types of network use different addressing schemes. (e.g. IP address in a LAN using the TCP/IP protocol set and
telephone number in a PSTN.)
The gatekeeper performs the necessary translation between the different address types during the call setup procedure

Page 24
Multimedia Communications

Page 25

You might also like