0% found this document useful (0 votes)
51 views13 pages

Distributed Arithmetic and Offset Binary Coding

Uploaded by

akshat28 mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views13 pages

Distributed Arithmetic and Offset Binary Coding

Uploaded by

akshat28 mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Distributed Arithmetic

Distributed Arithmetic
Purpose
• Bit-level hardware acceleration for vector-vector multiplications.
Key Applications
• Convolution operations
• Discrete Cosine Transform (DCT) in video compression.
Core Idea
• Reorder and mix multiplications to "distribute" arithmetic operations.
• Replace multipliers with ROMs and shift-accumulate units.
Conventional Distributed Arithmetic
Consider an inner product between 2 length-N vectors C and X:

where, ci : M-bit constants and xi : W-bit 2’s complement numbers


Conventional Distributed Arithmetic
If we define CW-1-j as below,

Then, the output Y can be written as follows:

Since the term Cj depends on the xi,j values and has only 2N possible values, it is possible to
precompute them and store them in a read only memory (ROM). An input set of N bits (x0j ,x1j ,…,xN−1,j )
is used as an address to retrieve the corresponding Cj values. These intermediate results are
accumulated in W clock cycles to produce one Y value. This leads to a multiplier-free realization of
vector multiplication.
Content of the ROM for N= 4
Conventional Distributed Arithmetic

Architecture of computing inner product of two length-N


vectors using distributed arithmetic
Conventional Distributed Arithmetic
• The shift-accumulator is a bit-parallel carry-
propagate adder that adds the ROM content to
the previous accumulated result.
• The inverter and the MUX are used for inverting
the output of the ROM in order to compute CW−1
and the control signal S is 1 when j=W−1 and 0
otherwise.
• The computation runs from j=0 to j=W−1 and the
result is available in bit-parallel format after W
clock cycles.
• This approach corresponds to a bit-serial
distributed arithmetic.
• For example, if J consecutive bits are
processed in a single clock cycle using
J ROMs, then the input words are processed in
Architecture of computing inner product of two length-N W/J clock cycles.
vectors using distributed arithmetic • A multi-input shift-accumulator adds the contents
of J ROMs and the previous accumulated result,
and generates the output in bit-parallel format.
Distributed Arithmetic with Offset Binary
Coding
Offset Binary Coding reduces the ROM size by a factor of 2 from 2N to 2N-1.
Distributed Arithmetic with Offset Binary
Coding
Let di,j ∈ {-1,1} be defined as follows:

Therefore,
Distributed Arithmetic with Offset Binary
Coding
Let and

Therefore, we get
Distributed Arithmetic with Offset Binary
Coding
• The Dj values exhibit a mirrored pattern.
• This mirroring indicates that the term Dj has a
reduced number of possible values, specifically
2N−1, which depend on the xi,j values.
• Consequently, the memory (ROM) required can be
reduced by half.
• Computation typically begins with the least
significant bit (lsb) of xi .
• Logic gates are employed for address decoding
within the system.
• A multiplexer provides an initial value to the
accumulator.
• Another multiplexer is used to invert the output of
the ROM under a specific condition.

Content of the ROM with OBC (N = 4)


Distributed Arithmetic with Offset
Binary Coding

Content of the ROM (Reduced Size)


with OBC (N=4)
Distributed Arithmetic with Offset
Binary Coding

Architecture of computing inner product of 2


length-N vectors using distributed arithmetic with
OBC coding.

You might also like