Distributed Arithmetic and Offset Binary Coding
Distributed Arithmetic and Offset Binary Coding
Distributed Arithmetic
Purpose
• Bit-level hardware acceleration for vector-vector multiplications.
Key Applications
• Convolution operations
• Discrete Cosine Transform (DCT) in video compression.
Core Idea
• Reorder and mix multiplications to "distribute" arithmetic operations.
• Replace multipliers with ROMs and shift-accumulate units.
Conventional Distributed Arithmetic
Consider an inner product between 2 length-N vectors C and X:
Since the term Cj depends on the xi,j values and has only 2N possible values, it is possible to
precompute them and store them in a read only memory (ROM). An input set of N bits (x0j ,x1j ,…,xN−1,j )
is used as an address to retrieve the corresponding Cj values. These intermediate results are
accumulated in W clock cycles to produce one Y value. This leads to a multiplier-free realization of
vector multiplication.
Content of the ROM for N= 4
Conventional Distributed Arithmetic
Therefore,
Distributed Arithmetic with Offset Binary
Coding
Let and
Therefore, we get
Distributed Arithmetic with Offset Binary
Coding
• The Dj values exhibit a mirrored pattern.
• This mirroring indicates that the term Dj has a
reduced number of possible values, specifically
2N−1, which depend on the xi,j values.
• Consequently, the memory (ROM) required can be
reduced by half.
• Computation typically begins with the least
significant bit (lsb) of xi .
• Logic gates are employed for address decoding
within the system.
• A multiplexer provides an initial value to the
accumulator.
• Another multiplexer is used to invert the output of
the ROM under a specific condition.