0% found this document useful (0 votes)
104 views25 pages

AES Algorithm and Implementation

This document describes and compares different architectures for implementing the Advanced Encryption Standard (AES) algorithm. It discusses compact iterative and fully pipelined implementations. A fully pipelined architecture runs the key expansion in parallel with the round transformations to enable unique keys per data block with no penalty to throughput. It achieves this by dividing key expansion into four incremental blocks that generate each word of the expanded key over four clock cycles synchronized with the round transformations.

Uploaded by

Sowmya Madhavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views25 pages

AES Algorithm and Implementation

This document describes and compares different architectures for implementing the Advanced Encryption Standard (AES) algorithm. It discusses compact iterative and fully pipelined implementations. A fully pipelined architecture runs the key expansion in parallel with the round transformations to enable unique keys per data block with no penalty to throughput. It achieves this by dividing key expansion into four incremental blocks that generate each word of the expanded key over four clock cycles synchronized with the round transformations.

Uploaded by

Sowmya Madhavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Prepared By

Sowmya Madhavan
Associate Professor, Dept. of ECE
NMIT, Bangalore
• The objective of this chapter is to describe a number of AES architectures
and to analyze the various trade-offs relative to performance versus area.

• AES is a symmetric, secret-key cipher that maps a 128-bit block of plain


text data to a 128-bit block of cipher text.

• The length of the key is variable between 128, 192, and 256 bits and will
determine level of security (longer key ¼ larger key space ¼ more
security).

• The transformations in the AES algorithm consist of four components


organized as distinct modules: Sub Bytes (bit mapping), shift rows
(swapping), mult-column [transformation over GF(28)], and Add Round
Key [addition of round key with bitwise operations in the field GF(2)].
• These transformations make up a “round,” and the number of rounds is
determined by the key size (128 bits, 10 rounds; 192 bits, 12 rounds;
256 bits, 14 rounds).

• The round key for each round is unique. These round keys are derived
from the original key through the key expansion. The key expansion is
one of the architectural focal points.
• The key expansion, which runs parallel to the data path, takes the cipher
key and creates a unique key for each transformation round.

• Let a word = 32 bits and Nk = Keysize/Wordsize (128, 192, or 256/32).

• The first Nk words of the expanded key are filled with the cipher key.

• Every subsequent 32-bit word in the expanded key is the XOR


(Exclusive-OR) of the previous 32-bit word and the 32-bit-word Nk
words previous to the current word.

• For words that occur on a multiple of Nk, the current word undergoes a
transformation prior to the XOR operation, followed by an XOR with a
round constant.
• The transformation consists of a cyclic permutation, followed by an 8-
byte mapping for all four bytes in the 32-bit word.

• The round constant is defined by FIPS 197 as the values given by [x(i21),
f00g, f00g, f00g], with x(i21) being powers of x, where x is denoted as
f02g in the field GF(28).
• Sub-bytes is implemented as a look-up table due to the iterative nature of
the algorithms implemented by it as well as the relatively small map space.

• Thus, an 8-bit to 8-bit mapping would be efficiently implemented as a


synchronous 8 256 (28) ROM with a single pipeline stage.
• This stage simply mixes the rows in the data block, so no logic is used
here.

• Thus, another pipeline stage at this point would create an imbalance of


logic around the pipeline stages and thus decrease maximum frequency
and total throughput.
• This stage has the most logic out of all four Round stages and is thus the
best place to add the additional pipeline stage.
• Mix-Column uses a module called Map-Column as a building block.

• Map-Column uses a block called Poly-Mult X2 (polynomial 2


multiplier) as a building block.
• The first implementation under consideration is a compact implementation
designed to iteratively reuse logic resources.

• Initially, the incoming data and key are added together in the Initial Round
module, and the result is registered before entering the encryption loop.

• The data is then applied to the Sub Bytes, Shift Rows, Mult-Column, and
Add Round Key in the specified order.

• At the end of each round, the new data is registered. These operations are
repeated according to the number of rounds.
Compact Implementation
(See Code under Figure 4.7)
• An AES round is completed in 11 to 14 clock cycles depending on the
key size.

• There will be multiple instantiations of the data path core which can be
used to create a pipelined design.

• Key expansion is performed in a static fashion (static pipelining)


Refer
Pipelining
document
• A problem arises in these architectures if one were to introduce
new keys at a rate faster than the encryption speed.

• The surrounding system would have to be smart enough to wait for the
pipe to empty before introducing the new data block along with the new
key.

• This information has to be fed back to the outside system that is


providing the information and the corresponding keys so that they can be
buffered and held appropriately.

• In the worst case where a new key is required for every block of
data, the pipelined architecture would have a throughput equivalent to that
of the iterative architecture and would be a massive waste of space.
• The term fully pipelined refers to an architecture for the key expansion
that runs in parallel to the round transformation pipeline, where
corresponding stages in the pipeline provide each other with the exact
information at just the right time.

• In other words, the round key for any particular stage and any particular
data block is valid for only one clock cycle and is used by the
corresponding round at that time. This occurs in parallel for every pipeline
stage.

• Thus, a unique key may be used for potentially every block of data, with
no penalization in terms of latency or wait states.

• The maximum throughput of the round transformation pipeline is


always achieved independent of the topology of the key set.
• A single iteration through the Key Expansion function (four
32-bit word expansion of the key) would happen fully synchronous
with the round previous to the round that the key being generated
would be used.

• Also, the latency for the Key Expansion block would have to
maintain a clock latency equal to that of the Round block, typically
equal to 1–4 clocks.

• For the round key at any arbitrary key expansion block to arrive at its
corresponding Round block appropriately, and on potentially every
clock pulse, the timing must be very precise.

• Specifically, each key expansion block must generate a round key in


exactly the same number of clock cycles that the Round block can
generate its corresponding data. Also, the latency must be such that
each key is valid when presented to the add-round-key sub-block.
• To handle these requirements, each key expansion block is divided into
four incremental expansion blocks.

• Each incremental expansion block generates a single word


(128/4 = 32 bits) for the key. This is as shown in the figure below.

.
•As mentioned above, each Key-Exp1 block generates a
single word (32 bits) of the expanded key. The stages to
add pipelining is shown in the below figure.

• The S-box can be implemented as a synchronous 8x


256 ROM, and to preserve latency timing accuracy, a
pipeline stage must also be added to the R-CON
calculation.
• To ensure that the latency timing between the key pipeline and the
data propagation pipeline is accurate, the keys must be generated one
clock cycle earlier than the round data is completed. This is because the
round key is necessary for the add-round-key block that clocks the XOR
operation into its final register.

• Clock 4 of the key expansion block must be synchronous with clock 3


of the corresponding Round block. This is handled by the initial key
addition at the beginning of the key expansion process. This is shown in
Figure below.
• LUTs: This represents the logic utilization that the AES core
consumes inside the FPGA.

• ASIC gates: This is the number of logic gates that is


consumed by the AES core in an ASIC.

• Best possible throughput: This is the maximum number of


data bits that can be processed per second in the best-case
scenario. “Best case” refers to the situation where there is the
least amount of penalty delay due to expanding new keys.

• Worst-case throughput: “Worst case” here refers to the


situation where there is the greatest amount of penalty delay
due to expanding new keys. This situation arises when every
data block has a unique key.

You might also like