0% found this document useful (0 votes)
24 views13 pages

2 - Internal CPU Organization and Implementation

This module covers the internal organization and implementation of the CPU, highlighting its central role in computer systems and detailing components such as the ALU, Control Unit, and registers. It also compares CISC and RISC architectures and discusses CPU pipelining, emphasizing the importance of these elements in efficient computing. Learning outcomes include explaining CPU functions, analyzing internal components, evaluating architectural differences, and understanding pipelining challenges.

Uploaded by

M Ola Son
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

2 - Internal CPU Organization and Implementation

This module covers the internal organization and implementation of the CPU, highlighting its central role in computer systems and detailing components such as the ALU, Control Unit, and registers. It also compares CISC and RISC architectures and discusses CPU pipelining, emphasizing the importance of these elements in efficient computing. Learning outcomes include explaining CPU functions, analyzing internal components, evaluating architectural differences, and understanding pipelining challenges.

Uploaded by

M Ola Son
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 2

2 Internal CPU organisation and implementation

Module outline
1.1 Overview of CPU and its central role in a computer system
1.2 Components of Internal CPU Organization
1.3 Comparison of Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing
(RISC) architectures
1.4 CPU Pipelining

Introduction
In this exploration of the Internal CPU Organization and Implementation, we embark on a comprehensive
journey into the intricate architecture that constitutes the brain of computing systems. The introductory segment
will unveil the "Overview of CPU and its Central Role in a Computer System," shedding light on the pivotal role
the Central Processing Unit (CPU) plays in the execution of instructions and the overall functionality of a
computer. As we progress, the focus will shift towards the "Components of Internal CPU Organization," delving
into the core elements such as the Arithmetic and Logic Unit (ALU), Control Unit, and various registers,
unraveling their functions and their collaborative role in information processing. The exploration then extends to
a critical analysis of the "Comparison of Complex Instruction Set Computing (CISC) and Reduced Instruction
Set Computing (RISC) Architectures," elucidating the design principles that influence CPU organization and
performance. Finally, attention turns to the intricate technique of "CPU Pipelining," examining the benefits and
challenges associated with this approach to streamline instruction execution. By navigating through these sub-
topics, students will gain not only a fundamental understanding of the CPU's internal organization but also
insights into the strategic choices and trade-offs inherent in the implementation of efficient and high-performance
computing architectures. Join us in this educational odyssey as we unravel the complexities of internal CPU
organization and its nuanced implementation in the realm of computer architecture.

Module learning outcomes


Upon completing this module, you should be able to:

MLO 1. Explain the fundamental role of the CPU in a computer system.

MLO 2. Analyze the components of internal CPU organization to articulate the functions of the ALU, Control
Unit, and registers.

MLO 3. Evaluate the differences between Complex Instruction Set Computing (CISC) and Reduced Instruction
Set Computing (RISC) architectures demonstrating a critical understanding of their implications for CPU
organization and overall system performance.

MLO 4. Analyze the benefits and challenges associated with CPU pipelining
2.1 Overview of CPU and its central role in a computer system
Central processing unit (CPU)—also called a central processor or main processor—is the most important
processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as
arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external
components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing
units (GPUs).

The form, design, and implementation of CPUs have changed over time, but their fundamental operation
remains almost unchanged. Principal components of a CPU include the arithmetic–logic unit (ALU) that performs
arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of
ALU operations, and a control unit that orchestrates the fetching (from memory), decoding and execution (of
instructions) by directing the coordinated operations of the ALU, registers, and other components. Modern CPUs
devote a lot of semiconductor area to caches, instruction-level parallelism and privileged modes to support
operating systems and virtualization.

Most modern CPUs are implemented on integrated circuit (IC) microprocessors, with one or more CPUs on a
single IC chip. Microprocessor chips with multiple CPUs are multi-core processors. The individual physical
CPUs, processor cores, can also be multithreaded to support CPU-level multithreading. An IC that contains a
CPU may also contain memory, peripheral interfaces, and other components of a computer; such integrated
devices are variously called microcontrollers or systems on a chip (SoC).

The CPU, being the core of a computer, plays a critical role in this process. One can't build or understand
computer systems without a thorough knowledge of the CPU and its components. A well-designed CPU ensures
that a computer operates efficiently and effectively, while poor design can significantly impact the system's
performance.

The Control Unit (CU), a vital component of the CPU, plays a critical role in the processing and execution of
program instructions within the computer system. The CU is responsible for various essential functions,
including the following:

i. Instruction Fetching: The CU fetches program instructions from main memory and stores them in an
Instruction Register (IR) within the CPU.
ii. Instruction Decoding: Each instruction comprises an operation code (opcode) and operand(s). The
CU analyses and decodes the opcode to determine the specific operation required and identifies the
corresponding operand(s).
iii. Instruction Execution: The CU directs the appropriate CPU component(s) to execute the instruction,
either through the Arithmetic Logic Unit (ALU) for mathematical and logical operations or through
other components for specific tasks, such as memory access or input/output operations.
iv. Sequencing and Control: The CU ensures that instructions are executed in the correct sequence and
manages control signals among various CPU components, maintaining proper synchronisation and
communication between them.
v. Error Handling: The CU is also implicated in error handling by ensuring that instructions are executed
correctly and identifying any issues that may arise during processing.
The CU, being the central command centre in the CPU, relies on a series of control signals, micro-operations,
and a control storage to ensure the precise execution of these functions. It is through the CU's meticulous
orchestration that enables the CPU to process instructions swiftly and efficiently.

2.1.1 CPU operation


The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence
of stored instructions that is called a program. The instructions to be executed are kept in some kind of computer
memory. Nearly all CPUs follow the fetch, decode and execute steps in their operation, which are collectively
known as the instruction cycle.

After the execution of an instruction, the entire process repeats, with the next instruction cycle normally fetching
the next-in-sequence instruction because of the incremented value in the program counter. If a jump instruction
was executed, the program counter will be modified to contain the address of the instruction that was jumped to
and program execution continues normally. In more complex CPUs, multiple instructions can be fetched,
decoded and executed simultaneously. This section describes what is generally referred to as the "classic RISC
pipeline", which is quite common among the simple CPUs used in many electronic devices (often called
microcontrollers). It largely ignores the important role of CPU cache, and therefore the access stage of the
pipeline.

Some instructions manipulate the program counter rather than producing result data directly; such instructions
are generally called "jumps" and facilitate program behavior like loops, conditional program execution (through
the use of a conditional jump), and existence of functions. In some processors, some other instructions change
the state of bits in a "flags" register. These flags can be used to influence how a program behaves, since they
often indicate the outcome of various operations. For example, in such processors a "compare" instruction
evaluates two values and sets or clears bits in the flags register to indicate which one is greater or whether they
are equal; one of these flags could then be used by a later jump instruction to determine program flow.

Fetch
Fetch involves retrieving an instruction (which is represented by a number or sequence of numbers) from
program memory. The instruction's location (address) in program memory is determined by the program counter
(PC; called the "instruction pointer" in Intel x86 microprocessors), which stores a number that identifies the
address of the next instruction to be fetched. After an instruction is fetched, the PC is incremented by the length
of the instruction so that it will contain the address of the next instruction in the sequence. Often, the instruction
to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the
instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline
architectures (see below).

Decode
The instruction that the CPU fetches from memory determines what the CPU will do. In the decode step,
performed by binary decoder circuitry known as the instruction decoder, the instruction is converted into signals
that control other parts of the CPU.

The way in which the instruction is interpreted is defined by the CPU's instruction set architecture (ISA). Often,
one group of bits (that is, a "field") within the instruction, called the opcode, indicates which operation is to be
performed, while the remaining fields usually provide supplemental information required for the operation, such
as the operands. Those operands may be specified as a constant value (called an immediate value), or as the
location of a value that may be a processor register or a memory address, as determined by some addressing
mode. In some CPU designs the instruction decoder is implemented as a hardwired, unchangeable binary
decoder circuit. In others, a microprogram is used to translate instructions into sets of CPU configuration signals
that are applied sequentially over multiple clock pulses. In some cases the memory that stores the microprogram
is rewritable, making it possible to change the way in which the CPU decodes instructions.

Execute
After the fetch and decode steps, the execute step is performed. Depending on the CPU architecture, this may
consist of a single action or a sequence of actions. During each action, control signals electrically enable or
disable various parts of the CPU so they can perform all or part of the desired operation. The action is then
completed, typically in response to a clock pulse. Very often the results are written to an internal CPU register
for quick access by subsequent instructions. In other cases results may be written to slower, but less expensive
and higher capacity main memory.

For example, if an instruction that performs addition is to be executed, registers containing operands (numbers
to be summed) are activated, as are the parts of the arithmetic logic unit (ALU) that perform addition. When the
clock pulse occurs, the operands flow from the source registers into the ALU, and the sum appears at its output.
On subsequent clock pulses, other components are enabled (and disabled) to move the output (the sum of the
operation) to storage (e.g., a register or memory). If the resulting sum is too large (i.e., it is larger than the ALU's
output word size), an arithmetic overflow flag will be set, influencing the next operation.

2.2 Components of Internal CPU Organization


The general components upon which the central processing unit is built include:

2.2.1 Bus
A bus is a bundle of wires grouped together to serve a single purpose. The main purpose of the bus is to
transfer data from one device to another. The processor's interface to the bus includes connections used to
pass data, connections to represent the address with which the processor is interested, and control lines
to manage and synchronize the transaction. The three major buses are Data, Address and Control buses.
There are internal buses that the processor uses to move data, instructions, configuration, and status
between its subsystems.

The Data Bus provides a path for moving data among system modules. The data bus may consist of 32,
64, 128, or even more separate lines, the number of lines being referred to as the width of the data bus.
Because each line can carry only 1 bit at a time, the number of lines determines how many bits can be
transferred at a time. The width of the data bus is a key factor in determining overall system performance.
A narrower bus width means that it will take more time to communicate a quantity of data as compared to a
wider bus. For example, if the data bus is 32 bits wide and each instruction is 64 bits long, then the
processor must access the memory module twice during each instruction cycle.

The Address Bus is used to designate the source or destination of the data on the data bus. For example,
if the processor wishes to read a word (8, 16, or 32 bits) of data from memory, it puts the address of the
desired word on the address lines. Clearly, the width of the address bus determines the maximum possible
memory capacity of the system. Address space refers to the maximum amount of memory and I/O that a
microprocessor can directly address.

If a microprocessor has a 16-bit address bus, it can address up to 216 = 65,536 bytes. Therefore, it hasa 64 kB
address space. i.e

1byte = 8 bits….
1024bytes =>1kB
65,536bytes=>64kB
Furthermore, the address lines are generally also used to address I/O ports. Note that the address bus is
unidirectional (the microprocessor asserts requested addresses to the various devices), and the data bus is
bidirectional (the microprocessor asserts data on a write and the devices assert data on reads).

The Control Bus is used to control the access to and the use of the data and address lines. Because the
data and address lines are shared by all components, there must be a means of controlling their use.
Control signals transmit both command and timing information among system modules. Timing signals
indicate the validity of data and address information. Command signals specify operations to be performed.
Typical control lines include:

Memory write: Causes data on the bus to be written into the addressed location

Memory read: Causes data from the addressed location to be placed on the bus

I/O write: Causes data on the bus to be output to the addressed I/O port

I/O read: Causes data from the addressed I/O port to be placed on the bus

Transfer ACK: Indicates that data have been accepted from or placed on the bus

Bus request: Indicates that a module needs to gain control of the bus

Bus grant: Indicates that a requesting module has been granted control of the bus

Interrupt request: Indicates that an interrupt is pending

Interrupt ACK: Acknowledges that the pending interrupt has been recognized

Clock: Is used to synchronize operations

Reset: Initializes all modules

2.2.2 Registers
Registers are temporary storage locations in the CPU. A register stores a binary value using a group of
latches. Although variables and pointers used in a program are all stored in memory, they are moved to
registers during periods in which they are the focus of operation. This is so that they can be manipulated
quickly. Once the processor shifts its focus, it stores the values it doesn't need any longer back in memory.
Registers may be used for several operations. Discussion on types and usage of registers will follow in
Module III of this document.

2.2.3 Buffers
A processor does not operate in isolation. Typically there are multiple processors supporting the operation
of the main processor. These include video processors, the keyboard and mouse interface processor, and the
processors providing data from hard drives and CDROMs. There are also processors to control
communication interfaces such as USB, and Ethernet networks. These processors all operate
independently, and therefore one may finish an operation before a second processor is ready to receive
the results.

If one processor is faster than another or if one processor is tied up with a process prohibiting it fromreceiving
data from a second process, then there needs to be a mechanism in place so that data is not lost. This
mechanism takes the form of a block of memory that can hold data until it is ready to be picked up. This block
of memory is called a buffer. The figure 2 below presents the basic block diagram of a system that
incorporates a buffer.

Instead of passing data to processor B, Processor B reads data in buffer, processor A stores data from the
buffer

Buffer as needed.
"memory queue"

Processor Processor
Effects of
unbalanced throughput are eased with buffer
Figure2:Block Diagram of a System Incorporating a Buffer

The concept of buffers is presented here because the internal structure of a processor often relies on buffers
to store data while waiting for an external device to become available.

2.2.4 The Stack


During the course of normal operation, there will be a number of times when the processor needs touse a temporary
memory, a place where it can store a number for a while until it is ready to use it again. For example, every processor
has a finite number of registers. If an application needs more registers than are available, the register values that are not
needed immediately can be stored in this temporary memory. When a processor needs to jump to a subroutine or
function, it needs to remember the instruction it jumped from so that it can pick back up where it left off when the
subroutine is completed. The return address is stored in this temporary memory. The stack is a block of memory locations
reserved to function as temporary memory. It operates much like the stack of plates at the start of a restaurant buffet
line. When a plate is put on top of an existing stack of plates, the plate that was on top is now hidden, one position lower
in the stack. It is not accessible until the top plate is removed. There are two main operations that the processor can
perform on the stack: it can either store the value of a register to the top of the stack or remove the top piece of data
from the stack and place it in a register. Storing data to the stack is referred to as "pushing" while removing the top piece
of data is called "popping". The LIFO nature of the stack makes it so that applications must remove data items in the
opposite order from which they were placed on the stack. For example, assume that a processor needs to store values
from registers A, B, and C onto the stack. If it pushes register A first, B second, and C last, then to restore the registers
it must pull in order C, then B, then A.

Assume registers A, B, and C of a processor contain 25, 83, and 74 respectively. If the processor pushes them onto the
stack in the order A, then B, then C then pulls them off the stack in the order B, then A, then C, what values do the
registers contain afterwards? The solution is explained as follows. First, let's see what the stack looks like after the
values from registers A, B, and C have been pushed. The data from register A is pushed first placing it at the bottom
of the stack of three data items. B is pushed next followed by C which sits at the top of the stack. In the stack, there is
no reference identifying which register each piece of data came from.
When the values are pulled from the stack, B is pulled first and it receives the value from the top of the stack,
i.e., 74. Next, A is pulled. Since the 74 was removed and placed in B, A gets the next piece of data, 83.
Last, 25 is placed in register C.

2.2.5 I/O Ports


Input/output ports or I/O ports refer to any connections that exist between the processor and its external
devices. A USB printer or scanner, for example, is connected to the computer system throughan I/O port.
The computer can issue commands and send data to be printed through this port or receive the device's status
or scanned images. Some I/O devices are connected directly to the memory bus and act just like memory
devices. Sending data to the port is done by storing data to a memory address and retrieving data from the
port is done by reading from a memory address. If the device is incorporated into the processor, then
communication with the port is done by reading and writing to registers. This is sometimes the case for simple
serial and parallel interfaces such as a printer port or keyboard and mouse interface.

2.3 Comparison of Complex Instruction Set Computing (CISC) and Reduced Instruction Set
Computing (RISC) architectures
One of the key features used to categorize a microprocessor is whether it supports reduced instruction set
computing (RISC) or complex instruction set computing (CISC). The distinction is how complex individual
instructions are the arrangement that exist for the same basic instruction. In practical terms, this distinction
directly relates to the complexity of a microprocessor’s instruction decoding logic; a more complex instruction
set requires more complex decoding logic.

RISC stands for Reduced Instruction Set Computer. To execute each instruction, if there is separate electronic
circuitry in the control unit, which produces all the necessary signals, this approach of the design of the control
section of the processor is called RISC design. It is also called hard-wired approach.

Examples of RISC processors:

IBM RS6000, MC88100, DEC’s Alpha 21064, 21164 and 21264 processors

Features of RISC Processors: The standard features of RISC processors are listed below:

i. RISC processors use a small and limited number of instructions.


ii. RISC machines mostly uses hardwired control unit.
iii. RISC processors consume less power and are having high performance.
iv. Each instruction is very simple and consistent.
v. RISC processors uses simple addressing modes.
vi. RISC instruction is of uniform fixed length.
CISC stands for Complex Instruction Set Computer. If the control unit contains a number of microelectronic
circuitry to generate a set of control signals and each micro-circuitry is activated by a microcode, this design
approach is called CISC design.

Examples of CISC processors are:

Intel 386, 486, Pentium, Pentium Pro, Pentium II, Pentium III, Motorola’s 68000, 68020, 68040, etc.

Features of CISC Processors: The standard features of CISC processors are listed below:

i. CISC chips have a large amount of different and complex instructions.


ii. CISC machines generally make use of complex addressing modes.
iii. Different machine programs can be executed on CISC machine.
iv. CISC machines uses micro-program control unit.
v. CISC processors are having limited number of registers.

2.3.1 Differences between CISC and RISC


CISC Instructions and addressing modes are complex hence complex instruction decode logic while in RISC,
Simple instruction decode logic since there are few instructions to decode hence few operand complexity.

In CISC, Processor are complex hence increasing difficulty to support clock rate because computation are
complex within a single clock period while RISCs have single instructions, smaller number of operations and
uses simpler number of instruction.

In a single instruction in CISC, many operations are embedded.eg. fetch, add, increment, store operations all in
one instruction while RISC has separate instruction for each set of operation, hence reduce complexity by
speeding up instructions that are frequently used.

Not all instructions in CISC microprocessors are used with the same frequency. Only some (core set) are called
most of the time. RISC, removes the instructions that are not frequently used so as to simplify the
microprocessor control logic hence system can perform faster, faster execution of programs, leading to
improved throughput for the commonly used instruction and increase overall performance.

In CISC, the instructions that are used less often impose a burden on the entire system because there is
increase in permutation of decode logic in a given clock cycle. RISC Reduces permutation of the decode logic
since instructions are reduced and only few memory R/W operation occurs.
IFT212 – Computer Architecture and Organization

2.4 CPU Pipelining


Microprocessor designers, in an attempt to squeeze every last bit of performance from their
designs, try to make sure that every circuit of the CPU is doing something productive at all
times. The most common application of this practice applies to the execution of instructions.
It is based on the fact that there are steps to the execution of an instruction, each of which
uses entirely different components of the CPU.

Assuming that the execution of a machine code instruction can be broken into three stages:

i. Fetch – get the next instruction to execute from its location in memory.

ii. Decode – determine which circuits to energize in order to execute the fetched instruction.

iii. Execute – use the ALU and the processor to memory interface to execute the instruction.

By comparing the definitions of the different components of the CPU shown with the needs of
these three different stages or cycles, it can be seen that three different circuits are used for
these three tasks.

i. The internal data bus and the instruction pointer perform the fetch.

ii. The instruction decoder performs the decode cycle.

iii. The ALU and CPU registers are responsible for the execute cycle.

Once the logic that controls the internal data bus is done fetching the current instruction, what's
to keep it from fetching the next instruction? It may have to guess what the next instruction is,
but if it guesses right, then a new instruction will be available to the instruction decoder
immediately after it finishes decoding the previous one. Once the instruction decoder has
finished telling the ALU what to do to execute the current instruction, what's to keep it from
decoding the next instruction while it's waiting for the ALU to finish? If the internal data bus
logic guessed right about what the next instruction is, then the ALU won't have to wait for a
fetch and subsequent decode in order to execute the next instruction.

This process of creating a queue of fetched, decoded, and executed instructions is called
pipelining, and it is a common method for improving the performance of a processor.
Therefore, a fast processor can be built by making the rate of execution of instruction fast.
This can be achieved by increasing the number of instructions that can be executed
simultaneously. Some CPUs break the fetch-decode execute cycle down into smaller steps,
where some of these smaller steps can be performed in parallel. This overlapping speeds up
execution. i.e. The CPU fetches and executes simultaneously.

This method, used by all current CPUs, is known as pipelining. This is a process whereby the
CPU fetches and executes at the same time, achieved by splitting the microprocessor into
two; (1) bus interface unit (BIU) and (2) execution unit (EU). It is a way of improving the
processing power of the CPU. The BIU access the memory and peripherals while the EU
executes instructions. The idea of pipelining is to have more than one instruction being
processed by the processor at the same time.

Module 2: Internal CPU organisation and implementation 1


ABC123 – Course Title (double-click to replace)

Figure 3 shows the time-line sequence of the execution of five instructions on a non- pipelined
processor. Notice how a full fetch - decode-execute cycle must be performed on instruction 1
before instruction 2 can be fetched. This sequential execution of instructions allows for a very
simple CPU hardware, but it leaves each portion of the CPU idle for 2 out of every 3 cycles.
During the fetch cycle, the instruction decoder and ALU are idle; during the decode cycle, the
bus interface and the ALU are idle; and during the execute cycle, the bus interface and the
instruction decoder are idle.

Figure 4 on the other hand shows the time-line sequence for the execution of five instructions
using a pipelined processor. Once the bus interface has fetched instruction 1 and passed it to
the instruction decoder for decoding, it can begin its fetch of instruction 2. Notice that the first
cycle in the figure only has the fetch operation. The second cycle has both the fetch and the
decode cycle happening at the same time. By the third cycle, all three operations are
happening in parallel.

Figure 3: Non-Pipelined Execution of Five Instructions

Figure 4: Pipelined Execution of Five Instructions

Without pipelining, five instructions take 15 cycles to execute. In a pipelined architecture, those
same five instructions take only 7 cycles to execute, a savings of over 50 %. In general, the
number of cycles it takes for a non-pipelined architecture using three cycles to execute an
instruction is equal to three times the number of instructions.

Num. of cycles (non-pipelined) = 3 × number of instructions …(1)

2 Module 2: Internal CPU organisation and implementation


IFT212 – Computer Architecture and Organization

For the pipelined architecture, it takes two cycles to "fill the pipe" so that all three CPU
components are fully occupied. Once this occurs, then an instruction is executed once every
cycle. Therefore, the formula used to determine the number of cycles used by a pipelined
processor to execute a specific number of instructions is:

Num. of cycles (pipelined) = 2 + number of instructions ….(2)

As the number of instructions grows, the number of cycles required of a pipelined architecture
approaches 1/3 that of the non-pipelined.

Example

Compare the number of cycles required to execute 50 instructions between a non-pipelined


processor and a pipelined processor.

Solution

Using equations (1) and (2), we can determine the number of cycles necessary for both the
nonpipelined and the pipelined CPUs.

number of cycles (non-pipelined) = 3 * 50 = 150 cycles number of cycles (pipelined) = 2 + 50


= 52 cycles

By taking the difference, we see that the pipelined architecture will execute 50 instructions in
98 fewer cycles.

Module 2: Internal CPU organisation and implementation 3

You might also like