وزارة التعليم العالي والبحث
العلمي
الجامعة
كلية
اسم الطالبة-:
الجامعة-:
القسم-:
المرحلة-:
الشعبة-:
CPU 1
A central processing unit (CPU) is the electronic circuitry within
a computer that carries out the instructions of a computer program by
performing the basic arithmetic, logical, control and input/output (I/O)
operations specified by the instructions. The computer industry has used the
term "central processing unit" at least since the early 1960s.Traditionally,
the term "CPU" refers to a processor, more specifically to its processing unit
and control unit (CU), distinguishing these core elements of a computer
from external components such as main memory and I/O circuitry.
The form, design, and implementation of CPUs have changed over the
course of their history, but their fundamental operation remains almost
unchanged. Principal components of a CPU include the arithmetic logic
unit (ALU) that performs arithmetic and logic operations, processor
registers that supply operands to the ALU and store the results of ALU
operations, and a control unit that orchestrates the fetching (from memory)
and execution of instructions by directing the coordinated operations of the
ALU, registers and other components.
Most modern CPUs are microprocessors, meaning they are contained on a
single integrated circuit (IC) chip. An IC that contains a CPU may also contain
memory, peripheral interfaces, and other components of a computer; such
integrated devices are variously called microcontrollers or systems on a
chip (SoC). Some computers employ a multi-core processor, which is a single
chip containing two or more CPUs called "cores"; in that context, one can
2
speak of such single chips as "sockets". Array processors or vector
processors have multiple processors that operate in parallel, with no unit
considered central. There also exists the concept of virtual CPUs which are
an abstraction of dynamical aggregated computational resources.
Transistor CPUs
The design complexity of CPUs increased as various technologies facilitated
building smaller and more reliable electronic devices. The first such
improvement came with the advent of the transistor. Transistorized CPUs
during the 1950s and 1960s no longer had to be built out of bulky,
unreliable, and fragile switching elements like vacuum tubes and relays.
With this improvement more complex and reliable CPUs were built onto one
or several printed circuit boardscontaining discrete (individual) components.
In 1964, IBM introduced its IBM System/360 computer architecture that was
used in a series of computers capable of running the same programs with
different speed and performance. This was significant at a time when most
electronic computers were incompatible with one another, even those made
by the same manufacturer. To facilitate this improvement, IBM utilized the
concept of a microprogram (often called "microcode"), which still sees
widespread usage in modern CPUs. The System/360 architecture was so
popular that it dominated the mainframe computer market for decades and
left a legacy that is still continued by similar modern computers like the
IBM zSeries. In 1965, Digital Equipment Corporation (DEC) introduced
3
another influential computer aimed at the scientific and research markets,
the PDP-8.
Fujitsu board with SPARC64 VIIIfx processors
Transistor-based computers had several distinct advantages over their
predecessors. Aside from facilitating increased reliability and lower power
consumption, transistors also allowed CPUs to operate at much higher
speeds because of the short switching time of a transistor in comparison to
a tube or relay.The increased reliability and dramatically increased speed of
the switching elements (which were almost exclusively transistors by this
time), CPU clock rates in the tens of megahertz were easily obtained during
this period. Additionally while discrete transistor and IC CPUs were in heavy
usage, new high-performance designs like SIMD (Single Instruction Multiple
Data) vector processors began to appear. These early experimental designs
later gave rise to the era of specialized supercomputers like those made
by Cray Inc and Fujitsu Ltd.
4
Small-scale integration CPUs
CPU, core memory, and external bus interface of a DEC PDP-8/I, made of medium-scale integrated circuits
During this period, a method of manufacturing many interconnected
transistors in a compact space was developed. The integrated circuit (IC)
allowed a large number of transistors to be manufactured on a
single semiconductor-based die, or "chip". At first, only very basic non-
specialized digital circuits such as NOR gates were miniaturized into ICs.
CPUs based on these "building block" ICs are generally referred to as "small-
scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo
guidance computer, usually contained up to a few dozen transistors. To
build an entire CPU out of SSI ICs required thousands of individual chips, but
still consumed much less space and power than earlier discrete transistor
designs.
IBM's System/370, follow-on to the System/360, used SSI ICs rather
than Solid Logic Technology discrete-transistor modules. DEC's PDP-8/I and
KI10 PDP-10 also switched from the individual transistors used by the PDP-8
and PDP-10 to SSI ICs, and their extremely popular PDP-11 line was originally
5
built with SSI ICs but was eventually implemented with LSI components once
these became practical.
Large-scale integration CPUs
Lee Boysel published influential articles, including a 1967 "manifesto", which
described how to build the equivalent of a 32-bit mainframe computer from
a relatively small number of large-scale integration circuits (LSI). At the time,
the only way to build LSI chips, which are chips with a hundred or more
gates, was to build them using a MOS process (i.e., PMOS logic, NMOS logic,
or CMOS logic). However, some companies continued to build processors
out of bipolar chips because bipolar junction transistors were so much faster
than MOS chips; for example, Datapoint built processors out of transistor–
transistor logic (TTL) chips until the early 1980s. At the time, MOS ICs were
so slow that they were considered useful only in a few niche applications
that required low power.
As the microelectronic technology advanced, an increasing number of
transistors were placed on ICs, decreasing the number of individual ICs
needed for a complete CPU. MSI and LSI ICs increased transistor counts to
hundreds, and then thousands. By 1968, the number of ICs required to build
a complete CPU had been reduced to 24 ICs of eight different types, with
each IC containing roughly 1000 MOSFETs. In stark contrast with its SSI and
MSI predecessors, the first LSI implementation of the PDP-11 contained a
CPU composed of only four LSI integrated circuits.
6
Microprocessors
Since the introduction of the first commercially available microprocessor,
the Intel 4004 in 1970, and the first widely used microprocessor, the Intel
8080 in 1974, this class of CPUs has almost completely overtaken all other
central processing unit implementation methods. Mainframe and
minicomputer manufacturers of the time launched proprietary IC
development programs to upgrade their older computer architectures, and
eventually produced instruction setcompatible microprocessors that were
backward-compatible with their older hardware and software. Combined
with the advent and eventual success of the ubiquitous personal computer,
the term CPU is now applied almost exclusively to microprocessors. Several
CPUs (denoted cores) can be combined in a single processing chip.
Operation
The fundamental operation of most CPUs, regardless of the physical form
they take, is to execute a sequence of stored instructions that is called a
program. The instructions to be executed are kept in some kind of computer
memory. Nearly all CPUs follow the fetch, decode and execute steps in their
operation, which are collectively known as the instruction cycle.
After the execution of an instruction, the entire process repeats, with the
next instruction cycle normally fetching the next-in-sequence instruction
because of the incremented value in the program counter. If a jump
instruction was executed, the program counter will be modified to contain
7
the address of the instruction that was jumped to and program execution
continues normally. In more complex CPUs, multiple instructions can be
fetched, decoded, and executed simultaneously. This section describes what
is generally referred to as the "classic RISC pipeline", which is quite common
among the simple CPUs used in many electronic devices (often called
microcontroller). It largely ignores the important role of CPU cache, and
therefore the access stage of the pipeline.
Some instructions manipulate the program counter rather than producing
result data directly; such instructions are generally called "jumps" and
facilitate program behavior like loops, conditional program execution
(through the use of a conditional jump), and existence of functions. In some
processors, some other instructions change the state of bits in a "flags"
register. These flags can be used to influence how a program behaves, since
they often indicate the outcome of various operations. For example, in such
processors a "compare" instruction evaluates two values and sets or clears
bits in the flags register to indicate which one is greater or whether they are
equal; one of these flags could then be used by a later jump instruction to
determine program flow.
Fetch
The first step, fetch, involves retrieving an instruction (which is represented
by a number or sequence of numbers) from program memory. The
instruction's location (address) in program memory is determined by a
8
program counter (PC), which stores a number that identifies the address of
the next instruction to be fetched. After an instruction is fetched, the PC is
incremented by the length of the instruction so that it will contain the
address of the next instruction in the sequence. Often, the instruction to be
fetched must be retrieved from relatively slow memory, causing the CPU to
stall while waiting for the instruction to be returned. This issue is largely
addressed in modern processors by caches and pipeline architectures (see
below).
Decode
The instruction that the CPU fetches from memory determines what the
CPU will do. In the decode step, performed by the circuitry known as
the instruction decoder, the instruction is converted into signals that control
other parts of the CPU.
The way in which the instruction is interpreted is defined by the CPU's
instruction set architecture (ISA). Often, one group of bits (that is, a "field")
within the instruction, called the opcode, indicates which operation is to be
performed, while the remaining fields usually provide supplemental
information required for the operation, such as the operands. Those
operands may be specified as a constant value (called an immediate value),
or as the location of a value that may be a processor register or a memory
address, as determined by some addressing mode.
9
In some CPU designs the instruction decoder is implemented as a hardwired,
unchangeable circuit. In others, a microprogram is used to translate
instructions into sets of CPU configuration signals that are applied
sequentially over multiple clock pulses. In some cases the memory that
stores the microprogram is rewritable, making it possible to change the way
in which the CPU decodes instructions.
Execute
After the fetch and decode steps, the execute step is performed. Depending
on the CPU architecture, this may consist of a single action or a sequence of
actions. During each action, various parts of the CPU are electrically
connected so they can perform all or part of the desired operation and then
the action is completed, typically in response to a clock pulse. Very often the
results are written to an internal CPU register for quick access by subsequent
instructions. In other cases results may be written to slower, but less
expensive and higher capacity main memory.
For example, if an addition instruction is to be executed, the arithmetic logic
unit (ALU) inputs are connected to a pair of operand sources (numbers to be
summed), the ALU is configured to perform an addition operation so that
the sum of its operand inputs will appear at its output, and the ALU output is
connected to storage (e.g., a register or memory) that will receive the sum.
When the clock pulse occurs, the sum will be transferred to storage and, if
10
the resulting sum is too large (i.e., it is larger than the ALU's output word
size), an arithmetic overflow flag will be set.
References
1. ^ Weik, Martin H. (1955). "A Survey of Domestic Electronic Digital
Computing Systems". Ballistic Research Laboratory.
2. ^ Jump up to:a b Weik, Martin H. (1961). "A Third Survey of Domestic
Electronic Digital Computing Systems". Ballistic Research Laboratory.
3. ^ Kuck, David (1978). Computers and Computations, Vol 1. John Wiley &
Sons, Inc. p. 12. ISBN 978-0471027164.
4. ^ Jump up to:a b Thomas Willhalm; Roman Dementiev; Patrick Fay
(December 18, 2014). "Intel Performance Counter Monitor – A better
way to measure CPU utilization". software.intel.com. Retrieved February
17, 2015.
5. ^ Liebowitz, Kusek, Spies, Matt, Christopher, Rynardt (2014). VMware
vSphere Performance: Designing CPU, Memory, Storage, and
Networking for Performance-Intensive Workloads. Wiley.
p. 68. ISBN 978-1-118-00819-5.
11
12