0% found this document useful (0 votes)
29 views46 pages

Module2 NOTES

Uploaded by

Harshitha PM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views46 pages

Module2 NOTES

Uploaded by

Harshitha PM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Rashtreeya Sikshana Samithi Trust's

Rashtreeya Vidyalaya Institute of Technology and


Management (RVITM), Bengaluru

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

NOTES Module2

SUBJECT: Computer Organization & ARM Microcontrollers


SUB CODE: 21EC52
SEMESTER: V
Module 2 Memory system

Module 2 RBT Level


Memory System: Basic Concepts, Semiconductor RAM Memories, Read Only Memories, L1, L2, L3
Speed, Size, and
Cost, Cache Memories – Mapping Functions, Replacement Algorithms, Performance
Considerations.
Text book 1: Chapter 5 – 5.1 to 5.4, 5.5 (5.5.1, 5.5.2), 5.6
Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete
Instruction, Multiple Bus
Organization, Hard-wired Control, Micro programmed Control. Basic concepts of pipelining,
Text book 1: Chapter7, Chapter 8 – 8.1
1. Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Computer Organization, 5th Edition, Tata McGraw
Hill, 2002.

2.1 Basic Concepts


• Maximum size of memory that can be used in any computer is determined by addressing mode.

• If MAR is k-bits long and MDR is n-bits long, then


→ Memory may contain up to 2K addressable-locations &
→ n-bits of data are transferred between the memory and processor.
• The data-transfer takes place over the processor-bus (Figure8.1).
• The processor-bus has
1) Address-Line
2) Data-line&
3) Control-Line (R/W`, MFC – Memory Function Completed).
• The Control-Line is used for coordinating data-transfer.
• The processor reads the data from the memory by
→ loading the address of the required memory-location into MAR and
→ setting the R/W` line to 1.
• The memory responds by
→ placing the data from the addressed-location onto the data-lines and
→ confirms this action by asserting MFC signal.
• Upon receipt of MFC signal, the processor loads the data from the data-lines into MDR.
• The processor writes the data into the memory-location by
→ loading the address of this location into MAR &
→ setting the R/W` line to 0.
Module 2 Memory system

• Memory Access Time: It is the time that elapses between


→ Initiation of an operation &
→ Completion of that operation.
• Memory Cycle Time: It is the minimum time delay that required between the initiation of the two
successive memory-operations.
RAM (Random Access Memory)
• In RAM, any location can be accessed for a Read/Write-operation in fixed amount of time,
Cache Memory
➢ It is a small, fast memory that is inserted between
→ Larger slower main-memory and
→ Processor.
➢ It holds the currently active segments of a program and their data.
Virtual Memory
➢ The address generated by the processor is referred to as a virtual/logical address.
➢ The virtual-address-space is mapped onto the physical-memory where data are actually
stored.
➢ The mapping-function is implemented by MMU. (MMU = memory management unit).
➢ Only the active portion of the address-space is mapped into locations in the physical-memory.
➢ The remaining virtual-addresses are mapped onto the bulk storage devices such as magnetic
disk.
➢ As the active portion of the virtual-address-space changes during program execution, the
MMU
→ changes the mapping-function &
→ transfers the data between disk and memory.
➢ During every memory-cycle, MMU determines whether the addressed-page is in the memory.
➢ If the page is in the memory, Then, the proper word is accessed and execution proceeds.
Otherwise, a page containing desired word is transferred from disk to memory
Module 2 Memory system

2.2 Semi conductor RAM Memories

2.2.1 INTERNAL ORGANIZATION OF MEMORY-CHIPS


• Memory-cells are organized in the form of array (Figure8.2).
• Each cell is capable of storing 1-bit of information.
• Each row of cells forms a memory-word.
• All cells of a row are connected to a common line called as Word-Line.
• The cells in each column are connected to Sense/Write circuit by2-bit-lines.
• The Sense/Write circuits are connected to data-input or output lines of the chip.
• During a write-operation, the sense/write circuit
→ receive input information &
→ Store input info in the cells of the selected word.

• The data-input and data-output of each Sense/Write circuit are connected to a single bidirectional
data-line.
• Data-line can be connected to a data-bus of the computer.
• Following 2 control lines are also used:
4) R/W` →Specifies the required operation.
5) CS →Chip Select input selects a given chip in the multi-chip memory-system.
Module 2 Memory system

2.2.2 STATIC MEMORIES


• Memories consist of circuits capable of retaining their state as long as power is applied are known.

• Two inverters are cross connected to form a batch (Figure8.4).


• The batch is connected to 2-bit-lines by transistors T1 and T2.
• The transistors act as switches that can be opened/closed under the control of the word-line.
• When the word-line is at ground level, the transistors are turned off and the latch retain its state.
Read Operation
• To read the state of the cell, the word-line is activated to close switches T1 and T2.
• If the cell is in state 1, the signal on bit-line b is high and the signal on the bit-line b is low.
• Thus, b and b’ are complement of each other.
• Sense/Write circuit
→ monitors the state of b & b’ and
→ sets the output accordingly.
Write Operation
• The state of the cell is set by
→ placing the appropriate value on bit-line b and its complement on b’ and
→ then activating the word-line. This forces the cell into the corresponding state.
• The required signal on the bit-lines is generated by Sense/Write circuit.

CMOS Cell
• Transistor pairs (T3, T5) and (T4, T6) form the inverters in the latch (Figure8.5).
• In state 1, the voltage at point X is high by having T5, T6 ON and T4, T5 are OFF.
• Thus, T1 and T2 returned ON (Closed), bit-line b and b’ will have high and low signals respectively.
• Advantages:
1) It has low power consumption ‘.’ the current flows in the cell only when the cell is active.
2) Static RAM’s can be accessed quickly. It access time is few nanoseconds.
• Disadvantage: SRAMs are said to be volatile memories ‘.’ their contents are lost when power is
interrupted.
Module 2 Memory system

2.2.3 ASYNCHRONOUS DRAMS


• Less expensive RAMs can be implemented if simple cells are used.
• Such cells cannot retain their state indefinitely. Hence they are called Dynamic RAM’s (DRAM).
• The information stored in a dynamic memory-cell in the form of a charge on a capacitor.
• This charge can be maintained only for tens of milliseconds.
• The contents must be periodically refreshed by restoring this capacitor charge to its full value.

• In order to store information in the cell, the transistor T is turned ‘ON’ (Figure8.6).
• The appropriate voltage is applied to the bit-line which charges the capacitor.
• After the transistor is turned off, the capacitor begins to discharge.
• Hence, info. Stored in cell can be retrieved correctly before threshold value of capacitor drops down.
• During a read-operation,
→ Transistor is turned ‘ON’
→ a sense amplifier detects whether the charge on the capacitor is above the threshold value.
➢ If (charge on capacitor) > (threshold value) →Bit-line will have logic value‘1’.
➢ If (charge on capacitor) < (threshold value) →Bit-line will set to logic value‘0’.

ASYNCHRONOUS DRAMS DESCRIPTION


• The 4 bit cells in each row are divided into 512 groups of 8 (Figure8.7).
• 21 bit address is needed to access a byte in the memory
i.e. 12 bit → To select a row
9 bit → Specify the group of 8 bits in the selected row.
A8-0 → Row-address of a byte
A20-9 → Column-address of a byte
Module 2 Memory system

Fig 8.7 Internal organization of 2M×8 dynamic memory chip.

• During Read/Write-operation,
→ row-address is applied first.
→ row-address is loaded into row-latch in response to a signal pulse on RAS input of chip.
(RAS = Row-address Strobe CAS = Column-address Strobe)
• When a Read-operation is initiated, all cells on the selected row are read and refreshed.
• Shortly after the row-address is loaded, the column-address is
→ applied to the address pins &
→ loaded into CAS.
• The information in the latch is decoded.
• The appropriate group of 8 Sense/Write circuits is selected.
R/W`=1(read-operation) →Output values of selected circuits are transferred to data-lines D0-D7.
R/W`=0(write-operation) →Information on D0-D7 are transferred to the selected circuits.
• RAS & CAS are active-low so that they cause latching of address when they change from high to low..
• To ensure that the contents of DRAMs are maintained, each row of cells is accessed periodically.
• A special memory-circuit provides the necessary control signals RAS & CAS that govern the timing.
• The processor must take into account the delay in the response of the memory.
Fast Page Mode
➢ Transferring the bytes in sequential order is achieved by applying the consecutive sequence
of column-address under the control of successive CAS signals.
➢ This scheme allows transferring a block of data at a faster rate.
➢ The block of transfer capability is called as Fast Page Mode.

2.2.4 SYNCHRONOUS DRAM


• The operations are directly synchronized with clock signal (Figure8.8).
• The address and data connections are buffered by means of registers.
• The output of each sense amplifier is connected to a latch.
• A Read-operation causes the contents of all cells in the selected row to be loaded in these latches.
Module 2 Memory system

• Data held in latches that correspond to selected columns are transferred into data-output register.
• Thus, data becoming available on the data-output pins.

• First, the row-address is latched under control of RAS signal (Figure8.9).


• The memory typically takes 2 or 3 clock cycles to activate the selected row.
• Then, the column-address is latched under the control of CAS signal.
• After a delay of one clock cycle, the first set of data bits is placed on the data-lines.
SDRAM automatically increments column-address to access next 3 sets of bits in the selected row.
Module 2 Memory system

LATENCY & BANDWIDTH


A good indication of performance is given by 2 parameters: 1) Latency 2) Bandwidth.
Latency:
It refers to the amount of time it takes to transfer a word of data to or from the memory.
For a transfer of single word, the latency provides the complete indication of memory performance.
For a block transfer, the latency denotes the time it takes to transfer the first word of data.
Bandwidth:
It is defined as the number of bits or bytes that can be transferred in one second.
Bandwidth mainly depends on the speed of access to the stored data& the number of bits that can be
accessed in parallel.

DOUBLE DATA RATE SDRAM (DDR-SDRAM)


The standard SDRAM performs all actions on the rising edge of the clock signal.
The DDR-SDRAM transfer data on both the edges (loading edge, trailing edge).
The Bandwidth of DDR-SDRAM is doubled for long burst transfer.
To make it possible to access the data at high rate, the cell array is organized into two banks.
Each bank can be accessed separately.
Consecutive words of a given block are stored in different banks.
Such interleaving of words allows simultaneous access to two words.
The two words are transferred on successive edge of the clock.

2.2.5 STRUCTURE OF LARGER MEMORIES

Static memory system


Module 2 Memory system

Dynamic Memory System


The physical implementation is done in the form of memory-modules.
If a large memory is built by placing DRAM chips directly on the Motherboard, then it will occupy large
amount of space on the board.
These packaging considerations have led to the development of larger memory units known as SIMM’s
&DIMM’s.
SIMM →Single Inline memory-module
DIMM →Dual Inline memory-module
SIMM/DIMM consists of many memory-chips on small board that plugs into a socket on motherboard.

2.2.6 MEMORY-SYSTEM considerations


MEMORY CONTROLLER
• To reduce the number of pins, the dynamic memory-chips use multiplexed-address inputs.
• The address is divided into 2parts:
1) High Order Address Bit
➢ Select a row in cell array.
➢ It is provided first and latched into memory-chips under the control of Reassigned.
2) Low Order Address Bit
➢ Selects column.

➢ They are provided on same address pins and latched using Coassignees.
• The Multiplexing of address bit is usually done by Memory Controller Circuit (Figure5.11).

• The Controller accepts a complete address & R/W` signal from the processor.
• A Request signal indicates a memory access operation is needed.
• Then, the Controller
→ forwards the row & column portions of the address to the memory.
→ generates RAS & CAS signals &
→ sends R/W` & CS signals to the memory.

REFRESH OVERHEAD
• All dynamic memories have to be refreshed.
• In DRAM, the period for refreshing all rows is 16ms whereas 64ms inSDRAM.
Eg: Given a cell array of8K(8192).
Clock cycle=4
Clock Rate=133MHZ
No of cycles to refresh all rows =8192*4
=32,768
Time needed to refresh all rows=32768/133*106
=246*10-6 sec
=0.246sec
Refresh Overhead =0.246/64
Refresh Overhead=0.0038
Module 2 Memory system

2.2.7 RAMBUS MEMORY


• The usage of wide bus inexpensive.
• Rambus developed the implementation of narrow bus.
• Rambus technology is a fast signaling method used to transfer information between chips.
• The signals consist of much smaller voltage swings around a reference voltage ref.
• The reference voltage is about2V.
• The two logical values are represented by 0.3V swings above and belowref.
• This type of signaling is generally is known as DifferentialSignalling.
• Rambus provides a complete specification for design of communication called as RambusChannel.
• Rambus memory has a clock frequency of 400MHz.
• The data are transmitted on both the edges of clock so that effective data-transfer rate is800MHZ.
• Circuitry needed to interface to Rambus channel is included on chip. Such chips are called DRAM
(RDRAM = Rambus DRAMs).
• Rambus channel has:
1) 9 Data-lines (1st-8th line ->Transfer the data, 9th line->Parity checking).
2) Control-Line&
3) Power line.
• A two channel rambus has 18 data-lines which has no separate Address-Lines.
• Communication between processor and RDRAM modules is carried out by means of packets
transmitted on the data-lines.
• There are 3 types of packets:
1) Request
2) Acknowledge&
3) Data.

2.3 Read Only Memories


• Both SRAM and DRAM chips are volatile, i.e. They lose the stored information if power is turned off.
• Many applications require non-volatile memory which retains the stored information if power is
turned off.
• For ex:
OS software has to be loaded from disk to memory i.e. it requires non-volatile memory.
• Non-volatile memory is used in embedded system.
• Since the normal operation involves only reading of stored data, a memory of this type is called ROM.
➢ At Logic value ‘0’ →Transistor(T) is connected to the ground point(P)
Transistor switch is closed & voltage on bit-line nearly drops to zero (Figure 8.11)
➢ At Logic value ‘1’ →Transistor switch is open.
The bit-line remains at high voltage.
Module 2 Memory system

• To read the state of the cell, the word-line inactivated.


• A Sense circuit at the end of the bit-line generates the proper output value.

TYPES OF ROM
• Different types of non-volatile memory are
6) PROM
7) EPROM
8) EEPROM&
9) Flash Memory (Flash Cards & Flash Drives)

PROM (PROGRAMMABLE ROM)


• PROM allows the data to be loaded by the user.
• Programmability is achieved by inserting a ‘fuse’ at point P in a Rommel.
• Before PROM is programmed, the memory contains all0’s.
• User can insert 1’s at required location by burning-out fuse using high current-pulse.
• This process is irreversible.
• Advantages:
1) It provides flexibility.
2) It is faster.
3) It is less expensive because they can be programmed directly by theuser.

EPROM (ERASABLE REPROGRAMMABLE ROM)


• EPROM allows
→ stored data to be erased and
→ new data to be loaded.
• In cell, a connection to ground is always made at ‘P’ and a special transistor issued.
• The transistor has the ability to functions
→ a normal transistor or
→ a disabled transistor that is always turned ‘off’.
• Transistor can be programmed to behave as a permanently open switch, by injecting charge in toit.
• Erasure requires dissipating the charges trapped in the transistor of memory-cells.
This can be done by exposing the chip to ultra-violet light.
• Advantages:
1) It provides flexibility during the development-phase of digital-system.
2) It is capable of retaining the stored information for a longtime.
• Disadvantages:
1) The chip must be physically removed from the circuit for reprogramming.
2) The entire contents need to be erased by Alight.

EEPROM (ELECTRICALLY ERASABLE ROM)


• Advantages:
1) It can be both programmed and erased electrically.
2) It allows the erasing of all cell contents selectively.
• Disadvantage: It requires different voltage for erasing, writing and reading the stored data.

FLASH MEMORY
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents of block.
• Prior to writing, the previous contents of the block are erased.
Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per bit.
2) It requires single power supply voltage & consumes less power.
Module 2 Memory system

• There are 2 methods for implementing larger memory: 1) Flash Cards & 2) Flash Drives
1) FlashCards
➢ One way of constructing larger module is to mount flash-chips on a small card.
➢ Such flash-card have standard interface.
➢ The card is simply plugged into a conveniently accessible slot.
➢ Memory-size of the card can be 8, 32 or64MB.
➢ Eg: A minute of music can be stored in 1MB of memory. Hence 64MB flash cards can store an
hour of music.
2) FlashDrives
➢ Larger flash memory can be developed by replacing the hard disk-drive.
➢ The flash drives are designed to fully emulate the hard disk.
➢ The flash drives are solid state electronic devices that have no movable parts.
Advantages:
1) They have shorter seek & access time which results in faster response.
2) They have low power consumption. .’. they are attractive for battery driven
application.
3) They are insensitive to vibration.
Disadvantages:
1) The capacity of flash drive (<1GB) is less than hard disk(>1GB).
2) It leads to higher cost per bit.
3) Flash memory will weaken after it has been written a number of times (typically
at least 1 million times).

2.4 Speed, Size, and Cost

• The main-memory can be built with DRAM (Figure8.14)


• Thus, SRAM’s are used in smaller units where speed is of essence.
• The Cache-memory is of 2types:
10) Primary/Processor Cache (Level1 or L1cache)
➢ It is always located on the processor-chip.
11) Secondary Cache (Level2 or L2cache)
➢ It is placed between the primary-cache and the rest of the memory.
• The memory is implemented using the dynamic components (SIMM, RIMM,DIMM).
• The access time for main-memory is about 10 times longer than the access time for L1cache.

• Fastest access is to the data held in processor registers. Registers are at the top of the memory
hierarchy.
• Relatively small amount of memory that can be implemented on the processor chip. This is
processor cache.
• Two levels of cache. Level 1 (L1) cache is on the processor chip. Level 2 (L2) cache is in between
main memory and processor.
Module 2 Memory system

• Next level is main memory, implemented as SIMMs. Much larger, but much slower than cache
memory.
• Next level is magnetic disks. Huge amount of inexpensive storage.
• Speed of memory access is critical, the idea is to bring instructions and data that will be used in the
near future as close to the processor as possible.

2.5 Cache Memories – Mapping Functions


CACHE MEMORIES
• The effectiveness of cache mechanism is based on the property of ‘Locality of Reference’.
Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during some time period
• Remainder of the program is accessed relatively infrequently (Figure8.15).
• There are 2types:
12) Temporal
➢ The recently executed instructions are likely to be executed again very soon.
13) Spatial
➢ Instructions in close proximity to recently executed instruction are also likely to be executed soon.
• If active segment of program is placed in cache-memory, then total execution time can be reduced.
• Block refers to the set of contiguous address locations of some size.
• The cache-line is used to refer to the cache-block
Module 2 Memory system

• The Cache-memory stores a reasonable number of blocks at a given time.


• This number of blocks is small compared to the total number of blocks available in main-memory.
• The correspondence between main-memory-block and the block in cache-memory is specified by a
mapping-function.
• Correspondence b/w main-memory-block & cache-memory-block is specified by mapping-function.
• Cache control hardware decides which block should be removed to create space for the new block.
• The collection of rule for making this decision is called the Replacement Algorithm.
• The cache control-circuit determines whether the requested-word currently exists in the cache.
• The write-operation is done in 2 ways: 1) Write-through protocol & 2) Write-back protocol.
Write-Through Protocol
➢ Here the cache-location and the main-memory-locations are updated simultaneously.
Write-Back Protocol
➢ This technique is to
→ Update only the cache-location &
→ Mark the cache-location with associated flag bit called Dirty/Modified Bit.
➢ The word in memory will be updated later, when the marked-block is removed from cache.
During Read-operation
• If the requested-word currently not exists in the cache, then read miss will occur.
• To overcome the read miss, Load–through/Early restart protocol issued.
Load–Through Protocol
➢ The block of words that contains the requested-word is copied from the memory into cache.
➢ After entire block is loaded into cache, the requested-word is forwarded to processor.
During write-operation
• If the requested-word not exists in the cache, then write Miss will occur.
1) If Write through Protocol is used, the information is written directly into main-memory.
2) If Write Back Protocol is used,
→ then block containing the addressed word is first brought into the cache &
→ Then the desired word in the cache is over-written with the new information.
Module 2 Memory system

2.5.1 MAPPING-FUNCTION
DIRECT MAPPING
• The block-j of the main-memory maps onto block-j modulo-128 of the cache (Figure8.16).
• When the memory-blocks 0, 128, & 256 are loaded into cache, the block is stored in cache-block 0.
Similarly, memory-blocks 1, 129, 257 are stored in cache-block1.
• The contention may arise when
1) When the cache is full.
2) When more than one memory-block is mapped onto a given cache-block position.
• The contention is resolved by
allowing the new blocks to overwrite the currently resident-block.
• Memory-address determines placement of block in the cache.

• The memory-address is divided into 3fields:


1) Low Order 4 bit field:
➢ Selects one of 16 words in block.
2) 7 bit cache-block field:
➢ 7-bits determine the cache-position in which new block must be stored.
3) 5 bit Tag field:
➢ 5-bits memory-address of block is stored in 5 tag-bits associated with cache-location.
• As execution proceeds,
5-bit tag field of memory-address is compared with tag-bits associated with cache-location.
If they match, then the desired word is in that block of the cache.
Otherwise, the block containing required word must be first read from the memory.
And then the word must be loaded into the cache.
Module 2 Memory system

ASSOCIATIVE MAPPING
• The memory-block can be placed into any cache-block position. (Figure8.17).
• 12 tag-bits will identify a memory-block when it is resolved in the cache.
• Tag-bits of an address received from processor are compared to the tag-bits of each block of cache.
• This comparison is done to see if the desired block is present.

• It gives complete freedom in choosing the cache-location.


• A new block that has to be brought into the cache has to replace an existing block if the cache isfull.
• The memory has to determine whether a given block is in the cache.
• Advantage: It is more flexible than direct mapping technique.
• Disadvantage: Its cost is high.

SET-ASSOCIATIVE MAPPING
• It is the combination of direct and associative mapping. (Figure8.18).
• The blocks of the cache are grouped into sets.
• The mapping allows a block of the main-memory to reside in any block of the specified set.
• The cache has 2 blocks per set, so the memory-blocks 0, 64, 128 .......... 4032 maps into cache set ‘0’.
• The cache can occupy either of the two block position within these.
6 bit set field
➢ Determines which set of cache contains the desired block.
6 bit tag field
➢ The tag field of the address is compared to the tags of the two blocks of these.
➢ This comparison is done to check if the desired block is present.
Module 2 Memory system

• The cache which contains 1 block per set is called direct mapping.
• A cache that has ‘k’ blocks per set is called as “k-way set associative cache‟.
• Each block contains a control-bit called a valid-bit.
• The Valid-bit indicates that whether the block contains valid-data.
• The dirty bit indicates that whether the block has been modified during its cache
residency.
Valid-bit=0 →When power is initially applied to system.
Valid-bit=1 →When the block is loaded from main-memory at first
time.
• If the main-memory-block is updated by a source & if the block in the source is
already exists in the cache, then the valid-bit will be cleared to“0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache Coherence
Problem.
• Advantages:
1) Contention problem of direct mapping is solved by having few choices for block
placement.
2) The hardware cost is decreased by reducing the size of associative search.
Module 2 Memory system

2.5.2 Replacement Algorithms


• In direct mapping method,
the position of each block is pre-determined and there is no need of replacement
strategy.
• In associative & set associative method,
The block position is not pre-determined.
If the cache is full and if new blocks are brought into the cache,
then the cache-controller must decide which of the old blocks has to be
replaced.
• When a block is to be overwritten, the block with longest time w/o being referenced is
over-written.
• This block is called Least recently Used (LRU) block & the technique is called LRU
algorithm.
• The cache-controller tracks the references to all blocks with the help of block-counter.
• Advantage: Performance of LRU is improved by randomness in deciding which block
is to be over- written.

Eg:
Consider 4 blocks/set in set associative cache.
➢ 2 bit counter can be used for each block.
➢ When a ‘hit’ occurs, then block counter=0; the counter with values originally
lower than the referenced one are incremented by 1 & all others remain
unchanged.
➢ When a ‘miss’ occurs & if the set is full, the blocks with the counter value 3 is
removed, the new block is put in its place & its counter is set to “0‟ and other
block counters are incremented by1.

2.6 Performance Considerations


• Two key factors in the commercial success are 1) performance & 2) cost.
• In other words, the best possible performance at low-cost.
• A common measure of success is called the Price Performance ratio.
• Performance depend on
→ How fast the machine instructions are brought to the processor &
→ How fast the machine instructions are executed.
• To achieve parallelism, interleaving is sued.
• Parallelism means both the slow and fast units are accessed in the same manner.

2.6.1 INTERLEAVING
• The main-memory of a computer is structured as a collection of physically separate
modules.
• Each module has its own
1) ABR (address buffer register)&
2) DBR (data buffer register).
• So, memory access operations may proceed in more than one module at the same time
(Fig5.25).
• Thus, the aggregate-rate of transmission of words to/from the main-memory can be
increased.
Module 2 Memory system

• The low-order k-bits of the memory-address select a module.


While the high-order m-bits name a location within the module.
In this way, consecutive addresses are located in successive
modules.
• Thus, any component of the system can keep several modules busy at any one time T.
• This results in both
→ Faster access to a block of data and
→ Higher average utilization of the memory-system as a whole.
• To implement the interleaved-structure, there must be 2kmodules;
Otherwise, there will be gaps of non-existent locations in the address-
space.

2.6.2 Hit Rate & Miss Penalty


• The number of hits stated as a fraction of all attempted accesses is called the Hit Rate.
• The extra time needed to bring the desired information into the cache is called the Miss
Penalty.
• High hit rates well over 0.9 are essential for high-performance computers.
• Performance is adversely affected by the actions that need to be taken when a miss
occurs.
• A performance penalty is incurred because
Of the extra time needed to bring a block of data from a slower unit to a
faster unit.
• During that period, the processor is stalled waiting for instructions or data.
• We refer to the total access time seen by the processor when a miss occurs as the miss
penalty.
• Let h be the hit rate, M the miss penalty, and C the time to access information in the
cache. Thus, the average access time experienced by the processors
tavg = hC + (1 − h)M

2.6.3 Caches on the processor chip


◾ In high performance processors 2 levels of caches are normally used.
◾ Avg access time in a system with 2 levels of caches is
T ave = h1C1+ (1-h1) h2C2+ (1-h1) (1-h2) M
Module 2 Memory system

Where
h1 is the hit rate in L1 cache
h2 is the hit rate in L2 cache
C1 is the time to access the information in L1 cache
C2 is the time to access the information in L2 cache
M is the time to access the information in the main memory

2.6.4 Other Performance Enhancements


Write buffer

◾ Write-through:
• Each write operation involves writing to the main memory.
• If the processor has to wait for the write operation to be complete, it
slows down the processor.
• Processor does not depend on the results of the write operation.
• Write buffer can be included for temporary storage of write requests.
• Processor places each write request into the buffer and continues
execution.
• If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer.
◾ Write-back:
• Block is written back to the main memory when it is replaced.
• If the processor waits for this write to complete, before reading the new
block, it is slowed down.
• Fast write buffer can hold the block to be written, and the new
Block can be read first.
Prefetching
• New data are brought into the processor when they are first needed.
• Processor has to wait before the data transfer is complete.
• Prefetch the data into the cache before they are actually needed, or a
before a Read miss occurs.
• Prefetching can be accomplished through software by including a special
instruction in the machine language of the processor.
▪ Inclusion of prefect instructions increases the length of the
programs.
• Prefetching can also be accomplished using hardware:
▪ Circuitry that attempts to discover patterns in memory references
and then prefetches according to this pattern.

Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses to the cache
until the prefetch is completed.
• A cache of this type is said to be “locked” while it services a miss.
• Cache structure which supports multiple outstanding misses is called a
lockup free cache.
• Since only one miss can be serviced at a time, a lockup free cache must
include circuits that keep track of all the outstanding misses.
• Special registers may hold the necessary
Information about these misses.
Module 2 Memory system

Problem 1:
Consider the dynamic memory cell. Assume that C = 30 fem to farads (10−15 F) and that
leakage current through the transistor is about 0.25 pico amperes (10−12 A). The voltage
across the capacitor when it is fully charged is 1.5 V. The cell must be refreshed before
this voltage drops below 0.9 V. Estimate the minimum refresh rate.
Solution:
The minimum refresh rate is given by

Therefore, each row has to be refreshed every 8 ms.

Problem 2:
Consider a main-memory built with SDRAM chips. Data are transferred in bursts & the
burst length is 8. Assume that 32 bits of data are transferred in parallel. If a 400-MHz clock
is used,how much time does it take to transfer:
(a) 32 bytes of data
(b) 64 bytes of data
What is the latency in each case?
Solution:
(a) It takes 5 + 8 = 13 clock cycles.

(b) It takes twice as long to transfer 64 bytes, because two independent 32-byte
transfers have to be made. The latency is the same, i.e. 38ns.

Problem 3:
Give a critique of the following statement: “Using a faster processor chip results in a
corresponding increase in performance of a computer even if the main-memory speed
remains the same.”
Solution:
A faster processor chip will result in increased performance, but the amount of
increase will not be directly proportional to the increase in processor speed,
because the cache miss penalty will remain the same if the main-memory speed
is not improved.

Problem 4:
A block-set-associative cache consists of a total of 64 blocks, divided into 4-block sets.
The main- memory contains 4096 blocks, each consisting of 32 words. Assuming a 32-
bit byte-addressable address-space,
(a) how many bits are there in main-memory address
(b) how many bits are there in each of the Tag, Set, and Word fields?
Solution:
(a) 4096 blocks of 128 words each require 12+7 = 19 bits for the main-memory
address.
(b) TAG field is 8 bits. SET field is 4 bits. WORD field is 7bits.

Problem 5:
The cache block size in many computers is in the range of 32 to 128 bytes. What would
Module 2 Memory system

be the main advantages and disadvantages of making the size of cache blocks larger
orsmaller?
Solution:
Larger size
➢ Fewer misses if most of the data in the block are actually used
➢ Wasteful if much of the data are not used before the cache
block isejected from the cache
Smaller size
➢ More misses

Problem6
A block set associative cache consists of a total of 64 blocks divided into 4 block sets.
The MM contains 4096 blocks each containing 128 words.
How many bits are there in MM address?
How many bits are there in each of the TAG, SET & word fields

Solution:- Number of sets = 64/4 = 16

Set bits = 4(24 = 16) Number of words = 128 Word bits = 7 bits (27 = 128)
a) MM capacity : 4096 x 128 (212 x 27 = 219)

Number of bits in memory address = 19 bits

b)

8 4 7

TAG SET WORD

TAG bits = 19 – (7+4) = 8 bits.

Problem 7
A computer system has a MM capacity of a total of 1M 16 bits words. It also has a 4K
words cache organized in the block set associative manner, with 4 blocks per set & 64
words per block. Calculate the number of bits in each of the TAG, SET & WORD fields
of MM address format.

Solution: Capacity: 1M (220 = 1M) Number of words per block = 64 Number of


blocks in cache = 4k/64 = 64 Number of sets = 64/4 = 16
Set bits = 4 (24 = 16)
Word bits = 6 bits (26 = 64) Tag bits = 20-(6+4) = 10 bits
MM address format: 10 tag bits, 6 word bits and 4 set bits

Problem 8
Suppose main memory consists of 32 blocks of 4 words each and cache consists of 4
blocks. How many bits required for main memory address? How many bits are there in
each of Tag, block/set and word fields for different mapping techniques?

Solution: main memory – 32 blocks of 4 words each=> 25 blocks X 22 words.


=> 27 words
=> memory address is 7 bits

1) Direct mapping –
No. of words = 4 = 22 =>
No. of blocks =4 = 22 =>
No. of tags = 7 – (2+2) = 3
Module 2 Memory system

2) Associative mapping –
No. of words = 4 = 22 =>
No. of tag bits = total bits – word bits=> 7-2 =5

3) Set – Associative mapping -> ( Assume that, there are 2 cache blocks in each set.)
No. of words = 4 = 22 =>
No. of sets = 4/2 = 21 =>
No. of tags = 7-(2+1) = 4
Module 2 Basic Processing Unit

MODULE 2: BASIC PROCESSING UNIT


2.1 SOME FUNDAMENTAL CONCEPTS
• To execute an instruction, processor has to perform following 3steps:
1) Fetch contents of memory-location pointed to by PC. Content of this location is an instruction
to be executed. The instructions are loaded into IR, Symbolically, this operation is written as:
IR[[PC]]
2) Increment PC by4.
PC[PC] +4
3) Carry out the actions specified by instruction (in the IR).
• The first 2 steps are referred to as Fetch Phase.
Step 3 is referred to as Execution Phase.
• The operation specified by an instruction can be carried out by performing one or more of the
following actions:
1) Read the contents of a given memory-location and load them into a register.
2) Read data from one or more registers.
3) Perform an arithmetic or logic operation and place the result into a register.
4) Store data from a register into a given memory-location.

SINGLE BUS ORGANIZATION


• ALU and all the registers are interconnected via a Single Common Bus (Figure7.1).
• Data & address lines of the external memory-bus is connected to the internal processor-bus via MDR
& MAR respectively. (MDR→ Memory Data Register, MAR →Memory Address Register).
• MDR has 2 inputs and 2 outputs. Data may be loaded
→ into MDR either from memory-bus (external) or
→ from processor-bus (internal).
• MAR’s input is connected to internal-bus;
MAR’s output is connected to external-bus.
• Instruction Decoder & Control Unit is responsible for
→ issuing the control-signals to all the units inside the processor.
→ implementing the actions specified by the instruction (loaded in the IR).
• Register R0 through R(n-1) are the Processor Registers.
The programmer can access these registers for general-purpose use.
• Only processor can access 3 registers Y, Z &Temp for temporary storage during program-execution.
The programmer cannot access these 3registers.
• In ALU, 1) ‘A’ input gets the operand from the output of the multiplexer(MUX).
2) ‘B’ input gets the operand directly from the processor-bus.
• There are 2 options provided for ‘A’ input of the ALU.
• MUX is used to select one of the 2inputs.
• MUX selects either
→ output of Y or
→ constant-value 4( which is used to increment PC content).
Module 2 Basic Processing Unit

• An instruction is executed by performing one or more of the following operations:


1) Transfer a word of data from one register to another or to the ALU.
2) Perform arithmetic or a logic operation and store the result in a register.
3) Fetch the contents of a given memory-location and load them into a register.
4) Store a word of data from a register into a given memory-location.
• Disadvantage: Only one data-word can be transferred over the bus in a clock cycle.
Solution: Provide multiple internal-paths. Multiple paths allow several data-transfers to take place in
parallel.

2.1.1 REGISTER TRANSFERS


• Instruction execution involves a sequence of steps in which data are transferred from one register to
another.
• For each register, two control-signals are used: Riin & Riout. These are called Gating Signals.
• Riin=1 →data on bus is loaded into Ri. Riout=1
→content of Ri is placed on bus.
Riout=0, →bus can be used for transferring data from other registers.
• For example, Move R1, R2; This transfers the contents of register R1 to register R2. This can be
accomplished as follows:
1) Enable the output of registers R1 by setting R1out to 1 (Figure 7.2).
This places the contents of R1 on processor-bus.
2) Enable the input of register R2 by setting R2out to1.
This loads data from processor-bus into register R4.
• All operations and data transfers within the processor take place within time-periods defined by the
processor-clock.
• The control-signals that govern a particular transfer are asserted at the start of the clock cycle.
Module 2 Basic Processing Unit

Input & Output Gating for one Register Bit


• A 2-input multiplexer is used to select the data applied to the input of an edge-triggered D flip-flop.
• Riin=1 → mux selects data on bus. This data will be loaded into flip-flop at rising-edge of
clock. Riin=0 → mux feeds back the value currently stored in flip-flop (Figure7.3).
• Q output of flip-flop is connected to bus via a tri-state gate.
Riout=0 →gate's output is in the high-impedance state.
Riout=1 →the gate drives the bus to 0 or 1, depending on the value of Q.

2.1.2 PERFORMING AN ARITHMETIC OR LOGIC OPERATION


• The ALU performs arithmetic operations on the 2 operands applied to its A and B inputs.
• One of the operands is output of MUX;
And, the other operand is obtained directly from processor-bus.
• The result (produced by the ALU) is stored temporarily in register Z.
• The sequence of operations for [R3][R1]+[R2] is as follows:
3) R1out,Yin
4) R2out, Select Y, Add, Zin
5) Zout,R3in
• Instruction execution proceeds as follows:
Step 1 --> Contents from register R1 are loaded into register Y.
Step2 --> Contents from Y and from register R2 are applied to the A and B inputs of ALU;
Addition is performed & Result is stored in the Z register.
Module 2 Basic Processing Unit

Step 3 -->The contents of Z register is stored in the R3 register.


• The signals are activated for the duration of the clock cycle corresponding to that step. All other
signals are inactive.

Fetching a word from memory


• The MDR register has 4 control-signals (Figure7.4):
1) MDRin & MDRout control the connection to the internal processor data bus&
2) MDRinE& MDRoutEcontrol the connection to the memory Data bus.
• MAR register has 2control-signals.
1) MARin controls the connection to the internal processor address bus&
2) MARout controls the connection to the memory address bus.

• To fetch instruction/data from memory, processor transfers required address to


MAR. At the same time, processor issues Read signal on control-lines of
memory-bus.
• When requested-data are received from memory, they are stored in MDR. From MDR, they are
transferred to other registers.
• The response time of each memory access varies (based on cache miss, memory-mapped I/O). To
accommodate this, MFC is used. (MFC →Memory Function Completed).
• MFC is a signal sent from addressed-device to the processor. MFC informs the processor that the
requested operation has been completed by addressed-device.
• Consider the instruction Move (R1),R2. The sequence of steps is (Figure7.5):

➢ MAR ← [R1]
➢ Start a Read operation on the memory bus
➢ Wait for the MFC response from the memory
➢ Load MDR from the memory bus
➢ R2← [MDR]
The signals which is used to perform the read operations are
1) R1out, MARin, Read ;desired address is loaded into MAR & Read command issued.
2) MDRinE,WMFC ;load MDR from memory-bus & Wait for MFC response from memory.
3) MDRout,R2in ;load R2 from MDR.
where WMFC=control-signal that causes processor's control.
circuitry to wait for arrival of MFC signal.
Module 2 Basic Processing Unit

Storing a Word in Memory


• Consider the instruction Move R2,(R1). This requires the following sequence:
1) R1out,MARin ;desired address is loaded into MAR.
2) R2out,MDRin,Write ;data to be written are loaded into MDR & Write command issued.
3) MDRoutE,WMFC ;load data into memory-location pointed by R1 from MDR.

2.2 EXECUTION OF A COMPLETE INSTRUCTION


Consider the instruction
• Add (R3),R1
which adds the contents of a memory-location pointed by R3 to register R1. Executing this instruction
requires the following actions:
4) Fetch the instruction.
5) Fetch the first operand.
6) Perform the addition&
7) Load the result intoR1.
Module 2 Basic Processing Unit

• Instruction execution proceeds as follows:


Step1--> The instruction-fetch operation is initiated by
→ loading contents of PC into MAR &
→ sending a Read request to memory.
The Select signal is set to Select4, which causes the Mux to select constant 4. This value
is added to operand at input B (PC’s content), and the result is stored in Z.
Step2--> Updated value in Z is moved to PC. This completes the PC increment operation and
PC will now point to next instruction.
Step3--> Fetched instruction is moved into MDR and then to IR.
The step 1 through 3 constitutes the Fetch Phase.
At the beginning of step 4, the instruction decoder interprets the contents of the IR. This
enables the control circuitry to activate the control-signals for steps 4 through 7.
The step 4 through 7 constitutes the Execution Phase.
Step4--> Contents of R3 are loaded into MAR & a memory read signal is issued.
Step5--> Contents of R1 are transferred to Y to prepare for addition.
Step6--> When Read operation is completed, memory-operand is available in MDR, and the
addition is performed.
Step7--> Sum is stored in Z, then transferred to R1.The End signal causes a new instruction
fetch cycle to begin by returning to step1.

2.2.1 BRANCHING INSTRUCTIONS


• Control sequence for an unconditional branch instruction is as follows:

• Instruction execution proceeds as follows:


Step 1-3-->The processing starts & the fetch phase ends in step3.
Step 4-->The offset-value is extracted from IR by instruction-decoding circuit.
Since the updated value of PC is already available in register Y, the offset X is gated onto
the bus, and an addition operation is performed.
Step 5--> the result, which is the branch-address, is loaded into the PC.
• The branch instruction loads the branch target address in PC so that PC will fetch the next instruction
from the branch target address.
• The branch target address is usually obtained by adding the offset in the contents of PC.
• The offset X is usually the difference between the branch target-address and the address
immediately following the branch instruction.
• In case of conditional branch,
we have to check the status of the condition-codes before loading a new value into the PC.
e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
If N=0, processor returns to step 1 immediately after step 4.
If N=1, step 5 is performed to load a new value into PC.
Module 2 Basic Processing Unit

2.2 MULTIPLE BUS ORGANIZATION


• Disadvantage of Single-bus organization: Only one data-word can be transferred over the bus in
a clock cycle. This increases the steps required to complete the execution of the instruction
Solution: To reduce the number of steps, most processors provide multiple internal-paths. Multiple
paths enable several transfers to take place in parallel.
• As shown in fig 7.8, three buses can be used to connect registers and the ALU of the processor.
• All general-purpose registers are grouped into a single block called the Register File.
• Register-file has 3ports:
8) Two output-ports allow the contents of 2 different registers to be simultaneously placed on
buses A &B.
9) Third input-port allows data on bus C to be loaded into a third register during the same
clock-cycle.
• Buses A and B are used to transfer source-operands to A & B inputs of ALU.
• The result is transferred to destination over bus C.
• Incremented Unit is used to increment PC by4.

• Instruction execution proceeds as


follows: Step 1--> Contents of P Care
→ passed through ALU using R=B control-signal &
→ loaded into MAR to start memory Read operation. At the same time, PC is incremented by 4.
Step2--> Processor waits for MFC signal from memory.
Step3--> Processor loads requested-data into MDR, and then transfers them to IR.
Step4--> The instruction is decoded and add operation takes place in a single step.
Module 2 Basic Processing Unit

2.6 HARDWIRED CONTROL


• To execute instructions, the processor must have some means of generating the control signals
needed in the proper sequence.
• Two categories:
– Hardwired control
– Microprogrammed control
• Hardwired system can operate at high speed; but with little flexibility.

• The required control signals are determined by the following information:


• Contents of the control step counter
• Contents of the instruction register
• Contents of the condition code flags
• External input signals, such as MFC and interrupt requests.
• The decoder/encoder block in Figure 7.10 is a combinational circuit that generates the required control
outputs, depending on the state of all its inputs.
• By separating the decoding and encoding functions, we obtain the more detailed block diagram in
Figure 7.11.

To execute instructions, the processor must have some means of generating the control-signals. There
are two approaches for this purpose:
1) Hardwired control and 2) Micro programmed control.
Hardwired control
• Hardwired control is a method of control unit design (Figure7.11).
• The control-signals are generated by using logic circuits such as gates, flip-flops, decoder set c.
• Decoder/Encoder Block is a combinational-circuit that generates required control-outputs
depending on state of all its inputs.
• Instruction Decoder
➢ It decodes the instruction loaded in the IR.
➢ If IR is an 8 bit register, then instruction decoder generates 2 8(256 lines); one for each
instruction.
➢ It consists of a separate output-lines INS1 through INSm for each machine instruction.
➢ According to code in the IR, one of the output-lines INS1 through INSm is set to 1, and all
other lines are set to0.
• Step-Decoder provides a separate signal line for each step in the control sequence.
• Encoder
➢ It gets the input from instruction decoder, step decoder, external inputs and condition codes.
➢ It uses all these inputs to generate individual control-signals: Yin, PCout, Add, End and soon.
➢ For example (Figure 7.12),Zin=T1+T6.ADD+T4.BR
Module 2 Basic Processing Unit

;This signal is asserted during time-slot T1 for all instructions.


during T6 for an Add instruction.
during T4 for unconditional branch instruction

• When RUN=1, counter is incremented by 1 at the end of every clock


cycle. When RUN=0, counter stops counting.
• After execution of each instruction, end signal is generated. End signal resets step counter.
• The End signal starts a new instruction fetch cycle by resetting the control step counter to its starting value.
• Sequence of operations carried out by this machine is determined by wiring of logic circuits, hence the
name“ hardwired”
Module 2 Basic Processing Unit

• Advantage: Can operate at high speed.


• Disadvantages:
1) Since no. of instructions/control-lines is often in hundreds, the complexity of control unit is
very high.
2) It is costly and difficult to design.
3) The control unit is inflexible because it is difficult to change the design.

2.4.1 COMPLETE PROCESSOR

• This has separate processing-units to deal with integer data and floating-point data.
Integer Unit →To process integer data. (Figure 7.14).
Floating Unit →To process floating –point data.
• Data-Cache is inserted between these processing-units & main-memory.
The integer and floating unit gets data from data cache.
• Instruction-Unit fetches instructions
→ from an instruction-cache or
→ from main-memory when desired instructions are not already in cache.
• Processor is connected to system-bus&
hence to the rest of the computer by means of a Bus Interface.
• Using separate caches for instructions & data is common practice in many processors today.
• A processor may include several units of each type to increase the potential for concurrent operations.
• The 80486 processor has 8-kbytes single cache for both instruction and data.
Whereas the Pentium processor has two separate 8 k bytes caches for instruction and data.
Module 2 Basic Processing Unit

HARDWIRED CONTROL VS MICROPROGRAMMED CONTROL


Attribute Hardwired Control Micro programmed Control
Definition Hardwired control is a control Micro programmed control is a control
mechanism to generate control- mechanism to generate control-signals
signals by using gates, flip- flops, by using a memory called control store
decoders, and other (CS), which contains the control-
digital circuits. signals.
Speed Fast Slow
Control functions Implemented in hardware. Implemented in software.
Flexibility Not flexible to accommodate More flexible, to accommodate new
new system specifications or system specification or new instructions
new instructions. redesign is required.
Ability to handle large Difficult. Easier.
or complex instruction
sets
Ability to support Very difficult. Easy.
operating systems &
diagnostic features
Design process Complicated. Orderly and systematic.
Applications Mostly RISC microprocessors. Mainframes, some microprocessors.
Instruction set size Usually under 100 instructions. Usually over 100 instructions.
ROM size - 2K to 10K by 20-400 bit
microinstructions.
Chip area efficiency Uses least area. Uses more area.
Diagram
Module 2 Basic Processing Unit

2.5 MICROPROGRAMMED CONTROL


• Microprogramming is a method of control unit design (Figure7.16).
• Control-signals are generated by a program similar to machine language programs.
• Control Word(CW) is a word whose individual bits represent various control-signals (like Add, PCin).
• Each of the control-steps in control sequence of an instruction defines a unique combination of 1s &
0s in CW.
• Individual control-words in micro routine are referred to as micro instructions (Figure7.15).
• AsequenceofCWscorrespondingtocontrol-sequenceofamachineinstructionconstitutesthe
Micro routine.
• The micro routines for all instructions in the instruction-set of a computer are stored in a special
memory called the Control Store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading CWs of
corresponding micro routine from CS.
• µPC is used to read CWs sequentially from CS. (µPC→ Micro program Counter).
• Every time new instruction is loaded into IR, o/p of Starting Address Generator is loaded into µPC.
• Then, µPC is automatically incremented by clock;
causing successive microinstructions to be read from CS.
Hence, control-signals are delivered to various parts of processor in correct sequence.

Advantages
• It simplifies the design of control unit. Thus it is both, cheaper and less error prone implement.
• Control functions are implemented in software rather than hardware.
• The design process is orderly and systematic.
• More flexible, can be changed to accommodate new system specifications or to correct the design
errors quickly and cheaply.
• Complex function such as floating point arithmetic can be realized efficiently.
Disadvantages
• A micro programmed control unit is somewhat slower than the hardwired control unit, because time
is required to access the microinstructions from CM.
• The flexibility is achieved at some extra hardware cost due to the control memory and its access
circuitry.
Module 2 Basic Processing Unit

ORGANIZATION OF MICROPROGRAMMED CONTROL UNIT TO SUPPORT CONDITIONAL


BRANCHING
• Drawback of previous Micro program control:
➢ It cannot handle the situation when the control unit is required to check the status of the
condition codes or external inputs to choose between alternative courses of action.
Solution:
➢ Use conditional branch microinstruction.
• In case of conditional branching, microinstructions specify which of the external inputs, condition-
codes should be checked as a condition for branching to take place.
• Starting and Branch Address Generator Block loads a new address into µPC when a
microinstruction instructs it to do so (Figure7.18).
• To allow implementation of a conditional branch, inputs to this block consist of
→ external inputs and condition-codes &
→ contents of IR.
• µPC is incremented every time a new microinstruction is fetched from micro program memory except
in following situations:
1) When a new instruction is loaded into IR, µPC is loaded with starting-address of micro
routine for that instruction.
2) When a Branch microinstruction is encountered and branch condition is satisfied, µPC is
loaded with branch-address.
3) When an End microinstruction is encountered, µPC is loaded with address of first CW in
micro routine for instruction fetch cycle.
Module 2 Basic Processing Unit

2.5.1 MICROINSTRUCTIONS
• A simple way to structure microinstructions is to assign one bit position to each control-signal
required in the CPU.
• There are 42 signals and hence each microinstruction will have 42bits.
• Drawbacks of micro programmed control:
1) Assigning individual bits to each control-signal results in long micro instructions
because the number of required signals is usually large.
2) Available bit-space is poorly used because
only a few bits are set to 1 in any given microinstruction.
• Solution: Signals can be grouped because
1) Most signals are not needed simultaneously.
2) Many signals are mutually exclusive. E.g. only 1 function of ALU can be activated at a
time. For ex: Gating signals: IN and OUT signals (Figure7.19).
Control-signals: Read, Write.
ALU signals: Add, Sub, Mul, Div, Mod.
• Grouping control-signals into fields requires a little more hardware because
decoding-circuits must be used to decode bit patterns of each field into individual control-signals.
• Advantage: This method results in a smaller control-store (only 20 bits are needed to store the
patterns for the 42signals).
Module 2 Basic Processing Unit

TECHNIQUES OF GROUPING OF CONTROL-SIGNALS


• The grouping of control-signal can be done either by using
1) Vertical organization&
2) Horizontal organization.

Vertical Organization Horizontal Organization


Highly encoded schemes that use compact The minimally encoded scheme in which many
codes to specify only a small number of control resources can be controlled with a single
functions in each microinstruction are referred Micro instruction is called a horizontal
to as a vertical organization. organization.
Slower operating-speeds. Useful when higher operating-speed is desired.
Short formats. Long formats.
Limited ability to express parallel Ability to express a high degree of parallelism.
micro operations.
Considerable encoding of the control Little encoding of the control information.
information.

2.5.2 MICROPROGRAM SEQUENCING


• The task of micro program sequencing is done by micro program sequencer.
• Two important factors must be considered while designing the micro program sequencer:
3) The size of the microinstruction&
4) The address generation time.
• The size of the microinstruction should be minimum so that the size of control memory required to
store microinstructions is also less.
• This reduces the cost of control memory.
• With less address generation time, microinstruction can be executed in less time resulting better
throughout.
• During execution of a micro program the address of the next microinstruction to be executed has 3
sources:
1) Determined by instruction register.
2) Next sequential address&
3) Branch.
• Microinstructions can be shared using micro instruction branching.
• Disadvantage of micro programmed branching:
1) Having a separate micro routine for each machine instruction results in a large total number
of microinstructions and a large control-store.
2) Execution time is longer because it takes more time to carry out the required branches.
• Consider the instruction Add src,Rdst;which adds the source-operand to the contents of Rdst and
places the sum in Rdst.
• Let source-operand can be specified in following addressing modes (Figure7.20):
a) Indexed
b) Auto increment
c) Auto decrement
d) Register indirect&
e) Register direct
• Each box in the chart corresponds to a microinstruction that controls the transfers and operations
indicated within the box.
• The microinstruction is located at the address indicated by the octal number(001,002).
Module 2 Basic Processing Unit
Module 2 Basic Processing Unit

BRANCH ADDRESS MODIFICATION USING BIT-ORING


• The branch address is determined by O Ring particular bit or bits with the current address of
microinstruction.
• Eg: If the current address is 170 and branch address is 171 then the branch address can be
generated by O Ring 01(bit 1), with the current address.
• Consider the point labeled in the figure. At this point, it is necessary to choose between direct and
indirect addressing modes.
• If indirect-mode is specified in the instruction, then the microinstruction in location 170 is performed
to fetch the operand from the memory.
If direct-mode is specified, this fetch must be bypassed by branching immediately to location 171.
• The most efficient way to bypass microinstruction 170 is to have bit-O Ring of
→ current address 170 &
→ branch address 171.

2.5.3 WIDE BRANCH ADDRESSING


• The instruction-decoder (Inst Dec) generates the starting-address of the micro routine that
implements the instruction that has just been loaded into theIR.
• Here, register IR contains the Add instruction, for which the instruction decoder generates the
microinstruction address 101. (However, this address cannot be loaded as is into the μPC).
• The source-operand can be specified in any of several addressing-modes. The bit-O Ring technique can
be used to modify the starting-address generated by the instruction-decoder to reach the appropriate
path.
Use of WMFC
• WMFC signal is issued at location 112 which causes a branch to the microinstruction in location171.
• WMFC signal means that the microinstruction may take several clock cycles to complete. If the branch
is allowed to happen in the first clock cycle, the microinstruction at location 171 would be fetched and
executed prematurely. To avoid this problem, WMFC signal must inhibit any change in the contents of
the μ PC during the waiting-period.

Detailed Examination of Add (Rsrc)+,Rdst


• Consider Add (Rsrc)+,Rdst; which adds Rsrc content to Rdst content, then stores the sum in Rdst
and finally increments Rsrc by 4 (i.e. auto-increment mode).
• In bit 10 and 9, bit-patterns 11, 10, 01 and 00 denote indexed, auto-decrement, auto-increment and
register modes respectively. For each of these modes, bit 8 is used to specify the indirect version.
• The processor has 16 registers that can be used for addressing purposes; each specified using a 4-
bit-code (Figure7.21).
• There are 2 stages of decoding:
5) The microinstruction field must be decoded to determine that an Rsrc or Rdst register is
involved.
6) The decoded output is then used to gate the contents of the Rsrc or Rdst fields in the IR into
a second decoder, which produces the gating-signals for the actual registers R0 toR15.
Module 2 Basic Processing Unit

2.5.4 MICROINSTRUCTIONS WITH NEXT-ADDRESS FIELDS


Module 2 Basic Processing Unit

Drawback of previous organization:


➢ The micro program requires several branch microinstructions which perform no useful
operation. Thus, they detract from the operating-speed of the computer.
Solution:
➢ Include an address-field as a part of every microinstruction to indicate the location of the
next microinstruction to be fetched. (Thus, every microinstruction becomes a branch
microinstruction).
• The flexibility of this approach comes at the expense of additional bits for the address-field(Fig7.22).
• Advantage: Separate branch microinstructions are virtually eliminated. (Figure7.23-24).
• Disadvantage: Additional bits for the address field (around1/6).
• There is no need for a counter to keep track of sequential address. Hence, μ PC is replaced with μ AR.
• The next-address bits are fed through the OR gate to the μ AR, so that the address can be modified
on the basis of the data in the IR, external inputs and condition-codes.
• The decoding circuits generate the starting-address of a given micro routine on the basis of the op
code in the IR. (μ AR→ Micro instruction Address Register).
Module 2 Basic Processing Unit

2.5.5 PREFETCHING MICROINSTRUCTIONS


• Disadvantage of Micro programmed Control: Slower operating-speed because of the time
ittakes to fetch microinstructions from the control-store.
Solution: Faster operation is achieved if the next microinstruction is pre-fetched while
thecurrent one is being executed.
2.5.6 Emulation
• The main function of micro programmed control is to provide a means for simple, flexible
andrelatively inexpensive execution of machine instruction.
• Its flexibility in using a machine's resources allows diverse classes of instructions to be implemented.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set
ofinstructions that is in fact the instruction-set of a different computerM2.
• Programs written in the machine language of M2 can be then be run on computer M1 i.e.
M1emulatesM2.
• Emulation allows us to replace obsolete equipment with more up-to-date machines.
• If the replacement computer fully emulates the original one, then no software changes have to
bemade to run existing programs.
• Emulation is easiest when the machines involved have similar architectures.
Module 2 Basic Processing Unit

Problem 1:
Why is the Wait-for-memory-function-completed step needed for reading from or writing to the main
memory?
Solution:
The WMFC step is needed to synchronize the operation of the processor and the main memory.

Problem 2:
For the single bus organization, write the complete control sequence for the instruction: Move (R1), R2
Solution:
1) PCout, MARin, Read, Select4, Add, Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, IRin
4) R1out, MARin, Read
5) MDRinE, WMFC
6) MDRout, R2in, End
Module 2 Basic Processing Unit

Problem 3 Write the sequence of control steps required for the single bus organization in each of the
followinginstructions:
a) Add the immediate number NUM to register R1.
b) Add the contents of memory-location NUM to register R1.
c) Add the contents of the memory-location whose address is at memory-location NUM to
register R1.
Assume that each instruction consists of two words. The first word specifies the operation and the
addressing mode, and the second word contains the number NUM.
Solution:

Problem 4:
Show the control steps for the Branch on Negative instruction for a processor with three-busorganization of
the data path
Solution:

You might also like