250324digital System Design - Memory
250324digital System Design - Memory
Memory
1
Memory
2
CPU Memory Interface
address bus
CPU Memory Interface usually consists of: data bus
❑ unidirectional address bus Read
❑ bidirectional data bus
Memo
CPU Write
ry
❑ read control line Ready
3
Memory Hierarchy
The design constraints on a computer memory can be summed up by three
questions (i) How Much (ii) How Fast (iii) How expensive.
There is a tradeoff among the three key characteristics
A variety of technologies are used to implement memory system
Dilemma facing designer is clear 🡪 large capacity, fast, low cost!!
Solution 🡪 Employ memory hierarchy
Cost
registers
Cache Capacity
Main Memory
4
Memory
CPU
Dynamic RAM
Cache Cache
Controller Memory
Local CPU / Memory Bus
PCI
DRAM Co-processor
Controller
Peripheral Component Interconnect Bus
EISA PC Bus
SCSI
Bus
PC Card 1 PC Card 2 PC Card 3
ENG3640 Fall 2012 5
Memory Classification:
6
Classifications: Key Design Metrics
7
The Memory Hierarchy
Let’s look at numbers for an Intel Pentium 4, 3.2 GHz Server.
Component Access Speed Size of Component
(Time for data to be
returned)
Registers 1 cycle = 8 registers
0.3 nanoseconds
Chapter 6: Memory 8
Basic Organization
Memory Cell Operation
12
Memory sizes
• We refer to this as a 2k x n memory.
– There are k address lines, which can specify one of 2k addresses.
– Each address contains an n-bit word.
2k x n memory
k n
ADRS OUT
n
DATA
CS
WR
• For example, a 224 x 16 RAM contains 224 = 16M words, each 16 bits
long.
– The RAM would need 24 address lines.
– The total storage capacity is 224 x 16 = 228 bits.
13
Size matters!
• Memory sizes are usually specified in numbers of bytes (8 bits).
• The 228-bit memory on the previous page translates into:
228 bits / 8 bits per byte = 225 bytes
• With the abbreviations below, this is equivalent to 32 megabytes.
• To confuse you, RAM size is measured in base 2 units, while hard drive size
is measured in base 10 units.
– In this class, we’ll only concern ourselves with the base 2 units.
14
Typical memory sizes
• Some typical memory capacities:
– PCs usually come with 128-512MB RAM.
– PDAs have 8-64MB of memory.
– Digital cameras and MP3 players can
have 32MB or more of storage.
• Many operating systems implement
virtual memory, which makes the
memory seem larger than it really is.
– Most systems allow up to 32-bit
addresses. This works out to 232, or
about four billion, different possible
addresses.
– With a data size of one byte, the result is
apparently a 4GB memory!
– The operating system uses hard disk
space as a substitute for “real” memory.
15
Reading RAM
• To read from this RAM, the controlling circuit
must:
– Enable the chip by ensuring CS = 1.
– Select the read operation, by setting WR = 0.
– Send the desired address to the ADRS input.
– The contents of that address appear on OUT after
a little while.
• Notice that the DATA input is unused for read
operations. k
2 x n memory
k n
ADRS OUT
n
DATA
CS
16
WR
SRAM Memory Timing for Read Accesses
17
Writing RAM
• To write to this RAM, you need to:
– Enable the chip by setting CS = 1.
– Select the write operation, by setting WR = 1.
– Send the desired address to the ADRS input.
– Send the word to store to the DATA input.
• The output OUT is not needed for memory
write operations.
2k x n memory
k n
ADRS OUT
n
DATA
CS
WR
18
SRAM Memory Timing for Write Accesses
19
PIC To SRAM ‘#’ means low true
PIC 8Kx8
RA[7:0] A[7:0]
RB[4:0] A[12:8]
RC[7:0] IO[7:0]
Vdd CE2
OE# CE1#
RD0
RD1 WE#
To read: Address on RA, RB. RC port is all inputs; RD0 = ‘0’, RD1 = ‘1’.
To write: Address on RA, RB. RC port is all outputs; RD0 = ‘1’, RD1 = 0.
V 0.1 20
Expanding Memory RAM1 8Kx8
RA[7:0] A[7:0]
RB[4:0] A[12:8]
RC[7:0] IO[7:0]
RB5 CE2
OE# CE1#
RD0
PIC
RD1 WE#
RAM0 8Kx8
RAM1 accessed when A[7:0]
RB5 = 1
A[12:8]
IO[7:0]
CE1#
RAM0 accessed when OE# CE2
RB5 = 0
WE#
21
More Memory RAM0 RAM1
13 A[12:0]
RB[4:0], RA[7:0] A[12:0]
8
RC[7:0] IO[7:0] IO[7:0]
RD0 OE# OE#
RD1 WE# WE#
PIC CE1# CE1#
RB[6:5]
Delay until
data
available
42
ENG3640 Fall 2012
DRAM Evolution (Synchronous)
• SRAM Cell
word line word line
45
Read Only Memory (ROM)
• Permanent storage
– Nonvolatile
• Microprogramming (will address later)
• Library subroutines
• Systems programs (BIOS)
• Function tables
• Controllers
Types of ROM
• ROM: Written during manufacture
– Very expensive for small runs
• Read “mostly”
– EPROM: Erasable Programmable
• Erased by UV (All of chip!)
– Flash memory
• Whole blocks of memory stored/changed electrically
– EEPROM: Electrically Erasable
• Takes much longer to write than read (lower density)
EPROM
Semiconductor Memory
16Mbit DRAM
256kByte Module Organisation (256K x 1)
Typical 16 Mb DRAM (4M x 4)
1MByte Module Organization (1Meg x 8 bits)
Refreshing
• Refresh circuit is included on the chip
• Count through rows
• Read & Write back
• Chip must be disabled during refresh
• Takes time
• Occurs asynchronously
• Slows down apparent performance
Improvements in memory
RAM – continually gets denser.
Source/destination
address can be
configured to be
unchange/increment/d
ecrement after each
transfer
DMA Transfer Modes
• Six transfer modes
– Single transfer, block transfer, burst-block transfer, repeated single transfer,
repeated block transfer, repeated burst-block transfer
• Single transfer
– Each transfer requires a separate trigger, DMA is disable after transfer
• Must re-enable DMA before receive another trigger
– Repeated single transfer: DMA remains enable
• Another trigger start another transfer
• Block transfer
– Transfer of a complete block after one trigger, DMA is disable after transfer
– Repeated block transfer: DMA remains enable,
• Another trigger start another transfer
• Burst-block transfer
– Block transfers with CPU activity interleaved,
– Repeated burst-block transfer: DMA remains enable
• Keep transferring
• CPU executes at 20% capacity
Appendix
65
Introduction
3 to 8 decoder
A5 A4 A3
3 to 8 decoder
A2 A1 A0
Selected cell
If A=110010
110 One cell selected
If A=110010
One cell selected
68
Address wires Data wires
Capacity
• 1 address bit: 2 locations, 0-------1
• 2 address bits: 4 locations, 0------3
• 3 address bits: 8 locations, 0------7
• 4 address bits: 16 locations, 0------15 (F)
• 5 address bits: 32 locations, 0------31(1F)
• 6 address bits: 64 locations, 0------63(3F)
• 7 address bits: 128 locations, 0------127(7F)
• 8 address bits: 256 locations, 0------255(FF)
• 9 address bits: 512 locations, 0------511(1FF)
• 10 address bits: 1024 locations, 0------1023(3FF) (1K)
• 11 address bits: 2048 locations, 0------2047(7FF)(2K)
• 12 address bits: 4096 locations, 0------4095(FFF)(4K)
• 13 address bits: 8K locations, 0------1FFF
• 14 address bits: 16K locations, 0------3FFF
• 15 address bits: 32K locations, 0------7FFF
• 16 address bits: 64K locations, 0------FFFF
69
Notes on addres 1Kilo:210=1024 Notes on data: 8 bits=byte
Classification of memory
Classes of memories:
• RAM
– SRAM
– DRAM
– RDD-RAM
– SDRAM
• ROM
– ROM
– PROM
– EPROM
– EEPROM
70
Mother board
71
Random access memory
• Sequential circuits all depend upon the presence of memory.
– A flip-flop can store one bit of information.
– A register can store a single “word,” typically 32-64 bits.
• Random access memory, or RAM, allows us to store even larger amounts
of data. We’ll see:
– The basic interface to memory.
– How you can implement static RAM chips hierarchically.
• This is the last piece we need to put together a computer!
72
Static memory
• How can you implement the memory chip?
• There are many different kinds of RAM.
– We’ll start off discussing static memory, which is most commonly
used in caches and video cards.
– Later we mention a little about dynamic memory, which forms the
bulk of a computer’s main memory.
• Static memory is modeled using one latch for each bit of storage.
• Why use latches instead of flip flops?
– A latch can be made with only two NAND or two NOR gates, but a
flip-flop requires at least twice that much hardware.
– In general, smaller is faster, cheaper and requires less power.
– The tradeoff is that getting the timing exactly right is a pain.
73
Starting with latches
• To start, we can use one latch to store each bit. A one-bit RAM cell is
shown here.
75
Those funny triangles
• The triangle represents a three-state buffer.
• Unlike regular logic gates, the output can be one of three different
possibilities, as shown in the table.
76
Connecting three-state buffers together
• You can connect several three-state
buffer outputs together if you can
guarantee that only one of them is
enabled at any time.
• The easiest way to do this is to use a
decoder!
• If the decoder is disabled, then all the
three-state buffers will appear to be
disconnected, and OUT will also appear
disconnected.
• If the decoder is enabled, then exactly
one of its outputs will be true, so only
one of the tri-state buffers will be
connected and produce an output.
• The net result is we can save some wire
and gate costs. We also get a little more
flexibility in putting circuits together.
77
Bigger and better
• Here is the 4 x 1 RAM
once again.
• How can we make a
“wider” memory with
more bits per word, like
maybe a 4 x 4 RAM?
• Duplicate the stuff in
the blue box!
78
A 4 x 4 RAM
• DATA and OUT are now each four bits long, so you can read and write
four-bit words.
79
Bigger RAMs from smaller RAMs
• We can use small RAMs as building blocks for making larger memories,
by following the same principles as in the previous examples.
• As an example, suppose we have some 64K x 8 RAMs to start with:
– 64K = 26 x 210 = 216, so there are 16 address lines.
– There are 8 data lines.
16
8 8
80
Making a larger memory
• We can put four 64K x 8 chips
together to make a 256K x 8 8
memory.
16
• For 256K words, we need 18
address lines.
– The two most significant
address lines go to the
decoder, which selects one of
the four 64K x 8 RAM chips.
– The other 16 address lines are
shared by the 64K x 8 chips.
• The 64K x 8 chips also share WR
and DATA inputs.
• This assumes the 64K x 8 chips
have three-state outputs. 8
81
Analyzing the 256K x 8 RAM
• There are 256K words of memory,
spread out among the four smaller 8
82
Address ranges
83
Making a wider memory
• You can also combine smaller chips to make wider memories, with the
same number of addresses but more bits per word.
• Here is a 64K x 16 RAM, created from two 64K x 8 chips.
– The left chip contains the most significant 8 bits of the data.
– The right chip contains the lower 8 bits of the data.
8 8
16
8 8
84
Input==output
85
Other memories
• Last time we showed how to build arbitrarily-large static memories
from single-bit RAM cells.
• Today we’ll look at some other kinds of memories.
– Dynamic RAM is used for the bulk of computer memory.
– Read-only memories and PLAs are two “programmable logic devices,”
which can be considered as special types of memories.
86
Dynamic memory in a nutshell
• Dynamic memory is built with capacitors.
– A stored charge on the capacitor represents a logical 1.
– No charge represents a logic 0.
• However, capacitors lose their charge after a few milliseconds. The
memory requires constant refreshing to recharge the capacitors.
(That’s what’s “dynamic” about it.)
• Dynamic RAMs tend to be physically smaller than static RAMs.
– A single bit of data can be stored with just one capacitor and one
transistor, while static RAM cells typically require 4-6 transistors.
– This means dynamic RAM is cheaper and denser—more bits can be
stored in the same physical area.
87
SDRAM
• Synchronous DRAM, or SDRAM, is one of
the most common types of PC memory now.
• Memory chips are organized into “modules”
that are connected to the CPU via a 64-bit
(8-byte) bus.
• Speeds are rated in megahertz: PC66, PC100
and PC133 memory run at 66MHz, 100MHz
and 133MHz respectively.
• The memory bandwidth can be computed by
multiplying the number of transfers per
second by the size of each transfer.
– PC100 can transfer up to 800MB per
second (100MHz x 8 bytes/cycle).
– PC133 can get over 1 GB per second. (from amazon.com)
88
DDR-RAM
• A newer type of memory is Double Data Rate, or DDR-RAM.
• It’s very similar to regular SDRAM, except data can be transferred on
both the positive and negative clock edges. For 100-133MHz buses, the
effective memory speeds appear to be 200-266MHz.
• This memory is confusingly called PC1600 and PC2100 RAM, because
– 200MHz x 8 bytes/cycle = 1600MB/s
– 266MHz x 8 bytes/cycle = 2100MB/s.
• DDR-RAM has lower power consumption, using 2.5V instead of 3.3V like
SDRAM. This makes it good for notebooks and other mobile devices.
89
RDRAM
• Another new type of memory called RDRAM
is used in the Playstation 2 as well as some
Pentium 4 computers.
• The data bus is only 16 bits wide.
• But the memory runs at 400MHz, and data
can be transferred on both the positive and
negative clock edges.
– That works out to a maximum transfer
rate of 1.6GB per second.
– You can also implement two “channels”
of memory, resulting in up to 3.2GB/s of
bandwidth.
(from amazon.com)
90
Dynamic vs. static memory
• In practice, dynamic RAM is used for a computer’s main memory, since
it’s cheap and you can pack a lot of storage into a small space.
– These days you can buy 256MB of memory for as little as $60.
– You can also load a system with 1.5GB or more of memory.
• The disadvantage of dynamic RAM is its speed.
– Transfer rates are 800MHz at best, which can be much slower than
the processor itself.
– You also have to consider latency, or the time it takes data to travel
from RAM to the processor.
• Real systems augment dynamic memory with small but fast sections of
static memory called caches.
– Typical processor caches range in size from 128KB to 320KB.
– That’s small compared to a 128MB main memory, but it’s enough to
significantly increase a computer’s overall speed.
91
Read-only memory
2k x n ROM
k n
ADRS OUT
CS
93
Memories and functions
• ROMs are actually combinational devices, not
sequential ones!
– You can’t store arbitrary data into a ROM,
so the same address will always contain the
same data.
– You can think of a ROM as a combinational
circuit that takes an address as input, and
produces some data as the output.
• A ROM table is basically just a truth table.
– The table shows what data is stored at each
ROM address.
– You can generate that data combinationally,
using the address as the input.
94
Decoders
• We can already convert truth tables to circuits easily, with decoders.
• For example, you can think of this old circuit as a memory that “stores”
the sum and carry outputs from the truth table on the right.
95
ROM setup
• ROMs are based on this decoder implementation of functions.
– A blank ROM just provides a decoder and several OR gates.
– The connections between the decoder and the OR gates are
“programmable,” so different functions can be implemented.
• To program a ROM, you just make the desired connections between the
decoder outputs and the OR gate inputs.
96
ROM example
• Here are three functions, V2V1V0, implemented with an 8 x 3 ROM.
• Blue crosses (X) indicate connections between decoder outputs and OR
gates. Otherwise there is no connection.
A2
A1
A0
97
The same example again
• Here is an alternative presentation of the same 8 x 3 ROM, using
“abbreviated” OR gates to make the diagram neater.
A2
A1
A0
V2 =
Σm(1,2,3,4)
V1 =
Σm(2,6,7) V2 V1 V0
V0 =
Σm(4,6,7)
98
Why is this a “memory”?
• This combinational circuit can be considered a read-only memory.
– It stores eight words of data, each consisting of three bits.
– The decoder inputs form an address, which refers to one of the
eight available words.
– So every input combination corresponds to an address, which is
“read” to produce a 3-bit data output.
A2
A1
A0
V2 V1 V0
99
ROMs vs. RAMs
• There are some important differences between ROM and RAM.
– ROMs are “non-volatile”—data is preserved even without power. On
the other hand, RAM contents disappear once power is lost.
– ROMs require special (and slower) techniques for writing, so they’re
considered to be “read-only” devices.
• Some newer types of ROMs do allow for easier writing, although the
speeds still don’t compare with regular RAMs.
– MP3 players, digital cameras and other toys use CompactFlash,
Secure Digital, or MemoryStick cards for non-volatile storage.
– Many devices allow you to upgrade programs stored in “flash ROM.”
100
Programmable logic arrays
• A ROM is potentially inefficient because it uses a decoder, which
generates all possible minterms. No circuit minimization is done.
• Using a ROM to implement an n-input function requires:
– An n-to-2n decoder, with n inverters and 2n n-input AND gates.
– An OR gate with up to 2n inputs.
– The number of gates roughly doubles for each additional ROM input.
• A programmable logic array, or PLA, makes the decoder part of the
ROM “programmable” too. Instead of generating all minterms, you can
choose which products (not necessarily minterms) to generate.
101
PLA
102
PROM
Programmable read only memory
• Non volatile.
• Can be programmed - written into - only once.
• Programming is done electrically and can be done after manufacturing.
• Special equipment is needed for the programming process.
– Uses fuses instead of diodes.
• Fuses that need to be removed are “vaporized” during the
programming process using a high voltage pulse (10 – 30 V).
103
EPROM
• Non volatile.
• More expensive than PROM.
104
EEPROM
• Non volatile.
• Updatable in place.
• More expensive and less dense than EPROM.
105
Flash Memory
106
Increasing the capacity of the memory
• Procedure:
– Find the capacity of the memory that can be used by the
microprocessor
– Find the capacity and the characteristics of the available memories
– Put these chips of memories in a table (ascending way)
– Find the range of addresses for each chip (beginning address and
final address)
– Find the expression of CS
– Connect with the microprocessor
…Example:
107