0% found this document useful (0 votes)

63 views118 pages

11 VM

The document discusses advanced computer architecture concepts, focusing on virtual memory and its management in computer systems. It covers topics such as the structure of operating systems, memory organization, address translation, and the use of page tables. Additionally, it explores the implications of virtual memory for application isolation, protection, and inter-process communication.

Uploaded by

antimo asumu bayeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views118 pages

11 VM

Uploaded by

antimo asumu bayeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Advanced Computer Architecture I

Virtual Memory
Looking Ahead: Multicore Cache
Hierarchies

• Shared vs Private L2
• Shared: Data distributed
across all core's L2
(capacity)
• Private: Data local to L1
(locality + no interference)
• Uniform vs Non-uniform • Maintain Coherence?
• For shared, does any hit (all copies of one cache
take same amount of time? line are the same)
• Centralized vs Distributed? • Maintain Consistency?
• Centralized physical (semantics of loads and
structure, or on-chip stores with shared
network? memory)
17
Today: Virtual Memory
App App App • The operating system (OS)
System software • A super-application
• Hardware support for an OS
Mem CPU I/O
• Virtual memory
• Page tables and address translation
• TLBs and memory hierarchy issues

18
A Computer System: Hardware
• CPUs and memories
• Connected by memory bus
• I/O peripherals: storage, input, display, network, …
• With separate or built-in DMA
• Connected by system bus (which is connected to memory bus)

Memory bus System (I/O) bus

bridge
CPU/$ CPU/$ DMA DMA I/O ctrl
Memory

kbd display NIC

Disk

19
A Computer System: + App Software
• Application software: computer must do something
• Pain to just use hardware directly – how to share?

Application software
Memory bus System (I/O) bus
bridge
CPU/$ CPU/$ DMA DMA I/O ctrl
Memory

kbd display NIC

Disk 口

20
A Computer System: + OS
• Operating System (OS): virtualizes hardware for apps
• Abstraction: provides services (e.g., threads, files, etc.)
+ Simplifies app programming model, raw hardware is nasty
• Isolation: gives each app illusion of private CPU, memory, I/O
+ Simplifies app programming model
+ Increases hardware resource utilization
Application Application Application Application
OS

Memory bus System (I/O) bus

bridge
CPU/$ CPU/$ DMA DMA I/O ctrl
Memory

kbd、 display
·
Disk NIC
口

21
Operating System (OS) and User Apps
• Sane system development requires a split
• Hardware itself facilitates/enforces this split

• Operating System (OS): a super-privileged process

• Manages hardware resource allocation/revocation for all processes
• Has direct access to resource allocation features
• Aware of many low level hardware details
• Aware of other processes
• Talks directly to input/output devices (device driver software)

• User-level apps: ignorance is bliss

• Unaware of most nasty hardware details
• Unaware of other apps (and OS)
• Explicitly denied access to resource allocation features

22
System Calls
• Controlled transfers to/from OS

• System Call: a user-level app “function call” to OS

• Leave description of what you want done in registers
• SYSCALL instruction (also called TRAP or INT)
• Entry point into the OS for privileged operations
• Basic operation
• Processor jumps to requested handler
• Sets privileged mode
• OS code performs operation
• OS does a “return from system call”
• Unsets privileged mode

23
Interrupts – OS wants your attention
• Exceptions: synchronous, generated by running app
• E.g., illegal insn, divide by zero, etc.
• Interrupts: asynchronous events generated externally
• E.g., timer, I/O request/reply, etc.
• E.g. Timer: programmable on-chip interrupt
• Initialize with some number of micro-seconds
• Timer counts down and interrupts when reaches zero
• “Interrupt” handling: same mechanism for both
• “Interrupts” are on-chip signals/bits
• Either internal (e.g., timer, exceptions) or from I/O devices
• Processor continuously monitors interrupt status, when one is high …
• Hardware jumps to some preset address in OS code (interrupt vector)
• Like an asynchronous, non-programmatic SYSCALL

24
Virtualizing Processors
• How do multiple apps (and OS) share the processors?
• Goal: applications think there are an infinite # of processors

• Solution: time-share the resource

• Trigger a context switch at a regular interval (~5ms)
• Pre-emptive: app doesn’t yield CPU, OS forcibly takes it
+ Stops greedy apps from starving others
• Architected state: PC, registers
• Save and restore them on context switches
• Memory state?
• Non-architected state: caches, predictor tables, etc.
• Ignore or flush
• OS responsible to handle context switching
• Hardware support is just a timer interrupt

25
Virtualizing Main Memory
• How do multiple apps (and the OS) share main memory?
• Goal: each application thinks it has infinite memory
• Biggest memory we have: Disk (so slooooow)
• Faster memory DRAM (so expensive!)
• One app may want more memory than is in main memory
• App’s insn/data footprint > DRAM
• Requires main memory to act like a cache
• With disk as next level in memory hierarchy (slow)
• Write-back, write-allocate, large blocks or “pages”
• Solution:
• Part #1: treat memory as a “cache”
• Store the overflowed blocks in “swap” space on disk
• Part #2: add a level of indirection (address translation)

27
Virtual Memory
Program • Programs use virtual addresses (VA)
code heap stack • 0…2N–1
• VA size also referred to as machine size
…
• E.g., Pentium4 is 32-bit, SPARC is 64-bit

• Memory uses physical addresses (PA)

• 0…2M–1 (typically M<N, especially if N=64)
…
• 2M is most physical memory machine supports
Main
Memory • VA→ PA at page granularity (VP→ PP)
• Small granularity too expensive!
• By “system”
• Mapping need not preserve contiguity
Disk
• VP need not be mapped to any PP
• Unmapped VPs live on disk (swap)
28
How to manage our memory as a cache?
• 1st/2nd levels: SRAM caches (I$, D$, L2)
CPU
• Set Associative/Hardware Managed
! !
I$ D$ • 3rd level: main memory (DRAM)
! • Made of DRAM
L2 • Managed in software (how?)
• Block size? 4KB+
• Associativity? Fully associative
Main • Replacement policy? In software
Memory • Write-back vs. write-through? Write-back
• Write-allocate vs. write-non-allocate?
Write allocate

Disk
• 4th level: disk (swap space)
• SSD/HDD/etc. -- SLOW!
30
Low %miss At All Costs
• For a memory component: thit vs. %miss tradeoff

• Upper components (I$, D$) emphasize low thit

• Frequent access → minimal thit important
• t miss is not bad → minimal %miss less important
• Low capacity/associativity, write-back or write-thru

• Moving down (L2) emphasis turns to %miss

• Infrequent access → minimal thit less important
• t miss is bad → minimal %miss important
• High capacity/associativity, write-back

• For memory, emphasis entirely on %miss

• t miss is disk access time (measured in ms, not ns)

31
Memory Organization Parameters
Parameter I$/D$ L2 Main Memory
thit <1ns 2-10ns 100ns
tmiss 2-10ns 100ns 10ms (10M ns)
Capacity 8–64KB 256KB–20MB Up to 1TB
Block size 16–64B 64–256B 4+KB
Associativity 1–8 4–16 Full
Replacement Policy LRU/NMRU LRU/NMRU working set
Write-through? Either No No
Write-allocate? Either Yes Yes
Victim buffer? Yes Maybe No
Prefetching? Either Yes Software

32
Uses of Virtual Memory
• Virtual memory is quite a useful feature
• Automatic, transparent memory management just one use
• “Functionality problems are solved by adding levels of indirection”
• Example: program isolation and multiprogramming
• Each process thinks it has 2N bytes of address space
• Each thinks its stack starts at address 0xFFFFFFFF
• System maps VPs from different processes to different PPs
+ Prevents processes from reading/writing each other’s memory

Program1 … … Program2

33
More Uses of Virtual Memory
• Protection
• Piggy-back mechanism to implement page-level protection
• Map virtual page to physical page
… and to Read/Write/Execute protection bits in page table
• Attempt to illegal access, to execute data, to write read-only data?
• Exception → OS terminates program
• Inter-process communication
• Map virtual pages in different processes to same physical page
• Share files via the UNIX mmap() call
OS App1 App2
… … …

• Don’t duplicate shared libraries!

34
Address Translation
virtual address[31:0] VPN[31:16] POFS[15:0]
!
translate ! don’t touch
physical address[25:0] PPN[27:16] POFS[15:0]

• VA→ PA mapping called address translation

• Split VA into virtual page number (VPN) and page offset (POFS)
• Translate VPN into physical page number (PPN)
• POFS is not translated
• VA→ PA = [VPN, POFS] → [PPN, POFS]

• Example above
• 64KB pages → 16-bit POFS
• 32-bit machine → 32-bit VA → 16-bit VPN
• Maximum 256MB memory → 28-bit PA → 12-bit PPN

35
Mechanics of Address Translation
• How are addresses translated?
• In software (now) but with hardware acceleration (a little later)
• Each process allocated a page table (PT)
• Managed by the operating system Page Table (PT)

• Maps VPs to PPs or to disk (swap) addresses

• VP entries empty if page never referenced

vpn
• Translation is table lookup
struct {
union { int ppn, disk block; }
bool is valid, is dirty;
} PTE;
struct PTE pt[NUM VIRTUAL PAGES];

int translate( int vpn) {

if (pt[vpn] .is valid) Disk(swap)
return pt[vpn] .ppn;
}

36
Page Table Size
• How big is a page table on the following machine?
• 4B page table entries (PTEs)
• 32-bit machine
• 4KB pages

• 32-bit machine → 32-bit VA → 4GB virtual memory

• 4GB virtual memory / 4KB page size → 1M VPs
• 1M VPs * 4B PTE → 4MB (per process)

• How big would the page table be with 64KB pages?

• 256KB
• How big would it be for a 64-bit machine?
• 18,014,398,509,481,984KB (oops)
• Page tables can get big
• There are ways of making them smaller
• PA = f(VA) many different data structures possible
• Want a compact representation that has fast lookups!
37
Here’s an interesting datastructure!
“d” “a”

• What is a “trie” data structure “a”

“b”
• Also called a “prefix tree” “root” “c”
“a” A “d”
“b”
“c”
“d”
• What is it used for?
“a”
“b”
“c”
• What properties does it have? “d”

• How is it different from a binary tree?

• How is it different than a hash table?

“a”
“b”
“c”
“d”

Better worst case # accesses compared to hash table!

38
Multi-Level Page Table
• One way: multi-level page tables
• Tree of page tables
• Lowest-level tables hold PTEs
• Upper-level tables hold pointers to lower-level tables
• Different parts of VPN used to index different levels

• Example: two-level page table

• Compute number of pages needed for lowest-level (PTEs)
• 4KB pages / 4B PTEs → 1K PTEs/page
• 1M PTEs / (1K PTEs/page) → 1K pages
• Compute number of pages needed for upper-level (pointers)
• 1K lowest-level pages → 1K pointers
• 1K pointers * 32-bit VA → 4KB → 1 upper level page

39
Multi-Level Page Table
• 20-bit VPN VPN[19:10] VPN[9:0] 2nd-level
PTEs
• Upper 10 bits index 1st-level table
1st-level
• Lower 10 bits index 2nd-level table pt “root” “pointers”

struct {
union { int ppn, disk block; }
bool is valid, is dirty;
} PTE;

struct {
struct PTE ptes[1024];
} L2PT;
struct L2PT *pt[1024];

int translate( int vpn) {

struct L2PT *l2pt = pt[vpn>>10];
if (l2pt && l2pt->ptes[vpn&1023].is_valid)
return l2pt->ptes[vpn&1023]. ppn;
}
40
Multi-Level Page Table (PT)
• Have we saved any space?
• Isn’t total size of 2nd level tables same as single-level
table (i.e., 4MB)?
• Yes, but …

• Large virtual address regions unused

• Corresponding 2nd-level tables need not exist
• Corresponding 1st-level pointers are null

• Example: 2MB code, 64KB stack, 16MB heap

• Each 2nd-level table maps 4MB of virtual addresses
• 1 for code, 1 for stack, 4 for heap, (+1 1st-level)
• 7 total pages = 28KB (much less than 4MB)

41
Translation in X86

Virtual Address VA

APP APP

OS CR3

Hardware Physical
Address PA
Courtesy Jayneel Gandhi (Agile
Paging)
At most mem accesses = 4 42
Alternative: Inverted/Hashed Page Tables
Inverted
Page Table
Base of Table

Table
PID Offset PA of IPTE
hash +
一 VPN PID PTE
VPN

Size of Inverted Page table only needs to be

proportional to the size of the physical memory

Each VPN can only be mapped to a small set

of entries according to a hash function

To translate a VPN, check all allowed table

entries for matching VPN and PID Physical
Memory
How many memory lookups per translation?

43
Address Translation Mechanics
• Three remaining questions
• Who performs translation?
• When do you translate?
• Where does page table reside?

• Conceptual view:
• Who: OS
• Disallow program from modifying its own page table entries
• When: Translate virtual address before every cache access
• Walk the page table for every load/store/instruction-fetch
• Where: Memory

• Actual approach:
• Cache translations in a “translation cache” to avoid repeated lookup

44
Translation Lookaside Buffer
CPU • Functionality problem? add indirection
• Performance problem? add cache
VA!
TLB
!
TLB
VA
• Address translation too slow?
• Cache translations in translation lookaside
PA ! buffer (TLB)
I$ D$ • Small cache: 16–512 entries
!PAPA! • Small TLBs often fully associative (<64)
L2 + Exploits temporal locality in page table (PT)
• What if an entry isn’t found in the TLB?
PA
• Invoke TLB miss handler
Main
Memory ! “tag” “data”
VPN PPN
VPN PPN
VPN PPN
↓
45
TLB Misses and Miss Handling
• TLB miss: requested PTE not in TLB, search page table
• Software routine, e.g., Alpha, SPARC, MIPS
• Special instructions for accessing TLB directly
• Latency: one or two memory accesses + trap
• Hardware finite state machine (FSM), e.g., x86
• “Page table walker”
• Store page table root in hardware register
• Page table root and table pointers are physical addresses
+ Latency: saves cost of OS call
• In both cases, reads use the the standard cache hierarchy
+ Allows caches to help speed up search of the page table

• Nested TLB miss: miss handler itself misses in the TLB

• Solution #1: Allow recursive TLB misses (very tricky)
• Solution #2: Lock TLB entries for page table into TLB
• Solution#3:Avoid problem using physical address in pagetable
46
TLB Performance
• TLB Reach = # TLB entries * Page size
= 64 * 4KB = 256KB << L2 cache size
Solution #1: Big pages (e.g., 4MB)
TLB Reach = 256MB, but internal fragmentation
How to support both big and small pages?
Solution #2: Two-level TLB
L1: 64-128 entries, L2: 512-2048 entries
Solution #3: Software TLB (aka TSB)
in memory TLB: 32K entries (or more)
low-associativity (e.g., 2-way), longer hit time
Much faster than page table access

47
Page Faults
• Page fault: PTE not in page table
• Page is simply not in memory
• Starts out as a TLB miss, detected by OS handler/hardware FSM

• OS routine
• Choose a physical page to replace
• “Working set”: more refined software version of LRU
• Tries to see which pages are actively being used
• Balances needs of all current running applications
• If dirty, write to disk
• Read missing page from disk
• Takes so long (~10ms), OS schedules another task
• Treat like a normal TLB miss from here

48
Physical (Address) Caches
• Memory hierarchy so far: physical caches
CPU
• Indexed and tagged by Pas
VA!
TLB
!
TLB
VA
• Physically Indexed (PI)
• Physically Tagged (PT)
• Translate to PA at the outset
PA ! + Cached inter-process communication works
I$ D$ • Single copy indexed by PA
!PA
PA!
– Slow: adds at least one cycle to thit
L2

PA
Main
Memory

49
Virtual Address Caches (VI/VT)
CPU • Alternative: virtual caches
• Indexed and tagged by VAs (VI and VT)
VA! !VA • Translate to PAs only to access L2
I$ D$ + Fast: avoids translation latency in common case
– Problem: VAs from differentprocesses are
VA ! distinct physical locations (with different values)
TLB (homonyms: same address different meaning)
PA ! • What to do on process switches?
L2 • Flush caches? Slow
• Add process IDs to cache tags
PA • Does inter-process communication work?
Main • Synonyms: multiple VAs map to same PA
Memory • Can’t allow same PA in the cache twice
• Can be handled, but very complicated (bonus)

50
Parallel TLB/Cache Access (VI/PT)

• Compromise: access TLB in parallel

CPU • In small caches, indexof VAandPA the same
VA
TLB
!I$ !VA
D$ TLB
• VI == PI
• Use the VA to index the cache
PA! • Tagged by PAs
• Cache access and address translation in parallel
L2
+ No context-switching/aliasing problems
PA + Fast: no additional thit cycles
Main
Memory • Common organization in processors today

Cache View tag [31:12] index [11:5] off[4:0]

Virtual View VPN [31:16] page offset [15:0]
Physical View PPN [31:16] page offset [15:0]

51
Parallel Cache/TLB Access

Fully associative TLB

==
• Parallel cache/TLB … ==
• If address translation
doesn’t change index ==
• VPN/index don’t overlap

TLB ==

TLB hit/miss— — cache

cache hit/miss
virtual tag [31:12] index [11:5] [4:0]
address VPN [31:16] page offset [15:0]

data
52
Cache Size And Page Size

[31:12] ? index [11:5] [4:0]

VPN [31:16] page offset [15:0]

• Relationship between page size and L1 cache size

• Forced by non-overlap between VPN and IDX portions of VA
• Which is required for TLB access
• Rule: (cache size) / (associativity) ≤ page size
• Result: associativity increases allowable cache sizes
• Example: Pentium 4, 4KB pages, 8KB, 2-way SA L1 data cache
• If cache is too big, same issues as virtually-indexed caches

53
TLB Organization
• Like caches: TLBs also have ABCs
• Capacity
• Associativity (At least 4-way associative, fully-associative common)
• What does it mean for a TLB to have a block size of two?
• Two consecutive VPs share a single tag

• Like caches: there can be L2 TLBs

• Why? Think about this …

• Rule of thumb: TLB should “cover” L2 contents (maybe LLC)

• In other words: (#PTEs in TLB) * page size ≥ L2 size
• Why? Think about relative miss latency in each …

54
What addresses in the pipeline?
• What type of address for LSQ?
• (synonyms within a process -> same physical)

store position flush?

load queue SQ
(LQ)

address
head == == head
== ==
==
== ==
== age ==
tail == == tail
==
== ==
== ==

D$/TLB
55
Intel Core i7 Memory System
Processor package
Core x4
Page Tbl Instruction MMU
Core
Walker fetch (addr translation)

L1 d-cache L1 i-cache L1 d-TLB L1 i-TLB

32 KB, 8-way 32 KB, 8-way 64 entries, 4-way 128 entries, 4-way
t t
Private L2 unified cache Private L2 unified TLB
256 KB, 8-way 512 entries, 4-way
To other
QuickPath interconnect
4 links @ 25.6 GB/s cores
each To I/O
bridge

Shared L3 unified cache DDR3 Memory controller

8 MB, 16-way 3 x 64 bit @ 10.66 GB/s
(shared by all cores) 32 GB/s total (shared by all cores)

Main memory
56
End-to-end Core i7 Address Translation
32/64
CPU L2, L3, and
Result
main memory
36 12
Virtual address (VA)
VPN VPO L1 L1
miss
32 ! 4
hit
TLBT TLBI
L1 d-cache
! ↓ ! TLB (64 sets, 8 lines/set)
hit
TLB
miss
...

...
L1 TLB (16 sets, 4
entries/set)
9 9 9 9
40 12 40 6 6
VPN1 VPN2 VPN3 VPN4
PPN PPO CT CI CO
Physical
CR address
3
PTE PTE PTE PTE (PA)

Page tables 57
Alternative to VM: base/bound registers
• Each process is given a non-overlapping, contiguous
physical memory region
•
When a process is swapped in, OS sets base to the start of
the process’s memory region and bound to the end of the
region
• On memory references, HW translation & protection check
• PA = EA + base
• provided (PA < bound), Base
active process’s
• else violations Bound region

privileged control another process’s

registers region
physical mem.

58
Also Segmented Address Space
• segment == a base and bound pair
• segmented addressing gives each process multiple segments
• initially, separate code and data segments
- 2sets of base-and-bound reg’sfor instand datafetch
-allowed sharing code segments
• became more and more elaborate: code,data, stack,etc.
• also (ab)used as a way for an ISA with a small EA space to address
a larger physical memory space
SEG # EA

segment tables
must be 1. PA
privileged data segment +,< &
structures and 2. base
table
private/unique to & okay?
each process bound

Cheaper translation, but fragmentation?

59
Virtual Memory
• Virtual memory ubiquitous today
• Certainly in general-purpose (in a computer) processors
• But even many embedded (in non-computer) processors support it

• Several forms of virtual memory

• Paging (aka flat memory): equal sized translation blocks
• Most systems do this
• Segmentation: variable sized (overlapping?) translation blocks
• x86 used this rather than 32-bits to break 16-bit (64KB) limit
• Memory allocation not fun … (fragmentation?)
• Paged segments: too many indirections …
• Today: X86-64 does not use segmentation in 64-bit mode
• forces all segment registers (CS,SS,DS,ES to 0-2^64)

60
Memory Protection and Isolation
• Important role of virtual memory today

• Virtual memory protects applications from one another

• OS uses indirection to isolate applications
• One buggy program should not corrupt the OS or other programs
+ Comes “for free” with translation
– However, the protection is limited

– What about protection from …

• Viruses and worms?
• Stack smashing
• Malicious/buggy services?
• Other applications with which you want to communicate

61
Stack Smashing via Buffer Overflow
int i = 0;
char buf[128];
while ((buf[i++] = getc()) != ’\n’) ; ra
return;

buf[128]
• Stack smashing via buffer overflow
• Oldest trick in the virus book
ra
• Exploits stack frame layout and …
• Sloppy code: length-unchecked copy to stack buffer
• “Attack string”: code (128B) + &buf[0] (4B)
• Caller return address replaced with pointer to attack code &buf[0]
• Caller return …
• …executes attack code at caller’s privilege level attack
code

62
Page-Level Protection
struct {
union { int ppn, disk block; }
int is valid, is dirty, permissions;
} PTE;

• Page-level protection
• Piggy-backs on translation infrastructure
• Each PTE associated with permission bits: Read, Write, eXecute
• Read/execute (RX): for code
• Read (R): read-only data
• Read/write (RW): read-write data
• TLB access traps on illegal operations (e.g., write to RX page)
• To defeat stack-smashing? Set stack permissions to RW
• Will trap if you try to execute &buf[0]

– Unfortunately, hackers have many other tricks (return oriented

programming)
63
Bonus

64
(not really VM, but lets talk about it)

• Criticality Aware Cache Hierarchies

• Problem?
• Insight?
• Approach?
• Benefits?

65
66
Background: μArch Dependence Graphs
[Fields, ISCA 2001]
• Represent execution nodes.
• Edges between Inorder Dependencies.
Fetch/Dispatch/Commit

F 0 F Fetch0 F 0 F
↓
Fetch-Dispatch !↓ ↓ ↓
Latency
0 0
D D Dispatch D 0 D
↓ ↓↓ ↓ ↓
Dispatch Before
E Execute E E
Execute E
↓ ↓ ↓ ↓
Execution
P Latency P Complete P P
↓↓ ↓
↓
Complete Before ↓
C 0
Commit C Commit
0 C 0 C

Inst. 1 Inst. 2 Inst. 3 Inst. 4

67
Background: μArch Dependence
8

Graphs
D ta
Branch Critical Pat Executio esource
Misprediction Time Dependence

F F 0 F F
1 ↓ ↓ ↓ ↓
D D D D
! ↓ ↓ ↓
1
E 6 E E 1 E
1
! ↓ ↓↓ ↓↓
0 1
P P P P
↓ ↓ ↓
1
C C C C

bgz r1 r2 = ld[r3] r4 = r2 * 2 r5 = r6 *
68
• Reduced complexity model with only
• D: Dispatch
• E: Execute
• C: Commit

69
Claim: Only critical L2 Hits matter

70
Intuition: Latency matters much more for L1 cache! 71
Intuition: L2 cache least sensitive to non-critical access latency 72
Criticality Aware Tiered Cache Hierarchy
(CATCH)

• Track critical load PCs

• Served from non-L1 on-die
caches

• Prefetch critical loads into

L1
• Accelerate the critical path

73
Criticality detection in hardware

(graph is backwards now …

74
TACT: Data Prefetchers

75
76
Backups

Slide History/Attribution Diagram:

UW Madison UPenn UW Madison
UCLA
Hill, Sohi, Amir Roth, Hill, Sohi, Wood,
Nowatzki
Smith, Wood Milo Martin Various Universities Sankaralingam, Sinclair
Asanovic, Falsafi, Hoe, Lipasti,
Shen, Smith, Vijaykumar
Safe and Efficient Services
• Scenario: module (application) A wants service B provides
• A doesn’t “trust” B and vice versa (e.g., B is kernel)
• How is service provided?
• Option I: conventional call in same address space
+ Can easily pass data back and forth (pass pointers)
– Untrusted module can corrupt your data
• Option II: trap or cross address space call
– Copy data across address spaces: slow, hard if data uses pointers
+ Data is not vulnerable
• Page-level protection helps somewhat, but …
• Page-level protection can be too coarse grained
• If modules share address space, both can change protections

78
Itanium Prevalidated tags
VA

TLB I$

• I$ tag is bit vector, not address tag

• match TLB location for hit
• TLB miss I$ miss
• TLB size tag size (32 entries/32 bits in Itanium 2)

79
RAM

0/1 ? 0/1 • RAM: large storage arrays

?
wordline0 o • Basic structure
• MxN array of bits (M N-bit words)
address

0/1 ? 0/1 ?
• This one is 4x2
wordline1 o • Bits in word connected by wordline
0/1 ? 0/1 ? • Bits in position connected by bitline
wordline2 o • Operation
0/1 • Address decodes into M wordlines
0/1 ? ?
• High wordline → word on bitlines
wordline3 o
• Bit/bitline connection → read/write
bitline1

bitline0

• Access latency
data • #ports * √#bits

80
SRAM

? ? • SRAM: static RAM

• Bits as cross-coupled inverters (CCI)
– Four transistors per bit
address

? ? – More transistors for ports

? ? • “Static” means
• Inverters connected to pwr/gnd
+ Bits naturally/continuously “refreshed”
? ?

• Designed for speed

data

81
DRAM
• DRAM: dynamic RAM
• Bits as capacitors
+ Single transistors as ports
address

+ One transistor per bit/port

• “Dynamic” means
• Capacitors not connected to pwr/gnd
– Stored charge decays over time
– Must be explicitly refreshed

• Designed for density

data

82
Moore’s Law
Year Capacity $/MB Access time
1980 64Kb $1500 250ns
1988 4Mb $50 120ns
1996 64Mb $10 60ns
2004 1Gb $0.5 35ns

• Commodity DRAM parameters

• 16X every 8 years is 2X every 2 years
• Not quite 2X every 18 months but still close

83
DRAM Operation I
• Read: similar to cache read
• Phase I: pre-charge bitlines to 0.5V
address

• Phase II: decode address, enable wordline

• Capacitor swings bitline voltage up(down)
• Sense-amplifier interprets swing as 1(0)
– Destructive read: word bits now discharged

write • Write: similar to cache write

sa sa
• Phase I: decode address, enable wordline
• Phase II: enable bitlines
• High bitlines charge corresponding capacitors

data – What about leakage over time?

84
DRAM Operation II
• Solution: add set of D-latches (row buffer)

• Read: two steps

address

• Step I: read selected word into row buffer

• Step IIA: read row buffer out to pins
• Step IIB: write row buffer back to selected word
+ Solves “destructive read” problem

r-I
sa sa
• Write: two steps
• Step IA: read selected word into row buffer
r ↓ ↓ • Step IB: write data into row buffer
r/w-I DL • Step II: write row buffer back to selected word
DL

r/w-II
data

85
DRAM Refresh
• DRAM periodically refreshes all contents
• Loops through all words
• Reads word into row buffer
address

• Writes row buffer back into DRAM array

• 1–2% of DRAM time occupied by refresh

sa sa

↓ ↓
DL DL

data

86
DRAM Parameters
• DRAM parameters
• Large capacity: e.g., 64–256Mb
address • Arranged as square
+ Minimizes wire length
+ Maximizes refresh efficiency

DRAM
bit array • Narrow data interface: 1–16 bit
• Cheap packages → few bus pins

! ! ! • Narrow address interface: N/2 bits

row buffer • 16Mb DRAM has a 12-bit address bus
• How does that work?

data

87
Two-Level Addressing
address • Two-level addressing
[23:12] [11:2] • Row decoder/column muxes share
↓ ↓ address lines
• Two strobes (RAS, CAS) signal which
part of address currently on bus
RAS
12to4K decoder

• Asynchronous access
4K x 4K
• Level 1: RAS high
bits
• Upper address bits on address bus
• Read row into row buffer
! ! ! • Level 2: CAS high
row buffer • Lower address bits on address bus
4 1Kto1 muxes
• Mux row buffer onto data bus

CAS data

88
Access Latency and Cycle Time
• DRAM access much slower than SRAM
• More bits → longer wires
• Buffered access with two-level addressing
• SRAM access latency: <1ns
• DRAM access latency: 30–50ns

• DRAM cycle time also longer than access time

• Cycle time: time between start of consecutive accesses
• SRAM: cycle time = access time
• Begin second access as soon as first access finishes
• DRAM: cycle time = 2 * access time
• Why? Can’t begin new access while DRAM is refreshing row

89
DRAM Latency and Power Derivations
• Same basic form as SRAM
• Most of the equations are geometrically derived
• Same structure for decoders, wordlines, muxes
• Some differences
• Somewhat different pre-charge/sensing scheme
• Array access represents smaller part of total access
• Arrays not multi-ported

90
Building a Memory System
CPU • How to build an efficient main memory
out of standard DRAM chips?
! !
I$ D$
• How many DRAM chips?
! • What width/speed (data) bus to use?
L2 • Assume separate address bus

Main • Main memory interface: L2 miss blocks

Memory • What do you want t miss-L2 to be?

Disk(swap)

91
An Example Memory System
• Parameters
• 32-bit machine
• L2 with 32B blocks
• 4Mx16b DRAMs, 20ns access time, 40ns cycle time
• 100MHz (10ns period) data bus
• 100MHz, 32-bit address bus

• How many DRAM chips?

• How wide to make the data bus?

92
First Memory System Design
T (ns) DRAM Data Bus
2B
10 [31:30]
4M 20 [31:30]
x 30 refresh [31:30]
2B
40 refresh
50 [29:28]

• 1 DRAM + 16b bus 60 [29:28]

70 refresh [29:28]
• Access time: 630ns
80 refresh
• Not including address
… … …
• Cycle time: 640ns 600 refresh
• DRAM ready to handle another miss 610 [1:0]
620 [1:0]
630 refresh [1:0]
640 refresh

93
Second Memory System Design
T (ns) DRAM Bus
4b
10 [31:30]
4M 20 [31:30]
x 30 refresh [31H]
2B
40 refresh [31L]
50 [29:28] [30H]

• 1 DRAM + 4b bus 60 [29:28] [30L]

70 refresh [29H]
• One DRAM chip, don’t need 16b bus
80 refresh [29L]
• Balanced system → match bandwidths
… … …
• DRAM: 2B / 40ns → 4b / 10ns 600 [1:0] [2H]
610 [1:0] [2L]

• Access time: 660ns (30ns longer, 4%) 620 refresh [1H]

640 refresh [1L]
• Cycle time: 640ns (same)
650 [0H]
+ Much cheaper
660 [0L]

94
Third Memory System Design
T (ns) DRAM0 DRAM1 DRAM15 Bus
32B
10 [31:30] [29:28] [1:0]
4M 4M 4M 4M 20 [31:30] [29:28] [1:0]
x x x … x 30 refresh refresh refresh [31:0]
2B 2B 2B 2B
40 refresh refresh refresh
0 1 2 15

• How fast can we go?

• 16 DRAM chips + 32B bus
• Stripe data across chips
• Byte M in chip (M/2)%16
• Access time: 30ns
• Cycle time: 40ns
– 32B bus is very expensive
– 128MB of memory isn’t, but you may not want that much

95
Latency and Bandwidth
• In general, given bus parameters …
• Find smallest number of chips that minimizes cycle time
• Approach: match bandwidths

96
Fourth Memory System Design
T (ns) DRAM0 DRAM1 DRAM2 DRAM3 Bus
2B
10 [31:30] [29:28] [27:26] [25:24]
4M 4M 4M 4M 20 [31:30] [29:28] [27:26] [25:24]
x x x x 30 refresh refresh refresh refresh [31:30]
2B 2B 2B 2B
40 refresh refresh refresh refresh [29:28]
0 1 2 3
50 [23:22] [21:20] [19:18] [17:16] [27:26]
60 [23:22] [21:20] [19:18] [17:16] [25:24]

• 2B bus … … … … … …
110 refresh refresh refresh refresh [15:14]
• Bus b/w: 2B/10ns
120 refresh refresh refresh refresh [13:12]
• DRAM b/w: 2B/40ns 130 [7:6] [5:4] [3:2] [1:0] [11:10]
• 4 DRAM chips 140 [7:6] [5:4] [3:2] [1:0] [9:8]

• Access time: 180ns 150 refresh refresh refresh refresh [7:6]

• Cycle time: 160ns 160 refresh refresh refresh refresh [5:4]

170 [3:2]
180 [1:0]

97
More Bandwidth From One DRAM
• EDO: extended data out
• Multiple row buffer reads/writes
• Send only column addresses

• SDRAM: synchronous DRAM

• Read/write row buffer chunks on clock edge
• No need to send column addresses at all
• DDR SDRAM: double-data rate SDRAM
• Read/write on both clock edges
• Popular these days

• RDRAM: aka RAMBUS

• Multiple row buffers, “split” transactions, other complex behaviors
• Very expensive, high end systems only

98
Memory Access and Clock Frequency
• Nominal clock frequency applies to CPU and caches
• Memory bus has its own clock, typically much slower
• DRAM has no clock (SDRAM operates on bus clock)

• Careful when doing calculations

• Clock frequency increases don’t reduce memory or bus latency
• May make misses come out faster
• At some point memory bandwidth may become a bottleneck
• Further increases in clock speed won’t help at all

99
Memory/Clock Frequency Example
• Parameters
• 1GHz CPU, base CPI = 1
• I$: 1% miss rate, 32B blocks (ignore D$, L2)
• Data bus: 100MHz, 8B (ignore address bus)
• DRAM: 10ns access, 20ns cycle, #chips to match bus bandwidth

• What are CPI and MIPS including memory latency?

• Bus: frequency = 100MHz → latency = 10ns (for 8B)
• Memory system cycle time = bus latency to transfer 32B = 40ns
• Memory system access time = 50ns (10ns DRAM access + bus)
• 1GHz clock → 50ns = 50 cycles
• CPI+memory = 1 + (0.01*50) = 1 + 0.5 = 1.5
• MIPS+memory = 1GHz / 1.5 CPI = 1000MHz / 1.5 CPI = 667

100
Memory/Clock Frequency Example
• What are CPI and MIPS if clock speed is doubled?
• Memory parameters same: 50ns access, 40ns cycle
• 2GHz clock → 50ns = 100 cycles
• CPI+memory = 1 + (0.01*100) = 1 + 1 = 2
• MIPS+memory = 2GHz / 2 CPI = 2000MHz / 2 CPI = 1000

• What is the peak MIPS if we can only change clock?

• Available bandwidth: 32B/40ns = 0.8B/ns
• Needed bandwidth: 0.01*32B/cycle = 0.32B/cycle * X cycle/ns
• Memory is a bottleneck at 0.8/0.32 cycle/ns = 2.5GHz
• No sustained speedup possible after that point
• 2.5GHz clock → 50ns = 125 cycles
• CPI+memory = 1 + (0.01*125) = 1 + 1.25 = 2.25
• MIPS+memory = 2.5GHz / 2.25 CPI = 2500MHz / 2.5 CPI = 1111

101
Digital Rights Management
• Digital rights management
• Question: how to enforce digital copyright?
• Electronically, not legally
• “Trying to make bits un-copiable is like trying to make water un-wet”

– You can easily make a copy of ©
• But, what if © is encrypted?
• In order to use ©, you must also have the decryptor
– Can hack decryptor to spit out unencrypted ©
– Or hack OS to look at decryptor’s physical memory

102
Aside: Public-Key Cryptography
• Public-key cryptography
• Asymmetric: pair of keys
• Kpub : used for encryption, published
• Kpriv : used for decryption, secret
• acrypt(acrypt(M, Kpub), Kpriv) = acrypt(acrypt(M, Kpriv), Kpub) = M
• Well-known example: RSA
• Two uses
• Encryption
• Someone sends you encrypted message M: C = acrypt(M, Kpub)
• You are the only one that can decrypt it
• Authentication/Digital Signature
• You send someone a chosen plaintext M
• They “sign” it by sending back DS = acrypt(M, Kpriv)
• If acrypt(DS, Kpub) = M, then they are who Kpub says they are

103
Research: XOM
you vendor
K priv
decrypt ©
© ©
CPU L2 K pub encrypt
encrypt K pub
K pub
$
• eXecute Only Memory (XOM)
• Stanford research project [Lie+, ASPLOS’00]
• Two registers: Kpriv, Kpub different for every chip (Flash program)
• Software can get at Kpub, but Kpriv is hardware’s secret
• Hardware encryption/decryption engine on L2 fill/spill path
• Vendor sells you acrypt(©, Kpub)
+ Even if someone copies it, they won’t have Kpriv to decrypt it
• Plaintext © only exists on-chip
+ Even OS can never see plaintext ©

104
XOM: Not Quite
• Performance consideration
• Asymmetric en-/de-cryption is slow, symmetric (one key) faster
• E.g., DES, AES (Rijndael)
– Problem: can’t publish encryption key without also...
• XOM Take II
• Vendor chooses random symmetric key Ksym
• Sells you scrypt(©, Ksym) + acrypt(Ksym, Kpub)
• Two-stage decryption
• Decrypt Ksym using Kpriv : slow (but for one piece of data)
• Decrypt © using Ksym : fast
• Note: SSL does the same thing
• Uses asymmetric cryptography to choose symmetric session key

105
Error Detection: Parity
• Parity: simplest scheme
• f(data N–1…0) = XOR(data N–1, …, data1, data0)
+ Single-error detect: detects a single bit flip (common case)
• Will miss two simultaneous bit flips …
• But what are the odds of that happening?
– Zero-error correct: no way to tell which bit flipped

106
Error Correction: Hamming Codes
• Hamming Code
• H(A,B) = number of 1’s in A^B (number of bits that differ)
• Called “Hamming distance”
• Use D data bits + C check bits construct a set of “codewords”
• Check bits are parities on different subsets of data bits
• codewords A,B H(A,B) ≥ α
• No combination of α–1 transforms one codeword into another
• For simple parity: α = 2
• Errors of δ bits (or fewer) can be detected if α = δ + 1
• Errors of β bits or fewer can be corrected if α = 2β + 1
• Errors of δ bits can be detected and errors of β bits can be
corrected if α = β + δ + 1

107
SEC Hamming Code
• SEC: single-error correct
• C = log2D + 1
+ Relative overhead decreases as D grows

• Example: D = 4 → C = 3
• d1 d2 d3 d4 c1 c2 c3 → c1 c2 d1 c3 d2 d3 d4
• c1 = d1 ^ d2 ^ d4 , c2 = d1 ^ d3 ^ d4 , c3 = d2 ^ d3 ^ d4
• Syndrome: ci ^ c’i = 0 ? no error : points to flipped-bit
• Working example
• Original data = 0110 → c1 = 1, c2 = 1, c3 = 0
• Flip d2 = 0010 → c’1 = 0, c’2 = 1, c’3 = 1
• Syndrome = 101 (binary 5) → 5th bit? D2
• Flip c2 → c’1 = 1, c’2 = 0, c’3 = 0
• Syndrome = 010 (binary 2) → 2nd bit? c2

108
SECDED Hamming Code
• SECDED: single error correct, double error detect
• C = log2D + 2
• Additional parity bit to detect additional error
• Example: D = 4 → C = 4
• d1 d2 d3 d4 c1 c2 c3 → c1 c2 d1 c3 d2 d3 d4 c4
• c4 = c1 ^ c2 ^ d1 ^ c3 ^ d2 ^ d3 ^ d4
• Syndrome == 0 and c’4 == c4 → no error
• Syndrome != 0 and c’4 != c4 → 1-bit error
• Syndrome != 0 and c’4 == c4 → 2-bit error
• Syndrome == 0 and c’4 != c4 → c4 error
• Many machines today use 64-bit SECDED code
• C = 8 (one additional byte, 12% overhead)
• ChipKill - correct any aligned 4-bit error
• If an entire DRAM chips dies, the system still works!

109
Bonus

110
111
This Unit: Main Memory
Application
• Memory hierarchy review
OS
Comp iler
• Virtual memory
Firmware
• Address translation and page tables
CPU I/O • Virtual memory’s impact on caches
Me mory • Page-based protection

Digital Circuits
• Organizing a memory system
• Bandwidth matching
Gates & Transistors
• Error correction

112
Static Random Access Memory
row select - Read
Sequence 1. address decode
2. drive row select

_bitline
bitline

3. selected bit-cells drive bitlines

4 . diff. sensing and col. select
5. precharge all bitlines
- Access latency dominated by steps 2 and 3
- Cycling time dominated by steps 2, 3 and 5
- step 2 proportional to 2 m
bit-cell array - step 3 and 5 proportional to 2 n

2n - usually encapsulated by synchronous

n 2n row x 2m-col (sometime pipelined) interface logic
n+m
(n≈m to minimize
overall latency)

m 2m diff pairs
sense amp and mux

1
113
Dynamic Random Access Memory
row enable
- Bits stored as charges on node
capacitance (non-restorative)
_bitline

- bit cell loses charge when read

- bit cell loses charge over time
- Read Sequence
1~3 same as SRAM
RAS 4. a “flip-flopping” sense amp amplifies
bit-cell array and regenerates the bitline, data bit is
2n mux’ed out
n 2n row x 2m-col
5. precharge all bitlines
(n≈m to minmize - A DRAM controller must periodically,
overall latency) either distributed or in a burst, read all
rows within the allowed refresh time
m 2m (10s of ms) synchronous interfaces
sense amp and mux
1 - various hacks to allow faster repeated
A DRAM die comprises reads to the same row
CAS of multiple such arrays
114
Brief History of DRAM
• DRAM (memory): a major force behind computer industry
• Modern DRAM came with introduction of IC (1970)
• Preceded by magnetic “core” memory (1950s)
• Each cell was a small magnetic “donut”
• And by mercury delay lines before that (ENIAC)
• Re-circulating vibrations in mercury tubes

“the one single development that put computers on their feet was the
invention of a reliable form of memory, namely the core memory … It’s
cost was reasonable, it was reliable, and because it was reliable it
could in due course be made large”
Maurice Wilkes
Memoirs of a Computer Programmer, 1985

115
DRAM Basics [Jacob and Wang]
• Precharge and Row Access

116
DRAM Basics, cont.
• Column Access

117
DRAM Basics, cont.
• Data Transfer

118
Open v. Closed Pages
• Open Page
• Row stays active until another row needs to be accessed
• Acts as memory-level cache to reduce latency
• Variable access latency complicates memory controller
• Higher power dissipation (sense amps remain active)
• Closed Page
• Immediately deactivate row after access
• All accesses become Activate Row, Read/Write, Precharge

• Complex power v. performance trade off

119
DRAM Bandwidth
• Use multiple DRAM chips to increase bandwidth
• Recall, access are the same size as second-level cache
• Example, 16 2-byte wide chips for 32B access

• DRAM density increasing faster than demand

• Result: number of memory chips per system decreasing

• Need to increase the bandwidth per chip

• Especially important in game consoles
• SDRAM DDR DDR2 FBDIMM ( DDR3)
• Rambus - high-bandwidth memory
• Used by several game consoles

120
DRAM Evolution
• Survey by Cuppu et al.
1. Early Asynchronous Interface
2. Fast Page Mode/Nibble Mode/Static Column (skip)
3. Extended Data Out
4. Synchronous DRAM & Double Data Rate
5. Rambus & Direct Rambus
6. FB-DIMM

121
Old 64MbitDRAM Example from Micron
Clock Recovery

COLUMNADDRESS

ROWADDRESS 122
Extended Data Out (EDO)
RAS’

CAS’

Row add Column Column Column

add add add

Data Data Data

• Similar to Fast Page Mode

• But overlapped Column Address assert with Data Out
123
Synchronous DRAM (SDRAM)

RAS’ \
CAS’

Column
add
Row add

Data Data Data

• Add Clock and Wider data!

• Also multiple transfers per RAS/CAS
124
Enhanced SDRAM & DDR
• Evolutionary Enhancements on SDRAM:
1. ESDRAM (Enhanced): Overlap row buffer access with
refresh

2. DDR (Double Data Rate): Transfer on both clock edges

3. DDR2’s small improvements
lower voltage, on-chip termination, driver calibration
prefetching, conflict buffering
4. DDR3, more small improvements
lower voltage, 2X speed, 2X prefetching,
2X banks, “fly-by topology”, automatic calibration

125
FB-DIMM

126
Wide v. Narrow Interfaces
• High frequency short wavelength data skew issues
• Balance wire lengths

DDR-2 serpentine board routing FB-DIMM board

routing

127
Rambus RDRAM
• High-frequency, narrow channel
• Time multiplexed “bus” dynamic point-to-point channels
• ~40 pins 1.6GB/s
• Proprietary solution
• Never gained industry-wide acceptance (cost and power)
• Used in some game consoles (e.g., PS2)

RDRAM RDRAM RDRAM

CPU
from_clock
or
Memory to_clock
Controller
Data bus 16 bits @ 800 Mhz

128
DRAM Reliability
• One last thing about DRAM technology … errors
• DRAM bits can flip from 0 1 or 1 0
• Small charge stored per bit
• Energetic α-particle strikes disrupt stored charge
• Many more bits
• Modern DRAM systems: built-in error detection/correction
• Today all servers; most new desktop and laptops
• Key idea: checksum-style redundancy
• Main DRAM chips store data, additional chips store f(data)
• |f(data)| < |data|
• On read: re-compute f(data), compare with stored f(data)
• Different ? Error …
• Option I (detect): kill program
• Option II (correct): enough information to fix error? fix and go on

129
DRAM Error Detection and Correction
address data error

4M 4M 4M 4M 4M
x x x x x
2B 2B 2B 2B 2B
0 1 2 3 f

• Performed by memory controller (not the DRAM chip)

• Error detection/correction schemes distinguished by …
• How many (simultaneous) errors they can detect
• How many (simultaneous) errors they can correct

130
Interleaved Main Memory
• Divide memory into M banks and “interleave” addresses
across them, so word A is
• in bank (A mod M)
• at word (A div M)
Bank 0 Bank 1 Bank 2 Bank n
word 0 word 1 word 2 word n-1
word n word n+1 word n+2 word 2n-1
word 2n word 2n+1 word 2n+2 word 3n-1

PA
Doubleword in bank Bank Word in doubleword

Interleaved memory increases memory BW without wider bus

• Use parallelism in memory banksto hide memory latency
Copyright © 2002 Falsafi, from Hill,
Smith, Sohi, Vijaykumar, and Wood 131
Block interleaved memory systems
• Cache blocks map to separate memory controllers
• Interleave across DRAMs w/i a MC
• Interleave across intra-DRAM banks w/i a DRAM

DRAM
DRAM

DRAM

DRAM
MC MC MC MC
CPU
B B+64 B+128 B+192

Data bus

132
Research: Processing in Memory
address • Processing in memory
• Embed some ALUs in DRAM
• Picture is logical, not physical
DRAM • Do computation in DRAM rather than …
bit array • Move data to from DRAM to CPU
• Compute on CPU

! ! ! • Move data from CPU to DRAM

row buffer • Will come back to this in “vectors” unit
row buffer
• E.g.,: IRAM: intelligent RAM
• Berkeley research project
• [Patterson+,ISCA’97]
! data
133
Memory Hierarchy Review
• Storage: registers, memory, disk
• Memory is the fundamental element

• Memory component performance

• tavg = thit + %miss * t miss
• Can’t get both low thit and %miss in a single structure

• Memory hierarchy
• Upper components: small, fast, expensive
• Lower components: big, slow, cheap
• tavg of hierarchy is close to thit of upper (fastest) component
• 10/90 rule: 90% of stuff found in fastest component
• Temporal/spatial locality: automatic up-down data movement

134
Software Managed Memory
• Isn’t full associativity difficult to implement?
• Yes … in hardware
• Implement fully associative memory in software

• Let’s take a step back…

135

05 Virtual Memory Io
No ratings yet
05 Virtual Memory Io
14 pages
Memory Location
No ratings yet
Memory Location
40 pages
06 Virtual Memory
No ratings yet
06 Virtual Memory
8 pages
Onur 447 Spring15 Lecture20 Virtual Memory Afterlecture
No ratings yet
Onur 447 Spring15 Lecture20 Virtual Memory Afterlecture
89 pages
Topics: - Cache Operations
No ratings yet
Topics: - Cache Operations
6 pages
Lecture 7 Memory 2021
No ratings yet
Lecture 7 Memory 2021
64 pages
ACA Unit 6 - Part - Virtual Memory Notes - U6
No ratings yet
ACA Unit 6 - Part - Virtual Memory Notes - U6
4 pages
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
No ratings yet
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
31 pages
3 Physical Memory Architecture: Assignments
No ratings yet
3 Physical Memory Architecture: Assignments
22 pages
Aca 2ND Assignment
No ratings yet
Aca 2ND Assignment
5 pages
Virtual Memory
No ratings yet
Virtual Memory
13 pages
CS 303 Chapter1, Lecture 3
No ratings yet
CS 303 Chapter1, Lecture 3
18 pages
Ca 12
No ratings yet
Ca 12
64 pages
Memory Management Topics
No ratings yet
Memory Management Topics
6 pages
Virtual Memory in Operating Systems
No ratings yet
Virtual Memory in Operating Systems
48 pages
Part 1B, DR S.W. Moore
100% (3)
Part 1B, DR S.W. Moore
16 pages
Virtual Memory
No ratings yet
Virtual Memory
68 pages
Memory Management in Operating Systems
No ratings yet
Memory Management in Operating Systems
40 pages
Main Memory
No ratings yet
Main Memory
73 pages
Mar 19
No ratings yet
Mar 19
48 pages
03 - Memory Management
No ratings yet
03 - Memory Management
59 pages
L09 AddressTranslation
No ratings yet
L09 AddressTranslation
39 pages
Unit 4
No ratings yet
Unit 4
208 pages
Unit IV CAL 817 Operating System
No ratings yet
Unit IV CAL 817 Operating System
17 pages
NOV11 Virtual Memory
No ratings yet
NOV11 Virtual Memory
15 pages
M2 Today
No ratings yet
M2 Today
106 pages
Virtualization
No ratings yet
Virtualization
137 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
Os - Unit Iv
No ratings yet
Os - Unit Iv
208 pages
Memory Management Basics
No ratings yet
Memory Management Basics
138 pages
11 Memory
No ratings yet
11 Memory
41 pages
WiSH 2025 Workbook Week1 Day5
No ratings yet
WiSH 2025 Workbook Week1 Day5
16 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
24 pages
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
No ratings yet
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
66 pages
Chapter Five A Memory Management
No ratings yet
Chapter Five A Memory Management
52 pages
Security Architecture and Design
No ratings yet
Security Architecture and Design
33 pages
Memory Organization & 8085 Assembly Language Programming
No ratings yet
Memory Organization & 8085 Assembly Language Programming
10 pages
Chapter 4 - Cache Memory
0% (1)
Chapter 4 - Cache Memory
50 pages
Memory Hierarchy and Access Patterns
No ratings yet
Memory Hierarchy and Access Patterns
32 pages
Operating Systems
No ratings yet
Operating Systems
14 pages
Unit 4 - Os - Theory
No ratings yet
Unit 4 - Os - Theory
110 pages
Storage Management
No ratings yet
Storage Management
11 pages
Cs Intro Os
No ratings yet
Cs Intro Os
58 pages
Lecture 07 - Memory Management
No ratings yet
Lecture 07 - Memory Management
87 pages
Unit 4 - 21CSC202J - Operating Systems
No ratings yet
Unit 4 - 21CSC202J - Operating Systems
206 pages
Cache vs. Virtual Memory Explained
No ratings yet
Cache vs. Virtual Memory Explained
33 pages
Memory Management
No ratings yet
Memory Management
55 pages
Cissp Chapter03
No ratings yet
Cissp Chapter03
94 pages
Operating Systems - Chapter 1 TS2
No ratings yet
Operating Systems - Chapter 1 TS2
11 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
CS3451 - Introduction To Operating Systems: Ii Year / Iv Semester
No ratings yet
CS3451 - Introduction To Operating Systems: Ii Year / Iv Semester
18 pages
5 VM Hardware
No ratings yet
5 VM Hardware
44 pages
CS8493 Unit 3
No ratings yet
CS8493 Unit 3
128 pages
Memory L
No ratings yet
Memory L
44 pages
Basics of Operating Systems: What Is An Operating System?
No ratings yet
Basics of Operating Systems: What Is An Operating System?
4 pages
Understanding Cache Memory Architecture
No ratings yet
Understanding Cache Memory Architecture
28 pages
07 Ooo Spec
No ratings yet
07 Ooo Spec
85 pages
10 Caches
No ratings yet
10 Caches
124 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
06 Ooo Basics
No ratings yet
06 Ooo Basics
74 pages
05 Wideissue
No ratings yet
05 Wideissue
77 pages
Mmep 11.11 15
No ratings yet
Mmep 11.11 15
10 pages
Oceans Music
No ratings yet
Oceans Music
1 page
01 Intro
No ratings yet
01 Intro
65 pages
Corro
No ratings yet
Corro
2 pages
English Transcript (Computer Science)
100% (1)
English Transcript (Computer Science)
1 page
Multifunction RF System Simulation Report
No ratings yet
Multifunction RF System Simulation Report
62 pages
Internship Report at Supreme Industries
No ratings yet
Internship Report at Supreme Industries
16 pages
Semi-Centrifugal Casting Guide
No ratings yet
Semi-Centrifugal Casting Guide
10 pages
Journal of Critical Care
No ratings yet
Journal of Critical Care
6 pages
Autocad Presentation Dimensioning and Measure
No ratings yet
Autocad Presentation Dimensioning and Measure
13 pages
Aces - Design Guide FRC Ss 674 2021 - r0
No ratings yet
Aces - Design Guide FRC Ss 674 2021 - r0
104 pages
Investigation Project
No ratings yet
Investigation Project
16 pages
Asian Paints Financial Analysis Report
No ratings yet
Asian Paints Financial Analysis Report
15 pages
Chapter 60 - Xiao Feng, "The Only God"
No ratings yet
Chapter 60 - Xiao Feng, "The Only God"
6 pages
Jetson NX/Nano Carrier Board Specs
No ratings yet
Jetson NX/Nano Carrier Board Specs
10 pages
Circuit Breakers Types
100% (2)
Circuit Breakers Types
42 pages
Box Compression Strength Guide
No ratings yet
Box Compression Strength Guide
4 pages
Large Scale Composites Testing
No ratings yet
Large Scale Composites Testing
38 pages
Colour Coding of Piping Material
No ratings yet
Colour Coding of Piping Material
2 pages
ABORTION Pathophysiology
No ratings yet
ABORTION Pathophysiology
3 pages
Jubilate Agno
No ratings yet
Jubilate Agno
116 pages
Calculus Problems for Students
No ratings yet
Calculus Problems for Students
19 pages
Tksu 10 - 20230214
No ratings yet
Tksu 10 - 20230214
2 pages
Anesthesia in Special Surgeries
No ratings yet
Anesthesia in Special Surgeries
18 pages
Advanced Linear Operator Theorems
No ratings yet
Advanced Linear Operator Theorems
5 pages
English Grammar Exercise Answers
No ratings yet
English Grammar Exercise Answers
4 pages
Overlock12 Machine - Pps
No ratings yet
Overlock12 Machine - Pps
24 pages
Automatic Transmission Cooling Guidelines
No ratings yet
Automatic Transmission Cooling Guidelines
3 pages
NSM Practical 5
No ratings yet
NSM Practical 5
3 pages
Babygro Book
No ratings yet
Babygro Book
118 pages
Ch10 AndreaGoldsmith
No ratings yet
Ch10 AndreaGoldsmith
8 pages
Nps 2017
No ratings yet
Nps 2017
2 pages
Liver Cirrhosis
100% (2)
Liver Cirrhosis
76 pages
Twin Head Emergency Ip65
No ratings yet
Twin Head Emergency Ip65
4 pages
ICAO NAT Regional SUPPS Doc 7030 Update
No ratings yet
ICAO NAT Regional SUPPS Doc 7030 Update
61 pages