CS:APP Chapter 4
Computer Architecture
Instruction Set
Architecture
Randal E. Bryant
Carnegie Mellon University
http://csapp.cs.cmu.edu
CS:APP2e
Instruction Set Architecture
Assembly Language View
Processor state Application
Registers, memory, … Program
Instructions
addq, pushq, ret, … Compiler OS
How instructions are encoded
ISA
as bytes
CPU
Layer of Abstraction Design
Above: how to program machine
Processor executes instructions Circuit
Design
in a sequence
Below: what needs to be built Chip
Use variety of tricks to make it Layout
run fast
E.g., execute multiple
–2– instructions simultaneously CS:APP3e
Y86-64 Processor State
RF: Program CC: Stat: Program status
registers Condition
codes
%rax %rsp %r8 %r12
%rcx %rbp %r9 %r13 ZF SF OF DMEM: Memory
%rdx %rsi %r10 %r14 PC
%rbx %rdi %r11
Program Registers
15 registers (omit %r15). Each 64 bits
Condition Codes
Single-bit flags set by arithmetic or logical instructions
» ZF: Zero SF:Negative OF:
Overflow
Program Counter
Indicates address of next instruction
Program Status
Indicates either normal operation or some error condition
Memory
–3– Byte-addressable storage array CS:APP3e
Y86-64 Instruction Set #1
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB) 4 0 rA rB D
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
jXX Dest 7 fn Dest
call Dest 8 0 Dest
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
–4– CS:APP3e
Y86-64 Instructions
Format
1–10 bytes of information read from memory
Can determine instruction length from first byte
Not as many instruction types, and simpler encoding than with
x86-64
Each accesses and modifies some part(s) of the program
state
–5– CS:APP3e
Y86-64 Instruction Set #2 rrmovq 2 0
Byte 0 1 2 3 4 5 6 7 8 9
cmovle 2 1
halt 0 0
cmovl 2 2
nop 1 0
cmove 2 3
cmovXX rA, rB 2 fn rA rB
cmovne 2 4
irmovq V, rB 3 0 F rB V
cmovge 2 5
rmmovq rA, D(rB) 4 0 rA rB D
cmovg 2 6
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
jXX Dest 7 fn Dest
call Dest 8 0 Dest
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
–6– CS:APP3e
Y86-64 Instruction Set #3
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB) 4 0 rA rB D
addq 6 0
mrmovq D(rB), rA 5 0 rA rB D
subq 6 1
OPq rA, rB 6 fn rA rB
andq 6 2
jXX Dest 7 fn Dest
xorq 6 3
call Dest 8 0 Dest
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
–7– CS:APP3e
Y86-64 Instruction Set #4
Byte 0 1 2 3 4 5 6 7 jmp
8 97 0
halt 0 0 jle 7 1
nop 1 0 jl 7 2
cmovXX rA, rB 2 fn rA rB je 7 3
irmovq V, rB 3 0 F rB V jne 7 4
rmmovq rA, D(rB) 4 0 rA rB D jge 7 5
mrmovq D(rB), rA 5 0 rA rB D jg 7 6
OPq rA, rB 6 fn rA rB
jXX Dest 7 fn Dest
call Dest 8 0 Dest
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
–8– CS:APP3e
Encoding Registers
Each register has 4-bit ID
%rax 0 %r8 8
%rcx 1 %r9 9
%rdx 2 %r10 A
%rbx 3 %r11 B
%rsp 4 %r12 C
%rbp 5 %r13 D
%rsi 6 %r14 E
%rdi 7 No Register F
Same encoding as in x86-64
Register ID 15 (0xF) indicates “no register”
Will use this in our hardware design in multiple places
–9– CS:APP3e
Instruction Example
Addition Instruction
Generic Form
Encoded Representation
addq rA, rB 6 0 rA rB
Add value in register rA to that in register rB
Store result in register rB
Note that Y86-64 only allows addition to be applied to register
data
Set condition codes based on result
e.g., addq %rax,%rsi Encoding: 60 06
Two-byte encoding
First indicates instruction type
Second gives source and destination registers
– 10 – CS:APP3e
Arithmetic and Logical Operations
Instruction Code Function Code
Add Refer to generically as
“OPq”
addq rA, rB 6 0 rA rB Encodings differ only by
“function code”
Subtract (rA from rB) Low-order 4 bytes in first
subq rA, rB 6 1 rA rB instruction word
Set condition codes as
And side effect
andq rA, rB 6 2 rA rB
Exclusive-Or
xorq rA, rB 6 3 rA rB
– 11 – CS:APP3e
Move Operations
Register Register
rrmovq rA, rB 2 0
Immediate Register
irmovq V, rB 3 0 F rB V
Register Memory
rmmovq rA, D(rB) 4 0 rA rB D
Memory Register
mrmovq D(rB), rA 5 0 rA rB D
Like the x86-64 movq instruction
Simpler format for memory addresses
Give different names to keep them distinct
– 12 – CS:APP3e
Move Instruction Examples
X86-64 Y86-64
movq $0xabcd, %rdx irmovq $0xabcd, %rdx
Encoding: 30 f2 cd ab 00 00 00 00 00 00
movq %rsp, %rbx rrmovq %rsp, %rbx
Encoding: 20 43
movq -12(%rbp),%rcx mrmovq -12(%rbp),%rcx
Encoding: 50 15 f4 ff ff ff ff ff ff ff
movq %rsi,0x41c(%rsp) rmmovq %rsi,0x41c(%rsp)
Encoding: 40 64 1c 04 00 00 00 00 00 00
– 13 – CS:APP3e
Conditional Move Instructions
Move Unconditionally
rrmovq rA, rB 2 0 rA rB Refer to generically as
Move When Less or Equal “cmovXX”
Encodings differ only by
cmovle rA, rB 2 1 rA rB
“function code”
Move When Less
Based on values of
cmovl rA, rB 2 2 rA rB
condition codes
Move When Equal Variants of rrmovq
cmove rA, rB 2 3 rA rB instruction
Move When Not Equal (Conditionally) copy value
2 4 rA rB
from source to destination
cmovne rA, rB
register
Move When Greater or Equal
cmovge rA, rB 2 5 rA rB
Move When Greater
cmovg rA, rB 2 6 rA rB
– 14 – CS:APP3e
Jump Instructions
Jump (Conditionally)
jXX Dest 7 fn Dest
Refer to generically as “jXX”
Encodings differ only by “function code” fn
Based on values of condition codes
Same as x86-64 counterparts
Encode full destination address
Unlike PC-relative addressing seen in x86-64
– 15 – CS:APP3e
Jump Instructions
Jump Unconditionally
jmp Dest 7 0 Dest
Jump When Less or Equal
jle Dest 7 1 Dest
Jump When Less
jl Dest 7 2 Dest
Jump When Equal
je Dest 7 3 Dest
Jump When Not Equal
jne Dest 7 4 Dest
Jump When Greater or Equal
jge Dest 7 5 Dest
Jump When Greater
jg Dest 7 6 Dest
– 16 – CS:APP3e
Y86-64 Program Stack
Stack
“Bottom” Region of memory holding
program data
Used in Y86-64 (and x86-64) for
supporting procedure calls
Stack top indicated by %rsp
Address of top stack element
•
Increasing • Stack grows toward lower
Addresses addresses
•
Top element is at highest
address in the stack
When pushing, must first
decrement stack pointer
After popping, increment stack
%rsp
pointer
Stack “Top”
– 17 – CS:APP3e
Stack Operations
pushq rA A 0 rA F
Decrement %rsp by 8
Store word from rA to memory at %rsp
Like x86-64
popq rA B 0 rA F
Read word from memory at %rsp
Save in rA
Increment %rsp by 8
Like x86-64
– 18 – CS:APP3e
Subroutine Call and Return
call Dest 8 0 Dest
Push address of next instruction onto stack
Start executing instructions at Dest
Like x86-64
ret 9 0
Pop value from stack
Use as address for next instruction
Like x86-64
– 19 – CS:APP3e
Miscellaneous Instructions
nop 1 0
Don’t do anything
halt 0 0
Stop executing instructions
x86-64 has comparable instruction, but can’t execute it
in user mode
We will use it to stop the simulator
Encoding ensures that program hitting memory
initialized to zero will halt
– 20 – CS:APP3e
Status Conditions
Mnemonic Code Normal operation
AOK 1
Halt instruction encountered
Mnemonic Code
HLT 2
Bad address (either instruction or data)
Mnemonic Code encountered
ADR 3
Invalid instruction encountered
Mnemonic Code
INS 4
Desired Behavior
If AOK, keep going
Otherwise, stop program execution
– 21 – CS:APP3e
Writing Y86-64 Code
Try to Use C Compiler as Much as Possible
Write code in C
Compile for x86-64 with gcc –Og –S
Transliterate into Y86-64
Modern compilers make this more difficult
Coding Example
Find number of elements in null-terminated list
int len1(int a[]);
a 5043
6125 3
7395
0
– 22 – CS:APP3e
Y86-64 Code Generation Example
First Try
Problem
Write to
Hard typical
do array
array
indexing
code on Y86-64
Since don’t have scaled addressing modes
/* Find number of elements in
null-terminated list */
long len(long a[]) L3:
{ addq $1,%rax
long len; cmpq $0, (%rdi,%rax,8)
for (len = 0; a[len]; len++) jne L3
;
return len;
}
Compile with gcc -Og -S
– 23 – CS:APP3e
Y86-64 Code Generation Example #2
Second Try
Result
Compiler
Write C code
generates
that mimics
exact expected
same codeY86-64
as before!
code
Compiler converts both versions into same intermediate form
long len2(long *a)
{
long ip = (long) a;
long val = *(long *) ip;
long len = 0;
while (val) {
ip += sizeof(long);
len++;
val = *(long *) ip;
}
return len;
}
– 24 – CS:APP3e
Y86-64 Code Generation Example #3
len:
irmovq $1, %r8 # Constant 1
irmovq $8, %r9 # Constant 8
irmovq $0, %rax # len = 0
mrmovq (%rdi), %rdx # val = *a Register Use
andq %rdx, %rdx # Test val
%rdi a
je Done # If zero, goto Done
Loop: %rax len
addq %r8, %rax # len++ %rdx val
addq %r9, %rdi # a++
mrmovq (%rdi), %rdx # val = *a %r8 1
andq %rdx, %rdx # Test val %r9 8
jne Loop # If !0, goto Loop
Done:
ret
– 25 – CS:APP3e
Y86-64 Sample Program Structure #1
init: # Initialization
. . . Program starts at
call Main address 0
halt
Must set up stack
.align 8 # Program data Where located
array: Pointer values
. . . Make sure don’t
overwrite code!
Main: # Main function Must initialize data
. . .
call len . . .
len: # Length function
. . .
.pos 0x100 # Placement of stack
Stack:
– 26 – CS:APP3e
Y86-64 Program Structure #2
init:
# Set up stack pointer Program starts at
irmovq Stack, %rsp address 0
# Execute main program
Must set up stack
call Main
# Terminate Must initialize data
halt Can use symbolic
names
# Array of 4 elements + terminating 0
.align 8
Array:
.quad 0x000d000d000d000d
.quad 0x00c000c000c000c0
.quad 0x0b000b000b000b00
.quad 0xa000a000a000a000
.quad 0
– 27 – CS:APP3e
Y86-64 Program Structure #3
Main:
irmovq array,%rdi
# call len(array)
call len
ret
Set up call to len
Follow x86-64 procedure conventions
Push array address as argument
– 28 – CS:APP3e
Assembling Y86-64 Program
unix> yas len.ys
Generates “object code” file len.yo
Actually looks like disassembler output
0x054: | len:
0x054: 30f80100000000000000 | irmovq $1, %r8 # Constant 1
0x05e: 30f90800000000000000 | irmovq $8, %r9 # Constant 8
0x068: 30f00000000000000000 | irmovq $0, %rax # len = 0
0x072: 50270000000000000000 | mrmovq (%rdi), %rdx # val = *a
0x07c: 6222 | andq %rdx, %rdx # Test val
0x07e: 73a000000000000000 | je Done # If zero, goto Done
0x087: | Loop:
0x087: 6080 | addq %r8, %rax # len++
0x089: 6097 | addq %r9, %rdi # a++
0x08b: 50270000000000000000 | mrmovq (%rdi), %rdx # val = *a
0x095: 6222 | andq %rdx, %rdx # Test val
0x097: 748700000000000000 | jne Loop # If !0, goto Loop
0x0a0: | Done:
0x0a0: 90 | ret
– 29 – CS:APP3e
Simulating Y86-64 Program
unix> yis len.yo
Instruction set simulator
Computes effect of each instruction on processor state
Prints changes in state from original
Stopped in 33 steps at PC = 0x13. Status 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%rax: 0x0000000000000000 0x0000000000000004
%rsp: 0x0000000000000000 0x0000000000000100
%rdi: 0x0000000000000000 0x0000000000000038
%r8: 0x0000000000000000 0x0000000000000001
%r9: 0x0000000000000000 0x0000000000000008
Changes to memory:
0x00f0: 0x0000000000000000 0x0000000000000053
0x00f8: 0x0000000000000000 0x0000000000000013
– 30 – CS:APP3e
CISC Instruction Sets
Complex Instruction Set Computer
IA32 is example
Stack-oriented instruction set
Use stack to pass arguments, save program counter
Explicit push and pop instructions
Arithmetic instructions can access memory
addq %rax, 12(%rbx,%rcx,8)
requires memory read and write
Complex address calculation
Condition codes
Set as side effect of arithmetic and logical instructions
Philosophy
Add instructions to perform “typical” programming tasks
– 31 – CS:APP3e
RISC Instruction Sets
Reduced Instruction Set Computer
Internal project at IBM, later popularized by Hennessy
(Stanford) and Patterson (Berkeley)
Fewer, simpler instructions
Might take more to get given task done
Can execute them with small and fast hardware
Register-oriented instruction set
Many more (typically 32) registers
Use for arguments, return pointer, temporaries
Only load and store instructions can access memory
Similar to Y86-64 mrmovq and rmmovq
No Condition codes
Test instructions return 0/1 in register
– 32 – CS:APP3e
MIPS Registers
$0 $0 Constant 0 $16 $s0
$1 $at Reserved Temp. $17 $s1
$2 $v0 $18 $s2
Return Values Callee Save
$3 $v1 $19 $s3 Temporaries:
$4 $a0 $20 $s4 May not be
$5 $a1 $21 $s5 overwritten by
Procedure arguments called procedures
$6 $a2 $22 $s6
$7 $a3 $23 $s7
$8 $t0 $24 $t8
Caller Save Temp
$9 $t1 $25 $t9
$10 $t2 Caller Save $26 $k0 Reserved for
$11 $t3 Temporaries: $27 $k1 Operating Sys
May be overwritten by
$12 $t4 called procedures $28 $gp Global Pointer
$13 $t5 $29 $sp Stack Pointer
$14 $t6 $30 $s8 Callee Save Temp
$15 $t7 $31 $ra Return Address
– 33 – CS:APP3e
MIPS Instruction Examples
R-R
Op Ra Rb Rd 00000 Fn
addu $3,$2,$1 # Register add: $3 = $2+$1
R-I
Op Ra Rb Immediate
addu $3,$2, 3145 # Immediate add: $3 = $2+3145
sll $3,$2,2 # Shift left: $3 = $2 << 2
Branch
Op Ra Rb Offset
beq $3,$2,dest # Branch when $3 = $2
Load/Store
Op Ra Rb Offset
lw $3,16($2) # Load Word: $3 = M[$2+16]
sw $3,16($2) # Store Word: M[$2+16] = $3
– 34 – CS:APP3e
CISC vs. RISC
Original Debate
Strong opinions!
CISC proponents---easy for compiler, fewer code bytes
RISC proponents---better for optimizing compilers, can make
run fast with simple chip design
Current Status
For desktop processors, choice of ISA not a technical issue
With enough hardware, can make anything run fast
Code compatibility more important
x86-64 adopted many RISC features
More registers; use them for argument passing
For embedded processors, RISC makes sense
Smaller, cheaper, less power
Most cell phones use ARM processor
– 35 – CS:APP3e
Summary
Y86-64 Instruction Set Architecture
Similar state and instructions as x86-64
Simpler encodings
Somewhere between CISC and RISC
How Important is ISA Design?
Less now than before
With enough hardware, can make almost anything go fast
– 36 – CS:APP3e