Instructions (354 Review) Forecast
Instructions are the “words” of a computer Basics
Instruction set architecture (ISA) is its “vocabulary” Registers and ALU ops
With a few other things, this defines the interface of computers Memory and load/store
But implementations vary Branches and jumps
We use MIPS ISA: simple, sensable, used in games & supercomp And more . . .
Most common: x86 (IA-32): Intel Pentium* & PC-compatible proc.
Others: PowerPC (Mac), SPARC (Sun), Alpha (Compaq), ...
We won’t write programs
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 2
Basics Basics
C statement Opcode: Specifies the kind of operation (mnemonic)
• f = (g + h) - (i + j) Operands: input and output data (source/destination)
MIPS instructions Operands t0 & t1 are temps
• add t0, g, h
One operation, two inputs, one output
• add t1, i, j
Multiple instructions for one C statement
• sub f, t0, t1
Opcodes/mnemonic, operands, source/destination
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 4
Why not have bigger instructions? Registers and ALU ops
Why not have “f = (g + h) - (i + j)” as one instruction? Ok, I lied!
Church’s thesis: A very primitive computer can compute anything Operands must be registers, not variables
that a fancy computer can compute - need only logical functions,
read and write memory and data-dependent decisions • add $8, $17, $18
• add $9, $19, $20
Therefore, ISA selected for practical reasons
• sub $16, $8, $9
• performance and cost, not computability
• MIPS has 32 registers $0-$31 (figure next slide)
Regularity tends to improve both $8 & $9 are temps, $16 is f, $17 g, $18 h, $19 i, & $20 j
• E.g., H/W to handle arbitrary number of operands MIPS also allows one constant called “immediate”
• complex and slow and NOT NECESSARY • later we will see immediate is 16 bits [-32768, 32767]
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 5 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 6
Registers and ALU ALU ops
Processor Some ALU ops:
• add, addi, addu, addiu (immediate, unsigned)
$0
Registers
• sub . . .
• mul, div - weird!
• and, andi
• or, ori
ALU
• sll, srl, . . .
$31
Why registers? fit in instructions, smaller is faster
Are registers enough?
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 8
Memory and load/store Memory and load/store
But need more than 32 words of storage Processor Memory
An array of locations M[addr], indexed by addr (figure next slide) $0 0
Registers
1
Data movement (on words or integers)
2
• load word for reg <-- memory 3
• lw $17, 1002 # get input g
• store word for reg --> memory ALU
• sw $16, 1001 # save output f; Note: destination last! $31 1001 f
1002 g
maxmem
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 10
Memory and load/store Memory and load/store
I lied again! Processor Memory
• We need address bytes for character strings $0 0
Registers
• Words take 4 bytes 1
• Therefore, word addresses must be multiples of 4 2
3
• Figure next slide
ALU
$31 4004 f
4008 g
maxmem
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 12
Memory and load/store Memory and load/stor4
Important for arrays Processor Memory
• A[i] = A[i] + h (figure next slide) $0 0
Registers
• # $8 - temp, $18 - h, $21 - (i x 4)
• Astart is 8000 4004 f
• lw $8, Astart($21) # or 8000($21) 4008 g
• add $8, $18, $8 8000 A[0]
ALU 8004 A[1]
• sw $8, Astart($21)
$31 8008 A[2]
MIPS has other load/store for bytes and halfwords
maxmem
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 14
Aside on “Endian” Branches and Jumps
Big endian: MSB at address xxxxxx00 While ( i != j) {
• e.g., IBM, SPARC j= j + i;
i= i + 1;
Little endian: MSB at address xxxxxx11
• e.g., Intel x86 (Windows NT requires it) } # $8 is i, $9 is j, $10 is k
Loop: beq $8, $9, Exit # not !=
Mode selectable
• e.g., PowerPC, MIPS add $9, $9, $8
addi $8, $8 , 1
j Loop
Exit:
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 16
Branches and Jumps Branches and Jumps
Better yet: What does bne do really?
beq $8, $9, Exit # not != read $8; read $9, compare
Loop: add $9, $9, $8 set PC = PC + 4 or PC = Target
addi $8, $8 , 1 To do compares other than = or !=
bne $8, $9, Loop • e.g., blt $8, $9, Target # assembler pseudo-instr
• expanded to
Exit:
• slt $1, $8, $9 # $1 == ($8<$9) == ($8-$9 < 0)
Let compilers worry about such optimizations
• bne $1, $0, Target # $0 is always 0
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 18
Branches and Jumps Layers of Software
Other MIPS branches/jumps Notation - program: input data -> output data
beq $8, $9, imm # if ($8 == $9) PC = PC + imm<<2 else PC += 4 • executable: input data -> output data
• loader: executable file -> executable in memory
bne . . .
• linker: object files -> executable file
slt, sle, sgt, sge
• assembler: assembly file -> object file
• with immediate, unsigned
• compiler: HLL file -> assembly file
j addr # PC = addr • editor: editor commands -> HLL file
jr $12 # PC = $12
Only possible because programs can be manipulated as data
jal addr # $31 = PC+4; PC = addr; used for procedure calls
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 20
MIPS Machine Language Instruction Format
All 32-bit instructions R-format
A.L. add $1, $2, $3 opcode rs rt rd shamt funct
33222222222211111111110000000000 6 5 5 5 5 6
10987654321098765432109876543210
Digression:
M.L. 00000000010000110000100000010000
How do you store the number 4,392,976?
000000 00010 00011 00001 00000 010000
• Same as add $1, $2, $3
alu-rr 2 3 1 zero add/signed
Stored program: instructions are represented as numbers
• programs can be read/written in memory like numbers
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 22
Instruction Format Instruction Format
Other R-format: addu, sub, subi, etc I-format also used for ALU ops with immediates
A.L. lw $1, 100($2) addi $1, $2, 100
M.L. 100011 00010 00001 0000000001100100 001000 00010 00001 0000000001100100
lw 2 1 100 (in binary) What about constants larger than 16 bits = [-32768, 32767]
I-format 1100 0000 0000 0000 1111?
opcode rs rt address/immediate lui $4,12 # $4 == 0000 0000 1100 0000 0000 0000 0000
6 5 5 16 ori $4,$4,15 # $4 == 0000 0000 1100 0000 0000 0000 1111
All loads and stores use I-format
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 24
Instruction Format Summary: Instruction Formats
beq $1, $2, 7 R-format: opcode rs rt rd shamt funct
000100 00001 00010 0000 0000 0000 0111 6 5 5 5 5 6
PC = PC + (0000 0111 << 2) # word offset I-format: opcode rs rt address/immediate
Finally, J-format 6 5 5 16
j address J-format: opcode addr
opcode addr
6 26
6 26
Instr decode - Theory: Inst bits -> identify instrs -> control signals
addr is weird in MIPS: 4 MSB of PC // addr // 00
Practice: Instruction bits -> control signals
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 26
Procedure Calls Procedure Calls
See section 3.6 for more details An important data structure is the stack
• save registers Proc: save more registers Stack grows from larger to smaller addresses
• set up parameters do function
$29 is the stack pointer, it points just beyond valid data
• call procedure set up results
Push $2: Pop $2:
• get results restore more registers
• addi $29, $29, -4 lw $2, 4($29)
• restore registers return
• sw $2, 4($29) addi $29, $29, 4
jal is only special instruction: the rest is software convention • the order cannot be changed. why? interrupts
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 28
Procedure Example Addressing modes
swap(int v[], ink) { There are many ways of getting data
int temp;
temp = v[k]; Register addressing
v[k] = v[k+1];
v[k+1] = temp; add $1, $2, $3
}
# $4 is v[] & $5 is k -- 1st & 2nd incoming argument op rs rt rd ... funct
# $8, $9 & $10 are temporaties that callee can use w/o saving
swap: add $9,$5,$5 # $9 = k+k
add $9,$9,$9 # $9 = k*4 register
add $9,$4,$9 # $9 = v + k*4 = &(v[k])
lw $8,0($9) # $8 = temp = v[k]
lw $10,4($9) # $10 = v[k+1]
sw $10,0($9) # v[k] = v[k+1]
sw $8,4($9) # v[k+1] = temp
jr $31 # return
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 30
Addressing Modes Addressing Modes
Base addressing (aka displacement) Immediate addressing
lw $1, 100($2) # $2 == 400, M[500] == 42 addi $1, $2, 100
op rs rt address op rs rt immediate
register 100
400 Memory
Effective
42
address
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 32
Addressing Modes Addressing Modes
PC relative addressing Not found in MIPS
beq $1, $2, 100 # if ($1 == $2) PC = PC + 100 Indexed: add two registers - base + index
op rs rt address Indirect: M[M[addr]] - two memory references
Autoincrement/decrement: add data size
PC
Autoupdate - found in IBM PowerPC, HP PA-RISC
Memory • like displacement but update register
Effective
address
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 34
Addressing Modes Addressing Modes
Autoupdate for (i = 0, I < N, i +=1)
sum += A[i];
lwupdate $1, 24[$2] # $1 = M[$2+24]; $2 = $2+24
$7 - sum, $8 - address of a[i], $9 - N, $2 - tmp, $3 - i*4
op rs rt address
inner: new inner:
register lw $2, 0($8) lwupdate $2, 4($8)
addi $8, $8, 4
Memory add add $7, $7, $2
Delay
Effective
Any problems with new inner ? before the loop: sub $8, $8, 4
address
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 36
How to Choose ISA Intel x-86 (IA-32)
Minimize what? Year CPU Comments
• Instrs/prog x cycles/instr x sec/cycle
1978 8086 16-bit with 8-bit bus from 8080; selected for
IBM PC - golden handcuffs!
In 1985-1995 technology, simple modes like MIPS good.
1980 8087 Floating Point Unit
As technology changes, computer design options change
1982 80286 24-bit addresses, memory-map, protection
1985 80386 32-bit registers and addresses, paging
For memory is small, dense instructions important
1989 80486
For high speed, pipelining important 1992 Pentium
1995 Pentium Pro few changes; 1997 MMX
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 37 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 38
Intel 80386 Registers & Memory Intel 80386 ISA
Registers Two register machine: src1/dst src2
• 8 32-bit registers (but backward 16 & 8b: EAX, AX, AH, AL) • reg - reg, reg - immed, reg - mem, mem - reg, mem - imm
• 4 special registers: stack (ESP) & frame (EBP) pointers, ...
Examples
• Condition codes: overflow, sign, zero, parity, & carry mov EAX, 23 # places 32b 2SC imm 23 in reg EAX
• Floating point uses a 8-element stack (re-used by MMX) neg [EAX+4] # M[EAX+4] = -M[EAX+4]
Memory faddp ST(7),ST # ST = ST + ST(7)
• Flat 32-bits or segmented (rarely used, but must support) jle label # PC = label if Sign Flag or Zero Flag set
• Effective address =
base_reg + (index_reg x scaling_factor) + displacement
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 39 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 40
Intel 80386 ISA, cont. Current Approach
Decoding nightmare Current technique in Pentium Pro
• instructions 1 to 17 bytes • Instruction decode logic translates into “RISCy Ops”
• prefixes, postfixes • Execution unit runs RISCy ops
• crazy “formats” - e.g., register specifiers move around + Backward compatibility
• but key 32-b 386 instructions not terrible – Complex decoding
• yet got to make all work correctly + Execution unit as fast as RISC
We work with MIPS to keep it simple and clean
Learn x86 on the job!
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 41 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 42
Complex Instructions Concluding Remarks
More powerful instructions not necessarily faster execution Simple and regular
E.g. - string copy • same length instructions, fields in same place
option 1: move with repeat prefix for memory to memory move Small and fast
• special-purpose • Small number of registers
option 2: use loads into register and then stores Compromises inevitable
• generic instructions • Pipelining (buffet concept) should not be hindered
option 2 faster on the same machine! Common case fast
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 43 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 3 44