0% found this document useful (0 votes)
13 views34 pages

ECE222 - Notes

Uploaded by

Ankit D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views34 pages

ECE222 - Notes

Uploaded by

Ankit D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 1 4ᵗʰ Sept 2024

Instruction Set Architecture how hardware executes software

What is a computer
just a machine that does whatever the software tells it
to do
computer digital circuits logic gates transistors
Software just a series of instructions
is
The five classic components of a computer
CPU Central Arithmetic ALU and Central Control
Memory System
Input and output

But here's a more realistic view

e
CPU

cache

memory bus 1 0 bridge

110 Bus
main
memory
contistier 8 It
I
disk disk graphics Network
Architecture vs Microarchitecture
compilers
in this case
Processor architecture
Functional appearance to software ISA

Exactly what instructions does it have


Number of memory storage locations it has
Interface
Processor microarchitecture

Logical structure that implements the architecture


Number of functional units interconnection control
Size of the caches
Not visible to the software
Implementation

Memory Hierarchy
speed coast size

CPU processor
CPU Cache
Physical Memory RAM
Solid State Memory SSD USB stick
Virtual Memory hard drive a

Moore's Law every 18 24 months


2x transistors on same chip area
2x processor speed
2x
memory capacity
energy power consumption
technically not a law but it's a self fulfilling prophecy
Wall speed processor
Memory every 2 years
2x Instructions second memory
2x memory capacity
1 Ix memory latency
time

growing disparity between processor and memory performance


significant effort in reducing hiding memory latency

Latency time from start to finish aka response time

Throughput number of tasks completed per time unit


aka bandwidth
throughput can exploit parallelism but latency cannot

Improving the latency of a component always


improves the overall system throughput
Performance x is speedup times faster than y
Speedup Performance y
throughput x
can be broken down into throughput y or
latency y
latency agent sets _n'eatin Eilert faction

Iron Law of Performance


instructions cycles Seconds
CPU Time instruction
program cycle
aka CPI
IPC is inverse instructions per cycle
Amdahl's Law speedup 1 [Link] sptere Eptfn

Example
Enhancement 1 speedup of 20 on 10 of time
I
1
1.105
1 0.1
If 0.9 0.005

Enhancement 2 speedup of 1.6 on 80 of time

I
0.210.5 1.43
1 0.8 98 0

i the second enhancement is better even though


the first enhancement has a speedup of 20 it's

only helpful 10 of the time

make the common case fast

Power consumption transistors consume dissipate


in two
power ways
Dynamic Power consumed by activities in a

circuit switching transistors

P C V2 f a
where C capacitance chip area
V power supply voltage
f clock frequency
a activity factor
Static Power consumed when powered on but
idle leaking current

as voltage decreases leakage increases


Ptotal
p Pdynamic

we strive for the sweet spot


between static and dynamic power Estatic

Power Wall supply voltage generally decreasing


over time
emphasis on power starting 2000
Since 2000 we've hit a wall and aren't making
as much progress

What is an ISA

Functional and precise specification of a


computer

An ISA is a contract between the software

and the hardware


Specifies what hardware promises to do when
it sees certain instructions but not how it does it

CISC RISC vs

Early trend was to add more and more instructions


to new CPUs to do elaborate operations
CISC Complex Instruction Set Computing
Now we keep the instruction set small and
simple and let the software do the complicated
operations by composing simpler ones

This makes it easier to build fast hardware


RISC Reduced Instruction Set Computing

Instruction Set
Arithmetic and logic add subtract multiply divide
AND OR XOR
Data movement move load store
Control flow branch and jump
System privileged used to manage processor
state handle exceptions etc
Each instruction has a
specific format and encoding

Registers
General Purpose Registers can be used for
various purposes as determined by the programme
orcompiler
Special Purpose Registers have specific roles
such as

Program counter PC holds the address


of the next instruction to be executed
Stack Pointer SP points to the top of
the stack in
memory
Status Register holds flags that indicate
the results of operations eg zero flag
this is virtual memory
Memory Model not physical
Address Space defines the range of memory
locations that can be addressed
Eg a 32 bit address space can address 232 4GB
Byte ordering specifies whether multi byte values
are stored with the most significant byte first
big endian or last little endian
Alignment Requirements some ISAs require data to
be aligned in memory in
specific ways
Eg 32 bit values may need to be stored at
addresses that are multiples of four

Addressing Modes
Immediate the operand is included directly in
the instruction itself
often used for small constants and quick
arithmetic operations
Register the operand stored in
is a CPU register
often used for frequently accessed variables and
intermediate results
Direct the instruction contains the memory address
where the operand is located
often used for accessing static variables or

fixed memory locations


Indirect the instruction specifies a register that
contains the memory address of the operand
often used for accessing pointer based structures
RISC V
New open source and license free ISA spec
Appropriate for all levels of computing system from
microcontrollers to supercomputers
simple and elegant
Designed with modularity and extensibility in mind
Many optional extensions targetting different use cases

M integer multiplication and division


A atomic instructions
F single precision floating point
D double precision floating point
C compressed instructions
vector operations
Eg RV64MC would be a RISC V 64 bit

implementation with the extensions M and C

There are 32 registers in RISC V numbered


from 0 to 31 0 31
0 is special it always holds the value zero
note is called
Each register is 32 bits wide
Some registers have conventional uses
1 return address ra

2 stack pointer sp
10 17 function arguments return values ao a7

Since registers are close to the processor they're


very fast faster than 0.25 ns
RISC V arithmetic instruction syntax

opname rd rsd rs2

opname operation by name


rd destination operand getting result
552 source 1 1st operand for operation
v52 source 2 2nd operand for operation

Eg add x1 x2 3
a bt C where a is stored at register
1 b in register 2 and c 3
in register

Example how would we


implement the following
statement a btc d e a is 10 b is x1
C is 2 d is 3 and 4

add 10 x1 x2 a btc
add 10 10 3 a a d
sub 10 10 4 a a e

Example f 9th it j where f is 19


is 20 h is 21 i is 22 and j is 23
g
we'll have to use intermediate temporary registers

add 5 20 21 temps 9th


add 6 22 23 temp 2 its
sub 19 5 6 f tempt temp 2

Immediates numerical constants

add immediate instruction similar to add instruction


but the last operand must be a number not a

register
addi rd rsd number

Eg add 6 to the value in register 3

addi 3 3 6 3 3 6

RISC philosophy is to reduce the operations to an absolute minimum

There is no subtract immediate instruction in

RISC just pass a


negative number

Register zero 0 RISC V hardwires the registe


0 XO to value 0

if 99
Eg add 3 4 0 f g
of
Eg add 3 0 Oxff f Oxff

i the instruction add 0 I 2 won't do anything

Also 0 is useful for clearing a register or

negating a number
RISC V is a load store architecture
Load store aka register register architecture memory
access is limited to load and store instructions All other
instructions arithmetic logical etc operate only on
registers

RISC V Load Instructions Syntax

size rd immerss

l stands for load


size specifies the size of the data to load
w for word h for halfword b for byte
rd destination register
imm immediate offset value
v51 the base register containing a memory address

Example translate the following code to RISC V


int A 100
g h A 3
g a 3
p [Link] ft
Iw 10 126 15 g gets A 3
add 1
124 18 g htg h A 3

RISC V Store Instructions Syntax

S size v52 imm rsd

5 stands for store


size specifies the size of the data to store
w for word halfword b for byte
h for
V52 the source register containing the data to store
imm an immediate offset value
rst the base register containing a
memory address

Example translate the following code to RISC V


int A 100

A to h A 3
A of
ptemp register
1W 10 126 15 temp gets A 3
add 18 121 10 temp h temp h A 3

If store temp in A
SW
48 10
p

Loading and storing Bytes


Using Ib and sb
the most significant bit of the byte is extended to fill the
rest of the word aka sign extend the byte

Example what does lb 10 36 11 do


contents of memory location with address 3 content
of register It is copied to the low byte position of
register 10

Example what will be in 12 after these instructions


addi 11 0 0 355
SW 11 01 5
lb 12 1 5
this is in hex for simplicity
1 It becomes 1001001031 F5
each 2
digits represents a
byte
4 bytes is a word

2 5 becomes 11 1001001031 F5

isβ skip first byte because 5


3 12 becomes

Pseudo instructions instructions that are actually


not

implemented in hardware but are recognized by the


assembler and translated into one or more real
hardware instructions

Decision making based on computation do or

don't do something different Eg if else


Assembly instructions support decision making called
branch

Branch change of control flow

Conditional Branch change control flow based on

outcome of comparison
beg bne bitcu bgt u

Unconditional Branch always branch


j jal jr

RISC V Conditional Branch Instructions Syntax


b cond rs1 rs2 L

b stands for branch


cond specifies the condition of the branch
eq for equal ne for not equal etc
V51 rs2 registers containing the 2 operands for comparison
L a label symbolic address for another position in the code

Example translate the following code to RISC V instructions


if i j assume i 13 j 14 f 10
f gth g xll he 12

bne 13 14 Exit if i j branch to Exit


add 10 11 12 i i j so f gth
Exit Exit label with nothing after it
note it's common to need to negate the if statement

Example translate the following code to RISC V instructions


if i j f g h assume is 13 j 14 f 10
else f g h g ll h 12

bne 13 14 Else if it branch to Else


add 10 11 12 i i j so f gth
j Exit end if i j
Else sub 10 11 12 i it j so f g h
Exit end

General programs need to test and as well


RISC V Magnitude Comparison Instruction Syntax

blt regs regs L

regs reg2
blt stands for branch on less than
if
regs regs registers containing the operands to compare
L label to branch to

There is also bltu which acts exactly like bit but


does not consider negative numbers
bitu branch on less than unsigned

Loops in Assembly while do while for

Example translate the C code to RISC V instructions


int A 20 Assume 8 is ACO 10 is sum It is i
int sum 0

for int i 0 i 20 it sum A i

add 9 8 0 copy ACO to 9


add 10 0 0 sum 10 0

add 11 0 0 int i o

addi 13 0 20 13 20 loop bound


Loop
bye 11 13 Done end if i 20
In 12 01 9 load Aci into 12

add 10 10 12 sum Aci


add 9 9 4 A it in 9
addi 11 11 I itt

j Loop loop again


Done end

Logical Operators

Bit by bit AND Bit by bit OR


C C 1
RISC V and RISC V or

Bit by bit OR Bit shift left


by bit
c n C co

RISC V or RISC V 511

Bit by bit shift right useful for moving


C extracting and inserting
RISC V Srl
groups of bits

Bitwise and used for masks


andi with 0 00000 OFF isolates the least

significant byte
and with OXFF 000000 isolates the most
significant byte
XOR gate with x and I gives x̅

there is no logical NOT in RISC V because it


can be implemented using XOR
Shift Left Logical SIL and immediate Sui

Ski rd rsd number

Ski stands for shift left logical immediate


rd destination register
v51 source register
number amount to shift

Example Sui 11 12 2

store in It the value from 12 shifted by 2


bits to the left inserting Os on the right
the bits on the left dissapear not wrap

before 0000 0010

after 0000 1000

Srl and srl do the exact opposite shift

Arithmetic Shifting

shift right arithmetic Sra srai


moves n bits to the right
inserts high order sign bit into empty bits

Srai rd rs2 number


srai stands for shift right arithmetic immediate
rd destination register
rs1 source register
number amount to shift

Example Srai 10 10 4
replace the contents of register 10 by bit
shifting 4 bits to the right with the same

sign

before 1111 1111 1111 1111 1111 1111 1110 0111 25


after IIII 1111 1111 1111 1111 1111 1111 1110 2

Sra is Not the same as dividing by 2

Six Fundamental steps in Calling a Function


1 Put arguments in a place where the function
can access them
2 Transfer control to function
3 needed
Acquire local storage resources
for the function
4 Perform desired task of the function
5 Put return value in a
place where calling code
can access it and restore any registers you
used release local storage
6 Return control to point of origin since a

function can be called from several points in

a program
Registers are faster than memory so use them

ao at 10 17 8 argument registers to

pass parameters and two return values ao a1

ra one return address register to return the

point of origin x1

50 51 8 9 52 511 18 27 saved
registers

Jump and Link jal Instruction


link means form an address or link that
points to
calling site to allow function to
return to proper address
jumps to address and simultaneously saves
the address of the following instruction in

register ra

Eg jal rd Function Label

Return from function jump register jr instruction

unconditional jump to address specified in

register jr ra

Eg jair rd rs imm
Stack last in first out LIFO queue
it's in
memory so need to register to point it
This is called the stack pointer Sp 2
convention is to grow stack down from high to

low addresses
push places data onto stack decrements sp
pop removes data from stack increments sp

Stack Frame
includes
return instruction address
parameters arguments
space for other variables local
Stack frames are contiguous blocks of memory
sp tells where bottom of stack frame
when a procedure is called a new stack
frame opens
when a procedure returns the stack frame
collapses

Example translate this C code to RISC V

int Leaf int g int h int i int j


int f
f 9th it j
return f

g h i and in ao al a2 and a f in 50
Leaf
addi sp sp 8 adjust stack for 2 items
SW 51 4 sp save 51 for later

SW SO O sp save so for later


add so [Link] f gth
add 52 a2 a 51 it

sub ao so 52 f f SI

Iw so 0 sp restore so for caller


In 52 4 sp restore SI for caller
addi sp sp 8 adjust stack to remove
fem
jr ra jump back to caller

Register Conventions
Caller the calling function
callee the function being called

when the callee returns from executing the


caller needs to know which registers may
have changed and which are guaranteed
to be unchanged

Register conventions set of


are a
generally
accepted rules as to which registers will be
unchanged after a
procedure call jal

Preserved across function call callee saved


caller can rely on values being unchanged
SP gp tp 50 511 150 fp

Not Preserved across function call caller saved


caller cannot rely on values being unchanged
a0 67 ra to 6

Non Leaf Procedures


Procedures that call other procedures nested
both a caller and a callee
For nested call caller needs to save on the stack
1
it's return address
2
any arguments and temporary registers needed
after the call
Restore from the stack after the call

RISC U's 32 bit instruction words are divided


into fields
each field tells the processor something about
the instruction

Six basic
types of instruction formats
R Format register register arithmetic operations
I Format register immediate arithmetic operations loads
S Format Stores
B Format branches minor variant of S Format
U Format 20 bit upper immediate instructions
J Format jumps minor variant of U Format
R Format Instruction Layout opname rd rs1 rs2

fun 7
v32 r 2 fun ct3
rgd opcode

functt funct3 and opcode describes what operation


to perform
all R Format instructions have opcode 0110011

register fields rst rs2 rd hold 5 bit unsigned


integers 0 31 corresponding to a register 0 31

all 10 R V32 R Format Instructions

But immediates need to be wider

i the I Format

[Link] 0 r 1 fun ct3 rsd opcode


imm 11 O holds 12 bit wide immediate values
values in
range 2048 2047
CPU sign extends to 32 bits before use in an

arithmetic operation

all 9 RV32 I Format Arithmetic Instructions

load instructions are also I Format

All 5 RV32 Load Instructions

Ib load byte 1h load halfword 16 bits


sign extend to fill upper bits of 32 bit register
Ibu thu same as previous but unsigned
0 extend to fill upper bits of 32 bit register
Note no twee instruction in RISC V

S Format Instruction Layout

31mm 1 53 r 2 r 1 fun ct3 imm 4 03 opc de

immediate is split up because RISC prioritizes


keeping register fields in the same place
Store address Base Register Immediate Offset
store needs immediate and two read registers
but doesn't need a destination register

All 3 RV Store Instructions

PC Relative Addressing supply a signed


offset to update the counter PC
program

Position Independent Code if all of the code


moves relative offsets don't change

Branches generally change the PC by a small


amount therefore we encode relative offsets as

signed immediates

Contrast with Absolute Addressing supply new


address to overwrite PC
use sparingly brittle to code movement

RISC V scales the branch offset by 2 bytes

B Format Instruction Layout

imm 5 r 2 fun ct3 imm 4


51 122 op de
12110 ref
all conditional branch instructions have opcode 1100011
immediate represents relative offset in increments
of 2 bytes half words
new PC PC byte offset
12 immediate bitsimply 2 32 bit instructions
I bit 2s complement allow offset
I bit half word 16 bit instruction support

if imm 12110 5 Exxxxxx and imm 4 1121 Wwwwy


then the byte offset is [Link] fffeIi3itaitay

Instruction bit 31 is always the sign bit highest


bit to sign extend in immediate

All 6 RV32 B Format Instructions


B Format has limited range 2 32 bit instructions
from current instruction
if the destination is further away use unconditional jump

J Format Instruction layout jal rd label

imm
2,01 10 57 imm 4 11 immgid i2 rsd opsode

Immediate represents relative offset in increments


of 2 bytes
PC PC byte offset
20 immediate bits imply 218 32 bit instructions reachable
1 bit for 2 s complement and I for hw 16 bit instruction suppor
rd gets return address rd PC 4

but what if we still want to jump further

U Format Instruction Layout opname rd immed


Upper Immediate

imm 12 rsd opcode


31
immediate represents upper 20 bits of a 32 bit immediate
imm immed 12

Lui instruction Load Upper Immediate lui rd immed


Write a 20 bit immediate value into the upper 20
bits of register rd and clear the lower 12 bits
rd immed cc 12

lui together with add to set lower 12 bits


can create any 32 bit value in a register

Example
lui x 10 0 87654 10 0 87654000
addi 10 10 0 321 10 0 87654321

The li load immediate pseudo instruction resolves


to luit addi as needed eg li 10 87654321

However there's an edge case because addi sign extends


Solution if the 12 bit immediate is negative just add
one to the upper 20 bit load

just use li pseudo instruction it automatically handles this

auipe Add Upper Immediate to PC auipe rd immed

rd PC immed c 12

Example quipe 5 OXABCDE 5 PC OxABCDE0OO

In practice Label anipe 5 0 5 address of Labe


Note unlike jal relative to PC jair addresses
are relative to rs1 which is modifiable by arithmetic
instructions jair lets us do
bigger jumps

that's the end of the RISC V ISA

Translator converts a
program from the source

language to an equivalent program in another language


note translating to lower level languages almost always
means higher efficiency and performance

Interpreter directly executes the program in the


source language
note easier to debug and port to different platforms

Compiler
Input high level language code eg foo c

Output assembly language code eg foo s


note output may include pseudo instructions

Assembler
Input assembly language code foo s
Output machine language module object file
eg foo o
Reads and uses directives
Replaces pseudo instructions with true assembly
Directives give directions to the assembler
often generated by the compiler
directives do not produce machine instructions Rather
they inform how to build different parts of the

object file

text subsequent items put in user Text segment machine code

data subsequent items put in user Data segment source file data in binary

globl sym declares sym global and can be referenced from other file

string Str store the string str in memory and null tÉminate it
Word W Wn store the n 32 bit quantities in successive memory words

Object File Format


1 Header
2 Text segment
3 Data segment
4 Symbol Table
5 Relocation Information
6
Debugging Information

1 object file header


size and position of other pieces of the object file

2 text segment
machine code pall necessary info already in the instruction

simple casearithmetic logical shifts etc


PC relative branches jumps
once pseudo instructions are replaced all PC relative
addressing can be computed

take Two Passes over the program


Pass 1 remember of labels store in
positions
symbol table
Pass 2 use label positions to generate machine code

3 data segment
however some references such as to other files or to
static data cannot yet be determined i the
assembler jots them down in the Relocation Information
and the
symbol Table

4 Symbol table
list of items in this file
Instruction Labels
used to compute machine code for PC Relative
addressing in branches function calling etc
global directive labels can be referenced by
other files
Data anything in the data section

5 relocation table
lines of code to fix later by linker
list of items whose address this file needs
any external label jumped to

any piece of data in static section


Linker
Input object files eg foo o

Output executable machine code eg a out


enables separate compilation of files
changes of one file does not require recompiliation
of the entire program

puts together text segments from each to file


puts together data segments from each o file
and concatenates to the end of the text segment
resolves references
ie go through the relocation table and fill in all
absolute addresses

Note don't forget that B type instructions don't need


editing as PC Relative addressing is preserved
even if text is relocated

so far we have described the traditional method


of statically linked libraries

The alternative dynamically linked libraries DLL


less disk space but needs extra time to link
uses machine code as lowest common denominator
Overall makes compiler linker and Os more complex
but provides many benefits that often outweigh these
complexities
Loader
Input executable code eg a out

Output runs
program
stored on disk
when an executable is run loader loads it into

memory and starts it


basically loader is the operating system OS

in more detail
Loads program into a
newly created address space
reads executable's file header for sizes of text and
data segments
creates new address space for program large
enough to hold text and data segments along
with a stack segment
copy instructions and data from executable file
into new address space
copy arguments passed to the program onto
the stack
Initialize machine registers
most registers cleared stack pointer assigned address
of first free stack location
Jump to start up routine which does the following
copy program arguments from stack to registers
set PC
If main routine returns terminate program with
exit system call
language
offsite affine
program
loader memory
[Link] [Link]
[Link] ut

Example when are the machine code bits determined


for the following assembly instructions
a add 6 7 8
after assembly as add is a simple instruction
with all necessary info built in
b jal x1 fprint f
after linking as the linker must resolve the
absolute address of fprintf

Conclusion
Compiler converts a single high level language file into
single assembly
a language file
Assembler removes pseudo instructions

You might also like