Embedded System Design: Hardware &
Software
(ICE 5113)
ARM Cortex M3 Addressing Modes ,Instruction Set,
Programming Examples
Bipin Krishna
Assistant Professor (Sr.)
ICE Department
Manipal Institute of Technology
MAHE, Karnataka, India
ARM Cortex-M Assembly Language
• Syntax:
• Assembly language instructions have four fields separated by spaces
or tabs.
Instruction Set
• When describing assembly instructions we will use the following list
of symbols
Instruction Set
Addressing Modes
• The addressing mode is the format the instruction uses to specify the
memory location to read or write data.
• The addressing mode is associated more specifically with the
operands, and a single instruction could exercise multiple addressing
modes for each of the operands.
• Register addressing
• Immediate addressing
• Indexed addressing
• PC-relative addressing
Addressing Modes
• Register addressing:
• Most instructions operate on the registers.
• In general, data flows towards the op code (right to left). In other
words, the register closest to the op code gets the result of the
operation.
Addressing Modes
• Register addressing:
• Register list. The stack push and stack pop instructions can operate on
one register or on a list of registers.
Addressing Modes
• Immediate addressing:
• With immediate addressing mode, the data itself is contained in the
instruction.
• Once the instruction is fetched no additional memory access cycles
are required to get the data.
• Immediate addressing is only used to get, load, or read data.
• It will never be used with an instruction that stores to memory.
Addressing Modes
• Immediate addressing:
• With immediate addressing mode, the data itself is contained in the
instruction.
• Once the instruction is fetched no additional memory access cycles
are required to get the data.
• Immediate addressing is only used to get, load, or read data.
• It will never be used with an instruction that stores to memory.
Addressing Modes
• Indexed addressing:
• With indexed addressing mode, the data is in memory and a register
will contain a pointer to the data.
• Once the instruction is fetched, one or more additional memory
access cycles are required to read or write the data.
• LDR R0,[R1] ; R0= value pointed to by R1 (R1 is not modified by this
instruction)
Addressing Modes
• Indexed addressing:
• LDR R0,[R1,#4] ; R0= word pointed to by R1+4 (Register R1 itself is not modified
by this instruction.)
• LDR R0,[R1,#4]! ; first R1=R1+4, then R0= word pointed to by R1
• LDR R0,[R1],#4 ; R0= word pointed to by R1, then R1=R1+4
• LDR R0,[R1,R2] ; R0= word pointed to by R1+R2
• LDR R0,[R1,R2, LSL #2] ; R0= word pointed to by R1+4*R2
Addressing Modes
• PC-relative addressing:
• PC-relative addressing is indexed addressing mode using the PC as the
pointer.
• The PC always points to the instruction that will be fetched next, so
changing the PC will cause the program to branch.
• A simple example of PC-relative addressing is the unconditional
branch.
• B Location ; jump to Location, using PC-relative addressing
• BL Subroutine ; call Subroutine, using PC-relative addressing
Addressing Modes
• PC-relative addressing:
• Typically, it takes two instructions to access data in RAM or I/O.
• The first instruction uses PC relative addressing to create a pointer to
the object, and the second instruction accesses the memory using the
pointer.
• LDR R1,=Count ; R1 points to variable Count, using PC-relative
• LDR R0,[R1] ; R0= value pointed to by R1
Addressing Modes
• Flexible second operand <op2>:
• <op2> can be a constant or a register with optional shift.
• ADD Rd, Rn, #constant ;Rd = Rn+constant
• We can also specify a flexible second operand in the form Rm {,shift} . If Rd is missing, Rn
is also the destination. For example:
• ADD Rd, Rn, Rm {,shift} ;Rd = Rn+Rm
• ADD Rn, Rm {,shift} ;Rn = Rn+Rm
• where Rm is the register holding the data for the second operand, and shift is an
optional shift to be applied to Rm . The optional shift can be one of these five formats:
Addressing Modes
• Flexible second operand <op2>:
• ADD R0,R1,LSL #4 ; R0 = R0 + R1*16 (R1 unchanged)
• ADD R0,R1,R2,ASR #4 ; signed R0 = R1 + R2/16 (R2 unchanged)
Memory Access Instructions
• There are four types of memory objects, and typically we use a specific
register to access them.
• The ADR instruction uses PC-relative addressing and is a handy way to
generate a pointer to a constant in code space or an address within the
program.
• The general form for ADR is:
• ADR{cond} Rd, label
• where {cond} is an optional condition (see Table 1), Rd is the destination
register, and label is a label within the code space within the range of
- 4095 to +4095 from the address in the PC.
Memory Access Instructions
• We use the LDR instruction to load data from memory into a register. and the STR
instruction to store data from a register to RAM.
• There is a special form of LDR which instructs the assembler to load a constant or
address into a register.
• This form for LDR is LDR{cond} Rd, =number and LDR{cond} Rd, =label
• On the TM4C family, Port A exists at address 0x4000.43FC.
Input LDR R5,=0x400043FC ;R5=0x400043FC, R5 points to PortA
LDR R6,[R5] ;Input from PortA into R6
; ...
BX LR
Memory Access Instructions
Memory Access Instructions
The move instructions get their data from the machine instruction or from within the processor and
do not require additional memory access instructions.
Logical Operations
• All instructions place the result into the destination register Rd . If Rd
is omitted, the result is placed into Rn , which is the register holding
the first operand.
• If the optional S suffix is specified, the N and Z condition code bits are
updated on the result of the operation.
• Some flexible second operands may affect the C bit.
• These logical instructions will leave the V bit unchanged.
Logical Operations
• For example, assume R1 is 0x12345678 and R2 is 0x87654321.
• The ORR R0,R1,R2
• will perform this operation, placing the 0x97755779 result in R0.
Shift Operations
• Use logic shift for unsigned numbers and arithmetic shifts for signed
numbers.
Shift Operations
Arithmetic Operations
• ADD{S}{cond} {Rd,} Rn, <op2> ;Rd = Rn + op2
• ADD{S}{cond} {Rd,} Rn, #im12 ;Rd = Rn + im12
• SUB{S}{cond} {Rd,} Rn, <op2> ;Rd = Rn - op2
• SUB{S}{cond} {Rd,} Rn, #im12 ;Rd = Rn - im12
• RSB{S}{cond} {Rd,} Rn, <op2> ;Rd = op2 - Rn
• RSB{S}{cond} {Rd,} Rn, #im12 ;Rd = im12 - Rn
• CMP{cond} Rn, <op2> ;Rn - op2
• CMN{cond} Rn, <op2> ;Rn - (-op2)
• If the optional S suffix is present, addition and subtraction set the
condition code bits as shown in Table
Arithmetic Operations
• The addition and subtraction instructions work for both signed and
unsigned values. As designers, we must know in advance whether we have
signed or unsigned numbers.
• The computer cannot tell from the binary which type it is, so it sets both C
and V.
• Our job as programmers is to look at the C bit if the values are unsigned
and look at the V bit if the values are signed.
Arithmetic Operations
• If the two inputs to an addition operation are considered as unsigned, then
the C bit (carry) will be set if the result does not fit.
• In other words, after an unsigned addition, the C bit is set if the answer is
wrong.
• If the two inputs to a subtraction operation are considered as unsigned,
then the C bit (carry) will be clear if the result does not fit.
• If the two inputs to an addition or subtraction operation are considered as
signed, then the V bit (overflow) will be set if the result does not fit.
• In other words, after a signed addition, the V bit is set if the answer is
wrong.
• If the result is unsigned, the N=1 means the result is greater than or equal
to 231 .
• Conversely, if the result is signed, the N=1 means the result is negative.
Arithmetic Operations
Example 1: Write code that reads from variable N adds 10 and stores the result in variable
M. Both variables are 32-bit.
• First, we perform a 32-bit read, bringing N into Register R1.
• Second we add 10, and lastly we store the result into M.
• LDR R3, =N ; R3 = &N (R3 points to N)
• LDR R1, [R3] ; R1 = N
• ADD R0, R1, #10 ; R0 = N+10
• LDR R2, =M ; R2 = &M (R2 points to M)
• STR R0, [R2] ; M = N+10
Arithmetic Operations
• Multiply( MUL ), multiply with accumulate( MLA ), and multiply with
subtract( MLS ) use 32-bit operands and produce a 32-bit result.
• MUL{S}{cond} {Rd,} Rn, Rm ;Rd = Rn * Rm
• MLA{cond} Rd, Rn, Rm, Ra ;Rd = Ra + Rn*Rm
• MLS{cond} Rd, Rn, Rm, Ra ;Rd = Ra - Rn*Rm
• UDIV{cond} {Rd,} Rn, Rm ;Rd = Rn/Rm unsigned
• SDIV{cond} {Rd,} Rn, Rm ;Rd = Rn/Rm signed
Arithmetic Operations
• The following four multiply instructions use 32-bit operands and
produce a 64-bitresult.
• The two registers RdLo and RdHi contain the least significant and
most significant parts respectively of the 64-bit result, signified as Rd .
• These multiply instructions do not set condition code flags.
• UMULL{cond} RdLo, RdHi, Rn, Rm ;Rd = Rn * Rm
• SMULL{cond} RdLo, RdHi, Rn, Rm ;Rd = Rn * Rm
• UMLAL{cond} RdLo, RdHi, Rn, Rm ;Rd = Rd + Rn*Rm
• SMLAL{cond} RdLo, RdHi, Rn, Rm ;Rd = Rd + Rn*Rm
Programming Examples
Example 2: MOV, LDR, STR instructions
• #1
• MOV R0,#10
• LDR R1,=NUM1; it uses to point to the address of the label, R1=0x00000108
• NUM1 DCD 0x12345678; at location 0x00000108= 0x12345678
• #2
• MOV R0,#10
• MOV R1,#11
• LDR R2,=NUM1; it uses to point to the address of the label, R2=0x0000010C
• LDR R3,=NUM2 it uses to point to the address of the label, R2=0x00000110
• NUM1 DCD 0x12345678; at location 0x0000010C= 0x12345678
• NUM2 DCD 0x87654321; at location 0x00000110= 0x87654321
Programming Examples
Example 2: MOV, LDR, STR instructions
• #3
• MOV R0,#10
• LDR R1,=0x10000000; R1 points to the location given, R1=0x10000000
• STR R0,[R1]; location 0x10000000=0x0A
• #4
• MOV R0,#10
• NUM1 DCD 0x10000000;
• LDR R1,NUM1; R1=0x10000000
• STR R0,[R1]; location 0x10000000=0x0A
Programming Examples
Example 3: Write code to set bit 0 in a 32-bit variable called N.
• First, we perform a 32-bit read, bringing N into Register R0.
• Second we perform a logical OR setting bit 0
• and lastly we store the result back into N.
• N DCD 0x12345678
• LDR R1, =N; R1=0x0000010C
• LDR R0, [R1]; R0=0x12345678
• ORR R0, R0, #1; R0=0x12345679
• STR R0, [R1];
Programming Examples
Example 4: Write code that reads from variable N multiplies by 5, adds 25, and stores the
.
result in variable M. Both variables are 32-bit
• First, we perform a 32-bit read, bringing N into Register R1.
• Second we multiply by 5 and add 25,
• and lastly we store the result into M.
• Since the value gets larger, overflow could occur. This solution ignores the
overflow error.
• LDR R3, =N ; R3 = &N (R3 points to N)
• LDR R1, [R3] ; R1 = N
• MOV R0, #5 ; R0 = 5
• MUL R1, R0, R1 ; R1 = 5*N
• MOV R0, #25 ; R0 = 25
• ADD R0, R0, R1 ; R0 = 25+5*N
• LDR R2, =M ; R2 = &M (R2 points to M)
• STR R0, [R2] ; M = 25+5*N
Programming Examples
Example 5: Assume we have three 8-bit variables named High , Low ,and Result . High and
Low have 4 bits of data; each is a number from 0 to 15. Take these two 4-bit nibbles and
combine them into one 8-bit value, storing the combination in Result .
LDR R2, =High ; R2 = &High
LDR R3, =Low ; R3 = &Low
LDR R4, =Result ; R4 = &Result
LDRB R1, [R2] ; R1 = High
LSL R0, R1, #4 ; R0 = R1<<4 = High<<4
LDRB R1, [R3] ; R1 = Low
ORR R0, R0, R1 ; R0 =(High<<4)|Low
STRB R0, [R4] ; Result =(High<<4)|Low
Branch and Control Instructions
B{cond} label ;branch to label
BX{cond} Rm ;branch indirect to location
specified by Rm
BL{cond} label ;branch to subroutine at label
BLX{cond} Rm ;branch to subroutine indirect
specified by Rm
• In assembly language, we use the term subroutine for all subprograms whether or not
they return a value.
• The last instruction in a subroutine will be BX LR , which we use to return from the
subroutine.
• In assembly language, we will use the BL instruction to call this subroutine.
Branch and Control Instructions
• Eg:-simple function that generates a pseudo random number.
main LDR R1,=Num ; (1) R1 = &Num
MOV R0,#0 ; (2) R0 = 0
STR R0,[R1] ; (3) Num = 0
loop BL Change ; (4) function call
B loop ; (10) repeat
Change LDR R1,=Num ; (5) R1 = &Num
LDR R0,[R1] ; (6) R0 = Num
ADD R0,R0,#25 ; (7) R0 = Num+25
STR R0,[R1] ; (8) Num = Num+25
BX LR ; (9) return
Branch and Control Instructions
• Decision making is an important aspect of software programming.
• Two values are compared and certain blocks of program are executed or skipped
depending on the results of the comparison.
Program to illustrates an if-then structure involving testing for unsigned greater than
or equal to.
Change LDR R1,=Num ; R1 = &Num
LDR R0,[R1] ; R0 = Num
CMP R0,#25600
BHS skip
ADD R0,R0,#1 ; R0 = Num+1
STR R0,[R1] ; Num = Num+1
skip BX LR ; return
Stack Operations
• The stack is a last-in-first-out temporary storage.
• To create a stack, a block of RAM is allocated for this temporary storage.
• On the ARM ® Cortex™-M processor, the stack always operates on 32-bit
data.
• The stack pointer (SP) points to the 32-bit data on the top of the stack.
• The stack grows downwards in memory as we push data on to it so,
although we refer to the most recent item as the “top of the stack” it is
actually the item stored at the lowest address!
• To push data on the stack, the stack pointer is first decremented by 4, and
then the 32-bit information is stored at the address specified by SP.
• To pop data from the stack, the 32-bit information pointed to by SP is first
retrieved, and then the stack pointer is incremented by 4.
• SP points to the last item pushed, which will also be the next item to be
popped.
Stack Operations
• Below figure illustrates how the stack is used to push the contents of
Registers R0, R1, and R2 in order.
• Assume Register R0 initially contains the value 1, R1 contains 2, and
R2 contains 3.
• PUSH {R0}
• PUSH {R1}
• PUSH {R2}
• POP {R3}
• POP {R4}
• POP {R5}
Stack Operations
• Rules to follow when using the stack:
1. Functions should have an equal number of pushes and pops
2. Stack accesses (push or pop) should not be performed outside the allocated area
3. Stack reads and writes should not be performed within the free area
4. Stack push should first decrement SP, then store the data
5. Stack pop should first read the data, and then increment SP
• Functions that violate rule number 1 will probably crash when incorrect data are
popped off at a later time.
• Violations of rule number 2 can be caused by a stack underflow or overflow.
• Overflow occurs when the number of elements becomes larger than the
allocated space.
• Stack underflow is caused when there are more pops than pushes, and it is
always the result of a software bug.
• Violations of rules 3, 4, and 5 will cause erratic behavior when operating with
interrupts.
• Rules 4 and 5 are followed automatically by the PUSH and POP instructions.
Stack Operations
• First, we will consider the situation where the allocated stack area is placed at the
beginning of RAM.
• For example, assume we allocate 4096 bytes for the stack from 0x2000.0000 to
0x2000.0FFF.
• The SP is initialized to 0x2000.1000, and the stack is considered empty.
• If the SP becomes less than 0x2000.0000 a stack overflow has occurred.
• The stack overflow will cause a bus fault because there is nothing at address 0x1FFF.FFFC.
• If the software tries to read from or write to any location greater than or equal to
0x2000.1000 then a stack underflow has occurred.
Stack Operations
• Next, we will consider the situation where the allocated stack area is placed at the
end of RAM.
• The TM4C123 has 32 KiB of RAM from 0x2000.0000 to 0x2000.7FFF.
• So in this case we allocate the 4096 bytes for the stack from 0x2000.7000 to 0x2000.7FFF.
• The SP is initialized to 0x2000.8000, and the stack is considered empty.
• If the SP becomes less than 0x2000.7000 a stack overflow has occurred.
• The stack overflow will not cause a bus fault because there is memory at address
0x2000.6FFC. Stack overflow in this case is a very difficult bug to recognize.
• If the software tries to read from or write to any location greater than or equal to
0x2001.0000 then a stack underflow has occurred. In this case, stack underflow will cause a
bus fault.
Reference
• Jonathan W Valvano, “Embedded Systems: Introduction to Arm®
Cortex™-M Microcontrollers”, 5th Edition, 2014
• Joseph Yiu, The Definitive Guide to the ARM Cortex-M3, 2nd Edition,
Newnes, (Elsevier), 2010.