Module 1 and 2 NOTES-MC
Module 1 and 2 NOTES-MC
Microcontrollers can be categorized based on their complexity and application levels into low-
level, medium-level, and high-level applications.
Low-Level Applications
Low-level applications of microcontrollers involve simple, basic tasks that require minimal
processing power and limited peripheral interfacing. These applications are typically found in
consumer appliances and everyday electronic devices. Examples include automatic door openers,
temperature controllers in home appliances, basic remote controls, and simple LED-based
display systems. Such systems use microcontrollers primarily for on/off control, basic
automation, and sensor-based adjustments without complex computations.
Medium-Level Applications
A microprocessor (MPU) is the central unit of a computing system that performs arithmetic and
logic operations. Unlike a microcontroller, a microprocessor does not have built-in memory or
input/output peripherals and requires external components for full functionality. It is used in
more complex computing tasks that demand higher processing power.
Examples of Microprocessors
Microprocessors come in various architectures and are used in different applications. Some
common examples include:
1. Intel x86 Series (Intel Core i3, i5, i7, i9, Xeon, Pentium)
-Used in personal computers, workstations, and servers.
2. AMD Ryzen Series (Ryzen 3, 5, 7, 9, EPYC, Threadripper)
-Found in gaming PCs, high-performance computing, and cloud servers.
3. ARM-Based Processors (Apple M1, Qualcomm Snapdragon, NVIDIA Tegra,
Samsung Exynos)
-Commonly used in smartphones, tablets, and embedded AI applications.
4. RISC-V Processors (SiFive, Kendryte K210)
-Emerging in open-source computing and IoT applications.
5. IBM Power and SPARC Processors (IBM Power9, Oracle SPARC)
-Used in enterprise servers, high-performance computing, and data centers.
Low-Level Applications
Low-level applications of microprocessors involve simple computing tasks that require minimal
processing power. These applications include calculators, basic electronic cash registers, and
simple point-of-sale (POS) systems. Such systems primarily rely on basic arithmetic and logic
operations without requiring high-speed data processing or multitasking capabilities.
Medium-Level Applications
Medium-level applications involve tasks that require real-time data processing, multitasking, and
moderate computational power. Examples include personal computers, ATMs, industrial
automation systems, and medical imaging devices such as ultrasound machines. Microprocessors
in these applications handle tasks such as running an operating system, processing multiple
input/output operations, and managing real-time user interactions.
High-Level Applications
RISC (Reduced Instruction Set Computer) processors follow a simple and efficient approach to
executing instructions. Unlike CISC (Complex Instruction Set Computer) processors, which use
many complex instructions that take multiple cycles to execute, RISC processors use a small
set of simple instructions, each designed to complete in a single clock cycle. Figure 1.1
illustrates these major differences.
Simple Instructions – RISC processors use a small set of simple instructions, each
designed to execute in one clock cycle. In contrast, CISC processors have complex
instructions that take multiple cycles.
Fixed Instruction Size – All RISC instructions are of the same size, allowing the processor
to fetch and decode instructions efficiently. CISC instructions, being variable in size,
make execution slower.
Load-Store Architecture – RISC processors separate memory access from computation,
requiring data to be loaded into registers before processing. This reduces memory delays,
unlike CISC processors that access memory directly.
Pipelining for Speed – The uniform instruction size in RISC makes pipelining more
effective, allowing multiple instructions to be executed simultaneously at different stages.
CISC processors struggle with pipelining due to variable execution times.
More Registers, Less Memory Access – RISC processors have more registers to store
frequently used data, reducing slow memory accesses. CISC processors rely more on
external memory, making them slower in comparison.
Comparison Between CISC and RISC
RISC is preferred for power-efficient and high-speed processing, while CISC is used for
compatibility and handling complex instructions efficiently.
Power Efficiency for Portable Devices – ARM processors are designed to be small and
power-efficient, making them ideal for battery-powered devices like mobile phones
and tablets. Lower power consumption helps in extending battery life, which is
essential for portable embedded systems.
High Code Density for Limited Memory – Since embedded systems often have
limited memory, ARM processors provide high code density, ensuring that more
instructions can fit into a smaller memory space. This is crucial for applications like
mass storage devices and mobile phones, where compact memory is a key requirement.
Cost-Effective Design – Embedded systems are cost-sensitive, and ARM processors
support slow and low-cost memory devices, making them suitable for high-volume
applications like digital cameras and industrial controllers. This helps in reducing
overall manufacturing costs.
Compact Die Size for Integration – ARM processors are designed to take up less space
on the chip (die), allowing manufacturers to add more specialized peripherals on the
same chip. This results in lower production costs and enables single-chip solutions for
various applications.
Enhanced Debugging for Faster Development – ARM includes hardware debugging
technology, allowing software engineers to monitor and analyze the processor’s
execution in real-time. This improves troubleshooting, speeds up development time,
and helps bring products to market faster.
Balanced RISC Approach for Embedded Systems – While ARM is based on RISC
architecture, it is not a pure RISC processor. It balances efficiency and performance,
ensuring low power consumption without compromising system performance, which is
essential for modern embedded applications.
Some Instructions Take More Time – Unlike basic RISC processors, not all ARM
instructions take just one cycle. Some, like load-store multiple, transfer data to several
registers at once, making memory access faster and saving space in the code.
Barrel Shifter for Better Performance – ARM has a barrel shifter that modifies data
before processing, helping to perform complex tasks faster without needing extra
instructions.
Thumb 16-bit Instruction Set – ARM can use both 16-bit and 32-bit instructions. The
16-bit Thumb instructions help save memory and make programs smaller and more
efficient.
Conditional Execution Saves Time – ARM allows instructions to run only if a
condition is met, reducing the need for extra jump instructions and making programs
faster.
Special DSP Instructions for Faster Processing – ARM includes extra instructions for
fast math operations, making it good for digital signal processing (DSP) without
needing a separate DSP chip.
These features make ARM one of the most popular processors for embedded systems
worldwide.
Embedded systems are used in many devices, from small sensors in factories to real-time
control systems in space missions. These systems combine software and hardware, with each
part designed for efficiency and sometimes future upgrades. A typical embedded system has
four main hardware components: Figure 1.2 shows a typical embedded device based on an ARM
core. Each box represents a feature or function. The lines connecting the boxes are the buses
carrying data. We can separate the device into four main hardware components:
ARM Processor – This is the brain of the system, processing instructions and handling
data. Different versions of the ARM processor are available, depending on the needs of
the device. It includes a core for execution and extra components like memory
management and caches to improve performance.
Controllers – These manage key parts of the system. Common types include the
interrupt controller, which handles signals from different parts of the system, and the
memory controller, which manages data storage and retrieval.
Peripherals – These are the input and output devices that allow the embedded system
to interact with the outside world. They make each device unique by adding special
functions, such as displays, sensors, and communication interfaces.
Bus – The bus is a pathway that connects all parts of the system, allowing data to move
between components efficiently.
Each part of an embedded system is carefully selected to ensure smooth performance, low
power consumption, and cost-effectiveness.
Embedded systems use a different bus system than PCs. Unlike the PCI bus in computers,
which connects external devices, embedded systems use an on-chip bus to link peripherals with
the ARM core.
Bus Masters & Slaves – The ARM processor is a bus master, meaning it starts data
transfers. Peripherals act as bus slaves, responding to requests from the master.
Two Levels of Bus Architecture – The physical level defines the bus width (16, 32, or
64 bits), while the protocol level sets rules for communication between the processor and
peripherals.
The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 for ARM
processors. It allows easy reuse of peripherals across different projects, improving
compatibility and speed.
These buses improve data transfer speed, efficiency, and system performance, making ARM-
based embedded systems more powerful and flexible.
1.8.3 Memory
Embedded systems require memory to store and execute code. The choice of memory impacts
price, performance, and power consumption. Important characteristics to consider include
memory hierarchy, width, and type. If memory needs to operate at a higher speed to maintain
bandwidth requirements, power consumption may increase.
1.8.3.1 Hierarchy
1.8.3.2 Width
Memory width defines the number of bits retrieved per access—typically 8, 16, 32, or 64 bits.
Memory width significantly influences performance and cost.
1.8.3.3 Types
1.8.4 Peripherals
A peripheral is a device that helps a processor communicate with the outside world. It can send
and receive data from sensors, displays, or other electronic devices.
In ARM systems, all peripherals are memory-mapped. This means they are controlled using
special memory locations. Each peripheral has a set of registers (small storage spaces) that the
processor can read or write to.
Some special peripherals are called controllers because they manage important tasks. Two
important controllers are:
Memory Controllers
A memory controller connects different types of memory (like RAM or Flash memory) to the
processor.
When the system starts, some memory is already active so the startup code can run.
Other types of memory, like DRAM, need to be set up by software before they can be
used.
The memory controller makes sure the processor can read and write data from memory correctly.
Interrupt Controllers
This signal is called an interrupt because it stops the processor from its regular work to handle
the important event.
An embedded system needs software to function properly. Figure 1.4 shows four typical
software components required to control an embedded device.
This software is made up of different parts, each with a specific role in controlling the hardware
and managing tasks. The four main parts of embedded system software are:
Initialization Code: The first code that runs when the system starts. It sets up basic
hardware like memory and processor settings before handing control to the operating
system or another program.
Operating System: Manages system resources like memory, processing power, and
devices. Some embedded systems use a full operating system, while others use a simple
task scheduler.
Device Drivers: Act as a link between the hardware and software, helping the system
control devices like sensors, displays, and communication modules.
Applications: Perform specific tasks, like running a diary app on a mobile phone. Some
systems run multiple applications at the same time, managed by the operating system.
When an embedded system starts, the boot code runs first. It takes the processor from a reset
state to a working state where it can execute programs. The main tasks of boot code include:
In some cases, boot code also runs hardware tests to check if all components are working
before loading the main software. If the system does not have a full operating system, the boot
code may start a simple task scheduler that controls when tasks run, or a debug monitor to help
developers check the system.
After initialization, the operating system (OS) takes control. It organizes system resources,
manages memory, and ensures multiple tasks can run smoothly. Some embedded systems use
full operating systems, while others rely on simpler mechanisms.
An operating system makes sure the embedded system runs efficiently. The choice of OS
depends on the needs of the application, whether it requires real-time performance or supports
general-purpose tasks.
1.9.3 Applications
Applications are programs that perform specific tasks on an embedded system. The operating
system manages these applications, ensuring they run smoothly. Some embedded systems run a
single application, while others support multiple applications running simultaneously.
ARM processors are used in many industries, including networking, automotive, mobile
devices, consumer electronics, storage, and imaging. For example:
In networking, ARM processors power home gateways, DSL modems, and 802.11
wireless devices.
In mobile devices, ARM is dominant in smartphones and tablets, making this the largest
application area.
In storage and imaging, ARM processors are found in hard drives and inkjet printers,
where cost efficiency and high production volumes are key.
The ARM processor consists of different functional units connected by data buses, which allow
information to move between components. These functional units work together to process
instructions and data efficiently. Figure 2.1 shows a Von Neumann implementation of the
ARM—data items and instructions share the same bus.
Data Flow and Architecture
Data enters the ARM processor through the data bus, which carries both instructions and data
in a Von Neumann architecture. In contrast, the Harvard architecture separates these into two
different buses, allowing for faster data access.
Inside the processor, an instruction decoder translates instructions before execution. ARM
processors use a load-store architecture, meaning:
Unlike some processors, the ARM does not process data directly in memory. Instead, all
calculations happen inside the processor using registers and the ALU (Arithmetic Logic Unit).
One special feature of the ARM processor is the barrel shifter, which can modify values before
they enter the ALU. This allows efficient computation of complex expressions and memory
addresses.
After processing, results are stored back into registers through the result bus. For memory
operations, an incrementer updates the address register so the next memory location can be
accessed automatically. The processor continues executing instructions until an exception or
interrupt occurs.
Registers are small, fast storage units inside the processor. The ARM core is a 32-bit
processor, so most instructions operate on 32-bit signed or unsigned values. If an 8-bit or 16-
bit value is loaded from memory, the hardware extends it to 32 bits before storing it in a register.
Data flows from the register file into the ALU or MAC (Multiply-Accumulate) unit, which
performs calculations. After processing, results are either stored back in a register or used to
generate a memory address for load/store operations.
This efficient design allows ARM processors to execute instructions quickly while keeping
power consumption low, making them ideal for embedded systems.
2.1 Registers
Registers are small, fast storage units inside the ARM processor that hold either data or
memory addresses. They are identified by the letter "r" followed by a number (e.g., r4 refers to
register 4). Figure 2.2 shows the active registers available in user mode
The ARM processor has up to 18 active registers, consisting of 16 data registers and 2
processor status registers. These registers are 32-bit in size and vary in function based on the
processor mode.
r13 (Stack Pointer, sp): Holds the top of the stack in the current processor mode.
r14 (Link Register, lr): Stores the return address when calling a subroutine.
r15 (Program Counter, pc): Contains the memory address of the next instruction to
be fetched and executed.
The stack pointer (r13) and link register (r14) can sometimes be used as general-purpose
registers. However, this is not recommended in an operating system environment because the
stack pointer must always reference a valid stack frame.
In ARM state, registers r0 to r13 are orthogonal, meaning that most instructions can use any of
them interchangeably. However, r14 (lr) and r15 (pc) have unique roles and are treated
differently by some instructions.
In addition to the 16 data registers, the ARM processor includes two program status registers
(PSRs):
cpsr (Current Program Status Register): Stores the current state of the processor.
spsr (Saved Program Status Register): Holds a backup of cpsr when switching
between processor modes.
The register file contains all registers available for programming. However, the number of
visible registers depends on the current mode of the processor. The ARM processor supports
multiple operating modes, such as user mode (for applications) and privileged modes (for
handling system-level tasks), which determine which registers can be accessed at a given time.
The Current Program Status Register (CPSR) is a special 32-bit register in the ARM
processor that helps monitor and control internal operations. It is part of the register file and
plays a crucial role in managing the processor's state. Figure 2.3 shows the basic layout of a
generic program status register.
Structure of CPSR
The CPSR is divided into four main sections, each 8 bits wide:
1. Flags Field – Stores condition flags that indicate the outcome of operations.
2. Status Field – Reserved for future use in newer ARM designs.
3. Extension Field – Also reserved for potential future updates.
4. Control Field – Contains essential information such as:
Processor mode (User mode, Supervisor mode, etc.)
State of the processor (ARM or Thumb mode)
Interrupt mask bits (To enable or disable specific interrupts)
Special Bits in CPSR
Some ARM processors have additional special bits that serve unique purposes. For example, the
J bit is found in processors with Jazelle technology, which allows the execution of 8-bit Java
instructions. Future ARM processors may introduce more bits to enhance monitoring and
control capabilities.
2.3 Pipeline
A pipeline is a technique used by RISC (Reduced Instruction Set Computing) processors like
ARM to speed up instruction execution. It allows multiple instructions to be processed
simultaneously by breaking the execution into stages, just like an assembly line in a car factory.
Each instruction moves through the pipeline, allowing new instructions to enter before the
previous ones finish execution. This process is called "filling the pipeline", making the CPU
more efficient by executing one instruction per cycle once the pipeline is full. Figure 2.8
illustrates the pipeline using a simple example.
Figure 2.8 illustrates the pipeline using a simple example. A pipeline helps execute instructions
efficiently by processing multiple steps simultaneously. Let’s understand this with a simple
example.
Three Instructions in the Pipeline
This process is called "filling the pipeline." Once filled, the processor can execute one
instruction per cycle, improving speed.
In an ARM7 pipeline, an instruction is not considered executed until it completes the execute
stage. This means that for a three-stage pipeline, an instruction is fully executed only when the
fourth instruction is fetched. Figure 2.11 shows an instruction sequence on an ARM7 pipeline.
Example of Execution
This shows that pipeline execution affects when instructions take effect in the system.
2.4 Exceptions, Interrupts, and the Vector Table
When an exception or interrupt occurs, the processor jumps to a specific memory address in
a special section called the vector table. This table contains instructions that direct the processor
to the correct routine for handling the event (see Table 2.6).
1. Reset Vector → Runs first when the processor starts, leading to initialization.
2. Undefined Instruction Vector → Activated when the processor encounters an unknown
instruction.
3. Software Interrupt (SWI) Vector → Triggered by a SWI instruction, often used to call
OS routines.
4. Prefetch Abort Vector → Occurs if an instruction is fetched from an unauthorized
address.
5. Data Abort Vector → Triggered when an instruction tries to access restricted data
memory.
6. Interrupt Request (IRQ) Vector → Used by external hardware to interrupt execution.
Each vector contains a branch instruction that points to the start of the specific handling
routine.
Core Extensions in ARM Processors
ARM processors include extra hardware features to boost performance, manage memory
efficiently, and add more functionality. These features vary across different ARM processor
families but generally include:
Cache is a small, high-speed memory located between the processor and main memory. It stores
frequently accessed data and instructions, allowing the processor to work faster by reducing
delays caused by slow external memory. Many ARM-based embedded systems use a single-
level cache inside the processor, though smaller systems may not need it. Figure 2.13. For
simplicity, we have called the glue logic that connects the memory system to the AMBA bus
logic and control.
Types of Cache in ARM Processors
1. Von Neumann Architecture – Uses a single unified cache for both instructions and data.
2. Harvard Architecture – Has separate caches for instructions and data, improving efficiency.
While cache improves speed, it does not guarantee predictable execution times, which is
essential for real-time systems.
For real-time applications, Tightly Coupled Memory (TCM) is used. TCM is fast SRAM
located close to the processor, ensuring fixed, predictable access times. Unlike cache, TCM
does not rely on automatic fetching; instead, it is directly accessed by the processor, appearing as
part of the memory system.
Memory Management
Embedded systems often use different types of memory. To keep everything organized and
protect the system from errors, memory management hardware is used. ARM processors have
three types of memory management:
1. No Memory Protection – Simple systems use fixed memory without security features.
This works for small devices that don’t need protection from faulty applications.
2. Memory Protection Unit (MPU) – The MPU divides memory into regions and assigns
specific access permissions to each. It is useful for systems that need some protection
but don’t have a complex memory structure.
3. Memory Management Unit (MMU) – The MMU provides full memory protection by
using translation tables stored in memory. These tables manage virtual-to-physical
addresses and control access permissions, making MMUs ideal for advanced systems.
Coprocessors
Coprocessors can be added to an ARM processor to expand its capabilities. They do this by
adding new instructions or providing configuration registers. Multiple coprocessors can be
connected through the coprocessor interface.
Coprocessors are controlled using special ARM instructions that work like load and store
operations. For example, coprocessor 15 manages the cache, tightly coupled memory (TCM),
and memory management.
Some coprocessors add new instructions to the ARM instruction set. For example, special
instructions for vector floating-point (VFP) operations improve performance in mathematical
calculations. When the ARM processor decodes an instruction, it checks if a coprocessor should
handle it. If the coprocessor is missing or doesn’t recognize the instruction, the processor triggers
an undefined instruction exception, allowing the operation to be simulated in software.
MODULE 2: Introduction to the ARM Instruction Set
The ARM (Advanced RISC Machine) architecture has emerged as one of the most widely used
processor architectures in modern computing. Designed with power efficiency and high
performance in mind, ARM processors power a vast array of devices, from embedded systems
and mobile phones to high-performance computing platforms. The success of ARM can be
attributed to its Reduced Instruction Set Computing (RISC) architecture, which enables
streamlined instruction execution, lower power consumption, and enhanced processing
efficiency compared to traditional Complex Instruction Set Computing (CISC) architectures.
This chapter explores the ARM instruction set, providing a foundational understanding of its
design principles, instruction types, and execution mechanisms. The ARM instruction set follows
a load-store architecture, meaning that data manipulation occurs primarily between registers,
with memory access limited to specific instructions. This design significantly improves
processing speed and reduces complexity in instruction decoding.
Uniform and Fixed-Length Instructions – Most ARM instructions are 32-bit, ensuring
consistent execution efficiency. Modern ARM variants also support a 16-bit Thumb
instruction set, enabling compact code for memory-constrained applications.
Conditional Execution – Unlike conventional architectures, ARM supports conditional
execution for nearly all instructions, reducing the need for branch instructions and improving
efficiency.
Barrel Shifter Integration – ARM’s unique instruction set allows efficient data
manipulation through integrated shifting operations within arithmetic and logical
instructions.
Multiple Addressing Modes – The architecture provides various addressing modes,
including register-based, immediate, and indexed addressing, offering flexibility in data
handling.
Power Efficiency and Scalability – ARM processors optimize power consumption through
an efficient pipeline structure, making them ideal for battery-powered devices.
Throughout this chapter, we will delve into the structure of ARM instructions, classify them into
categories such as data processing, memory access, and control flow instructions, and
analyze their operational significance in real-world applications. By the end of this chapter,
readers will have a solid grasp of the ARM instruction set, enabling them to write efficient
assembly programs and understand the underlying mechanisms of ARM-based computing
platforms.
2.2 Data Processing Instructions
Move instructions in the ARM architecture provide a fundamental way to transfer data between
registers or load immediate values into registers. These instructions are essential for initializing
variables, handling constants, and facilitating data movement within the processor.
Syntax:
<instruction>{<cond>}{S} Rd, N
Where:
MOV (Move): Transfers a 32-bit value (either from a register or an immediate constant) into the
destination register.
Rd = N
MVN (Move Not): Loads the bitwise complement (NOT operation) of the given 32-bit value into
the destination register.
Rd = ~N
The values allowed for operand N vary depending on the instruction. Typically, N can be a
register (Rm) or an immediate constant prefixed with #.
The following example demonstrates a basic move operation where the value from register r5 is
copied into register r7.
Pre-condition:
r5 = 5
r7 = 8
Instruction:
MOV r7, r5 ; Copy the value of r5 into r7
Post-condition:
r5 = 5
r7 = 5
This operation overwrites the previous value of r7 with the value from r5. Move instructions are
widely used in arithmetic operations, control logic, and register manipulations in ARM-based
systems.
In the previous example, we used the MOV instruction where N was a simple register. However,
N can also be a register (Rm) that undergoes preprocessing using the barrel shifter before being
used in a data processing instruction. The ARM processor has a unique feature that allows
shifting the 32-bit binary value in a register left or right by a specific number of positions
before it is processed by the arithmetic logic unit (ALU). This shifting mechanism enhances the
efficiency and flexibility of many operations. Not all data processing instructions use the barrel
shifter. Some examples that do not involve shifting include:
The shift operation happens within the same cycle as the instruction, making it highly efficient.
This feature is particularly useful when loading constants into registers or performing fast
multiplication or division by powers of 2.
To understand the barrel shifter, let us modify the previous example by adding a shift operation.
In this case, register Rn is used directly in the ALU without any preprocessing, while another
register passes through the barrel shifter before entering the ALU. Figure 2.1 illustrates how data
moves between the ALU and the barrel shifter.
Example 3.2
We apply a logical shift left (LSL) to register Rm before transferring the result to the destination
register. This is similar to using the shift operator (<<) in C programming. The MOV instruction
then stores the shifted value in Rd.
Pre-condition:
r4 = 3
r6 = 10
Instruction:
Post-condition:
r4 = 3
r6 = 24
This operation shifts the value in register r4 left by three positions (multiplying by 8) and stores
the result in r6.
The five different types of shift operations available in the barrel shifter are summarized in Table
3.2.
This instruction adds two 32-bit values and includes the carry flag in the addition.
PRE:
r0 = 0x00000000
r1 = 0x00000002
r2 = 0x00000003
Carry = 1
Instruction:
ADC r0, r1, r2
POST:
r0 = 0x00000006 (2 + 3 + 1)
2. ADD (Addition)
This instruction adds two 32-bit values and stores the result in the destination register.
PRE:
r0 = 0x00000000
r1 = 0x00000004
r2 = 0x00000002
Instruction:
ADD r0, r1, r2
POST:
r0 = 0x00000006 (4 + 2)
This instruction subtracts the first value from the second value and stores the result.
PRE:
r0 = 0x00000000
r1 = 0x00000005
r2 = 0x0000000A
Instruction:
RSB r0, r1, r2
POST:
r0 = 0x00000005 (10 - 5)
This instruction subtracts the first value from the second value and includes the inverted carry
flag.
PRE:
r0 = 0x00000000
r1 = 0x00000006
r2 = 0x0000000C
Carry = 1
Instruction:
RSC r0, r1, r2
POST:
This instruction subtracts two values and includes the inverted carry flag.
PRE:
r0 = 0x00000000
r1 = 0x00000009
r2 = 0x00000004
Carry = 1
Instruction:
SBC r0, r1, r2
POST:
r0 = 0x00000004 (9 - 4 - !(1) = 9 - 4 - 0)
6. SUB (Subtraction)
This instruction subtracts the second value from the first value.
PRE:
r0 = 0x00000000
r1 = 0x00000007
r2 = 0x00000003
Instruction:
SUB r0, r1, r2
POST:
r0 = 0x00000004 (7 - 3)
Logical Instructions in ARM Controller (Simple Explanation)
Logical instructions perform bitwise operations on two registers and store the result in a
destination register.
PRE:
r0 = 0x00000000
r1 = 0x0000000F (0000 1111 in binary)
r2 = 0x00000006 (0000 0110 in binary)
Instruction:
AND r0, r1, r2
POST:
Explanation:
The AND operation keeps only the bits that are 1 in both numbers.
PRE:
r0 = 0x00000000
r1 = 0x0000000F (0000 1111 in binary)
r2 = 0x00000006 (0000 0110 in binary)
Instruction:
ORR r0, r1, r2
POST:
Explanation:
The OR operation sets a bit to 1 if either of the values has a 1 in that position.
PRE:
r0 = 0x00000000
r1 = 0x0000000F (0000 1111 in binary)
r2 = 0x00000006 (0000 0110 in binary)
Instruction:
EOR r0, r1, r2
POST:
Explanation:
XOR sets a bit to 1 only if the bits in the two values are different.
This instruction clears specific bits using the AND NOT operation.
PRE:
r0 = 0x00000000
r1 = 0x0000000F (0000 1111 in binary)
r2 = 0x00000006 (0000 0110 in binary)
Instruction:
BIC r0, r1, r2
POST:
Explanation:
The BIC instruction clears (sets to 0) the bits that are 1 in the second number.
Summary Table
Comparison instructions compare or test a register with a 32-bit value but do not store the
result. Instead, they update the condition flags in the CPSR (Current Program Status Register).
These flags are then used to control program flow (e.g., deciding whether to jump to another
part of the program).
This instruction adds a register value to another number and updates the flags.
PRE:
r1 = 0x00000003 (3 in decimal)
r2 = 0x00000005 (5 in decimal)
Instruction:
CMN r1, r2
Explanation:
This checks the result of r1 + r2 without storing it, just setting the flags.
2. CMP (Compare - Subtracts Two Values and Sets Flags)
This instruction subtracts one value from another and updates the flags.
PRE:
r1 = 0x00000007 (7 in decimal)
r2 = 0x00000007 (7 in decimal)
Instruction:
CMP r1, r2
Explanation:
PRE:
Instruction:
TEQ r1, r2
Explanation:
XOR (Exclusive OR) of two equal values gives 0, so the Zero Flag (Z) is set.
4. TST (Test Bits - Bitwise AND and Sets Flags)
PRE:
Instruction:
TST r1, r2
Explanation:
Summary Table
1. MUL (Multiply)
The MUL instruction multiplies two 32-bit registers and stores the result in a destination
register.
Example:
Multiply r1 and r2, store the result in r0.
Pre-condition:
r1 = 3
r2 = 4
r0 = 0
Instruction:
MUL r0, r1, r2
Post-condition:
r0 = 12 (3 × 4 = 12)
The MLA instruction multiplies two 32-bit values and adds a third value before storing the
result in a destination register.
Example:
Multiply r1 and r2, then add r3, and store the result in r0.
Pre-condition:
r1 = 3
r2 = 4
r3 = 2
r0 = 0
Instruction:
MLA r0, r1, r2, r3
Post-condition:
r0 = 14 (3 × 4 + 2 = 14)
The UMULL instruction performs unsigned 32-bit multiplication but stores the full 64-bit
result in two registers.
The UMLAL instruction performs unsigned multiplication and adds an existing 64-bit value
before storing the result in two registers.
The SMULL instruction performs signed 32-bit multiplication and stores the full 64-bit result
in two registers. This is useful when dealing with negative numbers.
The SMLAL instruction performs signed multiplication and adds the result to an existing 64-
bit value before storing the final result in two registers.
SMULL and SMLAL are particularly useful when handling signed numbers.
MLA and UMLAL are ideal for cases where an additional value needs to be accumulated into the
multiplication result.
These instructions are widely used in mathematical computations, digital signal processing,
and high-performance applications.
2.1.4 Branch Instructions
Branch instructions change the flow of a program or call a subroutine. They are essential for
creating loops, conditional statements (if-then-else), and function calls. When a branch
instruction is executed, it updates the program counter (pc) to a new address, redirecting the
program’s execution.
Syntax:
Explanation:
The label is stored as a signed offset relative to pc and must be within 32 MB of the branch
instruction.
The T bit in the cpsr (Current Program Status Register) determines whether the processor is in
ARM or Thumb mode.
1) If T = 1, the processor switches to Thumb mode.
2) If T = 0, it remains in ARM mode.
This allows efficient execution of both 32-bit ARM and 16-bit Thumb instructions.
LOOP:
MOV R0, #5 ; R0 = 5
Explanation:
FUNCTION:
MOV R0, #10 ; R0 = 10
BX LR ; Return to the caller
Explanation:
BL FUNCTION → Jumps to FUNCTION and stores the return address in the link
register (LR).
MOV R0, #10 → Assigns 10 to R0 inside the function.
BX LR → Returns to the instruction after BL FUNCTION using the stored address in
LR.
The BL (Branch with Link) instruction is used for function calls, allowing the program to
return after execution.
Load-store instructions move data between memory and processor registers. These instructions
help the processor read from and write to memory. There are three types:
These instructions transfer a single data item between memory and a register. The data can be
32-bit words, 16-bit halfwords, or 8-bit bytes.
Fetches the value stored at the memory address in r3 and places it into r2.
LDR r2, [r3] ; Load r2 with the value from memory at address in r3
Saves the value from r2 into the memory location stored in r3.
Explanation:
ARM supports different ways to access memory, known as addressing modes. These determine
how the memory address is calculated during load-store operations.
1Preindex with Writeback – The address is calculated as base + offset, and the base register is
updated.
Example: LDR r0, [r1, #4]! (Loads data from r1 + 4 and updates r1.)
2Preindex – The address is base + offset, but the base register remains unchanged.
Example: LDR r0, [r1, #4] (Loads data from r1 + 4, but r1 is not updated.)
3Postindex – The address is initially base, and then the base register is updated.
Example: LDR r0, [r1], #4 (Loads data from r1, then updates r1 to r1 + 4.)
These methods allow efficient memory access and control in ARM processors.
3.3.3 Multiple-Register Transfer
ARM provides load-store multiple instructions that move multiple registers between memory
and the processor in a single instruction. These instructions are useful for efficiently transferring
blocks of data, saving/restoring processor states, and managing stacks.
1️⃣ Preindex with Writeback – The address is calculated before loading, and the base register is
updated.
2️⃣ Preindex – The address is calculated before loading, but the base register is not updated.
LDR r0, [r1, #0x4] → Loads r0 from mem[r1 + 4], r1 remains the same.
LDR r0, [r1, r2] → Loads r0 from mem[r1 + r2], r1 remains unchanged.
LDR r0, [r1, -r2, LSR #0x4] → Loads r0 from mem[r1 - (r2 >> 4)], r1 remains
unchanged.
3️⃣ Postindex – The address is used first, and then the base register is updated.
These multiple-register transfer instructions improve efficiency in memory operations and are
widely used in stack management and function calls.
In ARM, stack operations use Load Multiple (LDM) for popping values from the stack and
Store Multiple (STM) for pushing values onto the stack. The stack can grow up (ascending) or
down (descending) and can be full or empty, resulting in different addressing modes.
STMIB sp!, {r0, r1} ; Push r0 and r1 onto the stack (increment before
storing)
LDMDA sp!, {r0, r1} ; Pop r0 and r1 from the stack (decrement after
loading)
Example:
STMDB sp!, {r0, r1} ; Push r0 and r1 onto the stack (decrement before
storing)
LDMIA sp!, {r0, r1} ; Pop r0 and r1 from the stack (increment after
loading)
Example:
STMIA sp!, {r0, r1} ; Push r0 and r1 onto the stack (increment after
storing)
LDMDB sp!, {r0, r1} ; Pop r0 and r1 from the stack (decrement before
loading)
Example:
STMDA sp!, {r0, r1} ; Push r0 and r1 onto the stack (decrement after
storing)
LDMIB sp!, {r0, r1} ; Pop r0 and r1 from the stack (increment before
loading)
3.3.4 Swap Instruction (SWP) –
The SWP (Swap) instruction is a special type of load-store operation that exchanges data
between a memory location and a register in a single atomic operation. This means no other
instruction or process can access that memory location while the swap is taking place, ensuring
data consistency.
Syntax:
SWP{B}{<cond>} Rd, Rm, [Rn]
Where:
How It Works:
SWP r0, r1, [r2] ; Swap memory at address r2 with registers r0 and r1
Explanation:
Suppose:
1. r1 = 0x12345678 (New value to store in memory)
2. Memory at [r2] = 0xABCDEF00
After execution:
1. r0 = 0xABCDEF00 (Old memory value loaded into r0).
2. Memory at [r2] = 0x12345678 (New value from r1 stored in memory).
SWPB r0, r1, [r2] ; Swap a byte between memory and a register
Suppose:
1. r1 = 0xFF (New byte to store in memory).
2.Memory at [r2] = 0xAB
After execution:
1. r0 = 0xAB (Old byte loaded into r0).
2. Memory at [r2] = 0xFF (New byte stored).
A Software Interrupt (SWI) is a special instruction that allows a program to request services
from the operating system (OS), such as printing a message or reading user input.
How It Works
Syntax
SWI <SWI_number>
Task:
Example Code
MOV r0, #1 ; File descriptor for standard output (screen)
LDR r1, =message ; Load address of message into r1
MOV r2, #13 ; Length of the message
MOV r7, #4 ; System call number for printing (Linux ARM)
SWI 0 ; Call operating system to print
MOV r7, #1 ; System call number for exit
SWI 0 ; Call OS to terminate the program
Explanation
The Program Status Register (PSR) stores important information about the processor, such as:
1. MRS (Move PSR to Register): Copies PSR (CPSR or SPSR) into a general-purpose register.
2. MSR (Move Register to PSR): Writes a register value or an immediate value into PSR.
Syntax (Simple)
MRS Rd, <CPSR | SPSR> ; Copy PSR to a register
MSR <CPSR | SPSR>_<field>, Rm ; Move a register value to PSR
MSR <CPSR | SPSR>_<field>, #immediate ; Move an immediate value to PSR
Code
MOV r0, #0x1F ; Load System Mode value
MSR cpsr_c, r0 ; Set CPSR to System Mode
Simple Breakdown
Coprocessor instructions extend the ARM instruction set by allowing specialized operations
such as:
Syntax
CDP cp, opcode1, Cd, Cn, opcode2 ; Perform an operation in coprocessor
MRC cp, opcode1, Rd, Cn, Cm, opcode2 ; Read from coprocessor register to a
CPU register
MCR cp, opcode1, Rd, Cn, Cm, opcode2 ; Write from CPU register to a
coprocessor register
LDC cp, Cd, [address] ; Load memory to coprocessor
STC cp, Cd, [address] ; Store memory from coprocessor
This example reads the processor identification number from CP15 register c0 and stores it
in r10.
MRC p15, 0, r1, c1, c0, 0 ; Read CP15 control register c1 into r1
ORR r1, r1, #0x4 ; Set bit to enable cache
MCR p15, 0, r1, c1, c0, 0 ; Write modified value back to CP15
ARM does not have a direct instruction to move a 32-bit constant into a register because its
instructions are also 32 bits long.
Syntax
LDR Rd, =constant ; Load a 32-bit constant into Rd
ADR Rd, label ; Load an address into Rd
Key Takeaways
LDR → Loads a constant into a register.
ADR → Loads an address into a register.
If the constant cannot be encoded directly, ARM stores it in memory and loads it using
LDR.