0% found this document useful (0 votes)
25 views47 pages

Module 1 and 2 NOTES-MC

The document provides an overview of microcontrollers and microprocessors, detailing their definitions, components, applications, and differences. It categorizes applications into low, medium, and high-level for both microcontrollers and microprocessors, highlighting their use in various fields such as IoT, automotive, and embedded systems. Additionally, it discusses design philosophies like RISC and ARM, along with memory types and bus technologies relevant to embedded systems.

Uploaded by

akashgadde05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views47 pages

Module 1 and 2 NOTES-MC

The document provides an overview of microcontrollers and microprocessors, detailing their definitions, components, applications, and differences. It categorizes applications into low, medium, and high-level for both microcontrollers and microprocessors, highlighting their use in various fields such as IoT, automotive, and embedded systems. Additionally, it discusses design philosophies like RISC and ARM, along with memory types and bus technologies relevant to embedded systems.

Uploaded by

akashgadde05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

MODULE 1

1.1 What is a Microcontroller?

A microcontroller (MCU) is a compact integrated circuit designed to perform specific tasks in


an embedded system. It contains a processor (CPU), memory (RAM, ROM/Flash), and
input/output (I/O) peripherals on a single chip. Unlike general-purpose computers,
microcontrollers are optimized for real-time control applications and embedded systems.

Here are some commonly used microcontrollers:

1. ARM Cortex-M Series (STM32, NXP LPC, TI MSP432)


 Used in industrial automation, robotics, and IoT devices.
2. AVR Microcontrollers (ATmega328, ATtiny85)
 Popular in Arduino boards, used in DIY electronics and robotics.
3. PIC Microcontrollers (PIC16F877A, PIC18F4550)
 Used in automotive, medical, and consumer electronics.
4. ESP Series (ESP8266, ESP32)
 Widely used in IoT applications, home automation, and smart devices.
5. 8051 Microcontrollers (Intel 8051, Atmel 89C51)
 Used in legacy embedded systems, basic automation projects.

1.2 Applications of Microcontrollers

Microcontrollers can be categorized based on their complexity and application levels into low-
level, medium-level, and high-level applications.

Low-Level Applications

Low-level applications of microcontrollers involve simple, basic tasks that require minimal
processing power and limited peripheral interfacing. These applications are typically found in
consumer appliances and everyday electronic devices. Examples include automatic door openers,
temperature controllers in home appliances, basic remote controls, and simple LED-based
display systems. Such systems use microcontrollers primarily for on/off control, basic
automation, and sensor-based adjustments without complex computations.

Medium-Level Applications

Medium-level applications involve microcontrollers performing more complex tasks, often


requiring real-time processing and interaction with multiple sensors and actuators. These
applications are commonly found in industrial automation, automotive systems, and medical
devices. Examples include motor speed controllers in manufacturing plants, electronic
dashboards in vehicles, digital blood pressure monitors, and smart home automation systems.
These microcontrollers handle tasks such as pulse-width modulation (PWM) for motor control,
communication through serial protocols like UART or I2C, and real-time decision-making based
on sensor inputs.
High-Level Applications

High-level applications require microcontrollers with high processing capabilities, multiple


communication interfaces, and advanced real-time processing. These are typically found in
complex embedded systems, Internet of Things (IoT) devices, robotics, aerospace systems, and
artificial intelligence-based embedded applications. Examples include autonomous drones,
robotic arms used in precision surgery, advanced automotive control units such as anti-lock
braking systems (ABS) and adaptive cruise control, and smart IoT-based security surveillance
systems. In such applications, microcontrollers interact with multiple peripherals, process large
volumes of data, support wireless communication (Wi-Fi, Bluetooth, LoRa), and execute
complex algorithms for decision-making and automation.

1.3 What is a Microprocessor?

A microprocessor (MPU) is the central unit of a computing system that performs arithmetic and
logic operations. Unlike a microcontroller, a microprocessor does not have built-in memory or
input/output peripherals and requires external components for full functionality. It is used in
more complex computing tasks that demand higher processing power.

Examples of Microprocessors

Microprocessors come in various architectures and are used in different applications. Some
common examples include:

1. Intel x86 Series (Intel Core i3, i5, i7, i9, Xeon, Pentium)
-Used in personal computers, workstations, and servers.
2. AMD Ryzen Series (Ryzen 3, 5, 7, 9, EPYC, Threadripper)
-Found in gaming PCs, high-performance computing, and cloud servers.
3. ARM-Based Processors (Apple M1, Qualcomm Snapdragon, NVIDIA Tegra,
Samsung Exynos)
-Commonly used in smartphones, tablets, and embedded AI applications.
4. RISC-V Processors (SiFive, Kendryte K210)
-Emerging in open-source computing and IoT applications.
5. IBM Power and SPARC Processors (IBM Power9, Oracle SPARC)
-Used in enterprise servers, high-performance computing, and data centers.

1.4 Applications of Microprocessors

Microprocessors can be categorized based on their level of complexity and computational


requirements.

Low-Level Applications

Low-level applications of microprocessors involve simple computing tasks that require minimal
processing power. These applications include calculators, basic electronic cash registers, and
simple point-of-sale (POS) systems. Such systems primarily rely on basic arithmetic and logic
operations without requiring high-speed data processing or multitasking capabilities.

Medium-Level Applications

Medium-level applications involve tasks that require real-time data processing, multitasking, and
moderate computational power. Examples include personal computers, ATMs, industrial
automation systems, and medical imaging devices such as ultrasound machines. Microprocessors
in these applications handle tasks such as running an operating system, processing multiple
input/output operations, and managing real-time user interactions.

High-Level Applications

High-level applications require microprocessors with extensive computational capabilities, high-


speed processing, and complex multitasking. These applications include supercomputers, cloud
computing servers, artificial intelligence (AI) accelerators, and autonomous vehicle control
systems. Examples include AI-powered medical diagnostics, high-frequency trading systems in
finance, aerospace navigation systems, and deep learning-based robotics. These microprocessors
support large-scale parallel processing, high-speed data transfer, and advanced machine learning
algorithms for real-time decision-making.

1.5 Comparison Between Microcontroller and Microprocessor

Feature Microcontroller (MCU) Microprocessor (MPU)


Definition A compact integrated circuit with CPU, A processing unit that requires external
memory, and I/O peripherals on a single memory, I/O, and other components for
chip. functionality.
Components Includes CPU, RAM, ROM/Flash, Only contains the CPU; requires external
timers, and I/O ports within a single RAM, ROM, and I/O devices.
chip.
Purpose Designed for specific embedded Used for general-purpose computing,
applications with real-time processing. multitasking, and high-performance
applications.
Processing Lower compared to microprocessors, Higher processing power, suitable for
Power optimized for control-based tasks. complex computing tasks.
Memory Limited, built-in RAM and ROM/Flash Requires external RAM and ROM for
memory. program execution.
Power Low power consumption, ideal for Higher power consumption due to complex
Consumption battery-operated devices. operations.
Cost Generally low-cost due to integration of Higher cost due to external components
peripherals. and higher performance.
Applications Used in embedded systems, IoT devices, Used in computers, servers, smartphones,
industrial automation, and home AI systems, and high-performance
appliances. computing.
Examples STM32, ATmega328 (Arduino), ESP32, Intel Core i7, AMD Ryzen, ARM Cortex-
PIC16F877A. A, Apple M1.
1.6 The RISC Design Philosophy

RISC (Reduced Instruction Set Computer) processors follow a simple and efficient approach to
executing instructions. Unlike CISC (Complex Instruction Set Computer) processors, which use
many complex instructions that take multiple cycles to execute, RISC processors use a small
set of simple instructions, each designed to complete in a single clock cycle. Figure 1.1
illustrates these major differences.

 Simple Instructions – RISC processors use a small set of simple instructions, each
designed to execute in one clock cycle. In contrast, CISC processors have complex
instructions that take multiple cycles.
 Fixed Instruction Size – All RISC instructions are of the same size, allowing the processor
to fetch and decode instructions efficiently. CISC instructions, being variable in size,
make execution slower.
 Load-Store Architecture – RISC processors separate memory access from computation,
requiring data to be loaded into registers before processing. This reduces memory delays,
unlike CISC processors that access memory directly.
 Pipelining for Speed – The uniform instruction size in RISC makes pipelining more
effective, allowing multiple instructions to be executed simultaneously at different stages.
CISC processors struggle with pipelining due to variable execution times.
 More Registers, Less Memory Access – RISC processors have more registers to store
frequently used data, reducing slow memory accesses. CISC processors rely more on
external memory, making them slower in comparison.
Comparison Between CISC and RISC

Feature CISC (Complex Instruction Set RISC (Reduced Instruction Set


Computing) Computing)
Instruction Set Large and complex Small and simple
Instruction Length Variable Fixed
Execution Time Takes multiple clock cycles Mostly completes in one clock cycle
Number of Many instructions, some performing Few instructions, each performing a
Instructions multiple tasks single task
Uses less RAM due to complex Requires more RAM as instructions
Memory Usage
instructions are simple
Fewer registers, relies more on More registers, reduces memory
Registers
memory access
Faster due to simpler, optimized
Performance Slower due to complex operations
operations
Design Complexity More complex hardware Simpler hardware design
Power
Higher due to complex execution Lower, making it more efficient
Consumption
Examples Intel x86, AMD processors ARM, MIPS, PowerPC

RISC is preferred for power-efficient and high-speed processing, while CISC is used for
compatibility and handling complex instructions efficiently.

1.7 The ARM Design Philosophy

 Power Efficiency for Portable Devices – ARM processors are designed to be small and
power-efficient, making them ideal for battery-powered devices like mobile phones
and tablets. Lower power consumption helps in extending battery life, which is
essential for portable embedded systems.
 High Code Density for Limited Memory – Since embedded systems often have
limited memory, ARM processors provide high code density, ensuring that more
instructions can fit into a smaller memory space. This is crucial for applications like
mass storage devices and mobile phones, where compact memory is a key requirement.
 Cost-Effective Design – Embedded systems are cost-sensitive, and ARM processors
support slow and low-cost memory devices, making them suitable for high-volume
applications like digital cameras and industrial controllers. This helps in reducing
overall manufacturing costs.
 Compact Die Size for Integration – ARM processors are designed to take up less space
on the chip (die), allowing manufacturers to add more specialized peripherals on the
same chip. This results in lower production costs and enables single-chip solutions for
various applications.
 Enhanced Debugging for Faster Development – ARM includes hardware debugging
technology, allowing software engineers to monitor and analyze the processor’s
execution in real-time. This improves troubleshooting, speeds up development time,
and helps bring products to market faster.
 Balanced RISC Approach for Embedded Systems – While ARM is based on RISC
architecture, it is not a pure RISC processor. It balances efficiency and performance,
ensuring low power consumption without compromising system performance, which is
essential for modern embedded applications.

1.8 Instruction Set for Embedded Systems

 Some Instructions Take More Time – Unlike basic RISC processors, not all ARM
instructions take just one cycle. Some, like load-store multiple, transfer data to several
registers at once, making memory access faster and saving space in the code.
 Barrel Shifter for Better Performance – ARM has a barrel shifter that modifies data
before processing, helping to perform complex tasks faster without needing extra
instructions.
 Thumb 16-bit Instruction Set – ARM can use both 16-bit and 32-bit instructions. The
16-bit Thumb instructions help save memory and make programs smaller and more
efficient.
 Conditional Execution Saves Time – ARM allows instructions to run only if a
condition is met, reducing the need for extra jump instructions and making programs
faster.
 Special DSP Instructions for Faster Processing – ARM includes extra instructions for
fast math operations, making it good for digital signal processing (DSP) without
needing a separate DSP chip.

These features make ARM one of the most popular processors for embedded systems
worldwide.

1.8 Embedded System Hardware

Embedded systems are used in many devices, from small sensors in factories to real-time
control systems in space missions. These systems combine software and hardware, with each
part designed for efficiency and sometimes future upgrades. A typical embedded system has
four main hardware components: Figure 1.2 shows a typical embedded device based on an ARM
core. Each box represents a feature or function. The lines connecting the boxes are the buses
carrying data. We can separate the device into four main hardware components:
 ARM Processor – This is the brain of the system, processing instructions and handling
data. Different versions of the ARM processor are available, depending on the needs of
the device. It includes a core for execution and extra components like memory
management and caches to improve performance.
 Controllers – These manage key parts of the system. Common types include the
interrupt controller, which handles signals from different parts of the system, and the
memory controller, which manages data storage and retrieval.
 Peripherals – These are the input and output devices that allow the embedded system
to interact with the outside world. They make each device unique by adding special
functions, such as displays, sensors, and communication interfaces.
 Bus – The bus is a pathway that connects all parts of the system, allowing data to move
between components efficiently.

Each part of an embedded system is carefully selected to ensure smooth performance, low
power consumption, and cost-effectiveness.

1.8.1 ARM Bus Technology

Embedded systems use a different bus system than PCs. Unlike the PCI bus in computers,
which connects external devices, embedded systems use an on-chip bus to link peripherals with
the ARM core.
 Bus Masters & Slaves – The ARM processor is a bus master, meaning it starts data
transfers. Peripherals act as bus slaves, responding to requests from the master.
 Two Levels of Bus Architecture – The physical level defines the bus width (16, 32, or
64 bits), while the protocol level sets rules for communication between the processor and
peripherals.

1.8.2 AMBA Bus Protocol

The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 for ARM
processors. It allows easy reuse of peripherals across different projects, improving
compatibility and speed.

Types of AMBA Buses:

1. ASB & APB – Early ARM system and peripheral buses.


2. AHB (Advanced High-Performance Bus) – A faster, high-bandwidth bus that supports 64-
bit and 128-bit widths.
3. Multi-layer AHB – Allows multiple bus masters, enabling parallel processing.
4. AHB-Lite – A simpler version with a single bus master, ideal for smaller designs.

These buses improve data transfer speed, efficiency, and system performance, making ARM-
based embedded systems more powerful and flexible.

1.8.3 Memory

Embedded systems require memory to store and execute code. The choice of memory impacts
price, performance, and power consumption. Important characteristics to consider include
memory hierarchy, width, and type. If memory needs to operate at a higher speed to maintain
bandwidth requirements, power consumption may increase.

1.8.3.1 Hierarchy

Memory in computer systems is arranged hierarchically. Some embedded devices support


external off-chip memory, while others integrate cache memory within the processor to improve
performance. Figure 1.3 illustrates the memory trade-offs: the fastest cache memory is closest to
the processor core, while slower secondary memory is located further away.
 Cache Memory: Positioned between the processor core and main memory, cache speeds
up data transfer. However, while it improves general performance, it introduces
unpredictability in execution time, making it less suitable for real-time systems. Many
small embedded systems do not require a cache.
 Flash ROM: Used to store firmware and long-term data, flash ROM can be written to but
at a slower rate than read operations. It is software-controlled and does not require
additional hardware circuitry, making it cost-effective. It is increasingly used as an
alternative for secondary storage.
 DRAM (Dynamic RAM): The most common type of RAM, DRAM is cost-effective but
requires periodic refreshing. A DRAM controller must be set up before usage to handle
memory refresh cycles.
 SRAM (Static RAM): Faster than DRAM, SRAM does not require refreshing. It offers
shorter access times but is more expensive and consumes more silicon space. As a result,
it is mainly used for fast memory and cache storage.
 SDRAM (Synchronous DRAM): A subtype of DRAM, SDRAM operates at higher
clock speeds and synchronizes with the processor bus. It pipelines data access for
efficient bursts, outperforming traditional DRAM in performance.

1.8.3.2 Width

Memory width defines the number of bits retrieved per access—typically 8, 16, 32, or 64 bits.
Memory width significantly influences performance and cost.

1.8.3.3 Types

There are various types of memory in ARM-based embedded systems:


 ROM (Read-Only Memory): ROM is permanent and cannot be reprogrammed after
production. It is used for high-volume applications that do not require updates.
 Flash ROM: Unlike traditional ROM, Flash ROM is writable and used for firmware
storage and non-volatile data retention.
 DRAM (Dynamic RAM): Requires periodic refreshing but offers a low-cost, high-
capacity solution for system memory.
 SRAM (Static RAM): Faster than DRAM, SRAM is primarily used for caching and
high-speed tasks.
 SDRAM (Synchronous DRAM): A high-performance memory type that synchronizes
with the processor clock for efficient data transfer.

1.8.4 Peripherals

A peripheral is a device that helps a processor communicate with the outside world. It can send
and receive data from sensors, displays, or other electronic devices.

Peripherals can be:

 Simple, like a device that sends data through a USB port.


 Complex, like a Wi-Fi module that connects to the internet.

In ARM systems, all peripherals are memory-mapped. This means they are controlled using
special memory locations. Each peripheral has a set of registers (small storage spaces) that the
processor can read or write to.

Some special peripherals are called controllers because they manage important tasks. Two
important controllers are:

1. Memory Controller – Helps the processor use different types of memory.


2. Interrupt Controller – Helps the processor handle important signals from devices.

Memory Controllers

A memory controller connects different types of memory (like RAM or Flash memory) to the
processor.

 When the system starts, some memory is already active so the startup code can run.
 Other types of memory, like DRAM, need to be set up by software before they can be
used.

The memory controller makes sure the processor can read and write data from memory correctly.

Interrupt Controllers

This signal is called an interrupt because it stops the processor from its regular work to handle
the important event.

An interrupt controller helps the processor decide:

 Which device is sending the interrupt?


 Which interrupts are most important?

There are two types of interrupt controllers in ARM processors:

1. Standard Interrupt Controller


 Sends a signal when a device needs attention.
 The processor checks a register to find out which device sent the signal.
2. Vector Interrupt Controller (VIC)
 It prioritizes interrupts, meaning urgent ones are handled first.
 It can also send the processor directly to the correct interrupt handler (a special
function that deals with the interrupt).

1.9 Embedded System Software

An embedded system needs software to function properly. Figure 1.4 shows four typical
software components required to control an embedded device.

This software is made up of different parts, each with a specific role in controlling the hardware
and managing tasks. The four main parts of embedded system software are:
 Initialization Code: The first code that runs when the system starts. It sets up basic
hardware like memory and processor settings before handing control to the operating
system or another program.
 Operating System: Manages system resources like memory, processing power, and
devices. Some embedded systems use a full operating system, while others use a simple
task scheduler.
 Device Drivers: Act as a link between the hardware and software, helping the system
control devices like sensors, displays, and communication modules.
 Applications: Perform specific tasks, like running a diary app on a mobile phone. Some
systems run multiple applications at the same time, managed by the operating system.

1.9.1 Initialization (Boot) Code

When an embedded system starts, the boot code runs first. It takes the processor from a reset
state to a working state where it can execute programs. The main tasks of boot code include:

 Setting up the memory controller to manage memory operations.


 Initializing the processor cache to improve performance.
 Configuring important hardware devices like storage or communication ports.

In some cases, boot code also runs hardware tests to check if all components are working
before loading the main software. If the system does not have a full operating system, the boot
code may start a simple task scheduler that controls when tasks run, or a debug monitor to help
developers check the system.

1.9.2 Operating System

After initialization, the operating system (OS) takes control. It organizes system resources,
manages memory, and ensures multiple tasks can run smoothly. Some embedded systems use
full operating systems, while others rely on simpler mechanisms.

Operating systems for embedded systems can be classified into:

1. Real-Time Operating Systems (RTOS): Designed for quick responses to events.


A. Hard real-time OS: Used in critical applications like airbags, where timing is
crucial.
B. Soft real-time OS: Used in applications like digital music players, where minor
delays are acceptable.
2. Platform Operating Systems: These support more complex applications and require
more memory. Linux is a common example.

An operating system makes sure the embedded system runs efficiently. The choice of OS
depends on the needs of the application, whether it requires real-time performance or supports
general-purpose tasks.
1.9.3 Applications

Applications are programs that perform specific tasks on an embedded system. The operating
system manages these applications, ensuring they run smoothly. Some embedded systems run a
single application, while others support multiple applications running simultaneously.

ARM processors are used in many industries, including networking, automotive, mobile
devices, consumer electronics, storage, and imaging. For example:

 In networking, ARM processors power home gateways, DSL modems, and 802.11
wireless devices.
 In mobile devices, ARM is dominant in smartphones and tablets, making this the largest
application area.
 In storage and imaging, ARM processors are found in hard drives and inkjet printers,
where cost efficiency and high production volumes are key.

1.10 ARM Processor Fundamentals

The ARM processor consists of different functional units connected by data buses, which allow
information to move between components. These functional units work together to process
instructions and data efficiently. Figure 2.1 shows a Von Neumann implementation of the
ARM—data items and instructions share the same bus.
Data Flow and Architecture

Data enters the ARM processor through the data bus, which carries both instructions and data
in a Von Neumann architecture. In contrast, the Harvard architecture separates these into two
different buses, allowing for faster data access.

Inside the processor, an instruction decoder translates instructions before execution. ARM
processors use a load-store architecture, meaning:

 Load instructions move data from memory into registers.


 Store instructions move data from registers back into memory.

Unlike some processors, the ARM does not process data directly in memory. Instead, all
calculations happen inside the processor using registers and the ALU (Arithmetic Logic Unit).

One special feature of the ARM processor is the barrel shifter, which can modify values before
they enter the ALU. This allows efficient computation of complex expressions and memory
addresses.

After processing, results are stored back into registers through the result bus. For memory
operations, an incrementer updates the address register so the next memory location can be
accessed automatically. The processor continues executing instructions until an exception or
interrupt occurs.

Registers in the ARM Processor

Registers are small, fast storage units inside the processor. The ARM core is a 32-bit
processor, so most instructions operate on 32-bit signed or unsigned values. If an 8-bit or 16-
bit value is loaded from memory, the hardware extends it to 32 bits before storing it in a register.

Key registers include:

 Rn and Rm: Source registers that hold data for calculations.


 Rd: The destination register where the result is stored.

Data flows from the register file into the ALU or MAC (Multiply-Accumulate) unit, which
performs calculations. After processing, results are either stored back in a register or used to
generate a memory address for load/store operations.

This efficient design allows ARM processors to execute instructions quickly while keeping
power consumption low, making them ideal for embedded systems.
2.1 Registers

Registers are small, fast storage units inside the ARM processor that hold either data or
memory addresses. They are identified by the letter "r" followed by a number (e.g., r4 refers to
register 4). Figure 2.2 shows the active registers available in user mode

General-Purpose and Special-Purpose Registers

The ARM processor has up to 18 active registers, consisting of 16 data registers and 2
processor status registers. These registers are 32-bit in size and vary in function based on the
processor mode.

Among these, three registers have specific tasks:

 r13 (Stack Pointer, sp): Holds the top of the stack in the current processor mode.
 r14 (Link Register, lr): Stores the return address when calling a subroutine.
 r15 (Program Counter, pc): Contains the memory address of the next instruction to
be fetched and executed.

The stack pointer (r13) and link register (r14) can sometimes be used as general-purpose
registers. However, this is not recommended in an operating system environment because the
stack pointer must always reference a valid stack frame.

Register Behavior in ARM State

In ARM state, registers r0 to r13 are orthogonal, meaning that most instructions can use any of
them interchangeably. However, r14 (lr) and r15 (pc) have unique roles and are treated
differently by some instructions.
In addition to the 16 data registers, the ARM processor includes two program status registers
(PSRs):

 cpsr (Current Program Status Register): Stores the current state of the processor.
 spsr (Saved Program Status Register): Holds a backup of cpsr when switching
between processor modes.

Register Availability and Processor Modes

The register file contains all registers available for programming. However, the number of
visible registers depends on the current mode of the processor. The ARM processor supports
multiple operating modes, such as user mode (for applications) and privileged modes (for
handling system-level tasks), which determine which registers can be accessed at a given time.

2.2 Current Program Status Register (CPSR)

The Current Program Status Register (CPSR) is a special 32-bit register in the ARM
processor that helps monitor and control internal operations. It is part of the register file and
plays a crucial role in managing the processor's state. Figure 2.3 shows the basic layout of a
generic program status register.

Structure of CPSR

The CPSR is divided into four main sections, each 8 bits wide:

1. Flags Field – Stores condition flags that indicate the outcome of operations.
2. Status Field – Reserved for future use in newer ARM designs.
3. Extension Field – Also reserved for potential future updates.
4. Control Field – Contains essential information such as:
 Processor mode (User mode, Supervisor mode, etc.)
 State of the processor (ARM or Thumb mode)
 Interrupt mask bits (To enable or disable specific interrupts)
Special Bits in CPSR

Some ARM processors have additional special bits that serve unique purposes. For example, the
J bit is found in processors with Jazelle technology, which allows the execution of 8-bit Java
instructions. Future ARM processors may introduce more bits to enhance monitoring and
control capabilities.

2.3 Pipeline

A pipeline is a technique used by RISC (Reduced Instruction Set Computing) processors like
ARM to speed up instruction execution. It allows multiple instructions to be processed
simultaneously by breaking the execution into stages, just like an assembly line in a car factory.

Three-Stage Pipeline in ARM7

Figure 2.7 shows a three-stage pipeline

A basic ARM pipeline has three main stages:

1. Fetch – The processor retrieves an instruction from memory.


2. Decode – The instruction is identified and prepared for execution.
3. Execute – The processor performs the operation and stores the result.

Each instruction moves through the pipeline, allowing new instructions to enter before the
previous ones finish execution. This process is called "filling the pipeline", making the CPU
more efficient by executing one instruction per cycle once the pipeline is full. Figure 2.8
illustrates the pipeline using a simple example.

Example of Instruction Execution in a Pipeline

Figure 2.8 illustrates the pipeline using a simple example. A pipeline helps execute instructions
efficiently by processing multiple steps simultaneously. Let’s understand this with a simple
example.
Three Instructions in the Pipeline

Imagine we have three instructions:

1. ADD – Adds two numbers.


2. SUB – Subtracts one number from another.
3. CMP – Compares two numbers.

These instructions enter the three-stage pipeline step by step:

 Cycle 1: The ADD instruction is fetched from memory.


 Cycle 2: The SUB instruction is fetched, while the ADD instruction is decoded.
 Cycle 3: The CMP instruction is fetched, the SUB instruction is decoded, and the ADD
instruction is executed.

This process is called "filling the pipeline." Once filled, the processor can execute one
instruction per cycle, improving speed.

Pipeline Execution in ARM7

In an ARM7 pipeline, an instruction is not considered executed until it completes the execute
stage. This means that for a three-stage pipeline, an instruction is fully executed only when the
fourth instruction is fetched. Figure 2.11 shows an instruction sequence on an ARM7 pipeline.

Example of Execution

 The MSR instruction enables IRQ interrupts.


 This happens only after MSR completes execution.
 The I bit in the cpsr is cleared at this stage, allowing IRQ interrupts.
 Once the ADD instruction reaches execution, the IRQ interrupts are enabled.

This shows that pipeline execution affects when instructions take effect in the system.
2.4 Exceptions, Interrupts, and the Vector Table
When an exception or interrupt occurs, the processor jumps to a specific memory address in
a special section called the vector table. This table contains instructions that direct the processor
to the correct routine for handling the event (see Table 2.6).

Vector Table Location

 The vector table is located at memory address 0x00000000.


 Some processors allow it to be placed at 0xffff0000 instead.
 Operating systems like Linux and Microsoft's embedded systems use this feature.

Types of Exception Vectors

1. Reset Vector → Runs first when the processor starts, leading to initialization.
2. Undefined Instruction Vector → Activated when the processor encounters an unknown
instruction.
3. Software Interrupt (SWI) Vector → Triggered by a SWI instruction, often used to call
OS routines.
4. Prefetch Abort Vector → Occurs if an instruction is fetched from an unauthorized
address.
5. Data Abort Vector → Triggered when an instruction tries to access restricted data
memory.
6. Interrupt Request (IRQ) Vector → Used by external hardware to interrupt execution.

Each vector contains a branch instruction that points to the start of the specific handling
routine.
Core Extensions in ARM Processors

ARM processors include extra hardware features to boost performance, manage memory
efficiently, and add more functionality. These features vary across different ARM processor
families but generally include:

1. Cache and Tightly Coupled Memory (TCM)


2. Memory Management
3. Coprocessor Interface

Cache and Tightly Coupled Memory (TCM)

Cache is a small, high-speed memory located between the processor and main memory. It stores
frequently accessed data and instructions, allowing the processor to work faster by reducing
delays caused by slow external memory. Many ARM-based embedded systems use a single-
level cache inside the processor, though smaller systems may not need it. Figure 2.13. For
simplicity, we have called the glue logic that connects the memory system to the AMBA bus
logic and control.
Types of Cache in ARM Processors

1. Von Neumann Architecture – Uses a single unified cache for both instructions and data.
2. Harvard Architecture – Has separate caches for instructions and data, improving efficiency.

While cache improves speed, it does not guarantee predictable execution times, which is
essential for real-time systems.

Tightly Coupled Memory (TCM)

For real-time applications, Tightly Coupled Memory (TCM) is used. TCM is fast SRAM
located close to the processor, ensuring fixed, predictable access times. Unlike cache, TCM
does not rely on automatic fetching; instead, it is directly accessed by the processor, appearing as
part of the memory system.

Memory Management

Embedded systems often use different types of memory. To keep everything organized and
protect the system from errors, memory management hardware is used. ARM processors have
three types of memory management:

1. No Memory Protection – Simple systems use fixed memory without security features.
This works for small devices that don’t need protection from faulty applications.
2. Memory Protection Unit (MPU) – The MPU divides memory into regions and assigns
specific access permissions to each. It is useful for systems that need some protection
but don’t have a complex memory structure.
3. Memory Management Unit (MMU) – The MMU provides full memory protection by
using translation tables stored in memory. These tables manage virtual-to-physical
addresses and control access permissions, making MMUs ideal for advanced systems.

Coprocessors

Coprocessors can be added to an ARM processor to expand its capabilities. They do this by
adding new instructions or providing configuration registers. Multiple coprocessors can be
connected through the coprocessor interface.

Coprocessors are controlled using special ARM instructions that work like load and store
operations. For example, coprocessor 15 manages the cache, tightly coupled memory (TCM),
and memory management.

Some coprocessors add new instructions to the ARM instruction set. For example, special
instructions for vector floating-point (VFP) operations improve performance in mathematical
calculations. When the ARM processor decodes an instruction, it checks if a coprocessor should
handle it. If the coprocessor is missing or doesn’t recognize the instruction, the processor triggers
an undefined instruction exception, allowing the operation to be simulated in software.
MODULE 2: Introduction to the ARM Instruction Set

2.1 Introduction to the ARM Instruction Set

The ARM (Advanced RISC Machine) architecture has emerged as one of the most widely used
processor architectures in modern computing. Designed with power efficiency and high
performance in mind, ARM processors power a vast array of devices, from embedded systems
and mobile phones to high-performance computing platforms. The success of ARM can be
attributed to its Reduced Instruction Set Computing (RISC) architecture, which enables
streamlined instruction execution, lower power consumption, and enhanced processing
efficiency compared to traditional Complex Instruction Set Computing (CISC) architectures.

This chapter explores the ARM instruction set, providing a foundational understanding of its
design principles, instruction types, and execution mechanisms. The ARM instruction set follows
a load-store architecture, meaning that data manipulation occurs primarily between registers,
with memory access limited to specific instructions. This design significantly improves
processing speed and reduces complexity in instruction decoding.

Key features of the ARM instruction set include:

 Uniform and Fixed-Length Instructions – Most ARM instructions are 32-bit, ensuring
consistent execution efficiency. Modern ARM variants also support a 16-bit Thumb
instruction set, enabling compact code for memory-constrained applications.
 Conditional Execution – Unlike conventional architectures, ARM supports conditional
execution for nearly all instructions, reducing the need for branch instructions and improving
efficiency.
 Barrel Shifter Integration – ARM’s unique instruction set allows efficient data
manipulation through integrated shifting operations within arithmetic and logical
instructions.
 Multiple Addressing Modes – The architecture provides various addressing modes,
including register-based, immediate, and indexed addressing, offering flexibility in data
handling.
 Power Efficiency and Scalability – ARM processors optimize power consumption through
an efficient pipeline structure, making them ideal for battery-powered devices.

Throughout this chapter, we will delve into the structure of ARM instructions, classify them into
categories such as data processing, memory access, and control flow instructions, and
analyze their operational significance in real-world applications. By the end of this chapter,
readers will have a solid grasp of the ARM instruction set, enabling them to write efficient
assembly programs and understand the underlying mechanisms of ARM-based computing
platforms.
2.2 Data Processing Instructions

2.1.1 Move Instructions

Move instructions in the ARM architecture provide a fundamental way to transfer data between
registers or load immediate values into registers. These instructions are essential for initializing
variables, handling constants, and facilitating data movement within the processor.

Syntax:
<instruction>{<cond>}{S} Rd, N

Where:

 <instruction> specifies the operation (e.g., MOV or MVN).


 <cond> is an optional condition for conditional execution.
 {S} is an optional suffix that updates the condition flags.
 Rd is the destination register where the result is stored.
 N can be a register or an immediate value.

Types of Move Instructions

 MOV (Move): Transfers a 32-bit value (either from a register or an immediate constant) into the
destination register.
 Rd = N
 MVN (Move Not): Loads the bitwise complement (NOT operation) of the given 32-bit value into
the destination register.
 Rd = ~N

The values allowed for operand N vary depending on the instruction. Typically, N can be a
register (Rm) or an immediate constant prefixed with #.

Example: Move Instruction Usage

The following example demonstrates a basic move operation where the value from register r5 is
copied into register r7.

Pre-condition:
r5 = 5
r7 = 8

Instruction:
MOV r7, r5 ; Copy the value of r5 into r7

Post-condition:
r5 = 5
r7 = 5
This operation overwrites the previous value of r7 with the value from r5. Move instructions are
widely used in arithmetic operations, control logic, and register manipulations in ARM-based
systems.

2.1.2 Barrel Shifter

In the previous example, we used the MOV instruction where N was a simple register. However,
N can also be a register (Rm) that undergoes preprocessing using the barrel shifter before being
used in a data processing instruction. The ARM processor has a unique feature that allows
shifting the 32-bit binary value in a register left or right by a specific number of positions
before it is processed by the arithmetic logic unit (ALU). This shifting mechanism enhances the
efficiency and flexibility of many operations. Not all data processing instructions use the barrel
shifter. Some examples that do not involve shifting include:

 MUL (Multiply) – Used for multiplication operations.


 CLZ (Count Leading Zeros) – Determines the number of leading zero bits in a register.
 QADD (Signed Saturated Addition) – Performs 32-bit signed addition with saturation.

The shift operation happens within the same cycle as the instruction, making it highly efficient.
This feature is particularly useful when loading constants into registers or performing fast
multiplication or division by powers of 2.

Figure 2.1 Barrel shifter and ALU.

To understand the barrel shifter, let us modify the previous example by adding a shift operation.
In this case, register Rn is used directly in the ALU without any preprocessing, while another
register passes through the barrel shifter before entering the ALU. Figure 2.1 illustrates how data
moves between the ALU and the barrel shifter.
Example 3.2

We apply a logical shift left (LSL) to register Rm before transferring the result to the destination
register. This is similar to using the shift operator (<<) in C programming. The MOV instruction
then stores the shifted value in Rd.

Pre-condition:

r4 = 3
r6 = 10

Instruction:

MOV r6, r4, LSL #3 ; let r6 = r4 * 8 = (r4 << 3)

Post-condition:

r4 = 3
r6 = 24

This operation shifts the value in register r4 left by three positions (multiplying by 8) and stores
the result in r6.

The five different types of shift operations available in the barrel shifter are summarized in Table
3.2.

2.1.3 Arithmetic Instructions

Arithmetic Instructions in ARM Controller (Simple Explanation)


Table of Arithmetic Instructions
Instruction Operation Explanation Example Syntax
ADC Add with Carry Adds two 32-bit values and ADC R0, R1, R2 → R0 =
includes the carry flag. R1 + R2 + Carry
ADD Addition Adds two 32-bit values. ADD R0, R1, R2 → R0 =
R1 + R2
RSB Reverse Subtract Subtracts the first value from the RSB R0, R1, R2 → R0 =
second. R2 - R1
RSC Reverse Subtract Subtracts the first value from the RSC R0, R1, R2 → R0 =
with Carry second with carry. R2 - R1 - !(Carry)
SBC Subtract with Carry Subtracts two values and includes SBC R0, R1, R2 → R0 =
the carry flag. R1 - R2 - !(Carry)
SUB Subtraction Subtracts two 32-bit values. SUB R0, R1, R2 → R0 =
R1 - R2

Arithmetic Instructions in ARM Controller

1. ADC (Add with Carry)

This instruction adds two 32-bit values and includes the carry flag in the addition.

PRE:

 r0 = 0x00000000
 r1 = 0x00000002
 r2 = 0x00000003
 Carry = 1

Instruction:
ADC r0, r1, r2

POST:

 r0 = 0x00000006 (2 + 3 + 1)

2. ADD (Addition)

This instruction adds two 32-bit values and stores the result in the destination register.
PRE:

 r0 = 0x00000000
 r1 = 0x00000004
 r2 = 0x00000002

Instruction:
ADD r0, r1, r2

POST:

 r0 = 0x00000006 (4 + 2)

3. RSB (Reverse Subtract)

This instruction subtracts the first value from the second value and stores the result.

PRE:

 r0 = 0x00000000
 r1 = 0x00000005
 r2 = 0x0000000A

Instruction:
RSB r0, r1, r2

POST:

 r0 = 0x00000005 (10 - 5)

4. RSC (Reverse Subtract with Carry)

This instruction subtracts the first value from the second value and includes the inverted carry
flag.

PRE:

 r0 = 0x00000000
 r1 = 0x00000006
 r2 = 0x0000000C
 Carry = 1
Instruction:
RSC r0, r1, r2

POST:

 r0 = 0x00000005 (12 - 6 - !(1) = 12 - 6 - 0)

5. SBC (Subtract with Carry)

This instruction subtracts two values and includes the inverted carry flag.

PRE:

 r0 = 0x00000000
 r1 = 0x00000009
 r2 = 0x00000004
 Carry = 1

Instruction:
SBC r0, r1, r2

POST:

 r0 = 0x00000004 (9 - 4 - !(1) = 9 - 4 - 0)

6. SUB (Subtraction)

This instruction subtracts the second value from the first value.

PRE:

 r0 = 0x00000000
 r1 = 0x00000007
 r2 = 0x00000003

Instruction:
SUB r0, r1, r2

POST:

 r0 = 0x00000004 (7 - 3)
Logical Instructions in ARM Controller (Simple Explanation)

Logical instructions perform bitwise operations on two registers and store the result in a
destination register.

1. AND (Bitwise AND)

This instruction performs a bitwise AND operation between two values.

PRE:

 r0 = 0x00000000
 r1 = 0x0000000F (0000 1111 in binary)
 r2 = 0x00000006 (0000 0110 in binary)

Instruction:
AND r0, r1, r2

POST:

 r0 = 0x00000006 (0000 1111 & 0000 0110 = 0000 0110)

Explanation:

 The AND operation keeps only the bits that are 1 in both numbers.

2. ORR (Bitwise OR)

This instruction performs a bitwise OR operation between two values.

PRE:

 r0 = 0x00000000
 r1 = 0x0000000F (0000 1111 in binary)
 r2 = 0x00000006 (0000 0110 in binary)

Instruction:
ORR r0, r1, r2

POST:

 r0 = 0x0000000F (0000 1111 | 0000 0110 = 0000 1111)

Explanation:
 The OR operation sets a bit to 1 if either of the values has a 1 in that position.

3. EOR (Bitwise Exclusive OR - XOR)

This instruction performs a bitwise XOR operation between two values.

PRE:

 r0 = 0x00000000
 r1 = 0x0000000F (0000 1111 in binary)
 r2 = 0x00000006 (0000 0110 in binary)

Instruction:
EOR r0, r1, r2

POST:

 r0 = 0x00000009 (0000 1111 ⊕ 0000 0110 = 0000 1001)

Explanation:

 XOR sets a bit to 1 only if the bits in the two values are different.

4. BIC (Bit Clear - AND NOT)

This instruction clears specific bits using the AND NOT operation.

PRE:

 r0 = 0x00000000
 r1 = 0x0000000F (0000 1111 in binary)
 r2 = 0x00000006 (0000 0110 in binary)

Instruction:
BIC r0, r1, r2

POST:

 r0 = 0x00000009 (0000 1111 & ~(0000 0110) = 0000 1001)

Explanation:

 The BIC instruction clears (sets to 0) the bits that are 1 in the second number.
Summary Table

Instruction Operation Example Calculation Result


AND Bitwise AND 0000 1111 & 0000 0110 0000 0110 (6)
ORR Bitwise OR `0000 1111 0000 0110`
EOR Bitwise XOR 0000 1111 ⊕ 0000 0110 0000 1001 (9)
BIC Bitwise AND NOT 0000 1111 & ~(0000 0110) 0000 1001 (9)

Comparison Instructions in ARM Controller

Comparison instructions compare or test a register with a 32-bit value but do not store the
result. Instead, they update the condition flags in the CPSR (Current Program Status Register).
These flags are then used to control program flow (e.g., deciding whether to jump to another
part of the program).

1. CMN (Compare Negated - Adds Two Values and Sets Flags)

This instruction adds a register value to another number and updates the flags.

PRE:

 r1 = 0x00000003 (3 in decimal)
 r2 = 0x00000005 (5 in decimal)

Instruction:
CMN r1, r2

POST (Flags Updated):

 Z (Zero Flag) = 0 (Result is not zero)


 N (Negative Flag) = 0 (Result is positive)
 C (Carry Flag) = 0 (No overflow)

Explanation:

 This checks the result of r1 + r2 without storing it, just setting the flags.
2. CMP (Compare - Subtracts Two Values and Sets Flags)

This instruction subtracts one value from another and updates the flags.

PRE:

 r1 = 0x00000007 (7 in decimal)
 r2 = 0x00000007 (7 in decimal)

Instruction:
CMP r1, r2

POST (Flags Updated):

 Z (Zero Flag) = 1 (Result is zero)


 N (Negative Flag) = 0 (No negative result)
 C (Carry Flag) = 1 (No borrow needed)

Explanation:

 Since 7 - 7 = 0, the Zero Flag (Z) is set, indicating equality.

3. TEQ (Test for Equality - Bitwise XOR and Sets Flags)

This instruction performs a bitwise XOR and sets flags.

PRE:

 r1 = 0x0000000F (0000 1111 in binary)


 r2 = 0x0000000F (0000 1111 in binary)

Instruction:
TEQ r1, r2

POST (Flags Updated):

 Z (Zero Flag) = 1 (Result is zero)


 N (Negative Flag) = 0

Explanation:

 XOR (Exclusive OR) of two equal values gives 0, so the Zero Flag (Z) is set.
4. TST (Test Bits - Bitwise AND and Sets Flags)

This instruction performs a bitwise AND and sets flags.

PRE:

 r1 = 0x0000000F (0000 1111 in binary)


 r2 = 0x00000003 (0000 0011 in binary)

Instruction:
TST r1, r2

POST (Flags Updated):

 Z (Zero Flag) = 0 (Result is not zero)


 N (Negative Flag) = 0

Explanation:

 Bitwise AND keeps common bits:


o 0000 1111 & 0000 0011 = 0000 0011 (Not zero, so Z = 0)

Summary Table

Instruction Operation Example Calculation Flags Updated


CMN Compare Negated r1 + r2 Updates flags based on sum
CMP Compare r1 - r2 Updates flags based on difference
TEQ Test for Equality r1 ⊕ r2 Updates flags based on XOR
TST Test Bits r1 & r2 Updates flags based on AND

Multiply Instructions in ARM Controller Multiply Instructions in ARM


Controller

The multiply instructions in the ARM architecture perform multiplication operations on


registers. These instructions can handle 32-bit and 64-bit results, depending on whether they use
standard multiplication or long multiplication. Some instructions also support accumulation,
which allows adding an extra value to the multiplication result.
Table: Multiply Instructions Summary

Instruction Description Formula Registers Used


MUL Multiply two 32-bit values Rd = Rm × Rs 1 result register
MLA Multiply two 32-bit values and add another Rd = (Rm × Rs) + Rn 1 result register
value
UMULL Unsigned multiply long (produces 64-bit [RdHi, RdLo] = Rm × Rs 2 registers for
result) result
UMLAL Unsigned multiply and accumulate long [RdHi, RdLo] = (Rm × Rs) + 2 registers for
(adds an extra value to 64-bit result) [RdHi, RdLo] result
SMULL Signed multiply long (produces 64-bit [RdHi, RdLo] = Rm × Rs 2 registers for
result) result
SMLAL Signed multiply accumulate long (adds an [RdHi, RdLo] = (Rm × Rs) + 2 registers for
extra value to 64-bit result) [RdHi, RdLo] result

1. MUL (Multiply)

The MUL instruction multiplies two 32-bit registers and stores the result in a destination
register.

Example:
Multiply r1 and r2, store the result in r0.
Pre-condition:
r1 = 3
r2 = 4
r0 = 0

Instruction:
MUL r0, r1, r2

Post-condition:
r0 = 12 (3 × 4 = 12)

2. MLA (Multiply and Add)

The MLA instruction multiplies two 32-bit values and adds a third value before storing the
result in a destination register.
Example:
Multiply r1 and r2, then add r3, and store the result in r0.
Pre-condition:
r1 = 3
r2 = 4
r3 = 2
r0 = 0
Instruction:
MLA r0, r1, r2, r3
Post-condition:
r0 = 14 (3 × 4 + 2 = 14)

3. UMULL (Unsigned Multiply Long)

The UMULL instruction performs unsigned 32-bit multiplication but stores the full 64-bit
result in two registers.

4. UMLAL (Unsigned Multiply and Accumulate Long)

The UMLAL instruction performs unsigned multiplication and adds an existing 64-bit value
before storing the result in two registers.

5. SMULL (Signed Multiply Long)

The SMULL instruction performs signed 32-bit multiplication and stores the full 64-bit result
in two registers. This is useful when dealing with negative numbers.

6. SMLAL (Signed Multiply and Accumulate Long)

The SMLAL instruction performs signed multiplication and adds the result to an existing 64-
bit value before storing the final result in two registers.

 SMULL and SMLAL are particularly useful when handling signed numbers.
 MLA and UMLAL are ideal for cases where an additional value needs to be accumulated into the
multiplication result.

These instructions are widely used in mathematical computations, digital signal processing,
and high-performance applications.
2.1.4 Branch Instructions

Branch instructions change the flow of a program or call a subroutine. They are essential for
creating loops, conditional statements (if-then-else), and function calls. When a branch
instruction is executed, it updates the program counter (pc) to a new address, redirecting the
program’s execution.

The ARMv5E instruction set includes four types of branch instructions:

 B (Branch): Jumps to a new address.


 BL (Branch with Link): Jumps to a new address and saves the return address in the link register
(lr).
 BX (Branch Exchange): Jumps to an address stored in a register and switches between ARM and
Thumb modes.
 BLX (Branch Exchange with Link): Similar to BX but also saves the return address in lr if a
function call is made.

Syntax:

 B {condition} label → Jump to the given label.


 BL {condition} label → Jump to the label and store the return address in lr.
 BX {condition} Rm → Jump to the address in register Rm and switch execution mode if needed.
 BLX {condition} label | Rm → Jump to a label or an address in Rm, storing the return address in
lr.

Explanation:

 The label is stored as a signed offset relative to pc and must be within 32 MB of the branch
instruction.
 The T bit in the cpsr (Current Program Status Register) determines whether the processor is in
ARM or Thumb mode.
1) If T = 1, the processor switches to Thumb mode.
2) If T = 0, it remains in ARM mode.

This allows efficient execution of both 32-bit ARM and 16-bit Thumb instructions.

Example of Branch Instruction

B LOOP ; Jump to LOOP

LOOP:
MOV R0, #5 ; R0 = 5

Explanation:

 B LOOP → Jumps to the LOOP label.


 MOV R0, #5 → Assigns 5 to R0 when execution reaches LOOP.
Key Concept: The B (Branch) instruction changes the program’s execution flow by jumping to
a new location.

Example of BL (Branch with Link) Instruction

BL FUNCTION ; Call FUNCTION and save return address in LR

FUNCTION:
MOV R0, #10 ; R0 = 10
BX LR ; Return to the caller

Explanation:

 BL FUNCTION → Jumps to FUNCTION and stores the return address in the link
register (LR).
 MOV R0, #10 → Assigns 10 to R0 inside the function.
 BX LR → Returns to the instruction after BL FUNCTION using the stored address in
LR.

The BL (Branch with Link) instruction is used for function calls, allowing the program to
return after execution.

3.3 Load-Store Instructions

Load-store instructions move data between memory and processor registers. These instructions
help the processor read from and write to memory. There are three types:

1. Single-register transfer – Moves one data item.


2. Multiple-register transfer – Moves multiple registers at once.
3. Swap – Exchanges data between a register and memory.

3.3.1 Single-Register Transfer

These instructions transfer a single data item between memory and a register. The data can be
32-bit words, 16-bit halfwords, or 8-bit bytes.

Common Single-Register Transfer Instructions:

Instruction Operation Description


LDR Rd ← mem[address] Load a word (32-bit) into a register.
STR mem[address] ← Rd Store a word (32-bit) from a register to
memory.
LDRB Rd ← mem8[address] Load a byte (8-bit) into a register.
STRB mem8[address] ← Rd Store a byte (8-bit) from a register to memory.
LDRH Rd ← mem16[address] Load a halfword (16-bit) into a register.
STRH mem16[address] ← Rd Store a halfword (16-bit) from a register to
memory.
LDRSB Rd ← SignExtend(mem8[address]) Load a signed byte (8-bit) and extend to 32-bit.
LDRSH Rd ← Load a signed halfword (16-bit) and extend to
SignExtend(mem16[address]) 32-bit.

Load-Store Instruction Example

➡ Load Data from Memory into a Register

 Fetches the value stored at the memory address in r3 and places it into r2.

LDR r2, [r3] ; Load r2 with the value from memory at address in r3

➡ Store Data from a Register into Memory

 Saves the value from r2 into the memory location stored in r3.

STR r2, [r3] ; Store the value of r2 into memory at address in r3

Explanation:

 LDR (Load Register) → Moves data from memory to a register.


 STR (Store Register) → Moves data from a register to memory.

3.3.2 Single-Register Load-Store Addressing Modes

ARM supports different ways to access memory, known as addressing modes. These determine
how the memory address is calculated during load-store operations.

1Preindex with Writeback – The address is calculated as base + offset, and the base register is
updated.

 Example: LDR r0, [r1, #4]! (Loads data from r1 + 4 and updates r1.)

2Preindex – The address is base + offset, but the base register remains unchanged.

 Example: LDR r0, [r1, #4] (Loads data from r1 + 4, but r1 is not updated.)

3Postindex – The address is initially base, and then the base register is updated.

 Example: LDR r0, [r1], #4 (Loads data from r1, then updates r1 to r1 + 4.)

These methods allow efficient memory access and control in ARM processors.
3.3.3 Multiple-Register Transfer

ARM provides load-store multiple instructions that move multiple registers between memory
and the processor in a single instruction. These instructions are useful for efficiently transferring
blocks of data, saving/restoring processor states, and managing stacks.

Examples of LDR Instructions with Different Addressing Modes

1️⃣ Preindex with Writeback – The address is calculated before loading, and the base register is
updated.

 LDR r0, [r1, #0x4]! → Loads r0 from mem[r1 + 4] and updates r1 = r1 + 4.


 LDR r0, [r1, r2]! → Loads r0 from mem[r1 + r2] and updates r1 = r1 + r2.
 LDR r0, [r1, r2, LSR #0x4]! → Loads r0 from mem[r1 + (r2 >> 4)] and updates r1
= r1 + (r2 >> 4).

2️⃣ Preindex – The address is calculated before loading, but the base register is not updated.

 LDR r0, [r1, #0x4] → Loads r0 from mem[r1 + 4], r1 remains the same.
 LDR r0, [r1, r2] → Loads r0 from mem[r1 + r2], r1 remains unchanged.
 LDR r0, [r1, -r2, LSR #0x4] → Loads r0 from mem[r1 - (r2 >> 4)], r1 remains
unchanged.

3️⃣ Postindex – The address is used first, and then the base register is updated.

 LDR r0, [r1], #0x4 → Loads r0 from mem[r1], then updates r1 = r1 + 4.


 LDR r0, [r1], r2 → Loads r0 from mem[r1], then updates r1 = r1 + r2.
 LDR r0, [r1], r2, LSR #0x4 → Loads r0 from mem[r1], then updates r1 = r1 + (r2
>> 4).

These multiple-register transfer instructions improve efficiency in memory operations and are
widely used in stack management and function calls.

Stack Addressing Modes in ARM

In ARM, stack operations use Load Multiple (LDM) for popping values from the stack and
Store Multiple (STM) for pushing values onto the stack. The stack can grow up (ascending) or
down (descending) and can be full or empty, resulting in different addressing modes.

1. Full Ascending (FA)

 Stack grows upward (higher memory addresses).


 SP points to the last used location (full stack).
Example:

STMIB sp!, {r0, r1} ; Push r0 and r1 onto the stack (increment before
storing)
LDMDA sp!, {r0, r1} ; Pop r0 and r1 from the stack (decrement after
loading)

2. Full Descending (FD) (Most common in ARM systems)

 Stack grows downward (lower memory addresses).


 SP points to the last used location (full stack).

Example:

STMDB sp!, {r0, r1} ; Push r0 and r1 onto the stack (decrement before
storing)
LDMIA sp!, {r0, r1} ; Pop r0 and r1 from the stack (increment after
loading)

3. Empty Ascending (EA)

 Stack grows upward (higher memory addresses).


 SP points to the first unused location (empty stack).

Example:

STMIA sp!, {r0, r1} ; Push r0 and r1 onto the stack (increment after
storing)
LDMDB sp!, {r0, r1} ; Pop r0 and r1 from the stack (decrement before
loading)

4. Empty Descending (ED)

 Stack grows downward (lower memory addresses).


 SP points to the first unused location (empty stack).

Example:

STMDA sp!, {r0, r1} ; Push r0 and r1 onto the stack (decrement after
storing)
LDMIB sp!, {r0, r1} ; Pop r0 and r1 from the stack (increment before
loading)
3.3.4 Swap Instruction (SWP) –

The SWP (Swap) instruction is a special type of load-store operation that exchanges data
between a memory location and a register in a single atomic operation. This means no other
instruction or process can access that memory location while the swap is taking place, ensuring
data consistency.

Table: SWP vs. SWPB (Word vs. Byte Swap)

Instruction Operation Example


SWP Swaps a 32-bit word between memory and register SWP r0, r1, [r2]
SWPB Swaps a single byte between memory and register SWPB r0, r1, [r2]

Syntax:
SWP{B}{<cond>} Rd, Rm, [Rn]

Where:

 SWP → Swap a 32-bit word (4 bytes).


 SWPB → Swap a single byte (8-bit).
 Rd → Destination register (gets the original memory value).
 Rm → Register containing the new value to store in memory.
 Rn → Address of the memory location involved in the swap.

How It Works:

1. Loads the memory value at address [Rn] into a temporary register.


2. Stores the value from Rm into memory at [Rn].
3. Moves the temporary value into Rd.

Example: (Swapping a word between memory and a register)

SWP r0, r1, [r2] ; Swap memory at address r2 with registers r0 and r1

Explanation:

 Suppose:
1. r1 = 0x12345678 (New value to store in memory)
2. Memory at [r2] = 0xABCDEF00
 After execution:
1. r0 = 0xABCDEF00 (Old memory value loaded into r0).
2. Memory at [r2] = 0x12345678 (New value from r1 stored in memory).

Example: (Swapping a byte instead of a full word using SWPB)

SWPB r0, r1, [r2] ; Swap a byte between memory and a register

 Suppose:
1. r1 = 0xFF (New byte to store in memory).
2.Memory at [r2] = 0xAB
 After execution:
1. r0 = 0xAB (Old byte loaded into r0).
2. Memory at [r2] = 0xFF (New byte stored).

3.4 Software Interrupt Instruction (SWI)

A Software Interrupt (SWI) is a special instruction that allows a program to request services
from the operating system (OS), such as printing a message or reading user input.

How It Works

1. The processor switches to Supervisor Mode (SVC).


2. It jumps to address 0x8 in the exception vector table.
3. The SWI handler (a part of the OS) reads the SWI number to decide what operation to perform.
4. The handler executes the requested function and returns to the program.

Syntax
SWI <SWI_number>

 SWI_number tells the system what function to perform.

Example: Printing a Message

Task:

Use SWI to print "Hello, World!" on the screen.

Example Code
MOV r0, #1 ; File descriptor for standard output (screen)
LDR r1, =message ; Load address of message into r1
MOV r2, #13 ; Length of the message
MOV r7, #4 ; System call number for printing (Linux ARM)
SWI 0 ; Call operating system to print
MOV r7, #1 ; System call number for exit
SWI 0 ; Call OS to terminate the program
Explanation

 MOV r0, #1 → Selects the standard output (screen).


 LDR r1, =message → Loads the address of the message.
 MOV r2, #13 → Specifies the length of the message.
 MOV r7, #4 → System call for printing.
 SWI 0 → Calls the OS to execute the print command.

After printing, the program exits using:

MOV r7, #1 ; System call for exit


SWI 0 ; Call OS to terminate

3.5 Program Status Register (PSR) Instructions

The Program Status Register (PSR) stores important information about the processor, such as:

 Flags (f): Zero (Z), Carry (C), Negative (N), etc.


 Mode (s): Processor state (User, Supervisor, etc.).
 Control (c): Interrupt settings and Thumb mode.

To read or modify the PSR, ARM provides two instructions:

1. MRS (Move PSR to Register): Copies PSR (CPSR or SPSR) into a general-purpose register.
2. MSR (Move Register to PSR): Writes a register value or an immediate value into PSR.

Syntax (Simple)
MRS Rd, <CPSR | SPSR> ; Copy PSR to a register
MSR <CPSR | SPSR>_<field>, Rm ; Move a register value to PSR
MSR <CPSR | SPSR>_<field>, #immediate ; Move an immediate value to PSR

 Fields (c, x, s, f) specify which part of PSR is modified.

Example 1: Enabling IRQ Interrupts

This example turns on IRQ interrupts by clearing the I bit in CPSR.


Code
MRS r1, cpsr ; Copy CPSR to register r1
BIC r1, r1, #0x80 ; Clear bit 7 (IRQ disable bit)
MSR cpsr_c, r1 ; Write updated value back to CPSR

Effect: IRQ interrupts are enabled.

Example 2: Changing Processor Mode to System Mode

This example switches the processor to System Mode.

Code
MOV r0, #0x1F ; Load System Mode value
MSR cpsr_c, r0 ; Set CPSR to System Mode

Effect: The processor now runs in System Mode.

Simple Breakdown

 MRS copies PSR → register (e.g., MRS r1, cpsr).


 MSR moves register → PSR (e.g., MSR cpsr_c, r1).
 MSR with immediate sets PSR directly (e.g., MSR cpsr_c, #0x1F).

Would you like a practical application of these instructions? ️

Figure 3.9 psr byte fields.

3.5.1 Coprocessor Instructions

Coprocessor instructions extend the ARM instruction set by allowing specialized operations
such as:

 Additional computation (e.g., floating-point operations).


 Memory management (e.g., controlling caches and memory subsystems).

These instructions work only if a coprocessor is present.


Types of Coprocessor Instructions

1. CDP (Coprocessor Data Processing)


 Performs an operation inside a coprocessor.
2. MRC / MCR (Coprocessor Register Transfer)
 MRC: Reads a value from a coprocessor register into a general-purpose register.
 MCR: Writes a value from a general-purpose register into a coprocessor register.
3. LDC / STC (Coprocessor Memory Transfer)
 LDC: Loads a block of memory into a coprocessor.
 STC: Stores a block of memory from a coprocessor to main memory.

Syntax
CDP cp, opcode1, Cd, Cn, opcode2 ; Perform an operation in coprocessor
MRC cp, opcode1, Rd, Cn, Cm, opcode2 ; Read from coprocessor register to a
CPU register
MCR cp, opcode1, Rd, Cn, Cm, opcode2 ; Write from CPU register to a
coprocessor register
LDC cp, Cd, [address] ; Load memory to coprocessor
STC cp, Cd, [address] ; Store memory from coprocessor

 cp: Coprocessor number (e.g., p15 for system control).


 Cn, Cm, Cd: Coprocessor registers.
 opcode1, opcode2: Operation codes (specific to each coprocessor).

Example 1: Reading the Processor ID from CP15

This example reads the processor identification number from CP15 register c0 and stores it
in r10.

MRC p15, 0, r10, c0, c0, 0 ; Read processor ID from CP15

Effect: The processor ID is now stored in r10.

Example 2: Enabling Cache via CP15

This example enables the cache by writing to CP15 register c1.

MRC p15, 0, r1, c1, c0, 0 ; Read CP15 control register c1 into r1
ORR r1, r1, #0x4 ; Set bit to enable cache
MCR p15, 0, r1, c1, c0, 0 ; Write modified value back to CP15

Effect: Cache is enabled.


Key Takeaways

 MRC → Coprocessor → CPU Register (Read from coprocessor).


 MCR → CPU Register → Coprocessor (Write to coprocessor).
 CDP → Perform calculations inside a coprocessor.
 LDC/STC → Transfer memory between main memory and a coprocessor.

3.6 Loading Constants

ARM does not have a direct instruction to move a 32-bit constant into a register because its
instructions are also 32 bits long.

To handle this, ARM provides two pseudoinstructions:

1. LDR (Load Constant) → Loads a 32-bit constant into a register.


2. ADR (Load Address) → Loads a relative address into a register.

Syntax
LDR Rd, =constant ; Load a 32-bit constant into Rd
ADR Rd, label ; Load an address into Rd

 Rd: Destination register


 constant: Any 32-bit value
 label: A memory address

Example 1: Loading a 32-bit Constant


LDR r0, =0x12345678 ; Load 0x12345678 into r0

Effect: r0 now holds 0x12345678.

Example 2: Loading an Address


ADR r1, my_label ; Load the address of 'my_label' into r1

Effect: r1 now holds the memory address of my_label.

Key Takeaways
 LDR → Loads a constant into a register.
 ADR → Loads an address into a register.
 If the constant cannot be encoded directly, ARM stores it in memory and loads it using
LDR.

You might also like