An ARM processor is one of a family of CPUs based on the RISC (reduced instruction set
computer) architecture developed by Advanced RISC Machines (ARM). ARM makes 32-bit and
64-bit RISC multi-core processors.
• The ARM Cortex is a family of microprocessors presented in 2005 by ARM Holdings
and based on the ARMv7 instruction set .
• The Cortex family is made up of a series of functional blocks that can be connected to
each other in order to meet customer needs, so a specific Cortex processor does not
necessarily have all the functional units of the family.
• Cortex processors are available in single core or multicore configurations and for each
family there are multiple cores with different performance.
• The SecureCore series derived from the M series and used for security applications,
such as Smart cards .
1
CORTEX-A
Features of CORTEX-A
Modified Harvard Architecture
A number of key points are common to the Cortex-A family of devices:
• 32-bit RISC core, with 16 × 32-bit visible registers with mode-based register banking.
• Modified Harvard Architecture (separate, concurrent access to instructions and data).
2
• Load/Store Architecture.
• Thumb-2 technology as standard.
• VFP and NEON options.
• Backward compatibility with code from previous ARM processors.
• 4GB of virtual address space and a minimum of 4GB of physical address space.
• Hardware translation table walking for virtual to physical address translation.
• Virtual page sizes of 4KB, 64KB, 1MB and 16MB. Cacheability attributes and access permissions can be
set on a per-page basis.
• Big-endian and little-endian data access support.
• Unaligned access support for basic load/store instructions.
• Symmetric Multi-processing (SMP) support on MPCore™ variants, that is, multi-core versions of the
Cortex-A series processors, with full data coherency at the L1 cache level. Automatic cache and
Translation Lookaside Buffer (TLB) maintenance propagation provides high efficiency SMP operation
ARM Cortex-A* series: Applications (High performance & High efficiency)
Networking solutions
Digital TV
Smartphones and Tablets
Smart phones
Home Computing
Smart TVs
Digital Cameras
Embedded Computing
Home Networking
Storage
Unified Assembly Language (UAL) FOR Two instruction set: – ARM instruction set (32-bit) –
Thumb instruction set (mixed 16/32 bit)
ARM Exclusive Features - NEON
• The Advanced SIMD instructions perform
packed SIMD operations: - Registers are
considered as vectors of elements of the
same data type. - Instructions perform the
same operation in all lanes.
ARM Floating Point architecture (VFP)
provides hardware support for floating
point operations ARM Cortex™-A series
processors. • VFP architecture v3 is an
enhancement to v2: - Double the
double-precision registers - Instructions of
fixed-point and floating-point conversion
3
The ARM Cortex-R is a family of 32-bit RISC ARM processor cores licensed by Arm Holdings. The
cores are optimized for hard real-time and safety-critical applications. Cores in this family implement
the ARM Real-time (R) profile, which is one of three architecture profiles, the other two being the
Application (A) profile implemented by the Cortex-A family and the Microcontroller (M) profile
implemented by the Cortex-M family. The ARM Cortex-R family of microprocessors currently consists of
ARM Cortex-R4(F), ARM Cortex-R5(F), ARM Cortex-R7(F), ARM Cortex-R8(F), and ARM Cortex-R52(F).
Applications
The Cortex-R is suitable for use in computer-controlled systems where very low latency and/or a
high level of safety is required. An example of a hard real-time, safety critical application would be a
modern electronic braking system in an automobile. The system not only needs to be fast and
responsive to a plethora of sensor data input, but is also responsible for human safety. A failure of
such a system could lead to severe injury or loss of life.
Other examples of hard real-time and/or safety critical applications include:
Medical device
Programmable logic controller (PLC)
Electronic control units (ECU) for a wide variety of applications
Robotics
Avionics
Motion control
automotive braking system, powertrains etc
Real time and safety critical features added include:
Non overlapping memory regions
Tightly coupled memory
Increased exception handling in hardware
Hardware division instructions
Memory protection unit (MPU)
Deterministic interrupt handling as well as fast non-maskable interrupts
ECC on L1 cache and buses
Dual-core lockstep for CPU fault tolerance
4
8-stage dual issue pipeline with instruction pre-fetch and branch prediction
The FPU supports all single-precision data-processing instructions and data types described
in the ARM Architecture
Memory protection is a way to control memory access rights on a computer, and is a part of most
modern instruction set architectures and operating systems. The main purpose of memory protection
is to prevent a process from accessing memory that has not been allocated to it. This prevents a bug
or malware within a process from affecting other processes, or the operating system itself. Protection
may encompass all accesses to a specified area of memory, write accesses, or attempts to execute
the contents of the area. An attempt to access unowned memory results in a hardware fault, called
a segmentation fault or storage violation exception, generally causing abnormal termination of the
offending process. Memory protection for computer security includes additional techniques such
as address space layout randomization and executable space protection.
1Methods
o 1.1Segmentation
o 1.2Paged virtual memory
o 1.3Protection keys
o 1.4Simulated segmentation
o 1.5Capability-based addressing
o 1.6Dynamic tainting
Tightly-coupled memory
The purpose of the Tightly-Coupled Memory (TCM) is to provide low-latency memory
that the processor can use without the unpredictability that is a feature of caches.
You can use TCM to hold critical routines, such as interrupt handling routines or real-
time tasks where the indeterminacy of a cache is highly undesirable. In addition you can use
it to hold scratch pad data, data types whose locality properties are not well suited to
caching, and critical data structures such as interrupt stacks.
5
The size of each TCM can be selected independently from a minimum of 4KB to a maximum
of 256KB, in powers of 2
BIT
Year Core
WIDTH
2011 Cortex-R4(F) 32
2011 Cortex-R5(F)
2011 Cortex-R7(F)
2016 Cortex-R8(F)
2016 Cortex-R52(F)
The
ARM
Cortex-
M
(USED FOR controlling using micro control mode of operation )
6
is a group of 32-bit RISC ARM processor cores licensed by Arm Holdings. They are
intended for microcontroller use, and have been shipped in tens of billions of devices.
The cores consist of the Cortex-M0, Cortex-M0+, Cortex-M1, Cortex-M3, Cortex-M4,
Cortex-M7, Cortex-M23, Cortex-M33, Cortex-M35P.
• Energy-efficiency
– Lower energy cost, longer battery life
• Smaller code
– Lower silicon costs
• Ease of use
– Faster software development and reuse
Applications include microcontrollers in Embedded applications:
mixed signal devices,
smart sensors,
automotive body
electronics and airbags;
more recently IoT
Smart metering
human interface devices,
automotive and industrial
control systems,
consumer products
and medical instrumentation,
IoT
DWT functional description
A full DWT contains four comparators that you can configure as hardware watchpoint, an
ETM trigger, a PC sampler event trigger, or a data address sampler event trigger.
The first comparator, DWT_COMP0, can also compare against the clock cycle counter,
CYCCNT. You can also use the second comparator, DWT_COMP1, as a data comparator.
7
A reduced DWT contains one comparator that you can use as a watchpoint or as a trigger. It
does not support data matching.
The DWT, if present, contains counters for:
Clock cycles (CYCCNT).
Folded instructions.
Load Store Unit (LSU) operations.
Sleep cycles.
CPI, that is all instruction cycles except for the first cycle.
Interrupt overhead.
Announced
About the MPU (MEMORY PROTECTIONN UNIT)
Year Core
The MPU enforces privilege rules, separates processes, and 2004 Cortex-M3
enforces access rules to memory. The MPU is an optional
2007 Cortex-M1
component and supports the standard ARMv7 Protected
Memory System Architecture model. 2009 Cortex-M0
The MPU provides full support for: 2010 Cortex-M4(F)
Protection regions. 2012 Cortex-M0+
Overlapping protection regions, with ascending 2014 Cortex-M7(F)
region priority:
2016 Cortex-M23
o 7 = highest priority.
2016 Cortex-M33(F)
o 0 = lowest priority.
2018 Cortex-M35P(F)
Access permissions.
Exporting memory attributes to the system.
MPU mismatches and permission violations invoke the programmable-priority
MemManage fault handler. See the ARM®v7-M Architecture Reference
Manual for more information.
You can use the MPU to:
Enforce privilege rules.
Separate processes.
Enforce access rules.
Non-Confidential
Data endianness: Little-endian or big-endian. Unlike legacy ARM cores, the Cortex-M is
permanently fixed in silicon as one of these choices.
8
Interrupts: 1 to 32 (M0/M0+/M1), 1 to 240 (M3/M4/M7/M23), 1 to 480 (M33/M35P).
Wake-up interrupt controller: Optional.
Vector Table Offset Register: Optional. (not available for M0).
Instruction fetch width: 16-bit only, or mostly 32-bit.
User/privilege support: Optional.
Reset all registers: Optional.
Single-cycle I/O port: Optional. (M0+/M23).
Debug Access Port (DAP): None, SWD, JTAG and SWD. (optional for all Cortex-M cores)
Halting debug support: Optional.
Number of watchpoint comparators: 0 to 2 (M0/M0+/M1), 0 to 4 (M3/M4/M7/M23/M33/M35P).
Number of breakpoint comparators: 0 to 4 (M0/M0+/M1/M23), 0 to 8 (M3/M4/M7/M33/M35P).3
Selected Cortex-M processors include the instrumentation
trace microcell (ITM) to help understand system behaviour.
Although it can provide other types of trace, the ITM is
commonly associated with printf() output and event tracing
from applications and operating systems. Historically, Fast
Model systems have used semihosting or UART models to
provide character and file I/O when running software on
models. Starting with version 11.1, Fast Models for Cortex-M
provide the option of using the ITM for output and event
tracing. This makes software development equivalent for
models and boards.
ITM benefits
The primary benefit of the ITM support in Fast Models is the ability to use the
same software images for virtual prototypes and FPGA prototypes. In Cortex-M
projects, software engineers often move back and forth between models and
boards. They value the ability to use the same software for both as this
reduces complexity when switching platforms and saves time. The additional
9
maintenance related to source code modifications, based on the type of target,
is undesirable for developers.
Using the ITM is also faster than using a UART since writing to the ITM is an
internal 32-bit register write in the Cortex-M processor. Writing to a UART takes
additional time on the bus and has more impact on system performance. This
has little impact on models because Fast Models provide functional software
execution and do not model cycle-level timing details. Similarly, semihosting
works well on models, but is slower on an FPGA target because the CPU is
stopped by the semihosting exception while a data transfer takes place.
Semihosting is also toolchain dependent as the procedure for using it with
different debuggers and compilers is different.
The ARM architecture profiles are:
Application profile (Cortex-A)
Application profiles implement a traditional ARM architecture with multiple modes and support a
virtual memory system architecture based on an MMU. These profiles support both ARM and
Thumb instruction sets.
Real-time profile (Cortex-R)
Real-time profiles implement a traditional ARM architecture with multiple modes and support a
protected memory system architecture based on an MPU.
Microcontroller profile (Cortex-M)
Microcontroller profiles implement a programmers' model designed for fast interrupt processing,
with hardware stacking of registers and support for writing interrupt handlers in high-level
languages. The processor is designed for integration into an FPGA and is ideal for use in very low
power applications.000
10
Open Multimedia Application Platform (Open multimedia access point or port)
(OMAP- ARM core processor plus one or more specialized processors)
OMAP processor is introduced for mobile and multimedia applications which include a
general purpose ARM core processor plus one or more specialized processors.
Open Multimedia Application Platform is a series of image/video processors developed
by Texas Instruments.
OMAP processor is a category of proprietary SOCs for portable and mobile multimedia
applications.
OMAP devices include a
1. general purpose ARM architecture processor core plus
2. one or more specialized processors.
3. The advanced OMAP architecture provides a system solution for the wireless
market.
It seamlessly integrates a software infrastructure,
an ARM-RISC processor,
a high performance low power TI TMS320C55x generation digital signal
processor and
shared memory architecture on the same piece of silicon.
The OMAP software infrastructure includes
Support for advanced operating systems and applications through standard APIs.
TI’s unique DSP/BIOS bridge allows the developer to optimally partition tasks between the
RISC and the DSP to maximize performance without sacrificing the battery power
11
OMAP PROCESSORS The OMAP family consists of three product groups classified by
performance and intended applications: 1. High performance application processors 2. Basic
multimedia application processors 3. Integrated modem and applications processors [Link]
performance application processors: High performance application processors are used in smart
phones which are powerful enough to run significant operating systems such as Linux, Android,
Symbian ,support connectivity to personal computers and support various audio and video
applications. These high performance application processors include different OMAP serieses
(OMAP1, OMAP2, OMAP3, OMAP4, OMAP5).
OMAP1:
OMAP171x 220 MHz ,ARM926EJS+C55x DSP, low voltage 90 nm technology
OMAP161x 204 MHz,ARM926EJS+C55x DSP, 130 nm technology
OMAP2:
OMAP2431 330 MHz ARM1136+220 MHz C64x DSP OMAP2420 330 MHz ARM1136+220
MHz C55x DSP+ powerVR MBX GPU, 90nm technology
OMAP3:
OMAP model Fabric ation CPU Freq(MHz)
OMAP3430 65nm Cortex-A8 600
OMAP3440 65nm Cortex-A8 800
OMAP3530 65nm Cortex-A8 720
OMAP3640 45nm Cortex-A8 1200
OMAP4:
OMAP 4430 Cortex-A9 2 1.0-1.2
OMAP 4460 Cortex-A9 2 1.2-1.5
OMAP 4470 Cortex-A9 2 1.3-1.5
OMAP5:
OMAP5432 Cortex A15 and CortexM4 1.5,1.7 GHz PowerVR SGX544MP2 +dedicated TI 2D graph
Basic multimedia application processors: These are marketed only to handset manufacturers
which are intended to be highly integrated, low cost chips for consumer products. The OMAP-
12
DM series are intended to be used as digital media coprocessors for mobile devices with high
megapixel digital still video cameras. The image signal processor is used to accelerate
processing of camera images. The specifications of various basic multimedia application OMAP
processors is given below:
OMAP-DM270 ARM7+C54x DSP
OMAP-DM299 ARM7+Image signal processor + stacked mDDR SDRAM
OMAP-DM510 ARM926+ISP+128MB stacked mDDR SDRAM
OMAP-DM515 ARM926+ISP+256MB stacked mDDR SDRAM
Integrated modem and applications processors: These are marketed only used by handset
manufacturers. Many of the newer versions are highly integrated for use in very low cost cell
phones. The specifications of various integrated modems and application of OMAP processors is
given below:
OMAPV1030 EDGE digital baseband
OMAP850 200MHz ARM926EJS+GSM/GPRS digital baseband + stacked EDGE co-
processor
OMAP730 200MHz ARM926EJS+GSM/GPRS digital baseband + SDRAM memory
support.
13
The DSP/BIOS Bridge is the key to OMAP architecture functionality and ease of use. It provides
the application software developer a seamless, easy-to-use interface to the DSP. It allows the
developer on the RISC to access and control the DSP runtime environment using a standardized
application programming interface (API)
In the OMAP platform, the RISC OS kernel serves the same function as it does in a system using
RISC alone, but the DSP/BIOS Bridge allows software developers to reroute the processing-
intensive functions to the DSP, where they run asynchronously with no load on the RISC kernel
scheduling. In some cases, the RISC processor exercises only a limited number of command and
control functions, while the DSP provides the processing muscle needed for the application. A
media file will run asynchronously on the DSP without the interrupts and latencies inherent when
a RISC processor performs signal processing, resulting in a more robust, user-acceptable
implementation
In addition, the OMAP DSP/BIOS Bridge supports the OSE operating system for 3G mobile
phones.
TI also aggressively supports Java™ in the OMAP architecture and will make the DSP/BIOS
Bridge API accessible to developers of Java media players and the applications that make use of
these players. The OMAP architecture substantially overcomes the performance degradation that
is inherent in the Java by accelerating the processing of Java byte code.
Open Multimedia Application Platform (Open multimedia access point or port)
Open Multimedia Application Platform (Open multimedia access point or port)
14
OMAP- ARCHITECTURE
15