ARM: a mandatory primer
Advanced RISC Machines is a name so commonplace
that almost everyone in the embedded systems space
has come across them at somepoint in time
by ASHOK RAO / SENIOR SUPPORT ENGINEER, FARNELL ELEMENT14
S T H E S AY I NG G O E S ,
If you stand at Times Square
in NYC, within 15 minutes
somebody you know will cross your
path. The case with ARM is something
similar, every person in todays embedded industry will come across ARM at
some point in their career.
In this initial article, we discuss a few
(of the many) of ARMs IPs (Intellectual
Properties), the underlying technologies, their use in laymans terms and
26 ELEMENT14 SUMMER 2014
some of the foreseeable future of
these technologies.
ARMs IP is divided into various categories such as processor cores, graphics engines, system IP-for on chip communication & data capturing, memory
IP, interface IP and POP (Processor
Optimization Pack) IP.
The processor IP is sub divided into
various other Cortex, ARM7 and ARM9
series. The graphics engines are sub
divided into Mali mid and high range
graphics, Mali video and Mali display series. The system IP consists
of CoreLink and AMBA (Advanced
Microcontroller Bus Architecture),
Debug and trace IP, System and
Memory controllers and CoreSight. In
this article we will take a detailed look
at the processors and GPUs, the other
IPs will be covered in the follow up
article in the next edition of element14
Tech Journal.
www.arm.com/products/processors
Cortex-M0
Best suited for 8 or 16-bit type applications, low cost
and simple to program, 3 stage pipelining and 30%
smaller code size than 8-bit devices.
ARM IP and
technologies
Cortex-M0+
2 stage pipelining, most energy efficient processor
due to various sleep and deep sleep mode options.
Cortex-M1
Currently available only for FPGA platforms.
Cortex-M3
The most popular of all. Suitable for 32-bit
applications, 3 stage pipelining, Optional trace for
ETM, ITM and Data trace.
Cortex-M4
Specifically designed for DSP and DSC, 8/16-bit SIMD
arithmetic, Hardware divide, Single precision FPU.
Cortex-R4
8 stage pipelining, branch prediction, ARMv7-R
architecture, tightly coupled memory interfaces (used
for highly deterministic and low latency applications
where caching may not be particularly useful), optional
FPU and MPU, can operate usually up to 600MHz.
Cortex-R5
Increased efficiency and reliability, enhanced error
management in dependable real-time systems.
Operating frequency usually above 600MHz .
Cortex-R7
11 stage pipelining, ARMv7-R architecture with
DSP components, optional Tightly coupled memory
interface, FPU and error correction.
Cortex-A5
Minimum operating frequency at 530MHz and can
go upwards of 1GHz, lowest power consumption
among the A-class processors. Application areas
include low cost smart phones, tablets and Digital
TV, provides a gateway to the ARMv7-A architecture
based processors and easy migration path, TrustZone
technology for reliable implementation of security
critical applications.
Cortex-A7
More useful as a multicore platform, Utilizes big.LITTLE
configuration to provide optimum energy efficiency
along with high performance. The Global Task
Scheduling software which is used for switching the
control and instruction execution between the big
and the little processors on the chip is available
under open source at the following repository:
git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git.
Cortex-A8
Available only as single core platform, NEON,
TrustZone and Jazelle technology for Java
acceleration, 2 level cache memory.
Cortex-A9
14 cores, Thumb / Thumb2 instruction set,
Jazelle, DSP extension, optional SIMD and FPU,
ARMv7-A architecture.
Cortex-A12
All other features of previous A-series processors +
1TB addressable memory space-future proof, 1-4
cores within single processor cluster. Multiple such
clusters are possible.
Cortex-A15
Max operating frequency up to 2.5GHz, Applications
Smartphones, Digital home entertainment, Wireless
infrastructure, home and web servers.
Cortex-A17
Targeted at mobile and consumer market as
improvement over Cortex-A9, 60% performance
increase over Cortex-A9, All the usual goodies of
Cortex-A class.
Cortex-A50 and
Cortex-A53
64-bit ARMv8 architecture. This is targeted at
applications such as augmented reality, gesture
control, mobile gaming and Web 2.0.
Processor IP
ARM the name also consists of the
series of processor cores themselves.
The Cortex series of processors are
divided into Cortex-A, Cortex-R and
Cortex-M categories. Each of these has
its own advantages, target applications
and complexities. The Cortex-As are further subdivided into 32-bit: Cortex-A5,
A7, A8, A9, A12, A15, A17 and 64-bit:
A53 and A57. The Cortex-R series are
subdivided into Cortex-R4, R5 and R7.
The last but definitely not the least, the
Cortex-M is subdivided into Cortex-M0,
M0+, M1 (only for FPGAs), M3 and M4.
Some of the advantages common to all
the above processors are listed below:
All ARM processor cores
are designed for low power
consumption and high efficiency
All ARM processor cores
(fromthe same series) are upward
compatible, meaning migration from
one family to another is as easy as
slicing through warm butter
Better code reuse, increasing
connectivity, smaller code size, ease
of use and high performance.
Some of the target applications for the
Cortex-M series of processors include
smart metering, human interface devices, automotive and industrial control systems, white goods, consumer products
and medical instrumentation. Cortex-R
series is mainly targeted at automotive,
networking, HDDs and other such areas.
Cortex-A series is mainly targeted at
smart phones, tablet PCs, set top boxes
and medical systems.
Let us now very briefly explore the
various advantages of each of these
classes of processors.
SUMMER 2014
ELEMENT14 27
First RISC
Processor
First ARM
RISC core
JavaOS
Support
ARM810
Acorn ARM
Processor
1985
1987
1990
ARM6
ARM7
Thumb
1991
1993
1995
ARM7TDMI
1997
SecurCore
1998
2001
ARMv6
Mali-300
Graphics acceleration platform based on the Khronos
OpenGL ES 2.0, 8KB L2 cache. It also offers support
for OpenVG 1.1 and OpenGL ES 1.1. Very low battery
consumption increasing the power efficiency by a
very large extent.
Mali-400
OpenGL 2.0 compliant multicore GPU providing
2D through OpenVG 1.1 and 3D through OpenGL
ES 1.1 and 2.0. It also has a scalable architecture
from 1-4 cores. A single driver stack for multi core
configurations greatly simplifies application porting,
system integration and maintenance.
Mali-450 MP
Similar to the Mali-400 but up to 8 cores possible.
Performance and other architectural optimisations
also made.
Mali-T720
Cost effective version of the old T600 family of
GPUs. It is designed to provide complex graphics
processing for low to mid-range Android smartphones
and tablets. Over 150% increase in efficiency and
over 50% increase in performance over previous
Mali range of GPUs. Average throughput is close to
5.6Gpixels/sec.
Mali-T604
Common APIs supported are OpenGL ES 1.1, OpenGL
ES 2.0, OpenGL ES 3.0, DirectX 11 and OpenCL 1.1.
A single driver stack for multicore combinations
provides a deadly combination of performance and
low power consumption.
Mali-T622
Similar to the T604 but can be scaled from 1-2
cores only. It provides IEEE double precision floating
point arithmetic in hardware enabling full profile and
embedded OpenCL support.
Mali-T624
Similar in all respects to the T622 and T604 but can
be scaled from 1-4 cores. An advancement is that
the multicore scheduling and performance scaling
is fully handled in the GPU. This greatly reduces the
application development activity eliminating any
considerations for these.
Mali-T628
Similar to the Mali-T624 but can be scaled from 1-8
cores providing almost double the performance of the
T-624. It also provides on board hardware support
for 64-bit scalar and vector, integer and floating point
data types which are elementary for complex graphics
acceleration and algorithms.
Mali-T678
The highest performing T600 series processor to
date. The number of GPU cores and the number of
arithmetic pipelines is doubled thereby increasing
the performance drastically. It is mainly targeted at
applications such as 3D graphics, visual computing,
augmented reality and voice recognition.
Mali-T760
The highest offering in the Mali high end GPUs. It is
almost 400% energy efficient than the T604. It can
be scaled up to 16 cores. The throughput is close
to 11.2Gpixels/sec targeting applications such as
computational photography, gesture recognition and
image stabilization.
28 ELEMENT14 SUMMER 2014
www.arm.com/products/multimedia
Graphics Processing
Engines (GPU) IP
Mali mid-range graphics
This series of GPUs are sub-divided
into Mali-300, Mali-400, Mali-450 and
Mali-720T processors.
Mali high-end graphics
This series of GPUs are sub-divided
into Mali-T604, Mali-T622, Mali-T624,
Mali-T628, Mali-T678 and Mali-T760
processors. These are designed to meet
the general purpose computing needs
on a GPU. These computations usually
involve the use of the CPU introducing
the overhead of switching between the
GPU and the CPU. This is avoided with
the introduction of the T600 series of
GPUs. There is an on-board job manager that offloads the task management
from the CPU to the GPU and enables
greater load balancing thereby increasing performance.
Cortex-A5
Cortex-A9
Cortex-A8
Cortex-A15
Cortex-A12
Cortex-A7
Cortex-A57
Cortex-A53
Cortex-A17
Cortex-R4/5/7
Cortex-M1
Cortex-M3
ARM11
2002
2004
TrustZone
TZ
ARMv7
2005
Cortex-M0 Cortex-M4
2007
+ Keil
2009
2010
+ Mali
Cortex-M0+
2014
2011
big.LITTLE
ARMv8
50 billion
Total chips
shipped
SBSA
NEON
1 billion
10 billion
What does the future look like?
What does the future look like with open source
trying to make its presence felt in embedded systems?
With all these classes of processors, the future only looks brighter
WITH AN EVER EXPANDING
portfolio of complex MCU and
GPU cores, applications such as
augmented reality, gesture recognition, seamless internet connectivity, ultra high speed communications etc are all possible. With
64-bit architectures slowly taking
shape in the embedded world, we
can be sure that many more complex devices and applications can
be easily realized. With software
development tools such as the
ever advancing DS-5, MDK and
our very own CooCox IDE optimised for Cortex-M class processors; software and application
development is also beginning
to be a better, faster and more
productive activity.
Silicon manufacturers such
as TI, Freescale and NXP with
their open source offer ings
such as BeagleBone (TI Sitara),
mbed (NXP LPC series based on
Cortex-M3) and Sabre/RIoT platforms (Freescale i.MX series) are
beginning to realise the potential
of open source in the embedded
space. Application libraries and
source code provided in repos
or the cloud for use by other
developers is certainly the way
forward. Connected communities such as the element14 community, ARM connected community, the mBed forums and the
Freescale discussions forums are
all places where developers usually turn to at the 1st instance to
explore similar issues and potential solutions. Open source is certainly the way to go with more
developers switching to these
for quick and effective solutions
and discussions.
CONCLUSION
In this initial article we have only
touched upon the various processor
and GPU IPs from ARM. There is a
lot more to discuss and know. More
detailed information on all of these can
be found on the ARM.comwebsite.
With advanced technologies such as
instruction pipelining, DSP blocks, TCMI
(Tightly Coupled Memory Interface), big.
LITTLE processing, TrustZone, multi core
architectures, job manager and support
for various other graphics APIs, the
ideas of today can soon be turned into
technologies of tomorrow.
We shall discuss the other parts
of the system IP such as CoreSight,
Debug and Trace, AMBA etc in the next
part of this article. Watch for the next
edition of element14 Tech Journal.n
SUMMER 2014
ELEMENT14 29