0% found this document useful (0 votes)

434 views44 pages

Embedded Computing Platform Design

This document discusses embedded computing platforms and memory devices. It covers CPU buses and how they connect the CPU to memory and I/O devices using protocols like handshaking. It describes different types of memory like DRAM and SDRAM. It also discusses topics like DMA, multiple bus systems, AMBA architecture, and memory organization.

Uploaded by

Satish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

434 views44 pages

Embedded Computing Platform Design

Uploaded by

Satish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Unit – II – Embedded Computing Platform Design

Syllabus:

The CPU Bus-Memory devices and systems–Designing with computing

platforms – consumer electronics architecture – platform-level performance analysis -
Components for embedded programs- Models of programs- Assembly, linking and
loading – compilation techniques- Program level performance analysis – Software
performance optimization – Program level energy and power analysis and
optimization – Analysis and optimization of program size- Program validation and
testing.

Introduction:

 In this chapter, we concentrate on bus-based computer systems created using

microprocessors, I/O devices, and memory components.

 The microprocessor is an important element of the embedded computing

system. It cannot perform any operation without memories and I/O devices.

 Hardware platforms for embedded systems often build around with the help of
memory and I/O devices.

CPU BUS:

 The bus is the mechanism by which the CPU communicates with memory and
devices.

 A bus is, at a minimum, a collection of wires, but the bus also defines a
protocol by which the CPU, memory, and devices communicate.

 One of the major roles of the bus is to provide an interface to memory and I/O
devices.

Types of Buses:

1. Data Bus 2. Address Bus

3. Control Bus 4. System Bus

Bus Protocols:

 The protocol is nothing but certain rules and conditions for the data
communication.

 The basic building block of most bus protocols is the four-cycle handshake

 The handshake ensures that when two devices want to communicate,

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 1

Unit – II – Embedded Computing Platform Design

 One is ready to transmit and the other is ready to receive.

 The handshake uses a pair of wires dedicated to the handshake:

Enq (meaning enquiry)

Ack (meaning acknowledge).

 Extra wires are used for the data transmitted during the handshake

Four Cycles of Handshake:

 Device 1 raises its output to signal an enquiry, which tells device 2 that it
should get ready to listen for data

 When device 2 is ready to receive, it raises its output to signal an

acknowledgment. At this point, devices 1 and 2 can transmit or receive.

 Once the data transfer is complete, device 2 lowers its output, signalling that it
has received the data.

 After seeing that ack has been released, device 1 lowers its output

Timing Diagram:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 2

Unit – II – Embedded Computing Platform Design

Microprocessor Buses:

 Microprocessor buses build on the handshake for communication between

the CPU and other system components.

 The term bus is used in two ways.

 The most basic use is as a set of related wires,

 It also means a protocol for communicating between components.

 The fundamental bus operations are reading and writing.

Major Components:

 Clock provides synchronization to the bus components,

 R/W is true when the bus is reading and false when the bus is writing,

 Address is an a-bit bundle of signals that transmits the address for an access,

 Data is an n-bit bundle of signals that can carry data to or from the CPU, and

 Data ready signals when the values on the data bundle are valid.

Timing Diagram:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 3

Unit – II – Embedded Computing Platform Design

 The behavior of a bus is most often specified as a timing diagram. A timing

diagram shows how the signals on a bus vary over time.

 A’s value is known at all times, so it is shown as a standard waveform that

changes between zero and one.

 B and C alternate between changing and stable states.

 A stable signal has a stable value that could be measured by an oscilloscope.

 But we cannot measure all possible values of address and data lines using
timing diagram

State Diagram:

State diagram for the bus transaction is helpful to complement the timing diagram

DMA (Direct Memory Access):

 Direct memory access (DMA) is a bus operation that allows reads and writes
not controlled by the CPU.

 A DMA transfer is controlled by a DMA controller, which requests control of

the bus from the CPU.

 After gaining control, the DMA controller performs read and write operations
directly between devices and memory.

 The DMA requires the CPU to provide two additional bus signals:

 The bus request is an input to the CPU through which DMA controllers ask for
ownership of the bus.

 The bus grant signals that the bus has been granted to the DMA controller.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 4

Unit – II – Embedded Computing Platform Design

 The DMA controller uses these two signals to gain control of the bus using a
classic four-cycle handshake.

 The bus request is asserted by the DMA controller when it wants to control the
bus, and the bus grant is asserted by the CPU when the bus is ready.

 The CPU will finish all pending bus transactions before granting control of the
bus to the DMA controller. When it does grant control, it stops driving the
other bus signals: R/W, addresses, and so on.

 Once the DMA controller is bus master, it can perform reads and writes using
the same bus protocol as with any CPU-driven bus transaction

 After the transaction is finished, the DMA controller returns the bus to the CPU
by deasserting the bus request

System Bus Configuration:

A microprocessor system often has more than one bus. High-speed devices
may be connected to a high-performance bus, while lower-speed devices are
connected to a different bus. A small block of logic known as a bridge allows the
buses to connect to each other.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 5

Unit – II – Embedded Computing Platform Design

There are several good reasons to use multiple buses and bridges.

 Higher-speed buses may provide wider data connections.

 A high-speed bus usually requires more expensive circuits and connectors.

 The cost of low-speed devices can be held down by using a lower-speed,

lower-cost bus.

 The bridge may allow the buses to operate independently, thereby providing
some parallelism in I/O operations

AMBA Bus (Adv Micro Controller Bus Architecture):

Since the ARM CPU is manufactured by many different vendors, the bus
provided off-chip can vary from chip to chip. ARM has created a separate bus
specification for single-chip systems. The AMBA bus [ARM99A] supports CPUs,
memories, and peripherals integrated in a system-on-silicon.

 The AMBA high-performance bus (AHB) is optimized for high-speed

transfers and is directly connected to the CPU. It supports several high-
performance features: pipelining, burst transfers, split transactions and
multiple bus masters.

 A bridge can be used to connect the AHB to an AMBA peripherals bus

(APB). This bus is designed to be simple and easy to implement it also
consumes relatively little power.

 The AHB assumes that all peripherals act as slaves, simplifying the logic
required in both the peripherals and the bus controller. It also does not perform
pipelined operations, which simplifies the bus logic.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 6

Unit – II – Embedded Computing Platform Design

Memory Device Organization:

The most basic way to characterize a memory is by its capacity, such as 256
MB. However, manufacturers usually make several versions of a memory of a given
size, each with a different data width.

For example, a 256-MB memory may be available in two versions:

 As a 64M *4-bit array, a single memory access obtains an 8-bit data item,

 As a 32 M* 8-bit array, a single memory access obtains a 1-bit data item,

The height/width ratio of a memory is known as its aspect ratio. The best
aspect ratio depends on the amount of memory required.

 Internally, the data are stored in a two-dimensional array of memory cells. The
n-bit address received by the chip is split into a row and a column address
(with n =r+ c). The row and column select a particular memory cell.

Random-Access Memories:

 Random-access memories can be both read and written. They are called
random access because, unlike magnetic disks, addresses can be read in any
order

 Most bulk memory in modern systems is dynamic RAM (DRAM).

 DRAM is very dense; it does, however, require that its values be refreshed
periodically since the values inside the memory cells decay over time

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 7

Unit – II – Embedded Computing Platform Design

SDRAM Operation

 The dominant form of dynamic RAM today is the synchronous DRAMs

(SDRAMs), which use clocks to improve DRAM performance.

 SDRAMs use Row Address Select (RAS) and Column Address Select (CAS)
signals to break the address into two parts, which select the proper row and
column in the RAM array.

 SDRAMs use a separate refresh signal to control refreshing

 SDRAMs include registers that control the mode in which the SDRAM
operates.

 SDRAMs support burst modes that allow several sequential addresses to be

accessed by sending only one address

SIMMs and DIMMs

 Memory for PCs is generally purchased as single in-line memory modules

(SIMMs) or double in-line memory modules (DIMMs).

 A SIMM or DIMM is a small circuit board that fits into a standard memory
socket.

Read Only Memory:

Read-only memories (ROMs) are pre programmed with fixed data are also less
sensitive to radiation induced errors.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 8

Unit – II – Embedded Computing Platform Design

Types of ROM:

Flash is dominant form of field-programmable ROM.

 Electrically erasable, must be block erased.

 Random access, but write/erase is much slower than read.

 NOR flash is more flexible.

 NAND flash is more dense

 Flash memory is the dominant form of field-programmable ROM and is

electrically erasable. Flash memory uses standard system voltage for erasing
and programming

 It allows to be reprogrammed inside a typical system

 Most flash memories today allow certain blocks to be protected.

 A common application is to keep the boot-up code in a protected block but

allow updates to other memory blocks on the device called as Black Boot
Flash.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 9

Unit – II – Embedded Computing Platform Design

Designing With Computing Platforms:

The computing platform of the embedded system application is mainly designed with

 System Architecture

 Hardware Design

 PC as a Platform

 Development Environment

 Debugging

System Architecture:

 Architecture is a set of elements and the relationships between them that

together form a single unit. The architecture of an embedded computing system
is the blueprint for implementing that system.

 The architecture of an embedded computing system includes both hardware

and software elements. Some software is very hardware-dependent.

Hardware platform architecture

It contains several elements:

 CPU:An embedded computing system clearly contains a microprocessor

 Bus: It is an integral part of the microprocessor

 Memory : RAM & ROM used in hardware

 I/O devices: Timers, Counters, ADC, DAC, RTC, networking, sensors,

actuators, etc.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 10

Unit – II – Embedded Computing Platform Design

Evaluation boards:

 Designed by CPU manufacturer or others.

 Includes CPU, memory, some I/O devices.

 May include prototyping section.

 CPU manufacturer often gives out evaluation board net list---can be used as
starting point for your custom board design.

Hardware and software architectures

Hardware and software are intimately related:

 Software doesn’t run without hardware;

 How much hardware you need is determined by the software requirements:

 Speed;

 Memory.

Adding logic to a board:

 Programmable logic devices (PLDs) provide low/medium density logic.

 Field-programmable gate arrays (FPGAs) provide more logic and multi-level

logic.

 Application-specific integrated circuits (ASICs) are manufactured for a single

purpose.

The PC as a platform:

Advantages:

 Cheap and easy to get;

 Rich and familiar software environment.

Disadvantages:

 Requires a lot of hardware resources;

 Not well-adapted to real-time.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 11

Unit – II – Embedded Computing Platform Design

Typical PC hardware platform

Typical busses:

• PCI (Peripheral Component Interconnect): standard for high-speed interfacing

 33 or 66 MHz.

 PCI Express.

• USB (Universal Serial Bus) : relatively low-cost serial interface with high
speed.

Software elements

• IBM PC uses BIOS (Basic I/O System) to implement low-level functions:

 Boot-up;

 Minimal device drivers.

• BIOS have become a generic term for the lowest-level system software.

Developing Environment

 The part of the software development on a PC or workstation known as a host

 The hardware on which the code will finally run is known as the target.

 The host and target are frequently connected by a USB link, but a higher-
speed link such as Ethernet can also be used.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 12

Unit – II – Embedded Computing Platform Design

• The host should be able to do the following:

 load programs into the target,

 start and stop program execution on the target, and

 examine memory and CPU registers

Host-based tools:

1. Cross compiler:

 Compiles code on host for target system.

 It runs on the one type of machine and generates code for the another
machine.

 After compiled the code is downloaded to the target system by serial

line.

2. Cross debugger:

 Displays target state, allows target system to be controlled.

Debugging:

 The process of modifying the embedded code which runs on the host system
for its device configuration is called debugging.

Debugging Techniques:

 It is the process of checking the errors and correcting those errors.

 It can be performed in two sides, one is software side and other is hardware
side.

 For both the sides many debugging tools are available.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 13

Unit – II – Embedded Computing Platform Design

Types of Software Debugging Tools

There are two types of software debugging tools are available.

 Serial port tool

 Break Point tool

Serial Port Tool:

 It is the most important debugging tool.

 It will perform the debugging from the initial state of the embedded system
design

 This port can be used not only for debugging but also for solving the problems
in the field.

Break point Tool:

 Another important debugging tool is the breakpoint.

 The simplest form of a breakpoint is for the user to specify an address at which
the program’s execution is to break.

 Once the PC reaches that address, control is returned to the monitor program.

 From the monitor program, the user can examine and/or modify CPU registers,
after which execution can be continued.

Advantage:

 Implementing breakpoints does not require using exceptions or external device

Types of Hardware Debugging Tools:

When the software tools are inefficient to debug the system, the hardware tools
will be used.

 Microprocessor In circuit Emulators

 Logic Analyzer

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 14

Unit – II – Embedded Computing Platform Design

Microprocessor In-circuit emulators

 A microprocessor in-circuit emulator is a specialized hardware tool, which

helps the debug software in working embedded system.

 Allows you to stop execution, examine CPU state, and modify registers.

 The CPU provides as much debugging functionality without any memory

utilization.

Drawbacks:

 Specific to particular Mp&Mc only

 Very Expensive

Logic analyzer architecture:

• It can sample different values simultaneously and but can display “0” or
changing values for each.

• It records the values of the signals into an internal memory and display the
results on the display.

 Once the memory is full

 Run is aborted.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 15

Unit – II – Embedded Computing Platform Design

Modes of Logic Analyzer:

1. State Mode:

 It represents different values of sampling the values.

 It uses system own clock to control the sampling.

2. Timing Mode:

 It also represents different values of sampling the values.

 It uses an internal clock to take several samples per clock period in a typical
system

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 16

Unit – II – Embedded Computing Platform Design

Consumer Electronics Architecture

 It is an example for complex embedded systems and the platform that supports
them.

 Not all the devices have all features, depending upon the way the device is to
be used, but most devices select features from common menus.

 Similarly, there is no single platform for consumer electronic devices, but

architecture in use is organized around some common themes.

Consumer Use cases:

1. Multimedia:

 The media may be audio, still images or video.

 They are stored in compressed form, uncompressed on viewing.

 A large and growing number of standards has been developed for multimedia
compression

 Eg. MP3, Dolby Digital for audio , JPEG for Images, MPEG-2, MPEG – 4,
H.264 for video

2. Data storage and management

 It will keep track of your multimedia and storage of multimedia, etc.

3. Communication:

 It may be relatively simple and sophisticated to use by means of USB, Ethernet

port or a cellular telephone link

Use case for Playing Multimedia

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 17

Unit – II – Embedded Computing Platform Design

Non-functional requirements for CE

 Often battery-operated, strict power budget.,

 Eg. Typical battery for portable devices provides only 75mW which must
supports all processors, display and radio

 Very inexpensive and provides very high performance.

 User interface must be capable but inexpensive.

CE devices and hosts

 It shows a use case for connecting to a client. The connection may be either
USB or over a internet.

 Many devices talk to host system.

 PC host does things that are hard to do on the device

Platforms and operating systems:

 Many CE devices use a DSP for signal processing and a RISC CPU for other
tasks.

 I/O devices include buttons, screen, USB.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 18

Unit – II – Embedded Computing Platform Design

Platform-Level Performance Analysis

 Bus-based systems add another layer of complication to performance analysis.

 Platform-level performance involves much more than the CPU.

 The CPU, Bus and Memory or I/O devices all acts as a independent elements
operated in parallel.

 We often focus on the CPU because it processes instructions, but any part of
the system can affect total system performance.

 More precisely, the CPU provides an upper bound on performance, but any
other part of the system can slow down the CPU.

 Performance depends on all the elements of the system:

 CPU.

 Cache.

 Bus.

 Main memory.

 I/O device.

Simple System

Consider the simple system as shown in Figure. We want to move data from
memory to the CPU to process it. To get the data from memory to the CPU we must:

 read from the memory;

 transfer over the bus to the cache; and

 transfer from the cache to the CPU

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 19

Unit – II – Embedded Computing Platform Design

Bandwidth as performance

 Bandwidth applies to several components:

 Memory.

 Bus.

 CPU fetches.

 Different parts of the system run at different clock rates. Different components
may have different widths (bus, memory).

Let T: # bus cycles; P: time/bus cycle.

Total time for transfer: t = TP.

D: data payload length.

O1 + O2 = overhead O.

Bus burst transfer bandwidth

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 20

Unit – II – Embedded Computing Platform Design

T: # bus cycles; P: time/bus cycle.

Total time for transfer: t = TP.

D: data payload length.

O1 + O2 = overhead O.

Parallelism:

 Computer systems have multiple components.

 When the hardware and software are properly designed, those systems can
operate independently for at least part of the time.

 When different components of the system operate in parallel, we can get more
work done in a given amount of time.

DMA:

 Direct memory access is a prime example of parallelism.

 DMA was designed to off-load memory transfers from the CPU. The CPU can
do other useful work while the DMA transfer is running

 Speed things up by running several units at once.

 DMA provides parallelism if CPU doesn’t need the bus:

 DMA + bus.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 21

Unit – II – Embedded Computing Platform Design

 CPU.

Sequential and parallel schedules in a bus-based system

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 22

Unit – II – Embedded Computing Platform Design

Components for Embedded Programs:

• In this section, we consider code for three structures or components that are
commonly used in embedded software:

 the state machine,

 the circular buffer, and

 the queue.

 State machines are well suited to reactive systems such as user interfaces;

 circular buffers and queues are useful in digital signal processing

State Machines:

 When inputs appear intermittently rather than as periodic samples, it is often

convenient to think of the system as reacting to those inputs.

 The reaction of most systems can be characterized in terms of the input

received and the current state of the system.

 This leads naturally to a finite-state machine style of describing the reactive

system’s behavior.

 The state machine style of programming is also an efficient implementation of

such computations.

Circular Buffers:

 The data stream style makes sense for data that comes in regularly and must be
processed.

 For each sample, the filter must emit one output that depends on the values of
the last n inputs.

 In a typical workstation application, we would process the samples over a

given interval by reading them all in from a file and then computing the results
all at once in a batch process

 The circular buffer is a data structure that lets us handle streaming data in
an efficient way.

 At each point in time, the algorithm needs a subset of the data stream that
forms a window into the stream

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 23

Unit – II – Embedded Computing Platform Design

 The window slides with time as we throw out old values no longer needed and
add new values.

 Since the size of the window does not change, we can use a fixed-size buffer to
hold the current data

Queues:

 Queues are also used in signal processing and event processing.

 Queues are used whenever data may arrive and depart at somewhat
unpredictable times or when variable amounts of data may arrive.

 A queue is often referred to as an elastic buffer.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 24

Unit – II – Embedded Computing Platform Design

Models of Programs:

 In this section, we develop models for programs that are more general than
source code.

 Once we have such a model, we can perform many useful analyses on the
model more easily than we could on the source code. It can be done by

 Data Flow Graph

 Control / Data Flow Graph

Data Flow Graph:

 A data flow graph is a model of a program with no conditionals.

 In a high-level programming language, a code segment with no conditionals—

more precisely, with only one entry and exit point is known as a basic block.

 Describes the minimal ordering requirements on operations

Single Assignment Form:

w = a + b; w = a + b;

x = a - c; x1 = a - c;

y = x + d; y = x1 + d;

x = a + c; x2 = a + c;

z = y + e; z = y + e;

Original basic block in C Single Assignment Form

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 25

Unit – II – Embedded Computing Platform Design

Control-data flow graph:

• CDFG: represents control and data. Uses data flow graphs as components.

• Two types of nodes:

 Decision;

 Data flow.

Data flow node

Encapsulates a data flow graph:

Write operations in basic block form for simplicity.

Control Node:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 26

Unit – II – Embedded Computing Platform Design

CDFG Example:

if (cond1) bb1();

else bb2();

bb3();

switch (test1) {

case c1: bb4(); break;

case c2: bb5(); break;

case c3: bb6(); break;

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 27

Unit – II – Embedded Computing Platform Design

Assembly and Linking:

 Assembly and linking are the last steps in the compilation process. They turn a
list of instructions into an image of the program’s bits in memory.

 Compilers do not directly generate machine code, but instead create the
instruction-level program in the form of human-readable assembly language

 The assembler’s job is to translate symbolic assembly language statements into

bit-level representations of instructions known as object code

 The assembler takes care of instruction formats and does part of the job of
translating labels into addresses.

 The final steps in determining the addresses of instructions and data are
performed by the linker, which produces an executable binary file.

 That file may not necessarily be located in the CPU’s memory, however, unless
the linker happens to create the executable directly in RAM.

 The program that brings the program into memory for execution is called a
loader

 Programs may be composed from several files.

 Addresses become more specific during processing:

 Relative addresses are measured relative to the start of a module;

 Absolute addresses are measured relative to the start of the CPU address
space.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 28

Unit – II – Embedded Computing Platform Design

Assemblers:

 Assemblers not only translating assembly code into object code,

 It also translated the assembler must translate opcode and format the bits in
each instruction, and translate labels into addresses.

 Labels make the assembly process more complex, but they are the most
important abstraction provided by the assembler

Labels:

 Label processing requires making two passes through the assembly source code
as follows:

 The first pass scans the code to determine the address of each label.

 The second pass assembles the instructions using the label values computed in
the first pass

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 29

Unit – II – Embedded Computing Platform Design

Basic Compilation Techniques:

• It is useful to understand how a high-level language program is translated into

instructions.

• Since implementing an embedded computing system often requires

 controlling the instruction sequences used to handle interrupts,

 placement of data and instructions in memory

Compilation:

• Compilation strategy (Wirth):

Compilation = translation + optimization

• Compiler determines quality of code:

 use of CPU resources;

 memory access scheduling;

 code size.

 Compilation begins with high-level language code such as C and generally

produces assembly code.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 30

Unit – II – Embedded Computing Platform Design

 The high-level language program is parsed to break it into statements and

expressions.

 In addition, a symbol table is generated, which includes all the named objects
in the program.

 Some compilers may then perform higher-level optimizations that can be

viewed as modifying the high-level language program input without reference
to instructions.

 Simplifying arithmetic expressions is one example of a machine-independent

optimization.

 Not all compilers do such optimizations, and compilers can vary widely
regarding which combinations of machine-independent optimizations they do
perform.

 Instruction-level optimizations are aimed at generating code.

 They may work directly on real instructions or on a pseudo-instruction format

that is later mapped onto the instructions of the target CPU.

 This level of optimization also helps modularize the compiler by allowing code
generation to create simpler code that is later optimized

Example 1: Arithmetic expressions:

Expression: ab + 5(c-d)

Data Flow Graph:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 31

Unit – II – Embedded Computing Platform Design

Assembly Language Program

ADR r4, a

MOV r1, [r4]

ADR r4, b

MOV r2, [r4]

ADD r3, r1, r2

ADR r4, c

MOV r1, [r4]

ADR r4, d

MOV r5, [r4]

SUB r6, r4, r5

MUL r7, r6, #5

ADD r8, r7, r3

Example 2: Control code generation:

if (a+b > 0)

x = 5;

else x = 7;

Data Flow Graph:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 32

Unit – II – Embedded Computing Platform Design

Assembly Language Program:

ADR r5,a

LDR r1,[r5]

ADR r5,b

LDR r2,[r5]

ADD r3,r1,r2

BLE label3

LDR r3,#5

ADR r5,x

STR r3,[r5]

B stmtent

LDR r3,#7

ADR r5,x

STR r3,[r5]

stmtent ...

Procedure linkage:

Another major code generation problem is the creation of procedures. It needs

the code to:

 call and return;

 Pass parameters and results.

 Procedure stacks are typically built to grow down from high addresses.

 A stack pointer (sp) defines the end of the current frame, while a frame pointer
(fp) defines the end of the last frame.

Procedure Stack:

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 33

Unit – II – Embedded Computing Platform Design

ARM procedure linkage:

• APCS (ARM Procedure Call Standard):

 r0-r3 passes parameters into procedure. Extra parameters are put on

stack frame.

 r0 holds return value.

 r4-r7 hold registers values.

 r11 is frame pointer, r13 is stack pointer.

 r10 holds limiting address on stack size to check for stack overflows.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 34

Unit – II – Embedded Computing Platform Design

Program-Level Performance Analysis:

• Need to understand performance in detail:

 Real-time behavior, not just typical.

 On complex platforms.

• Program performance ¹ CPU performance:

 Pipeline, cache are windows into program.

 We must analyze the entire program.

Execution Time:

 Execution time is a global property of a program.

 The execution time of a program often varies with the input data values.

 The cache has a major effect on program performance.

 Execution times may vary even at the instruction level.

Eg. Floating-point operations are the most sensitive to data values, than the
normal integer execution

Program Performance:

 Some microprocessor manufacturers supply simulators for their CPUs takes as

input an executable for the microprocessor along with input data, and simulate
the execution of that program.

 A timer connected to the microprocessor bus can be used to measure

performance of executing sections of code

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 35

Unit – II – Embedded Computing Platform Design

 A logic analyzer can be connected to the microprocessor bus to measure the

start and stop times of a code segment

Program performance metrics:

 Average-case execution time.

 Typically used in application programming.

 Worst-case execution time.

 A component takes longer times to complete the deadline makes

dissatisfaction.

 Best-case execution time.

 This measure can be important in Multirate real-time system

Elements of program performance:

 Basic program execution time formula:

 execution time = program path + instruction timing

 The path is the sequence of instructions executed by the program

 The instruction timing is determined based on the sequence of instructions

traced by the program path

 Solving these problems independently helps simplify analysis.

 Easier to separate on simpler CPUs.

 Accurate performance analysis requires:

 Assembly/binary code.

 Execution platform.

Instruction timing:

 Not all instructions take the same amount of time.

 Multi-cycle instructions.

 Fetches.

 Execution times of instructions are not independent.

 Pipeline interlocks.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 36

Unit – II – Embedded Computing Platform Design

 Cache effects.

 Execution times may vary with operand value.

 Floating-point operations.

 Some multi-cycle integer operations.

Example: Data-dependent paths in an if statement

Truth Table:

0 0 0 T1=F, T3=F: no assignments

0 0 1 T1=F, T3=T: A4
0 1 0 T1=T, T2=F: A2, A3
0 1 1 T1=T, T2=T: A1, A3
1 0 0 T1=T, T2=F: A2, A3
1 0 1 T1=T, T2=T: A1, A3
1 1 0 T1=T, T2=F: A2, A3
1 1 1 T1=T, T2=T: A1, A3

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 37

Unit – II – Embedded Computing Platform Design

Measurement-driven performance analysis:

 The most direct way to determine the execution time of a program is by

measuring it.

 Not so easy as it sounds:

 Must actually have access to the CPU.

 Must know data inputs that give worst/best case performance.

 Must make state visible

Feeding the program:

 Need to know the desired input values.

 May need to write software scaffolding to generate the input values.

 Software scaffolding may also need to examine outputs to generate feedback-

driven inputs.

Trace-driven measurement:

 Trace-driven:

 Instrument (Monitoring) the program.

 Save information about the path.

 Requires modifying the program.

 Trace files are large.

 Widely used for cache analysis.

Physical measurement:

 In-circuit emulator allows tracing.

 Affects execution timing.

 Logic analyzer can measure behavior at pins.

 Address bus can be analyzed to look for events.

 Code can be modified to make events visible.

 Particularly important for real-world input streams.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 38

Unit – II – Embedded Computing Platform Design

Software Performance Optimization

1. Loop Optimizations:

 Loops are important targets for optimization because programs with loops tend
to spend a lot of time executing those loops.

 There are three important techniques in optimizing loops:

 code motion,

 induction variable elimination, and

 Strength reduction

Code motion:

 Code motion lets us move unnecessary code out of a loop.

 If a computation’s result does not depend on operations performed in the loop

body, then we can safely move it out of the loop

Example:

for (i=0; i<N*M; i++)

z[i] = a[i] + b[i];

Induction variable elimination:

 An induction variable is a variable whose value is derived from the loop

iteration variable’s value.

 The compiler often introduces induction variables to help it implement the loop

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 39

Unit – II – Embedded Computing Platform Design

 Consider loop:

for (i=0; i<N; i++)

for (j=0; j<M; j++)

z[i,j] = b[i,j];

 Rather than recompute i*M+j for each array in each iteration, share induction
variable between arrays, increment at end of loop body.

Cache Optimizations:

 Loop nest: set of loops, one inside other.

 Perfect loop nest: no conditionals in nest.

 Because loops use large quantities of data, cache conflicts are common.

Example:

for (j = 0; j < M; j++)

for (i = 0; i < N; i++)

a[j][i] = b[j][i] * c;

Performance optimization hints:

 Use registers efficiently.

 Use page mode memory accesses.

 Analyze cache behavior:

 instruction conflicts can be handled by rewriting code, rescheduling;

 conflicting scalar data can easily be moved;

 Conflicting array data can be moved, padded.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 40

Unit – II – Embedded Computing Platform Design

Energy/power optimization

 Energy: ability to do work.

 Most important in battery-powered systems.

 Power: energy per unit time.

 Important even in wall-plug systems---power becomes heat.

Opportunities for saving power:

 We may be able to replace the algorithms with others that do things in clever
ways that consume less power.

 Memory accesses are a major component of power consumption in many

applications.

 By optimizing memory accesses we may be able to significantly reduce power.

 We may be able to turn off parts of the system—such as subsystems of the

CPU, chips in the system when we do not need them in order to save power.

Measuring energy consumption for a piece of code:

Factors contribute energy consumption of the program:

 Energy consumption varies somewhat from instruction to instruction.

 The sequence of instructions has some influence.

 The opcode and the locations of the operands also matter

Cache Behaviour:

 Caches are an important factor in energy consumption.

 On the one hand, a cache hit saves a costly main memory access,

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 41

Unit – II – Embedded Computing Platform Design

 On the other, the cache itself is relatively power hungry because it is built from
SRAM, not DRAM

 Energy consumption has a sweet spot as cache size changes:

 cache too small: program thrashes, burning energy on external memory

accesses;

 Cache too large: cache itself burns too much power.

 Li and Henkel [Li98] measured the influence of caches on energy consumption.

 It breaks down the energy consumption of a computer running MPEG (a video

encoder) into several components:

 software running on the CPU,

 main memory,

 data cache and instruction cache

Cache Sweet Spot

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 42

Unit – II – Embedded Computing Platform Design

Optimizing for energy:

 First-order optimization:

high performance = low energy

 Use registers efficiently.

 Identify and eliminate cache conflicts.

 Moderate loop unrolling eliminates some loop overhead instructions.

 Eliminate pipeline stalls.

 Inlining procedures may help: reduces linkage, but may increase cache
thrashing.

Program Validation & Testing:

 Complex systems need testing to ensure that they work as they are intended.

 But bugs can be subtle, particularly in embedded systems, where specialized

hardware and real-time responsiveness make programming more challenging.

 Fortunately, there are many available techniques for software testing that can
help us generate a comprehensive set of tests to ensure that our system works
properly

The two major types of testing strategies:

 Black-box Testing: It generates tests without looking at the internal structure

of the program.

 Clear-box (also known as white-box) : It generate tests based on the program

structure

Clear Box Testing:

 The control/data flow graph extracted from a program’s source code is an

important tool in developing clear-box tests for the program.

 To test the program, we must exercise both its control and data operations.

 In order to execute and evaluate these tests, we must be able to control

variables in the program and observe the results of computations

 In general, we may need to modify the program to make it more testable.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 43

Unit – II – Embedded Computing Platform Design

 By adding new inputs and outputs, we can usually substantially reduce the
effort required to find and execute the test.

We must accomplish the following three things in a test

 Provide the program with inputs that exercise the test we are interested in.

 Execute the program to perform the test.

 Examine the outputs to determine whether the test was successful

Black Box Testing:

 Complements clear-box testing.

 May require a large number of tests.

 Tests software in different ways.

 Black-box tests are generated without knowledge of the code being tested

 Tests should be created that provide specified outputs and evaluate whether the
results also satisfy the inputs

Black-box test vectors:

 Random tests.

 May weight distribution based on software specification.

 Regression tests.

 Tests of previous versions, bugs, etc.

 May be clear-box tests of previous versions.

EC6703 – ERTS Class Notes – Prepared by R.SARAVANAN – AP / ECE - PSNACET Page 44

D. Text Based: The Time Period For The Clock #10 Clock Clock
No ratings yet
D. Text Based: The Time Period For The Clock #10 Clock Clock
5 pages
Case Study
No ratings yet
Case Study
33 pages
EC 6703 2marks QB Upto Nov 2018
No ratings yet
EC 6703 2marks QB Upto Nov 2018
22 pages
Vlsi Design Flow
No ratings yet
Vlsi Design Flow
7 pages
ARM-UNIT3 and UNIT4 Question Bank
No ratings yet
ARM-UNIT3 and UNIT4 Question Bank
3 pages
WSN Unit 5
No ratings yet
WSN Unit 5
22 pages
GPU Programming and Parallelism
No ratings yet
GPU Programming and Parallelism
16 pages
Embedded Systems Q&A: ARM & Real-Time Concepts
No ratings yet
Embedded Systems Q&A: ARM & Real-Time Concepts
20 pages
Edsim 51 Lab Manual Embedded System and Iot
No ratings yet
Edsim 51 Lab Manual Embedded System and Iot
20 pages
SoC Design Course Overview
No ratings yet
SoC Design Course Overview
34 pages
DSP SHARC Processors PART1
100% (1)
DSP SHARC Processors PART1
33 pages
Embedded C Programming Overview
No ratings yet
Embedded C Programming Overview
24 pages
Embedded Systems and IoT - CS3691 - Questions With Answer
100% (1)
Embedded Systems and IoT - CS3691 - Questions With Answer
11 pages
Software Development Process
No ratings yet
Software Development Process
6 pages
PIC18F4550 ADC - PIC Controllers
100% (1)
PIC18F4550 ADC - PIC Controllers
9 pages
SDRAM: Features and Operations Guide
No ratings yet
SDRAM: Features and Operations Guide
21 pages
Arm 9
No ratings yet
Arm 9
16 pages
Frequency Divider by Odd Using Verilog Code
No ratings yet
Frequency Divider by Odd Using Verilog Code
6 pages
Energy-Efficient 10T SRAM for IMC
No ratings yet
Energy-Efficient 10T SRAM for IMC
6 pages
ESD 06 GettingStarted PDF
No ratings yet
ESD 06 GettingStarted PDF
139 pages
Ca Unit 5 Prabu
No ratings yet
Ca Unit 5 Prabu
37 pages
Microprocessor MCQ Questions and Answers PDF Msbte News
No ratings yet
Microprocessor MCQ Questions and Answers PDF Msbte News
16 pages
Fat Papers Vlsi Vellore Vit
No ratings yet
Fat Papers Vlsi Vellore Vit
4 pages
2009 - Open Book Exam BITS Pilani
No ratings yet
2009 - Open Book Exam BITS Pilani
2 pages
Intel 8086 Microprocessor Guide
No ratings yet
Intel 8086 Microprocessor Guide
79 pages
Unit 9 - Week 7: Assignment 7
No ratings yet
Unit 9 - Week 7: Assignment 7
5 pages
ES MCQ CDAC
100% (2)
ES MCQ CDAC
53 pages
Embedded Systems Question Bank
No ratings yet
Embedded Systems Question Bank
28 pages
Introduction To Spartan-3E Starter Kit Online Training
100% (1)
Introduction To Spartan-3E Starter Kit Online Training
33 pages
FSMD Design and Datapath Overview
No ratings yet
FSMD Design and Datapath Overview
19 pages
M.Tech ES ARM LAB
No ratings yet
M.Tech ES ARM LAB
14 pages
EDF and RMS Scheduling in RTOS
No ratings yet
EDF and RMS Scheduling in RTOS
28 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
AP Unit 3
No ratings yet
AP Unit 3
133 pages
Mentor Graphics Lab Manual
No ratings yet
Mentor Graphics Lab Manual
27 pages
Embedded Systems Core Concepts
No ratings yet
Embedded Systems Core Concepts
15 pages
Verilog - PPT 1
No ratings yet
Verilog - PPT 1
41 pages
Exceptions Interrupts Timers
100% (1)
Exceptions Interrupts Timers
35 pages
MPMC Ec8691 Lab Manual
71% (7)
MPMC Ec8691 Lab Manual
107 pages
Assignment Questions of ES1
No ratings yet
Assignment Questions of ES1
5 pages
Circuit Debugging Round1withans
No ratings yet
Circuit Debugging Round1withans
6 pages
21985A0425 Report
No ratings yet
21985A0425 Report
24 pages
Integrating Sensors and Actuators with Arduino
No ratings yet
Integrating Sensors and Actuators with Arduino
27 pages
FPGA RD 02098 1 6 RAM Type Interface For Embedded User Flash Memory
No ratings yet
FPGA RD 02098 1 6 RAM Type Interface For Embedded User Flash Memory
16 pages
HCL Networking and Technical Interview Q&A
No ratings yet
HCL Networking and Technical Interview Q&A
8 pages
Embedded System Exam Papers Old EI EC
No ratings yet
Embedded System Exam Papers Old EI EC
8 pages
Electronics Engineering Exam Prep
No ratings yet
Electronics Engineering Exam Prep
11 pages
Adc Dac Interfacing
No ratings yet
Adc Dac Interfacing
25 pages
Application of Computer Architecture and Organisation Using DE2 Board
No ratings yet
Application of Computer Architecture and Organisation Using DE2 Board
4 pages
Question Bank: Department of Information Technology
No ratings yet
Question Bank: Department of Information Technology
14 pages
Embedded C Development Overview
100% (2)
Embedded C Development Overview
48 pages
CMOS RF Circuit Design Q&A Guide
No ratings yet
CMOS RF Circuit Design Q&A Guide
2 pages
Embedded Internship Report
No ratings yet
Embedded Internship Report
49 pages
FMPMC Unit 1
No ratings yet
FMPMC Unit 1
48 pages
5.1 Cpu, Mem
No ratings yet
5.1 Cpu, Mem
33 pages
Understanding Bus-Based Computer Systems
No ratings yet
Understanding Bus-Based Computer Systems
2 pages
STM32 AHB and DRAM Memory Systems
No ratings yet
STM32 AHB and DRAM Memory Systems
68 pages
Lecture 1.1.3 (System Bus Structure-Data, Address and Control Bus)
No ratings yet
Lecture 1.1.3 (System Bus Structure-Data, Address and Control Bus)
17 pages
Embedded Systems Interfacing Guide
No ratings yet
Embedded Systems Interfacing Guide
75 pages
Getting To Know Embedded Systems Hardware
No ratings yet
Getting To Know Embedded Systems Hardware
28 pages
Information System Infrastructure
No ratings yet
Information System Infrastructure
55 pages
DistriCom-ELAB INCUBATOR 2023
No ratings yet
DistriCom-ELAB INCUBATOR 2023
12 pages
OS For Electrical Installation Operator Level 4
No ratings yet
OS For Electrical Installation Operator Level 4
73 pages
+diagramas Electricos Peugeot Partner 2008 - 2017 Ingles-1
100% (3)
+diagramas Electricos Peugeot Partner 2008 - 2017 Ingles-1
88 pages
Tle 8 - CHS Test
No ratings yet
Tle 8 - CHS Test
4 pages
CX Integrator Operation Manual W445 E1 01
No ratings yet
CX Integrator Operation Manual W445 E1 01
336 pages
Guinea Pig
No ratings yet
Guinea Pig
23 pages
Basic Computer BALLB & LL.B
No ratings yet
Basic Computer BALLB & LL.B
14 pages
Chapter-6 Computer Memory
No ratings yet
Chapter-6 Computer Memory
18 pages
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
No ratings yet
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
8 pages
Application Controls in CIS
No ratings yet
Application Controls in CIS
2 pages
MTS 3013 Structured Programming
No ratings yet
MTS 3013 Structured Programming
28 pages
Computer Forensics - Past Present Future
No ratings yet
Computer Forensics - Past Present Future
15 pages
IT Technician Resume
No ratings yet
IT Technician Resume
18 pages
Computer, Computer!: by Heather R. Ashley
No ratings yet
Computer, Computer!: by Heather R. Ashley
3 pages
Multichannel Analyzers: Wallace A. Ross
No ratings yet
Multichannel Analyzers: Wallace A. Ross
24 pages
Introduction to Computer System Servicing
No ratings yet
Introduction to Computer System Servicing
32 pages
Chapter One
No ratings yet
Chapter One
36 pages
SOP Lakehead
100% (2)
SOP Lakehead
2 pages
Computer Architecture Overview: Neumann Design
No ratings yet
Computer Architecture Overview: Neumann Design
67 pages
Computer Science Overview at UOG
No ratings yet
Computer Science Overview at UOG
9 pages
LCC New Catalog
No ratings yet
LCC New Catalog
18 pages
CH-1 Computer Languages
No ratings yet
CH-1 Computer Languages
5 pages
Microcontroller Basics & History
No ratings yet
Microcontroller Basics & History
13 pages
Complete Computer Science Notes
No ratings yet
Complete Computer Science Notes
3 pages
Hardware: Computer Hardware Is A Collective Term Used To Describe Any of The Physical Components of An
No ratings yet
Hardware: Computer Hardware Is A Collective Term Used To Describe Any of The Physical Components of An
4 pages
Choosing M.Tech VLSI Thesis Topics
100% (3)
Choosing M.Tech VLSI Thesis Topics
6 pages
It Question Bank-+2 1st Year
No ratings yet
It Question Bank-+2 1st Year
3 pages
Production Scheduling for Shoe Manufacturing
No ratings yet
Production Scheduling for Shoe Manufacturing
15 pages