0% found this document useful (0 votes)

51 views1,002 pages

DUI0472M Armcc User Guide

Uploaded by

jedi.chenyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views1,002 pages

DUI0472M Armcc User Guide

Uploaded by

jedi.chenyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ARM® Compiler

Version 5.06u3

armcc User Guide

Copyright © 2010-2016 ARM. All rights reserved.

ARM DUI0472M
ARM® Compiler

ARM® Compiler
armcc User Guide
Copyright © 2010-2016 ARM. All rights reserved.
Release Information

Document History

Issue Date Confidentiality Change

A 28 May 2010 Non-Confidential ARM v4.1 Release
B 30 September 2010 Non-Confidential Update 1 for ARM v4.1
C 28 January 2011 Non-Confidential Update 2 for ARM v4.1 Patch 3
D 30 April 2011 Non-Confidential ARM v5.0 Release
E 29 July 2011 Non-Confidential Update 1 for ARM v5.0
F 30 September 2011 Non-Confidential ARM v5.01 Release
G 29 February 2012 Non-Confidential Document update 1 for ARM v5.01 Release
H 27 July 2012 Non-Confidential ARM v5.02 Release
I 31 January 2013 Non-Confidential ARM v5.03 Release
J 27 November 2013 Non-Confidential ARM v5.04 Release
K 10 September 2014 Non-Confidential ARM v5.05 Release
L 29 July 2015 Non-Confidential ARM v5.06 Release
M 31 May 2016 Non-Confidential Update 3 for ARM v5.06 Release

Non-Confidential Proprietary Notice

This document is protected by copyright and other related rights and the practice or implementation of the information contained in
this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.

Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE
WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to, and has
undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other
rights.

This document may include technical inaccuracies or typographical errors.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of
this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is
not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to ARM’s customers is
not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document at
any time and without notice.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2

Non-Confidential
ARM® Compiler

If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement covering this
document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of these terms.
This document may be translated into other languages for convenience, and you agree that if there is any conflict between the
English version of this document and any translation, the terms of the English version of the Agreement shall prevail.

Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or
elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective
owners. Please follow ARM’s trademark usage guidelines at http://www.arm.com/about/trademark-usage-guidelines.php

Copyright © 2010-2016, ARM Limited or its affiliates. All rights reserved.

ARM Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

LES-PRE-20349
Additional Notices
Some material in this document is based on IEEE 754-1985 IEEE Standard for Binary Floating-Point Arithmetic. The IEEE
disclaims any responsibility or liability resulting from the placement and use in the described manner.
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.

Unrestricted Access is an ARM internal classification.

Product Status
The information in this document is Final, that is for a developed product.
Web Address
http://www.arm.com

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3

Non-Confidential
Contents
ARM® Compiler armcc User Guide

Preface
About this book ..................................................... ..................................................... 26

Chapter 1 Overview of the Compiler

1.1 The compiler ............................................................................................................ 1-30
1.2 Source language modes of the compiler ................................ ................................ 1-31
1.3 Language extensions ............................................... ............................................... 1-33
1.4 Language compliance .............................................. .............................................. 1-34
1.5 The C and C++ libraries ............................................. ............................................. 1-35

Chapter 2 Getting Started with the Compiler

2.1 Compiler command-line syntax ....................................... ....................................... 2-37
2.2 Compiler command-line options listed by group ...................................................... 2-38
2.3 Default compiler behavior ........................................................................................ 2-44
2.4 Order of compiler command-line options ................................ ................................ 2-45
2.5 Using stdin to input source code to the compiler .......................... .......................... 2-46
2.6 Directing output to stdout ............................................ ............................................ 2-48
2.7 Filename suffixes recognized by the compiler ............................ ............................ 2-49
2.8 Compiler output files ................................................................................................ 2-51
2.9 Factors influencing how the compiler searches for header files .............................. 2-52
2.10 Compiler command-line options and search paths ........................ ........................ 2-53
2.11 Compiler search rules and the current place ............................. ............................. 2-54
2.12 The ARMCC5INC environment variable .................................................................. 2-55
2.13 Code compatibility between separately compiled and assembled modules ............ 2-56

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4

Non-Confidential
2.14 Using GCC fallback when building Linux applications ...................... ...................... 2-57
2.15 Linker feedback during compilation .................................... .................................... 2-59
2.16 Unused function code .............................................................................................. 2-60
2.17 Minimizing code size by eliminating unused functions during compilation .............. 2-61
2.18 Compilation build time .............................................. .............................................. 2-62

Chapter 3 Using the NEON Vectorizing Compiler

3.1 NEON technology .................................................................................................... 3-69
3.2 The NEON unit ........................................................................................................ 3-70
3.3 Methods of writing code for NEON .......................................................................... 3-72
3.4 Generating NEON instructions from C or C++ code ................................................ 3-73
3.5 NEON C extensions ................................................ ................................................ 3-74
3.6 Automatic vectorization ............................................. ............................................. 3-75
3.7 Data references within a vectorizable loop .............................................................. 3-76
3.8 Stride patterns and data accesses .......................................................................... 3-77
3.9 Factors affecting NEON vectorization performance ................................................ 3-78
3.10 NEON vectorization performance goals .................................................................. 3-79
3.11 Recommended loop structure for vectorization ........................... ........................... 3-80
3.12 Data dependency conflicts when vectorizing code .................................................. 3-81
3.13 Carry-around scalar variables and vectorization .......................... .......................... 3-82
3.14 Reduction of a vector to a scalar ...................................... ...................................... 3-83
3.15 Vectorization on loops containing pointers .............................................................. 3-84
3.16 Nonvectorization on loops containing pointers and indirect addressing .................. 3-85
3.17 Nonvectorization on conditional loop exits ............................... ............................... 3-86
3.18 Vectorizable loop iteration counts ............................................................................ 3-87
3.19 Indicating loop iteration counts to the compiler with __promise(expr) .......... .......... 3-89
3.20 Grouping structure accesses for vectorization ............................ ............................ 3-91
3.21 Vectorization and struct member lengths ................................ ................................ 3-92
3.22 Nonvectorization of function calls to non-inline functions from within loops ............ 3-93
3.23 Conditional statements and efficient vectorization ......................... ......................... 3-94
3.24 Vectorization diagnostics to tune code for improved performance .......................... 3-95
3.25 Vectorizable code example ...................................................................................... 3-97
3.26 DSP vectorizable code example .............................................................................. 3-99
3.27 What can limit or prevent automatic vectorization ........................ ........................ 3-101

Chapter 4 Compiler Features

4.1 Compiler intrinsics ................................................ ................................................ 4-105
4.2 Performance benefits of compiler intrinsics ............................. ............................. 4-106
4.3 ARM assembler instruction intrinsics .................................. .................................. 4-107
4.4 Generic intrinsics ................................................. ................................................. 4-108
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts .................. .................. 4-109
4.6 Compiler intrinsics for inserting optimization barriers ...................... ...................... 4-110
4.7 Compiler intrinsics for inserting native instructions ................................................ 4-112
4.8 Compiler intrinsics for Digital Signal Processing (DSP) .................... .................... 4-113
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic
operations .............................................................................................................. 4-114
4.10 Overflow and carry status flags for C and C++ code ...................... ...................... 4-116
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code ................................ 4-117
4.12 NEON intrinsics provided by the compiler .............................. .............................. 4-118
4.13 Using NEON intrinsics ............................................. ............................................. 4-119

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5

Non-Confidential
4.14 Compiler support for accessing registers using named register variables ............ 4-121
4.15 Pragmas recognized by the compiler .................................................................... 4-124
4.16 Compiler and processor support for bit-banding ......................... ......................... 4-126
4.17 Compiler type attribute, __attribute__((bitband)) ......................... ......................... 4-127
4.18 --bitband compiler command-line option ............................... ............................... 4-128
4.19 How the compiler handles bit-band objects placed outside bit-band regions ........ 4-129
4.20 Compiler support for thread-local storage .............................. .............................. 4-130
4.21 Compiler support for literal pools ..................................... ..................................... 4-131
4.22 Compiler eight-byte alignment features ................................ ................................ 4-132
4.23 Using compiler and linker support for symbol versions .................... .................... 4-133
4.24 Precompiled Header (PCH) files ..................................... ..................................... 4-134
4.25 Automatic Precompiled Header (PCH) file processing .......................................... 4-136
4.26 Precompiled Header (PCH) file processing and the header stop point ........ ........ 4-137
4.27 Precompiled Header (PCH) file creation requirements .................... .................... 4-139
4.28 Compilation with multiple Precompiled Header (PCH) files ................. ................. 4-141
4.29 Obsolete Precompiled Header (PCH) files ............................................................ 4-142
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file ....
................................................................................................................................ 4-143
4.31 Selectively applying Precompiled Header (PCH) file processing .......................... 4-144
4.32 Suppressing Precompiled Header (PCH) file processing ...................................... 4-145
4.33 Message output during Precompiled Header (PCH) processing ............. ............. 4-146
4.34 Performance issues with Precompiled Header (PCH) files ................. ................. 4-147
4.35 Default compiler options that are affected by optimization level ............................ 4-148

Chapter 5 Compiler Coding Practices

5.1 The compiler as an optimizing compiler ................................................................ 5-152
5.2 Compiler optimization for code size versus speed ................................................ 5-153
5.3 Compiler optimization levels and the debug view .................................................. 5-154
5.4 Selecting the target processor at compile time ...................................................... 5-157
5.5 Enabling NEON and FPU for bare-metal ............................... ............................... 5-158
5.6 Optimization of loop termination in C code ............................................................ 5-159
5.7 Loop unrolling in C code ........................................................................................ 5-161
5.8 Compiler optimization and the volatile keyword .......................... .......................... 5-163
5.9 Code metrics .................................................... .................................................... 5-165
5.10 Code metrics for measurement of code size and data size ................. ................. 5-166
5.11 Stack use in C and C++ ............................................ ............................................ 5-167
5.12 Benefits of reducing debug information in objects and libraries ............................ 5-169
5.13 Methods of reducing debug information in objects and libraries ............. ............. 5-170
5.14 Guarding against multiple inclusion of header files ....................... ....................... 5-171
5.15 Methods of minimizing function parameter passing overhead ............... ............... 5-172
5.16 Returning structures from functions through registers ..................... ..................... 5-173
5.17 Functions that return the same result when called with the same arguments ... ... 5-174
5.18 Comparison of pure and impure functions .............................. .............................. 5-175
5.19 Recommendation of postfix syntax when qualifying functions with ARM function
modifiers ................................................................................................................ 5-176
5.20 Inline functions ................................................... ................................................... 5-177
5.21 Compiler decisions on function inlining ................................ ................................ 5-178
5.22 Automatic function inlining and static functions .......................... .......................... 5-179
5.23 Inline functions and removal of unused out-of-line functions at link time ....... ....... 5-180
5.24 Automatic function inlining and multifile compilation ...................... ...................... 5-181

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 6

Non-Confidential
5.25 Restriction on overriding compiler decisions about function inlining .......... .......... 5-182
5.26 Compiler modes and inline functions .................................. .................................. 5-183
5.27 Inline functions in C++ and C90 mode ................................. ................................. 5-184
5.28 Inline functions in C99 mode ........................................ ........................................ 5-185
5.29 Inline functions and debugging .............................................................................. 5-187
5.30 Types of data alignment ............................................ ............................................ 5-188
5.31 Advantages of natural data alignment ................................. ................................. 5-189
5.32 Compiler storage of data objects by natural byte alignment .................................. 5-190
5.33 Relevance of natural data alignment at compile time ............................................ 5-191
5.34 Unaligned data access in C and C++ code ............................. ............................. 5-192
5.35 The __packed qualifier and unaligned data access in C and C++ code ....... ....... 5-193
5.36 Unaligned fields in structures ........................................ ........................................ 5-194
5.37 Performance penalty associated with marking whole structures as packed .... .... 5-195
5.38 Unaligned pointers in C and C++ code .................................................................. 5-196
5.39 Unaligned Load Register (LDR) instructions generated by the compiler ....... ....... 5-197
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually
__packed fields, and of a __packed struct and a #pragma packed struct ...... ...... 5-198
5.41 Compiler support for floating-point arithmetic ........................................................ 5-200
5.42 Default selection of hardware or software floating-point support ............. ............. 5-202
5.43 Example of hardware and software support differences for floating-point arithmetic ....
................................................................................................................................ 5-203
5.44 Vector Floating-Point (VFP) architectures .............................. .............................. 5-205
5.45 Limitations on hardware handling of floating-point arithmetic ................................ 5-207
5.46 Implementation of Vector Floating-Point (VFP) support code ............... ............... 5-208
5.47 Compiler and library support for half-precision floating-point numbers ........ ........ 5-210
5.48 Half-precision floating-point number format ............................. ............................. 5-211
5.49 Compiler support for floating-point computations and linkage ............... ............... 5-212
5.50 Types of floating-point linkage ....................................... ....................................... 5-213
5.51 Compiler options for floating-point linkage and computations ............... ............... 5-214
5.52 Floating-point linkage and computational requirements of compiler options .... .... 5-216
5.53 Processors and their implicit Floating-Point Units (FPUs) .................. .................. 5-218
5.54 Integer division-by-zero errors in C code ............................... ............................... 5-221
5.55 Software floating-point division-by-zero errors in C code ...................................... 5-223
5.56 About trapping software floating-point division-by-zero errors ............... ............... 5-224
5.57 Identification of software floating-point division-by-zero errors .............................. 5-225
5.58 Software floating-point division-by-zero debugging ....................... ....................... 5-227
5.59 New language features of C99 .............................................................................. 5-228
5.60 New library features of C99 ......................................... ......................................... 5-230
5.61 // comments in C99 and C90 ........................................ ........................................ 5-231
5.62 Compound literals in C99 ...................................................................................... 5-232
5.63 Designated initializers in C99 ................................................................................ 5-233
5.64 Hexadecimal floating-point numbers in C99 .......................................................... 5-234
5.65 Flexible array members in C99 .............................................................................. 5-235
5.66 __func__ predefined identifier in C99 .................................................................... 5-236
5.67 inline functions in C99 ............................................. ............................................. 5-237
5.68 long long data type in C99 and C90 ...................................................................... 5-238
5.69 Macros with a variable number of arguments in C99 ............................................ 5-239
5.70 Mixed declarations and statements in C99 ............................................................ 5-240
5.71 New block scopes for selection and iteration statements in C99 ............. ............. 5-241
5.72 _Pragma preprocessing operator in C99 ............................... ............................... 5-242

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 7

Non-Confidential
5.73 Restricted pointers in C99 .......................................... .......................................... 5-243
5.74 Additional <math.h> library functions in C99 ............................ ............................ 5-244
5.75 Complex numbers in C99 ...................................................................................... 5-245
5.76 Boolean type and <stdbool.h> in C99 ................................. ................................. 5-246
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 ........ 5-247
5.78 <fenv.h> floating-point environment access in C99 ....................... ....................... 5-248
5.79 <stdio.h> snprintf family of functions in C99 .......................................................... 5-249
5.80 <tgmath.h> type-generic math macros in C99 ........................... ........................... 5-250
5.81 <wchar.h> wide character I/O functions in C99 .......................... .......................... 5-251
5.82 How to prevent uninitialized data from being initialized to zero .............. .............. 5-252

Chapter 6 Compiler Diagnostic Messages

6.1 Severity of compiler diagnostic messages .............................. .............................. 6-254
6.2 Options that change the severity of compiler diagnostic messages ...................... 6-255
6.3 Controlling compiler diagnostic messages with pragmas ...................................... 6-257
6.4 Prefix letters in compiler diagnostic messages ...................................................... 6-259
6.5 Compiler exit status codes and termination messages .................... .................... 6-260
6.6 Compiler data flow warnings ........................................ ........................................ 6-261

Chapter 7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.1 Compiler support for inline assembly language .......................... .......................... 7-264
7.2 Inline assembler support in the compiler ............................... ............................... 7-265
7.3 Restrictions on inline assembler support in the compiler ................... ................... 7-266
7.4 Inline assembly language syntax with the __asm keyword in C and C++ ...... ...... 7-267
7.5 Inline assembly language syntax with the asm keyword in C++ ............. ............. 7-268
7.6 Inline assembler rules for compiler keywords __asm and asm .............. .............. 7-269
7.7 Restrictions on inline assembly operations in C and C++ code .............. .............. 7-270
7.8 Inline assembler register restrictions in C and C++ code ...................................... 7-271
7.9 Inline assembler processor mode restrictions in C and C++ code ........................ 7-272
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code ................ 7-273
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code ...... 7-274
7.12 Inline assembler instruction restrictions in C and C++ code .................................. 7-275
7.13 Miscellaneous inline assembler restrictions in C and C++ code ............. ............. 7-276
7.14 Inline assembler and register access in C and C++ code .................. .................. 7-277
7.15 Inline assembler and the # constant expression specifier in C and C++ code ...... 7-279
7.16 Inline assembler and instruction expansion in C and C++ code ............................ 7-280
7.17 Expansion of inline assembler instructions that use constants .............. .............. 7-281
7.18 Expansion of inline assembler load and store instructions .................................... 7-282
7.19 Inline assembler effect on processor condition flags in C and C++ code .............. 7-283
7.20 Inline assembler expression operands in C and C++ code ................. ................. 7-284
7.21 Inline assembler register list operands in C and C++ code ................. ................. 7-285
7.22 Inline assembler intermediate operands in C and C++ code ................ ................ 7-286
7.23 Inline assembler function calls and branches in C and C++ code ............ ............ 7-287
7.24 Inline assembler branches and labels in C and C++ code .................................... 7-289
7.25 Inline assembler and virtual registers .................................................................... 7-290
7.26 Embedded assembler support in the compiler ...................................................... 7-291
7.27 Embedded assembler syntax in C and C++ .......................................................... 7-292
7.28 Effect of compiler ARM and Thumb states on embedded assembler ......... ......... 7-293
7.29 Restrictions on embedded assembly language functions in C and C++ code ... ... 7-294
7.30 Compiler generation of embedded assembly language functions ............ ............ 7-295

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 8

Non-Confidential
7.31 Access to C and C++ compile-time constant expressions from embedded assembler ...
................................................................................................................................ 7-297
7.32 Differences between expressions in embedded assembler and C or C++ ............ 7-298
7.33 Manual overload resolution in embedded assembler ............................................ 7-299
7.34 __offsetof_base keyword for related base classes in embedded assembler ........ 7-300
7.35 Compiler-supported keywords for calling class member functions in embedded
assembler .............................................................................................................. 7-301
7.36 __mcall_is_virtual(D, f) .......................................................................................... 7-302
7.37 __mcall_is_in_vbase(D, f) .......................................... .......................................... 7-303
7.38 __mcall_offsetof_vbase(D, f) ........................................ ........................................ 7-304
7.39 __mcall_this_offset(D, f) ........................................................................................ 7-305
7.40 __vcall_offsetof_vfunc(D, f) ......................................... ......................................... 7-306
7.41 Calling nonstatic member functions in embedded assembler ............... ............... 7-307
7.42 Calling a nonvirtual member function .................................................................... 7-308
7.43 Calling a virtual member function .......................................................................... 7-309
7.44 Accessing sp (r13), lr (r14), and pc (r15) ............................... ............................... 7-310
7.45 Differences in compiler support for inline and embedded assembly code ...... ...... 7-311

Chapter 8 Compiler Command-line Options

8.1 -Aopt ...................................................................................................................... 8-317
8.2 --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata .............. .............. 8-318
8.3 --allow_null_this, --no_allow_null_this ................................. ................................. 8-319
8.4 --alternative_tokens, --no_alternative_tokens ........................... ........................... 8-320
8.5 --anachronisms, --no_anachronisms .................................. .................................. 8-321
8.6 --apcs=qualifier...qualifier ........................................... ........................................... 8-322
8.7 --arm ...................................................................................................................... 8-326
8.8 --arm_linux ...................................................... ...................................................... 8-327
8.9 --arm_linux_config_file=path ........................................ ........................................ 8-329
8.10 --arm_linux_configure ............................................................................................ 8-330
8.11 --arm_linux_paths .................................................................................................. 8-332
8.12 --arm_only ...................................................... ...................................................... 8-334
8.13 --asm .......................................................... .......................................................... 8-335
8.14 --asm_dir=directory_name .......................................... .......................................... 8-336
8.15 --autoinline, --no_autoinline ......................................... ......................................... 8-337
8.16 --bigend ........................................................ ........................................................ 8-338
8.17 --bitband ........................................................ ........................................................ 8-339
8.18 --branch_tables, --no_branch_tables .................................. .................................. 8-340
8.19 --brief_diagnostics, --no_brief_diagnostics ............................................................ 8-342
8.20 --bss_threshold=num .............................................. .............................................. 8-343
8.21 -c ............................................................................................................................ 8-344
8.22 -C ............................................................. ............................................................. 8-345
8.23 --c90 ........................................................... ........................................................... 8-346
8.24 --c99 ........................................................... ........................................................... 8-347
8.25 --code_gen, --no_code_gen .................................................................................. 8-348
8.26 --comment_section, --no_comment_section ............................ ............................ 8-349
8.27 --common_functions, --no_common_functions .......................... .......................... 8-350
8.28 --compatible=name ................................................................................................ 8-351
8.29 --compile_all_input, --no_compile_all_input .......................................................... 8-353
8.30 --conditionalize, --no_conditionalize ...................................................................... 8-354
8.31 --configure_cpp_headers=path ...................................... ...................................... 8-355

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 9

Non-Confidential
8.32 --configure_extra_includes=paths .................................... .................................... 8-356
8.33 --configure_extra_libraries=paths .......................................................................... 8-357
8.34 --configure_gas=path .............................................. .............................................. 8-358
8.35 --configure_gcc=path .............................................. .............................................. 8-359
8.36 --configure_gcc_version=version ..................................... ..................................... 8-360
8.37 --configure_gld=path .............................................. .............................................. 8-361
8.38 --configure_sysroot=path ........................................... ........................................... 8-362
8.39 --cpp ........................................................... ........................................................... 8-363
8.40 --cpp11 ......................................................... ......................................................... 8-364
8.41 --cpp_compat .................................................... .................................................... 8-365
8.42 --cpu=list ................................................................................................................ 8-367
8.43 --cpu=name ..................................................... ..................................................... 8-368
8.44 --create_pch=filename ............................................. ............................................. 8-371
8.45 -Dname[(parm-list)][=def] ........................................... ........................................... 8-372
8.46 --data_reorder, --no_data_reorder .................................... .................................... 8-373
8.47 --debug, --no_debug .............................................................................................. 8-374
8.48 --debug_macros, --no_debug_macros .................................................................. 8-375
8.49 --default_definition_visibility=visibility .................................................................... 8-376
8.50 --default_extension=ext ............................................ ............................................ 8-377
8.51 --dep_name, --no_dep_name ................................................................................ 8-378
8.52 --depend=filename ................................................ ................................................ 8-379
8.53 --depend_dir=directory_name ....................................... ....................................... 8-380
8.54 --depend_format=string ............................................ ............................................ 8-381
8.55 --depend_single_line, --no_depend_single_line .................................................... 8-383
8.56 --depend_system_headers, --no_depend_system_headers ................ ................ 8-384
8.57 --depend_target=target .......................................................................................... 8-385
8.58 --diag_error=tag[,tag,...] ............................................ ............................................ 8-386
8.59 --diag_remark=tag[,tag,...] .......................................... .......................................... 8-387
8.60 --diag_style=arm|ide|gnu compiler option .............................................................. 8-388
8.61 --diag_suppress=tag[,tag,...] .................................................................................. 8-389
8.62 --diag_suppress=optimizations .............................................................................. 8-390
8.63 --diag_warning=tag[,tag,...] .................................................................................... 8-391
8.64 --diag_warning=optimizations ................................................................................ 8-392
8.65 --dllexport_all, --no_dllexport_all ..................................... ..................................... 8-393
8.66 --dllimport_runtime, --no_dllimport_runtime ............................. ............................. 8-394
8.67 --dollar, --no_dollar ................................................ ................................................ 8-395
8.68 --dwarf2 ........................................................ ........................................................ 8-396
8.69 --dwarf3 ........................................................ ........................................................ 8-397
8.70 -E ............................................................. ............................................................. 8-398
8.71 --echo .......................................................... .......................................................... 8-399
8.72 --emit_frame_directives, --no_emit_frame_directives ..................... ..................... 8-400
8.73 --enum_is_int .................................................... .................................................... 8-401
8.74 --errors=filename ................................................. ................................................. 8-402
8.75 --exceptions, --no_exceptions ....................................... ....................................... 8-403
8.76 --exceptions_unwind, --no_exceptions_unwind .......................... .......................... 8-404
8.77 --execstack, --no_execstack .................................................................................. 8-405
8.78 --execute_only ................................................... ................................................... 8-406
8.79 --export_all_vtbl, --no_export_all_vtbl ................................. ................................. 8-407
8.80 --export_defs_implicitly, --no_export_defs_implicitly ...................... ...................... 8-408
8.81 --extended_initializers, --no_extended_initializers ........................ ........................ 8-409

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 10

Non-Confidential
8.82 --feedback=filename .............................................................................................. 8-410
8.83 --float_literal_pools, --no_float_literal_pools .......................................................... 8-411
8.84 --force_new_nothrow, --no_force_new_nothrow ......................... ......................... 8-412
8.85 --forceinline ............................................................................................................ 8-413
8.86 --fp16_format=format .............................................. .............................................. 8-414
8.87 --fpmode=model .................................................. .................................................. 8-415
8.88 --fpu=list ........................................................ ........................................................ 8-417
8.89 --fpu=name ............................................................................................................ 8-418
8.90 --friend_injection, --no_friend_injection ................................ ................................ 8-421
8.91 -g ............................................................. ............................................................. 8-422
8.92 --global_reg=reg_name[,reg_name,...] .................................................................. 8-423
8.93 --gnu ...................................................................................................................... 8-424
8.94 --gnu_defaults ........................................................................................................ 8-425
8.95 --gnu_instrument, --no_gnu_instrument ................................................................ 8-426
8.96 --gnu_version=version ............................................. ............................................. 8-427
8.97 --guiding_decls, --no_guiding_decls ...................................................................... 8-428
8.98 --help .......................................................... .......................................................... 8-429
8.99 --hide_all, --no_hide_all ............................................ ............................................ 8-430
8.100 -Idir[,dir,...] .............................................................................................................. 8-431
8.101 --ignore_missing_headers .......................................... .......................................... 8-432
8.102 --implicit_include, --no_implicit_include ................................ ................................ 8-433
8.103 --implicit_include_searches, --no_implicit_include_searches ................................ 8-434
8.104 --implicit_key_function, --no_implicit_key_function ....................... ....................... 8-435
8.105 --implicit_typename, --no_implicit_typename ........................................................ 8-436
8.106 --import_all_vtbl .................................................. .................................................. 8-437
8.107 --info=totals ............................................................................................................ 8-438
8.108 --inline, --no_inline ................................................ ................................................ 8-439
8.109 --integer_literal_pools, --no_integer_literal_pools ........................ ........................ 8-440
8.110 --interface_enums_are_32_bit ....................................... ....................................... 8-441
8.111 --interleave ...................................................... ...................................................... 8-442
8.112 -Jdir[,dir,...] ...................................................... ...................................................... 8-443
8.113 --kandr_include ...................................................................................................... 8-444
8.114 -Lopt ........................................................... ........................................................... 8-445
8.115 --library_interface=lib .............................................. .............................................. 8-446
8.116 --library_type=lib .................................................................................................... 8-448
8.117 --link_all_input, --no_link_all_input ........................................................................ 8-449
8.118 --list ........................................................................................................................ 8-450
8.119 --list_dir=directory_name ........................................... ........................................... 8-452
8.120 --list_macros .......................................................................................................... 8-453
8.121 --littleend ................................................................................................................ 8-454
8.122 --locale=lang_country ............................................................................................ 8-455
8.123 --long_long ...................................................... ...................................................... 8-456
8.124 --loop_optimization_level=opt ................................................................................ 8-457
8.125 --loose_implicit_cast .............................................................................................. 8-458
8.126 --lower_ropi, --no_lower_ropi ........................................ ........................................ 8-459
8.127 --lower_rwpi, --no_lower_rwpi ....................................... ....................................... 8-460
8.128 -M ............................................................. ............................................................. 8-461
8.129 --md ........................................................... ........................................................... 8-462
8.130 --message_locale=lang_country[.codepage] ............................ ............................ 8-463
8.131 --min_array_alignment=opt ......................................... ......................................... 8-464

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 11

Non-Confidential
8.132 --mm ...................................................................................................................... 8-465
8.133 --multibyte_chars, --no_multibyte_chars ............................... ............................... 8-466
8.134 --multifile, --no_multifile ............................................ ............................................ 8-467
8.135 --multiply_latency=cycles ........................................... ........................................... 8-468
8.136 --narrow_volatile_bitfields ...................................................................................... 8-469
8.137 --nonstd_qualifier_deduction, --no_nonstd_qualifier_deduction ............................ 8-470
8.138 -o filename ...................................................... ...................................................... 8-471
8.139 -Onum .................................................................................................................... 8-473
8.140 --old_specializations, --no_old_specializations .......................... .......................... 8-476
8.141 --old_style_preprocessing .......................................... .......................................... 8-477
8.142 --ool_section_name, --no_ool_section_name ........................... ........................... 8-478
8.143 -Ospace ........................................................ ........................................................ 8-479
8.144 -Otime .................................................................................................................... 8-480
8.145 --output_dir=directory_name ........................................ ........................................ 8-481
8.146 -P ............................................................. ............................................................. 8-482
8.147 --parse_templates, --no_parse_templates .............................. .............................. 8-483
8.148 --pch ........................................................... ........................................................... 8-484
8.149 --pch_dir=dir ..................................................... ..................................................... 8-485
8.150 --pch_messages, --no_pch_messages ................................ ................................ 8-486
8.151 --pch_verbose, --no_pch_verbose .................................... .................................... 8-487
8.152 --pending_instantiations=n .......................................... .......................................... 8-488
8.153 --phony_targets .................................................. .................................................. 8-489
8.154 --pointer_alignment=num ........................................... ........................................... 8-490
8.155 --preinclude=filename ............................................................................................ 8-491
8.156 --preprocess_assembly ............................................ ............................................ 8-492
8.157 --preprocessed ................................................... ................................................... 8-493
8.158 --protect_stack, --no_protect_stack ................................... ................................... 8-494
8.159 --reassociate_saturation, --no_reassociate_saturation .................... .................... 8-495
8.160 --reduce_paths, --no_reduce_paths ...................................................................... 8-496
8.161 --relaxed_ref_def, --no_relaxed_ref_def ................................................................ 8-497
8.162 --remarks ....................................................... ....................................................... 8-498
8.163 --remove_unneeded_entities, --no_remove_unneeded_entities ............. ............. 8-499
8.164 --restrict, --no_restrict ............................................................................................ 8-500
8.165 --retain=option ................................................... ................................................... 8-501
8.166 --rtti, --no_rtti .......................................................................................................... 8-502
8.167 --rtti_data, --no_rtti_data ........................................................................................ 8-503
8.168 -S ............................................................. ............................................................. 8-504
8.169 --share_inlineable_strings, --no_share_inlineable_strings .................................... 8-505
8.170 --shared ........................................................ ........................................................ 8-507
8.171 --show_cmdline .................................................. .................................................. 8-508
8.172 --signed_bitfields, --unsigned_bitfields .................................................................. 8-509
8.173 --signed_chars, --unsigned_chars .................................... .................................... 8-510
8.174 --split_ldm .............................................................................................................. 8-511
8.175 --split_sections ................................................... ................................................... 8-512
8.176 --strict, --no_strict ................................................. ................................................. 8-513
8.177 --strict_warnings .................................................................................................... 8-514
8.178 --string_literal_pools, --no_string_literal_pools ...................................................... 8-515
8.179 --sys_include .................................................... .................................................... 8-517
8.180 --thumb .................................................................................................................. 8-518
8.181 --translate_g++ ...................................................................................................... 8-519

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 12

Non-Confidential
8.182 --translate_gcc ................................................... ................................................... 8-521
8.183 --translate_gld ........................................................................................................ 8-523
8.184 --trigraphs, --no_trigraphs ...................................................................................... 8-525
8.185 --type_traits_helpers, --no_type_traits_helpers .......................... .......................... 8-526
8.186 -Uname .................................................................................................................. 8-527
8.187 --unaligned_access, --no_unaligned_access ........................................................ 8-528
8.188 --use_frame_pointer, --no_use_frame_pointer ...................................................... 8-530
8.189 --use_gas ....................................................... ....................................................... 8-531
8.190 --use_pch=filename ............................................... ............................................... 8-532
8.191 --using_std, --no_using_std ......................................... ......................................... 8-533
8.192 --vectorize, --no_vectorize .......................................... .......................................... 8-534
8.193 --version_number ................................................. ................................................. 8-535
8.194 --vfe, --no_vfe ........................................................................................................ 8-536
8.195 --via=filename ........................................................................................................ 8-537
8.196 --visibility_inlines_hidden ........................................... ........................................... 8-538
8.197 --vla, --no_vla .................................................... .................................................... 8-539
8.198 --vsn ........................................................... ........................................................... 8-540
8.199 -W .......................................................................................................................... 8-541
8.200 -Warmcc,option[,option,...] .......................................... .......................................... 8-542
8.201 -Warmcc,--gcc_fallback ............................................ ............................................ 8-543
8.202 --wchar, --no_wchar ............................................... ............................................... 8-544
8.203 --wchar16 ....................................................... ....................................................... 8-545
8.204 --wchar32 ....................................................... ....................................................... 8-546
8.205 --whole_program ................................................. ................................................. 8-547
8.206 --wrap_diagnostics, --no_wrap_diagnostics .......................................................... 8-548

Chapter 9 Language Extensions

9.1 Preprocessor extensions ........................................... ........................................... 9-551
9.2 #assert ......................................................... ......................................................... 9-552
9.3 #include_next .................................................... .................................................... 9-553
9.4 #unassert ....................................................... ....................................................... 9-554
9.5 #warning ................................................................................................................ 9-555
9.6 C99 language features available in C90 ................................................................ 9-556
9.7 // comments ..................................................... ..................................................... 9-557
9.8 Subscripting struct ................................................ ................................................ 9-558
9.9 Flexible array members ............................................ ............................................ 9-559
9.10 C99 language features available in C++ and C90 ........................ ........................ 9-560
9.11 Variadic macros .................................................. .................................................. 9-561
9.12 long long ................................................................................................................ 9-562
9.13 restrict .................................................................................................................... 9-563
9.14 Hexadecimal floats ................................................................................................ 9-564
9.15 Standard C language extensions ..................................... ..................................... 9-565
9.16 Constant expressions ............................................................................................ 9-566
9.17 Array and pointer extensions ........................................ ........................................ 9-567
9.18 Block scope function declarations .................................... .................................... 9-568
9.19 Dollar signs in identifiers ........................................................................................ 9-569
9.20 Top-level declarations ............................................................................................ 9-570
9.21 Benign redeclarations ............................................................................................ 9-571
9.22 External entities .................................................. .................................................. 9-572
9.23 Function prototypes ............................................... ............................................... 9-573

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 13

Non-Confidential
9.24 Standard C++ language extensions ...................................................................... 9-574
9.25 ? operator .............................................................................................................. 9-575
9.26 Declaration of a class member .............................................................................. 9-576
9.27 friend ...................................................................................................................... 9-577
9.28 Read/write constants .............................................. .............................................. 9-578
9.29 Scalar type constants ............................................................................................ 9-579
9.30 Specialization of nonmember function templates .................................................. 9-580
9.31 Type conversions ................................................. ................................................. 9-581
9.32 Standard C and Standard C++ language extensions ............................................ 9-582
9.33 Address of a register variable ................................................................................ 9-583
9.34 Arguments to functions .......................................................................................... 9-584
9.35 Anonymous classes, structures and unions .......................................................... 9-585
9.36 Assembler labels ................................................. ................................................. 9-586
9.37 Empty declaration .................................................................................................. 9-587
9.38 Hexadecimal floating-point constants .................................................................... 9-588
9.39 Incomplete enums ................................................ ................................................ 9-589
9.40 Integral type extensions ............................................ ............................................ 9-590
9.41 Label definitions .................................................. .................................................. 9-591
9.42 Long float ....................................................... ....................................................... 9-592
9.43 Nonstatic local variables ........................................................................................ 9-593
9.44 Structure, union, enum, and bitfield extensions .......................... .......................... 9-594
9.45 GNU extensions to the C and C++ languages ........................... ........................... 9-595

Chapter 10 Compiler-specific Features

10.1 Keywords and operators ...................................................................................... 10-600
10.2 __align ........................................................ ........................................................ 10-601
10.3 __ALIGNOF__ .................................................. .................................................. 10-602
10.4 __alignof__ .......................................................................................................... 10-603
10.5 __asm .................................................................................................................. 10-604
10.6 __forceinline ........................................................................................................ 10-605
10.7 __global_reg ........................................................................................................ 10-606
10.8 __inline ................................................................................................................ 10-608
10.9 __int64 ........................................................ ........................................................ 10-609
10.10 __INTADDR__ .................................................. .................................................. 10-610
10.11 __irq .......................................................... .......................................................... 10-611
10.12 __packed ...................................................... ...................................................... 10-612
10.13 __pure ........................................................ ........................................................ 10-614
10.14 __smc .................................................................................................................. 10-615
10.15 __softfp ................................................................................................................ 10-616
10.16 __svc ......................................................... ......................................................... 10-617
10.17 __svc_indirect ...................................................................................................... 10-618
10.18 __svc_indirect_r7 ................................................ ................................................ 10-619
10.19 __value_in_regs .................................................................................................. 10-620
10.20 __weak ................................................................................................................ 10-621
10.21 __writeonly ..................................................... ..................................................... 10-623
10.22 __declspec attributes ............................................. ............................................. 10-624
10.23 __declspec(dllexport) ............................................. ............................................. 10-625
10.24 __declspec(dllimport) ............................................. ............................................. 10-627
10.25 __declspec(noinline) ............................................................................................ 10-628
10.26 __declspec(noreturn) ............................................. ............................................. 10-629

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 14

Non-Confidential
10.27 __declspec(nothrow) ............................................. ............................................. 10-630
10.28 __declspec(notshared) ........................................................................................ 10-631
10.29 __declspec(thread) .............................................................................................. 10-632
10.30 Function attributes ............................................... ............................................... 10-633
10.31 __attribute__((alias)) function attribute ................................................................ 10-635
10.32 __attribute__((always_inline)) function attribute .................................................. 10-637
10.33 __attribute__((const)) function attribute ............................... ............................... 10-638
10.34 __attribute__((constructor[(priority)])) function attribute ...................................... 10-639
10.35 __attribute__((deprecated)) function attribute .......................... .......................... 10-640
10.36 __attribute__((destructor[(priority)])) function attribute ........................................ 10-641
10.37 __attribute__((format)) function attribute .............................. .............................. 10-642
10.38 __attribute__((format_arg(string-index))) function attribute ................ ................ 10-643
10.39 __attribute__((malloc)) function attribute .............................. .............................. 10-644
10.40 __attribute__((noinline)) function attribute ............................. ............................. 10-645
10.41 __attribute__((no_instrument_function)) function attribute .................................. 10-646
10.42 __attribute__((nomerge)) function attribute ............................ ............................ 10-647
10.43 __attribute__((nonnull)) function attribute ............................................................ 10-648
10.44 __attribute__((noreturn)) function attribute .......................................................... 10-649
10.45 __attribute__((notailcall)) function attribute ............................ ............................ 10-650
10.46 __attribute__((nothrow)) function attribute ............................. ............................. 10-651
10.47 __attribute__((pcs("calling_convention"))) function attribute ............... ............... 10-652
10.48 __attribute__((pure)) function attribute ................................................................ 10-653
10.49 __attribute__((section("name"))) function attribute .............................................. 10-654
10.50 __attribute__((sentinel)) function attribute ............................. ............................. 10-655
10.51 __attribute__((unused)) function attribute ............................. ............................. 10-656
10.52 __attribute__((used)) function attribute ............................... ............................... 10-657
10.53 __attribute__((visibility("visibility_type"))) function attribute ................ ................ 10-658
10.54 __attribute__((warn_unused_result)) ................................. ................................. 10-659
10.55 __attribute__((weak)) function attribute ............................... ............................... 10-660
10.56 __attribute__((weakref("target"))) function attribute ...................... ...................... 10-661
10.57 Type attributes .................................................. .................................................. 10-662
10.58 __attribute__((bitband)) type attribute ................................ ................................ 10-663
10.59 __attribute__((aligned)) type attribute ................................ ................................ 10-664
10.60 __attribute__((packed)) type attribute ................................ ................................ 10-665
10.61 __attribute__((transparent_union)) type attribute ................................................ 10-666
10.62 Variable attributes ................................................................................................ 10-667
10.63 __attribute__((alias)) variable attribute ................................................................ 10-668
10.64 __attribute__((at(address))) variable attribute .......................... .......................... 10-669
10.65 __attribute__((aligned)) variable attribute ............................................................ 10-670
10.66 __attribute__((deprecated)) variable attribute .......................... .......................... 10-671
10.67 __attribute__((noinline)) constant variable attribute ............................................ 10-672
10.68 __attribute__((packed)) variable attribute ............................................................ 10-673
10.69 __attribute__((section("name"))) variable attribute .............................................. 10-674
10.70 __attribute__((unused)) variable attribute ............................. ............................. 10-675
10.71 __attribute__((used)) variable attribute ............................... ............................... 10-676
10.72 __attribute__((visibility("visibility_type"))) variable attribute ................ ................ 10-677
10.73 __attribute__((weak)) variable attribute ............................... ............................... 10-678
10.74 __attribute__((weakref("target"))) variable attribute ...................... ...................... 10-679
10.75 __attribute__((zero_init)) variable attribute .......................................................... 10-680
10.76 Pragmas .............................................................................................................. 10-681

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 15

Non-Confidential
10.77 #pragma anon_unions, #pragma no_anon_unions ...................... ...................... 10-682
10.78 #pragma arm ................................................... ................................................... 10-683
10.79 #pragma arm section [section_type_list] .............................. .............................. 10-684
10.80 #pragma diag_default tag[,tag,...] ........................................................................ 10-686
10.81 #pragma diag_error tag[,tag,...] ..................................... ..................................... 10-687
10.82 #pragma diag_remark tag[,tag,...] ................................... ................................... 10-688
10.83 #pragma diag_suppress tag[,tag,...] .................................................................... 10-689
10.84 #pragma diag_warning tag[, tag, ...] .................................................................... 10-690
10.85 #pragma exceptions_unwind, #pragma no_exceptions_unwind ............ ............ 10-691
10.86 #pragma GCC system_header ............................................................................ 10-692
10.87 #pragma hdrstop ................................................ ................................................ 10-693
10.88 #pragma import symbol_name ............................................................................ 10-694
10.89 #pragma import(__use_full_stdio) ................................... ................................... 10-695
10.90 #pragma import(__use_smaller_memcpy) .......................................................... 10-696
10.91 #pragma inline, #pragma no_inline .................................. .................................. 10-697
10.92 #pragma no_pch .................................................................................................. 10-698
10.93 #pragma Onum .................................................................................................... 10-699
10.94 #pragma once ...................................................................................................... 10-700
10.95 #pragma Ospace ................................................ ................................................ 10-701
10.96 #pragma Otime .................................................................................................... 10-702
10.97 #pragma pack(n) ................................................ ................................................ 10-703
10.98 #pragma pop ................................................... ................................................... 10-705
10.99 #pragma push ...................................................................................................... 10-706
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage .......................................... 10-707
10.101 #pragma thumb ................................................. ................................................. 10-708
10.102 #pragma unroll [(n)] .............................................. .............................................. 10-709
10.103 #pragma unroll_completely .................................................................................. 10-711
10.104 #pragma weak symbol, #pragma weak symbol1 = symbol2 ............... ............... 10-712
10.105 Instruction intrinsics .............................................. .............................................. 10-713
10.106 __breakpoint intrinsic ............................................. ............................................. 10-714
10.107 __cdp intrinsic ...................................................................................................... 10-715
10.108 __clrex intrinsic .................................................................................................... 10-716
10.109 __clz intrinsic ................................................... ................................................... 10-717
10.110 __current_pc intrinsic ............................................. ............................................. 10-718
10.111 __current_sp intrinsic ............................................. ............................................. 10-719
10.112 __disable_fiq intrinsic .......................................................................................... 10-720
10.113 __disable_irq intrinsic .......................................................................................... 10-721
10.114 __dmb intrinsic .................................................. .................................................. 10-723
10.115 __dsb intrinsic ...................................................................................................... 10-724
10.116 __enable_fiq intrinsic ............................................. ............................................. 10-725
10.117 __enable_irq intrinsic ............................................. ............................................. 10-726
10.118 __fabs intrinsic .................................................. .................................................. 10-727
10.119 __fabsf intrinsic .................................................................................................... 10-728
10.120 __force_loads intrinsic ............................................ ............................................ 10-729
10.121 __force_stores intrinsic ........................................................................................ 10-730
10.122 __isb intrinsic ................................................... ................................................... 10-731
10.123 __ldrex intrinsic .................................................................................................... 10-732
10.124 __ldrexd intrinsic .................................................................................................. 10-734
10.125 __ldrt intrinsic ................................................... ................................................... 10-735
10.126 __memory_changed intrinsic ....................................... ....................................... 10-736

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 16

Non-Confidential
10.127 __nop intrinsic .................................................. .................................................. 10-737
10.128 __pld intrinsic ................................................... ................................................... 10-739
10.129 __pldw intrinsic .................................................................................................... 10-740
10.130 __pli intrinsic ........................................................................................................ 10-741
10.131 __promise intrinsic ............................................... ............................................... 10-742
10.132 __qadd intrinsic ................................................. ................................................. 10-743
10.133 __qdbl intrinsic .................................................. .................................................. 10-744
10.134 __qsub intrinsic .................................................................................................... 10-745
10.135 __rbit intrinsic ................................................... ................................................... 10-746
10.136 __rev intrinsic ................................................... ................................................... 10-747
10.137 __return_address intrinsic ......................................... ......................................... 10-748
10.138 __ror intrinsic ................................................... ................................................... 10-749
10.139 __schedule_barrier intrinsic ........................................ ........................................ 10-750
10.140 __semihost intrinsic .............................................. .............................................. 10-751
10.141 __sev intrinsic ...................................................................................................... 10-753
10.142 __sqrt intrinsic .................................................. .................................................. 10-754
10.143 __sqrtf intrinsic .................................................. .................................................. 10-755
10.144 __ssat intrinsic .................................................. .................................................. 10-756
10.145 __strex intrinsic .................................................................................................... 10-757
10.146 __strexd intrinsic .................................................................................................. 10-759
10.147 __strt intrinsic ................................................... ................................................... 10-761
10.148 __swp intrinsic .................................................. .................................................. 10-762
10.149 __usat intrinsic .................................................. .................................................. 10-763
10.150 __wfe intrinsic ...................................................................................................... 10-764
10.151 __wfi intrinsic ................................................... ................................................... 10-765
10.152 __yield intrinsic .................................................................................................... 10-766
10.153 ARMv6 SIMD intrinsics ........................................................................................ 10-767
10.154 ETSI basic operations ............................................ ............................................ 10-768
10.155 C55x intrinsics .................................................. .................................................. 10-770
10.156 VFP status intrinsic .............................................................................................. 10-771
10.157 __vfp_status intrinsic ............................................. ............................................. 10-772
10.158 Fused Multiply Add (FMA) intrinsics .................................................................... 10-773
10.159 Named register variables .......................................... .......................................... 10-774
10.160 GNU built-in functions .......................................................................................... 10-778
10.161 Predefined macros ............................................... ............................................... 10-786
10.162 Built-in function name variables ..................................... ..................................... 10-792

Chapter 11 C and C++ Implementation Details

11.1 Character sets and identifiers in ARM C and C++ ....................... ....................... 11-794
11.2 Basic data types in ARM C and C++ ................................. ................................. 11-796
11.3 Operations on basic data types ARM C and C++ ................................................ 11-798
11.4 Structures, unions, enumerations, and bitfields in ARM C and C++ .................... 11-799
11.5 Using the ::operator new function in ARM C++ ......................... ......................... 11-805
11.6 Tentative arrays in ARM C++ ....................................... ....................................... 11-806
11.7 Old-style C parameters in ARM C++ functions .................................................... 11-807
11.8 Anachronisms in ARM C++ ........................................ ........................................ 11-808
11.9 Template instantiation in ARM C++ .................................. .................................. 11-809
11.10 Namespaces in ARM C++ ......................................... ......................................... 11-810
11.11 C++ exception handling in ARM C++ ................................. ................................. 11-812
11.12 Extern inline functions in ARM C++ .................................. .................................. 11-813

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 17

Non-Confidential
11.13 C++11 supported features ......................................... ......................................... 11-814

Chapter 12 ARMv6 SIMD Instruction Intrinsics

12.1 ARMv6 SIMD intrinsics by prefix .................................... .................................... 12-819
12.2 ARMv6 SIMD intrinsics, summary descriptions, byte lanes, affected flags .... .... 12-821
12.3 ARMv6 SIMD intrinsics, compatible processors and architectures .......... .......... 12-824
12.4 ARMv6 SIMD instruction intrinsics and APSR GE flags ...................................... 12-825
12.5 __qadd16 intrinsic ............................................... ............................................... 12-827
12.6 __qadd8 intrinsic ................................................ ................................................ 12-828
12.7 __qasx intrinsic .................................................................................................... 12-829
12.8 __qsax intrinsic .................................................................................................... 12-830
12.9 __qsub16 intrinsic ................................................................................................ 12-831
12.10 __qsub8 intrinsic .................................................................................................. 12-832
12.11 __sadd16 intrinsic ................................................................................................ 12-833
12.12 __sadd8 intrinsic .................................................................................................. 12-834
12.13 __sasx intrinsic .................................................................................................... 12-835
12.14 __sel intrinsic ................................................... ................................................... 12-836
12.15 __shadd16 intrinsic .............................................................................................. 12-837
12.16 __shadd8 intrinsic ................................................................................................ 12-838
12.17 __shasx intrinsic .................................................................................................. 12-839
12.18 __shsax intrinsic .................................................................................................. 12-840
12.19 __shsub16 intrinsic .............................................................................................. 12-841
12.20 __shsub8 intrinsic ................................................................................................ 12-842
12.21 __smlad intrinsic .................................................................................................. 12-843
12.22 __smladx intrinsic ................................................................................................ 12-844
12.23 __smlald intrinsic ................................................ ................................................ 12-845
12.24 __smlaldx intrinsic ............................................... ............................................... 12-846
12.25 __smlsd intrinsic .................................................................................................. 12-847
12.26 __smlsdx intrinsic ................................................................................................ 12-848
12.27 __smlsld intrinsic ................................................ ................................................ 12-849
12.28 __smlsldx intrinsic ............................................... ............................................... 12-850
12.29 __smuad intrinsic ................................................ ................................................ 12-851
12.30 __smuadx intrinsic ............................................... ............................................... 12-852
12.31 __smusd intrinsic ................................................ ................................................ 12-853
12.32 __smusdx intrinsic ............................................... ............................................... 12-854
12.33 __ssat16 intrinsic ................................................ ................................................ 12-855
12.34 __ssax intrinsic .................................................................................................... 12-856
12.35 __ssub16 intrinsic ................................................................................................ 12-857
12.36 __ssub8 intrinsic .................................................................................................. 12-858
12.37 __sxtab16 intrinsic ............................................... ............................................... 12-859
12.38 __sxtb16 intrinsic ................................................ ................................................ 12-860
12.39 __uadd16 intrinsic ............................................... ............................................... 12-861
12.40 __uadd8 intrinsic ................................................ ................................................ 12-862
12.41 __uasx intrinsic .................................................................................................... 12-863
12.42 __uhadd16 intrinsic .............................................. .............................................. 12-864
12.43 __uhadd8 intrinsic ............................................... ............................................... 12-865
12.44 __uhasx intrinsic .................................................................................................. 12-866
12.45 __uhsax intrinsic .................................................................................................. 12-867
12.46 __uhsub16 intrinsic .............................................................................................. 12-868
12.47 __uhsub8 intrinsic ................................................................................................ 12-869

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 18

Non-Confidential
12.48 __uqadd16 intrinsic .............................................. .............................................. 12-870
12.49 __uqadd8 intrinsic ............................................... ............................................... 12-871
12.50 __uqasx intrinsic .................................................................................................. 12-872
12.51 __uqsax intrinsic .................................................................................................. 12-873
12.52 __uqsub16 intrinsic .............................................................................................. 12-874
12.53 __uqsub8 intrinsic ................................................................................................ 12-875
12.54 __usad8 intrinsic .................................................................................................. 12-876
12.55 __usada8 intrinsic ................................................................................................ 12-877
12.56 __usat16 intrinsic ................................................ ................................................ 12-878
12.57 __usax intrinsic .................................................................................................... 12-879
12.58 __usub16 intrinsic ................................................................................................ 12-880
12.59 __usub8 intrinsic .................................................................................................. 12-881
12.60 __uxtab16 intrinsic ............................................... ............................................... 12-882
12.61 __uxtb16 intrinsic ................................................ ................................................ 12-883

Chapter 13 Via File Syntax

13.1 Overview of via files .............................................. .............................................. 13-885
13.2 Via file syntax rules .............................................................................................. 13-886

Chapter 14 Summary Table of GNU Language Extensions

14.1 Supported GNU extensions ........................................ ........................................ 14-888

Chapter 15 Standard C Implementation Definition

15.1 Implementation definition .......................................... .......................................... 15-892
15.2 Translation ..................................................... ..................................................... 15-893
15.3 Environment ................................................... ................................................... 15-894
15.4 Identifiers ...................................................... ...................................................... 15-896
15.5 Characters ..................................................... ..................................................... 15-897
15.6 Integers ................................................................................................................ 15-899
15.7 Floating-point ................................................... ................................................... 15-900
15.8 Arrays and pointers .............................................. .............................................. 15-901
15.9 Registers ...................................................... ...................................................... 15-902
15.10 Structures, unions, enumerations, and bitfields ......................... ......................... 15-903
15.11 Qualifiers ...................................................... ...................................................... 15-907
15.12 Expression evaluation ............................................ ............................................ 15-908
15.13 Preprocessing directives .......................................... .......................................... 15-909
15.14 Library functions .................................................................................................. 15-910
15.15 Behaviors considered undefined by the ISO C Standard .................................... 15-911

Chapter 16 Standard C++ Implementation Definition

16.1 Integral conversion .............................................................................................. 16-913
16.2 Calling a pure virtual function .............................................................................. 16-914
16.3 Major features of language support .................................. .................................. 16-915
16.4 Standard C++ library implementation definition ......................... ......................... 16-916

Chapter 17 C and C++ Compiler Implementation Limits

17.1 C++ ISO/IEC standard limits ....................................... ....................................... 17-918
17.2 Limits for integral numbers .................................................................................. 17-920
17.3 Limits for floating-point numbers .................................... .................................... 17-921

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 19

Non-Confidential
Chapter 18 Using NEON Support
18.1 Introduction to NEON intrinsics ..................................... ..................................... 18-925
18.2 Vector data types ................................................ ................................................ 18-926
18.3 NEON intrinsics ................................................. ................................................. 18-927
18.4 NEON intrinsics for addition ........................................ ........................................ 18-929
18.5 NEON intrinsics for multiplication ........................................................................ 18-931
18.6 NEON intrinsics for subtraction ..................................... ..................................... 18-933
18.7 NEON intrinsics for comparison ..................................... ..................................... 18-935
18.8 NEON intrinsics for absolute difference ............................... ............................... 18-937
18.9 NEON intrinsics for maximum and minimum ........................... ........................... 18-938
18.10 NEON intrinsics for pairwise addition .................................................................. 18-939
18.11 NEON intrinsics for folding maximum .................................................................. 18-940
18.12 NEON intrinsics for folding minimum ................................. ................................. 18-941
18.13 NEON intrinsics for reciprocal and sqrt ............................... ............................... 18-942
18.14 NEON intrinsics for shifts by signed variable ........................... ........................... 18-943
18.15 NEON intrinsics for shifts by a constant .............................................................. 18-945
18.16 NEON intrinsics for shifts with insert ................................. ................................. 18-949
18.17 NEON intrinsics for loading a single vector or lane ...................... ...................... 18-951
18.18 NEON intrinsics for storing a single vector or lane .............................................. 18-953
18.19 NEON intrinsics for loading an N-element structure ............................................ 18-955
18.20 NEON intrinsics for extracting lanes from a vector into a register ........... ........... 18-964
18.21 NEON intrinsics for loading a single lane of a vector from a literal ...................... 18-965
18.22 NEON intrinsics for initializing a vector from a literal bit pattern .......................... 18-966
18.23 NEON intrinsics for setting all lanes to the same value ................... ................... 18-967
18.24 NEON intrinsics for combining vectors ................................................................ 18-969
18.25 NEON intrinsics for splitting vectors .................................................................... 18-970
18.26 NEON intrinsics for converting vectors ................................................................ 18-971
18.27 NEON intrinsics for table look up .................................... .................................... 18-972
18.28 NEON intrinsics for extended table look up ............................ ............................ 18-973
18.29 NEON intrinsics for operations with a scalar value .............................................. 18-974
18.30 NEON intrinsics for vector extraction ................................. ................................. 18-978
18.31 NEON intrinsics for reversing vector elements (swap endianness) .......... .......... 18-979
18.32 NEON intrinsics for other single operand arithmetic ............................................ 18-980
18.33 NEON intrinsics for logical operations ................................ ................................ 18-982
18.34 NEON intrinsics for transposition operations ........................... ........................... 18-984
18.35 NEON intrinsics for vector cast operations .......................................................... 18-985
18.36 NEON instructions without equivalent intrinsics .................................................. 18-986

Appendix A Compiler Document Revisions

A.1 Revisions for armcc Compiler User Guide ......................... ......................... Appx-A-989

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 20

Non-Confidential
List of Figures
ARM® Compiler armcc User Guide

Figure 2-1 GCC fallback process diagram .............................................................................................. 2-58

Figure 5-1 Half-precision floating-point format ...................................................................................... 5-211
Figure 10-1 Nonpacked structure S ...................................................................................................... 10-703
Figure 10-2 Packed structure SP .......................................................................................................... 10-703
Figure 11-1 Conventional nonpacked structure example ...................................................................... 11-800
Figure 11-2 Bitfield allocation 1 ............................................................................................................. 11-802
Figure 11-3 Bitfield allocation 2 ............................................................................................................. 11-802
Figure 15-1 Conventional nonpacked structure example ..................................................................... 15-904
Figure 15-2 Bitfield allocation 1 ............................................................................................................. 15-906
Figure 15-3 Bitfield allocation 2 ............................................................................................................. 15-906

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 21

Non-Confidential
List of Tables
ARM® Compiler armcc User Guide

Table 2-1 Filename suffixes recognized by the compiler ....................................................................... 2-49

Table 2-2 Include file search paths ........................................................................................................ 2-53
Table 3-1 Array A ................................................................................................................................... 3-70
Table 3-2 Array B ................................................................................................................................... 3-70
Table 3-3 Result .................................................................................................................................... 3-70
Table 3-4 Vectorizable and nonvectorizable loops ................................................................................ 3-87
Table 3-5 Factors that limit or prevent automatic vectorization ........................................................... 3-101
Table 5-1 C code for incrementing and decrementing loops ............................................................... 5-159
Table 5-2 C Disassembly for incrementing and decrementing loops .................................................. 5-159
Table 5-3 C code for rolled and unrolled bit-counting loops ................................................................ 5-161
Table 5-4 Disassembly for rolled and unrolled bit-counting loops ....................................................... 5-162
Table 5-5 C code for nonvolatile and volatile buffer loops ................................................................... 5-163
Table 5-6 Disassembly for nonvolatile and volatile buffer loop ............................................................ 5-164
Table 5-7 C code for pure and impure functions ................................................................................. 5-175
Table 5-8 Disassembly for pure and impure functions ........................................................................ 5-175
Table 5-9 Compiler storage of data objects by byte alignment ............................................................ 5-190
Table 5-10 C code for an unpacked struct, a packed struct, and a struct with individually packed fields ... 5-
198
Table 5-11 Disassembly for an unpacked struct, a packed struct, and a struct with individually packed
fields .................................................................................................................................... 5-198
Table 5-12 C code for a packed struct and a pragma packed struct ..................................................... 5-199
Table 5-13 Compiler options for floating-point linkage and floating-point computations ....................... 5-214
Table 5-14 FPU-option capabilities and requirements ........................................................................... 5-216

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 22

Non-Confidential
Table 5-15 Implicit FPUs of processors ................................................................................................. 5-218
Table 6-1 Severity of diagnostic messages ........................................................................................ 6-254
Table 6-2 Identifying diagnostic messages .......................................................................................... 6-259
Table 7-1 Differences between inline and embedded assembler ........................................................ 7-311
Table 8-1 Compiling with the --asm option .......................................................................................... 8-335
Table 8-2 Compatible processor or architecture combinations ........................................................... 8-351
Table 8-3 Supported ARM architectures ............................................................................................. 8-368
Table 8-4 Compiling with the --interleave option ................................................................................. 8-442
Table 8-5 Compiling with the -o option ................................................................................................ 8-471
Table 8-6 Compiling without the -o option ........................................................................................... 8-471
Table 9-1 Behavior of constant value initializers in comparison with ISO Standard C ........................ 9-566
Table 10-1 Keyword extensions that the ARM compiler supports ....................................................... 10-600
Table 10-2 __declspec attributes that the compiler supports, and their equivalents ........................... 10-624
Table 10-3 Function attributes that the compiler supports, and their equivalents ............................... 10-633
Table 10-4 Type attributes that the compiler supports, and their equivalents ..................................... 10-662
Table 10-5 Variable attributes that the compiler supports, and their equivalents ................................ 10-667
Table 10-6 Pragmas that the compiler supports .................................................................................. 10-681
Table 10-7 Instruction intrinsics that the ARM compiler supports ........................................................ 10-713
Table 10-8 Access widths that the __ldrex intrinsic supports .............................................................. 10-732
Table 10-9 Access widths that the __ldrex intrinsic supports .............................................................. 10-734
Table 10-10 Access widths that the __ldrt intrinsic supports ................................................................. 10-735
Table 10-11 Access widths that the __strex intrinsic supports .............................................................. 10-757
Table 10-12 Access widths that the __strexd intrinsic supports ............................................................ 10-759
Table 10-13 Access widths that the __strt intrinsic supports ................................................................. 10-761
Table 10-14 Access widths that the __swp intrinsic supports ............................................................... 10-762
Table 10-15 ETSI basic operations that the ARM compilation tools support ......................................... 10-768
Table 10-16 ETSI status flags exposed in the ARM compilation tools .................................................. 10-768
Table 10-17 TI C55x intrinsics that the compilation tools support ......................................................... 10-770
Table 10-18 Modifying the FPSCR flags ............................................................................................... 10-772
Table 10-19 Named registers available on ARM architecture-based processors .................................. 10-775
Table 10-20 Named registers available on targets with floating-point hardware ................................... 10-775
Table 10-21 Predefined macros ............................................................................................................ 10-786
Table 10-22 Thumb architecture versions in relation to ARM architecture versions ............................. 10-791
Table 10-23 built-in variables ................................................................................................................. 10-792
Table 11-1 Character escape codes .................................................................................................... 11-794
Table 11-2 Size and alignment of data types ....................................................................................... 11-796
Table 12-1 ARMv6 SIMD intrinsics by prefix ....................................................................................... 12-819
Table 12-2 ARMv6 SIMD intrinsics, summary descriptions, byte lanes, affected flags ....................... 12-821
Table 12-3 ARMv6 SIMD intrinsics, compatible processors and architectures ................................... 12-824
Table 12-4 ARMv6 SIMD instruction intrinsics and APSR GE flags .................................................... 12-825
Table 14-1 Supported GNU extensions ............................................................................................... 14-888
Table 15-1 Character escape codes .................................................................................................... 15-897
Table 16-1 Major feature support for language .................................................................................. 16-915
Table 17-1 Implementation limits ......................................................................................................... 17-918
Table 17-2 Integer ranges ................................................................................................................... 17-920
Table 17-3 Floating-point limits ........................................................................................................... 17-921
Table 17-4 Other floating-point characteristics ................................................................................... 17-921
Table 18-1 Vector data types ............................................................................................................... 18-926
Table 18-2 NEON instructions without equivalent intrinsics ................................................................ 18-986
Table A-1 Differences between issue L and issue M ................................................................. Appx-A-989

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 23

Non-Confidential
Table A-2 Differences between issue K and issue L .................................................................. Appx-A-989
Table A-3 Differences between issue J and issue K .................................................................. Appx-A-989
Table A-4 Differences between issue I and issue J ................................................................... Appx-A-991
Table A-5 Differences between issue H and issue I ................................................................... Appx-A-992
Table A-6 Differences between issue G and issue H ................................................................. Appx-A-993
Table A-7 Differences between issue F and issue G ................................................................. Appx-A-994
Table A-8 Differences between issue E and issue F .................................................................. Appx-A-995
Table A-9 Differences between issue D and issue E ................................................................. Appx-A-996
Table A-10 Differences between issue C and issue D ................................................................. Appx-A-997
Table A-11 Differences between issue B and issue C ............................................................... Appx-A-1000
Table A-12 Differences between issue A and issue B ............................................................... Appx-A-1000

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 24

Non-Confidential
Preface

This preface introduces the ARM® Compiler armcc User Guide.

It contains the following:
• About this book on page 26.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 25

Non-Confidential
Preface
About this book

About this book

ARM® Compiler armcc User Guide. This manual provides user information for the ARM compiler,
armcc. armcc is an optimizing C and C++ compiler that compiles Standard C and Standard C++ source
code into machine code for ARM architecture-based processors.

Using this book

This book is organized into the following chapters:
Chapter 1 Overview of the Compiler
Gives an overview of the ARM compiler, the languages and extensions it supports, and the
provided libraries.
Chapter 2 Getting Started with the Compiler
Introduces some of the more common ARM compiler command-line options.
Chapter 3 Using the NEON Vectorizing Compiler
Introduces the NEON unit and explains how to take advantage of automatic vectorizing features.
Chapter 4 Compiler Features
Provides an overview of ARM-specific features of the compiler.
Chapter 5 Compiler Coding Practices
Describes programming techniques and practices to help you increase the portability, efficiency
and robustness of your C and C++ source code.
Chapter 6 Compiler Diagnostic Messages
Describes the format of compiler diagnostic messages and how to control the output during
compilation.
Chapter 7 Using the Inline and Embedded Assemblers of the ARM Compiler
Describes the optimizing inline assembler and non-optimizing embedded assembler of the ARM
compiler, armcc.
Chapter 8 Compiler Command-line Options
Describes the armcc compiler command-line options.
Chapter 9 Language Extensions
Describes the language extensions that the compiler supports.
Chapter 10 Compiler-specific Features
Describes compiler-specific features including ARM extensions to the C and C++ Standards,
ARM-specific pragmas and intrinsics, and predefined macros.
Chapter 11 C and C++ Implementation Details
Describes the language implementation details for the compiler. Some language implementation
details are common to both C and C++, while others are specific to C++.
Chapter 12 ARMv6 SIMD Instruction Intrinsics
Describes the ARMv6 SIMD instruction intrinsics. SIMD instructions allow the processor to
operate on packed 8-bit or 16-bit values in 32-bit registers.
Chapter 13 Via File Syntax
Describes the syntax of via files accepted by armcc.
Chapter 14 Summary Table of GNU Language Extensions
Describes ARM compiler support for GNU extensions to the C and C++ languages.
Chapter 15 Standard C Implementation Definition
Provides information required by the ISO C standard for conforming C implementations.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 26

Non-Confidential
Preface
About this book

Chapter 16 Standard C++ Implementation Definition

Lists the C++ language features defined in the ISO/IEC standard for C++, and states whether or
not ARM C++ supports that language feature.
Chapter 17 C and C++ Compiler Implementation Limits
Describes the implementation limits when using the ARM compiler to compile C and C++.
Chapter 18 Using NEON Support
Describes NEON intrinsics support in this release of the ARM compilation tools.
Appendix A Compiler Document Revisions
Describes the technical changes that have been made to the armcc Compiler User Guide.

Glossary
The ARM Glossary is a list of terms used in ARM documentation, together with definitions for those
terms. The ARM Glossary does not contain terms that are industry standard unless the ARM meaning
differs from the generally accepted meaning.
See the ARM Glossary for more information.

Typographic conventions
italic
Introduces special terminology, denotes cross-references, and citations.
bold
Highlights interface elements, such as menu names. Denotes signal names. Also used for terms
in descriptive lists, where appropriate.
monospace
Denotes text that you can enter at the keyboard, such as commands, file and program names,
and source code.
monospace
Denotes a permitted abbreviation for a command or option. You can enter the underlined text
instead of the full command or option name.
monospace italic
Denotes arguments to monospace text where the argument is to be replaced by a specific value.
monospace bold
Denotes language keywords when used outside example code.
<and>
Encloses replaceable terms for assembler syntax where they appear in code or code fragments.
For example:
MRC p15, 0, <Rd>, <CRn>, <CRm>, <Opcode_2>

SMALL CAPITALS
Used in body text for a few terms that have specific technical meanings, that are defined in the
ARM glossary. For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and
UNPREDICTABLE.

Feedback

Feedback on this product

If you have any comments or suggestions about this product, contact your supplier and give:
• The product name.
• The product revision or version.
• An explanation with as much information as you can provide. Include symptoms and diagnostic
procedures if appropriate.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 27

Non-Confidential
Preface
About this book

Feedback on content
If you have comments on content then send an e-mail to [email protected]. Give:
• The title ARM® Compiler armcc User Guide.
• The number ARM DUI0472M.
• If applicable, the page number(s) to which your comments refer.
• A concise explanation of your comments.
ARM also welcomes general suggestions for additions and improvements.
Note
ARM tests the PDF only in Adobe Acrobat and Acrobat Reader, and cannot guarantee the quality of the
represented document when used with any other PDF reader.

Other information
• ARM Information Center.
• ARM Technical Support Knowledge Articles.
• Support and Maintenance.
• ARM Glossary.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 28

Non-Confidential
Chapter 1
Overview of the Compiler

Gives an overview of the ARM compiler, the languages and extensions it supports, and the provided
libraries.
It contains the following sections:
• 1.1 The compiler on page 1-30.
• 1.2 Source language modes of the compiler on page 1-31.
• 1.3 Language extensions on page 1-33.
• 1.4 Language compliance on page 1-34.
• 1.5 The C and C++ libraries on page 1-35.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-29

Non-Confidential
1 Overview of the Compiler
1.1 The compiler

1.1 The compiler

The compiler, armcc, is an optimizing C and C++ compiler that compiles Standard C and Standard C++
source code into machine code for ARM architecture-based processors.
Command-line options enable you to control the level of optimization.
The compiler compiles the following different varieties of C and C++ source code into ARM and
Thumb® code:
• ISO Standard C:1990 source.
• ISO Standard C:1999 source.
• ISO Standard C++:2003 source.
• ISO Standard C++:2011 source.
Publications on the C and C++ standards are available from national standards bodies. For example,
AFNOR in France and ANSI in the USA.
The compiler also provides a vectorization mode for ARM processors that have NEON™ technology,
enabling use of the ARM Advanced Single Instruction Multiple Data (SIMD) extension. Vectorization
involves the compiler generating NEON vector instructions directly from C or C++ code.
armcc complies with the Base Standard Application Binary Interface for the ARM Architecture (BSABI).
In particular, the compiler:
• Generates output objects in ELF format.
• Generates Debug With Arbitrary Record Format Debugging Standard Version 3 (DWARF 3) debug
information and contains support for DWARF 2 debug tables.
• Uses the Edison Design Group (EDG) front end.
Many features of the compiler are designed to take advantage of the target processor or architecture that
your code is designed to run on, so knowledge of your target processor or architecture is useful, and in
some cases, essential, when working with the compiler.
Note
Be aware of the following:
• Generated code might be different between two ARM® Compiler releases.
• For a feature release, there might be significant code generation differences.

Note
The command-line option descriptions and related information in the individual ARM Compiler tools
documents describe all the features that ARM Compiler supports. Any features not documented are not
supported and are used at your own risk. You are responsible for making sure that any generated code
using unsupported features is operating correctly.

Related concepts
3.1 NEON technology on page 3-69.

Related information
The DWARF Debugging Standard, http://dwarfstd.org/.
Application Binary Interface (ABI) for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-30

Non-Confidential
1 Overview of the Compiler
1.2 Source language modes of the compiler

1.2 Source language modes of the compiler

The compiler can compile different varieties of C and C++ source code.
ISO C90
The compiler compiles C as defined by the 1990 C standard and addenda.
• ISO/IEC 9899:1990. The 1990 International Standard for C.
• ISO/IEC 9899 AM1. The 1995 Normative Addendum 1, adding international character
support through wchar.h and wtype.h.
ISO C99
The compiler compiles C as defined by the 1999 C standard and addenda:
• ISO/IEC 9899:1999. The 1999 International Standard for C.
• ISO/IEC 9899:1999/Cor 2:2004. Technical Corrigendum 2.
ISO C++03
The compiler compiles C++ as defined by the 2003 standard, excepting export templates:
• ISO/IEC 14882:2003. The 2003 International Standard for C++.
ISO C++11
The compiler compiles supported features of C++11 as defined by the 2011 standard.
• ISO/IEC 14882:2011. The 2011 International Standard for C++.
The compiler provides support for numerous extensions to the C and C++ languages. For example, it
supports some GNU compiler extensions. The compiler has several modes in which compliance with a
source language is either enforced or relaxed:
Strict mode
In strict mode the compiler enforces compliance with the language standard relevant to the
source language.
To compile in strict mode, use the command-line option --strict.
GNU mode
In GNU mode all the GNU compiler extensions to the relevant source language are available.
To compile in GNU mode, use the compiler option --gnu.
Throughout this document, the term:
C90
Means ISO C90, together with the ARM extensions.
Use the compiler option --c90 to compile C90 code. This is the default.
Strict C90
Means C as defined by the 1990 C standard and addenda.
Use the compiler options --C90 --strict to enforce strict C90 code. Because C90 is the
default, you could omit --C90.
C99
Means ISO C99, together with the ARM and GNU extensions.
Use the compiler option --c99 to compile C99 code.
Strict C99
Means C as defined by the 1999 C standard and addenda.
Use the compiler options --c99 --strict to compile strict C99 code.
Standard C
Means C90 or C99 as appropriate.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-31

Non-Confidential
1 Overview of the Compiler
1.2 Source language modes of the compiler

C
Means any of C90, strict C90, C99, strict C99, and Standard C.
C++03
Means ISO C++03, excepting export templates, either with or without the ARM extensions.
Use the compiler option --cpp to compile C++03 code.
Use the compiler options --cpp --cpp_compat to maximize binary compatibility with C++03
code compiled using older compiler versions.
Strict C++03
Means ISO C++03, excepting export templates.
Use the compiler options --cpp --strict to compile strict C++03 code.
C++11
Means ISO C++11, either with or without the ARM extensions.
Use the compiler option --cpp11 to compile C++11 code.
Use the compiler options --cpp11 --cpp_compat to compile a subset of C++11 code that
maximizes compatibility with code compiled to the C++ 2003 standard.
Strict C++11
Means ISO C++11.
Use the compiler options --cpp11 --strict to compile strict C++11 code.
Standard C++
Means strict C++03 or strict C++11 as appropriate.
C++
Means any of C++03, strict C++03, C++11, strict C++11.

Related concepts
5.59 New language features of C99 on page 5-228.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.

Related references
1.3 Language extensions on page 1-33.
1.4 Language compliance on page 1-34.
1.3 Language extensions on page 1-33.
1.4 Language compliance on page 1-34.
11.13 C++11 supported features on page 11-814.
15.1 Implementation definition on page 15-892.
16.4 Standard C++ library implementation definition on page 16-916.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-32

Non-Confidential
1 Overview of the Compiler
1.3 Language extensions

1.3 Language extensions

The compiler supports numerous extensions to its various source languages.
These language extensions are categorized as follows:
C99 features
The compiler makes some language features of C99 available:
• As extensions to strict C90, for example, //-style comments.
• As extensions to both Standard C++ and strict C90, for example, restrict pointers.
Standard C extensions
The compiler supports numerous extensions to strict C99, for example, function prototypes that
override old-style nonprototype definitions.
These extensions to Standard C are also available in C90.
Standard C++ extensions
The compiler supports numerous extensions to strict C++, for example, qualified names in the
declaration of class members.
These extensions are not available in either Standard C or C90.
Standard C and Standard C++ extensions
The compiler supports some extensions specific to strict C++ and strict C90, for example,
anonymous classes, structures, and unions.
GNU extensions
The compiler supports some GNU extensions.
ARM-specific extensions
The compiler supports a range of extensions specific to the ARM compiler, for example,
instruction intrinsics and other built-in functions.

Related references
9.6 C99 language features available in C90 on page 9-556.
9.10 C99 language features available in C++ and C90 on page 9-560.
9.15 Standard C language extensions on page 9-565.
9.24 Standard C++ language extensions on page 9-574.
9.32 Standard C and Standard C++ language extensions on page 9-582.
1.4 Language compliance on page 1-34.
9.45 GNU extensions to the C and C++ languages on page 9-595.
Chapter 14 Summary Table of GNU Language Extensions on page 14-887.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-33

Non-Confidential
1 Overview of the Compiler
1.4 Language compliance

1.4 Language compliance

The compiler provides several command-line options for either enforcing or relaxing compliance with
the available source languages.
Strict mode
In strict mode the compiler enforces compliance with the language standard relevant to the
source language. For example, the use of //-style comments results in an error when compiling
strict C90.
To compile in strict mode, use the command-line option --strict.
GNU mode
In GNU mode all the GNU compiler extensions to the relevant source language are available.
For example, in GNU mode:
• Case ranges in switch statements are available when the source language is any of C90, C99
or nonstrict C++.
• C99-style designated initializers are available when the source language is either C90 or
nonstrict C++.
To compile in GNU mode, use the compiler option --gnu.
Note
Some GNU extensions are also available when you are in a nonstrict mode.

Examples
The following examples illustrate combining source language modes with language compliance modes:
• Compiling a .cpp file with the command-line option --strict compiles Standard C++03.
• Compiling a C source file with the command-line option --gnu compiles GNU mode C90.
• Compiling a .c file with the command-line options --strict and --gnu is an error.

Related references
8.93 --gnu on page 8-424.
8.176 --strict, --no_strict on page 8-513.
9.45 GNU extensions to the C and C++ languages on page 9-595.
2.7 Filename suffixes recognized by the compiler on page 2-49.
Chapter 14 Summary Table of GNU Language Extensions on page 14-887.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-34

Non-Confidential
1 Overview of the Compiler
1.5 The C and C++ libraries

1.5 The C and C++ libraries

ARM provides a number of runtime C and C++ libraries, including the ARM C libraries, the Rogue
Wave Standard C++ Library, and ARM C libraries.
The following runtime C and C++ libraries are provided:
The ARM C libraries
The ARM C libraries provide standard C functions, and helper functions used by the C and C++
libraries. The C libraries also provide target-dependent functions that implement the standard C
library functions such as printf() in a semihosted environment. The C libraries are structured
so that you can redefine target-dependent functions in your own code to remove semihosting
dependencies.
The ARM libraries comply with:
• The C Library ABI for the ARM Architecture (CLIBABI).
• The C++ ABI for the ARM Architecture (CPPABI).
Rogue Wave Standard C++ Library
The Rogue Wave Standard C++ Library, as supplied by Rogue Wave Software, Inc., provides
Standard C++ functions and objects such as cout. It includes data structures and algorithms
known as the Standard Template Library (STL). The C++ libraries use the C libraries to provide
target-specific support. The Rogue Wave Standard C++ Library is provided with C++
exceptions enabled.
For more information on the Rogue Wave libraries, see the Rogue Wave HTML documentation.
These manuals might be installed with the documentation of your ARM product. If they are not
installed, you can view them at Rogue Wave Standard C++ Library Documentation
Support libraries
The ARM C libraries provide additional components to enable support for C++ and to compile
code for different architectures and processors.
The C and C++ libraries are provided as binaries only. There is a variant of the 1990 ISO Standard C
library for each combination of major build options, such as the byte order of the target system, whether
interworking is selected, and whether floating-point support is selected.

Related information
ARM DS-5 License Management Guide.
Application Binary Interface (ABI) for the ARM Architecture.
Compliance with the Application Binary Interface (ABI) for the ARM architecture.
The ARM C and C++ Libraries.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 1-35

Non-Confidential
Chapter 2
Getting Started with the Compiler

Introduces some of the more common ARM compiler command-line options.

It contains the following sections:
• 2.1 Compiler command-line syntax on page 2-37.
• 2.2 Compiler command-line options listed by group on page 2-38.
• 2.3 Default compiler behavior on page 2-44.
• 2.4 Order of compiler command-line options on page 2-45.
• 2.5 Using stdin to input source code to the compiler on page 2-46.
• 2.6 Directing output to stdout on page 2-48.
• 2.7 Filename suffixes recognized by the compiler on page 2-49.
• 2.8 Compiler output files on page 2-51.
• 2.9 Factors influencing how the compiler searches for header files on page 2-52.
• 2.10 Compiler command-line options and search paths on page 2-53.
• 2.11 Compiler search rules and the current place on page 2-54.
• 2.12 The ARMCC5INC environment variable on page 2-55.
• 2.13 Code compatibility between separately compiled and assembled modules on page 2-56.
• 2.14 Using GCC fallback when building Linux applications on page 2-57.
• 2.15 Linker feedback during compilation on page 2-59.
• 2.16 Unused function code on page 2-60.
• 2.17 Minimizing code size by eliminating unused functions during compilation on page 2-61.
• 2.18 Compilation build time on page 2-62.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-36

Non-Confidential
2 Getting Started with the Compiler
2.1 Compiler command-line syntax

2.1 Compiler command-line syntax

Use the armcc command from the command-line to invoke the compiler. Specify the source files you
want to compile, together with any options you need to control compiler behavior.
The command for invoking the compiler is:
armcc [options] [source]

where:
options
are compiler command-line options that affect the behavior of the compiler.
source
provides the filenames of one or more text files containing C or C++ source code. By default,
the compiler looks for source files and creates output files in the current directory.
If a source file is an assembly file, that is, one with an extension of .s, the compiler activates the
ARM assembler to process the source file.
When you invoke the compiler, you normally specify one or more source files. However, a
minority of compiler command-line options do not require you to specify a source file. For
example, armcc --version_number.
The compiler accepts one or more input files, for example:
armcc -c [options] input_file_1 ... input_file_n

Specifying a dash - for an input file causes the compiler to read from stdin. To specify that all
subsequent arguments are treated as filenames, not as command switches, use the POSIX option --.
The -c option instructs the compiler to perform the compilation step, but not the link step.

Related concepts
2.2 Compiler command-line options listed by group on page 2-38.

Related references
8.21 -c on page 8-344.

Related information
Rules for specifying command-line options.
Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-37

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

2.2 Compiler command-line options listed by group

This topic lists the compiler command-line options, ordered by functional group.

Note
The following characters are interchangeable:
• Nonprefix hyphens and underscores. For example, --version_number and --version-number.
• Equals signs and spaces. For example, armcc --cpu=list and armcc --cpu list.
This applies to all tools provided with the compiler.

The compiler command-line options are as follows:

Help
• --echo
• --help
• --show_cmdline
• --version_number
• --vsn

Source languages
• --c90
• --c99
• --compile_all_input, --no_compile_all_input
• --cpp
• --cpp11
• --cpp_compat
• --gnu
• --strict, --no_strict
• --strict_warnings
Search paths
• -Idir[,dir,...]
• -Jdir[,dir,...]
• --kandr_include
• --preinclude=filename
• --reduce_paths, --no_reduce_paths
• --sys_include
• --ignore_missing_headers
Precompiled headers
• --create_pch=filename
• --pch
• --pch_dir=dir
• --pch_messages, --no_pch_messages
• --pch_verbose, --no_pch_verbose
• --use_pch=filename

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-38

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

Preprocessor
• -C
• --code_gen, --no_code_gen
• -Dname[(parm-list)][=def]
• -E
• -M
• --old_style_preprocessing
• -P
• --preprocess_assembly
• --preprocessed
• -Uname
C++
• --allow_null_this
• --anachronisms, --no_anachronisms
• --dep_name, --no_dep_name
• --export_all_vtbl, --no_export_all_vtbl
• --force_new_nothrow, --no_force_new_nothrow
• --friend_injection, --no_friend_injection
• --guiding_decls, --no_guiding_decls
• --implicit_include, --no_implicit_include
• --implicit_include_searches, --no_implicit_include_searches
• --implicit_typename, --no_implicit_typename
• --nonstd_qualifier_deduction, --no_nonstd_qualifier_deduction
• --old_specializations, --no_old_specializations
• --parse_templates, --no_parse_templates
• --pending_instantiations=n
• --rtti, --no_rtti
• --rtti_data
• --type_traits_helpers
• --using_std, --no_using_std
• --vfe, --no_vfe

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-39

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

Output format
• --asm
• --asm_dir
• -c
• --default_extension=ext
• --depend=filename
• --depend_dir
• --depend_format=string
• --depend_single_line
• --depend_system_headers, --no_depend_system_headers
• --depend_target
• --errors
• --info=totals
• --interleave
• --list
• --list_dir
• --list_macros
• --md
• --mm
• -o filename
• --output_dir
• --phony_targets
• -S
• --split_sections
Target architectures and processors
• --arm
• --arm_only
• --compatible=name
• --cpu=list
• --cpu=name
• --fpu=list
• --fpu=name
• --thumb
Floating-point support
• --fp16_format=format
• --fpmode=model
• --fpu=list
• --fpu=name
Debug
• --debug, --no_debug
• --debug_macros, --no_debug_macros
• --dwarf2
• --dwarf3
• -g
• --remove_unneeded_entities, --no_remove_unneeded_entities
• --emit_frame_directives

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-40

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

Code generation
• --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata
• --alternative_tokens, --no_alternative_tokens
• --bigend
• --bitband
• --branch_tables
• --bss_threshold=num
• --conditionalize, --no_conditionalize
• --default_definition_visibility
• --dllexport_all, --no_dllexport_all
• --dllimport_runtime, --no_dllimport_runtime
• --dollar, --no_dollar
• --enum_is_int
• --exceptions, --no_exceptions
• --exceptions_unwind, --no_exceptions_unwind
• --execute_only
• --float_literal_pools
• --export_all_vtbl, --no_export_all_vtbl
• --export_defs_implicitly, --no_export_defs_implicitly
• --extended_initializers, --no_extended_initializers
• --global_reg
• --gnu_defaults
• --gnu_instrument
• --gnu_version
• --hide_all, --no_hide_all
• --implicit_key_function
• --import_all_vtbl
• --integer_literal_pools
• --interface_enums_are_32_bit
• --littleend
• --locale=lang_country
• --long_long
• --loose_implicit_cast
• --message_locale=lang_country[.codepage]
• --min_array_alignment=opt
• --multibyte_chars, --no_multibyte_chars
• --multiply_latency
• --narrow_volatile_bitfields
• --pointer_alignment=num
• --protect_stack, --no_protect_stack
• --restrict, --no_restrict
• --relaxed_ref_def
• --share_inlineable_strings
• --signed_bitfields, --unsigned_bitfields
• --signed_chars, --unsigned_chars
• --split_ldm
• --string_literal_pools
• --trigraphs
• --unaligned_access, --no_unaligned_access
• --use_frame_pointer
• --vectorize, --no_vectorize
• --visibility_inlines_hidden
• --vla, --no_vla
• --wchar
• --wchar16
• --wchar32

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-41

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

Optimization
• --autoinline, --no_autoinline
• --data_reorder, --no_data_reorder
• --forceinline
• --fpmode=model
• --inline, --no_inline
• --library_interface=lib
• --library_type=lib
• --loop_optimization_level=opt
• --lower_ropi, --no_lower_ropi
• --lower_rwpi, --no_lower_rwpi
• --multifile, --no_multifile
• -Onum
• -Ospace
• -Otime
• --reassociate_saturation
• --retain=option
• --whole_program

Note
Optimization options can limit the debug information generated by the compiler.

Diagnostics
• --brief_diagnostics, --no_brief_diagnostics
• --diag_error=tag[,tag,...]
• --diag_remark=tag[,tag,...]
• --diag_style={arm|ide|gnu}
• --diag_suppress=tag[,tag,...]
• --diag_suppress=optimizations
• --diag_warning=tag[,tag,...]
• --diag_warning=optimizations
• --errors=filename
• --link_all_input
• --remarks
• -W
• --wrap_diagnostics, --no_wrap_diagnostics
Command-line options in a text file
• --via=filename
Linker feedback
• --feedback=filename
Procedure call standard
• --apcs=qualifier...qualifier
Passing options to other tools
• -Aopt
• -Lopt

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-42

Non-Confidential
2 Getting Started with the Compiler
2.2 Compiler command-line options listed by group

ARM Linux
• --arm_linux
• --arm_linux_configure
• --arm_linux_config_file=path
• --arm_linux_paths
• --configure_gas
• --configure_gcc=path
• --configure_gcc_version
• --configure_gld=path
• --configure_sysroot=path
• --configure_cpp_headers=path
• --configure_extra_includes=paths
• --configure_extra_libraries=paths
• --execstack
• --shared
• --translate_g++
• --translate_gcc
• --translate_gld
• --use_gas
• -Warmcc,option[,option,...]
• -Warmcc,--gcc_fallback

Related concepts
2.4 Order of compiler command-line options on page 2-45.

Related references
Chapter 8 Compiler Command-line Options on page 8-312.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-43

Non-Confidential
2 Getting Started with the Compiler
2.3 Default compiler behavior

2.3 Default compiler behavior

By default, the compiler determines the source language by examining the source filename extension.
For example, filename.c indicates C, while filename.cpp indicates C++03, although the command-
line options --c90, --c99, --cpp, and --cpp11 let you override this.
The default compiler target instruction set depends on the target processor (--cpu=name):
• For processors that support ARM instructions, the default instruction set is ARM. Use the --thumb
command-line option to specify Thumb®.
• For processors that do not support ARM instructions, the default instruction set is Thumb.
When you compile multiple files with a single command, all files must be of the same type, either C or
C++. The compiler cannot switch the language based on the file extension. The following example
produces an error because the specified source files have different languages:
armcc -c test1.c test2.cpp

If you specify files with conflicting file extensions you can force the compiler to compile both files for C
or for C++, regardless of file extension. For example:
armcc -c --cpp test1.c test2.cpp

Where an unrecognized extension begins with .c, for example, filename.cmd, an error message is
generated.
Support for processing Precompiled Header (PCH) files is not available when you specify multiple
source files in a single compilation. If you request PCH processing and specify more than one primary
source file, the compiler issues an error message, and aborts the compilation.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

armcc can in turn invoke armasm and armlink. For example, if your source code contains embedded
assembly code, armasm is called. armcc searches for the armasm and armlink binaries in the following
locations, in this order:
1. The same location as armcc.
2. The PATH locations.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
2.4 Order of compiler command-line options on page 2-45.
2.9 Factors influencing how the compiler searches for header files on page 2-52.
2.11 Compiler search rules and the current place on page 2-54.
2.12 The ARMCC5INC environment variable on page 2-55.
2.2 Compiler command-line options listed by group on page 2-38.
2.1 Compiler command-line syntax on page 2-37.

Related tasks
2.5 Using stdin to input source code to the compiler on page 2-46.

Related references
2.7 Filename suffixes recognized by the compiler on page 2-49.
2.8 Compiler output files on page 2-51.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-44

Non-Confidential
2 Getting Started with the Compiler
2.4 Order of compiler command-line options

2.4 Order of compiler command-line options

In general, compiler command-line options can appear in any order in a single compiler invocation.
However, the effects of some options depend on the order they appear in the command line and how they
are combined with other related options.
The compiler enables you to use multiple options even where these might conflict. This means that you
can append new options to an existing command line, for example, in a makefile or a via file.
Where options override previous options on the same command line, the last option specified always
takes precedence. For example:
armcc -O1 -O2 -Ospace -Otime ...

is executed by the compiler as:

armcc -O2 -Otime

You can use the environment variable ARMCC5_CCOPT to specify compiler command-line options. Options
specified on the command line take precedence over options specified in the environment variable.
To see how the compiler has processed the command line, use the --show_cmdline option. This shows
nondefault options that the compiler used. The contents of any via files are expanded. In the example
used here, although the compiler executes armcc -O2 -Otime, the output from --show_cmdline does
not include -O2. This is because -O2 is the default optimization level, and --show_cmdline does not
show options that apply by default.

Related concepts
2.2 Compiler command-line options listed by group on page 2-38.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-45

Non-Confidential
2 Getting Started with the Compiler
2.5 Using stdin to input source code to the compiler

2.5 Using stdin to input source code to the compiler

Instead of creating a file for your source code, you can use stdin to input source code directly on the
command line.
This is useful if you want to test a short piece of code without having to create a file for it.

Procedure
1. Invoke the compiler with the command-line options you want to use. The default compiler mode is C.
Use the minus character (-) as the source filename to instruct the compiler to take input from stdin.
For example:
armcc --bigend -c -

If you want an object file to be written, use the -o option. If you want preprocessor output to be sent
to the output stream, use the -E option. If you want the output to be sent to stdout, use the -o-
option. If you want an assembly listing of the keyboard input to be sent to the output stream after
input has been terminated, use none of these options.
2. You cannot input on the same line after the minus character. You must press the return key if you
have not already done so.
The command prompt waits for you to enter more input.
3. Enter your input. For example:
#include <stdio.h>
int main(void)
{ printf("Hello world\n"); }

4. Terminate your input:

• Press Ctrl+Z then Return on Microsoft Windows systems.
• Press Ctrl+D on Red Hat Linux systems.
An assembly listing for the keyboard input is sent to the output stream after input has been terminated if
both the following are true:
• No output file is specified.
• No preprocessor-only option is specified, for example -E.
Otherwise, an object file is created or preprocessor output is sent to the standard output stream,
depending on whether you used the -o option or the -E option.
The compiler accepts source code from the standard input stream in combination with other files, when
performing a link step. For example, the following are permitted:
• armcc -o output.axf - object.o mylibrary.a
• armcc -o output.axf --c90 source.c -
Executing the following command compiles the source code you provide on standard input, and links it
into test.axf:
armcc -o test.axf -

You can only combine standard input with other source files when you are linking code. If you attempt to
combine standard input with other source files when not linking, the compiler generates an error.

Related concepts
2.1 Compiler command-line syntax on page 2-37.
2.2 Compiler command-line options listed by group on page 2-38.

Related information
Rules for specifying command-line options.
Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-46

Non-Confidential
2 Getting Started with the Compiler
2.5 Using stdin to input source code to the compiler

Rules for specifying command-line options.

Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-47

Non-Confidential
2 Getting Started with the Compiler
2.6 Directing output to stdout

2.6 Directing output to stdout

If you want output to be sent to the standard output stream, use the -o- option.
For example:
armcc -c -o- hello.c

This outputs an assembly listing of the source code to stdout.

To send preprocessor output to stdout, use the -E option.

Related concepts
2.1 Compiler command-line syntax on page 2-37.
2.2 Compiler command-line options listed by group on page 2-38.

Related information
Rules for specifying command-line options.
Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-48

Non-Confidential
2 Getting Started with the Compiler
2.7 Filename suffixes recognized by the compiler

2.7 Filename suffixes recognized by the compiler

The compiler uses filename suffixes to identify the classes of file involved in compilation and in the link
stage.
The filename suffixes recognized by the compiler are described in the following table.
Note
C and C++ source file suffixes imply a default language variety. For example, the .c suffix implies --
c90. Explicitly specifying --c90, --c99, --cpp, or --cpp11 overrides the implied defaults for these
filename suffixes.

Table 2-1 Filename suffixes recognized by the compiler

Suffix Description Usage notes

.c C source file Implies --c90

.C C or C++ source file On UNIX platforms, implies --cpp. On non-UNIX platforms, implies --c90.

.cpp C++ source file Implies --cpp

.c++ The compiler uses the suffixes .cc and .CC to identify files for implicit inclusion.
.cxx
.cc
.CC

.d Dependency list file .d is the default output filename suffix for files output using the --md option.

.h C or C++ header file -

.i C or C++ source file A C or C++ file that has already been preprocessed, and is to be compiled without
additional preprocessing.
.ii C++ source file A C++ file that has already been preprocessed, and is to be compiled without additional
preprocessing.
.lst Error and warning list file .lst is the default output filename suffix for files output using the --list option.

.a ARM, Thumb, or mixed ARM -

and Thumb object file or library.
.lib
.o
.obj
.so

.pch Precompiled header file .pch is the default output filename suffix for files output using the --pch option.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05
onwards on all platforms. Note that ARM Compiler on Windows 8 never supported PCH
files.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-49

Non-Confidential
2 Getting Started with the Compiler
2.7 Filename suffixes recognized by the compiler

Table 2-1 Filename suffixes recognized by the compiler (continued)

Suffix Description Usage notes

.s ARM, Thumb, or mixed ARM For files in the input file list suffixed with .s, the compiler invokes the assembler,
and Thumb assembly language armasm, to assemble the file.
source file.
.s is the default output filename suffix for files output using either the option -S or --
asm.

.S ARM, Thumb, or mixed ARM On UNIX platforms, for files in the input file list suffixed with .S, the compiler
and Thumb assembly language preprocesses the assembly source before passing that source to the assembler.
source file.
On non-UNIX platforms, .S is equivalent to .s. That is, preprocessing is not performed.

.sx ARM, Thumb, or mixed ARM For files in the input file list suffixed with .sx, the compiler preprocesses the assembly
and Thumb assembly language source before passing that source to the assembler.
source file.
.txt Text file .txt is the default output filename suffix for files output using the -S or --asm option in
combination with the --interleave option.

Related references
8.7 --arm on page 8-326.
8.111 --interleave on page 8-442.
8.118 --list on page 8-450.
8.129 --md on page 8-462.
8.148 --pch on page 8-484.
8.168 -S on page 8-504.
8.29 --compile_all_input, --no_compile_all_input on page 8-353.
11.9 Template instantiation in ARM C++ on page 11-809.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-50

Non-Confidential
2 Getting Started with the Compiler
2.8 Compiler output files

2.8 Compiler output files

By default, output files created by the compiler are located in the current directory. Object files are
written in ARM ELF.

Related information
ELF for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-51

Non-Confidential
2 Getting Started with the Compiler
2.9 Factors influencing how the compiler searches for header files

2.9 Factors influencing how the compiler searches for header files
Several factors influence how the compiler searches for #include header files and source files.
• The value of the environment variable ARMCC5INC.
• The value of the environment variable ARMINC.
• The -I and -J compiler options.
• The --kandr_include and --sys_include compiler options.
• Whether the filename is an absolute filename or a relative filename.
• Whether the filename is between angle brackets or double quotes.

Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.11 Compiler search rules and the current place on page 2-54.

Related references
2.10 Compiler command-line options and search paths on page 2-53.
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.

Related information
Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-52

Non-Confidential
2 Getting Started with the Compiler
2.10 Compiler command-line options and search paths

2.10 Compiler command-line options and search paths

The following table shows how the specified compiler command-line options affect the search path used
by the compiler when it searches for header and source files.

Table 2-2 Include file search paths

Compiler option <include> search order "include" search order

Neither -Idir[,dir,...] 1. ARMCC5INC 1. The current place on page 2-54.

nor -Jdir[,dir,...] 2. ARMINC 2. ARMCC5INC
3. ../include 3. ARMINC
4. ../include

-Idir[,dir,...] 1. ARMCC5INC 1. The current place on page 2-54.

2. ARMINC 2. The directory or directories specified by -
3. ../include Idir[,dir,...].
4. The directory or directories specified 3. ARMCC5INC
by -Idir[,dir,...] . 4. ARMINC
5. ../include

-Jdir[,dir,...] The directory or directories specified by - 1. The current place on page 2-54.
Jdir[,dir,...]. 2. The directory or directories specified by -
Jdir[,dir,...].

Both -Idir[,dir,...] 1. The directory or directories specified 1. The current place on page 2-54.
and -Jdir[,dir,...] by -Jdir[,dir,...]. 2. The directory or directories specified by -
2. The directory or directories specified Idir[,dir,...].
by -Idir[,dir,...]. 3. The directory or directories specified by -
Jdir[,dir,...].

--sys_include No effect. Removes the current place on page 2-54 from the search
path.
--kandr_include No effect. Uses Kernighan and Ritchie search rules.

Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.11 Compiler search rules and the current place on page 2-54.
2.9 Factors influencing how the compiler searches for header files on page 2-52.

Related references
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-53

Non-Confidential
2 Getting Started with the Compiler
2.11 Compiler search rules and the current place

2.11 Compiler search rules and the current place

By default, the compiler uses Berkeley UNIX search rules, so source files and #include header files are
searched for relative to the current place. The current place is the directory containing the source or
header file currently being processed by the compiler.
When a file is found relative to an element of the search path, the directory containing that file becomes
the new current place. When the compiler has finished processing that file, it restores the previous
current place. At each instant there is a stack of current places corresponding to the stack of nested
#include directives. For example, if the current place is the include directory ...\include, and the
compiler is seeking the include file sys\defs.h, it locates ...\include\sys\defs.h if it exists. When
the compiler begins to process defs.h, the current place becomes ...\include\sys. Any file included
by defs.h that is not specified with an absolute path name, is searched for relative to ...\include\sys.
The original current place ...\include is restored only when the compiler has finished processing
defs.h.

You can disable the stacking of current places by using the compiler option --kandr_include. This
option makes the compiler use Kernighan and Ritchie search rules whereby each nonrooted user
#include is searched for relative to the directory containing the source file that is being compiled.

Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.9 Factors influencing how the compiler searches for header files on page 2-52.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-54

Non-Confidential
2 Getting Started with the Compiler
2.12 The ARMCC5INC environment variable

2.12 The ARMCC5INC environment variable

The ARMCC5INC environment variable points to the location of the included header and source files that
are provided with the compilation tools.
This variable might be initialized with the correct path to the header files when the ARM compilation
tools are installed or when configured with server modules. You can change this variable, but you must
ensure that any changes you make do not break the installation.
The list of directories specified by the ARMCC5INC environment variable is comma separated.
If you want to include files from other locations, use the -I and -J command-line options as required.
When compiling, directories specified with ARMCC5INC are searched immediately after directories
specified by the -I option have been searched, for user include files.
If you use the -J option, ARMCC5INC is ignored.

Related concepts
2.11 Compiler search rules and the current place on page 2-54.
2.9 Factors influencing how the compiler searches for header files on page 2-52.

Related information
Toolchain environment variables.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-55

Non-Confidential
2 Getting Started with the Compiler
2.13 Code compatibility between separately compiled and assembled modules

2.13 Code compatibility between separately compiled and assembled modules

By writing code that adheres to the ARM Architecture Procedure Call Standard (AAPCS), you can
ensure that separately compiled and assembled modules can work together.
The AAPCS forms part of the Base Standard Application Binary Interface for the ARM Architecture
specification.
Interworking qualifiers associated with the --apcs compiler command-line option control interworking.
Position independence qualifiers, also associated with the --apcs compiler command-line option, control
position independence, and affect the creation of reentrant and thread-safe code.
Note
This does not mean that you must use the same --apcs command-line options to get your modules to
work together. You must be familiar with the AAPCS.

Related references
8.6 --apcs=qualifier...qualifier on page 8-322.

Related information
Procedure Call Standard for the ARM Architecture.
ARM C libraries and multithreading.
BPABI and SysV Shared Libraries and Executables.
Interworking ARM and Thumb.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-56

Non-Confidential
2 Getting Started with the Compiler
2.14 Using GCC fallback when building Linux applications

2.14 Using GCC fallback when building Linux applications

When building Linux applications developed to build with GCC, there might be cases when the ARM
Compiler toolchain cannot complete the build successfully, because of unsupported GCC-specific
functionality. For such cases, GCC fallback can invoke the GCC toolchain to complete the build.
GCC fallback supports the CodeSourcery 2010Q1 GNU toolchain release.
You cannot use GCC fallback to build the Linux kernel, only user applications.
To specify GCC fallback, include the compiler option -Warmcc,--gcc_fallback. GCC is invoked with
the same GCC-style command-line options that are given to armcc. Therefore, GCC fallback has the
same effect as the following shell script:
armcc $myflags
if found-gcc-specific-coding; then
gcc $myflags
endif

The whole build is still driven by the build script, makefile, or other infrastructure you are using, and that
does not change. For example, a single compile step might fail when armcc tries to compile that step.
armcc then attempts to perform that single compile step with gcc. If a link step fails, armcc attempts to
perform the link with the GCC toolchain, using GNU ld. When armcc performs a compile or link step,
the include paths, library paths, and Linux libraries it uses are identified in the ARM Linux configuration
file. For fallback, you must either:
• Use the --arm_linux_config_file compiler option to produce the configuration file by configuring
armcc against an existing gcc.
• Provide an explicit path to gcc if you are specifying other configuration options manually.
The GCC toolchain used for fallback is the one that the configuration was created against. Therefore, the
paths and libraries used by armcc and gcc must be equivalent.
If armcc invokes GCC fallback, a warning message is displayed. If gcc also fails, an additional error is
displayed, otherwise you get a message indicating that gcc succeeded. You also see the original error
messages from armcc to inform you of the source file or files that failed to compile, and the cause of the
problem.
Note
• There is no change to what the ARM Compiler tools link with when using GCC fallback. That is, the
tools only link with whatever gcc links with, as identified in the configuration file generated with the
--arm_linux_config_file compiler option. Therefore, it is your responsibility to ensure that
licenses are adhered to, and in particular to check what you are linking with. You might have to
explicitly override this if necessary. To do this, include the GNU options -nostdinc,
-nodefaultlibs, and -nostdlib on the armcc command line.
• armcc invokes the GNU tools in a separate process.
• armcc does not optimize any code in any GCC intermediate representations.

To see the commands that are invoked during GCC fallback, specify the -Warmcc,--echo command-line
option.
The following figure shows a high-level view of the GCC fallback process:

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-57

Non-Confidential
2 Getting Started with the Compiler
2.14 Using GCC fallback when building Linux applications

Object files
Source files for
and libraries
compilation
for linking

.c, .cpp, .o, .a,

*.s *.so

armcc
driver Error: Pass
command-line
across GCC
ARM Compiler toolchain GCC toolchain
driver

Compile Compile
step step Error

Link Step Link Step

Object
files Stop

Executable
image

Figure 2-1 GCC fallback process diagram

Related references
8.9 --arm_linux_config_file=path on page 8-329.
8.10 --arm_linux_configure on page 8-330.
8.71 --echo on page 8-399.
8.200 -Warmcc,option[,option,...] on page 8-542.
8.201 -Warmcc,--gcc_fallback on page 8-543.

Related information
GNU Compiler Collection, http://gcc.gnu.org.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-58

Non-Confidential
2 Getting Started with the Compiler
2.15 Linker feedback during compilation

2.15 Linker feedback during compilation

The compiler can use feedback files produced by the linker to optimize code generation.
Feedback from the linker to the compiler enables:
• Efficient elimination of unused functions.
• Reduction of compilation required for interworking.

Related concepts
2.16 Unused function code on page 2-60.

Related tasks
2.17 Minimizing code size by eliminating unused functions during compilation on page 2-61.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-59

Non-Confidential
2 Getting Started with the Compiler
2.16 Unused function code

2.16 Unused function code

Unused function code can unnecessarily increase code size. Feedback from the linker to the compiler can
remove unused function code, minimizing code size.
Unused function code might occur in the following situations.
• Where you have legacy functions that are no longer used in your source code. Rather than manually
remove the unused function code from your source code, you can use linker feedback to remove the
unused object code automatically from the final image.
• Where a function is inlined. Where an inlined function is not declared as static, the out-of-line
function code is still present in the object file, but there is no longer a call to that code.
In addition, the linker can detect when an ARM function is being called from a Thumb state, and when a
Thumb function is being called from an ARM state. You can use feedback from the linker to avoid
compiling functions for interworking that are never used in an interworking context.
Note
Reduction of compilation required for interworking is only applicable to ARMv4T architectures.
ARMv5T and later processors can interwork without penalty.

The linker option --feedback=filename creates a feedback file, and the --feedback_type option
controls the different types of feedback generated.

Related tasks
2.17 Minimizing code size by eliminating unused functions during compilation on page 2-61.

Related references
2.15 Linker feedback during compilation on page 2-59.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-60

Non-Confidential
2 Getting Started with the Compiler
2.17 Minimizing code size by eliminating unused functions during compilation

2.17 Minimizing code size by eliminating unused functions during compilation

Feedback from the linker to the compiler enables efficient elimination of unused functions.

Procedure
1. Compile your source code.
2. Use the linker option --feedback=filename to create a feedback file.
3. Use the linker option --feedback_type to control which feedback the linker generates.
By default, the linker generates feedback to eliminate unused functions. This is equivalent to
--feedback_type=unused,noiw. The linker can also generate feedback to avoid compiling functions
for interworking that are never used in an interworking context. Use the linker option
--feedback_type=unused,iw to eliminate both types of unused function.
Note
Reduction of compilation required for interworking is only applicable to ARMv4T architectures.
ARMv5T and later processors can interwork without penalty.

4. Re-compile using the compiler option --feedback=filename to feed the feedback file to the
compiler.
The compiler uses the feedback file generated by the linker to compile the source code in a way that
enables the linker to subsequently discard the unused functions.
Note
To obtain maximum benefit from linker feedback, do a full compile and link at least twice. A single
compile and link using feedback from a previous build is normally sufficient to obtain some benefit.

Note
Always ensure that you perform a full clean build immediately before using the linker feedback file. This
minimizes the risk of the feedback file becoming out of date with the source code it was generated from.

You can specify the --feedback=filename option even when no feedback file exists. This enables you
to use the same build commands or makefile regardless of whether a feedback file exists, for example:
armcc -c --feedback=unused.txt test.c -o test.o
armlink --feedback=unused.txt test.o -o test.axf

The first time you build the application, it compiles normally but the compiler warns you that it cannot
read the specified feedback file because it does not exist. The link command then creates the feedback
file and builds the image. Each subsequent compilation step uses the feedback file from the previous link
step to remove any unused functions that are identified.

Related concepts
2.16 Unused function code on page 2-60.

Related references
2.15 Linker feedback during compilation on page 2-59.
8.82 --feedback=filename on page 8-410.

Related information
--feedback_type=type linker option.
About linker feedback.
Interworking ARM and Thumb.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-61

Non-Confidential
2 Getting Started with the Compiler
2.18 Compilation build time

2.18 Compilation build time

Compilation build time is affected by the compiler optimizations you use and the applications running on
your host platform.
This section contains the following subsections:
• 2.18.1 Compilation build time on page 2-62.
• 2.18.2 Minimizing compilation build time on page 2-63.
• 2.18.3 Minimizing compilation build time with a single armcc invocation on page 2-64.
• 2.18.4 Effect of --multifile on compilation build time on page 2-64.
• 2.18.5 Minimizing compilation build time with parallel make on page 2-65.
• 2.18.6 Compilation build time and operating system choice on page 2-65.

2.18.1 Compilation build time

Modern software applications can comprise many thousands of source code files. These files can take a
considerable amount of time to compile. The many different techniques that the ARM compilation tools
use to optimize for small code size and high performance can also increase build time.
When you invoke the compiler, the following steps occur:
1. The compiler loads and begins to execute.
2. The compiler tries to obtain a license.
3. The compiler compiles your code.
Loading and beginning to execute the compiler normally takes a fixed period of time.
The time taken to obtain a license does not generally vary if a license is available. However, if a floating
license is being used, the time taken to obtain a license depends on network traffic and whether or not a
license is free on the server. In most cases, rather than terminate with error if a license is not immediately
available, the compiler waits for a license to become available.
The process of obtaining a floating license is more involved than obtaining a node-locked license. With a
node-locked license, the compiler only has to parse the file to check that there is a valid license. With a
floating license, the compiler has to check where the license is, send a message through the TCP/IP
stacks over the network to the server, then wait for a response. When the compiler receives the response,
it then has to check whether or not it has been granted a license. When the compilation is complete, the
license has to be returned back to the server.
Floating licenses provide flexibility, but at the cost of speed. If speed is your priority, consider obtaining
node-locked licenses for your build machines, or some node-locked licenses locked to USB network
cards that can be moved between machines as required.
Setting the environment variable TCP_NODELAY to 1 improves FlexNet license server system
performance when processing license requests. However, you must use this with caution, because it
might cause an increase in network traffic.
The time taken to compile your code depends on the size and complexity of the file being compiled.
Compiling a small number of large files might be quicker than compiling a larger number of small files.
This is because the longer compilation time per file might be offset by the smaller amount of time spent
loading and unloading the compiler and obtaining licenses.

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-62

Non-Confidential
2 Getting Started with the Compiler
2.18 Compilation build time

2.18.6 Compilation build time and operating system choice on page 2-65.
5.13 Methods of reducing debug information in objects and libraries on page 5-170.

Related information
Optimizing license checkouts from a floating license server.
Licensed features of ARM Compiler.

2.18.2 Minimizing compilation build time

There are a number of actions you can take to minimize how long the compiler takes to compile your
source code.
These actions include:
• Avoid compiling at -O3 level. -O3 gives maximum optimization in the code that is generated, but can
result in longer build times to achieve such results.
• Minimize the amount of debug information the compiler generates.
• Guard against multiple inclusion of header files.
• Use the restrict keyword if you can safely do so, to avoid the compiler having to do compile-time
checks for pointer aliasing.
• Try to keep the number of include paths to a minimum. If you have many include paths, ensure that
the files you include most often exist in directories near the start of the include search path.
• Try compiling a small number of large files instead of a large number of small files. The longer
compilation time per file might be offset by less time spent unloading and unloading the compiler and
obtaining licenses, particularly if using floating licenses.
• Try compiling multiple files within a single invocation of armcc (and single license checkout),
instead of multiple armcc invocations.
• Floating licenses provide flexibility, but at the cost of speed. Consider obtaining node-locked licenses
for your build machines, or some node-locked licenses locked to USB network cards that can be
moved between machines as required.
• Consider using or avoiding --multifile compilation, depending on the resulting build time.
Note
— In RVCT 4.0, if you compile with -O3, --multifile is enabled by default.
— In ARM Compiler 4.1 and later, --multifile is disabled by default, regardless of the
optimization level.

• If you are using a makefile-based build environment, consider using a make tool that can apply some
form of parallelism.
• Consider your choice of operating system for cross-compilation. Linux generally gives better build
speed than Windows, but there are general performance-tuning techniques you can apply on
Windows that might help improve build times.

Related concepts
2.18.1 Compilation build time on page 2-62.
4.24 Precompiled Header (PCH) files on page 4-134.
5.14 Guarding against multiple inclusion of header files on page 5-171.
3.15 Vectorization on loops containing pointers on page 3-84.

Related references
2.18.3 Minimizing compilation build time with a single armcc invocation on page 2-64.
2.18.4 Effect of --multifile on compilation build time on page 2-64.
2.18.5 Minimizing compilation build time with parallel make on page 2-65.
2.18.6 Compilation build time and operating system choice on page 2-65.
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
8.44 --create_pch=filename on page 8-371.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-63

Non-Confidential
2 Getting Started with the Compiler
2.18 Compilation build time

8.134 --multifile, --no_multifile on page 8-467.

8.139 -Onum on page 8-473.
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.164 --restrict, --no_restrict on page 8-500.

Related information
Licensed features of ARM Compiler.

2.18.3 Minimizing compilation build time with a single armcc invocation

Using a single armcc invocation rather than multiple invocations helps minimize compilation build time.
The following type of script incurs multiple loads and unloads of the compiler and multiple license
checkouts:
armcc file1.c ...
armcc file2.c ...
armcc file3.c ...

Instead, you can try modifying your script to compile multiple files within a single invocation of armcc.
For example, armcc file1.c file2.c file3.c ...
For convenience, you can also list all your .c files in a single via file invoked with
armcc -via sources.txt.

Although this mechanism can dramatically reduce license checkouts and loading and unloading of the
compiler to give significant improvements in build time, the following limitations apply:
• All files are compiled with the same options.
• Converting existing build systems could be difficult.
• Usability depends on source file structure and dependencies.
• An IDE might be unable to report which file had compilation errors.
• After detecting an error, the compiler does not compile subsequent files.

Related concepts
2.18.1 Compilation build time on page 2-62.

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

Related references
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
8.195 --via=filename on page 8-537.

Related information
Licensed features of ARM Compiler.

2.18.4 Effect of --multifile on compilation build time

When compiling with --multifile, the compiler might generate code with additional optimizations by
compiling across several source files to produce a single object file. These additional cross-source
optimizations can increase compilation time.
Conversely, if there is little additional optimization to apply, and only small amounts of code to check for
possible optimizations, then using --multifile to generate a single object file instead of several might

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-64

Non-Confidential
2 Getting Started with the Compiler
2.18 Compilation build time

reduce compilation time as a result of time recovered from creating (opening and closing) multiple object
files.
Note
• In RVCT 4.0, if you compile with -O3, --multifile is enabled by default.
• In ARM Compiler 4.1 and later, --multifile is disabled by default, regardless of the optimization
level.

Related concepts
2.18.1 Compilation build time on page 2-62.

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

Related references
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.

Related information
Licensed features of ARM Compiler.

2.18.5 Minimizing compilation build time with parallel make

If you are using a makefile-based build environment, you could consider using a make tool that can
apply some form of parallelism to minimize compilation build time.
For example, with GNU make you can typically use make -j N, where N is the number of compile
processes you want to have running in parallel.
Even on a single machine with a single processor, a performance boost can be achieved. This is because
running processes in parallel can hide the effects of network delays and general I/O accesses such as
loading and saving files to disk, by fully utilizing the processor during these times with another
compilation process.
If you have multiple processor machines, you can extend the use of parallelism with make -j N * M,
where M is the number of processors.

Related concepts
2.18.1 Compilation build time on page 2-62.

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

2.18.6 Compilation build time and operating system choice

Your choice of operating system can affect compilation build time.
Linux generally gives better build speeds than Windows.
However, if you are using Windows, there are ways to tune the performance of the operating system at a
general level. This might help with increasing the percentage of processor time that is being used for
your build.
At a simple level, turning off virus checking software can help, but an Internet search for "tune windows
performance" provides plenty of information.

Related concepts
2.18.1 Compilation build time on page 2-62.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-65

Non-Confidential
2 Getting Started with the Compiler
2.18 Compilation build time

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

Related information
On what platforms will my ARM development tools work?.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 2-66

Non-Confidential
Chapter 3
Using the NEON Vectorizing Compiler

Introduces the NEON unit and explains how to take advantage of automatic vectorizing features.
It contains the following sections:
• 3.1 NEON technology on page 3-69.
• 3.2 The NEON unit on page 3-70.
• 3.3 Methods of writing code for NEON on page 3-72.
• 3.4 Generating NEON instructions from C or C++ code on page 3-73.
• 3.5 NEON C extensions on page 3-74.
• 3.6 Automatic vectorization on page 3-75.
• 3.7 Data references within a vectorizable loop on page 3-76.
• 3.8 Stride patterns and data accesses on page 3-77.
• 3.9 Factors affecting NEON vectorization performance on page 3-78.
• 3.10 NEON vectorization performance goals on page 3-79.
• 3.11 Recommended loop structure for vectorization on page 3-80.
• 3.12 Data dependency conflicts when vectorizing code on page 3-81.
• 3.13 Carry-around scalar variables and vectorization on page 3-82.
• 3.14 Reduction of a vector to a scalar on page 3-83.
• 3.15 Vectorization on loops containing pointers on page 3-84.
• 3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
• 3.17 Nonvectorization on conditional loop exits on page 3-86.
• 3.18 Vectorizable loop iteration counts on page 3-87.
• 3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
• 3.20 Grouping structure accesses for vectorization on page 3-91.
• 3.21 Vectorization and struct member lengths on page 3-92.
• 3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
• 3.23 Conditional statements and efficient vectorization on page 3-94.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-67

Non-Confidential
3 Using the NEON Vectorizing Compiler

• 3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
• 3.25 Vectorizable code example on page 3-97.
• 3.26 DSP vectorizable code example on page 3-99.
• 3.27 What can limit or prevent automatic vectorization on page 3-101.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-68

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.1 NEON technology

3.1 NEON technology

ARM NEON technology is the implementation of the Advanced SIMD architecture extension. It is a 64
and 128-bit hybrid SIMD technology targeted at advanced media and signal processing applications and
embedded processors.
NEON instructions are available in both ARM and Thumb code.
Note
Not all ARM processors support NEON technology. In particular, there is no NEON support for
architectures before ARMv7.

Note
The NEON register bank is shared with the VFP register bank.

Related concepts
3.2 The NEON unit on page 3-70.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-69

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.2 The NEON unit

3.2 The NEON unit

The NEON unit has a register bank of thirty-two 64-bit vector registers that can be operated on in
parallel.
The NEON unit can view the register bank as either:
• Sixteen 128-bit quadword registers, Q0 to Q15.
• Thirty-two 64-bit doubleword registers, D0 to D31.
These registers can then be operated on in parallel in the NEON unit. For example, in one vector add
instruction you can add eight 16-bit integers to eight other 16-bit integers to produce eight 16-bit results.
This is known as vectorization (or more specifically for NEON, Single Instruction Multiple Data (SIMD)
vectorization).
The NEON unit supports 8-bit, 16-bit and 32-bit integer operations, and some 64-bit operations, in
addition to single-precision (32-bit) floating point operations. It can operate on elements in groups of 2,
4, 8, or 16. (The Cortex-A9 processor also supports conversion to and from 16-bit floating-point
operations, which the compiler supports when --fp16_format is specified, from RVCT 4.0 and later,
and ARM Compiler 4.1 and later.)
Note
Vectorization of floating-point code does not always occur automatically. For example, loops that require
re-association only vectorize when compiled with --fpmode fast. Compiling with --fpmode fast
enables the compiler to perform some transformations that could affect the result.

The NEON unit is classified as a vector Single Instruction Multiple Data (SIMD) unit that operates on
multiple elements in a vector register by using one instruction.
For example, array A is a 16-bit integer array with 8 elements.

Table 3-1 Array A

1 2 3 4 5 6 7 8

Array B has the following 8 elements:

Table 3-2 Array B

80 70 60 50 40 30 20 10

To add these arrays together, fetch each vector into a vector register and use one vector SIMD instruction
to obtain the result.

Table 3-3 Result

81 72 63 54 45 36 27 18

The NEON unit can only deal with vectors that are stored consecutively in memory, so it is not possible
to vectorize indirect addressing.
When writing structures, be aware that NEON structure loads require the structure to contain equal-sized
members.

Related concepts
3.3 Methods of writing code for NEON on page 3-72.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-70

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.2 The NEON unit

Related tasks
3.4 Generating NEON instructions from C or C++ code on page 3-73.

Related references
8.86 --fp16_format=format on page 8-414.
8.87 --fpmode=model on page 8-415.
8.192 --vectorize, --no_vectorize on page 8-534.

Related information
Introducing NEON Development Article.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-71

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.3 Methods of writing code for NEON

3.3 Methods of writing code for NEON

You can use a number of different methods to write code for NEON.
These methods are as follows:
• Write in assembly language, or use embedded assembly language in C, and use the NEON
instructions directly.
• Write in C or C++ using the NEON C language extensions.
• Call a library routine that has been optimized to use NEON instructions.
• Use automatic vectorization to get loops vectorized for NEON.
Optimizing for performance requires an understanding of where in the program most of the time is spent.
To gain maximum performance benefits you might also have to use profiling and benchmarking of the
code under realistic conditions.

Related concepts
3.2 The NEON unit on page 3-70.
3.6 Automatic vectorization on page 3-75.
3.8 Stride patterns and data accesses on page 3-77.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.14 Reduction of a vector to a scalar on page 3-83.
3.15 Vectorization on loops containing pointers on page 3-84.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.

Related tasks
3.4 Generating NEON instructions from C or C++ code on page 3-73.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.5 NEON C extensions on page 3-74.
3.7 Data references within a vectorizable loop on page 3-76.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.27 What can limit or prevent automatic vectorization on page 3-101.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-72

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.4 Generating NEON instructions from C or C++ code

3.4 Generating NEON instructions from C or C++ code

To generate NEON instructions from C or C++ code, specify the following compiler options:
• A target --cpu that has NEON capability, for example Cortex-A7, Cortex-A8, Cortex-A9, Cortex-
A12, or Cortex-A15.
• --vectorize to enable NEON vectorization.
• -O2 (default) or -O3 optimization level.
• -Otime to optimize for performance instead of code size.

You can also use --diag_warning=optimizations to obtain useful diagnostics from the compiler on
what it can and cannot optimize or vectorize. For example:
armcc --cpu Cortex-A8 --vectorize -O3 -Otime --diag_warning=optimizations source.c

Note
To run code that contains NEON instructions, you must enable both the FPU and NEON.

Related concepts
5.5 Enabling NEON and FPU for bare-metal on page 5-158.

Related tasks
5.4 Selecting the target processor at compile time on page 5-157.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
3.5 NEON C extensions on page 3-74.

Related information
Licensed features of ARM Compiler.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-73

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.5 NEON C extensions

3.5 NEON C extensions

The NEON C extensions are a set of new data types and intrinsic functions defined by ARM to enable
access to the NEON unit from C.
Most of the vector functions map directly to vector instructions available in the NEON unit and are
compiled inline by the NEON enhanced ARM C compiler. With these extensions, performance at C level
can be comparable to performance obtained with assembly language coding.

Related concepts
3.3 Methods of writing code for NEON on page 3-72.

Related references
Chapter 18 Using NEON Support on page 18-923.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-74

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.6 Automatic vectorization

3.6 Automatic vectorization

Automatic vectorization involves the high-level analysis of loops in your code. This is the most efficient
way to map the majority of typical code onto the functionality of the NEON unit.
For most code, the gains that can be made with algorithm-dependent parallelism on a smaller scale are
very small relative to the cost of automatic analysis of such opportunities. For this reason, the NEON
unit is designed as a target for loop-based parallelism.
Vectorization is carried out in a way that ensures that optimized code gives the same results as
nonvectorized code. In certain cases, to avoid the possibility of an incorrect result, vectorization of a loop
is not carried out. This can lead to suboptimal code, and you might have to manually tune your code to
make it more suitable for automatic vectorization.
Automatic vectorization can also often be impeded by earlier manual optimization attempts, for example,
manual loop unrolling in the source code, or complex array accesses. For optimal results, it is best to
write code using simple loops, enabling the compiler to perform the optimization. For hand-optimized
legacy code, it can be easier to rewrite critical portions of the code based on the original algorithm using
simple loops.
By coding in vectorizable loops using NEON extensions instead of writing in explicit NEON
instructions, code portability is preserved between processors. Performance levels similar to that of hand
coded vectorization are achieved with less effort.

Related concepts
3.8 Stride patterns and data accesses on page 3-77.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.14 Reduction of a vector to a scalar on page 3-83.
3.15 Vectorization on loops containing pointers on page 3-84.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.7 Data references within a vectorizable loop on page 3-76.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.27 What can limit or prevent automatic vectorization on page 3-101.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-75

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.7 Data references within a vectorizable loop

3.7 Data references within a vectorizable loop

To vectorize, the compiler has to identify variables with a vector access pattern. It also has to ensure that
there are no data dependencies between different iterations of the loop.
Data references in your code can be classified as one of three types:
Scalar
A single value that does not change throughout all of the loop iterations.
Index
An integer quantity that increments by a constant amount each pass through the loop.
Vector
A range of memory locations with a constant stride between consecutive elements.
The following example shows the classification of variables in a loop:
i,j
index variables
a,b
vectors
n,x
scalar
float *a, *b;
int i, j, n, x;
...
for (i = 0; i < n; i++)
{
*(a+j) = x + b[i];
j += 2;
};

Related concepts
3.8 Stride patterns and data accesses on page 3-77.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-76

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.8 Stride patterns and data accesses

3.8 Stride patterns and data accesses

The stride pattern of data accesses in a loop is the pattern of accesses to data elements between sequential
loop iterations.
For example, a loop that linearly accesses each element of an array has a stride pattern, or a stride, of
one. A loop that accesses an array with a constant offset between each element used has a constant stride.
float *a, *b;
int i, j=0, n;
...
for (i = 0; i < n; i++)
{
/* a is accessed with a stride of 2. */
/* b is accessed with a stride of 1. */
*(a+j) = x + b[i];
j += 2;
};

Related references
3.7 Data references within a vectorizable loop on page 3-76.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-77

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.9 Factors affecting NEON vectorization performance

3.9 Factors affecting NEON vectorization performance

The automatic vectorization process and performance of the generated code is affected by a number of
criteria:
The way loops are organized
For best performance, the innermost loop in a loop nest must access arrays with a stride of one.
The way the data is structured
The data type dictates how many data elements can be held in a NEON register, and therefore
how many operations can be performed in parallel.
The iteration counts of loops
Longer iteration counts are generally better, because the loop overhead is reduced over more
iterations. Tiny iteration counts, such as two or three elements, can be faster to process with
nonvector instructions.
The data type of arrays
For example, NEON does not improve performance when double precision floating point arrays
are used.
The use of memory hierarchy
Most current processors are relatively unbalanced between memory bandwidth and processor
capacity. For example, performing relatively few arithmetic operations on large data sets
retrieved from main memory is limited by the memory bandwidth of the system.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-78

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.10 NEON vectorization performance goals

3.10 NEON vectorization performance goals

Most applications require tuning to gain the best performance from vectorization. There is always some
overhead so the theoretical maximum performance cannot be reached.
For example, the NEON unit can process four single-precision floats at one time. This means that the
theoretical maximum performance for a floating-point application is a factor of four over the original
scalar nonvectorized code.

Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-79

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.11 Recommended loop structure for vectorization

3.11 Recommended loop structure for vectorization

The overall structure of a loop is important for obtaining the best performance from vectorization.
Generally, it is best to write simple loops with iteration counts that are fixed at the start of the loop, and
that do not contain complex conditional statements or conditional exits. You might have to rewrite your
loops to improve the vectorization performance of the code.

Related references
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-80

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.12 Data dependency conflicts when vectorizing code

3.12 Data dependency conflicts when vectorizing code

A loop that has results from one iteration feeding back into a future iteration of the same loop is said to
have a data dependency conflict.
The conflicting values might be array elements or a scalar such as an accumulated sum.
Loops containing data dependency conflicts might not be completely optimized. Detecting data
dependencies involving arrays or pointers requires extensive analysis of the arrays used in each loop
nest. It also involves examination of the offset and stride of accesses to elements along each dimension
of arrays that are both used and stored in a loop. If there is a possibility of the usage and storage of arrays
overlapping on different iterations of a loop, then there is a data dependency problem. A loop cannot be
safely vectorized if the vector order of operations can change the results. In these cases, the compiler
detects the problem and leaves the loop in its original form or carries out a partial vectorization of the
loop. This type of data dependency must be avoided in your code to achieve the best performance.
In the loop shown below, the reference to a[i-2] at the top of the loop conflicts with the store into a[i]
at the bottom. Performing vectorization on this loop gives a result that differs from the result that is
obtained without vectorization, so it is left in its original form.
float a[99], b[99], t;
int i;
for (i = 3; i < 99; i++)
{
t = a[i-1] + a[i-2];
b[i] = t + 3.0 + a[i];
a[i] = sqrt(b[i]) - 5.0;
};

Information from other array subscripts is used as part of the analysis of dependencies. The loop in the
following example vectorizes because the nonvector subscripts of the references to array a can never be
equal. They can never be equal because n is not equal to n+1 and so gives no feedback between
iterations. The references to array a use two different pieces of the array, so they do not share data.
float a[99][99], b[99], c[99];
int i, n;
...
for (i = 1; i < 99; i++)
{
a[n][i] = a[n+1][i-1] * b[i] + c[i];
}

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-81

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.13 Carry-around scalar variables and vectorization

3.13 Carry-around scalar variables and vectorization

Scalar variables that are used and then set in a loop can cause problems for vectorization.
A scalar variable that is used but not set in a loop is replicated in each position in a vector register and
the replication is used in the vector calculation.
A scalar that is set and then used in a loop is promoted to a vector. These variables generally hold
temporary scalar values in a loop that now has to hold temporary vector values. In the following
example, x is a used scalar and y is a promoted scalar.
Vectorizable loop:
float a[99], b[99], x, y;
int i, n;
...
for (i = 0; i < n; i++)
{
y = x + b[i];
a[i] = y + 1/y;
};

A scalar that is used and then set in a loop is called a carry-around scalar. These variables are a problem
for vectorization because the value computed in one pass of the loop is carried forward into the next pass.
In the following example, x is a carry-around scalar.
Nonvectorizable loop
float a[99], b[99], x;
int i, n;
...
for (i = 0; i < n; i++)
{
a[i] = x + b[i];
x = a[i] + 1/x;
};

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-82

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.14 Reduction of a vector to a scalar

3.14 Reduction of a vector to a scalar

A special category of scalar use within loops is reduction operations. This category involves the
reduction of a vector of values down to a scalar result.
The most common reduction is the summation of all elements of a vector. Other reductions include:
• The dot product of two vectors.
• The maximum value in a vector.
• The minimum value in a vector.
• The product of all vector elements.
• The index of the maximum or minimum element of a vector.
The following example shows a dot product reduction where x is a reduction scalar.
float a[99], b[99], x;
int i, n;
...
for (i = 0; i < n; i++) x += a[i] * b[i];

Reduction operations are worth vectorizing because they occur so often. In general, reduction operations
are vectorized by creating a vector of partial reductions that is then reduced into the final resulting scalar.

Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-83

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.15 Vectorization on loops containing pointers

3.15 Vectorization on loops containing pointers

When accessing arrays, the compiler can often prove that memory accesses do not overlap. When using
pointers, this is less likely to be possible, and either requires a runtime test, or requires you to use the
restrict keyword.

The compiler is able to vectorize loops containing pointers if it can determine that the loop is safe. Both
array references and pointer references in loops are analyzed to see if there is any vector access to
memory. In some cases, the compiler creates a run-time test, and executes a vector version or scalar
version of the loop depending on the result of the test.
Often, function arguments are passed as pointers. If several pointer variables are passed to a function, it
is possible that pointing to overlapping sections of memory can occur. Often, at runtime, this is not the
case but the compiler always follows the safe method and avoids optimizing loops that involve pointers
appearing on both the left and right sides of an assignment operator. For example, consider the following
function.
void func (int *pa, int *pb, int x)
{
int i;
for (i = 0; i < 100; i++)
{
*(pa + i) = *(pb + i) + x;
}
};

In this example, if pa and pb overlap in memory in a way that causes results from one loop pass to feed
back to a subsequent loop pass, then vectorization of the loop can give incorrect results. If the function is
called with the following arguments, vectorization might be ambiguous:
int *a;

func (a, a-1);

The compiler performs a runtime test to see if pointer aliasing occurs. If pointer aliasing does not occur,
it executes a vectorized version of the code. If pointer aliasing occurs, the original nonvectorized code
executes instead. This leads to a small cost in runtime efficiency and code size.
In practice, it is very rare for data dependence to exist because of function arguments. Programs that pass
overlapping pointers are very hard to understand and debug, apart from any vectorization concerns.
In the example above, adding restrict to pa is sufficient to avoid the runtime test.

Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-84

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.16 Nonvectorization on loops containing pointers and indirect addressing

3.16 Nonvectorization on loops containing pointers and indirect addressing

Indirect addressing is not vectorizable with the NEON unit.
Indirect addressing occurs when an array is accessed by a vector of values. If the array is being fetched
from memory, the operation is called a gather. If the array is being stored into memory, the operation is
called a scatter.
In the following example, a is being scattered and b is being gathered.
float a[99], b[99];
int ia[99], ib[99], i, n, j;
...
for (i = 0; i < n; i++) a[ia[i]] = b[j + ib[i]];

Indirect addressing is not vectorizable with the NEON unit because it can only deal with vectors that are
stored consecutively in memory. If there is indirect addressing and significant calculations in a loop, it
might be more efficient for you to move the indirect addressing into a separate non vector loop. This
enables the calculations to vectorize efficiently.

Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-85

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.17 Nonvectorization on conditional loop exits

3.17 Nonvectorization on conditional loop exits

For vectorization purposes, it is best to write loops that do not contain conditional exits from the loop.
The following example is nonvectorizable because it contains a conditional exit from the loop. In cases
like this, you must rewrite the loop, if possible, for vectorization to succeed.
int a[99], b[99], c[99], i, n;
...
for (i = 0; i < n; i++)
{
a[i] = b[i] + c[i];
if (a[i] > 5) break;
};

Related concepts
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.23 Conditional statements and efficient vectorization on page 3-94.

Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-86

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.18 Vectorizable loop iteration counts

3.18 Vectorizable loop iteration counts

If a loop has a fixed iteration count, automatic vectorization is possible. The iteration count must occur at
the start of the loop.
In the example vectorizable loop below, the iteration count is n. The value of n does not change
throughout the course of the loop, so this loop can be automatically vectorized.
If a loop does not have a fixed iteration count, automatic vectorization is not possible.
In the example nonvectorizable loop below, the value of i changes throughout the course of the loop, so
this loop cannot be automatically vectorized.

Table 3-4 Vectorizable and nonvectorizable loops

Vectorizable loop Nonvectorizable loop

/* myprog1.c */ /* myprog2.c */
int a[99], b[99], c[99], i, n; int a[99], b[99], c[99], i, n;
... ...
for (i = 0; i < n; i++) while (i < n)
{ {
a[i] = b[i] + c[i]; a[i] = b[i] + c[i];
} i += a[i];
};

armcc --cpu=Cortex-A8 -O3 -Otime --vectorize myprog1.c - armcc --cpu=Cortex-A8 -O3 -Otime --
o- vectorize myprog2.c -o-

ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8

AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2

f PROC ||foo|| PROC

PUSH {r4-r6} LDR r3,|L1.76|
MOV r0,#0 LDR r0,[r3,#0] ; i, n
LDR r4,|L1.160| LDR r2,[r3,#4]
STR r0,[r4,#0] ; i CMP r0,r2
LDR r12,[r4,#4] ; n BXGE lr
CMP r12,#0 PUSH {r4-r6}
BLE |L1.152| LDR r12,|L1.80|
ASR r0,r12,#31 ADD r4,r12,#0x18c
LDR r1,|L1.164| SUB r5,r12,#0x18c
ADD r0,r12,r0,LSR #30
ADD r2,r1,#0x18c
ASRS r0,r0,#2
SUB r3,r2,#0x318
BEQ |L1.80|

|L1.56| |L1.36|
VLD1.32 {d0,d1},[r1]! LDR r1,[r12,r0,LSL #2]
SUBS r0,r0,#1 LDR r6,[r4,r0,LSL #2]
VLD1.32 {d2,d3},[r2]! ADD r1,r1,r6
VADD.I32 q0,q0,q1 STR r1,[r5,r0,LSL #2]
VST1.32 {d0,d1},[r3]! ADD r0,r0,r1
BNE |L1.56| CMP r0,r2
STR r0,[r3,#0] ; i
BLT |L1.36|
POP {r4-r6}
BX lr
ENDP

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-87

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.18 Vectorizable loop iteration counts

Table 3-4 Vectorizable and nonvectorizable loops (continued)

Vectorizable loop Nonvectorizable loop

|L1.80|
AND r0,r12,#3
CMP r0,#0
BLE |L1.144|
SUB r0,r12,r0
CMP r0,r12
BGE |L1.144|
LDR r1,|L1.164|
ADD r2,r1,#0x18c
SUB r3,r2,#0x318

|L1.116|
LDR r5,[r1,r0,LSL #2]
LDR r6,[r2,r0,LSL #2]
ADD r5,r5,r6
STR r5,[r3,r0,LSL #2]
ADD r0,r0,#1
CMP r0,r12
BLT |L1.116|

|L1.144|
LDR r0,[r4,#4] ; n
STR r0,[r4,#0] ; i

|L1.152|
POP {r4-r6}
BX lr
ENDP

Related concepts
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-88

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.19 Indicating loop iteration counts to the compiler with __promise(expr)

3.19 Indicating loop iteration counts to the compiler with __promise(expr)

The __promise intrinsic lets you indicate to the compiler that a loop iteration count is, for example,
always divisible by 8. This enables the compiler to generate smaller and faster code by reducing the
overhead of runtime iteration count tests.
The NEON unit can operate on elements in groups of 2, 4, 8, or 16. Where the iteration count at the start
of the loop is unknown, the compiler might add a runtime test to check if the iteration count is not a
multiple of the lanes that can be used for the appropriate data type in a NEON register. This increases
code size because additional nonvectorized code is generated to execute any additional loop iterations.
The overhead added by the runtime test is typically insignificant compared with the performance
increase that arises from the vectorized code, although corner cases do exist. For example, an iteration
count of 17 gives a group of 16 elements to operate on in parallel, with 1 iteration left over as
nonvectorized code, whereas an iteration count of 3 gives a group of only 2 elements to operate on in
parallel. In the latter case, the overhead of the runtime test is proportionally greater in comparison with
the vectorized code.
If you know that the iteration count is divisible by the number of elements that the NEON unit can
operate on in parallel, you can indicate this to the compiler using the __promise intrinsic, for example:
/* Promise the compiler that the loop iteration count is divisible by 16 */
__promise((k % 16) == 0);
for (i = 0; i < k; i++)
{
...
}

The __promise intrinsic is required to enable vectorization if the loop iteration count at the start of the
loop is unknown, providing you can make the promise that you claim to make.
This reduces the size of the generated code and can give a performance improvement.
The disassembled output of the example code below illustrates the difference that __promise makes. The
disassembly is reduced to a simple vectorized loop with the removal of nonvectorized code that would
otherwise have been required for possible additional loop iterations. That is, loop iterations beyond those
that are a multiple of the lanes that can be used for the appropriate data type in a NEON register. (The
additional nonvectorized code is known as a scalar fix-up loop. With the use of the __promise(expr)
intrinsic, the scalar fix-up loop is removed.)
/* promise.c */
void f(int *x, int n)
{
int i;
__promise((n > 0) && ((n & 7) == 0));
for (i=0; i < n; i++) x[i]++;
}

When compiling for a processor that supports NEON, the disassembled output might be similar to the
following, for example:
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
f PROC
VMOV.I32 q0,#0x1
ASR r1,r1,#2
|L0.8|
VLD1.32 {d2,d3},[r0]
SUBS r1,r1,#1
VADD.I32 q1,q1,q0
VST1.32 {d2,d3},[r0]!
BNE |L0.8|
BX lr
ENDP

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-89

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.19 Indicating loop iteration counts to the compiler with __promise(expr)

Related concepts
3.18 Vectorizable loop iteration counts on page 3-87.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
10.131 __promise intrinsic on page 10-742.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-90

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.20 Grouping structure accesses for vectorization

3.20 Grouping structure accesses for vectorization

Writing loops to use all parts of a structure together is important for vectorization. Each part of the
structure must be accessed within the same loop.
The following examples show how loop organization can affect vectorization.
Structure access resulting in a nonvectorizable loop:
for (...) { buffer[i].a = ....; }
for (...) { buffer[i].b = ....; }
for (...) { buffer[i].c = ....; }

Structure access resulting in a vectorizable loop

for (...)
{
buffer[i].a = ....;
buffer[i].b = ....;
buffer[i].c = ....;
}

Related concepts
3.21 Vectorization and struct member lengths on page 3-92.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-91

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.21 Vectorization and struct member lengths

3.21 Vectorization and struct member lengths

NEON structure loads require that all members of a structure are of the same length.
In the example code below, the compiler does not attempt to use vector loads because of the inconsistent
structure member lengths.
struct foo
{
short a;
int b;
short c;
} n[10];

This code could be rewritten for vectorization by using the same data type throughout the structure. For
example, if the variable b is to be of type int, consider making variables a and c of type int rather than
short.

Related concepts
3.20 Grouping structure accesses for vectorization on page 3-91.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-92

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.22 Nonvectorization of function calls to non-inline functions from within loops

3.22 Nonvectorization of function calls to non-inline functions from within loops

Calls to non-inline functions from within a loop inhibit vectorization.
Splitting complex operations into several functions to aid clarity is common practice. However, if such
functions are to be considered for vectorization, they must be marked with the __inline or
__forceinline keywords if they are called from within any loops. These functions are then expanded
inline for vectorization to take place.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
10.8 __inline on page 10-608.
10.6 __forceinline on page 10-605.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-93

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.23 Conditional statements and efficient vectorization

3.23 Conditional statements and efficient vectorization

For efficient vectorization, loops must contain mostly assignment statements and must limit the use of if
and switch statements.
Loop invariant conditions are simple conditions that do not change between iterations of the loop. The
compiler can move loop invariant conditions before the loop so that they are executed once, rather than
on each loop iteration.
The compiler can vectorize more complex conditional operations by computing all pathways in vector
mode and merging the results. If there is significant conditional computation, then performance might
suffer.
The following example uses conditional statements in a way that is acceptable for vectorization.
float a[99], b[99], c[99];
int i, n;
...
for (i = 0; i < n; i++)
{
if (c[i] > 0) a[i] = b[i] - 5.0f;
else a[i] = b[i] * 2.0;
};

Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-94

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.24 Vectorization diagnostics to tune code for improved performance

3.24 Vectorization diagnostics to tune code for improved performance

The compiler can provide diagnostic information to indicate where vectorization optimizations were
successfully applied and where it failed to apply vectorization.
The command-line options that provide this information are --diag_warning=optimizations and
--remarks.

The following example shows two functions that implement a simple sum operation on an array. This
code does not vectorize.
int addition(int a, int b)
{
return a + b;
}
void add_int(int *pa, int *pb, unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < n; i++) *(pa + i) = addition(*(pb + i),x);
/* Function calls cannot be vectorized */
}

Using the --diag_warning=optimizations option produces an optimization warning message for the
addition() function:

armcc -O3 -Otime --vectorize --diag_warning=optimizations test.c

Using the --remarks option produces the same messages.

Adding the __inline qualifier to the definition of addition() enables this code to vectorize. However,
it is still not optimal. Using the --diag_warning=optimizations option again produces optimization
warning messages to indicate that the loop vectorizes but there might be a potential pointer aliasing
problem.
The compiler must generate a runtime test for aliasing and output both vectorized and scalar copies of the
code. If you know that the pointers are not aliased, you can use the restrict keyword to reduce the
runtime test overhead and improve vectorization performance:
__inline int addition(int a, int b)
{
return a + b;
}
void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < n; i++) *(pa + i) = addition(*(pb + i),x);
}

The final improvement you can make is to indicate the number of loop iterations. In the previous
example, the number of iterations is not fixed and might not be a multiple that can fit exactly into a
NEON register. This means that the compiler must test for remaining iterations to execute using
nonvectored code. If you know that your iteration count is divisible by the number of elements that the
NEON unit can operate on in parallel, you can indicate this to the compiler using the __promise
intrinsic. The following example shows the final code that obtains the best performance from
vectorization.
__inline int addition(int a, int b)
{
return a + b;
}
void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x)
{
unsigned int i;
__promise((n % 4) == 0);
/* n is a multiple of 4 */
for(i = 0; i < (n & ~3); i++) *(pa + i) = addition(*(pb + i),x);
}

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-95

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.24 Vectorization diagnostics to tune code for improved performance

Related concepts
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-96

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.25 Vectorizable code example

3.25 Vectorizable code example

The following example shows a complete example that uses vectorizable code.
The options required to build this example are listed within the introductory source code comments. The
--cpu=name option must name a processor that has NEON technology, such as Cortex-A7, Cortex-A8,
Cortex-A9, Cortex-A12, or Cortex-A15.
You can use --diag_warning=optimizations to view where vectorization optimization is applied.
The use of __promise enables the compiler to generate smaller and faster code. The code still works and
vectorizes without these promises, but is then larger and slower.
/* * Vectorizable example code.
* Copyright 2006 ARM. All rights reserved.
*
* Includes embedded assembly to initialize cpu; link using '--entry=init_cpu'.
*
* Build using:
* armcc --vectorize -c vector_example.c --cpu Cortex-A8 -Otime -O3 -DNDEBUG
* armlink -o vector_example.axf vector_example.o --entry=init_cpu
*/
#include <stdio.h>
#include <assert.h> /* for __promise() */
void fir(short *__restrict y, const short *x, const short *h, int n_out, int n_coefs)
{
int n;
/* I promise ‘n_out is always a positive multiple of 8’ */
__promise(0 < n_out && (n_out % 8) == 0);
for (n = 0; n < n_out; n++)
{
int k, sum = 0;
/* I promise ‘n_coefs is always a positive multiple of 4’ */
__promise(0 < n_coefs && (n_coefs % 4) == 0);
for (k = 0; k < n_coefs; k++)
{
sum += h[k] * x[n - n_coefs + 1 + k];
}
y[n] = ((sum>>15) + 1) >> 1;
}
}
int main()
{
static const short x[128+7] =
{
0x0000, 0x0647, 0x0c8b, 0x12c8, 0x18f8, 0x1f19, 0x2528, 0x2b1f,
0x30fb, 0x36ba, 0x3c56, 0x41ce, 0x471c, 0x4c3f, 0x5133, 0x55f5,
0x5a82, 0x5ed7, 0x62f2, 0x66cf, 0x6a6d, 0x6dca, 0x70e2, 0x73b5,
0x7641, 0x7884, 0x7a7d, 0x7c29, 0x7d8a, 0x7e9d, 0x7f62, 0x7fd8,
0x8000, 0x7fd8, 0x7f62, 0x7e9d, 0x7d8a, 0x7c29, 0x7a7d, 0x7884,
0x7641, 0x73b5, 0x70e2, 0x6dca, 0x6a6d, 0x66cf, 0x62f2, 0x5ed7,
0x5a82, 0x55f5, 0x5133, 0x4c3f, 0x471c, 0x41ce, 0x3c56, 0x36ba,
0x30fb, 0x2b1f, 0x2528, 0x1f19, 0x18f8, 0x12c8, 0x0c8b, 0x0647,
0x0000, 0xf9b9, 0xf375, 0xed38, 0xe708, 0xe0e7, 0xdad8, 0xd4e1,
0xcf05, 0xc946, 0xc3aa, 0xbe32, 0xb8e4, 0xb3c1, 0xaecd, 0xaa0b,
0xa57e, 0xa129, 0x9d0e, 0x9931, 0x9593, 0x9236, 0x8f1e, 0x8c4b,
0x89bf, 0x877c, 0x8583, 0x83d7, 0x8276, 0x8163, 0x809e, 0x8028,
0x8000, 0x8028, 0x809e, 0x8163, 0x8276, 0x83d7, 0x8583, 0x877c,
0x89bf, 0x8c4b, 0x8f1e, 0x9236, 0x9593, 0x9931, 0x9d0e, 0xa129,
0xa57e, 0xaa0b, 0xaecd, 0xb3c1, 0xb8e4, 0xbe32, 0xc3aa, 0xc946,
0xcf05, 0xd4e1, 0xdad8, 0xe0e7, 0xe708, 0xed38, 0xf375, 0xf9b9,
0x0000, 0x0647, 0x0c8b, 0x12c8, 0x18f8, 0x1f19, 0x2528
};
static const short coeffs[8] =
{
0x0800, 0x1000, 0x2000, 0x4000,
0x4000, 0x2000, 0x1000, 0x0800
};
short y[128];
static const short expected[128] =
{
0x1474, 0x1a37, 0x1fe9, 0x2588, 0x2b10, 0x307d, 0x35cc, 0x3afa,
0x4003, 0x44e5, 0x499d, 0x4e27, 0x5281, 0x56a9, 0x5a9a, 0x5e54,
0x61d4, 0x6517, 0x681c, 0x6ae1, 0x6d63, 0x6fa3, 0x719d, 0x7352,
0x74bf, 0x6de5, 0x66c1, 0x5755, 0x379e, 0x379e, 0x5755, 0x66c1,
0x6de5, 0x74bf, 0x7352, 0x719d, 0x6fa3, 0x6d63, 0x6ae1, 0x681c,
0x6517, 0x61d4, 0x5e54, 0x5a9a, 0x56a9, 0x5281, 0x4e27, 0x499d,
0x44e5, 0x4003, 0x3afa, 0x35cc, 0x307d, 0x2b10, 0x2588, 0x1fe9,
0x1a37, 0x1474, 0x0ea5, 0x08cd, 0x02f0, 0xfd10, 0xf733, 0xf15b,

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-97

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.25 Vectorizable code example

0xeb8c, 0xe5c9, 0xe017, 0xda78, 0xd4f0, 0xcf83, 0xca34, 0xc506,

0xbffd, 0xbb1b, 0xb663, 0xb1d9, 0xad7f, 0xa957, 0xa566, 0xa1ac,
0x9e2c, 0x9ae9, 0x97e4, 0x951f, 0x929d, 0x905d, 0x8e63, 0x8cae,
0x8b41, 0x8a1b, 0x893f, 0x88ab, 0x8862, 0x8862, 0x88ab, 0x893f,
0x8a1b, 0x8b41, 0x8cae, 0x8e63, 0x905d, 0x929d, 0x951f, 0x97e4,
0x9ae9, 0x9e2c, 0xa1ac, 0xa566, 0xa957, 0xad7f, 0xb1d9, 0xb663,
0xbb1b, 0xbffd, 0xc506, 0xca34, 0xcf83, 0xd4f0, 0xda78, 0xe017,
0xe5c9, 0xeb8c, 0xf15b, 0xf733, 0xfd10, 0x02f0, 0x08cd, 0x0ea5,
};
int i, ok = 1;
fir(y, x + 7, coeffs, 128, 8);
for (i = 0; i < sizeof(y)/sizeof(*y); ++i)
{
if (y[i] != expected[i])
{
printf("mismatch: y[%d] = 0x%04x; expected[%d] = 0x%04x\n", i, y[i], i,
expected[i]);
ok = 0;
break;
}
}
if (ok) printf("** TEST PASSED OK **\n");
return ok ? 0 : 1;
}
#ifdef __TARGET_ARCH_7_A
__asm void init_cpu() {
// Set up processor state
MRC p15,0,r4,c1,c0,0
ORR r4,r4,#0x00400000 // enable unaligned mode (U=1)
BIC r4,r4,#0x00000002 // disable alignment faults (A=0)
// MMU not enabled: no page tables
MCR p15,0,r4,c1,c0,0
#ifdef __BIG_ENDIAN
SETEND BE
#endif
MRC p15,0,r4,c1,c0,2 // Enable VFP access in the CAR -
ORR r4,r4,#0x00f00000 // must be done before any VFP instructions
MCR p15,0,r4,c1,c0,2
MOV r4,#0x40000000 // Set EN bit in FPEXC
MSR FPEXC,r4
IMPORT __main
B __main
}
#endif

Related concepts
3.26 DSP vectorizable code example on page 3-99.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
7.26 Embedded assembler support in the compiler on page 7-291.
3.25 Vectorizable code example on page 3-97.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.43 --cpu=name on page 8-368.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
10.161 Predefined macros on page 10-786.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.

Related information
--entry=location linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-98

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.26 DSP vectorizable code example

3.26 DSP vectorizable code example

The following example shows a complete Digital Signal Processing (DSP) example that uses
vectorizable code.
The options required to build this example are listed within the introductory source code comments. The
--cpu=name option must name a processor that has NEON technology, such as Cortex-A7, Cortex-A8,
Cortex-A9, Cortex-A12, or Cortex-A15.
You can use --diag_warning=optimizations to view where vectorization optimization is applied.
/** DSP Vectorizable example code.
* Copyright 2006 ARM. All rights reserved.
*
* Includes embedded assembly to initialize cpu; link using '--entry=init_cpu'.
*
* Build using:
* armcc -c dsp_vector_example.c --cpu Cortex-A8 -O3 -Otime --vectorize -DNDEBUG
* armlink -o dsp_vector_example.axf dsp_vector_example.o --entry=init_cpu
*/
#include <stdio.h>
#include <dspfns.h>
#include <assert.h> /* for __promise() */
void fn(short *__restrict r, int n, const short *__restrict a, const short *__restrict b)
{
int i;
/* I promise ‘n is always a positive multiple of 8’ */
__promise(0 < n && (n % 8) == 0);
for (i = 0; i < n; ++i)
{
r[i] = add(a[i], b[i]);
}
}
int main()
{
static const short x[128] =
{
0x0000, 0x0647, 0x0c8b, 0x12c8, 0x18f8, 0x1f19, 0x2528, 0x2b1f,
0x30fb, 0x36ba, 0x3c56, 0x41ce, 0x471c, 0x4c3f, 0x5133, 0x55f5,
0x5a82, 0x5ed7, 0x62f2, 0x66cf, 0x6a6d, 0x6dca, 0x70e2, 0x73b5,
0x7641, 0x7884, 0x7a7d, 0x7c29, 0x7d8a, 0x7e9d, 0x7f62, 0x7fd8,
0x8000, 0x7fd8, 0x7f62, 0x7e9d, 0x7d8a, 0x7c29, 0x7a7d, 0x7884,
0x7641, 0x73b5, 0x70e2, 0x6dca, 0x6a6d, 0x66cf, 0x62f2, 0x5ed7,
0x5a82, 0x55f5, 0x5133, 0x4c3f, 0x471c, 0x41ce, 0x3c56, 0x36ba,
0x30fb, 0x2b1f, 0x2528, 0x1f19, 0x18f8, 0x12c8, 0x0c8b, 0x0647,
0x0000, 0xf9b9, 0xf375, 0xed38, 0xe708, 0xe0e7, 0xdad8, 0xd4e1,
0xcf05, 0xc946, 0xc3aa, 0xbe32, 0xb8e4, 0xb3c1, 0xaecd, 0xaa0b,
0xa57e, 0xa129, 0x9d0e, 0x9931, 0x9593, 0x9236, 0x8f1e, 0x8c4b,
0x89bf, 0x877c, 0x8583, 0x83d7, 0x8276, 0x8163, 0x809e, 0x8028,
0x8000, 0x8028, 0x809e, 0x8163, 0x8276, 0x83d7, 0x8583, 0x877c,
0x89bf, 0x8c4b, 0x8f1e, 0x9236, 0x9593, 0x9931, 0x9d0e, 0xa129,
0xa57e, 0xaa0b, 0xaecd, 0xb3c1, 0xb8e4, 0xbe32, 0xc3aa, 0xc946,
0xcf05, 0xd4e1, 0xdad8, 0xe0e7, 0xe708, 0xed38, 0xf375, 0xf9b9,
};
static const short y[128] =
{
0x8000, 0x7fd8, 0x7f62, 0x7e9d, 0x7d8a, 0x7c29, 0x7a7d, 0x7884,
0x7641, 0x73b5, 0x70e2, 0x6dca, 0x6a6d, 0x66cf, 0x62f2, 0x5ed7,
0x5a82, 0x55f5, 0x5133, 0x4c3f, 0x471c, 0x41ce, 0x3c56, 0x36ba,
0x30fb, 0x2b1f, 0x2528, 0x1f19, 0x18f8, 0x12c8, 0x0c8b, 0x0647,
0x0000, 0xf9b9, 0xf375, 0xed38, 0xe708, 0xe0e7, 0xdad8, 0xd4e1,
0xcf05, 0xc946, 0xc3aa, 0xbe32, 0xb8e4, 0xb3c1, 0xaecd, 0xaa0b,
0xa57e, 0xa129, 0x9d0e, 0x9931, 0x9593, 0x9236, 0x8f1e, 0x8c4b,
0x89bf, 0x877c, 0x8583, 0x83d7, 0x8276, 0x8163, 0x809e, 0x8028,
0x8000, 0x8028, 0x809e, 0x8163, 0x8276, 0x83d7, 0x8583, 0x877c,
0x89bf, 0x8c4b, 0x8f1e, 0x9236, 0x9593, 0x9931, 0x9d0e, 0xa129,
0xa57e, 0xaa0b, 0xaecd, 0xb3c1, 0xb8e4, 0xbe32, 0xc3aa, 0xc946,
0xcf05, 0xd4e1, 0xdad8, 0xe0e7, 0xe708, 0xed38, 0xf375, 0xf9b9,
0x0000, 0x0647, 0x0c8b, 0x12c8, 0x18f8, 0x1f19, 0x2528, 0x2b1f,
0x30fb, 0x36ba, 0x3c56, 0x41ce, 0x471c, 0x4c3f, 0x5133, 0x55f5,
0x5a82, 0x5ed7, 0x62f2, 0x66cf, 0x6a6d, 0x6dca, 0x70e2, 0x73b5,
0x7641, 0x7884, 0x7a7d, 0x7c29, 0x7d8a, 0x7e9d, 0x7f62, 0x7fd8,
};
short r[128];
static const short expected[128] =
{
0x8000, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff,
0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff,
0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff,

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-99

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.26 DSP vectorizable code example

0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff, 0x7fff,

0x8000, 0x7991, 0x72d7, 0x6bd5, 0x6492, 0x5d10, 0x5555, 0x4d65,
0x4546, 0x3cfb, 0x348c, 0x2bfc, 0x2351, 0x1a90, 0x11bf, 0x08e2,
0x0000, 0xf71e, 0xee41, 0xe570, 0xdcaf, 0xd404, 0xcb74, 0xc305,
0xbaba, 0xb29b, 0xaaab, 0xa2f0, 0x9b6e, 0x942b, 0x8d29, 0x866f,
0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000,
0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000,
0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000,
0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000,
0x8000, 0x866f, 0x8d29, 0x942b, 0x9b6e, 0xa2f0, 0xaaab, 0xb29b,
0xbaba, 0xc305, 0xcb74, 0xd404, 0xdcaf, 0xe570, 0xee41, 0xf71e,
0x0000, 0x08e2, 0x11bf, 0x1a90, 0x2351, 0x2bfc, 0x348c, 0x3cfb,
0x4546, 0x4d65, 0x5555, 0x5d10, 0x6492, 0x6bd5, 0x72d7, 0x7991,
};
int i, ok = 1;
fn(r, sizeof(r)/sizeof(*r), x, y);
for (i = 0; i < sizeof(r)/sizeof(*r); ++i)
{
if (r[i] != expected[i])
{
printf("mismatch: r[%d] = 0x%04x; expected[%d] = 0x%04x\n", i, r[i], i,
expected[i]);
ok = 0;
break;
}
}
if (ok) printf("** TEST PASSED OK **\n");
return ok ? 0 : 1;
}
#ifdef __TARGET_ARCH_7_A
__asm void init_cpu()
{
// Set up processor state
MRC p15,0,r4,c1,c0,0
ORR r4,r4,#0x00400000 // enable unaligned mode (U=1)
BIC r4,r4,#0x00000002 // disable alignment faults (A=0)
// MMU not enabled: no page tables
MCR p15,0,r4,c1,c0,0
#ifdef __BIG_ENDIAN
SETEND BE
#endif
MRC p15,0,r4,c1,c0,2 // Enable VFP access in the CAR -
ORR r4,r4,#0x00f00000 // must be done before any VFP instructions
MCR p15,0,r4,c1,c0,2
MOV r4,#0x40000000 // Set EN bit in FPEXC
MSR FPEXC,r4
IMPORT __main
B __main
}
#endif

Related information
--entry=location linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-100

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.27 What can limit or prevent automatic vectorization

3.27 What can limit or prevent automatic vectorization

The following table summarizes what can limit or prevent automatic vectorization of loops.

Table 3-5 Factors that limit or prevent automatic vectorization

Inhibiting factor Extent to which it applies

Not having a valid NEON compiler You might require a valid NEON compiler license to generate NEON instructions, depending on
license. your compiler version.
RVCT 3.1 or later, and ARM Compiler 4.1, require a valid NEON compiler license.
ARM Compiler 5.01 and later do not require a separate NEON compiler license.

Source code without loops. Automatic vectorization involves loop analysis. Without loops, automatic vectorization cannot
apply.

Target processor. The target processor (--cpu) must have NEON capability if NEON instructions are to be
generated. For example, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-A12, or Cortex-A15.

Floating-point code. Vectorization of floating-point code does not always occur automatically. For example, loops
that require re-association only vectorize when compiled with --fpmode fast.

--no_vectorize by default. By default, generation of NEON vector instructions directly from C or C++ code is disabled,
and must be enabled with --vectorize.

-Otime not specified. -Otime must be specified to reduce execution time and enable loops to vectorize.

-Onum not set high enough. The optimization level you set must be -O2 or -O3. Loops do not vectorize at -O0 or -O1.

Risk of incorrect results. If there is a risk of an incorrect result, vectorization is not applied where that risk occurs. You
might have to manually tune your code to make it more suitable for automatic vectorization.

Earlier manual optimization Automatic vectorization can be impeded by earlier manual optimization attempts. For example,
attempts. manual loop unrolling in the source code, or complex array accesses.

No vector access pattern. If variables in a loop lack a vector access pattern, the compiler cannot automatically vectorize
the loop.

Data dependencies between Where there is a possibility of the use and storage of arrays overlapping on different iterations of
different iterations of a loop. a loop, there is a data dependency problem. A loop cannot be safely vectorized if the vector
order of operations can change the results, so the compiler leaves the loop in its original form or
only partially vectorizes the loop.

Memory hierarchy. Performing relatively few arithmetic operations on large data sets retrieved from main memory
is limited by the memory bandwidth of the system. Most processors are relatively unbalanced
between memory bandwidth and processor capacity. This can adversely affect the automatic
vectorization process.

Iteration count not fixed at start of For automatic vectorization, it is generally best to write simple loops with iterations that are
loop. fixed at the start of the loop. If a loop does not have a fixed iteration count, automatic addressing
is not possible.

Conditional loop exits. It is best to write loops that do not contain conditional exits from the loop.

Carry-around scalar variables. Carry-around scalar variables are a problem for automatic vectorization because the value
computed in one pass of the loop is carried forward into the next pass.

__promise(expr) not used. Failure to use __promise(expr) where it could make a difference to automatic vectorization
can limit automatic vectorization.

Pointer aliasing. Pointer aliasing prevents the use of automatically vectorized code.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-101

Non-Confidential
3 Using the NEON Vectorizing Compiler
3.27 What can limit or prevent automatic vectorization

Table 3-5 Factors that limit or prevent automatic vectorization (continued)

Inhibiting factor Extent to which it applies

Indirect addressing. Indirect addressing is not vectorizable because the NEON unit can only deal with vectors stored
consecutively in memory.

Separating access to different parts Each part of a structure must be accessed within the same loop for automatic vectorization to
of a structure into separate loops. occur.

Inconsistent length of members If members of a structure are not all the same length, the compiler does not attempt to use vector
within a structure. loads.

Calls to non-inline functions. Calls to non-inline functions from within a loop inhibits vectorization. If such functions are to
be considered for vectorization, they must be marked with the __inline or __forceinline
keywords.

if and switch statements. Extensive use of if and switch statements can affect the efficiency of automatic
vectorization.

You can use --diag_warning=optimizations to obtain compiler diagnostics on what can and cannot
be vectorized.

Related concepts
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.

Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.7 Data references within a vectorizable loop on page 3-76.
3.11 Recommended loop structure for vectorization on page 3-80.
9.13 restrict on page 9-563.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
10.6 __forceinline on page 10-605.
8.87 --fpmode=model on page 8-415.
10.8 __inline on page 10-608.
8.139 -Onum on page 8-473.
8.144 -Otime on page 8-480.
8.164 --restrict, --no_restrict on page 8-500.
10.131 __promise intrinsic on page 10-742.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 3-102

Non-Confidential
Chapter 4
Compiler Features

Provides an overview of ARM-specific features of the compiler.

It contains the following sections:
• 4.1 Compiler intrinsics on page 4-105.
• 4.2 Performance benefits of compiler intrinsics on page 4-106.
• 4.3 ARM assembler instruction intrinsics on page 4-107.
• 4.4 Generic intrinsics on page 4-108.
• 4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
• 4.6 Compiler intrinsics for inserting optimization barriers on page 4-110.
• 4.7 Compiler intrinsics for inserting native instructions on page 4-112.
• 4.8 Compiler intrinsics for Digital Signal Processing (DSP) on page 4-113.
• 4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
• 4.10 Overflow and carry status flags for C and C++ code on page 4-116.
• 4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
• 4.12 NEON intrinsics provided by the compiler on page 4-118.
• 4.13 Using NEON intrinsics on page 4-119.
• 4.14 Compiler support for accessing registers using named register variables on page 4-121.
• 4.15 Pragmas recognized by the compiler on page 4-124.
• 4.16 Compiler and processor support for bit-banding on page 4-126.
• 4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
• 4.18 --bitband compiler command-line option on page 4-128.
• 4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.
• 4.20 Compiler support for thread-local storage on page 4-130.
• 4.21 Compiler support for literal pools on page 4-131.
• 4.22 Compiler eight-byte alignment features on page 4-132.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-103

Non-Confidential
4 Compiler Features

• 4.23 Using compiler and linker support for symbol versions on page 4-133.
• 4.24 Precompiled Header (PCH) files on page 4-134.
• 4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
• 4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
• 4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
• 4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
• 4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
• 4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file
on page 4-143.
• 4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
• 4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
• 4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
• 4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
• 4.35 Default compiler options that are affected by optimization level on page 4-148.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-104

Non-Confidential
4 Compiler Features
4.1 Compiler intrinsics

4.1 Compiler intrinsics

Compiler intrinsics are functions provided by the compiler. They enable you to easily incorporate
domain-specific operations in C and C++ source code without resorting to complex implementations in
assembly language.
The C and C++ languages are suited to a wide variety of tasks but they do not provide in-built support
for specific areas of application, for example, Digital Signal Processing (DSP).
Within a given application domain, there is usually a range of domain-specific operations that have to be
performed frequently. However, often these operations cannot be efficiently implemented in C or C++. A
typical example is the saturated add of two 32-bit signed two’s complement integers, commonly used in
DSP programming. The following example shows a C implementation of saturated add operation
#include <limits.h>
int L_add(const int a, const int b)
{
int c;
c = a + b;
if (((a ^ b) & INT_MIN) == 0)
{
if ((c ^ a) & INT_MIN)
{
c = (a < 0) ? INT_MIN : INT_MAX;
}
}
return c;
}

Using compiler intrinsics, you can achieve more complete coverage of target architecture instructions
than you would from the instruction selection of the compiler.
An intrinsic function has the appearance of a function call in C or C++, but is replaced during
compilation by a specific sequence of low-level instructions. When implemented using an intrinsic, for
example, the saturated add function previous example has the form:
#include <dspfns.h> /* Include ETSI intrinsics */
...
int a, b, result;
...
result = L_add(a, b); /* Saturated add of a and b */

Related concepts
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.

Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.154 ETSI basic operations on page 10-768.
10.105 Instruction intrinsics on page 10-713.
10.155 C55x intrinsics on page 10-770.
Chapter 18 Using NEON Support on page 18-923.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-105

Non-Confidential
4 Compiler Features
4.2 Performance benefits of compiler intrinsics

4.2 Performance benefits of compiler intrinsics

The use of compiler intrinsics offers a number of performance benefits:
• The low-level instructions substituted for an intrinsic might be more efficient than corresponding
implementations in C or C++, resulting in both reduced instruction and cycle counts. To implement
the intrinsic, the compiler automatically generates the best sequence of instructions for the specified
target architecture. For example, the L_add intrinsic maps directly to the ARM assembly language
instruction qadd:
QADD r0, r0, r1 /* Assuming r0 = a, r1 = b on entry */
• More information is given to the compiler than the underlying C and C++ language is able to convey.
This enables the compiler to perform optimizations and to generate instruction sequences that it could
not otherwise have performed.
These performance benefits can be significant for real-time processing applications. However, care is
required because the use of intrinsics can decrease code portability.

Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-106

Non-Confidential
4 Compiler Features
4.3 ARM assembler instruction intrinsics

4.3 ARM assembler instruction intrinsics

The compiler provides a range of instruction intrinsics for generating ARM assembly language
instructions from within your C or C++ code.
Collectively, these intrinsics enable you to emulate inline assembly code using a combination of C code
and instruction intrinsics.
ARM provides the following types of compiler intrinsics:
• Generic intrinsics.
• Compiler intrinsics for controlling IRQ and FIQ interrupts.
• Compiler intrinsics for inserting optimization barriers.
• Compiler intrinsics for inserting native instructions.
• Compiler intrinsics for Digital Signal Processing (DSP).

Related references
4.4 Generic intrinsics on page 4-108.
4.7 Compiler intrinsics for inserting native instructions on page 4-112.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-107

Non-Confidential
4 Compiler Features
4.4 Generic intrinsics

4.4 Generic intrinsics

The compiler provides a number of generic intrinsics, that is, intrinsics not targeted towards any
particular area of application.
The following generic intrinsics are ARM language extensions to the ISO C and C++ standards:
• __breakpoint intrinsic.
• __current_pc intrinsic.
• __current_sp intrinsic.
• __nop intrinsic.
• __return_address intrinsic.
• __semihost intrinsic.
Implementations of these intrinsics are available across all architectures.

Related references
10.106 __breakpoint intrinsic on page 10-714.
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.
10.127 __nop intrinsic on page 10-737.
10.137 __return_address intrinsic on page 10-748.
10.140 __semihost intrinsic on page 10-751.
10.160 GNU built-in functions on page 10-778.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-108

Non-Confidential
4 Compiler Features
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts

4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts

The intrinsics __disable_irq, __enable_irq, __disable_fiq and __enable_fiq control IRQ and FIQ
interrupts.
You cannot use these intrinsics to change any other CPSR bits, including the mode, state, and imprecise
data abort setting. This means that the intrinsics can be used only if the processor is already in a
privileged mode, because the control bits of the CPSR and SPSR cannot be changed in User mode.
These intrinsics are available for all processor architectures in both ARM and Thumb state, as follows:
• If you are compiling for processors that support ARMv6 (or later), a CPS instruction is generated
inline for these functions, for example:
CPSID i
• If you are compiling for processors that support ARMv4 or ARMv5 in ARM state, the compiler
inlines a sequence of MRS and MSR instructions, for example:
MRS r0, CPSR
ORR r0, r0, #0x80
MSR CPSR_c, r0
• If you are compiling for processors that support ARMv4 or ARMv5 in Thumb state, or if
--compatible is being used, the compiler calls a helper function, for example:
BL __ARM_disable_irq

Related concepts
4.1 Compiler intrinsics on page 4-105.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.

Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.112 __disable_fiq intrinsic on page 10-720.
10.113 __disable_irq intrinsic on page 10-721.
10.116 __enable_fiq intrinsic on page 10-725.
10.117 __enable_irq intrinsic on page 10-726.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-109

Non-Confidential
4 Compiler Features
4.6 Compiler intrinsics for inserting optimization barriers

4.6 Compiler intrinsics for inserting optimization barriers

The optimization barrier intrinsics __schedule_barrier, __force_stores, __force_loads, and
__memory_changed let you override compiler optimizations by disabling instruction re-ordering and
forcing memory updates.
The compiler can perform a range of optimizations, including re-ordering instructions and merging some
operations. In some cases, such as system level programming where memory is being accessed
concurrently by multiple processes, it might be necessary to disable instruction re-ordering and force
memory to be updated.
The optimization barrier intrinsics __schedule_barrier, __force_stores, __force_loads and
__memory_changed do not generate code, but they can result in slightly increased code size and extra
memory accesses.
Note
On some systems, the optimization barrier intrinsics might not be sufficient to ensure memory
consistency. For example, the __memory_changed intrinsic forces values held in registers to be written
out to memory. However, if the destination for the data is held in a region that can be buffered it might
wait in a write buffer. In this case, you might also have to write to CP15 or use a memory barrier
instruction to drain the write buffer. See the Technical Reference Manual for your ARM processor for
more information.

Memory barrier intrinsics

Memory barrier intrinsics insert the following instructions into the instruction stream:

Memory barrier intrinsic Instruction

__dsb() DSB

__dmb() DMB

__isb() ISB

The memory barrier intrinsic also implicitly adds an optimization barrier intrinsic, and applies an
operand to the inserted instruction. The argument passed to either __dsb(), or __dmb() defines which
optimization barrier is added, and which operand is applied.

Intrinsic argument Operand Optimization barrier intrinsic

1 OSH __force_loads()

2 OSHST __force_stores()

3 OSH __memory_changed()

5 NSH __force_loads()

6 NSHST __force_stores()

7 NSH __memory_changed()

9 ISH __force_loads()

10 ISHST __force_stores()

11 ISH __memory_changed()

13 SY __force_loads()

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-110

Non-Confidential
4 Compiler Features
4.6 Compiler intrinsics for inserting optimization barriers

(continued)

Intrinsic argument Operand Optimization barrier intrinsic

14 ST __force_stores()

15 SY __memory_changed()

Example
__dsb(5) inserts DSB NSH into the instruction stream, and implicitly adds the __force_loads()
optimization barrier intrinsic.
Note
• For__isb(), the only supported operand is SY.
• When compiling for an ARMv7-M target, all intrinsic arguments passed to __isb(), __dmb() and
__dsb() emit the SY operand.

Related references
10.121 __force_stores intrinsic on page 10-730.
10.126 __memory_changed intrinsic on page 10-736.
10.139 __schedule_barrier intrinsic on page 10-750.
10.120 __force_loads intrinsic on page 10-729.
10.122 __isb intrinsic on page 10-731.
10.114 __dmb intrinsic on page 10-723.
10.115 __dsb intrinsic on page 10-724.

Related information
DMB.
DSB.
ISB.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-111

Non-Confidential
4 Compiler Features
4.7 Compiler intrinsics for inserting native instructions

4.7 Compiler intrinsics for inserting native instructions

The compiler provides a number of intrinsics that insert ARM processor instructions into the instruction
stream generated by the compiler.

Related references
10.107 __cdp intrinsic on page 10-715.
10.108 __clrex intrinsic on page 10-716.
10.123 __ldrex intrinsic on page 10-732.
10.125 __ldrt intrinsic on page 10-735.
10.128 __pld intrinsic on page 10-739.
10.130 __pli intrinsic on page 10-741.
10.135 __rbit intrinsic on page 10-746.
10.136 __rev intrinsic on page 10-747.
10.138 __ror intrinsic on page 10-749.
10.141 __sev intrinsic on page 10-753.
10.145 __strex intrinsic on page 10-757.
10.147 __strt intrinsic on page 10-761.
10.148 __swp intrinsic on page 10-762.
10.150 __wfe intrinsic on page 10-764.
10.151 __wfi intrinsic on page 10-765.
10.152 __yield intrinsic on page 10-766.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-112

Non-Confidential
4 Compiler Features
4.8 Compiler intrinsics for Digital Signal Processing (DSP)

4.8 Compiler intrinsics for Digital Signal Processing (DSP)

The compiler provides intrinsics that assist in the implementation of DSP algorithms.
These intrinsics introduce the appropriate target instructions for:
• ARM, on architectures from ARMv5TE onwards.
• Thumb, on architectures with Thumb-2 technology.
Not every instruction has its own intrinsic. The compiler can combine several intrinsics, or combinations
of intrinsics and C operators to generate more powerful instructions. For example, the ARMv5TE QDADD
instruction is generated by a combination of __qadd and __qdbl.

Related references
10.109 __clz intrinsic on page 10-717.
10.118 __fabs intrinsic on page 10-727.
10.119 __fabsf intrinsic on page 10-728.
10.132 __qadd intrinsic on page 10-743.
10.133 __qdbl intrinsic on page 10-744.
10.134 __qsub intrinsic on page 10-745.
10.142 __sqrt intrinsic on page 10-754.
10.143 __sqrtf intrinsic on page 10-755.
10.144 __ssat intrinsic on page 10-756.
10.149 __usat intrinsic on page 10-763.
10.153 ARMv6 SIMD intrinsics on page 10-767.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-113

Non-Confidential
4 Compiler Features
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations

4.9 Compiler support for European Telecommunications Standards Institute

(ETSI) basic operations
ARM Compiler 4.1 and later provide support for the ETSI basic operations to help implement coding of
speech.
ETSI has produced several recommendations for the coding of speech, for example, the G.723.1 and G.
729 recommendations. These recommendations include source code and test sequences for reference
implementations of the codecs.
Model implementations of speech codecs supplied by ETSI are based on a collection of C functions
known as the ETSI basic operations. The ETSI basic operations include 16-bit, 32-bit and 40-bit
operations for saturated arithmetic, 16-bit and 32-bit logical operations, and 16-bit and 32-bit operations
for data type conversion.

Note
Version 2.0 of the ETSI collection of basic operations, as described in the ITU-T Software Tool Library
2005 User's manual, introduces new 16-bit, 32-bit and 40 bit-operations. These operations are not
supported in the ARM compilation tools.

The ETSI basic operations serve as a set of primitives for developers publishing codec algorithms, rather
than as a library for use by developers implementing codecs in C or C++.
ARM Compiler 4.1 and later provide support for the ETSI basic operations through the header file
dspfns.h. The dspfns.h header file contains definitions of the ETSI basic operations as a combination
of C code and intrinsics.
See dspfns.h for a complete list of the ETSI basic operations supported in ARM Compiler 4.1 and later.
ARM Compiler 4.1 and later support the original ETSI family of basic operations as described in the
ETSI G.729 recommendation Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-
excited linear prediction (CS-ACELP), including:
• 16-bit and 32-bit saturated arithmetic operations, such as add and sub. For example, add(v1, v2)
adds two 16-bit numbers v1 and v2 together, with overflow control and saturation, returning a 16-bit
result.
• 16-bit and 32-bit multiplication operations, such as mult and L_mult. For example, mult(v1, v2)
multiplies two 16-bit numbers v1 and v2 together, returning a scaled 16-bit result.
• 16-bit arithmetic shift operations, such as shl and shr. For example, the saturating left shift operation
shl(v1, v2) arithmetically shifts the 16-bit input v1 left v2 positions. A negative shift count shifts
v1 right v2 positions.
• 16-bit data conversion operations, such as extract_l, extract_h, and round. For example,
round(L_v1) rounds the lower 16 bits of the 32-bit input L_v1 into the most significant 16 bits with
saturation.
Note
Beware that both the dspfns.h header file and the ISO C99 header file math.h both define (different
versions of) the function round(). Take care to avoid this potential conflict.

Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
4.10 Overflow and carry status flags for C and C++ code on page 4-116.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-114

Non-Confidential
4 Compiler Features
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations

Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.154 ETSI basic operations on page 10-768.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-115

Non-Confidential
4 Compiler Features
4.10 Overflow and carry status flags for C and C++ code

4.10 Overflow and carry status flags for C and C++ code
The implementation of the European Telecommunications Standards Institute (ETSI) basic operations in
dspfns.h exposes the status flags Overflow and Carry.

These flags are available as global variables for use in your own C or C++ programs. For example:
#include <dspfns.h> /* include ETSI intrinsics */
#include <stdio.h>
...
const int BUFLEN=255;
int a[BUFLEN], b[BUFLEN], c[BUFLEN];
...
Overflow = 0; /* clear overflow flag */
for (i = 0; i < BUFLEN; ++i) {
c[i] = L_add(a[i], b[i]); /* saturated add of a[i] and b[i] */
}
if (Overflow)
{
fprintf(stderr, "Overflow on saturated addition\n");
}

Generally, saturating functions have a sticky effect on overflow. That is, the overflow flag remains set
until it is explicitly cleared.

Related concepts
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-116

Non-Confidential
4 Compiler Features
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code

4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code

The ARM compilation tools support the emulation of selected TI C55x intrinsics.
The TI C55x compiler recognizes a number of intrinsics for the optimization of C code. The ARM
compilation tools support the emulation of selected TI C55x intrinsics through the header file, c55x.h.
c55x.h gives a complete list of the TI C55x intrinsics that are emulated by the ARM compilation tools.

TI C55x intrinsics that are emulated in c55x.h include:

• Intrinsics for addition, subtraction, negation and absolute value, such as _sadd and _ssub. For
example, _sadd(v1, v2) returns the 16-bit saturated sum of v1 and v2.
• Intrinsics for multiplication and shifting, such as _smpy and _sshl. For example, _smpy(v1, v2)
returns the saturated fractional-mode product of v1 and v2.
• Intrinsics for rounding, saturation, bitcount and extremum, such as _round and _count. For example,
_round(v1) returns the value v1 rounded by adding 215 using unsaturated arithmetic, clearing the
lower 16 bits.
• Associative variants of intrinsics for addition and multiply-and-accumulate. This includes all TI C55x
intrinsics prefixed with _a_, for example, _a_sadd and _a_smac.
• Rounding variants of intrinsics for multiplication and shifting, for example, _smacr and _smasr.
The following TI C55x intrinsics are not supported in c55x.h:
• All long long variants of intrinsics. This includes all TI C55x intrinsics prefixed with _ll, for
example, _llsadd and _llshl. long long variants of intrinsics are not supported in the ARM
compilation tools because they operate on 40-bit data.
• All arithmetic intrinsics with side effects. For example, the TI C55x intrinsics _firs and _lms are not
defined in c55x.h.
• Intrinsics for ETSI support functions, such as L_add_c and L_sub_c.
Note
An exception is the ETSI support function for saturating division, divs. This intrinsic is supported in
c55x.h.

Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.

Related information
Texas Instruments, http://www.ti.com.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-117

Non-Confidential
4 Compiler Features
4.12 NEON intrinsics provided by the compiler

4.12 NEON intrinsics provided by the compiler

As an alternative to automatic compiler vectorization, NEON intrinsics provide an intermediate step
between a vectorizing compiler and writing assembly language for SIMD code generation.
This feature makes it easier to write code that takes advantage of the NEON architecture when compared
to writing assembly language directly.
NEON intrinsics are defined in the header file arm_neon.h. The header file defines both the intrinsics
and a set of vector types.

Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
3.2 The NEON unit on page 3-70.
3.3 Methods of writing code for NEON on page 3-72.

Related tasks
4.13 Using NEON intrinsics on page 4-119.

Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
Chapter 18 Using NEON Support on page 18-923.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-118

Non-Confidential
4 Compiler Features
4.13 Using NEON intrinsics

4.13 Using NEON intrinsics

Describes how to build an example program that uses NEON intrinsics.

Procedure
1. Create the following example C program source code:
/* neon_example.c - Neon intrinsics example program */
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <arm_neon.h>
/* fill array with increasing integers beginning with 0 */
void fill_array(int16_t *array, int size)
{
int i;
for (i = 0; i < size; i++)
{
array[i] = i;
}
}
/* return the sum of all elements in an array. This works by calculating 4 totals (one
for each lane) and adding those at the end to get the final total */
int sum_array(int16_t *array, int size)
{
/* initialize the accumulator vector to zero */
int16x4_t acc = vdup_n_s16(0);
int32x2_t acc1;
int64x1_t acc2;
/* this implementation assumes the size of the array is a multiple of 4 */
assert((size % 4) == 0);
/* counting backwards gives better code */
for (; size != 0; size -= 4)
{
int16x4_t vec;
/* load 4 values in parallel from the array */
vec = vld1_s16(array);
/* increment the array pointer to the next element */
array += 4;
/* add the vector to the accumulator vector */
acc = vadd_s16(acc, vec);
}
/* calculate the total */
acc1 = vpaddl_s16(acc);
acc2 = vpaddl_s32(acc1);
/* return the total as an integer */
return (int)vget_lane_s64(acc2, 0);
}
/* main function */
int main()
{
int16_t my_array[100];
fill_array(my_array, 100);
printf("Sum was %d\n", sum_array(my_array, 100));
return 0;
}

2. Compile the example source code with the following options:

armcc -c --debug --cpu=Cortex-A8 neon_example.c
3. Link the resulting object file using the following command:
armlink neon_example.o -o neon_example.axf
4. Use a compatible debugger to load and run the resulting image neon_example.axf.

Related concepts
4.12 NEON intrinsics provided by the compiler on page 4-118.

Related references
8.21 -c on page 8-344.
8.43 --cpu=name on page 8-368.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-119

Non-Confidential
4 Compiler Features
4.13 Using NEON intrinsics

8.47 --debug, --no_debug on page 8-374.

Chapter 18 Using NEON Support on page 18-923.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-120

Non-Confidential
4 Compiler Features
4.14 Compiler support for accessing registers using named register variables

4.14 Compiler support for accessing registers using named register variables
You can use named register variables to access registers of an ARM architecture-based processor.
Named register variables are declared by combining the register keyword with the __asm keyword.
The __asm keyword takes one parameter, a character string, that names the register. For example, the
following declaration declares R0 as a named register variable for the register r0:
register int R0 __asm("r0");

Any type of the same size as the register being named can be used in the declaration of a named register
variable. The type can be a structure, but bitfield layout is sensitive to endianness.

Note
Writing to the current stack pointer, "r13" or "sp", can give unpredictable results at either compile-time
or run-time.

You must declare core registers as global rather than local named register variables. Your program might
still compile if you declare them locally, but you risk unexpected runtime behavior if you do. There is no
restriction on the scope of named register variables for other registers.
Note
A global named register variable is global to the source file in which it is declared, not global to the
program. It has no effect on other files, unless you use multifile compilation or you declare it in a header
file.

A typical use of named register variables is to access bits in the Application Program Status Register
(APSR). The following example shows how to use named register variables to set the saturation flag Q in
the APSR.
#ifndef __BIG_ENDIAN // bitfield layout of APSR is sensitive to endianness
typedef union
{
struct
{
int mode:5;
int T:1;
int F:1;
int I:1;
int _dnm:19;
int Q:1;
int V:1;
int C:1;
int Z:1;
int N:1;
} b;
unsigned int word;
} PSR;
#else /* __BIG_ENDIAN */
typedef union
{
struct
{
int N:1;
int Z:1;
int C:1;
int V:1;
int Q:1;
int _dnm:19;
int I:1;
int F:1;
int T:1;
int mode:5;
} b;
unsigned int word;
} PSR;
#endif /* __BIG_ENDIAN */
/* Declare PSR as a register variable for the "apsr" register */
register PSR apsr __asm("apsr");

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-121

Non-Confidential
4 Compiler Features
4.14 Compiler support for accessing registers using named register variables

void set_Q(void)
{
apsr.b.Q = 1;
}

The following example shows how to use a named register variable to clear the Q flag in the APSR.
register unsigned int _apsr __asm("apsr");
void ClearQFlag(void)
{
_apsr = _apsr & ~0x08000000; // clear Q flag
}

Compiling this example using --cpu=7-M results in the following assembly code:
ClearQFlag
MRS r0,APSR ; formerly CPSR
BIC r0,r0,#0x80000000
MSR APSR_nzcvq,r0; formerly CPSR_f
BX lr

The following example shows how to use named register variables to set up stack pointers.
register unsigned int _control __asm("control");
register unsigned int _msp __asm("msp");
register unsigned int _psp __asm("psp");
void init(void)
{
_msp = 0x30000000; // set up Main Stack Pointer
_control = _control | 3; // switch to User Mode with Process Stack
_psp = 0x40000000; // set up Process Stack Pointer
}

Compiling this example using --cpu=7-M results in the following assembly code:
init
MOV r0,0x30000000
MSR MSP,r0
MRS r0,CONTROL
ORR r0,r0,#3
MSR CONTROL,r0
MOV r0,#0x40000000
MSR PSP,r0
BX lr

You can also use named register variables to access registers within a coprocessor. The string syntax
within the declaration corresponds to how you intend to use the variable. For example, to declare a
variable that you intend to use with the MCR instruction, look up the instruction syntax for this instruction
and use this syntax when you declare your variable. The following example shows how to use a named
register variable to set bits in a coprocessor register.
register unsigned int PMCR __asm("cp15:0:c9:c12:0");
void __reset_cycle_counter(void)
{
PMCR = 4;
}

Compiling this example using --cpu=7-M results in the following assembly code:
__reset_cycle_counter PROC
MOV r0,#4
MCR p15,#0x0,r0,c9,c12,#0 ; move from r0 to c9
BX lr
ENDP

In the above example, PMCR is declared as a register variable of type unsigned int, that is associated
with the cp15 coprocessor, with CRn = c9, CRm = c12, opcode1 = 0, and opcode2 = 0 in an MCR or MRC
instruction. The MCR encoding in the disassembly corresponds with the register variable declaration.
The physical coprocessor register is specified with a combination of the two register numbers, CRn and
CRm, and two opcode numbers. This maps to a single physical register.

The same principle applies if you want to manipulate individual bits in a register, but you write normal
variable arithmetic in C, and the compiler does a read-modify-write of the coprocessor register. The

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-122

Non-Confidential
4 Compiler Features
4.14 Compiler support for accessing registers using named register variables

following example shows how to manipulate bits in a coprocessor register using a named register
variable
register unsigned int SCTLR __asm("cp15:0:c1:c0:0");
/* Set bit 11 of the system control register */
void enable_branch_prediction(void)
{
SCTLR |= (1 << 11);
}

Compiling this example using --cpu=7-M results in the following assembly code:
__enable_branch_prediction PROC
MRC p15,#0x0,r0,c1,c0,#0
ORR r0,r0,#0x800
MCR p15,#0x0,r0,c1,c0,#0
BX lr
ENDP

Related references
10.5 __asm on page 10-604.
10.159 Named register variables on page 10-774.

Related information
Application Program Status Register.
MRC and MRC2.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-123

Non-Confidential
4 Compiler Features
4.15 Pragmas recognized by the compiler

4.15 Pragmas recognized by the compiler

The compiler recognizes a number of pragmas, used to instruct the compiler to use particular features.
The compiler recognizes the following pragmas:

Pragmas for saving and restoring the pragma state

• #pragma pop
• #pragma push

Pragmas controlling optimization goals

• #pragma Onum
• #pragma Ospace
• #pragma Otime

Pragmas controlling code generation

• #pragma arm
• #pragma thumb
• #pragma exceptions_unwind, #pragma no_exceptions_unwind

Pragmas controlling loop unrolling

• #pragma unroll [(n)]
• #pragma unroll_completely

Pragmas controlling Precompiled Header (PCH) processing

• #pragma hdrstop
• #pragma no_pch

Pragmas controlling anonymous structures and unions

• #pragma anon_unions, #pragma no_anon_unions

Pragmas controlling diagnostic messages

• #pragma diag_default tag[,tag,...]
• #pragma diag_error tag[,tag,...]
• #pragma diag_remark tag[,tag,...]
• #pragma diag_suppress tag[,tag,...]
• #pragma diag_warning tag[, tag, ...]

Miscellaneous pragmas
• #pragma arm section [section_type_list]
• #pragma import(__use_full_stdio)
• #pragma inline, #pragma no_inline
• #pragma once
• #pragma pack(n)
• #pragma softfp_linkage, #pragma no_softfp_linkage
• #pragma import symbol_name

Related references
10.77 #pragma anon_unions, #pragma no_anon_unions on page 10-682.
10.78 #pragma arm on page 10-683.
10.79 #pragma arm section [section_type_list] on page 10-684.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.
10.82 #pragma diag_remark tag[,tag,...] on page 10-688.
10.83 #pragma diag_suppress tag[,tag,...] on page 10-689.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-124

Non-Confidential
4 Compiler Features
4.15 Pragmas recognized by the compiler

10.84 #pragma diag_warning tag[, tag, ...] on page 10-690.

10.85 #pragma exceptions_unwind, #pragma no_exceptions_unwind on page 10-691.
10.86 #pragma GCC system_header on page 10-692.
10.87 #pragma hdrstop on page 10-693.
10.88 #pragma import symbol_name on page 10-694.
10.89 #pragma import(__use_full_stdio) on page 10-695.
10.90 #pragma import(__use_smaller_memcpy) on page 10-696.
10.91 #pragma inline, #pragma no_inline on page 10-697.
10.92 #pragma no_pch on page 10-698.
10.93 #pragma Onum on page 10-699.
10.94 #pragma once on page 10-700.
10.95 #pragma Ospace on page 10-701.
10.96 #pragma Otime on page 10-702.
10.97 #pragma pack(n) on page 10-703.
10.98 #pragma pop on page 10-705.
10.99 #pragma push on page 10-706.
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage on page 10-707.
10.101 #pragma thumb on page 10-708.
10.102 #pragma unroll [(n)] on page 10-709.
10.103 #pragma unroll_completely on page 10-711.
10.104 #pragma weak symbol, #pragma weak symbol1 = symbol2 on page 10-712.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-125

Non-Confidential
4 Compiler Features
4.16 Compiler and processor support for bit-banding

4.16 Compiler and processor support for bit-banding

The compiler supports bit-banding for processors that provide the feature.
The compiler supports bit-banding in the following ways:
• __attribute((bitband)) language extension.
• --bitband command-line option.
Bit-banding is a feature of the Cortex-M3 and Cortex-M4 processors (--cpu=Cortex-M3 and
--cpu=Cortex-M4) and some derivatives (for example, --cpu=Cortex-M3-rev0). This functionality is
not available on other ARM processors.

Related concepts
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.18 --bitband compiler command-line option on page 4-128.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.

Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-126

Non-Confidential
4 Compiler Features
4.17 Compiler type attribute, __attribute__((bitband))

4.17 Compiler type attribute, attribute((bitband))

__attribute__((bitband)) is a type attribute that lets you bit-band type definitions of structures.

In the following example, the unplaced bit-banded objects must be relocated into the bit-band region.
This can be achieved by either using an appropriate scatter-loading description file or by using the
--rw_base linker command-line option.

/* foo.c */
typedef struct {
int i : 1;
int j : 2;
int k : 3;
} BB __attribute__((bitband));
BB value; // Unplaced object
void update_value(void)
{
value.i = 1;
value.j = 0;
}
/* end of foo.c */

Alternatively, you can use __attribute__((at())) to place bit-banded objects at a particular address in
the bit-band region, as in the following example:
/* foo.c */
typedef struct {
int i : 1;
int j : 2;
int k : 3;
} BB __attribute((bitband));
BB value __attribute__((at(0x20000040))); // Placed object
void update_value(void)
{
value.i = 1;
value.j = 0;
}
/* end of foo.c */

Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.18 --bitband compiler command-line option on page 4-128.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.

Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.

Related information
Scatter-loading Features.
--rw_base=address linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-127

Non-Confidential
4 Compiler Features
4.18 --bitband compiler command-line option

4.18 --bitband compiler command-line option

The --bitband command-line option bit-bands all non const global structure objects.
In the following example, when --bitband is applied to foo.c, the write to value.i is bit-banded. That
is, the value 0x00000001 is written to the bit-band alias word that value.i maps to in the bit-band
region.
Accesses to value.j and value.k are not bit-banded.
/* foo.c */
typedef struct {
int i : 1;
int j : 2;
int k : 3;
} BB;
BB value __attribute__((at(0x20000040))); // Placed object
void update_value(void)
{
value.i = 1;
value.j = 0;
}
/* end of foo.c */

armcc supports the bit-banding of objects accessed through absolute addresses. When --bitband is
applied to foo.c in the following example, the access to rts is bit-banded.
/* foo.c */
typedef struct {
int rts : 1;
int cts : 1;
unsigned int data;
} uart;
#define com2 (*((volatile uart *)0x20002000))
void put_com2(int n)
{
com2.rts = 1;
com2.data = n;
}
/* end of foo.c */

Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.

Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-128

Non-Confidential
4 Compiler Features
4.19 How the compiler handles bit-band objects placed outside bit-band regions

4.19 How the compiler handles bit-band objects placed outside bit-band regions
Bit-band objects must not be placed outside bit-band regions.
If you do inadvertently place a bit-band object outside a bit-band region, either using the at attribute, or
using an integer pointer to a particular address, the compiler responds as follows:
• If the bitband attribute is applied to an object type and --bitband is not specified on the command
line, the compiler generates an error.
• If the bitband attribute is applied to an object type and --bitband is specified on the command line,
the compiler generates a warning, and ignores the request to bit-band.
• If the bitband attribute is not applied to an object type and --bitband is specified on the command
line, the compiler ignores the request to bit-band.

Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.18 --bitband compiler command-line option on page 4-128.

Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-129

Non-Confidential
4 Compiler Features
4.20 Compiler support for thread-local storage

4.20 Compiler support for thread-local storage

Thread-local storage is a class of static storage that, like the stack, is private to each thread of execution.
Each thread in a process is given a location where it can store thread-specific data. Variables are
allocated so that there is one instance of the variable for each existing thread.
Before each thread terminates, it releases its dynamic memory and any pointers to thread-local variables
in that thread become invalid.

Related references
10.29 __declspec(thread) on page 10-632.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-130

Non-Confidential
4 Compiler Features
4.21 Compiler support for literal pools

4.21 Compiler support for literal pools

Literal pools are areas of constant data in a code section.
No single instruction can generate a 4 byte constant, so the compiler generates code that loads these
constants from a literal pool.
In the following example, the compiler generates code that loads the integer constant 0xdeadbeef from a
literal pool (marked with ***).
int f(void) {
return 0xdeadbeef;
}

** Section #1 '.text' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]

Size : 12 bytes (alignment 4)
Address: 0x00000000

$a
.text
f
0x00000000: e59f0000 .... LDR r0,[pc,#0] ; [0x8] = 0xdeadbeef
0x00000004: e12fff1e ../. BX lr
$d
0x00000008: deadbeef .... DCD 3735928559 ***

An alternative to using literal pools is to generate the constant in a register with a MOVW/MOVT instruction
pair:
** Section #1 '.text' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 12 bytes (alignment 4)
Address: 0x00000000

$a
.text
f
0x00000000: e30b0eef .... MOV r0,#0xbeef
0x00000004: e34d0ead ..M. MOVT r0,#0xdead
0x00000008: e12fff1e ../. BX lr

In most cases, generating literal pools improves performance and code size. However, in some specific
cases you might prefer to generate code without literal pools.
The following compiler options control literal pools:
• --integer_literal_pools.
• --string_literal_pools.
• --branch_tables.
• --float_literal_pools.

Related references
8.109 --integer_literal_pools, --no_integer_literal_pools on page 8-440.
8.178 --string_literal_pools, --no_string_literal_pools on page 8-515.
8.18 --branch_tables, --no_branch_tables on page 8-340.
8.83 --float_literal_pools, --no_float_literal_pools on page 8-411.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-131

Non-Confidential
4 Compiler Features
4.22 Compiler eight-byte alignment features

4.22 Compiler eight-byte alignment features

The compiler has the following eight-byte alignment features:
• The Procedure Call Standard for the ARM Architecture (AAPCS) requires that the stack is eight-byte
aligned at all external interfaces. The compiler and C libraries preserve the eight-byte alignment of
the stack. In addition, the default C library memory model maintains eight-byte alignment of the
heap.
• Code is compiled in a way that requires and preserves the eight-byte alignment constraints at external
interfaces.
• If you have assembly language files, or legacy objects, or libraries in your project, it is your
responsibility to check that they preserve eight-byte stack alignment, and correct them if required.
• In RVCT v2.0 and later, and in ARM Compiler 4.1 and later, double and long long data types are
eight-byte aligned for compliance with the Application Binary Interface for the ARM Architecture
(AEABI). This enables efficient use of the LDRD and STRD instructions in ARMv5TE and later.
• The default implementations of malloc(), realloc(), and calloc() maintain an eight-byte aligned
heap.
• The default implementation of alloca() returns an eight-byte aligned block of memory.

Related information
Procedure Call Standard for the ARM Architecture.
Application Binary Interface (ABI) for the ARM Architecture.
Alignment restrictions in load and store element and structure instructions.
alloca().
Section alignment with the linker.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-132

Non-Confidential
4 Compiler Features
4.23 Using compiler and linker support for symbol versions

4.23 Using compiler and linker support for symbol versions

The compiler and the linker both support the GNU-extended symbol versioning model.
To create a function with a symbol version in C or C++ code, you must use the assembly label GNU
extension. Use this extension to rename the function symbol into a symbol that has either of the
following names:
• function@@ver for a default ver of function.
• function@ver for a nondefault ver of function.
For example, to define a default version:
int new_function(void) __asm__("versioned_fun@@ver2");
int new_function(void)
{
return 2;
}

To define a nondefault version:

int old_function(void) __asm__("versioned_fun@ver1");
int old_function(void)
{
return 1;
}

Related references
9.36 Assembler labels on page 9-586.

Related information
Symbol versioning for BPABI models.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-133

Non-Confidential
4 Compiler Features
4.24 Precompiled Header (PCH) files

4.24 Precompiled Header (PCH) files

Precompiled Header files can help reduce compilation time when the same header file is used by
multiple source files.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

When you compile source files, the included header files are also compiled. If a header file is included in
more than one source file, it is recompiled when each source file is compiled. Also, you might include
header files that introduce many lines of code, but the primary source files that include them are
relatively small. Therefore, it is often desirable to avoid recompiling a set of header files by precompiling
them. These are referred to as PCH files.
The compiler can precompile and use PCH files automatically with the --pch option, or you can use the
--create_pch and --use_pch options to manually control the use of PCH files.

By default, when the compiler creates a PCH file, it:

• Takes the name of the primary source file and replaces the suffix with .pch.
• Creates the file in the same directory as the primary source file.

Note
Support for PCH processing is not available when you specify multiple source files in a single
compilation. In such a case, the compiler issues an error message and aborts the compilation.

Note
Do not assume that if a PCH file is available, it is used by the compiler. In some cases, system
configuration issues mean that the compiler might not always be able to use the PCH file. Address Space
Randomization on Red Hat Enterprise Linux 3 (RHE3) is one example of a possible system
configuration issue.

Related concepts
2.4 Order of compiler command-line options on page 2-45.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-134

Non-Confidential
4 Compiler Features
4.24 Precompiled Header (PCH) files

4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
8.150 --pch_messages, --no_pch_messages on page 8-486.
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-135

Non-Confidential
4 Compiler Features
4.25 Automatic Precompiled Header (PCH) file processing

4.25 Automatic Precompiled Header (PCH) file processing

The --pch command-line option enables automatic PCH file processing.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

Automatic PCH file processing means that the compiler automatically looks for a qualifying PCH file,
and reads it if found. Otherwise, the compiler creates one for use on a subsequent compilation.
When the compiler creates a PCH file, it takes the name of the primary source file and replaces the suffix
with .pch. The PCH file is created in the directory of the primary source file unless the --pch_dir
option is specified.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.24 Precompiled Header (PCH) files on page 4-134.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-136

Non-Confidential
4 Compiler Features
4.26 Precompiled Header (PCH) file processing and the header stop point

4.26 Precompiled Header (PCH) file processing and the header stop point
The PCH file contains a snapshot of all the code that precedes a header stop point.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

Typically, the header stop point is the first token in the primary source file that does not belong to a
preprocessing directive. In the following example, the header stop point is int and the PCH file contains
a snapshot that reflects the inclusion of xxx.h and yyy.h:
#include "xxx.h"
#include "yyy.h"
int i;

You can manually specify the header stop point with #pragma hdrstop. If you use this pragma, it must
appear before the first token that does not belong to a preprocessing directive. In this example, it must be
placed before int, as follows:
#include "xxx.h"
#include "yyy.h"
#pragma hdrstop
int i;

If a conditional directive block (#if, #ifdef, or #ifndef) encloses the first non-preprocessor token or
#pragma hdrstop, the header stop point is the outermost enclosing conditional directive.

For example:
#include "xxx.h"
#ifndef YYY_H
#define YYY_H 1
#include "yyy.h"
#endif
#if TEST /* Header stop point lies immediately before #if TEST */
int i;
#endif

In this example, the first token that does not belong to a preprocessing directive is int, but the header
stop point is the start of the #if block containing it. The PCH file reflects the inclusion of xxx.h and,
conditionally, the definition of YYY_H and inclusion of yyy.h. It does not contain the state produced by
#if TEST.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-137

Non-Confidential
4 Compiler Features
4.26 Precompiled Header (PCH) file processing and the header stop point

4.29 Obsolete Precompiled Header (PCH) files on page 4-142.

4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
10.87 #pragma hdrstop on page 10-693.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-138

Non-Confidential
4 Compiler Features
4.27 Precompiled Header (PCH) file creation requirements

4.27 Precompiled Header (PCH) file creation requirements

A PCH file is produced only if the header stop point and the code preceding it, mainly the header files,
meet specific requirements.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

These requirements are as follows:

• The header stop point must appear at file scope. It must not be within an unclosed scope established
by a header file. For example, a PCH file is not created in this case:
// xxx.h
class A
{
// xxx.c
#include "xxx.h"
int i;
};
• The header stop point must not be inside a declaration that is started within a header file. Also, in
C++, it must not be part of a declaration list of a linkage specification. For example, in the following
case the header stop point is int, but because it is not the start of a new declaration, no PCH file is
created:
// yyy.h
static
// yyy.c
#include "yyy.h"
int i;
• The header stop point must not be inside a #if block or a #define that is started within a header file.
• The processing that precedes the header stop point must not have produced any errors.
Note
Warnings and other diagnostics are not reproduced when the PCH file is reused.

• No references to predefined macros DATE or TIME must appear.

• No instances of the #line preprocessing directive must appear.
• #pragma no_pch must not appear.
• The code preceding the header stop point must have introduced a sufficient number of declarations to
justify the overhead associated with precompiled headers.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-139

Non-Confidential
4 Compiler Features
4.27 Precompiled Header (PCH) file creation requirements

4.29 Obsolete Precompiled Header (PCH) files on page 4-142.

Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.150 --pch_messages, --no_pch_messages on page 8-486.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-140

Non-Confidential
4 Compiler Features
4.28 Compilation with multiple Precompiled Header (PCH) files

4.28 Compilation with multiple Precompiled Header (PCH) files

More than one PCH file might apply to a given compilation. If so, the compiler uses the largest PCH file.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

That is, the compiler uses the PCH file representing the most preprocessing directives from the primary
source file.
For example, a primary source file might begin with:
#include "xxx.h"
#include "yyy.h"
#include "zzz.h"

If there is one PCH file for xxx.h and a second for xxx.h and yyy.h, the latter PCH file is selected,
assuming that both apply to the current compilation. Additionally, after the PCH file for the first two
headers is read in and the third is compiled, a new PCH file for all three headers is created if the
requirements for PCH file creation are met.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-141

Non-Confidential
4 Compiler Features
4.29 Obsolete Precompiled Header (PCH) files

4.29 Obsolete Precompiled Header (PCH) files

In automatic PCH processing mode the compiler identifies and deletes obsolete PCH files.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

The compiler indicates that a PCH file is obsolete, and deletes it, under the following circumstances:
• If the PCH file is based on at least one out-of-date header file but is otherwise applicable for the
current compilation.
• If the PCH file has the same base name as the source file being compiled, for example, xxx.pch and
xxx.c, but is not applicable for the current compilation, for example, because you have used different
command-line options.
These describe some common cases. You must delete other PCH files as required.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-142

Non-Confidential
4 Compiler Features
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file

4.30 Manually specifying the filename and location of a Precompiled Header (PCH)
file
You can manually specify the filename and location of PCH files for the compiler to create and use.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

Use the following compiler command-line options to specify PCH filenames and locations:
• --create_pch=filename
• --pch_dir=directory
• --use_pch=filename

If you use --create_pch or --use_pch with the --pch_dir option, the indicated filename is appended
to the directory name, unless the filename is an absolute path name.
Note
If multiple options are specified on the same command line, the following rules apply:
• --use_pch takes precedence over --pch.
• --create_pch takes precedence over all other PCH file options.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
8.44 --create_pch=filename on page 8-371.
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.190 --use_pch=filename on page 8-532.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-143

Non-Confidential
4 Compiler Features
4.31 Selectively applying Precompiled Header (PCH) file processing

4.31 Selectively applying Precompiled Header (PCH) file processing

You can selectively include and exclude header files for PCH file processing, even if you are using
automatic PCH file processing.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

Use the #pragma hdrstop directive to insert a manual header stop point in the primary source file. Insert
it before the first token that does not belong to a preprocessing directive. This enables you to specify
where the set of header files that is subject to precompilation ends. For example,
#include "xxx.h"
#include "yyy.h"
#pragma hdrstop
#include "zzz.h"

In this example, the PCH file includes the processing state for xxx.h and yyy.h but not for zzz.h. This
is useful if you decide that the information following the #pragma hdrstop does not justify the creation
of another PCH file.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-144

Non-Confidential
4 Compiler Features
4.32 Suppressing Precompiled Header (PCH) file processing

4.32 Suppressing Precompiled Header (PCH) file processing

To suppress PCH file processing, use the #pragma no_pch directive in the primary source file.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

You do not have to place this directive at the beginning of the file for it to take effect. For example, no
PCH file is created if you compile the following source code with armcc --create_pch=foo.pch
myprog.c:

#include "xxx.h"
#pragma no_pch
#include "zzz.h"

If you want to selectively enable PCH processing, for example, subject xxx.h to PCH file processing, but
not zzz.h, replace #pragma no_pch with #pragma hdrstop, as follows:
#include "xxx.h"
#pragma hdrstop
#include "zzz.h"

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-145

Non-Confidential
4 Compiler Features
4.33 Message output during Precompiled Header (PCH) processing

4.33 Message output during Precompiled Header (PCH) processing

Whenever the compiler creates or uses a PCH file, it displays a message. You can suppress these
messages or make them more verbose.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

When the compiler creates or uses a PCH file, it displays the following kind of message:
test.c: creating precompiled header file test.pch

You can suppress this message with the compiler command-line option --no_pch_messages.
The --pch_verbose option enables verbose mode. In verbose mode, the compiler displays a message for
each PCH file that it considers but does not use, giving the reason why it cannot be used.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.

Related references
8.150 --pch_messages, --no_pch_messages on page 8-486.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-146

Non-Confidential
4 Compiler Features
4.34 Performance issues with Precompiled Header (PCH) files

4.34 Performance issues with Precompiled Header (PCH) files

Typically, the overhead of creating and reading a PCH file is small, even for reasonably large header
files. If the PCH file is used, there is typically a significant decrease in compilation time. However, PCH
files can range in size from about 250KB to several megabytes or more, so you might not want to create
many PCH files.

Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.

PCH processing might not always be appropriate, for example, where you have an arbitrary set of files
with non-uniform initial sequences of preprocessing directives.
The benefits of PCH processing occur when several source files can share the same PCH file. The more
sharing, the less disk space is consumed. Sharing minimizes the disadvantage of large PCH files, without
giving up the advantage of a significant decrease in compilation times.
Therefore, to take full advantage of header file precompilation, you might have to re-order the #include
sections of your source files, or group #include directives within a commonly used header file.
Different environments and different projects might have differing requirements. Be aware, however, that
making the best use of PCH support might require some experimentation and probably some minor
changes to source code.

Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-147

Non-Confidential
4 Compiler Features
4.35 Default compiler options that are affected by optimization level

4.35 Default compiler options that are affected by optimization level

In general, optimization levels are independent from the default behavior of command-line options.
However, there are a small number of exceptions where the level of optimization you use changes the
default option.
These exceptions are:
• --autoinline, --no_autoinline.
• --data_reorder, --no_data_reorder.
Depending on the value of -Onum you use (-O0, -O1, -O2, or -O3), the default option changes as
specified. See the individual command-line option reference descriptions for more information.

Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.46 --data_reorder, --no_data_reorder on page 8-373.
8.139 -Onum on page 8-473.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 4-148

Non-Confidential
Chapter 5
Compiler Coding Practices

Describes programming techniques and practices to help you increase the portability, efficiency and
robustness of your C and C++ source code.
It contains the following sections:
• 5.1 The compiler as an optimizing compiler on page 5-152.
• 5.2 Compiler optimization for code size versus speed on page 5-153.
• 5.3 Compiler optimization levels and the debug view on page 5-154.
• 5.4 Selecting the target processor at compile time on page 5-157.
• 5.5 Enabling NEON and FPU for bare-metal on page 5-158.
• 5.6 Optimization of loop termination in C code on page 5-159.
• 5.7 Loop unrolling in C code on page 5-161.
• 5.8 Compiler optimization and the volatile keyword on page 5-163.
• 5.9 Code metrics on page 5-165.
• 5.10 Code metrics for measurement of code size and data size on page 5-166.
• 5.11 Stack use in C and C++ on page 5-167.
• 5.12 Benefits of reducing debug information in objects and libraries on page 5-169.
• 5.13 Methods of reducing debug information in objects and libraries on page 5-170.
• 5.14 Guarding against multiple inclusion of header files on page 5-171.
• 5.15 Methods of minimizing function parameter passing overhead on page 5-172.
• 5.16 Returning structures from functions through registers on page 5-173.
• 5.17 Functions that return the same result when called with the same arguments on page 5-174.
• 5.18 Comparison of pure and impure functions on page 5-175.
• 5.19 Recommendation of postfix syntax when qualifying functions with ARM function modifiers
on page 5-176.
• 5.20 Inline functions on page 5-177.
• 5.21 Compiler decisions on function inlining on page 5-178.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-149

Non-Confidential
5 Compiler Coding Practices

• 5.22 Automatic function inlining and static functions on page 5-179.

• 5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
• 5.24 Automatic function inlining and multifile compilation on page 5-181.
• 5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
• 5.26 Compiler modes and inline functions on page 5-183.
• 5.27 Inline functions in C++ and C90 mode on page 5-184.
• 5.28 Inline functions in C99 mode on page 5-185.
• 5.29 Inline functions and debugging on page 5-187.
• 5.30 Types of data alignment on page 5-188.
• 5.31 Advantages of natural data alignment on page 5-189.
• 5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
• 5.33 Relevance of natural data alignment at compile time on page 5-191.
• 5.34 Unaligned data access in C and C++ code on page 5-192.
• 5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
• 5.36 Unaligned fields in structures on page 5-194.
• 5.37 Performance penalty associated with marking whole structures as packed on page 5-195.
• 5.38 Unaligned pointers in C and C++ code on page 5-196.
• 5.39 Unaligned Load Register (LDR) instructions generated by the compiler on page 5-197.
• 5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed
fields, and of a __packed struct and a #pragma packed struct on page 5-198.
• 5.41 Compiler support for floating-point arithmetic on page 5-200.
• 5.42 Default selection of hardware or software floating-point support on page 5-202.
• 5.43 Example of hardware and software support differences for floating-point arithmetic
on page 5-203.
• 5.44 Vector Floating-Point (VFP) architectures on page 5-205.
• 5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
• 5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
• 5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
• 5.48 Half-precision floating-point number format on page 5-211.
• 5.49 Compiler support for floating-point computations and linkage on page 5-212.
• 5.50 Types of floating-point linkage on page 5-213.
• 5.51 Compiler options for floating-point linkage and computations on page 5-214.
• 5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
• 5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
• 5.54 Integer division-by-zero errors in C code on page 5-221.
• 5.55 Software floating-point division-by-zero errors in C code on page 5-223.
• 5.56 About trapping software floating-point division-by-zero errors on page 5-224.
• 5.57 Identification of software floating-point division-by-zero errors on page 5-225.
• 5.58 Software floating-point division-by-zero debugging on page 5-227.
• 5.59 New language features of C99 on page 5-228.
• 5.60 New library features of C99 on page 5-230.
• 5.61 // comments in C99 and C90 on page 5-231.
• 5.62 Compound literals in C99 on page 5-232.
• 5.63 Designated initializers in C99 on page 5-233.
• 5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
• 5.65 Flexible array members in C99 on page 5-235.
• 5.66 __func__ predefined identifier in C99 on page 5-236.
• 5.67 inline functions in C99 on page 5-237.
• 5.68 long long data type in C99 and C90 on page 5-238.
• 5.69 Macros with a variable number of arguments in C99 on page 5-239.
• 5.70 Mixed declarations and statements in C99 on page 5-240.
• 5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
• 5.72 _Pragma preprocessing operator in C99 on page 5-242.
• 5.73 Restricted pointers in C99 on page 5-243.
• 5.74 Additional <math.h> library functions in C99 on page 5-244.
• 5.75 Complex numbers in C99 on page 5-245.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-150

Non-Confidential
5 Compiler Coding Practices

• 5.76 Boolean type and <stdbool.h> in C99 on page 5-246.

• 5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
• 5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
• 5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
• 5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
• 5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
• 5.82 How to prevent uninitialized data from being initialized to zero on page 5-252.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-151

Non-Confidential
5 Compiler Coding Practices
5.1 The compiler as an optimizing compiler

5.1 The compiler as an optimizing compiler

The compiler is highly optimizing for small code size and high performance, performing a range of
optimization techniques.
The compiler performs optimizations common to other optimizing compilers, for example, data-flow
optimizations such as common sub-expression elimination and loop optimizations such as loop
combining and distribution.
In addition, the compiler performs a range of optimizations specific to ARM architecture-based
processors.
Although the compiler performs a number of architecture independent optimizations, you can often
significantly improve the performance of your C or C++ code by selecting correct optimization criteria,
and the correct target processor and architecture.
Note
Optimization options can limit debug information generated by the compiler.

Related concepts
5.2 Compiler optimization for code size versus speed on page 5-153.
5.3 Compiler optimization levels and the debug view on page 5-154.
5.6 Optimization of loop termination in C code on page 5-159.
5.8 Compiler optimization and the volatile keyword on page 5-163.

Related tasks
5.4 Selecting the target processor at compile time on page 5-157.

Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.43 --cpu=name on page 8-368.
8.46 --data_reorder, --no_data_reorder on page 8-373.
8.85 --forceinline on page 8-413.
8.87 --fpmode=model on page 8-415.
8.108 --inline, --no_inline on page 8-439.
8.115 --library_interface=lib on page 8-446.
8.116 --library_type=lib on page 8-448.
8.126 --lower_ropi, --no_lower_ropi on page 8-459.
8.127 --lower_rwpi, --no_lower_rwpi on page 8-460.
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
8.165 --retain=option on page 8-501.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-152

Non-Confidential
5 Compiler Coding Practices
5.2 Compiler optimization for code size versus speed

5.2 Compiler optimization for code size versus speed

The compiler can optimize for either code size or performance.
The following options control whether the compiler optimizes for code size or performance:

-Ospace
This option causes the compiler to optimize mainly for code size. This is the default option.
-Otime
This option causes the compiler to optimize mainly for speed.
For best results, you must build your application using the most appropriate command-line option.
Note
These command-line options instruct the compiler to use optimizations that deliver the effect wanted in
the vast majority of cases. However, it is not guaranteed that -Otime always generates faster code, or that
-Ospace always generates smaller code.

Related references
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-153

Non-Confidential
5 Compiler Coding Practices
5.3 Compiler optimization levels and the debug view

5.3 Compiler optimization levels and the debug view

The precise optimizations performed by the compiler depend both on the level of optimization chosen,
and whether you are optimizing for performance or code size.
The compiler supports the following optimization levels:

0
Minimum optimization. Turns off most optimizations. When debugging is enabled, this option
gives the best possible debug view because the structure of the generated code directly
corresponds to the source code. All optimization that interferes with the debug view is disabled.
In particular:
• Breakpoints can be set on any reachable point, including dead code.
• The value of a variable is available everywhere within its scope, except where it is
uninitialized.
• Backtrace gives the stack of open function activations that is expected from reading the
source.

Note
Although the debug view produced by -O0 corresponds most closely to the source code, users
might prefer the debug view produced by -O1 because this improves the quality of the code
without changing the fundamental structure.

Note
Dead code includes reachable code that has no effect on the result of the program, for example
an assignment to a local variable that is never used. Unreachable code is specifically code that
cannot be reached via any control flow path, for example code that immediately follows a return
statement.

1
Restricted optimization. The compiler only performs optimizations that can be described by
debug information. Removes unused inline functions and unused static functions. Turns off
optimizations that seriously degrade the debug view. If used with --debug, this option gives a
generally satisfactory debug view with good code density.
The differences in the debug view from –O0 are:
• Breakpoints cannot be set on dead code.
• Values of variables might not be available within their scope after they have been initialized.
For example if their assigned location has been reused.
• Functions with no side-effects might be called out of sequence, or might be omitted if the
result is not needed.
• Backtrace might not give the stack of open function activations that is expected from reading
the source because of the presence of tailcalls.
The optimization level –O1 produces good correspondence between source code and object
code, especially when the source code contains no dead code. The generated code can be
significantly smaller than the code at –O0, which can simplify analysis of the object code.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-154

Non-Confidential
5 Compiler Coding Practices
5.3 Compiler optimization levels and the debug view

2
High optimization. If used with --debug, the debug view might be less satisfactory because the
mapping of object code to source code is not always clear. The compiler might perform
optimizations that cannot be described by debug information.
This is the default optimization level.
The differences in the debug view from –O1 are:
• The source code to object code mapping might be many to one, because of the possibility of
multiple source code locations mapping to one point of the file, and more aggressive
instruction scheduling.
• Instruction scheduling is allowed to cross sequence points. This can lead to mismatches
between the reported value of a variable at a particular point, and the value you might expect
from reading the source code.
• The compiler automatically inlines functions.
3
Maximum optimization. When debugging is enabled, this option typically gives a poor debug
view. ARM recommends debugging at lower optimization levels.
If you use -O3 and -Otime together, the compiler performs extra optimizations that are more
aggressive, such as:
• High-level scalar optimizations, including loop unrolling. This can give significant
performance benefits at a small code size cost, but at the risk of a longer build time.
• More aggressive inlining and automatic inlining.
These optimizations effectively rewrite the input source code, resulting in object code with the
lowest correspondence to source code and the worst debug view. The
--loop_optimization_level=option controls the amount of loop optimization performed at
–O3 –Otime. The higher the amount of loop optimization the worse the correspondence between
source and object code.
Use of the --vectorize option also lowers the correspondence between source and object code.
For extra information about the high level transformations performed on the source code at
–O3 –Otime use the --remarks command-line option.

Because optimization affects the mapping of object code to source code, the choice of optimization level
with -Ospace and -Otime generally impacts the debug view.
The option -O0 is the best option to use if a simple debug view is required. Selecting -O0 typically
increases the size of the ELF image by 7 to 15%. To reduce the size of your debug tables, use the
--remove_unneeded_entities option.

Related concepts
5.12 Benefits of reducing debug information in objects and libraries on page 5-169.

Related references
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
8.47 --debug, --no_debug on page 8-374.
8.48 --debug_macros, --no_debug_macros on page 8-375.
8.68 --dwarf2 on page 8-396.
8.69 --dwarf3 on page 8-397.
8.139 -Onum on page 8-473.
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
8.163 --remove_unneeded_entities, --no_remove_unneeded_entities on page 8-499.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-155

Non-Confidential
5 Compiler Coding Practices
5.3 Compiler optimization levels and the debug view

Related information
ELF for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-156

Non-Confidential
5 Compiler Coding Practices
5.4 Selecting the target processor at compile time

5.4 Selecting the target processor at compile time

You can often significantly improve the performance of your C or C++ code by selecting the appropriate
target processor at compile time.
Each new version of the ARM architecture typically supports extra instructions, extra modes of
operation, pipeline differences, and register renaming.

Procedure
1. Decide whether the compiled program is to run on a specific ARM architecture-based processor or on
different ARM processors.
2. Obtain the name, or names, of the target processors recognized by the compiler using the following
compiler command-line option:
--cpu=list

3. If the compiled program is to run on a specific ARM architecture-based processor, having obtained
the name of the processor with the --cpu=list option, select the target processor using the
--cpu=name compiler command-line option.
For example, to compile code to run on a Cortex-A9 processor:
armcc --cpu=Cortex-A9 myprog.c

Alternatively, if the compiled program is to run on different ARM processors, choose the lowest
common denominator architecture appropriate for the application and then specify that architecture in
place of the processor name. For example, to compile code for processors supporting the ARMv6
architecture:
armcc --cpu=6 myprog.c

Selecting the target processor using the --cpu=name command-line option lets the compiler:
• Make full use of all available instructions for that particular processor.
• Perform processor-specific optimizations such as instruction scheduling.
--cpu=list lists all the processors and architectures that the compiler supports.

Related references
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-157

Non-Confidential
5 Compiler Coding Practices
5.5 Enabling NEON and FPU for bare-metal

5.5 Enabling NEON and FPU for bare-metal

If the compiler knows that an FPU or NEON is available, for example if you use the --cpu option to
specify a processor with an FPU, then the compiler might introduce FPU or NEON instructions into your
code.
These instructions can be introduced even if you are not deliberately performing any floating-point
operations.
If you want to build an image that does not use any FPU or NEON instructions, and does not require that
the FPU and NEON be enabled, you can use the --fpu=none option when building all your source files.
When targeting bare-metal and compiling for a processor with an FPU or NEON, you must enable the
FPU and NEON in your startup code before you can execute FPU or NEON instructions. See the
Technical Reference Manual for your processor.
For example, the following startup code enables NEON and FPU hardware for a Cortex-A8 processor:
__asm void StartHere(void)
{
MRC p15,0,r0,c1,c0,2 // Read CP Access register
ORR r0,r0,#0x00f00000 // Enable full access to NEON/VFP (Coprocessors 10 and 11)
MCR p15,0,r0,c1,c0,2 // Write CP Access register
ISB
MOV r0,#0x40000000 // Switch on the VFP and NEON hardware
MSR FPEXC,r0 // Set EN bit in FPEXC
IMPORT __main
B __main // Enter normal C run-time environment & library start-up
}

To compile this code:

armcc -c --cpu=Cortex-A8 main.c
armlink --entry=StartHere main.o

Related tasks
5.4 Selecting the target processor at compile time on page 5-157.
3.4 Generating NEON instructions from C or C++ code on page 3-73.

Related references
8.43 --cpu=name on page 8-368.
8.89 --fpu=name on page 8-418.

Related information
--startup=symbol, --no_startup linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-158

Non-Confidential
5 Compiler Coding Practices
5.6 Optimization of loop termination in C code

5.6 Optimization of loop termination in C code

Loops are a common construct in most programs. Because a significant amount of execution time is
often spent in loops, it is worthwhile paying attention to time-critical loops.
The loop termination condition can cause significant overhead if written without caution. Where
possible:
• Use simple termination conditions.
• Write count-down-to-zero loops.
• Use counters of type unsigned int.
• Test for equality against zero.
Following any or all of these guidelines, separately or in combination, is likely to result in better code.
The following table shows two sample implementations of a routine to calculate n! that together
illustrate loop termination overhead. The first implementation calculates n! using an incrementing loop,
while the second routine calculates n! using a decrementing loop.

Table 5-1 C code for incrementing and decrementing loops

Incrementing loop Decrementing loop

int fact1(int n) int fact2(int n)

{ {
int i, fact = 1; unsigned int i, fact = 1;
for (i = 1; i <= n; i++) for (i = n; i != 0; i--)
fact *= i; fact *= i;
return (fact); return (fact);
} }

The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations above, where the C code for both implementations has been
compiled using the options -O2 -Otime.

Table 5-2 C Disassembly for incrementing and decrementing loops

Incrementing loop Decrementing loop

fact1 PROC fact2 PROC

MOV r2, r0 MOVS r1, r0
MOV r0, #1 MOV r0, #1
CMP r2, #1 BXEQ lr
MOV r1, r0 |L1.12|
BXLT lr MUL r0, r1, r0
|L1.20| SUBS r1, r1, #1
MUL r0, r1, r0 BNE |L1.12|
ADD r1, r1, #1 BX lr
CMP r1, r2 ENDP
BLE |L1.20|
BX lr
ENDP

Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop
disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly. This
is because a compare with zero can be used instead.
In addition to saving an instruction in the loop, the variable n does not have to be saved across the loop,
so the use of a register is also saved in the decrementing loop disassembly. This eases register allocation.
It is even more important if the original termination condition involves a function call. For example:
for (...; i < get_limit(); ...);

The technique of initializing the loop counter to the number of iterations required, and then decrementing
down to zero, also applies to while and do statements.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-159

Non-Confidential
5 Compiler Coding Practices
5.6 Optimization of loop termination in C code

Related concepts
5.7 Loop unrolling in C code on page 5-161.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-160

Non-Confidential
5 Compiler Coding Practices
5.7 Loop unrolling in C code

5.7 Loop unrolling in C code

Loops are a common construct in most programs. Because a significant amount of execution time is
often spent in loops, it is worthwhile paying attention to time-critical loops.
Small loops can be unrolled for higher performance, with the disadvantage of increased code size. When
a loop is unrolled, a loop counter needs to be updated less often and fewer branches are executed. If the
loop iterates only a few times, it can be fully unrolled so that the loop overhead completely disappears.
The compiler unrolls loops automatically at -O3 -Otime. Otherwise, any unrolling must be done in
source code.
Note
Manual unrolling of loops might hinder the automatic re-rolling of loops and other loop optimizations by
the compiler.

The advantages and disadvantages of loop unrolling can be illustrated using the two sample routines
shown in the following table. Both routines efficiently test a single bit by extracting the lowest bit and
counting it, after which the bit is shifted out.
The first implementation uses a loop to count bits. The second routine is the first implementation
unrolled four times, with an optimization applied by combining the four shifts of n into one shift.
Unrolling frequently provides new opportunities for optimization.

Table 5-3 C code for rolled and unrolled bit-counting loops

Bit-counting loop Unrolled bit-counting loop

int countbit1(unsigned int n) int countbit2(unsigned int n)

{ {
int bits = 0; int bits = 0;
while (n != 0) while (n != 0)
{ {
if (n & 1) bits++; if (n & 1) bits++;
n >>= 1; if (n & 2) bits++;
} if (n & 4) bits++;
return bits; if (n & 8) bits++;
} n >>= 4;
}
return bits;
}

The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations above, where the C code for each implementation has been
compiled using the option -O2.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-161

Non-Confidential
5 Compiler Coding Practices
5.7 Loop unrolling in C code

Table 5-4 Disassembly for rolled and unrolled bit-counting loops

Bit-counting loop Unrolled bit-counting loop

countbit1 PROC countbit2 PROC

MOV r1, #0 MOV r1, r0
B |L1.20| MOV r0, #0
|L1.8| B |L1.48|
TST r0, #1 |L1.12|
ADDNE r1, r1, #1 TST r1, #1
LSR r0, r0, #1 ADDNE r0, r0, #1
|L1.20| TST r1, #2
CMP r0, #0 ADDNE r0, r0, #1
BNE |L1.8| TST r1, #4
MOV r0, r1 ADDNE r0, r0, #1
BX lr TST r1, #8
ENDP ADDNE r0, r0, #1
LSR r1, r1, #4
|L1.48|
CMP r1, #0
BNE |L1.12|
BX lr
ENDP

On the ARM9 processor, checking a single bit takes six cycles in the disassembly of the bit-counting
loop shown in the leftmost column. The code size is only nine instructions. The unrolled version of the
bit-counting loop checks four bits at a time per loop iteration, taking on average only three cycles per bit.
However, the cost is the larger code size of fifteen instructions.

Related concepts
5.6 Optimization of loop termination in C code on page 5-159.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-162

Non-Confidential
5 Compiler Coding Practices
5.8 Compiler optimization and the volatile keyword

5.8 Compiler optimization and the volatile keyword

Higher optimization levels can reveal problems in some programs that are not apparent at lower
optimization levels, for example, missing volatile qualifiers.
This can manifest itself in a number of ways. Code might become stuck in a loop while polling hardware,
multi-threaded code might exhibit strange behavior, or optimization might result in the removal of code
that implements deliberate timing delays. In such cases, it is possible that some variables are required to
be declared as volatile.
The declaration of a variable as volatile tells the compiler that the variable can be modified at any time
externally to the implementation, for example, by the operating system, by another thread of execution
such as an interrupt routine or signal handler, or by hardware. Because the value of a volatile-qualified
variable can change at any time, the actual variable in memory must always be accessed whenever the
variable is referenced in code. This means the compiler cannot perform optimizations on the variable, for
example, caching its value in a register to avoid memory accesses. Similarly, when used in the context of
implementing a sleep or timer delay, declaring a variable as volatile tells the compiler that a specific
type of behavior is intended, and that such code must not be optimized in such a way that it removes the
intended functionality.
In contrast, when a variable is not declared as volatile, the compiler can assume its value cannot be
modified in unexpected ways. Therefore, the compiler can perform optimizations on the variable.
The use of the volatile keyword is illustrated in the two sample routines of the following table. Both of
these routines loop reading a buffer until a status flag buffer_full is set to true. The state of
buffer_full can change asynchronously with program flow.

The two versions of the routine differ only in the way that buffer_full is declared. The first routine
version is incorrect. Notice that the variable buffer_full is not qualified as volatile in this version. In
contrast, the second version of the routine shows the same loop where buffer_full is correctly qualified
as volatile.

Table 5-5 C code for nonvolatile and volatile buffer loops

Nonvolatile version of buffer loop Volatile version of buffer loop

int buffer_full; volatile int buffer_full;

int read_stream(void) int read_stream(void)
{ {
int count = 0; int count = 0;
while (!buffer_full) while (!buffer_full)
{ {
count++; count++;
} }
return count; return count;
} }

The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the examples above, where the C code for each implementation has been compiled using the
option -O2.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-163

Non-Confidential
5 Compiler Coding Practices
5.8 Compiler optimization and the volatile keyword

Table 5-6 Disassembly for nonvolatile and volatile buffer loop

Nonvolatile version of buffer loop Volatile version of buffer loop

read_stream PROC read_stream PROC

LDR r1, |L1.28| LDR r1, |L1.28|
MOV r0, #0 MOV r0, #0
LDR r1, [r1, #0] |L1.8|
|L1.12| LDR r2, [r1, #0]; ; buffer_full
CMP r1, #0 CMP r2, #0
ADDEQ r0, r0, #1 ADDEQ r0, r0, #1
BEQ |L1.12| ; infinite loop BEQ |L1.8|
BX lr BX lr
ENDP ENDP
|L1.28| |L1.28|
DCD ||.data|| DCD ||.data||
AREA ||.data||, DATA, ALIGN=2 AREA ||.data||, DATA, ALIGN=2
buffer_full buffer_full
DCD 0x00000000 DCD 0x00000000

In the disassembly of the nonvolatile version of the buffer loop in the above table, the statement LDR r0,
[r0, #0] loads the value of buffer_full into register r0 outside the loop labeled |L1.12|. Because
buffer_full is not declared as volatile, the compiler assumes that its value cannot be modified
outside the program. Having already read the value of buffer_full into r0, the compiler omits
reloading the variable when optimizations are enabled, because its value cannot change. The result is the
infinite loop labeled |L1.12|.
In contrast, in the disassembly of the volatile version of the buffer loop, the compiler assumes the value
of buffer_full can change outside the program and performs no optimizations. Consequently, the value
of buffer_full is loaded into register r0 inside the loop labeled |L1.8|. As a result, the loop |L1.8| is
implemented correctly in assembly code.
To avoid optimization problems caused by changes to program state external to the implementation, you
must declare variables as volatile whenever their values can change unexpectedly in ways unknown to
the implementation.
In practice, you must declare a variable as volatile whenever you are:
• Accessing memory-mapped peripherals.
• Sharing global variables between multiple threads.
• Accessing global variables in an interrupt routine or signal handler.
The compiler does not optimize the variables you have declared as volatile.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-164

Non-Confidential
5 Compiler Coding Practices
5.9 Code metrics

5.9 Code metrics

Code metrics provide a means of objectively evaluating code quality. The compiler and linker provide
several facilities for generating simple code metrics and improving code quality.
In particular, you can:
• Measure code and data sizes.
• Generate dynamic callgraphs.
• Measure stack use.

Related concepts
5.10 Code metrics for measurement of code size and data size on page 5-166.
5.11 Stack use in C and C++ on page 5-167.

Related information
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--map, --no_map linker option.
--callgraph, --no_callgraph linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-165

Non-Confidential
5 Compiler Coding Practices
5.10 Code metrics for measurement of code size and data size

5.10 Code metrics for measurement of code size and data size
The compiler, linker, and fromelf image converter let you measure code and data size.
Use the following command-line options:
• --info=sizes (armlink and fromelf).
• --info=totals (armcc, armlink, and fromelf).
• --map (armlink).

Related references
8.107 --info=totals on page 8-438.

Related information
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--map, --no_map linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-166

Non-Confidential
5 Compiler Coding Practices
5.11 Stack use in C and C++

5.11 Stack use in C and C++

C and C++ both use the stack intensively.
For example, the stack holds:
• The return address of functions.
• Registers that must be preserved, as determined by the ARM Architecture Procedure Call Standard
(AAPCS), for instance, when register contents are saved on entry into subroutines.
• Local variables, including local arrays, structures, unions, and in C++, classes.
Some stack usage is not obvious, such as:
• Local integer or floating point variables are allocated stack memory if they are spilled (that is, not
allocated to a register).
• Structures are normally allocated to the stack. A space equivalent to sizeof(struct) padded to a
multiple of four bytes is reserved on the stack. The compiler tries to allocate structures to registers
instead.
• If the size of an array size is known at compile time, the compiler allocates memory on the stack.
Again, a space equivalent to sizeof(struct) padded to a multiple of four bytes is reserved on the
stack.
Note
Memory for variable length arrays is allocated at runtime, on the heap.

• Several optimizations can introduce new temporary variables to hold intermediate results. The
optimizations include: CSE elimination, live range splitting and structure splitting. The compiler tries
to allocate these temporary variables to registers. If not, it spills them to the stack.
• Generally, code compiled for processors that support only 16-bit encoded Thumb instructions makes
more use of the stack than ARM code and code compiled for processors that support 32-bit encoded
Thumb instructions. This is because 16-bit encoded Thumb instructions have only eight registers
available for allocation, compared to fourteen for ARM code and 32-bit encoded Thumb instructions.
• The AAPCS requires that some function arguments are passed through the stack instead of the
registers, depending on their type, size, and order.

Methods of estimating stack usage

Stack use is difficult to estimate because it is code dependent, and can vary between runs depending on
the code path that the program takes on execution. However, it is possible to manually estimate the
extent of stack utilization using the following methods:
• Link with --callgraph to produce a static callgraph. This shows information on all functions,
including stack use.
• Link with --info=stack or --info=summarystack to list the stack usage of all global symbols.
• Use the debugger to set a watchpoint on the last available location in the stack and see if the
watchpoint is ever hit.
Note
Running your program under a debug monitor like a Real-Time System Model (RTSM), in DS-5
Debugger or RealView Debugger, has a severe performance penalty, because the watched address is
checked for every instruction. Using DSTREAM or RealView ICE and RealView Trace has no such
penalty.

• Use the debugger, and:

1. Allocate space in memory for the stack that is much larger than you expect to require.
2. Fill the stack space with copies of a known value, for example, 0xDEADDEAD.
3. Run your application, or a fixed portion of it. Aim to use as much of the stack space as possible in
the test run. For example, try to execute the most deeply nested function calls and the worst case
path found by the static analysis. Try to generate interrupts where appropriate, so that they are
included in the stack trace.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-167

Non-Confidential
5 Compiler Coding Practices
5.11 Stack use in C and C++

4. After your application has finished executing, examine the stack space of memory to see how
many of the known values have been overwritten. The space has garbage in the used part and the
known values in the remainder.
5. Count the number of garbage values and multiply by four, to give their size, in bytes.
The result of the calculation shows how the size of the stack has grown, in bytes.
• Use RTSM, and define a region of memory where access is not allowed directly below your stack in
memory, with a map file. If the stack overflows into the forbidden region, a data abort occurs, which
can be trapped by the debugger.

Methods of reducing stack usage

In general, you can lower the stack requirements of your program by:
• Writing small functions that only require a small number of variables.
• Avoiding the use of large local structures or arrays.
• Avoiding recursion, for example, by using an alternative algorithm.
• Minimizing the number of variables that are in use at any given time at each point in a function.
• Using C block scope and declaring variables only where they are required, so overlapping the
memory used by distinct scopes.
The use of C block scope involves declaring variables only where they are required. This minimizes use
of the stack by overlapping memory required by distinct scopes.
Note
Code performance is optimized by locating the stack in fast (zero wait-state), on-chip, 32-bit RAM. The
ARM (LDMFD and STMFD) and Thumb (PUSH and POP) stack access instructions both push and pop a
number of 32-bit registers on or off the stack. If the stack is in 32-bit memory, each register access takes
one cycle. However, if the stack is in 16-bit memory then each register access takes two cycles, reducing
overall performance.

Related information
Getting Started with DS-5, ARM DS-5 Product Overview, About Fixed Virtual Platform (FVP).
ARM DS-5 Using the Debugger.
ARM DS-5 EB FVP Reference Guide.
Fixed Virtual Platforms VE and MPS FVP Reference Guide.
Procedure Call Standard for the ARM Architecture.
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--callgraph, --no_callgraph linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-168

Non-Confidential
5 Compiler Coding Practices
5.12 Benefits of reducing debug information in objects and libraries

5.12 Benefits of reducing debug information in objects and libraries

Reducing the amount of debug information in objects and libraries has a number of code size and
performance benefits.
Reducing the level of debug information:
• Reduces the size of objects and libraries, thereby reducing the amount of disk space required to store
them.
• Speeds up link time. In the compilation cycle, most of the link time is consumed by reading in all the
debug sections and eliminating the duplicates.
• Minimizes the size of the final image. This facilitates the fast loading and processing of debug
symbols by a debugger.

Related concepts
5.3 Compiler optimization levels and the debug view on page 5-154.

Related references
5.13 Methods of reducing debug information in objects and libraries on page 5-170.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-169

Non-Confidential
5 Compiler Coding Practices
5.13 Methods of reducing debug information in objects and libraries

5.13 Methods of reducing debug information in objects and libraries

There are a number of ways to reduce the amount of debug information being generated per source file.
For example, you can:
• Avoid conditional use of #define in header files. This might make it more difficult for the linker to
eliminate duplicate information.
• Modify your C or C++ source files so that header files are #included in the same order.
• Partition header information into smaller blocks. That is, use a larger number of smaller header files
rather than a smaller number of larger header files. This helps the linker to eliminate more of the
common blocks.
• Only include a header file in a C or C++ source file if it is really required.
• Guard against the multiple inclusion of header files. Place multiple-inclusion guards inside the header
file, rather than around the #include statement. For example, if you have a header file foo.h, add:
#ifndef foo_h
#define foo_h
...
// rest of header file as before
...
#endif /* foo_h */

You can use the compiler option --remarks to warn about unguarded header files.
• Compile your code with the --no_debug_macros command-line option to discard preprocessor
macro definitions from debug tables.
• Consider using (or not using) --remove_unneeded_entities.
Caution
Although --remove_unneeded_entities can help to reduce the amount of debug information
generated per file, it has the disadvantage of reducing the number of debug sections that are common
to many files. This reduces the number of common debug sections that the linker is able to remove at
final link time, and can result in a final debug image that is larger than necessary. For this reason, use
--remove_unneeded_entities only when necessary.

Related concepts
2.18.1 Compilation build time on page 2-62.
5.12 Benefits of reducing debug information in objects and libraries on page 5-169.
5.3 Compiler optimization levels and the debug view on page 5-154.

Related tasks
2.18.2 Minimizing compilation build time on page 2-63.

Related references
8.48 --debug_macros, --no_debug_macros on page 8-375.
8.162 --remarks on page 8-498.
8.163 --remove_unneeded_entities, --no_remove_unneeded_entities on page 8-499.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-170

Non-Confidential
5 Compiler Coding Practices
5.14 Guarding against multiple inclusion of header files

5.14 Guarding against multiple inclusion of header files

Guarding against multiple inclusion of header files has a number of benefits.
Specifically, guarding against multiple inclusion of header files:
• Improves compilation time.
• Reduces the size of object files generated using the -g compiler command-line option, which can
speed up link time.
• Avoids compilation errors that arise from including the same code multiple times.
For example:
/* foo.h */
#ifndef FOO_H
#define FOO_H 1
...
#endif
/* bar.c */
#ifndef FOO_H
#include "foo.h"
#endif

Related references
8.91 -g on page 8-422.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-171

Non-Confidential
5 Compiler Coding Practices
5.15 Methods of minimizing function parameter passing overhead

5.15 Methods of minimizing function parameter passing overhead

There are a number of ways in which you can minimize the overhead of passing parameters to functions.
For example:
• Ensure that functions take four or fewer arguments if each argument is a word or less in size. In C++,
ensure that nonstatic member functions take three or fewer arguments because of the implicit this
pointer argument that is usually passed in R0.
• Ensure that a function does a significant amount of work if it requires more than four arguments, so
that the cost of passing the stacked arguments is outweighed.
• Put related arguments in a structure, and pass a pointer to the structure in any function call. This
reduces the number of parameters and increases readability.
• Minimize the number of long long parameters, because these take two argument words that have to
be aligned on an even register index.
• Minimize the number of double parameters when using software floating-point.
• Avoid functions with a variable number of parameters. Functions taking a variable number of
arguments effectively pass all their arguments on the stack.

Related concepts
5.16 Returning structures from functions through registers on page 5-173.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-172

Non-Confidential
5 Compiler Coding Practices
5.16 Returning structures from functions through registers

5.16 Returning structures from functions through registers

The compiler allows functions to return structures containing multiple values through the registers, rather
than the stack.
In C and C++, one way of returning multiple values from a function is to use a structure. Normally,
structures are returned on the stack, with all the associated expense this entails.
To reduce memory traffic and reduce code size, the compiler enables functions to return multiple values
through the registers. A function can return up to four words in a struct by qualifying the function with
__value_in_regs. For example:

typedef struct s_coord { int x; int y; } coord;

coord reflect(int x1, int y1) __value_in_regs;

You can use __value_in_regs anywhere where multiple values have to be returned from a function.
Examples include:
• Returning multiple values from C and C++ functions.
• Returning multiple values from embedded assembly language functions.
• Making supervisor calls.
• Re-implementing __user_initial_stackheap.

Related concepts
5.15 Methods of minimizing function parameter passing overhead on page 5-172.

Related references
10.19 __value_in_regs on page 10-620.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-173

Non-Confidential
5 Compiler Coding Practices
5.17 Functions that return the same result when called with the same arguments

5.17 Functions that return the same result when called with the same arguments
A function that always returns the same result when called with the same arguments, and does not
change any global data, is referred to as a pure function.
By definition, it is sufficient to evaluate any particular call to a pure function only once. Because the
result of a call to the function is guaranteed to be the same for any identical call, each subsequent call to
the function in code can be replaced with the result of the original call.
Using the keyword __pure when declaring a function indicates that the function is a pure function.
By definition, pure functions cannot have side effects. For example, a pure function cannot read or write
global state by using global variables or indirecting through pointers, because accessing global state can
violate the rule that the function must return the same value each time when called twice with the same
parameters. Therefore, you must use __pure carefully in your programs. Where functions can be
declared __pure, however, the compiler can often perform powerful optimizations, such as Common
Subexpression Eliminations (CSEs).

Related references
10.33 __attribute__((const)) function attribute on page 10-638.
10.48 __attribute__((pure)) function attribute on page 10-653.
5.18 Comparison of pure and impure functions on page 5-175.
10.13 __pure on page 10-614.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-174

Non-Confidential
5 Compiler Coding Practices
5.18 Comparison of pure and impure functions

5.18 Comparison of pure and impure functions

The two sample routines in the following table illustrate the use of the __pure keyword.
Both routines call a function fact() to calculate the sum of n! and n!. fact() depends only on its input
argument n to compute n!. Therefore, fact() is a pure function.
The first routine shows a naive implementation of the function fact(), where fact() is not declared
__pure. In the second implementation, fact() is qualified as __pure to indicate to the compiler that it is
a pure function.

Table 5-7 C code for pure and impure functions

A pure function not declared pure A pure function declared pure

int fact(int n) int fact(int n) __pure

{ {
int f = 1; int f = 1;
while (n > 0) while (n > 0)
f *= n--; f *= n--;
return f; return f;
} }
int foo(int n) int foo(int n)
{ {
return fact(n)+fact(n); return fact(n)+fact(n);
} }

Table 5-8 Disassembly for pure and impure functions

A pure function not declared pure A pure function declared pure

fact PROC fact PROC

... ...
foo PROC foo PROC
MOV r3, r0 PUSH {lr}
PUSH {lr} BL fact
BL fact LSL r0,r0,#1
MOV r2, r0 POP {pc}
MOV r0, r3 ENDP
BL fact
ADD r0, r0, r2
POP {pc}
ENDP

In the disassembly where fact() is not qualified as __pure, fact() is called twice because the compiler
does not know that the function is a candidate for Common Subexpression Elimination (CSE). In
contrast, in the disassembly where fact() is qualified as __pure, fact() is called only once, instead of
twice, because the compiler has been able to perform CSE when adding fact(n) + fact(n).

Related concepts
5.17 Functions that return the same result when called with the same arguments on page 5-174.

Related references
10.13 __pure on page 10-614.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-175

Non-Confidential
5 Compiler Coding Practices
5.19 Recommendation of postfix syntax when qualifying functions with ARM function modifiers

5.19 Recommendation of postfix syntax when qualifying functions with ARM

function modifiers
You can use function modifiers such as __pure either prefix or postfix, that is before the function
declaration or after the parameter list. ARM recommends using the more precise postfix syntax.
Many ARM keyword extensions modify the behavior or calling sequence of a function. For example,
__pure, __irq, __swi, __swi_indirect , __softfp, and __value_in_regs all behave in this way.

These function modifiers all have a common syntax. A function modifier such as __pure can qualify a
function declaration either:
• Before the function declaration. For example:
__pure int foo(int);
• After the closing parenthesis on the parameter list. For example:
int foo(int) __pure;

For simple function declarations, each syntax is unambiguous. However, for a function whose return type
or arguments are function pointers, the prefix syntax is imprecise. For example, the following function
returns a function pointer, but it is not clear whether __pure modifies the function itself or its returned
pointer type:
__pure int (*foo(int)) (int); /* declares 'foo' as a (pure?) function
that returns a pointer to a (pure?)
function.
It is ambiguous which of the two
function types is pure. */

In fact, the single __pure keyword at the front of the declaration of foo modifies both foo itself and the
function pointer type returned by foo.
In contrast, the postfix syntax enables clear distinction between whether __pure applies to the argument,
the return type, or the base function, when declaring a function whose argument and return types are
function pointers. For example:
int (*foo1(int) __pure) (int); /* foo1 is a pure function
returning a pointer to
a normal function */
int (*foo2(int)) (int) __pure; /* foo2 is a function
returning a pointer to
a pure function */
int (*foo3(int) __pure) (int) __pure; /* foo3 is a pure function
returning a pointer to
a pure function */

In this example:
• foo1 and foo3 are modified themselves.
• foo2 and foo3 return a pointer to a modified function.
• The functions foo3 and foo are identical.
Because the postfix syntax is more precise than the prefix syntax, ARM recommends that, where
possible, you make use of the postfix syntax when qualifying functions with ARM function modifiers.

Related references
10.11 __irq on page 10-611.
10.13 __pure on page 10-614.
10.15 __softfp on page 10-616.
10.16 __svc on page 10-617.
10.17 __svc_indirect on page 10-618.
10.19 __value_in_regs on page 10-620.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-176

Non-Confidential
5 Compiler Coding Practices
5.20 Inline functions

5.20 Inline functions

Inline functions offer a trade-off between code size and performance. By default, the compiler decides
for itself whether to inline code or not.
As a general rule, when compiling with -Ospace, the compiler makes sensible decisions about inlining
with a view to producing code of minimal size. This is because code size for embedded systems is of
fundamental importance. When compiling with -Otime, the compiler inlines in most cases, but still
avoids large code growth. On NEON, calls to non-inline functions from within a loop inhibit
vectorization, and require explicit indication that they are to be inlined for vectorization to take place.
In most circumstances, the decision to inline a particular function is best left to the compiler. However,
you can give the compiler a hint that a function is required to be inlined by using the appropriate inline
keyword.
Functions that are qualified with the __inline, inline, or __forceinline keywords are called inline
functions. In C++, member functions that are defined inside a class, struct, or union, are also inline
functions.
The compiler also offers a range of other facilities for modifying its behavior with respect to inlining.
There are several factors you must take into account when deciding whether to use these facilities, or
more generally, whether to inline a function at all.
The linker is able to apply some degree of function inlining to functions that are very short.

Related concepts
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.

Related information
--inline, --no_inline linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-177

Non-Confidential
5 Compiler Coding Practices
5.21 Compiler decisions on function inlining

5.21 Compiler decisions on function inlining

When function inlining is enabled, the compiler uses a complex decision tree to decide if a function is to
be inlined.
The following simplified algorithm is used:
1. If the function is qualified with __forceinline, the function is inlined if it is possible to do so.
2. If the function is qualified with __inline and the option --forceinline is selected, the function is
inlined if it is possible to do so.
If the function is qualified with __inline and the option --forceinline is not selected, the function
is inlined if it is practical to do so.
3. If the optimization level is -O2 or higher, or --autoinline is specified, the compiler automatically
inlines functions if it is practical to do so, even if you do not explicitly give a hint that function
inlining is wanted.
When deciding if it is practical to inline a function, the compiler takes into account several other criteria,
such as:
• The size of the function, and how many times it is called.
• The current optimization level.
• Whether it is optimizing for speed (-Otime) or size (-Ospace).
• Whether the function has external or static linkage.
• How many parameters the function has.
• Whether the return value of the function is used.
Ultimately, the compiler can decide not to inline a function, even if the function is qualified with
__forceinline. As a general rule:
• Smaller functions stand a better chance of being inlined.
• Compiling with -Otime increases the likelihood that a function is inlined.
• Large functions are not normally inlined because this can adversely affect code density and
performance.
A recursive function is never inlined into itself, even if __forceinline is used.

Related concepts
5.20 Inline functions on page 5-177.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-178

Non-Confidential
5 Compiler Coding Practices
5.22 Automatic function inlining and static functions

5.22 Automatic function inlining and static functions

At -O2 and -O3 levels of optimization, or when --autoinline is specified, the compiler can
automatically inline functions if it is practical and possible to do so, even if the functions are not declared
as __inline or inline.
This works best for static functions, because if all use of a static function can be inlined, no out-of-line
copy is required. Unless a function is explicitly declared as static (or __inline), the compiler has to
retain the out-of-line version of it in the object file in case it is called from some other module.
It is best to mark all non-inline functions as static if they are not used outside the translation unit where
they are defined (a translation unit being the preprocessed output of a source file together with all of the
headers and source files included as a result of the #include directive). Typically, you do not want to
place definitions of non-inline functions in header files.
If you fail to declare functions that are never called from outside a module as static, code can be
adversely affected. In particular, you might have:
• A larger code size, because out-of-line versions of functions are retained in the image.
When a function is automatically inlined, both the in-line version and an out-of-line version of the
function might end up in the final image, unless the function is declared as static. This might
increase code size.
• An unnecessarily complicated debug view, because there are both inline versions and out-of-line
versions of functions to display.
Retaining both inline and out-of-line copies of a function in code can sometimes be confusing when
setting breakpoints or single-stepping in a debug view. The debugger has to display both in-line and
out-of-line versions in its interleaved source view so that you can see what is happening when
stepping through either the in-line or out-of-line version.
Because of these problems, declare non-inline functions as static when you are sure that they can never
be called from another module.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.139 -Onum on page 8-473.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-179

Non-Confidential
5 Compiler Coding Practices
5.23 Inline functions and removal of unused out-of-line functions at link time

5.23 Inline functions and removal of unused out-of-line functions at link time
The linker cannot remove unused out-of-line functions from an object unless you place the unused out-
of-line functions in their own sections.
Use one of the following methods to place unused out-of-line functions in their own sections:
• --split_sections.
• __attribute__((section("name"))).
• #pragma arm section [section_type_list].
• Linker feedback.
--feedback is typically an easier method of enabling unused function removal.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.82 --feedback=filename on page 8-410.
8.175 --split_sections on page 8-512.
10.69 __attribute__((section("name"))) variable attribute on page 10-674.
10.79 #pragma arm section [section_type_list] on page 10-684.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-180

Non-Confidential
5 Compiler Coding Practices
5.24 Automatic function inlining and multifile compilation

5.24 Automatic function inlining and multifile compilation

If you are compiling with the --multifile option, the compiler can perform automatic inlining for calls
to functions that are defined in other translation units.
In RVCT 4.0 the --multifile option is enabled by default at -O3 level.
In ARM Compiler 4.1 and later the --multifile option is disabled by default, regardless of the
optimization level.
For --multifile, both translation units must be compiled in the same invocation of the compiler.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.108 --inline, --no_inline on page 8-439.
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-181

Non-Confidential
5 Compiler Coding Practices
5.25 Restriction on overriding compiler decisions about function inlining

5.25 Restriction on overriding compiler decisions about function inlining

You can enable and disable function inlining, but you cannot override decisions the compiler makes
about when it is practical to inline a function.
For example, you cannot force a function to be inlined if the compiler thinks it is not sensible to do so.
Even if you use --forceinline or __forceinline, the compiler only inlines functions if it is possible
to do so.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-182

Non-Confidential
5 Compiler Coding Practices
5.26 Compiler modes and inline functions

5.26 Compiler modes and inline functions

Compiler modes affect the behavior of inline functions.
ARM provides information about inline functions in C++, C90, and C99 modes.
The GNU Compiler Collection (GCC) web site provides information about inline functions in GNU C90
mode.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.

Related information
GNU Compiler Collection, http://gcc.gnu.org.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-183

Non-Confidential
5 Compiler Coding Practices
5.27 Inline functions in C++ and C90 mode

5.27 Inline functions in C++ and C90 mode

The inline keyword is not available in C90. The effect of __inline in C90, and __inline and inline
in C++, is identical.
When declaring an extern function to be inline, you must define it in every translation unit that it is used
in. You must ensure that you use the same definition in each translation unit.
The requirement of defining the function in every translation unit applies even though it has external
linkage.
If an inline function is used by more than one translation unit, its definition is typically placed in a header
file.
ARM does not recommend placing definitions of non-inline functions in header files, because this can
result in the creation of a separate function in each translation unit. If the non-inline function is an
extern function, this leads to duplicate symbols at link time. If the non-inline function is static, this
can lead to unwanted code duplication.
Member functions defined within a C++ structure, class, or union declaration, are implicitly inline. They
are treated as if they are declared with the inline or __inline keyword.
Inline functions have extern linkage unless they are explicitly declared static. If an inline function is
declared to be static, any out-of-line copies of the function must be unique to their translation unit, so
declaring an inline function to be static could lead to unwanted code duplication.
The compiler generates a regular call to an out-of-line copy of a function when it cannot inline the
function, and when it decides not to inline it.
The requirement of defining a function in every translation unit it is used in means that the compiler is
not required to emit out-of-line copies of all extern inline functions. When the compiler does emit out-
of-line copies of an extern inline function, it uses Common Groups, so that the linker eliminates
duplicates, keeping at most one copy in the same out-of-line function from different object files.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.108 --inline, --no_inline on page 8-439.
10.8 __inline on page 10-608.

Related information
Elimination of common groups or sections.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-184

Non-Confidential
5 Compiler Coding Practices
5.28 Inline functions in C99 mode

5.28 Inline functions in C99 mode

The rules for C99 inline functions with external linkage differ from those of C++.
C99 distinguishes between inline definitions and external definitions. Within a given translation unit
where the inline function is defined, if the inline function is always declared with inline and never with
extern, it is an inline definition. Otherwise, it is an external definition. These inline definitions do not
generate out-of-line copies, even when --no_inline is used.
Each use of an inline function might be inlined using a definition from the same translation unit (that
might be an inline definition or an external definition), or it might become a call to an external definition.
If an inline function is used, it must have exactly one external definition in some translation unit. This is
the same rule that applies to using any external function. In practice, if all uses of an inline function are
inlined, no error occurs if the external definition is missing. If you use --no_inline, only external
definitions are used.
Typically, you put inline functions with external linkage into header files as inline definitions, using
inline, and not using extern. There is also an external definition in one source file. For example:

/* example_header.h */
inline int my_function (int i)
{
return i + 42; // inline definition
}
/* file1.c */
#include "example_header.h"
... // uses of my_function()
/* file2.c */
#include "example_header.h"
... // uses of my_function()
/* myfile.c */
#include "example_header.h"
extern inline int my_function(int); // causes external definition.

This is the same strategy that is typically used for C++, but in C++ there is no special external definition,
and no requirement for it.
The definitions of inline functions can be different in different translation units. However, in typical use,
as in the above example, they are identical.
When compiling with --multifile, calls in one translation unit might be inlined using the external
definition in another translation unit.
C99 places some restrictions on inline definitions. They cannot define modifiable local static objects.
They cannot reference identifiers with static linkage.
In C99 mode, as with all other modes, the effects of __inline and inline are identical.
Inline functions with static linkage have the same behavior in C99 as in C++.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.29 Inline functions and debugging on page 5-187.

Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.108 --inline, --no_inline on page 8-439.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-185

Non-Confidential
5 Compiler Coding Practices
5.28 Inline functions in C99 mode

8.134 --multifile, --no_multifile on page 8-467.

10.8 __inline on page 10-608.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-186

Non-Confidential
5 Compiler Coding Practices
5.29 Inline functions and debugging

5.29 Inline functions and debugging

The debug view generated for inline functions is generally good. However, it is sometimes useful to
avoid inlining functions because in some situations, debugging is clearer if they are not inlined.
You can enable and disable the inlining of functions using the --no_inline, --inline, --autoinline
and --no_autoinline command-line options.
The debug view can also be adversely affected by retaining both inline and out-of-line copies of a
function when out-of-line copies are not required. Functions that are never called from outside a module
can be declared as static functions to avoid an unnecessarily complicated debug view.

Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.

Related information
--inline, --no_inline linker option.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-187

Non-Confidential
5 Compiler Coding Practices
5.30 Types of data alignment

5.30 Types of data alignment

All access to data in memory can be classified into a number of different categories.
These categories are as follows:
• Natural alignment, for example, on a word boundary at 0x1004. The ARM compiler normally aligns
variables and pads structures so that these items are accessed efficiently using LDR and STR
instructions.
• Known but non-natural alignment, for example, a word at address 0x1001. This type of alignment
commonly occurs when structures are packed to remove unnecessary padding. In C and C++, the
__packed qualifier or the #pragma pack(n) pragma let you signify that a structure is packed.
• Unknown alignment, for example, a word at an arbitrary address. This type of alignment commonly
occurs when defining a pointer that can point to a word at any address. In C and C++, the __packed
qualifier or the #pragma pack(n) pragma let you signify that a pointer can access a word on a non-
natural alignment boundary.

Related concepts
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.

Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
10.97 #pragma pack(n) on page 10-703.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-188

Non-Confidential
5 Compiler Coding Practices
5.31 Advantages of natural data alignment

5.31 Advantages of natural data alignment

The various C data types are aligned on specific byte boundaries to maximize storage potential and to
provide for fast, efficient memory access with the ARM instruction set.
For example, the ARM architecture can access a four-byte variable using only one instruction when the
object is stored at an address divisible by four, so four-byte objects are located on four-byte boundaries.
ARM and Thumb processors are designed to efficiently access naturally aligned data, that is,
doublewords that lie on addresses that are multiples of eight, words that lie on addresses that are
multiples of four, halfwords that lie on addresses that are multiples of two, and single bytes that lie at any
byte address. Such data is located on its natural size boundary.

Related concepts
5.30 Types of data alignment on page 5-188.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.

Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-189

Non-Confidential
5 Compiler Coding Practices
5.32 Compiler storage of data objects by natural byte alignment

5.32 Compiler storage of data objects by natural byte alignment

C data types are aligned on specific byte boundaries, depending on their type.
By default, the compiler stores data objects by byte alignment as shown in the following table.

Table 5-9 Compiler storage of data objects by byte alignment

Type Bytes Alignment

char, bool, _Bool 1 Located at any byte address.

short, wchar_t 2 Located at any address that is evenly divisible by 2.

float, int, long, pointer 4 Located at an address that is evenly divisible by 4.

long long, double, long double 8 Located at an address that is evenly divisible by 8.

Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.

Related references
5.33 Relevance of natural data alignment at compile time on page 5-191.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-190

Non-Confidential
5 Compiler Coding Practices
5.33 Relevance of natural data alignment at compile time

5.33 Relevance of natural data alignment at compile time

Data alignment becomes relevant when the compiler allocates memory locations to variables.
For example, in the following structure, a three-byte gap is required between bmem and cmem.
struct example_st {
int amem;
char bmem;
int cmem;
};

Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-191

Non-Confidential
5 Compiler Coding Practices
5.34 Unaligned data access in C and C++ code

5.34 Unaligned data access in C and C++ code

It can be necessary to access unaligned data in memory, for example, when porting legacy code from a
CISC architecture where instructions are available to directly access unaligned data in memory.
On ARMv4 and ARMv5 architectures, and on the ARMv6 architecture depending on how it is
configured, care is required when accessing unaligned data in memory, to avoid unexpected results. For
example, when C or C++ source code uses a conventional pointer to read a word in C or C++ source
code, the ARM compiler generates assembly language code that reads the word using an LDR instruction.
This works as expected when the address is a multiple of four, for example if it lies on a word boundary.
However, if the address is not a multiple of four, the LDR instruction returns a rotated result rather than
performing a true unaligned word load. Generally, this rotation is not what the programmer expects.
On ARMv6 and later architectures, unaligned access is fully supported.

Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.

Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-192

Non-Confidential
5 Compiler Coding Practices
5.35 The __packed qualifier and unaligned data access in C and C++ code

5.35 The __packed qualifier and unaligned data access in C and C++ code
The __packed qualifier sets the alignment of any valid type to 1.
This enables objects of packed type to be read or written using unaligned access.
Examples of objects that can be packed include:
• Structures.
• Unions.
• Pointers.

Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.36 Unaligned fields in structures on page 5-194.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.

Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-193

Non-Confidential
5 Compiler Coding Practices
5.36 Unaligned fields in structures

5.36 Unaligned fields in structures

You can use the __packed qualifier to create unaligned fields in structures. This saves space because the
compiler does not need to pad fields to their natural size boundary.
For efficiency, fields in a structure are positioned on their natural size boundary. This means that the
compiler often inserts padding between fields to ensure that they are naturally aligned.
When space is at a premium, you can use the __packed qualifier to create structures without padding
between fields. Structures can be packed in the following ways:
• The entire struct can be declared as __packed. For example:
__packed struct mystruct
{
char c;
short s;
} // not recommended

Each field of the structure inherits the __packed qualifier.

Declaring an entire struct as __packed typically incurs a penalty both in code size and performance.
• Individual non-aligned fields within the struct can be declared as __packed. For example:
struct mystruct
{
char c;
__packed short s; // recommended
}

This is the recommended approach to packing structures.

Note
The same principles apply to unions. You can declare either an entire union as __packed, or use the
__packed attribute to identify components of the union that are unaligned in memory.

Related concepts
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed
fields, and of a __packed struct and a #pragma packed struct on page 5-198.

Related references
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-194

Non-Confidential
5 Compiler Coding Practices
5.37 Performance penalty associated with marking whole structures as packed

5.37 Performance penalty associated with marking whole structures as packed

Reading from and writing to whole structures qualified with __packed requires unaligned accesses and
can therefore incur a performance penalty.
When optimizing a struct that is packed, the compiler tries to deduce the alignment of each field, to
improve access. However, it is not always possible for the compiler to deduce the alignment of each field
in a __packed struct. In contrast, when individual fields in a struct are declared as __packed, fast
access is guaranteed to naturally aligned members within the struct. Therefore, when the use of a
packed structure is required, ARM recommends that you always pack individual fields of the structure,
rather than the entire structure itself.
Note
Declaring individual non-aligned fields of a struct as __packed also has the advantage of making it
clearer to the programmer which fields of the struct are not naturally aligned.

Related concepts
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
5.36 Unaligned fields in structures on page 5-194.
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed
fields, and of a __packed struct and a #pragma packed struct on page 5-198.

Related references
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-195

Non-Confidential
5 Compiler Coding Practices
5.38 Unaligned pointers in C and C++ code

5.38 Unaligned pointers in C and C++ code

If you want to define a pointer that can point to a word at any address, you must specify the __packed
qualifier.
By default, the compiler expects conventional C and C++ pointers to point to naturally aligned words in
memory because this enables the compiler to generate more efficient code.
For example, to specify an unaligned pointer:
__packed int *pi; // pointer to unaligned int

When a pointer is declared as __packed, the compiler generates code that correctly accesses the
dereferenced value of the pointer, regardless of its alignment. The generated code consists of a sequence
of byte accesses, or variable alignment-dependent shifting and masking instructions, rather than a simple
LDR instruction. Consequently, declaring a pointer as __packed incurs a performance and code size
penalty.

Related concepts
5.39 Unaligned Load Register (LDR) instructions generated by the compiler on page 5-197.

Related references
10.12 __packed on page 10-612.
8.187 --unaligned_access, --no_unaligned_access on page 8-528.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-196

Non-Confidential
5 Compiler Coding Practices
5.39 Unaligned Load Register (LDR) instructions generated by the compiler

5.39 Unaligned Load Register (LDR) instructions generated by the compiler

In some circumstances, where it is legal to do so, the compiler might intentionally generate unaligned
LDR instructions.

In particular, the compiler can do this to load halfwords from memory, even where the architecture
supports dedicated halfword load instructions.
For example, to access an unaligned short within a __packed structure, the compiler might load the
required halfword into the top half of a register and then shift it down to the bottom half. This operation
requires only one memory access, whereas performing the same operation using LDRB instructions
requires two memory accesses, plus instructions to merge the two bytes.

Related concepts
5.38 Unaligned pointers in C and C++ code on page 5-196.

Related references
10.12 __packed on page 10-612.
8.187 --unaligned_access, --no_unaligned_access on page 8-528.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-197

5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with

individually __packed fields, and of a __packed struct and a #pragma packed
struct
These comparisons illustrate the differences between the methods of packing structures.

Comparison of an unpacked struct, a __packed struct, and a struct with individually

__packed fields
The differences between not packing a struct, packing an entire struct, and packing individual fields
of a struct are illustrated by the three implementations of a struct shown in the following table.

Table 5-10 C code for an unpacked struct, a packed struct, and a struct with individually packed fields

Unpacked struct packed struct packed fields

struct foo __packed struct foo struct foo

{ { {
char one; char one; char one;
short two; short two; __packed short two;
char three; char three; char three;
int four; int four; int four;
} c; } c; } c;

In the first implementation, the struct is not packed. In the second implementation, the entire structure
is qualified as __packed. In the third implementation, the __packed attribute is removed from the
structure and the individual field that is not naturally aligned is declared as __packed.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations of the preceding table, where the C code for each implementation
has been compiled using the option -O2.

Table 5-11 Disassembly for an unpacked struct, a packed struct, and a struct with individually packed fields

Unpacked struct packed struct packed fields

; r0 contains address of c ; r0 contains address of c ; r0 contains address of c

; char one ; char one ; char one
LDRB r1, [r0, #0] LDRB r1, [r0, #0] LDRB r1, [r0, #0]
; short two ; short two ; short two
LDRSH r2, [r0, #2] LDRB r2, [r0, #1] LDRB r2, [r0, #1]
; char three LDRSB r12, [r0, #2] LDRSB r12, [r0, #2]
LDRB r3, [r0, #4] ORR r2, r12, r2, LSL #8 ORR r2, r12, r2, LSL #8
; int four ; char three ; char three
LDR r12, [r0, #8] LDRB r3, [r0, #3] LDRB r3, [r0, #3]
; int four ; int four
ADD r0, r0, #4 LDR r12, [r0, #4]
BL __aeabi_uread4

Note
The -Ospace and -Otime compiler options control whether accesses to unaligned elements are made
inline or through a function call. Using -Otime results in inline unaligned accesses. Using -Ospace
results in unaligned accesses made through function calls.

In the disassembly of the unpacked struct example above, the compiler always accesses data on aligned
word or halfword addresses. The compiler is able to do this because the struct is padded so that every
member of the struct lies on its natural size boundary.
In the disassembly of the __packed struct example above, fields one and three are aligned on their
natural size boundaries by default, so the compiler makes aligned accesses. The compiler always carries
out aligned word or halfword accesses for fields it can identify as being aligned. For the unaligned field

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-198

Non-Confidential
5 Compiler Coding Practices
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed fields, and of a __packed struct and a
#pragma packed struct
two, the compiler uses multiple aligned memory accesses (LDR/STR/LDM/STM), combined with fixed
shifting and masking, to access the correct bytes in memory. The compiler calls the ARM Embedded
Application Binary Interface (AEABI) runtime routine __aeabi_uread4 for reading an unsigned word at
an unknown alignment to access field four because it is not able to determine that the field lies on its
natural size boundary.
In the disassembly of the struct with individually packed fields example above, fields one, two, and
three are accessed in the same way as in the case where the entire struct is qualified as __packed. In
contrast to the situation where the entire struct is packed, however, the compiler makes a word-aligned
access to the field four. This is because the presence of the __packed short within the structure helps
the compiler to determine that the field four lies on its natural size boundary.

Comparison of a __packed struct and a #pragma packed struct

The differences between a __packed struct and a #pragma packed struct are illustrated by the two
implementations of a struct shown in the following table.

Table 5-12 C code for a packed struct and a pragma packed struct

__packed struct #pragma packed struct

__packed struct foobar #pragma push

{ #pragma pack(1)
char x; struct foobar
short y[10]; {
}; char x;
short get_y0(struct foobar *s) short y[10];
{ };
// Unaligned-capable load #pragma pop
return *s->y; short get_y0(struct foobar *s)
} {
short *get_y(struct foobar *s) // Unaligned-capable load
{ return *s->y;
return s->y; // Compile error }
} short *get_y(struct foobar *s)
{
return s->y; // No error
// Potentially illegal unaligned load,
// depending on use of result
}

In the first implementation, taking the address of a field in a __packed struct or a __packed field in a
struct yields a __packed pointer, and the compiler generates a type error if you try to implicitly cast
this to a non-__packed pointer. In the second implementation, in contrast, taking the address of a field in
a #pragma packed struct does not yield a __packed-qualified pointer. However, the field might not be
properly aligned for its type, and dereferencing such an unaligned pointer results in Undefined behavior.

Related concepts
5.36 Unaligned fields in structures on page 5-194.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.

Related references
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
10.12 __packed on page 10-612.
10.60 __attribute__((packed)) type attribute on page 10-665.
10.68 __attribute__((packed)) variable attribute on page 10-673.
10.97 #pragma pack(n) on page 10-703.

Related information
Application Binary Interface (ABI) for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-199

Non-Confidential
5 Compiler Coding Practices
5.41 Compiler support for floating-point arithmetic

5.41 Compiler support for floating-point arithmetic

The compiler provides many features for managing floating-point arithmetic both in hardware and in
software.
For example, you can specify software or hardware support for floating-point, particular hardware
architectures, and the level of conformance to IEEE floating-point standards.
The selection of floating-point options determines various trade-offs between floating-point performance,
system cost, and system flexibility. To obtain the best trade-off between performance, cost, and
flexibility, you have to make sensible choices in your selection of floating-point options.
Floating-point arithmetic can be supported, either:
• In software, through the floating-point library fplib. This library provides functions that can be
called to implement floating-point operations using no additional hardware.
• In hardware, using a hardware Vector Floating Point (VFP) coprocessor with the ARM processor to
provide the required floating-point operations. VFP is a coprocessor architecture that implements
IEEE floating-point and supports single and double precision, but not extended precision.
Note
In practice, floating-point arithmetic in the VFP is implemented using a combination of hardware,
that executes the common cases, and software, that deals with the uncommon cases, and cases
causing exceptions.

Code that uses hardware support for floating-point arithmetic is more compact and offers better
performance than code that performs floating-point arithmetic in software. However, hardware support
for floating-point arithmetic requires a VFP coprocessor.

Related concepts
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage on page 10-707.
8.43 --cpu=name on page 8-368.
10.118 __fabs intrinsic on page 10-727.
8.86 --fp16_format=format on page 8-414.
8.87 --fpmode=model on page 8-415.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.
10.142 __sqrt intrinsic on page 10-754.
10.160 GNU built-in functions on page 10-778.
10.161 Predefined macros on page 10-786.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-200

Non-Confidential
5 Compiler Coding Practices
5.41 Compiler support for floating-point arithmetic

10.156 VFP status intrinsic on page 10-771.

9.14 Hexadecimal floats on page 9-564.
9.38 Hexadecimal floating-point constants on page 9-588.
17.3 Limits for floating-point numbers on page 17-921.

Related information
Institute of Electrical and Electronics Engineers.
Floating-point Support.
ARM and Thumb floating-point build options (ARMv6 and earlier).
ARM and Thumb floating-point build options (ARMv7 and later).

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-201

Non-Confidential
5 Compiler Coding Practices
5.42 Default selection of hardware or software floating-point support

5.42 Default selection of hardware or software floating-point support

The default target FPU architecture is derived from use of the --cpu option.
If the processor specified with --cpu has a VFP coprocessor, the default target FPU architecture is the
VFP architecture for that processor. For example, the option --cpu ARM1136JF-S implies the option
--fpu vfpv2.

If you are building ARM Linux applications using --arm_linux or --arm_linux_paths, the default is
always software floating-point linkage. Even if you specify a processor that implies an FPU (for
example, --cpu=ARM1136JF-S), the compiler still defaults to --fpu=softvfp+vfp, not --fpu=vfp.
If a VFP coprocessor is present, VFP instructions are generated. If there is no VFP coprocessor, the
compiler generates code that makes calls to the software floating-point library fplib to carry out
floating-point operations. fplib is available as part of the standard distribution of the ARM compilation
tools suite of C libraries.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.

Related information
Floating-point Support.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-202

Non-Confidential
5 Compiler Coding Practices
5.43 Example of hardware and software support differences for floating-point arithmetic

5.43 Example of hardware and software support differences for floating-point

arithmetic
This example shows how the compiler deals with floating-point arithmetic for different processors
supporting either hardware or software floating-point arithmetic.
The following example shows a function implementing floating-point arithmetic in C code.
float foo(float num1, float num2)
{
float temp, temp2;
temp = num1 + num2;
temp2 = num2 * num2;
return temp2 - temp;
}

When the example C code is compiled with the command-line options --cpu 5TE and --fpu softvfp,
the compiler produces machine code with the disassembly shown below. In this case, floating-point
arithmetic is performed in software through calls to library routines such as __aeabi_fmul.
||foo|| PROC
PUSH {r4-r6, lr}
MOV r4, r1
BL __aeabi_fadd
MOV r5, r0
MOV r1, r4
MOV r0, r4
BL __aeabi_fmul
MOV r1, r5
POP {r4-r6, lr}
B __aeabi_fsub
ENDP

However, when the example C code is compiled with the command-line option --fpu vfp, the compiler
produces machine code with the disassembly shown below. In this case, floating-point arithmetic is
performed in hardware through floating-point arithmetic instructions such as VMUL.F32.
||foo|| PROC
VADD.F32 s2, s0, s1
VMUL.F32 s0, s1, s1
VSUB.F32 s0, s0, s2
BX lr
ENDP

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.43 --cpu=name on page 8-368.
8.42 --cpu=list on page 8-367.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-203

Non-Confidential
5 Compiler Coding Practices
5.43 Example of hardware and software support differences for floating-point arithmetic

Related information
Application Binary Interface (ABI) for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-204

Non-Confidential
5 Compiler Coding Practices
5.44 Vector Floating-Point (VFP) architectures

5.44 Vector Floating-Point (VFP) architectures

ARM supports several versions of the VFP architecture, implemented in different ARM architectures.
VFP architectures provide both single and double precision operations. Many operations can take place
in either scalar form or in vector form. Several versions of the architecture are supported, including:
• VFPv2, implemented in:
— VFP10 revision 1, as provided by the ARM10200E processor.
— VFP9-S, available as a separately licensable option for the ARM926E, ARM946E and ARM966E
processors.
— VFP11, as provided in the ARM1136JF-S, ARM1176JZF-S and ARM11 MPCore processors.
• VFPv3, implemented on ARM architecture v7 and later, for example, the Cortex-A8 processor.
VFPv3 is backwards compatible with VFPv2, except that it cannot trap floating point exceptions. It
requires no software support code. VFPv3 has 32 double-precision registers.
• VFPv3_fp16, VFPv3 with half-precision extensions. These extensions provide conversion functions
between half-precision floating-point numbers and single-precision floating-point numbers, in both
directions. They can be implemented with any Advanced SIMD and VFP implementation that
supports single-precision floating-point numbers.
• VFPv3-D16, an implementation of VFPv3 that provides 16 double-precision registers. It is
implemented on ARM architecture v7 processors that support VFP without NEON technology.
• VFPv3U, an implementation of VFPv3 that can trap floating-point exceptions. It requires software
support code.
• VFPv4, implemented on ARM architecture v7 and later, for example, the Cortex-A7 processor.
VFPv4 has 32 double-precision registers. VFPv4 adds both half-precision extensions and fused
multiply-add instructions to the features of VFPv3.
• VFPv4-D16, an implementation of VFPv4 that provides 16 double-precision registers. It is
implemented on ARM architecture v7 processors that support VFP without NEON technology.
• VFPv4U, an implementation of VFPv4 that can trap floating-point exceptions. It requires software
support code.

Note
Particular implementations of the VFP architecture might provide additional implementation-specific
functionality. For example, the VFP coprocessor hardware might include extra registers for describing
exceptional conditions. This extra functionality is known as sub-architecture functionality.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-205

Non-Confidential
5 Compiler Coding Practices
5.44 Vector Floating-Point (VFP) architectures

Related information
ARM Application Note 133 - Using VFP with RVDS.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-206

Non-Confidential
5 Compiler Coding Practices
5.45 Limitations on hardware handling of floating-point arithmetic

5.45 Limitations on hardware handling of floating-point arithmetic

ARM Vector Floating-Point (VFP) coprocessors are optimized to process well-defined floating-point
code in hardware. Arithmetic operations that occur too rarely, or that are too complex, are not handled in
hardware.
Instead, processing of these cases must be handled in software. This approach minimizes the amount of
coprocessor hardware required and reduces costs.
Code provided to handle cases the VFP hardware is unable to process is known as VFP support code.
When the VFP hardware is unable to deal with a situation directly, it bounces the case to VFP support
code for more processing. For example, VFP support code might be called to process any of the
following:
• Floating-point operations involving NaNs.
• Floating-point operations involving denormals.
• Floating-point overflow.
• Floating-point underflow.
• Inexact results.
• Division-by-zero errors.
• Invalid operations.
When support code is in place, the VFP supports a fully IEEE 754-compliant floating-point model.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.

Related information
Institute of Electrical and Electronics Engineers.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-207

Non-Confidential
5 Compiler Coding Practices
5.46 Implementation of Vector Floating-Point (VFP) support code

5.46 Implementation of Vector Floating-Point (VFP) support code

For convenience, an implementation of VFP support code that can be used in your system is provided
with your installation of the ARM compilation tools.
The support code comprises:
• The libraries vfpsupport.l and vfpsupport.b for emulating VFP operations bounced by the
hardware.
These files are located in the \lib\armlib subdirectory of your installation.
• C source code and assembly language source code implementing top-level, second-level and user-
level interrupt handlers.
These files can be found in the vfpsupport subdirectory of the Examples directory of your ARM
compilation tools distribution at install_directory\Examples\...\vfpsupport.
These files might require modification to integrate VFP support with your operating system.
• C source code and assembly language source code for accessing subarchitecture functionality of VFP
coprocessors.
These files are located in the vfpsupport subdirectory of the Examples directory of your ARM
compilation tools distribution at install_directory\Examples\...\vfpsupport.
When the VFP coprocessor bounces an instruction, an Undefined Instruction exception is signaled to the
processor and the VFP support code is entered through the Undefined Instruction vector. The top-level
and second-level interrupt handlers perform some initial processing of the signal, for example, ensuring
that the exception is not caused by an illegal instruction. The user-level interrupt handler then calls the
appropriate library function in the library vfpsupport.l or vfpsupport.b to emulate the VFP operation
in software.
Note
You do not have to use VFP support code:
• When building with --fpmode=std.
• When no trapping of uncommon or exceptional cases is required.
• When the VFP coprocessor is operating in RunFast mode.
• When the hardware coprocessor is a VFPv3-based system.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-208

Non-Confidential
5 Compiler Coding Practices
5.46 Implementation of Vector Floating-Point (VFP) support code

Related information
ARM Application Note 133 - Using VFP with RVDS.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-209

Non-Confidential
5 Compiler Coding Practices
5.47 Compiler and library support for half-precision floating-point numbers

5.47 Compiler and library support for half-precision floating-point numbers

Half-precision is a floating-point format that occupies 16 bits.
Half-precision floating-point numbers are provided by:
• The Vector Floating-Point (VFP) Version 4 architecture.
• An optional extension to the VFPv3 architecture.
If a VFP coprocessor is not available, or if a VFPv3 coprocessor is used that does not have the extension,
half-precision floating-point numbers are supported through the floating-point library fplib.
Half-precision floating-point numbers can only be used when selected with the --fp16_format=format
compiler command-line option.
The C++ name mangling for the half-precision data type is specified in the C++ generic Application
Binary Interface (ABI).

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related information
C++ ABI for the ARM Architecture.
Floating-point Support.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-210

Non-Confidential
5 Compiler Coding Practices
5.48 Half-precision floating-point number format

5.48 Half-precision floating-point number format

The half-precision floating-point formats available are ieee and alternative. In both formats, the basic
layout of the 16-bit number is the same.
The half-precision floating-point format is as follows:

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

S E T

Figure 5-1 Half-precision floating-point format

Where:
S (bit[15]): Sign bit
E (bits[14:10]): Biased exponent
T (bits[9:0]): Mantissa.

The meanings of these fields depend on the format that is selected.

The IEEE half-precision format is as follows:
IF E==31:
IF T==0: Value = Signed infinity
IF T!=0: Value = Nan
T[9] determines Quiet or Signalling:
0: Quiet NaN
1: Signalling NaN
IF 0<E<31:
Value = (-1)^S x 2^(E-15) x (1 + (2^(-10) x T))
IF E==0:
IF T==0: Value = Signed zero
IF T!=0: Value = (-1)^S x 2^(-14) x (0 + (2^(-10) x T))

The alternative half-precision format is as follows:

IF 0<E<32:
Value = (-1)^S x 2^(E-15) x (1 + (2^(-10) x T))
IF E==0:
IF T==0: Value = Signed zero
IF T!=0: Value = (-1)^S x 2^(-14) x (0 + (2^(-10) x T))

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related information
Institute of Electrical and Electronics Engineers.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-211

Non-Confidential
5 Compiler Coding Practices
5.49 Compiler support for floating-point computations and linkage

5.49 Compiler support for floating-point computations and linkage

It is important to understand the difference between floating-point computations and floating-point
linkage.
Floating-point computations are performed by hardware coprocessor instructions or by library functions.
Floating-point linkage is concerned with how arguments are passed between functions that use floating-
point variables.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-212

Non-Confidential
5 Compiler Coding Practices
5.50 Types of floating-point linkage

5.50 Types of floating-point linkage

Different types of floating-point linkage provide different benefits.
The types of floating-point linkage are:
• Software floating-point linkage.
• Hardware floating-point linkage.
Software floating-point linkage means that the parameters and return value for a function are passed
using the ARM integer registers r0 to r3 and the stack.
Hardware floating-point linkage uses the Vector Floating-Point (VFP) coprocessor registers to pass the
arguments and return value.
The benefit of using software floating-point linkage is that the resulting code can be run on a processor
with or without a VFP coprocessor. It is not dependent on the presence of a VFP hardware coprocessor,
and it can be used with or without a VFP coprocessor present.
The benefit of using hardware floating-point linkage is that it is more efficient than software floating-
point linkage, but you must have a VFP coprocessor.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.

Related information
Procedure Call Standard for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-213

Non-Confidential
5 Compiler Coding Practices
5.51 Compiler options for floating-point linkage and computations

5.51 Compiler options for floating-point linkage and computations

Compiler options determine the type of floating-point linkage and floating-point computations.
By specifying the type of floating-point linkage and floating-point computations you require, you can
determine, from the following table, the associated compiler command-line options that are available.

Table 5-13 Compiler options for floating-point linkage and floating-point computations

Linkage Computations

Hardware FP Software FP Hardware FP Software FP Compiler options

linkage linkage coprocessor library (fplib)

No Yes No Yes --fpu=softvfp --apcs=/

softfp

No Yes Yes No --fpu=softvfp+vfpv2 --apcs=/

softfp
--fpu=softvfp+vfpv3
--fpu=softvfp+vfpv3_fp16
--fpu=softvfp+vfpv3_d16
--fpu=softvfp+vfp3_d16_fp16
--fpu=softvfp+vfpv4
--fpu=softvfp+vfpv4_d16
--fpu=softvfp+fpv4-sp

Yes No Yes No --fpu=vfp --apcs=/

hardfp
--fpu=vfpv2
--fpu=vfpv3
--fpu=vfpv3_fp16
--fpu=vfpv3_dp16
--fpu=vfpv3_d16_fp16
--fpu=vpfv4
--fpu=vfpv4_d16
--fpu=fpv4-sp

softvfp specifies software floating-point linkage. When software floating-point linkage is used, either:

• The calling function and the called function must be compiled using one of the options --softvfp,
--fpu softvfp+vfpv2, --fpu softvfp+vfpv3, --fpu softvfp+vfpv3_fp16, softvfp+vfpv3_d16,
softvfp+vfpv3_d16_fp16, softvfp+vfpv4, softvfp+vfpv4_d16, or softvfp+fpv4-sp.
• The calling function and the called function must be declared using the __softfp keyword.
Each of the options --fpu softvfp, --fpu softvfp+vfpv2,--fpu softvfp+vfpv3,
--fpu softvfp+vfpv3_fp16, --fpu softvfpv3_d16, --fpu softvfpv3_d16_fp16,
--fpu softvfp+vfpv4, softvfp+vfpv4_d16 and softvfp+fpv4-sp specify software floating-point
linkage across the whole file. In contrast, the __softfp keyword enables software floating-point linkage
to be specified on a function by function basis.
Note
Rather than having separate compiler options to select the type of floating-point linkage you require and
the type of floating-point computations you require, you use one compiler option, --fpu, to select both.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-214

Non-Confidential
5 Compiler Coding Practices
5.51 Compiler options for floating-point linkage and computations

For example, --fpu=softvfp+vfpv2 selects software floating-point linkage, and a hardware coprocessor
for the computations. Whenever you use softvfp, you are specifying software floating-point linkage.

If you use the --fpu option, you must know the VFP architecture version implemented in the target
processor. An alternative to --fpu=softvfp+... is --apcs=/softfp. This gives software linkage with
whatever VFP architecture version is implied by --cpu. --apcs=/softfp and --apcs=/hardfp are
alternative ways of requesting the integer or floating-point variant of the Procedure Call Standard for the
ARM Architecture (AAPCS).
To use hardware floating-point linkage when targeting ARM Linux, you must explicitly specify a --fpu
option that implies hardware floating-point linkage, for example, --fpu=vfpv3, or compile with
--apcs=/hardfp. The ARM Linux ABI does not support hardware floating-point linkage. The compiler
issues a warning to indicate this.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.89 --fpu=name on page 8-418.
8.115 --library_interface=lib on page 8-446.
10.15 __softfp on page 10-616.
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage on page 10-707.

Related information
Procedure Call Standard for the ARM Architecture.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-215

Non-Confidential
5 Compiler Coding Practices
5.52 Floating-point linkage and computational requirements of compiler options

5.52 Floating-point linkage and computational requirements of compiler options

There are various valid combinations of FPU options and processors.
The following table sets out the FPU options, and their capabilities and requirements.

Table 5-14 FPU-option capabilities and requirements

FPU name Hardware d0-d15 d16-d31 VFP Half Single Double

FP linkage registers registers instructions precision precision precision

softvfp No No No No No No No
softvfp+vfpv2 No Yes No Yes No Yes Yes
softvfp+vfpv3 No Yes Yes Yes No Yes Yes
softvfp No Yes Yes Yes Yes Yes Yes
+vfpv3_fp16

softvfp+vfpv3_d16 No Yes No Yes No Yes Yes

softvfp No Yes No Yes Yes Yes Yes
+vfpv3_d16_fp16

softvfp No Yes No Yes Yes Yes No

+vfpv3_sp_d16

softvfp+vfpv4 No Yes Yes Yes Yes Yes Yes

softvfp+vfpv4_d16 No Yes No Yes Yes Yes Yes
softvfp No Yes No Yes Yes Yes No
+vfpv4_sp_d16

softvfp+fpv4-sp No Yes No Yes Yes Yes No

vfp Yes Yes No Yes No Yes Yes
vfpv2 Yes Yes No Yes No Yes Yes
vfpv3 Yes Yes Yes Yes No Yes Yes
vfpv3_fp16 Yes Yes Yes Yes Yes Yes Yes
vfpv3_d16 Yes Yes No Yes No Yes Yes
vfpv3_d16_fp16 Yes Yes No Yes Yes Yes Yes
vfpv3_sp_d16 Yes Yes No Yes Yes Yes No
vfpv4 Yes Yes Yes Yes Yes Yes Yes
vfpv4_d16 Yes Yes No Yes Yes Yes Yes
vfpv4_sp_d16 Yes Yes No Yes Yes Yes No
fpv4-sp Yes Yes No Yes Yes Yes No

Note
You can specify the floating-point linkage, independently of the VFP architecture, with --apcs.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-216

Non-Confidential
5 Compiler Coding Practices
5.52 Floating-point linkage and computational requirements of compiler options

5.42 Default selection of hardware or software floating-point support on page 5-202.

5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.89 --fpu=name on page 8-418.

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-217

Non-Confidential
5 Compiler Coding Practices
5.53 Processors and their implicit Floating-Point Units (FPUs)

5.53 Processors and their implicit Floating-Point Units (FPUs)

Not every ARM processor has an FPU, but every one has an implicit --fpu option.
The following table lists the implicit --fpu option for each processor --cpu option.

Table 5-15 Implicit FPUs of processors

Processor name FPU name

ARM processors designed by ARM Limited

ARM7EJ-S SoftVFP

ARM7TDMI SoftVFP

ARM7TDMI-S SoftVFP

ARM720T SoftVFP

ARM9E-S SoftVFP

ARM9TDMI SoftVFP

ARM920T SoftVFP

ARM922T SoftVFP

ARM926EJ-S SoftVFP

ARM946E-S SoftVFP

ARM966E-S SoftVFP

ARM1020E SoftVFP

ARM1136J-S SoftVFP

ARM1136J-S-rev1 SoftVFP

ARM1136JF-S VFPv2

ARM1136JF-S-rev1 VFPv2

ARM1156T2-S SoftVFP

ARM1176JZ-S SoftVFP

ARM1176JZF-S VFPv2

Cortex-A5 SoftVFP

Cortex-A5.vfp VFPv4_D16

Cortex-A5.neon VFPv4

Cortex-A7 VFPv4

Cortex-A7.no_neon VFPv4_D16

Cortex-A7.no_neon.no_vfp SoftVFP

Cortex-A8 VFPv3

Cortex-A8.no_neon SoftVFP

Cortex-A8NoNeon SoftVFP

Cortex-A9 VFPv3_FP16

Cortex-A9.no_neon VFPv3_D16_FP16

ARM DUI0472M Copyright © 2010-2016 ARM. All rights reserved. 5-218

Non-Confidential
5 Compiler Coding Practices
5.53 Processors and their implicit Floating-Point Units (FPUs)

Table 5-15 Implicit FPUs of processors (continued)

Processor name FPU name

Cortex-A9.no_neon.no_vfp SoftVFP

Cortex-A12 VFPv4

Cortex-A12.no_neon.no_vfp SoftVFP

Cortex-A15 VFPv4

Cortex-A15.no_neon VFPv4_D16

Cortex-A15.no_neon.no_vfp SoftVFP

Cortex-A17 VFPv4

Cortex-A17.no_neon.no_vfp SoftVFP

Cortex-M0 SoftVFP

Cortex-M0plus SoftVFP

Cortex-M1 SoftVFP

Cortex-M1.os_extension SoftVFP

Cortex-M1.no_os_extension SoftVFP

Cortex-M3 SoftVFP

Cortex-M3-rev0 SoftVFP

Cortex-M4 SoftVFP

Cortex-M4.fp.sp FPv4-SP

Cortex-M7 SoftVFP

Cortex-M7.fp.sp FPv5-SP

Cortex-M7.fp.dp FPv5_D16

Cortex-R4 SoftVFP

Cortex-R4F VFPv3_D16

Cortex-R5 SoftVFP

Cortex-R5-rev1 SoftVFP

Cortex-R5F VFPv3_D16

Cortex-R5F-rev1 VFPv3_D16

Cortex-R5F-rev1.sp VFPv3_SP_D16

Cortex-R7 VFPv3_D16_FP16

Cortex-R7.no_vfp SoftVFP

Cortex-R8 VFPv3_D16_FP16

Cortex-R8.no_vfp SoftVFP

MPCore VFPv2

MPCore.no_vfp SoftVFP

MPCoreNoVFP SoftVFP

SC000 SoftVFP

Non-Confidential
5 Compiler Coding Practices
5.53 Processors and their implicit Floating-Point Units (FPUs)

Table 5-15 Implicit FPUs of processors (continued)

Processor name FPU name

SC300 SoftVFP

ARM processors designed by ARM licensees

PJ4 VFPv3_D16

PJ4.no_vfp SoftVFP

QSP VFPv3_FP16

QSP.no_neon VFPv3_FP16

QSP.no_neon.no_vfp SoftVFP

Note
You can:
• Specify a different FPU with --fpu.
• Specify the floating-point linkage, independently of the FPU architecture, with --apcs.
• Display the complete expanded command line, including the FPU, with --echo.

Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.

Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.71 --echo on page 8-399.
8.89 --fpu=name on page 8-418.

Non-Confidential
5 Compiler Coding Practices
5.54 Integer division-by-zero errors in C code

5.54 Integer division-by-zero errors in C code

For targets that do not support hardware division instructions (for example SDIV and UDIV), you can trap
and identify integer division-by-zero errors with the appropriate C library helper functions,
__aeabi_idiv0() and __rt_raise().

Trapping integer division-by-zero errors with __aeabi_idiv0()

You can trap integer division-by-zero errors with the C library helper function __aeabi_idiv0() so that
division by zero returns some standard result, for example zero.
Integer division is implemented in code through the C library helper functions __aeabi_idiv() and
__aeabi_uidiv(). Both functions check for division by zero.

When integer division by zero is detected, a branch to __aeabi_idiv0() is made. To trap the division by
zero, therefore, you only have to place a breakpoint on __aeabi_idiv0().
The library provides two implementations of __aeabi_idiv0(). The default one does nothing, so if
division by zero is detected, the division function returns zero. However, if you use signal handling, an
alternative implementation is selected that calls __rt_raise(SIGFPE, DIVBYZERO).
If you provide your own version of __aeabi_idiv0(), then the division functions call this function. The
function prototype for __aeabi_idiv0() is:
int __aeabi_idiv0(void);

If __aeabi_idiv0() returns a value, that value is used as the quotient returned by the division function.
On entry into __aeabi_idiv0(), the link register LR contains the address of the instruction after the call
to the __aeabi_uidiv() division routine in your application code.
The offending line in the source code can be identified by looking up the line of C code in the debugger
at the address given by LR.
If you want to examine parameters and save them for postmortem debugging when trapping
__aeabi_idiv0, you can use the $Super$$ and $Sub$$ mechanism:
1. Prefix __aeabi_idiv0() with $Super$$ to identify the original unpatched function
__aeabi_idiv0().
2. Use __aeabi_idiv0() prefixed with $Super$$ to call the original function directly.
3. Prefix __aeabi_idiv0() with $Sub$$ to identify the new function to be called in place of the
original version of __aeabi_idiv0().
4. Use __aeabi_idiv0() prefixed with $Sub$$ to add processing before or after the original function
__aeabi_idiv0().

The following example shows how to intercept __aeabi_div0 using the $Super$$ and $Sub$$
mechanism.
extern void $Super$$__aeabi_idiv0(void);
/* this function is called instead of the original __aeabi_idiv0() */
void $Sub$$__aeabi_idiv0()
{
// insert code to process a divide by zero
...
// call the original __aeabi_idiv0 function
$Super$$__aeabi_idiv0();
}

Trapping integer division-by-zero errors with __rt_raise()

By default, integer division by zero returns zero. If you want to intercept division by zero, you can re-
implement the C library helper function __rt_raise().
The function prototype for __rt_raise() is:
void __rt_raise(int signal, int type);

Non-Confidential
5 Compiler Coding Practices
5.54 Integer division-by-zero errors in C code

If you re-implement __rt_raise(), then the library automatically provides the signal-handling library
version of __aeabi_idiv0(), which calls __rt_raise(), then that library version of __aeabi_idiv0()
is included in the final image.
In that case, when a divide-by-zero error occurs, __aeabi_idiv0() calls __rt_raise(SIGFPE,
DIVBYZERO). Therefore, if you re-implement __rt_raise(), you must check (signal == SIGFPE) &&
(type == DIVBYZERO) to determine if division by zero has occurred.

Related information
Run-time ABI for the ARM Architecture.

Non-Confidential
5 Compiler Coding Practices
5.55 Software floating-point division-by-zero errors in C code

5.55 Software floating-point division-by-zero errors in C code

Floating-point division-by-zero errors in software can be trapped and identified using a combination of
intrinsics and C library helper functions.
Specifically:
• The __ieee_status intrinsic lets you trap floating-point division-by-zero errors.
• Placing a breakpoint on _fp_trapveneer() lets you identify software floating-point division-by-zero
errors.
• Intercepting _fp_trapveneer() using the $Super$$ and $Sub$$ mechanism lets you save
parameters for debugging.

Related concepts
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.
5.58 Software floating-point division-by-zero debugging on page 5-227.

Non-Confidential
5 Compiler Coding Practices
5.56 About trapping software floating-point division-by-zero errors

5.56 About trapping software floating-point division-by-zero errors

Software floating-point division-by-zero errors can be trapped with the __ieee_status intrinsic.
__ieee_status(FE_IEEE_MASK_ALL_EXCEPT, FE_IEEE_MASK_DIVBYZERO);

This traps any division-by-zero errors in code, and untraps all other exceptions, as illustrated in the
following example:
#include <stdio.h>
#include <fenv.h>
int main(void)
{ float a, b, c;
// Trap the Invalid Operation exception and untrap all other
// exceptions:
__ieee_status(FE_IEEE_MASK_ALL_EXCEPT, FE_IEEE_MASK_DIVBYZERO);
c = 0;
a = b / c;
printf("b / c = %f, ", a); // trap division-by-zero error
return 0;
}

Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.
5.58 Software floating-point division-by-zero debugging on page 5-227.

Related information
__ieee_status().

Non-Confidential
5 Compiler Coding Practices
5.57 Identification of software floating-point division-by-zero errors

5.57 Identification of software floating-point division-by-zero errors

You can use the C library helper function _fp_trapveneer() to identify the location of a software
floating-point division-by-zero error.
_fp_trapveneer() is called whenever an exception occurs. On entry into this function, the state of the
registers is unchanged from when the exception occurred. Therefore, to find the address of the function
in the application code that contains the arithmetic operation that resulted in the exception, a breakpoint
can be placed on the function _fp_trapveneer() and LR can be inspected.
For example, consider the following example C code:
#include <stdio.h>
#include <fenv.h>
int main(void)
{ float a, b, c;
// Trap the Invalid Operation exception and untrap all other
// exceptions:
__ieee_status(FE_IEEE_MASK_ALL_EXCEPT, FE_IEEE_MASK_DIVBYZERO);
c = 0;
b = 5.366789;
a = b / c;
printf("b / c = %f, ", a); // trap division-by-zero error
return 0;
}

This example code is compiled with the following command:

armcc --fpmode ieee_full

The compiled example disassembles to the following code:

main:
0x000080E0 : PUSH {r4,lr}
0x000080E4 : MOV r1,#0x200
0x000080E8 : MOV r0,#0x9f00
0x000080EC : BL __ieee_status ; 0xB9B8
0x000080F0 : MOV r4,#0
0x000080F4 : LDR r0,[pc,#40] ; [0x8124] = 0x891E2153
0x000080F8 : LDR r1,[pc,#40] ; [0x8128] = 0x40157797
0x000080FC : BL __aeabi_d2f ; 0xA948
0x00008100 : MOV r1,r4
0x00008104 : BL __aeabi_fdiv ; 0xB410
0x00008108 : BL __aeabi_f2d ; 0xB388
0x0000810C : MOV r2,r0
0x00008110 : MOV r3,r1
0x00008114 : ADR r0,{pc}+0x18 ; 0x812c
0x00008118 : BL __2printf ; 0x813C
0x0000811C : MOV r0,#0
0x00008120 : POP {r4,pc}
0x00008124 : DCD 0x891E2153
0x00008128 : DCD 0x40157797
0x0000812C : DCD 0x202F2062
0x00008130 : DCD 0x203D2063
0x00008134 : DCD 0x202C6625
0x00008138 : DCD 0x00000000

Placing a breakpoint on _fp_trapveneer() and executing the disassembly in the debug monitor
produces:
> run
Execution stopped at breakpoint 1: S:0x0000BAC8
In _fp_trapveneer (no debug info)
S:0x0000BAC8 PUSH {r12,lr}

Then, inspection of the registers shows:

r0: 0x40ABBCBC r1: 0x00000000 r2: 0x00000000 r3: 0x00000000
r4: 0x0000C1DC r5: 0x0000BD44 r6: 0x00000000 r7: 0x00000000
r8: 0x00000000 r9: 0x00000000 r10: 0x0000BC1C r11: 0x00000000
r12: 0x08000004 SP: 0x0FFFFFF8 LR: 0x00008108 PC: 0x0000BAC8
CPSR: 0x000001D3

Non-Confidential
5 Compiler Coding Practices
5.57 Identification of software floating-point division-by-zero errors

The address contained in the link register LR is set to 0x8108, the address of the instruction after the
instruction BL __aeabi_fdiv that resulted in the exception.

Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.58 Software floating-point division-by-zero debugging on page 5-227.

Non-Confidential
5 Compiler Coding Practices
5.58 Software floating-point division-by-zero debugging

5.58 Software floating-point division-by-zero debugging

Parameters for postmortem debugging can be saved by intercepting _fp_trapveneer().
You can use the $Super$$ and $Sub$$ mechanism to intervene in all calls to _fp_trapveneer().
For example:
AREA foo, CODE
IMPORT |$Super$$_fp_trapveneer|
EXPORT |$Sub$$_fp_trapveneer|
|$Sub$$_fp_trapveneer|
;; Add code to save whatever registers you require here
;; Take care not to corrupt any needed registers
B |$Super$$_fp_trapveneer|
END

Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.

Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions.

Non-Confidential
5 Compiler Coding Practices
5.59 New language features of C99

5.59 New language features of C99

The 1999 C99 standard introduces several new language features.
These new features include:
• Some features similar to extensions to C90 offered in the GNU compiler, for example, macros with a
variable number of arguments.
Note
The implementations of extensions to C90 in the GNU compiler are not always compatible with the
implementations of similar features in C99.

• Some features available in C++, such as // comments and the ability to mix declarations and
statements.
• Some entirely new features, for example complex numbers, restricted pointers and designated
initializers.
• New keywords and identifiers.
• Extended syntax for the existing C90 language.
A selection of new features in C99 that might be of interest to developers using them for the first time are
documented.
Note
C90 is compatible with Standard C++ in the sense that the language specified by the standard is a subset
of C++, except for a few special cases. New features in the C99 standard mean that C99 is no longer
compatible with C++ in this sense.

Some examples of special cases where the language specified by the C90 standard is not a subset of C++
include support for // comments and merging of the typedef and structure tag namespaces. For example,
in C90 the following code expands to x = a / b - c; because /* hello world */ is deleted, but in C
++ and C99 it expands to x = a - c; because everything from // to the end of the first line is deleted:
x = a //* hello world */ b
- c;

The following code demonstrates how typedef and the structure tag are treated differently between C (90
and 99) and C++ because of their merged namespaces:
typedef int a;
{
struct a { int x, y; };
printf("%d\n", sizeof(a));
}

In C 90 and C99, this code defines two types with separate names whereby a is a typedef for int and
struct a is a structure type containing two integer data types. sizeof(a) evaluates to sizeof(int).

In C++, a structure type can be addressed using only its tag. This means that when the definition of
struct a is in scope, the name a used on its own refers to the structure type rather than the typedef, so
in C++ sizeof(a) is greater than sizeof(int).

Related concepts
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.

Non-Confidential
5 Compiler Coding Practices
5.59 New language features of C99

5.67 inline functions in C99 on page 5-237.

5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.

Non-Confidential
5 Compiler Coding Practices
5.60 New library features of C99

5.60 New library features of C99

The C99 standard introduces several new library features of interest to programmers.
These new features include:
• Some features similar to extensions to the C90 standard libraries offered in UNIX standard libraries,
for example, the snprintf family of functions.
• Some entirely new library features, for example, the standardized floating-point environment offered
in <fenv.h>.
• New libraries, and new macros and functions for existing C90 libraries.
A selection of new features in C99 that might be of interest to developers using them for the first time are
documented.
Note
C90 is compatible with Standard C++ in the sense that the language specified by the standard is a subset
of C++, except for a few special cases. New features in the C99 standard mean that C99 is no longer
compatible with C++ in this sense.

Many library features that are new to C99 are available in C90 and C++. Some require macros such as
USE_C99_ALL or USE_C99_MATH to be defined before the #include.

Related concepts
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.61 // comments in C99 and C90

5.61 // comments in C99 and C90

In C99 you can use // to indicate the start of a one-line comment, like in C++. In C90 mode you can
use // comments providing you do not specify --strict.

Related concepts
5.59 New language features of C99 on page 5-228.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.

Related references
8.176 --strict, --no_strict on page 8-513.
9.7 // comments on page 9-557.

Non-Confidential
5 Compiler Coding Practices
5.62 Compound literals in C99

5.62 Compound literals in C99

ISO C99 supports compound literals. A compound literal looks like a cast followed by an initializer.
Its value is an object of the type specified in the cast, containing the elements specified in the initializer.
It is an lvalue.
For example:
int *y = (int []) {1, 2, 3};
int *z = (int [3]) {1};

Note
int *y = (int []) {1, 2, 3}; is accepted by the compiler, but int y[] = (int []) {1, 2, 3};
is not accepted as a high-level (global) initialization.

In the following example source code, the compound literals are:

• (struct T) { 43, "world"}
• &(struct T) {.b = "hello", .a = 47}
• &(struct T) {43, "hello"}
• (int[]){1, 2, 3}
struct T
{
int a;
char *b;
} t2;
void g(const struct T *t);
void f()
{
int x[10];
...
t2 = (struct T) {43, "world"};
g(&(struct T) {.b = "hello", .a = 47});
g(&(struct T) {43, "bye"});
memcpy(x, (int[]){1, 2, 3}, 3 * sizeof(int));
}

Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.

Non-Confidential
5 Compiler Coding Practices
5.63 Designated initializers in C99

5.63 Designated initializers in C99

In C90, there is no way to initialize specific members of arrays, structures, or unions. C99 supports the
initialization of specific members of an array, structure, or union by either name or subscript through the
use of designated initializers.
For example:
typedef struct
{
char *name;
int rank;
} data;
data vars[10] = { [0].name = "foo", [0].rank = 1,
[1].name = "bar", [1].rank = 2,
[2].name = "baz",
[3].name = "gazonk" };

Members of an aggregate that are not explicitly initialized are initialized to zero by default.

Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.

Non-Confidential
5 Compiler Coding Practices
5.64 Hexadecimal floating-point numbers in C99

5.64 Hexadecimal floating-point numbers in C99

C99 supports floating-point numbers that can be written in hexadecimal format.
For example:
float hex_floats(void)
{
return 0x1.fp3; // 1 15/16 * 2^3
}

In hexadecimal format the exponent is a decimal number that indicates the power of two by which the
significant part is multiplied. Therefore 0x1.fp3 = 1.9375 * 8 = 1.55e1.
C99 also adds %a and %A format for printf().

Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.

Non-Confidential
5 Compiler Coding Practices
5.65 Flexible array members in C99

5.65 Flexible array members in C99

In a struct with more than one member, the last member of the struct can have incomplete array type.
Such a member is called a flexible array member of the struct.

Note
When a struct has a flexible array member, the entire struct itself has incomplete type.

Flexible array members enable you to mimic dynamic type specification in C in the sense that you can
defer the specification of the array size to runtime. For example:
extern const int n;
typedef struct
{
int len;
char p[];
} str;
void foo(void)
{
size_t str_size = sizeof(str); // equivalent to offsetoff(str, p)
str *s = malloc(str_size + (sizeof(char) * n));
}

Non-Confidential
5 Compiler Coding Practices
5.66 __func__ predefined identifier in C99

5.66 func predefined identifier in C99

The __func__ predefined identifier provides a means of obtaining the name of the current function.
For example, the function:
void foo(void)
{
printf("This function is called '%s'.\n", __func__);
}

prints:
This function is called 'foo'.

Related references
10.162 Built-in function name variables on page 10-792.

Non-Confidential
5 Compiler Coding Practices
5.67 inline functions in C99

5.67 inline functions in C99

The C99 keyword inline hints to the compiler that invocations of a function qualified with inline are
to be expanded inline.
For example:
inline int max(int a, int b)
{
return (a > b) ? a : b;
}

The compiler inlines a function qualified with inline only if it is reasonable to do so. It is free to ignore
the hint if inlining the function adversely affects performance.
Note
The __inline keyword is available in C90.

Note
The semantics of inline in C99 are different to the semantics of inline in Standard C++.

Non-Confidential
5 Compiler Coding Practices
5.68 long long data type in C99 and C90

5.68 long long data type in C99 and C90

C99 supports the integral data type long long.
This type is 64 bits wide in the ARM compilation tools.
For example:
long long int j = 25902068371200; // length of light
// day, meters
unsigned long long int i = 94607304725808000ULL; // length of light
// year, meters

long long is also available in C90 when not using --strict.

int64 is a synonym for long long. int64 is always available.

Related references
9.12 long long on page 9-562.

Non-Confidential
5 Compiler Coding Practices
5.69 Macros with a variable number of arguments in C99

5.69 Macros with a variable number of arguments in C99

You can declare a macro in C99 that accepts a variable number of arguments.
The syntax for defining such a macro is similar to that of a function. For example:
#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__)
void Variadic_Macros_0()
{
debug ("a test string is printed out along with %x %x %x\n", 12, 14, 20);
}

Non-Confidential
5 Compiler Coding Practices
5.70 Mixed declarations and statements in C99

5.70 Mixed declarations and statements in C99

C99 enables you to mix declarations and statements within compound statements, like in C++.
For example:
void foo(float i)
{
i = (i > 0) ? -i : i;
float j = sqrt(i); // illegal in C90
}

Non-Confidential
5 Compiler Coding Practices
5.71 New block scopes for selection and iteration statements in C99

5.71 New block scopes for selection and iteration statements in C99
In a for loop, the first expression can be a declaration, like in C++. The scope of the declaration extends
to the body of the loop only.
For example:
extern int max;
for (int n = max - 1; n >= 0; n--)
{
// body of loop
}

is equivalent to:
extern int max;
{
int n = max - 1;
for (; n >= 0; n--)
{
// body of loop
}
}

Note
Unlike in C++, you cannot introduce new declarations in a for-test, if-test or switch-expression.

Non-Confidential
5 Compiler Coding Practices
5.72 _Pragma preprocessing operator in C99

5.72 _Pragma preprocessing operator in C99

C90 does not permit a #pragma directive to be produced as the result of a macro expansion. However, the
C99 _Pragma operator enables you to embed a preprocessor macro in a pragma directive.
_Pragma is permitted in C90 if --strict is not specified.

For example:
# define RWDATA(X) PRAGMA(arm section rwdata=#X)
# define PRAGMA(X) _Pragma(#X)
RWDATA(foo) // same as #pragma arm section rwdata="foo"
int y = 1; // y is placed in section "foo"

Non-Confidential
5 Compiler Coding Practices
5.73 Restricted pointers in C99

5.73 Restricted pointers in C99

The C99 keyword restrict is an indication to the compiler that different object pointer types and
function parameter arrays do not point to overlapping regions of memory.
This enables the compiler to perform optimizations that might otherwise be prevented because of
possible aliasing.
In the following example, pointer a does not, and must not, point to the same region of memory as
pointer b:
void copy_array(int n, int *restrict a, int *restrict b)
{
while (n-- > 0)
*a++ = *b++;
}
void test(void)
{
extern int array[100];
copy_array(50, array + 50, array); // valid
copy_array(50, array + 1, array); // undefined behavior
}

Pointers qualified with restrict can however point to different arrays, or to different regions within an
array.
It is your responsibility to ensure that restrict-qualified pointers do not point to overlapping regions of
memory.
__restrict, permitted in C90 and C++, is a synonym for restrict.

--restrict enables restrict to be used in C90 and C++.

Related references
8.164 --restrict, --no_restrict on page 8-500.

Non-Confidential
5 Compiler Coding Practices
5.74 Additional <math.h> library functions in C99

5.74 Additional <math.h> library functions in C99

C99 supports additional macros, types, and functions in the standard header <math.h> that are not found
in the corresponding C90 standard header.
New macros found in C99 that are not found in C90 include:
INFINITY // positive infinity
NAN // IEEE not-a-number

New generic function macros found in C99 that are not found in C90 include:
#define isinf(x) // non-zero only if x is positive or negative infinity
#define isnan(x) // non-zero only if x is NaN
#define isless(x, y) // 1 only if x < y and x and y are not NaN, and 0 otherwise
#define isunordered(x, y) // 1 only if either x or y is NaN, and 0 otherwise

New mathematical functions found in C99 that are not found in C90 include:
double acosh(double x); // hyperbolic arccosine of x
double asinh(double x); // hyperbolic arcsine of x
double atanh(double x); // hyperbolic arctangent of x
double erf(double x); // returns the error function of x
double round(double x); // returns x rounded to the nearest integer
double tgamma(double x); // returns the gamma function of x

C99 supports the new mathematical functions for all real floating-point types.
Single precision versions of all existing <math.h> functions are also supported.

Related concepts
5.60 New library features of C99 on page 5-230.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Related information
Institute of Electrical and Electronics Engineers.

Non-Confidential
5 Compiler Coding Practices
5.75 Complex numbers in C99

5.75 Complex numbers in C99

In C99 mode, the compiler supports complex and imaginary numbers. In GNU mode, the compiler
supports complex numbers only.
For example:
#include <stdio.h>
#include <complex.h>
int main(void)
{
complex float z = 64.0 + 64.0*I;
printf("z = %f + %fI\n", creal(z), cimag(z));
return 0;
}

The complex types are:

• float complex.
• double complex.
• long double complex.

Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.76 Boolean type and <stdbool.h> in C99

5.76 Boolean type and <stdbool.h> in C99

C99 introduces the native type _Bool.
The associated standard header <stdbool.h> introduces the macros bool, true and false for Boolean
tests. For example:
#include <stdbool.h>
bool foo(FILE *str)
{
bool err = false;
...
if (!fflush(str))
{
err = true;
}
...
return err;
}

Note
The C99 semantics for bool are intended to match those of C++.

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99

5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99
In C90, the long data type can serve both as the largest integral type, and as a 32-bit container. C99
removes this ambiguity through the new standard library header files <inttypes.h> and <stdint.h>.
The header file <stdint.h> introduces the new types:
• intmax_t and uintmax_t, that are maximum width signed and unsigned integer types.
• intptr_t and unintptr_t, that are integer types capable of holding signed and unsigned object
pointers.
The header file <inttypes.h> provides library functions for manipulating values of type intmax_t,
including:
intmax_t imaxabs(intmax_t x); // absolute value of x
imaxdiv_t imaxdiv(intmax_t x, intmax_t y) // returns the quotient and remainder
// of x / y

These header files are also available in C90 and C++.

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.78 <fenv.h> floating-point environment access in C99

5.78 <fenv.h> floating-point environment access in C99

The C99 standard header file <fenv.h> provides access to an IEEE 754-compliant floating-point
environment for numerical programming.
The library introduces two types and numerous macros and functions for managing and controlling
floating-point state.
The new types supported are:
• fenv_t, representing the entire floating-point environment.
• fexcept_t, representing the floating-point state.

New macros supported include:

• FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW and FE_UNDERFLOW for managing floating-
point exceptions.
• FE_DOWNWARD, FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD for managing rounding in the represented
rounding direction.
• FE_DFL_ENV, representing the default floating-point environment.
New functions include:
int feclearexcept(int ex); // clear floating-point exceptions selected by ex
int feraiseexcept(int ex); // raise floating point exceptions selected by ex
int fetestexcept(int ex); // test floating point exceptions selected by ex
int fegetround(void); // return the current rounding mode
int fesetround(int mode); // set the current rounding mode given by mode
int fegetenv(fenv_t *penv); return the floating-point environment in penv
int fesetenv(const fenv_t *penv); // set the floating-point environment to penv

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Related information
Institute of Electrical and Electronics Engineers.

Non-Confidential
5 Compiler Coding Practices
5.79 <stdio.h> snprintf family of functions in C99

5.79 <stdio.h> snprintf family of functions in C99

Using the sprintf family of functions found in the C90 standard header <stdio.h> can be dangerous.
In the statement:
sprintf(buffer, "Error %d: Cannot open file '%s'", errno, filename);

the full output of the formatting operation is written into buffer regardless of whether there is enough
space to hold it. Consequently, more characters can be output than might fit in the memory allocated to
the string.
The snprintf functions found in the C99 version of <stdio.h> are safe versions of the sprintf
functions that prevent buffer overrun. In the statement:
snprintf(buffer, size, "Error %d: Cannot open file '%s'", errno, filename);

the variable size specifies the maximum number of characters that can be written to buffer. The buffer
can never be overrun, provided its size is always greater than the size specified by size.
Note
The C standard does not define what should happen if buffer + size exceeds 4GB (the limit of the 32-
bit address space). In this scenario, the ARM implementation of snprintf does not write any data to the
buffer (to prevent wrapping the buffer around the address space) and returns the number of bytes that
would have been written.

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.80 <tgmath.h> type-generic math macros in C99

5.80 <tgmath.h> type-generic math macros in C99

The new standard header <tgmath.h> defines several families of mathematical functions that are type
generic in the sense that they are overloaded on floating-point types.
For example, the trigonometric function cos works as if it has the overloaded declaration:
extern float cos(float x);
extern double cos(double x);
extern long double cos(long double x);
...

A statement such as:

p = cos(0.78539f); // p = cos(pi / 4)

calls the single-precision version of the cos function, as determined by the type of the literal 0.78539f.
Note
Type-generic families of mathematical functions can be defined in C++ using the operator overloading
mechanism. The semantics of type-generic families of functions defined using operator overloading in
C++ are different from the semantics of the corresponding families of type-generic functions defined in
<tgmath.h>.

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.

Non-Confidential
5 Compiler Coding Practices
5.81 <wchar.h> wide character I/O functions in C99

5.81 <wchar.h> wide character I/O functions in C99

Wide character I/O functions have been incorporated into C99. These enable you to read and write wide
characters from a file in much the same way as normal characters.
The ARM C Library supports all of the C99 functions defined in wchar.h.

Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.

Non-Confidential
5 Compiler Coding Practices
5.82 How to prevent uninitialized data from being initialized to zero

5.82 How to prevent uninitialized data from being initialized to zero

The ANSI C specification states that static data that is not explicitly initialized, is to be initialized to
zero.
Therefore, by default, the compiler puts both zero-initialized and uninitialized data into the same data
section, which is populated with zeroes at runtime by the C library initialization code. The data section
can either be a RW data section (.data) or a ZI data section (.bss). When optimizing, the compiler might
put small global ZI data items in RW data sections. Specifying --bss_threshold=0 prevents this
behavior, and puts small global ZI data items in ZI data sections.
You can prevent uninitialized data from being initialized to zero by placing that data in a different
section. This can be achieved using #pragma arm section, or with the GNU compiler extension
__attribute__((section("name"))).

The following example shows how to keep uninitialized data using #pragma arm section:
#pragma arm section zidata = "non_initialized"
int i, j; // uninitialized data in non_initialized section (without the pragma,
// would be in .bss section by default)
#pragma arm section zidata // back to default (.bss section)
int k = 0, l = 0; // zero-initialized data in .bss section

Specify --bss_threshold=0 when compiling this example code, to ensure that k and l are placed in a ZI
data section. If --bss_threshold=0 is not used, section name rwdata must be used instead of zidata.
The non_initialized section is placed into its own UNINIT execution region, as follows:
LOAD_1 0x0
{
EXEC_1 +0
{
* (+RO)
* (+RW)
* (+ZI) ; ZI data gets initialized to zero
}
EXEC_2 +0 UNINIT
{
* (non_initialized) ; ZI data does not get initialized to zero
}
}

Related references
8.93 --gnu on page 8-424.
10.79 #pragma arm section [section_type_list] on page 10-684.
10.69 __attribute__((section("name"))) variable attribute on page 10-674.
8.20 --bss_threshold=num on page 8-343.

Related information
Execution region attributes.

Non-Confidential
Chapter 6
Compiler Diagnostic Messages

Describes the format of compiler diagnostic messages and how to control the output during compilation.
The compiler issues messages about potential portability problems and other hazards. It is possible to:
• Turn off specific messages. For example, warnings can be turned off if you are in the early stages of
porting a program written in old-style C. In general, however, it is better to check the code than to
turn off messages.
• Change the severity of specific messages.
It contains the following sections:
• 6.1 Severity of compiler diagnostic messages on page 6-254.
• 6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
• 6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
• 6.4 Prefix letters in compiler diagnostic messages on page 6-259.
• 6.5 Compiler exit status codes and termination messages on page 6-260.
• 6.6 Compiler data flow warnings on page 6-261.

Non-Confidential
6 Compiler Diagnostic Messages
6.1 Severity of compiler diagnostic messages

6.1 Severity of compiler diagnostic messages

Diagnostic messages have an associated severity.
The following table describes each of the different severities.

Table 6-1 Severity of diagnostic messages

Severity Description

Internal fault Internal faults indicate an internal problem with the compiler. Contact your supplier with feedback.

Error Errors indicate problems that cause the compilation to stop. These errors include command line errors, internal errors,
missing include files, and violations in the syntactic or semantic rules of the C or C++ language. If multiple source files
are specified, no more source files are compiled.

Warning Warnings indicate unusual conditions in your code that might indicate a problem. Compilation continues, and object
code is generated unless any more problems with an Error severity are detected.

Remark Remarks indicate common, but sometimes unconventional, use of C or C++. These diagnostics are not displayed by
default. Compilation continues, and object code is generated unless any more problems with an Error severity are
detected.

Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.

Non-Confidential
6 Compiler Diagnostic Messages
6.2 Options that change the severity of compiler diagnostic messages

6.2 Options that change the severity of compiler diagnostic messages

You can change the diagnostic severity of all remarks and warnings, and a limited number of errors.
These options let you change severities:
--diag_error=tag[, tag, ...]
Sets the diagnostic messages that have the specified tag, or tags, to Error severity.
--diag_error=warning
Upgrades all warning messages to Error severity.
--diag_remark=tag[, tag, ...]
Sets the diagnostic messages that have the specified tag, or tags, to Remark severity.
--diag_suppress=tag[, tag, ...]
Suppresses the diagnostic messages that have the specified tag, or tags.
--diag_suppress=optimizations
Suppresses diagnostic messages for high-level optimizations.
--diag_warning=tag[, tag, ...]
Sets the diagnostic messages that have the specified tag, or tags, to Warning severity.
--diag_warning=error
Sets all downgradable error messages to Warning severity.
The format tag[, tag, ...] indicates a comma-separated list of the error messages that you want to
change. For example, you might want to change a warning message with the number 1293 to Remark
severity, because remarks are not displayed by default.

Note
tag is the four-digit number, nnnn, with the tool letter prefix, but without the letter suffix indicating the
severity.

To do this, use the following command:

armcc --diag_remark=1293 ...

Only errors with a suffix of -D following the error number can be downgraded by changing them into
warnings or remarks.

Note
These options also have pragma equivalents.

The following diagnostic messages can be changed:

• Messages with the number format #nnnn-D.
• Warning messages with the number format CnnnnW.
It is also possible to apply changes to optimization messages as a group. For example,
--diag_warning=optimizations. By default, optimization messages are remarks.

Related concepts
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.

Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.

Non-Confidential
6 Compiler Diagnostic Messages
6.2 Options that change the severity of compiler diagnostic messages

10.82 #pragma diag_remark tag[,tag,...] on page 10-688.

10.83 #pragma diag_suppress tag[,tag,...] on page 10-689.
10.84 #pragma diag_warning tag[, tag, ...] on page 10-690.
10.98 #pragma pop on page 10-705.
10.99 #pragma push on page 10-706.
8.58 --diag_error=tag[,tag,...] on page 8-386.
8.59 --diag_remark=tag[,tag,...] on page 8-387.
8.60 --diag_style=arm|ide|gnu compiler option on page 8-388.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.62 --diag_suppress=optimizations on page 8-390.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
8.64 --diag_warning=optimizations on page 8-392.

Non-Confidential
6 Compiler Diagnostic Messages
6.3 Controlling compiler diagnostic messages with pragmas

6.3 Controlling compiler diagnostic messages with pragmas

Pragmas let you suppress, enable, or change the severity of specific diagnostic messages from within
your code.
For example, you can suppress a particular diagnostic message when compiling one specific function.
Note
You can alternatively use command-line options to suppress or change the severity of messages, but the
change applies for the entire compilation.

Examples
The following example shows three identical functions, foo1(), foo2(), and foo3(), all of which would
normally provoke diagnostic message #177-D: variable "x" was declared but never
referenced.

For foo1(), the current pragma state is pushed to the stack and #pragma diag_suppress suppresses the
message. The message is re-enabled by #pragma pop before compiling foo2(). In foo3(), the message
is not suppressed because the #pragma push and #pragma pop do not enclose the full scope responsible
for the generation of the message:
#pragma push
#pragma diag_suppress 177
void foo1( void )
{
/* Here we do not expect a diagnostic, because we suppressed it. */
int x;
}
#pragma pop

void foo2( void )

{
/* Here we do, because the suppression was inside push/pop. */
int x;
}

void foo3( void )

{
#pragma push
#pragma diag_suppress 177
/* Here, the suppression fails because the push/pop must enclose the whole function. */
int x;
#pragma pop
}

Diagnostic messages use the pragma state in place at the time they are generated. If you use pragmas to
control a message in your code, you must be aware of when that message is generated. For example, the
following code is intended to suppress the diagnostic message #177-D: function "dummy" was
declared but never referenced:

#include <stdio.h>
#pragma push
#pragma diag_suppress 177
static int dummy(void)
{
printf("This function is never called.");
return 1;
}
#pragma pop
main(void){
printf("Hello world!\n");
}

However, message 177 is only generated after all functions have been processed. Therefore, the message
is generated after pragma pop restores the pragma state, and message 177 is not suppressed.
Removing pragma push and pragma pop would correctly suppress message 177, but would suppress
messages for all unreferenced functions rather than for only the dummy() function.

Non-Confidential
6 Compiler Diagnostic Messages
6.3 Controlling compiler diagnostic messages with pragmas

Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.

Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.
10.82 #pragma diag_remark tag[,tag,...] on page 10-688.
10.83 #pragma diag_suppress tag[,tag,...] on page 10-689.
10.84 #pragma diag_warning tag[, tag, ...] on page 10-690.
10.98 #pragma pop on page 10-705.
10.99 #pragma push on page 10-706.
8.58 --diag_error=tag[,tag,...] on page 8-386.
8.59 --diag_remark=tag[,tag,...] on page 8-387.
8.60 --diag_style=arm|ide|gnu compiler option on page 8-388.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.62 --diag_suppress=optimizations on page 8-390.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
8.64 --diag_warning=optimizations on page 8-392.

Non-Confidential
6 Compiler Diagnostic Messages
6.4 Prefix letters in compiler diagnostic messages

6.4 Prefix letters in compiler diagnostic messages

The compilation tools automatically insert an identification letter to diagnostic messages.
The following table shows the prefix letters used by the compilation tools. Using these prefix letters
enables the tools to use overlapping message ranges.

Table 6-2 Identifying diagnostic messages

Prefix letter Tool

C armcc

A armasm

L armlink or armar

Q fromelf

The following rules apply:

• All of the compilation tools act on a message number without a prefix.
• A message number with a prefix is only acted on by the tool with the matching prefix.
• A tool does not act on a message with a non-matching prefix.
Therefore, the compiler prefix C can be used with --diag_error, --diag_remark, and
--diag_warning, or when suppressing messages, for example:

armcc --diag_suppress=C1287,C4017 ...

Use the prefix letters to control options that are passed from the compiler to other tools, for example,
include the prefix letter L to specify linker message numbers.

Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.

Related references
6.1 Severity of compiler diagnostic messages on page 6-254.

Non-Confidential
6 Compiler Diagnostic Messages
6.5 Compiler exit status codes and termination messages

6.5 Compiler exit status codes and termination messages

If the compiler detects any warnings or errors during compilation, it writes the messages to stderr.
At the end of the messages, a summary message is displayed that gives the total number of each type of
message of the form:
filename: n warnings, n errors

where n indicates the number of warnings or errors detected.

Note
Remarks are not displayed by default. To display remarks, use the --remarks compiler option. No
summary message is displayed if only remark messages are generated.

The signals SIGINT (caused by a user interrupt, like ^C) and SIGTERM (caused by a UNIX kill
command) are trapped by the compiler and cause abnormal termination.
On completion, the compiler returns a value greater than zero if an error is detected. If no error is
detected, a value of zero is returned.

Related references
6.1 Severity of compiler diagnostic messages on page 6-254.

Non-Confidential
6 Compiler Diagnostic Messages
6.6 Compiler data flow warnings

6.6 Compiler data flow warnings

The compiler performs data flow analysis as part of its optimization process. This information can help
identify potential problems in your code, for example, issuing warnings about the use of uninitialized
variables.
The data flow analysis can only warn about local variables that are held in processor registers, not global
variables held in memory or variables or structures that are placed on the stack.
Be aware that:
• In ARM Compiler 5.04 and later, data flow warnings are suppressed by default. To output them, use
the --diag_warning=4017 option. In RealView Compiler Tools (RVCT) v2.0 and earlier, data flow
warnings are issued only if you specify the -fa option.
• Data flow analysis is disabled at optimization level -O0, even if you specify --diag_warning=4017.
For example, the following code produces the warning C4017W: i may be used before being set, if
you have enabled it, when compiling at -O1 and above:
int f(void)
{
int i;
return i++;
}

The results of the analysis vary with the level of optimization used. This means that higher optimization
levels might produce a number of warnings that do not appear at lower levels.
The data flow analysis cannot reliably identify faulty code and any C4017W warnings issued by the
compiler are intended only as an indication of possible problems. For a full analysis of your code, use an
appropriate third-party analysis tool, for example Lint.

Related references
6.1 Severity of compiler diagnostic messages on page 6-254.

Non-Confidential
Chapter 7
Using the Inline and Embedded Assemblers of the
ARM Compiler

Describes the optimizing inline assembler and non-optimizing embedded assembler of the ARM
compiler, armcc.

Note
Using intrinsics is generally preferable to using inline or embedded assembly language.

It contains the following sections:

• 7.1 Compiler support for inline assembly language on page 7-264.
• 7.2 Inline assembler support in the compiler on page 7-265.
• 7.3 Restrictions on inline assembler support in the compiler on page 7-266.
• 7.4 Inline assembly language syntax with the __asm keyword in C and C++ on page 7-267.
• 7.5 Inline assembly language syntax with the asm keyword in C++ on page 7-268.
• 7.6 Inline assembler rules for compiler keywords __asm and asm on page 7-269.
• 7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
• 7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
• 7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
• 7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
• 7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
• 7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
• 7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
• 7.14 Inline assembler and register access in C and C++ code on page 7-277.
• 7.15 Inline assembler and the # constant expression specifier in C and C++ code on page 7-279.
• 7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler

• 7.17 Expansion of inline assembler instructions that use constants on page 7-281.
• 7.18 Expansion of inline assembler load and store instructions on page 7-282.
• 7.19 Inline assembler effect on processor condition flags in C and C++ code on page 7-283.
• 7.20 Inline assembler expression operands in C and C++ code on page 7-284.
• 7.21 Inline assembler register list operands in C and C++ code on page 7-285.
• 7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
• 7.23 Inline assembler function calls and branches in C and C++ code on page 7-287.
• 7.24 Inline assembler branches and labels in C and C++ code on page 7-289.
• 7.25 Inline assembler and virtual registers on page 7-290.
• 7.26 Embedded assembler support in the compiler on page 7-291.
• 7.27 Embedded assembler syntax in C and C++ on page 7-292.
• 7.28 Effect of compiler ARM and Thumb states on embedded assembler on page 7-293.
• 7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
• 7.30 Compiler generation of embedded assembly language functions on page 7-295.
• 7.31 Access to C and C++ compile-time constant expressions from embedded assembler
on page 7-297.
• 7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.
• 7.33 Manual overload resolution in embedded assembler on page 7-299.
• 7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
• 7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
• 7.36 __mcall_is_virtual(D, f) on page 7-302.
• 7.37 __mcall_is_in_vbase(D, f) on page 7-303.
• 7.38 __mcall_offsetof_vbase(D, f) on page 7-304.
• 7.39 __mcall_this_offset(D, f) on page 7-305.
• 7.40 __vcall_offsetof_vfunc(D, f) on page 7-306.
• 7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
• 7.42 Calling a nonvirtual member function on page 7-308.
• 7.43 Calling a virtual member function on page 7-309.
• 7.44 Accessing sp (r13), lr (r14), and pc (r15) on page 7-310.
• 7.45 Differences in compiler support for inline and embedded assembly code on page 7-311.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.1 Compiler support for inline assembly language

7.1 Compiler support for inline assembly language

The compiler provides an inline assembler that enables you to write optimized assembly language
routines, and to access features of the target processor not available from C or C++.

Related concepts
7.2 Inline assembler support in the compiler on page 7-265.
7.3 Restrictions on inline assembler support in the compiler on page 7-266.
7.4 Inline assembly language syntax with the __asm keyword in C and C++ on page 7-267.
7.5 Inline assembly language syntax with the asm keyword in C++ on page 7-268.
7.6 Inline assembler rules for compiler keywords __asm and asm on page 7-269.
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.14 Inline assembler and register access in C and C++ code on page 7-277.
7.15 Inline assembler and the # constant expression specifier in C and C++ code on page 7-279.
7.19 Inline assembler effect on processor condition flags in C and C++ code on page 7-283.
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
7.45 Differences in compiler support for inline and embedded assembly code on page 7-311.
7.23 Inline assembler function calls and branches in C and C++ code on page 7-287.
7.24 Inline assembler branches and labels in C and C++ code on page 7-289.
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.

Related references
10.159 Named register variables on page 10-774.

Related information
armasm User Guide.
Mixing C, C++, and Assembly Language.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.2 Inline assembler support in the compiler

7.2 Inline assembler support in the compiler

The inline assembler supports ARM assembly language for all architectures, and Thumb assembly
language in ARMv6T2, ARMv6-M, and ARMv7.
For ARMv7, the inline assembler supports:
• Most ARM instructions.
• Most Thumb instructions.
For ARMv6T2, the inline assembler supports most Thumb instructions.
For ARMv6, the inline assembler supports most ARM instructions, including the complete set of
ARMv6 Single Instruction Multiple Data (SIMD) instructions.
For ARMv5, the inline assembler supports most ARM instructions, including generic coprocessor
instructions.
For ARMv4, the inline assembler supports most ARM instructions, including generic coprocessor
instructions.
VFPv2 instructions are supported in the inline assembler.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.3 Restrictions on inline assembler support in the compiler

7.3 Restrictions on inline assembler support in the compiler

The inline assembler in the compiler does not support a number of instructions.
Specifically, the inline assembler does not support:
• Thumb assembly language in processors without Thumb-2 technology.
• VFP instructions that were added in VFPv3 or higher.
• NEON instructions.
• The ARMv6 SETEND instruction and some of the system extensions.
• ARMv5 BX, BLX, and BXJ instructions.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.4 Inline assembly language syntax with the __asm keyword in C and C++

7.4 Inline assembly language syntax with the __asm keyword in C and C++
The inline assembler is invoked with the assembler specifier, __asm, and is followed by a list of
assembler instructions inside braces or parentheses.
You can specify inline assembly code using the following formats:
• On a single line, for example:
__asm("instruction[;instruction]");
__asm{instruction[;instruction]}

You cannot include comments.

• Using multiple adjacent strings, for example:
__asm("ADD x, x, #1\n"
"MOV y, x\n");

This enables you to use macros to generate inline assembly, for example:
#define ADDLSL(x, y, shift) __asm ("ADD " #x ", " #y ", LSL " #shift)
• On multiple lines, for example:
__asm
{
...
instruction
...
}

You can use C or C++ comments anywhere in an inline assembly language block.
You can use an __asm statement wherever a statement is expected.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.5 Inline assembly language syntax with the asm keyword in C++

7.5 Inline assembly language syntax with the asm keyword in C++
When compiling C++, the compiler supports the asm syntax proposed in the ISO C++ Standard.
You can specify inline assembly code using the following formats:
• On a single line, for example:
asm("instruction[;instruction]");
asm{instruction[;instruction]}

You cannot include comments.

• Using multiple adjacent strings, for example:
asm("ADD x, x, #1\n"
"MOV y, x\n");

This enables you to use macros to generate inline assembly, for example:
#define ADDLSL(x, y, shift) asm ("ADD " #x ", " #y ", LSL " #shift)
• On multiple lines, for example:
asm
{
...
instruction
...
}

You can use C or C++ comments anywhere in an inline assembly language block.
You can use an asm statement wherever a statement is expected.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.6 Inline assembler rules for compiler keywords __asm and asm

7.6 Inline assembler rules for compiler keywords __asm and asm
There are a number of rule that apply to the __asm and asm keywords.
These rules are as follows:
• Multiple instructions on the same line must be separated with a semicolon (;).
• If an instruction requires more than one line, line continuation must be specified with the backslash
character (\).
• For the multiple line format, C and C++ comments are permitted anywhere in the inline assembly
language block. However, comments cannot be embedded in a line that contains multiple
instructions.
• The comma (,) is used as a separator in assembly language, so C expressions with the comma
operator must be enclosed in parentheses to distinguish them:
__asm
{
ADD x, y, (f(), z)
}
• Labels must be followed by a colon, :, like C and C++ labels.
• An asm statement must be inside a C++ function. An asm statement can be used anywhere a C++
statement is expected.
• Register names in inline assembly code are treated as C or C++ variables. They do not necessarily
relate to the physical register of the same name. If the register is not declared as a C or C++ variable,
the compiler generates a warning.
• Registers must not be saved and restored in inline assembly code. The compiler does this for you.
Also, the inline assembler does not provide direct access to the physical registers. However, indirect
access is provided through variables that act as virtual registers.
If registers other than ASPR, CPSR, and SPSR are read without being written to, an error message is
issued. For example:
int f(int x)
{
__asm
{
STMFD sp!, {r0} // save r0 - illegal: read before write
ADD r0, x, 1
EOR x, r0, x
LDMFD sp!, {r0} // restore r0 - not needed.
}
return x;
}

The function must be written as:

int f(int x)
{
int r0;
__asm
{
ADD r0, x, 1
EOR x, r0, x
}
return x;
}

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.7 Restrictions on inline assembly operations in C and C++ code

7.7 Restrictions on inline assembly operations in C and C++ code

There are a number of restrictions on the operations that can be performed in inline assembly code.
These restrictions provide a measure of safety, and ensure that the assumptions in compiled C and C++
code are not violated in the assembled assembly code.

Related concepts
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.8 Inline assembler register restrictions in C and C++ code

7.8 Inline assembler register restrictions in C and C++ code

Registers such as r0-r3, sp, lr, and the NZCV flags in the CPSR must be used with caution.
If C or C++ expressions are used, these might be used as temporary registers and NZCV flags might be
corrupted by the compiler when evaluating the expression.
The pc, lr, and sp registers cannot be explicitly read or modified using inline assembly code because
there is no direct access to any physical registers. However, you can use the intrinsics __current_pc,
__current_sp, and __return_address to read these registers.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
7.14 Inline assembler and register access in C and C++ code on page 7-277.

Related references
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.
10.137 __return_address intrinsic on page 10-748.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.9 Inline assembler processor mode restrictions in C and C++ code

7.9 Inline assembler processor mode restrictions in C and C++ code

ARM strongly recommends that you do not change processor modes or modify coprocessor states in
inline assembly code.

Caution
The compiler does not recognize such changes.

Instead of attempting to change processor modes or coprocessor states from within inline assembly code,
see if there are any intrinsics available that provide what you require. If no such intrinsics are available,
use embedded assembly code if absolutely necessary.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
4.1 Compiler intrinsics on page 4-105.
7.26 Embedded assembler support in the compiler on page 7-291.

Related information
Processor modes, and privileged and unprivileged software execution.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code

7.10 Inline assembler Thumb instruction set restrictions in C and C++ code
The inline assembler supports Thumb state in ARM architectures v6T2, v6M, and v7. There are a
number of Thumb-specific restrictions.
These restrictions are as follows:
1. TBB, TBH, CBZ, and CBNZ instructions are not supported.
2. In some cases, the compiler can replace IT blocks with branched code.
3. The instruction width specifier .N denotes a preference, but not a requirement, to the compiler. This is
because, in rare cases, optimizations and register allocation can make it inefficient to generate a 16-
bit encoding.
For ARMv6 and lower architectures, the inline assembler does not assemble any Thumb instructions.
Instead, on finding inline assembly while in Thumb state, the compiler switches to ARM state
automatically. Code that relies on this switch is currently supported, but this practice is deprecated. For
ARMv6T2 and higher, the automatic switch from Thumb to ARM state is made if the code is valid ARM
assembly but not Thumb.
ARM state can be set deliberately. Inline assembly language can be included in a source file that contains
code to be compiled for Thumb in ARMv6 and lower, by enclosing the functions containing inline
assembly code between #pragma arm and #pragma thumb statements. For example:
... // Thumb code
#pragma arm // ARM code. Switch code generation to the ARM instruction set so
// that the inline assembler is available for Thumb in ARMv6 and lower.
int add(int i, int j)
{
int res;
__asm
{
ADD res, i, j // add here
}
return res;
}
#pragma thumb // Thumb code. Switch back to the Thumb instruction set.
// The inline assembler is no longer available for Thumb in ARMv6 and
// lower.

The code must also be compiled using the --apcs /interwork compiler command-line option.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.

Related references
8.6 --apcs=qualifier...qualifier on page 8-322.
10.76 Pragmas on page 10-681.

Related information
Instruction width specifiers.
IT.
TBB and TBH.
CBZ and CBNZ.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code

7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code
The inline assembler provides direct support for VFPv2 instructions.
For example:
float foo(float f, float g)
{
float h;
__asm
{
VADD h, f, 0.5*g; // h = f + 0.5*g
}
return h;
}

In inline assembly code you cannot use the VFP instruction VMOV to transfer between an ARM register
and half of a doubleword extension register (NEON scalar). Instead, you can use the instruction VMOV to
transfer between an ARM register and a single-precision VFP register.
If you change the FPSCR register using inline assembly code, it produces runtime effects on the inline
VFP code and on subsequent compiler-generated VFP code.
Note
• Do not use inline assembly code to change VFP vector mode. Inline assembly code must not be used
for this purpose, and VFP vector mode is deprecated.
• ARM strongly discourages the use of inline assembly coprocessor instructions to interact with VFP in
any way.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
5.41 Compiler support for floating-point arithmetic on page 5-200.

Related information
VMOV (between an ARM register and a NEON scalar).
VMOV (between one ARM register and single precision VFP).

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.12 Inline assembler instruction restrictions in C and C++ code

7.12 Inline assembler instruction restrictions in C and C++ code

There are a number of instructions that the inline assembler does not support.
Specifically, the following instructions are not supported:
• BKPT, BX, BXJ, and BLX instructions.
Note
You can insert a BKPT instruction in C and C++ code by using the __breakpoint() intrinsic.

• LDR Rn, =expression pseudo-instruction. Use MOV Rn, expression instead. (This can generate a
load from a literal pool.)
• LDRT, LDRBT, STRT, and STRBT instructions.
• MUL, MLA, UMULL, UMLAL, SMULL, and SMLAL flag setting instructions.
• MOV or MVN flag-setting instructions where the second operand is a constant.
• The special LDM instructions used in system or supervisor mode to load the user-mode banked
registers, written with a ^ after the register list, such as:
LDMIA sp!, {r0-r12, lr, pc}^
• ADR and ADRL pseudo-instructions.
Note
You can use MOV Rn, &expression; instead of the ADR and ADRL pseudo-instructions.

• ARM recommends not using the LDREX and STREX instructions. This is because the compiler might
generate loads and stores between LDREX and STREX, potentially clearing the exclusive monitor set by
LDREX. This recommendation also applies to the byte, halfword, and doubleword variants LDREXB,
STREXB, LDREXH, STREXH, LDREXD, and STREXD.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.

Related references
10.106 __breakpoint intrinsic on page 10-714.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.13 Miscellaneous inline assembler restrictions in C and C++ code

7.13 Miscellaneous inline assembler restrictions in C and C++ code

Compared with armasm or embedded assembly language, the inline assembler has a number of
restrictions.
Specifically, these restrictions are as follows:
• The inline assembler is a high-level assembler, and the code it generates might not always be exactly
what you write. Do not use it to generate more efficient code than the compiler generates. Use the
embedded assembler or the ARM assembler armasm for this purpose.
• Some low-level features that are available in the ARM assembler armasm, such as writing to PC, are
not supported.
• Label expressions are not supported.
• You cannot get the address of the current instruction using dot notation (.) or {PC}.
• You cannot use the & operator to denote hexadecimal constants. Use the 0x prefix instead. For
example:
__asm { AND x, y, 0xF00 }
• The notation to specify the actual rotation of an 8-bit constant is not available in inline assembly
language. This means that where an 8-bit shifted constant is used, the C flag must be regarded as
corrupted if the NZCV flags are updated.
• You must not modify the stack pointer. This is not necessary because the compiler automatically
stacks and restores any working registers as required. The compiler does not permit you to explicitly
stack and restore work registers.

Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.14 Inline assembler and register access in C and C++ code

7.14 Inline assembler and register access in C and C++ code

The inline assembler provides no direct access to the physical registers of an ARM processor. If an ARM
register name is used as an operand in an inline assembler instruction it becomes a reference to a variable
of the same name, and not the physical ARM register.
The variable can be thought of as a virtual register.
The compiler declares variables for physical registers as appropriate during optimization and code
generation. However, the physical register used in the assembled code might be different to that specified
in the instruction, or it might be stored on the stack. You can explicitly declare variables representing
physical registers as normal C or C++ variables. The compiler implicitly declares registers R0 to R12 and
r0 to r12 as auto signed int local variables, regardless of whether or not they are used. If you want to
declare them to be of a different data type, you can do so. For example, in the following code, the
compiler does not implicitly declare r1 and r2 as auto signed int because they are explicitly declared
as char and float types respectively:
void bar(float *);
int add(int x)
{
int a = 0;
char r1 = 0;
float r2 = 0.0;
bar(&r2);
__asm
{
ADD r1, a, #100
}
...
return r1;
}

The compiler does not implicitly declare variables for any other registers, so you must explicitly declare
variables for registers other than R0 to R12 and r0 to r12 in your C or C++ code. No variables are
declared for the sp (r13), lr (r14), and pc (r15) registers, and they cannot be read or directly modified
in inline assembly code.
There is no virtual Processor Status Register (PSR). Any references to the PSR are always to the physical
PSR.

The size of the variables is the same as the physical registers.

The compiler-declared variables have function local scope, that is, within a single function, multiple asm
statements or declarations that reference the same variable name access the same virtual register.
Existing inline assembly code that conforms to previously documented guidelines continues to perform
the same function as in previous versions of the compiler, although the actual registers used in each
instruction might be different.
The initial value in each variable representing a physical register is UNKNOWN. You must write to these
variables before reading them. The compiler generates an error if you attempt to read such a variable
before writing to it, for example, if you attempt to read the variable associated with the physical register
r1.

Any variables that you use in inline assembly code to refer to registers must be explicitly declared in
your C or C++ code, unless they are implicitly declared by the compiler. However, it is better to
explicitly declare them in your C or C++ code. You do not have to declare them to be of the same data
type as the implicit declarations. For example, although the compiler implicitly declares register R0 to be
of type signed int, you can explicitly declare R0 as an unsigned integer variable if required.
It is also better to use C or C++ variables as instruction operands. The compiler generates a warning the
first time a variable or physical register name is used, regardless of whether it is implicitly or explicitly
declared, and only once for each translation unit. For example, if you use register r3 without declaring it,
a warning is displayed. You can suppress the warning with --diag_suppress.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.14 Inline assembler and register access in C and C++ code

Related concepts
7.18 Expansion of inline assembler load and store instructions on page 7-282.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.

Related references
8.61 --diag_suppress=tag[,tag,...] on page 8-389.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.15 Inline assembler and the # constant expression specifier in C and C++ code

7.15 Inline assembler and the # constant expression specifier in C and C++ code
The constant expression specifier # is optional. If it is used, the expression following it must be a
constant.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.16 Inline assembler and instruction expansion in C and C++ code

7.16 Inline assembler and instruction expansion in C and C++ code

An ARM instruction in inline assembly code might be expanded into several instructions in the compiled
object.
The expansion depends on the instruction, the number of operands specified in the instruction, and the
type and value of each operand.

Related concepts
7.17 Expansion of inline assembler instructions that use constants on page 7-281.
7.18 Expansion of inline assembler load and store instructions on page 7-282.
7.1 Compiler support for inline assembly language on page 7-264.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.17 Expansion of inline assembler instructions that use constants

7.17 Expansion of inline assembler instructions that use constants

A constant operand specified in an instruction is not limited to the values permitted by the instruction.
Instead, the compiler might translate the instruction into a sequence of instructions with the same effect.
For example:
ADD r0,r0,#1023

might be translated into:

ADD r0,r0,#1024
SUB r0,r0,#1

Another example of expansion possibility is:

MOV rn,0x12345678

With the exception of coprocessor instructions, all ARM instructions with a constant operand support
instruction expansion. In addition, the MUL instruction can be expanded into a sequence of adds and shifts
when the third operand is a constant.
The effect of updating the CPSR by an expanded instruction is:
• Arithmetic instructions set the NZCV flags correctly.
• Logical instructions:
— Set the NZ flags correctly.
— Do not change the V flag.
— Corrupt the C flag.

Related concepts
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.18 Expansion of inline assembler load and store instructions

7.18 Expansion of inline assembler load and store instructions

The LDM, STM, LDRD, and STRD instructions might be replaced by equivalent ARM instructions.
In this case the compiler outputs a warning message informing you that it might expand instructions. The
warning can be suppressed with --diag_suppress.
Inline assembly code must be written in such a way that it does not depend on the number of expected
instructions or on the expected execution time for each specified instruction.
Instructions that normally place constraints on pairs of operand registers, such as LDRD and STRD, are
replaced by a sequence of instructions with equivalent functionality and without the constraints.
However, these might be recombined into LDRD and STRD instructions.
All LDM and STM instructions are expanded into a sequence of LDR and STR instructions with equivalent
effect. However, the compiler might subsequently recombine the separate instructions into an LDM or STM
during optimization.

Related concepts
7.14 Inline assembler and register access in C and C++ code on page 7-277.
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.

Related references
8.61 --diag_suppress=tag[,tag,...] on page 8-389.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.19 Inline assembler effect on processor condition flags in C and C++ code

7.19 Inline assembler effect on processor condition flags in C and C++ code
An inline assembly language instruction might explicitly or implicitly attempt to update the processor
condition flags.
Inline assembly language instructions that involve only virtual register operands or simple expression
operands have predictable behavior. The condition flags are set by the instruction if either an implicit or
an explicit update is specified. The condition flags are unchanged if no update is specified.
If any of the instruction operands are not simple operands, then the condition flags might be corrupted
unless the instruction updates them.
In general, the compiler cannot easily diagnose potential corruption of the condition flags. However, for
operands that require the construction and subsequent destruction of C++ temporaries the compiler gives
a warning if the instruction attempts to update the condition flags. This is because the destruction might
corrupt the condition flags.

Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.20 Inline assembler expression operands in C and C++ code

7.20 Inline assembler expression operands in C and C++ code

Function arguments, C or C++ variables, and other C or C++ expressions can be specified as register
operands in an inline assembly language instruction.
The type of an expression used in place of an ARM integer register must be either an integral type (that
is, char, short, int or long), excluding long long, or a pointer type. No sign extension is performed
on char or short types. You must perform sign extension explicitly for these types. The compiler might
add code to evaluate these expressions and allocate them to registers.
When an operand is used as a destination, the expression must be a modifiable lvalue if used as an
operand where the register is modified. For example, a destination register or a base register with a base-
register update.
For an instruction containing more than one expression operand, the order that expression operands are
evaluated is unspecified.
An expression operand of a conditional instruction is only evaluated if the conditions for the instruction
are met.
A C or C++ expression that is used as an inline assembly code operand might result in the instruction
being expanded into several instructions. This happens if the value of the expression does not meet the
constraints set out for the instruction operands in the ARM Architecture Reference Manual.
If an expression used as an operand creates a temporary that requires destruction, then the destruction
occurs after the inline assembly instruction is executed. This is analogous to the C++ rules for
destruction of temporaries.
A simple expression operand is one of the following:
• A variable value.
• The address of a variable.
• The dereferencing of a pointer variable.
• A compile-time constant.
Any expression containing one of the following is not a simple expression operand:
• An implicit function call, such as for division, or explicit function call.
• The construction of a C++ temporary.
• An arithmetic or logical operation.

Related concepts
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.

Related information
ARM Architecture Reference Manual.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.21 Inline assembler register list operands in C and C++ code

7.21 Inline assembler register list operands in C and C++ code

A register list can contain a maximum of 16 operands. These operands can be virtual registers or
expression register operands.
The order that virtual registers and expression operands are specified in a register list is significant. The
register list operands are read or written in left-to-right order. The first operand uses the lowest address,
and subsequent operands use addresses formed by incrementing the previous address by four. This
behavior is in contrast to the usual operation of the LDM or STM instructions where the lowest numbered
physical register is always stored to the lowest memory address. This difference in behavior is a
consequence of the virtualization of registers.
An expression operand or virtual register can appear more than once in a register list and is used each
time it is specified.
The base register is updated, if specified. The update overwrites any value loaded into the base register
during a memory load operation.
The inline assembler does not support operating on User mode registers when in a privileged mode, by
specifying ^ after a register list.

Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.22 Inline assembler intermediate operands in C and C++ code

7.22 Inline assembler intermediate operands in C and C++ code

A C or C++ constant expression of an integral type might be used as an immediate value in an inline
assembly language instruction.
A constant expression that specifies an immediate shift must have a value that lies in the range defined in
the ARM Architecture Reference Manual, as appropriate for the shift operation.
A constant expression that specifies an immediate offset for a memory or coprocessor data transfer
instruction must have a value with suitable alignment.

Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.23 Inline assembler function calls and branches in C and C++ code

7.23 Inline assembler function calls and branches in C and C++ code
The BL and SVC instructions of the inline assembler enable you to specify three optional lists following
the normal instruction fields.
These instructions have the following format:
SVC{cond} svc_num, {input_param_list}, {output_value_list}, {corrupt_reg_list}
BL{cond} function, {input_param_list}, {output_value_list}, {corrupt_reg_list}

Note
RVCT v3.0 renamed the SWI instruction to SVC. The inline assembler still accepts SWI in place of SVC.

If you are compiling for architecture 5TE or later, the linker converts BL function instructions to BLX
function instructions if appropriate. However, you cannot use BLX function instructions directly
within inline assembly code.
• input_param_list specifies the expressions or variables that are the input parameters to the function
call or SVC instruction, and the physical registers that contain the expressions or variables. They are
specified as assignments to physical registers or as physical register names. A single list can contain
both types of input register specification.
The inline assembler ensures that the correct values are present in the specified physical registers
before the BL or SVC instruction is entered. A physical register name that is specified without
assignment ensures that the value in the virtual register of the same name is present in the physical
register. This ensures backwards compatibility with existing inline assembly language code.
For example, the instruction:
BL foo, { r0=expression1, r1=expression2, r2 }

generates the following pseudocode:

MOV (physical) r0, expression1
MOV (physical) r1, expression2
MOV (physical) r2, (virtual) r2
BL foo

By default, if you do not specify any input_param_list input parameters, registers r0 to r3 are used
as input parameters.
Note
It is not possible to specify the lr, sp, or pc registers in the input parameter list.

• output_value_list specifies the physical registers that contain the output values from the BL or SVC
instruction and where they must be stored. The output values are specified as assignments from
physical registers to modifiable lvalue expressions or as single physical register names.
The inline assembler takes the values from the specified physical registers and assigns them into the
specified expressions. A physical register name specified without assignment causes the virtual
register of the same name to be updated with the value from the physical register.
For example, the instruction:
BL foo, { }, { result1=r0, r1 }

generates the following pseudocode:

BL foo
MOV result1, (physical) r0
MOV (virtual) r1, (physical) r1

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.23 Inline assembler function calls and branches in C and C++ code

By default, if you do not specify any output_value_list output values, register r0 is used for the
output value.
Note
It is not possible to specify the lr, sp, or pc registers in the output value list.

• corrupt_reg_list specifies the physical registers that are corrupted by the called function. If the
condition flags are modified by the called function, you must specify the PSR in the corrupted register
list.
The BL and SVC instructions always corrupt lr.
If corrupt_reg_list is omitted then for BL and SVC, the registers r0-r3, lr and the PSR are
corrupted.
Only the branch instruction, B, can jump to labels within a single C or C++ function.
By default, if you do not specify any corrupt_reg_list registers, r0 to r3, r14, and the PSR can be
corrupted.
Note
It is not possible to specify the lr, sp, or pc registers in the corrupt register list.

If you do not specify any lists, then:

• r0-r3 are used as input parameters.
• r0 is used for the output value and can be corrupted.
• r0-r3, r14, and the PSR can be corrupted.

Note
• The BX, BLX, and BXJ instructions are not supported in the inline assembler.
• It is not possible to specify the lr, sp, or pc registers in any of the input, output, or corrupted register
lists.
• The sp register must not be changed by any SVC instruction or function call.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.24 Inline assembler branches and labels in C and C++ code

7.24 Inline assembler branches and labels in C and C++ code

Labels defined in inline assembly code can be used as targets for branches or C and C++ goto
statements.
They must be followed by a colon, :, like C and C++ labels, and they must be defined within the same
function that they are called from.
Labels defined in C and C++ can be used as targets by branch instructions in inline assembly code, in the
form:
B{cond} label

For example:
int foo(int x, int y)
{
__asm
{
SUBS x,x,y
BEQ end
}
return 1;
end:
return 0;
}

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.25 Inline assembler and virtual registers

7.25 Inline assembler and virtual registers

Inline assembly code for the compiler always specifies virtual registers.
The compiler chooses the physical registers to be used for each instruction during code generation, and
enables the compiler to fully optimize the assembly code and surrounding C or C++ code.
The pc (r15), lr (r14), and sp (r13) registers cannot be accessed at all. An error message is generated
when these registers are accessed.
The initial values of virtual registers are undefined. Therefore, you must write to virtual registers before
reading them. The compiler warns you if code reads a virtual register before writing to it. The compiler
also generates these warnings for legacy code that relies on particular values in physical registers at the
beginning of inline assembly code, for example:
int add(int i, int j)
{
int res;
__asm
{
ADD res, r0, r1 // relies on i passed in r0 and j passed in r1
}
return res;
}

This code generates warning and error messages.

The errors are generated because virtual registers r0 and r1 are read before writing to them. The
warnings are generated because r0 and r1 must be defined as C or C++ variables. The corrected code is:
int add(int i, int j)
{
int res;
__asm
{
ADD res, i, j
}
return res;
}

Related concepts
7.14 Inline assembler and register access in C and C++ code on page 7-277.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.26 Embedded assembler support in the compiler

7.26 Embedded assembler support in the compiler

The compiler enables you to include assembly code out-of-line in one or more C or C++ function
definitions.
Embedded assembly code provides unrestricted, low-level access to the target processor, enables you to
use the C and C++ preprocessor directives, and gives easy access to structure member offsets. The
embedded assembler supports ARM and Thumb states.

Related concepts
7.27 Embedded assembler syntax in C and C++ on page 7-292.
7.28 Effect of compiler ARM and Thumb states on embedded assembler on page 7-293.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
7.30 Compiler generation of embedded assembly language functions on page 7-295.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.
7.33 Manual overload resolution in embedded assembler on page 7-299.
7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.

Related information
armasm User Guide.
Mixing C, C++, and Assembly Language.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.27 Embedded assembler syntax in C and C++

7.27 Embedded assembler syntax in C and C++

An embedded assembly language function definition is marked by the __asm function qualifier in C and
C++, or the asm function qualifier in C++.
The __asm and asm function qualifiers can be used on:
• Member functions.
• Non-member functions.
• Template functions.
• Template class member functions.
Functions declared with __asm or asm can have arguments, and return a type. They are called from C and
C++ in the same way as normal C and C++ functions. The syntax of an embedded assembly language
function is:
__asm return-type function-name(parameter-list)
{
// ARM/Thumb assembly code
instruction{;comment is optional}
...
instruction
}

Note
Argument names are permitted in the parameter list, but they cannot be used in the body of the embedded
assembly function. For example, the following function uses integer i in the body of the function, but
this is not valid in assembly:
__asm int f(int i)
{
ADD i, i, #1 // error
}

You can use, for example, r0 instead of i.

The following example shows a string copy routine as a not very optimal embedded assembler routine.
#include <stdio.h>
__asm void my_strcpy(const char *src, char *dst)
{
loop
LDRB r2, [r0], #1
STRB r2, [r1], #1
CMP r2, #0
BNE loop
BX lr
}
int main(void)
{
const char *a = "Hello world!";
char b[20];
my_strcpy (a, b);
printf("Original string: '%s'\n", a);
printf("Copied string: '%s'\n", b);
return 0;
}

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.28 Effect of compiler ARM and Thumb states on embedded assembler

7.28 Effect of compiler ARM and Thumb states on embedded assembler

The initial state of the embedded assembler, ARM or Thumb state, is determined by the initial state of
the compiler, as specified on the command line.
This means that:
• If the compiler starts in ARM state, the embedded assembler uses --arm.
• If the compiler starts in Thumb state, the embedded assembler uses --thumb.
The embedded assembler state at the start of each function is as set by the invocation of the compiler, as
modified by #pragma arm and #pragma thumb pragmas.
You can change the state of the embedded assembler within a function by using explicit ARM, THUMB, or
CODE16 directives in the embedded assembler function. Such a directive within an __asm function does
not affect the ARM or Thumb state of subsequent __asm functions.
If you are compiling for a 32-bit Thumb capable processor, you can use both 32-bit encoded Thumb
instructions and 16-bit encoded Thumb instructions when in Thumb state.
If you are compiling for a 16-bit Thumb capable processor, you can only use 16-bit encoded Thumb
instructions when in Thumb state.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.29 Restrictions on embedded assembly language functions in C and C++ code

7.29 Restrictions on embedded assembly language functions in C and C++ code

A number of restrictions apply to embedded assembly language functions.
Specifically:
• After preprocessing, __asm functions can only contain assembly code, with the exception of the
following embedded assembler built-ins:
__cpp(expr)
__offsetof_base(D, B)
__mcall_is_virtual(D, f)
__mcall_is_in_vbase(D, f)
__mcall_offsetof_base(D, f)
__mcall_this_offset(D, f)
__vcall_offsetof_vfunc(D, f)
• No return instructions are generated by the compiler for an __asm function. If you want to return from
an __asm function, you must include the return instructions, in assembly code, in the body of the
function.
Note
This makes it possible to fall through to the next function, because the embedded assembler
guarantees to emit the __asm functions in the order you define them. However, inlined and template
functions behave differently. Do not assume that code execution falls out of an inline or template
function into another embedded assembly function.

• __asm functions do not change the ARM Architecture Procedure Call Standard (AAPCS) rules that
apply. This means that all calls between an __asm function and a normal C or C++ function must
adhere to the AAPCS, even though there are no restrictions on the assembly code that an __asm
function can use (for example, change state).

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.30 Compiler generation of embedded assembly language functions on page 7-295.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.30 Compiler generation of embedded assembly language functions

7.30 Compiler generation of embedded assembly language functions

The bodies of all the __asm functions in a translation unit are assembled as if they are concatenated into a
single file that is then passed to the ARM assembler.
The order of __asm functions in the assembly language file that is passed to the assembler is guaranteed
to be the same order as in the source file, except for functions that are generated using a template
instantiation.

Note
This means that it is possible for control to pass from one __asm function to another by falling off the end
of the first function into the next __asm function in the file, if the return instruction is omitted.

When you invoke armcc, the object file produced by the assembler is combined with the object file of the
compiler by a partial link that produces a single object file.
The compiler generates an AREA directive for each __asm function, as in the following example:
#include <cstddef>
struct X
{
int x,y;
void addto_y(int);
};
__asm void X::addto_y(int)
{
LDR r2, [r0, #__cpp(offsetof(X, y))]
ADD r1, r2, r1
STR r1, [r0, #__cpp(offsetof(X, y))]
BX lr
}

For this function, the compiler generates:

AREA ||.emb_text||, CODE, READONLY
EXPORT |_ZN1X7addto_yEi|
#line num "file"
|_ZN1X7addto_yEi| PROC
LDR r2, [r0, #4]
ADD r1, r2, r1
STR r1, [r0, #4]
BX lr
ENDP
END

The use of offsetof must be inside __cpp() because it is the normal offsetof macro from the cstddef
header file.
Ordinary __asm functions are put in an ELF section with the name .emb_text. That is, embedded
assembly functions are never inlined. However, implicitly instantiated template functions and out-of-line
copies of inline functions are placed in an area with a name that is derived from the name of the function,
and an extra attribute that marks them as common. This ensures that the special semantics of these kinds
of functions are maintained.
Note
Because of the special naming of the area for out-of-line copies of inline functions and template
functions, these functions are not in the order of definition, but in an arbitrary order. Therefore, do not
assume that code execution falls out of an inline or template function and into another __asm function.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.30 Compiler generation of embedded assembly language functions

Related information
ELF for the ARM Architecture.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.31 Access to C and C++ compile-time constant expressions from embedded assembler

7.31 Access to C and C++ compile-time constant expressions from embedded

assembler
You can use the __cpp keyword to access C and C++ compile-time constant expressions, including the
addresses of data or functions with external linkage, from the assembly code.
The expression inside the __cpp must be a constant expression suitable for use as a C++ static
initialization. See 3.6.2 Initialization of non-local objects and 5.19 Constant expressions in ISO/IEC
14882:2003.
The following example shows a constant replacing the use of __cpp(expr):
LDR r0, =__cpp(&some_variable)
LDR r1, =__cpp(some_function)
BL __cpp(some_function)
MOV r0, #__cpp(some_constant_expr)

Names in the __cpp expression are looked up in the C++ context of the __asm function. Any names in
the result of a __cpp expression are mangled as required and automatically have IMPORT statements
generated for them.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
7.33 Manual overload resolution in embedded assembler on page 7-299.
7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.32 Differences between expressions in embedded assembler and C or C++

7.32 Differences between expressions in embedded assembler and C or C++

There are a number of differences between embedded assembly and C or C++.
Specifically:
• Assembly expressions are always unsigned. The same expression might have different values
between assembly and C or C++. For example:
MOV r0, #(-33554432 / 2) // result is 0x7f000000
MOV r0, #__cpp(-33554432 / 2) // result is 0xff000000
• Assembly numbers with leading zeros are still decimal. For example:
MOV r0, #0700 // decimal 700
MOV r0, #__cpp(0700) // octal 0700 == decimal 448
• Assembly operator precedence differs from C and C++. For example:
MOV r0, #(0x23 :AND: 0xf + 1) // ((0x23 & 0xf) + 1) => 4
MOV r0, #__cpp(0x23 & 0xf + 1) // (0x23 & (0xf + 1)) => 0
• Assembly strings are not NUL-terminated:
DCB "Hello world!" // 12 bytes (no trailing NUL)
DCB __cpp("Hello world!") // 13 bytes (trailing NUL)

Note
The embedded assembly rules apply outside __cpp, and the C or C++ rules apply inside __cpp.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.33 Manual overload resolution in embedded assembler

7.33 Manual overload resolution in embedded assembler

The following example shows the use of C++ casts to do overload resolution for nonvirtual function
calls.
void g(int);
void g(long);
struct T
{
int mf(int);
int mf(int,int);
};
__asm void f(T*, int, int)
{
BL __cpp(static_cast<int (T::*)(int, int)>(&T::mf)) // calls T::mf(int, int)
BL __cpp(static_cast<void (*)(int)>(g)) // calls g(int)
BX lr
}

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.34 __offsetof_base keyword for related base classes in embedded assembler

7.34 __offsetof_base keyword for related base classes in embedded assembler

The __offsetof_base keyword enables you to determine the offset from the beginning of an object to a
base class sub-object within it.
__offsetof_base(D, B)

B must be an unambiguous, nonvirtual base class of D.

Returns the offset from the beginning of a D object to the start of the B base subobject within it. The
result might be zero. The following example shows the offset (in bytes) that must be added to a D* p to
implement the equivalent of static_cast<B*>(p).
__asm B* my_static_base_cast(D* /*p*/) // equivalent to:
// return static_cast<B*>(p)
{
if __offsetof_base(D, B) <> 0 // optimize zero offset case
ADD r0, r0, #__offsetof_base(D, B)
endif
BX lr
}

The __offsetof_base, __mcall_*, and _vcall_offsetof_vfunc keywords are converted into integer
or logical constants in the assembly source code. You can only use it in __asm functions, not in __cpp
expressions.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.35 Compiler-supported keywords for calling class member functions in embedded assembler

7.35 Compiler-supported keywords for calling class member functions in

embedded assembler
The following embedded assembler built-ins facilitate the calling of virtual and nonvirtual member
functions from an __asm function.
Those beginning with __mcall can be used for both virtual and nonvirtual functions. Those beginning
with __vcall can be used only with virtual functions. They do not particularly help in calling static
member functions.
• __mcall_is_virtual(D, f).
• __mcall_is_in_vbase(D, f).
• __mcall_offsetof_vbase(D, f).
• __mcall_this_offset(D, f).
• __vcall_offsetof_vfunc(D, f).

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.36 __mcall_is_virtual(D, f) on page 7-302.
7.37 __mcall_is_in_vbase(D, f) on page 7-303.
7.38 __mcall_offsetof_vbase(D, f) on page 7-304.
7.39 __mcall_this_offset(D, f) on page 7-305.
7.40 __vcall_offsetof_vfunc(D, f) on page 7-306.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.36 __mcall_is_virtual(D, f)

7.36 __mcall_is_virtual(D, f)
Results in {TRUE} if f is a virtual member function found in D, or a base class of D, otherwise {FALSE}.
If it returns {TRUE} the call can be done using virtual dispatch, otherwise the call must be done directly.

Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.37 __mcall_is_in_vbase(D, f)

7.37 __mcall_is_in_vbase(D, f)
Results in {TRUE} if f is a nonstatic member function found in a virtual base class of D, otherwise
{FALSE}.

If it returns {TRUE} the this adjustment must be done using __mcall_offsetof_vbase(D, f),
otherwise it must be done with __mcall_this_offset(D, f).

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.38 __mcall_offsetof_vbase(D, f)

7.38 __mcall_offsetof_vbase(D, f)
Returns the negative offset from the value of the vtable pointer of the vtable slot that holds the base
offset (from the beginning of a D object to the start of the base that f is defined in).
Where D is a class type and f is a nonstatic member function defined in a virtual base class of D, in other
words __mcall_is_in_vbase(D,f) returns {TRUE}.
The base offset is the this adjustment necessary when making a call to f with a pointer to a D.
Note
The offset returns a positive number that then has to be subtracted from the value of the vtable pointer.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.39 __mcall_this_offset(D, f)

7.39 __mcall_this_offset(D, f)
Returns the offset from the beginning of a D object to the start of the base in which f is defined.
This is the this adjustment necessary when making a call to f with a pointer to a D. It is either zero if f
is found in D or the same as __offsetof_base(D,B), where B is a nonvirtual base class of D that contains
f.

Where D is a class type and f is a nonstatic member function defined in D or a nonvirtual base class of D.
If __mcall_this_offset(D,f) is used when f is found in a virtual base class of D it returns an arbitrary
value designed to cause an assembly error if used. This is so that such invalid uses of
__mcall_this_offset can occur in sections of assembly code that are to be skipped.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.40 __vcall_offsetof_vfunc(D, f)

7.40 __vcall_offsetof_vfunc(D, f)
Returns the offset of the slot in the vtable that holds the pointer to the virtual function, f.
Where D is a class and f is a virtual function defined in D, or a base class of D.
If __vcall_offsetof_vfunc(D, f) is used when f is not a virtual member function it returns an
arbitrary value designed to cause an assembly error if used.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.41 Calling nonstatic member functions in embedded assembler

7.41 Calling nonstatic member functions in embedded assembler

You can use keywords beginning with __mcall and __vcall to call nonvirtual and virtual functions from
__asm functions.

There is no __mcall_is_static to detect static member functions because static member functions have
different parameters (that is, no this), so call sites are likely to already be specific to calling a static
member function.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.42 Calling a nonvirtual member function

7.42 Calling a nonvirtual member function

The following example shows code that calls a nonvirtual function in either a virtual or nonvirtual base.
// rp contains a D* and we want to do the equivalent of rp->f() where f is a
// nonvirtual function
// all arguments other than the this pointer are already set up
// assumes f does not return a struct
if __mcall_is_in_vbase(D, f)
LDR r12, [rp] // fetch vtable pointer
LDR r0, [r12, #-__mcall_offsetof_vbase(D, f)] // fetch the vbase offset
ADD r0, r0, rp // do this adjustment
else
ADD r0, rp, #__mcall_this_offset(D, f) // set up and adjust this
// pointer for D*
endif
BL __cpp(&D::f) // call D::f

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.43 Calling a virtual member function on page 7-309.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.43 Calling a virtual member function

7.43 Calling a virtual member function

The following example shows code that calls a virtual function in either a virtual or nonvirtual base.
// rp contains a D* and we want to do the equivalent of rp->f() where f is a
// virtual function
// all arguments other than the this pointer are already set up
// assumes f does not return a struct
if __mcall_is_in_vbase(D, f)
LDR r12, [rp] // fetch vtable pointer
LDR r0, [r12, #-__mcall_offsetof_vbase(D, f)] // fetch the base offset
ADD r0, r0, rp // do this adjustment
LDR r12, [r0] // fetch vbase vtable pointer
else
MOV r0, rp // set up this pointer for D*
LDR r12, [rp] // fetch vtable pointer
ADD r0, r0, #__mcall_this_offset(D, f) // do this adjustment
endif
MOV lr, pc // prepare lr
LDR pc, [r12, #__vcall_offsetof_vfunc(D, f)] // calls rp->f()

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.42 Calling a nonvirtual member function on page 7-308.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.44 Accessing sp (r13), lr (r14), and pc (r15)

7.44 Accessing sp (r13), lr (r14), and pc (r15)

The following methods enable you to access the sp, lr, and pc registers correctly in your source code.
The first method uses the compiler intrinsics in inline assembly, for example:
void printReg()
{
unsigned int spReg, lrReg, pcReg;
__asm
{
MOV spReg, __current_sp()
MOV pcReg, __current_pc()
MOV lrReg, __return_address()
}
printf("SP = 0x%X\n",spReg);
printf("PC = 0x%X\n",pcReg);
printf("LR = 0x%X\n",lrReg);
}

The second method uses embedded assembly to access physical ARM registers from within a C or C++
source file, for example:
__asm void func()
{
MOV r0, lr
...
BX lr
}

This enables the return address of a function to be captured and displayed, for example, for debugging
purposes, to show the call tree.
Note
The compiler might also inline a function into its caller function. If a function is inlined, then the return
address is the return address of the function that calls the inlined function. Also, a function might be tail
called.

Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.

Related references
10.137 __return_address intrinsic on page 10-748.
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.

Non-Confidential
7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.45 Differences in compiler support for inline and embedded assembly code

7.45 Differences in compiler support for inline and embedded assembly code
There are differences between the ways inline and embedded assembly are compiled.
Specifically:
• Inline assembly code uses a high level of processor abstraction, and is integrated with the C and C++
code during code generation. Therefore, the compiler optimizes the C and C++ code and the
assembly code together.
• Unlike inline assembly code, embedded assembly code is assembled separately from the C and C++
code to produce a compiled object that is then combined with the object from the compilation of the
C or C++ source.
• Inline assembly code can be inlined by the compiler, but embedded assembly code cannot be inlined,
either implicitly or explicitly.
The following table summarizes the main differences between inline assembler and embedded assembler.

Table 7-1 Differences between inline and embedded assembler

Feature Embedded assembler Inline assembler

Instruction set ARM and Thumb. ARM on all processors.

Thumb on processors with Thumb-2 technology.

ARM assembler directives All supported. None supported.

ARMv6 instructions All supported. Supports most instructions, with some exceptions, for
example SETEND and some of the system extensions. The
complete set of ARMv6 SIMD instructions is supported.

ARMv7 instructions All supported. Supports most instructions.

VFP and NEON instructions All supported. VFPv2 only.

C/C++ expressions Constant expressions only. Full C/C++ expressions.

Optimization of assembly code No optimization. Full optimization.

Inlining Never. Possible.

Register access Specified physical registers are used. Uses virtual registers. Using sp (r13), lr (r14), and pc
You can also use PC, LR and SP. (r15) gives an error.

Return instructions You must add them in your code. Generated automatically. (The BX, BXJ, and BLX instructions
are not supported.)

BKPT instruction Supported directly. Not supported.

Non-Confidential
Chapter 8
Compiler Command-line Options

Describes the armcc compiler command-line options.

It contains the following sections:
• 8.1 -Aopt on page 8-317.
• 8.2 --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata on page 8-318.
• 8.3 --allow_null_this, --no_allow_null_this on page 8-319.
• 8.4 --alternative_tokens, --no_alternative_tokens on page 8-320.
• 8.5 --anachronisms, --no_anachronisms on page 8-321.
• 8.6 --apcs=qualifier...qualifier on page 8-322.
• 8.7 --arm on page 8-326.
• 8.8 --arm_linux on page 8-327.
• 8.9 --arm_linux_config_file=path on page 8-329.
• 8.10 --arm_linux_configure on page 8-330.
• 8.11 --arm_linux_paths on page 8-332.
• 8.12 --arm_only on page 8-334.
• 8.13 --asm on page 8-335.
• 8.14 --asm_dir=directory_name on page 8-336.
• 8.15 --autoinline, --no_autoinline on page 8-337.
• 8.16 --bigend on page 8-338.
• 8.17 --bitband on page 8-339.
• 8.18 --branch_tables, --no_branch_tables on page 8-340.
• 8.19 --brief_diagnostics, --no_brief_diagnostics on page 8-342.
• 8.20 --bss_threshold=num on page 8-343.
• 8.21 -c on page 8-344.
• 8.22 -C on page 8-345.
• 8.23 --c90 on page 8-346.

Non-Confidential
8 Compiler Command-line Options

• 8.24 --c99 on page 8-347.

• 8.25 --code_gen, --no_code_gen on page 8-348.
• 8.26 --comment_section, --no_comment_section on page 8-349.
• 8.27 --common_functions, --no_common_functions on page 8-350.
• 8.28 --compatible=name on page 8-351.
• 8.29 --compile_all_input, --no_compile_all_input on page 8-353.
• 8.30 --conditionalize, --no_conditionalize on page 8-354.
• 8.31 --configure_cpp_headers=path on page 8-355.
• 8.32 --configure_extra_includes=paths on page 8-356.
• 8.33 --configure_extra_libraries=paths on page 8-357.
• 8.34 --configure_gas=path on page 8-358.
• 8.35 --configure_gcc=path on page 8-359.
• 8.36 --configure_gcc_version=version on page 8-360.
• 8.37 --configure_gld=path on page 8-361.
• 8.38 --configure_sysroot=path on page 8-362.
• 8.39 --cpp on page 8-363.
• 8.40 --cpp11 on page 8-364.
• 8.41 --cpp_compat on page 8-365.
• 8.42 --cpu=list on page 8-367.
• 8.43 --cpu=name on page 8-368.
• 8.44 --create_pch=filename on page 8-371.
• 8.45 -Dname[(parm-list)][=def] on page 8-372.
• 8.46 --data_reorder, --no_data_reorder on page 8-373.
• 8.47 --debug, --no_debug on page 8-374.
• 8.48 --debug_macros, --no_debug_macros on page 8-375.
• 8.49 --default_definition_visibility=visibility on page 8-376.
• 8.50 --default_extension=ext on page 8-377.
• 8.51 --dep_name, --no_dep_name on page 8-378.
• 8.52 --depend=filename on page 8-379.
• 8.53 --depend_dir=directory_name on page 8-380.
• 8.54 --depend_format=string on page 8-381.
• 8.55 --depend_single_line, --no_depend_single_line on page 8-383.
• 8.56 --depend_system_headers, --no_depend_system_headers on page 8-384.
• 8.57 --depend_target=target on page 8-385.
• 8.58 --diag_error=tag[,tag,...] on page 8-386.
• 8.59 --diag_remark=tag[,tag,...] on page 8-387.
• 8.60 --diag_style=arm|ide|gnu compiler option on page 8-388.
• 8.61 --diag_suppress=tag[,tag,...] on page 8-389.
• 8.62 --diag_suppress=optimizations on page 8-390.
• 8.63 --diag_warning=tag[,tag,...] on page 8-391.
• 8.64 --diag_warning=optimizations on page 8-392.
• 8.65 --dllexport_all, --no_dllexport_all on page 8-393.
• 8.66 --dllimport_runtime, --no_dllimport_runtime on page 8-394.
• 8.67 --dollar, --no_dollar on page 8-395.
• 8.68 --dwarf2 on page 8-396.
• 8.69 --dwarf3 on page 8-397.
• 8.70 -E on page 8-398.
• 8.71 --echo on page 8-399.
• 8.72 --emit_frame_directives, --no_emit_frame_directives on page 8-400.
• 8.73 --enum_is_int on page 8-401.
• 8.74 --errors=filename on page 8-402.
• 8.75 --exceptions, --no_exceptions on page 8-403.
• 8.76 --exceptions_unwind, --no_exceptions_unwind on page 8-404.
• 8.77 --execstack, --no_execstack on page 8-405.
• 8.78 --execute_only on page 8-406.
• 8.79 --export_all_vtbl, --no_export_all_vtbl on page 8-407.

Non-Confidential
8 Compiler Command-line Options

• 8.80 --export_defs_implicitly, --no_export_defs_implicitly on page 8-408.

• 8.81 --extended_initializers, --no_extended_initializers on page 8-409.
• 8.82 --feedback=filename on page 8-410.
• 8.83 --float_literal_pools, --no_float_literal_pools on page 8-411.
• 8.84 --force_new_nothrow, --no_force_new_nothrow on page 8-412.
• 8.85 --forceinline on page 8-413.
• 8.86 --fp16_format=format on page 8-414.
• 8.87 --fpmode=model on page 8-415.
• 8.88 --fpu=list on page 8-417.
• 8.89 --fpu=name on page 8-418.
• 8.90 --friend_injection, --no_friend_injection on page 8-421.
• 8.91 -g on page 8-422.
• 8.92 --global_reg=reg_name[,reg_name,...] on page 8-423.
• 8.93 --gnu on page 8-424.
• 8.94 --gnu_defaults on page 8-425.
• 8.95 --gnu_instrument, --no_gnu_instrument on page 8-426.
• 8.96 --gnu_version=version on page 8-427.
• 8.97 --guiding_decls, --no_guiding_decls on page 8-428.
• 8.98 --help on page 8-429.
• 8.99 --hide_all, --no_hide_all on page 8-430.
• 8.100 -Idir[,dir,...] on page 8-431.
• 8.101 --ignore_missing_headers on page 8-432.
• 8.102 --implicit_include, --no_implicit_include on page 8-433.
• 8.103 --implicit_include_searches, --no_implicit_include_searches on page 8-434.
• 8.104 --implicit_key_function, --no_implicit_key_function on page 8-435.
• 8.105 --implicit_typename, --no_implicit_typename on page 8-436.
• 8.106 --import_all_vtbl on page 8-437.
• 8.107 --info=totals on page 8-438.
• 8.108 --inline, --no_inline on page 8-439.
• 8.109 --integer_literal_pools, --no_integer_literal_pools on page 8-440.
• 8.110 --interface_enums_are_32_bit on page 8-441.
• 8.111 --interleave on page 8-442.
• 8.112 -Jdir[,dir,...] on page 8-443.
• 8.113 --kandr_include on page 8-444.
• 8.114 -Lopt on page 8-445.
• 8.115 --library_interface=lib on page 8-446.
• 8.116 --library_type=lib on page 8-448.
• 8.117 --link_all_input, --no_link_all_input on page 8-449.
• 8.118 --list on page 8-450.
• 8.119 --list_dir=directory_name on page 8-452.
• 8.120 --list_macros on page 8-453.
• 8.121 --littleend on page 8-454.
• 8.122 --locale=lang_country on page 8-455.
• 8.123 --long_long on page 8-456.
• 8.124 --loop_optimization_level=opt on page 8-457.
• 8.125 --loose_implicit_cast on page 8-458.
• 8.126 --lower_ropi, --no_lower_ropi on page 8-459.
• 8.127 --lower_rwpi, --no_lower_rwpi on page 8-460.
• 8.128 -M on page 8-461.
• 8.129 --md on page 8-462.
• 8.130 --message_locale=lang_country[.codepage] on page 8-463.
• 8.131 --min_array_alignment=opt on page 8-464.
• 8.132 --mm on page 8-465.
• 8.133 --multibyte_chars, --no_multibyte_chars on page 8-466.
• 8.134 --multifile, --no_multifile on page 8-467.
• 8.135 --multiply_latency=cycles on page 8-468.

Non-Confidential
8 Compiler Command-line Options

• 8.136 --narrow_volatile_bitfields on page 8-469.

• 8.137 --nonstd_qualifier_deduction, --no_nonstd_qualifier_deduction on page 8-470.
• 8.138 -o filename on page 8-471.
• 8.139 -Onum on page 8-473.
• 8.140 --old_specializations, --no_old_specializations on page 8-476.
• 8.141 --old_style_preprocessing on page 8-477.
• 8.142 --ool_section_name, --no_ool_section_name on page 8-478.
• 8.143 -Ospace on page 8-479.
• 8.144 -Otime on page 8-480.
• 8.145 --output_dir=directory_name on page 8-481.
• 8.146 -P on page 8-482.
• 8.147 --parse_templates, --no_parse_templates on page 8-483.
• 8.148 --pch on page 8-484.
• 8.149 --pch_dir=dir on page 8-485.
• 8.150 --pch_messages, --no_pch_messages on page 8-486.
• 8.151 --pch_verbose, --no_pch_verbose on page 8-487.
• 8.152 --pending_instantiations=n on page 8-488.
• 8.153 --phony_targets on page 8-489.
• 8.154 --pointer_alignment=num on page 8-490.
• 8.155 --preinclude=filename on page 8-491.
• 8.156 --preprocess_assembly on page 8-492.
• 8.157 --preprocessed on page 8-493.
• 8.158 --protect_stack, --no_protect_stack on page 8-494.
• 8.159 --reassociate_saturation, --no_reassociate_saturation on page 8-495.
• 8.160 --reduce_paths, --no_reduce_paths on page 8-496.
• 8.161 --relaxed_ref_def, --no_relaxed_ref_def on page 8-497.
• 8.162 --remarks on page 8-498.
• 8.163 --remove_unneeded_entities, --no_remove_unneeded_entities on page 8-499.
• 8.164 --restrict, --no_restrict on page 8-500.
• 8.165 --retain=option on page 8-501.
• 8.166 --rtti, --no_rtti on page 8-502.
• 8.167 --rtti_data, --no_rtti_data on page 8-503.
• 8.168 -S on page 8-504.
• 8.169 --share_inlineable_strings, --no_share_inlineable_strings on page 8-505.
• 8.170 --shared on page 8-507.
• 8.171 --show_cmdline on page 8-508.
• 8.172 --signed_bitfields, --unsigned_bitfields on page 8-509.
• 8.173 --signed_chars, --unsigned_chars on page 8-510.
• 8.174 --split_ldm on page 8-511.
• 8.175 --split_sections on page 8-512.
• 8.176 --strict, --no_strict on page 8-513.
• 8.177 --strict_warnings on page 8-514.
• 8.178 --string_literal_pools, --no_string_literal_pools on page 8-515.
• 8.179 --sys_include on page 8-517.
• 8.180 --thumb on page 8-518.
• 8.181 --translate_g++ on page 8-519.
• 8.182 --translate_gcc on page 8-521.
• 8.183 --translate_gld on page 8-523.
• 8.184 --trigraphs, --no_trigraphs on page 8-525.
• 8.185 --type_traits_helpers, --no_type_traits_helpers on page 8-526.
• 8.186 -Uname on page 8-527.
• 8.187 --unaligned_access, --no_unaligned_access on page 8-528.
• 8.188 --use_frame_pointer, --no_use_frame_pointer on page 8-530.
• 8.189 --use_gas on page 8-531.
• 8.190 --use_pch=filename on page 8-532.
• 8.191 --using_std, --no_using_std on page 8-533.

Non-Confidential
8 Compiler Command-line Options

• 8.192 --vectorize, --no_vectorize on page 8-534.

• 8.193 --version_number on page 8-535.
• 8.194 --vfe, --no_vfe on page 8-536.
• 8.195 --via=filename on page 8-537.
• 8.196 --visibility_inlines_hidden on page 8-538.
• 8.197 --vla, --no_vla on page 8-539.
• 8.198 --vsn on page 8-540.
• 8.199 -W on page 8-541.
• 8.200 -Warmcc,option[,option,...] on page 8-542.
• 8.201 -Warmcc,--gcc_fallback on page 8-543.
• 8.202 --wchar, --no_wchar on page 8-544.
• 8.203 --wchar16 on page 8-545.
• 8.204 --wchar32 on page 8-546.
• 8.205 --whole_program on page 8-547.
• 8.206 --wrap_diagnostics, --no_wrap_diagnostics on page 8-548.

Non-Confidential
8 Compiler Command-line Options
8.1 -Aopt

8.1 -Aopt
Specifies command-line options to pass to the assembler when it is invoked by the compiler to assemble
either .s input files or embedded assembly language functions.

Syntax
-Aopt

Where:
opt
is a command-line option to pass to the assembler.
Note
Some compiler command-line options are passed to the assembler automatically whenever it is
invoked by the compiler. For example, if the option --cpu is specified on the compiler
command line, then this option is passed to the assembler whenever it is invoked to assemble .s
files or embedded assembly code.
To see the compiler command-line options passed by the compiler to the assembler, use the
compiler command-line option -A--show_cmdline.

Restrictions
If an unsupported option is passed through using -A, an error is generated by the assembler.

Example
armcc -A--predefine="NEWVERSION SETL {TRUE}" main.c

Related references
8.114 -Lopt on page 8-445.
8.171 --show_cmdline on page 8-508.
8.7 --arm on page 8-326.
8.28 --compatible=name on page 8-351.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.

Non-Confidential
8 Compiler Command-line Options
8.2 --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata

8.2 --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata

Enables and disables the use of VFP and NEON registers and data transfer instructions for non-VFP and
non-NEON data.

Usage
--allow_fpreg_for_nonfpdata enables the compiler to use VFP and NEON registers and instructions
for data transfer operations on non-VFP and non-NEON data. This is useful when demand for integer
registers is high. For the compiler to use the VFP or NEON registers, the default options for the
processor or the specified options must enable the hardware.
--no_allow_fpreg_for_nonfpdata prevents VFP and NEON registers from being used for non-VFP
and non-NEON data. When this option is specified, the compiler uses VFP and NEON registers for VFP
and NEON data only. This is useful when you want to confine the number of places in your code where
the compiler generates VFP or NEON instructions.

Default
The default is --no_allow_fpreg_for_nonfpdata.

Related references
8.87 --fpmode=model on page 8-415.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.

Related information
Extension register bank mapping.
NEON views of the register bank.
VFP views of the extension register bank.

Non-Confidential
8 Compiler Command-line Options
8.3 --allow_null_this, --no_allow_null_this

8.3 --allow_null_this, --no_allow_null_this

Allows and disallows null this pointers in C++.

Usage
Allowing null this pointers gives well-defined behavior when a nonvirtual member function is called on
a null object pointer.
Disallowing null this pointers enables the compiler to perform optimizations, and conforms with the
C++ standard.

Default
The default is --no_allow_null_this.

Related references
8.94 --gnu_defaults on page 8-425.

Non-Confidential
8 Compiler Command-line Options
8.4 --alternative_tokens, --no_alternative_tokens

8.4 --alternative_tokens, --no_alternative_tokens

Enables and disables the recognition of alternative tokens in C and C++.

Usage
In C and C++, use this option to control recognition of the digraphs. In C++, use this option to control
recognition of operator keywords, for example, and and bitand.

Default
The default is --alternative_tokens.

Non-Confidential
8 Compiler Command-line Options
8.5 --anachronisms, --no_anachronisms

8.5 --anachronisms, --no_anachronisms

Enables and disables anachronisms in C++.

Mode
This option is effective only if the source language is C++.

Default
The default is --no_anachronisms.

Example
typedef enum { red, white, blue } tricolor;
inline tricolor operator++(tricolor c, int)
{
int i = static_cast<int>(c) + 1;
return static_cast<tricolor>(i);
}
void foo(void)
{
tricolor c = red;
c++; // okay
++c; // anachronism
}

Compiling this code with the option --anachronisms generates a warning message.
Compiling this code without the option --anachronisms generates an error message.

Related references
8.39 --cpp on page 8-363.
8.176 --strict, --no_strict on page 8-513.
8.177 --strict_warnings on page 8-514.
11.8 Anachronisms in ARM C++ on page 11-808.

Non-Confidential
8 Compiler Command-line Options
8.6 --apcs=qualifier...qualifier

8.6 --apcs=qualifier...qualifier
Controls interworking and position independence when generating code.
By specifying qualifiers to the --apcs command-line option, you can define the variant of the Procedure
Call Standard for the ARM architecture (AAPCS) used by the compiler.

Syntax
--apcs=qualifier...qualifier

Where qualifier...qualifier denotes a list of qualifiers. There must be:

• At least one qualifier present.
• No spaces separating individual qualifiers in the list.
Each instance of qualifier must be one of:
/interwork
/nointerwork
Generates code with or without ARM/Thumb interworking support. The default
is /nointerwork, except for ARMv5T and later where the default is /interwork.
/ropi
/noropi
Enables or disables the generation of Read-Only Position-Independent (ROPI) code. The default
is /noropi.
/[no]pic is an alias for /[no]ropi.

/rwpi
/norwpi
Enables or disables the generation of Read/Write Position-Independent (RWPI) code. The
default is /norwpi.
/[no]pid is an alias for /[no]rwpi.

/fpic
/nofpic
Enables or disables the generation of read-only position-independent code where relative
address references are independent of the location where your program is loaded.
/hardfp
/softfp
Requests hardware or software floating-point linkage. This enables the procedure call standard
to be specified separately from the version of the floating-point hardware available through the
--fpu option. It is still possible to specify the procedure call standard by using the --fpu option,
but ARM recommends that you use --apcs instead.

Note
The / prefix is optional for the first qualifier, but must be present to separate subsequent qualifiers in the
same --apcs option. For example, --apcs=/nointerwork/noropi/norwpi is equivalent to
--apcs=nointerwork/noropi/norwpi.

You can specify multiple qualifiers using either a single --apcs option or multiple --apcs options. For
example, --apcs=/nointerwork/noropi/norwpi is equivalent to --apcs=/nointerwork
--apcs=noropi/norwpi.

Default
If you do not specify an --apcs option, the compiler assumes
--apcs=/nointerwork/noropi/norwpi/nofpic.

Non-Confidential
8 Compiler Command-line Options
8.6 --apcs=qualifier...qualifier

Usage
/interwork
/nointerwork
By default, code is generated:
• Without interworking support, that is /nointerwork, unless you specify a --cpu option that
corresponds to architecture ARMv5T or later.
• With interworking support, that is /interwork, on ARMv5T and later. ARMv5T and later
architectures provide direct support to interworking by using instructions such as BLX and
load to program counter instructions.
/ropi
/noropi
If you select the /ropi qualifier to generate ROPI code, the compiler:
• Addresses read-only code and data PC-relative.
• Sets the Position Independent (PI) attribute on read-only output sections.

Note
--apcs=/ropi is not supported when compiling C++.

/rwpi
/norwpi
If you select the /rwpi qualifier to generate RWPI code, the compiler:
• addresses writable data using offsets from the static base register sb. This means that:
— The base address of the RW data region can be fixed at runtime.
— Data can have multiple instances.
— Data can be, but does not have to be, position-independent.
• Sets the PI attribute on read/write output sections.

Note
Because the --lower_rwpi option is the default, code that is not RWPI is automatically
transformed into equivalent code that is RWPI. This static initialization is done at runtime by the
C++ constructor mechanism, even for C.

/fpic
/nofpic
If you select this option, the compiler:
• Accesses all static data using PC-relative addressing.
• Accesses all imported or exported read-write data using a Global Offset Table (GOT) entry
created by the linker.
• Accesses all read-only data relative to the PC.
You must compile your code with /fpic if it uses shared objects. This is because relative
addressing is only implemented when your code makes use of System V shared libraries.
You do not have to compile with /fpic if you are building either a static image or static library.
The use of /fpic is supported when compiling C++. In this case, virtual function tables and
typeinfo are placed in read-write areas so that they can be accessed relative to the location of
the PC.
Note
When building a System V or ARM Linux shared library, use --apcs /fpic together with
--no_hide_all.

Non-Confidential
8 Compiler Command-line Options
8.6 --apcs=qualifier...qualifier

/hardfp
If you use /hardfp, the compiler generates code for hardware floating-point linkage. Hardware
floating-point linkage uses the FPU registers to pass the arguments and return values.
/hardfp interacts with or overrides explicit or implicit use of --fpu as follows:

The /hardfp and /softfp qualifiers are mutually exclusive.

• If floating-point support is not permitted (for example, because --fpu=none is specified, or
because of other means), /hardfp is ignored.
• If floating-point support is permitted, but without floating-point hardware
(--fpu=softvfp), /hardfp gives an error.
• If floating-point hardware is available and the hardfp calling convention is used
(--fpu=vfp...), /hardfp is ignored.
• If floating-point hardware is present and the softfp calling convention is used
(--fpu=softvfp+vfp...), /hardfp gives an error.
/softfp
If you use /softfp, software floating-point linkage is used. Software floating-point linkage
means that the parameters and return value for a function are passed using the ARM integer
registers r0 to r3 and the stack.
/softfp interacts with or overrides explicit or implicit use of --fpu as follows: