DUI0472M Armcc User Guide
DUI0472M Armcc User Guide
Version 5.06u3
ARM® Compiler
armcc User Guide
Copyright © 2010-2016 ARM. All rights reserved.
Release Information
Document History
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE
WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to, and has
undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other
rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of
this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is
not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to ARM’s customers is
not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document at
any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement covering this
document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of these terms.
This document may be translated into other languages for convenience, and you agree that if there is any conflict between the
English version of this document and any translation, the terms of the English version of the Agreement shall prevail.
Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or
elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective
owners. Please follow ARM’s trademark usage guidelines at http://www.arm.com/about/trademark-usage-guidelines.php
LES-PRE-20349
Additional Notices
Some material in this document is based on IEEE 754-1985 IEEE Standard for Binary Floating-Point Arithmetic. The IEEE
disclaims any responsibility or liability resulting from the placement and use in the described manner.
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.
Preface
About this book ..................................................... ..................................................... 26
Chapter 7 Using the Inline and Embedded Assemblers of the ARM Compiler
7.1 Compiler support for inline assembly language .......................... .......................... 7-264
7.2 Inline assembler support in the compiler ............................... ............................... 7-265
7.3 Restrictions on inline assembler support in the compiler ................... ................... 7-266
7.4 Inline assembly language syntax with the __asm keyword in C and C++ ...... ...... 7-267
7.5 Inline assembly language syntax with the asm keyword in C++ ............. ............. 7-268
7.6 Inline assembler rules for compiler keywords __asm and asm .............. .............. 7-269
7.7 Restrictions on inline assembly operations in C and C++ code .............. .............. 7-270
7.8 Inline assembler register restrictions in C and C++ code ...................................... 7-271
7.9 Inline assembler processor mode restrictions in C and C++ code ........................ 7-272
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code ................ 7-273
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code ...... 7-274
7.12 Inline assembler instruction restrictions in C and C++ code .................................. 7-275
7.13 Miscellaneous inline assembler restrictions in C and C++ code ............. ............. 7-276
7.14 Inline assembler and register access in C and C++ code .................. .................. 7-277
7.15 Inline assembler and the # constant expression specifier in C and C++ code ...... 7-279
7.16 Inline assembler and instruction expansion in C and C++ code ............................ 7-280
7.17 Expansion of inline assembler instructions that use constants .............. .............. 7-281
7.18 Expansion of inline assembler load and store instructions .................................... 7-282
7.19 Inline assembler effect on processor condition flags in C and C++ code .............. 7-283
7.20 Inline assembler expression operands in C and C++ code ................. ................. 7-284
7.21 Inline assembler register list operands in C and C++ code ................. ................. 7-285
7.22 Inline assembler intermediate operands in C and C++ code ................ ................ 7-286
7.23 Inline assembler function calls and branches in C and C++ code ............ ............ 7-287
7.24 Inline assembler branches and labels in C and C++ code .................................... 7-289
7.25 Inline assembler and virtual registers .................................................................... 7-290
7.26 Embedded assembler support in the compiler ...................................................... 7-291
7.27 Embedded assembler syntax in C and C++ .......................................................... 7-292
7.28 Effect of compiler ARM and Thumb states on embedded assembler ......... ......... 7-293
7.29 Restrictions on embedded assembly language functions in C and C++ code ... ... 7-294
7.30 Compiler generation of embedded assembly language functions ............ ............ 7-295
Glossary
The ARM Glossary is a list of terms used in ARM documentation, together with definitions for those
terms. The ARM Glossary does not contain terms that are industry standard unless the ARM meaning
differs from the generally accepted meaning.
See the ARM Glossary for more information.
Typographic conventions
italic
Introduces special terminology, denotes cross-references, and citations.
bold
Highlights interface elements, such as menu names. Denotes signal names. Also used for terms
in descriptive lists, where appropriate.
monospace
Denotes text that you can enter at the keyboard, such as commands, file and program names,
and source code.
monospace
Denotes a permitted abbreviation for a command or option. You can enter the underlined text
instead of the full command or option name.
monospace italic
Denotes arguments to monospace text where the argument is to be replaced by a specific value.
monospace bold
Denotes language keywords when used outside example code.
<and>
Encloses replaceable terms for assembler syntax where they appear in code or code fragments.
For example:
MRC p15, 0, <Rd>, <CRn>, <CRm>, <Opcode_2>
SMALL CAPITALS
Used in body text for a few terms that have specific technical meanings, that are defined in the
ARM glossary. For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and
UNPREDICTABLE.
Feedback
Feedback on content
If you have comments on content then send an e-mail to [email protected]. Give:
• The title ARM® Compiler armcc User Guide.
• The number ARM DUI0472M.
• If applicable, the page number(s) to which your comments refer.
• A concise explanation of your comments.
ARM also welcomes general suggestions for additions and improvements.
Note
ARM tests the PDF only in Adobe Acrobat and Acrobat Reader, and cannot guarantee the quality of the
represented document when used with any other PDF reader.
Other information
• ARM Information Center.
• ARM Technical Support Knowledge Articles.
• Support and Maintenance.
• ARM Glossary.
Gives an overview of the ARM compiler, the languages and extensions it supports, and the provided
libraries.
It contains the following sections:
• 1.1 The compiler on page 1-30.
• 1.2 Source language modes of the compiler on page 1-31.
• 1.3 Language extensions on page 1-33.
• 1.4 Language compliance on page 1-34.
• 1.5 The C and C++ libraries on page 1-35.
Note
The command-line option descriptions and related information in the individual ARM Compiler tools
documents describe all the features that ARM Compiler supports. Any features not documented are not
supported and are used at your own risk. You are responsible for making sure that any generated code
using unsupported features is operating correctly.
Related concepts
3.1 NEON technology on page 3-69.
Related information
The DWARF Debugging Standard, http://dwarfstd.org/.
Application Binary Interface (ABI) for the ARM Architecture.
C
Means any of C90, strict C90, C99, strict C99, and Standard C.
C++03
Means ISO C++03, excepting export templates, either with or without the ARM extensions.
Use the compiler option --cpp to compile C++03 code.
Use the compiler options --cpp --cpp_compat to maximize binary compatibility with C++03
code compiled using older compiler versions.
Strict C++03
Means ISO C++03, excepting export templates.
Use the compiler options --cpp --strict to compile strict C++03 code.
C++11
Means ISO C++11, either with or without the ARM extensions.
Use the compiler option --cpp11 to compile C++11 code.
Use the compiler options --cpp11 --cpp_compat to compile a subset of C++11 code that
maximizes compatibility with code compiled to the C++ 2003 standard.
Strict C++11
Means ISO C++11.
Use the compiler options --cpp11 --strict to compile strict C++11 code.
Standard C++
Means strict C++03 or strict C++11 as appropriate.
C++
Means any of C++03, strict C++03, C++11, strict C++11.
Related concepts
5.59 New language features of C99 on page 5-228.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
Related references
1.3 Language extensions on page 1-33.
1.4 Language compliance on page 1-34.
1.3 Language extensions on page 1-33.
1.4 Language compliance on page 1-34.
11.13 C++11 supported features on page 11-814.
15.1 Implementation definition on page 15-892.
16.4 Standard C++ library implementation definition on page 16-916.
Related references
9.6 C99 language features available in C90 on page 9-556.
9.10 C99 language features available in C++ and C90 on page 9-560.
9.15 Standard C language extensions on page 9-565.
9.24 Standard C++ language extensions on page 9-574.
9.32 Standard C and Standard C++ language extensions on page 9-582.
1.4 Language compliance on page 1-34.
9.45 GNU extensions to the C and C++ languages on page 9-595.
Chapter 14 Summary Table of GNU Language Extensions on page 14-887.
Examples
The following examples illustrate combining source language modes with language compliance modes:
• Compiling a .cpp file with the command-line option --strict compiles Standard C++03.
• Compiling a C source file with the command-line option --gnu compiles GNU mode C90.
• Compiling a .c file with the command-line options --strict and --gnu is an error.
Related references
8.93 --gnu on page 8-424.
8.176 --strict, --no_strict on page 8-513.
9.45 GNU extensions to the C and C++ languages on page 9-595.
2.7 Filename suffixes recognized by the compiler on page 2-49.
Chapter 14 Summary Table of GNU Language Extensions on page 14-887.
Related information
ARM DS-5 License Management Guide.
Application Binary Interface (ABI) for the ARM Architecture.
Compliance with the Application Binary Interface (ABI) for the ARM architecture.
The ARM C and C++ Libraries.
where:
options
are compiler command-line options that affect the behavior of the compiler.
source
provides the filenames of one or more text files containing C or C++ source code. By default,
the compiler looks for source files and creates output files in the current directory.
If a source file is an assembly file, that is, one with an extension of .s, the compiler activates the
ARM assembler to process the source file.
When you invoke the compiler, you normally specify one or more source files. However, a
minority of compiler command-line options do not require you to specify a source file. For
example, armcc --version_number.
The compiler accepts one or more input files, for example:
armcc -c [options] input_file_1 ... input_file_n
Specifying a dash - for an input file causes the compiler to read from stdin. To specify that all
subsequent arguments are treated as filenames, not as command switches, use the POSIX option --.
The -c option instructs the compiler to perform the compilation step, but not the link step.
Related concepts
2.2 Compiler command-line options listed by group on page 2-38.
Related references
8.21 -c on page 8-344.
Related information
Rules for specifying command-line options.
Toolchain environment variables.
Note
The following characters are interchangeable:
• Nonprefix hyphens and underscores. For example, --version_number and --version-number.
• Equals signs and spaces. For example, armcc --cpu=list and armcc --cpu list.
This applies to all tools provided with the compiler.
Source languages
• --c90
• --c99
• --compile_all_input, --no_compile_all_input
• --cpp
• --cpp11
• --cpp_compat
• --gnu
• --strict, --no_strict
• --strict_warnings
Search paths
• -Idir[,dir,...]
• -Jdir[,dir,...]
• --kandr_include
• --preinclude=filename
• --reduce_paths, --no_reduce_paths
• --sys_include
• --ignore_missing_headers
Precompiled headers
• --create_pch=filename
• --pch
• --pch_dir=dir
• --pch_messages, --no_pch_messages
• --pch_verbose, --no_pch_verbose
• --use_pch=filename
Preprocessor
• -C
• --code_gen, --no_code_gen
• -Dname[(parm-list)][=def]
• -E
• -M
• --old_style_preprocessing
• -P
• --preprocess_assembly
• --preprocessed
• -Uname
C++
• --allow_null_this
• --anachronisms, --no_anachronisms
• --dep_name, --no_dep_name
• --export_all_vtbl, --no_export_all_vtbl
• --force_new_nothrow, --no_force_new_nothrow
• --friend_injection, --no_friend_injection
• --guiding_decls, --no_guiding_decls
• --implicit_include, --no_implicit_include
• --implicit_include_searches, --no_implicit_include_searches
• --implicit_typename, --no_implicit_typename
• --nonstd_qualifier_deduction, --no_nonstd_qualifier_deduction
• --old_specializations, --no_old_specializations
• --parse_templates, --no_parse_templates
• --pending_instantiations=n
• --rtti, --no_rtti
• --rtti_data
• --type_traits_helpers
• --using_std, --no_using_std
• --vfe, --no_vfe
Output format
• --asm
• --asm_dir
• -c
• --default_extension=ext
• --depend=filename
• --depend_dir
• --depend_format=string
• --depend_single_line
• --depend_system_headers, --no_depend_system_headers
• --depend_target
• --errors
• --info=totals
• --interleave
• --list
• --list_dir
• --list_macros
• --md
• --mm
• -o filename
• --output_dir
• --phony_targets
• -S
• --split_sections
Target architectures and processors
• --arm
• --arm_only
• --compatible=name
• --cpu=list
• --cpu=name
• --fpu=list
• --fpu=name
• --thumb
Floating-point support
• --fp16_format=format
• --fpmode=model
• --fpu=list
• --fpu=name
Debug
• --debug, --no_debug
• --debug_macros, --no_debug_macros
• --dwarf2
• --dwarf3
• -g
• --remove_unneeded_entities, --no_remove_unneeded_entities
• --emit_frame_directives
Code generation
• --allow_fpreg_for_nonfpdata, --no_allow_fpreg_for_nonfpdata
• --alternative_tokens, --no_alternative_tokens
• --bigend
• --bitband
• --branch_tables
• --bss_threshold=num
• --conditionalize, --no_conditionalize
• --default_definition_visibility
• --dllexport_all, --no_dllexport_all
• --dllimport_runtime, --no_dllimport_runtime
• --dollar, --no_dollar
• --enum_is_int
• --exceptions, --no_exceptions
• --exceptions_unwind, --no_exceptions_unwind
• --execute_only
• --float_literal_pools
• --export_all_vtbl, --no_export_all_vtbl
• --export_defs_implicitly, --no_export_defs_implicitly
• --extended_initializers, --no_extended_initializers
• --global_reg
• --gnu_defaults
• --gnu_instrument
• --gnu_version
• --hide_all, --no_hide_all
• --implicit_key_function
• --import_all_vtbl
• --integer_literal_pools
• --interface_enums_are_32_bit
• --littleend
• --locale=lang_country
• --long_long
• --loose_implicit_cast
• --message_locale=lang_country[.codepage]
• --min_array_alignment=opt
• --multibyte_chars, --no_multibyte_chars
• --multiply_latency
• --narrow_volatile_bitfields
• --pointer_alignment=num
• --protect_stack, --no_protect_stack
• --restrict, --no_restrict
• --relaxed_ref_def
• --share_inlineable_strings
• --signed_bitfields, --unsigned_bitfields
• --signed_chars, --unsigned_chars
• --split_ldm
• --string_literal_pools
• --trigraphs
• --unaligned_access, --no_unaligned_access
• --use_frame_pointer
• --vectorize, --no_vectorize
• --visibility_inlines_hidden
• --vla, --no_vla
• --wchar
• --wchar16
• --wchar32
Optimization
• --autoinline, --no_autoinline
• --data_reorder, --no_data_reorder
• --forceinline
• --fpmode=model
• --inline, --no_inline
• --library_interface=lib
• --library_type=lib
• --loop_optimization_level=opt
• --lower_ropi, --no_lower_ropi
• --lower_rwpi, --no_lower_rwpi
• --multifile, --no_multifile
• -Onum
• -Ospace
• -Otime
• --reassociate_saturation
• --retain=option
• --whole_program
Note
Optimization options can limit the debug information generated by the compiler.
Diagnostics
• --brief_diagnostics, --no_brief_diagnostics
• --diag_error=tag[,tag,...]
• --diag_remark=tag[,tag,...]
• --diag_style={arm|ide|gnu}
• --diag_suppress=tag[,tag,...]
• --diag_suppress=optimizations
• --diag_warning=tag[,tag,...]
• --diag_warning=optimizations
• --errors=filename
• --link_all_input
• --remarks
• -W
• --wrap_diagnostics, --no_wrap_diagnostics
Command-line options in a text file
• --via=filename
Linker feedback
• --feedback=filename
Procedure call standard
• --apcs=qualifier...qualifier
Passing options to other tools
• -Aopt
• -Lopt
ARM Linux
• --arm_linux
• --arm_linux_configure
• --arm_linux_config_file=path
• --arm_linux_paths
• --configure_gas
• --configure_gcc=path
• --configure_gcc_version
• --configure_gld=path
• --configure_sysroot=path
• --configure_cpp_headers=path
• --configure_extra_includes=paths
• --configure_extra_libraries=paths
• --execstack
• --shared
• --translate_g++
• --translate_gcc
• --translate_gld
• --use_gas
• -Warmcc,option[,option,...]
• -Warmcc,--gcc_fallback
Related concepts
2.4 Order of compiler command-line options on page 2-45.
Related references
Chapter 8 Compiler Command-line Options on page 8-312.
If you specify files with conflicting file extensions you can force the compiler to compile both files for C
or for C++, regardless of file extension. For example:
armcc -c --cpp test1.c test2.cpp
Where an unrecognized extension begins with .c, for example, filename.cmd, an error message is
generated.
Support for processing Precompiled Header (PCH) files is not available when you specify multiple
source files in a single compilation. If you request PCH processing and specify more than one primary
source file, the compiler issues an error message, and aborts the compilation.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
armcc can in turn invoke armasm and armlink. For example, if your source code contains embedded
assembly code, armasm is called. armcc searches for the armasm and armlink binaries in the following
locations, in this order:
1. The same location as armcc.
2. The PATH locations.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
2.4 Order of compiler command-line options on page 2-45.
2.9 Factors influencing how the compiler searches for header files on page 2-52.
2.11 Compiler search rules and the current place on page 2-54.
2.12 The ARMCC5INC environment variable on page 2-55.
2.2 Compiler command-line options listed by group on page 2-38.
2.1 Compiler command-line syntax on page 2-37.
Related tasks
2.5 Using stdin to input source code to the compiler on page 2-46.
Related references
2.7 Filename suffixes recognized by the compiler on page 2-49.
2.8 Compiler output files on page 2-51.
You can use the environment variable ARMCC5_CCOPT to specify compiler command-line options. Options
specified on the command line take precedence over options specified in the environment variable.
To see how the compiler has processed the command line, use the --show_cmdline option. This shows
nondefault options that the compiler used. The contents of any via files are expanded. In the example
used here, although the compiler executes armcc -O2 -Otime, the output from --show_cmdline does
not include -O2. This is because -O2 is the default optimization level, and --show_cmdline does not
show options that apply by default.
Related concepts
2.2 Compiler command-line options listed by group on page 2-38.
Procedure
1. Invoke the compiler with the command-line options you want to use. The default compiler mode is C.
Use the minus character (-) as the source filename to instruct the compiler to take input from stdin.
For example:
armcc --bigend -c -
If you want an object file to be written, use the -o option. If you want preprocessor output to be sent
to the output stream, use the -E option. If you want the output to be sent to stdout, use the -o-
option. If you want an assembly listing of the keyboard input to be sent to the output stream after
input has been terminated, use none of these options.
2. You cannot input on the same line after the minus character. You must press the return key if you
have not already done so.
The command prompt waits for you to enter more input.
3. Enter your input. For example:
#include <stdio.h>
int main(void)
{ printf("Hello world\n"); }
You can only combine standard input with other source files when you are linking code. If you attempt to
combine standard input with other source files when not linking, the compiler generates an error.
Related concepts
2.1 Compiler command-line syntax on page 2-37.
2.2 Compiler command-line options listed by group on page 2-38.
Related information
Rules for specifying command-line options.
Toolchain environment variables.
Related concepts
2.1 Compiler command-line syntax on page 2-37.
2.2 Compiler command-line options listed by group on page 2-38.
Related information
Rules for specifying command-line options.
Toolchain environment variables.
.C C or C++ source file On UNIX platforms, implies --cpp. On non-UNIX platforms, implies --c90.
.d Dependency list file .d is the default output filename suffix for files output using the --md option.
.pch Precompiled header file .pch is the default output filename suffix for files output using the --pch option.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05
onwards on all platforms. Note that ARM Compiler on Windows 8 never supported PCH
files.
.s ARM, Thumb, or mixed ARM For files in the input file list suffixed with .s, the compiler invokes the assembler,
and Thumb assembly language armasm, to assemble the file.
source file.
.s is the default output filename suffix for files output using either the option -S or --
asm.
.S ARM, Thumb, or mixed ARM On UNIX platforms, for files in the input file list suffixed with .S, the compiler
and Thumb assembly language preprocesses the assembly source before passing that source to the assembler.
source file.
On non-UNIX platforms, .S is equivalent to .s. That is, preprocessing is not performed.
.sx ARM, Thumb, or mixed ARM For files in the input file list suffixed with .sx, the compiler preprocesses the assembly
and Thumb assembly language source before passing that source to the assembler.
source file.
.txt Text file .txt is the default output filename suffix for files output using the -S or --asm option in
combination with the --interleave option.
Related references
8.7 --arm on page 8-326.
8.111 --interleave on page 8-442.
8.118 --list on page 8-450.
8.129 --md on page 8-462.
8.148 --pch on page 8-484.
8.168 -S on page 8-504.
8.29 --compile_all_input, --no_compile_all_input on page 8-353.
11.9 Template instantiation in ARM C++ on page 11-809.
Related information
ELF for the ARM Architecture.
2.9 Factors influencing how the compiler searches for header files
Several factors influence how the compiler searches for #include header files and source files.
• The value of the environment variable ARMCC5INC.
• The value of the environment variable ARMINC.
• The -I and -J compiler options.
• The --kandr_include and --sys_include compiler options.
• Whether the filename is an absolute filename or a relative filename.
• Whether the filename is between angle brackets or double quotes.
Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.11 Compiler search rules and the current place on page 2-54.
Related references
2.10 Compiler command-line options and search paths on page 2-53.
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.
Related information
Toolchain environment variables.
-Jdir[,dir,...] The directory or directories specified by - 1. The current place on page 2-54.
Jdir[,dir,...]. 2. The directory or directories specified by -
Jdir[,dir,...].
Both -Idir[,dir,...] 1. The directory or directories specified 1. The current place on page 2-54.
and -Jdir[,dir,...] by -Jdir[,dir,...]. 2. The directory or directories specified by -
2. The directory or directories specified Idir[,dir,...].
by -Idir[,dir,...]. 3. The directory or directories specified by -
Jdir[,dir,...].
--sys_include No effect. Removes the current place on page 2-54 from the search
path.
--kandr_include No effect. Uses Kernighan and Ritchie search rules.
Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.11 Compiler search rules and the current place on page 2-54.
2.9 Factors influencing how the compiler searches for header files on page 2-52.
Related references
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.
You can disable the stacking of current places by using the compiler option --kandr_include. This
option makes the compiler use Kernighan and Ritchie search rules whereby each nonrooted user
#include is searched for relative to the directory containing the source file that is being compiled.
Related concepts
2.12 The ARMCC5INC environment variable on page 2-55.
2.9 Factors influencing how the compiler searches for header files on page 2-52.
Related references
2.10 Compiler command-line options and search paths on page 2-53.
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.
Related concepts
2.11 Compiler search rules and the current place on page 2-54.
2.9 Factors influencing how the compiler searches for header files on page 2-52.
Related references
2.10 Compiler command-line options and search paths on page 2-53.
8.100 -Idir[,dir,...] on page 8-431.
8.112 -Jdir[,dir,...] on page 8-443.
8.113 --kandr_include on page 8-444.
8.179 --sys_include on page 8-517.
Related information
Toolchain environment variables.
Related references
8.6 --apcs=qualifier...qualifier on page 8-322.
Related information
Procedure Call Standard for the ARM Architecture.
ARM C libraries and multithreading.
BPABI and SysV Shared Libraries and Executables.
Interworking ARM and Thumb.
The whole build is still driven by the build script, makefile, or other infrastructure you are using, and that
does not change. For example, a single compile step might fail when armcc tries to compile that step.
armcc then attempts to perform that single compile step with gcc. If a link step fails, armcc attempts to
perform the link with the GCC toolchain, using GNU ld. When armcc performs a compile or link step,
the include paths, library paths, and Linux libraries it uses are identified in the ARM Linux configuration
file. For fallback, you must either:
• Use the --arm_linux_config_file compiler option to produce the configuration file by configuring
armcc against an existing gcc.
• Provide an explicit path to gcc if you are specifying other configuration options manually.
The GCC toolchain used for fallback is the one that the configuration was created against. Therefore, the
paths and libraries used by armcc and gcc must be equivalent.
If armcc invokes GCC fallback, a warning message is displayed. If gcc also fails, an additional error is
displayed, otherwise you get a message indicating that gcc succeeded. You also see the original error
messages from armcc to inform you of the source file or files that failed to compile, and the cause of the
problem.
Note
• There is no change to what the ARM Compiler tools link with when using GCC fallback. That is, the
tools only link with whatever gcc links with, as identified in the configuration file generated with the
--arm_linux_config_file compiler option. Therefore, it is your responsibility to ensure that
licenses are adhered to, and in particular to check what you are linking with. You might have to
explicitly override this if necessary. To do this, include the GNU options -nostdinc,
-nodefaultlibs, and -nostdlib on the armcc command line.
• armcc invokes the GNU tools in a separate process.
• armcc does not optimize any code in any GCC intermediate representations.
To see the commands that are invoked during GCC fallback, specify the -Warmcc,--echo command-line
option.
The following figure shows a high-level view of the GCC fallback process:
Object files
Source files for
and libraries
compilation
for linking
armcc
driver Error: Pass
command-line
across GCC
ARM Compiler toolchain GCC toolchain
driver
Compile Compile
step step Error
Object
files Stop
Executable
image
Related references
8.9 --arm_linux_config_file=path on page 8-329.
8.10 --arm_linux_configure on page 8-330.
8.71 --echo on page 8-399.
8.200 -Warmcc,option[,option,...] on page 8-542.
8.201 -Warmcc,--gcc_fallback on page 8-543.
Related information
GNU Compiler Collection, http://gcc.gnu.org.
Related concepts
2.16 Unused function code on page 2-60.
Related tasks
2.17 Minimizing code size by eliminating unused functions during compilation on page 2-61.
The linker option --feedback=filename creates a feedback file, and the --feedback_type option
controls the different types of feedback generated.
Related tasks
2.17 Minimizing code size by eliminating unused functions during compilation on page 2-61.
Related references
2.15 Linker feedback during compilation on page 2-59.
Procedure
1. Compile your source code.
2. Use the linker option --feedback=filename to create a feedback file.
3. Use the linker option --feedback_type to control which feedback the linker generates.
By default, the linker generates feedback to eliminate unused functions. This is equivalent to
--feedback_type=unused,noiw. The linker can also generate feedback to avoid compiling functions
for interworking that are never used in an interworking context. Use the linker option
--feedback_type=unused,iw to eliminate both types of unused function.
Note
Reduction of compilation required for interworking is only applicable to ARMv4T architectures.
ARMv5T and later processors can interwork without penalty.
4. Re-compile using the compiler option --feedback=filename to feed the feedback file to the
compiler.
The compiler uses the feedback file generated by the linker to compile the source code in a way that
enables the linker to subsequently discard the unused functions.
Note
To obtain maximum benefit from linker feedback, do a full compile and link at least twice. A single
compile and link using feedback from a previous build is normally sufficient to obtain some benefit.
Note
Always ensure that you perform a full clean build immediately before using the linker feedback file. This
minimizes the risk of the feedback file becoming out of date with the source code it was generated from.
You can specify the --feedback=filename option even when no feedback file exists. This enables you
to use the same build commands or makefile regardless of whether a feedback file exists, for example:
armcc -c --feedback=unused.txt test.c -o test.o
armlink --feedback=unused.txt test.o -o test.axf
The first time you build the application, it compiles normally but the compiler warns you that it cannot
read the specified feedback file because it does not exist. The link command then creates the feedback
file and builds the image. Each subsequent compilation step uses the feedback file from the previous link
step to remove any unused functions that are identified.
Related concepts
2.16 Unused function code on page 2-60.
Related references
2.15 Linker feedback during compilation on page 2-59.
8.82 --feedback=filename on page 8-410.
Related information
--feedback_type=type linker option.
About linker feedback.
Interworking ARM and Thumb.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related references
2.18.3 Minimizing compilation build time with a single armcc invocation on page 2-64.
2.18.4 Effect of --multifile on compilation build time on page 2-64.
2.18.5 Minimizing compilation build time with parallel make on page 2-65.
2.18.6 Compilation build time and operating system choice on page 2-65.
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
Related information
Optimizing license checkouts from a floating license server.
Licensed features of ARM Compiler.
• If you are using a makefile-based build environment, consider using a make tool that can apply some
form of parallelism.
• Consider your choice of operating system for cross-compilation. Linux generally gives better build
speed than Windows, but there are general performance-tuning techniques you can apply on
Windows that might help improve build times.
Related concepts
2.18.1 Compilation build time on page 2-62.
4.24 Precompiled Header (PCH) files on page 4-134.
5.14 Guarding against multiple inclusion of header files on page 5-171.
3.15 Vectorization on loops containing pointers on page 3-84.
Related references
2.18.3 Minimizing compilation build time with a single armcc invocation on page 2-64.
2.18.4 Effect of --multifile on compilation build time on page 2-64.
2.18.5 Minimizing compilation build time with parallel make on page 2-65.
2.18.6 Compilation build time and operating system choice on page 2-65.
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
8.44 --create_pch=filename on page 8-371.
Related information
Licensed features of ARM Compiler.
Instead, you can try modifying your script to compile multiple files within a single invocation of armcc.
For example, armcc file1.c file2.c file3.c ...
For convenience, you can also list all your .c files in a single via file invoked with
armcc -via sources.txt.
Although this mechanism can dramatically reduce license checkouts and loading and unloading of the
compiler to give significant improvements in build time, the following limitations apply:
• All files are compiled with the same options.
• Converting existing build systems could be difficult.
• Usability depends on source file structure and dependencies.
• An IDE might be unable to report which file had compilation errors.
• After detecting an error, the compiler does not compile subsequent files.
Related concepts
2.18.1 Compilation build time on page 2-62.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related references
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
8.195 --via=filename on page 8-537.
Related information
Licensed features of ARM Compiler.
reduce compilation time as a result of time recovered from creating (opening and closing) multiple object
files.
Note
• In RVCT 4.0, if you compile with -O3, --multifile is enabled by default.
• In ARM Compiler 4.1 and later, --multifile is disabled by default, regardless of the optimization
level.
Related concepts
2.18.1 Compilation build time on page 2-62.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related references
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
Related information
Licensed features of ARM Compiler.
Related concepts
2.18.1 Compilation build time on page 2-62.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related concepts
2.18.1 Compilation build time on page 2-62.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related information
On what platforms will my ARM development tools work?.
Introduces the NEON unit and explains how to take advantage of automatic vectorizing features.
It contains the following sections:
• 3.1 NEON technology on page 3-69.
• 3.2 The NEON unit on page 3-70.
• 3.3 Methods of writing code for NEON on page 3-72.
• 3.4 Generating NEON instructions from C or C++ code on page 3-73.
• 3.5 NEON C extensions on page 3-74.
• 3.6 Automatic vectorization on page 3-75.
• 3.7 Data references within a vectorizable loop on page 3-76.
• 3.8 Stride patterns and data accesses on page 3-77.
• 3.9 Factors affecting NEON vectorization performance on page 3-78.
• 3.10 NEON vectorization performance goals on page 3-79.
• 3.11 Recommended loop structure for vectorization on page 3-80.
• 3.12 Data dependency conflicts when vectorizing code on page 3-81.
• 3.13 Carry-around scalar variables and vectorization on page 3-82.
• 3.14 Reduction of a vector to a scalar on page 3-83.
• 3.15 Vectorization on loops containing pointers on page 3-84.
• 3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
• 3.17 Nonvectorization on conditional loop exits on page 3-86.
• 3.18 Vectorizable loop iteration counts on page 3-87.
• 3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
• 3.20 Grouping structure accesses for vectorization on page 3-91.
• 3.21 Vectorization and struct member lengths on page 3-92.
• 3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
• 3.23 Conditional statements and efficient vectorization on page 3-94.
• 3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
• 3.25 Vectorizable code example on page 3-97.
• 3.26 DSP vectorizable code example on page 3-99.
• 3.27 What can limit or prevent automatic vectorization on page 3-101.
Note
The NEON register bank is shared with the VFP register bank.
Related concepts
3.2 The NEON unit on page 3-70.
The NEON unit is classified as a vector Single Instruction Multiple Data (SIMD) unit that operates on
multiple elements in a vector register by using one instruction.
For example, array A is a 16-bit integer array with 8 elements.
1 2 3 4 5 6 7 8
80 70 60 50 40 30 20 10
To add these arrays together, fetch each vector into a vector register and use one vector SIMD instruction
to obtain the result.
81 72 63 54 45 36 27 18
The NEON unit can only deal with vectors that are stored consecutively in memory, so it is not possible
to vectorize indirect addressing.
When writing structures, be aware that NEON structure loads require the structure to contain equal-sized
members.
Related concepts
3.3 Methods of writing code for NEON on page 3-72.
Related tasks
3.4 Generating NEON instructions from C or C++ code on page 3-73.
Related references
8.86 --fp16_format=format on page 8-414.
8.87 --fpmode=model on page 8-415.
8.192 --vectorize, --no_vectorize on page 8-534.
Related information
Introducing NEON Development Article.
Related concepts
3.2 The NEON unit on page 3-70.
3.6 Automatic vectorization on page 3-75.
3.8 Stride patterns and data accesses on page 3-77.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.14 Reduction of a vector to a scalar on page 3-83.
3.15 Vectorization on loops containing pointers on page 3-84.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.
Related tasks
3.4 Generating NEON instructions from C or C++ code on page 3-73.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.5 NEON C extensions on page 3-74.
3.7 Data references within a vectorizable loop on page 3-76.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.27 What can limit or prevent automatic vectorization on page 3-101.
You can also use --diag_warning=optimizations to obtain useful diagnostics from the compiler on
what it can and cannot optimize or vectorize. For example:
armcc --cpu Cortex-A8 --vectorize -O3 -Otime --diag_warning=optimizations source.c
Note
To run code that contains NEON instructions, you must enable both the FPU and NEON.
Related concepts
5.5 Enabling NEON and FPU for bare-metal on page 5-158.
Related tasks
5.4 Selecting the target processor at compile time on page 5-157.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
3.5 NEON C extensions on page 3-74.
Related information
Licensed features of ARM Compiler.
Related concepts
3.3 Methods of writing code for NEON on page 3-72.
Related references
Chapter 18 Using NEON Support on page 18-923.
Related concepts
3.8 Stride patterns and data accesses on page 3-77.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.14 Reduction of a vector to a scalar on page 3-83.
3.15 Vectorization on loops containing pointers on page 3-84.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.7 Data references within a vectorizable loop on page 3-76.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.27 What can limit or prevent automatic vectorization on page 3-101.
Related concepts
3.8 Stride patterns and data accesses on page 3-77.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
Related references
3.7 Data references within a vectorizable loop on page 3-76.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.10 NEON vectorization performance goals on page 3-79.
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.17 Nonvectorization on conditional loop exits on page 3-86.
Related references
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
Information from other array subscripts is used as part of the analysis of dependencies. The loop in the
following example vectorizes because the nonvector subscripts of the references to array a can never be
equal. They can never be equal because n is not equal to n+1 and so gives no feedback between
iterations. The references to array a use two different pieces of the array, so they do not share data.
float a[99][99], b[99], c[99];
int i, n;
...
for (i = 1; i < 99; i++)
{
a[n][i] = a[n+1][i-1] * b[i] + c[i];
}
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
A scalar that is used and then set in a loop is called a carry-around scalar. These variables are a problem
for vectorization because the value computed in one pass of the loop is carried forward into the next pass.
In the following example, x is a carry-around scalar.
Nonvectorizable loop
float a[99], b[99], x;
int i, n;
...
for (i = 0; i < n; i++)
{
a[i] = x + b[i];
x = a[i] + 1/x;
};
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
Reduction operations are worth vectorizing because they occur so often. In general, reduction operations
are vectorized by creating a vector of partial reductions that is then reduced into the final resulting scalar.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
The compiler is able to vectorize loops containing pointers if it can determine that the loop is safe. Both
array references and pointer references in loops are analyzed to see if there is any vector access to
memory. In some cases, the compiler creates a run-time test, and executes a vector version or scalar
version of the loop depending on the result of the test.
Often, function arguments are passed as pointers. If several pointer variables are passed to a function, it
is possible that pointing to overlapping sections of memory can occur. Often, at runtime, this is not the
case but the compiler always follows the safe method and avoids optimizing loops that involve pointers
appearing on both the left and right sides of an assignment operator. For example, consider the following
function.
void func (int *pa, int *pb, int x)
{
int i;
for (i = 0; i < 100; i++)
{
*(pa + i) = *(pb + i) + x;
}
};
In this example, if pa and pb overlap in memory in a way that causes results from one loop pass to feed
back to a subsequent loop pass, then vectorization of the loop can give incorrect results. If the function is
called with the following arguments, vectorization might be ambiguous:
int *a;
The compiler performs a runtime test to see if pointer aliasing occurs. If pointer aliasing does not occur,
it executes a vectorized version of the code. If pointer aliasing occurs, the original nonvectorized code
executes instead. This leads to a small cost in runtime efficiency and code size.
In practice, it is very rare for data dependence to exist because of function arguments. Programs that pass
overlapping pointers are very hard to understand and debug, apart from any vectorization concerns.
In the example above, adding restrict to pa is sufficient to avoid the runtime test.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
Indirect addressing is not vectorizable with the NEON unit because it can only deal with vectors that are
stored consecutively in memory. If there is indirect addressing and significant calculations in a loop, it
might be more efficient for you to move the indirect addressing into a separate non vector loop. This
enables the calculations to vectorize efficiently.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
Related concepts
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.15 Vectorization on loops containing pointers on page 3-84.
3.14 Reduction of a vector to a scalar on page 3-83.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.23 Conditional statements and efficient vectorization on page 3-94.
Related references
3.11 Recommended loop structure for vectorization on page 3-80.
3.10 NEON vectorization performance goals on page 3-79.
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.
/* myprog1.c */ /* myprog2.c */
int a[99], b[99], c[99], i, n; int a[99], b[99], c[99], i, n;
... ...
for (i = 0; i < n; i++) while (i < n)
{ {
a[i] = b[i] + c[i]; a[i] = b[i] + c[i];
} i += a[i];
};
armcc --cpu=Cortex-A8 -O3 -Otime --vectorize myprog1.c - armcc --cpu=Cortex-A8 -O3 -Otime --
o- vectorize myprog2.c -o-
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
|L1.56| |L1.36|
VLD1.32 {d0,d1},[r1]! LDR r1,[r12,r0,LSL #2]
SUBS r0,r0,#1 LDR r6,[r4,r0,LSL #2]
VLD1.32 {d2,d3},[r2]! ADD r1,r1,r6
VADD.I32 q0,q0,q1 STR r1,[r5,r0,LSL #2]
VST1.32 {d0,d1},[r3]! ADD r0,r0,r1
BNE |L1.56| CMP r0,r2
STR r0,[r3,#0] ; i
BLT |L1.36|
POP {r4-r6}
BX lr
ENDP
|L1.80|
AND r0,r12,#3
CMP r0,#0
BLE |L1.144|
SUB r0,r12,r0
CMP r0,r12
BGE |L1.144|
LDR r1,|L1.164|
ADD r2,r1,#0x18c
SUB r3,r2,#0x318
|L1.116|
LDR r5,[r1,r0,LSL #2]
LDR r6,[r2,r0,LSL #2]
ADD r5,r5,r6
STR r5,[r3,r0,LSL #2]
ADD r0,r0,#1
CMP r0,r12
BLT |L1.116|
|L1.144|
LDR r0,[r4,#4] ; n
STR r0,[r4,#0] ; i
|L1.152|
POP {r4-r6}
BX lr
ENDP
Related concepts
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
The __promise intrinsic is required to enable vectorization if the loop iteration count at the start of the
loop is unknown, providing you can make the promise that you claim to make.
This reduces the size of the generated code and can give a performance improvement.
The disassembled output of the example code below illustrates the difference that __promise makes. The
disassembly is reduced to a simple vectorized loop with the removal of nonvectorized code that would
otherwise have been required for possible additional loop iterations. That is, loop iterations beyond those
that are a multiple of the lanes that can be used for the appropriate data type in a NEON register. (The
additional nonvectorized code is known as a scalar fix-up loop. With the use of the __promise(expr)
intrinsic, the scalar fix-up loop is removed.)
/* promise.c */
void f(int *x, int n)
{
int i;
__promise((n > 0) && ((n & 7) == 0));
for (i=0; i < n; i++) x[i]++;
}
When compiling for a processor that supports NEON, the disassembled output might be similar to the
following, for example:
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
f PROC
VMOV.I32 q0,#0x1
ASR r1,r1,#2
|L0.8|
VLD1.32 {d2,d3},[r0]
SUBS r1,r1,#1
VADD.I32 q1,q1,q0
VST1.32 {d2,d3},[r0]!
BNE |L0.8|
BX lr
ENDP
Related concepts
3.18 Vectorizable loop iteration counts on page 3-87.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
10.131 __promise intrinsic on page 10-742.
Related concepts
3.21 Vectorization and struct member lengths on page 3-92.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
This code could be rewritten for vectorization by using the same data type throughout the structure. For
example, if the variable b is to be of type int, consider making variables a and c of type int rather than
short.
Related concepts
3.20 Grouping structure accesses for vectorization on page 3-91.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
10.8 __inline on page 10-608.
10.6 __forceinline on page 10-605.
Related concepts
3.17 Nonvectorization on conditional loop exits on page 3-86.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.11 Recommended loop structure for vectorization on page 3-80.
The following example shows two functions that implement a simple sum operation on an array. This
code does not vectorize.
int addition(int a, int b)
{
return a + b;
}
void add_int(int *pa, int *pb, unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < n; i++) *(pa + i) = addition(*(pb + i),x);
/* Function calls cannot be vectorized */
}
Using the --diag_warning=optimizations option produces an optimization warning message for the
addition() function:
The final improvement you can make is to indicate the number of loop iterations. In the previous
example, the number of iterations is not fixed and might not be a multiple that can fit exactly into a
NEON register. This means that the compiler must test for remaining iterations to execute using
nonvectored code. If you know that your iteration count is divisible by the number of elements that the
NEON unit can operate on in parallel, you can indicate this to the compiler using the __promise
intrinsic. The following example shows the final code that obtains the best performance from
vectorization.
__inline int addition(int a, int b)
{
return a + b;
}
void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x)
{
unsigned int i;
__promise((n % 4) == 0);
/* n is a multiple of 4 */
for(i = 0; i < (n & ~3); i++) *(pa + i) = addition(*(pb + i),x);
}
Related concepts
3.25 Vectorizable code example on page 3-97.
3.26 DSP vectorizable code example on page 3-99.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
Related concepts
3.26 DSP vectorizable code example on page 3-99.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
7.26 Embedded assembler support in the compiler on page 7-291.
3.25 Vectorizable code example on page 3-97.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.43 --cpu=name on page 8-368.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
10.161 Predefined macros on page 10-786.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
Related information
--entry=location linker option.
Related concepts
3.26 DSP vectorizable code example on page 3-99.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
7.26 Embedded assembler support in the compiler on page 7-291.
3.25 Vectorizable code example on page 3-97.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
8.43 --cpu=name on page 8-368.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
10.161 Predefined macros on page 10-786.
8.164 --restrict, --no_restrict on page 8-500.
9.13 restrict on page 9-563.
Related information
--entry=location linker option.
Not having a valid NEON compiler You might require a valid NEON compiler license to generate NEON instructions, depending on
license. your compiler version.
RVCT 3.1 or later, and ARM Compiler 4.1, require a valid NEON compiler license.
ARM Compiler 5.01 and later do not require a separate NEON compiler license.
Source code without loops. Automatic vectorization involves loop analysis. Without loops, automatic vectorization cannot
apply.
Target processor. The target processor (--cpu) must have NEON capability if NEON instructions are to be
generated. For example, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-A12, or Cortex-A15.
Floating-point code. Vectorization of floating-point code does not always occur automatically. For example, loops
that require re-association only vectorize when compiled with --fpmode fast.
--no_vectorize by default. By default, generation of NEON vector instructions directly from C or C++ code is disabled,
and must be enabled with --vectorize.
-Otime not specified. -Otime must be specified to reduce execution time and enable loops to vectorize.
-Onum not set high enough. The optimization level you set must be -O2 or -O3. Loops do not vectorize at -O0 or -O1.
Risk of incorrect results. If there is a risk of an incorrect result, vectorization is not applied where that risk occurs. You
might have to manually tune your code to make it more suitable for automatic vectorization.
Earlier manual optimization Automatic vectorization can be impeded by earlier manual optimization attempts. For example,
attempts. manual loop unrolling in the source code, or complex array accesses.
No vector access pattern. If variables in a loop lack a vector access pattern, the compiler cannot automatically vectorize
the loop.
Data dependencies between Where there is a possibility of the use and storage of arrays overlapping on different iterations of
different iterations of a loop. a loop, there is a data dependency problem. A loop cannot be safely vectorized if the vector
order of operations can change the results, so the compiler leaves the loop in its original form or
only partially vectorizes the loop.
Memory hierarchy. Performing relatively few arithmetic operations on large data sets retrieved from main memory
is limited by the memory bandwidth of the system. Most processors are relatively unbalanced
between memory bandwidth and processor capacity. This can adversely affect the automatic
vectorization process.
Iteration count not fixed at start of For automatic vectorization, it is generally best to write simple loops with iterations that are
loop. fixed at the start of the loop. If a loop does not have a fixed iteration count, automatic addressing
is not possible.
Conditional loop exits. It is best to write loops that do not contain conditional exits from the loop.
Carry-around scalar variables. Carry-around scalar variables are a problem for automatic vectorization because the value
computed in one pass of the loop is carried forward into the next pass.
__promise(expr) not used. Failure to use __promise(expr) where it could make a difference to automatic vectorization
can limit automatic vectorization.
Pointer aliasing. Pointer aliasing prevents the use of automatically vectorized code.
Indirect addressing. Indirect addressing is not vectorizable because the NEON unit can only deal with vectors stored
consecutively in memory.
Separating access to different parts Each part of a structure must be accessed within the same loop for automatic vectorization to
of a structure into separate loops. occur.
Inconsistent length of members If members of a structure are not all the same length, the compiler does not attempt to use vector
within a structure. loads.
Calls to non-inline functions. Calls to non-inline functions from within a loop inhibits vectorization. If such functions are to
be considered for vectorization, they must be marked with the __inline or __forceinline
keywords.
if and switch statements. Extensive use of if and switch statements can affect the efficiency of automatic
vectorization.
You can use --diag_warning=optimizations to obtain compiler diagnostics on what can and cannot
be vectorized.
Related concepts
3.9 Factors affecting NEON vectorization performance on page 3-78.
3.12 Data dependency conflicts when vectorizing code on page 3-81.
3.17 Nonvectorization on conditional loop exits on page 3-86.
3.13 Carry-around scalar variables and vectorization on page 3-82.
3.16 Nonvectorization on loops containing pointers and indirect addressing on page 3-85.
3.18 Vectorizable loop iteration counts on page 3-87.
3.19 Indicating loop iteration counts to the compiler with __promise(expr) on page 3-89.
3.20 Grouping structure accesses for vectorization on page 3-91.
3.21 Vectorization and struct member lengths on page 3-92.
3.22 Nonvectorization of function calls to non-inline functions from within loops on page 3-93.
3.23 Conditional statements and efficient vectorization on page 3-94.
3.24 Vectorization diagnostics to tune code for improved performance on page 3-95.
Related references
8.192 --vectorize, --no_vectorize on page 8-534.
3.7 Data references within a vectorizable loop on page 3-76.
3.11 Recommended loop structure for vectorization on page 3-80.
9.13 restrict on page 9-563.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
10.6 __forceinline on page 10-605.
8.87 --fpmode=model on page 8-415.
10.8 __inline on page 10-608.
8.139 -Onum on page 8-473.
8.144 -Otime on page 8-480.
8.164 --restrict, --no_restrict on page 8-500.
10.131 __promise intrinsic on page 10-742.
• 4.23 Using compiler and linker support for symbol versions on page 4-133.
• 4.24 Precompiled Header (PCH) files on page 4-134.
• 4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
• 4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
• 4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
• 4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
• 4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
• 4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file
on page 4-143.
• 4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
• 4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
• 4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
• 4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
• 4.35 Default compiler options that are affected by optimization level on page 4-148.
Using compiler intrinsics, you can achieve more complete coverage of target architecture instructions
than you would from the instruction selection of the compiler.
An intrinsic function has the appearance of a function call in C or C++, but is replaced during
compilation by a specific sequence of low-level instructions. When implemented using an intrinsic, for
example, the saturated add function previous example has the form:
#include <dspfns.h> /* Include ETSI intrinsics */
...
int a, b, result;
...
result = L_add(a, b); /* Saturated add of a and b */
Related concepts
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.154 ETSI basic operations on page 10-768.
10.105 Instruction intrinsics on page 10-713.
10.155 C55x intrinsics on page 10-770.
Chapter 18 Using NEON Support on page 18-923.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
4.6 Compiler intrinsics for inserting optimization barriers on page 4-110.
4.8 Compiler intrinsics for Digital Signal Processing (DSP) on page 4-113.
Related references
4.4 Generic intrinsics on page 4-108.
4.7 Compiler intrinsics for inserting native instructions on page 4-112.
Related references
10.106 __breakpoint intrinsic on page 10-714.
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.
10.127 __nop intrinsic on page 10-737.
10.137 __return_address intrinsic on page 10-748.
10.140 __semihost intrinsic on page 10-751.
10.160 GNU built-in functions on page 10-778.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.112 __disable_fiq intrinsic on page 10-720.
10.113 __disable_irq intrinsic on page 10-721.
10.116 __enable_fiq intrinsic on page 10-725.
10.117 __enable_irq intrinsic on page 10-726.
__dsb() DSB
__dmb() DMB
__isb() ISB
The memory barrier intrinsic also implicitly adds an optimization barrier intrinsic, and applies an
operand to the inserted instruction. The argument passed to either __dsb(), or __dmb() defines which
optimization barrier is added, and which operand is applied.
1 OSH __force_loads()
2 OSHST __force_stores()
3 OSH __memory_changed()
5 NSH __force_loads()
6 NSHST __force_stores()
7 NSH __memory_changed()
9 ISH __force_loads()
10 ISHST __force_stores()
11 ISH __memory_changed()
13 SY __force_loads()
(continued)
14 ST __force_stores()
15 SY __memory_changed()
Example
__dsb(5) inserts DSB NSH into the instruction stream, and implicitly adds the __force_loads()
optimization barrier intrinsic.
Note
• For__isb(), the only supported operand is SY.
• When compiling for an ARMv7-M target, all intrinsic arguments passed to __isb(), __dmb() and
__dsb() emit the SY operand.
Related references
10.121 __force_stores intrinsic on page 10-730.
10.126 __memory_changed intrinsic on page 10-736.
10.139 __schedule_barrier intrinsic on page 10-750.
10.120 __force_loads intrinsic on page 10-729.
10.122 __isb intrinsic on page 10-731.
10.114 __dmb intrinsic on page 10-723.
10.115 __dsb intrinsic on page 10-724.
Related information
DMB.
DSB.
ISB.
Related references
10.107 __cdp intrinsic on page 10-715.
10.108 __clrex intrinsic on page 10-716.
10.123 __ldrex intrinsic on page 10-732.
10.125 __ldrt intrinsic on page 10-735.
10.128 __pld intrinsic on page 10-739.
10.130 __pli intrinsic on page 10-741.
10.135 __rbit intrinsic on page 10-746.
10.136 __rev intrinsic on page 10-747.
10.138 __ror intrinsic on page 10-749.
10.141 __sev intrinsic on page 10-753.
10.145 __strex intrinsic on page 10-757.
10.147 __strt intrinsic on page 10-761.
10.148 __swp intrinsic on page 10-762.
10.150 __wfe intrinsic on page 10-764.
10.151 __wfi intrinsic on page 10-765.
10.152 __yield intrinsic on page 10-766.
Related references
10.109 __clz intrinsic on page 10-717.
10.118 __fabs intrinsic on page 10-727.
10.119 __fabsf intrinsic on page 10-728.
10.132 __qadd intrinsic on page 10-743.
10.133 __qdbl intrinsic on page 10-744.
10.134 __qsub intrinsic on page 10-745.
10.142 __sqrt intrinsic on page 10-754.
10.143 __sqrtf intrinsic on page 10-755.
10.144 __ssat intrinsic on page 10-756.
10.149 __usat intrinsic on page 10-763.
10.153 ARMv6 SIMD intrinsics on page 10-767.
Note
Version 2.0 of the ETSI collection of basic operations, as described in the ITU-T Software Tool Library
2005 User's manual, introduces new 16-bit, 32-bit and 40 bit-operations. These operations are not
supported in the ARM compilation tools.
The ETSI basic operations serve as a set of primitives for developers publishing codec algorithms, rather
than as a library for use by developers implementing codecs in C or C++.
ARM Compiler 4.1 and later provide support for the ETSI basic operations through the header file
dspfns.h. The dspfns.h header file contains definitions of the ETSI basic operations as a combination
of C code and intrinsics.
See dspfns.h for a complete list of the ETSI basic operations supported in ARM Compiler 4.1 and later.
ARM Compiler 4.1 and later support the original ETSI family of basic operations as described in the
ETSI G.729 recommendation Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-
excited linear prediction (CS-ACELP), including:
• 16-bit and 32-bit saturated arithmetic operations, such as add and sub. For example, add(v1, v2)
adds two 16-bit numbers v1 and v2 together, with overflow control and saturation, returning a 16-bit
result.
• 16-bit and 32-bit multiplication operations, such as mult and L_mult. For example, mult(v1, v2)
multiplies two 16-bit numbers v1 and v2 together, returning a scaled 16-bit result.
• 16-bit arithmetic shift operations, such as shl and shr. For example, the saturating left shift operation
shl(v1, v2) arithmetically shifts the 16-bit input v1 left v2 positions. A negative shift count shifts
v1 right v2 positions.
• 16-bit data conversion operations, such as extract_l, extract_h, and round. For example,
round(L_v1) rounds the lower 16 bits of the 32-bit input L_v1 into the most significant 16 bits with
saturation.
Note
Beware that both the dspfns.h header file and the ISO C99 header file math.h both define (different
versions of) the function round(). Take care to avoid this potential conflict.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
4.10 Overflow and carry status flags for C and C++ code on page 4-116.
Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
10.154 ETSI basic operations on page 10-768.
4.10 Overflow and carry status flags for C and C++ code
The implementation of the European Telecommunications Standards Institute (ETSI) basic operations in
dspfns.h exposes the status flags Overflow and Carry.
These flags are available as global variables for use in your own C or C++ programs. For example:
#include <dspfns.h> /* include ETSI intrinsics */
#include <stdio.h>
...
const int BUFLEN=255;
int a[BUFLEN], b[BUFLEN], c[BUFLEN];
...
Overflow = 0; /* clear overflow flag */
for (i = 0; i < BUFLEN; ++i) {
c[i] = L_add(a[i], b[i]); /* saturated add of a[i] and b[i] */
}
if (Overflow)
{
fprintf(stderr, "Overflow on saturated addition\n");
}
Generally, saturating functions have a sticky effect on overflow. That is, the overflow flag remains set
until it is explicitly cleared.
Related concepts
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.12 NEON intrinsics provided by the compiler on page 4-118.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
Related information
Texas Instruments, http://www.ti.com.
Related concepts
4.1 Compiler intrinsics on page 4-105.
4.5 Compiler intrinsics for controlling IRQ and FIQ interrupts on page 4-109.
4.9 Compiler support for European Telecommunications Standards Institute (ETSI) basic operations
on page 4-114.
4.11 Texas Instruments (TI) C55x intrinsics for optimizing C code on page 4-117.
3.2 The NEON unit on page 3-70.
3.3 Methods of writing code for NEON on page 3-72.
Related tasks
4.13 Using NEON intrinsics on page 4-119.
Related references
4.2 Performance benefits of compiler intrinsics on page 4-106.
4.3 ARM assembler instruction intrinsics on page 4-107.
Chapter 18 Using NEON Support on page 18-923.
Procedure
1. Create the following example C program source code:
/* neon_example.c - Neon intrinsics example program */
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <arm_neon.h>
/* fill array with increasing integers beginning with 0 */
void fill_array(int16_t *array, int size)
{
int i;
for (i = 0; i < size; i++)
{
array[i] = i;
}
}
/* return the sum of all elements in an array. This works by calculating 4 totals (one
for each lane) and adding those at the end to get the final total */
int sum_array(int16_t *array, int size)
{
/* initialize the accumulator vector to zero */
int16x4_t acc = vdup_n_s16(0);
int32x2_t acc1;
int64x1_t acc2;
/* this implementation assumes the size of the array is a multiple of 4 */
assert((size % 4) == 0);
/* counting backwards gives better code */
for (; size != 0; size -= 4)
{
int16x4_t vec;
/* load 4 values in parallel from the array */
vec = vld1_s16(array);
/* increment the array pointer to the next element */
array += 4;
/* add the vector to the accumulator vector */
acc = vadd_s16(acc, vec);
}
/* calculate the total */
acc1 = vpaddl_s16(acc);
acc2 = vpaddl_s32(acc1);
/* return the total as an integer */
return (int)vget_lane_s64(acc2, 0);
}
/* main function */
int main()
{
int16_t my_array[100];
fill_array(my_array, 100);
printf("Sum was %d\n", sum_array(my_array, 100));
return 0;
}
Related concepts
4.12 NEON intrinsics provided by the compiler on page 4-118.
Related references
8.21 -c on page 8-344.
8.43 --cpu=name on page 8-368.
4.14 Compiler support for accessing registers using named register variables
You can use named register variables to access registers of an ARM architecture-based processor.
Named register variables are declared by combining the register keyword with the __asm keyword.
The __asm keyword takes one parameter, a character string, that names the register. For example, the
following declaration declares R0 as a named register variable for the register r0:
register int R0 __asm("r0");
Any type of the same size as the register being named can be used in the declaration of a named register
variable. The type can be a structure, but bitfield layout is sensitive to endianness.
Note
Writing to the current stack pointer, "r13" or "sp", can give unpredictable results at either compile-time
or run-time.
You must declare core registers as global rather than local named register variables. Your program might
still compile if you declare them locally, but you risk unexpected runtime behavior if you do. There is no
restriction on the scope of named register variables for other registers.
Note
A global named register variable is global to the source file in which it is declared, not global to the
program. It has no effect on other files, unless you use multifile compilation or you declare it in a header
file.
A typical use of named register variables is to access bits in the Application Program Status Register
(APSR). The following example shows how to use named register variables to set the saturation flag Q in
the APSR.
#ifndef __BIG_ENDIAN // bitfield layout of APSR is sensitive to endianness
typedef union
{
struct
{
int mode:5;
int T:1;
int F:1;
int I:1;
int _dnm:19;
int Q:1;
int V:1;
int C:1;
int Z:1;
int N:1;
} b;
unsigned int word;
} PSR;
#else /* __BIG_ENDIAN */
typedef union
{
struct
{
int N:1;
int Z:1;
int C:1;
int V:1;
int Q:1;
int _dnm:19;
int I:1;
int F:1;
int T:1;
int mode:5;
} b;
unsigned int word;
} PSR;
#endif /* __BIG_ENDIAN */
/* Declare PSR as a register variable for the "apsr" register */
register PSR apsr __asm("apsr");
void set_Q(void)
{
apsr.b.Q = 1;
}
The following example shows how to use a named register variable to clear the Q flag in the APSR.
register unsigned int _apsr __asm("apsr");
void ClearQFlag(void)
{
_apsr = _apsr & ~0x08000000; // clear Q flag
}
Compiling this example using --cpu=7-M results in the following assembly code:
ClearQFlag
MRS r0,APSR ; formerly CPSR
BIC r0,r0,#0x80000000
MSR APSR_nzcvq,r0; formerly CPSR_f
BX lr
The following example shows how to use named register variables to set up stack pointers.
register unsigned int _control __asm("control");
register unsigned int _msp __asm("msp");
register unsigned int _psp __asm("psp");
void init(void)
{
_msp = 0x30000000; // set up Main Stack Pointer
_control = _control | 3; // switch to User Mode with Process Stack
_psp = 0x40000000; // set up Process Stack Pointer
}
Compiling this example using --cpu=7-M results in the following assembly code:
init
MOV r0,0x30000000
MSR MSP,r0
MRS r0,CONTROL
ORR r0,r0,#3
MSR CONTROL,r0
MOV r0,#0x40000000
MSR PSP,r0
BX lr
You can also use named register variables to access registers within a coprocessor. The string syntax
within the declaration corresponds to how you intend to use the variable. For example, to declare a
variable that you intend to use with the MCR instruction, look up the instruction syntax for this instruction
and use this syntax when you declare your variable. The following example shows how to use a named
register variable to set bits in a coprocessor register.
register unsigned int PMCR __asm("cp15:0:c9:c12:0");
void __reset_cycle_counter(void)
{
PMCR = 4;
}
Compiling this example using --cpu=7-M results in the following assembly code:
__reset_cycle_counter PROC
MOV r0,#4
MCR p15,#0x0,r0,c9,c12,#0 ; move from r0 to c9
BX lr
ENDP
In the above example, PMCR is declared as a register variable of type unsigned int, that is associated
with the cp15 coprocessor, with CRn = c9, CRm = c12, opcode1 = 0, and opcode2 = 0 in an MCR or MRC
instruction. The MCR encoding in the disassembly corresponds with the register variable declaration.
The physical coprocessor register is specified with a combination of the two register numbers, CRn and
CRm, and two opcode numbers. This maps to a single physical register.
The same principle applies if you want to manipulate individual bits in a register, but you write normal
variable arithmetic in C, and the compiler does a read-modify-write of the coprocessor register. The
following example shows how to manipulate bits in a coprocessor register using a named register
variable
register unsigned int SCTLR __asm("cp15:0:c1:c0:0");
/* Set bit 11 of the system control register */
void enable_branch_prediction(void)
{
SCTLR |= (1 << 11);
}
Compiling this example using --cpu=7-M results in the following assembly code:
__enable_branch_prediction PROC
MRC p15,#0x0,r0,c1,c0,#0
ORR r0,r0,#0x800
MCR p15,#0x0,r0,c1,c0,#0
BX lr
ENDP
Related references
10.5 __asm on page 10-604.
10.159 Named register variables on page 10-774.
Related information
Application Program Status Register.
MRC and MRC2.
Miscellaneous pragmas
• #pragma arm section [section_type_list]
• #pragma import(__use_full_stdio)
• #pragma inline, #pragma no_inline
• #pragma once
• #pragma pack(n)
• #pragma softfp_linkage, #pragma no_softfp_linkage
• #pragma import symbol_name
Related references
10.77 #pragma anon_unions, #pragma no_anon_unions on page 10-682.
10.78 #pragma arm on page 10-683.
10.79 #pragma arm section [section_type_list] on page 10-684.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.
10.82 #pragma diag_remark tag[,tag,...] on page 10-688.
10.83 #pragma diag_suppress tag[,tag,...] on page 10-689.
Related concepts
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.18 --bitband compiler command-line option on page 4-128.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.
Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.
In the following example, the unplaced bit-banded objects must be relocated into the bit-band region.
This can be achieved by either using an appropriate scatter-loading description file or by using the
--rw_base linker command-line option.
/* foo.c */
typedef struct {
int i : 1;
int j : 2;
int k : 3;
} BB __attribute__((bitband));
BB value; // Unplaced object
void update_value(void)
{
value.i = 1;
value.j = 0;
}
/* end of foo.c */
Alternatively, you can use __attribute__((at())) to place bit-banded objects at a particular address in
the bit-band region, as in the following example:
/* foo.c */
typedef struct {
int i : 1;
int j : 2;
int k : 3;
} BB __attribute((bitband));
BB value __attribute__((at(0x20000040))); // Placed object
void update_value(void)
{
value.i = 1;
value.j = 0;
}
/* end of foo.c */
Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.18 --bitband compiler command-line option on page 4-128.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.
Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.
Related information
Scatter-loading Features.
--rw_base=address linker option.
armcc supports the bit-banding of objects accessed through absolute addresses. When --bitband is
applied to foo.c in the following example, the access to rts is bit-banded.
/* foo.c */
typedef struct {
int rts : 1;
int cts : 1;
unsigned int data;
} uart;
#define com2 (*((volatile uart *)0x20002000))
void put_com2(int n)
{
com2.rts = 1;
com2.data = n;
}
/* end of foo.c */
Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.19 How the compiler handles bit-band objects placed outside bit-band regions on page 4-129.
Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.
4.19 How the compiler handles bit-band objects placed outside bit-band regions
Bit-band objects must not be placed outside bit-band regions.
If you do inadvertently place a bit-band object outside a bit-band region, either using the at attribute, or
using an integer pointer to a particular address, the compiler responds as follows:
• If the bitband attribute is applied to an object type and --bitband is not specified on the command
line, the compiler generates an error.
• If the bitband attribute is applied to an object type and --bitband is specified on the command line,
the compiler generates a warning, and ignores the request to bit-band.
• If the bitband attribute is not applied to an object type and --bitband is specified on the command
line, the compiler ignores the request to bit-band.
Related concepts
4.16 Compiler and processor support for bit-banding on page 4-126.
4.17 Compiler type attribute, __attribute__((bitband)) on page 4-127.
4.18 --bitband compiler command-line option on page 4-128.
Related references
10.58 __attribute__((bitband)) type attribute on page 10-663.
10.64 __attribute__((at(address))) variable attribute on page 10-669.
8.17 --bitband on page 8-339.
Related references
10.29 __declspec(thread) on page 10-632.
$a
.text
f
0x00000000: e59f0000 .... LDR r0,[pc,#0] ; [0x8] = 0xdeadbeef
0x00000004: e12fff1e ../. BX lr
$d
0x00000008: deadbeef .... DCD 3735928559 ***
An alternative to using literal pools is to generate the constant in a register with a MOVW/MOVT instruction
pair:
** Section #1 '.text' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 12 bytes (alignment 4)
Address: 0x00000000
$a
.text
f
0x00000000: e30b0eef .... MOV r0,#0xbeef
0x00000004: e34d0ead ..M. MOVT r0,#0xdead
0x00000008: e12fff1e ../. BX lr
In most cases, generating literal pools improves performance and code size. However, in some specific
cases you might prefer to generate code without literal pools.
The following compiler options control literal pools:
• --integer_literal_pools.
• --string_literal_pools.
• --branch_tables.
• --float_literal_pools.
Related references
8.109 --integer_literal_pools, --no_integer_literal_pools on page 8-440.
8.178 --string_literal_pools, --no_string_literal_pools on page 8-515.
8.18 --branch_tables, --no_branch_tables on page 8-340.
8.83 --float_literal_pools, --no_float_literal_pools on page 8-411.
Related information
Procedure Call Standard for the ARM Architecture.
Application Binary Interface (ABI) for the ARM Architecture.
Alignment restrictions in load and store element and structure instructions.
alloca().
Section alignment with the linker.
Related references
9.36 Assembler labels on page 9-586.
Related information
Symbol versioning for BPABI models.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
When you compile source files, the included header files are also compiled. If a header file is included in
more than one source file, it is recompiled when each source file is compiled. Also, you might include
header files that introduce many lines of code, but the primary source files that include them are
relatively small. Therefore, it is often desirable to avoid recompiling a set of header files by precompiling
them. These are referred to as PCH files.
The compiler can precompile and use PCH files automatically with the --pch option, or you can use the
--create_pch and --use_pch options to manually control the use of PCH files.
Note
Support for PCH processing is not available when you specify multiple source files in a single
compilation. In such a case, the compiler issues an error message and aborts the compilation.
Note
Do not assume that if a PCH file is available, it is used by the compiler. In some cases, system
configuration issues mean that the compiler might not always be able to use the PCH file. Address Space
Randomization on Red Hat Enterprise Linux 3 (RHE3) is one example of a possible system
configuration issue.
Related concepts
2.4 Order of compiler command-line options on page 2-45.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
8.150 --pch_messages, --no_pch_messages on page 8-486.
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
Automatic PCH file processing means that the compiler automatically looks for a qualifying PCH file,
and reads it if found. Otherwise, the compiler creates one for use on a subsequent compilation.
When the compiler creates a PCH file, it takes the name of the primary source file and replaces the suffix
with .pch. The PCH file is created in the directory of the primary source file unless the --pch_dir
option is specified.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.24 Precompiled Header (PCH) files on page 4-134.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
4.26 Precompiled Header (PCH) file processing and the header stop point
The PCH file contains a snapshot of all the code that precedes a header stop point.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
Typically, the header stop point is the first token in the primary source file that does not belong to a
preprocessing directive. In the following example, the header stop point is int and the PCH file contains
a snapshot that reflects the inclusion of xxx.h and yyy.h:
#include "xxx.h"
#include "yyy.h"
int i;
You can manually specify the header stop point with #pragma hdrstop. If you use this pragma, it must
appear before the first token that does not belong to a preprocessing directive. In this example, it must be
placed before int, as follows:
#include "xxx.h"
#include "yyy.h"
#pragma hdrstop
int i;
If a conditional directive block (#if, #ifdef, or #ifndef) encloses the first non-preprocessor token or
#pragma hdrstop, the header stop point is the outermost enclosing conditional directive.
For example:
#include "xxx.h"
#ifndef YYY_H
#define YYY_H 1
#include "yyy.h"
#endif
#if TEST /* Header stop point lies immediately before #if TEST */
int i;
#endif
In this example, the first token that does not belong to a preprocessing directive is int, but the header
stop point is the start of the #if block containing it. The PCH file reflects the inclusion of xxx.h and,
conditionally, the definition of YYY_H and inclusion of yyy.h. It does not contain the state produced by
#if TEST.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
10.87 #pragma hdrstop on page 10-693.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
Related references
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.150 --pch_messages, --no_pch_messages on page 8-486.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
That is, the compiler uses the PCH file representing the most preprocessing directives from the primary
source file.
For example, a primary source file might begin with:
#include "xxx.h"
#include "yyy.h"
#include "zzz.h"
If there is one PCH file for xxx.h and a second for xxx.h and yyy.h, the latter PCH file is selected,
assuming that both apply to the current compilation. Additionally, after the PCH file for the first two
headers is read in and the third is compiled, a new PCH file for all three headers is created if the
requirements for PCH file creation are met.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
The compiler indicates that a PCH file is obsolete, and deletes it, under the following circumstances:
• If the PCH file is based on at least one out-of-date header file but is otherwise applicable for the
current compilation.
• If the PCH file has the same base name as the source file being compiled, for example, xxx.pch and
xxx.c, but is not applicable for the current compilation, for example, because you have used different
command-line options.
These describe some common cases. You must delete other PCH files as required.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH)
file
You can manually specify the filename and location of PCH files for the compiler to create and use.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
Use the following compiler command-line options to specify PCH filenames and locations:
• --create_pch=filename
• --pch_dir=directory
• --use_pch=filename
If you use --create_pch or --use_pch with the --pch_dir option, the indicated filename is appended
to the directory name, unless the filename is an absolute path name.
Note
If multiple options are specified on the same command line, the following rules apply:
• --use_pch takes precedence over --pch.
• --create_pch takes precedence over all other PCH file options.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
8.44 --create_pch=filename on page 8-371.
8.148 --pch on page 8-484.
8.149 --pch_dir=dir on page 8-485.
8.190 --use_pch=filename on page 8-532.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
Use the #pragma hdrstop directive to insert a manual header stop point in the primary source file. Insert
it before the first token that does not belong to a preprocessing directive. This enables you to specify
where the set of header files that is subject to precompilation ends. For example,
#include "xxx.h"
#include "yyy.h"
#pragma hdrstop
#include "zzz.h"
In this example, the PCH file includes the processing state for xxx.h and yyy.h but not for zzz.h. This
is useful if you decide that the information following the #pragma hdrstop does not justify the creation
of another PCH file.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
You do not have to place this directive at the beginning of the file for it to take effect. For example, no
PCH file is created if you compile the following source code with armcc --create_pch=foo.pch
myprog.c:
#include "xxx.h"
#pragma no_pch
#include "zzz.h"
If you want to selectively enable PCH processing, for example, subject xxx.h to PCH file processing, but
not zzz.h, replace #pragma no_pch with #pragma hdrstop, as follows:
#include "xxx.h"
#pragma hdrstop
#include "zzz.h"
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
10.87 #pragma hdrstop on page 10-693.
10.92 #pragma no_pch on page 10-698.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
When the compiler creates or uses a PCH file, it displays the following kind of message:
test.c: creating precompiled header file test.pch
You can suppress this message with the compiler command-line option --no_pch_messages.
The --pch_verbose option enables verbose mode. In verbose mode, the compiler displays a message for
each PCH file that it considers but does not use, giving the reason why it cannot be used.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
4.34 Performance issues with Precompiled Header (PCH) files on page 4-147.
Related references
8.150 --pch_messages, --no_pch_messages on page 8-486.
8.151 --pch_verbose, --no_pch_verbose on page 8-487.
Note
Support for Precompiled Header (PCH) files is deprecated from ARM Compiler 5.05 onwards on all
platforms. Note that ARM Compiler on Windows 8 never supported PCH files.
PCH processing might not always be appropriate, for example, where you have an arbitrary set of files
with non-uniform initial sequences of preprocessing directives.
The benefits of PCH processing occur when several source files can share the same PCH file. The more
sharing, the less disk space is consumed. Sharing minimizes the disadvantage of large PCH files, without
giving up the advantage of a significant decrease in compilation times.
Therefore, to take full advantage of header file precompilation, you might have to re-order the #include
sections of your source files, or group #include directives within a commonly used header file.
Different environments and different projects might have differing requirements. Be aware, however, that
making the best use of PCH support might require some experimentation and probably some minor
changes to source code.
Related concepts
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
4.24 Precompiled Header (PCH) files on page 4-134.
4.25 Automatic Precompiled Header (PCH) file processing on page 4-136.
4.26 Precompiled Header (PCH) file processing and the header stop point on page 4-137.
4.27 Precompiled Header (PCH) file creation requirements on page 4-139.
4.28 Compilation with multiple Precompiled Header (PCH) files on page 4-141.
4.29 Obsolete Precompiled Header (PCH) files on page 4-142.
4.30 Manually specifying the filename and location of a Precompiled Header (PCH) file on page 4-143.
4.31 Selectively applying Precompiled Header (PCH) file processing on page 4-144.
4.32 Suppressing Precompiled Header (PCH) file processing on page 4-145.
4.33 Message output during Precompiled Header (PCH) processing on page 4-146.
Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.46 --data_reorder, --no_data_reorder on page 8-373.
8.139 -Onum on page 8-473.
Describes programming techniques and practices to help you increase the portability, efficiency and
robustness of your C and C++ source code.
It contains the following sections:
• 5.1 The compiler as an optimizing compiler on page 5-152.
• 5.2 Compiler optimization for code size versus speed on page 5-153.
• 5.3 Compiler optimization levels and the debug view on page 5-154.
• 5.4 Selecting the target processor at compile time on page 5-157.
• 5.5 Enabling NEON and FPU for bare-metal on page 5-158.
• 5.6 Optimization of loop termination in C code on page 5-159.
• 5.7 Loop unrolling in C code on page 5-161.
• 5.8 Compiler optimization and the volatile keyword on page 5-163.
• 5.9 Code metrics on page 5-165.
• 5.10 Code metrics for measurement of code size and data size on page 5-166.
• 5.11 Stack use in C and C++ on page 5-167.
• 5.12 Benefits of reducing debug information in objects and libraries on page 5-169.
• 5.13 Methods of reducing debug information in objects and libraries on page 5-170.
• 5.14 Guarding against multiple inclusion of header files on page 5-171.
• 5.15 Methods of minimizing function parameter passing overhead on page 5-172.
• 5.16 Returning structures from functions through registers on page 5-173.
• 5.17 Functions that return the same result when called with the same arguments on page 5-174.
• 5.18 Comparison of pure and impure functions on page 5-175.
• 5.19 Recommendation of postfix syntax when qualifying functions with ARM function modifiers
on page 5-176.
• 5.20 Inline functions on page 5-177.
• 5.21 Compiler decisions on function inlining on page 5-178.
Related concepts
5.2 Compiler optimization for code size versus speed on page 5-153.
5.3 Compiler optimization levels and the debug view on page 5-154.
5.6 Optimization of loop termination in C code on page 5-159.
5.8 Compiler optimization and the volatile keyword on page 5-163.
Related tasks
5.4 Selecting the target processor at compile time on page 5-157.
Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.43 --cpu=name on page 8-368.
8.46 --data_reorder, --no_data_reorder on page 8-373.
8.85 --forceinline on page 8-413.
8.87 --fpmode=model on page 8-415.
8.108 --inline, --no_inline on page 8-439.
8.115 --library_interface=lib on page 8-446.
8.116 --library_type=lib on page 8-448.
8.126 --lower_ropi, --no_lower_ropi on page 8-459.
8.127 --lower_rwpi, --no_lower_rwpi on page 8-460.
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
8.165 --retain=option on page 8-501.
-Ospace
This option causes the compiler to optimize mainly for code size. This is the default option.
-Otime
This option causes the compiler to optimize mainly for speed.
For best results, you must build your application using the most appropriate command-line option.
Note
These command-line options instruct the compiler to use optimizations that deliver the effect wanted in
the vast majority of cases. However, it is not guaranteed that -Otime always generates faster code, or that
-Ospace always generates smaller code.
Related references
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
0
Minimum optimization. Turns off most optimizations. When debugging is enabled, this option
gives the best possible debug view because the structure of the generated code directly
corresponds to the source code. All optimization that interferes with the debug view is disabled.
In particular:
• Breakpoints can be set on any reachable point, including dead code.
• The value of a variable is available everywhere within its scope, except where it is
uninitialized.
• Backtrace gives the stack of open function activations that is expected from reading the
source.
Note
Although the debug view produced by -O0 corresponds most closely to the source code, users
might prefer the debug view produced by -O1 because this improves the quality of the code
without changing the fundamental structure.
Note
Dead code includes reachable code that has no effect on the result of the program, for example
an assignment to a local variable that is never used. Unreachable code is specifically code that
cannot be reached via any control flow path, for example code that immediately follows a return
statement.
1
Restricted optimization. The compiler only performs optimizations that can be described by
debug information. Removes unused inline functions and unused static functions. Turns off
optimizations that seriously degrade the debug view. If used with --debug, this option gives a
generally satisfactory debug view with good code density.
The differences in the debug view from –O0 are:
• Breakpoints cannot be set on dead code.
• Values of variables might not be available within their scope after they have been initialized.
For example if their assigned location has been reused.
• Functions with no side-effects might be called out of sequence, or might be omitted if the
result is not needed.
• Backtrace might not give the stack of open function activations that is expected from reading
the source because of the presence of tailcalls.
The optimization level –O1 produces good correspondence between source code and object
code, especially when the source code contains no dead code. The generated code can be
significantly smaller than the code at –O0, which can simplify analysis of the object code.
2
High optimization. If used with --debug, the debug view might be less satisfactory because the
mapping of object code to source code is not always clear. The compiler might perform
optimizations that cannot be described by debug information.
This is the default optimization level.
The differences in the debug view from –O1 are:
• The source code to object code mapping might be many to one, because of the possibility of
multiple source code locations mapping to one point of the file, and more aggressive
instruction scheduling.
• Instruction scheduling is allowed to cross sequence points. This can lead to mismatches
between the reported value of a variable at a particular point, and the value you might expect
from reading the source code.
• The compiler automatically inlines functions.
3
Maximum optimization. When debugging is enabled, this option typically gives a poor debug
view. ARM recommends debugging at lower optimization levels.
If you use -O3 and -Otime together, the compiler performs extra optimizations that are more
aggressive, such as:
• High-level scalar optimizations, including loop unrolling. This can give significant
performance benefits at a small code size cost, but at the risk of a longer build time.
• More aggressive inlining and automatic inlining.
These optimizations effectively rewrite the input source code, resulting in object code with the
lowest correspondence to source code and the worst debug view. The
--loop_optimization_level=option controls the amount of loop optimization performed at
–O3 –Otime. The higher the amount of loop optimization the worse the correspondence between
source and object code.
Use of the --vectorize option also lowers the correspondence between source and object code.
For extra information about the high level transformations performed on the source code at
–O3 –Otime use the --remarks command-line option.
Because optimization affects the mapping of object code to source code, the choice of optimization level
with -Ospace and -Otime generally impacts the debug view.
The option -O0 is the best option to use if a simple debug view is required. Selecting -O0 typically
increases the size of the ELF image by 7 to 15%. To reduce the size of your debug tables, use the
--remove_unneeded_entities option.
Related concepts
5.12 Benefits of reducing debug information in objects and libraries on page 5-169.
Related references
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
8.47 --debug, --no_debug on page 8-374.
8.48 --debug_macros, --no_debug_macros on page 8-375.
8.68 --dwarf2 on page 8-396.
8.69 --dwarf3 on page 8-397.
8.139 -Onum on page 8-473.
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
8.163 --remove_unneeded_entities, --no_remove_unneeded_entities on page 8-499.
Related information
ELF for the ARM Architecture.
Procedure
1. Decide whether the compiled program is to run on a specific ARM architecture-based processor or on
different ARM processors.
2. Obtain the name, or names, of the target processors recognized by the compiler using the following
compiler command-line option:
--cpu=list
3. If the compiled program is to run on a specific ARM architecture-based processor, having obtained
the name of the processor with the --cpu=list option, select the target processor using the
--cpu=name compiler command-line option.
For example, to compile code to run on a Cortex-A9 processor:
armcc --cpu=Cortex-A9 myprog.c
Alternatively, if the compiled program is to run on different ARM processors, choose the lowest
common denominator architecture appropriate for the application and then specify that architecture in
place of the processor name. For example, to compile code for processors supporting the ARMv6
architecture:
armcc --cpu=6 myprog.c
Selecting the target processor using the --cpu=name command-line option lets the compiler:
• Make full use of all available instructions for that particular processor.
• Perform processor-specific optimizations such as instruction scheduling.
--cpu=list lists all the processors and architectures that the compiler supports.
Related references
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
Related tasks
5.4 Selecting the target processor at compile time on page 5-157.
3.4 Generating NEON instructions from C or C++ code on page 3-73.
Related references
8.43 --cpu=name on page 8-368.
8.89 --fpu=name on page 8-418.
Related information
--startup=symbol, --no_startup linker option.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations above, where the C code for both implementations has been
compiled using the options -O2 -Otime.
Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop
disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly. This
is because a compare with zero can be used instead.
In addition to saving an instruction in the loop, the variable n does not have to be saved across the loop,
so the use of a register is also saved in the decrementing loop disassembly. This eases register allocation.
It is even more important if the original termination condition involves a function call. For example:
for (...; i < get_limit(); ...);
The technique of initializing the loop counter to the number of iterations required, and then decrementing
down to zero, also applies to while and do statements.
Related concepts
5.7 Loop unrolling in C code on page 5-161.
The advantages and disadvantages of loop unrolling can be illustrated using the two sample routines
shown in the following table. Both routines efficiently test a single bit by extracting the lowest bit and
counting it, after which the bit is shifted out.
The first implementation uses a loop to count bits. The second routine is the first implementation
unrolled four times, with an optimization applied by combining the four shifts of n into one shift.
Unrolling frequently provides new opportunities for optimization.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations above, where the C code for each implementation has been
compiled using the option -O2.
On the ARM9 processor, checking a single bit takes six cycles in the disassembly of the bit-counting
loop shown in the leftmost column. The code size is only nine instructions. The unrolled version of the
bit-counting loop checks four bits at a time per loop iteration, taking on average only three cycles per bit.
However, the cost is the larger code size of fifteen instructions.
Related concepts
5.6 Optimization of loop termination in C code on page 5-159.
The two versions of the routine differ only in the way that buffer_full is declared. The first routine
version is incorrect. Notice that the variable buffer_full is not qualified as volatile in this version. In
contrast, the second version of the routine shows the same loop where buffer_full is correctly qualified
as volatile.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the examples above, where the C code for each implementation has been compiled using the
option -O2.
In the disassembly of the nonvolatile version of the buffer loop in the above table, the statement LDR r0,
[r0, #0] loads the value of buffer_full into register r0 outside the loop labeled |L1.12|. Because
buffer_full is not declared as volatile, the compiler assumes that its value cannot be modified
outside the program. Having already read the value of buffer_full into r0, the compiler omits
reloading the variable when optimizations are enabled, because its value cannot change. The result is the
infinite loop labeled |L1.12|.
In contrast, in the disassembly of the volatile version of the buffer loop, the compiler assumes the value
of buffer_full can change outside the program and performs no optimizations. Consequently, the value
of buffer_full is loaded into register r0 inside the loop labeled |L1.8|. As a result, the loop |L1.8| is
implemented correctly in assembly code.
To avoid optimization problems caused by changes to program state external to the implementation, you
must declare variables as volatile whenever their values can change unexpectedly in ways unknown to
the implementation.
In practice, you must declare a variable as volatile whenever you are:
• Accessing memory-mapped peripherals.
• Sharing global variables between multiple threads.
• Accessing global variables in an interrupt routine or signal handler.
The compiler does not optimize the variables you have declared as volatile.
Related concepts
5.10 Code metrics for measurement of code size and data size on page 5-166.
5.11 Stack use in C and C++ on page 5-167.
Related information
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--map, --no_map linker option.
--callgraph, --no_callgraph linker option.
5.10 Code metrics for measurement of code size and data size
The compiler, linker, and fromelf image converter let you measure code and data size.
Use the following command-line options:
• --info=sizes (armlink and fromelf).
• --info=totals (armcc, armlink, and fromelf).
• --map (armlink).
Related references
8.107 --info=totals on page 8-438.
Related information
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--map, --no_map linker option.
• Several optimizations can introduce new temporary variables to hold intermediate results. The
optimizations include: CSE elimination, live range splitting and structure splitting. The compiler tries
to allocate these temporary variables to registers. If not, it spills them to the stack.
• Generally, code compiled for processors that support only 16-bit encoded Thumb instructions makes
more use of the stack than ARM code and code compiled for processors that support 32-bit encoded
Thumb instructions. This is because 16-bit encoded Thumb instructions have only eight registers
available for allocation, compared to fourteen for ARM code and 32-bit encoded Thumb instructions.
• The AAPCS requires that some function arguments are passed through the stack instead of the
registers, depending on their type, size, and order.
4. After your application has finished executing, examine the stack space of memory to see how
many of the known values have been overwritten. The space has garbage in the used part and the
known values in the remainder.
5. Count the number of garbage values and multiply by four, to give their size, in bytes.
The result of the calculation shows how the size of the stack has grown, in bytes.
• Use RTSM, and define a region of memory where access is not allowed directly below your stack in
memory, with a map file. If the stack overflows into the forbidden region, a data abort occurs, which
can be trapped by the debugger.
Related information
Getting Started with DS-5, ARM DS-5 Product Overview, About Fixed Virtual Platform (FVP).
ARM DS-5 Using the Debugger.
ARM DS-5 EB FVP Reference Guide.
Fixed Virtual Platforms VE and MPS FVP Reference Guide.
Procedure Call Standard for the ARM Architecture.
--info=topic[,topic,...] fromelf option.
--info=topic[,topic,...] linker option.
--callgraph, --no_callgraph linker option.
Related concepts
5.3 Compiler optimization levels and the debug view on page 5-154.
Related references
5.13 Methods of reducing debug information in objects and libraries on page 5-170.
You can use the compiler option --remarks to warn about unguarded header files.
• Compile your code with the --no_debug_macros command-line option to discard preprocessor
macro definitions from debug tables.
• Consider using (or not using) --remove_unneeded_entities.
Caution
Although --remove_unneeded_entities can help to reduce the amount of debug information
generated per file, it has the disadvantage of reducing the number of debug sections that are common
to many files. This reduces the number of common debug sections that the linker is able to remove at
final link time, and can result in a final debug image that is larger than necessary. For this reason, use
--remove_unneeded_entities only when necessary.
Related concepts
2.18.1 Compilation build time on page 2-62.
5.12 Benefits of reducing debug information in objects and libraries on page 5-169.
5.3 Compiler optimization levels and the debug view on page 5-154.
Related tasks
2.18.2 Minimizing compilation build time on page 2-63.
Related references
8.48 --debug_macros, --no_debug_macros on page 8-375.
8.162 --remarks on page 8-498.
8.163 --remove_unneeded_entities, --no_remove_unneeded_entities on page 8-499.
Related references
8.91 -g on page 8-422.
Related concepts
5.16 Returning structures from functions through registers on page 5-173.
You can use __value_in_regs anywhere where multiple values have to be returned from a function.
Examples include:
• Returning multiple values from C and C++ functions.
• Returning multiple values from embedded assembly language functions.
• Making supervisor calls.
• Re-implementing __user_initial_stackheap.
Related concepts
5.15 Methods of minimizing function parameter passing overhead on page 5-172.
Related references
10.19 __value_in_regs on page 10-620.
5.17 Functions that return the same result when called with the same arguments
A function that always returns the same result when called with the same arguments, and does not
change any global data, is referred to as a pure function.
By definition, it is sufficient to evaluate any particular call to a pure function only once. Because the
result of a call to the function is guaranteed to be the same for any identical call, each subsequent call to
the function in code can be replaced with the result of the original call.
Using the keyword __pure when declaring a function indicates that the function is a pure function.
By definition, pure functions cannot have side effects. For example, a pure function cannot read or write
global state by using global variables or indirecting through pointers, because accessing global state can
violate the rule that the function must return the same value each time when called twice with the same
parameters. Therefore, you must use __pure carefully in your programs. Where functions can be
declared __pure, however, the compiler can often perform powerful optimizations, such as Common
Subexpression Eliminations (CSEs).
Related references
10.33 __attribute__((const)) function attribute on page 10-638.
10.48 __attribute__((pure)) function attribute on page 10-653.
5.18 Comparison of pure and impure functions on page 5-175.
10.13 __pure on page 10-614.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations above, where the C code for each implementation has been
compiled using the option -O2, and inlining has been suppressed.
In the disassembly where fact() is not qualified as __pure, fact() is called twice because the compiler
does not know that the function is a candidate for Common Subexpression Elimination (CSE). In
contrast, in the disassembly where fact() is qualified as __pure, fact() is called only once, instead of
twice, because the compiler has been able to perform CSE when adding fact(n) + fact(n).
Related concepts
5.17 Functions that return the same result when called with the same arguments on page 5-174.
Related references
10.13 __pure on page 10-614.
These function modifiers all have a common syntax. A function modifier such as __pure can qualify a
function declaration either:
• Before the function declaration. For example:
__pure int foo(int);
• After the closing parenthesis on the parameter list. For example:
int foo(int) __pure;
For simple function declarations, each syntax is unambiguous. However, for a function whose return type
or arguments are function pointers, the prefix syntax is imprecise. For example, the following function
returns a function pointer, but it is not clear whether __pure modifies the function itself or its returned
pointer type:
__pure int (*foo(int)) (int); /* declares 'foo' as a (pure?) function
that returns a pointer to a (pure?)
function.
It is ambiguous which of the two
function types is pure. */
In fact, the single __pure keyword at the front of the declaration of foo modifies both foo itself and the
function pointer type returned by foo.
In contrast, the postfix syntax enables clear distinction between whether __pure applies to the argument,
the return type, or the base function, when declaring a function whose argument and return types are
function pointers. For example:
int (*foo1(int) __pure) (int); /* foo1 is a pure function
returning a pointer to
a normal function */
int (*foo2(int)) (int) __pure; /* foo2 is a function
returning a pointer to
a pure function */
int (*foo3(int) __pure) (int) __pure; /* foo3 is a pure function
returning a pointer to
a pure function */
In this example:
• foo1 and foo3 are modified themselves.
• foo2 and foo3 return a pointer to a modified function.
• The functions foo3 and foo are identical.
Because the postfix syntax is more precise than the prefix syntax, ARM recommends that, where
possible, you make use of the postfix syntax when qualifying functions with ARM function modifiers.
Related references
10.11 __irq on page 10-611.
10.13 __pure on page 10-614.
10.15 __softfp on page 10-616.
10.16 __svc on page 10-617.
10.17 __svc_indirect on page 10-618.
10.19 __value_in_regs on page 10-620.
Related concepts
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.
Related information
--inline, --no_inline linker option.
Related concepts
5.20 Inline functions on page 5-177.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.
8.139 -Onum on page 8-473.
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.139 -Onum on page 8-473.
5.23 Inline functions and removal of unused out-of-line functions at link time
The linker cannot remove unused out-of-line functions from an object unless you place the unused out-
of-line functions in their own sections.
Use one of the following methods to place unused out-of-line functions in their own sections:
• --split_sections.
• __attribute__((section("name"))).
• #pragma arm section [section_type_list].
• Linker feedback.
--feedback is typically an easier method of enabling unused function removal.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.82 --feedback=filename on page 8-410.
8.175 --split_sections on page 8-512.
10.69 __attribute__((section("name"))) variable attribute on page 10-674.
10.79 #pragma arm section [section_type_list] on page 10-684.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.108 --inline, --no_inline on page 8-439.
8.134 --multifile, --no_multifile on page 8-467.
8.139 -Onum on page 8-473.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
Related information
GNU Compiler Collection, http://gcc.gnu.org.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.28 Inline functions in C99 mode on page 5-185.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.108 --inline, --no_inline on page 8-439.
10.8 __inline on page 10-608.
Related information
Elimination of common groups or sections.
/* example_header.h */
inline int my_function (int i)
{
return i + 42; // inline definition
}
/* file1.c */
#include "example_header.h"
... // uses of my_function()
/* file2.c */
#include "example_header.h"
... // uses of my_function()
/* myfile.c */
#include "example_header.h"
extern inline int my_function(int); // causes external definition.
This is the same strategy that is typically used for C++, but in C++ there is no special external definition,
and no requirement for it.
The definitions of inline functions can be different in different translation units. However, in typical use,
as in the above example, they are identical.
When compiling with --multifile, calls in one translation unit might be inlined using the external
definition in another translation unit.
C99 places some restrictions on inline definitions. They cannot define modifiable local static objects.
They cannot reference identifiers with static linkage.
In C99 mode, as with all other modes, the effects of __inline and inline are identical.
Inline functions with static linkage have the same behavior in C99 as in C++.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.29 Inline functions and debugging on page 5-187.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.108 --inline, --no_inline on page 8-439.
Related concepts
5.20 Inline functions on page 5-177.
5.21 Compiler decisions on function inlining on page 5-178.
5.22 Automatic function inlining and static functions on page 5-179.
5.23 Inline functions and removal of unused out-of-line functions at link time on page 5-180.
5.24 Automatic function inlining and multifile compilation on page 5-181.
5.26 Compiler modes and inline functions on page 5-183.
5.27 Inline functions in C++ and C90 mode on page 5-184.
5.28 Inline functions in C99 mode on page 5-185.
Related references
5.25 Restriction on overriding compiler decisions about function inlining on page 5-182.
8.15 --autoinline, --no_autoinline on page 8-337.
8.85 --forceinline on page 8-413.
10.6 __forceinline on page 10-605.
10.8 __inline on page 10-608.
8.108 --inline, --no_inline on page 8-439.
Related information
--inline, --no_inline linker option.
Related concepts
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
10.97 #pragma pack(n) on page 10-703.
Related concepts
5.30 Types of data alignment on page 5-188.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
long long, double, long double 8 Located at an address that is evenly divisible by 8.
Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
Related references
5.33 Relevance of natural data alignment at compile time on page 5-191.
Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
5.35 The __packed qualifier and unaligned data access in C and C++ code
The __packed qualifier sets the alignment of any valid type to 1.
This enables objects of packed type to be read or written using unaligned access.
Examples of objects that can be packed include:
• Structures.
• Unions.
• Pointers.
Related concepts
5.30 Types of data alignment on page 5-188.
5.31 Advantages of natural data alignment on page 5-189.
5.34 Unaligned data access in C and C++ code on page 5-192.
5.36 Unaligned fields in structures on page 5-194.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.
Related references
5.32 Compiler storage of data objects by natural byte alignment on page 5-190.
5.33 Relevance of natural data alignment at compile time on page 5-191.
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.
Note
The same principles apply to unions. You can declare either an entire union as __packed, or use the
__packed attribute to identify components of the union that are unaligned in memory.
Related concepts
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed
fields, and of a __packed struct and a #pragma packed struct on page 5-198.
Related references
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.
Related concepts
5.35 The __packed qualifier and unaligned data access in C and C++ code on page 5-193.
5.36 Unaligned fields in structures on page 5-194.
5.40 Comparisons of an unpacked struct, a __packed struct, and a struct with individually __packed
fields, and of a __packed struct and a #pragma packed struct on page 5-198.
Related references
10.12 __packed on page 10-612.
10.97 #pragma pack(n) on page 10-703.
When a pointer is declared as __packed, the compiler generates code that correctly accesses the
dereferenced value of the pointer, regardless of its alignment. The generated code consists of a sequence
of byte accesses, or variable alignment-dependent shifting and masking instructions, rather than a simple
LDR instruction. Consequently, declaring a pointer as __packed incurs a performance and code size
penalty.
Related concepts
5.39 Unaligned Load Register (LDR) instructions generated by the compiler on page 5-197.
Related references
10.12 __packed on page 10-612.
8.187 --unaligned_access, --no_unaligned_access on page 8-528.
In particular, the compiler can do this to load halfwords from memory, even where the architecture
supports dedicated halfword load instructions.
For example, to access an unaligned short within a __packed structure, the compiler might load the
required halfword into the top half of a register and then shift it down to the bottom half. This operation
requires only one memory access, whereas performing the same operation using LDRB instructions
requires two memory accesses, plus instructions to merge the two bytes.
Related concepts
5.38 Unaligned pointers in C and C++ code on page 5-196.
Related references
10.12 __packed on page 10-612.
8.187 --unaligned_access, --no_unaligned_access on page 8-528.
Table 5-10 C code for an unpacked struct, a packed struct, and a struct with individually packed fields
In the first implementation, the struct is not packed. In the second implementation, the entire structure
is qualified as __packed. In the third implementation, the __packed attribute is removed from the
structure and the individual field that is not naturally aligned is declared as __packed.
The following table shows the corresponding disassembly of the machine code produced by the compiler
for each of the sample implementations of the preceding table, where the C code for each implementation
has been compiled using the option -O2.
Table 5-11 Disassembly for an unpacked struct, a packed struct, and a struct with individually packed fields
Note
The -Ospace and -Otime compiler options control whether accesses to unaligned elements are made
inline or through a function call. Using -Otime results in inline unaligned accesses. Using -Ospace
results in unaligned accesses made through function calls.
In the disassembly of the unpacked struct example above, the compiler always accesses data on aligned
word or halfword addresses. The compiler is able to do this because the struct is padded so that every
member of the struct lies on its natural size boundary.
In the disassembly of the __packed struct example above, fields one and three are aligned on their
natural size boundaries by default, so the compiler makes aligned accesses. The compiler always carries
out aligned word or halfword accesses for fields it can identify as being aligned. For the unaligned field
Table 5-12 C code for a packed struct and a pragma packed struct
In the first implementation, taking the address of a field in a __packed struct or a __packed field in a
struct yields a __packed pointer, and the compiler generates a type error if you try to implicitly cast
this to a non-__packed pointer. In the second implementation, in contrast, taking the address of a field in
a #pragma packed struct does not yield a __packed-qualified pointer. However, the field might not be
properly aligned for its type, and dereferencing such an unaligned pointer results in Undefined behavior.
Related concepts
5.36 Unaligned fields in structures on page 5-194.
5.37 Performance penalty associated with marking whole structures as packed on page 5-195.
Related references
8.143 -Ospace on page 8-479.
8.144 -Otime on page 8-480.
10.12 __packed on page 10-612.
10.60 __attribute__((packed)) type attribute on page 10-665.
10.68 __attribute__((packed)) variable attribute on page 10-673.
10.97 #pragma pack(n) on page 10-703.
Related information
Application Binary Interface (ABI) for the ARM Architecture.
Code that uses hardware support for floating-point arithmetic is more compact and offers better
performance than code that performs floating-point arithmetic in software. However, hardware support
for floating-point arithmetic requires a VFP coprocessor.
Related concepts
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage on page 10-707.
8.43 --cpu=name on page 8-368.
10.118 __fabs intrinsic on page 10-727.
8.86 --fp16_format=format on page 8-414.
8.87 --fpmode=model on page 8-415.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.
10.142 __sqrt intrinsic on page 10-754.
10.160 GNU built-in functions on page 10-778.
10.161 Predefined macros on page 10-786.
Related information
Institute of Electrical and Electronics Engineers.
Floating-point Support.
ARM and Thumb floating-point build options (ARMv6 and earlier).
ARM and Thumb floating-point build options (ARMv7 and later).
If you are building ARM Linux applications using --arm_linux or --arm_linux_paths, the default is
always software floating-point linkage. Even if you specify a processor that implies an FPU (for
example, --cpu=ARM1136JF-S), the compiler still defaults to --fpu=softvfp+vfp, not --fpu=vfp.
If a VFP coprocessor is present, VFP instructions are generated. If there is no VFP coprocessor, the
compiler generates code that makes calls to the software floating-point library fplib to carry out
floating-point operations. fplib is available as part of the standard distribution of the ARM compilation
tools suite of C libraries.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
Related information
Floating-point Support.
When the example C code is compiled with the command-line options --cpu 5TE and --fpu softvfp,
the compiler produces machine code with the disassembly shown below. In this case, floating-point
arithmetic is performed in software through calls to library routines such as __aeabi_fmul.
||foo|| PROC
PUSH {r4-r6, lr}
MOV r4, r1
BL __aeabi_fadd
MOV r5, r0
MOV r1, r4
MOV r0, r4
BL __aeabi_fmul
MOV r1, r5
POP {r4-r6, lr}
B __aeabi_fsub
ENDP
However, when the example C code is compiled with the command-line option --fpu vfp, the compiler
produces machine code with the disassembly shown below. In this case, floating-point arithmetic is
performed in hardware through floating-point arithmetic instructions such as VMUL.F32.
||foo|| PROC
VADD.F32 s2, s0, s1
VMUL.F32 s0, s1, s1
VSUB.F32 s0, s0, s2
BX lr
ENDP
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.43 --cpu=name on page 8-368.
8.42 --cpu=list on page 8-367.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.
Related information
Application Binary Interface (ABI) for the ARM Architecture.
Note
Particular implementations of the VFP architecture might provide additional implementation-specific
functionality. For example, the VFP coprocessor hardware might include extra registers for describing
exceptional conditions. This extra functionality is known as sub-architecture functionality.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
Related information
ARM Application Note 133 - Using VFP with RVDS.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
Related information
Institute of Electrical and Electronics Engineers.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.87 --fpmode=model on page 8-415.
Related information
ARM Application Note 133 - Using VFP with RVDS.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.86 --fp16_format=format on page 8-414.
Related information
C++ ABI for the ARM Architecture.
Floating-point Support.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
S E T
Where:
S (bit[15]): Sign bit
E (bits[14:10]): Biased exponent
T (bits[9:0]): Mantissa.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.86 --fp16_format=format on page 8-414.
Related information
Institute of Electrical and Electronics Engineers.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
10.47 __attribute__((pcs("calling_convention"))) function attribute on page 10-652.
10.15 __softfp on page 10-616.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
Related information
Procedure Call Standard for the ARM Architecture.
Table 5-13 Compiler options for floating-point linkage and floating-point computations
Linkage Computations
softvfp specifies software floating-point linkage. When software floating-point linkage is used, either:
• The calling function and the called function must be compiled using one of the options --softvfp,
--fpu softvfp+vfpv2, --fpu softvfp+vfpv3, --fpu softvfp+vfpv3_fp16, softvfp+vfpv3_d16,
softvfp+vfpv3_d16_fp16, softvfp+vfpv4, softvfp+vfpv4_d16, or softvfp+fpv4-sp.
• The calling function and the called function must be declared using the __softfp keyword.
Each of the options --fpu softvfp, --fpu softvfp+vfpv2,--fpu softvfp+vfpv3,
--fpu softvfp+vfpv3_fp16, --fpu softvfpv3_d16, --fpu softvfpv3_d16_fp16,
--fpu softvfp+vfpv4, softvfp+vfpv4_d16 and softvfp+fpv4-sp specify software floating-point
linkage across the whole file. In contrast, the __softfp keyword enables software floating-point linkage
to be specified on a function by function basis.
Note
Rather than having separate compiler options to select the type of floating-point linkage you require and
the type of floating-point computations you require, you use one compiler option, --fpu, to select both.
For example, --fpu=softvfp+vfpv2 selects software floating-point linkage, and a hardware coprocessor
for the computations. Whenever you use softvfp, you are specifying software floating-point linkage.
If you use the --fpu option, you must know the VFP architecture version implemented in the target
processor. An alternative to --fpu=softvfp+... is --apcs=/softfp. This gives software linkage with
whatever VFP architecture version is implied by --cpu. --apcs=/softfp and --apcs=/hardfp are
alternative ways of requesting the integer or floating-point variant of the Procedure Call Standard for the
ARM Architecture (AAPCS).
To use hardware floating-point linkage when targeting ARM Linux, you must explicitly specify a --fpu
option that implies hardware floating-point linkage, for example, --fpu=vfpv3, or compile with
--apcs=/hardfp. The ARM Linux ABI does not support hardware floating-point linkage. The compiler
issues a warning to indicate this.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.89 --fpu=name on page 8-418.
8.115 --library_interface=lib on page 8-446.
10.15 __softfp on page 10-616.
10.100 #pragma softfp_linkage, #pragma no_softfp_linkage on page 10-707.
Related information
Procedure Call Standard for the ARM Architecture.
softvfp No No No No No No No
softvfp+vfpv2 No Yes No Yes No Yes Yes
softvfp+vfpv3 No Yes Yes Yes No Yes Yes
softvfp No Yes Yes Yes Yes Yes Yes
+vfpv3_fp16
Note
You can specify the floating-point linkage, independently of the VFP architecture, with --apcs.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
Related references
5.53 Processors and their implicit Floating-Point Units (FPUs) on page 5-218.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.89 --fpu=name on page 8-418.
ARM7TDMI SoftVFP
ARM7TDMI-S SoftVFP
ARM720T SoftVFP
ARM9E-S SoftVFP
ARM9TDMI SoftVFP
ARM920T SoftVFP
ARM922T SoftVFP
ARM926EJ-S SoftVFP
ARM946E-S SoftVFP
ARM966E-S SoftVFP
ARM1020E SoftVFP
ARM1136J-S SoftVFP
ARM1136J-S-rev1 SoftVFP
ARM1136JF-S VFPv2
ARM1136JF-S-rev1 VFPv2
ARM1156T2-S SoftVFP
ARM1176JZ-S SoftVFP
ARM1176JZF-S VFPv2
Cortex-A5 SoftVFP
Cortex-A5.vfp VFPv4_D16
Cortex-A5.neon VFPv4
Cortex-A7 VFPv4
Cortex-A7.no_neon VFPv4_D16
Cortex-A7.no_neon.no_vfp SoftVFP
Cortex-A8 VFPv3
Cortex-A8.no_neon SoftVFP
Cortex-A8NoNeon SoftVFP
Cortex-A9 VFPv3_FP16
Cortex-A9.no_neon VFPv3_D16_FP16
Cortex-A9.no_neon.no_vfp SoftVFP
Cortex-A12 VFPv4
Cortex-A12.no_neon.no_vfp SoftVFP
Cortex-A15 VFPv4
Cortex-A15.no_neon VFPv4_D16
Cortex-A15.no_neon.no_vfp SoftVFP
Cortex-A17 VFPv4
Cortex-A17.no_neon.no_vfp SoftVFP
Cortex-M0 SoftVFP
Cortex-M0plus SoftVFP
Cortex-M1 SoftVFP
Cortex-M1.os_extension SoftVFP
Cortex-M1.no_os_extension SoftVFP
Cortex-M3 SoftVFP
Cortex-M3-rev0 SoftVFP
Cortex-M4 SoftVFP
Cortex-M4.fp.sp FPv4-SP
Cortex-M7 SoftVFP
Cortex-M7.fp.sp FPv5-SP
Cortex-M7.fp.dp FPv5_D16
Cortex-R4 SoftVFP
Cortex-R4F VFPv3_D16
Cortex-R5 SoftVFP
Cortex-R5-rev1 SoftVFP
Cortex-R5F VFPv3_D16
Cortex-R5F-rev1 VFPv3_D16
Cortex-R5F-rev1.sp VFPv3_SP_D16
Cortex-R7 VFPv3_D16_FP16
Cortex-R7.no_vfp SoftVFP
Cortex-R8 VFPv3_D16_FP16
Cortex-R8.no_vfp SoftVFP
MPCore VFPv2
MPCore.no_vfp SoftVFP
MPCoreNoVFP SoftVFP
SC000 SoftVFP
SC300 SoftVFP
PJ4.no_vfp SoftVFP
QSP VFPv3_FP16
QSP.no_neon VFPv3_FP16
QSP.no_neon.no_vfp SoftVFP
Note
You can:
• Specify a different FPU with --fpu.
• Specify the floating-point linkage, independently of the FPU architecture, with --apcs.
• Display the complete expanded command line, including the FPU, with --echo.
Related concepts
5.41 Compiler support for floating-point arithmetic on page 5-200.
5.42 Default selection of hardware or software floating-point support on page 5-202.
5.43 Example of hardware and software support differences for floating-point arithmetic on page 5-203.
5.44 Vector Floating-Point (VFP) architectures on page 5-205.
5.45 Limitations on hardware handling of floating-point arithmetic on page 5-207.
5.46 Implementation of Vector Floating-Point (VFP) support code on page 5-208.
5.47 Compiler and library support for half-precision floating-point numbers on page 5-210.
5.48 Half-precision floating-point number format on page 5-211.
5.49 Compiler support for floating-point computations and linkage on page 5-212.
5.50 Types of floating-point linkage on page 5-213.
5.51 Compiler options for floating-point linkage and computations on page 5-214.
Related references
5.52 Floating-point linkage and computational requirements of compiler options on page 5-216.
8.6 --apcs=qualifier...qualifier on page 8-322.
8.71 --echo on page 8-399.
8.89 --fpu=name on page 8-418.
When integer division by zero is detected, a branch to __aeabi_idiv0() is made. To trap the division by
zero, therefore, you only have to place a breakpoint on __aeabi_idiv0().
The library provides two implementations of __aeabi_idiv0(). The default one does nothing, so if
division by zero is detected, the division function returns zero. However, if you use signal handling, an
alternative implementation is selected that calls __rt_raise(SIGFPE, DIVBYZERO).
If you provide your own version of __aeabi_idiv0(), then the division functions call this function. The
function prototype for __aeabi_idiv0() is:
int __aeabi_idiv0(void);
If __aeabi_idiv0() returns a value, that value is used as the quotient returned by the division function.
On entry into __aeabi_idiv0(), the link register LR contains the address of the instruction after the call
to the __aeabi_uidiv() division routine in your application code.
The offending line in the source code can be identified by looking up the line of C code in the debugger
at the address given by LR.
If you want to examine parameters and save them for postmortem debugging when trapping
__aeabi_idiv0, you can use the $Super$$ and $Sub$$ mechanism:
1. Prefix __aeabi_idiv0() with $Super$$ to identify the original unpatched function
__aeabi_idiv0().
2. Use __aeabi_idiv0() prefixed with $Super$$ to call the original function directly.
3. Prefix __aeabi_idiv0() with $Sub$$ to identify the new function to be called in place of the
original version of __aeabi_idiv0().
4. Use __aeabi_idiv0() prefixed with $Sub$$ to add processing before or after the original function
__aeabi_idiv0().
The following example shows how to intercept __aeabi_div0 using the $Super$$ and $Sub$$
mechanism.
extern void $Super$$__aeabi_idiv0(void);
/* this function is called instead of the original __aeabi_idiv0() */
void $Sub$$__aeabi_idiv0()
{
// insert code to process a divide by zero
...
// call the original __aeabi_idiv0 function
$Super$$__aeabi_idiv0();
}
If you re-implement __rt_raise(), then the library automatically provides the signal-handling library
version of __aeabi_idiv0(), which calls __rt_raise(), then that library version of __aeabi_idiv0()
is included in the final image.
In that case, when a divide-by-zero error occurs, __aeabi_idiv0() calls __rt_raise(SIGFPE,
DIVBYZERO). Therefore, if you re-implement __rt_raise(), you must check (signal == SIGFPE) &&
(type == DIVBYZERO) to determine if division by zero has occurred.
Related information
Run-time ABI for the ARM Architecture.
Related concepts
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.
5.58 Software floating-point division-by-zero debugging on page 5-227.
This traps any division-by-zero errors in code, and untraps all other exceptions, as illustrated in the
following example:
#include <stdio.h>
#include <fenv.h>
int main(void)
{ float a, b, c;
// Trap the Invalid Operation exception and untrap all other
// exceptions:
__ieee_status(FE_IEEE_MASK_ALL_EXCEPT, FE_IEEE_MASK_DIVBYZERO);
c = 0;
a = b / c;
printf("b / c = %f, ", a); // trap division-by-zero error
return 0;
}
Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.
5.58 Software floating-point division-by-zero debugging on page 5-227.
Related information
__ieee_status().
Placing a breakpoint on _fp_trapveneer() and executing the disassembly in the debug monitor
produces:
> run
Execution stopped at breakpoint 1: S:0x0000BAC8
In _fp_trapveneer (no debug info)
S:0x0000BAC8 PUSH {r12,lr}
The address contained in the link register LR is set to 0x8108, the address of the instruction after the
instruction BL __aeabi_fdiv that resulted in the exception.
Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.58 Software floating-point division-by-zero debugging on page 5-227.
Related concepts
5.55 Software floating-point division-by-zero errors in C code on page 5-223.
5.56 About trapping software floating-point division-by-zero errors on page 5-224.
5.57 Identification of software floating-point division-by-zero errors on page 5-225.
Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions.
• Some features available in C++, such as // comments and the ability to mix declarations and
statements.
• Some entirely new features, for example complex numbers, restricted pointers and designated
initializers.
• New keywords and identifiers.
• Extended syntax for the existing C90 language.
A selection of new features in C99 that might be of interest to developers using them for the first time are
documented.
Note
C90 is compatible with Standard C++ in the sense that the language specified by the standard is a subset
of C++, except for a few special cases. New features in the C99 standard mean that C99 is no longer
compatible with C++ in this sense.
Some examples of special cases where the language specified by the C90 standard is not a subset of C++
include support for // comments and merging of the typedef and structure tag namespaces. For example,
in C90 the following code expands to x = a / b - c; because /* hello world */ is deleted, but in C
++ and C99 it expands to x = a - c; because everything from // to the end of the first line is deleted:
x = a //* hello world */ b
- c;
The following code demonstrates how typedef and the structure tag are treated differently between C (90
and 99) and C++ because of their merged namespaces:
typedef int a;
{
struct a { int x, y; };
printf("%d\n", sizeof(a));
}
In C 90 and C99, this code defines two types with separate names whereby a is a typedef for int and
struct a is a structure type containing two integer data types. sizeof(a) evaluates to sizeof(int).
In C++, a structure type can be addressed using only its tag. This means that when the definition of
struct a is in scope, the name a used on its own refers to the structure type rather than the typedef, so
in C++ sizeof(a) is greater than sizeof(int).
Related concepts
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
Many library features that are new to C99 are available in C90 and C++. Some require macros such as
USE_C99_ALL or USE_C99_MATH to be defined before the #include.
Related concepts
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Related concepts
5.59 New language features of C99 on page 5-228.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Related references
8.176 --strict, --no_strict on page 8-513.
9.7 // comments on page 9-557.
Note
int *y = (int []) {1, 2, 3}; is accepted by the compiler, but int y[] = (int []) {1, 2, 3};
is not accepted as a high-level (global) initialization.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Members of an aggregate that are not explicitly initialized are initialized to zero by default.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
In hexadecimal format the exponent is a decimal number that indicates the power of two by which the
significant part is multiplied. Therefore 0x1.fp3 = 1.9375 * 8 = 1.55e1.
C99 also adds %a and %A format for printf().
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Note
When a struct has a flexible array member, the entire struct itself has incomplete type.
Flexible array members enable you to mimic dynamic type specification in C in the sense that you can
defer the specification of the array size to runtime. For example:
extern const int n;
typedef struct
{
int len;
char p[];
} str;
void foo(void)
{
size_t str_size = sizeof(str); // equivalent to offsetoff(str, p)
str *s = malloc(str_size + (sizeof(char) * n));
}
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
prints:
This function is called 'foo'.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Related references
10.162 Built-in function name variables on page 10-792.
The compiler inlines a function qualified with inline only if it is reasonable to do so. It is free to ignore
the hint if inlining the function adversely affects performance.
Note
The __inline keyword is available in C90.
Note
The semantics of inline in C99 are different to the semantics of inline in Standard C++.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
5.20 Inline functions on page 5-177.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Related references
9.12 long long on page 9-562.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
5.71 New block scopes for selection and iteration statements in C99
In a for loop, the first expression can be a declaration, like in C++. The scope of the declaration extends
to the body of the loop only.
For example:
extern int max;
for (int n = max - 1; n >= 0; n--)
{
// body of loop
}
is equivalent to:
extern int max;
{
int n = max - 1;
for (; n >= 0; n--)
{
// body of loop
}
}
Note
Unlike in C++, you cannot introduce new declarations in a for-test, if-test or switch-expression.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
For example:
# define RWDATA(X) PRAGMA(arm section rwdata=#X)
# define PRAGMA(X) _Pragma(#X)
RWDATA(foo) // same as #pragma arm section rwdata="foo"
int y = 1; // y is placed in section "foo"
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.73 Restricted pointers in C99 on page 5-243.
5.75 Complex numbers in C99 on page 5-245.
Pointers qualified with restrict can however point to different arrays, or to different regions within an
array.
It is your responsibility to ensure that restrict-qualified pointers do not point to overlapping regions of
memory.
__restrict, permitted in C90 and C++, is a synonym for restrict.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.75 Complex numbers in C99 on page 5-245.
Related references
8.164 --restrict, --no_restrict on page 8-500.
New generic function macros found in C99 that are not found in C90 include:
#define isinf(x) // non-zero only if x is positive or negative infinity
#define isnan(x) // non-zero only if x is NaN
#define isless(x, y) // 1 only if x < y and x and y are not NaN, and 0 otherwise
#define isunordered(x, y) // 1 only if either x or y is NaN, and 0 otherwise
New mathematical functions found in C99 that are not found in C90 include:
double acosh(double x); // hyperbolic arccosine of x
double asinh(double x); // hyperbolic arcsine of x
double atanh(double x); // hyperbolic arctangent of x
double erf(double x); // returns the error function of x
double round(double x); // returns x rounded to the nearest integer
double tgamma(double x); // returns the gamma function of x
C99 supports the new mathematical functions for all real floating-point types.
Single precision versions of all existing <math.h> functions are also supported.
Related concepts
5.60 New library features of C99 on page 5-230.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Related information
Institute of Electrical and Electronics Engineers.
Related concepts
5.59 New language features of C99 on page 5-228.
5.61 // comments in C99 and C90 on page 5-231.
5.62 Compound literals in C99 on page 5-232.
5.63 Designated initializers in C99 on page 5-233.
5.64 Hexadecimal floating-point numbers in C99 on page 5-234.
5.65 Flexible array members in C99 on page 5-235.
5.66 __func__ predefined identifier in C99 on page 5-236.
5.67 inline functions in C99 on page 5-237.
5.68 long long data type in C99 and C90 on page 5-238.
5.69 Macros with a variable number of arguments in C99 on page 5-239.
5.70 Mixed declarations and statements in C99 on page 5-240.
5.71 New block scopes for selection and iteration statements in C99 on page 5-241.
5.72 _Pragma preprocessing operator in C99 on page 5-242.
5.73 Restricted pointers in C99 on page 5-243.
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Note
The C99 semantics for bool are intended to match those of C++.
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99
In C90, the long data type can serve both as the largest integral type, and as a 32-bit container. C99
removes this ambiguity through the new standard library header files <inttypes.h> and <stdint.h>.
The header file <stdint.h> introduces the new types:
• intmax_t and uintmax_t, that are maximum width signed and unsigned integer types.
• intptr_t and unintptr_t, that are integer types capable of holding signed and unsigned object
pointers.
The header file <inttypes.h> provides library functions for manipulating values of type intmax_t,
including:
intmax_t imaxabs(intmax_t x); // absolute value of x
imaxdiv_t imaxdiv(intmax_t x, intmax_t y) // returns the quotient and remainder
// of x / y
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Related information
Institute of Electrical and Electronics Engineers.
the full output of the formatting operation is written into buffer regardless of whether there is enough
space to hold it. Consequently, more characters can be output than might fit in the memory allocated to
the string.
The snprintf functions found in the C99 version of <stdio.h> are safe versions of the sprintf
functions that prevent buffer overrun. In the statement:
snprintf(buffer, size, "Error %d: Cannot open file '%s'", errno, filename);
the variable size specifies the maximum number of characters that can be written to buffer. The buffer
can never be overrun, provided its size is always greater than the size specified by size.
Note
The C standard does not define what should happen if buffer + size exceeds 4GB (the limit of the 32-
bit address space). In this scenario, the ARM implementation of snprintf does not write any data to the
buffer (to prevent wrapping the buffer around the address space) and returns the number of bytes that
would have been written.
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
calls the single-precision version of the cos function, as determined by the type of the literal 0.78539f.
Note
Type-generic families of mathematical functions can be defined in C++ using the operator overloading
mechanism. The semantics of type-generic families of functions defined using operator overloading in
C++ are different from the semantics of the corresponding families of type-generic functions defined in
<tgmath.h>.
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.81 <wchar.h> wide character I/O functions in C99 on page 5-251.
Related concepts
5.60 New library features of C99 on page 5-230.
5.74 Additional <math.h> library functions in C99 on page 5-244.
5.75 Complex numbers in C99 on page 5-245.
5.76 Boolean type and <stdbool.h> in C99 on page 5-246.
5.77 Extended integer types and functions in <inttypes.h> and <stdint.h> in C99 on page 5-247.
5.78 <fenv.h> floating-point environment access in C99 on page 5-248.
5.79 <stdio.h> snprintf family of functions in C99 on page 5-249.
5.80 <tgmath.h> type-generic math macros in C99 on page 5-250.
The following example shows how to keep uninitialized data using #pragma arm section:
#pragma arm section zidata = "non_initialized"
int i, j; // uninitialized data in non_initialized section (without the pragma,
// would be in .bss section by default)
#pragma arm section zidata // back to default (.bss section)
int k = 0, l = 0; // zero-initialized data in .bss section
Specify --bss_threshold=0 when compiling this example code, to ensure that k and l are placed in a ZI
data section. If --bss_threshold=0 is not used, section name rwdata must be used instead of zidata.
The non_initialized section is placed into its own UNINIT execution region, as follows:
LOAD_1 0x0
{
EXEC_1 +0
{
* (+RO)
* (+RW)
* (+ZI) ; ZI data gets initialized to zero
}
EXEC_2 +0 UNINIT
{
* (non_initialized) ; ZI data does not get initialized to zero
}
}
Related references
8.93 --gnu on page 8-424.
10.79 #pragma arm section [section_type_list] on page 10-684.
10.69 __attribute__((section("name"))) variable attribute on page 10-674.
8.20 --bss_threshold=num on page 8-343.
Related information
Execution region attributes.
Describes the format of compiler diagnostic messages and how to control the output during compilation.
The compiler issues messages about potential portability problems and other hazards. It is possible to:
• Turn off specific messages. For example, warnings can be turned off if you are in the early stages of
porting a program written in old-style C. In general, however, it is better to check the code than to
turn off messages.
• Change the severity of specific messages.
It contains the following sections:
• 6.1 Severity of compiler diagnostic messages on page 6-254.
• 6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
• 6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
• 6.4 Prefix letters in compiler diagnostic messages on page 6-259.
• 6.5 Compiler exit status codes and termination messages on page 6-260.
• 6.6 Compiler data flow warnings on page 6-261.
Severity Description
Internal fault Internal faults indicate an internal problem with the compiler. Contact your supplier with feedback.
Error Errors indicate problems that cause the compilation to stop. These errors include command line errors, internal errors,
missing include files, and violations in the syntactic or semantic rules of the C or C++ language. If multiple source files
are specified, no more source files are compiled.
Warning Warnings indicate unusual conditions in your code that might indicate a problem. Compilation continues, and object
code is generated unless any more problems with an Error severity are detected.
Remark Remarks indicate common, but sometimes unconventional, use of C or C++. These diagnostics are not displayed by
default. Compilation continues, and object code is generated unless any more problems with an Error severity are
detected.
Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.
Note
tag is the four-digit number, nnnn, with the tool letter prefix, but without the letter suffix indicating the
severity.
Only errors with a suffix of -D following the error number can be downgraded by changing them into
warnings or remarks.
Note
These options also have pragma equivalents.
Related concepts
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.
Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.
Examples
The following example shows three identical functions, foo1(), foo2(), and foo3(), all of which would
normally provoke diagnostic message #177-D: variable "x" was declared but never
referenced.
For foo1(), the current pragma state is pushed to the stack and #pragma diag_suppress suppresses the
message. The message is re-enabled by #pragma pop before compiling foo2(). In foo3(), the message
is not suppressed because the #pragma push and #pragma pop do not enclose the full scope responsible
for the generation of the message:
#pragma push
#pragma diag_suppress 177
void foo1( void )
{
/* Here we do not expect a diagnostic, because we suppressed it. */
int x;
}
#pragma pop
Diagnostic messages use the pragma state in place at the time they are generated. If you use pragmas to
control a message in your code, you must be aware of when that message is generated. For example, the
following code is intended to suppress the diagnostic message #177-D: function "dummy" was
declared but never referenced:
#include <stdio.h>
#pragma push
#pragma diag_suppress 177
static int dummy(void)
{
printf("This function is never called.");
return 1;
}
#pragma pop
main(void){
printf("Hello world!\n");
}
However, message 177 is only generated after all functions have been processed. Therefore, the message
is generated after pragma pop restores the pragma state, and message 177 is not suppressed.
Removing pragma push and pragma pop would correctly suppress message 177, but would suppress
messages for all unreferenced functions rather than for only the dummy() function.
Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.
Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
10.80 #pragma diag_default tag[,tag,...] on page 10-686.
10.81 #pragma diag_error tag[,tag,...] on page 10-687.
10.82 #pragma diag_remark tag[,tag,...] on page 10-688.
10.83 #pragma diag_suppress tag[,tag,...] on page 10-689.
10.84 #pragma diag_warning tag[, tag, ...] on page 10-690.
10.98 #pragma pop on page 10-705.
10.99 #pragma push on page 10-706.
8.58 --diag_error=tag[,tag,...] on page 8-386.
8.59 --diag_remark=tag[,tag,...] on page 8-387.
8.60 --diag_style=arm|ide|gnu compiler option on page 8-388.
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
8.62 --diag_suppress=optimizations on page 8-390.
8.63 --diag_warning=tag[,tag,...] on page 8-391.
8.64 --diag_warning=optimizations on page 8-392.
C armcc
A armasm
L armlink or armar
Q fromelf
Use the prefix letters to control options that are passed from the compiler to other tools, for example,
include the prefix letter L to specify linker message numbers.
Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.5 Compiler exit status codes and termination messages on page 6-260.
6.6 Compiler data flow warnings on page 6-261.
Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
The signals SIGINT (caused by a user interrupt, like ^C) and SIGTERM (caused by a UNIX kill
command) are trapped by the compiler and cause abnormal termination.
On completion, the compiler returns a value greater than zero if an error is detected. If no error is
detected, a value of zero is returned.
Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.6 Compiler data flow warnings on page 6-261.
Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
The results of the analysis vary with the level of optimization used. This means that higher optimization
levels might produce a number of warnings that do not appear at lower levels.
The data flow analysis cannot reliably identify faulty code and any C4017W warnings issued by the
compiler are intended only as an indication of possible problems. For a full analysis of your code, use an
appropriate third-party analysis tool, for example Lint.
Related concepts
6.2 Options that change the severity of compiler diagnostic messages on page 6-255.
6.3 Controlling compiler diagnostic messages with pragmas on page 6-257.
6.4 Prefix letters in compiler diagnostic messages on page 6-259.
6.5 Compiler exit status codes and termination messages on page 6-260.
Related references
6.1 Severity of compiler diagnostic messages on page 6-254.
Describes the optimizing inline assembler and non-optimizing embedded assembler of the ARM
compiler, armcc.
Note
Using intrinsics is generally preferable to using inline or embedded assembly language.
• 7.17 Expansion of inline assembler instructions that use constants on page 7-281.
• 7.18 Expansion of inline assembler load and store instructions on page 7-282.
• 7.19 Inline assembler effect on processor condition flags in C and C++ code on page 7-283.
• 7.20 Inline assembler expression operands in C and C++ code on page 7-284.
• 7.21 Inline assembler register list operands in C and C++ code on page 7-285.
• 7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
• 7.23 Inline assembler function calls and branches in C and C++ code on page 7-287.
• 7.24 Inline assembler branches and labels in C and C++ code on page 7-289.
• 7.25 Inline assembler and virtual registers on page 7-290.
• 7.26 Embedded assembler support in the compiler on page 7-291.
• 7.27 Embedded assembler syntax in C and C++ on page 7-292.
• 7.28 Effect of compiler ARM and Thumb states on embedded assembler on page 7-293.
• 7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
• 7.30 Compiler generation of embedded assembly language functions on page 7-295.
• 7.31 Access to C and C++ compile-time constant expressions from embedded assembler
on page 7-297.
• 7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.
• 7.33 Manual overload resolution in embedded assembler on page 7-299.
• 7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
• 7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
• 7.36 __mcall_is_virtual(D, f) on page 7-302.
• 7.37 __mcall_is_in_vbase(D, f) on page 7-303.
• 7.38 __mcall_offsetof_vbase(D, f) on page 7-304.
• 7.39 __mcall_this_offset(D, f) on page 7-305.
• 7.40 __vcall_offsetof_vfunc(D, f) on page 7-306.
• 7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
• 7.42 Calling a nonvirtual member function on page 7-308.
• 7.43 Calling a virtual member function on page 7-309.
• 7.44 Accessing sp (r13), lr (r14), and pc (r15) on page 7-310.
• 7.45 Differences in compiler support for inline and embedded assembly code on page 7-311.
Related concepts
7.2 Inline assembler support in the compiler on page 7-265.
7.3 Restrictions on inline assembler support in the compiler on page 7-266.
7.4 Inline assembly language syntax with the __asm keyword in C and C++ on page 7-267.
7.5 Inline assembly language syntax with the asm keyword in C++ on page 7-268.
7.6 Inline assembler rules for compiler keywords __asm and asm on page 7-269.
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.14 Inline assembler and register access in C and C++ code on page 7-277.
7.15 Inline assembler and the # constant expression specifier in C and C++ code on page 7-279.
7.19 Inline assembler effect on processor condition flags in C and C++ code on page 7-283.
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
7.45 Differences in compiler support for inline and embedded assembly code on page 7-311.
7.23 Inline assembler function calls and branches in C and C++ code on page 7-287.
7.24 Inline assembler branches and labels in C and C++ code on page 7-289.
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.
Related references
10.159 Named register variables on page 10-774.
Related information
armasm User Guide.
Mixing C, C++, and Assembly Language.
7.4 Inline assembly language syntax with the __asm keyword in C and C++
The inline assembler is invoked with the assembler specifier, __asm, and is followed by a list of
assembler instructions inside braces or parentheses.
You can specify inline assembly code using the following formats:
• On a single line, for example:
__asm("instruction[;instruction]");
__asm{instruction[;instruction]}
This enables you to use macros to generate inline assembly, for example:
#define ADDLSL(x, y, shift) __asm ("ADD " #x ", " #y ", LSL " #shift)
• On multiple lines, for example:
__asm
{
...
instruction
...
}
You can use C or C++ comments anywhere in an inline assembly language block.
You can use an __asm statement wherever a statement is expected.
7.5 Inline assembly language syntax with the asm keyword in C++
When compiling C++, the compiler supports the asm syntax proposed in the ISO C++ Standard.
You can specify inline assembly code using the following formats:
• On a single line, for example:
asm("instruction[;instruction]");
asm{instruction[;instruction]}
This enables you to use macros to generate inline assembly, for example:
#define ADDLSL(x, y, shift) asm ("ADD " #x ", " #y ", LSL " #shift)
• On multiple lines, for example:
asm
{
...
instruction
...
}
You can use C or C++ comments anywhere in an inline assembly language block.
You can use an asm statement wherever a statement is expected.
7.6 Inline assembler rules for compiler keywords __asm and asm
There are a number of rule that apply to the __asm and asm keywords.
These rules are as follows:
• Multiple instructions on the same line must be separated with a semicolon (;).
• If an instruction requires more than one line, line continuation must be specified with the backslash
character (\).
• For the multiple line format, C and C++ comments are permitted anywhere in the inline assembly
language block. However, comments cannot be embedded in a line that contains multiple
instructions.
• The comma (,) is used as a separator in assembly language, so C expressions with the comma
operator must be enclosed in parentheses to distinguish them:
__asm
{
ADD x, y, (f(), z)
}
• Labels must be followed by a colon, :, like C and C++ labels.
• An asm statement must be inside a C++ function. An asm statement can be used anywhere a C++
statement is expected.
• Register names in inline assembly code are treated as C or C++ variables. They do not necessarily
relate to the physical register of the same name. If the register is not declared as a C or C++ variable,
the compiler generates a warning.
• Registers must not be saved and restored in inline assembly code. The compiler does this for you.
Also, the inline assembler does not provide direct access to the physical registers. However, indirect
access is provided through variables that act as virtual registers.
If registers other than ASPR, CPSR, and SPSR are read without being written to, an error message is
issued. For example:
int f(int x)
{
__asm
{
STMFD sp!, {r0} // save r0 - illegal: read before write
ADD r0, x, 1
EOR x, r0, x
LDMFD sp!, {r0} // restore r0 - not needed.
}
return x;
}
Related concepts
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
7.14 Inline assembler and register access in C and C++ code on page 7-277.
Related references
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.
10.137 __return_address intrinsic on page 10-748.
Caution
The compiler does not recognize such changes.
Instead of attempting to change processor modes or coprocessor states from within inline assembly code,
see if there are any intrinsics available that provide what you require. If no such intrinsics are available,
use embedded assembly code if absolutely necessary.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
4.1 Compiler intrinsics on page 4-105.
7.26 Embedded assembler support in the compiler on page 7-291.
Related information
Processor modes, and privileged and unprivileged software execution.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code
The inline assembler supports Thumb state in ARM architectures v6T2, v6M, and v7. There are a
number of Thumb-specific restrictions.
These restrictions are as follows:
1. TBB, TBH, CBZ, and CBNZ instructions are not supported.
2. In some cases, the compiler can replace IT blocks with branched code.
3. The instruction width specifier .N denotes a preference, but not a requirement, to the compiler. This is
because, in rare cases, optimizations and register allocation can make it inefficient to generate a 16-
bit encoding.
For ARMv6 and lower architectures, the inline assembler does not assemble any Thumb instructions.
Instead, on finding inline assembly while in Thumb state, the compiler switches to ARM state
automatically. Code that relies on this switch is currently supported, but this practice is deprecated. For
ARMv6T2 and higher, the automatic switch from Thumb to ARM state is made if the code is valid ARM
assembly but not Thumb.
ARM state can be set deliberately. Inline assembly language can be included in a source file that contains
code to be compiled for Thumb in ARMv6 and lower, by enclosing the functions containing inline
assembly code between #pragma arm and #pragma thumb statements. For example:
... // Thumb code
#pragma arm // ARM code. Switch code generation to the ARM instruction set so
// that the inline assembler is available for Thumb in ARMv6 and lower.
int add(int i, int j)
{
int res;
__asm
{
ADD res, i, j // add here
}
return res;
}
#pragma thumb // Thumb code. Switch back to the Thumb instruction set.
// The inline assembler is no longer available for Thumb in ARMv6 and
// lower.
The code must also be compiled using the --apcs /interwork compiler command-line option.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
Related references
8.6 --apcs=qualifier...qualifier on page 8-322.
10.76 Pragmas on page 10-681.
Related information
Instruction width specifiers.
IT.
TBB and TBH.
CBZ and CBNZ.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code
The inline assembler provides direct support for VFPv2 instructions.
For example:
float foo(float f, float g)
{
float h;
__asm
{
VADD h, f, 0.5*g; // h = f + 0.5*g
}
return h;
}
In inline assembly code you cannot use the VFP instruction VMOV to transfer between an ARM register
and half of a doubleword extension register (NEON scalar). Instead, you can use the instruction VMOV to
transfer between an ARM register and a single-precision VFP register.
If you change the FPSCR register using inline assembly code, it produces runtime effects on the inline
VFP code and on subsequent compiler-generated VFP code.
Note
• Do not use inline assembly code to change VFP vector mode. Inline assembly code must not be used
for this purpose, and VFP vector mode is deprecated.
• ARM strongly discourages the use of inline assembly coprocessor instructions to interact with VFP in
any way.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
5.41 Compiler support for floating-point arithmetic on page 5-200.
Related information
VMOV (between an ARM register and a NEON scalar).
VMOV (between one ARM register and single precision VFP).
• LDR Rn, =expression pseudo-instruction. Use MOV Rn, expression instead. (This can generate a
load from a literal pool.)
• LDRT, LDRBT, STRT, and STRBT instructions.
• MUL, MLA, UMULL, UMLAL, SMULL, and SMLAL flag setting instructions.
• MOV or MVN flag-setting instructions where the second operand is a constant.
• The special LDM instructions used in system or supervisor mode to load the user-mode banked
registers, written with a ^ after the register list, such as:
LDMIA sp!, {r0-r12, lr, pc}^
• ADR and ADRL pseudo-instructions.
Note
You can use MOV Rn, &expression; instead of the ADR and ADRL pseudo-instructions.
• ARM recommends not using the LDREX and STREX instructions. This is because the compiler might
generate loads and stores between LDREX and STREX, potentially clearing the exclusive monitor set by
LDREX. This recommendation also applies to the byte, halfword, and doubleword variants LDREXB,
STREXB, LDREXH, STREXH, LDREXD, and STREXD.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.13 Miscellaneous inline assembler restrictions in C and C++ code on page 7-276.
Related references
10.106 __breakpoint intrinsic on page 10-714.
Related concepts
7.7 Restrictions on inline assembly operations in C and C++ code on page 7-270.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
7.9 Inline assembler processor mode restrictions in C and C++ code on page 7-272.
7.10 Inline assembler Thumb instruction set restrictions in C and C++ code on page 7-273.
7.11 Inline assembler Vector Floating-Point (VFP) restrictions in C and C++ code on page 7-274.
7.12 Inline assembler instruction restrictions in C and C++ code on page 7-275.
The compiler does not implicitly declare variables for any other registers, so you must explicitly declare
variables for registers other than R0 to R12 and r0 to r12 in your C or C++ code. No variables are
declared for the sp (r13), lr (r14), and pc (r15) registers, and they cannot be read or directly modified
in inline assembly code.
There is no virtual Processor Status Register (PSR). Any references to the PSR are always to the physical
PSR.
Any variables that you use in inline assembly code to refer to registers must be explicitly declared in
your C or C++ code, unless they are implicitly declared by the compiler. However, it is better to
explicitly declare them in your C or C++ code. You do not have to declare them to be of the same data
type as the implicit declarations. For example, although the compiler implicitly declares register R0 to be
of type signed int, you can explicitly declare R0 as an unsigned integer variable if required.
It is also better to use C or C++ variables as instruction operands. The compiler generates a warning the
first time a variable or physical register name is used, regardless of whether it is implicitly or explicitly
declared, and only once for each translation unit. For example, if you use register r3 without declaring it,
a warning is displayed. You can suppress the warning with --diag_suppress.
Related concepts
7.18 Expansion of inline assembler load and store instructions on page 7-282.
7.8 Inline assembler register restrictions in C and C++ code on page 7-271.
Related references
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
7.15 Inline assembler and the # constant expression specifier in C and C++ code
The constant expression specifier # is optional. If it is used, the expression following it must be a
constant.
Related concepts
7.17 Expansion of inline assembler instructions that use constants on page 7-281.
7.18 Expansion of inline assembler load and store instructions on page 7-282.
7.1 Compiler support for inline assembly language on page 7-264.
With the exception of coprocessor instructions, all ARM instructions with a constant operand support
instruction expansion. In addition, the MUL instruction can be expanded into a sequence of adds and shifts
when the third operand is a constant.
The effect of updating the CPSR by an expanded instruction is:
• Arithmetic instructions set the NZCV flags correctly.
• Logical instructions:
— Set the NZ flags correctly.
— Do not change the V flag.
— Corrupt the C flag.
Related concepts
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.
Related concepts
7.14 Inline assembler and register access in C and C++ code on page 7-277.
7.16 Inline assembler and instruction expansion in C and C++ code on page 7-280.
Related references
8.61 --diag_suppress=tag[,tag,...] on page 8-389.
7.19 Inline assembler effect on processor condition flags in C and C++ code
An inline assembly language instruction might explicitly or implicitly attempt to update the processor
condition flags.
Inline assembly language instructions that involve only virtual register operands or simple expression
operands have predictable behavior. The condition flags are set by the instruction if either an implicit or
an explicit update is specified. The condition flags are unchanged if no update is specified.
If any of the instruction operands are not simple operands, then the condition flags might be corrupted
unless the instruction updates them.
In general, the compiler cannot easily diagnose potential corruption of the condition flags. However, for
operands that require the construction and subsequent destruction of C++ temporaries the compiler gives
a warning if the instruction attempts to update the condition flags. This is because the destruction might
corrupt the condition flags.
Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
Related concepts
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
Related information
ARM Architecture Reference Manual.
Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.22 Inline assembler intermediate operands in C and C++ code on page 7-286.
Related concepts
7.20 Inline assembler expression operands in C and C++ code on page 7-284.
7.21 Inline assembler register list operands in C and C++ code on page 7-285.
7.23 Inline assembler function calls and branches in C and C++ code
The BL and SVC instructions of the inline assembler enable you to specify three optional lists following
the normal instruction fields.
These instructions have the following format:
SVC{cond} svc_num, {input_param_list}, {output_value_list}, {corrupt_reg_list}
BL{cond} function, {input_param_list}, {output_value_list}, {corrupt_reg_list}
Note
RVCT v3.0 renamed the SWI instruction to SVC. The inline assembler still accepts SWI in place of SVC.
If you are compiling for architecture 5TE or later, the linker converts BL function instructions to BLX
function instructions if appropriate. However, you cannot use BLX function instructions directly
within inline assembly code.
• input_param_list specifies the expressions or variables that are the input parameters to the function
call or SVC instruction, and the physical registers that contain the expressions or variables. They are
specified as assignments to physical registers or as physical register names. A single list can contain
both types of input register specification.
The inline assembler ensures that the correct values are present in the specified physical registers
before the BL or SVC instruction is entered. A physical register name that is specified without
assignment ensures that the value in the virtual register of the same name is present in the physical
register. This ensures backwards compatibility with existing inline assembly language code.
For example, the instruction:
BL foo, { r0=expression1, r1=expression2, r2 }
By default, if you do not specify any input_param_list input parameters, registers r0 to r3 are used
as input parameters.
Note
It is not possible to specify the lr, sp, or pc registers in the input parameter list.
• output_value_list specifies the physical registers that contain the output values from the BL or SVC
instruction and where they must be stored. The output values are specified as assignments from
physical registers to modifiable lvalue expressions or as single physical register names.
The inline assembler takes the values from the specified physical registers and assigns them into the
specified expressions. A physical register name specified without assignment causes the virtual
register of the same name to be updated with the value from the physical register.
For example, the instruction:
BL foo, { }, { result1=r0, r1 }
By default, if you do not specify any output_value_list output values, register r0 is used for the
output value.
Note
It is not possible to specify the lr, sp, or pc registers in the output value list.
• corrupt_reg_list specifies the physical registers that are corrupted by the called function. If the
condition flags are modified by the called function, you must specify the PSR in the corrupted register
list.
The BL and SVC instructions always corrupt lr.
If corrupt_reg_list is omitted then for BL and SVC, the registers r0-r3, lr and the PSR are
corrupted.
Only the branch instruction, B, can jump to labels within a single C or C++ function.
By default, if you do not specify any corrupt_reg_list registers, r0 to r3, r14, and the PSR can be
corrupted.
Note
It is not possible to specify the lr, sp, or pc registers in the corrupt register list.
Note
• The BX, BLX, and BXJ instructions are not supported in the inline assembler.
• It is not possible to specify the lr, sp, or pc registers in any of the input, output, or corrupted register
lists.
• The sp register must not be changed by any SVC instruction or function call.
For example:
int foo(int x, int y)
{
__asm
{
SUBS x,x,y
BEQ end
}
return 1;
end:
return 0;
}
Related concepts
7.14 Inline assembler and register access in C and C++ code on page 7-277.
Related concepts
7.27 Embedded assembler syntax in C and C++ on page 7-292.
7.28 Effect of compiler ARM and Thumb states on embedded assembler on page 7-293.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
7.30 Compiler generation of embedded assembly language functions on page 7-295.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.
7.33 Manual overload resolution in embedded assembler on page 7-299.
7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
Related information
armasm User Guide.
Mixing C, C++, and Assembly Language.
Note
Argument names are permitted in the parameter list, but they cannot be used in the body of the embedded
assembly function. For example, the following function uses integer i in the body of the function, but
this is not valid in assembly:
__asm int f(int i)
{
ADD i, i, #1 // error
}
The following example shows a string copy routine as a not very optimal embedded assembler routine.
#include <stdio.h>
__asm void my_strcpy(const char *src, char *dst)
{
loop
LDRB r2, [r0], #1
STRB r2, [r1], #1
CMP r2, #0
BNE loop
BX lr
}
int main(void)
{
const char *a = "Hello world!";
char b[20];
my_strcpy (a, b);
printf("Original string: '%s'\n", a);
printf("Copied string: '%s'\n", b);
return 0;
}
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
• __asm functions do not change the ARM Architecture Procedure Call Standard (AAPCS) rules that
apply. This means that all calls between an __asm function and a normal C or C++ function must
adhere to the AAPCS, even though there are no restrictions on the assembly code that an __asm
function can use (for example, change state).
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.34 __offsetof_base keyword for related base classes in embedded assembler on page 7-300.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.30 Compiler generation of embedded assembly language functions on page 7-295.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
Note
This means that it is possible for control to pass from one __asm function to another by falling off the end
of the first function into the next __asm function in the file, if the return instruction is omitted.
When you invoke armcc, the object file produced by the assembler is combined with the object file of the
compiler by a partial link that produces a single object file.
The compiler generates an AREA directive for each __asm function, as in the following example:
#include <cstddef>
struct X
{
int x,y;
void addto_y(int);
};
__asm void X::addto_y(int)
{
LDR r2, [r0, #__cpp(offsetof(X, y))]
ADD r1, r2, r1
STR r1, [r0, #__cpp(offsetof(X, y))]
BX lr
}
The use of offsetof must be inside __cpp() because it is the normal offsetof macro from the cstddef
header file.
Ordinary __asm functions are put in an ELF section with the name .emb_text. That is, embedded
assembly functions are never inlined. However, implicitly instantiated template functions and out-of-line
copies of inline functions are placed in an area with a name that is derived from the name of the function,
and an extra attribute that marks them as common. This ensures that the special semantics of these kinds
of functions are maintained.
Note
Because of the special naming of the area for out-of-line copies of inline functions and template
functions, these functions are not in the order of definition, but in an arbitrary order. Therefore, do not
assume that code execution falls out of an inline or template function and into another __asm function.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
Related information
ELF for the ARM Architecture.
Names in the __cpp expression are looked up in the C++ context of the __asm function. Any names in
the result of a __cpp expression are mangled as required and automatically have IMPORT statements
generated for them.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
7.33 Manual overload resolution in embedded assembler on page 7-299.
7.32 Differences between expressions in embedded assembler and C or C++ on page 7-298.
Note
The embedded assembly rules apply outside __cpp, and the C or C++ rules apply inside __cpp.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.31 Access to C and C++ compile-time constant expressions from embedded assembler on page 7-297.
Returns the offset from the beginning of a D object to the start of the B base subobject within it. The
result might be zero. The following example shows the offset (in bytes) that must be added to a D* p to
implement the equivalent of static_cast<B*>(p).
__asm B* my_static_base_cast(D* /*p*/) // equivalent to:
// return static_cast<B*>(p)
{
if __offsetof_base(D, B) <> 0 // optimize zero offset case
ADD r0, r0, #__offsetof_base(D, B)
endif
BX lr
}
The __offsetof_base, __mcall_*, and _vcall_offsetof_vfunc keywords are converted into integer
or logical constants in the assembly source code. You can only use it in __asm functions, not in __cpp
expressions.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.36 __mcall_is_virtual(D, f) on page 7-302.
7.37 __mcall_is_in_vbase(D, f) on page 7-303.
7.38 __mcall_offsetof_vbase(D, f) on page 7-304.
7.39 __mcall_this_offset(D, f) on page 7-305.
7.40 __vcall_offsetof_vfunc(D, f) on page 7-306.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.29 Restrictions on embedded assembly language functions in C and C++ code on page 7-294.
7.36 __mcall_is_virtual(D, f)
Results in {TRUE} if f is a virtual member function found in D, or a base class of D, otherwise {FALSE}.
If it returns {TRUE} the call can be done using virtual dispatch, otherwise the call must be done directly.
Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
7.37 __mcall_is_in_vbase(D, f)
Results in {TRUE} if f is a nonstatic member function found in a virtual base class of D, otherwise
{FALSE}.
If it returns {TRUE} the this adjustment must be done using __mcall_offsetof_vbase(D, f),
otherwise it must be done with __mcall_this_offset(D, f).
Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
7.38 __mcall_offsetof_vbase(D, f)
Returns the negative offset from the value of the vtable pointer of the vtable slot that holds the base
offset (from the beginning of a D object to the start of the base that f is defined in).
Where D is a class type and f is a nonstatic member function defined in a virtual base class of D, in other
words __mcall_is_in_vbase(D,f) returns {TRUE}.
The base offset is the this adjustment necessary when making a call to f with a pointer to a D.
Note
The offset returns a positive number that then has to be subtracted from the value of the vtable pointer.
Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
7.39 __mcall_this_offset(D, f)
Returns the offset from the beginning of a D object to the start of the base in which f is defined.
This is the this adjustment necessary when making a call to f with a pointer to a D. It is either zero if f
is found in D or the same as __offsetof_base(D,B), where B is a nonvirtual base class of D that contains
f.
Where D is a class type and f is a nonstatic member function defined in D or a nonvirtual base class of D.
If __mcall_this_offset(D,f) is used when f is found in a virtual base class of D it returns an arbitrary
value designed to cause an assembly error if used. This is so that such invalid uses of
__mcall_this_offset can occur in sections of assembly code that are to be skipped.
Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
7.40 __vcall_offsetof_vfunc(D, f)
Returns the offset of the slot in the vtable that holds the pointer to the virtual function, f.
Where D is a class and f is a virtual function defined in D, or a base class of D.
If __vcall_offsetof_vfunc(D, f) is used when f is not a virtual member function it returns an
arbitrary value designed to cause an assembly error if used.
Related concepts
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
There is no __mcall_is_static to detect static member functions because static member functions have
different parameters (that is, no this), so call sites are likely to already be specific to calling a static
member function.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.35 Compiler-supported keywords for calling class member functions in embedded assembler
on page 7-301.
7.42 Calling a nonvirtual member function on page 7-308.
7.43 Calling a virtual member function on page 7-309.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.43 Calling a virtual member function on page 7-309.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
7.41 Calling nonstatic member functions in embedded assembler on page 7-307.
7.42 Calling a nonvirtual member function on page 7-308.
The second method uses embedded assembly to access physical ARM registers from within a C or C++
source file, for example:
__asm void func()
{
MOV r0, lr
...
BX lr
}
This enables the return address of a function to be captured and displayed, for example, for debugging
purposes, to show the call tree.
Note
The compiler might also inline a function into its caller function. If a function is inlined, then the return
address is the return address of the function that calls the inlined function. Also, a function might be tail
called.
Related concepts
7.26 Embedded assembler support in the compiler on page 7-291.
Related references
10.137 __return_address intrinsic on page 10-748.
10.110 __current_pc intrinsic on page 10-718.
10.111 __current_sp intrinsic on page 10-719.
7.45 Differences in compiler support for inline and embedded assembly code
There are differences between the ways inline and embedded assembly are compiled.
Specifically:
• Inline assembly code uses a high level of processor abstraction, and is integrated with the C and C++
code during code generation. Therefore, the compiler optimizes the C and C++ code and the
assembly code together.
• Unlike inline assembly code, embedded assembly code is assembled separately from the C and C++
code to produce a compiled object that is then combined with the object from the compilation of the
C or C++ source.
• Inline assembly code can be inlined by the compiler, but embedded assembly code cannot be inlined,
either implicitly or explicitly.
The following table summarizes the main differences between inline assembler and embedded assembler.
ARMv6 instructions All supported. Supports most instructions, with some exceptions, for
example SETEND and some of the system extensions. The
complete set of ARMv6 SIMD instructions is supported.
Register access Specified physical registers are used. Uses virtual registers. Using sp (r13), lr (r14), and pc
You can also use PC, LR and SP. (r15) gives an error.
Return instructions You must add them in your code. Generated automatically. (The BX, BXJ, and BLX instructions
are not supported.)
8.1 -Aopt
Specifies command-line options to pass to the assembler when it is invoked by the compiler to assemble
either .s input files or embedded assembly language functions.
Syntax
-Aopt
Where:
opt
is a command-line option to pass to the assembler.
Note
Some compiler command-line options are passed to the assembler automatically whenever it is
invoked by the compiler. For example, if the option --cpu is specified on the compiler
command line, then this option is passed to the assembler whenever it is invoked to assemble .s
files or embedded assembly code.
To see the compiler command-line options passed by the compiler to the assembler, use the
compiler command-line option -A--show_cmdline.
Restrictions
If an unsupported option is passed through using -A, an error is generated by the assembler.
Example
armcc -A--predefine="NEWVERSION SETL {TRUE}" main.c
Related references
8.114 -Lopt on page 8-445.
8.171 --show_cmdline on page 8-508.
8.7 --arm on page 8-326.
8.28 --compatible=name on page 8-351.
8.42 --cpu=list on page 8-367.
8.43 --cpu=name on page 8-368.
Usage
--allow_fpreg_for_nonfpdata enables the compiler to use VFP and NEON registers and instructions
for data transfer operations on non-VFP and non-NEON data. This is useful when demand for integer
registers is high. For the compiler to use the VFP or NEON registers, the default options for the
processor or the specified options must enable the hardware.
--no_allow_fpreg_for_nonfpdata prevents VFP and NEON registers from being used for non-VFP
and non-NEON data. When this option is specified, the compiler uses VFP and NEON registers for VFP
and NEON data only. This is useful when you want to confine the number of places in your code where
the compiler generates VFP or NEON instructions.
Default
The default is --no_allow_fpreg_for_nonfpdata.
Related references
8.87 --fpmode=model on page 8-415.
8.88 --fpu=list on page 8-417.
8.89 --fpu=name on page 8-418.
Related information
Extension register bank mapping.
NEON views of the register bank.
VFP views of the extension register bank.
Usage
Allowing null this pointers gives well-defined behavior when a nonvirtual member function is called on
a null object pointer.
Disallowing null this pointers enables the compiler to perform optimizations, and conforms with the
C++ standard.
Default
The default is --no_allow_null_this.
Related references
8.94 --gnu_defaults on page 8-425.
Usage
In C and C++, use this option to control recognition of the digraphs. In C++, use this option to control
recognition of operator keywords, for example, and and bitand.
Default
The default is --alternative_tokens.
Mode
This option is effective only if the source language is C++.
Default
The default is --no_anachronisms.
Example
typedef enum { red, white, blue } tricolor;
inline tricolor operator++(tricolor c, int)
{
int i = static_cast<int>(c) + 1;
return static_cast<tricolor>(i);
}
void foo(void)
{
tricolor c = red;
c++; // okay
++c; // anachronism
}
Compiling this code with the option --anachronisms generates a warning message.
Compiling this code without the option --anachronisms generates an error message.
Related references
8.39 --cpp on page 8-363.
8.176 --strict, --no_strict on page 8-513.
8.177 --strict_warnings on page 8-514.
11.8 Anachronisms in ARM C++ on page 11-808.
8.6 --apcs=qualifier...qualifier
Controls interworking and position independence when generating code.
By specifying qualifiers to the --apcs command-line option, you can define the variant of the Procedure
Call Standard for the ARM architecture (AAPCS) used by the compiler.
Syntax
--apcs=qualifier...qualifier
/rwpi
/norwpi
Enables or disables the generation of Read/Write Position-Independent (RWPI) code. The
default is /norwpi.
/[no]pid is an alias for /[no]rwpi.
/fpic
/nofpic
Enables or disables the generation of read-only position-independent code where relative
address references are independent of the location where your program is loaded.
/hardfp
/softfp
Requests hardware or software floating-point linkage. This enables the procedure call standard
to be specified separately from the version of the floating-point hardware available through the
--fpu option. It is still possible to specify the procedure call standard by using the --fpu option,
but ARM recommends that you use --apcs instead.
Note
The / prefix is optional for the first qualifier, but must be present to separate subsequent qualifiers in the
same --apcs option. For example, --apcs=/nointerwork/noropi/norwpi is equivalent to
--apcs=nointerwork/noropi/norwpi.
You can specify multiple qualifiers using either a single --apcs option or multiple --apcs options. For
example, --apcs=/nointerwork/noropi/norwpi is equivalent to --apcs=/nointerwork
--apcs=noropi/norwpi.
Default
If you do not specify an --apcs option, the compiler assumes
--apcs=/nointerwork/noropi/norwpi/nofpic.
Usage
/interwork
/nointerwork
By default, code is generated:
• Without interworking support, that is /nointerwork, unless you specify a --cpu option that
corresponds to architecture ARMv5T or later.
• With interworking support, that is /interwork, on ARMv5T and later. ARMv5T and later
architectures provide direct support to interworking by using instructions such as BLX and
load to program counter instructions.
/ropi
/noropi
If you select the /ropi qualifier to generate ROPI code, the compiler:
• Addresses read-only code and data PC-relative.
• Sets the Position Independent (PI) attribute on read-only output sections.
Note
--apcs=/ropi is not supported when compiling C++.
/rwpi
/norwpi
If you select the /rwpi qualifier to generate RWPI code, the compiler:
• addresses writable data using offsets from the static base register sb. This means that:
— The base address of the RW data region can be fixed at runtime.
— Data can have multiple instances.
— Data can be, but does not have to be, position-independent.
• Sets the PI attribute on read/write output sections.
Note
Because the --lower_rwpi option is the default, code that is not RWPI is automatically
transformed into equivalent code that is RWPI. This static initialization is done at runtime by the
C++ constructor mechanism, even for C.
/fpic
/nofpic
If you select this option, the compiler:
• Accesses all static data using PC-relative addressing.
• Accesses all imported or exported read-write data using a Global Offset Table (GOT) entry
created by the linker.
• Accesses all read-only data relative to the PC.
You must compile your code with /fpic if it uses shared objects. This is because relative
addressing is only implemented when your code makes use of System V shared libraries.
You do not have to compile with /fpic if you are building either a static image or static library.
The use of /fpic is supported when compiling C++. In this case, virtual function tables and
typeinfo are placed in read-write areas so that they can be accessed relative to the location of
the PC.
Note
When building a System V or ARM Linux shared library, use --apcs /fpic together with
--no_hide_all.
/hardfp
If you use /hardfp, the compiler generates code for hardware floating-point linkage. Hardware
floating-point linkage uses the FPU registers to pass the arguments and return values.
/hardfp interacts with or overrides explicit or implicit use of --fpu as follows: