C++ Annotations Guide for Programmers
C++ Annotations Guide for Programmers
Frank B. Brokken
Center of Information Technology,
University of Groningen
Nettelbosje 1,
P.O. Box 11044,
9700 CA Groningen
The Netherlands
Published at the University of Groningen
ISBN 90 367 0470 7
1994 - 2010
Abstract
This document is intended for knowledgeable users of C (or any other language using a C-like grammar,
like Perl or Java) who would like to know more about, or make the transition to, C++. This document is
the main textbook for Frank’s C++ programming courses, which are yearly organized at the University
of Groningen. The C++ Annotations do not cover all aspects of C++, though. In particular, C++’s basic
grammar is not covered when equal to C’s grammar. Any basic book on C may be consulted to refresh
that part of C++’s grammar.
If you want a hard-copy version of the C++ Annotations: printable versions are available in
postscript, pdf and other formats in
http://sourceforge.net/projects/cppannotations/,
in files having names starting with cplusplus (A4 paper size). Files having names starting with
‘cplusplusus’ are intended for the US legal paper size.
The latest version of the C++ Annotations in html-format can be browsed at:
http://cppannotations.sourceforge.net/
and/or at
http://www.icce.rug.nl/documents/
Contents
2 Introduction 3
3.1 Extensions to C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ii
3.1.2 The scope resolution operator :: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Name Spaces 49
4.1 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 Initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.2 Iterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.4 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Classes 107
9 Exceptions 165
13 Inheritance 273
14 Polymorphism 293
15 Friends 331
xv
xvi
Chapter 1
• Chapter 7: The ‘class’ concept: structs having functions. The ‘object’ concept: variables of a class.
• Chapter 8: Allocation and returning unused memory: new, delete, and the function
set_new_handler().
• Chapter 9: Exceptions: handle errors where appropriate, rather than where they occur.
• Chapter 10: Give your own meaning to operators.
• Chapter 11: Static data and functions: members of a class not bound to objects.
• Chapter 12: Abstract Containers to put stuff into.
• Chapter 13: Building classes upon classes: setting up class hierarcies.
• Chapter 14: Changing the behavior of member functions accessed through base class pointers.
• Chapter 15: Gaining access to private parts: friend functions and classes.
• Chapter 16: Classes having pointers to members: pointing to locations inside objects.
• Chapter 17: Constructing classes and enums within classes.
• Chapter 18: The Standard Template Library.
• Chapter 19: The STL generic algorithms.
• Chapter 20: Function templates: using molds for type independent functions.
• Chapter 21: Class templates: using molds for type independent classes.
• Chapter 22: Advanced Template Use: programming the compiler.
• Chapter 23: Several examples of programs written in C++.
1
2 CHAPTER 1. OVERVIEW OF THE CHAPTERS
Chapter 2
Introduction
This document offers an introduction to the C++ programming language. It is a guide for C/C++
programming courses, yearly presented by Frank at the University of Groningen. This document is not
a complete C/C++ handbook, as much of the C-background of C++ is not covered. Other sources should
be referred to for that (e.g., the Dutch book De programmeertaal C, Brokken and Kubat, University of
Groningen, 1996) or the on-line book1 suggested to me by George Danchev (danchev at spnet dot
net).
The reader should be forwarned that extensive knowledge of the C programming language is actually
assumed. The C++ Annotations continue where topics of the C programming language end, such as
pointers, basic flow control and the construction of functions.
The version number of the C++ Annotations (currently 8.2.0) is updated when the contents of the
document change. The first number is the major number, and will probably not be changed for some
time: it indicates a major rewriting. The middle number is increased when new information is added to
the document. The last number only indicates small changes; it is increased when, e.g., series of typos
are corrected.
This document is published by the Computing Center, University of Groningen, the Netherlands under
the GNU General Public License2 .
The C++ Annotations were typeset using the yodl3 formatting system.
Frank B. Brokken
Center of Information Technology,
University of Groningen
Nettelbosje 1,
P.O. Box 11044,
9700 CA Groningen
The Netherlands
(email: [email protected])
In this chapter an overview of C++’s defining features is presented. A few extensions to C are reviewed
and the concepts of object based and object oriented programming (OOP) are briefly introduced.
1 http://publications.gbdirect.co.uk/c_book/
2 http://www.gnu.org/licenses/
3 http://yodl.sourceforge.net
3
4 CHAPTER 2. INTRODUCTION
This section is modified when the first or second part of the version number changes (and sometimes
for the third part as well).
• Version 8.2.0 adds a section about casting shared_ptrs (section 18.4.5) and about sharing arrays
of objects (18.4.6).
• Version 8.1.0 was released following a complete overhaul of the C++ Annotations, with two pre-
leases in between. Many inconsistencies that had crept into the text and examples were removed,
streamlining the text and synchronizing examples with the text. All of the code examples have
received a work-over, replacing endl by ’\n’, making virtual functions private, etc., etc. The
sections labeled C++0x were improved and sections showing C++0x now also mention the g++
version in which the new feature will be made available, using ? if this is as yet unknown.
No version is shown if the feature is already available in g++ 4.3 (or in one of its subreleases,
like 4.3.3). I received a host of suggestions from Francesco Poli (thanks, Francesco (and several
others), for all the effort you’ve put into sending me those corrections).
• Version 8.0.0 was released as a result of the upcoming new C++ standard4 becoming (partially)
available in the Gnu g++ compiler5 . Not all new elements of the new standard (informally called
the C++0x standard) are available right now, and new subreleases of the C++ Annotations will
appear once more elements become implemented in the g++ compiler. In section 2.2.3 the way
to activate the new standard is shown, and new sections covering elements of the new standard
show C++0x in their section-titles.
Furthermore, two new chapters were added: the STL chapter is now split in two. The STL chapter
now covers the STL except for the Generic Algorithms which are now discussed in a separate
chapter. Name spaces, originally covered by the introductory chapter are now also covered in a
separate chapter.
• Version 7.3.0 adds a section about overloading operators outside of the common context of classes
(section 10.11).
• Version 7.2.0 describes the implementation of polymorphism for classes inheriting from multiple
base classes defining virtual member functions (section 14.9) and adds two new sections in the
concrete examples chapter: Section 23.6 discusses the problem how to distinguish lvalues from
rvalues with operator[](), section 23.8.3 discusses in the context of the Bisonc++ parser gen-
erator how to use polymorphism instead of a union to define different types of semantic values.
As usual, several typos were repaired and various other improvements were made.
• Version 7.1.0 adds a description of the type_info::before() member (cf. section 14.5.2). Fur-
thermore, several typographical corrections were made.
• Version 7.0.1. was released shortly after releasing version 7.0.0, as the result of very extensive
feedback received by Eric S. Raymond (esr at thyrsus dot com) and Edward Welbourne (eddy
at chaos dot org dot uk). Considering the extent of the received feedback, it’s appropriate to
mention explicitly the sub-sub-release here. Many textual changes were made and section 5.2.4
was completely reorganized.
• Version 7.0.0 comes with a new chapter discussing advanced template applications. Moreover,
the general terminology used with templates has evolved. ‘Templates’ are now considered a core
concept, which is reflected by the use of ‘templates’ as a noun, rather than an adjective. So, from
now on it is ‘class template’ rather than ‘template class’. The addition of another chapter, together
with the addition of several new sections to existing chapters as well as various rewrites of exist-
ing sections made it appropriate to upgrade to the next major release. The newly added chapter
does not aim at concrete examples of templates. Instead it discusses possibilities of templates be-
yond the basic function and class templates. In addition to this new chapter, several new sections
were added: section 7.7 introduces local classes; section 8.1.5 discusses the placement new opera-
tor; section 13.6.1 discusses how to make available some members of privately inherited classes
4 http://en.wikipedia.org/wiki/C++0x
5 http://gcc.gnu.org/projects/cxx0x.html
2.1. WHAT’S NEW IN THE C++ ANNOTATIONS 5
and section 13.8 discusses how objects created by new[] can be initialized by non-default con-
structors. In addition to all this, Elwin Dijck (e dot dijck at gmail dot com), one of the students of
the 2006-2007 edition of the C++ course, did a magnificent job by converting all images to vector
graphics (in the process prompting me to start using vector graphics as well :-). Thanks, Elwin
for a job well done!
• Version 6.5.0 changed unsigned into size_t where appropriate, and explicitly mentioned int-
derived types like int16_t. In-class member function definitions were moved out of (below) their
class definitions as inline defined members. A paragraphs about implementing pure virtual
member functions was added. Various bugs and compilation errors were fixed.
• Version 6.4.0 added a new section (22.1.1) further discussing the use of the template keyword
to distinguish types nested under class templates from template members. Furthermore, Sergio
Bacchi s dot bacchi at gmail dot com did an impressive job when translating the Anno-
tations into Portuguese. His translation (which may lag a distribution or two behind the latest
verstion of the Annotations) may also be retrieved from the contributions/ subdirectory in the
c++-annotations_X.Y.Z.tar.gz archive at
http://sourceforge.net/projects/cppannotations/
• Version 6.3.0 added new sections about anonymous objects (section 7.5.1) and type resolution
with class templates (section 22.1.2). Also the description of the template parameter deduction
algorithm was rewritten (cf. section 20.3.4) and numerous modifications required because of the
compiler’s closer adherence to the C++ standard were realized, among which exception rethrowing
from constructor and destructor function try blocks. Also, all textual corrections received from
readers since version 6.2.4 were processed.
• In version 6.2.4 many textual improvements were realized. I received extensive lists of typos and
suggestions for clarifications of the text, in particular from Nathan Johnson and from Jakob van
Bethlehem. Equally valuable were suggestions I received from various other readers of the C++
annotations: all were processed in this release. The C++ content matter of this release was not
substantially modified, compared to version 6.2.2.
• Version 6.2.2 offers improved implementations of the configurable class templates (removed in
8.1.0, superseded by lambda functions).
• Version 6.2.0 was released as an Annual Update, by the end of May, 2005. Apart from the usual
typo corrections several new sections were added and some were removed: in the Exception chap-
ter (9) a section was added covering the standard exceptions and their meanings; in the chapter
covering static members (11) a section was added discussing static const data members; and
the final chapter (23) covers configurable class templates using local context structs (replacing the
previous ForEach, UnaryPredicate and BinaryPredicate classes). Furthermore, the final
section (covering a C++ parser generator) now uses bisonc++, rather than the old (and somewhat
outdated) bison++ program.
• Version 6.1.0 was released shortly after releasing 6.0.0. Following suggestions received from
Leo Razoumov<[email protected]> and Paulo Tribolet, and after receiving many,
many useful suggestions and extensive help from Leo, navigatable .pdf files are from now on
distributed with the C++ Annotations. Also, some sections were slightly adapted.
• Version 6.0.0 was released after a full update of the text, removing many inconsistencies and ty-
pos. Since the update effected the Annotation’s full text an upgrade to a new major version seemed
appropriate. Several new sections were added: overloading binary operators (section 10.6); throw-
ing exceptions in constructors and destructors (section 9.11); function try-blocks (section 9.10);
calling conventions of static and global functions (section 11.2.1) and virtual constructors (section
14.11). The chapter on templates was completely rewritten and split into two separate chapters:
chapter 20 discusses the syntax and use of template functions; chapter 21 discusses template
classes. Various concrete examples were modified; new examples were included as well (chapter
23).
• In version 5.2.4 the description of the random_shuffle generic algorithm (section 19.1.39) was
modified.
6 CHAPTER 2. INTRODUCTION
• In version 5.2.3 section 2.5.11 on local variables was extended and section 2.5.4 on function over-
loading was modified by explicitly discussing the effects of the const modifier with overloaded
functions. Also, the description of the compare() function in chapter 5 contained an error, which
was repaired.
• In version 5.2.2 a leftover in section 10.4 from a former version was removed and the correspond-
ing text was updated. Also, some minor typos were corrected.
• In version 5.2.1 various typos were repaired, and some paragraphs were further clarified. Fur-
thermore, a section was added to the template chapter (chapter 20), about creating several iterator
types. This topic was further elaborated in chapter 23, where the section about the construction
of a reverse iterator (section 23.7) was completely rewritten. In the same chapter, a universal text
to anything convertor is discussed (section 23.5). Also, LaTeX, PostScript and PDF versions fit-
ting the US-letter paper size are now available as cplusplusus versions: cplusplusus.latex,
cplusplusus.ps and cplusplus.pdf. The A4-paper size is of course kept, and remains to be
available in the cplusplus.latex, cplusplus.ps and cpluspl.pdf files.
• Version 5.2.0 was released after adding a section about the mutable keyword (section 7.8), and
after thoroughly changing the discussion of the Fork() abstract base class (section 23.3). All
examples should now be up-to-date with respect to the use of the std namespace.
• However, in the meantime the Gnu g++ compiler version 3.2 was released6 . In this version ex-
tensions to the abstract containers (see chapter 12) like the hash_map were placed in a separate
namespace, __gnu_cxx. This namespace should be used when using these containers. However,
this may break compilations of sources with g++, version 3.0. In that case, a compilation can be
performed conditionally to the 3.2 and the 3.0 compiler version, defining __gnu_cxx for the 3.2
version. Alternatively, the dirty trick
can be placed just before header files in which the __gnu_cxx namespace is used. This might
eventually result in name-collisions, and it’s a dirty trick by any standards, so please don’t tell
anybody I wrote this down.
• Version 5.1.1 was released after modifying the sections related to the fork() system call in chap-
ter 23. Under the ANSI/ISO standard many of the previously available extensions (like procbuf,
and vform) applied to streams were discontinued. Starting with version 5.1.1. ways of construct-
ing these facilities under the ANSI/ISO standard are discussed in the C++ Annotations. I consider
the involved subject sufficiently complex to warrant the upgrade to a new subversion.
• With the advent of the Gnu g++ compiler version 3.00, a more strict implementation of the
ANSI/ISO C++ standard became available. This resulted in version 5.1.0 of the Annotations,
appearing shortly after version 5.0.0. In version 5.1.0 chapter 6 was modified and several cos-
metic changes took place (e.g., removing class from template type parameter lists, see chapter
20). Intermediate versions (like 5.0.0a, 5.0.0b) were not further documented, but were mere in-
termediate releases while approaching version 5.1.0. Code examples will gradually be adapted to
the new release of the compiler.
In the meantime the reader should be prepared to insert using namespace std; in
many code examples, just beyond the #include preprocessor directives as a temporary
measure to make the example accepted by the compiler.
• New insights develop all the time, resulting in version 5.0.0 of the Annotations. In this version a
lot of old code was cleaned up and typos were repaired. According to current standard, namespaces
are required in C++ programs, so they are introduced now very early (in section 3.1.1) in the
Annotations. A new section about using external programs was added to the Annotations (and
removed again in version 5.1.0), and the new stringstream class, replacing the strstream
class is now covered too (sections 6.4.3 and 6.5.3). Actually, the chapter on input and output
was completely rewritten. Furthermore, the operators new and delete are now discussed in
chapter 8, where they fit better than in a chapter on classes, where they previously were discussed.
6 http://www.gnu.org
2.2. C++’S HISTORY 7
Chapters were moved, split and reordered, so that subjects could generally be introduced without
forward references. Finally, the html, PostScript and pdf versions of the C++ Annotations now
contain an index (sigh of relief ?) All in, considering the volume and nature of the modifications,
it seemed right to upgrade to a full major version. So here it is.
Considering the volume of the Annotations, I’m sure there will be typos found every now and
then. Please do not hesitate to send me mail containing any mistakes you find or corrections you
would like to suggest.
• In release 4.4.1b the pagesize in the LaTeX file was defined to be din A4. In countries where
other pagesizes are standard the default pagesize might be a better choice. In that case, re-
move the a4paper,twoside option from cplusplus.tex (or cplusplus.yo if you have yodl
installed), and reconstruct the Annotations from the TeX-file or Yodl-files.
The Annotations mailing lists was stopped at release 4.4.1d. From this point on only minor
modifications were expected, which are not anymore generally announced.
At some point, I considered version 4.4.1 to be the final version of the C++ Annotations. How-
ever, a section on special I/O functions was added to cover unformatted I/O, and the section about
the string datatype had its layout improved and was, due to its volume, given a chapter of its
own (chapter 5). All this eventually resulted in version 4.4.2.
Version 4.4.1 again contains new material, and reflects the ANSI/ISO7 standard (well, I try
to have it reflect the ANSI/ISO standard). In version 4.4.1. several new sections and chapters
were added, among which a chapter about the Standard Template Library (STL) and generic
algorithms.
Version 4.4.0 (and subletters) was a mere construction version and was never made available.
The version 4.3.1a is a precursor of 4.3.2. In 4.3.1a most of the typos I’ve received since the
last update have been processed. In version 4.3.2 extra attention was paid to the syntax for
function addresses and pointers to member functions.
The decision to upgrade from version 4.2.* to 4.3.* was made after realizing that the lexical scan-
ner function yylex() can be defined in the scanner class that is derived from yyFlexLexer.
Under this approach the yylex() function can access the members of the class derived from
yyFlexLexer as well as the public and protected members of yyFlexLexer. The result of all
this is a clean implementation of the rules defined in the flex++ specification file.
The upgrade from version 4.1.* to 4.2.* was the result of the inclusion of section 3.4.1 about the
bool data type in chapter 3. The distinction between differences between C and C++ and exten-
sions of the C programming languages is (albeit a bit fuzzy) reflected in the introduction chapter
and the chapter on first impressions of C++: The introduction chapter covers some differences
between C and C++, whereas the chapter about first impressions of C++ covers some extensions
of the C programming language as found in C++.
Major version 4 is a major rewrite of the previous version 3.4.14. The document was rewritten
from SGML to Yodl and many new sections were added. All sections got a tune-up. The distribu-
tion basis, however, hasn’t changed: see the introduction.
Modifications in versions 1.*.*, 2.*.*, and 3.*.* (replace the stars by any applicable number) were
not logged.
Subreleases like 4.4.2a etc. contain bugfixes and typographical corrections.
The first implementation of C++ was developed in the 1980s at the AT&T Bell Labs, where the Unix
operating system was created.
C++ was originally a ‘pre-compiler’, similar to the preprocessor of C, converting special constructions in
its source code to plain C. Back then this code was compiled by a standard C compiler. The ‘pre-code’,
which was read by the C++ pre-compiler, was usually located in a file with the extension .cc, .C or
7 ftp://research.att.com/dist/c++std/WP/
8 CHAPTER 2. INTRODUCTION
.cpp. This file would then be converted to a C source file with the extension .c, which was thereupon
compiled and linked.
The nomenclature of C++ source files remains: the extensions .cc and .cpp are still used. However,
the preliminary work of a C++ pre-compiler is nowadays usually performed during the actual compi-
lation process. Often compilers determine the language used in a source file from its extension. This
holds true for Borland’s and Microsoft’s C++ compilers, which assume a C++ source for an extension
.cpp. The Gnu compiler g++, which is available on many Unix platforms, assumes for C++ the exten-
sion .cc.
The fact that C++ used to be compiled into C code is also visible from the fact that C++ is a superset
of C: C++ offers the full C grammar and supports all C-library functions, and adds to this features of
its own. This makes the transition from C to C++ quite easy. Programmers familiar with C may start
‘programming in C++’ by using source files having extensions .cc or .cpp instead of .c, and may then
comfortably slip into all the possibilities offered by C++. No abrupt change of habits is required.
The original version of the C++ Annotations was written by Frank Brokken and Karel Kubat in Dutch
using LaTeX. After some time, Karel rewrote the text and converted the guide to a more suitable format
and (of course) to English in september 1994.
The first version of the guide appeared on the net in october 1994. By then it was converted to SGML.
Gradually new chapters were added, and the contents were modified and further improved (thanks to
countless readers who sent us their comment).
In major version four Frank added new chapters and converted the document from SGML to yodl8 .
The C++ Annotations are freely distributable. Be sure to read the legal notes9 .
Reading the annotations beyond this point implies that you are aware of these
notes and that you agree with them.
If you like this document, tell your friends about it. Even better, let us know by sending email to
Frank10 .
In the Internet, many useful hyperlinks exist to C++. Without even suggesting completeness (and
without being checked regularly for existence: they might have died by the time you read this), the
following might be worthwhile visiting:
Prospective C++ programmers should realize that C++ is not a perfect superset of C. There are some
differences you might encounter when you simply rename a file to a file having the extension .cc and
run it through a C++ compiler:
• In C, sizeof(’c’) equals sizeof(int), ’c’ being any ASCII character. The underlying phi-
losophy is probably that chars, when passed as arguments to functions, are passed as integers
8 http://yodl.sourceforge.net
9 legal.shtml
10 mailto:[email protected]
2.2. C++’S HISTORY 9
anyway. Furthermore, the C compiler handles a character constant like ’c’ as an integer con-
stant. Hence, in C, the function calls
putchar(10);
and
putchar(’\n’);
are synonymous.
By contrast, in C++, sizeof(’c’) is always 1 (but see also section 3.4.2). An int is still an int,
though. As we shall see later (section 2.5.4), the two function calls
somefunc(10);
and
somefunc(’\n’);
may be handled by different functions: C++ distinguishes functions not only by their names, but
also by their argument types, which are different in these two calls. The former using an int
argument, the latter a char.
• C++ requires very strict prototyping of external functions. E.g., in C a prototype like
void func();
means that a function func() exists, returning no value. The declaration doesn’t specify which
arguments (if any) are accepted by the function.
However, in C++ the above declaration means that the function func() does not accept any
arguments at all. Any arguments passed to it will result in a compile-time error.
Note that the keyword extern is not required when declaring functions. A function definition
becomes a function declaration simply by replacing a function’s body by a semicolon. The keyword
extern is required, though, when declaring variables.
To compile a C++ program, a C++ compiler is required. Considering the free nature of this document,
it won’t come as a surprise that a free compiler is suggested here. The Free Software Foundation (FSF)
provides at http://www.gnu.org a free C++ compiler which is, among other places, also part of the
Debian (http://www.debian.org) distribution of Linux ( http://www.linux.org).
The upcoming C++0x standard has not yet fully been implemented in the g++ compiler. Unless in-
dicated otherwise, all features of the C++0x standard covered by the C++ Annotations are available
in g++ 4.4, unless indicated otherwise. To use these features the compiler flag -std=c++0x must
currently be provided. It is assumed that this flag is used when compiling the examples given by the
Annotations. The features of the C++0x standard may or may not be available in g++ versions older
than 4.4.
When visiting the above URL to obtain a free g++ compiler, click on install now. This will download
the file setup.exe, which can be run to install cygwin. The software to be installed can be downloaded
by setup.exe from the internet. There are alternatives (e.g., using a CD-ROM), which are described
10 CHAPTER 2. INTRODUCTION
on the Cygwin page. Installation proceeds interactively. The offered defaults are sensible and should
be accepted unless you have reasons to divert.
The most recent Gnu g++ compiler can be obtained from http://gcc.gnu.org. If the compiler that
is made available in the Cygnus distribution lags behind the latest version, the sources of the latest
version can be downloaded after which the compiler can be built using an already available compiler.
The compiler’s webpage (mentioned above) contains detailed instructions on how to proceed. In our
experience building a new compiler within the Cygnus environment works flawlessly.
Generally the following command can be used to compile a C++ source file ‘source.cc’:
g++ source.cc
This produces a binary program (a.out or a.exe). If the default name is inappropriate, the name of
the executable can be specified using the -o flag (here producing the program source):
If a mere compilation is required, the compiled module can be produced using the -c flag:
g++ -c source.cc
This generates the file source.o, which can later on be linked to other modules. As pointed out,
provide the compiler option –std=c++0x (note: two dashes). to activate the features of the C++0x
standard.
C++ programs quickly become too complex to maintain ‘by hand’. With all serious programming
projects program maintenance tools are used. Usually the standard make program is be used to
maintain C++ programs, but good alternatives exist, like the icmake11 program maintenance utility,
ccbuild12 or lake13
It is strongly advised to start using maintenance utilities early in the study of C++.
Often it is said that programming in C++ leads to ‘better’ programs. Some of the claimed advantages
of C++ are:
• New programs would be developed in less time because old code can be reused.
• The memory management under C++ would be easier and more transparent.
• Programs would be less bug-prone, as C++ uses a stricter syntax and type checking.
• ‘Data hiding’, the usage of data by one program part while other program parts cannot access the
data, would be easier to implement with C++.
11 http://icmake.sourceforge.net/
12 http://ccbuild.sourceforge.net/
13 http://nl.logilogi.org/MetaLogi/LaKe
2.3. C++: ADVANTAGES AND CLAIMS 11
Which of these allegations are true? Originally, our impression was that the C++ language was some-
what overrated; the same holding true for the entire object-oriented programming (OOP) approach. The
enthusiasm for the C++ language resembles the once uttered allegations about Artificial-Intelligence
(AI) languages like Lisp and Prolog: these languages were supposed to solve the most difficult AI-
problems ‘almost without effort’. New languages are often oversold: in the end, each problem can be
coded in any programming language (say BASIC or assembly language). The advantages and disadvan-
tages of a given programming language aren’t in ‘what you can do with them’, but rather in ‘which tools
the language offers to implement an efficient and understandable solution to a programming problem’.
Often these tools take the form of syntactic restrictions, enforcing or promoting certain constructions
or which simply suggest intentions by applying or ‘embracing’ such syntactic forms. Rather than a
long list of plain assembly instructions we now use flow control statements, functions, objects or even
(with C++) so-called templates to structure and organize code and to express oneself ‘eloquently’ in the
language of one’s choice.
• The development of new programs while existing code is reused can also be implemented in C by,
e.g., using function libraries. Functions can be collected in a library and need not be re-invented
with each new program. C++, however, offers specific syntax possibilities for code reuse, apart
from function libraries (see chapters 13 and 20).
• Creating and using new data types is certainly possible in C; e.g., by using structs, typedefs
etc.. From these types other types can be derived, thus leading to structs containing structs
and so on. In C++ these facilities are augmented by defining data types which are completely
‘self supporting’, taking care of, e.g., their memory management automatically (without having to
resort to an independently operating memory management system as used in, e.g., Java).
• Concerning ‘bug proneness’ we can say that C++ indeed uses stricter type checking than C. How-
ever, most modern C compilers implement ‘warning levels’; it is then the programmer’s choice
to disregard or get rid of the warnings. In C++ many of such warnings become fatal errors (the
compilation stops).
• As far as ‘data hiding’ is concerned, C does offer some tools. E.g., where possible, local or static
variables can be used and special data types such as structs can be manipulated by dedicated
functions. Using such techniques, data hiding can be implemented even in C; though it must be
admitted that C++ offers special syntactic constructions, making it far easier to implement ‘data
hiding’ (and more in general: ‘encapsulation’) in C++ than in C.
C++ in particular (and OOP in general) is of course not the solution to all programming problems.
However, the language does offer various new and elegant facilities which are worth investigating. At
the downside, the level of grammatical complexity of C++ has increased significantly as compared to
C. This may be considered a serious drawback of the language. Although we got used to this increased
level of complexity over time, the transition was neither fast nor painless.
With the C++ Annotations we hope to help the reader when transiting from C to C++ by focusing on
the additions of C++ as compared to C and by leaving out plain C. It is our hope that you like this
document and may benefit from it.
In contrast (or maybe better: in addition) to this, an object-based approach identifies the keywords
used in a problem statement. These keywords are then depicted in a diagram where arrows are drawn
between those keywords to depict an internal hierarchy. The keywords become the objects in the im-
plementation and the hierarchy defines the relationship between these objects. The term object is used
here to describe a limited, well-defined structure, containing all information about an entity: data types
and functions to manipulate the data. As an example of an object oriented approach, an illustration
follows:
The employees and owner of a car dealer and auto garage company are paid as follows. First,
mechanics who work in the garage are paid a certain sum each month. Second, the owner of
the company receives a fixed amount each month. Third, there are car salesmen who work
in the showroom and receive their salary each month plus a bonus per sold car. Finally, the
company employs second-hand car purchasers who travel around; these employees receive
their monthly salary, a bonus per bought car, and a restitution of their travel expenses.
When representing the above salary administration, the keywords could be mechanics, owner, salesmen
and purchasers. The properties of such units are: a monthly salary, sometimes a bonus per purchase
or sale, and sometimes restitution of travel expenses. When analyzing the problem in this manner we
arrive at the following representation:
• The owner and the mechanics can be represented by identical types, receiving a given salary per
month. The relevant information for such a type would be the monthly amount. In addition this
object could contain data as the name, address and social security number.
• Car salesmen who work in the showroom can be represented as the same type as above but with
some extra functionality: the number of transactions (sales) and the bonus per transaction.
In the hierarchy of objects we would define the dependency between the first two objects by letting
the car salesmen be ‘derived’ from the owner and mechanics.
• Finally, there are the second-hand car purchasers. These share the functionality of the salesmen
except for travel expenses. The additional functionality would therefore consist of the expenses
made and this type would be derived from the salesmen.
The hierarchy of the identified objects are further illustrated in Figure 2.1.
2.5. DIFFERENCES BETWEEN C AND C++ 13
The overall process in the definition of a hierarchy such as the above starts with the description of the
most simple type. Traditionally (and still in vogue with some popular object oriented languages) more
complex types are then derived from the basic set, with each derivation adding a little extra functional-
ity. From these derived types, more complex types can be derived ad infinitum, until a representation
of the entire problem can be made. Over the years, however, this approach has become less popular in
C++ as it typically results in overly tight coupling, which in turns reduces rather than enhances the
understanding, maintainability and testability of complex programs. In C++ object oriented program
more and more favors small, easy to understand hierarchies, limited coupling and a developmental
process where design patterns (cf. Gamma et al. (1995)) play a central role.
Nonetheless, in C++ classes are frequently used to define the characteristics of objects. Classes contain
the necessary functionality to do useful things. Classes generally do not offer all their functionality
(and typically none of their data) to objects of other classes. As we will see, classes tend to hide their
properties in such a way that they are not directly modifiable by the outside world. Instead, dedicated
functions are used to reach or modify the properties of objects. Thus class-type objects are able to
uphold their own integrity. The core concept here is encapsulation of which data hiding is just an
example. These concepts will be further explained in chapter 7.
In this section some examples of C++ code are shown. Some differences between C and C++ are high-
lighted.
In C++ there are but two variants of the function main: int main() and int main(int argc,
char **argv). Notes:
• It is not required to use an explicit return statement at the end of main. If omitted main returns
0.
• The ‘third char **envp parameter’ is not defined by the C++ standard and should be avoided.
Instead, the global variable extern char **environ should be declared providing access to the
program’s environment variables. Its final element has the value 0.
According to the ANSI/ISO definition, ‘end of line comment’ is implemented in the syntax of C++. This
comment starts with // and ends at the end-of-line marker. The standard C comment, delimited by /*
and */ can still be used in C++:
int main()
{
// this is end-of-line comment
// one comment per line
/*
this is standard-C comment, covering
multiple lines
*/
}
14 CHAPTER 2. INTRODUCTION
Despite the example, it is advised not to use C type comment inside the body of C++ functions. Some-
times existing code must temporarily be suppressed, e.g., for testing purposes. In those cases it’s very
practical to be able to use standard C comment. If such suppressed code itself contains such comment,
it would result in nested comment-lines, resulting in compiler errors. Therefore, the rule of thumb is
not to use C type comment inside the body of C++ functions (alternatively, #if 0 until #endif pair of
preprocessor directives could of course also be used).
C++ uses very strict type checking. A prototype must be known for each function before it is called, and
the call must match the prototype. The program
int main()
{
printf("Hello World\n");
}
often compiles under C, albeit with a warning that printf() is an unknown function. But C++ compil-
ers (should) fail to produce code in such cases. The error is of course caused by the missing #include
<stdio.h> (which in C++ is more commonly included as #include <cstdio> directive).
And while we’re at it: as we’ve seen in C++ main always uses the int return value. Although it is
possible to define int main() without explicitly defining a return statement, within main it is not
possible to use a return statement without an explicit int-expression. For example:
int main()
{
return; // won’t compile: expects int expression, e.g.
// return 1;
}
In C++ it is possible to define functions having identical names but performing different actions. The
functions must differ in their parameter lists (and/or in their const attribute). An example is given
below:
#include <stdio.h>
int main()
2.5. DIFFERENCES BETWEEN C AND C++ 15
{
show(12);
show(3.1415);
show("Hello World\n!");
}
In the above program three functions show are defined, only differing in their parameter lists, expecting
an int, double and char *, respectively. The functions have identical names. Functions having
identical names but different parameter lists are called overloaded. The act of defining such functions
is called ‘function overloading’.
The C++ compiler implements function overloading in a rather simple way. Although the functions
share their names (in this example show), the compiler (and hence the linker) use quite different names.
The conversion of a name in the source file to an internally used name is called ‘name mangling’. E.g.,
the C++ compiler might convert the prototype void show (int) to the internal name VshowI, while
an analogous function having a char * argument might be called VshowCP. The actual names that are
used internally depend on the compiler and are not relevant for the programmer, except where these
names show up in e.g., a listing of the contents of a library.
• Do not use function overloading for functions doing conceptually different tasks. In the example
above, the functions show are still somewhat related (they print information to the screen).
However, it is also quite possible to define two functions lookup, one of which would find a name
in a list while the other would determine the video mode. In this case the behavior of those two
functions have nothing in common. It would therefore be more practical to use names which
suggest their actions; say, findname and videoMode.
• C++ does not allow identically named functions to differ only in their return values, as it is always
the programmer’s choice to either use or ignore a function’s return value. E.g., the fragment
printf("Hello World!\n");
provides no information about the return value of the function printf. Two functions printf
which only differ in their return types would therefore not be distinguishable to the compiler.
• In chapter 7 the notion of const member functions is introduced (cf. section 7.5). Here it is
merely mentioned that classes normally have so-called member functions associated with them
(see, e.g., chapter 5 for an informal introduction to the concept). Apart from overloading member
functions using different parameter lists, it is then also possible to overload member functions
by their const attributes. In those cases, classes may have pairs of identically named member
functions, having identical parameter lists. Then, these functions are overloaded by their const
attribute. In such cases only one of these function must have the const attribute.
In C++ it is possible to provide ‘default arguments’ when defining a function. These arguments are
supplied by the compiler when they are not specified by the programmer. For example:
#include <stdio.h>
int main()
{
showstring("Here’s an explicit argument.\n");
// showstring("Hello World!\n");
}
The possibility to omit arguments in situations where default arguments are defined is just a nice
touch: it is the compiler who supplies the lacking argument unless it is explicitly specified at the call.
The code of the program will neither be shorter nor more efficient when default arguments are used.
int main()
{
two_ints(); // arguments: 1, 4
two_ints(20); // arguments: 20, 4
two_ints(20, 5); // arguments: 20, 5
}
When the function two_ints is called, the compiler supplies one or two arguments whenever necessary.
A statement like two_ints(,6) is, however, not allowed: when arguments are omitted they must be
on the right-hand side.
Default arguments must be known at compile-time since at that moment arguments are supplied to
functions. Therefore, the default arguments must be mentioned at the function’s declaration, rather
than at its implementation:
It is an error to supply default arguments in function definitions. When the function is used by other
sources the compiler reads the header file rather than the function definition. Consequently the com-
piler has no way to determine the values of default function arguments. Current compilers generate
compile-time errors when detecting default arguments in function definitions.
In C++ all zero values are coded as 0. In C NULL is often used in the context of pointers. This difference
is purely stylistic, though one that is widely adopted. In C++ NULL should be avoided (as it is a macro,
and macros can –and therefore should– easily be avoided in C++). Instead 0 can almost always be used.
Almost always, but not always. As C++ allows function overloading (cf. section 2.5.4) the programmer
might be confronted with an unexpected function selection in the situation shown in section 2.5.4:
#include <stdio.h>
{
printf("Double: %lf\n", val);
}
int main()
{
show(12);
show(3.1415);
show("Hello World\n!");
}
In this situation a programmer intending to call show(char const *) might call show(0). But this
doesn’t work, as 0 is interpreted as int and so show(int) is called. But calling show(NULL) doesn’t
work either, as C++ usually defines NULL as 0, rather than ((void *)0). So, show(int) is called
once again. To solve these kinds of problems the new C++ standard introduces the keyword nullptr
representing the 0 pointer. In the current example the programmer should call show(nullptr) to
avoid the selection of the wrong function. The nullptr value can also be used to initialize pointer
variables. E.g.,
void func();
means that the argument list of the declared function is not prototyped: the compiler will not warn
against calling func with any set of arguments. In C the keyword void is used when it is the explicit
intent to declare a function with no arguments at all, as in:
void func(void);
As C++ enforces strict type checking, in C++ an empty parameter list indicates the total absence of
parameters. The keyword void is thus omitted.
Each C++ compiler which conforms to the ANSI/ISO standard defines the symbol __cplusplus: it is
as if each source file were prefixed with the preprocessor directive #define __cplusplus.
We shall see examples of the usage of this symbol in the following sections.
Normal C functions, e.g., which are compiled and collected in a run-time library, can also be used in
C++ programs. Such functions, however, must be declared as C functions.
18 CHAPTER 2. INTRODUCTION
This declaration is analogous to a declaration in C, except that the prototype is prefixed with extern
"C".
extern "C"
{
// C-declarations go in here
}
It is also possible to place preprocessor directives at the location of the declarations. E.g., a C header
file myheader.h which declares C functions can be included in a C++ source file as follows:
extern "C"
{
#include <myheader.h>
}
Although these two approaches may be used, they are actually seldom encountered in C++ sources. We
will encounter a more frequently used method to declare external C functions in the next section.
The combination of the predefined symbol __cplusplus and the possibility to define extern "C"
functions offers the ability to create header files for both C and C++. Such a header file might, e.g.,
declare a group of functions which are to be used in both C and C++ programs.
#ifdef __cplusplus
extern "C"
{
#endif
#ifdef __cplusplus
}
#endif
Using this setup, a normal C header file is enclosed by extern "C" { which occurs near the top of the
file and by }, which occurs near the bottom of the file. The #ifdef directives test for the type of the
compilation: C or C++. The ‘standard’ C header files, such as stdio.h, are built in this manner and
are therefore usable for both C and C++.
In addition C++ headers should support include guards. In C++ it is usually undesirable to include
the same header file twice in the same source file. Such multiple inclusions can easily be avoided by
including an #ifndef directive in the header file. For example:
#ifndef MYHEADER_H_
#define MYHEADER_H_
2.5. DIFFERENCES BETWEEN C AND C++ 19
When this file is initially scanned by the preprocessor, the symbol MYHEADER_H_ is not yet defined. The
#ifndef condition succeeds and all declarations are scanned. In addition, the symbol MYHEADER_H_
is defined.
When this file is scanned next while compiling the same source file, the symbol MYHEADER_H_ has been
defined and consequently all information between the #ifndef and #endif directives is skipped by
the compiler.
In this context the symbol name MYHEADER_H_ serves only for recognition purposes. E.g., the name of
the header file can be used for this purpose, in capitals, with an underscore character instead of a dot.
Apart from all this, the custom has evolved to give C header files the extension .h, and to give C++
header files no extension. For example, the standard iostreams cin, cout and cerr are available
after including the header file iostream, rather than iostream.h. In the Annotations this convention
is used with the standard C++ header files, but not necessarily everywhere else.
There is more to be said about header files. Section 7.9 provides an in-depth discussion of the preferred
organization of C++ header files.
In C local variables can only be defined at the top of a function or at the beginning of a nested block. In
C++ local variables can be created at any position in the code, even between statements.
Furthermore, local variables can be defined within some statements, just prior to their usage. A typical
example is the for statement:
#include <stdio.h>
int main()
{
for (int i = 0; i < 20; ++i)
printf("%d\n", i);
}
In this program the variable i is created in the initialization section of the for statement. According
to the ANSI-standard, the variable does not exist prior to the for-statement and not beyond the for-
statement. With some older compilers, the variable continues to exist after the execution of the for-
statement, but nowadays a warning like
warning: name lookup of ‘i’ changed for new ANSI ‘for’ scoping using obsolete binding at ‘i’
The implication seems clear: define a variable just before the for-statement if it is to be used beyond
that statement. Otherwise the variable should be defined inside the for-statement itself. This reduces
its scope as much as possible, which is a very desirable characteristic.
Defining local variables when they’re needed requires a little getting used to. However, eventually it
tends to produce more readable, maintainable and often more efficient code than defining variables
at the beginning of compound statements. We suggest the following rules of thumb for defining local
variables:
• Local variables should be created at ‘intuitively right’ places, such as in the example above. This
20 CHAPTER 2. INTRODUCTION
does not only entail the for-statement, but also all situations where a variable is only needed,
say, half-way through the function.
• More in general, variables should be defined in such a way that their scope is as limited and
localized as possible. When avoidable local variables are not defined at the beginning of functions
but rather where they’re first used.
• It is considered good practice to avoid global variables. It is fairly easy to lose track of which global
variable is used for what purpose. In C++ global variables are seldom required, and by localizing
variables the well known phenomenon of using the same variable for multiple purposes, thereby
invalidating each individual purpose of the variable, can easily be prevented.
If considered appropriate, nested blocks can be used to localize auxiliary variables. However, situations
exist where local variables are considered appropriate inside nested statements. The just mentioned
for statement is of course a case in point, but local variables can also be defined within the condition
clauses of if-else statements, within selection clauses of switch statements and condition clauses of
while statements. Variables thus defined will be available to the full statement, including its nested
statements. For example, consider the following switch statement:
#include <stdio.h>
int main()
{
switch (int c = getchar())
{
case ’a’:
case ’e’:
case ’i’:
case ’o’:
case ’u’:
printf("Saw vowel %c\n", c);
break;
case EOF:
printf("Saw EOF\n");
break;
default:
printf("Saw other character, hex value 0x%2x\n", c);
}
}
Note the location of the definition of the character ‘c’: it is defined in the expression part of the switch
statement. This implies that ‘c’ is available only to the switch statement itself, including its nested
(sub)statements, but not outside the scope of the switch.
The same approach can be used with if and while statements: a variable that is defined in the
condition part of an if and while statement is available in their nested statements. There are some
caveats, though:
• The variable definition must result in a variable which is initialized to a numeric or logical value;
• The variable definition cannot be nested (e.g., using parentheses) within a more complex expres-
sion.
The latter point of attention should come as no big surprise: in order to be able to evaluate the logi-
cal condition of an if or while statement, the value of the variable must be interpretable as either
zero (false) or non-zero (true). Usually this is no problem, but in C++ objects (like objects of the type
2.5. DIFFERENCES BETWEEN C AND C++ 21
std::string (cf. chapter 5)) are often returned by functions. Such objects may or may not be in-
terpretable as numeric values. If not (as is the case with std::string objects), then such variables
can not be defined at the condition or expression clauses of condition- or repetition statements. The
following example will therefore not compile:
The above example requires additional clarification. Often a variable can profitably be given local scope,
but an extra check is required immediately following its initialization. The initialization and the test
cannot both be combined in one expression. Instead two nested statements are required. Consequently,
the following example won’t compile either:
If such a situation occurs, either use two nested if statements, or localize the definition of int c using
a nested compound statement:
The keyword typedef is still allowed in C++, but is not required anymore when defining union,
struct or enum definitions. This is illustrated in the following example:
struct somestruct
{
int a;
double d;
char string[80];
};
When a struct, union or other compound type is defined, the tag of this type can be used as type
name (this is somestruct in the above example):
somestruct what;
what.d = 3.1415;
In C++ we may define functions as members of structs. Here we encounter the first concrete example
of an object: as previously described (see section 2.4), an object is a structure containing data while
specialized functions exist to manipulate those data.
22 CHAPTER 2. INTRODUCTION
A definition of a struct Point is provided by the code fragment below. In this structure, two int
data fields and one function draw are declared.
A similar structure could be part of a painting program and could, e.g., represent a pixel. With respect
to this struct it should be noted that:
• The function draw mentioned in the struct definition is a mere declaration. The actual code
of the function defining the actions performed by the function is found elsewhere (the concept of
functions inside structs is further discussed in section 3.2).
• The size of the struct Point is equal to the size of its two ints. A function declared inside the
structure does not affect its size. The compiler implements this behavior by allowing the function
draw to be available only in the context of a Point.
b = a; // copy a to b
b.y = 20; // redefine y-coord
b.draw(); // and draw it
As shown in the above example a function that is part of the structure may be selected using the dot (.)
(the arrow (->) operator is used when pointers to objects are available). This is therefore identical to
the way data fields of structures are selected.
The idea behind this syntactic construction is that several types may contain functions having identical
names. E.g., a structure representing a circle might contain three int values: two values for the
coordinates of the center of the circle and one value for the radius. Analogously to the Point structure,
a Circle may now have a function draw to draw the circle.
Chapter 3
In this chapter C++ is further explored. The possibility to declare functions in structs is illustrated
in various examples; the concept of a class is introduced; casting is covered in detail; many new types
are introduced and several important notational extensions to C are discussed.
3.1 Extensions to C
Before we continue with the ‘real’ object-approach to programming, we first introduce some extensions
to the C programming language: not mere differences between C and C++, but syntactic constructs
and keywords not found in C.
3.1.1 Namespaces
C++ introduces the notion of a namespace: all symbols are defined in a larger context, called a names-
pace. Namespaces are used to avoid name conflicts that could arise when a programmer would like to
define a function like sin operating on degrees, but does not want to lose the capability of using the
standard sin function, operating on radians.
Namespaces are covered extensively in chapter 4. For now it should be noted that most compilers
require the explicit declaration of a standard namespace: std. So, unless otherwise indicated, it is
stressed that all examples in the Annotations now implicitly use the
declaration. So, if you actually intend to compile examples given in the C++ Annotations, make sure
that the sources start with the above using declaration.
C++ introduces several new operators, among which the scope resolution operator (::). This operator
can be used in situations where a global variable exists having the same name as a local variable:
#include <stdio.h>
int main()
23
24 CHAPTER 3. A FIRST IMPRESSION OF C++
{
for (int counter = 1; // this refers to the
counter < 10; // local variable
counter++)
{
printf("%d\n",
::counter // global variable
/ // divided by
counter); // local variable
}
}
In the above program the scope operator is used to address a global variable instead of the local variable
having the same name. In C++ the scope operator is used extensively, but it is seldom used to reach
a global variable shadowed by an identically named local variable. Its main purpose is described in
chapter 7.
Even though the keyword const is part of the C grammar, its use is more important and much more
common in C++ than it is in C.
The const keyword is a modifier stating that the value of a variable or of an argument may not be
modified. In the following example the intent is to change the value of a variable ival, which fails:
int main()
{
int const ival = 3; // a constant int
// initialized to 3
This example shows how ival may be initialized to a given value in its definition; attempts to change
the value later (in an assignment) are not permitted.
Variables that are declared const can, in contrast to C, be used to specify the size of an array, as in the
following example:
Another use of the keyword const is seen in the declaration of pointers, e.g., in pointer-arguments. In
the declaration
buf is a pointer variable pointing to chars. Whatever is pointed to by buf may not be changed through
buf: the chars are declared as const. The pointer buf itself however may be changed. A statement
like *buf = ’a’; is therefore not allowed, while ++buf is.
In the declaration
buf itself is a const pointer which may not be changed. Whatever chars are pointed to by buf may
be changed at will.
is also possible; here, neither the pointer nor what it points to may be changed.
The rule of thumb for the placement of the keyword const is the following: whatever occurs to the left
to the keyword may not be changed.
Although simple, this rule of thumb is often used. For example, Bjarne Stroustrup states (in http://www.research.att.co
But we’ve already seen an example where applying this simple ‘before’ placement rule for the keyword
const produces unexpected (i.e., unwanted) results as we will shortly see below. Furthermore, the
‘idiomatic’ before-placement also conflicts with the notion of const functions, which we will encounter
in section 7.5. With const functions the keyword const is also placed behind rather than before the
name of the function.
The definition or declaration (either or not containing const) should always be read from the variable
or function identifier back to the type indentifier:
This rule of thumb is especially useful in cases where confusion may occur. In examples of C++ code
published in other places one often encounters the reverse: const preceding what should not be altered.
That this may result in sloppy code is indicated by our second example above:
What must remain constant here? According to the sloppy interpretation, the pointer cannot be altered
(as const precedes the pointer). In fact, the char values are the constant entities here, as becomes
clear when we try to compile the following program:
int main()
{
char const *buf = "hello";
Compilation fails on the statement *buf = ’u’; and not on the statement ++buf.
Marshall Cline’s C++ FAQ1 gives the same rule (paragraph 18.5) , in a similar context:
[18.5] What’s the difference between "const Fred* p", "Fred* const p" and "const Fred* const
p"?
You have to read pointer declarations right-to-left.
1 http://www.parashift.com/c++-faq-lite/const-correctness.html
26 CHAPTER 3. A FIRST IMPRESSION OF C++
Marshal Cline’s advice might be improved, though: you should start to read pointer definitions (and
declarations) at the variable name, reading as far as possible to the definition’s end. Once you see
a closing parenthesis, read backwards (right to left) from the initial point, until you find matching
open-parenthesis or the very beginning of the definition. For example, consider the following complex
declaration:
Here, we see:
Analogous to C, C++ defines standard input- and output streams which are available when a program
is executed. The streams are:
Syntactically these streams are not used as functions: instead, data are written to streams or read
from them using the operators <<, called the insertion operator and >>, called the extraction operator.
This is illustrated in the next example:
#include <iostream>
int main()
{
int ival;
char sval[30];
cout << "The number is: " << ival << "\n"
"And the string is: " << sval << ’\n’;
}
This program reads a number and a string from the cin stream (usually the keyboard) and prints these
data to cout. With respect to streams, please note:
• The standard streams are declared in the header file iostream. In the examples in the C++
Annotations this header file is often not mentioned explicitly. Nonetheless, it must be included
3.1. EXTENSIONS TO C 27
(either directly or indirectly) when these streams are used. Comparable to the use of the using
namespace std; clause, the reader is expected to #include <iostream> with all the examples
in which the standard streams are used.
• The streams cout, cin and cerr are variables of so-called class-types. Such variables are com-
monly called objects. Classes are discussed in detail in chapter 7 and are used extensively in
C++.
• The stream cin extracts data from a stream and copies the extracted information to variables
(e.g., ival in the above example) using the extraction operator (two consecutive > characters:
>>). We will describe later how operators in C++ can perform quite different actions than what
they are defined to do by the language, as is the case here. Function overloading has already
been mentioned. In C++ operators can also have multiple definitions, which is called operator
overloading.
• The operators which manipulate cin, cout and cerr (i.e., >> and <<) also manipulate variables
of different types. In the above example cout << ival results in the printing of an integer value,
whereas cout << "Enter a number" results in the printing of a string. The actions of the
operators therefore depend on the types of supplied variables.
• The extraction operator (>>) performs a so called type safe assignment to a variable by ‘extracting’
its value from a text stream. Normally, the extraction operator skips all white space characters
preceding the values to be extracted.
• Special symbolic constants are used for special situations. Normally a line is terminated by in-
serting "\n" or ’\n’. But when inserting the endl symbol the line is terminated followed by
the flushing of the stream’s internal buffer. Thus, endl can usually be avoided in favor of ’\n’
resulting in somewhat more efficient code.
The stream objects cin, cout and cerr are not part of the C++ grammar proper. The streams are part
of the definitions in the header file iostream. This is comparable to functions like printf that are not
part of the C grammar, but were originally written by people who considered such functions important
and collected them in a run-time library.
A program may still use the old-style functions like printf and scanf rather than the new-style
streams. The two styles can even be mixed. But streams offer several clear advantages and in many
C++ programs have completely replaced the old-style C functions. Some advantages of using streams
are:
• Using insertion and extraction operators is type-safe. The format strings which are used with
printf and scanf can define wrong format specifiers for their arguments, for which the compiler
sometimes can’t warn. In contrast, argument checking with cin, cout and cerr is performed by
the compiler. Consequently it isn’t possible to err by providing an int argument in places where,
according to the format string, a string argument should appear. With streams there are no
format strings.
• The functions printf and scanf (and other functions using format strings) in fact implement a
mini-language which is interpreted run-time. In contrast, with streams the C++ compiler knows
exactly which in- or output action to perform given the arguments used. No mini-language here.
• In addition the possibilities of the insertion and extraction operators may be extended allowing
objects of classes that didn’t exist when the streams were originally designed to be inserted into
or extracted from streams. Mini languages as used with printf cannot be extended.
• The usage of the left-shift and right-shift operators in the context of the streams illustrates yet
another capability of C++: operator overloading allowing us to redefine the actions an operator
performs in certain contexts. Ascending from C operator overloading requires some getting used,
but after a short little while these overloaded operators feel rather comfortable.
• Streams are independent of the media they operate upon. This (at this point somewhat abstract)
notion means that the same code can be used without any modification at all to interface your code
to any kind of device. The code using streams can be used when the device is a file on disk; an
28 CHAPTER 3. A FIRST IMPRESSION OF C++
Internet connection; a digital camera; a DVD device; a satellite link; and much more: you name
it. Streams allow your code to be decoupled (independent) of the devices your code is supposed to
operate on, which eases maintenance and allows reuse of the same code in new situations.
The iostream library has a lot more to offer than just cin, cout and cerr. In chapter 6 iostreams
will be covered in greater detail. Even though printf and friends can still be used in C++ programs,
streams have practically replaced the old-style C I/O functions like printf. If you think you still
need to use printf and related functions, think again: in that case you’ve probably not yet completely
grasped the possibilities of stream objects.
Earlier it was mentioned that functions can be part of structs (see section 2.5.13). Such functions are
called member functions. This section briefly discusses how to define such functions.
The code fragment below shows a struct having data fields for a person’s name and address. A
function print is included in the struct’s definition:
struct Person
{
char name[80];
char address[80];
void print();
};
When defining the member function print the structure’s name (Person) and the scope resolution
operator (::) are used:
void Person::print()
{
cout << "Name: " << name << "\n"
"Address: " << address << ’\n’;
}
The implementation of Person::print shows how the fields of the struct can be accessed without
using the structure’s type name. Here the function Person::print prints a variable name. Since
Person::print is itself a part of struct person, the variable name implicitly refers to the same
type.
Person person;
strcpy(person.name, "Karel");
strcpy(p.address, "Marskramerstraat 33");
p.print();
The advantage of member functions is that the called function automatically accesses the data fields
of the structure for which it was invoked. In the statement person.print() the object person is the
‘substrate’: the variables name and address that are used in the code of print refer to the data stored
in the person object.
3.2. FUNCTIONS AS PART OF STRUCTS 29
As mentioned before (see section 2.3), C++ contains specialized syntactic possibilities to implement
data hiding. Data hiding is the capability of sections of a program to hide its data from other sections.
This results in very clean data definitions. It also allows these sections to enforce the integrity of their
data.
C++ has three keywords that are related to data hiding: private, protected and public. These
keywords can be used in the definition of structs. The keyword public allows all subsequent fields of
a structure to be accessed by all code; the keyword private only allows code that is part of the struct
itself to access subsequent fields. The keyword protected is discussed in chapter 13, and is somewhat
outside of the scope of the current discussion.
In a struct all fields are public, unless explicitly stated otherwise. Using this knowledge we can
expand the struct Person:
struct Person
{
private:
char d_name[80];
char d_address[80];
public:
void setName(char const *n);
void setAddress(char const *a);
void print();
char const *name();
char const *address();
};
As the data fields d_name and d_address are in a private section they are only accessible to the
member functions which are defined in the struct: these are the functions setName, setAddress
etc.. As an illustration consider the following code:
Person fbb;
Data integrity is implemented as follows: the actual data of a struct Person are mentioned in the
structure definition. The data are accessed by the outside world using special functions that are also
part of the definition. These member functions control all traffic between the data fields and other parts
of the program and are therefore also called ‘interface’ functions. The thus implemented data hiding
is illustrated in Figure 3.1. The members setName and setAddress are declared with char const
* parameters. This indicates that the functions will not alter the strings which are supplied as their
arguments. Analogously, the members name and address return char const *s: the compiler will
prevent callers of those members from modifying the information made accessible through the return
values of those members.
Two examples of member functions of the struct Person are shown below:
Figure 3.1: Private data and public interface functions of the class Person.
return d_name;
}
The power of member functions and of the concept of data hiding results from the abilities of member
functions to perform special tasks, e.g., checking the validity of the data. In the above example setName
copies only up to 79 characters from its argument to the data member name, thereby avoiding a buffer
overflow.
Another illustration of the concept of data hiding is the following. As an alternative to member func-
tions that keep their data in memory a library could be developed featuring member functions storing
data on file. To convert a program which stores Person structures in memory to one that stores the
data on disk no special modifications would be required. After recompilation and linking the program to
a new library it will have converted from storage in memory to storage on disk. This example illustrates
a broader concept than data hiding; it illustrates encapsulation. Data hiding is a kind of encapsulation.
Encapsulation in general results in reduced coupling of different sections of a program. This in turn
greatly enhances reusability and maintainability of the resulting software. By having the structure
encapsulate the actual storage medium the program using the structure becomes independent of the
actual storage medium that is used.
Though data hiding can be implemented using structs, more often (almost always) classes are used
instead. A class is a kind of struct, except that a class uses private access by default, whereas structs
use public access by default. The definition of a class Person is therefore identical to the one shown
above, except for the fact that the keyword class has replaced struct while the initial private:
clause can be omitted. Our typographic suggestion for class names (and other type names defined by
the programmer) is to start with a capital character to be followed by the remainder of the type name
using lower case letters (e.g., Person).
In this section we’ll discuss an important difference between C and C++ structs and (member) func-
tions. In C it is common to define several functions to process a struct, which then require a pointer
to the struct as one of their arguments. An imaginary C header file showing this concept is:
/* print information */
void print(PERSON const *p);
/* etc.. */
In C++, the declarations of the involved functions are put inside the definition of the struct or class.
The argument denoting which struct is involved is no longer needed.
class Person
{
char d_name[80];
char d_address[80];
public:
void initialize(char const *nm, char const *adr);
void print();
// etc..
};
In C++ the struct parameter is not used. A C function call such as:
PERSON x;
becomes in C++:
Person x;
3.3.1 References
In addition to the common ways to define variables (plain variables or pointers) C++ introduces refer-
ences defining synonyms for variables. A reference to a variable is like an alias; the variable and the
reference can both be used in statements involving the variable:
int int_value;
int &ref = int_value;
In the above example a variable int_value is defined. Subsequently a reference ref is defined, which
(due to its initialization) refers to the same memory location as int_value. In the definition of ref, the
reference operator & indicates that ref is not itself an int but a reference to one. The two statements
++int_value;
++ref;
32 CHAPTER 3. A FIRST IMPRESSION OF C++
have the same effect: they increment int_value’s value. Whether that location is called int_value
or ref does not matter.
References serve an important function in C++ as a means to pass modifiable arguments to functions.
E.g., in standard C, a function that increases the value of its argument by five and returning nothing
needs a pointer parameter:
int main()
{
int x;
This construction can also be used in C++ but the same effect is also achieved using a reference:
int main()
{
int x;
It is arguable whether code such as the above should be preferred over C’s method, though. The
statement increase (x) suggests that not x itself but a copy is passed. Yet the value of x changes
because of the way increase() is defined. However, references can also be used to pass objects that
are only inspected (without the need for a copy or a const *) or to pass objects whose modification is an
accepted side-effect of their use. In those cases using references are strongly preferred over existing
alternatives like copy by value or passing pointers.
Behind the scenes references are implemented using pointers. So, as far as the compiler is concerned
references in C++ are just const pointers. With references, however, the programmer does not need
to know or to bother about levels of indirection. An important distinction between plain pointers and
references is of course that with references no indirection takes place. For example:
• In those situations where a function does not alter its parameters of a built-in or pointer type,
value parameters can be used:
{
cout << val << ’\n’;
}
int main()
{
int x;
• When a function explicitly must change the values of its arguments, a pointer parameter is pre-
ferred. These pointer parameters should preferably be the function’s initial parameters. This is
called return by argument.
• When a function doesn’t change the value of its class- or struct-type arguments, or if the modi-
fication of the argument is a trivial side-effect (e.g., the argument is a stream) references can be
used. Const-references should be used if the function does not modify the argument:
int main ()
{
int x = 7;
by_pointer(&x); // a pointer is passed
// x might be changed
string str("hello");
by_reference(str); // str is not altered
}
References play an important role in cases where the argument is not changed by the function
but where it is undesirable to copy the argument to initialize the parameter. Such a situation
occurs when a large object is passed as argument, or is returned by the function. In these cases
the copying operation tends to become a significant factor, as the entire object must be copied. In
these cases references are preferred.
If the argument isn’t modified by the function, or if the caller shouldn’t modify the returned
information, the const keyword should be used. Consider the following example:
// printperson expects a
// reference to a structure
// but won’t change it
void printperson (Person const &p)
34 CHAPTER 3. A FIRST IMPRESSION OF C++
{
cout << "Name: " << p.name << ’\n’ <<
"Address: " << p.address << ’\n’;
}
// get a person by indexvalue
Person const &person(int index)
{
return person[index]; // a reference is returned,
} // not a copy of person[index]
int main()
{
Person boss;
• Furthermore, note that there is yet another reason for using references when passing objects as
function arguments. When passing a reference to an object, the activation of a so called copy
constructor is avoided. Copy constructors will be covered in chapter 8.
References could result in extremely ‘ugly’ code. A function may return a reference to a variable, as in
the following example:
int &func()
{
static int value;
return value;
}
func() = 20;
func() += func();
It is probably superfluous to note that such constructions should normally not be used. Nonetheless,
there are situations where it is useful to return a reference. We have actually already seen an example
of this phenomenon in our previous discussion of streams. In a statement like cout << "Hello" <<
’\n’; the insertion operator returns a reference to cout. So, in this statement first the "Hello" is
inserted into cout, producing a reference to cout. Through this reference the ’\n’ is then inserted in
the cout object, again producing a reference to cout, which is then ignored.
Several differences between pointers and references are pointed out in the next list below:
• A reference cannot exist by itself, i.e., without something to refer to. A declaration of a reference
like
int &ref;
• References may exist as parameters of functions: they are initialized when the function is called.
• References may be used in the return types of functions. In those cases the function determines
what the return value will refer to.
• References may be used as data members of classes. We will return to this usage later.
• Pointers are variables by themselves. They point at something concrete or just “at nothing”.
• References are aliases for other variables and cannot be re-aliased to another variable. Once a
reference is defined, it refers to its particular variable.
• Pointers (except for const pointers) can be reassigned to point to different variables.
• When an address-of operator & is used with a reference, the expression yields the address of the
variable to which the reference applies. In contrast, ordinary pointers are variables themselves,
so the address of a pointer variable has nothing to do with the address of the variable pointed to.
In C++, temporary (rvalue) values are indistinguishable from const & types. the C++0x standard
adds a new reference type called an rvalue reference, defined as typename &&.
The name rvalue reference is derived from assignment statements, where the variable to the left of the
assignment operator is called an lvalue and the expression to the right of the assignment operator is
called an rvalue. Rvalues are often temporary (or anonymous) values, like values returned by functions.
In this parlance the C++ reference should be considered an lvalue reference (using the notation typename
&). They can be contrasted to rvalue references (using the notation typename &&).
The key to understanding rvalue references is anonymous variable. An anonymous variable has no
name and this is the distinguishing feature for the compiler to associate it automatically with an lvalue
reference if it has a choice. Before introducing some interesting and new constructions that weren’t
available before C++0x let’s first have a look at some distinguishing applications of lvalue references.
The following function returns a temporary (anonymous) value:
int intVal()
{
return 5;
}
Although the return value of intVal can be assigned to an int variable it requires a copying operation,
which might become prohibitive when a function does not return an int but instead some large object.
A reference or pointer cannot be used either to collect the anonymous return value as the return value
won’t survive beyond that. So the following is illegal (as noted by the compiler):
Apparently it is not possible to modify the temporary returned by intVal. But now consider the next
function:
int main()
{
receive(18);
int value = 5;
receive(value);
receive(intVal());
}
It shows the compiler selecting receive(int &&value) in all cases where it receives an anony-
mous int as its argument. Note that this includes receive(18): a value 18 has no name and thus
receive(int &&value) is called. Internally, it actually uses a temporary variable to store the 18, as
is shown by the following example which modifies receive:
Contrasting receive(int &value) with receive(int &&value) has nothing to do with int &value
not being a const reference. If receive(int const &value) is used the same results are obtained.
Bottom line: the compiler selects the overloaded function using the rvalue reference if the function is
passed an anonymous value.
The compiler runs into problems if void receive(int &value) is replaced by void receive(int
value), though. When confronted with the choice between a value parameter and a reference param-
eter (either lvalue or rvalue) it cannot make a decision and reports an ambiguity. In practical contexts
this is not a problem. Rvalue refences were added to the language in order to be able to distinguish the
two forms of references: named values (for which lvalue references are used) and anonymous values
(for which rvalue references are used).
It is this distinction that allows the implementation of move semantics and perfect forwarding. At
this point the concept of move semantics cannot yet fully be discussed (but see section 8.6 for a more
thorough discussusion) but it is very well possible to illustrate the underlying ideas.
Consider the situation where a function returns a struct Data containing a pointer to dynamically al-
located characters. Moreover, the struct defines a member function copy(Data const &other) that
takes another Data object and copies the other’s data into the current object. The (partial) definition of
the struct Data might look like this2 :
struct Data
{
char *text;
size_t size;
void copy(Data const &other)
{
text = strdup(other.text);
2 To the observant reader: in this example the memory leak that results from using Data::copy() should be ignored
3.3. MORE EXTENSIONS TO C 37
size = strlen(text);
}
};
int main()
{
Data d1 = {strdup("hello"), strlen("hello")};
Data d2;
d2.copy(d1); // 1 (see text)
Data d3;
d3.copy(dataFactory("hello")); // 2
}
At (1) d2 appropriately receives a copy of d1’s text. But at (2) d3 receives a copy of the text stored in
the temporary returned by the dataFactory function. As the temporary ceases to exist after the call
to copy() two releated and unpleasant consequences are observed:
• The return value is a temporary object: its only reason for existence is to pass its data on to d3.
Now d3 copies the temporary’s data which clearly is somewhat overdone.
• The temporary Data object is lost following the call to copy(). Unfortunately its dynamically
allocated data is lost as well resulting in a memory leak.
In cases like these rvalue reference should be used. By overloading the copy member with a member
copy(Data &&other) the compiler is able to distinguish situations (1) and (2). It now calls the initial
copy() member in situation (1) and the newly defined overloaded copy() member in situation (2):
struct Data
{
char *text;
size_t size;
void copy(Data const &other)
{
text = strdup(other.text);
}
void copy(Data &&other)
{
text = other.text;
other.text = 0;
}
};
Note that the overloaded copy() function merely moves the other.text pointer to the current object’s
text pointer followed by reassigning 0 to other.text. Struct Data suddenly has become move-
aware and implements move semantics, removing the drawbacks of the previously shown approach:
• Instead of making a deep copy (which is required in situation (1)), the pointer value is simply
moved to its new owner;
38 CHAPTER 3. A FIRST IMPRESSION OF C++
• Since the other.text doesn’t point to dynamically allocated memory anymore the memory leak
is prevented.
Rvalue references for *this and initialization of class objects by rvalues are not yet supported by the
g++ compiler.
Standard ASCII-C strings are delimited by double quotes, supporting escape sequences like \n, \\
and \". In some cases it is useful to avoid escaping strings (e.g., in the context of XML). To this end,
the C++0x standard offers raw string literals.
Raw string literals start with an R, followed by a double quote, followed by a label (which is an arbitrary
sequence of characters not equal to [), followed by [. The raw string ends at the closing bracket ],
followed by the label which is in turn followed by a double quote. Example:
In the first case, everything between "[ and ]" is part of the string. Escape sequences aren’t supported
so \" defines three characters: a backslash, a blank character and a double quote. The second example
shows a raw string defined between the markers "delimiter[ and ]delimiter".
Enumeration values in C++ are in fact int values, thereby bypassing type safety. E.g., values of
different enumeration types may be compared for (in)equality, albeit through a (static) type cast.
Another problem with the current enum type is that their values are not restricted to the enum type
name itself, but to the scope where the enumeration is defined. As a consequence, two enumerations
having the same scope cannot have identical values.
In the C++0x standard these problems are solved by defining enum classes. An enum class can be
defined as in the following example:
Enum classes use int values by default, but the used value type can easily be changed using the :
type notation, as in:
To use a value defined in an enum class its enumeration name must be provided as well. E.g., OK is not
defined, CharEnum::OK is.
Using the data type specification (noting that it defaults to int) it is possible to use enum class forward
declarations. E.g.,
3.3. MORE EXTENSIONS TO C 39
The C language defines the initializer list as a list of values enclosed by curly braces, possibly them-
selves containing initializer lists. In C these initializer lists are commonly used to initialize arrays and
structs.
C++ extends this concept in the C++0x standard by introducing the type initializer_list<Type>
where Type is replaced by the type name of the values used in the initializer list. Initializer lists
in C++ are, like their counterparts in C, recursive, so they can also be used with multi-dimensional
arrays, structs and classes.
Like in C, initializer lists consist of a list of values surrounded by curly braces. But unlike C, functions
can define initializer list parameters. E.g.,
The initializer list appears as an argument which is a list of values surrounded by curly braces. Due to
the recursive nature of initializer lists a two-dimensional series of values can also be passes, as shown
in the next example:
values2({{1, 2}, {2, 3}, {3, 5}, {4, 7}, {5, 11}, {6, 13}});
Initializer lists are constant expressions and cannot be modified. However, their size and values may
be retrieved using their size, begin, and end members as follows:
Initializer lists can also be used to initialize objects of classes (cf. section 7.3).
40 CHAPTER 3. A FIRST IMPRESSION OF C++
A special use of the keyword auto is defined by the C++0x standard allowing the compiler to determine
the type of a variable automatically rather than requiring the software engineer to define a variable’s
type explicitly.
In parallel, the use of auto as a storage class specifier is no longer supported in the C++0x standard.
According to that standard a variable definition like auto int var results in a compilation error.
This can be very useful in situations where it is very hard to determine the variable’s type in advance.
These situations occur, e.g., in the context of templates, topics covered in chapters 18 until 22.
At this point in the Annotations only simple examples can be given, and some hints will be provided
about more general uses of the auto keyword.
When defining and initializing a variable int variable = 5 the type of the initializing expression
is well known: it’s an int, and unless the programmer’s intentions are different this could be used
to define variable’s type (although it shouldn’t in normal circumstances as it reduces rather than
improves the clarity of the code):
auto variable = 5;
Here are some examples where using auto is useful. In chapter 5 the iterator concept is introduced
(see also chapters 12 and 18). Iterators sometimes have long type definitions, like
std::vector<std::string>::const_reverse_iterator
Functions may return types like this. Since the compiler knows the types returned by functions we
may exploit this knowledge using auto. Assuming that a function begin() is declared as follows:
std::vector<std::string>::const_reverse_iterator begin();
Rather than writing the verbose variable definition (at // 1) a much shorter definition (at // 2) may
be used:
It’s easy to define additional variables of this type. When initializing those variables using iter the
auto keyword can be used again:
If start can’t be initialized immediately using an existing variable the type of a well known variable
of function can be used in combination with the decltype keyword, as in:
decltype(iter) start;
decltype(begin()) spare;
The keyword decltype may also receive an expression as its argument. This feature is already avail-
able in the C++0x standard implementation in g++ 4.3. E.g., decltype(3 + 5) represents an int,
decltype(3 / double(3)) represents double.
The auto keyword can also be used to postpone the definition of a function’s return type. The declara-
tion of a function intArrPtr returning a pointer to an array of 10 ints looks like this:
int (*intArrPtr())[10];
3.4. NEW LANGUAGE-DEFINED DATA TYPES 41
Such a declaration is fairly complex. E.g., among other complexities it requires ‘protection of the
pointer’ using parentheses in combination with the function’s parameter list. In situations like these
the specification of the return type can be postponed using the auto return type, followed by the speci-
fication of the function’s return type after any other specification the function might receive (e.g., as a
const member (cf. section 7.5) or following its exception throw list (cf. section 9.6)).
The auto keyword can also be used to defined types that are related to the actual auto associated type.
Here are some examples:
vector<int> vi;
auto iter = vi.begin(); // standard: auto is vector<int>::iterator
auto &&rref = vi.begin(); // auto is rvalue ref. to the iterator type
auto *ptr = &iter; // auto is pointer to the iterator type
auto *ptr = &rref; // same
In many cases, however, the initialization, condition and increment parts are fairly obvious as in situ-
ations where all elements of an array or vector must be processed. Many languages offer the foreach
statement for that and C++ offers the std::for_each generic algorithm (cf. section 19.1.17).
The C++0x standard adds a new for statement syntax to this. The new syntax can be used to process
each element of a range. Three types of ranges are distinguished:
In these cases the C++0x standard offers the following additional for-statement syntax:
here an int &element is defined whose lifetime and scope is restricted to the lifetime of the for-
statement. It refers to each of the subsequent elements of array at each new iteration of the for-
statement, starting with the first element of the range.
In C the following built-in data types are available: void, char, short, int, long, float and
double. C++ extends these built-in types with several additional built-in types: the types bool,
42 CHAPTER 3. A FIRST IMPRESSION OF C++
wchar_t, long long and long double (Cf. ANSI/ISO draft (1995), par. 27.6.2.4.1 for examples
of these very long types). The type long long is merely a double-long long datatype. The type long
double is merely a double-long double datatype. These built-in types as well as pointer variables are
called primitive types in the C++ Annotations.
Except for these built-in types the class-type string is available for handling character strings. The
datatypes bool, and wchar_t are covered in the following sections, the datatype string is covered in
chapter 5. Note that recent versions of C may also have adopted some of these newer data types (no-
tably bool and wchar_t). Traditionally, however, C doesn’t support them, hence they are mentioned
here.
Now that these new types are introduced, let’s refresh your memory about letters that can be used in
literal constants of various types. They are:
• E or e: the exponentiation character in floating point literal values. For example: 1.23E+3.
Here, E should be pronounced (and interpreted) as: times 10 to the power. Therefore, 1.23E+3
represents the value 1230.
• F can be used as postfix to a non-integral numeric constant to indicate a value of type float,
rather than double, which is the default. For example: 12.F (the dot transforms 12 into a
floating point value); 1.23E+3F (see the previous example. 1.23E+3 is a double value, whereas
1.23E+3F is a float value).
• L can be used as prefix to indicate a character string whose elements are wchar_t-type characters.
For example: L"hello world".
• L can be used as postfix to an integral value to indicate a value of type long, rather than
int, which is the default. Note that there is no letter indicating a short type. For that a
static_cast<short>() must be used.
• p, to specify the power in hexadecimal floating point numbers. E.g. 0x10p4. The exponent itself is
read as a decimal constant and can therefore not start with 0x. The exponent part is interpreted
as a power of 2. So 0x10p2 is (decimal) equal to 64: 16 * 2^2.
• U can be used as postfix to an integral value to indicate an unsigned value, rather than an int.
It may also be combined with the postfix L to produce an unsigned long int value.
And, of course: the x and a until f characters can be used to specify hexadecimal constants (optionally
using capital letters).
The type bool represents boolean (logical) values, for which the (now reserved) constants true and
false may be used. Except for these reserved values, integral values may also be assigned to vari-
ables of type bool, which are then implicitly converted to true and false according to the following
conversion rules (assume intValue is an int-variable, and boolValue is a bool-variable):
Furthermore, when bool values are inserted into streams then true is represented by 1, and false
is represented by 0. Consider the following example:
cout << "A true value: " << true << "\n"
"A false value: " << false << ’\n’;
3.4. NEW LANGUAGE-DEFINED DATA TYPES 43
The bool data type is found in other programming languages as well. Pascal has its type Boolean;
Java has a boolean type. Different from these languages, C++’s type bool acts like a kind of int type.
It is primarily a documentation-improving type, having just two values true and false. Actually,
these values can be interpreted as enum values for 1 and 0. Doing so would ignore the philosophy
behind the bool data type, but nevertheless: assigning true to an int variable neither produces
warnings nor errors.
Using the bool-type is usually clearer than using int. Consider the following prototypes:
For the first prototype, readers will expect the function to return true if the given filename is the name
of an existing file. However, using the second prototype some ambiguity arises: intuitively the return
value 1 is appealing, as it allows constructions like
if (exists("myfile"))
cout << "myfile exists";
On the other hand, many system functions (like access, stat, and many other) return 0 to indicate a
successful operation, reserving other values to indicate various types of errors.
As a rule of thumb I suggest the following: if a function should inform its caller about the success or
failure of its task, let the function return a bool value. If the function should return success or various
types of errors, let the function return enum values, documenting the situation by its various symbolic
constants. Only when the function returns a conceptually meaningful integral value (like the sum of
two int values), let the function return an int value.
The wchar_t type is an extension of the char built-in type, to accomodate wide character values (but
see also the next section). The g++ compiler reports sizeof(wchar_t) as 4, which easily accomodates
all 65,536 different Unicode character values.
Note that Java’s char data type is somewhat comparable to C++’s wchar_t type. Java’s char type is
2 bytes wide, though. On the other hand, Java’s byte data type is comparable to C++’s char type: one
byte. Confusing?
In C++ string literals can be defined as ASCII-Z C strings. Prepending an ASCII-Z string by L (e.g.,
L"hello") defines a wchar_t string literal.
The new C++0x standard adds to this support for 8, 16 and 32 bit Unicode encoded strings. Further-
more, two new data types are introduced: char16_t and char32_t storing, respectively, a UTF-16
and UTF-32 unicode value.
In addition, char will be large enough to contain any UTF-8 unicode value as well (i.e., it will remain
an 8-bit value).
String literals for the various types of unicode encodings (and associated variables) can be defined as
follows:
Alternatively, unicode constants may be defined using the \u escape sequence, followed by a hexadec-
imal value. Depending on the type of the unicode variable (or constant) a UTF-8, UTF-16 or UTF-32
value is used. E.g.,
Unicode strings can be delimited by double quotes but raw string literals can also be used.
The C++0x standard adds the type long long int to the set of standard types. On 32 bit systems it
will have at least 64 usable bits. Some compilers already supported long long int as an extension,
but C++0x officially adds it to C++.
The size_t type is not really a built-in primitive data type, but a data type that is promoted by POSIX
as a typename to be used for non-negative integral values answering questions like ‘how much’ and ‘how
many’, in which case it should be used instead of unsigned int. It is not a specific C++ type, but also
available in, e.g., C. Usually it is defined implictly when a (any) system header file is included. The
header file ‘officially’ defining size_t in the context of C++ is cstddef.
Using size_t has the advantage of being a conceptual type, rather than a standard type that is then
modified by a modifier. Thus, it improves the self-documenting value of source code.
Sometimes functions explictly require unsigned int to be used. E.g., on amd-architectures the X-
windows function XQueryPointer explicitly requires a pointer to an unsigned int variable as one
of its arguments. In such situations a pointer to a size_t variable can’t be used, but the address of an
unsigned int must be provided. Such situations are exceptional, though.
Other useful bit-represented types also exists. E.g., uint32_t is guaranteed to hold 32-bits unsigned
values. Analogously, int32_t holds 32-bits signed values. Corresponding types exist for 8, 16 and 64
bits values. These types are defined in the header file cstdint.
(typename)expression
C style casts are now deprecated. Although C++ offers function call notations using the following
syntax:
typename(expression)
the function call notation does in fact not represents a cast, but a request to the compiler to construct
an (anonymous) variable having type typename from expression. Although this form is very often
used in C++, it should not be used for casting. Instead, there are now four new-style casts available,
that are introduced in the following sections.
3.5. A NEW SYNTAX FOR CASTS 45
The C++0x standard defines the shared_ptr type (cf. section 18.4). To cast shared pointers specialized
casts should be used. These are discussed in section 18.4.5.
static_cast<type>(expression)
This type of cast is used to convert, e.g., a double to an int: both are numbers, but as the int has no
fractions precision is potentially reduced. But the converse also holds true. When the quotient of two
int values must be assigned to a double the fraction part of the division will get lost unless a cast is
used.
Here is an example of such a cast is (assuming quotient is of type double and lhs and rhs are
int-typed variables):
If the cast is omitted, the division operator will ignore the remainder as its operands are int expres-
sions. Note that the division should be put outside of the cast expression. If the division is put inside
(as in static_cast<double>(lhs / rhs)) an integer division will have been performed before the
cast has had a chance to convert the type of an operand to double.
Another nice example of code in which it is a good idea to use the static_cast<>()-operator is in
situations where the arithmetic assignment operators are used in mixed-typed exprressions. Consider
the following expression (assume doubleVar is a variable of type double):
intVar += doubleVar;
intVar = static_cast<int>(static_cast<double>(intVar) +
doubleVar);
Here IntVar is first promoted to a double, and is then added as a double value to doubleVar. Next,
the sum is cast back to an int. These two casts are a bit overdone. The same result is obtained by
explicitly casting doubleVar to an int, thus obtaining an int-value for the right-hand side of the
expression:
intVar += static_cast<int>(doubleVar);
A static_cast can also be used to undo or introduce the signed-modifier of an int-typed variable.
The C function tolower requires an int representing the value of an unsigned char. But char by
default is a signed type. To call tolower using an available char ch we should use:
tolower(static_cast<unsigned char>(ch))
Casts like these provide information to the compiler about how to handle the provided data. Very often
(especially with data types differing only in size but not in representation) the cast won’t require any
additional code. Additional code will be required, however, to convert one representation to another,
e.g., when converting double to int.
46 CHAPTER 3. A FIRST IMPRESSION OF C++
The const keyword has been given a special place in casting. Normally anything const is const for
a good reason. Nonetheless situations may be encountered where the const can be ignored. For these
special situations the const_cast should be used. Its syntax is:
const_cast<type>(expression)
The need for a const_cast may occur in combination with functions from the standard C library
which traditionally weren’t always as const-aware as they should. A function strfun(char *s) might
be available, performing some operation on its char *s parameter without actually modifying the
characters pointed to by s. Passing char const hello[] = "hello"; to strfun will produce the
warning
strfun(const_cast<char *>(hello));
The third new-style cast is used to change the interpretation of information: the reinterpret_cast.
It is somewhat reminiscent of the static_cast, but reinterpret_cast should be used when it is
known that the information as defined in fact is or can be interpreted as something completely different.
Its syntax is:
Another example of a reinterpret_cast is found in combination with the write functions that are
available for files and streams. In C++ streams are the preferred interface to, e.g., files. Output streams
(like cout) offer write members having the prototype
To write a double to a stream using write a reinterpret_cast is needed as well. E.g., to write the
raw bytes of a variable double value to cout we would use:
All casts are potentially dangerous, but the reinterpret_cast is the most dangerous of all casts.
Effectively we tell the compiler: back off, we know what we’re doing, so stop fuzzing. All bets are off,
and we’d better do know what we’re doing in situations like these. As a case in point consider the
following code:
cout << "Value’s first byte has value: " << hex <<
static_cast<int>(
*reinterpret_cast<unsigned char *>(&value)
);
The above code will show different results on little and big endian computers. Little endian computers
will show the value 78, big endian computers the value 12. Also note that the different represen-
tations used by little and big endian computers renders the previous example (cout.write(...))
non-portable over computers of different architectures.
As a rule of thumb: if circumstances arise in which casts have to be used, clearly document the reasons
for their use in your code, making double sure that the cast will not eventually cause a program to
misbehave.
Finally there is a new style cast that is used in combination with polymorphism (see chapter 14). Its
syntax is:
dynamic_cast<type>(expression)
It is used run-time to convert, a pointer to an object of a class to a pointer to an object of a class that is
found further down its so-called class hierarchy (which is also called a downcast). At this point in the
Annotations a dynamic_cast cannot yet be discussed extensively, but we will return to this topic in
section 14.5.1.
C++’s keywords are a superset of C’s keywords. Here is a list of all keywords of the language:
Notes:
48 CHAPTER 3. A FIRST IMPRESSION OF C++
• The export keyword is removed from the language under the C++0x standard, but remains a
keyword, reserved for future use.
• The nullptr keyword is defined in the C++0x standard (not yet supported by the g++ compiler).
• the operator keywords: and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq,
xor and xor_eq are symbolic alternatives for, respectively, &&, &=, &, |, ~, !, !=, ||,
|=, ^ and ^=.
Keywords can only be used for their intended purpose and cannot be used as names for other entities
(e.g., variables, functions, class-names, etc.). In addition to keywords identifiers starting with an un-
derscore and living in the global namespace (i.e., not using any explicit namespace or using the mere
:: namespace specification) or living in the std namespace are reserved identifiers in the sense that
their use is a prerogative of the implementor.
Chapter 4
Name Spaces
4.1 Namespaces
Imagine a math teacher who wants to develop an interactive math program. For this program functions
like cos, sin, tan etc. are to be used accepting arguments in degrees rather than arguments in
radians. Unfortunately, the function name cos is already in use, and that function accepts radians as
its arguments, rather than degrees.
Problems like these are usually solved by defining another name, e.g., the function name cosDegrees
is defined. C++ offers an alternative solution through namespaces. Namespaces can be considered as
areas or regions in the code in which identifiers are defined that normally won’t conflict with names
already defined elsewhere.
Now that the ANSI/ISO standard has been implemented to a large degree in recent compilers, the use
of namespaces is more strictly enforced than in previous versions of compilers. This affects the setup
of class header files. At this point in the Annotations this cannot be discussed in detail, but in section
7.9.1 the construction of header files using entities from namespaces is discussed.
namespace identifier
{
// declared or defined entities
// (declarative region)
}
Within the declarative region, introduced in the above code example, functions, variables, structs,
classes and even (nested) namespaces can be defined or declared. Namespaces cannot be defined
within a function body. However, it is possible to define a namespace using multiple namespace decla-
rations. Namespaces are ‘open’ meaning that a namespace CppAnnotations could be defined in a file
file1.cc and also in a file file2.cc. Entities defined in the CppAnnotations namespace of files
file1.cc and file2.cc are then united in one CppAnnotations namespace region. For example:
// in file1.cc
namespace CppAnnotations
{
double cos(double argInDegrees)
49
50 CHAPTER 4. NAME SPACES
{
...
}
}
// in file2.cc
namespace CppAnnotations
{
double sin(double argInDegrees)
{
...
}
}
Both sin and cos are now defined in the same CppAnnotations namespace.
Namespace entities can be defined outside of their namespaces. This topic is discussed in section
4.1.4.1.
Instead of defining entities in a namespace, entities may also be declared in a namespace. This allows
us to put all the declarations in a header file that can thereupon be included in sources using the
entities defined in the namespace. Such a header file could contain, e.g.,
namespace CppAnnotations
{
double cos(double degrees);
double sin(double degrees);
}
Namespaces can be defined without a name. Such an anonymous namespace restricts the visibility of
the defined entities to the source file defining the anonymous namespace.
Entities defined in the anonymous namespace are comparable to C’s static functions and variables. In
C++ the static keyword can still be used, but its preferred use is in class definitions (see chapter 7).
In situations where in C static variables or functions would have been used the anonymous namespace
should be used in C++.
The anonymous namespace is a closed namespace: it is not possible to add entities to the same anony-
mous namespace using different source files.
Given a namespace and its entities, the scope resolution operator can be used to refer to its entities.
For example, the function cos() defined in the CppAnnotations namespace may be used as follows:
int main()
{
4.1. NAMESPACES 51
This is a rather cumbersome way to refer to the cos() function in the CppAnnotations namespace,
especially so if the function is frequently used. In cases like these an abbreviated form can be used
after specifying a using declaration. Following
calling cos will call the cos function defined in the CppAnnotations namespace. This implies that
the standard cos function, accepting radians, is not automatically called anymore. To call that latter
cos function the plain scope resolution operator should be used:
int main()
{
using CppAnnotations::cos;
...
cout << cos(60) // calls CppAnnotations::cos()
<< ::cos(1.5) // call the standard cos() function
<< ’\n’;
}
A using declaration can have restricted scope. It can be used inside a block. The using declaration
prevents the definition of entities having the same name as the one used in the using declaration. It
is not possible to specify a using declaration for a variable value in some namespace, and to define
(or declare) an identically named object in a block also containing a using declaration. Example:
int main()
{
using CppAnnotations::value;
...
cout << value << ’\n’; // uses CppAnnotations::value
int value; // error: value already declared.
}
Following this directive, all entities defined in the CppAnnotations namespace are used as if they
were declared by using declarations.
While the using directive is a quick way to import all the names of a namespace (assuming the names-
pace has previously been declared or defined), it is at the same time a somewhat dirty way to do so, as
it is less clear what entity will be used in a particular block of code.
If, e.g., cos is defined in the CppAnnotations namespace, CppAnnotations::cos will be used when
cos is called. However, if cos is not defined in the CppAnnotations namespace, the standard cos
function will be used. The using directive does not document as clearly as the using declaration what
entity will be used. Therefore use caution when applying the using directive.
52 CHAPTER 4. NAME SPACES
If Koenig lookup were called the ‘Koenig principle’, it could have been the title of a new Ludlum novel.
However, it is not. Instead it refers to a C++ technicality.
‘Koenig lookup’ refers to the fact that if a function is called without specifying its namespace, then
the namespaces of its arguments are used to determine its namespace. If the namespace in which the
arguments are defined contains such a function, then that function is used. This is called the ‘Koenig
lookup’.
This is illustrated in the following example. The function FBB::fun(FBB::Value v) is defined in the
FBB namespace. It can be called without explicitly mentioning its namespace:
#include <iostream>
namespace FBB
{
enum Value // defines FBB::Value
{
FIRST
};
void fun(Value x)
{
std::cout << "fun called for " << x << ’\n’;
}
}
int main()
{
fun(FBB::FIRST); // Koenig lookup: no namespace
// for fun() specified
}
/*
generated output:
fun called for 0
*/
The compiler is fairly smart when handling namespaces. If Value in the namespace FBB would have
been defined as typedef int Value then FBB::Value would be recognized as int, thus causing the
Koenig lookup to fail.
As another example, consider the next program. Here two namespaces are involved, each defining their
own fun function. There is no ambiguity, since the argument defines the namespace and FBB::fun is
called:
#include <iostream>
namespace FBB
{
enum Value // defines FBB::Value
{
FIRST
};
void fun(Value x)
{
std::cout << "FBB::fun() called for " << x << ’\n’;
}
4.1. NAMESPACES 53
namespace ES
{
void fun(FBB::Value x)
{
std::cout << "ES::fun() called for " << x << ’\n’;
}
}
int main()
{
fun(FBB::FIRST); // No ambiguity: argument determines
// the namespace
}
/*
generated output:
FBB::fun() called for 0
*/
Here is an example in which there is an ambiguity: fun has two arguments, one from each namespace.
The ambiguity must be resolved by the programmer:
#include <iostream>
namespace ES
{
enum Value // defines ES::Value
{
FIRST
};
}
namespace FBB
{
enum Value // defines FBB::Value
{
FIRST
};
namespace ES
{
void fun(FBB::Value x, Value y)
{
std::cout << "ES::fun() called\n";
}
}
int main()
{
// fun(FBB::FIRST, ES::FIRST); ambiguity: resolved by
// explicitly mentioning
// the namespace
54 CHAPTER 4. NAME SPACES
ES::fun(FBB::FIRST, ES::FIRST);
}
/*
generated output:
ES::fun() called
*/
An interesting subtlety with namespaces is that definitions in one namespace may break the code
defined in another namespace. It shows that namespaces may affect each other and that may backfire
if we’re not aware of their peculiarities. Consider the following example:
namespace FBB
{
struct Value
{};
namespace ES
{
void fun(int x)
{
fun(x);
}
void gun(FBB::Value x)
{
gun(x);
}
}
Whatever happens, the programmer’d better not use any of the ES::fun functions since it will doubtlessly
result in infinite recursion, but that’s not the point. The point is that the programmer won’t even be
given the opportunity to call ES::fun since the compilation fails.
Compilation fails for gun but not for fun. Why is that? Why is ES::fun flawlessly compiling while
ES::gun isn’t? In ES::fun fun(x) is called. As x’s type is not defined in a namespace the Koenig
lookup does not apply and fun calls itself with infinite recursion.
With ES::gun the argument is defined in the FBB namespace. Consequently, the FBB::gun function is
a possible candidate to be called. But ES::gun itself also is possible as ES::gun’s prototype perfectly
matches the call gun(x).
Now consider the situation where FBB::gun has not yet been declared. Then there is of course no
ambiguity. The programmer responsible for the ES namespace is resting happily. Some time after
that the programmer who’s maintaining the FBB namespace decides it may be nice to add a function
gun(Value x) to the FBB namespace. Now suddenly the code in the namespace ES breaks because of
an addition in a completely other namespace (FBB). Namespaces clearly are not completely independent
of each other and we should be aware of subtleties like the above. Later in the C++ Annotations (chapter
10) we’ll return to this issue.
The std namespace is reserved by C++. The standard defines many entities that are part of the
runtime available software (e.g., cout, cin, cerr); the templates defined in the Standard Template
Library (cf. chapter 18); and the Generic Algorithms (cf. chapter 19) are defined in the std namespace.
4.1. NAMESPACES 55
Regarding the discussion in the previous section, using declarations may be used when referring to
entities in the std namespace. For example, to use the std::cout stream, the code may declare this
object as follows:
#include <iostream>
using std::cout;
Often, however, the identifiers defined in the std namespace can all be accepted without much thought.
Because of that, one frequently encounters a using directive, allowing the programmer to omit a
namespace prefix when referring to any of the entities defined in the namespace specified with the
using directive. Instead of specifying using declarations the following using directive is frequently
encountered: construction like
#include <iostream>
using namespace std;
Should a using directive, rather than using declarations be used? As a rule of thumb one might decide
to stick to using declarations, up to the point where the list becomes impractically long, at which point
a using directive could be considered.
• Programmers should not declare or define anything inside the namespace std. This is not com-
piler enforced but is imposed upon user code by the standard;
• Using declarations and directives should not be imposed upon code written by third parties. In
practice this means that using directives and declarations should be banned from header files
and should only be used in source files (cf. section 7.9.1).
namespace CppAnnotations
{
namespace Virtual
{
void *pointer;
}
}
The variable pointer is defined in the Virtual namespace, that itself is nested under the CppAnnotations
namespace. To refer to this variable the following options are available:
• The fully qualified name can be used. A fully qualified name of an entity is a list of all the
namespaces that are encountered until reaching the definition of the entity. The namespaces and
entity are glued together by the scope resolution operator:
int main()
{
CppAnnotations::Virtual::pointer = 0;
}
using CppAnnotations::Virtual;
56 CHAPTER 4. NAME SPACES
int main()
{
Virtual::pointer = 0;
}
using CppAnnotations::Virtual::pointer;
int main()
{
pointer = 0;
}
int main()
{
pointer = 0;
}
int main()
{
pointer = 0;
}
• A combination of using declarations and using directives can be used. E.g., a using directive
can be used for the CppAnnotations namespace, and a using declaration can be used for the
Virtual::pointer variable:
int main()
{
pointer = 0;
}
At every using directive all entities of that namespace can be used without any further prefix. If a
namespace is nested, then that namespace can also be used without any further prefix. However, the
entities defined in the nested namespace still need the nested namespace’s name. Only after applying
a using declaration or directive the qualified name of the nested namespace can be omitted.
When fully qualified names are preferred but a long name like
CppAnnotations::Virtual::pointer
namespace CV = CppAnnotations::Virtual;
4.1. NAMESPACES 57
This defines CV as an alias for the full name. The variable pointer may now be accessed using:
CV::pointer = 0;
namespace CV = CppAnnotations::Virtual;
using namespace CV;
It is not strictly necessary to define members of namespaces inside a namespace region. But before an
entity is defined outside of a namespace it must have been declared inside its namespace.
To define an entity outside of its namespace its name must be fully qualified by prefixing the member
by its namespaces. The definition may be provided at the global level or at intermediate levels in the
case of nested namespaces. This allows us to define an entity belonging to namespace A::B within the
region of namespace A.
Assume the type int INT8[8] is defined in the CppAnnotations::Virtual namespace. Further-
more assume that is is our intent to define a function squares, inside the namespace
CppAnnotations::Virtual returning a pointer to CppAnnotations::Virtual::INT8.
Having defined the prerequisites within the CppAnnotations::Virtual namespace, our function
could be defined as follows (cf. chapter 8 for coverage of the memory allocation operator new[]):
namespace CppAnnotations
{
namespace Virtual
{
void *pointer;
INT8 *squares()
{
INT8 *ip = new INT8[1];
return ip;
}
}
}
The function squares defines an array of one INT8 vector, and returns its address after initializing
the vector by the squares of the first eight natural numbers.
Now the function squares can be defined outside of the CppAnnotations::Virtual namespace:
namespace CppAnnotations
{
namespace Virtual
{
void *pointer;
58 CHAPTER 4. NAME SPACES
INT8 *squares();
}
}
CppAnnotations::Virtual::INT8 *CppAnnotations::Virtual::squares()
{
INT8 *ip = new INT8[1];
return ip;
}
Finally, note that the function could also have been defined in the CppAnnotations region. In that
case the Virtual namespace would have been required when defining squares() and when specifying
its return type, while the internals of the function would remain the same:
namespace CppAnnotations
{
namespace Virtual
{
void *pointer;
INT8 *squares();
}
Virtual::INT8 *Virtual::squares()
{
INT8 *ip = new INT8[1];
return ip;
}
}
Chapter 5
C++ offers many solutions for common problems. Most of these facilities are part of the Standard
Template Library or they are implemented as generic algorithms (see chapter 19).
Among the facilities C++ programmers have developed over and over again are those manipulating
chunks of text, commonly called strings. The C programming language offers rudimentary string
support. C’s ASCII-Z terminated series of characters form the foundation upon which an enourmous
amount of code has been built1 .
To process text C++ offers a std::string type. In C++ the traditional C library functions manipu-
lating ASCII-Z strings are deprecated in favor of using string objects. Many problems in C programs
are caused by buffer overruns, boundary errors and allocation problems that can be traced back to im-
properly using these traditional C string library functions. Many of these problems can be prevented
using C++ string objects.
Actually, string objects are class type variables, and in that sense they are comparable to stream
objects like cin and cout. In this section the use of string type objects is covered. The focus is on
their definition and their use. When using string objects the member function syntax is commonly
used:
stringVariable.operation(argumentList)
For example, if string1 and string2 are variables of type std::string, then
string1.compare(string2)
In addition to the common member functions the string class also offers a wide variety of operators,
like the assignment (=) and the comparison operator (==). Operators often result in code that is easy to
understand and their use is generally preferred over the use of member functions offering comparable
functionality. E.g., rather than writing
if (string1.compare(string2) == 0)
if (string1 == string2)
To define and use string-type objects, sources must include the header file string. To merely declare
the string type the header should be included.
1 We define an ASCII-Z string as a series of ASCII-characters terminated by the ASCII-character zero (hence -Z), which has
the value zero, and should not be confused with character ’0’, which usually has the value 0x30
59
60 CHAPTER 5. THE ‘STRING’ DATA TYPE
Some of the operations that can be performed on strings return indices within the strings. Whenever
such an operation fails to find an appropriate index, the value string::npos is returned. This value
is a symbolic value of type string::size_type, which is (for all practical purposes) an (unsigned)
int.
All string members accepting string objects as arguments also accept char const * (C ASCII-Z
string) arguments. The same usually holds true for operators accepting string objects.
Some string-members use iterators. Iterators are formally introduced in section 18.2. Member func-
tions using iterators are listed in the next section (5.2), but the iterator concept itself is not further
covered by this chapter.
Strings support a large variety of members and operators. A short overview listing their capabilities
is provided in this section, with subsequent sections offering a detailed discussion. The bottom line:
C++ strings are extremely versatile and there is hardly a reason for falling back on the C library
to process text. C++ strings handle all the required memory management and thus memory related
problems, which is the #1 source of problems in C programs, can be prevented when C++ strings are
used. Strings do come at a price, though. The class’s extensive capabilities have also turned it into
a beast. It’s hard to learn and master all its features and in the end you’ll find that not all that you
expected is actually there. For example, std::string doesn’t offer case-insensitive comparisons. But
in the end it even isn’t as simple as that. It is there, but it is somewhat hidden and at this point in the
C++ Annotations it’s too early to study into that hidden corner yet. Instead, realize that C’s standard
library does offer useful functions that can be used as long as we’re aware of their limitations and are
able to avoid their traps. So for now, to perform a classical case-insensitive comparison of the contents
of two std::string objects str1 and str2 the following will do:
strcasecmp(str1.c_str(), str2.c_str());
• initialization:
when string objects are defined they are always properly initialized. In other words,
they are always in a valid state. Strings may be initialized empty or already existing
text can be used to initialize strings.
• assignment:
strings may be given new values. New values may be assigned using member functions
(like assign) but a plain assignment operator (i.e., =)may also be used. Furthermore,
assignment to a character buffer is also supported.
• conversions:
the partial or complete contents of string objects may be interpreted as C strings but the
string’s contents may also be processed as a series of raw binary bytes, not necessarily
terminating in an ASCII-Z byte. Furthermore, in many situations plain characters and
C strings may be used where std::strings are accepted as well.
• breakdown:
the individual characters stored in a string can be accessed using the familiar index
operator ([]) allowing us to either access or modify information in the middle of a string.
• comparisons:
strings may be compared to other strings (or C ASCII-Z strings) using the familiar logi-
cal comparison operators ==, !=, <, <=, > and >=. There are also member functions
available offering a more fine-grained comparison.
5.2. A STD::STRING REFERENCE 61
• modification:
the contents of strings may me modified in many ways. Operators are available to add
information to string objects, to insert information in the middle of string objects, or to
replace or erase (parts of) a string’s contents.
• swapping:
the string’s swapping capability allows us in principle to exchange the contents of two
string objects without a byte-by-byte copying operation of the string’s contents.
• searching:
the locations of characters, sets of characters, or series of characters may be searched for
from any position within the string object and either searching in a forward or backward
direction.
• housekeeping:
several housekeeping facilities are offered: the string’s length, or its empty-state may
be interrogated. But string objects may also be resized.
• stream I/O:
strings may be extracted from or inserted into streams. In addition to plain string ex-
traction a line of a text file may be read without running the risk of a buffer overrun.
Since extraction and insertion operations are stream based the I/O facilities are device
independent.
In this section the string members and string-related operations are referenced. The subsections cover,
respectively the string’s initializers, iterators, operators, and member functions. The following termi-
nology is used throughout this section:
Both opos and apos must refer to existing offsets, or an exception (cf. chapter 9) is generated. In
contrast, an and on may exceed the number of available characters, in which case only the available
characters will be considered.
Many members declare default values for on, an and apos. Some members declare default values
for opos. Default offset values are 0, the default values of on and an is string::npos, which can be
interpreted as ‘the required number of characters to reach the end of the string’.
With members starting their operations at the end of the string object’s contents proceeding backwards,
the default value of opos is the index of the object’s last character, with on by default equal to opos +
1, representing the length of the substring ending at opos.
In the overview of member functions presented below it may be assumed that all these parameters
accept default values unless indicated otherwise. Of course, the default argument values cannot be
used if a function requires additional arguments beyond the ones otherwise accepting default values.
62 CHAPTER 5. THE ‘STRING’ DATA TYPE
Some members have overloaded versions expecting an initial argument of type char const *. But
even if that is not the case the first argument can always be of type char const * where a parameter
of std::string is defined.
Several member functions accept iterators. Section 18.2 covers the technical aspects of iterators, but
these may be ignored at this point without loss of continuity. Like apos and opos, iterators must refer
to existing positions and/or to an existing range of characters within the string object’s contents.
Finally, all string-member functions computing indices return the predefined constant string::npos
on failure.
5.2.1 Initializers
After defining string objects they are guaranteed to be in a valid state. At definition time string objects
may be initialized in one of the following ways: The following string constructors are available:
• string object:
initializes object to an empty string. When defining a string this way no argument
list may be specified;
• string object(string::size_type count, char ch):
initializes object with count characters ch;
• string object(string const &argument):
initializes object with argument;
• string object(std::string const &argument, string::size_type apos,
string::size_type an):
initializes object with argument’s contents starting at index position apos, using at
most an of argument’s characters;
• string object(InputIterator begin, InputIterator end):
initializes object with the characters in the range of characters defined by the two
InputIterators.
5.2.2 Iterators
See section 18.2 for details about iterators. As a quick introduction to iterators: an iterator acts like a
pointer, and pointers can often be used in situations where iterators are requested. Iterators usually
come in pairs, defining a range of entities. The begin-iterator points to the first entity of the range, the
end-iterator points just beyond the last entity that will be considered. Their difference is equal to the
number of entities in the iterator-range.
Iterators play an important role in the context of generic algorithms (cf. chapter 19). The class
std::string defines the following iterator types:
5.2.3 Operators
String objects may be manipulated by member functions but also by operators. Using operators of-
ten results in more natural-looking code. In cases where operators are available having equivalent
functionality as member function the operator is practically always preferred.
The following operators are available for string objects (in the examples ‘object’ and ‘argument’ refer
to existing std::string objects).
• plain assignment:
• addition:
the arithmetic additive assignment operator and the addition operator add text to a
string object. The arithmetic assignment operator returns its left-hand side operand,
the addition operator returns its result in a temporary string object. When using the
addition operator either the left-hand side operand or the right-hand side operand must
be a std::string object. The other operand may be a char, a C string or a C++ string.
Example:
object += argument;
object += "hello";
object += ’x’; // integral expressions are OK
• index operator:
The index operator may be used to retrieve object’s individual characters, or to assign
new values to individual characters of a non-const string object. There is no range-
checking (use the at() member function for that). This operator returns a char & or
char const &. Example:
object[3] = argument[5];
• logical operators:
the logical comparison operators may be applied to two string objects or to a string object
and a C string to compare their contents. These operators return a bool value The ==,
!=, >, >=, < and <= operators are available. The ordering operators perform a lexi-
cographical comparison of their contents using the ASCII character collating sequence.
Example:
object == object; // true
object != (object + ’x’); // true
object <= (object + ’x’); // true
• stream related operators:
the insertion-operator (cf. section 3.1.4) may be used to insert a string object into
an ostream, the extraction-operator may be used to extract a string object from an
istream. The extraction operator by default first ignores all white space characters
64 CHAPTER 5. THE ‘STRING’ DATA TYPE
and then extracts all consecutively non-blank characters from an istream. Instead
of a string a character array may be extracted as well, but the advantage of using a
string object should be clear: the destination string object is automatically resized to
the required number of characters. Example:
cin >> object;
cout << object;
The std::string class offers many member function as well as additional non-member functions that
should be considered part of the string class. All these functions are listed below in alphabetic order.
The symbolic value string::npos is defined by the string class. It represents ‘index-not-found’ when
returned by member functions returning string offset positions. Example: when calling ‘object.find(’x’)’
(see below) on a string object not containing the character ’x’, npos is returned, as the requested po-
sition does not exist.
The final 0-byte used in C strings to indicate the end of an ASCII-Z string is not considered part of a
C++ string, and so the member function will return npos, rather than length() when looking for 0 in
a string object containing the characters of a C string.
Here are the standard functions that operate on objects of the class string. When a parameter of
size_t is mentioned it may be interpreted as a parameter of type string::size_type, but without
defining a default argument value. The type size_type should be read as string::size_type.
With size_type the default argument values mentioned in section 5.2 apply. All quoted functions are
member functions of the class std::string, except where indicated otherwise.
a reference to the character at the indicated position is returned. When called with
string const objects a char const & is returned. The member function performs
range-checking, raising an exception (that by default aborts the program) if an invalid
index is passed.
the characters in the range defined by begin and end are appended to the current string
object.
argument (or a substring) is assigned to the string object. If argument is of type char
const * and one additional argument is provided the second argument is interpreted
as a value initializing an, using 0 to initialize apos.
the number of characters that can currently be stored in the string object without need-
ing to resize it is returned.
the text stored in the current string object and the text stored in argument is compared
using a lexicographical comparison using the ASCII character collating sequence. zero
is returned if the two strings have identical contents, a negative value is returned if
the text in the current object should be ordered before the text in argument; a positive
value is returned if the text in the current object should be ordered beyond the text in
argument.
a substring of the text stored in the current string object is compared to the text stored
in argument. At most on characters starting at offset opos are compared to the text in
argument.
a substring of the text stored in the current string object is compared to a substring of
the text stored in argument. At most on characters of the current string object, starting
at offset opos, are compared to at most an characters of argument, starting at offset
apos. In this case argument must be a string object.
• int compare(size_t opos, size_t on, char const *argument, size_t an):
a substring of the text stored in the current string object is compared to a substring of
the text stored in argument. At most on characters of the current string object starting
at offset opos are compared to at most an characters of argument. Argument must
have at least an characters. The characters may have arbitrary values: the ASCII-Z
value has no special meaning.
the contents of the current string object are (partially) copied into argument. The actual
number of characters copied is returned. The second argument, specifying the number of
characters to copy, from the current string object is required. No ASCII-Z is appended
to the copied string but can be appended to the copied text using an idiom like the
following:
argument[object.copy(argument, string::npos)] = 0;
Of course, the programmer should make sure that argument’s size is large enough to
accomodate the additional 0-byte.
the raw contents of the current string object are returned. Since this member does
not return an ASCII-Z terminated C string (as c_str does), it can be used to retrieve
any kind of information stored inside the current string object including, e.g., series of
0-bytes:
string s(2, 0);
cout << static_cast<int>(s.data()[1]) << ’\n’;
the parameter end is optional. If omitted the value returned by the current object’s end
member is used. The characters defined by the begin and end iterators are erased. The
iterator begin is returned, which is then referring to the position immediately following
the last erased character.
the first index in the current string object where argument is found is returned.
the first index in the current string object where argument is found is returned. When
all three arguments are specified the first argument must be a char const *.
the first index in the current string object where ch is found is returned.
the first index in the current string object where any character in argument is found is
returned.
the first index in the current string object where any character in argument is found
is returned. If opos is provided it refers to the first index in the current string object
where the search for argument should start. If omitted, the string object is scanned
completely. If an is provided it indicates the number of characters of the char const
* argument that should be used in the search. It defines a substring starting at the
beginning of argument. If omitted, all of argument’s characters are used.
the first index in the current string object where character ch is found is returned.
the first index in the current string object where another character than ch is found is
returned.
the last index in the current string object where any character in argument is found is
returned.
the last index in the current string object where any character in argument is found
is returned. If opos is provided it refers to the last index in the current string object
where the search for argument should start. If omitted, the string object is scanned
completely. If an is provided it indicates the number of characters of the char const
* argument that should be used in the search. It defines a substring starting at the
beginning of argument. If omitted, all of argument’s characters are used.
the last index in the current string object where character ch is found is returned.
5.2. A STD::STRING REFERENCE 67
the last index in the current string object where any character not appearing in argument
is found is returned.
a (sub)string of argument is inserted into the current string object at the current string
object’s index position opos. Arguments for apos and an must either both be provided
or they must both be omitted.
argument (of type char const *) is inserted at index opos into the current string
object.
Count characters ch are inserted at index opos into the current string object.
the character ch is inserted at the current object’s position referred to by begin. Begin
is returned.
Count characters ch are inserted at the current object’s position referred to by begin.
Begin is returned.
the characters in the range defined by the InputIterators abegin and aend are
inserted at the current object’s position referred to by begin. Begin is returned.
the maximum number of characters that can be stored in the current string object is
returned.
a (sub)string of characters in object are replaced by the (subset of) characters of argument.
If on is specified as 0 argument is inserted into object at offset opos.
68 CHAPTER 5. THE ‘STRING’ DATA TYPE
Extending the standard stream (FILE) approach, well known from the C programming language, C++
offers an input/output (I/O) library based on class concepts.
All C++ I/O facilities are defined in the namespace std. The std:: prefix is omitted below, except for
situations where this would result in ambiguities.
Earlier (in chapter 3) we’ve seen several examples of the use of the C++ I/O library, in particular
showing insertion operator (<<) and the extraction operator (>>). In this chapter we’ll cover I/O in
more detail.
The discussion of input and output facilities provided by the C++ programming language heavily uses
the class concept and the notion of member functions. Although class construction has not yet been
covered (for that see chapter 7) and although inheritance will not be covered formally before chapter
13, we think it is quite possible to introduce I/O facilities long before the technical background of their
construction has been covered.
Most C++ I/O classes have names starting with basic_ (like basic_ios). However, these basic_
names are not regularly found in C++ programs, as most classes are also defined using typedef defi-
nitions like:
Since C++ supports various kinds of character types (e.g., char, wchar_t), I/O facilities were developed
using the template mechanism allowing for easy conversions to character types other than the tradi-
tional char type. As elaborated in chapter 20, this also allows the construction of generic software,
that could thereupon be used for any particular type representing characters. So, analogously to the
above typedef there exists a
This type definition can be used for the wchar_t type. Because of the existence of these type defini-
tions, the basic_ prefix was omitted from the C++ Annotations without loss of continuity. The C++
Annotations primarily focus on the standard 8-bits char type.
It must be stressed that it is not correct anymore to declare iostream objects using standard forward
declarations, like:
Using C++ I/O offers the additional advantage of type safety. Objects (or plain values) are inserted
into streams. Compare this to the situation commonly encountered in C where the fprintf function
is used to indicate by a format string what kind of value to expect where. Compared to this latter
situation C++’s iostream approach immediately uses the objects where their values should appear, as
in
cout << "There were " << nMaidens << " virgins present\n";
The compiler notices the type of the nMaidens variable, inserting its proper value at the appropriate
place in the sentence inserted into the cout iostream.
Compare this to the situation encountered in C. Although C compilers are getting smarter and smarter,
and although a well-designed C compiler may warn you for a mismatch between a format specifier
and the type of a variable encountered in the corresponding position of the argument list of a printf
statement, it can’t do much more than warn you. The type safety seen in C++ prevents you from making
type mismatches, as there are no types to match.
Apart from this, iostreams offer more or less the same set of possibilities as the standard FILE-based I/O
used in C: files can be opened, closed, positioned, read, written, etc.. In C++ the basic FILE structure,
as used in C, is still available. But C++ adds to this I/O based on classes, resulting in type safety,
extensibility, and a clean design.
In the ANSI/ISO standard the intent was to create architecture independent I/O. Previous implemen-
tations of the iostreams library did not always comply with the standard, resulting in many extensions
to the standard. The I/O sections of previously developed software may have to be partially rewritten.
This is tough for those who are now forced to modify old software, but every feature and extension
that was once available can be rebuilt easily using ANSI/ISO standard conforming I/O. Not all of these
reimplementations can be covered in this chapter, as many reimplementations relies on inheritance
and polymorphism, which will not be covered until chapters 13 and 14. Selected reimplementations
will be provided in chapter 23, and in this chapter references to particular sections in other chapters
will be given where appropriate. This chapter is organized as follows (see also Figure 6.1):
• The class ios_base is the foundation upon which the iostreams I/O library was built. It defines
the core of all I/O operations and offers, among other things, facilities for inspecting the state of
I/O streams and for output formatting.
• The class ios was directly derived from ios_base. Every class of the I/O library doing input or
output is itself derived from this ios class, and so inherits its (and, by implication, ios_base’s)
capabilities. The reader is urged to keep this in mind while reading this chapter. The concept of
inheritance is not discussed here, but rather in chapter 13.
The class ios is important in that it implements the communication with a buffer that is used
by streams. This buffer is a streambuf object which is responsible for the actual I/O to/from the
underlying device. Consequently iostream objects do not perform I/O operations themselves, but
leave these to the (stream)buffer objects with which they are associated.
• Next, basic C++ output facilities are discussed. The basic class used for output operations is
ostream, defining the insertion operator as well as other facilities writing information to streams.
Apart from inserting information into files it is possible to insert information into memory buffers,
for which the ostringstream class is available. Formatting output is to a great extent possible
using the facilities defined in the ios class, but it is also possible to insert formatting commands
directly into streams using manipulators. This aspect of C++ output is discussed as well.
• Basic C++ input facilities are implemented by the istream class. This class defines the extraction
operator and related input facilities. Comparably to insert information into to memory buffers
(using ostringstream) a class istringstream is available to extract information from memory
buffers.
• Finally, several advanced I/O-related topics are discussed. E.g., reading and writing from the
same stream and mixing C and C++ I/O using filebuf ojects. Other I/O related topics are
covered elsewhere in the C++ Annotations, e.g., in chapter 23.
73
Stream objects have a limited but important role: they are the interface between, on the one hand, the
objects to be input or output and, on the other hand, the streambuf, which is responsible for the actual
input and output to the device accessed by a streambuf object.
This approach allows us to construct a new kind of streambuf for a new kind of device, and use that
streambuf in combination with the ‘good old’ istream- and ostream-class facilities. It is important to
understand the distinction between the formatting roles of iostream objects and the buffering interface
to an external device as implemented in a streambuf object. Interfacing to new devices (like sockets
or file descriptors) requires the construcion of a new kind of streambuf, rather than a new kind of
istream or ostream object. A wrapper class may be constructed around the istream or ostream
classes, though, to ease the access to a special device. This is how the stringstream classes were
constructed.
Several iostream related header files are available. Depending on the situation at hand, the following
header files should be used:
• iosfwd: sources should include this header file if only a declaration of the stream classes is re-
quired. For example, if a function defines a reference parameter to an ostream then the compiler
doesn not need to know exactly what an ostream is. When declaring such a function the ostream
class merely needs to be be declared. One cannot use
• <streambuf>: sources should include this header file when using streambuf or filebuf classes.
See sections 14.7 and 14.7.2.
• <istream>: sources should include this preprocessor directive when using the class istream or
when using classes that do both input and output. See section 6.5.1.
• <ostream>: sources should include this header file when using the class ostream class or when
using classes that do both input and output. See section 6.4.1.
• <iostream>: sources should include this header file when using the global stream objects (like
cin and cout).
• <fstream>: sources should include this header file when using the file stream classes. See sec-
tions 6.4.2, 6.5.2, and 6.6.2.
• <sstream>: sources should include this header file when using the string stream classes. See
sections 6.4.3 and 6.5.3.
• <iomanip>: sources should include this header file when using parameterized manipulators. See
section 6.3.2.
The class std::ios_base forms the foundation of all I/O operations, and defines, among other things,
facilities for inspecting the state of I/O streams and most output formatting facilities. Every stream
6.3. INTERFACING ‘STREAMBUF’ OBJECTS: THE CLASS ‘IOS’ 75
class of the I/O library is, through the class ios, derived from this class, and inherits its capabilities.
As ios_base is the foundation on which all C++ I/O was built, we introduce it here as the first class of
the C++ I/O library.
Note that, as in C, I/O in C++ is not part of the language (although it is part of the ANSI/ISO standard
on C++). Although it is technically possible to ignore all predefined I/O facilities, nobody does so, and
the I/O library therefore represents a de facto I/O standard for C++. Also note that, as mentioned
before, the iostream classes themselves are not responsible for the eventual I/O, but delegate this to an
auxiliary class: the class streambuf or its derivatives.
It is neither possible nor required to construct an ios_base object directly. Its construction is always
a side-effect of constructing an object further down the class hierarchy, like std::ios. Ios is the next
class down the iostream hierarchy (see figure 6.1). Since all stream classes in turn inherit from ios,
and thus also from ios_base, the distinction between ios_base and ios is in practice not important.
Therefore, facilities actually provided by ios_base will be discussed as facilities provided by ios. The
reader who is interested in the true class in which a particular facility is defined should consult the
relevant header files (e.g., ios_base.h and basic_ios.h).
The std::ios class is derived directly from ios_base, and it defines de facto the foundation for all
stream classes of the C++ I/O library.
Although it is possible to construct an ios object directly, this is seldom done. The purpose of the class
ios is to provide the facilities of the class basic_ios, and to add several new facilites, all related to
the streambuf object which is managed by objects of the class ios.
All other stream classes are either directly or indirectly derived from ios. This implies, as explained
in chapter 13, that all facilities of the classes ios and ios_base are also available to other stream
classes. Before discussing these additional stream classes, the features offered by the class ios (and
by implication: by ios_base) are now introduced.
In some cases it may be required to include ios explicitly. An example is the situations where the
formatting flags themselves (cf. section 6.3.2.2) are referred to in source code.
The class ios offers several member functions, most of which are related to formatting. Other fre-
quently used member functions are:
• std::streambuf *ios::rdbuf():
A pointer to the streambuf object forming the interface between the ios object and
the device with which the ios object communicates is returned. See section 23.1.2 for
further information about the class streambuf.
• std::streambuf *ios::rdbuf(std::streambuf *new):
The current ios object is associated with another streambuf object. A pointer to the
ios object’s original streambuf object is returned. The object to which this pointer
points is not destroyed when the stream object goes out of scope, but is owned by the
caller of rdbuf.
• std::ostream *ios::tie():
A pointer to the ostream object that is currently tied to the ios object is returned (see
the next member). The return value 0 indicates that currently no ostream object is tied
to the ios object. See section 6.5.5 for details.
• std::ostream *ios::tie(std::ostream *outs):
The ostream object is tied to current ios object. This means that the ostream object
is flushed every time before an input or output action is performed by the current ios
76 CHAPTER 6. THE IO-STREAM LIBRARY
object. A pointer to the ios object’s original ostream object is returned. To break the
tie, pass the argument 0. See section 6.5.5 for an example.
Operations on streams may fail for various reasons. Whenever an operation fails, further operations on
the stream are suspended. It is possible to inspect (and possibly: clear) the condition state of streams,
allowing a program to repair the problem rather than having to abort.
• ios::badbit:
if this flag has been raised an illegal operation has been requested at the level of the
streambuf object to which the stream interfaces. See the member functions below for
some examples.
• ios::eofbit:
if this flag has been raised, the ios object has sensed end of file.
• ios::failbit:
if this flag has been raised, an operation performed by the stream object has failed (like
an attempt to extract an int when no numeric characters are available on input). In
this case the stream itself could not perform the operation that was requested of it.
• ios::goodbit:
this flag is raised when none of the other three condition flags were raised.
Several condition member functions are available to manipulate or determine the states of ios objects.
Originally they returned int values, but their current return type is bool:
• ios::bad():
the value true is returned when the stream’s badbit has been set and false other-
wise. If true is returned it indicates that an illegal operation has been requested at the
level of the streambuf object to which the stream interfaces. What does this mean? It
indicates that the streambuf itself is behaving unexpectedly. Consider the following
example:
std::ostream error(0);
Here an ostream object is constructed without providing it with a working streambuf
object. Since this ‘streambuf’ will never operate properly, its badbit flag is raised from
the very beginning: error.bad() returns true.
• ios::eof():
the value true is returned when end of file (EOF) has been sensed (i.e., the eofbit flag
has been set) and false otherwise. Assume we’re reading lines line-by-line from cin,
but the last line is not terminated by a final \n character. In that case std::getline
attempting to read the \n delimiter hits end-of-file first. This raises the eofbit flag and
cin.eof() returns true. For example, assume std::string str and main executing
the statements:
getline(cin, str);
cout << cin.eof();
Then
echo "hello world" | program
6.3. INTERFACING ‘STREAMBUF’ OBJECTS: THE CLASS ‘IOS’ 77
• ios::fail():
the value true is returned when bad returns true or when the failbit flag was set.
The value false is returned otherwise. In the above example, cin.fail() returns
false, whether we terminate the final line with a delimiter or not (as we’ve read a
line). However, trying to execute a second getline will set the failbit flag, causing
cin::fail() to return true. In general: fail returns true if the requested stream
operation failed. A simple example showing this consists of an attempt to extract an
int when the input stream contains the text hello world. The value not fail() is
returned by the bool interpretation of a stream object (see below).
• ios::good():
the value of the goodbit flag is returned. It equals true when none of the other condi-
tion flags (badbit, eofbit, failbit) was raised. Consider the following little pro-
gram:
#include <iostream>
#include <string>
void state()
{
cout << "\n"
"Bad: " << cin.bad() << " "
"Fail: " << cin.fail() << " "
"Eof: " << cin.eof() << " "
"Good: " << cin.good() << ’\n’;
}
int main()
{
string line;
int x;
cin >> x;
state();
cin.clear();
getline(cin, line);
state();
getline(cin, line);
state();
}
When this program processes a file having two lines, containing, respectively, hello
and world, while the second line is not terminated by a \n character the following is
shown:
Bad: 0 Fail: 1 Eof: 0 Good: 0
Thus, extracting x fails (good returning false). Then, the error state is cleared, and
the first line is successfully read (good returning true). Finally the second line is read
(incompletely): good returning false, and eof returning true.
• Interpreting streams as bool values:
streams may be used in expressions expecting logical values. Some examples are:
if (cin) // cin itself interpreted as bool
if (cin >> x) // cin interpreted as bool after an extraction
if (getline(cin, str)) // getline returning cin
When interpreting a stream as a logical value, it is actually ‘not fail()’ that is inter-
preted. The above examples may therefore be rewritten as:
if (not cin.fail())
if (not (cin >> x).fail())
if (not getline(cin, str).fail())
The former incantation, however, is used almost exclusively.
• ios::clear():
When an error condition has occurred, and the condition can be repaired, then clear
can be used to clear the error state of the file. An overloaded version exists accepting
state flags, that are set after first clearing the current set of flags: clear(int state).
Its return type is void
• ios::rdstate():
The current set of flags that are set for an ios object are returned (as an int). To test
for a particular flag, use the bitwise and operator:
if (!(iosObject.rdstate() & ios::failbit))
{
// last operation didn’t fail
}
Note that this test cannot be performed for the goodbit flag as its value equals zero.
To test for ‘good’ use a construction like:
if (iosObject.rdstate() == ios::goodbit)
{
// state is ‘good’
}
• ios::setstate(ios::iostate state):
A stream may be assigned a certain set of states using setstate. Its return type is
void. E.g.,
cin.setstate(ios::failbit); // set state to ‘fail’
To set multiple flags in one setstate() call use the bitor operator:
cin.setstate(ios::failbit | ios::eofbit)
The member clear is a shortcut to clear all error flags. Of course, clearing the flags doesn’t
automatically mean the error condition has been cleared too. The strategy should be:
– An error condition is detected,
– The error is repaired
– The member clear is called.
C++ supports an exception mechanism to handle exceptional situations. According to the ANSI/ISO
standard, exceptions can be used with stream objects. Exceptions are covered in chapter 9. Using
exceptions with stream objects is covered in section 9.7.
6.3. INTERFACING ‘STREAMBUF’ OBJECTS: THE CLASS ‘IOS’ 79
The way information is written to streams (or, occasionally, read from streams) is controlled by format-
ting flags.
Formatting is used when it is necessary to, e.g., set the width of an output field or input buffer and
to determine the form (e.g., the radix) in which values are displayed. Most formatting features belong
to the realm of the ios class. Formatting is controlled by flags, defined by the ios class. These flags
may be manipulated in two ways: using specialized member functions or using manipulators, which
are directly inserted into or extracted from streams. There is no special reason for using either method;
usually both methods are possible. In the following overview the various member functions are first
introduced. Following this the flags and manipulators themselves are covered. Examples are provided
showing how the flags can be manipulated and what their effects are.
Many manipulators are parameterless and are available once a stream header file (e.g., iostream) has
been included. Some manipulators require arguments. To use the latter manipulators the header file
iomanip must be included.
Several member functions are available manipulating the I/O formatting flags. Instead of using the
members listed below manipulators are often available that may directly be inserted into or extracted
from streams. The available members are listed in alphabetical order, but the most important ones in
practice are setf, unsetf and width.
the number of significant digits to use when outputting real values is set to signif.
The previously used number of significant digits is returned. If the number of required
digits exceeds signif then the number is displayed in ‘scientific’ notation (cf. section
6.3.2.2). Manipulator: setprecision. Example:
cout.precision(3); // 3 digits precision
cout << setprecision(3); // same, using the manipulator
cout << 1.23 << " " << 12.3 << " " << 123.12 << " " << 1234.3 << ’\n’;
// displays: 1.23 12.3 123 1.23e+03
sets one or more formatting flags (use the bitor operator to combine multiple flags).
Already set flags are not affected. The previous set of flags is returned. Instead of
using this member function the manipulator setiosflags may be used. Examples are
provided in the next section (6.3.2.2).
clears all flags mentioned in mask and sets the flags specified in flags. The previous
set of flags is returned. Some examples are (but see the next section (6.3.2.2) for a more
thorough discussion):
// left-adjust information in wide fields
cout.setf(ios::left, ios::adjustfield);
// display integral values as hexadecimal numbers
cout.setf(ios::hex, ios::basefield);
// display floating point values in scientific notation
cout.setf(ios::scientific, ios::floatfield);
the specified formatting flags are cleared (leaving the remaining flags unaltered) and re-
turns the previous set of flags. A request to unset an active default flag (e.g., cout.unsetf(ios::dec))
is ignored. Instead of this member function the manipulator resetiosflags may also
be used. Example:
cout << 12.24; // displays 12.24
cout << setf(ios::fixed);
cout << 12.24; // displays 12.240000
cout.unsetf(ios::fixed); // undo a previous ios::fixed setting.
cout << 12.24; // displays 12.24
cout << resetiosflags(ios::fixed); // using manipulator rather
// than unsetf
the currently active output field width to use on the next insertion is returned. The
default value is 0, meaning ‘as many characters as needed to write the value’.
the field width of the next insertion operation is set to nchars, returning the previously
used field width. This setting is not persistent. It is reset to 0 after every insertion
operation. Manipulator: std::setw(int). Example:
cout.width(5);
cout << 12; // using 5 chars field width
cout << setw(12) << "hello"; // using 12 chars field width
6.3. INTERFACING ‘STREAMBUF’ OBJECTS: THE CLASS ‘IOS’ 81
Most formatting flags are related to outputting information. Information can be written to output
streams in basically two ways: using binary output information is written directly to an output stream,
without converting it first to some human-readable format and using formatted output by which values
stored in the computer’s memory are converted to human-readable text first. Formatting flags are
used to define the way this conversion takes place. In this section all formatting flags are covered.
Formatting flags may be (un)set using member functions, but often manipulators having the same
effect may also be used. For each of the flags it is shown how they can be controlled by a member
function or -if available- a manipulator.
• ios::internal:
to add fill characters (blanks by default) between the minus sign of negative numbers
and the value itself. Other values and data types are right-adjusted. Manipulator:
std::internal. Example:
cout.setf(ios::internal, ios::adjustfield);
cout << internal; // same, using the manipulator
• ios::left:
to left-adjust values in fields that are wider than needed to display the values. Manipu-
lator: std::left. Example:
cout.setf(ios::left, ios::adjustfield);
cout << left; // same, using the manipulator
cout << ’\’’ << setw(5) << "hi" << "’\n"; // displays ’hi ’
• ios::right:
to right-adjust values in fields that are wider than needed to display the values. Manip-
ulator: std::right. This is the default. Example:
cout.setf(ios::right, ios::adjustfield);
cout << right; // same, using the manipulator
cout << ’\’’ << setw(5) << "hi" << "’\n"; // displays ’ hi’
• ios::dec:
• ios::hex:
• ios::oct:
• std::setbase(int radix):
This is a manipulator that can be used to change the number representation to decimal,
hexadecimal or octal. Example:
cout << setbase(8); // octal numbers, use 10 for
// decimal, 16 for hexadecimal
cout << 16; // displays 20
• ios::boolalpha:
logical values may be displayed as text using the text ‘true’ for the true logical value,
and ‘false’ for the false logical value using boolalpha. By default this flag is not
set. Complementary flag: ios::noboolalpha. Manipulators: std::boolalpha and
std::noboolalpha. Example:
cout.setf(ios::boolalpha);
cout << boolalpha; // same, using the manipulator
cout << (1 == 1); // displays true
• ios::showbase:
to display the numeric base of integral values. With hexadecimal values the 0x prefix is
used, with octal values the prefix 0. For the (default) decimal value no particular prefix
is used. Complementary flag: ios::noshowbase. Manipulators: std::showbase and
std::noshowbase. Example:
cout.setf(ios::showbase);
cout << showbase; // same, using the manipulator
cout << hex << 16; // displays 0x10
• ios::showpos:
to display the + sign with positive decimal (only) values. Complementary flag:
ios::noshowpos. Manipulators: std::showpos and std::noshowpos. Example:
cout.setf(ios::showpos);
cout << showpos; // same, using the manipulator
cout << 16; // displays +16
cout.unsetf(ios::showpos); // Undo showpos
cout << 16; // displays 16
• ios::uppercase:
• ios::fixed:
to display real values using a fixed decimal point (e.g., 12.25 rather than 1.225e+01), the
fixed formatting flag is used. It can be used to set a fixed number of digits behind the
decimal point. Manipulator: fixed. Example:
cout.setf(ios::fixed, ios::floatfield);
cout.precision(3); // 3 digits behind the .
// Alternatively:
cout << setiosflags(ios::fixed) << setprecision(3);
cout << 3.0 << " " << 3.01 << " " << 3.001 << ’\n’;
<< 3.0004 << " " << 3.0005 << " " << 3.0006 << ’\n’
// Results in:
// 3.000 3.010 3.001
// 3.000 3.001 3.001
The example shows that 3.0005 is rounded away from zero, becoming 3.001 (likewise
-3.0005 becomes -3.001). First setting precision and then fixed has the same effect.
• ios::scientific:
to display real values in scientific notation (e.g., 1.24e+03). Manipulator: std::scientific.
Example:
cout.setf(ios::scientific, ios::floatfield);
cout << scientific; // same, using the manipulator
cout << 12.25; // displays 1.22500e+01
• ios::showpoint:
to display a trailing decimal point and trailing decimal zeros when real numbers are dis-
played. Complementary flag: ios::noshowpoint. Manipulators: std::showpoint,
std::noshowpoint. Example:
cout << fixed << setprecision(3); // 3 digits behind .
cout << 16.0 << ", " << 16.1 << ", " << 16;
// displays: 16.0000, 16.1000, 16
Note that the final 16 is an integral rather than a floating point number, so it has no dec-
imal point. So showpoint has no effect. If ios::showpoint is not active trailing zeros
are discarded. If the fraction is zero the decimal point is discarded as well. Example:
cout.unsetf(ios::fixed, ios::showpoint); // unset the flags
• std::endl:
manipulator inserting a newline character and flushing the stream. Often flushing the
stream is not required and doing so would needlessly slow down I/O processing. Con-
sequently, using endl should be avoided (in favor of inserting ’\n’) unless flusing the
stream is explicitly intended. Note that streams are automatically flushed when the
84 CHAPTER 6. THE IO-STREAM LIBRARY
program terminates or when a stream is ‘tied’ to another stream (cf. tie in section 6.3).
Example:
cout << "hello" << endl; // prefer: << ’\n’;
• std::ends:
• std::flush:
a stream may be flushed using this member. Often flushing the stream is not required
and doing so would needlessly slow down I/O processing. Consequently, using flush
should be avoided unless it is explicitly required to do so. Note that streams are auto-
matically flushed when the program terminates or when a stream is ‘tied’ to another
stream (cf. tie in section 6.3). Example:
cout << "hello" << flush; // avoid if possible.
• ios::skipws:
leading white space characters (blanks, tabs, newlines, etc.) are skipped when a value
is extracted from a stream. This is the default. If the flag is not set, leading white
space characters are not skipped. Complementary flag: ios::noskipws. Manipulators:
std::skipws, std::noskipws. Example:
cin.setw(ios::skipws);
cin >> skipws; // same, using the manipulator
int value;
cin >> value; // skips initial blanks
• ios::unitbuf:
the stream for which this flag is set will flush its buffer after every output operation
Often flushing a stream is not required and doing so would needlessly slow down I/O
processing. Consequently, setting unitbuf should be avoided unless flusing the stream
is explicitly intended. Note that streams are automatically flushed when the program
terminates or when a stream is ‘tied’ to another stream (cf. tie in section 6.3). Com-
plementary flag: ios::nounitbuf. Manipulators: std::unitbuf, std::nounitbuf.
Example:
cout.setf(ios::unitbuf);
cout << unitbuf; // same, using the manipulator
• std::ws:
manipulator removing all white space characters (blanks, tabs, newlines, etc.) at the
current file position. White space are removed if present even if the flag ios::noskipws
has been set. Example (assume the input contains 4 blank characters followed by the
character X):
cin >> ws; // skip white space
cin.get(); // returns ’X’
6.4 Output
In C++ output is primarily based on the std::ostream class. The ostream class defines the basic
operators and members inserting information into streams: the insertion operator (<<), and special
members like write writing unformatted information to streams.
6.4. OUTPUT 85
The class ostream acts as base class for several other classes, all offering the functionality of the
ostream class, but adding their own specialties. In the upcoming sections we will introduce:
• The class ofstream, allowing us to write files (comparable to C’s fopen(filename, "w"));
• The class ostringstream, allowing us to write information to memory (comparable to C’s sprintf
function).
The class ostream defines basic output facilities. The cout, clog and cerr objects are all ostream
objects. All facilities related to output as defined by the ios class are also available in the ostream
class.
this constructor creates an ostream object which is a wrapper around an existing std::streambuf
object. It isn’t possible to define a plain ostream object (e.g., using std::ostream
out;) that can thereupon be used for insertions. When cout or its friends are used, we
are actually using a predefined ostream object that has already been defined for us and
interfaces to the standard output stream using a (also predefined) streambuf object
handling the actual interfacing.
It is, however, possible to define an ostream object passing it a 0-pointer. Such an
object cannot be used for insertions (i.e., it will raise its ios::bad flag when something
is inserted into it), but it may be given a streambuf later. Thus it may be preliminary
constructed, suspending its use until an appropriate streambuf becomes available.
To define the ostream class in C++ sources, the ostream header file must be included. To use the
predefined ostream objects (std::cin, std::cout etc.) the iostream header must be included.
The insertion operator (<<) is used to insert values in a type safe way into ostream objects. This is
called formatted output, as binary values which are stored in the computer’s memory are converted to
human-readable ASCII characters according to certain formatting rules.
The insertion operator points to the ostream object to receive the information. The normal associativ-
ity of << remains unaltered, so when a statement like
is encountered, the leftmost two operands are evaluated first (cout << "hello "), and an ostream
& object, which is actually the same cout object, is returned. Now, the statement is reduced to
The << operator has a lot of (overloaded) variants, so many types of variables can be inserted into
ostream objects. There is an overloaded <<-operator expecting an int, a double, a pointer, etc. etc..
86 CHAPTER 6. THE IO-STREAM LIBRARY
Each operator will return the ostream object into which the information so far has been inserted,
followed by the next insertion.
Streams lack facilities for formatted output like C’s printf and vprintf functions. Although it is not
difficult to implement these facilities in the world of streams, printf-like functionality is hardly ever
required in C++ programs. Furthermore, as it is potentially type-unsafe, it might be better to avoid
this functionality completely.
When binary files must be written, normally no text-formatting is used or required: an int value
should be written as a series of raw bytes, not as a series of ASCII numeric characters 0 to 9. The
following member functions of ostream objects may be used to write ‘binary files’:
Although not every ostream object supports repositioning, they usually do. This means that it is
possible to rewrite a section of the stream which was written earlier. Repositioning is frequently used
in database applications where it must be possible to access the information in the database at random.
The current position can be obtained and modified using the following members:
• ios::pos_type tellp():
the current (absolute) position in the file where the next write-operation to the stream
will take place is returned.
• ostream &seekp(ios::off_type step, ios::seekdir org):
modifies a stream’s actual position. The function expects an off_type step represent-
ing the number of bytes the current stream position is moved with respect to org. The
step value may be negative, zero or positive.
The origin of the step, org is a value in the ios::seekdir enumeration. Its values are:
– ios::beg:
the stepsize is computed relative to the beginning of the stream. This value
is used by default.
– ios::cur:
the stepsize is computed relative to the current position of the stream (as
returned by tellp).
– ios::end:
the stepsize is interpreted relative to the current end position of the stream.
It is OK to seek or write beyond the last file position. Writing bytes to a location beyond
EOF will pad the intermediate bytes with ASCII-Z values: null-bytes. Seeking before
ios::beg raises the ios::fail flag.
6.4. OUTPUT 87
Unless the ios::unitbuf flag has been set, information written to an ostream object is not imme-
diately written to the physical stream. Rather, an internal buffer is filled during the write-operations,
and when full it is flushed.
• ostream& flush():
any buffered information stored internally by the ostream object is flushed to the device
to which the ostream object interfaces. A stream is flushed automatically when:
– the object ceases to exist;
– the endl or flush manipulators (see section 6.3.2.2) are inserted into an ostream
object;
– a stream supporting the close-operation is explicitly closed (e.g., a std::ofstream
object, cf. section 6.4.2).
The std::ofstream class is derived from the ostream class: it has the same capabilities as the
ostream class, but can be used to access files or create files for writing.
In order to use the ofstream class in C++ sources, the fstream header file must be included. Including
fstream will not automatically make available the standard streams cin, cout and cerr. Include
iostream to declare these standard streams.
• ofstream object:
this is the basic constructor. It defines an ofstream object which may be associated
with an actual file later, using its open() member (see below).
this constructor defines an ofstream object and associates it immediately with the file
named name using output mode mode. Section 6.4.2.1 provides an overview of available
output modes. Example:
ofstream out("/tmp/scratch");
It is not possible to open an ofstream using a file descriptor. The reason for this is (apparently)
that file descriptors are not universally available over different operating systems. Fortunately, file
descriptors can be used (indirectly) with a std::streambuf object (and in some implementations:
with a std::filebuf object, which is also a streambuf). Streambuf objects are discussed in section
14.7, filebuf objects are discussed in section 14.7.2.
Instead of directly associating an ofstream object with a file, the object can be constructed first, and
opened later.
this member function is used to associate an ofstream object with an actual file. If the
ios::fail flag was set before calling open and opening succeeds the flag is cleared.
Opening an already open stream fails. To reassociate a stream with another file it must
first be closed:
ofstream out("/tmp/out");
88 CHAPTER 6. THE IO-STREAM LIBRARY