System Programming in Linux 2025 9
System Programming in Linux 2025 9
SYSTEM PROGRAMMING
level languages and frameworks—writing code that
The mechanics of signals, timers, and interprocess
runs on Linux without understanding how it interacts
communication
with Linux.
IN LINUX
Using synchronization tools to write multithreaded
In today’s world, that’s not enough to stand out.
programs
Especially as more companies turn to AI to write their
Interacting with filesystems, devices, and terminals
software, the question becomes: How do you stay
Building text-based user interfaces using ncurses
relevant in an AI-driven world? You learn how things
Developing programs that are robust, efficient, and
really work.
If you’ve ever wondered how processes are
portable A Hands-On Introduction
IN LINUX
created, how memory and files are managed, or how At Hunter College, Professor Weiss built the course
programs communicate in a Unix environment, System this book is based on, and he has helped thousands
Programming in Linux will make it all make sense. of students go from confusion to confidence in his STEWART N. WEISS
This is a hands-on guide to writing software that over 40 years of teaching programming. His clear,
interfaces directly with the Linux operating system. conversational style; technical depth; and focus on
You’ll go beyond shell commands and abstractions real-world application make this one of the most
to understand what the kernel is doing—and how to approachable and powerful system programming
leverage it through your own code. Rather than telling books available.
you how to solve each problem, Professor Stewart N. As Linux continues to dominate development,
Weiss guides you through the process of discovering server, and embedded environments, understanding
the solution yourself. the system behind your software isn’t just helpful; it’s
Start with the core concepts of Unix and Linux, essential.
then work your way up to advanced topics like process Whether you’re a student, developer, or sysadmin,
control, signals, interprocess communication, threading, this book gives you the tools to work directly with
and non-blocking I/O. Each chapter includes conceptual Linux and the insight to understand what’s really
diagrams, annotated source code, and practical projects happening under the hood.
to help you immediately apply what you’ve learned.
T H E F I N E S T I N G E E K E N T E RTA I N M E N T ™
®
nostarch.com
®
®
SYSTEM PROGRAMMING
IN LINUX
SYSTEM
PROGRAMMING
IN LINUX
A Hands-On Introduction
by Stewart N. Weiss
San Francisco
SYSTEM PROGRAMMING IN LINUX. Copyright © 2026 by Stewart N. Weiss.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the
prior written permission of the copyright owner and the publisher.
First printing
29 28 27 26 25 12345
Sunflower photograph by Gilberto da Silva Moraes used under license from Shutterstock.com.
For customer service inquiries, please contact [email protected]. For information on distribution, bulk sales,
corporate sales, or translations: [email protected]. For permission to translate this work: [email protected].
To report counterfeit copies or piracy: [email protected]. The authorized representative in the EU
for product safety and compliance is EU Compliance Partner, Pärnu mnt. 139b-14, 11317 Tallinn, Estonia,
[email protected], +3375690241.
No Starch Press and the No Starch Press iron logo are registered trademarks of No Starch Press, Inc. Other product
and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark
symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been
taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person
or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information
contained in it.
[E]
To
Joanna, my hero, love, and soulmate;
Shayna, light of my life;
and
my parents, who taught me what it means to be honorable.
About the Author
Stewart N. Weiss was a professor in the Department of Computer Science
at Hunter College for 38 years and served on the faculty of the Graduate
Center of the City University of New York as well. He has taught a broad
range of courses, several of which he developed, including Unix system pro-
gramming, parallel computing, software testing, and open source software
development. He authored or co-authored nearly two dozen publications
on aspects of software engineering, including software testing and reliability
and open source software development. He was a principal investigator on
several grants from the National Science Foundation.
Stewart holds a PhD in computer science from the Courant Institute
of Mathematical Sciences at New York University. He started working with
Unix and C in 1983 while he was a graduate student there and has been a
Unix enthusiast ever since. He has always loved teaching and is very passion-
ate about sharing his appreciation and knowledge of Unix and Linux.
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969
CONTENTS IN DETAIL
ACKNOWLEDGMENTS xxv
PREFACE xxvii
INTRODUCTION xxix
What Will You Learn from This Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
How Will This Book Teach You? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
Using Open Source Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Presenting Different Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Using Example Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
What Should You Know to Understand This Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
The Role of C in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
Utility Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii
System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii
About UNIX, Unix, Linux, and More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiv
Scope, Content, and Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiv
Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Online Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii
Conventions and Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
Typographical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
Example Program Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl
Dates and Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl
Suggestions and Corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli
1
CORE CONCEPTS 1
What Is System Programming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The Magic of Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The Role of the C Library in I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
System Programs Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Fundamental Concepts of Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
The Unix Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Shells and Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Users and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Privileged and Nonprivileged Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Files, Directories, and the Single Directory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . 17
Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Online Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Using the Manual Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
The Pager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
The Structure of Man Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Searching Through the Man Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Unix History and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
The Birth of UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Early Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
The Free Software Foundation and GNU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
The Rise of Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Many Unixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Unix and Related Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
C Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2
FUNDAMENTALS OF SYSTEM PROGRAMMING 49
Object Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
System Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Static and Shared Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
The Advantages of Shared Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Commands to Query a Library’s Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Commands to Show the Libraries Linked to a Program . . . . . . . . . . . . . . . . . . . . . . 56
The C Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Wrapper Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
System Call Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Multiple Paths to Kernel Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Handling Errors from System Calls and Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
System Call Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Errors from Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Feature Test Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Other Portability Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
System Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Internationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Processing the Command Line and Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Extracting Command Line Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Accessing the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Reporting Usage Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Extracting the Program Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3
TIME, DATES, AND LOCALES 93
Learning System Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Organization of Common Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Functions for Extracting Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Common Error-Handling Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
File Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Planning Our First System Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Designing the First Version of spl_date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
About Calendar Time in Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Broken-Down Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Calendar Time System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Time Conversion Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Designing a Second Version of spl_date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Designing a Third Version of spl_date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
The User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Program Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Working with Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Locale Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
About Time Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
The Command-Level Interface to Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
The Structure of Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
The Programming Interface to Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
An Internationalized Version of the spl_date Program . . . . . . . . . . . . . . . . . . . . . . . 139
Other Ways to Internationalize Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Locale Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4
BASIC CONCEPTS OF FILE I/O 149
High-Level vs. Low-Level File I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Universal I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
File Permissions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Applying the Umask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Setting and Getting Umasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Propagating Umasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A Process’s User IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
The setuid Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Input/Output Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5
FILE I/O AND LOGIN ACCOUNTING 187
Controlling the Position of I/O Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
The lseek() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
File Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Displaying Last Login Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
The lastlog Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
The lastlog File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
The lastlog Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Usernames, User IDs, and the passwd File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
The Password Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Accessing All User Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Developing a lastlog Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Program Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Writing the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Developing a last Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Login Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
The utmp Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
The utmpx API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Logins, Logouts, and the utmp and wtmp Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
A Program to Show the utmp and wtmp Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Analysis of the wtmp File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Designing the spl_last Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
User Space Buffering of Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Contents in Detail xv
7
THE DIRECTORY HIERARCHY 311
Directory Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Processing Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
The readdir() Library Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
The dirent Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Directory Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
The opendir() Library Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
The closedir() Library Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
A Simple ls Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Other Functions in the Directory API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
The telldir() and seekdir() Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
The scandir() Library Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Processing the Directory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Mounting File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
An Example of Filesystem Mounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Commands for Finding Mount Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Duplicate Inode Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Tree Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
A Recursive Tree Walk Using readdir() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
A Recursive Tree Walk Using scandir() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
The nftw() Tree Walk Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Writing a du Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
The fts Tree Traversal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
The pwd Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
An Exercise in Constructing a Directory Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
A Strategy for Implementing the pwd Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
8
INTRODUCTION TO SIGNALS 383
The Role of Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
A Signal Delivery Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Sources of Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
Signal Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
The Lifetime of a Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Signal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Basic Signal Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
The signal() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
The System V signal() Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Sending Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Blocking Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Signal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
The sigprocmask() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
9
TIMERS AND SLEEP FUNCTIONS 435
Keeping Track of Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Alarm Clocks and Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Sleep Functions and Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Time, Clocks, and Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Hardware Clocks and Hardware Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
The System Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
High-Resolution Sleep Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
The nanosleep() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
The clock_nanosleep() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Software Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
The alarm() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
A Progress Bar Based on Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Interval Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
POSIX Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
A POSIX Timer-Based Progress Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
Resource Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Real-Time Signals and Multiple Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
10
PROCESS FUNDAMENTALS 491
Processes Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
The Process Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Process Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Foreground and Background Processes and Process Groups . . . . . . . . . . . . . . . . . . . . . . . . . 497
Program Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
The Contents of an Executable File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
The Executable and Linking Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
The Virtual Memory Layout of a Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
The Text Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
The Initialized Data Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
11
PROCESS CREATION AND TERMINATION 539
The Lifetime of a Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Creating Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
The Basics of fork() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
The Child’s Memory Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
The Child’s Process Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Sharing of Open Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Potential Race Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Process Synchronization with Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Other Functions That Create Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
Terminating Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Executing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
The execve() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
The exec() Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Waiting for Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
The wait() and waitpid() System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
The waitid() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
The SIGCHLD Signal and Asynchronous Waiting . . . . . . . . . . . . . . . . . . . . . . . . . . 582
Putting It All Together: A Simple Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
The system() Library Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
12
INTRODUCTION TO INTERPROCESS COMMUNICATION 597
Why Do We Need IPC? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
An Overview of Interprocess Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Shared Memory Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Data Transfer Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
13
PIPES AND FIFOS 645
An Overview of Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
Pipe Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
Unnamed Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
The Behavior of Read Operations on Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
The Behavior of Write Operations on Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
A Producer-Consumer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
A Shell Pipe Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
Best Practices Regarding Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
The popen() and pclose() Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Creating Named Pipes in the Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Creating FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Opening FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
Putting It All Together: A Simple FIFO-Based Server-Like Program . . . . . . . . . . . . . 671
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
14
CLIENT-SERVER APPLICATIONS AND DAEMONS 679
Introduction to Client-Server Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
System Logging Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Converting Processes into Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
15
INTRODUCTION TO THREADS 709
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Threads and Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Support in the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Pros and Cons of Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Shared Resources and Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
Program Design Considerations with Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Overview of the Pthreads Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
Thread Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
Creating a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
Exiting a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Joining a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Passing Data to Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
Identifying Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
Detaching Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
Canceling a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
Setting Thread Stack Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Signals and Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Thread-Directed Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Process-Directed Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Signal Masks and Dispositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
A Multithreaded Concurrent Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
16
THREAD SYNCHRONIZATION 739
Correctness and Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
Mutexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
Declaring and Initializing a Mutex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Locking and Unlocking a Mutex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
Destroying a Mutex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
A Program Using a Normal Mutex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
Other Types of Mutexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
xx Contents in Detail
Condition Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
Why Do We Need Condition Variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
The Typical Steps for Using Condition Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
Declaring and Initializing a Condition Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Waiting on a Condition Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Signaling a Condition Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
Destroying a Condition Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
Condition Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
A Multithreaded Multiple Producer, Multiple Consumer Program . . . . . . . . . . . . . 757
Barrier Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
Pthreads Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
A Program Using Barrier Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Read-Write Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
Read-Write Lock API Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Use and Semantics of Read-Write Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
Further Details About Pthreads Read-Write Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
Read-Write Lock Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
17
ALTERNATIVE METHODS OF I/O 791
Nonblocking I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
Enabling Nonblocking I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
A Program to Demonstrate Nonblocking Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
Signal-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Procedure for Enabling Signal-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
Events Causing Signal Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
Real-Time Signals and Signal-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
A Program Using Signal-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
POSIX Asynchronous I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
The AIO API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
Performance Benefits of Asynchronous I/O with Disk Files . . . . . . . . . . . . . . . . . . . 813
An AIO-Based Implementation of spl_cp1.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
Multiplexed I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
The select() System Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
An Example Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
19
INTERACTIVE PROGRAMMING AND THE NCURSES LIBRARY 877
Canonical and Noncanonical Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Canonical Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Overview of Noncanonical Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
The MIN and TIME Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
An Interactive Program in Noncanonical Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Program Features and Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Terminal Control Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Global Constants, Types, and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
The sprite.c main() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Curses and the ncurses Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
History, Standards, and Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Compiling, Building, and Running Curses Programs . . . . . . . . . . . . . . . . . . . . . . . . 897
Curses Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897
The Curses API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
A Program with Tiled Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909
A Curses Version of sprite.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911
A
CREATING LIBRARIES 943
About Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Static vs. Shared Libraries in Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Identifying Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Creating a Static Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Using (Linking to) a Static Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947
Creating a Shared Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
Shared Library Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
Steps to Create the Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
Using a Shared Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
B
UNICODE AND UTF-8 955
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
UTF-8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
Conversion Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
Conversion Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960
C
DATE AND TIME FORMAT SPECIFIERS 961
BIBLIOGRAPHY 965
INDEX 969
I thank many people for their help, support, and guidance in the creation
of this book. Foremost is my wife, Joanna Klukowska, who had been encour-
aging me to write this book for a long time and who gave me constant sup-
port throughout the entire process. She was always ready and eager to step
through my code and find the bugs that seemed to escape my eyes, and she
was far more steadfast and successful at it than I. In what little spare time she
had, she also proofread many of the chapters.
I am sincerely grateful to Jill Franklin, the managing editor at No Starch
Press, who read my manuscript meticulously, made many improvements
in my writing, and offered many suggestions about making the big picture
clearer. The book would not be what it is without her help and guidance.
I’m also indebted to Mitch Frazier, the technical reviewer, for catching sev-
eral mistakes and inaccuracies in the manuscript, for finding bugs in the
code, and for detecting inconsistencies between the code that appears in the
book and its counterpart in the repository. It was reassuring when a chapter
passed his muster. I also thank everyone at No Starch Press who helped in
the production of the book.
This book’s origins date back to when I started teaching a course in Unix
system programming at Hunter College at the City University of New York.
That course was inspired by one taught by Matthew Smosna at New York
University, who was a friend and fellow graduate student there. He passed
away many years ago, but his legacy lives in this book. Many of the students
who took my course made comments and suggestions that ultimately altered
my lecture notes and found their way into this book. Thanks go to all of
them. I give particular thanks to Zhi Peng Lin and Syeda Rahman, who took
the course while I was writing the manuscript and made several suggestions
for the first Early Access edition that improved the book. My brother-in-law,
Jay Militscher, an IT professional, made several useful comments that led to
rewrites of the introductory material.
Lastly, I am grateful to the many authors who’ve written excellent books
about the Unix and Linux operating systems. Those resources were invalu-
able during the writing of this book.
xxvi Acknowledgments
PREFACE
xxviii Preface
INTRODUCTION
xxx Introduction
In this sense, my approach tries to adhere to the principles embodied
in the well-known proverb often attributed to the Chinese philosopher Lao
Tzu, paraphrased as follows:
Give a person a fish and they will eat for a day. Teach a person to
fish and they will eat for a lifetime.
Introduction xxxi
Using Example Programs
This book is predicated on a learning model in which you and I investigate
new components of the Unix/Linux programming interface and then de-
velop code based on them. To this end, I’ve written about 200 example pro-
grams to accompany the book. You can download all of these programs in
order to read and experiment with them. However, I strongly believe that
the best way to learn how to program is to write code. I encourage you to
start by modifying those programs and then writing programs like them from
scratch. The more you do this, the better and more efficient you’ll be at de-
veloping software on Linux systems. See “Online Materials” on page xxxvii
for details about how to obtain copies of the programs.
xxxii Introduction
which they see as archaic. These functions are usually much more useful and
efficient than those found in C++. Where I taught for more than 40 years, the
basic programming classes used C++, but most students who took my classes
in Unix system programming quickly adapted to C, and some preferred it.
Utility Programs
At the very least, you need to know how to compile and build programs on
Linux or other Unix systems. I don’t cover how to use any program develop-
ment tools in this book other than showing you how to build the programs
using the GNU gcc compiler collection. However, you’ll be a more efficient
programmer if you know how to use a few software development system util-
ities. The most important of these tools are:
make For maintaining program collections
gdb An indispensable command line debugging tool
valgrind To help find memory leaks and bad pointers in code
git A command line tool for version control
If you’re used to using an integrated development environment (IDE) such
as Eclipse or NetBeans, try to toss its training wheels away and learn how to
use these tools instead.
System Requirements
The book assumes that you have access to a Linux system on which you can
develop programs. It doesn’t matter which flavor of Linux you use. I’ve been
using Ubuntu for several years; many other distributions are available. You
don’t need to install it on your machine if you have remote access to a Linux
system, although working remotely is generally slower. You can also install a
Linux virtual machine on your host computer as your work environment for
this book.
If you don’t have superuser privilege for your system, you’ll either need
to get sudo privilege or ask the person who maintains the machine to make
sure the system has the needed packages, which include:
• The GNU compiler collection (gcc)
• All man pages, including manpages-dev, man-db, manpages-posix, and
manpages-posix-dev
• The man-db package, which lets you search through man pages, but
you’ll have to run the command mandb as a superuser to initialize the
database for searching
• The make utility, gdb, git, and valgrind
You’ll have to check which package manager your variant of Linux uses for
installing packages if you need to install any of these.
Introduction xxxiii
About UNIX, Unix, Linux, and More
Unix has a history dating back to 1969, and since that time many different
variants have been developed, of which Linux is one. In 1969, and for many
years after that, Unix was always written as UNIX because its name was a pun
based on an earlier system named MULTICS on which its original develop-
ers worked. In fact, for a very short period of time, it was called UNICS. In
1993, UNIX became a registered trademark of The Open Group, a consor-
tium of companies. The term Unix is not trademarked and doesn’t refer to
any one operating system. In general, it refers to any operating system that
is what people often call Unix-like. In the interest of clarity, when the term
UNIX appears in the text, it refers very narrowly to any operating system that
has been certified by The Open Group as conforming to its branding of the
term or to those versions of Unix predating the trademark whose name was
written as UNIX at the time. I mostly use the word Unix, which in some con-
texts has a precise meaning and in others does not.
One important consequence of the fact that there are so many different
varieties of Unix is that a program that works on one Unix system may not
work on another. This problem led over time to the standardization of Unix.
Chapter 1 contains a brief history of the various applicable standards. The
general problem of writing programs that work across a variety of operating
systems is called portability. Chapter 1 also describes steps that you can take
to make your code portable to Unix systems other than the one on which
you wrote it, provided that they conform to one standard or another.
The term Linux poses a slightly different problem. Technically speaking,
Linux is not an entire operating system with all of its utilities and programs
that come bundled together in an installation package. It’s just what’s com-
monly called the kernel, a term defined in Chapter 1. The rest of the oper-
ating system is mostly programs and libraries developed by GNU as part of
the GNU Project (https://www.gnu.org/gnu/gnu.html). (GNU is a recursively
defined acronym for GNU’s Not Unix.) For this reason, many people believe
it should be called GNU/Linux. I am one of those who believe that its name
should reflect the major contribution to it by the GNU Project; therefore,
when I want to refer specifically to the entire operating system, I’ll some-
times call it GNU/Linux as a reminder, but when I refer specifically to its
kernel, I’ll call it Linux.
xxxiv Introduction
Chapter Organization
The book has 19 chapters that build upon one another. I wrote this book
as if I were teaching in a classroom and you’re there with me, and we’ve em-
barked on a journey in which we learn this material together. I don’t expect
a reader who starts in Chapter 7 to understand it any more than I would
expect a student who missed the first six classes of a course to understand
much in the seventh class.
Chapter 1: Core Concepts Explains what system programming is and
how it differs from other kinds of programming. It introduces the fun-
damental concepts and components of the Unix operating system, such
as users and groups, files and directories, processes, and so on, and it
explains the man pages and how we’ll use them. It also covers some of the
history of Unix and the key standards.
Chapter 2: Fundamentals of System Programming Introduces con-
cepts related to programming in a Unix environment and working with
the kernel application programming interface (API). It covers object
libraries and the difference between static and shared libraries, system
calls, error handling, portability and feature test macros, system limits,
and internationalization of programs. It also covers how programs can
access the environment strings and their command line arguments, and
process command line options.
Chapter 3: Time, Dates, and Locales Presents the methodology for
learning system programming that the rest of the book follows and ex-
plains how the source code repository that contains all example pro-
grams is organized. It applies this methodology to the development of
programs that work with dates and times in Unix and introduces basic
methods of internationalizing programs.
Chapter 4: Basic Concepts of File I/O Introduces core concepts of
files and file I/O in Unix, including universal I/O, open file connections,
file descriptors, and the parts of the kernel API relevant to I/O. It also
covers file permissions, the types of user IDs, and the setuid facility.
It develops a simplified copy command and explores issues related to
performance and buffering.
Chapter 5: File I/O and Login Accounting Introduces the file pointer,
seeking operations, and a few more advanced methods of I/O. It intro-
duces system data files related to users and logins, and it develops sim-
plified versions of the lastlog and last commands.
Chapter 6: Overview of Filesystems and Files Dives into the struc-
ture of disks, disk partitions, disk filesystems, and their internals. It in-
troduces parts of the kernel API for accessing filesystem attributes, file
attributes, and more, and it also introduces the Linux virtual filesystem
and how it works. It then develops simple versions of the stat and statfs
commands.
Introduction xxxv
Chapter 7: The Directory Hierarchy Explains the structure of directo-
ries and the directory hierarchy. It explores the parts of the kernel API
and the standard libraries for processing directories and the directory
hierarchy, including methods of traversing the hierarchy. Here, we de-
velop simple ls, pwd, and du commands.
Chapter 8: Introduction to Signals Covers the core concepts of signals
and how they’re used in Unix systems. It introduces the parts of the ker-
nel API related to sending signals, signal handling, signal registration,
and signal blocking. It also discusses the design of signal handlers and
the concept of asynchronous signal safety.
Chapter 9: Timers and Sleep Functions Introduces timing elements
for programs and explains fundamental concepts related to timing, such
as clocks, hardware interval timers, and more. It introduces several dif-
ferent sleep functions and software interval timers, and it also develops a
couple of programs that act like system monitors.
Chapter 10: Process Fundamentals Introduces the fundamentals of
processes: what they are, how they’re organized, and how they’re man-
aged and represented internally by the kernel. It introduces the Ex-
ecutable and Linking Format (ELF) file format and how it is used to
create process images, and it also introduces the proc pseudofilesystem.
Here we develop a simplified ps command.
Chapter 11: Process Creation and Termination Introduces the parts
of the kernel API related to the creation, termination, and management
of processes, including calls for the synchronization of parent and child
processes. It develops a simplified shell program.
Chapter 12: Introduction to Interprocess Communication The first
of two chapters dedicated to interprocess communication (IPC). It cov-
ers POSIX shared memory, semaphores, and POSIX message queues. It
develops a few programs that demonstrate the application of these IPC
facilities.
Chapter 13: Pipes and FIFOs Introduces unnamed pipes and named
pipes, also called FIFOs, and goes into details of the semantics of open-
ing, reading, writing, and closing pipes and FIFOs. It develops a simple
FIFO-based server.
Chapter 14: Client-Server Applications and Daemons Covers con-
cepts related to the development of client-server applications, including
system logging facilities and conversion of processes into daemons. It
develops both an iterative server similar to the calc command and a con-
current server.
Chapter 15: Introduction to Threads The first of two chapters on
multithreaded programs. It covers thread basics, explores much of the
Pthreads library related to thread creation and management, and devel-
ops a multithreaded server.
xxxvi Introduction
Chapter 16: Thread Synchronization Covers the parts of the Pthreads
API related to the synchronization of threads, including mutexes, condi-
tion variables, barriers, and read-write locks.
Chapter 17: Alternative Methods of I/O Explores I/O models be-
yond the standard blocking I/O model. In particular, it covers non-
blocking I/O and polling, signal-driven I/O, POSIX asynchronous I/O,
and multiplexed I/O using the select() system call.
Chapter 18: Terminals and Terminal I/O Covers terminals and ter-
minal I/O, beginning with the special needs of interactive programs.
It examines the structure of terminal driver software and support for
terminal configuration in the kernel, after which it explores methods
of configuring the terminal such as the termios and ioctl interfaces. It
develops a simplified stty command.
Chapter 19: Interactive Programming and the ncurses Library Covers
configuring the terminal for interactive programs, including noncanon-
ical mode programming. It introduces the ncurses library’s API and de-
velops a few programs based on it, ending with a simple version of the
top command.
Appendix A: Creating Libraries Shows how to create and manage
static and shared libraries.
Appendix B: Unicode and UTF-8 Offers a short tutorial on Unicode
and the variable-length representation of Unicode known as UTF-8.
Appendix C: Date and Time Format Specifiers Presents a table of the
date and time specifiers, with examples, used in the formatting of dates
and times by various functions and system utilities.
Online Materials
To keep the book from becoming too long, programs are generally not pre-
sented in their entirety. Instead, they’re available online along with other
materials. You can access them on the book’s web page at https://nostarch
.com/introduction-system-programming-linux.
Source Code
All of the source code that appears in this book, as well as other example
programs, is available for download at https://github.com/stewartweiss/intro
-linux-sys-prog and as a ZIP file from https://nostarch.com/introduction-system
-programming-linux. The programs are organized by chapter. Each chapter
directory contains a makefile for building and maintaining the programs in
that directory, and I include a master makefile that can build and maintain
all of the programs in this repository. The top-level directory has a README
file that explains the licensing and has instructions for maintaining the pro-
grams. I have tried to write thorough inline documentation for all programs.
There are also three directories named common, include, and lib. The
first contains source code and header files for functions that are used in
Introduction xxxvii
multiple chapters. The makefile there can build a library file that is copied
into lib and copy the headers into include.
All complete programs in the repository are covered by the GNU Gen-
eral Public License (Version 3), a copy of which is in the repository. The
source code for all library functions in the common directory is covered by
the GNU Lesser General Public License (Version 3), a copy of which is also
in the repository.
make Tutorial
I’ve written a make tutorial for those who want to know how to use this utility
program in elementary ways. The book’s web page has a link to a GitHub
repository that contains the tutorial and instructions for how to use it.
Solutions to Exercises
Solutions to selected exercises from the ends of each chapter are available
online in a single ZIP file that you can download from the website.
Typographical Conventions
I use a monospaced font for all code, input and output of programs, file con-
tents, and the names of all commands and executable programs. For exam-
ple, I would write “bash is a popular shell in Linux.” I use italic text for the
names of all files and directories, as in “The executable program file for the
bash shell is /bin/bash.”
Notation
In the description of a command or function I use square brackets ([ ])
to enclose optional elements. The brackets are not part of the command.
Italic text denotes placeholders, not actual text that you type. An ellipsis
(...) means more than one copy of the preceding token. For example, in the
description
the words option and directory_name are placeholders. The square brackets in-
dicate that both the option specifiers and the argument to the ls command
xxxviii Introduction
are optional but that all option specifiers must precede any directory names
and that option specifiers and directory names can occur multiple times,
as in:
ls -l -t chapter01 chapter02
Here, -l and -t are two options and chapter01 and chapter02 are two arguments.
I use a vertical bar (|) to indicate exactly one choice among multiple al-
ternatives. For example, the description
indicates that after all options, you can supply either a command string or
the name of a file but not both.
Throughout the book, I’ll use the $ character as the prompt string dis-
played inside a terminal window. Any text that you would enter is shown
in boldface. For example, the echo command just prints whatever text you
enter after it on the command line. I would demonstrate how you use it as
follows:
Notice that the prompt character is displayed again. This is how I indicate
that you’re seeing all of the command’s output and that the command ter-
minated. When showing all output would require too many lines, I’ll snip
some of the lines by putting the word --snip-- in place of the removed out-
put, as in:
$ ls /var
backups/
cache/
--snip--
$
I’ll also use an ellipsis on a single line when I’ve deleted some of the text
on that line. In the Unix system that you use for following along with this
book, the prompt character that you see might be something other than $.
Many systems might have a default prompt that includes more information,
such as your login name or the name of the computer. In fact, you are usu-
ally able to customize your prompt.
In code listings, I’ll sometimes omit parts of the code to save space. I’ll
indicate this either with a --snip-- in place of multiple lines of code, as in
int main()
--snip--
return 0;
}
Introduction xxxix
or, when I want to specify what’s missing, I’ll use this notation:
if ( argc < 2 )
// OMITTED: Handle missing argument
else
I’ll also write all pseudocode that appears in code listings in //-style com-
ment blocks, reserving /*...*/-style blocks for actual comments.
xl Introduction
Suggestions and Corrections
I’ve tried my best to find bugs and mistakes in the example programs and
the text of the book, but I can’t imagine that I found and corrected them all.
Along the same lines, I sometimes rewrote an explanation many times over,
trying to make sure it is easy and enjoyable to read and accurate, but again, I
am not perfect, and you may find places in the book that you think could be
written better.
If you’d like to make suggestions or corrections to the text of the book,
please email me at [email protected] or email [email protected]. In-
clude either a page number or a piece of identifying text that is long enough
to be unique. If you find bugs or more serious flaws in the code (I hope
not), if you’re familiar with Git, please open an issue on https://github.com/
stewartweiss/intro-linux-sys-prog.
I hope that you learn a lot from this book. Even more important, I hope
that it shows you how you can learn what it doesn’t cover on your own. Fi-
nally, I hope it’s enjoyable to read and that you gain an appreciation for the
marvel and magic of Unix.
Introduction xli
1
CORE CONCEPTS
The first line is an include directive, which starts with the keyword #include
and is followed by the specification of a file. It tells the C preprocessor to read
the contents of the file, in this case, the C header stdio.h, at that point in the
program. We need that action to take place because the main program makes
a call to the C printf() function, whose declaration is in stdio.h. Without it,
the compiler could not tell whether printf() was being called properly. The
C preprocessor has to find the header file before it can read it, and header
files can be in many possible places. The angle brackets (<>) around the name
of the file tell the preprocessor that it’s in one of the standard places that it
searches.
$ ./hello_world
hello, world
$
For someone who has never written a program before, this seems like
magic. All you have to do is include the stdio.h header file in the code and
give the printf() function the string that you want to print, and voilá: When
you run the program, the string appears.
It clearly isn’t magic though, and a lot must be going on behind the scenes
to make the characters appear on the screen. C has given us a very powerful
tool, printf(), so that we can write programs that print to the screen without
needing to learn a lot about terminals and other technology.
Let’s take this one step further. The preceding program outputs text but
has no input. Listing 1-2 performs both input and output.
The scanf() function is the C library’s formatted input function. It reads input,
by default from the keyboard, following a format that you give it. In general, its
first parameter is a string enclosed in double quotes followed by one or more
(continued)
Core Concepts 3
pointers. The double-quoted string is called the format specification. In this ex-
ample, it is "%255s", which specifies that the input data should be stored as a
string (s for string) with a maximum width of 255 characters. The argument fol-
lowing the format specification must be a pointer to the start of a character array
large enough to hold 255 characters plus the NULL byte. Since array names can
be used in C wherever a constant pointer is expected, the array username is a
valid argument.
Let’s think about how this input and output actually take place. The pro-
gram makes calls to the scanf() and printf() functions, but where is their
code and how is it executed? Many beginning programmers mistakenly be-
lieve that the header files included by their programs contain the function
implementations because all they have to do is put appropriate #include di-
rectives into their programs for them to work. However, those implementa-
tions are not in the header files.
4 Chapter 1
Program C library Operating system
When we run a program like hello.c in Listing 1-2, we have the illusion
that the program is connected directly to the keyboard and the display de-
vice via C library functions. If you run the program on your own personal
computing device, this illusion may not be far from reality. However, we
can also run it on a multiuser system in a terminal window, and the results
will be exactly the same. This fact complicates the picture even further. In
a Unix system, and in almost all modern operating systems, many people
can work on the system at the same time, and programs belonging to differ-
ent people can run at the same time, each receiving input from a different
keyboard and sending output to a different display. Each person will see
the same output as if they had run the program on a single-user machine.
The operating system is what makes this possible. It has to ensure that each
user’s programs do not interfere with each other.
System Resources
We can frame this problem in terms of resources. Resources are objects that
software uses and/or modifies. For example, a program’s input and out-
put data are resources, as are the values that it stores in its internal data
structures. A program has the privilege to access or modify any of its own
resources.
In Unix systems, some resources are protected from access by ordinary
programs and are accessible only by the operating system. These protected
resources are called system resources. System resources include hardware, such
as the CPU, physical memory, screen displays, storage devices, and network
connections. They also include objects that aren’t hardware, such as system
data structures and files. These are sometimes called soft resources. Figure 1-2
illustrates the way an operating system is layered in order to control access to
system resources.
Core Concepts 5
Users (people and/or devices)
Operating system
NOTE If you’re familiar with object-oriented programming, you may notice a resemblance
between the operating system’s API and a class interface. Both provide a set of meth-
ods for accessing protected data only through a well-defined set of access points.
6 Chapter 1
or provide functions that higher-level applications can use. For example, we
could write a program that gets the current time from the operating system’s
internal clock and displays it in various formats for any user. This would be a
system program.
The term system program also applies to any program that can run in-
dependently of the operating system and extend its functionality, even if it
doesn’t make any direct calls to the API. Tools such as compilers, assemblers,
linkers, terminal emulators, and so on are considered to be system programs,
and they play a fundamental role in a computer system. As Richard Stallman
wrote, “The kernel is an essential part of an operating system, but useless by
itself; it can function only in the context of a complete operating system” [40].
In this view, system programs are like an extension to the operating system,
even though their definition is a bit fuzzy. The primary purpose of this book
is to show you how you can write programs of this nature, namely those that
interact directly with the operating system and, in effect, act like a part of that
system.
Core Concepts 7
The Unix Kernel
It is perhaps unfortunate that the term operating system has no single, uni-
versally agreed upon definition. If you look at almost any textbook on oper-
ating systems [37, 41], you’ll find two different views of what constitutes an
operating system:
• The operating system is the collection of all software that provides
services to applications and users and manages and protects all hard-
ware resources. In this view, tools like user interfaces and browsers
are part of the operating system.
• The operating system is only the program that is loaded into mem-
ory on startup and remains in memory, controlling all computer
resources, until the computer is powered off.
Regardless of which definition you decide to adopt, the term kernel is
unambiguously used as another name for the second definition. It’s an ap-
propriate name, since it’s the core of the Unix system. In the seminal book
on the design of the 4.4BSD operating system, The Design and Implementation
of the 4.4BSD Operating System, McKusick and co-authors define a kernel as “a
small nucleus of software that provides only the minimal facilities necessary
for implementing additional operating system services” [26]. In this book, I
use the narrow definition of an operating system, namely that it is the kernel
and nothing more.
The kernel is a program, or a collection of interacting programs, depend-
ing on the particular implementation of Unix, with many entry points. An
entry point is an instruction in a program at which execution can begin. Each
of these entry points provides a service that the kernel performs. If you are
used to thinking of programs as always starting at their first line, this may be
disconcerting.
Most likely, in the programs that you have written so far, there has been
a single entry point, namely the main() function. However, it’s possible to
create code that can have several entry points. Software libraries are code
modules with multiple entry points. You can think of entry points as func-
tions that can be called by other programs. They perform services such as
opening, reading, and writing files, creating new processes, allocating mem-
ory, and so on. Each of these functions expects a certain number of argu-
ments of certain types and produces well-defined results. The collection of
kernel entry points makes up a large part of its API. In fact, you can think of
the kernel as containing a collection of separate functions, bundled together
into a large package, and its API as the collection of signatures or prototypes
of these functions.
8 Chapter 1
When a Unix system boots, a combination of firmware and software
loads the kernel into the portion of memory called system space or kernel
space, where it stays until the machine is shut down. User programs are not
allowed to access system space. If they try, the kernel terminates them.
The kernel has full access to all of the hardware attached to the com-
puter. The kernel maintains various system resources in order to perform
services for user programs. These system resources include many different
data structures that keep track of input/output (I/O), memory, and device
usage, for example.
The Unix kernel manages and protects all of these resources and pro-
vides an operating environment that allows all users to work efficiently,
safely, and happily. It prevents users and the programs that they run from
accessing any hardware resources directly. In other words, if a user’s run-
ning program wants to read from or write to a disk, it must ask the kernel
to do that on its behalf, rather than doing it on its own. The kernel will per-
form the task and transfer any data to or from a portion of memory that the
user’s program can access.
To understand why this is necessary, consider what would happen if
users’ programs could access the hard disk directly. A user could run a pro-
gram that could try to acquire all disk space, or even worse, try to erase the
disk, subverting the kernel’s ability to protect its resources.
The Unix kernel also protects users from each other and protects it-
self from users, while simultaneously giving users the impression that they
each have the computer entirely to themselves. This is precisely the illu-
sion described in the section “What Is System Programming?” on page 2.
Somehow, everyone is able to run programs that seem as if they have the
computer all to themselves, as if no one else were using the machine. Users
have their own disk space, their own private portion of memory, their fair
share of time on the CPU, and so on.
In order to achieve these objectives, the inventors of Unix incorporated
several key principles into its design:
• The system designates two levels of privilege (user privilege and ker-
nel privilege) such that certain instructions can be executed only
with kernel privilege.
• Each user has a unique identity. A privileged user can create groups
of users, and those groups have unique identities as well. These user
and group identifiers are assigned privileges and protections for all
user resources such as disk storage, running programs, and so on.
• The system of files supports creation, modification, retrieval, and re-
moval of persisted data and programs, as well as privacy, protection,
and the ability to share software and data.
• Physical memory is divided into two regions: user space, where ordi-
nary user programs are loaded, and system space, which is where the
operating system itself is stored.
• The kernel has exclusive control of the use of the processor, and it
decides at any given time what runs next.
Core Concepts 9
• The kernel has the exclusive ability to load programs into memory,
run them, and terminate them. A running program cannot even
terminate itself; the best it can do is to ask the kernel to terminate it!
• The kernel has complete and exclusive control of all computer
hardware.
We’ll describe each of these principles in more depth in the remaining
sections of this chapter.
Kernel Services
I’ve mentioned reading and writing files and terminal I/O as some of the
types of services that the kernel provides, but to give you an even better
sense of the scope of its services, the following list shows the types of ser-
vices it performs:
• Process scheduling and management
• I/O handling
• Physical and virtual memory management
• Device management
• Filesystem management
• Signaling and interprocess communication
• Multithreading
• Protection and security
• Networking services
Figure 1-3 depicts how users and their programs access system resources
and services through the kernel’s application programming interface.
Error
Program I/O Protection
Filesystems Communication detection Accounting
execution operations and security
and recovery
Kernel
Hardware
Each of the boxes inside the kernel region represents a different service
category. The box labeled “System calls” represents the part of the API that
programs use to request and obtain these services, whereas the box labeled
10 Chapter 1
“System programs” is the set of stand-alone programs that users can run to
obtain these services.
Commands
A command is an instruction that you enter by inputting text, usually (but not
always) using a keyboard. Commands may have options and arguments fol-
lowing the command name. Options modify the behavior of the command,
whereas arguments are the command’s inputs. For example:
$ firefox
If you give it the -P myprofile option, it starts up with the user profile
named myprofile. If you enter just
$ firefox -P
it displays a dialog asking you to pick a profile from a list. The profile name
is a nonrequired argument to -P.
The rules for giving option arguments are:
• The argument to a short option follows it immediately, possibly with
intervening space or TAB characters, as in -ohello or -o hello. The
one exception is that nonrequired arguments can’t have space be-
fore them.
• The argument to a long option follows the = operator without
intervening space, as in --date='Jan 01,1970'.
The typical command consists of the command name followed by op-
tions and then arguments, but some commands allow the options and argu-
ments to be intermixed. For example:
$ gcc -g myprog.c
$ gcc myprog.c -g
Shells
The word shell is the Unix term for a particular type of command line in-
terpreter. Command line interpreters have been provided with operating
systems since their inception. Early mainframes and personal computer op-
erating systems required people to interact with them exclusively through
a command line interpreter. DOS, for example, provided a command line
interpreter, which became the basis for the Microsoft Command window,
which was simply a DOS emulator.
A command line interpreter presents a prompt of some kind, indicating
that it’s waiting for you to enter a command. At the prompt, you type a com-
mand and press ENTER, causing the command to be executed, after which
the prompt reappears:
12 Chapter 1
$ hostname
harpo
$
If you enter the hostname command, it shows the name of the computer
on which you’re working. Here it printed harpo, the name of my computer,
and redisplayed the prompt. The shell continues to run until you give it a
command to terminate itself, such as exit.
In Unix, a shell is not just a command line interpreter; it’s also a pro-
gramming language interpreter. You can use it to define variables, evaluate
expressions, perform I/O, use conditional control-of-flow statements such as
loops and branching statements, define and call functions, and much more.
In short, it has most of the features of a high-level programming language
such as C. You can save a sequence of shell commands into a file to be exe-
cuted at another time. Such a file is called a shell script. You can arrange for
the shell to execute these shell scripts in a few different ways.
Most shells also implement various frequently used commands as func-
tions inside the shell itself, which are called shell builtins or just builtins. Build-
ing a command directly into the shell speeds up its execution because calling
a function takes much less time than starting a separate program, which
requires kernel intervention.
In a typical Unix system, you can choose which shell you’d like to use
from among several different shells, depending on your preferences.
The oldest of the most commonly distributed shells, which was part of
Seventh Edition UNIX (released in 1979 by Bell Labs), is known as the Bourne
shell, so named because it was written by Stephen Bourne [3]. The name of
the shell program was sh, which is what you had to enter to run it. It was the
first extension to the original UNIX shell, written by Ken Thompson. The
Bourne shell is important because it is always part of any Unix distribution
and many administrative scripts are written in it, requiring that it’s installed.
Some commands will fail if it isn’t found on the system.
Other common shells that have been around a long time include the C
shell (csh) and the Korn shell (ksh).
However, the most commonly used shell in GNU/Linux systems is the
Bourne Again SHell, whose program name is bash, and that is the shell we’ll
use in this book. The GNU Project created bash by extending the Bourne
shell with features from the Korn shell and the C shell (https://www.gnu.org/
software/bash/).
Core Concepts 13
The traditional method of authentication in Unix gives every user a
unique username and an associated unique, nonnegative integer user ID,
or UID for short. The username is the name a person enters to log in
to the system. Each user also has an associated password. Unix uses the
username/password pair to authenticate a user attempting to log in. If the
username does not exist or the password doesn’t match it, the system rejects
the user. System files store passwords in an encrypted form.
LOGGING IN
To log in to a system is to log into it. One of the dictionary meanings of the verb
to log that existed long before computers did is to record something in a log-
book, as a sea captain or airplane pilot does. The term login conveys the idea
that the action is being recorded in a logbook. In Unix, logins are recorded in
a file that acts like a logbook. The system maintains a list of names of users who
are allowed to log in. We take this term for granted. We use the noun login
as a single word only because it has become a single word on millions of login
screens around the world. To log in, as a verb, really means to log into some-
thing; it requires an indirect object.
To be precise, in modern Unix systems, a user is any entity that can run
programs and own files. This entity need not be an actual person. For vari-
ous reasons, the definition of a user was generalized to allow abstract entities
as well as programs to be users as well. For example, root, syslog, and lp are
each nonperson users.
A group is a set of users. Just as each user has a username and user ID,
each group has a unique group name and an associated unique, nonnegative
integer group ID, or GID for short. Unix uses groups to provide a means
of resource sharing. For example, a file can be associated with a group, and
all users in that group would have the same access rights to that file. Since
a program is just an executable file, the same is true of programs; an exe-
cutable program can be associated with a group so that all members of that
group will have the same right to run that program.
Every user belongs to at least one group, called the user’s primary group.
You can use the id command to print your username and user ID and the
group name and group ID of all groups to which you belong:
$ id
uid=500(stewart) gid=500(stewart)
groups=500(stewart),4(adm),24(cdrom),27(sudo)
In fact, you can supply id with any username, and it will list their information:
$ id syslog
uid=102(syslog) gid=106(syslog) groups=106(syslog),4(adm),5(tty)
Alternatively, you can use the groups command to print a list of groups to
which you (or another user) belongs:
14 Chapter 1
$ groups
stewart adm cdrom sudo
$ groups syslog
syslog : syslog adm tty
Environments
When a program is run in Unix, one of the steps that the kernel takes prior
to running the program is to make available to it an array of name-value pairs
called the environment list, or simply the environment. Each name-value pair in
this list is a string of the form name=value, where value is a NULL-terminated C
string and there are no spaces around the = character. The name is called an
environment variable and name=value is called an environment string. For example
LOGNAME=stewart
is an environment string that specifies that the variable named LOGNAME has
the value stewart. Variable names are not allowed to contain the = character,
Core Concepts 15
but otherwise they have no restrictions. However, for portability of any pro-
grams that use these variables, and by convention, they should contain only
uppercase letters, digits, and underscores and should not begin with a digit
(see The Open Group Base Specifications, Issue 7, 2018, Chapter 8 [14]).
In this example
COLUMNS=80
void main()
{
char *shell = getenv("SHELL");
printf("The current shell is %s.\n", shell);
}
16 Chapter 1
The program needs to include the stdio.h header file because it calls the
printf() function and the stdlib.h header file because it calls getenv(), which is
declared in that header. We compile it and run it as follows:
This is a sneak preview of how we compile code using the GNU gcc com-
piler. We give gcc the name of the source code file, getenv_demo.c, and the
option -o getenv_demo to store the output of the compiler in the executable
file named getenv_demo. Without that option, it would store the executable
in a file named a.out. In the next chapter we’ll explain thoroughly the pro-
cess of building executable code.
Files
For most people who use computers, files are simply objects that store in-
formation. These objects usually reside on nonvolatile storage devices, which
are storage devices that retain data even when power is not applied to them,
such as magnetic tapes and magnetic, optical, and electronic disks. (In con-
trast, volatile storage, such as main memory, does not retain data when it is
powered off.) These nonvolatile storage devices are called secondary storage
devices or external storage devices, even though they might appear to you to
be “inside” the computer. The nomenclature is a historical artifact.
In many non-Unix systems, the operating system recognizes different
types of files, each having its own specific structure, such as word processor
documents, image files, or spreadsheets. In fact, in those systems, files often
have names or extensions that can be used to infer their structure or even
cause a specific program to load them.
In Unix, however, the story is very different. From the kernel’s viewpoint,
an ordinary file is just an object that contains a linear sequence of bytes.
It does not impose any structure on the contents of this kind of file; any
structure that it might have is given to it by the user or program that creates
it. These files are called regular or plain files. Some of these files are what
we commonly call text files because when we open them we see plaintext.
These files contain sequences of characters with lines demarcated by newline
characters; programs that are designed to display them use the embedded
Core Concepts 17
newline characters to create the line structure on the screen. Binary files, in
contrast, are files that contain byte sequences that are not necessarily text
characters, such as a program’s executable code.
File Types
The Unix kernel does define a small set of file types other than these regular
files:
• Directories
• Device files
• Pipes
• Sockets
• Symbolic links
Directories are described in “Directories” on page 19. Device files, pipes,
and sockets are collectively called special files. Special files are an unusual fea-
ture of the Unix system of files. They were invented to provide a method of
programming I/O in a device-independent way. Chapters 6 and 13 cover de-
vice files, pipes, and device-independent I/O. Sockets are a type of device file
that allows processes to communicate with each other, and they’re primar-
ily used in network communication. Because they are a complex topic that
can fill a book by themselves, I don’t cover them in this book. I define and
discuss symbolic links in “Symbolic Links” on page 24.
Directories
A directory, often called a folder in other operating systems, is a type of file
that, from the user’s perspective, appears to contain other files. We tend to
visualize them as shown in Figure 1-4.
jammy
This is only an illusion; directories don’t contain files any more than the
table of contents contains the chapters of the book. What then is a directory?
To be precise, a directory is a file that contains a table of directory entries,
which are properly called links. A link is an object that associates a filename
to an actual file. It has two components: the filename and a reference to a
file’s inode. The links may reference any type of file, including directories,
implying that directories can be members of directories. However, a link isn’t
allowed to refer to a file that’s on a different device from the directory itself.
Directories are never empty because every directory contains two links,
named . (dot) and .. (dot-dot). These entries have a predefined meaning:
. is a link to the directory itself, and .. is a link to the directory containing
this directory, which is called the parent directory. Figure 1-5 shows what the
actual directory table for the directory named jammy in Figure 1-4 looks like.
jammy directory
Reference
to file Name
53 .
2 ..
12 kernel
185 drivers
282 README
$ ls
chapters/ fonts/ images/ main.tex main.bib
$ ls chapters images
chapters:
appendix_a.tex chapter_02.tex chapter_05.tex preface.tex
back_matter.tex chapter_03.tex front_matter.tex
chapter_01.tex chapter_04.tex intro.tex
images:
chapter_01/ chapter_2/ chapter_3/ chapter_4/ chapter_5/
Notice that each directory’s name appears first, followed by the files that are
in that directory. The number of columns that ls uses is based on how many
names the directory has and their lengths.
We can change the current working directory with the cd command:
$ cd chapters
$ ls
appendix_a.tex chapter_02.tex chapter_05.tex preface.tex
back_matter.tex chapter_03.tex front_matter.tex
chapter_01.tex chapter_04.tex intro.tex
Notice that now the ls command displays the contents of the new working
directory, which is chapters. We can return to the previous directory via the
.. link:
20 Chapter 1
$ cd ..
$ ls
chapters/ fonts/ images/ main.tex main.bib
The output of ls shows that the working directory is once again the parent
of chapters, since the list of filenames is the same as it was before we changed
directory to chapters.
Filenames
Files and filenames, as noted earlier, are different things. A filename is a
string that names a file. It is part of the link contained inside a directory. A
single nondirectory file may have names in different directories (on the same
logical device) and can therefore appear to be a member of many directories.
However, files exist independently of the directories in which they appear. If
the same file has names in different directories, the references associated to
those names in the links all point to the exact same inode, namely the unique
inode for that file. It’s like a person traveling with several passports. The
passports might have different names for the person and be used in different
countries, but they each represent the same person. Figure 1-6 illustrates
this idea.
File’s
inode
In this figure, one file is known by three different names, each being a
link to a different directory.
Filenames are allowed to be quite long. The maximum number of char-
acters in a filename is defined by a system-dependent constant NAME_MAX,
which is usually 255 characters. They can contain almost any character ex-
cept a forward slash (/) and the NULL character (\0), but you shouldn’t use
certain characters in filenames even if they’re allowed. For example, a file-
name can have spaces and newlines, but if it does, you’ll usually need to put
quotes around the name to use it as an argument to commands. Certain
characters, such as $, &, *, and others, have a distinct meaning to various
programs and must be escaped by preceding them with a backslash if they’re
used in those contexts, so it’s best to avoid them. The convention is to use
only alphanumeric characters, the underscore, and the hyphen in filenames.
Unix is case-sensitive, such that source and Source would be treated as two dif-
ferent filenames.
Core Concepts 21
Unlike most other operating systems, Unix doesn’t use filename exten-
sions for any purpose, although user-level software such as compilers and
word processors might use them as guides. Desktop environments such as
GNOME and KDE can create associations based on filename extensions in
much the same way that Windows and macOS do, but Unix itself doesn’t
have a notion of file type based on content, and it provides the same set of
operations for all files, regardless of their type. In Unix, we use the word
suffix for the part of a filename after a period, such as the c in myprog.c.
22 Chapter 1
/
sweiss ...
Core Concepts 23
don’t depend on where they are stored, whereas bootloader files are specific
to a given machine and aren’t shareable.
Variable files are files whose contents can change, whereas static files
are those whose contents cannot. They include, for example, executable
binaries, libraries, documentation files, and other files that don’t normally
change in the day-to-day operation of the computer. In modern Unix systems,
the shareability and variability of files are factors in deciding which ones are
in which parts of the hierarchy. Files that differ in either of these attributes
are placed into different directories, which makes it easy to store files with
different usage characteristics on different filesystems and also makes backing
up easier. For example, the /etc directory is unshareable—it contains files
specific to the particular computer—and it’s static because its contents are
configuration files that are modified only when we apply updates, install
new software, or the superuser decides to change configurations. The /var
directory is so named because it is variable. It contains many different types
of logfiles that the kernel and applications update on a regular basis. Some of
its subdirectories, such as /var/mail, may be shareable, whereas others such
as /var/log may be unshareable. The /usr directory is shareable and static. It
contains application binaries, libraries, and static data.
Symbolic Links
An ordinary link is a directory entry that points to the inode for a file, but
a symbolic link is a file whose contents are just the name of another file. The
file to which the link points is called the target of the link. The inode for a
symbolic link identifies that file as a symbolic link. It’s similar to a shortcut in
the Windows operating system. Symbolic links are often called soft links in
contrast to ordinary links, which are called hard links.
Usually, commands, programs, and the kernel itself, when they are given
a symbolic link when a filename is expected, will operate on the target of
the link, not the link itself. They can easily see that the file is a symbolic link
because the inode indicates it. We say that a link is dereferenced or is followed
when the link is opened to access its target.
Symbolic links pose hazards for the operating system and applications
because of the possibility of circular references and infinite loops. The dan-
ger is that a symbolic link can point to a directory, which means that if a
program follows symbolic links, it might return to a directory that it already
visited and end up in a cycle. Chapters 6 and 7 address issues related to sym-
bolic links in more detail.
Pathnames
A pathname is a character string that identifies a file. There are two types of
pathnames: absolute and relative. An absolute pathname starts at the root
of the directory hierarchy and starts with a leading forward slash, /. Zero
or more filenames separated by slashes follow that leading slash, such as
/data/jammy/kernel/sched/sched.h. All filenames except the last must be di-
rectory names or symbolic links whose targets are directory names. Each of
the names in the example pathname except sched.h is a directory. The last
24 Chapter 1
name in the path may be any type of file. Other examples of absolute path-
names are /usr/bin/, /usr/local/share/man, and /home/stewart/unixbook/figures/
figure01.png.
Terminating a pathname with a slash is acceptable if the last filename in
it is a directory, as in the pathname /usr/bin/.
If you accidentally insert more than one slash between the names in the
path, it will be ignored. The two absolute pathnames /usr/local/share/man
and /usr/local///share/man are the same.
If a pathname doesn’t start with a leading slash, it’s called a relative
pathname. A relative pathname starts in the current working directory, which
we can now accurately define. The current working directory (also called the
present working directory) is the directory that any running program uses to re-
solve pathnames that do not begin with a /. For example, if the current work-
ing directory is /home/stewart/unix_book, the pathname chapters/chapter_01
refers to a file whose absolute pathname is /home/stewart/unix_book/chapters/
chapter_01.
The environment variable PWD contains the absolute pathname of the
current working directory. The pwd command prints the value of PWD:
$ pwd
/home/stewart/unix_book
$ printenv PWD
/home/stewart/unix_book
Pathnames can become very long if they contain symbolic links, and
Unix systems limit their length, expressed in bytes. POSIX.1-2024 speci-
fies that the constant PATH_MAX is the maximum number of bytes allowed in
a pathname, including the terminating NULL byte. On many Linux systems,
it is 4096 bytes.
Processes
People (and sometimes programs) write programs. Programs are sequences
of instructions to the computer, written in a programming language. The
language might be a high-level one, such as C or C++, or it might be a low-
level one, such as an assembly language. In general, programs can’t be exe-
cuted in the form in which they’re written; they must be translated into an
executable form. The exceptions to this are programs written in scripting
languages, such as JavaScript, PHP, and BASIC. These aren’t translated into
an executable; an interpreter program reads the source code directly and
executes their instructions one after another.
We call the first form of a program the source code and the second form
the executable code or, simply, the executable. For example, the source file
hello_world.c from Listing 1-1 is a human-readable text file. You can use the
GNU C compiler to build an executable from it named hello_world with the
following command:
Core Concepts 25
The file hello_world will be an executable file residing, by default, in the same
directory as hello_world.c. You can’t use ordinary text editors to see or mod-
ify the contents of this file because it’s not plaintext; it’s a binary file.
Perhaps surprisingly, even running a program is a complex procedure
(we’ll cover the details in Chapter 11). The executable form of most programs
isn’t something we can actually run. We can’t just load it into memory and tell
the machine to start running that file from its first byte. That file is usually
a conglomeration of executable code, various tables, and instructions to a
linker/loader. When you enter the command
$ ./hello_world
a sequence of actions takes place that causes a linker/loader to use the in-
formation in that hello_world executable to load the file, as well as any shared
objects that it needs, into memory, prepare the program for execution, and
run that program.
Many users can run a single program at the same time on a given ma-
chine, or a single user can run one multiple times in different terminal win-
dows. Either way, it means that one executable can have many running
instances, which is what leads us to distinguish between programs and pro-
cesses. A process is an instance of a running program. Each separate instance
is a different process, although each and every one of them is executing the
exact same executable file.
This formal definition of a process doesn’t really tell you what a process
is in concrete terms, even though it’s the one you’ll likely see in an operat-
ing systems textbook. It’s like defining a baseball game as an instance of the
implementation of the set of rules created by Alexander Cartwright in 1845
by which two teams compete against each other on a playing field. Neither
definition gives you a mental picture of what’s being defined. Let’s make it
more concrete.
When a program is run on a computer, it uses resources such as pri-
mary memory and secondary storage space; kernel memory (kernel space)
for mappings and tables of various kinds, such as a table of which parts of
primary memory it uses; privileges, such as the right to read or write certain
files or devices; and much, much more. As a result, at any moment of time,
a process is associated with the collection of all resources allocated to that
instance of the running program, as well as any other properties and settings
that characterize that instance, such as the values of the processor’s regis-
ters. Thus, although the idea of a process sounds like an abstract idea, it is,
in fact, a very concrete thing, and an operating system must manage it.
Unix systems assign to each process a unique nonnegative integer called
its process identifier, or PID for short. We can learn a bit about processes us-
ing the ps command, which can display a list of running processes, as well as
selected information about each of them. It has various options to control
which processes it displays and what information it outputs. In its simplest
form, with no options, we can use it to see the PIDs of our own running
processes:
26 Chapter 1
$ ps
PID TTY TIME CMD
10278 pts/0 00:00:00 bash
11087 pts/0 00:00:00 ps
This lists two processes: one running bash and the other running the ps
command itself. They use so little time that it shows up as zeros, and their
respective PIDs are 10278 and 11087. They’re both running in a terminal
whose device name is pts/0.
At the programming language level, we can call the getpid() function
to obtain the PID of the process that invokes it. We demonstrate this in the
getpid_demo.c program:
All this program does is print its own PID, but it illustrates how to use getpid().
The program includes the header file <unistd.h> ¶ because the getpid() func-
tion, called inside the argument list of printf() ·, is a system call, and almost
all system call declarations are in <unistd.h>. This is our first program to
make a system call.
The return value of getpid() is the PID of the process that calls it. Be-
cause PIDs are integers in the format string of printf(), we use the %d format
specification to print the return value as a fixed decimal numeral. Assuming
that getpid_demo.c is in our working directory, we can compile and run it with
these commands:
If we were to run this same program again, it would print a different PID,
proving a new process is created whenever it is run.
Threads
The programs that we’ve described so far in this chapter are assumed to
have a single thread of control. A thread of control is a single sequence of
instructions that’s executed one instruction at a time, one after the other,
during the execution of a program. Originally, all programs had a single
thread of control. As the cost of computer processors became smaller and
smaller, hardware vendors started building computers containing multiple
processors, and computer scientists sought ways to take advantage of this
new technology. They designed and created programming languages and
Core Concepts 27
libraries that would allow a program to contain more than one thread of
control, each of which could run on the separate processors simultaneously.
These threads of control were named threads for simplicity.
POSIX.1-2024 formally defines a thread as a single flow of control through
a process together with the required system resources to support a flow of
control [14].
The traditional Unix process is a single thread, but in modern operating
systems, processes in general can have multiple threads. When a process has
multiple threads, it’s called a multithreaded process. A multithreaded process
has two types of resources: those that are shared among all of its threads,
which are generally called global or shared, and those that are unique to
each thread, commonly called either thread local, private, or per-thread. In
Chapter 15, we detail exactly which process resources are shared and which
are thread local.
Unix systems in general support multithreading, and Linux in particu-
lar supports several different types of threads. Linux handles threads in an
interesting way; it treats all threads as standard processes. It doesn’t provide
any special scheduling or data structures for threads. To the Linux kernel,
processes and threads are both called tasks and are both represented inter-
nally by the same data structure, called a task_struct [4]. In Linux, a task is
an entity that’s assigned system resources and can be scheduled on a proces-
sor. The difference between threads and ordinary processes in Linux is that
threads can share resources, such as their address space, whereas processes
don’t share any resources.
In many Unix implementations, a thread has a thread identifier (TID) that
is unique in the operating system, but POSIX.1-2024 doesn’t require this. It
requires only that within a single process, each thread’s TID is unique. Linux
handles TIDs with a two-pronged approach: In a single-threaded process, the
TID is equal to the process ID, whereas in a multithreaded process, all threads
have the same PID, but each one has a unique TID. In Linux, a thread can
call the gettid() function to obtain its thread ID. The gettid_demo.c program
demonstrates this idea:
void main()
{
printf("I am a thread with thread ID %d\n", gettid());
}
The program uses the C preprocessor #define directive to define the sym-
bol _GNU_SOURCE. Unless this symbol is defined, the compiler won’t see the
various declarations in the header files that are needed for the program to
call gettid(). This is an example of a feature test macro, which is explained in
28 Chapter 1
“Portability” in Chapter 2. The #define directive must appear before all in-
clude directives. We can compile and run it as shown in the following sam-
ple session:
If we run this program again, it too will display a different TID each time for
the same reasons as before: A new process runs, and its TID is the same as
its PID when it has one thread.
Online Documentation
Unix systems provide several different types of online documentation. In
this context, online means on the computer that you are using, not on the
World Wide Web.
The man pages are an important part of Unix documentation. They act
as an online reference when you want to learn about any part of the Unix
system, such as a command, a function from one of the libraries, a system
Core Concepts 29
call, a device interface, a system file, various file formats, and much more.
Although the documentation is very thorough and detailed, it’s usually not
tutorial in nature. It can be overwhelming sometimes, but many pages have
code examples that you can compile, modify, and run.
Over the years in which I taught Unix system programming, students
would sometimes say that they didn’t need to learn how to use the man
pages because all that information is on the web and they just had to google
it. It’s true that you can find copies of the man pages on many websites and
read posts on discussion boards, but the reasons for reading the man pages
on your own Unix installation go beyond this:
• The versions of the man pages on your system were installed at the
time that the software they document was installed, and they are
updated whenever you update the software itself and the software
has updates to apply to them.
• Man pages are written by the people who wrote and maintain the
software and are trustworthy and accurate.
• The man pages on your system are self-contained in the sense that
any cross references they make are also on your system.
• You can read them even if your internet connection isn’t available.
To view the man page for a given topic, enter man followed by the topic in
which you’re interested, meaning the command name, function name, and
so on. For example, enter man man to read the man page for the man command
itself:
$ man man
MAN(1) Manual pager utils MAN(1)
NAME
man - an interface to the system reference manuals
--snip--
The output is just the first few lines of that page. The first line shows
that the man command is in Section 1 of the man pages because the title con-
tains MAN(1). The text Manual pager utils is not the name of Section 1; we’ll
call it the man page header or the header when the meaning is clear. Different
man pages in Section 1 may have different headers. After the word NAME is
the name of the command followed by a very brief description of what the
command does. This is the very first man page you should read, and we’ll
revisit it shortly.
All POSIX-conforming Unix systems are required to contain man pages
for all of the header files that might be included by a function in the kernel’s
API. To put it more precisely, each function in the System Interfaces volume
of POSIX.1-2024 specifies the headers that an application must include to
use that function, and a POSIX-conforming system must have a man page
for each of those headers. They may not be installed on the system you’re
using, but they’re available. They’re installed only if the system administra-
tor installed the application development files.
30 Chapter 1
The man page for the scanf() function starts with the following lines:
SYNOPSIS
#include <stdio.h>
--snip--
It tells us that we need the header file stdio.h to use scanf(). We can enter
man stdio.h to read about that header file, which outputs the following:
PROLOG
This manual page is part of the POSIX Programmer's Manual. The Linux
implementation of this interface may differ (consult the corresponding
Linux manual page for details of Linux behavior), or the interface may
not be implemented on Linux.
NAME
stdio.h - standard buffered input/output
--snip--
Notice that this man page is in a section whose number is 7posix. On your
system, this page might be in a different section, such as Section 0.
One challenge with using the man pages is that you need to know the
name of the command or function in which you’re interested for them to be
of help. The man pages do have a relatively simple search mechanism, but
they are really intended as a reference manual for people who already have a
sense of what it is they need to look up, so if you know what you want to do
but don’t know the command name, the challenge is how to find it.
The man pages play a key role in helping you solve problems on your
own. My method of teaching how to write system programs is based on using
the man pages to guide the learning process. They’re inextricably linked
to learning system programming in this book, so I’ve included a separate
section, “Using the Manual Pages” on page 34, that explains their structure
and how to use them in greater depth, including the syntax they use for
specifying options and arguments.
$ info ls
Next: dir invocation, Up: Directory listing
The 'ls' program lists information about files (of any type, including
directories). Options and file arguments can be intermixed arbitrarily,
as usual.
--snip--
When there isn’t a page for a particular topic in the Info system, the Info
reader opens up the man page for that topic instead.
The Info pages use a method of navigation similar to the one in Emacs,
which people often find hard to use. There’s a method of reading an Info
document and bypassing the navigation in it by piping its output into a pager
program such as more or less, as shown here:
$ info ls | more
File: coreutils.info, Node: ls invocation, Next: dir invocation, Up: Directory
listing
The 'ls' program lists information about files (of any type, including
directories). Options and file arguments can be intermixed arbitrarily,
as usual.
--snip--
The same information is displayed, but it also mentions the file in which it’s
contained: coreutils.info. We’ll explain how this works and what pagers are in
“The Pager” on page 34.
Application-Provided Documentation
Sometimes you can also find information about a particular application or
program in one of the directories in /usr/share/doc. Many applications and
higher-level program installers place their documentation there. This doc-
umentation sometimes includes extensive usage examples, development
notes, and hints on where to find further information.
Some commands have a means of displaying their own help, usually by
providing an option such as --help:
32 Chapter 1
$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
--snip--
Shell Help
Certain shells have a help feature for commands that are built into the shell. In
particular, bash has a help command, which when entered without arguments
prints a two-column list of all bash builtins with options and arguments listed:
$ help
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
These shell commands are defined internally. Type `help' to see this list.
Type `help name' to find out more about the function `name'.
Use `info bash' to find out more about the shell in general.
Use `man -k' or `info' to find out more about commands not in this list.
When given the name of a particular bash builtin, it prints a short sum-
mary of how to use that command:
$ help pwd
pwd: pwd [-LP]
Print the name of the current working directory.
Options:
-L print the value of $PWD if it names the current working directory
-P print the physical directory, without any symbolic links
Core Concepts 33
Exit Status:
Returns 0 unless an invalid option is given or the current directory
cannot be read.
$
The help command uses the same syntax as the man pages.
The Pager
A pager is a program that displays its input one screen at a time. The man
pages are stored in a compressed format in the directory hierarchy. The man
command decompresses and formats them and then displays them with its
pager. The default pager is actually named pager, but it’s usually a symbolic
link to the less command. Therefore, when you view a page, you’ll most likely
be using less. The : at the bottom of the screen is followed by your cursor
because the : is the less command’s prompt for you to type something on the
keyboard. You can change the pager that man uses by changing the value of
the PAGER environment variable. The following list describes some of the basic
navigation controls when you use the default pager:
• To see the next screen, press SPACEBAR or enter f (for forward).
• To go back one screen, enter b (for backward).
• To stop reading, enter q for quit.
• To go to line N, enter NG. If you just enter G, you’ll go to the bottom
of the page.
• To search forward for keyword, enter /keyword. Enter n to find the
next occurrence downward, or enter N to search upward.
• To search backward for keyword, enter ?<keyword>. Enter n to find the
next occurrence upward, or enter N to search downward.
To see the list of all possible navigation operators, read the man page
for the pager. Both of the search operators accept patterns with wildcards,
which you can read about in the man page for the pager command.
34 Chapter 1
The Structure of Man Pages
Entering man followed by the name of any command or topic that has a man
page displays that man page. We saw earlier that the man command has a
page for itself as well. We’re about to study that page, but before we do, let’s
take a look at a couple of other, simpler pages.
Since we’ve already seen the echo command in the Introduction, let’s
start with that. If you want to learn more about how to use echo, you’d enter
man echo and you’d see several screens of output, beginning with:
NAME
echo - display a line of text
SYNOPSIS
echo [SHORT-OPTION]... [STRING]...
echo LONG-OPTION
DESCRIPTION
Echo the STRING(s) to standard output.
The top of the page often has everything you need to know, such as what
options are available and whether there are multiple forms of the command.
Sometimes the name of the man page man displays is different from the
name of the command that you entered as an argument. For example, enter-
ing man view produces the following output:
NAME
vim - Vi IMproved, a programmer's text editor
SYNOPSIS
vim [options] [file ..]
vim [options] -
--snip--
ex
view
gvim gview evim eview
rvim rview rgvim rgview
--snip--
This is the page for vim, but the view command is listed on that page. Some-
times a single man page provides information about related commands.
Core Concepts 35
Notice too that instead of the title User Commands, this page’s title is General
Commands Manual. People who write man pages follow a standard, but that
standard allows some variation, such as in the title of the page.
The sections of a man page are somewhat standardized. A few sections
are required, but most sections are optional. The following list shows some
common section names and describes their contents:
NAME The name of this manual page
SYNOPSIS A brief summary of the command’s or function’s interface
DESCRIPTION An explanation of what the program, function, or
format does
OPTIONS For commands only; a description of the command line op-
tions accepted by a program and how they change its behavior
USAGEFor commands; a more thorough description of the use of the
command
A list of all environment variables that affect the
ENVIRONMENT VARIABLES
command or function and how they affect it
EXIT STATUS For commands; a list of exit values returned by the
command
RETURN VALUE For functions; a list of the values the function will return
to the caller and the conditions that cause these values to be returned
ERRORS For functions; a list of the values that may be placed in the static
variable errno in the event of an error, along with information about the
cause of the errors
FILES A list of the files used by the command or function, and files that
might be modified
ATTRIBUTES Architectures on which it runs, availability, code indepen-
dence, and so on
VERSIONS A brief summary of the kernel or library versions where a
function appeared or changed significantly in its operation
CONFORMING TO The standards to which the implementation conforms
BUGS A list of limitations, known defects or inconveniences, and other
questionable activities
EXAMPLES If present, examples of how to use the command or function
AUTHORS A list of authors of the documentation or program
SEE ALSO A list of commands related to this command
NOTES General comments that do not fit elsewhere
NOTE It’s unfortunate nomenclature that the word section is used in two different ways.
Do not confuse the sections of a man page with the sections of the manual.
36 Chapter 1
The most important sections to study when reading a man page for the
first time are NAME, SYNOPSIS, DESCRIPTION, and SEE ALSO, and if you’re reading
about a command, check the OPTIONS section also. The SYNOPSIS section con-
tains a brief summary of the command or function’s interface. If there’s an
EXAMPLES section, I often look at it at right after reading the SYNOPSIS, which is
usually my first stop on the page. The examples typically include programs
you can copy and run or commands that you can try out.
The SYNOPSIS section for commands shows the command’s syntax, in-
cluding all arguments and options. Square brackets ([ ]) surround optional
elements, a vertical bar (|) (sometimes called an alternation operator) sepa-
rates choices among elements, angle brackets (< >) surround placeholders,
and an ellipsis (...) represents elements that can be repeated. When multi-
ple option letters are enclosed in square brackets, such as in [-aHvW], all of
them can be given together. If it were written as [-a | -H | -v | -W], only
one of the choices would be allowed. To illustrate, the git command, which
is a version control program, has the following complex synopsis:
For functions, the SYNOPSIS shows any required data declarations or #include
directives, followed by the function declaration. If there are feature test macro
requirements, which we cover in “Feature Test Macros” on page 67, these are
described as well. When you read about a function, you must read the ERRORS
and RETURN VALUE sections; they tell you what possible errors the function re-
ports, what values it can return, and how you need to handle them.
For learning how to use commands and functions, the man page by it-
self is usually sufficient. To understand how a command interacts with the
operating system or how it might be implemented, we’ll need to do more re-
search. In Chapter 3, we’ll go through an exercise that shows how to use the
man pages in more detail.
Core Concepts 37
Searching Through the Man Pages
The man command has a number of options for performing searches. Let’s
look at the beginning of the man page for man:
NAME
man - an interface to the system reference manuals
SYNOPSIS
man [man options] [[section] page ...] ...
man -k [apropos options] regexp ...
man -K [man options] [section] term ...
man -f [whatis options] page ...
man -l [man options] file ...
man -w|-W [man options] page ...
DESCRIPTION
man is the system's manual pager. Each page argument given to man is
normally the name of a program, utility or function. The manual page
associated with each of these arguments is then found and displayed. A
section, if provided, will direct man to look only in that section of
the manual. The default action is to search in all of the available
sections following a pre-defined order (see DEFAULTS), and to show only
the first page found, even if page exists in several sections.
--snip--
You may not see all of the options that appear here. The POSIX.1-2024 stan-
dard (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/man.html)
requires only the -k option, but most implementations provide more. The
output shown in this example is from the most recent version of the man
page from the the Linux man-pages Project (https://www.kernel.org/doc/man
-pages/), which provides and standardizes man pages separately from the
POSIX.1-2024 standard. A number of Linux distributions, including Debian,
Fedora, Gentoo, openSUSE, and Ubuntu, as well as macOS and a few propri-
etary Unix systems, conform to this latter standard. (See https://man-db.gitlab
.io/man-db/ for an alternative set of man pages that can be installed on other
systems.)
The most important options for us are -k and -K, which allow us to search
through the man pages for keywords. If you read further in the man page,
you’ll see the following example:
man -k printf
Search the short descriptions and manual page names for the
keyword printf as regular expression. Print out any matches.
Equivalent to apropos printf.
Further down the page, you’ll see a description of what this and the -K do:
38 Chapter 1
-k, --apropos
Equivalent to apropos. Search the short manual page descriptions
for keywords and display any matches. See apropos(1) for details.
-K, --global-apropos
Search for text in all manual pages. This is a brute-force
search, and is likely to take some time; if you can, you should
specify a section to reduce the number of pages that need to be
searched. Search terms may be simple strings (the default), or
regular expressions if the --regex option is used.
The -k option allows us to search through all man pages to find those short
descriptions that match the word we give it. The short description is the NAME
section and its one-line description. The -K option searches the entire page,
not just the short description, for a match. We are warned that this is slow,
but we may occasionally find use for it.
The page also suggests that we should read about the apropos command.
If we look at its man page, we find exactly what we need:
$ man apropos
APROPOS(1) Manual pager utils APROPOS(1)
NAME
apropos - search the manual page names and descriptions
SYNOPSIS
apropos [-dalv?V] [-e|-w|-r] [-s list] [-m system[,...]] [-M path] [-L
locale] [-C file] keyword ...
DESCRIPTION
Each manual page has a short description available within it. apropos
searches the descriptions for instances of keyword.
We can use apropos for searching. If we give it the -r option, we can supply a
regular expression, which is a particular type of pattern, or we can give it -w
and use a different kind of pattern called wildcards, which are patterns used
for matching filenames. If we give it the -e option, it will match the keyword
exactly.
If we read more in this page, we’ll see that by default, matching is case-
insensitive. Also by default, apropos searches through all sections (volumes)
of the manual, but we can limit searches to specific sections with the -s op-
tion. The -a option forces the match to return only those pages that match
Core Concepts 39
all of the search terms rather than any of the search terms. A few examples
will demonstrate:
$ apropos case
$ apropos Case
Both of these match any line containing the word case, case insensitively.
Matches can include lines that contain words that have case as a sub-
string, such as lowercase, case-insensitive, and so on, and the search will check
all sections. Here are two examples that clarify this:
The first command limits the search to Sections 2 and 3 and matches de-
scriptions with any words containing file, such as filename, FileProducer, and
so on. The second matches only lines that have the exact word file, so it ex-
cludes filename, FileProducer, and so on.
NOTE The apropos command may be implemented differently on your system than what I
describe here. The options may have slightly different usage. For example, in Ubuntu
Linux, the option -s3 searches through Sections 3, 3posix, 3perl, and so on. On your
system, you may have to specify all sections explicitly. You should base your use of it
on what your system’s apropos man page states.
Consider this example:
This command matches all pages whose short descriptions contain the two
words convert and case, not necessarily next to each other, such as convert
lowercase.
The following command matches just those lines containing a word
starting with case or a word in which case is part of a hyphenated word such
as case-sensitive:
40 Chapter 1
Unix History and Standards
Finally, the number of UNIX installations has grown to 10,
with more expected.
—Ken Thompson and Dennis Ritchie, UNIX Programmer’s Manual,
2nd edition, 1972
Why should you learn anything about the history of Unix if all that you care
about is how to write system programs? The most compelling answer is that
Unix’s complex, haphazard history is the cause of its lack of a single stan-
dard and the consequent need to read documentation very carefully to de-
cide whether your code will be portable or even be able to run on your own
system. By knowing something about its history you’ll see that certain fea-
tures originated in different Unix distributions and are sometimes incompat-
ible and that some are fusions of ideas from different branches of the Unix
family tree.
Unix has a colorful history filled with many stories [36]. Many articles,
websites, and books describe that history in great detail, and at the end of
this section I include references to several of them. Here, I describe the ma-
jor milestones on the path from its birth as an experimental platform for
Ken Thompson’s “Space Travel” game through the present.
Core Concepts 41
Early Branches
The University of California at Berkeley (UCB) was one of the universities
that obtained a copy of V4 from AT&T, and it embarked on a mission to
add more features to the operating system, thereby starting a new fork in
its development. When Ken Thompson spent 1975 and 1976 visiting UCB,
he and the students there added even more features to their copy of Unix.
These features weren’t present in the AT&T system from which it derived.
From 1974 to 1979, UCB and AT&T worked on independent copies of
Unix. By 1978, the various versions of Unix had most of the features found
in it today, but not all in one system. In the late 1970s, legal actions began
under US antitrust legislation to break up AT&T, the result of which was
that by 1982, when the breakup was complete, it was allowed to sell its own
brand of UNIX. AT&T then staked proprietary rights to this UNIX and sold
it commercially. AT&T’s first major commercial Unix was called System V,
released in 1983.
The versions of Unix developed at UCB were named Berkeley Software
Distributions (BSDs) and had names such as 1BSD, 2BSD, and so on. BSD
systems were released under a much more generous license than AT&T’s and
didn’t require a license fee or a requirement to be distributed with source
code. The result was that much BSD source code was incorporated into var-
ious commercial Unix variants. By the time that 4.3BSD was written, almost
none of the original AT&T source code was left in it. FreeBSD, NetBSD, and
OpenBSD were all forks of 4.3BSD, having none of the original AT&T source
code and no right to the UNIX trademark, but much of their code found
its way into commercial Unix operating systems as well. In short, two major
versions of UNIX had emerged: those based on the BSD family and those
based on the AT&T version.
42 Chapter 1
The Rise of Linux
In 1991, the picture was further complicated by the creation of a new kernel
named Linux. The Linux kernel was developed from scratch, unlike the BSD
systems, which made Linux a lot less like AT&T UNIX than BSD was. Be-
cause Linux was just a kernel, without any tools or libraries, it was bundled
together with the GNU Project software to turn it into a full-fledged operat-
ing system.
Linux was started by Linus Torvalds, who at the time was a student at
the University of Helsinki. Many of his ideas were based on the Minix oper-
ating system written by Andrew Tanenbaum, who was a professor in Vrije
Universiteit in Amsterdam. Tanenbaum made the sources for Minix avail-
able with copies of his book on operating systems [41]. Minix ran on Intel
386 processors but wasn’t efficient. Torvalds wanted to build a Unix kernel
to run more efficiently on the Intel 386.
Many Unixes
In 1993, AT&T divested itself of UNIX, selling it to Novell, which one year
later sold the trademark to an industry consortium known as X/Open. There
are now dozens of different Unix distributions, each with its own behavior.
There are systems such as Solaris and UnixWare that are based on SVR4, the
AT&T version released in 1989, and FreeBSD and OpenBSD based on the
UCB distributions. Systems such as Linux are hybrids, as are AIX, IRIX,
and HP-UX.
It is natural to ask what makes a system Unix. The answer is that over
the course of the past 30 years or so, standards have been developed in order
to define Unix. Operating systems can be branded as conforming to one
standard or another. In the next section, we’ll explore the various Unix
standards.
You can read more about the history of various aspects of Unix in re-
sources such as Dennis Ritchie’s telling of its history [31]; Salus and Reed’s
The Daemon, the Gnu, and the Penguin [36]; Salus’s comprehensive telling in
A Quarter Century of UNIX [35]; Brian Kernighan’s memoir, Unix: A History
and a Memoir [17]; UNIX Internals [28]; and The Design and Implementation of
the 4.4BSD Operating System [26]. You can read transcripts of interviews with
many UNIX developers in the Oral History of UNIX [23] and read the his-
tory of the GNU project at https://www.gnu.org/gnu/gnu.html. Torvalds and
Diamond published an account of Linux development [46], and Appendix A
of Open Sources: Voices from the Open Source Revolution [7] has an interesting
exchange of ideas between Torvalds and Tanenbaum germane to the design
of the Linux kernel. The bibliography also has additional references on Unix
history [20] [32] [35] [38].
Core Concepts 43
Unix in particular; it’s more general than that. POSIX is a family of stan-
dards known formally as IEEE 1003. It was also published by the Interna-
tional Standards Organization (ISO) with the name ISO/IEC 9945:2003;
these were one and the same document.
NOTE The most recent version of POSIX as of this writing is IEEE Std 1003.1-2024, also
known as POSIX.1-2024. It is simultaneously known as the Open Group Base
Specifications Issue 8. The POSIX.1-2024 standard consolidates the major stan-
dards preceding it, including POSIX.1, and the Single UNIX Specification, Version
4 (SuSV4).
The spirit of POSIX is to define a Unix system, as is stated in the introduc-
tion to the specification (http://pubs.opengroup.org/onlinepubs/9799919799/):
POSIX.1-2024 defines a standard operating system interface and
environment, including a command interpreter (or “shell”), and
common utility programs to support applications portability at
the source code level. It is intended to be used by both application
developers and system implementors.
The Single UNIX Specification was derived from an earlier standard writ-
ten in 1994 known as the X/Open System Interface, which itself was devel-
oped around a Unix portability guide called the Spec 1170 Initiative, which
contained a description of exactly 1,170 distinct system calls, headers, com-
mands, and utilities covered in the spec.
The Single UNIX Specification was revised many times starting in 1997
by The Open Group, which was formed in 1996 as a merger of X/Open and
the Open Software Foundation (OSF), both industry consortia. The Open
Group owns the UNIX trademark. It uses the Single UNIX Specification to
define the interfaces an implementation must support to call itself a UNIX
system. The most recent edition, revised in 2018, contains 1,833 distinct
interfaces.
The specification standardizes the collection of all system calls, libraries,
and those utility programs such as grep, awk, and sed that make Unix feel like
Unix. The collection of system calls is what defines the Unix kernel. The sys-
tem calls and libraries together constitute the Unix application programming
interface, whereas the utility programs constitute the Unix user interface.
There are four major parts to the standard:
Base definitions General terms, concepts, and interfaces common to
all volumes of the standard, including utility conventions and C language
header definitions
System interfaces Definitions for system service functions and subrou-
tines; language-specific system services for the C programming language;
function issues, including portability, error handling, and error recovery
Shell and utilities Definitions for a standard source code–level inter-
face to command interpreters and common utility programs for applica-
tion programs
44 Chapter 1
Rationale An informative section, which contains historical informa-
tion concerning the contents of POSIX.1-2024 and why features were
included or discarded by the standard developers
POSIX.1-2024 also defines areas as being outside of its scope:
• Graphics interfaces
• Database management system interfaces
• Record I/O considerations
• Object or binary code portability
• System configuration and resource availability
All interfaces defined by POSIX are written in C because much of Unix
was originally developed in C. Therefore, POSIX depends upon a standard
definition of C; in particular, POSIX.1-2024 is based on C17, whose official
standard is ISO/IEC 9899:2018. I’ll discuss more about C standards in the
next section.
The Single UNIX Specification, Version 4, from 2018 is essentially the
same as POSIX.1-2024, except that it includes a standard for the ncurses li-
brary, which is a terminal control library that can be used to create interactive
programs that run in terminal windows, such as text editors and games.
The fact that there are standards doesn’t imply that all Unix implementa-
tions adhere to them. Although there are systems such as AIX, Solaris, and
macOS that are fully POSIX conformant, most are partly compliant. Systems
such as FreeBSD and various versions of Linux fall into this category.
Any single Unix system may have features and interfaces that do not
comply with a standard. The challenge in system programming is being able
to write programs that will run on a broad range of systems in spite of this.
A Unix man page generally shows to which standards the topic of the
man page conforms. The standards man page, in Section 7, lists all of the
names used for the standards referenced in the man pages. If you enter
the command man standards, you will see the full list. In Chapter 2, we’ll go
over how feature test macros are used to provide a means to compile a single
program on a variety of different Unix systems.
C Standards
The C programming language has undergone several revisions since it was
first invented by Dennis Ritchie, each adding new features and sometimes
fixing defects. The most recent version as of this writing is C23. You can
download the latest free draft of the C23 standard as well as drafts of older
versions from various websites, such as https://iso-9899.info/wiki/The_Standard.
It’s a good idea to keep a local copy of the current standard for those times
when you encounter an unfamiliar construct in a program.
Because POSIX specifies not just what Unix must do but what the various
parts of the C Standard Library must do, in effect, it specifies an extension to
the C language. Therefore, a Unix system that is POSIX conformant contains
Core Concepts 45
all of the library functions of the C language, such as the C Standard I/O
Library and the C Math Library, all part of what’s commonly called the C
Standard Library.
The C Standard Library provided for Linux as well as several other Unix
distributions is the GNU C Library, called GNU libc, or glibc. GNU often
extends the C library, and not everything in it conforms to the ISO standard,
nor to POSIX. What all of this amounts to is that the version of the C library
on one system is not necessarily the same as that found on another system.
This is one reason why it’s important to know the standard and know
what it does and doesn’t define. In general, the C standard describes what’s
required, what’s prohibited, and what’s allowed within certain limits. Specifi-
cally, it describes the following:
• The representation of C programs
• The syntax and constraints of the C language
• The semantic rules for interpreting C programs
• The representation of input data to be processed by C programs
• The representation of output data produced by C programs
• The restrictions and limits imposed by a conforming implementa-
tion of C
Not all compilers and C runtime libraries comply with the standard, and this
complicates programming in C.
The GNU compiler has command line options that let you compile ac-
cording to various standards. For example, if we wanted our hello.c program
to be compiled against the ANSI standard, we would enter:
Even though there are later ISO C standards, if we use the previous com-
mand, it will apply the most recent C standard anyway.
Understanding how to write programs for Unix requires knowing which
features are part of C and which are there because they are part of Unix. In
other words, you’ll need to understand what the C libraries do and what the
underlying Unix system defines. Having a good grasp of the C standard will
make this easier.
46 Chapter 1
Summary
System programs are fundamentally different from the kinds of programs
that most beginning students learn how to write because they access pro-
tected resources inside the computer system. What actually happens when
a program makes a relatively simple call to print onto the terminal window
involves much more than what meets the eye. The sequence of steps includes
the use of system calls, which are function calls into the kernel code. The
kernel is the core of the operating system, the part that is memory resident
as long as the computer is powered on, and is responsible for protecting,
managing, and making available the wide range of resources in the computer
system.
Unix introduced many novel ideas in the design of operating systems.
Some of the most innovative ideas that made it so successful are the following:
• A programmable, interchangeable command line interpreter, called
a shell, that runs in userspace rather than as a part of the kernel
• The concept of processes and the method of process creation
• The use of two levels of privilege to provide protection of the kernel
and its resources
• Device-independent I/O operations
• The representation of files as sequences of bytes without structure
• I/O redirection and pipes in particular
• The concepts of users and groups and file permissions
• The single directory hierarchy
• The environment concept
The growth and spread of Unix led to many different Unix varieties and
distributions and a need for standardization. This in turn led to the creation
of a consortium that created the POSIX standards for its interfaces and
behavior.
Exercises
1. Who are the authors of the bash shell? (Hint: Use the man pages to
find out.)
2. What is the return type of the read() system call?
3. Using the man pages, find the names of all of the header files that
you would need to include to use the following functions in a pro-
gram. There might be more than one needed for some of these.
(a) _exit()
(b) setuid()
(c) fstat()
Core Concepts 47
4. If your current working directory is /usr/share/gcc/python, what is the
shortest relative pathname of the file /usr/lib32/libc.so.6?
5. What command can be used to print the creation date of a file?
(Hint: This information is part of a file’s status.)
48 Chapter 1
FUNDAMENTALS OF SYSTEM
2
PROGRAMMING
Object Libraries
Most likely, almost every program you’ve written has made calls to functions
you didn’t write but that are part of some library installed on your system.
The functions that you call to read from or print to the screen are contained
in a library, most likely a standard library, such as the C Standard Library
(for example, printf()) or the C++ Input/Output Library based on iostreams
(for example, the insertion operator of the ostream cout object). You may
not have thought much about libraries before, but they play a key role in
programming.
When you’ve been writing programs for a while, you might realize that
you keep writing certain functions over and over again for different projects.
To avoid rewriting them each time, perhaps you copy them from one direc-
tory to another, possibly tweaking them a bit depending on how you plan to
reuse them.
Suppose you discover while working on your latest project that one of
the functions you’re reusing in this way has a bug. You can fix it in the cur-
rent copy, but then you’ll have to find all of the other projects that use that
function and fix the bug in them as well. It’s not a very efficient organizing
principle.
Wouldn’t it be better if you could create a repository of those frequently
used functions in such a way that each new project could just link to it? Al-
though such a repository could be a collection of source code files, it would
be even better if it were a bundle of object modules, code that’s already com-
piled and ready to link into a program.
One advantage of an object code bundle instead of a source code bun-
dle is that you don’t have to compile it every time. Also, if you plan on shar-
ing your work with others, you could distribute the object code and not
50 Chapter 2
worry that it might be modified, unintentionally or otherwise, or possibly
broken. Those issues are possible if you distribute just the source code. If
you did distribute the object code, you’d most likely need to distribute a
header file that contained all declarations of the functions and other sym-
bols contained in the object code.
In Unix systems, doing so isn’t just possible, it’s also relatively easy. Unix
has tools that let you create your own libraries and tools that can view and
modify libraries. Appendix A contains detailed instructions on how to cre-
ate libraries in Unix.
An object library, also called a software library, is a file that bundles to-
gether, in a structured way, the compiled object code from multiple func-
tions so that programs can call them easily. Libraries aren’t stand-alone ex-
ecutables; they don’t have a main() function, and you can’t run them. They
contain function implementations and sometimes type definitions and con-
stants needed by those functions or by code that calls them. Figure 2-1 de-
picts a hypothetical library named libsnw.a.
libsnw.a
index
sort.o
cardinal.o
binsearch.o
The index in Figure 2-1 is essentially a look-up table that contains the
addresses relative to the start of the file of all symbols defined in the library,
which makes those symbols easy to find.
System Libraries
System calls are usually very low-level primitives. They do very simple tasks
because the Unix operating system was designed to keep the kernel as small
as possible. For the same reason, the kernel typically doesn’t provide many
routines that do similar things. For example, the kernel has a single func-
tion to perform almost all read operations, and when it reads from storage
devices such as disks, it reads large blocks of data from a specified device
to specified system buffers. It doesn’t have a different system call to read a
character at a time, nor one that reads formatted input, both of which are
NOTE The fact that a shared library is also called a dynamically linked library doesn’t
imply that they’re the same as what Microsoft calls a DLL. While DLL is short for
“dynamically linked library” also, DLLs are different from Unix shared libraries.
I’ll use the term shared library so as not to cause any confusion.
With shared libraries, calls to functions or references to other symbols
in the library are linked only when the program actually executes the calls or
accesses the symbols for the first time. Shared libraries have names ending
in .so, possibly followed by a numeric suffix of the form .<number>, such as
libc.so.6, where the number refers to a specific version. The .so suffix is short
for “shared object.”
Static linking, which was the original form of linking used in most op-
erating systems, including Unix, resolves references to externally defined
52 Chapter 2
symbols such as functions by copying the library code directly into the ex-
ecutable file when the executable file is built. The linkage editor, also called
the link editor or simply the linker, performs static linking. The term linker
is a bit ambiguous, so I avoid using it. The ld program is the static linker in
Linux.
The primary reason to use static linking, perhaps now the only reason,
is that statically linked executables are self-contained and can run reliably on
multiple platforms. For example, a program might use a particular version
of a graphical toolkit such as GTK that may not be present on all systems. If
the toolkit’s libraries are statically linked into the executable, the executable
can run on other systems with the same machine architecture without re-
quiring that the users on those systems install the specific library files.
When a library is dynamically linked to a program, the linkage editor in-
serts records into the program for symbols from the library to indicate that
these symbols will be resolved when they are first reached during the pro-
gram’s execution. When the program is loaded into memory, the dynamic
linker checks whether that library is already in memory and, if not, finds a
place in memory for it and loads it. As the program executes, each time a
new symbol is reached, the dynamic linker links it to the library. Programs
can experience slightly longer running times with dynamic linking, because
whenever an unresolved symbol is found and must be resolved, there’s a bit
of overhead in locating the library and linking to it.
Linux systems have two dynamic linkers: ld.so and ld-linux.so. The for-
mer links and loads the old-style executable format known as a.out binaries,
and the latter links and loads executables in the modern Executable and
Linking Format (ELF). ELF is a standard format for executable files, object
files, and libraries. It replaces the older a.out format and the Common Ob-
ject File Format (COFF), which was created to replace a.out. ELF was devel-
oped by UNIX System Laboratories and has been adopted by almost all Unix
vendors.
$ ar t /lib/gcc/x86_64-linux-gnu/11/libstdc++.a
compatibility.o
--snip--
array_type_info.o
atexit_arm.o
atexit_thread.o
atomicity.o
bad_alloc.o
--snip--
The path to libstdc++.a may be different on your system. You can also use
the objdump command to view executable program files and shared libraries.
The -a option prints the index with information about the original object
files:
$ objdump -a libsnw.a
In archive libsnw.a:
54 Chapter 2
$ objdump -t libsnw.a
In archive libsnw.a:
SYMBOL TABLE:
00000000 l df *ABS* 00000000 sort.cpp
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .gcc_except_table 00000000 .gcc_except_table
00000000 l d .gnu.linkonce.t._ZStgtIcSt11char_traitsIcESaIcEEbRKSbIT...
00000000 l d .eh_frame 00000000 .eh_frame
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
--snip--
The man page for objdump explains how to read its output.
For shared libraries, you can use the nm command with the -D or --dynamic
option. The following shows how to use it to view the dynamically linkable
symbols in the C standard library:
$ nm -j -D /lib/x86_64-linux-gnu/libc.so.6
--snip--
_IO_do_write@@GLIBC_2.2.5
_IO_doallocbuf@@GLIBC_2.2.5
_IO_enable_locks@@GLIBC_PRIVATE
_IO_fclose@@GLIBC_2.2.5
_IO_fdopen@@GLIBC_2.2.5
_IO_feof@@GLIBC_2.2.5
_IO_ferror@@GLIBC_2.2.5
_IO_fflush@@GLIBC_2.2.5
--snip--
The -j forces nm to print just the symbols and suppress other information.
Another tool you can use is readelf, which can display the contents of
any ELF file, including object files. The readelf command is an example
of a binary utility, a command designed to work with binary files such as
ELF files. On some systems such as Solaris, you need to use elfdump because
readelf isn’t available.
To understand the output of readelf, you need to understand the struc-
ture of ELF files and the notation used by readelf. But if all you want to do
is check what functions or other symbols are in an executable, you can enter
readelf -s elf-file | more, and you’ll see a large amount of output, a screen-
ful at a time.
For example, I can run readelf on a program, say myprogram, that was
linked to a libutils.so shared library and see all symbols, as shown here:
$ readelf -s myprogram
Symbol table '.dynsym' contains 17 entries:
Fundamentals of System Programming 55
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FUNC GLOBAL DEFAULT UND show_time
2: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
--snip--
The fact that show_time has a value of 0 means that it is not yet bound to an
address. This is to be expected, because the actual binding will not take
place until runtime.
To learn more, first read the man page for ELF and then read the page
for readelf. You can also download the specification of ELF from various
websites such as the Linux Foundation (https://refspecs.linuxfoundation.org/
LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/elf-generic.html). Chapter 10 ex-
plains the structure of ELF files in detail.
Two other tools, hexdump and od, short for “octal dump,” are sometimes
useful. Each can display a file’s raw, uninterpreted bytes starting from byte
0, with byte addresses, in various output formats such as character when pos-
sible, hexadecimal, octal, and decimal.
$ ldd hello
linux-vdso.so.1 (0x00007ffdbe564000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc2e26a4000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2e28f6000)
This shows that hello is linked only to the dynamic linker ld-linux-x86-64.so.2
and the GNU C Library, libc.so.6, as well as a library named linux-vdso.so.1.
We don’t need to know much about this library; it’s used by the C Standard
Library at runtime to solve some performance issues. Section 7 of the man
page for vdso explains its purpose in more detail.
Let’s look at what dynamic libraries the ls program uses:
$ ldd /bin/ls
linux-vdso.so.1 (0x00007ffd591a7000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007efc6271...
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efc624f6000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007efc6245...
/lib64/ld-linux-x86-64.so.2 (0x00007efc62793000)
This output shows that ls is linked to two libraries besides the linking loader
and the C standard library. The libpcre2 library has functions for working
with Perl regular expressions, and libselinux is the SELinux runtime library.
56 Chapter 2
SELinux is a security system for Linux that defines access controls for the
applications, processes, and files.
We can also use the ltrace and strace tools for seeing which functions
are actually called when a program runs. You can learn how to use them
from their man pages.
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.39-0ubuntu3.1) stable release version 2.39.
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 11.4.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.
#include <gnu/libc-version.h>
const char *gnu_get_libc_version(void);
The following program demonstrates its use; it just prints out the ver-
sion number:
System Calls
An ordinary function call is a jump into and return from a routine that is part
of the code linked into the program making the call, regardless of whether
the routine is statically or dynamically linked to the code. A system call is like
a conventional function call in that it causes a jump to a routine followed by
a return to the caller. But it’s significantly different because it’s a call to a
function that is a part of the Unix kernel.
It’s easy to tell whether a function is a system call or a library function.
System call man pages are usually in Section 2, whereas library functions are
usually in Section 3. If when you read the man page for a function, its SYNOPSIS
shows you that you need to include unistd.h, then it’s most likely a system call.
If the function is in Section 3, unistd.h is not required.
The code that’s executed during a system call is actually kernel code.
Since the kernel code accesses hardware and contains privileged instruc-
tions, it must be run in privileged mode. Since only the kernel runs in priv-
ileged mode, this mode is also commonly called kernel mode or privileged
mode. Therefore, during a system call, the process that made the call runs in
kernel mode.
Unlike an ordinary function call, a system call causes a change in the
execution mode of the processor; systems usually implement this with a trap
instruction.
NOTE A trap is a machine instruction that changes the processor mode and jumps to a spe-
cific location in memory. In older systems, the trap is implemented with the int 0x80
instruction. Linux kernels from 2.6 and later use the sysenter instruction, and the
GNU C Library glibc 2.3.2 and later use sysenter.
The kernel supports a fixed number of system calls on any given sys-
tem. The syscalls man page lists the names of all calls supported on the
system. Each call is associated with a number that’s used as an index into a
table of addresses to which control is transferred inside the kernel. These
numbers are system dependent, but each has a symbolic name defined by
a macro. For example, the symbolic name for the getpid() system call num-
ber is __NR_getpid (as well as SYS_getpid for backward compatibility). As of this
writing, the latest Linux kernel has about 450 different system calls. The trap
instruction is typically invoked with a parameter that references this number
to specify which system call to run.
58 Chapter 2
The number of parameters in system calls varies, and the method by
which they’re transferred to the kernel depends on how many there are.
Linux systems use a combination of two different methods:
Register method Parameters are placed into known registers in a spe-
cific order. When the number of parameters exceeds the number of
available registers, the block method is used instead.
Block method The parameters are stored in a block of consecutive
bytes in memory, and the address of the block is passed in a register.
The latest version of Linux does not allow more than six parameters to a
system call.
Wrapper Functions
Processes don’t usually invoke system calls directly. Instead they call wrap-
per functions. A wrapper function for a function named foo() does very little
other than repackage the parameters of the call to foo(), call foo(), collect
its return value, and possibly supply it in a different form to the caller. The
GNU C Library glibc has wrapper functions for almost all system calls.
Wrapper functions for system calls usually have the same name as the call
itself. They also have to execute the trap instruction to trap to kernel mode.
A wrapper is said to be thin if it does almost nothing but pass the argu-
ments in and receive the return values. The GNU C Library wrapper func-
tions are often very thin, doing little more than copying arguments to the
right registers before invoking the system call and then setting the value of a
global error variable.
Sometimes a wrapper is not so thin, as when the library function has
to decide which of several alternative functions to invoke, depending upon
what is available in the kernel. The truncate() system call is a good example.
It can truncate a file to a specified length, discarding the data beyond that
length.
The original truncate() function could handle only lengths that could
fit into a 32-bit integer. When filesystems were able to support very large
files, a newer version named truncate64() was added. The newer function
can process lengths representable in 64 bits. The wrapper for truncate() de-
cides which one is provided by the kernel and calls it.
Some system calls don’t have wrappers in the library, and for those,
the programmer has no other choice but to invoke the system call with
the syscall() function, passing the system call’s number and arguments.
Generally speaking, for a system call named foo, its number is defined by a
symbolic constant named either __NR_foo or SYS_foo. These macro definitions
are exposed by the header file sys/syscall.h, which you’d need to include in
the code. They may not be in that file itself, but in an included file, such
as asm/unistd_32.h or asm/unistd_64.h. The man page for syscall() lists the
headers to include.
An example of a system call that may not have a wrapper is gettid(),
which returns the caller’s thread ID. (A wrapper was added to glibc starting
in version 2.30.) In Chapter 1, we saw a slightly different program that called
this function. It’s the same as getpid() for a process with a single thread. The
gettid_demo.c program in Listing 2-1 uses syscall() to call gettid() and prints
the returned ID on the screen.
60 Chapter 2
gettid_demo.c #include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <stdio.h>
Listing 2-1: A program that uses the syscall() function to make a system call
Program
Kernel space
System call handler
Kernel
Figure 2-3: The different control paths for obtaining services, showing the relationship
between library function calls and system calls
Your program must not declare the errno variable. Because errno is declared
in errno.h, including the header also includes its declaration. If you put another
declaration of it in your program, it would hide the real errno variable and the
one your program uses wouldn’t contain the error values. Also, the program must
inspect errno immediately after the system call because, if your program calls
any other function or makes another system call before it inspects that variable,
the error value may be overwritten by the error value resulting from the later call.
You can also enter the errno -l command to see the list of all possible er-
ror codes from all system calls. This command is part of the moreutils pack-
age, which may not be installed on your system. If you see an error message
after entering that command, you need to install the package. A normal run
looks like:
$ errno -l
EPERM 1 Operation not permitted
ENOENT 2 No such file or directory
62 Chapter 2
ESRCH 3 No such process
EINTR 4 Interrupted system call
EIO 5 Input/output error
ENXIO 6 No such device or address
E2BIG 7 Argument list too long
ENOEXEC 8 Exec format error
--snip--
#include <unistd.h>
int gethostname(char *name, size_t len);
The type of the second parameter, size_t, is an unsigned integer type that
is defined by the POSIX.1 standard. Unix systems that conform to the stan-
dard employ this type for all symbols that are supposed to store the size of
any kind of object. It’s our first example of a Unix system type.
The man page explains the behavior of gethostname():
gethostname() returns the NULL-terminated hostname in the charac-
ter array name, which has a length of len bytes. If the NULL-terminated
hostname is too large to fit, then the name is truncated, and no
error is returned. POSIX.1-2024 states that if such truncation oc-
curs, then it is unspecified whether the returned buffer includes a
terminating NULL byte.
Based on this explanation, our program must check the value returned and
handle the error, because if the name array was truncated and is missing the
terminating NULL byte, the program will generate some type of fault, most
likely a segmentation fault, when we try to print the name.
The ERRORS section on the man page for gethostname() lists three possible
errors:
EFAULT When name is an invalid address
EINVAL When len is negative
ENAMETOOLONG When len is smaller than the actual size
This list implies that we should have code to handle each case. Because
there are only three, a sequence of if statements can handle them.
Listing 2-2 contains a complete program, gethostname_demo.c, that demon-
strates one way to handle the errors from the call to gethostname().
void main()
{
char name[4]; /* Declare string to hold returned value. */
size_t len = 3; /* Purposely too small so error is revealed */
int returnvalue;
Listing 2-2: A program that demonstrates how to handle system call errors by inspecting
the errno variable
The program needs to include unistd.h on the first line because gethost
name() is a system call. It includes errno.h ¶ in order to use the errno variable.
If the if condition · is true, an error occurred and the switch statement se-
lects a custom error message to print, after which the program terminates.
If not, the program prints the name returned in the else clause. I purposely
made the array too small for most machine names so that when this pro-
gram is run we get to see the error message. By changing the array size to
a large enough number, we prevent the error from occurring.
The following run of the program shows what it outputs, assuming the
hostname is harpo and the executable is named gethostname_demo:
$ ./gethostname_demo
The hostname is too long for the allocated array.
64 Chapter 2
The perror() function writes a message onto the standard error stream
describing the last error encountered during a call to a system or library
function. Its synopsis is:
#include <stdio.h>
void perror(const char *s);
void main()
{
char name[4]; /* Declare string to hold returned value. */
size_t len = 3; /* Purposely declared too small so error is revealed */
int returnvalue;
Listing 2-3: A program that uses perror() to handle system call errors
$ ./perror_demo
gethostname: File name too long
#include <string.h>
char *strerror(int errnum);
It returns a pointer to a string containing the error message for the error
number passed to it. Therefore, strerror(errno) is the error message associ-
ated with errno.
66 Chapter 2
Portability
Portability refers to the degree to which your program can run on other com-
puters with little or no modification of the code itself. If, for example, your
code uses features available only in GNU/Linux and you try to run it on an-
other Unix system without that support, it won’t behave the same way, and
you may not even be able to build it unless you modify the source code.
Unix’s haphazard growth is partly the cause of this problem, because over
time, three major variants of Unix evolved: BSD, GNU/Linux, and System V
(see Chapter 1). These variants had different features and capabilities, and
people created standards to specify how those various systems were supposed
to behave. One Unix system can have functions with the same names as
another but whose semantics are different because they evolved in different
variants. We need to know which version of a function our program uses
when we compile it on the development machine and whether it will be the
same when we compile the program on a different machine.
If you’re distributing source code to be built on other computers, ideally
you would design it so that it will compile into an executable whose behavior
is what you expect even on other computers.
Portability is tied to the concept of standards because, for example, if
your program is intended to adhere to the POSIX.1-2024 standard but must
be built on a system conforming to a different, perhaps older, POSIX stan-
dard, you need to know how to design the code so that it uses features avail-
able on the other computer when the ones you hoped to use aren’t available.
The macro preprocessor’s ability to compile code conditionally based on the
values of macro objects is the key to solving this problem.
#ifdef __USE_GNU
/* Close all streams... */
extern int fcloseall (void);
#endif
NAME
getline, getdelim - delimited string input
SYNOPSIS
#include <stdio.h>
ssize_t getline(char **lineptr, size_t *n, FILE *stream);
ssize_t getdelim(char **lineptr, size_t *n, int delim, FILE *stream);
getline(), getdelim():
Since glibc 2.10:
_POSIX_C_SOURCE >= 200809L
Before glibc 2.10:
_GNU_SOURCE
--snip--
The page explicitly mentions Feature Test Macro Requirements. What are they,
and how are you supposed to use this information?
If the SYNOPSIS section of a function’s man page lists feature test macro
requirements, it means that the given prototype or constant declaration will
be read by the preprocessor only if the macro is defined before including any
header files, not just the one in which it is declared, but all of them, as in:
#define __GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
--snip--
If you understand how the header files use these definitions, the code
will make much more sense to you. I’ll use a simplified version of the stdio.h
header file to illustrate, because the actual header file is much more com-
plex. The declaration of the prototype for getline() in this file looks roughly
like this:
68 Chapter 2
#ifdef __GNU_SOURCE
ssize_t getline (char **__lineptr, size_t *__n, FILE *__stream);
#endif
#define __GNU_SOURCE
#include <stdio.h>
--snip--
Doing this causes the lines that #ifdef __GNU_SOURCE ... #endif protects to
be read.
The man page in essence tells us that if we want to use either of the two
functions getline() or getdelim(), if our version of glibc is 2.10 or later, we
need to include the definition:
If our version of glibc is older than 2.10, we need to use this macro:
#define _GNU_SOURCE
#include <stdio.h>
If you don’t remember how to find which version of glibc you have, see the
“The C Standard Library” on page 57.
As an alternative to defining the macro in the program source code, we
can enable the definition when we compile the code on the command line
using the -D option to gcc, as in
or
Some feature test macros are intended to make your program more
portable by preventing nonstandard definitions from being exposed. Other
macros serve the opposite purpose, exposing nonstandard definitions that
aren’t exposed by default. The syntax of the feature test macros on the man
page uses the logical-OR and logical-AND operators: || and &&. The example
shown in the feature_test_macros man page is for the acct() function. It’s not
important what this function does:
SYNOPSIS
#include <unistd.h>
int acct(const char *filename);
acct():
Since glibc 2.21:
_DEFAULT_SOURCE
In glibc 2.19 and 2.20:
_DEFAULT_SOURCE || (_XOPEN_SOURCE && _XOPEN_SOURCE < 500)
Up to and including glibc 2.19:
_BSD_SOURCE || (_XOPEN_SOURCE && _XOPEN_SOURCE < 500)
--snip--
70 Chapter 2
• The sizes and ordering of data members in structures
• The set of macros actually available in header files
We’ll address these issues as they arise.
System Limits
All Unix systems set limits on system resources and properties, such as the
maximum length of a filename or a pathname and the maximum length of a
username. Various standards specify minimum values for these maximums.
For example, POSIX.1-2024 specifies that _POSIX_NAME_MAX is the least value
that any conforming system can use as the maximum length of a filename.
These specified values are called system limits.
A portable application needs to know what these limits are on each sys-
tem on which it runs, and it should be able to adjust its use of resources
accordingly. There are a few different means for getting these limits, depend-
ing on their category. POSIX.1-2024 divides system limits into three such
categories:
Runtime invariant Those whose values are constant for any particular
Unix system
Pathname variable Filesystem-related limits whose values can vary on
a single system, depending on which filesystem they limit
Runtime increasable Those whose values can be increased at runtime
For example, most runtime invariant limits are defined in the header
file limits.h. A program can call the functions pathconf() and sysconf() to get
the values of various limits at runtime. Several programs in later chapters of
the book provide examples of how to do this.
Internationalization
In the early days of computing, almost all software was developed for En-
glish speakers. Now, computer systems are used throughout the world, and
we need to design software so that it accommodates local languages and cul-
tural conventions. Sometimes differences in cultural conventions can lead to
ambiguities with serious consequences. Two simple examples illustrate this
issue:
• In the United States, people express dates in the format MM/DD/
YYYY, where MM is a two-digit month, DD is a two-digit day of the
month, and YYYY is a four-digit year, such as 07/11/2033. In Eu-
rope, the convention is DD/MM/YYYY. If a program is transported
from one side of the Atlantic to another and dates are input or out-
put, it would be hard to know which date is meant by 07/11/2033.
Is it November 7 or July 11, 2033?
$ gcc -o main main.c utils.c fileio.c # Build executable main from sources.
and
72 Chapter 2
When you type a command and press ENTER, the shell makes the words
following the command name available to the program executing that com-
mand. The program needs to distinguish between the words that are non-
option arguments to the command, such as main.c, utils.c, and fileio.c from
the previous example, and those that are command options, such as -o main
and -r. In this section, I’ll explain how you can design your programs to
extract words from the command line, separate them into options and argu-
ments, and obtain the values of any environment variables that may influence
the behavior of the program.
the arguments the shell finds are dir1, dir2, and dir3. The phrase > listing is
a redirection; you’re allowed to put redirections between those arguments,
even though it’s a confusing thing that you should never do.
A program’s main() function is allowed to have no parameters, as in
but in this case, it’s unable to access its command line arguments.
The C standard requires compliant implementations of C (C compilers)
to accept a main() function with two parameters, as follows:
5 0 ″showargs\0"
2 ″:\0"
3 ″debug prog.c\0"
4 ″email\0"
NULL
It displays the command name that the user enters to execute the pro-
gram, followed by the command line arguments that it receives from the
shell, numbered to show their positions, one per line.
Notice that the last argument is in argv[argc-1], not argv[argc]. Because
the array’s last element is a NULL byte, we can also iterate through the argu-
ments until the condition argv[i] == NULL is true, as shown in Listing 2-5.
Listing 2-5: A program that prints its command line arguments until it finds the NULL byte in
the argv[] array
Using pointer arithmetic, we could dispense completely with the index
variable i. (This is left as an exercise for the reader.)
#include <stdlib.h>
char *getenv(const char *name);
Given name, it searches the environment list for a variable matching name,
and if it finds one, it returns a pointer to its value; otherwise, it returns NULL.
For example, the program in Listing 2-6 prints the value of the HOME environ-
ment variable, unless it’s not in the environment, in which case it prints an
error message.
char path_to_home[256];
since this allocates storage for it and makes path_to_home a constant char pointer.
The function wouldn’t be able to assign a value to it, and the compiler will
flag it as an error.
POSIX.1-2024 allows an implementation of this function to store the
string whose address is returned in a statically allocated storage location,
which means it will be overwritten by a subsequent call. If you intend to call
it again, copy the return value to a local variable. For example, the follow-
ing code may not work on some systems, because by the time that the value
of home is evaluated, the storage has been overwritten by the return value of
getenv() in user = getenv("USER"):
--snip--
char *home, *user;
home = getenv("HOME");
if ( NULL != home ) {
user = getenv("USER");
if ( NULL != user )
printf("USER=%s and HOME=%s\n", user, home);
}
--snip--
Instead, you could use a string copying function such as strncpy() to copy the
return value into home, as in:
char home[256];
strncpy(home, getenv("HOME"), sizeof(home));
If you do this, make sure to include the string.h header file, since the declara-
tions of the string copying functions are there.
76 Chapter 2
environ
0 "HOME=/home/stewart/\0″
1 "USER=stewart\0″
2 "SHELL=/bin/bash\0″
3
...
4
NULL
Listing 2-7: Using char **environ to search the environment sequentially and print its
environment strings
This envp parameter points to the start of the environment list inherited by
the program in the same way that the environ variable does.
You could then access the environment list with a loop such as:
int n = 0;
while ( NULL != envp[n] ) {
/* Do something with envp[n]. */
printf("%s\n", envp[n++]);
}
If you need only a few variables’ values, it’s better to use getenv(). Even
though many systems support this feature, POSIX.1-2024 doesn’t support it,
which implies that on some systems, your code won’t work if you use it, so I
advise you not to use it.
The command line must have at least three words for this program to
run properly. If there are more than three, it can ignore the extras. The pro-
gram should be allowed to run only if the first parameter to main(), which is
int argc, is at least 3. The program in Listing 2-8 demonstrates how to check
for correct usage properly.
78 Chapter 2
}
printf("About to copy from %s to %s\n", argv[1], argv[2]);
/* But no code for copying just yet */
return 0;
}
Listing 2-8: A program that checks for correct usage, printing a message if it is used
incorrectly
If the user doesn’t supply two or more arguments, the program exits
after displaying a message by calling the C fprintf() function ¶, whose first
parameter is the C Library file stream to which to print, in this case, the
standard error stream (stderr). Otherwise, it prints a message saying that it
will copy from the first named file to the second. We’ll see how to copy files
in Chapter 4. For now, we just say we’re doing so.
$ ~/bin/usagecheck_demo infile
usage: /home/stewart/bin/usagecheck_demo file1 file2
When the program runs, the tilde ~ is expanded to the path /home/stewart
and argv[0] contains the entire pathname, /home/stewart/bin/usagecheck_demo.
If you don’t want to display the entire pathname of the program but
prefer that it displays only the more concise message
regardless of where the executable is, then before you print it, strip off the
leading part of the argv[0] string so that the only thing left is what comes
after the final / character. There are two relatively portable ways to do this,
one more general than the other.
One way is to use the strrchr() function declared in string.h, whose pro-
totype is:
Listing 2-9: A program that strips the program name of any leading directories using
strrchr()
For those unfamiliar with C, or if your C is a bit rusty, the instruction suffixptr
= forwardslashptr + 1; ¶ performs pointer arithmetic to make suffixptr point
to the first character after the forward slash.
When pointer arithmetic appears in code, the compiler translates addi-
tion of an integer n to a pointer of type basetype* into the addition of sizeof
(basetype) * n bytes to the pointer’s value. For example, if dblptr is a pointer
of type double* that contains the address 1024, and a double uses 8 bytes, then
dblptr + 6 is the address 1024 + (6 × 8) = 1072. It’s worth remembering the
strrchr() function because it’s a useful function for other purposes as well.
For example, we can use it to get the suffix of a filename or to get the por-
tion of the filename before the suffix.
An easier, but less general, method of stripping the directories from the
pathname in argv[0] is to use the basename() library function, of which there
are both POSIX and GNU versions. Their prototypes are the same
but the POSIX function is declared in libgen.h, whereas the GNU version
is declared in string.h. The POSIX function modifies argv, but the GNU
version doesn’t. Furthermore, the man page for basename() states that the
80 Chapter 2
POSIX version implemented in glibc has bugs. For these reasons, we’ll use
the GNU version to demonstrate.
To use the GNU version, we need to define the _GNU_SOURCE macro before
including any header files. Listing 2-10 shows the program.
Listing 2-10: A program using basename() to strip the program pathname of its leading
directories
POSIX.1-2024 requires that all options should precede all of the argu-
ments, but some commands don’t conform to this requirement. If a com-
mand has several short options, none of which have arguments, we can write
them in various combinations, such as the following:
$ ssh -acCfGgKkMN
$ ssh -a -c -CfGg -Kk -M -N
$ ssh -CfGg -Kk -M -a -c -N
NAME
getopt, getopt_long, getopt_long_only, optarg, optind, opterr, optopt -
Parse command-line options
SYNOPSIS
#include <unistd.h>
#include <getopt.h>
82 Chapter 2
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
getopt(): _POSIX_C_SOURCE >= 2 || _XOPEN_SOURCE
getopt_long(), getopt_long_only(): _GNU_SOURCE
--snip--
Even though these are library functions, to use them you must include unistd.h.
The variables optarg, optind, opterr, and optopt are externally defined, and you
must not declare them in your program.
The man page explains everything we need to know to use these func-
tions. If our program expects all arguments to follow all options before
the header files are included, it should define _POSIX_C_SOURCE with a value
greater than or equal to 2 or define _XOPEN_SOURCE. If we want to allow a user
to intermingle options and arguments, it doesn’t need to define either of
these macros.
As mentioned previously, we must define _GNU_SOURCE to use getopt_long().
The getopt() function parses the command line arguments. Its first two
arguments, argc and argv, are the argument count and array passed to the
main() function. The third argument, optstring, is a string that identifies the
options and their arguments. The string is interpreted according to the fol-
lowing rules:
• A letter by itself is an option without arguments. For example, b rep-
resents -b.
• A letter with a single colon (:) after it has a required argument, and
getopt() will place a pointer to the argument in optarg if it exists, and
if it’s missing, it will return ?. (See the final rule regarding how a
leading : in optstring is used.)
• A letter with a double colon (::) after it has an optional argument,
and getopt() will place a pointer to it in optarg or will set optarg to 0
if it’s missing.
• If getopt() finds an undefined option, it will put the character in
optopt, print an error message on stderr, and return ?. You can set
opterr to 0 to suppress the message. It will also perform these ac-
tions if a required option argument is missing.
• If the leading character is a :, then if getopt() finds a missing required
option argument, instead of returning a ?, it returns a :, which makes
it possible to distinguish the type of error. A : implies a missing
option argument, and a ? implies an invalid option character.
Let’s look at a small program that uses getopt(). The option string
":hb::c:1" specifies that -h and -1 are options without arguments, -b is an op-
tion with an optional argument, and -c is an option with a required argument.
The getopt() function initializes the external variable optind to 1. When
getopt() is called repeatedly, it returns each of the option characters from
each of the option elements on the command line. When it can’t find any
more options, it sets optind to be the index in the argv array of the next ele-
ment to be processed, and it returns -1. Thus, when it returns -1, optind is
#define TRUE 1
#define FALSE 0
84 Chapter 2
if ( 0 != optarg )
strcpy(b_arg, optarg); break;
case 'c': /* c has a required argument. */
opt_c = TRUE;
strcpy(c_arg, optarg); break;
case '1': /* 1 is a switch (no arg). */
opt_1 = TRUE; break;
case '?':
printf("Found invalid option %c\n", optopt); break;
case ':':
printf("Missing required argument\n"); break;
default:
printf("?? getopt returned character code 0%o ??\n", ch);
}
}
/* Finished processing the command line */
/* Process the options - in this case, just print what was found. */
printf("Options found:\n");
if ( opt_h ) printf("-h \n");
if ( opt_1 ) printf("-1 \n");
if ( opt_b ) {
printf("-b ");
if ( strlen(b_arg) > 0 )
printf("with argument %s\n", b_arg);
else
printf("with no argument \n");
}
if ( opt_c )
printf("-c with argument %s\n", c_arg);
/* optind is the index of the 1st non-option word in the argv[] array. */
/* If optind < argc, there is at least one word that is not an option. */
Listing 2-11: A program that parses the command line for options and arguments
Listing 2-11 models the usual way to process the options, using a loop
and an embedded switch statement in which the fact of finding an option is
recorded in a variable associated with that option. This variable is checked
later in the program.
switch ( ch ) {
--snip--
case 'h':
print_help = TRUE;
break;
Then somewhere in the main program’s body, we’d put code such as:
if ( print_help )
print_help_message(); /* Print the help information. */
If the program allows the same option to be present multiple times on the
command line with different arguments, the switch case for that option
needs to store the successive arguments in a suitable data structure.
#include <stdlib.h>
long strtol(const char *nptr, char **endptr, int base);
if ( argc < 2 ) {
fprintf(stderr, "Usage: %s str \n", argv[0]);
exit(EXIT_FAILURE);
}
¶ errno = 0; /* To distinguish success/failure after call */
val = strtol(argv[1], · &endptr, 0);
Listing 2-12: A program that calls strtol() to convert its first argument to a number
Let’s study some of the details in Listing 2-12. We set errno to 0 ¶ so that
after the call, if it’s nonzero we’ll know that an error occurred. We need to
do this because the actual number might be 0, implying that we can’t inter-
pret a return value of 0 as an error.
We pass the address of endptr, not endptr itself ·, to strtol(). After the
call, endptr contains the address of the first invalid character. We also check
whether errno is 0 ¸ when strtol() returns. If it isn’t 0, a conversion error
occurred, and in this case we exit the program because the number might be
out of range and we don’t want to attempt to store it.
If errno is 0 ¹, there was no error, but it’s possible that the string was not
a number. If endptr points to the start of the string, it wasn’t a number.
Finally, we check for a different possibility º. It’s possible that the string
is something like 1234abc, which has valid digits followed by nondigits. If endptr
doesn’t point to the end of the string, the string must have nondigits. It’s best
in this case to let the calling program know this.
If we run this program with several different types of input, we’ll see
how it behaves. Assume the executable is named strtol_demo:
$ ./strtol_demo 100000000000000
strtol() returned 100000000000000
$ ./strtol_demo -817238172
strtol() returned -817238172
$ ./strtol_demo +871237abns
Characters following the number: "abns"
strtol() returned 871237
$ ./strtol_demo kjasdksd
No digits were found
$ ./strtol_demo 71238172381273687236817236
strtol: Numerical result out of range
$ ./strtol_demo 032
strtol() returned 26
88 Chapter 2
The very last run is revealing. The leading 0 is interpreted by strtol() as an
indicator that the number is octal.
Because we’ll need to convert strings to numbers frequently, in Chapter 3
we’ll develop a few functions based on the strto* functions that we’ll use in
subsequent chapters of the book.
Before leaving this topic, however, let’s consider another very simple way
to extract the numeric value of a string using the sscanf() function. It’s essen-
tially the same as scanf() except it reads from a C string passed to it in its first
parameter instead of from the standard input stream. Its synopsis is:
#include <stdio.h>
int sscanf(const char *str, const char *format, ...);
Like scanf(), its return value is the number of items successfully read and
converted to the format specified. By giving it the %d format specifier and
passing the address of an integer as the second argument, we can obtain the
integer value of the string.
Listing 2-13 shows how to do this.
if ( argc < 2 ) {
fprintf(stderr,"usage: %s <number>\n", argv[0]);
exit(1);
}
sscanf(argv[1], " %d", &x);
printf("The number is %d\n", x);
return 0;
}
Summary
System programs make requests to the kernel for services that require kernel-
level privileges through the use of system calls. System calls are calls to func-
tions implemented within the kernel.
Exercises
1. This exercise is open ended. Navigate to the /usr/bin directory on
the host you’re using. There, run the ldd command on every exe-
cutable, and examine the sets of dynamic libraries to which each
executable is linked. Which libraries are used the most? Which
commands link to the most libraries?
2. The printargs2.c program in Listing 2-5 used an integer to iterate
through the argv[] array. Write a version of it that does not print
the argument numbers and does not use any local variables.
3. Write a program that prints out the words it receives on the com-
mand line in reverse order, one per line.
4. Write a program that prints out the words it receives on the com-
mand line sorted by their lengths, from shortest to longest, one per
line. Words of the same length can be in any order.
90 Chapter 2
5. The program perror_demo.c in Listing 2-3 purposely used an array
of characters too small for the hostname. Read the man page that
describes the limits.h header file, find the system constant that spec-
ifies the maximum hostname length, and rewrite the program so
that this error cannot occur.
6. The seq command prints out sequences of numbers. In the sim-
plest case, seq num1 num2 prints every number from num1 through num2.
Write a program that implements this simple form of the command.
If any arguments are missing, if they are not two integers such that
the first is less than or equal to the second, it should print an error
message.
#include "sys_hdrs.h"
/* Non-system headers */
#include "get_nums.h" /* String to number conversions */
/* General errors */
#define READ_ERROR -4 /* Incomplete read of a file */
#define MEM_ERROR -5 /* Insufficient memory */
#endif /* COMMON_HDRS_H */
#ifndef COMMON_HDRS_H
#define COMMON_HDRS_H
--snip--
#endif /* COMMON_HDRS_H */
See the “Header Guards” box if these are unfamiliar to you. Every header
file should have a header guard to prevent multiple-definition errors.
HEADER GUARDS
Suppose that a file named func.c contains the directive #include "common.h".
When the macro preprocessor cpp sees this directive, it copies the named file
common.h into a copy of func.c at the point at which the #include directive was
found. Every included file is copied into this temporary copy of the file that cpp
is processing.
96 Chapter 3
Suppose that a second header file, mylist.h, which contains the prototypes for
functions in mylist.c, uses some functions declared in common.h (as well as other
functions), and it therefore includes common.h. Finally, suppose that the main
program, main.c, uses functions declared in both common.h and mylist.h. Then
main.c will contain these directives:
#include "common.h"
#include "mylist.h"
When you run the compiler to build the executable for main.c, cpp sees the
#include directive to copy common.h and will copy it into its temporary copy
of main.c. It then sees the #include "mylist.h" directive and copies the file
mylist.h after it. But this file also includes common.h, so any definitions in
common.h will now appear twice in the copy of main.c that cpp passes to the
compiler, which will cause the compiler to report definition errors.
A header guard, also called an include guard, is a conditional macro-based
construction designed to prevent this.
By enclosing a header file, say one named file.h, in a conditional macro of the
following form, we prevent the file from being included twice:
#ifndef FILE_H
#define FILE_H
--snip--
#endif
This is because the first line #ifndef FILE_H has the meaning, “If the macro symbol
FILE_H is not defined, continue reading and processing code until the matching
occurrence of #endif.” In this case, the line immediately after this conditional
test, #define FILE_H, causes cpp to store the definition of FILE_H.
On the other hand, if when #ifndef FILE_H is executed, the symbol FILE_H
is defined, then cpp skips reading code until immediately after the matching
#endif. This implies that any code enclosed in the header guard will be included
only once, and the multiple definitions cannot occur.
Notice that the common_hdrs.h file in Listing 3-1 includes a header named
get_nums.h as well as error_exits.h. In the following section, we discuss the first
of these, and in “Common Error-Handling Functions” on page 102, we dis-
cuss the second.
#ifndef GET_NUMS_H
#define GET_NUMS_H
#include "sys_hdrs.h"
· /* Return codes */
#define VALID_NUMBER 0 /* Successful processing */
#define FATAL_ERROR -1 /* ERANGE or EINVAL returned by strtol() */
#define TRAILING_CHARS_FOUND -2 /* Characters found after number */
#define OUT_OF_RANGE -3 /* int requested but out of int range */
#define NO_DIGITS_FOUND -4 /* No digits in string */
#define NEG_NUM_FOUND -5 /* Negative number found but not allowed */
/** get_long()
On successful processing, it returns VALID_NUMBER and stores the resulting
number in *value; otherwise, it returns one of the nonzero error codes
and puts a suitable message into *msg. flags is used to decide whether
trailing characters, negative values, and zeros for strings without any
digits are allowed or should be errors.
* @param char* arg [IN] String to parse
* @param int flags [IN] Flag specifying how to handle anomalies
* @param long* value [OUT] Returned long int
* @param char* msg [OUT] If not empty, error message
* @return int VALID_NUMBER or a negative error code indicating the
type of error
*/
int get_long(char *arg, int flags, long *value, char *msg);
--snip--
#endif
98 Chapter 3
or error ·. If there are no errors or anomalies, it returns VALID_NUMBER, which
is defined as zero. Callers can easily ignore the specific error codes or take
different actions depending on which they are.
The prototype has four parameters. The first is the string to be parsed.
The second argument is an integer interpreted by the function as a set of
flags. The following list includes the four possible flags ¶ and their meanings:
NO_TRAILING Returns a TRAILING_CHARS_FOUND value for any string contain-
ing trailing nonnumeric characters, including those that have no digits
at all, returning the value of the digits it found
NON_NEG_ONLY Returns NEG_NUM_FOUND if the numeric value is negative
POS_ONLY Returns NEG_NUM_FOUND if the numeric value is not positive
ONLY_DIGITS Return NO_DIGITS_FOUND if the string has no digits and set
*value to zero
int flag = 0;
flag = flag | NO_TRAILING | ONLY_DIGITS;
The third parameter is a pointer to a location that can store the long int on
successful return.
The fourth argument is the location in which to store an error message
if things go wrong. Thus, if get_long() returns VALID_NUMBER, the number is in
*value. If it returns anything else, the error message that it constructs is in msg.
Because the definition of get_long() is lengthy, to conserve space, I omit
parts of it as well as comments in Listing 3-3 (the book’s source code reposi-
tory provides the complete listing).
get_long() int get_long(char *arg, int flags, long *value, char *msg)
{
char *endptr;
long val;
errno = 0;
val = strtol(arg, &endptr, 0);
if ( errno == ERANGE ) {
if ( msg != NULL )
sprintf(msg, "%s\n", strerror(errno));
return FATAL_ERROR;
100 Chapter 3
are trailing characters. Similar logic applies to the remaining flags, but that
code is not shown.
The get_int() function, displayed in Listing 3-4, is much shorter, because
it just calls get_long() and checks whether the number is within range for an
integer using the system constants INT_MAX and INT_MIN.
get_int() int get_int(char *arg, int flags, int *value, char *msg)
{
long val;
int res = get_long(arg, flags, &val, msg);
if ( VALID_NUMBER == res ) {
if ( val > INT_MAX || val < INT_MIN ) {
sprintf(msg, "%ld is out of range\n", val);
return OUT_OF_RANGE;
}
else {
*value = val;
return VALID_NUMBER;
}
}
else { /* get_long failed in one way or another. */
return res;
}
}
Observe that get_int() doesn’t have to check for any errors other than
the numbers being out of range. It just passes the other error codes from
get_long() to its caller. What may not be obvious is that the message that
get_long() constructs will also be passed to the caller if get_int() doesn’t
overwrite it for a number that is out of range.
I wrote a couple of programs to call these functions (available in the
source code), passing various flags to illustrate some of their error handling.
The following shows some of their runs:
/** error_message()
This prints an error message associated with errnum on standard error
if errnum > 0. If errnum <= 0, it prints the msg passed to it.
It does not terminate the calling program.
This is used when there is a way to recover from the error. */
void error_mssge(int errornum, const char *msg);
/** fatal_error()
This prints an error message associated with errnum on standard error
before terminating the calling program, if errnum > 0.
If errnum <= 0, it prints the msg passed to it.
fatal_error() should be called for a nonrecoverable error. */
void fatal_error(int errornum, const char *msg);
/** usage_error()
This prints a usage error message on standard error, advising the
user of the correct way to call the program. */
void usage_error(const char *msg);
#endif /* ERROR_EXITS_H */
Listing 3-5: The header file with declarations of common error-handling functions
102 Chapter 3
system-defined EXIT_FAILURE number as its argument. The usage_error() func-
tion prints a usage message on standard error and terminates the program.
Listing 3-6 provides their implementations.
Although the fatal_error() and usage() functions look similar, it’s conve-
nient to have a separate function for displaying a message specifically when
the user ran the program incorrectly.
File Organization
The numeric parsing and error-handling functions just described are used
by almost all programs in this book. To facilitate using them, I place their
source code into a single top-level directory named common at the same level
as the include and lib directories. Each chapter has a directory at this same
level, containing the sources for all programs referenced in the chapter. The
lib directory contains a static library named libspl.a that contains all common
object files from the common directory. This directory structure is depicted
in Figure 3-1.
demos
Figure 3-1: The structure of the demo program directories with common code
Time, Dates, and Locales 103
To create the libspl.a library, I use the GNU ar command. Appendix A
contains an explanation of this command and detailed instructions for how
to create static and shared libraries in general.
On my system, 64 lines of output were displayed, but I’m showing only the
first five lines. Several of these lines are descriptions of commands that have
nothing to do with dates or times. Why is this? Remember that keyword
searches in their simplest form display any short descriptions that contain
the keyword, even as a substring.
We need to request an exact match instead:
From this short list, we can see that there are two man pages for a command
named date, one in Section 1 and the other in Section 1posix.
104 Chapter 3
Here’s part of the man page for date in Section 1posix:
PROLOG
This manual page is part of the POSIX Programmer's Manual. The Linux
implementation of this interface may differ (consult the corresponding
Linux manual page for details of Linux behavior), or the interface may
not be implemented on Linux.
NAME
date - write the date and time
SYNOPSIS
date [-u] [+format]
date [-u] mmddhhmm[[cc]yy]
--snip--
NAME
date - print or set the system date and time
SYNOPSIS
date [OPTION]... [+FORMAT]
date [-u|--utc|--universal] [MMDDhhmm[[CC]YY][.ss]]
--snip--
$ date
Wed Mar 26 02:54:17 PM EDT 2025
$ date -d 'next Thu' # Note that we can put space between -d and its argument
Thu Mar 27 12:00:00 AM EDT 2025
$ date -d'next month'
Sat Apr 26 02:55:05 PM EDT 2025
$ date -d"2038-01-19 03:14:07 UTC" # Time of end of Unix Epoch
Mon Jan 18 10:14:07 PM EST 2038
$ date -d'5 years ago'
Thu Mar 26 02:56:13 PM EDT 2020
The -d option lets us request that date print dates other than the current
one, which is a useful feature, and it even allows expressions such as five years
ago and next Thursday. This option is not detailed much in the man page.
Instead, its author wrote the following note there: “The date string format
is more complex than is easily documented here but is fully described in the
info documentation.”
More important for us right now is that it has options for controlling the
format of the output date/time string. We can change the format by sup-
plying an option of the form +"FORMAT", where FORMAT is a string that contains
ordinary character sequences as well as character sequences called format
specifications, each of which is introduced by a % character and followed by
a second character called a format specifier character. Each format specifica-
tion defines one or more pieces of date or time information formatted in a
particular way. For example, %m is replaced on output by a two-digit month
number, such as 04.
The ordinary character sequences in FORMAT (called literals) are output ex-
actly as they’re written in the string. For example, the format "The month is %m"
is output as "The month is 04" if the current month is April.
Some common format specifiers are %a, which is replaced by the three-
letter weekday name, such as Sun for Sunday; %b, which is replaced by the
three-letter month name, such as Dec for December; and %D, which is replaced
by a date in the form mm/dd/yyyy, such as 01/01/1972. Appendix C contains a
comprehensive list of specifiers with examples.
Here are a few examples of output when the date is the end of the Unix
Epoch, January 19, 2038, at 03:14:07 UTC:
$ date +"%A, %D" # Full day name, literal comma, and American date
Monday, 01/18/38
$ date # Default format
Mon Jan 18 10:14:07 PM EST 2038
$ date +"%c" # Locale's date and time
Mon 18 Jan 2038 10:14:07 PM EST
$ date +"It is %A at %R." # Full day name, 24-hour time
It is Monday at 22:14.
Notice that the format string contains a mix of format specifiers and literals.
The literals are displayed uninterpreted, where they appear relative to the
106 Chapter 3
format specifiers. The %c format specifier is called the locale’s date and time in
the documentation.
Several of the format specifiers refer to the user’s locale in their descrip-
tions. For example, the %a and %A are the locale’s abbreviated and full week-
day names, respectively. In the United States, these are names such as Sun
and Sunday, respectively, but we have yet to see what they would be if we
could choose a different locale. We’ll see how to do this in “Working with
Locales” on page 128.
The man page for date doesn’t specify what its default format is. We can
see what it looks like, and we know that it’s the locale’s format, but we don’t
know why it’s in that form. However, the SEE ALSO section tells us that the full
documentation is available in two places: the Info documentation and on
the GNU website (https://www.gnu.org/software/coreutils/date). Both sources
state that “invoking date with no format argument is equivalent to invoking it
with a default format that depends on the LC_TIME locale category.” In the de-
fault C locale, this format is +"\%a \%b \%e \%H:\%M:\%S \%Z \%Y", so the output
looks like Thu Mar 3 13:47:51 PST 2005. Clearly, we need to know more about
locales to understand this explanation, but for now we’ll focus on writing
some simple programs that behave like date, and we’ll explore locales later
in this chapter.
Our first goal is to write a much simpler version of the date command
without any of its command line options to reproduce its default behavior.
Once we do that, we’ll add the ability to customize the output format using
format specifiers, and after that we’ll see how to make a version of it that’s
sensitive to locale settings. We’ll name the first version of the command
spl_date1 and name the program’s source code file spl_date1.c.
Of these, the time (7) man page looks like the best starting point. It summa-
rizes what we need to understand about time in Unix and Linux.
Broken-Down Time
The time (7) man page also mentions a type of time representation called
broken-down time, which is a time representation that’s broken down into var-
ious commonly used components. As Robert Grudin put it in Time and the
Art of Living [11]:
Our units of temporal measurement, from seconds on up to months,
are so complicated, asymmetrical and disjunctive so as to make co-
herent mental reckoning in time all but impossible. . . . It is as
though architects had to measure length in feet, width in meters
and height in ells; as though basic instruction manuals demanded
a knowledge of five different languages.
108 Chapter 3
A broken-down time structure consolidates all of this information into
a single data structure, called a struct tm, which is used by several functions
that convert time and date formats from one form to another. The man
page mentions some of them and suggests looking at the ctime() man page.
If we look at that page, we see that asctime(), ctime(), gmtime(), localtime(),
mktime(), strftime(), and strptime(), as well as various thread-safe versions of
these, are all time-conversion functions.
We’ll examine these functions in “Time Conversion Functions” on
page 112 to decide which we should use, but first we’ll examine the struct
tm data structure, which is defined in the time.h header file as follows:
struct tm {
int tm_sec; /* Seconds (0-60) */
int tm_min; /* Minutes (0-59) */
int tm_hour; /* Hours (0-23) */
int tm_mday; /* Day of the month (1-31) */
int tm_mon; /* Month (0-11) */
int tm_year; /* Year - 1900 */
int tm_wday; /* Day of the week (0-6, Sunday = 0) */
int tm_yday; /* Day in the year (0-365, 1 Jan = 0) */
int tm_isdst; /* Daylight saving time */
};
The fields have their expected meanings, but there are two details to note:
• The tm_sec field stores the number of seconds after the minute,
which is normally in the range 0 to 59, but it can be up to 60 to
allow for leap seconds.
• The tm_isdst field is a flag that indicates whether daylight saving
time is in effect at the time described. The value is positive if day-
light saving time is in effect, zero if it is not, and negative if the
information is not available.
Now that we know that functions exist to convert formats and that they
use this broken-down time structure, we can turn to the question of how to
get the current time.
#include <time.h>
int clock_gettime(clockid_t clockid, struct timespec *tp);
struct timespec {
time_t tv_sec; /* Seconds */
long tv_nsec; /* Nanoseconds */
};
• In the SEE ALSO section, it suggests a few pages that we should read:
gettimeofday() and time(), both in Section 2. We should also read the
man page for time.h.
The time() system call is much simpler to use and understand. Its man
page tells us the following:
• time() returns the number of seconds since the Epoch.
#include <time.h>
time_t time(time_t *tloc);
110 Chapter 3
The argument is a pointer to an integer of type time_t, but it’s al-
lowed to be NULL because its return value is also the current time.
• When tloc is NULL the function cannot fail, obviating the need for
error handling.
Before we decide which of these two functions to use, we look at the
man page for the gettimeofday() function suggested in the SEE ALSO section.
Its synopsis is
#include <sys/time.h>
int gettimeofday(struct timeval *tv, struct timezone *tz);
struct timeval {
time_t tv_sec; /* Seconds */
suseconds_t tv_usec; /* Microseconds */
};
On return, this stores the number of seconds and microseconds since the
Epoch. The timezone struct pointer tz should always be set to NULL because it
has been deprecated.
WARNING Whenever you see a feature marked as deprecated in documentation, avoid using it.
If the organization that supports the software has deprecated it, that means it will no
longer support it and it will become obsolete.
In fact, the CONFORMING TO section notes that the function itself has been marked
as obsolete since POSIX.1-2008, and it recommends using clock_gettime()
instead.
The choice is thus reduced to clock_gettime() and time(). The differ-
ence is in how the returned time is represented and its granularity. Since the
tv_sec field of the struct timespec returned by clock_gettime() is the number
of seconds since the Epoch, it should be the same as the value returned by
time(). For our program, we don’t need subsecond granularity, so there’s lit-
tle benefit to using clock_gettime(). On the other hand, it’s a more adaptable
function.
Another factor to consider is performance. To check whether there is
a price to pay for obtaining the finer resolution of the timespec structure,
I wrote two programs that called each function 10 million times and mea-
sured their elapsed times when run on my x86-64 system running Linux
5.15.0. The program that called time() required 0.032 seconds, whereas the
one running clock_gettime() required 0.171 seconds. Repeated runs had sim-
ilar results. Taking everything into consideration, we’ll choose time() for get-
ting the current time in this first version of our program.
#include <time.h>
We observe that:
• asctime() is given a broken-down time struct and returns a string.
• gmtime() and localtime() are each given a time_t value and return a
pointer to a broken-down time struct.
• mktime() is given a broken-down time struct and returns a time_t
value.
• None of these functions require time resolution smaller than sec-
onds, reinforcing our decision to use time() instead of clock_gettime()
to get the current time.
This shows that we need to use either asctime() or ctime() to create a for-
matted time string, but if we use asctime() we need to convert from calendar
time to the broken-down time first. Reading the DESCRIPTION section reveals
that the difference between gmtime() and localtime() is that gmtime() con-
verts its time_t argument to broken-down time expressed in UTC, whereas
localtime() converts its time_t argument to broken-down time expressed rel-
ative to the user’s specified time zone. We’ll have more to say about time
zones in “About Time Zones” on page 131.
The CONFORMING TO section of that page, however, states that both asctime()
and ctime() are marked as obsolete and that strftime() should be used in
their place, ruling them out. Reading the man page for strftime(), we learn
that it’s a much more powerful function than either of them:
$ man strftime
--snip--
SYNOPSIS
#include <time.h>
size_t strftime(char *s, size_t max, const char *format,
const struct tm *tm);
112 Chapter 3
DESCRIPTION
The strftime() function formats the broken-down time tm according to
the format specification format and places the result in the character
array s of size max.
--snip--
The rest of its description tells us that strftime() lets us customize the
output date and time string by using a format specification, which is its third
parameter. In fact, the set of format specifiers is almost identical to those
used by the date command, making our job fairly easy.
As a start, to approximate the default format of date, we can use %c.
(Although the man page states that there’s a format specification %+ that
produces a string in the exact same format as the date command, it isn’t
supported in glibc version 2.) Thus, to obtain a string in roughly the same
format as date such as
time()
Calendar time
(time_t)
localtime()
Format
Broken-down time specification string
(struct tm) (char[])
strftime()
Time string
(char[])
printf()
We use time() to get the current time in calendar time units and pass
that return value to localtime(), which constructs a broken-down time object.
We pass that in turn to strftime() in addition to the format specification %c,
stored in a variable. Finally, we print out the string produced by strftime().
strcpy(format_str, "%c");
current_time = time(NULL); /* Get the current time. */
/* Create a string from the broken down time using the %c format. */
if ( 0 == strftime(formatted_date, sizeof(formatted_date),
¶ format_str, broken_down_time) ) {
fatal_error(EXIT_FAILURE, "Conversion to a date-time string"
" failed or produced an empty string\n");
}
printf("%s\n", formatted_date);
return 0;
}
Rather than hardcoding the %c format specifier ¶ directly into the call
to strftime(), we store it in a string variable named format_str of length MAXLEN
(defined in common_hdrs.h) that we pass to the function. This makes it easier
to change the program in the next version.
In this way, different users could see the time in their format of choice.
To accomplish this, we have to make only a small change to the program.
Specifically, we need to check whether the command has an argument, and
if so, whether it starts with a + and is small enough to fit into format_str. If so,
114 Chapter 3
we can pass the string following the + to strftime(). If that string isn’t a valid
format string, strftime() will return an error that we can report. Otherwise,
we print the string that it produces. If there’s no argument to the program,
we just print the current time in the default format.
We can incorporate this logic into a function named getformat(), which is
passed a pointer to the command line and extracts the format string from it:
The function expects its first parameter nargs to be passed argc; this way,
argvec[nargs-1] is the last word on the command line.
We add this function to the program, which we’ll name spl_date2.c. We’ll
call it immediately before the call to time() (see Listing 3-8). No other changes
are needed. The complete program is in the source code distribution for
the book.
--snip--
int main(int argc, char *argv[])
{
char formatted_date[MAXLEN];
time_t current_time;
struct tm *broken_down_time;
char format_string[MAXLEN];
if ( argc < 2 )
strcpy(format_string, "%c");
else
getformat(argc, argv, format_string);
current_time = time(NULL); /* Get the current time. */
--snip--
if ( 0 == strftime(formatted_date, sizeof(formatted_date),
format_string, broken_down_time) ) {
fatal_error(BAD_FORMAT_ERROR, "Conversion to a date-time string"
" failed or produced an empty string\n");
Listing 3-8: A partial listing of the second version of spl_date, allowing an optional
user-supplied format string argument
Following are a few runs of this program that show how it handles some
possible errors and produces the expected output:
$ ./spl_date2
Wed Mar 26 15:07:25 2025
$ ./spl_date2 today
spl_date2: format should be +"format-string"
$ ./spl_date2 +"Today is day %e of %B. It is now %r"
Today is day 26 of March. It is now 03:08:49 PM
$ ./spl_date2 +"a very long string, longer than 1024 characters..."
spl_date2: format string length is too long
The last run is given a format string whose length exceeds the size of the
buffer that the program uses to show how it handles this error. You can see
that it detects it and exits without crashing.
116 Chapter 3
Time Adjustment Specifications
When we write amounts of time in noncomputer contexts, we understand
that the expressions “1 month, 8 days,” “one month and eight days,” and
“one month, eight days” are equivalent amounts of time. If we allowed users
to enter amounts of time with that degree of flexibility, we’d be making the
task of parsing the input much harder than if we limited the form of the in-
put to something simpler. It amounts to a trade-off between what’s easy for
the user and what’s easy for the programmer. Since we’re not trying to write
production software yet, we need a compromise that provides a convenient
interface for the user and a relatively easy syntax to parse.
My compromise is to allow the user to enter time differences in the cus-
tomary units we use, specifically, years, months, weeks, days, hours, min-
utes, and seconds, but not to enter phrases such as “next Monday” or “last
month,” which would add more parsing to the program.
To make the parsing easier, we’ll require the user to enter numerals
rather than words for the amounts. For example, the program should ac-
cept a phrase such as “2 years 3 weeks” but not “two years three weeks” or
“two years, three weeks.”
Also, we’ll give users the ability to enter times in the past by allowing
negative numbers for the time quantities, so we’ll accept a phrase such as
“–4 hours 5 minutes,” which could also be entered as “–3 hours –55 min-
utes.” Note that a negative number applies only to the time unit next to it—
“–4 hours 5 minutes” is not “–(4 hours 5 minutes).”
To simplify the program, we’ll forbid fractional amounts, such as “3.5
days,” but we’ll allow users to enter the same unit multiple times. For exam-
ple, they could enter a time adjustment such as:
The way that I’ve formulated this, commas between the units are not allowed.
I’ll write a specification of the time adjustment using the following gram-
mar, which uses the same syntax as the man page synopsis:
Notice that the time units can have an optional s on the end, numbers can
start with an optional + or -, and they cannot start with leading zeros. If they
use a leading zero, the number will be interpreted as an octal number.
Here are some examples:
Fuzzy Time
The last consideration before we start to map out the program logic con-
cerns the fuzziness of months and years as units of time. The number of
days in a month depends on the month, and the number of days in a year
changes for leap years. If we subtract one month from July 31 what is the
date? Since there is no June 31, is it June 30?
If you read the Info page for the date command, you’ll see that its imple-
mentation uses the rule that adding (or subtracting) a month increments (or
decrements) the month number, and if the date doesn’t exist in that month,
it’s adjusted to the nearest date that’s valid.
We can test how the real date adjusts these dates using its --date= option:
For consistency, our program should use this same date calculation logic,
but that raises the question: Is there a library function that does this calcula-
tion, or do we have to implement it ourselves?
If we return to the man page for ctime(), we’ll see that it has relevant
information about the mktime() function:
The mktime() function modifies the fields of the tm structure as
follows: tm_wday and tm_yday are set to values determined from the
contents of the other fields; if structure members are outside their
valid interval, they will be normalized (so that, for example, 40
October is changed into 9 November); tm_isdst is set (regardless of
its initial value) to a positive value or to 0, respectively, to indicate
whether DST is or is not in effect at the specified time.
118 Chapter 3
In short, mktime() encapsulates the corrections for invalid dates and times
used in the date command, saving us from having to implement this logic
ourselves. Therefore, we can add time adjustments to a broken-down time
structure bd_time and call mktime(bd_time) to have mktime() normalize the time
for us.
Program Logic
How this version of spl_date differs from the preceding one will guide the
changes in the program logic. The first step is to list the changes:
• We have to add option parsing.
• We have to parse the time adjustment, if it’s present, into the num-
bers of seconds, minutes, hours, and so on, that need to be added
(or subtracted) from the current time.
• We need to add the time adjustment to the current time and display
the resulting time.
We can incorporate these differences into the program’s control flow, ignor-
ing error handling for the moment, in the following sequence of steps:
1. Parse the command line, checking whether the -d or -h option is
present.
2. If -h is present, ignore all other arguments and options, print out
help information, and exit.
3. Otherwise, if -d is present, allocate memory to store its argument
and copy the argument into that memory.
4. If there is a format specification, copy it into a string of suffi-
cient size.
5. Obtain and store the current time into a time_t variable using time().
6. Convert the current time into a broken-down time representation
using localtime().
7. If -d is present, parse the argument, creating a temporary broken-
down time structure that stores the time to add to the current time
in terms of years, months, days, and so on, and add the value of the
temporary structure to the broken-down current time.
8. Use the strftime() function to format the output string representa-
tion of the broken-down time.
9. Print the formatted string using printf().
Figure 3-3 shows the control flow with the new logic in bold.
Calendar time
(time_t) parse_time_adjustment() print help and exit
localtime()
If -d option
update_time()
If not -d option
Adjusted Format
broken-down time specification string
(struct tm) (char[])
strftime()
Time string
(char[])
printf()
We can now prototype and design the function that parses the time-
adjustment string. Since the function should receive a time-adjustment string
as its input and create a broken-down time representation of that string as its
output, a reasonable prototype for it is the following:
datetm->tm_year = 2
datetm->tm_mon = 4
datetm->tm_mday = 10
datetm->tm_hour = -6
--snip--
/* All other members set to zero */
In principle, we can design this code from scratch and parse the string
without any need to call a library function, reading each character from left
to right, processing them as needed. For example, we could skip whitespace,
build numbers when we see a plus or minus sign or a digit, and build time-
unit strings when the characters are alphabetic. Processing this way makes
one pass over the string and is the fastest possible approach. However, we
need to process only the command line, not thousands of large strings, im-
plying that the amount of time we’ll save with this approach is impercepti-
ble. It would be far better to take advantage of existing library functions that
have been well tested, even if we end up making two passes across the string.
There’s usually a trade-off between code that’s easy to read and code that per-
forms well. In designing a system program, we should certainly aim for good
performance, but we also want to write code that’s easy to understand and main-
tain. What principles can guide the algorithms we choose?
• Code that isn’t executed much doesn’t need to be fast because even
if it’s a few orders of magnitude slower than it could be, it won’t add
any noticeable amount to the total running time. In contrast, code
that’s executed frequently should be fast.
• It’s safer to use code that has been already written and tested thor-
oughly than to write new code to solve the exact same problem.
• Code that will be in service a long time should be easier to maintain
than code that you know will be obsolete sooner.
I usually ask myself these questions when I design algorithms and need to de-
cide how to make the trade-offs.
What functions can we use? Again, the first step is to consult the man
pages. If we try using apropos -s3 string or apropos -s3 -e string to see which
man pages in Section 3 are related to strings, we’ll get a very long list that we
can search by hand. Or, we could see if there’s a man page named string. If
we do that, we’ll discover a new resource:
$ man string
STRING(3) Linux Programmer's Manual STRING(3)
NAME
stpcpy, strcasecmp, strcat, strchr, strcmp, strcoll, strcpy, strcspn,
Time, Dates, and Locales 121
strdup, strfry, strlen, strncat, strncmp, strncpy, strncasecmp,
strpbrk, strrchr, strsep, strspn, strstr, strtok, strxfrm, index,
rindex - string operations
SYNOPSIS
#include <strings.h>
We could also read the string.h man page, but this one is better because
the string.h man page is a POSIX page saying what should be present in
a POSIX-compliant system, whereas this one is what actually is on our sys-
tem. On GNU/Linux, all of the functions listed in the string.h man page are
available, possibly with different behavior than POSIX.1-2024 requires. No
matter which you choose, it will be informative and will provide guidance
and clues for picking the right tool for the job.
The list of functions in the string.h man page includes one named strtok()
with this prototype:
#include <string.h>
char *strtok(char *s, const char *delim);
The description states that it extracts tokens from the string s that are delim-
ited by one of the bytes in delim. Tokens are pieces of a string to be parsed.
The strtok() library function is a great tool for breaking up a line into
tokens separated by any types of delimiters. For example, if you’re given a
comma-separated values (CSV) file and need to extract its fields, you could
use this function passing a comma as a delimiter.
The delim string is the set of characters that act as delimiters. If the string
is :,;, then each of those characters will be treated as a character separating
two tokens. For our purpose, we set delim = " \t" because the tokens in the
time-adjustment string are separated by whitespace characters, including tab
characters.
The first time we call strtok(), we pass the string to be parsed in the first
argument. Its return value will be a pointer to the first token it finds. All
returned tokens are terminated with a NULL byte (\0) so that string-processing
functions can be used safely with them.
In subsequent calls, we pass the NULL pointer in the first parameter. If
there are no more tokens, it returns NULL, so the standard way to use it is es-
sentially as follows:
The strtok() function actually makes a copy of the string that you pass to
it, and as it finds each token, it replaces the delimiter at the end of it with a
terminating NULL byte (\0).
Since our program expects the time-adjustment string to be a sequence
of pairs of the form <number> <whitespace> <time-unit>, each iteration of the
loop should call strtok() twice: the first time to get a number and the second
to get a time unit. We’ll declare the following variables:
The function tries to get the first token before entering the loop. If success-
ful, it enters the loop and calls get_int() to extract the number from the re-
turned token, exits for any possible errors from a failed call to get_int(), and
calls strtok() to get the associated time unit. It exits if the time unit is miss-
ing; otherwise, it adds the amount of time to the datetm structure before call-
ing strtok() again. Listing 3-9 contains the complete function implementa-
tion, with some comments omitted to save space.
To add the time adjustment to the datetm structure, I use the strstr()
function, also described in that time man page. This is essentially a substring
searching function. Its man page shows the prototype:
#include <string.h>
char *strstr(const char *haystack, const char *needle);
As the parameter names suggest, it searches for the first occurrence of sub-
string needle in string haystack, returning a pointer to that occurrence or NULL
if it’s not there. As it’s presented here, this function will parse a string such
as “4 megadays” as “4 days.” It can be modified so that it is successful only
if the time units are exact words such as “day” or “days.” I leave this as an
exercise.
NOTE You might wonder why I use a sequence of cascading if statements in parse_time
_adjustment() instead of a switch statement. In C, the switch statement requires an
integer type, but I need to compare strings, which are not an integer type. There are
more efficient ways to do this, but since this code is executed only relatively few times,
and since it’s clear and simple, it’s suitable.
124 Chapter 3
The last function we’ll use is one that adds the values from one broken-
down time structure into another, which I name adjust_time(). It’s displayed
in Listing 3-10.
errno = 0;
mktime(datetm);
if ( errno != 0 )
fatal_error(errno, NULL);
return 0;
}
Listing 3-10: A function that adds time amounts to a broken-down time structure and
normalizes the fields
The only point to emphasize about adjust_time() is that it’s possible for
mktime() to fail, and because of this, the function checks for an error after
the call and terminates the program if something went wrong.
Listing 3-11 shows fragments of the spl_date3.c program with the preced-
ing functions omitted to save space.
current_time = time(NULL);
bdtime = localtime(¤t_time);
if ( bdtime == NULL )
fatal_error(EOVERFLOW, "localtime");
if ( d_option ) {
126 Chapter 3
parse_time_adjustment(d_arg, &time_adjustment);
update_time(bdtime, &time_adjustment);
· free(d_arg); /* Allocated in option handling above */
}
if ( 0 == strftime(formatted_date, sizeof(formatted_date),
format_string, bdtime) )
fatal_error(BAD_FORMAT_ERROR, "Conversion to a date-time string "
"failed or produced an empty string\n");
printf("%s\n", formatted_date);
return 0;
}
#include <stdlib.h>
void *malloc(size_t size);
Because it returns a void* result, we can assign that address to any C pointer,
such as a utlist* or a char*. Although unlikely, it can fail because there’s no
memory left to allocate and will return NULL and set errno to ENOMEM in this case.
The listing also introduces free() ·, which is used to free the memory
space pointed to by ptr, which must have been returned by a previous call to
malloc(), calloc(), or realloc(). Its synopsis is:
#include <stdlib.h>
void free(void *ptr);
If the memory ptr pointed to has been freed already, the consequences are
unpredictable. We’ll discuss allocating and deallocating memory more in
Chapter 10. Note that the absence of a break after each call to usage_error() in
the switch statement is justified because the function terminates the program.
A few runs demonstrate the program’s behavior:
$ ./spl_date3 -h
usage: spl_date3 [-d "<time adjustment>"] [+"format specification"]
$ ./spl_date3 +"%a %b %d, %Y, at %R"
Wed Mar 26, 2025, at 15:24
$ ./spl_date3 -d "1 year" +"%a %b %d, %Y, at %R"
Thu Mar 26, 2026, at 15:30
$ ./spl_date3 -d "1 week 2 hours" +"%a %b %d, %Y, at %R"
Wed Apr 02, 2025, at 17:33
$ ./spl_date3 -d "-2 months +4 months" +"%a %b %d, %Y, at %R"
Mon May 26, 2025, at 15:38
$ ./spl_date3 -d '+120 minutes -2 hours' +"%a %b %d, %Y, at %R"
Wed Mar 26, 2025, at 15:39
$ man 7 locale
LOCALE(7) Linux Programmer's Manual LOCALE(7)
NAME
locale - description of multilanguage support
128 Chapter 3
SYNOPSIS
#include <locale.h>
DESCRIPTION
A locale is a set of language and cultural rules. These cover aspects
such as language for messages, different character sets, lexicographic
conventions, and so on. A program needs to be able to determine its
locale and act accordingly to be portable to different cultures.
The header <locale.h> declares data types, functions and macros which
are useful in this task.
This page refers us to the locale.h header file for details about the data types,
functions, and macros. It also mentions two functions, setlocale() and
localeconv(), that we may need in our modified program. The rest of the
man page describes important fundamental concepts, summarized next.
Locale Categories
A locale consists of a collection of categories. Categories are parts of the lo-
cale that control related aspects of a user’s cultural and language settings.
For example, the LC_CTYPE category consists of data that specifies character
classification, case conversion, and other character attributes, such as which
characters are letters, which are digits, which are punctuation, and so on.
The names that identify categories all begin with LC_ (short for “locale
category”). These names are integer-valued macros declared in locale.h for
use by programs. The names can also be placed into the environment, in
which case they’re also environment variables. Thus, LC_CTYPE is the macro
name of a category and can also be the name of an environment variable.
POSIX.1-2024 defines six categories, all of which should be in the en-
vironment of most Unix systems that you might use. Some systems do not
add them to the environment by default. The GNU C library, starting with
version glibc 2.2, extends the set with six more categories.
Table 3-1 contains all of the categories present in the latest GNU/Linux
distribution as of this writing, with an indication of whether it is part of
POSIX or a GNU extension and a brief synopsis of what it controls.
The variables in Table 3-2 are not locale categories, but with the ex-
ception of TZ, they’re used for managing locale information. For example,
LC_ALL acts like a global locale setting, overriding the values of all locale vari-
ables; setting it to a specific locale assigns that locale to all of the categories,
whether or not they were set to a specific locale.
130 Chapter 3
The Section 7 man page also describes how to pass locale data to the
setlocale() function and shows the declaration of the lconv struct that
localeconv() returns. Although we’ll eventually need to learn about these
two functions, we’ll visit them later in this chapter in “The Programming
Interface to Locales” on page 138.
Before we explore how to manage locales at the user level, we need to
understand a bit about time zones.
TZ=":Europe/Paris"
TZ=":Europe/Dublin"
TZ=":Greenwich"
The POSIX page is a specification of what the command should do. The
other page describes the command implemented on the system you’re using.
Both are useful, but let’s see what the first page tells us:
$ man 1 locale
LOCALE(1) Linux User Manual LOCALE(1)
NAME
locale - get locale-specific information
SYNOPSIS
locale [option]
locale [option] -a
locale [option] -m
locale [option] name...
DESCRIPTION
The locale command displays information about the current locale, or
all locales, on standard output.
Let’s run it without arguments to see what it outputs (you’ll likely see
something different):
$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=C.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
132 Chapter 3
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
On my system the locale is set to be en_US.UTF-8 for all but one category,
LC_COLLATE, which is set to C.UTF-8. The LANG is en_US.UTF-8 as well. The LANGUAGE
variable has the same value, but the name is the short form of it. The LC_ALL
variable is assigned an empty string because if it were assigned a nonempty
string, it would override the values for all other categories, which would pre-
vent me, for example, from changing one category to a different value from
the others.
Locale names are typically of the form
language[territory][.codeset][@modifier]
where language is an ISO 639 language code, territory is an ISO 3166 coun-
try code, codeset is a character set or encoding identifier such as ISO-8859-1
or UTF-8, and modifier is any string used to further refine the name.
In the locale name en_US.UTF-8, en is the English language, US is the United
States, and UTF-8 is the codeset. It has no modifier.
A codeset is a mapping from graphical characters to numeric values. The
numeric values are called code points, and codesets are also sometimes called
character maps or character sets. For example, ASCII is an early codeset that
maps the set of characters commonly found on old keyboards, as well as cer-
tain other nonprinting characters, to 7-bit unsigned integers. It does not
map characters with diacritical marks or non-Latin characters.
The UTF-8 codeset is a variable-length codeset that is capable of repre-
senting all Unicode code points in anywhere from 1 to 4 bytes per point.
Unicode is a numeric representation of the alphabets of almost all known
ancient and modern languages, including Japanese, Chinese, Greek, Cyrillic,
Canadian Aboriginal, and Arabic. Appendix B contains a brief history and
description of Unicode with detailed examples.
The locale command with the -a option outputs a list of the available
locales on your system. This is a fragment of the output on my system, for
example:
$ locale -a
C
C.utf8
POSIX
en_AG.utf8
en_AU.utf8
--snip--
fr_BE.utf8
fr_FR.utf8
pl_PL.utf8
--snip--
$ locale -av
--snip--
--snip--
You can change your locale to any of the ones this command lists by as-
signing the environment variables their full names. For example, if I change
the LC_ALL variable to pl_PL.utf8, all of those functions and commands that
are sensitive to the locale will use the Polish settings for my locale.
The locales locale -a lists are a small subset of those that you can gener-
ate. In some versions of Linux, the file /etc/locale.gen contains a list of locales
134 Chapter 3
that you can generate by uncommenting them and rerunning the locale-gen
command, provided that you have superuser privilege. After you do that,
the locale’s name will be in the list that locale -a displays.
The /etc/locale.gen file typically contains several hundred locale names,
mostly commented out. Linux maintains a list of all supported locales in the
/usr/share/i18n/SUPPORTED file. The exact path might vary depending on
the particular Linux distribution that you’re using.
The directory name i18n in this path is the abbreviation that people use
for “internationalization” (that word has 18 letters starting with i and ending
with n). That file usually has about as many entries as etc/locale.gen.
In bash, you can precede a command with one or more variable assignments.
If these variables are environment variables, the change in their value will be in
effect only for the execution of that individual command, because a temporary
environment is created with those changes and passed to a subshell in which the
command is run.
To demonstrate, I’ll run date +"%c" first and then set the time zone variable TZ
to be the current time in Spain and override all other category settings using the
territorial locale for Spain, es_ES.utf-8. Then I’ll run date +"%c" again, so you
can see the difference:
$ date +"%c"
Mon 06 Mar 2023 01:11:45 PM EST
$ TZ=Spain LC_ALL=es_ES.utf-8 date +"%c"
lun 06 mar 2023 18:11:47
The day lun is short for Lunes, the Spanish word for Monday, and mar is short
for marzo, the word for March.
LC_NUMERIC
decimal_point "<period>"
thousands_sep "<comma>"
grouping "3;0"
--snip--
END LC_NUMERIC
Every file begins with the name of the category and ends with END category
name. This category has three keywords: decimal_point, thousands_sep, and
grouping. The first two values are self-explanatory. The value for grouping in-
dicates that groups of three digits are separated by commas for all groups to
the left of the decimal point. The first digit (3) is the size of the first group
to the left of the decimal point. The second, 0, means that all groups to the
left have 3 as well.
The LC_CTYPE category has much more extensive data. So that you can
see how their definitions can vary, Listing 3-12 provides a fragment of a typi-
cal definition file for the en_US.utf8 locale.
escape_char /
LC_CTYPE
upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;/
<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;/
<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
space <tab>;<newline>;<vertical-tab>;<form-feed>;/
<carriage-return>;<space>
cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;/
<form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;/
<ETX>;<SEL>;<RNL>;<DEL>;<GE>;<SPS>;<RPT>;<SI>;<SO>;<DLE>;<DC1>;/
<DC2>;<DC3>;<RES>;<POC>;<CAN>;<EM>;<UBS>;<CU1>;<IFS>;/
<IGS>;<IRS>;<ITB>;<DS>;<SOS>;<fs>;<WUS>;<BYP>;<LF>;/
<ETB>;<ESC>;<SA>;<SM>;<CSP>;<MFA>;<ENQ>;<ACK>;/
<SYN>;<IR>;<PP>;<TRN>;<NBS>;<EOT>;<SBS>;<IT>;<RFF>;/
<CU3>;<DC4>;<NAK>;<SUB>
punct <exclamation-mark>;<quotation-mark>;<number-sign>;<dollar-sign>;/
<percent-sign>;<ampersand>;<apostrophe>;<left-parenthesis>;/
<right-parenthesis>;<asterisk>;<plus-sign>;<comma>;/
<hyphen-minus>;<period>;<slash>;<colon>;<semicolon>;/
<less-than-sign>;<equals-sign>;<greater-than-sign>;/
<question-mark>;<commercial-at>;<left-square-bracket>;/
136 Chapter 3
<backslash>;<right-square-bracket>;<circumflex>;/
<underscore>;<grave-accent>;<left-curly-bracket>;/
<vertical-line>;<right-curly-bracket>;<tilde>
digit <zero>;<one>;<two>;<three>;<four>;/
<five>;<six>;<seven>;<eight>;<nine>
--snip--
tolower (<A>,<a>);(<B>,<b>);(<C>,<c>);(<D>,<d>);(<E>,<e>);/
(<F>,<f>);(<G>,<g>);(<H>,<h>);(<I>,<i>);(<J>,<j>);/
(<K>,<k>);(<L>,<l>);(<M>,<m>);(<N>,<n>);(<O>,<o>);/
(<P>,<p>);(<Q>,<q>);(<R>,<r>);(<S>,<s>);(<T>,<t>);/
(<U>,<u>);(<V>,<v>);(<W>,<w>);(<X>,<x>);(<Y>,<y>);(<Z>,<z>)
--snip--
END LC_CTYPE
Listing 3-12: A locale definition file for the English language in the United States
Notice the syntax that’s used for defining the keyword values in this cate-
gory. The tolower keyword provides the data that functions would need to
convert uppercase to lowercase, so it’s a semicolon-separated sequence of
pairs that essentially defines a function that maps characters to characters.
In contrast, the digit keyword’s value is just a list of the names of the deci-
mal digits that we use in the United States.
If you want to know what the keywords and values are for a locale cate-
gory, you could read the documentation, but fortunately, the locale -k com-
mand will list them. Give it the name of the category, and it outputs a list:
$ locale -k LC_TIME
abday="Sun;Mon;Tue;Wed;Thu;Fri;Sat"
day="Sunday;Monday;Tuesday;Wednesday;Thursday;Friday;Saturday"
abmon="Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec"
mon="January;February;March;April;May;June;July;August;September;October;
November;December"
am_pm="AM;PM"
d_t_fmt="%a %d %b %Y %r %Z"
d_fmt="%m/%d/%Y"
t_fmt="%r"
t_fmt_ampm="%I:%M:%S %p"
--snip--
You can also give it a keyword. To see the format used by date, enter the
following:
The -c option prints the locale category, in this case LC_TIME, on a separate line.
With the -k keyword option, locale prints the supplied keyword and its value,
in this case date_fmt and its value, %a %b %e %r %Z %Y. Consulting Table C-1 in
You can see that date with no format outputs exactly the same fields as the
format string "%a %b %e %r %Z %Y".
$ man setlocale
SETLOCALE(3) Linux Programmer's Manual SETLOCALE(3)
NAME
setlocale - set the current locale
SYNOPSIS
#include <locale.h>
char *setlocale(int category, const char *locale);
DESCRIPTION
The setlocale() function is used to set or query the program's current
locale.
--snip--
The program doesn’t save the return value, but it checks whether it’s NULL,
which is returned if the locale couldn’t be set. If we want to save the name of
the locale for later use, we copy it into a local string variable.
Because the program is nearly identical to spl_date3.c, Listing 3-13 con-
tains only the part of it containing the updated code. The complete pro-
gram is in the source code distribution for the book.
Listing 3-13: The internationalized spl_date program, with most code omitted
Let’s see how this program behaves. We’ll run it under several different
locales, leaving the time zone unchanged, and with both the default format
and a custom format:
$ LC_TIME=da_DK.utf8 ./spl_date4
ons 26 mar 2025 15:50:40 EDT
$ LC_TIME=da_DK.utf8 ./spl_date4 "+%A, %d %B %Y"
onsdag, 26 marts 2025
$ LC_TIME=de_DE.utf8 ./spl_date4 "+%A, %d %B %Y"
Mittwoch, 26 März 2025
$ LC_TIME=es_ES.utf8 ./spl_date4 "+%A, %d %B %Y"
miércoles, 26 marzo 2025
$ LC_TIME=fi_FI.utf8 ./spl_date4
ke 26. maaliskuuta 2025 15.56.07
$ LC_TIME=fi_FI.utf8 ./spl_date4 "+%A, %d %B %Y"
keskiviikko, 26 maaliskuu 202
$ LC_TIME=fr_FR.utf8 ./spl_date4
mer. 26 mars 2025 15:57:15
$ LC_TIME=ja_JP.utf8 ./spl_date4
2025年03月26日 15時57分40秒
$ LC_TIME=ja_JP.utf8 ./spl_date4 "+%A, %d %B %Y"
水曜日, 26 3月 2025
This final version of spl_date is able to display dates and times following the
conventions of a wide range of geographic locales. In the end, enabling this
feature required only a small modification to the previous program, but un-
derstanding why and how it works was the real goal. Now, we’ll turn our at-
tention to other aspects of internationalization.
140 Chapter 3
The underlying philosophy of the GNU C Library is that the program-
mer should be freed as much as possible from the burden of handling inter-
nationalization. If a program sets its locale using setlocale() or one of a few
other similar functions I haven’t mentioned yet, before calling any library
functions, all of the functions that are designed to use locale data will mod-
ify their behavior according to the locale’s rules.
This reduces our problem to knowing which functions use locale infor-
mation and which locale categories they use. Unfortunately, the documenta-
tion doesn’t contain a complete list of precisely those library functions that
use locale information, so I’ll provide some guidance that overcomes this
deficiency. Following is a list of functions that do use locale data:
fprintf() islower() iswcntrl() iswupper() strcoll() toupper()
fscanf() isprint() iswctype() iswxdigit() strerror() towlower()
isalnum() ispunct() iswdigit() isxdigit() strfmon() towupper()
isalpha() isspace() iswgraph() mblen() strftime() wcscoll()
isblank() isupper() iswlower() mbstowcs() strsignal() wcstod()
iscntrl() iswalnum() iswprint() mbtowc() strtod() wcstombs()
isdigit() iswalpha() iswpunct() perror() strxfrm() wcsxfrm()
Most of these use data from either the LC_CTYPE or LC_COLLATE category,
but some also use LC_NUMERIC, LC_TIME, or LC_MONETARY. Their man pages specify
which of these categories the function uses, either by naming which locale-
specific environment variables it uses or by stating that the function uses the
locale in a specific way. You can search for the keyword locale or the pattern
LC_ in the page using the pager’s search operator / followed by the keyword,
as in /LC_ to jump to the part of the page that references these terms.
If this list isn’t accessible and you can’t remember which functions use
the locale, refer to the SEE ALSO section of the Info page for setlocale() or visit
the POSIX.1-2024 website page for it at https://pubs.opengroup.org/onlinepubs/
9699919799/functions/setlocale.html, where many of the functions are listed.
The strcoll() function is worth singling out. Here’s its prototype:
It compares two strings, s1 and s2, and returns a negative integer if s1 < s2,
zero if s1 == s2, and a positive integer if s1 > s2.
Most people use strcmp() for comparing two strings in C. Its prototype
is the same, but strcmp() doesn’t use locale data in its comparisons, which
means that sorting algorithms based on strcmp() won’t sort according to the
true ordering of characters in the user’s locale.
In contrast, strcoll() does use the locale’s LC_COLLATE data. The following
program demonstrates its use:
if ( argc < 3 ) {
sprintf(usage_msg, "%s string string ...\n", basename(argv[0]));
usage_error(usage_msg);
}
if ( NULL == setlocale(LC_COLLATE, "") )
fatal_error(LOCALE_ERROR,
"setlocale() could not set the given locale");
smallest = argv[i];
for ( j = i + 1; j < argc; j++ )
if ( strcoll(smallest, argv[j]) > 0 )
smallest = argv[j];
printf("%s\n", smallest);
return 0;
}
If we compile and run this program, setting a different temporary locale for
each run, we see how it behaves:
The C locale uses the ASCII ordering of characters, with all uppercase pre-
ceding all lowercase. In contrast, the en_US.utf8 locale sorting order is case-
insensitive. If, in strcoll_demo.c, we replaced strcoll() with strcmp() and ran
this program, in both locales the output would be Zebra, showing that strcmp()
doesn’t use locale data.
Sometimes no library function can handle the problem you’re trying to
solve in a locale-sensitive way. In that case, you need to access locale data di-
rectly. The library has ways to do this. When we first searched for functions
to internationalize our spl_date program by entering apropos locale, the out-
put listed a few library functions that we overlooked. We’ll search again but
limit the search to Section 3: