67% found this document useful (3 votes)
3K views726 pages

Linux Programming by Example

Linux Programming by Example

Uploaded by

Maximus Cosmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
3K views726 pages

Linux Programming by Example

Linux Programming by Example

Uploaded by

Maximus Cosmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 726

:.

: PRENTICE
• • HALL

Prentice Hall Open Source Software Development Series

Programmi by Example

ARNOLD ROBBINS
Prentice Hall
Open Source Software Development Series
Arnold Robbins, Series Editor

((Real world code from real world applications n

Open Source technology has revolll(ionized the computing world. Many large-scale projects are
in production use worldwide, such as Apache, MySQL, and Postgres, with programmers writing
applications in a variety of languages includ ing Perl , Python , and PHP These technologies are in
use o n m any di fferent systems, ranging fro m proprietary sys tems , to Linux systems, to traditional
UNIX sys tems, to main fra mes.

T he Prentice Hall Open Source Software D evelopment Series is designed to bring you the
best of these Open Sou rce tech nologies. Not only will you learn how to use them for yo ur
projects, but you will learn ftom them. By seeing real code from real applications , yo u will learn
the best practices of Open So urce developers the wo rl d over.

Titles currently in the series include:

Linux®Debugging and Performance Tuning: Tips and Techniques


Steve Best
0 13 1492470 , Paper, 10/ 14/200 5
The book is no t o nly a high-level strategy gui de b ut also a book that combines strategy wi th
hands-o n deb ugging sessions and perfor mance tu n ing too ls and techniq ues.

Linux Programming by E..;ample: The Fundamentals


Arnold Ro bbins
0 13 1429647, Paper, 4/ 12/2 004
G rad ual ly, o ne step at a time, Robbins teaches both high level p rinciples and "un de r the hood"
techniques. This book will hel p the reader master the fundamentals needed to b uild serious
Lin ux software .

The Linux® Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures
Claud ia Salzbe rg, Gordo n Fischer, Steven Smolski
013118 1637, Paper, 9/21/2005
A comprehensive view of the Linux Kernel is presented in a top down ap proach-t he big picture
first wi th a clear view of all components , how they interrelate, and where the hardware/softwa re
separation exists. The coverage of both (he x86 and the PowerPC is unique to this book.
To my wife Miriam)
and my children)
Chana) Rivka) Nachum) and MaIka.
Linux Programming
by Example

Arnold Robbins

PRENTICE HALL
Professional Technical Reference
PREN T ICE
HAll Upper Saddle River, NJ 07458
PTR www.phptr.com
© 2004 Pearson Education, In c.
PRENTI C E
Publi shin g as Prenrice Hall Professio nal Technical Refere nce
HAll U pper Saddl e River, New Jersey 074 58
PTR
Prenrice H all PT R offe rs d isco unrs on m is book wh en orde red in quantiry for bul k purchases or special sales. Fo r
more info rmat io n, please conract: U.S. Co rporate and Governm enr Sales, 1-800-382-34 19,
corpsales@ pearsonrechgro up.com. For sales o uts ide of the U nited States, please co nract: Inrernational Sales,
1-3 17 -58 1-3793, inrernati o nal@pearsonrech group.com.

Porti ons of Chapter 1, Copyright © 1994 Arn old David Robbins, first appeared in an article in Issue 16 of Linux
JournaL, reprinred by permi ssion.
Porti on s of the documenratio n for Valgrind , C o pyright © 2003 Julian Seward , reprinred by permi ssion.
Portions of the documentatio n fo r the DBUG library, by Fred N. Fish, reprinred by permiss ion.
The GNU programs in this book are Copyright © 1985-2003, Free Software Foundati on , Inc .. T he full list of fil es
and copyright dates is provided in the Preface. Each program is "free software; you can redistribute it and/or modify
it un der the terms of the G NU General Pu blic License as pu blished by the Free Software Foundation; either version
2 of the License, or (at your option) any later version." Appendi x C of this book p rovides m e text of the GNU
General Public License .
.All V7 Unix code and docum enration are Copyri ght © C ald era International In c. 2001 -200 2. Al l ri ghts reserved .
They are reprinred here und er the terms of th e C aldera Ancient UN IX License, which is reprod uced in full in
Appendix B.
Cove r im age courtesy of Parks Sabers, Inc. T he Arc-Wave(tm) saber is manufactured by Parks Sabers, Inc., C opyright
© 2001 , www. parksabers.com. Parks Sabers is not associated with any Lucasfi lm Ltd . properry, fi lm, or franchi se.
The programs and applications presenred in thi s book have been included fo r th ei r instructi onal val ue. They h ave
been tested with care but are not guaranteed fo r any particular purpose. The publisher does not offe r any warranti es
or represe ntations, nor does it accept any li abi lities with respect to the programs or applications. UN IX is a registered
tradem ark of T he O pen G roup in the U ni ted States and oth er co untri es.
Microso ft, MS, and MS- D OS are registered trademarks, and W indows is a trad emark of Microsoft Co rpo rati o n in
the U ni ted States and other countries. Linux is a registered trademark of Linux Torvalds.
All com pany and product names mentioned herein are the tradematks or registered tradem arks of th eir respective
owners.
This ma te rial may be distributed only subj ect ro th e terms and conditions set fo rth in th e O pen Pu bli cati on License,
vl.O or later (the latest version is presently available at http://www.opencontenr.org/openp ub/). with License
Option B.
Printed in the United States of America
ISBN 0-13-142964-7
T ext printed on recycled paper
First primi ng
Pearso n Ed ucation LTD .
Pearson Ed ucation Austral ia PIT, Lim ited
Pearso n Education Sin gapo re, Pte. Ltd .
Pearso n Ed ucation North Asia Ltd.
Pearso n Ed ucation C an ada, Ltd.
Pearso n Ed ucacion de Mexico, S.A. de c.v.
Pearso n Education- Japan
Pearso n Ed ucation M alays ia, Ptd. Ltd.
Contents

Preface ........................................................................................................... XVII

PART I Files and Users ........................................... . 1


Chapter 1 Introduction ............................................................................. . 3
1.1 T he Linux/Unix File Model .. .. .... ............. .... ............ ...... ..... ..... ....................... . 4
1.1.1 Files and Permissions ... ............................... .. ....... ... ............ ..................... . 4
1.1.2 Director ies and Filenames ......................... .. ...................... ........... ......... .. . 6
1.1.3 Executable Files ..... ................. .. ........ ................. .......... ... ...... ..... .. ...... .. ... . . 7
l.1.4 Devices .... ...... ... .. ................... ... ....... ...... .. ...... ........ ........ .. ..... ....... ... ........ . . 9
1.2 The Linux/Un ix Process Model ....................................... ........ ...... ............. .... .. 10
l.2.1 Pipes: H ooking Processes Together .............................................. .......... .. 12
1.3 Standard C vs. Original C ....................................................................... .. ...... . 12
l.4 Why GNU Programs Are Better .... .. .. .. .... ...... .............. .. .. .. ............ .................. . 14
1.4.1 Program Design ....................................................... ... ... ...... ... .... ........... .. 15
1.4.2 Program Behavior ............ ......... .......... .... ........ ... .......... .. ....... ... ..... .... .. .... .. 16
1.4 .3 C Code Programming ............ .. ................. ........................ .............. .. .. ... .. 16
l.4.4 Things That Make a GNU Program Better ........... ....................... ... ........ . 17
1.4.5 Parting Thoughts about the "GNU Coding Standards" .............. .. ......... .. 19
1.5 Portability Revisited .................. ....... ......... ........... ........ ............... ..... .. ..... .. .. ..... 19
l.6 Suggested Reading ............ .... ...... ........ ............. .... .. .... .. .. .............. ........ .. ........... 20
1.7 Summary .. ............... ............. .... .. .............. ... ... ....... ........................................... 21
Exercises .. ....... ....... .. .. ...... ... ............... ............. .. .. .... ..... ... .................. ............. ... ........... 22
Chapter 2 Arguments, Options, and the Environment ................................ 23

2.1 Option and Argument Co nventions ............................................... ................. . 24


2.1.1 POSIX Conventions .... ....... ..................... ...... .. ...... .... ... ........... ................ . 25
2.1 .2 GNU Long Options ............ .......... .. ................. . .. .... ............... .... ...... .. .... .. 27

v
VI Coments

2.2 Basic Command-Line Processing ............................ .......................................... 28


2.2.1 The V7 echo Program ............... . ............ .. .............. ........ .. .... .... .... .. ........ 29
2.3 Option Parsing: getopt () and getopt_long () ................ ... ...... ........ ...... 30
2.3.1 Single-Letter Options ....... .. ..... ... .................... ... .... .. ..... .. ............ .. .... ........ 30
2.3.2 GNU getopt () and Option Ordering .. ... ... .. .. .. ....... ....... .. .. ..... ... ... ....... 33
2.3.3 Long Options ... ... ..................................................................................... 34
2.3.3.1 Long Options Table .. .... .... .... ...................... ......... .......... ....... .. ..... ...... 34
2.3.3.2 Long Options, POSIX Style ........ ...................... .. ............................... 37
2.3.3.3 getopt_long () Return Value Summary ............................... ...... ... 37
2.3.3.4 GNU getopt () or getopt_ long () in User Programs ......... ..... .. 39
2.4 The Envi ronment ............................................................................ ............. .... 40
2.4.1 Environment Management Functions ................. .. ..... ................. ............ . 41
2.4.2 The Entire Environment: environ ........ ........................................ ....... . 43
2.4.3 GNU env ........ ........ ... ...... ..... .. .................. .............. .... .. ................. ........ . 44
2.5 Summary .......................... ...... ... ... ........ .................. .. ......................... ......... ...... 49
Exercises.. .... ... .... ........... .... .......... .. . ...... ... .... .................. ... ................. ....... ................... 50
Chapter 3 User-Level Memory Management .............................................. . 51

3.1 Linux/Unix Address Space ............................. .................. ... ...... ................ ....... . 52
3.2 Memory Allocation ....................................................... .............................. .... . 56
3.2.1 Library Calls: malloc (), calloc (), realloc (), free () ............ .. 56
3.2.1.1 Examining C Language Details ............................................ .. ............ . 57
3.2.1.2 Initially Allocating Memory: malloc () .......................................... . 58
3.2.1.3 Releasing Memory: free () ...................... .. .................................... .. 60
3.2.1.4 Changing Size: realloc () ............................................................ .. 62
3.2.1.5 Allocating and Zero-filling: calloc () ............................................ .. 65
3.2.1.6 Summarizing from the GNU Coding Standards .... .................... ........ .. 66
3.2.1.7 Using Private Allocators .................................................................... .. 67
3.2.1.8 Example: Reading Arbitrarily Long Lines ...................... .... ................ . 67
3.2.1.9 GLIBC Only: Reading Entire Lines: getline ( ) and getdelim () . 73
3.2.2 String Copying: s trdup () .................................................................... . 74
3.2.3 System Calls: brk () and sbrk () .......... .. ........................................ .... .. 75
3.2.4 Lazy Programmer Calls: alloca () .......... .............. ............................... . 76
3.2.5 Address Space Examination .. .............................................. .................. .. .. 78
3.3 Summary .. ... .. ....................... ........... ......................................... .. ..... ......... ... .... . 80
Exercises ........................ ............................................... ........ ........................... ...... .. .. .. 81
Contents VII

Chapter 4 Files and File I/O ...................................................... ................. 83


4.1 Introducing the Linux/Unix 1/0 Model...... ..... ... .. ...... ........ .......... ........... ........ 84
4.2 Presenting a Basic Program Structure ..... .... ......... .... ....... ............. .... .. ....... ... . .... 84
4 .3 Determining What Went Wrong ......... .... ........................................................ 86
4.3. 1 Values for errno .................................. ...... ... .......................... ..... ........... 87
4.3.2 Error Message Style ......... .. ......... .. ................ ..... ........... ...... ....... ....... ..... .. . 90
4.4 Doing Input and O utput .............................. ........ ....................... ... ................. 91
4.4 .1 U nderstandin g File Descriptors ...... .. ........ ........ .... .... ........ ..... ........... .. ...... 92
4.4.2 Opening and Closing Files ....................................................................... 93
4.4.2 .1 Mapping FIL E * Variables to File Descriptors .................... ....... .. .... 95
4.4. 2.2 Closin g All Open Files .... .. .................... .. .... ........ .... ........ .. .................. 96
4.4.3 Reading and Writing ................................................... .. ...... ...... .. ............. 96
4.4.4 Example: Unix cat ................................ .... .. ............ ...... .......................... 99
4.5 Random Access: Movin g Around within a File .. .. ...... ...... .. ........................... .... 102
4.6 Creating Files .. ................. ... ............ .... ................................. .... ..................... ... 106
4.6.1 Specifying Initial File Permissions ...... ........ ...... ..... .. ... ................ ...... ........ 106
4. 6.2 C reating Files with crea t () ............ .... ........... .... .... ..... .. .... .... .. ...... ........ 109
4.6.3 Revisiting open () ........ ...... ... .. ..... .. ...... .. ...................... ...... ..... ............... . 110
4.7 Forcing Data to Disk ... .... .. ............ ...... .......... .. .... .... ................ .... .. ............ .... .. . 113
4.8 Setting File Length .................. ...... .... ...... .. .................. ..... .......................... .. .... 114
4.9 Summary.......... ........ ...... .... ......................... ............................ ... .... .................. 115
Exercises .... .. ........ .... ... .. .. .... .. ..... .. ... .... ....... ...... ..... .. ......... ........ .. .......... ... .................. ... 115
Chapter 5 Directories and File Metadata .................................................... 117
5.1 Co nside ring Directory Co ntents.. ......... ..... ............ .......... ................................. 118
5.1.1 Definitions ........ .... ............. ............ .. .......................................... ...... ...... .. 118
5.1 .2 Directory Contents ................. ....................... ... .. ........ ........ .... .... .. ...... .... .. 120
5.1.3 H ard Links .. .. ........... .. ... .. ........ ...... ...... ....... ... ... ... ..... .. .. .................... .... ... . 122
5. 1.3. 1 The GNU 1 ink Program .. .... ........ .... ............................................. .. . 123
5. 1.3.2 Dot and Dot-Dot ........ .... ...... ........................ ..... ........... ................... . . 125
5.1.4 File Renaming ........ .... ....... .. .... .. .... ...... ............ .. ...... ......... .... .. .. ............... . 125
5.1.5 File Removal ......... ..... .......... ... .... ............................. .... .... ...... ................. . 126
5. 1.5.1 Removing Open Files .. .... ........ ........ ................. .. ........ .. .. .... .... .......... .. 127
5. 1.5.2 Using ISO C: r emove () .... ............................................................. . 127
5.1.6 Symbolic Links ........ ................ .. ... ....... ........ .......... .. ............................... .. 128
VIII Contents

5.2 Creating and Removing Directories ... .... ....... .. ......... .... ... .. .... .. .......... ... ............. 130
5.3 Reading Directories ..... ..... .... ............ ...... ...... .... ......... ... ..... .. .. ...... ............... ..... . 132
5.3.1 Basic Directory Reading .... ... ..... .... ............ ...... ................ ......................... 133
5.3.1.1 Portability Considerations ... .... ........... .. .............................................. 136
5.3. 1.2 Linux and BSD Directory Entries ...... ..... ... .......... ..... ............. ...... .... .. . 137
5.3.2 BSD Directory Positioning Functions ............. ..... ..... ................. .. ... ........ . 138
5.4 Obtaining Information about Files ........... ..... ...................... ... .......... .. .... .......... 139
5.4.1 Linux File Types .... ........ ....... ....... ... ........................ .... ...... ...... .... ..... .... .... . 139
5.4.2 Retrieving File Information ... ..... .......... ..... .. ..... .... ..... ............. ..... .... .... ..... 141
5.4.3 Linux Only: Specifying Higher-Precision File Times ....... ... .. .... ... ... .......... 143
5.4.4 Determining File Type .. .... ........... ..... ................ ..... .... ...... .. ...... ................ 144
5.4.4.1 Device Information ... ................. ... ...... ....... .. ............. .. ........ ...... ....... .. 147
5.4.4.2 The V7 cat Revisited .. ..... .... .. ... ..... ... ... ... .... .... .. .... ... ... .. ... ..... ... .. ....... 150
5.4 .5 Working with Symbolic Links ....... ... ..... ...... ........... ... ........ ...... .. .... ..... ...... 151
5.5 Changing Ownership, Permission, and Modification Times .. ... ....... .. ............. .. 155
5.5.1 Changing File Ownership: chown (), fch own (), and l c h own () ....... 155
5.5.2 Changing Permissions: chmod() and f chrnod() ..... ... ........... .......... ... .. 156
5.5.3 Changing Timestamps: ut ime () ........ ........... ..... ..... ........ .... ........ ........ ... 157
5.5.3.1 Fakingutime ( f ile, NULL) ... . ... ........ .. .. ................ .. ................... . 159
5.5.4 Using fc h own () and fchrn od () for Security ...... ....... ... .................. .. ... 161
5.6 Summary ............ .. ....... ............. ...... ... ........... ....... ... .. .. ..... ..... ............. ...... ... .... .. 162
Exe rcises. .. ....... ... .......... ...... ........... ..... ...... .... .... ........ .... .. .. .... ... ..... ... ............. ... ..... ....... 163
Chapter 6 General Library Interfaces - Part 1 ............................................ 165
6.1 Times and Dates ......... ............. ........ .......... .......... ................. ......... ... ... ...... .. .... 166
6.1.1 Retrieving the Current Time: time () and difftime () .................. .. .. 167
6.1.2 Breaking Down Times: gmtime () and l ocalt ime () ... .......... ..... ....... 168
6.1.3 Formatting Dates and Times ............. .. .. .. .......... ..... .......... .. .. ......... ..... ...... 170
6.1.3. 1 Simple Time Formatting: asctime () and c time ( ) ..... ......... ........ 170
6.1.3 .2 Complex Time Formatting: str ft i me () .. ... .. .. ......... .... ... ... ....... ... .. 171
6.1. 4 Converting a Broken-Down Time to a t i me _ t ... ............ ........ ..... .. ........ 176
6.1.5 Getting Time-Zone Information .... ...... ......... ... .. .. ... .. ............................... 178
6.1.5.1 BSD Systems Gotcha: time zone ( ) , Not timezone ....... .......... ..... 179
6.2 Sorting and Searching Functions ... ............................................... .. ... .. .. ........... 181
6.2.1 Sorting: qso rt () .... ........ ... ..... ....... ... ....... ...... .... ....... ...... ........... ... ... .... .. 181
Contems IX

6.2.1.1 Example: Sorting Employees .................... ................... ............. ......... . 183


6.2.1.2 Example: So rring Directory Contents ... .......... .... .. .. .. .. ....... ................ . 188
6 .2.2 Binary Searchin g: bsearch () ........ .............. .... ................ .... ..... .. .......... . 191
6.3 User and Group Names .............. ...... ........................... ............ ...... ...... ............ . 195
6.3. 1 User Database ......... ........... .. .................... ............................................... . 196
6.3 .2 Group Database ........... ...... .. ................................................................... . 199
6.4 Terminals: isatty () ........ ................................................... ........................ .. 202
6.5 Suggested Reading .......... ................. .............................. ...... .... ........ ............. .. .. 203
6.6 Summary ............................. .......... ... ...... ......................................................... . 203
Exercises ................. .. ................................. .. ...... .. ....... .. ... .......... .. ............................... . 205
Chapter 7 Putting It All Together: ls ......................................................... 207

7. 1 V7 ls Options ......................................................... .. ...................................... 208


7.2 V7 ls Code ......................................................... .. ............................. .. .... .... .. . 209
7.3 Summary ....... .......... .... ...... ..... .......... .. ................ .. ...... ........ ...... ... ..................... 225
Exercises............ .................................. ...... ...................... ................... .. ........ ............. .. 226
Chapter 8 Filesystems and Directory Walks ................................................ 227

8. 1 Mouming and Unmounting Fi lesystems ...................................................... .... 228


8.1.1 Reviewing the Background ...... .. .................................. ....... .... .. .............. .. 228
8.1.2 Looking at Different Filesystem Types .............. .. ...... .. .................. .. .... .... . 232
8.1.3 Mounting Filesystems: mount ............................................... .. ................ 236
8.1.4 Unmounti ng Filesystems: urnount ................................................ .......... 237
8.2 Files for Filesystem Admin istration ................................................................... 238
8.2.1 Using Mount Options .................. .. ............ ............ .................................. 239
8.2.2 Working with Mounted Filesystems: getrnntent () ................ .............. 24 1
8.3 Retrieving Per-Filesystem Info rmation .......................................................... .. . 244
8.3 .1 POSIX Style: statvfs () and fstatvfs () ................ .. .. .. ...... .. .......... 244
8.3.2 Linux Style: statfs () and fstatfs () ............ .. ............. .. .................. 252
8.4 Moving Around in the File Hierarchy...... ...... .................................. ...... ...... .... 256
8.4.1 Changin g Directory: chdir () and fchdir () ...................................... 256
8.4.2 Getting the Current Directory: getcwd () ....... .. .................. ............... .. . 258
8.4.3 Walkin g a Hierarchy: nftw () ........................................... .. ................... . 260
8.4.3. 1 T he nf tw () Interface .................... ................... ................................ 26 1
8.4.3.2 The nftw () Callback Function ...... .... ............................. .. ............... 263
x Contents

8.5 Walking a File Tree: GNU du .. ................ .... ... ... .................. ..... .. .. .. ... ........... .. 269
8.6 Changi ng the Root Directory: c hr oo t () ..... ... .. ...... .... .. ...... ..... .... ..... ........ .... . 276
8.7 Summary ... ................ .. .. .... .. ... ... .......... .... .... ..... .. ... .. .... ...... ...... ... ... .... ... .. .... .. .... 277
Exercises .. ..... ... .... ... .... .. ...... ... ...... .. .. .... ...... ...... .. ... ... ..... ..... .... ... .... ......... ....... .. .. ... ..... ... 278

PART II Processes, IPC, and Internationalization ..... .. 281


Chapter 9 Process Management and Pipes ....... ........... ............ ...... .......... .. . 283

9.1 Process Creation and Managem ent .. ...... .. .. .. ...... .. ...... .. ...... .. .... .. .. .. .. .. .... .. ........ . 284
9.l. 1 Creating a Process: fo rk () ...... ............ .. .... .................................... .. .... .. . 284
9.l.1.1 After the fork () : Shared and Distinct Attributes .......... .. .... .. .. ........ . 285
9. l.l.2 File Descrip to r Sharing .... .... .............. ............ .. .. .... .. .. .. .. .. .......... .. .... ... 286
9.1.l.3 File Descriptor Sharing and clo s e () ...... .. .. .. .. .. .. .. ....... .... .. ...... .. .... .. 288
9.1.2 Identifying a Process: getpid ( ) and getppid () ...... .. .. .... .... .. ........ .. .. 289
9.1.3 Setting Process Priority: ni c e () .. ............ .. ...... .... ...... .. .... .. .. .. .... ... .... ...... 29 1
9.1.3.1 POSIX vs. Reality ........ ...... .... .. .......... .. ........ .. .. .. .. .... ...... ...... ... ............ 293
9.1.4 Starting New Program s: T he exec () Family ...... .... .. ...... ........ ........ .... .. .. 293
9.l.4.1 T he e x ecve ( ) System Call ............ .. .................... .... .. .. ...... ...... .... .. .. 294
9. l.4.2 Wrapper Functions: e xec l () et al. ............ .. ........ .. .. .... .. .. .. .. ... ......... 295
9. l.4.3 Program N ames and a rgv[O) .. .. .. .... .... .. .. .. .. .. .. ............ .. .......... .... .... 297
9.1.4.4 Attributes Inherited across exe c () .. .. .. .. .. .. .. .. .... .. ....................... .. .. .. 298
9.1. 5 Terminating a Process .......... .. .. ...... .. .. .. .. .......... ........ .... .. ........ .. .............. .. 300
9.1.5.1 D efining Process Exit Status .. .. ...... .... .......... .. .. .. .. .. .. ........ .. ....... ......... . 300
9.1.5.2 Returning from ma i n ( ) .................. .. .. .. .......... ........... .. ......... ....... .... . 30 1
9.1.5.3 Exiting Functions .. .. ..... .. .. ...... .. ............ .. .... .... ........ .. .. .. .. .. .......... ....... . 302
9.l. 6 Recovering a Child's Exit Status .. .. .... .. .... ................ ................. .. .. .. .. .. ...... 305
9.l.6. 1 Using POSIX Fun ctions: wa it ( ) and wai tp i d () .. ......... ...... .. ...... 306
9. l.6.2 Using BSD Functions: wai t3 () and wai t4 () ...... .. ... .... ................ 310
9.2 Process Groups .. .... .. ....... .. ...... ...... .. .... .. ................................ ..... ... .... .... .......... .. 312
9.2.1 Job C ontrol Overview .. .. .. ...... ........ .... ........ ...... .... .... .................. .. ............ 312
9.2.2 Process Gro up Identification: getpg r p () and g etpgid ( ) .... .. ............ 314
9.2.3 Process Gro up Setting: s etpgid ( ) and se t pgrp () .. .. .......... .... .......... 314
9. 3 Basic Interprocess Communication: Pipes and FIFOs .... .... .. .. .. ... .. .. ............ .. ... 315
9.3.1 Pipes ........ .. .. ................. ....... .. .... .... .. .. .. ..... ..... ...... ..... .. .... ... .. .......... .... ....... 315
9.3.1.1 Creating Pipes .... ...... .. .. .................... .... .. .. .. ...... ....... .. ...... .. ...... .... .. ..... 316
9.3.1.2 Pipe Buffering .. .... ...... ...... .. .... .. ............ ......... .................... .. ............ .. . 318
Contents XI

9.3.2 FIFOs .. ....... .............. ... ....... .. .. .... ..... .... .......... .. ...... .... ..... ...... ........ ............ 319
9.4 File Descriptor M anagement .. .... .. ... ...... .. .............. .... ..... .... ... .... .. ............. .. ...... 320
9.4.1 Duplicating Open Files: dup () and dup2 () ...... ...... ......... ....... ............. 321
9.4.2 Creating Nonlinear Pipelines: I dev I fd l xx ........................................... 326
9.4.3 Managing File Attributes: fcntl () ..... ................ .. ......... ....... .......... ....... 328
9.4.3.1 The Close-on-exec Flag ... ......... .. .... ... .... ........... ..... ........................... .. 329
9.4.3.2 File Descriptor Duplication ................................................................ 331
9.4.3.3 Manipulation of File Status Flags and Access Modes ............ .............. 332
9.4.3.4 Nonblocking I/O for Pipes and FIFOs .... .......................... .... ............. 333
9.4.3.5 fcntl () Summary ................ .. .... .. .. .. ...... .. ........................ .. .... ........ . 336
9.5 Example: Two-Way Pipes in gawk .................................................................. 337
9.6 Suggested Reading ...... .... ................... ............. .......... ...... .. .. .... .. .............. .. .. .. .. .. 34 1
9.7 Summary ..................... ... .. ... ...... .. ....... ............... .. ............. ... .. ........................... 342
Exercises ... ........ ... ........ .... ........... ...................... ... .................. ...... ... ............... .. .. ..... ... .. 344
Chapter 10 Signals ...... .... ...... .. .. ........................... .. .......... ........ ..... .... .. .. .... .. . 347

10.1 Introduction .......................... ... . ..................................... ................ ...... ............ 348


10.2 Signal Actions.......... ......... ... ..... ............ .... . ................ .. ..... .. .... .. ........................ 348
10.3 Standard C Signals: signal () and raise () ................... .... .... .. .. ............ .. .. 349
10.3.1 The signal () Function .............................. .................................. .. ...... 349
10.3. 2 Sending Signals Programmatically: raise () ....... ...... ............... ...... ........ 353
10.4 Signal H andlers in Action .......................................... .. ..... .. ...... .... .................. .. 353
10.4.1 Traditional Systems .................................................................................. 353
10.4.2 BSD and GNU/Linux ........ .......................... .. .. .. ........................ .. ........ .... 356
10.4. 3 Ignoring Signals ....................................................................................... 356
10.4.4 Restartable System Calls ........................................................................... 357
10.4.4.1 Example: GNU Coreutils safe_read () and safe_ wri te () ...... 359
10.4.4.2 GLIBC Only: TEMP_FA I LURE_RETRY () ....................................... 360
10.4.5 Race Conditions and sig_a tomic_t (ISO C) ........ .. ...................... .. .... 361
10.4.6 Additional Caveats ............ .. ...... .. ............. ..................... ........................... 363
10.4.7 Our Story So Far, Episode I .................. .................. .. ............................... 363
10.5 The System V Release 3 Signal APIs: sigset () et al. .................................... 365
10.6 POSIX Signals ......... ......................................................................................... 367
10.6.1 Uncovering the Problem...... ...... ...... .. ........ ........ .. ...... ........................ .... .. 367
10.6.2 Signal Sets: sigset_t and Related Functions .......... .. .. ...... .. .................. 368
XII Contents

10.6.3 Managing the Signal Mask: sigpr ocmask () et al . .. .. .......................... . 369


10.6.4 Catching Signals: sigact ion () ....... ............ ... ................. ............ .. .... ... 370
10.6.5 Retrieving Pending Signals: sigpending () .. ........ ..... .............. ........ .. ... 375
10.6.6 Making Functions Interruptible: siginterrupt () ..................... ....... . 376
10.6.7 Sending Signals: kill () and killpg () .. ... .... ...... .... .... .. ........ .... .... ..... . 376
10.6.8 Our Story So Far, Episode II .. .......... .. ... .... .. ....... ...... ........ ....... ... ..... .. ...... . 378
10.7 Signals for Interprocess Communication .. ............ ............................................ 379
10.8 Important Special-Purpose Signals .................. ...... .. ....... .................................. 382
10.8.1 Alarm Clocks: sleep ( ) , alarm ( ) , and SIGALRM ............................. .. 382
10.8.1.1 Harder but with More Control: a larm() and S IGALRM ...... ........... 382
10.8.1.2 Simple and Easy: sleep () ..... ............................... ........ ...... ........ ...... 383
10.8.2 Job Control Signals ................. ...... ................ .... ... ... .... .... .. ... ........... .. ... .... 383
10.8.3 Parental Supervision: Three Different Strategies ....... .. .. .. ....... .. ........ ......... 385
10.8.3.1 Poor Parenting: Ignoring Children Completely......... .. ............... .... .... 385
10.8.3.2 Permissive Parenting: Supervising Minimally.. ......... .. ... ....... ......... .. .. 386
10.8.3.3 Strict Parental Control........................................... ..... ..... ............. ...... 393
10.9 SignalsAcrossf ork() andexec() .. .................................... .............. .. ......... 398
10.10 Summary .... .. ..... ...................................... ........... ..... ... ... .. ............................. .. .. 399
Exercises ........ .. ..... ... ....... .. ........ .... ..... ... .............. .. .. .......... ... ................................... ..... 401
Chapter 11 Permissions and User and Group 10 Numbers ........................... 403
11.1 Checking Permissions .. ...... ....... ................... .. ............. ... .. ..... ...... .. ... .. ..... .. ..... ... 404
11.1.1 Real and Effective IDs ...... .... ....................................... ............................. 405
11.1.2 Setuid and Setgid Bits .............................................................................. 406
11.2 Retrieving User and Group IDs ...... ...... .................... .............. .. ............ ............ 407
11.3 C hecking As the Real User: access () ................ ................ ...................... ..... 410
11.4 Checki ng as the Effective User: euidaccess () (G LIBC) .... .. ...................... . 412
11.5 Setting Extra Permission Bits for Directories ............ .. ...................................... 412
11.5.1 D efault Group for New Files and Directories .... .......... ............................. 412
11.5.2 Direcrories and the Sticky Bit ................................................................... 414
11.6 Setting Real and Effective IDs ................................ .. .. .. .................................. .. 415
11.6.1 C hanging the Group Set .......... .. .. .................... .............................. ...... .... 416
11.6.2 C hanging the Real and Effective IDs .. .............. ........................................ 416
11.6.3 Using the Setuid and Setgid Bits .............. .. ............ .............................. .... 419
11.7 Working with All Three IDs: getresuid () and setre suid () (Linux).. .. 421
Co ntents XIII

11.8 C rossing a Security Minefield: Setuid root .... ... ......... ... ...... ........... ....... .. ........ 422
11.9 Suggested Reading .................................. ......... .... .............. .......................... ..... 423
11.10 Summary ... ........ ...................... ........................ .. ... .. .... ... .. .............. .... .. .. ........... 424
Exerc ises ..... .. .. .................. ....... ... ....... ... ... ......... ......... ......................... .... .. ....... ...... ...... 426

Chapter 12 General Library Interfaces - Part 2 ................... ............... .......... 427

12.1 Assertion Statements: as se r t () .............................. .. ... ......... .... .. .... ... .. ...... ... 428
12.2 Low-Level Memory: T he me mXXX () Functions ...... .... ..... ............. .. .......... ...... 432
12.2. 1 Setting Memory: me mset () .... .. ........ .... ................................................ . 432
12.2.2 Copyi ng Memory: memcpy ( ) , memmove ( ) , and memcc py () .. ..... .. .. .. 433
12.2.3 Compar ing Memory Blocks: memcmp () .................. ...... ............. ... .. ...... . 434
12.2.4 Searching for a Byre Value: memc hr () .... .. ...... .. ... .. ................. .. .. .. ...... .. .. 435
12.3 Temporary Files .......... .......... ......... ........... ..... ... ............................. .... .. ...... .. .. .. 436
12.3.1 Ge nerating Temporary Filenames (Bad) ................................ ...... .. ... .. ...... 437
12.3.2 Creating and Openi ng Temporary Files (Good) .................... ... .. .... .... ... ... 44 1
12.3 .3 Using the TMPDIR Environment Variable ........ .... ................................ .... 443
12.4 Committing Suicide: abo rt () ....................... .. ........ ......................... ........ ..... 445
12.5 Non local Gotos .............................. .. ......... ..... ..... .............. ..... ......... ................ . 446
12. 5. 1 Using Standard Functions: se tjmp () and longjmp () ..... .... ..... .......... 447
12. 5.2 H andli ng Signal Masks: si g s etjmp ( ) and si g l o ng j mp () .. .. .... .. ..... 449
12.5.3 Observing Important Caveats .... .. .. .. .......... .. .. ... ............... ...... ................... 450
12.6 Pseudorando m Numbers ........ .................... .. ........ ......... .. ..... ....................... ..... 454
12.6. 1 Standard C: rand () and srand () ...... ........ .. ... ........ .... .... ................ ..... 455
12.6.2 POSIX Functions: random () and srandom () .................. .............. ..... 457
12.6.3 The Idev / random and Idev / urandom Special Files ...... .. .... .............. 460
12.7 Metacharacter Expans ions...... ..................... .. .... .......... .... .......... ....................... 461
12. 7. 1 Simple Pattern Matching: fnma tch () ..... .. ... .. ... ....... .... .. .... ............... .... 462
12.7.2 Filename Expansion: gl o b () and g lob free () .... ... .. ........ ..... ...... .. ..... 464
12.7.3 Shell Word Expansion: wo r d exp ( ) and wo r dfree () ......................... 470
12.8 Regular Exp ressions .... ........... ........ .................. .. .. ............................................. 47 1
12.9 Suggested Reading ........... .... ......... ..... ....................................... ..... .. ................. 480
12.10 Summary .... .......... .............. ... ........ ....... .. ...... .. ....... ................... ... .... .. .... ........... 48 1
Exercises ................. .............. ... ......... ...... ... ... ........... . .... ........ ....... .. ......... ... ........ .... ...... 482
XIV Contents

Chapter 13 Internationalization and Localization .......................................... 485

13.1 Introduction ............. ........................................ ... .. .... .............................. ......... 486


13.2 Locales and the C Library . ... .. ........................................... ................................ 487
13.2.1 Locale Categories and Environment Variables ........... .. ........... .................. 487
13.2.2 Setting the Locale: setlocale () ....... ....... ......................... ................... 489
13.2.3 String Collation: strcoll () and strxfrrn () .... ...... ... ........................ 490
13.2.4 Low-Level Numeric and Monetary Formatting: localeconv ( ) 494
13.2.5 High-Level Numeric and Monetary Formatting: strfrnon ( )
andprintf () .............. .... .. .... ............... ... .... .. .......... ........ .. ........... ........ . 498
13.2.6 Example: Formatting Numeric Values in gawk ..... .... .... ............ .... ......... . 501
13.2.7 Formatting Date and Time Values: ctirne () and strftirne () ....... ... . 503
13.2.8 Other Locale Information: nl_langinfo () ..... ....... .. .... ......... ............ .. 504
13.3 Dynamic Translation of Program Messages ...... .... ... ............ ... ................... ...... . 507
13.3.1 Setting the Text Domain: textdornain () .... ........... ..... .. ..... ................. . 507
13.3.2 Translating Messages: gettext () .................................. .................. .... . 508
13.3 .3 Working with Plurals: ngettext () ....... .......... .......................... .... ....... . 509
13.3.4 Making get text () Easy to Use .... ...... .... ... ... .. .... ... ........ ..... ........... ...... . 510
13.3.4.1 Portable Programs: "gettext. h" ........... .. ..... ...... ..................... ..... . 511
13.3.4.2 GLIBC Only: <libintl.h> .......... ...... ..... ... ... ... ... ..... ... .. ............... . 513
13.3.5 Rearranging Word Order with printf () ..... ................ ............ ............ . 514
13.3.6 Testing Translations in a Private Directory ...... ...... ....... ...... ... ................. . 515
13.3.7 Preparing Internationalized Programs ......... ..... ...... ... ................ ..... .. .. ...... . 516
13.3.8 Creating Translations ................. .............. .................................. ...... ....... . 517
13.4 Can You Spell That for Me, Please~ ........ .................. ........ .. ... .... ...... ................ . 521
13.4. 1 Wide Characters .. ....... ............ ............... .. ............................................. ... . 523
13.4.2 Multibyte Character Encodings ...... ... .. ................................ .................... . 523
13.4.3 Languages ................................. .... ....................... ... ... ..... .. .......... .... .... ... .. . 524
13.4.4 Conclusion ................ ................ ............... ..... ...... ....... .. ... ....... .. .. .. .. ... ... ... . 525
13.5 Suggested Reading .......... ................. .................................... .... .............. ....... .... 526
13 .6 Summary .... ..................... .............................................................................. ... 526
Exercises ........... ........................ ............. .. ....... ... ... .. ... ........................... .............. ......... 527
Chapter 14 Extended Interfaces ................... ........ ....................... ................. 529

14.1 Allocating Aligned Memory: posix_rnernal ign () and memalign () ........ . 530
14.2 Locking Files ............ ................. .. ...... ...................... ........................... ......... .... . 531
Comems xv

14.2. 1 File Locking Co ncepts ..... ................ ............. ................. .. ...................... ... 531
14.2.2 POSIX Locking: f cntl () and loc H () ....... ..... .......... ....... .......... .. .. ... 533
14.2.2.1 Describing a Lock ..... ... ...... ..... ...... ..... .... .......... ... .... ................ ...... ...... 533
14.2. 2.2 O btaining and Releasi ng Locks ............. .... ................ .. .......... ........ ...... 536
14.2. 2.3 O bserving Locking Caveats .......... .. ...... .. ................. .. .. .. ...... ............... 538
14.2.3 BSD Locking: flock () ........ ........ .. ..................................... .... .... ........... 539
14.2.4 Mandatory Locking ........................ .. .. .... ..... .... ........... ... ...... ..... .......... ... ... 540
14.3 More Precise Times .. ...... .... .. .. ...................... ........ .... ........... .... ........ ................. 543
14.3. 1 Microsecond Times: get time o fday () ...... .. ....... .. .............................. . 544
14.3.2 Microsecond File Times: utimes () .... ...... ...... ........ .. .. .. ............ .. .... .. .... . 545
14.3.3 Interval Timers: seti timer () and geti timer () .. .. .. .. .. .... .. .... ........ . 546
14.3.4 More Exact Pauses: nanosleep () .. .......................................... .. ...... ..... 550
14.4 Advanced Searchin g with Binary Trees .. .. .......... .... .... .. .. ...... ........ ....... .. .. .. ........ 551
14.4.1 Introduction to Binary Trees ...... .. .......... ...... .. ............................ .... .......... 551
14.4.2 Tree Management Functions .......... .. .......................................... .. .......... .. 554
14.4.3 Tree Insertion : tsearch () ............ .. ...................................................... 554
14.4.4 Tree Lookup and Use of A Returned Poin ter: t fin d () and
tsearch () ............ ......... ................................ .... ... .............. .. ... ............. 555
14.4.5 Tree Traversal: twalk () .......... ........ ...... .... .... .. ........ .... ........................... 55 7
14.4.6 Tree Node Removal and Tree Deletion: tdelete () and tdest r oy (). 561
14.5 Summary ............. .......... ......... .. .. .. ....... ........ .... ..... .... .... ..... ......... .... ........... ... .... 562
Exercises .. ... ....................... ...... .. ... .... ... .... .... ...... ..... ....... ............................................. . 563

PART III Debugging and Final Project ........................ 565


Chapter 15 Debugging ................................................................................. 567

15.1 First T hings First .. ...................................................... .. ............................. ....... 568


15.2 Compilation for D ebugging ................ .... ...... .... .. ........ .. .............. ........ .. ........... 569
15.3 GDB Basics .... ..... .. ...... ...... ...... .......................... .. ......... ... ............ ....... .............. 570
15.3.1 Running GDB ........................................................... .. ............................ 57 1
15.3.2 Setting Breakpoints, Single-Stepping, and Setting Watchpoints .............. . 574
15.4 Programming for Debuggi ng .... .. .................................................... .. ............ .. .. 577
15.4.1 Compile-Time Debugging Code ........................... .... ............. .... .......... ... . 577
15. 4.1.1 Use Deb ugging Macros .... ............ .... .......... ..... .. ......................... .. ...... 577
15 .4. 1.2 Avoid Expression Macros If Possible ................ ............. ........ ........ ...... 580
15 .4. l. 3 Reorder Code If Necessary ............ .... ...... .... .. ........ .... ............. .. .......... 582
XVI Contents

15.4.1.4 Use Debugging Helper Functions ... ... .. ........... ...... ... ... .... ..... ........... .. .. 584
15.4.1.5 Avoid Unions When Possible ... ............. ....... .......... ...... ...... .. .. ...... ...... 591
15.4.2 Runtime Debugging Code ...... ................ ........ ....... ..... .. ........ .... ........ ....... 595
15.4.2.1 Add Debugging Options and Variables..... ... ........... .......... .. ...... ......... 595
15.4.2.2 Use Special Environment Variables ........ ...... ..... ..... ..... ........ .. .... .. ...... . 597
15.4.2.3 Add Loggi ng Code .......... ............... .. ............. ... .. .... .... ........ ..... .. ......... 601
15 .4.2.4 Runtime Debugging Files ....... ...... ..... ..... .......... .... .............. ..... ........... 602
15.4.2.5 Add Special H ooks fo r Breakpoints ..... ............ ........... .... .... .... ..... ...... . 603
15.5 D ebugging Tools .............. .. .. ..... ........ ........ ... .. .. ... ... ....... ......... ..... ......... .... .... .... 605
15.5.1 The dbug Library - A Sophisticated p r i n tf () ........ ............. ...... ..... .. . 606
15.5.2 Memory Allocation Debuggers ... ... ...... ........ .. .. ........ ................................. 612
15 .5.2.1 GNU/Linux mtrace .. ... ..... .. ..... .. ... ..... ..... ... ....... ... .... ... ... .... .. .. ........ .. 613
15.5.2.2 Electric Fence ... .......... .... .. .. .... ...... .... ...... .. .. .... ......... .. ....... ...... .... ... .... . 614
15 .5.2.3 Debugging Malloe: dmalloc ..... ............ .. .... .... ... .... ...... ...... ... ..... ..... . 619
15 .5.2.4 Valgrind: A Versatile Tool... .. .. .... ..... ....... .... .... ........ .... ........ ... .... .. .. ... . 623
15 .5.2.5 Other Malloc Debuggers .. .... ............. ..... ... .... .. ....... .. ....... .... ...... .... .... . 629
15.5.3 A Modern l i nt .. ... ........ ....... ....... .... .... .. ... ..... ... ......... ... .. .......... .... .. .. ...... 63 1
15.6 Software Testing .. .. ...... ..... .......... .... .. ... ..... .... ... ... ........ .... ......... ... ........ .... ..... ..... 632
15.7 Debugging Rules ...... ............ .... .. ......... ..... ............... .. .. ........... ..... ....... .............. 633
15.8 Suggested Reading ... .. .......... ... ........ .. ... ...... ... ............... ..... .... ... .... ...... ... .. ... ..... .. 637
15.9 Summary .. .... ...... ... .. ........ .... .... .... ....... .... .......... ....... .. ... .... .... ....... .. ......... .......... 638
Exercises ...... ... ....... .. ..... ......................... .... ......... ..... ......... .. ...... ........... .. ........ .... ......... . 639
Chapter 16 A Project That Ties Everything Together ............................ ........ 641

16.1 Project Description .. .... .... .. ........ .... ........ ..................... ...... .. ...... ........ .. .... .... .. .. .. 642
16.2 Suggested Reading ......... ..... .... ........ ........ .. ..... .. .. ..... ...... .. ..... ...... ... .. ... .... .. ...... ... 644

PART IV Appendixes ......... ................................... .. ... 647


Appendix A Teach Yourself Programming in Ten Years .............. ................... 649

Appendix B Caldera Ancient UNIX License ..................... ............................... 655

Appendix C GNU General Public License ........................ ............................... 657

Index .......................................................... ....................... ............................. 667


Preface

O ne of the best ways to learn about programming is to read well-written pro-


grams. This book teaches the fundamental Linux system call APIs-those
that form the core of any significant program-by presenting code from production
programs that you use every day.
By looking at concrete programs , you can not only see how to use the Linux APIs,
but yo u also can examine the real-world issues (performance, portability, robustness)
that arise in writing software.
While the book's title is Linux Programming by Example, everything we cover, unless
otherwise noted, applies to modern Unix systems as well. In general we use "Linux"
to mean the Linux kernel, and "GNU/Linux" to mean the total system (kernel, li-
braries, tools). Also, we often say "Linux" when we mean all ofLinux, GNU/Linux
and Unix; if something is specific to one system or the other, we mention it explicitly.

Au d ien ce
This book is intended for the person who understands programming and is familiar
with the basics ofC, at least on the level of The C Programming Language by Kernighan
and Ritchie. (Java programmers wishing to read this book should understand C pointers,
since C code makes heavy use of them.) The examples use both the 1990 version of
Standard C and Original C.
In particular, you should be familiar with all C operators, control-How structures,
variable and pointer declarations and use, the string management functions, the use of
exi t (), and the <stdio. h> suite of functions for file input/output.
You should understand the basic concepts of standard input, standard output, and
standard error and the fact that all C programs receive an array of character strings
representing invocation options and arguments . You should also be familiar with the
fundamental command-line tools, such as cd, cp, date, ln, ls, man (and info if you

XVI I
XVIII Preface

have it) , rmdir, and rm, the use oflong and short command-line options, environment
variables, and I/O redirection, including pipes.
We assume that you want to write programs that work not just under GNU/Linux
but across the range of Unix systems. To that end, we mark each interface as to its
availability (GLIBC systems only, or defined by P05IX, and so on), and portability
advice is included as an integral part of the text.
The programming taught here may be at a lower level than you're used to ; that's
OK. The system calls are the fundamental building blocks for higher operations and
are thus low-level by nature. This in turn dictates our use of C: The APIs were designed
for use from C, and code that interfaces them to higher-level languages, such as c++
and Java, will necessarily be lower level in nature, and most likely, written in C. It may
help to remember that "low level" doesn 't mean "bad," it just means "more challenging. "

What You Will Learn


This book focuses on the basic APIs that form the core of Linux programming:
• Memoty management
• File input/output
• File metadata
• Processes and signals
• Users and groups
• Programming support (sorting, argument parsing, and so on)
• Internationalization
• Debugging

We have purposely kept the list of topics short. We believe that it is intimidating to
try to learn "all there is to know" from a single book. Most readers prefer smaller, more
focused books, and the best Unix books are all written that way.
So, instead of a single giant tome, we plan several volumes: one on Interprocess
Communication (IPC) and networking, and another on software development and
code portability. We also have an eye toward possible additional volumes in a Linux
Preface XIX

Programming by Example series that will cover topics such as thread program ming and
GUI programming.
The APIs we cover include both sys tem calls and library functions . Indeed, at the C
level, both appear as sim ple function calls. A system call is a direct request for system
services, such as reading or writing a file o r creating a process. A library function, on the
o ther han d, runs at the user level , possibly never requesting any services from the oper-
ating system. System calls are doc umented in section 2 of the reference manual (viewable
online with the man command) , and library functions are documented in section 3.
Our goal is to teach yo u the use of the Linux APIs by example: in particular, through
the use, wherever possible, of both original Unix so urce code and the GNU urilities.
U nfortunately, there aren ' t as many self-contained examples as we though t there'd be.
Th us, we have written numerous small demonstration programs as well. We stress
programming principles : especially those aspects of GNU programming, such as "no
arbitrary limits ," that make the G NU utilities into exceptional programs.
T he choice of everyday programs to study is deliberate. If you've been using
GNU/Linux for any length of time, yo u already understand what programs such as ls
and cp do; it then becomes easy to dive straight into how the programs work, without
having to spend a lot of time learning what they do.
Occasionally, we present both higher-level and lower-level ways of doing things.
Usually the higher-level standard interface is implemented in terms of the lower-level
interface or co nstruct. We hope that such views of what's " under the hood" w ill help
yo u understand how things wo rk; for all the code you wri te, you should always use the
higher-level, standard interface.
Similarly, we sometimes introduce functions that provide certain functio nali ty and
then recommend (with a provided reason) that these functions be avoided! The primary
reason for this app roach is so that yo u'll be able to recognize these functions when you
see them and thus understand the code using them. A well-rounded knowledge of a
topic requires understanding not just what yo u can do, but what you should and should
not do.
Finally, each chapter co ncludes with exercises . Some involve m odifying or writing
code. Others are more in the category of "thought experiments" or "why do you
think .. . " We recommend that yo u do all of them- they will help cement yo ur under-
standing of the material.
xx Preface

Small Is Beautiful: Unix Programs


Hoare's law:
"I nsid e every large program is a small program
struggling to get out."
-CA.R. Hoare-

Initially, we planned to teach the LinuxAPI by using the code from the GNU utilities.
However, the modern versions of even simple command-line programs (like mv and
cp) are large and many-featured. This is particularly true of the GNU variants of the
standard utilities, which allow long and short options, do everything required by POSIX,
and often have additional, seemingly unrelated options as well (like output highlighting).
It then becomes reasonable to ask, "Given such a large and confusing forest , how
can we focus on the one or two important trees?" In other words, if we present the
current full-featured program, will it be possible to see the underlying core operation
of the program?
That is when Hoare's law 1 inspired us to look to the original Unix programs for ex-
ample code. The original V7 Unix utilities are small and straightforward, making it
easy to see what's going on and to understand how the system calls are used. (V7 was
released around 1979; it is the common ancestor of all modern Unix systems, including
GNU/Linux and the BSD systems.)
For many years, Unix source code was protected by copyrights and trade secret license
agreements, making it difficult to use for study and impossible to publish. This is still
true of all commercial Unix source code. However, in 2002, Caldera (currently operating
as SeO) made the original Unix code (through V7 and 32V Unix) available under an
Open Source style license (see Appendix B, "Caldera Ancient UNIX License," page 655).
This makes it possible for us to include the code from the early Unix system in this book.

Standards
Throughout the book we refer to several different formal standards. A standard is a
document describing how something works. Formal standards exist for many things,
for example, the shape, placement, and meaning of the holes in the electrical outlet in

1 This famous statement was made at The International Workshop on Efficient Production of Large Programs in
Jablonna, Poland, August 10- 14, 1970.
Preface XXI

your wall are defined by a formal standard so that all the power cords in your country
work in all the outlets.
50 , too, formal standards for computing systems define how they are supposed to
work; this enables developers and users to know what to expect from their software and
enables them to complain to their vendor when software doesn't work.
Of interest to us here are:

1. ISO/IEC International Standard 9899: Programming Languages - C, 1990.


The first formal standard for the C programming language.
2. ISO/IEC International Standard 9899: Programming Languages - C, Second
edition, 1999. The second (and current) formal standard for the C programming
language.
3. ISO/IEC International Standard 14882: Programming Languages - C+ +, 1998.
The first formal standard for the c++ programming language.
4. ISO/IEC International Standard 14882: Programming Languages- C+ +, 2003.
The second (and current) formal standard for the c++ programming language.
5. IEEE Standard 1003. 1-2001: Standard for Information Technology - Portable
Operating System Interface (POSIJ:®). The current version of the POSIX stan-
dard; describes the behavior expected of Unix and Unix-like systems. This
edition covers both the system call and library interface, as seen by the C/C++
programmer, and the shell and utilities interface, seen by the user. It consists
of several volumes:
• Base Definitions. The definitions of terms, facilities, and header files.
• Base Definitions - Rationale. Explanations and rationale for the choice of
facilities that both are and are not included in the standard.
• System Interfaces. The system calls and library functions. P05IX terms them
all "functions."
• Sheil and Utilities. The shell language and utilities available for use with shell
programs and interactively.

Although language standards aren't exciting reading, you may wish to consider pur-
chasing a copy of the C standard: It provides the final definition of the language. Copies
XXII Preface

can be purchased from ANSI 2 and from ISO.3 (The PDF version of the C standard is
quite affordable.)
The POSIX standard can be ordered from The Open Group.4 By working through
their publications catalog to the items listed under "CAE Specifications," you can find
individual pages for each part of the standard (named "C031" through "C034"). Each
one's page provides free access to the online HTML version of the particular volume.
The POSIX standard is intended for implementation on both Unix and Unix-like
systems, as well as non-Unix systems. Thus, the base functionality it provides is a subset
of what Unix systems have. However, the POSIX standard also defines optional exten-
sions-additional functionality, for example, for threads or real-time support. Of most
importance to us is the XlOpen System Interface (XSI) extension, which describes facilities
from historical Unix systems.
Throughout the book, we mark each API as to its availability: ISO C, POSIX, XSI,
GUBC only, or nonstandard but commonly available.

Features and Power: GNU Programs


Restricting ourselves to just the original Unix code would have made an interesting
histoty book, but it would not have been vety useful in the 21st century. Modern pro-
grams do not have the same constraints (memory, CPU power, disk space, and speed)
that the early Unix systems did. Furthermore, they need to operate in a multilingual
world-ASCII and American English aren't enough.
More importantly, one of the primary freedoms expressly promoted by the Free
Software Foundation and the GNU Project 5 is the "freedom to study." GNU programs
are intended to provide a large corpus of well-written programs that journeyman pro-
grammers can use as a source from which to learn.

2 http: // www . ansi. o r g


3 http: // wwvJ . is o .ch
4 http: // ~MW . opengroup . org
5 http: // www . gnu.org
Preface XXIII

By using GNU programs, we want to meet both goals: show you well-written,
modern code from which you will learn how to write good code and how to use the
APIs well.
We believe that GNU software is better because it is free (in the sense of "freedom, "
not "free beer"). But it's also recognized that GNU software is often technically better
than the corresponding Unix counterparts, and we devote space in Section 1.4, "Why
GNU Programs Are Better, " page 14, to explaining why.
A number of the GNU code examples come from g a wk (GNU aWk). The main
reason is that it's a program with which we' re very familiar, and therefore it was easy
to pick examples from it. We don 't otherwise make any special claims about it.

Summary of Chapters
Driving a car is a holistic process that involves multiple simultaneous tasks. In many
ways, Linux programming is similar, requiring understanding of multiple aspects
of the API, such as file 110, file metadata, directories, storage of time information,
and so on.
The first part of the book looks at enough of these individual items to enable studying
the first significant program, the V7 15 . Then we complete the discussion of files and
users by looking at file hierarchies and the way filesystems work and are used.
Chapter 1, '1ntroduction,"page 3,
describes the Unix and Linux file and process models , looks at the differences be-
tween Original C and 1990 Standard C, and provides an overview of the principles
that make GNU programs generally better than standard Unix programs.
Chapter 2, "Arguments, Options, and the Environment," page 23,
describes how a C program accesses and processes command-line arguments and
options and explains how to work with the environment.
Chapter 3, "User-Level Memory Management,"page 51,
provides an overview of the different kinds of memory in use and available in a
running process. User-level memory management is central to every nontrivial
application, so it's important to understand it early on.
XXIV Preface

Chapter 4, "Files and File 110," page 83,


discusses basic file I/O , showing how to create and use files. This understanding
is important for everything else that follows.
Chapter 5, "Directories and File Metadata,"page 117,
describes how directories, hard links, and symbolic links work. It then describes
file metadata, such as owners, permissions, and so on, as well as covering how to
work with directories.
Chapter 6, "General Library Interfaces - Part 1,"page 165,
looks at the first set of general programming interfaces that we need so that we
can make effective use of a file's metadata.
Chapter 7, "Putting It All Together: 1 s," page 207,
ties together everything seen so far by looking at the V7 ls program.
Chapter 8, "Filesystems and Directory Walks,"page 227,
describes how filesystems are mounted and unmounted and how a program
can tell what is mounted on the system. It also describes how a program can
easily "walk" an entire file hierarchy, taking appropriate action for each object
It encounters.

The second part of the book deals with process creation and management, interprocess
communication with pipes and signals, user and group IDs, and additional general
programming interfaces. Next, the book first describes internationalization with GNU
gettext and then several advanced APIs.

Chapter 9, "Process Management and Pipes,"page 283,


looks at process creation, program execution, IPe with pipes, and file descriptor
management, including nonblocking I/O.
Chapter 10, "Signals," page 347,
discusses signals, a simplistic form of interprocess communication. Signals also
play an important role in a parent process's management of its children.
Chapter 11, "Permissions and User and Group ID Numbers," page 403,
looks at how processes and files are identified, how permission checking works,
and how the setuid and setgid mechanisms work.
Preface xxv

Chapter 12, "General Library Interfaces - Part 2,"page 427,


looks at the rest of the general APIs; many of these are more specialized than the
first general set of APIs.
Chapter 13, "Internationalization and Localization," page 485,
explains how to enable your programs to work in multiple languages, with almost
no pam.
Chapter 14, "Extended Interfaces," page 529,
describes several extended versions of interfaces covered in previous chapters, as
well as covering file locking in full detail.

We tound the book off with a chapter on debugging, since (almost) no one gets
things right the first time, and we suggest a final project to cement your knowledge of
the APIs covered in this book.

Chapter 15, "Debugging,"page 567,


describes the basics of the GDB debugger, transmits as much of our programming
experience in this area as possible, and looks at several useful tools for doin g dif-
ferent kinds of debugging.
Chapter 16, ':11 Project That Ties Everything Together," page 641,
presents a significant programming project that makes use of juSt about everything
covered in the book.

Several appendices cover topics of interest, including the licenses for the source code
used in this book.
Appendix A, "Teach Yourself Programming in Ten Years," page 649,
invokes the famous saying, "Rome wasn't built in a day." So too, Linux/Unix ex-
pertise and understanding only come with time and practice. To that end, we
have included this essay by Peter Norvig which we highly recommend.
Appendix B, "Caldera Ancient UNIX License," page 655,
covers the Unix source code used in this book.
Appendix C, "GNU General Public License,"page 657,
covers the GNU so urce code used in this book.
XXVI Preface

Typographical Conventions
Like all books on computer-related topics, we use certain typographical conventions
to convey information. Definitions or first uses of terms appear in italics, like the word
"Definitions" at the beginning of this sentence. Italics are also used for emphasis, for
citations of other works, and for commentary in examples. Variable items such as argu-
ments or filenames , appear l i ke t hi s . Occasionally, we use a bold font when a point
needs to be made strongly.
Things that exist on a computer are in a constant-width font , such as filenames
(f aa . c ) and command names (Is, grep). Short snippets that you type are additionally
enclosed in single quotes: ' 1 s -1 *. c' .
$ and > are the Bourne shell primary and secondary prompts and are used to display
interactive examples. User input appears in a different font from regular comput e r
outpu t in examples. Examples look like this:
$ 18 -1 Look at files. Option is digit 1, not letter I
foo
bar
baz

We prefer the Bourne shell and its variants (ksh9 3 , Bash) over the C shell; thus, all
our examples show only the Bourne shell. Be aware that quoting and line-continuation
rules are different in the C shell; if you use it, you' re on your own!6
When referring to functions in programs, we append an empty pair of parentheses
to the function 's name: printf ( ) , st r cpy () . When referring to a manual page (acces-
sible with the man command), we follow the standard Unix convention of writing the
command or function name in italics and the section in parentheses after it, in regular
type: awk(1), printf(3).

Where to Get Unix and GNU Source Code


You may wish to have copies of the programs we use in this book for your own ex-
perimentation and review. All the source code is available over the Internet, and your
GNU/Linux distribution contains the source code for the GNU utilities.

6 See th e csh(l) and m hO ) man pages and the book Using csh & tcsh, by Paul DuBois, O 'Reilly & Associates, Se-
bastopol, CA, USA, 1995. ISBN: 1-56592- 132- 1.
Preface XXVII

Unix Code
Archives of various "ancient" versions of Unix are maintained by The UNIX Heritage
Society (TUHS), h ttp : // www . tuh s. org.
Of most interest is that it is possible to browse the archive of old Unix source code
on the Web. Start with http : // minnie . tuh s . org / UnixTree / . All the example code
in this book is from the Seventh Edition Research UNIX System, also known as "V7."
The TUHS si te is physically located in Australia, although there are mirrors of the
archive around the world- see http: // www . tuh s. org/archi ve_sit es . html.
This page also indicates that the archive is available for mirroring with rsync.
(See htt p: //rsync . samba . org/ if you don 't have rsync: It's standard on
GNU/Linux systems.)
You will need about 2-3 gigabytes of disk ro copy the entire archive. To copy the
archive, create an empty directoty, and in it, run the following commands:
mkdir Applicati ons 4BSD PDP-ll PDP-ll/Trees VAX Other

rsync -avz minn ie . t uhs. org : : OA_R oot .


r sync -avz minnie . tuhs . org : : OA_Applic at ions Applications
rs ync -avz minni e . tuhs . org : : OA_4BSD 4BS D
rs ync -av z minn i e . t uhs . org : : OA_PDPll PD P-ll
r sync -avz minni e .tuhs . org : : OA_PDPll_Tr ees PDP -l l/Trees
rsync -avz minnie . tuhs. org : : OA_VAX VP~
r sync -avz minnie.tuhs . org : : OA_Other Other

You may wish to omit copying the Trees directory, which contains extractions of
several versions of Unix, and occupies around 7 00 megabytes of disk.
You may also wish to consult the TUHS mailing list to see if anyone near YOLl can
provide copies of the archive on CD-ROM, to avoid transferring so much data over
the Internet.
The folks at Southern Storm Software, Pry. Ltd., in Australia, have "modernized" a
portion of the V7 user-level code so that it can be compiled and run on current systems,
most notably GNU/Linux. This code can be downloaded fro m their web site. 7
It's interesting to note that V7 code does not contain any copyright or permission
notices in it. The authors wrote the code primarily for themselves and their research,
leaving the permission issues to AT &T' s corporate licensing department.

7 http: // www.s ou th ern- storm . com . au/ v7upgrade . htm l


XXVIII Preface

GNU Code
If yo u're using GNU/Linux, then your distribution will have come with source code,
presumably in whatever packaging format it uses (Red Hat RPM files , Debian DEB
files, Slackware . tar . gz files, etc.). Many of the examples in the book are from the
GNU Coreutils, version 5.0. Find the appropriate CD-ROM for your GNU/Lin ux
distribution, and use the appropriate tool to extract the code. Or follow the instructions
in the next few paragraphs to retrieve the code.
If you prefer to retrieve the files yourself from the GNU ftp site, you will find them
atftp: // ftp.gnu . org / gnu / coreutils / coreutils-5.0 . tar. gz.
You can use the wget utility to retrieve the file:
$ wget ftp://ftp.gnu.org/gnu/coreutils/coreutils-S . O.tar.gz Retrieve the distribution
... lots of output here as file is retrieved ...
Alternatively, you can use good old-fashioned ftp to retrieve the file:
$ ftp ftp.gnu.org Connect to GNU ftp site
Connected to ftp . gnu.org ( 199.232.41 . 7).
220 GNU FTP server ready .
Name (ftp .gnu . org : arnold) : anonymous Use anonymous ftp
331 please specify the password.
Password: Password does not echo on screen
230-If you have any problems with the GNU software or its downloading,
230-please refer your questions to <gnu@gnu . org>.
Lots of verbiage deleted
230 Login successful. Have fun.
Remote system type is UNIX .
Using binary mode to transfer files.
ftp> cd /gnu/coreutils Change to Coreutils directory
250 Directory successfully changed .
ftp> bin
200 Switching to Binary mode .
ftp> hash Print # signs as progress indicators
Hash mark printing on (1024 bytes/hash mark ) .
ftp> get coreutils-S.O.tar . gz Retrieve file
local: coreutils - 5 . 0 . tar . gz remote: coreutils-5.0 . tar . gz
227 Entering Passive Mode (199 ,2 32 ,41,7,86, 107)
150 Opening BINARY mode data connection for coreutils-5 . 0 . tar.gz (6020616 bytes)
#################################################################################
#################################################################################

226 File send OK .


6020616 bytes received in 2 . 03e+03 secs (2.9 Kbytes/sec)
f tp> quit Log off
221 Goodbye .
Preface XX IX

O nce you have the file, extract it as follows :


$ gzip - d e < e oreutils - 5.0 . tar . g z I tar -xvpf - Extract files
. lots of output here as files are extracted .

Systems using GNU tar may use this incantation:


$ tar - xvp z f e oreutils-5.0 . tar . gz Extract files
.. . lots of output here as files are extracted .

In compliance with the GNU General Public License, here is the Copyright infor-
mation for all GNU programs quoted in this book. All the programs are "free software;
you can redistribute it and/or mo dify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either version 2 of the License,
or (at your option) any later versio n." See Appen dix C, "GNU General Public License, "
page 657, for the text of the GNU General Pub lic License.
Coreutils S.O File Copyright dates
l i b/sa fe - read.c Copyright© 1993-1994, 1998,2002
l ib / safe-write.c Copyrigh t© 2002
lib/utime . c Copyright © 1998, 200 1-2002
l ib / xreadlink. c Copyright © 2001
src/du . c Copyright © 1988-1991,1995- 2003
src/env. c Copyright© 1986, 1991 - 2003
src / install.c Copyright © 1989-1991,1995-2002
srcllink. c Copyright © 2001-2002
srclls . c Copyright© 1985, 1988,1990, 199 1,1995-2003
src / pathchk.c Copyright© 1991 - 2003
src / s ort. c Copyright © 1988, 1991-2002
src/sys2.h Copyright © 1997-2003
src / wc . c Copyright © 1985, 1991, 1995-2002

Gawk 3.0.6 File Copyright dates


eval . c Copyright © 1986, 1988, 1989, 199 1-2000
xxx Preface

Gawk 3.1.3 File Copyright dates


awk.h Copyright © 1986, 1988,1989,199 1-2003
builtin . c Copyright© 1986, 1988,1989,1991 - 2003
eval.c Copyright © 1986, 1988,1989, 1991-2003
io. c Copyright © 1986, 1988, 1989, 1991-2003
main . c Copyright © 1986, 1988, 1989, 1991-2003
posix/gawkmis c . c Copyright © 1986, 1988, 1989, 1991-1998,2001-2003

Gawk 3.1.4 File Copyright dates


builtin.c Copyright © 1986, 1988, 1989, 1991-2004

GLlBC 2.3.2 File Copyright dates


loc alellocale.h Copyright © 1991, 1992, 1995-2002
posix / uni std. h Copyright © 1991 - 2003
time /sys/time. h Copyright © 1991-1994, 1996-2003

Make 3.80 File Copyright dates


read. c Copyright © 1988-1997,2002

Where to Get the Example Programs Used in This Book


The example programs used in this book can be found at http : / / au thors .
php tr . com/robbins.

About the Cover


"This is the weapon of aJed i Knight ... , an elegant weapon for
a more civilized age. For over a thousand generations theJedi
Knights were the guardians of peace and justice in the Old
Republic. Before the dark times, before the Empire."
-Obi-Wan Kenobi-
You may be wondering why we chose to put a light saber on the cover and to use it
throughout the book's interior. What does it represent, and how does it relate to Linux
programming?
Preface XXXI

In the hands of a Jedi Knight, a light saber is both a powerful weapon and a thing
ofbeaury. Its use demonstrates the power, knowledge, control of the Force, and arduous
training of the J edi who wields it.
The elegance of the light saber mirrors the elegance of the original Unix API design.
There, too, the studied, precise use of the APls and the Software Tools and GNU design
principles lead to today's powerful, flexible, capable GNU/Linux system. This system
demonstrates the knowledge and understanding of the programmers who wrote all its
components.
And, of course, light sabers are just way cool!

Ac kn owledgmen ts
Writing a book is lots of work, and doing it well requires help from many people.
Dr. Brian W. Kernighan, Dr. Doug McIlroy, Peter Memishian, and Peter van der
Linden reviewed the initial book proposal. David J. Agans, Fred Fish, Don Marti, Jim
Meyering, Peter Norvig, and Julian Seward provided reprint permission for various
items quoted throughout the book. Thanks to Geoff Collyer, Ulrich Drepper, Yosef
Gold, Dr. CA.R. (Tony) Hoare, Dr. Manny Lehman, Jim Meyering, Dr. Dennis M.
Ritchie, Julian Seward, Henry Spencer, and Dr. Wladyslaw M. Turski, who provided
much useful general information. Thanks also to the other members of the GNITS
gang: Karl Berry, Akim DeMaille, Ulrich Drepper, Greg McGary, Jim Meyering,
Fran<;:ois Pinard, and Tom Tromey, who all provided helpful feedback about good
programming practice. Karl Berry, Alper Ersoy, and Dr. Nelson H.F. Beebe provided
valuable technical help with the T exinfo and DocBook/XML toolchains.
Good technical reviewers not only make sute that an author gets his facts right, they
also ensure that he thinks carefully about his presentation. Dr. Nelson H.F. Beebe,
Geoff Collyer, Russ Cox, Ulrich Drepper, Randy Lechlitner, Dr. Brian W. Kernighan,
Peter Memishian, Jim Meyering, Chet Ramey, and Louis Taber acted as technical re-
viewers for the entire book. Dr. Michael Brennan provided helpful comments on
Chapter 15. Both the prose and many of the example programs benefited from their
reviews. I hereby thank all of them. As most authors usually say here, "Any remaining
.
errors are mme. "

I would especially like to thank Mark Taub of Pearson Education for initiating this
project, for his enthusiasm for the series, and for his help and advice as the book moved
XXX II Preface

through its various stages. Anthony Gemmellaro did a phenomenal job of realizing my
concept for the cover, and Gail Cocker's interior design is beautiful. Faye Gemmellaro
made the production process enjoyable, instead of a chore. Dmitry Kirsanov and
Alina Kirsanova did the figures , page layout, and indexing; they were a pleasure to
work with.
Finally, my deepest gratitude and love to my wife, Miriam, for her support and en-
couragement during the book's writing.

Arnold Robbins
No/Ayalon
ISRAEL
Chapter 1 Introduction page 3

Chapter 2 Arguments, Options, and the Environment page 23

Chapter 3 User-Level Memory Management page 51

Chapter 4 Files and File I/ O page 83

Chapter 5 Directories and File Metadata page 117

Chapter 6 General Library Interfaces - Part 1 page 165

Chapter 7 Putting It All Together: 15 page 207

Chapter 8 Filesystems and Directory Walks page 227


In this chapter

• 1 .1 T he Linux/ Uni x File Model page 4

• 1.2 The Linux/ Un ix Process Mod el page 10

• 1. 3 Stand a rd C vs. Original C page 12

• 1.4 Why GNU Programs Are Better page 14

• 1 .5 Portability Revisited page 19

• 1.6 Suggested Reading page 20

• 1 .7 Summary page 2 1

• Exe rcises page 22

3
I f there is one phrase that summarizes the primary GNU/Linux (and therefore
Unix) concepts, it's "files and processes. " In this chapter we review the Linux
file and process models. These are important to understand because the system calls
are almost all concerned with modifYing some attribute or part of the state of a file
or a process.
Next, because we'll be examining code in both styles, we briefly review the major
difference between 1990 Standard C and Original C. Finally, we discuss at some
length what makes GNU programs "better," programming principles that we'll see
in use in the code.
This chapter contains a number of intentional simplifications. The full details are
covered as we progress through the book. If you're already a Linux wizard, please
forgive us.

1.1 The LinuxjUnix File Model


One of the driving goals in the original Unix design was simplicity. Simple concepts
are easy to learn and use. When the concepts are translated into simple APIs , simple
programs are then easy to design, write, and get correct. In addition, simple code is
often smaller and more efficient than more complicated designs.
The quest for simplicity was driven by two factors. From a technical point of view,
the original PDP-II minicomputers on which Unix was developed had a small address
space: 64 Kilobytes total on the smaller systems, 64K code and 64K of data on the large
ones. These restrictions applied not just to regular programs (so-called user level code),
but to the operating system itself (kernel level code). Thus, not only "Small Is Beautiful"
aesthetically, but "Small Is Beautiful" because there was no other choice!
The second factor was a negative reaction to contemporary commercial operating
systems, which were needlessly complicated, with obtuse command languages, multiple
kinds of file I/O , and little generality or symmetry. (Steve Johnson once remarked that
"Using TSO is like trying to kick a dead whale down a beach. " TSO is one of the obtuse
mainframe time-sharing systems just described. )

1.1.1 Files and Permissions


The Unix file model is as simple as it gets: A file is a linear stream of bytes. Period.
The operating system imposes no preordained structure on files: no fixed or varying

4
1.1 The Linux/ Unix File Model 5

record sizes, no indexed files , nothing. The interpretation of fil e contents is entirely up
to the application. (This isn' t quite true, as we'll see shortly, but it's close enough for
a start.)
Once you have a file, you can do three things with the file 's data: read them, write
them , or execute them.
Unix was designed for time-sharing minicomputers; this implies a multiuser environ-
m ent from the get-go. Once there are multiple users, it must be possible to specify a
file's permissions: Perhaps user jane is user fr ed's boss, and jane doesn't want fre d
to read the latest performance evaluations.
For file permission purposes, users are classified into three distinct categories: user:
the owner of a file; group: the group of users associated with this file (discussed shortly) ;
and other: anybody else. For each of these categories, every file has separate read, write,
and execute permission bits ass ociated with it, yielding a total of nine permission bits.
This shows up in the first field of the output of ' 1 s - 1':
S ls -1 progex.t e xi
- r w- r - - r- - 1 arno l d dev el 5 61 4 F e b 24 18 : 0 2 pr o gex . tex i

Here, arno l d and deve l are the owner and group ofproge x . t exi , and - r w- r- - r- -
are the file type and permissions . The first character is a dash for regular file, a d for
directories, or o ne of a small set of other characters for other kinds of files that aren't
important at the moment. Each subsequent group of three characters represents read,
write, and execute permission for the owner, group, and "other," respectively.
In this example, progex. t e xi is readable and writable by the owner, and readab le
by the group and other. The dashes indicate absent permissions, thus the fil e is no t ex-
ecutable by anyone, nor is it wri table by the group or other.
T he owner and group of a file are stored as numeric values known as the user ID
(UID) and group ID (GID); standard library functions that we present later in the book
m ake it possible to print the values as human -readable names.
A file's owner can change the permission by using the chmod (change mode)
command. (As such, file permissions are sometimes referred to as the "file mode. ")
A file's group can be changed with the chgrp (change group) and chown (change
owner) commands. 1

1 Some sysrems al low regular use rs ro cha nge rhe ownership o n rheir fi les ro someo ne else, rhus "giving rh em away."
T h e details are srandardized by POSIX bur are a bir messy. Typical GNU/Linux configurarions do nor allow it.
6 Chapter 1 • Introduction

Group permissions were intended to support cooperative work: Although one person
in a group or department may own a particular file , perhaps everyone in that group
needs to be able to modify it. (Consider a collaborative marketing paper or data from
a survey.)
When the system goes to check a file access (usually upon opening a file) , if the UID
of the process matches that of the file , the owner permissions apply. If those permissions
deny the operation (say, a write to a file with - r--rw-rw- permissions), the operation
fails; Unix and Linux do not proceed to test the group and other permissions. 2 The
same is true if the UID is different but the GID matches; if the group permissions deny
the operation, it fails.
U nix and Linux support the notion of a superuser: a user with special privileges. This
user is known as r oo t and has the UID of o. r oot is allowed to do anything; all bets
are off, all doors are open, all drawers unlocked. 3 (This can have significant security
implications, which we touch on throughout the book but do not cover exhaustively.)
Thus, even if a file is mode ----- -----, r oot can still read and write the file. (One
exception is that the file can't be executed. But as root can add execute permission,
the restriction doesn' t prevent anything.)
The user/group/other, read/write/execute permissions model is simple, yet flexible
enough to cover most situations. Other, more powerful but more complicated, models
exist and are implemented on different systems , but none of them are well enough
standardized and broadly enough implemented to be worth discussing in a general-
purpose text like this one.

1.1.2 Directories and Filenames


Once you have a file, you need someplace to keep it. This is the purpose of the direc-
tory (known as a "folder" on Windows and Apple Macintosh systems). A directory is
a special kind of file , which associates filenames with particular collections of file
metadata, known as in odes. Directories are special because they can only be updated by
the operating system, by the system calls described in Chapter 4, "Files and File I/O,"

2 The own er can always change the permission , of course. Most users don't di sable write permission fo r themselves.
'3 There are some rare exceptions to this rule, all of which are beyo nd th e scope of this book.
1.1 T he Linux/Un ix File Model 7

page 83. They are also special in that the operating system dictates the format of direc-
to ryentnes.
Filenames may contain any valid 8-bit byte except the / (forward slash) character
and ASCII NUL, the character whose bits are all zero. Early Unix systems limited file-
names to 14 bytes; modern systems allow individual filenames to be up to 255 bytes.
T he inode contains all the information abo ut a file except its name: the type, owner,
group, permissions, size, m odification and access times . It also stores the locations on
disk of the blocks containing the file 's data. All of these are data about the file, not the
file 's data itself, thus the term metadata.
Directory permissions have a slightly different m eaning from those for file permissions.
Read permission means the ability to search the directory; that is, to look through it to
see what files it contains. Write permission is the abili ty to create and remove files in
the directory. Execute permission is the abili ty to go through a directory when opening
or otherwise accessing a co ntained file or subdirectory.

J NOTE If you have write permission on a directory, you ca n remove fil es in th a t


i~ directo ry, even if they don't be lon g to you! When used interactively, the r m
m co mmand noti ces thi s, and asks you for co nfirmation in such a case.
t~
d The / tmp directory has write permission for everyon e, but your files in / tmp
ill! a re quite safe because/ tmp usually has th e so-ca lled sticky bit set on it:
I $ 1s -ld /tmp

II*
d rwxrwxrwt 1 1 root roo t 40 96 May 1 5 17 :1 1 /trop

Note the t is the last position of the first fi eld . On most directories thi s position
Im has an x in it. Wi th th e sticky bit set, only you, as the fil e's owner, or r o ot may
:ffi remove your fil es. (We discu ss this in more detail in Section 11 .5. 2 , " Directori es
I and the Sticky Bit," page 414. )
ill

1.1.3 Executable Files


Remember we said that the operating sys tem doesn't impose a structure on files?
Well, we've already seen that that was a white lie when it comes to directories. It's also
the case for binary executable files. To run a program, the kernel has to know what part
of a file represents instructions (code) and what part represents data. This leads to the
notion of an object file fo rmat, which is the definition for how these things are laid o ut
within a file on disk.
8 Chapter 1 • Introduction

Although the kernel will only run a file laid out in the proper format, it is up to user-
level utilities to create these files. The compiler for a programming language (such as
Ada, Fortran, C , or C++) creates object files, and then a linker or loader (usually named
ld) binds the object files with library routines to create the final executable. Note that
even if a file h as all the right bits in all the right places, the kernel won' t run it if the
appropriate execute permission bit isn't turned on (or at least one execute bit for r oo t) .
Because the compiler, assembler, and loader are user-level tools, it's (relatively) easy
to change object file formats as needs develop over time; it's only necessary to "teach"
the kernel about the new format and then it can be used. The part that loads executables
is relatively small and this isn't an impossible task. Thus, Unix file formats have evolved
over time. The original format was known as a . out (Assembler OUTput) . The next
format , still used on some commercial systems, is known as COFF (Common Object
File Format), and the current, most widely used format is ELF (Extensible Linking
Format). Modern GNU/Linux systems use ELF .
The kernel recognizes that an executable file contains binary object code by looking
at the first few bytes of the file for special m agic numbers. These are sequences of two
or four bytes that the kernel recognizes as being special. For backwards compatibility,
modern Unix systems recognize multiple formats . ELF files begin with the four characters
" \ 177ELF" .

Besides binary executables, the kernel also supports executable scripts. Such a file also
begins with a magic number: in this case, the two regular characters # ! . A script is a
program executed by an interpreter, such as the shell, awk, Peri, Python, or Tcl. The
#! line provides the full path to the interpreter and, optionally, one single argument:
#! I bin l awk -f

BEGIN { print "hello, world" }

Let's assume the above contents are in a file named hello . awk and that the file is
executable. When you type 'hell o . awk' , the kernel runs the program as if you had
typed ' I bin l awk - f hell o . aWk' . Any additional command-line arguments are also
passed on to the program. In this case, awk runs the program and prints the universally
known hel lo , world message.
The # ! mechanism is an elegant way of hiding the distinction between binary exe-
cutables and script executables. If he ll o . awk is renamed to just hell o, the user typing
l.1 The Linux/ Unix File Model 9

'he llo ' can't tell (and indeed sho uldn't have to know) that hello isn' t a binary exe-
cutab le program.

1 .1.4 Devices
One of U nix's most notable innovations was the unificatio n of file I/O and device
I/0 .4 Devices appear as files in the filesystem, regular permissio ns apply to their access,
and the same I/O system calls are used for opening, reading, writing, and closing them.
All of the "magic" to make devi ces look like files is hidden in the kernel. This is just
another aspect of the driving simplicity principle in action: We might phrase it as no
special cases for user code.
Two devices appear frequently in everyday use, particularly at the shell level:
/ dev / null and / dev / tty.
/ dev/null is the "bit bucker." All data sent to Idev/null is discarded by the oper-
ating sys tem, and attempts to read from it always return end-of-file (EOF) immediately.
I dey / tty is the process's current controlling terminal-
the one to which it listens
when a user types the interrupt character (typically CTRL-C) or performs job control
(CTRL-Z).
GNU/Linux systems , and many modern Unix systems, supply /dev / stdin,
/ dev / stdout , and / dev / stderr devices, which provide a way to name the open files
each process inherits upon startup .
Other devices rep resent real hardware, such as tape and disk drives, CD-ROM drives,
and serial ports. There are also software devices, such as pseudo-ttys, that are used for
networking logins and windowing sys tems. / dey I console represents the system console,
a particular hardware device on minicomputers. On modern co mputers, / dey / c onsol e
is the screen and keyboard, but it could be a serial port.
Unfortun ately, device-naming conventions are not standardized, and each operating
system has different names for tapes, disks, and so on. (Fortunately, that's not an issue
for what we cover in this book.) Devices have either a b or c in the first character of
'ls -1' o utput:

4 T his feature firsr appeared in M ulrics, bur Mulrics was neve r widely used.
10 Chapter 1 • Inrroduc(ion

$ 16 -1 /dev/tty /dev/hda
brw-rw---- 1 root disk 3, o Aug 31 02 : 31 /dev/hda
crw-rw-rw- 1 root root 5, o Feb 26 08 : 44 / dev / tty

The initial b represents block devices, and a c represents character devices. Device files
are discussed further in Section 5.4, "Obtaining Information about Files," page 139.

1.2 The LinuxjUnix Process Model


A process is a running program.5 Processes have the following attributes:
• A unique process identifier (the PI D )
• A parent process (with an associated identifier, the PPID)
• Permission identifiers (UID, GID, groupset, and so on)
• An address space, separate from those of all other processes
• A program running in that address space
• A current working directory (' . ')
• A current root directory (/ ; changing this is an advanced topic)
• A set of open files , directories, or both
• A permissions-to-deny mask for use in creating new files
• A set of strings representing the environment
• A scheduling prioriry (an advanced topic)
• Settings for signal disposition (an advanced topic)
• A controlling terminal (also an advanced topic)

When the main () function begins execution, all of these things have already been
put in place for the running program. System calls are available to query and change
each of the above items; covering them is the purpose of this book.
New processes are always created by an existing process. The existing process is termed
the parent, and the new process is termed the child. Upon booting, the kernel handcrafts
the first , primordial process, which runs the program / sbin / ini t; it has process ID

5 Processes can be suspe nded , in which case they are not "running"; however, neither are they terminated. In any
case, in the early stages of the climb up the learning curve, it pays not ro be roo pedantic.
1.2 The Linux/ Unix Process Model 11

1 and serves several administrative functions. All other processes are descendants of
init. (init's parent is the kernel, often listed as process 10 0.)
T h e child- to-parent relationship is one-to-one; each process h as only one parent,
and thus it's easy to find out the PID of the parent. T he parent-to-child relationship
is one-to-many; any given process can create a potentially unlimited number of children.
Thus, there is no easy way for a process to find o ut the PIDs of al l its children. (In
practice, it's no t necessary, anyway.) A parent process can arrange to be notified when
a child process terminates ("dies"), and it can also explicitly wai t for such an event.
Each process's address space (memory) is separate from that of every other. U nless
two processes have made explicit arrangement to share memory, one process cannot
affect the address space of another. This is important; it provides a basic level of securiry
and system reliabiliry. (Fo r efficiency, the system arranges to share the read-only exe-
cutable code of the same program among all the processes running that program . This
is transparent to the user and to the runni ng program.)
The current working directory is the one to which relative pathnames (those that
don't start with a / ) are relative. This is the directory you are "in" whenever you issue
a 'cd s omeplac e' command to the shell.
By co nvention, all programs start out with three files already open: standard input,
standard output, and standard error. These are where input comes fro m , output goes
to, and error messages go to, respectively. In the co urse of this book, we will see h ow
these are put in place. A parent process can open addi tional files and have them already
available for a child p rocess; the child will have to know they' re there, either by way of
some convention or by a command-line argument or environment variable.
T he environment is a set of strings, each of the form 'n ame=v al ue'. Functions exist
for querying and setting environment variables, and child processes inherit the environ-
ment of their parents. Typical environment variables are things like PATH and HOME in
the shell. Many programs look fo r the exis tence and val ue of specific environment
variables in order to control their behavior.
It is important to understand that a single process may execute multiple programs
during its lifetime. U nless explicitly changed, all of the other system-maintained
attributes (cutrent directory, open files, PID, etc.) remain the same. The separation of
"starting a new process" from "choosing which program to run" is a key Unix innovation.
12 Chapter 1 • Introduction

It makes many operations simple and straightforward. Other operating systems that
combine the two operations are less general and more complicated to use.

1.2.1 Pipes: Hooking Processes Together


You've undoubtedly used the pipe construct (' I ') in the shell to connect two or more
running programs. A pipe acts like a file: One process writes to it using the normal
write operation, and the other process reads from it using the read operation. The
processes don 't (usually) know that their input/output is a pipe and not a regular file.
Just as the kernel hides the "magic" for devices, making them act like regular files ,
so too the kernel does the work for pipes, arranging to pause the pipe's writer when the
pipe fills up and to pause the reader when no data is waiting ro be read.
The file 110 paradigm with pipes thus acts as a key mechanism for connecting running
programs; no temporary files are needed. Again , generaliry and simplicity at work: no
special cases for user code.

1.3 Standard C vs. Original C


For many years, the de facto definition of C was found in the first edition of the
book The C Programming Language, by Brian Kernighan and Dennis Ritchie. This
book described C as it existed for Unix and on the systems to which the Bell Labs de-
velopers had ported it. Throughout this book, we refer to it as "Original C," although
it's also common for it to be referred to as "K&R C," after the book's two authors.
(Dennis Ritchie designed and implemented C.)
The 1990 ISO Standard for C formalized the language's definition, including the
functions in the C library (such as printf () and fopen ()) . The C standards committee
did an admirable job of standardizing existing practice and avoided inventing new fea-
tures, with one notable exception (and a few minor ones). The most visible change in
the language was the use of function prototypes, borrowed from c++ .
Standard C, C ++ , and the Java programming language use function prototypes for
function declarations and definitions. A prototype describes not only the function's
return value but also the number and type of its arguments. With prototypes, a compiler
can do complete type checking at the point of a function call:
1.3 Srandard C vs. Original C 13

extern int myfunc ( strucc my_s truct *a, Declaration


struct my_struc t *b,
double c, int d) ;

int rny func (struct my_struct *a, Defnition


struct my_struct *b,
d ouble c , int d)

st ruct my_stru ct s , t;
in t j ;

/ * Func ti on cal l, s omewhere else : * /


j = my_func( & s, & t, 3 . 1415, 42 ) ;

This fun ction call is fine . But consider an erroneous call:


j = my_f unc ( -l, -2 , 0) ; Wrong number and types of arguments
T he compiler can immediately di agnose this call as being invalid. However, In

Original C, functions are declared w ithout the argument list being specified:
e xtern int myfunc() ; Returns int, arguments unknown
Furthermore, function defini tions list the parameter names in the fun ction header,
and then declare the parameters before the functi on body. Parameters of type int d on ' t
have to be declared, and if a function returns int, that doesn't have to be declared either:
myfunc (a , b, c, d) Return type is int
struct my_struct *a, *b ;
double C ; No te, no declaration of parameter d
{

Consider again the same erro neous function call: j = my_ fun c ( - 1, - 2 , 0);' . In
C

Original C, the compiler has no way of knowing that yo u've (accidentally, we assume)
passed the wrong arguments to my_ fune () . Such erroneo us calls generally lead to hard-
to-find runtime problems (s uch as segmentation faults, whereby the program dies), and
the Unix lint program was created to deal with these kinds of things.
So, although function prototypes were a radical departure from existing practice,
their additional type checking was deemed too important to be witho ut, and they were
added to the language with little opposition.
14 Chapter 1 • Introducti o n

In 1990 Standard C, code written in the original style, for both declarations and
definitions, is valid. This makes it possible to continue to compile millions of lines of
existing code with a standard-conforming compiler. New code, obviously, should be
written with prototypes because of the improved possibilities for compile-time
error checking.
1999 Standard C continues to allow original style declarations and definitions.
However, the "implicit int" rule was removed; functions must have a return type, and
all parameters must be declared.
Furthermore, when a ptogram called a function that had not been formally declared,
Original C would create an implicit declaration for the function, giving it a return type
of int o 1990 Standard C did the same, additionally noting that it had no information
about the parameters. 1999 Standard C no longer provides this "auto-declare" feature .
Other notable additions in Standard C are the const keyword, also from C+ +, and
the vola t ile keyword, which the committee invented. For the code you'll see in this
book, understanding the different function declaration and definition syntaxes is the
most important thing.
For V7 code using original style definitions, we have added comments showing the
equivalent prototype. Otherwise, we have left the code alone, preferring to show it ex-
actlyas it was originally written and as you'll see it if you download the code yourself.
Altho ugh 1999 C adds some additional keywords and features beyond the 1990
version, we have chosen to stick to the 1990 dialect, since C99 compilers are not yet
commonplace. Practically speaking, this doesn 't matter: C89 code should compile and
run without change when a C99 compiler is used, and the new C99 features don't affect
our discussion or use of the fundamental Linux/Unix APIs.

1.4 Why GNU Programs Are Better


What is it that makes a GNU program a GNU program?6 What makes GNU software
"better" than other (free or non-free) software? T he most obvious difference is the GNU
General Public License (GPL) , which describes the distribution terms for GNU software.
But this is usually not the reason yo u hear people saying "Get the GNU version of xy z,

6 This section is adapted from an articl e by the author that appeared in Issue 16 of Linux Journal. (See
h ttp : // www .li nu x j ournal . com / article . php? s id=11 3 5.) Reprinted and adapted by permission.
1.4 Why GNU Programs Are Bener 15

it's much better. " GNU software is generally more robust, and performs better, than
standard Unix versions. In this section we look at some of the reasons why, and at the
document that describes the principles of GNU software design.
The GNU Coding Standards describes how to write software for the GNU
project. It covers a range of topics. You can read the GNU Coding Standards online at
ht tp: // www . gnu . org / prep / standa rds . h t ml. See the online version for pointers
to the source files in other formats.
In this section, we describe only those parts of the GNU Coding Standards that relate
to program design and implementation.

1.4.1 Program Design


Chapter 3 of the GNU Coding Standards ptovides general advice about program de-
sign. The four main issues are compatibility (with standards and Unix), the language
to write in, reliance o n nonstandard features of other programs (in a word, "none"),
and the meaning of" portability."
Compatibility with Standard C and POSIX, and to a lesser extent, with Berkeley
Unix is an important goal. But it's not an overriding one. The general idea is to provide
all necessary functionality, with command-line switches to provide a strict ISO or
POSIX mode.
C is the preferred language for writing GNU software since it is the most commonly
available language. In the Unix world, Standard C is now common, but if you can
easily support Original C, you should do so. Although the coding standards prefer C
over C++, C+ + is now commonplace too. One widely used GNU package written in
C++ is groff (GNU troff). With GCC supporting C++, it has been our experience
that installing gro f f is not difficult.
The standards state that portability is a bit of a red herring. GNU utilities are ulti-
mately intended to run on the GNU kernel with the GNU C Library. ? But since the
kernel isn't finished yet and users are using GNU tools on non-GNU systems, portabil-
ity is desirable, JUSt not paramount. The standard recommends using Autoconf for
achieving portability among different Unix systems.

7 T hi s statement refers to the HURD kern el, which is srill under develo pment (as of early 2004) . GCC and GN U
C Library (GLIBC) development rake place mostly on Linux-based sysrems today.
16 Chapter 1 • Introduction

1.4.2 Program Behavior


Chapter 4 of the GNU Coding Standards provides general advice about program be-
havior. We will return to look at one of its sections in detail, below. The chapter focuses
on program design, formatting error messages, writing libraries (by making them
reentrant) , and standards for the command-line interface.
Error message formatting is important since several tools, notably Emacs, use the
error messages to help you go straight to the point in the source file or data file at which
an error occurred.
GNU utilities should use a function named get op t_long () for processing
the command line. This function provides command-line option parsing for both
traditional Unix-style options ('gawk -F: ... ') and GNU-style long options
('gawk --f ield- sepa r at or=: ... '). All programs should provide - -help and
--version options, and when a long name is used in one program, it should be used
the same way in other GNU programs. To this end, there is a rather exhaustive list of
long options used by current GNU programs.
As a simple yet obvious example, --verbose is spelled exactly the same way in all
GNU programs. Contrast this to -v, -v, - d, etc., in many Unix programs. Most of
Chapter 2, "Arguments, Options, and the Environment," page 23 , is devoted to the
mechanics of argument and option parsing.

1.4.3 C Code Programming


The most substantive part of the GNU Coding Standards is Chapter 5, which describes
how to write C code, covering things like formatting the code, correct use of comments,
using C cleanly, naming your functions and variables, and declaring, or not declaring,
standard system functions that you wish to use.
Code formatting is a religious issue; many people have different styles that they prefer.
We personally don't like the FSF's style, and if you look at gawk, which we maintain,
you'll see it's formatted in standard K&R style (the code layout style used in both edi-
tions of the Kernighan and Ritchie book). But this is the only variation in gawk from
this part of the coding standards.
Nevertheless, even though we don 't like the FSF's style, we feel that when modifying
some other program, sticking to the coding style already used is of the utmost impor-
tance. Having a consistent coding style is more important than which coding style you
1.4 Why GNU Programs Are Better 17

pick. The GNU Coding Standards also makes this point. (So metimes, there is no de-
tectable consistent coding style, in which case the program is probably overdue for a
trip through either GNU indent or Unix's cb.)
What we find important about the chapter on C coding is that the advice is good
for any C coding, not just if you happen to be working on a GNU program. So, if
yo u' re just learning C or even if yo u've been working in C (o r C++) for a while, we
recommend this chapter to you since it encapsulates many years of experience.

1.4.4 Things That Make a GNU Program Better


We now examine the section titled Writing Robust Programs in Chapter 4, Program
Behavior for All Programs, of the GNU Coding Standards. T his sectio n provides the
principles of software design that make GNU programs better than their Unix counter-
parts. We quote selected parts of the chapter, with some examples of cases in which
these principles have paid off.
Avoid arbitrary limits on the length or number of any data structure, including
file names, lines, files , and symbols, by allocating all data structures dynami-
cally. In most Unix utilities, "long lines are silently truncated. " This is not
accep table in a GNU utility.

This rule is perhaps the single most important rule in GNU software design-no
arbitrary Limits. All GNU utilities should be able to manage arbi trary amounts of data.
While this requirement perhaps makes it harder for the programmer, it makes things
much better for the user. At one point, we had a gawk user who regularly ran an awk
program on more than 650,000 files (no, that's n ot a typo) to gather statistics. gawk
would grow to over 192 megabytes of data space, and the program ran fo r around seven
CPU hours. He would not have been able to run his program using another awk
implementation. 8
Utilities reading files should not drop NUL characters, or any other nonprint-
ing characters incLuding those with codes above 0177 The only sensible excep-
tions would be utilities specifically intended for interface to certain types of
terminals or printers that can't handle those characters.

8 T his situatio n occurred circa 1993; [he truism is eve n more obvious roday, as users process gigabytes of log files
with gawk .
18 Chapter 1 • Introduction

It is also well known that Emacs can edit any arbitrary file, including files containing
binary data!
Whenever possible, try to make programs work properly with sequences of
bytes that represent multi byte characters, using encodings such as UTF-8
and others. 9 Check every system call for an error return, unless you know
you wish to ignore errors. Include the system error text (from perro r or
equivalent) in every error message resulting from a failing system call, as well
as the name of the file if any and the name of the utility. Just "cannot open
foo .c" or "stat failed" is not sufficient.
Checking every system call provides robustness. This is another case in which life is
harder for the programmer but better for the user. An error message detailing what ex-
actly went wrong makes finding and solving any problems much easier. 1o
Finally, we quote from Chapter 1 of the GNU Coding Standards, which discusses
how to write your program differently from the way a Unix program may have
been written.
For example, Unix utilities were generally optimized to minimize memory
use; if you go for speed instead, your program will be very different. You
could keep the entire input file in core and scan it there instead of using
stdio. Use a smarter algorithm discovered more recently than the Unix pro-
gram. Eliminate use of temporary files. Do it in one pass instead of two (we
did this in the assembler).
Or, on the contrary, emphasize simplicity instead of speed. For some appli-
cations, the speed of today's computers makes simpler algorithms adequate.
Or go for generality. For example, Unix programs often have static tables or
fixed-size strings, which make for arbitrary limits; use dynamic allocation
instead. Make sure your program handles NULs and other funny characters
in the input files. Add a programming language for extensibility and write
part of the program in that language.

9 Sectio n 13.4 , "Can You Spell That for M e, Please?", page 521 , provides an overvi ew of mu!tibyre characters and
encodings.
10 The m echanics of checking for and reporting errors are discussed in Section 4.3, "Determining What Went
Wrong," page 86.
1.5 Porcability Revisited 19

Or turn some parts of the program into independently usable libraries. Or


use a simple garbage collector instead of tracking precisely when to free
memory, or use a new GNU facility such as obstacks.
An excellent example of the difference an algorithm can make is GNU diff. One
of our system's early incarnations was an AT&T 3B1: a system with a MC68010 pro-
cessor, a whopping two megabytes of memory and 80 megabytes of disk. We did
(and do) lots of editing on the manual for gawk, a file that is almost 28 ,000 lines long
(although at the time, it was only in the 10,OOO-lines range) . We used to use 'd iff - c '
quite frequently to look at our changes. On this slow system, switching to GNU diff
made a stunning difference in the amount of time it took for the context diff to appear.
The difference is almost entirely due to the better algorithm that GNU di f f uses.
The final paragraph mentions the idea of structuring a program as an independently
usab le library, with a command-line wrapper or other interface around it. One example
of this is GOB, the GNU debugger, wh ich is partially implemented as a command-line
tool on top of a debugging library. (The separation of the GDB core functionality from
the command interface is an ongoing development project.) This implementation makes
it possible to write a graphical debugging interface on top of the basic debugging
functionali ty.

1.4.5 Part ing Thoughts about the "GNU Coding Standards"


The GNU Coding Standards is a worthwhile document to read if you wish to develop
new GNU software, enhance existing GNU software, or just learn how to be a better
programmer. The principles and techniques it espouses are what make GNU software
the preferred choice of the Unix community.

1.5 Portability Revisited


Portability is something of a holy grail; always so ught after, bur not always obtainable,
and certainly not easily. There are several aspects to writing portable code. The GNU
Coding Standards discusses many of them. But there are others as well. Keep portability
in mind at both higher and lower levels as you develop. We recommend these practices:
Code to standards.
Although it can be challenging, it pays to be familiar with the formal standards
for the language you're using. In particular, pay attention to the 1990 and 1999
20 Chapter 1 • Introduction

ISO standards for C and the 2003 standard for c++ since most Linux programming
is done in one of those two languages.
Also, the POSIX standard for library and system call interfaces, while large, has
broad industry support. Writing to POSIX greatly improves the chances of suc-
cessfully moving your code to other systems besides GNU/Linux. This standard
is quite readable; it distills decades of experience and good practice.
Pick the best interface for the job.
If a standard interface does what you need, use it in your code. Use Autoconf to
detect an unavailable interface, and supply a replacement version of it for deficient
systems. (For example, some older systems lack the memmove () function, which
is fairly easy to code by hand or to pull from the GLIBC library.)
Isolate portability problems behind new interfaces.
Sometimes, you may need to do operating-system-specific tasks that apply on
some systems but not on others. (For example, on some systems, each program
has to expand command-line wildcards instead of the shell doing it.) Create a new
interface that does nothing on systems that don't need it but does the correct thing
on systems that do.
Use Autoconffor configuration.
Avoid #ifdef if possible. If not, bury it in low-level library code. Use Autoconf
to do the checking for the tests to be performed with #ifdef.

1.6 Suggested Reading


1. The C Programming Language, 2nd edition, by Brian W . Kernighan and Dennis
M. Ritchie. Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1989. ISBN:
0-13-1103 70-9.
This is the "bible" for C, covering the 1990 version of Standard C. It is a rather
dense book, with lots of information packed into a startlingly small number
of pages . You may need to read it through more than once; doing so is well
worth the trouble.
2. C, A Reference Manual, 5th edition, by Samuel P. Harbison III and Guy L.
Steele, Jr. Prentice-Hall, Upper Saddle River, New Jersey, USA, 2002. ISBN:
0-13-089592-X.
l. 7 Summary 21

This book is also a classic. It covers Original C as well as the 1990 and 1999
standards. Because it is current, it makes a valuable companion to The C Pro-
gramming Language. It covers many important items , such as internationaliza-
tion-related types and library functions, that aren ' t in the Kernighan and
Ritchie book.
3. Notes on Programming in C, by Rob Pike, February 21, 1989. Available
on the Web from many sites. Perhaps the most widely cited location is
http : // www . lysato r. liu .s e /c/ pikestyle . h t rnl. (Many other useful
articles are available from one level up: http : // www . lysato r. liu . se /c/ .)
Rob Pike worked for many years at the Bell Labs research center where C and
Unix were invented and did pioneering development there. His notes distill
many years of experience into a "philosophy of clarity in programming" that
is well worth reading.
4. The various links at ht tp: // www . c hris-l o tt . o rg / r es o u r ces / cstyl e/ .
This site includes Rob Pike's notes and several articles by Henry Spencer. Of
particular note is the Recommended C StyLe and Coding Standards, originally
written at the Bell Labs Indian Hill site.

1.7 Summary
• "Files and processes" summarizes the Linux/Unix worldview. The treatment of
fi les as byte streams and devices as files, and the use of standard input, output,
and error, simplify program design and unify the data access model. The permis-
sions model is simple, yet Bexible, applying to both files and direcrories.
• Processes are running programs that have user and group identifiers associated
with them for permission checking, as well as other attributes such as open files
and a current working directory.
• The most visible difference between Standard C and Original C is the use of
function prototypes for stricter type checking. A good C programmer should be
able to read Original-style code, since many existing programs use it. New code
should be written using prototypes.
• The GNU Coding Standards describe how to write GNU programs. They provide
numerous valuable techniques and guiding principles for producing robust, usable
22 Chapter 1 • Introduction

software. The "no arbitrary limits" principle is perhaps the single most important
of these. This document is required reading for serious programmers.
• Making programs portable is a significant challenge. Guidelines and tools help ,
but ultimately experience is needed too.

Exercises

1. Read and comment on the article "The GNU Project",l1 by Richard M.


Stallman, originally written in August of 1998.

Il http : //www . gnu . org/gnu/thegnuproject . html


In this chapter

• 2. 1 Opti o n and Argument Conventions page 24


• 2.2 Basi c Command-Line Processing page 28
• 2.3 Option Parsing: ge t opt () and getopt_l ong () page 30
• 2 .4 Th e Environment page 40
• 2.5 Summary page 49

• Exe rcises page 50

23
C ommand-line option and argument interpretation is usually the first task of
any program. This chapter examines how C (and C++) programs access their
command-line arguments, describes standard routines for parsing options, and takes
a look at the environment.

2.1 Option and Argument Conventions


The word arguments has two meanings. The more technical definition is "all the
'words' on the command line." For example:
$ Is main . c opts.c process.c

Here, the user typed four "words." All four words are made available ro the program
as Its arguments.
The second definition is more informal: Arguments are all the words on the command
line except the command name. By default, Unix shells separate arguments from each
other with whitespace (spaces or TAB characters). Quoting allows arguments to include
whitespace:
$ echo here are lots of spaces
here are lots of spaces Th e shell "eats" the spaces
$ echo "here are lots of spaces"
here are lots of spaces Spaces are preserved
Quoting is transparent to the running program; e cho never sees the double-quote
characters. (Double and single quotes are different in the shell; a discussion of the rules
is beyond the scope of this book, which focuses on C programming.)
Arguments can be further classified as options or operands. In the previous two exam-
ples all the arguments were operands: files for I s and raw text for echo.
Options are special arguments that each program interprets. Options change a pro-
gram's behavior, or they provide information to the program. By ancient convention,
(almost) universally adhered to, options start with a dash (a.k.a. hyphen, minus sign)
and consist of a single letter. Option arguments are information needed by an option,
as opposed to regular operand arguments. For example, the fgrep program's - f option
means "use the contents of the following file as a list of strings to search for." See
Figure 2.1.

24
2. 1 O ption and Argumem Co nvemions 25

Command name

Option

Option argument

-
I r-----rr~r

fg rep - f p atfi le fo o . c bar . c baz . c


Operands

FIGURE 2.1
Command-line components

Thus, patfile is not a data file to search, but rather it's for use by fgre p in defining
the list of strings to search for.

2.1.1 POSIX Conventions


The POSIX standard describes a number of conventions that standard-conforming
programs adhere to. Nothing requires that your programs adhere to these standards,
but it's a good idea for them to do so : Linux and Unix users the world over understand
and use these conventions, and if your program doesn't follow them, yo ur users will
be unhappy. (Or you won't have any users !) Furthermore, the fun ctio ns we discuss
later in this chapter relieve you of the burden of manually adhering to these conventions
for each program you write. Here they are, paraphrased from the standard:

1. Program names should h ave no less than rwo and no more than nine characters.
2. Program names should consist of only lowercase letters and digits.
3. Option names should be single alphanumeric characters. Multidigit options
sho uld not be allowed . For vendors implementing the POSIX utilities, the - w
option is reserved for vendor-specific options.
4 . All options should begin with a '-' character.
5. For options that don' t require option arguments, it sh ould be possible to group
multiple options after a single '- ' character. (For example, 'f o o -a - b -c'
and 'foo - abc' should be treated the same way.)
6. When an option does require an option argument, the argument should be
separated from the option by a space (for example, 'f grep -f pa tfile').
26 Chapter 2 • Argumems, Oprions, and rhe Environmem

The standard, however, does allow for historical practice, whereby sometimes
the option and the operand could be in the same string: ' f g r ep - f p atfile' .
In practice, the getopt () and getopt_ long () functions interpret '- fpatfile'
as '-f patfile', not as ' - f -p -a - t . . . '.
7. Option arguments should not be optional.
This means that when a program documents an option as requiring an option
argument, that option 's argument must always be present or else the program
will fail. GNU getopt () does provide for optional option arguments since
they' re occasionally useful.
8. If an option takes an argument that may have multiple values, the program
should receive that argument as a single string, with values separated by commas
or whitespace.
For example, suppose a h ypothetical program myprog requires a list of users
for its -u option. Then, it should be invoked in one of these two ways:
myprog -u "arnold, joe, jane" Separate with commas
myprog -u "arnold joe jane" Separate with whitespace
In such a case, you're on your own for splitting out and processing each value
(that is, there is no standard routine), but doing so manually is usually
straightforward.
9. Options should come first on the command line, before operands. Unix versions
of getopt () enforce this convention . GNU getopt () does not by default,
although you can tell it to .
10. The special argument ' - -' indicates the end of all options. Any subsequent ar-
guments on the command line are treated as operands, even if they begin with
a dash.
11. The order in which options are given should not matter. However, for mutu-
ally exclusive options, when one option overrides the setting of another, then
(so to speak) the last one wins. If an option that has arguments is repeated, the
program should process the arguments in order. For example, 'myprog - u
arnold - u jane' is the same as 'myprog - u "arno l d , j ane"'. (You have
to enforce this yourself; getopt ( ) doesn ' t help you. )

12. It is OK for the order of operands to matter to a program. Each program should
document such things.
2. 1 Option and Argument Conventions 27

13. Programs that read or write named files should treat the single argument' -' as
meaning standard input or standard output, as is appropriate for the program.

Note that many standard programs don't follow all of the above conventions . The
primary reason is historical compatibility; many such programs predate the codi fYing
of these conventions.

2.1.2 GNU Long Options


As we saw in Section 1.4 .2 , "Program Behavior," page 16, GNU programs are en-
co uraged to use lo ng options of the form --help, - -verbos e , and so on. Such op tio ns,
since they start with' - -', do not conRict with the POSIX conventions . They also can
be easier to remember, and they provide the opportunity for consistency across all GNU
utilities. (For example, - - help is the same everywhere, as compared with - h for "help ,"
- i for "information," and so on.) GNU long options have their own conventions, im-
plemented by the getopt_long () function:

1. For programs implementing POSIX utilities, every short (single-letter) option


should also have a long option.
2. Additional GNU-specific long options need not have a corresponding short
option, but we recommend that they do.
3. Long options can be abbreviated to the shortest string that remains unique.
For example, if there are two options --verbos e and --verbatim, the
shortest possible abbreviations are --verbo and --verba.
4. Option arguments are separated from long options either by whitespace or by
an = sign. For example, --s ourcefile= /some / f ile or --sourcef ile
I some l fil e.
5. Options and arguments may be interspersed with operands on the command
line; getopt_ long () will rearrange things so that all options are processed
and then all operands are available sequentially. (This behavior can be sup-
pressed. )
6. Option arguments can be optional. For such optio ns, the argument is deemed
to be present if it's in the same string as the option. This works only for short
options. For example, if -x is such an option , given 'f oo -xYANKEE S -y', the
argument to -x is 'YANKE ES'. For 'foo -x -y', there is no argument to -x.
28 Chapter 2 • Arguments, Options, and the Environment

7. Programs can choose to allow long options to begin with a single dash. (This
is common with many X Window programs.)

Much of this will become clearer when we examine getopt_long () later In

the chapter.
The GNU Coding Standards devotes considerable space to listing all the long and
shorr options used by GNU programs. If you're writing a program that accepts long
options, see if option names already in use might make sense for you to use as well.

2.2 Basic Command-Line Processing


A C program accesses its command-line arguments through its parameters, argc
and argv. The argc parameter is an integer, indicating the number of arguments there
are, including the command name. There are rwo common ways to decl are main ( ) ,
varying in how argv is declared:
in t ma i n ( int argc , c har *argv[]) i n t main(in t a r gc , char ** argv )

Practically speaking, there's no difference berween the rwo declaratio n s, although


the first is conceptually clearer: argv is an array of pointers to characters. The second
is more commonly used: ar gv is a pointer to a pointer. Also, the second definition is
technically more correct, and it is what we use. Figure 2.2 depicts this situation.

c ha r ** char *

* -.,----1..... "ca t " C strings, terminated with ' \ 0 '


- + ---1..... "filel"
argv - -ii-- - I..... " file2 "
NULL pointer, binary zero

FIGURE 2.2
Memory for argv

By convention, argv [0 1 is the program's name. (For details, see Section 9.1.4.3,
"Program Names and argv [0 1," page 297.) Subsequent entries are the command line
arguments. The final entry in the argv array is a NULL pointer.
2.2 Basic Comma nd-Line Processing 29

argc indicates how m an y arguments there are; since C is zero-based , it is always true
that 'argv [a rgc] == NULL' . Because of this, particul arly in Unix code, you will see
different ways of checking for the end of arguments, such as looping until a counter is
greater than or equal to argc , or until 'argv [i] == 0' or while ' * argv ! = NULL' and
so on . These are all equivalent.

2.2.1 The V7 echo Program


Perhaps the simplest example of command-line processing is the V7 echo program ,
which prints its arguments to standard output, separated by spaces and terminated with
a newline. If the first argument is -n, then the trailing newline is omitted. (This is used
for prompting from shell scripts.) Here's the code: l
1 #include <stdio . h>
2
3 main(argc, argv) int main(int argc, char **argv)
4 int argc;
5 char *argv[] ;
6 (
7 register int i , nflg ;
8
9 nflg = 0 ;
10 if(argc > 1 && argv[l] [0] && argv[l] [1] 'n') (
11 nflg++ ;
12 argc- -;
13 argv+ +;
14
15 for(i=l; i<argc; i++) (
16 fputs(a rgv[i ], stdout) ;
17 if ( i < a rgc -1 )
18 putchar( ' ' ) ;
19
20 if(nflg == 0)
21 putchar( '\n');
22 exit (0) ;
23

Only 23 lines! There are two points of interest. First , decrementing argc and simul-
taneo usly incrementing argv (lines 12 and 13) are common ways of skipping initial
arguments. Second, the check for -n (line 10) is simplistic. - no-newl ine-at-the-
end also works. (Compile it and try it!)

I See /u sr / src / cmd/ echo . c in (h e V7 dimibu(io n .


30 Chapter 2 • Argumenrs, Options, and the Environmenr

Manual option parsing is common in V7 code because the ge topt () function hadn't
been invented yet.
Finally, here and in other places throughout the book, we see use of the r eg i s ter
keyword. At one time, this keyword provided a hint to the compiler that the given
variables should be placed in CPU registers, if possible. Use of this keyword is obsolete;
modern compilers all base register assignment on analysis of the source code, ignoring
the r egister keyword. We've chosen to leave code using it alone, but you should be
aware that it has no real use anymore. 2

2.3 Option Parsing: getopt () and getopt_long ( )


Circa 1980, for System III, the Unix Support Group within AT&T noted that each
Unix program used ad hoc techniques for parsing arguments. To make things easier
for users and developers , they developed most of the conventions we listed earlier. (The
statement in the System III intro(l) man page is considerably less formal than what's
in the POSIX standard, though.)
The Unix Support Group also developed the ge t op t () function, along with several
external variables, to make it easy to write code that follows the standard conventions.
The GNU getopt_long () function supplies a compatible version of getopt ( ) , as
well as making it easy to parse long options of the form described earlier.

2.3.1 Single-Letter Options


The getopt () function is declared as follows:
#include <unistd . h> POSIX

int getopt(int argc, char *const argyl], const char *optstring ) ;

extern char *optarg;


extern int optind, opterr, optopt;

The arguments arg c and argv are normally passed straight from those of main ( ) .
op t string is a string of option letters. If any letter in the string is followed by a colon,
then that option is expected to have an argument.

2 Wh en we asked Jim M eyerin g, the C oreutils maintainer, abour instances of register in the GNU Coreurils,
he gave us an interesting response. H e removes them wh en modifYing code, bur oth erwise leaves them alon e to
make it easier to inregrate changes submirred against existing versions.
2. 3 Op(io n Parsing: g e top t () and getopt_ long ( ) 31

To use getop t ( ) , call it repeatedly from a whi le loo p unti l it returns - 1. Each time
that it finds a valid option letter, it returns that letter. If the option takes an argument,
opta rg is set to point to it. Consider a program that accepts a - a option that doesn't
take an argument and a - b argument that does:
in t oc ; / * op ti on chara c ter * /
char *b_ opt_arg ;

wh ile «(oc = ge t o pt (argc , argv , "ab : " )) ' = -1 ) {


s wi t ch (oc) (
ca se ' a ' :
/ * handl e - a, se t a fla g, whatever * /
b r eak ;
c ase ' b ':
/ * handl e - b, ge t a rg va l ue from opta rg * /
b_opt_ arg = o ptarg;
br e ak;
c as e
/ * error h a n dl ing , see te xt * /
c ase '? I :

d efault :
/ * e rr or han dling , see t e x t * /

As it works, get op t () sets several variables that control error handling.

c ha r *optarg
The argument for an optio n, if the option accepts one.
int opt ind
The current index in argv. When the wh i le loop has finished , rema1l11l1g
operands are found in ar gv [op t ind] through argv [argc- l] . (Remember that
'argv[argc] == NULL' .)
int op t er r
When this variable is nonzero (which it is by default) , ge topt () prints itS own
error messages for invalid options and for missing option argum ents.
int optopt
When an invalid optio n character is fo und, getop t () returns either a ' ? ' or a
, : ' (see below), and opt op t contains the invalid character that was found.

People being human, it is inevitable that programs will be invoked incorrectly, either
with an invalid option or with a missing option argument. In the normal case, getopt ( )
32 Chapter 2 • Arguments, Options, and the Environment

prints its own messages for these cases and returns the ' ? ' character. However, you
can change its behavior in two ways.
First, by setting opt err to 0 before invoking getop t ( ) , you can force get op t ( )
to remain silent when it finds a problem.

Second, if the first character in the opts tring argument is a colon, then getopt ( )
is silent and it returns a different character depending upon the error, as follows:

Invalid option
get opt () returns a ' ?' and optopt contains the invalid option character. (This
is the normal behavior.)
Missing option argument
getopt () returns a ' : ' . If the first character of optst ring is not a colon, then
getopt () returns a '?', making this case indistinguishable from the invalid
optlon case.

Thus, making the first character of op tstring a colon is a good idea since it allows
you to distinguish between "invalid option" and "missing option argument." The cost
is that using the colon also silences get opt ( ) , forcing you to supply your own error
messages. Here is the previous example, this time with error message handling:
int o c; / * option character * /
char *b_opt_ arg;

while ((o c = get op t( argc, argv, " : ab:" )) != -1 ) (


swi tch (oc) (
case . a I:
/ * handle -a, set a flag, whatever * /
break;
case 'b':
/ * handle -b, get arg value from optarg * /
b_opt_arg = optarg;
break;
case I : I :

/ * missing option argument * /


fprintf(stderr, "%s: option ' -%c' requires an argument\n" ,
argv[ O] , optopt ) ;
break;
case '? I :

default :
/ * invalid option * /
fprintf(stderr, "%s: option ' -%c' is invalid : ignored \ n",
argv[ O], optopt ) ;
break;
2.3 Option Parsing: getopt () and getopt_long ( ) 33

A word about flag or option variab le-naming conventions: Much Unix code uses
names of the form xfl g for any given option letter x (for example, nflg in the V7
echo; xflag is also common). This may be great for the program's author, who happens
to know what the x option does wi thout having to check the documentation. But it's
unkind to someone else trying to read the code who doesn' t know the meaning of all
the option letters by heart. It is much better to use names that convey the option's
meaning, such as no_newline for ech o's -n option.

2.3.2 GNU getopt () and Option Ordering


The standard getopt () function sto ps looking for options as soo n as it encounters
a command-line argument that doesn ' t start with a ' - '. GNU getopt () is different:
It scans the entire command line looking fo r o ptions . As it goes along, it permutes
(rearranges) the elements of argv, so that when it's done, all the options have been
moved to the front and code that proceeds to examine ar gv [optind] through
a rgv [argc -1] works correctly. In all cases, the special argument '- -' terminates
. .
optio n scannmg.
You can change the default behavior by usi ng a special first character in opts tring,
as follows:

opt string [O] == '+'


GNU getopt () behaves like standard getopt ( ) ; it returns options in the order
in which they are found, stopping at the first nonoption argument. T his will also
be true if POSIXLY CORRECT exists in the environment.
optstring[ O] == ' - '
GNU get opt () returns every command-line argument, whether or not it repre-
sents an argument. In this case, for each such argument, the function returns the
integer 1 and sets optarg to point to the string.

As for standard get opt (), if the first character of optstring is a ' : " then GNU
getopt () distinguishes between "invalid option" and "missing option argument" by
returning' ?' or ' : ' , respectively. The' : ' in opts tring can be the second character
if the first character is ' +' or ' - '.
Finally, if an option letter in opts tring is followed by two colon characters, then
that option is allowed to have an optional option argument. (Say that three times fast!)
Such an argument is deemed to be present if it's in the same argv element as the option,
34 Chapter 2 • Arguments, Options, and the Environment

and absent otherwise. In the case that it's absent, GNU getopt () returns the option
letter and sets optarg to NULL. For example, given-
whil e ((c = getopt(argc, argv, "ab ::" )) ! = 1)

-for - bYANKEES , the return value is 'b ', and op targ points to "YANKEE S", while
for - b or ' - b YANKEE S', the return value is still 'b' but optarg is set to NULL. In the
latter case, "YANKEE S " is a separate command-line argument.

2.3.3 Long Options


The getopt_l ong () function handles the parsing of long options of the form de-
scribed earlier. An additional routine, getopt_ long_only () works identically, but it
is used for programs where all options are long and options begin with a single' -'
character. Otherwise, both work just like the simpler GNU get opt () function. (For
brevity, whenever we say "getopt_l ong () ," it's as if we'd said "get opt_ long () and
getopt_l ong_only ( ) .") Here are the declarations, from the GNU/Linux getopt(3)
manpage:
#include <getopt . h> CLiBe

inc ge topt_long(int argc, char *c onst argv[] ,


const char *opts tr ing,
const struct option * longopts , int * long index ) ;

int get opt_long _ only (int argc, char *cons t argY ll,
const char *optst ring,
const s truct opt ion *l ongopts , in t *longindex ) ;

The first three arguments are the same as for get opt ( ) . The next option is a pointer
to an array of st ruc t opt ion, which we refer to as the long options table and which
is described shortly. The longindex parameter, if not set to NULL, points to a variable
which is filled in with the index in longopts of the long option that was found. This
is useful for error diagnostics, for example.

2.3.3 .1 Long Options Table


Long options are described with an array of struc t option structures. The struc t
opti on is declared in <getopt. h > ; it looks like this:
2.3 Op rio n Parsing: getopt () and getopt_l ong () 35

struct option {
co nst char *name ;
int has _arg ;
int *f lag;
int va l;
};

T he elements in the structure are as follows:

const char *name


T his is the name of the option, without any leading dashes, for example, "help "
or "v e rbos e" .
int has _ar g
This describes whether the long option has an argument, and if so , what kind of
argument. The value must be o ne of those presenred in Table 2 . 1.
The symbolic constants are macros for the numeric values given in the table. While
the numeric values work, the symbolic cons tants are considerably easier to read,
and yo u sho uld use them instead of the co rresponding numbers in any code that
yo u wnte.
int *flag
If this pointer is NU L L, then get opt_l ong () returns the value in the va l field of
the structure. If it's not NULL, the variable it points to is filled in with the value
in val and getopt_ long () returns o. If the flag isn't NULL but the long option
is never seen, the n the poinred-to variable is not ch anged.
int va l
This is the val ue to return if the long option is seen or to load into * fla g if fla g
is not NUL L. Typically, if flag is not NULL, then val is a true/false value, such as
lor O. O n the other hand, if flag is NULL, then va l is usually a character constant.
If the long option corresponds to a short one, the character constant should be
the same one that appears in the op tstri ng argument for this option. (All of this
will become clearer shortly when we see some examples.)

Each long option has a single entry with the values appropriately filled in. The last
element in the array should have zeros for all the values. The array need not be so rted;
get opt_long () does a linear search. H owever, sorting it by long name may make it
easier for a programmer to read.
36 Chapter 2 • Arguments, Options, and the Environment

TABLE 2.1
Values for has_arg

Symbolic constant Numeric value Meaning


no_argument o The option does not take an argument.
required_argument 1 The option requires an argument.
op ti onal_argument 2 The option 's argument is optional.

The use of flag and v al seems confusing at first encounter. Let's step back for a
moment and examine why it works the way it does. Most of the time, option processing
consists of setting different Bag variables when different option letters are seen, like so:
while ((c = getopt(argc, argv , ":af :hv " )) != -1) {
switch (c) (
case 'a':
do all 1;
break;
case 'f' :
myfile optarg ;
break;
case 'h' :
do_help 1;
break;
case 'v' :
do_verbose 1;
break;
Error handling code here

When flag is not NULL, getopt_long () sets the variable for you. This reduces the
three cases in the previous swi tch to one case. Here is an example long options table
and the code to go wi th it:
int do_all, do_ help, do_verbose ; / * flag variabl es * /
char *myfile;

struct option longopts[] = {


{ "all", no_argument, & do_all, 1 },
{ " file", required_argument, NULL, , f' },
"help", no_argument, & do_help, 1 },
"verbose", no_argument, & do_ verbose, 1 },
0, 0 , 0, 0
};
2.3 Option Parsing: getopt () and get o p t_lo ng () 37

wh i l e ((c = g e topt_ l ong(a rgc, argv , " : f : " , longopts, NULL )) ! = - 1) {


s witch (c) (
case ' f ' :
myfi le o p ta rg ;
br e a k;
case 0 :
/ * ge t op t_ l o n g() set a var iabl e, jus t k eep go i ng * /
break ;
Error handling code here

No tice that the value passed for the op ts tring argument no longer contains' a ' ,
, h' , or ' v ' . This means that the corresponding short optio ns are not accep ted. To allow
both long and short options, yo u would have to restore the corresponding cas e s fro m
the first example to the swi t c h.
Practically speaking, yo u sho uld write your programs such that each short option
also has a co rresp onding long option. In this case, it's easiest to have fl ag be NULL and
val be the corresponding single letter.

2.3.3 .2 Long Options, POSIX Style


The POSIX standard reserves the - w option for vend or-specific features. Thus, by
definition , -w isn' t portable across different systems.
lf w appears in the optst ri ng argument followed by a semicolon (note: nota colon) ,
then getopt_ l ong () treats - Wlongop t the same as --l ongopt . Thus, in the previous
example, change the call to be:
whi le ( ( c = get opt_ l o n g ( a rgc , a rgv , " : f : W; ", long o p ts , NULL )) ! = -1 )

With this change, - Wall is the same as -- all and -Wfil e =myfile is the same as
--fi le =myfile. The use of a semicolon makes it possible for a program to use - Was
a regular option, if desired. (For example, Gee uses it as a regular option, whereas
gawk uses it for POSIX conformance.)

2.3.3.3 getopt_l ong () Return Value Summary


As should be clear by now, g et opt_l ong ( ) provides a flexible mechanism for optio n
parsing. Table 2.2 summarizes the possible return values and their meaning.
38 Chapter 2 • Arguments, Options, and the Environment

TABLE 2.2
getopt_l ong () return values

Return code Meaning


o getopt_l ong () set a flag as found in the long option table.
1 op targ points at a plain command-line argument.
I? I Invalid option.
, ., Missing option argument.
'x' Option character 'x ' .
-1 End of options.

Finally, we enhance the previous example code, showing the full swi tch statement:
int do_all, do_help, do_verbose; / * flag variables * /
char *myfile, *user; /* input file, user name * /

struct option longopts [l = {


"all" , no_argument, & do_all, 1 },
" file" , required_argument, NULL, 'f' },
"help" , no_argument, & do_ help, 1 } ,
"verbose" , no_argument, & do_verbose, 1 },
"userI' opti o nal _ argument, NULL, 'u' },

};
°, 0, 0, }
°
while {(c = get o pt_long (argc, argv, ": ahvf : u :: W;", l o ngopts, NULL }) != -1 ) {
switch (c ) {
case 'a' :
d o all 1;
br eak;
case 'f':
myfile optarg;
break;
case 'h':
do_help 1;
break;
case 'u':
if (optarg ! = NULL )
user optarg;
else
user " root " ;
break;
case 'v' :
do_verbose = 1;
break ;
case 0 : / * getopt_long( ) set a variable, just keep going * /
break;
2.3 Option Parsing: getopt ( ) and getopt_lo ng ( ) 39

#i f 0
case 1 :
1*
* Use this case if getopt_ long() should go through all
* ar guments . If so , add a leading character to opts tri ng.
* Actual code , if any, go es here .
'I
break;
#endif
case ' : ' : 1 * missing op tion argument * 1
fprin tf(stderr , "%s : opt ion '-%c ' requir es an argument\n ",
argv [O], optopt ) ;
break;
case '?' :
default : 1 * invalid option * 1
fprintf(stderr, "%s : option '- %c' is inva lid : i gnored\n",
argv[O] , opto p t) ;
br eak;

In your programs, you may wish to have comments for each option letter explaining
what each one does. However, if you've used descriptive variable names for each option
letter, comments are not as necessary. (Compare do_verbos e to v f lg .)

2.3.3.4 GNU getopt () or getopt_l o ng () in User Programs


You may wish to use GNU getop t () or getopt_long () in your own programs
and have them run on non-Linux systems. That's OK; just copy the source files from
a GNU program or from the GNU C Library (GUBC) CVS archive. 3 The so urce files
are get op t . h, get opt . c , and getop tl. c . They are licensed under the GNU Lesser
General Public License, which allows library functions to be included even in proprietary
programs. You should include a copy of the file COPYING . LI B with your program ,
along with the files getopt. h, getopt . c, and getoptl . c .
Include the source files in your distribution, and compile them along with any other
source files. In your source code that calls get opt_long (), use '#include
<getopt. h>', not '# include "get opt . h "'. Then , when compiling, add -I. to the
C compiler's command line. That way, the local copy of the header file will be
found first.

3 See http : //source s . r edhat.com.


40 Chapter 2 • Arguments, Options, and the Environment

You may be wondering, "Gee, I already use GNU/Linux. Why should I include
getopt_long () in my executable, making it bigger, if the routine is already in the C
library?" That's a good question. However, there's nothing to worry about. The source
code is set up so that if it's compiled on a system that uses GLIBC, the compiled files
will not contain any code! Here's the proof, on our system:
$ uname -a Show system name and type
Linux example 2.4.18-14 #1 Wed Sep 4 13 : 35:50 EDT 2002 i686 i686 i386 GNU/Linux
$ Is -1 getopt.o getoptl.o Show file sizes
-rw-r--r-- 1 arnold devel 9836 Mar 24 13 : 55 getopt.o
-rw-r--r-- 1 arnold devel 10324 Mar 24 13:55 getopt1 . o
$ size getopt.o getoptl.o Show sizes included in executable
text data bss dec hex filename
o 0 0 0 o getopt. o
o o o o o getoptl.o
The size command prints the sizes of the various parts of a binary object or exe-
cutable file. We explain the output in Section 3.1 , "Linux/UnixAddress Space," page 52.
What's important to understand right now is that, despite the nonzero sizes of the files
themselves, they don't contribute anything to the final executable. (We think this is
pretty neat.)

2.4 The Environment


The environment is a set of 'name=valu e' pairs for each program. These pairs are
termed environment variables. Each name consists of one to any number of alphanumeric
characters or underscores C-'), but the name may not start with a digit. (This rule is
enforced by the shell; the C API can put anything it wants to into the environment, at
the likely cost of confusing subsequent programs.)
Environment variables are often used to control program behavior. For example, if
P OS IXLY_CORRECT exists in the environment, many GNU programs disable extensions
or historical behavior that isn't compatible with the POSIX standard.
You can decide (and should document) the environment variables that your program
will use to control its behavior. For example, you may wish to use an environment
variable for debugging options instead of a command-line argument. The advantage
of using environment variables is that users can set them in their startup file and not
have to remember to always supply a particular set of command-line options.
2.4 The Enviro nment 41

Of course, the disadvantage to using environment variables is that they can silently
change a program's behavior. Jim Meyering, the maintainer of the Coreutils, put it
this way:

It makes it easy for the user to customize how the program works without
changing how the program is invoked. That can be both a blessing and a
curse. If yo u write a script that depends on your having a certain environment
variab le set, but then have someone else use that same script, it m ay well fail
(o r worse, silently pro duce invalid results) if that other person d oesn't have
the same environment settings.

2.4.1 Environment Management Functions


Several functions let you retrieve the values of environment variables, change their
values, o r remove them. Here are the declarations:
#include <stdlib . h>

c h ar *ge t env(cons t char *name) ; /50 C: Retrieve environment variable


int setenv (c o nst char * name , co nst char *value, pas/x: Set environment varia ble
int overwr i te ) ;
in t pute nv(char * string) ; XS/: Set environment variable, uses string
v oid uns e tenv(const char *name) ; pas/x: Remove environment va riable
i nt clea r env (v o id ) ; Common: Clear entire environment
The getenv ( ) functio n is the o ne you will use 99 percent of the time. The argument
is the environment variable name to look up, such as "HOME" or "PATH " . If the variable
exists, get env () returns a pointer to the character string val ue. If no t, it returns NULL .
For example:
char *pa thval;

1 * Loo k f or PATH; i f not present , s upply a default va lue */


if ( (pathva l = getenv( " PATH" )) == NULL)
pat hva l = " / bi n : lusr/ bi n : /us r / ucb ";

Occasionally, environment variables exist, but with empty values. In this case, the
return value will be non-NULL, but the first character pointed to will be the zero byte,
which is the C string terminator, ' \ 0 ' . Your code should be careful to check that the
return value pointed to is not NULL. Even if it isn 't NULL, also check that the string is
not empty if you intend to use its value for something. In any case, don ' t just blindly
use the returned value.
42 Chapter 2 • Arguments, Options, and the Environment

To change an envIronment variable or to add a new one to the envIronment,


use setenv ( ) :
if ( setenv ( "PATH", " / bin : l usr / bin : /usr / ucb", 1 ) != 0) {
1* handle failure * 1

It's possible that a variable already exists in the environment. If the third argument
is true (nonzero) , then the supplied value overwrites the previous one. Otherwise, it
doesn 't. The return value is -1 if there was no memory for the new variable, and 0
otherwise. s e t env () makes private copies of both the variable name and the new value
for storing in the environment.
A simpler alternative to s et env () is putenv ( ) , which takes a single" n ame= v al u e"
string and places it in the environment:
if (putenv ( "PATH= / bin : l usr/bin : lusr/ucb") != 0) {
1* handle fai l ure *1
}

pu tenv () blindly replaces any previous value for the same variable. Also, and perhaps
more importantly, the string passed to puten v () is placed directly into the environment.
This means that if your code later modifies this string (for example, if it was an array,
not a string constant) , the environment is modified also. This in turn means that you
should not use a local variable as the parameter for putenv ( ) . For all these reasons
set env () is preferred.

I NOTE The GNU putenv ( ) has an additional (documented) quirk to its


II behavior. If the argument string is a name, then without an = character, the
named variable is removed. The GNU env program, which we look at later in
I this chapter, relies on this behavior.

The unse t env () function removes a variable from the environment:


unsetenv ( "PATH" ) ;

Finally, the cl earenv () function clears the environment entirely:


if ( clearenv () != 0) {
1* handle failure *1
}

This function is not standardized by POSIX, although it's available in GNU/Linux


and several commercial Unix variants. You should use it if your application must be
very security conscious and you want it to build its own environment entirely from
2.4 T he Enviro nment 43

scratch. If cl earenv () is not available, the GNU/ Linux clearenv(3) manpage recom-
mends using ' envi ron = NUL L ; ' to accomplish the task.

2.4.2 The Entire Environment: envi r o n


The correct way to deal with the environment is through the functions described in
the previo us sectio n. H owever, it's worth a look at how things are managed "under
the hood."
The external variable env iron provides access to the envi to nment in the same way
that a rgv provides access to the command-line arguments. You must declare the variable
yourself. Although standardized by POSIX, envi r on is purposely not declared by any
standardized header fil e. (This seems to evolve from historical practice.) H ere is the
declaration:
extern c ha r **env iron; / * Look Ma, no he a der f ile ! */ POSIX

Like a rgv, the final element in environ is NU L L. T here is no "environment count"


variable that corresponds to argc , however. This simple program prints out the entire
enVIro nment:
/ * ch02-printenv . c --- Print out the environment . * /

# include <stdio . h >

ex tern char **envir on;

int main(int argc, char ** argv)

i nt i;

if ( env iron != NULL)


for (i = 0; environ[i] != NULL; i++)
print f ( "'s\n", environ[i ]) ;

retur n 0;

Although it's unlikely to happen, this program makes sure that environ isn' t NUL L
before attempting to use it.
Variables are kept in the environment in random order. Although some Unix shells
keep the environment sorted by variable name, there is no formal requirement that this
be so , and many shells don 't keep them sorted.
44 Chapter 2 • Arguments, Options, and the Environment

As something of a quirk of the implementation, you can access the environment by


declaring a third parameter to main ( ) :
int main(int argc, char **argv, char **envp )

You can then use envp as you would have used environ. Although you may see this
occasionally in old code, we don' t recommend its use; envir o n is the official, standard,
portable way to access the entire environment, should you need to do so.

2.4.3 GNU env


To round off the chapter, here is the GNU version of the env command. This command
adds variables to the environment for the duration of one command. It can also be used
to clear the environment for that command or to remove specific environment variables.
The program serves double-duty for us, since it demonstrates both getopt_ long ( )
and several of the functions discussed in this section. Here is how the program is invoked:
$ env --help
Usage : env [OPTION] ... [-] [NAME =VALUE] . .. [COMMAND [ARG] ... ]
Set each NAME to VALUE in the environment and run COMMAND .

-i, --ignore-environment start with an empty environment


-u, --unset=NAME remove variable from the environment
--help display this help and exit
--version output version information and exit

A mere - implies -i. If no COMMAND, print the resulting environment .

Report bugs to <[email protected]>.

Here are some sample invocations:


$ env - myprog argl Clear environment, run program with args

$ env - PATH=/bin:/usr/bin myprog argl Clear environment, add PATH, run program

$ env -u IFS PATH=/bin:/usr/bin myprog argl Unset IFS, add PATH, run program
The code begins with a standard GNU copyright statement and explanatory comment.
We have omitted both for brevity. (The copyright statement is discussed in Appendix C ,
"GNU General Public License, " page 657. The --help output shown previously is
enough to understand how the program works.) Following the copyright and comments
2.4 The Environment 4S

are header includes and declarations. The 'N_ ( " s tr ing" ) , macro invocati on (line 9 3)
is for use in internationalization and localization of the software, topics covered in
C hapter 13, "Internationalization and Localization," page 485 . For now, you can treat
it as if it were the contained string co nstant.
80 #include <config . h>
81 #include <stdio . h>
82 #include <getopt . h>
83 #include <sys/ types . h>
84 #include <ge topt . h >
85
86 #include "syst em . h"
87 #include "erro r. h"
88 # incl ude "clos e out.h"
89
90 1* The official name of this program (e .g. , no 'g' prefi x ) . *1
91 #define PROGRAM_NAME "env"
92
93 #defin e AUTHORS N_ ("Richard Mlynar i k and Davi d MacKenzie " )
94
95 in t putenv () ;
96
97 e x tern cha r **envi r on;
98
99 1 * The name by wh ich this p r ogram was run . * 1
100 char *program_name;
101
102 static struct o ption const longopts[]
103
104 {" ignore -env ironmen t ", no_ argument , NULL , 'i'} ,
105 {"unset", required_argumen t, NULL, ' u ' },
106 {GETOPT_HEL P_OPTION_DECL},
107 {GETOPT_VERSION_OPTION_DECL } ,
108 {NULL , 0 , NULL , O}
109 };

The GNU Coreutils contain a large number of programs, m any of which perform
the same common tasks (for example, argument parsing) , To make maintenance easier,
m any commo n idioms are defined as macros. GETOPT_ HELP _ OPTI ON_DECL and
GETOPT_VERSION_ OPT I ON (lines 106 and 107) are two such. We examine their defini-
tions shortly. The first function , usage ( ), prints the usage information and exits.
T he _ ( "stri ng " ) macro (line 115, and used throughout the program) is also for
internationalization, and for now you should also treat it as if it were the contained
stnng co nstant.
46 Chapter 2 • Arguments, Opcions, and che Enviro nment

111 void
112 usage (int status )
11 3
11 4 if (status != 0)
115 fprintf (stderr, _ ( "Try '% s --help' for more information. \n"),
11 6 program_name ) ;
117 else
118
119 printf (_ ( " \
120 Usage : %s [O PTION] ... [-] [NAME=VALUE] ... [COMMAND [ARG] . .. ]\ n" ) ,
121 program_name ) ;
122 fputs (_ ( " \
123 Set each NAME to VALUE in the environment and run COMMAND . \ n \
124 \n\
125 -i, --igno re-environment start with an empty environment \n\
126 -u, --unset=NAME remove variable from the environment\n \
127 " ), s tdou t ) ;
128 fputs (HELP_OPTION_DESCRIPTION, stdout ) ;
129 fputs (VERSION_OPTION_DES CR IPTION, stdout);
130 fputs (_ ( " \
131 \ n\
132 A mere - implies -i. If no COMMAND , print the resulting environment . \ n \
133 " ) , stdout ) ;
134 printf (_ ( " \nR eport bugs to <%s>.\n"), PACKAGE_BUGREPORT ) ;
135
136 exit (status);
137 }

The first part of main () declares variables and sets up the internationalization. The
functions setlocale (), bindtextdomain (), and textdomain () (lines 147-149)
are all discussed in Chapter 13, "Internationalization and Localization," page 485. Note
that this program does use the envp argument to main () (line 140). It is the only one
of the Coreutils programs to do so. Finally, the call to a texi t () on line 151 (see Sec-
tion 9.1.5 .3, "Exiting Functions," page 302) registers a Coreutils library function thac
Bushes all pending output and closes stdout, reporting a message if there were problems.
The next bit processes the command-line arguments, using getopt_ long ( ) .
139 int
140 main (register int argc, register char **argv, char **envp )
141
142 char *dummy_environ [ l ] ;
143 int optc ;
144 int ignore_environment = 0 ;
145
146 program_name = argv[O];
147 setlocale (LC_ALL , "" ) ;
148 bindtextdomain (PACKAGE, LOCALEDIR ) ;
149 textdomain (PACKAG E) ;
150
151 atexit (close_stdout) ;
2.4 The Enviro nmenr 47

152
153 whi le (( o p t c = ge top t_long (argc, argv, "+ iu : ", longopts , NULL)) != - 1)
154 (
155 swi tch (optc )
156
157 ca se 0 :
158 b reak;
159 c ase ' i' :
160 ignor e _envir onment 1;
161 brea k ;
162 c ase !u ' :
163 b reak ;
164 c as e _GETOPT_ HELP_CHAR ;
165 ca s e _ GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS) ;
166 defaul t :
167 u s ag e (2);
168
169
170
171 if (optind ! = argc && ! strcmp (a r gv [ optind l , "-"))
172 ignore_e nvir olli~e n t = 1;

H ere are the macros, from src / sys2 . h in the Coreutils distribution, that define
the declarations we saw earlier and the 'cas e_GETOPT_xxx' m acros used above (lines
164- 165);
/ * Factor out s ome of the common --help and -- version processi n g c o d e . */

/ * These enum values ca nnot possibly conflict with the option valu es
o rdinaril y u s ed by commands, including CHAR_MAX + 1, etc . Avoid
CHAR_MIN - 1, as it may equal -1, the getopt end-of-options v alue . */
enum

GETOPT_HELP_CHAR = (CHAR_ MI N - 2) ,
GETOPT_VERSION_CHAR = (CHAR_ MI N - 3 )
};

#define GETOPT_HELP_OPTI ON_DEC L \


"help " , no_ a rgument, 0, GETOPT_HELP_CHAR
#define GETOPT_ VERS ION_OPTION_DECL \
"version", no_ a r gume nt, 0, GETOPT_VERSI ON_ CHAR

#define ca s e_GETOPT HELP CHAR


case GETOPT_HELP_CHAR :
usage (EXIT_SUCCES S ) ;
break;

#def ine cas e_GETOPT_VERSION_ CHAR ( Progra m_name , Authors )


c ase GETOPT_ VERSION_C HAR :
versi o n _etc (stdout , Prog r am_name , PACKAGE , VERSI ON, Author s) ;
exit ( EX IT_SUCCES S ) ;
b reak;
48 Chapter 2 • Arguments, Options, and the Environment

The upshot of this code is that --help prints the usage message and --version
prints version information. Both exit successfully. ("Success" and "failure" exit statuses
are described in Section 9.1.5.1 , "Defining Process Exit Status," page 300.) Given that
the Coreutils have dozens of utilities , it makes sense to factor out and standardize as
much repetitive code as possible.
Returning to env. c:
174 environ = dummy_environ;
175 envir on[O) = NULL;
176
177 if ( ! ignore_environment )
178 for ( ; *envp; envp++ )
179 putenv ( *envp ) ;
180
181 optind = 0 ; / * Force GNU getopt to re-initialize . * /
182 while ((optc = getopt_long (argc, argv, "+iu:", longopts, NULL )) != -1 )
183 i f (optc == 'u' )
184 putenv ( optarg ) ; /* Requires GNU putenv. */
185
186 if (optind ! = argc && !strcmp (argv[optindJ, "-" )) Skip options
187 ++ optind;
188
189 while (optind < argc && strchr (argv[ optind), ' =' )) Set environment variables
190 putenv (argv[optind++ )) ;
191
192 / * If no program is specified, print the environment and exit. * /
193 if (optind == argc )
194 {
195 while ( *environ )
196 puts (* envir on++ ) ;
197 exit ( EXIT_SUCCESS ) ;
198

Lines 174-179 copy the existing environment into a fresh copy of the environment.
The global variable environ is set to point to an empty local array. The envp parameter
maintains access to the original environment.
Lines 181-184 remove any environment variables as requested by the -u option.
The program does this by rescanning the command line and removing names listed
there. Environment variable removal relies on the GNU putenv () behavior discussed
earlier: that when called with a plain variable name, putenv () removes the environment
variable.
After any options, new or replac~ment environment variables are supplied on the
command line. Lines 189-190 continue scanning the command line, looking for envi-
ronment variable settings of the form 'name=value'.
2.5 Summary 49

Upon reaching line 192, if nothing is left on the command line, env is supposed to
print the new environment, and exit. It does so (lines 195-197).
If arguments are left, they represent a command name to run and arguments to pass
to that new command. This is done with the execvp () system call (line 200), which
replaces the current program with the new one. (This call is discussed in Section 9.l.4 ,
"S tarting New Programs: The exec () Family," page 293; don 't worry about the details
for now.) If this call returns to the current program, it failed. In such a case, env prints
an error message and exits .
200 e xe cvp (argv[optind] , &argv[optind]) ;
201
202
203 int exit_status = (errno == ENOENT ? 127 : 126) ;
204 error (0 , errno , "%s", argv[optind]);
205 exit (e xit_status ) ;
206
207

The exit status values, 126 and 127 (determined on line 203) , conform to POSIX.
127 means the program that execvp () attempted to run didn ' t exist. (ENOENT means
the file doesn ' t have an entry in the directory.) 126 means that the file exists, but
something else went wrong.

2. 5 Summary
• C programs access their command-line arguments through the parameters argc
and argv. The getopt () function provides a standard way for consistent parsing
of options and their arguments. The GNU version of getopt () provides some
extensions, and getopt_1ong () and getopt_1ong_ on1y () make it possible to
easily parse long-style options.
• The environment is a set of 'name=value' pairs that each program inherits from
its parent. Programs can, at their author's whim, use environment variables to
change their behavior, in addition to any command-line arguments. Standard
routines (getenv ( ), setenv () , putenv ( ) , and unsetenv ( )) exist for retrieving
environment variable values, changing them, or removing them. If necessary, the
entire environment is available through the external variable environ or
through the char * * envp third argument to rna in ( ). The latter technique is
discouraged.
50 Chapter 2 • Arguments, Options, and the Environment

Exercises

1. Assume a program accepts options -a, -b, and -c, and that - b requires an ar-
gument. Write the manual argument parsing code for this program, without
using getopt () or getopt_ long ( ). Accept - - to end option processing.
Make sure that - ac works, as do -bYANKEES, -b YANKEES, and -abYANKEES.
Test your program.
2. Implement getopt ( ) . For the first version, don't worry about the case in which
'optstring [0 1 == ,:, '. You may also ignore opterr .
3. Add code for 'optstring [0 1 ' . " and opt err to your verSlOn of
get opt ().
4. Print and read the GNU get opt. h, get op t . c and get opt l . c files.
5. Write a program that declares both env ir on and envp and compares their
values.
6. Parsing command line arguments and options is a wheel that many people
can't refrain from reinventing. Besides get opt () and getopt_l ong ( ) , you
may wish to examine different argument-parsing packages, such as:
• The Plan 9 From Bell Labs arg(2) argument-parsing library,4
• Argp, 5
• Argv,6
• Autoopts'?
• GNU Gengetopt,8
• Opt,9
• PopelO See also the popt(3) manpage on a GNU/Linux system.
7. Extra credit: Why can't a C compiler completely ignore the regist er keyword?
Hint: What operation cannot be applied to a register variable?

4 http : // plan9 . bell-labs . c om/ magic / man2html/2 / arg


5 http: // www . gnu.org / manua l/ glibc/html_node / Argp . html
6 http: // 25 6 . c om/ s o u r ces / argv
7 http : // autogen.sourceforge . ne t / autoopts.htm l
8 ftp : // ftp . gnu . org/gnu/gengetopt /
9 http://nis-www . lanl . gov / -jt / Software / opt /opt-3 . 19 . tar . gz
10 ht tp: / / freshmeat . net / proj ects / popt I?topic_id=809
In this chapter

• 3. 1 Linux/ UnixAddressSpace page 52


• 3.2 Memory Al locat ion page 56

• 3.3 Summary page 80

• Exercises page 8 1

51
W ithout memory for storing data, it's impossible for a program to get any
work done. (Or rather, it's impossible to get any useful work done.) Real-
world programs can't afford to rely on fixed-size buffers or arrays of data structures.
They have to be able ro handle inputs of varying sizes, from small ro large. This in
turn leads to the use of dynamically allocated memory--memory allocated at runtime
instead of at compile time. This is how the GNU "no arbitrary limits" principle is
put into action.
Because dynamically allocated memory is such a basic building block for real-world
programs, we cover it early, before looking at everything else there is to do. Our
discussion focuses exclusively on the user-level view of the process and its memory;
it has nothing to do with CPU architecture.

3 .1 Linuxj Unix Add ress Space


For a working definition, we've said that a process is a running program. This means
that the operating system has loaded the executable file for the program into memory,
has arranged for it to have access to its command-line arguments and environment
variables, and has started it running. A process has five conceptually different areas of
memory allocated to it:
Code
Often referred to as the text segment, this is the area in which the executable in-
structions reside. Linux and Unix arrange things so that multiple running instances
of the same program share their code if possible; only one copy of the instructions
for the same program resides in memory at any time. (This is transparent to the
running programs .) The portion of the executable file containing the text segment
is the text section.
Initialized data
Statically allocated and global data that are initialized with nonzero values live in
the data segment. Each process running the same program has its own data segment.
The portion of the executable file containing the data segment is the data section.

52
3.1 Linux/Unix Address Space 53

Zero-initialized data
Global and statically allocated data that are initialized to zero by default are kept
in what is colloquially called the BSS area of the process. ! Each process running
the same program has its own BSS area. When running, the BSS data are placed
in the data segment. In the executable file, they are stored in the BSS section.
The format of a Linux/U nix executable is such that only variables that are initialized
to a nonzero value occupy space in the executable's disk file. Thus, a large array
declared 'stat ic char somebuf [2048] ; ', which is automatically zero-filled,
does not take up 2 KB worth of disk space. (So me compilers have options that let
you place zero-initialized data into the data segment.)
Heap
The heap is where dynamic memory (obtained by ma llo c () and friends) comes
from. As memory is allocated on the heap , the process's address space grows, as
you can see by watching a running program with the ps command.
Although it is possible to give memory back to the system and shrink a process's
address space, this is almost neve r done. (We distinguish between releasing no-
longer-needed dynami c memory and shrinking the address space; this is discussed
in more detail later in this chapter.)
It is rypical for the heap to "grow upward. " This means that successive items that
are added to the heap are added at addresses that are numerically greater than
previous items. It is also rypical for the heap to start immediately after the BSS
area of the data segment.
Stack
The stack segment is where local variables are allocated. Local variables are all
variables declared inside the opening left brace of a function body (or other left
brace) that aren't defined as s tatic .
On most architectures, function parameters are also placed on the stack, as well
as "invisible" bookkeeping information generated by the compiler, such as room
for a function return value and storage for the return address representing the return
from a function to its caller. (Some architectures do all this with registers.)

I BSS is an acronym for "B lock Started by Symbol," a mnemonic from the IBM 7094 asse mbler.
S4 Chapter 3 • User-Level Memory Managemenr

It is the use of a stack for function parameters and return values that makes it
convenient to write recursive functions (functions that call themselves).
Variables stored on the stack" disappear" when the function containing them re-
turns; the space on the stack is reused for subsequent function calls.
On most modern architectures, the stack "grows downward," meaning that items
deeper in the call chain are at numerically lower addresses.

When a program is running, the initialized data, BSS, and heap areas are usually
placed into a single contiguous area: the data segment. The stack segment and code
segment are separate from the data segment and from each other. This is illustrated in
Figure 3.1.

High Address
Program Stack
STACK SEGMENT

Stack grows downward

Possible "hole"
in address space

Heap grows upward

Heap

BSS: zero·filled DATA SEGMENT


variables

Globals and
Static variables
(Data)
Low Address

Executable code
(shared)
TEXT SEGMENT

FIGURE 3.1
LinuxjUnix process address space
3.1 Linux/U nix Address Space ss

Although it's theoretically possible for the stack and heap to grow into each other,
the operating system prevents that event, and any program that tries to make it happen
is asking for trouble. This is particularly true on modern systems, on which process
address spaces are large and the gap between the top of the stack and the end of the
heap is a big one. The different memory areas can have different hardware memory
protection assigned to them. For example, the text segment might be marked "execute
only," whereas the data and stack segments would have execute permission disabled.
This practice can prevent certain kinds of security attacks. The details, of course, are
hardware and operating-system specific and likely to change over time. Of note is that
both Standard C and c++ allow canst items to be placed in read-only memory. The
relationship among the different segments is summarized in Table 3.l.

TABLE 3 .1
Executable program segments an d their locations

Program me mory Address space segment Executable file section


Code Text Text
Initialized data Data Data
BSS Data BSS
Heap Data
Stack Stack

The size program prints out the size in bytes of each of the text, data, and BSS
sections, along with the total size in decimal and hexadecimal. (The c h 0 3 -memaddr. c
program is shown later in this chapter; see Section 3.2.5, "Address Space Examination,"
page 78.)
$ cc -0 ch03-memaddr . c -0 ch03-memaddr Compile the program
$ Is -1 ch03-memaddr Show total size
-rwxr- x r-x 1 arnold devel 12320 Nov 24 16 : 4 5 ch03-memaddr
$ size ch03-memaddr Show component sizes
tex t data bss dec hex fil ename
1458 276 8 1742 6ce c h 03-memaddr
$ strip ch03-memaddr Remove symbols
$ Is -1 chO 3 -memaddr Show total size again
- rwxr-xr - x 1 arnold devel 3480 Nov 24 16 : 45 ch03-memaddr
$ size chO 3 -memaddr Component sizes haven't changed
tex t da t a bss dec hex filename
1458 276 8 1742 6ce ch03 - memaddr
56 Chapter 3 • User-Level Memory Management

The total size of what gets loaded into memory is only 1742 bytes, in a file that is
12,320 bytes long. Most of that space is occupied by the symbols, a list of the program's
variables and function names. (The symbols are not loaded into memory when the
program runs.) The strip program removes the symbols from the object file . This can
save significant disk space for a large program, at the cost of making it impossible to
debug a core dump2 should one occur. (On modern systems this isn' t worth the trouble;
don ' t use strip.) Even after removing the symbols, the file is still larger than what gets
loaded into memory since the object file format maintains additional data about the
program, such as what shared libraries it may use, if any. 3
Finally, we'll mention that threads represent multiple threads of execution within a
single address space. Typically, each thread has its own stack, and a way to get thread
local data, that is, dynamically allocated data for private use by the thread. We don ' t
otherwise cover threads in this book, since they are an advanced topic.

3.2 Memory Allocation


Four library functions form the basis for dynamic memory management from C.
We describe them first , followed by descriptions of the two system calls upon which
these library functions are built. The C library functions in turn are usually used to
implement other library functions that allocate memory and the c ++ new and delete
operators.
Finally, we discuss a function that you will see used frequently, but which we don't
recommend.

3.2.1 Library Calls: malloc ( ) , calloc ( ) , realloc ( ) , free ( )


Dynamic memory is allocated by either the malloe () or calloe () functions. These
functions return pointers to the allocated memory. Once you have a block of memory

2 A core dump is the memory image of a running process created when the process terminates unexpectedly. It may
be used later for debugging. Unix systems named the file core, and GNU/Linux systems use core. pid, where
pi d is the process 10 of the process that died.

3 The description here is a deliberate simplificati on. Runn ing programs occupy much more space than the size
progran1 indicates, since shared libraries are included in the address space. Also, the data segment will grow as a
program allocates memory.
3.2 Memory Allocarion 57

of a certain initial size, you can change its size with the rea lloc () function. D ynamic
memory is released with the fr ee () function.
Debugging the use of dynamic memory is an important top ic in its own right. We
discuss tools for this purpose in Section 15.5.2, "Memory Allocation Debuggers, "
page 612.

3.2.1.1 Examining C Language Details


Here are the function declarations from the GNU/Linux mal/oc(3) manpage:
~in clude <stdli b . h> ISO C

voi d *call oc(s ize_ t nmemb, s ize_t siz e ) ; Allocate and zero fill
v oid *mal loc ( si ze_ t si z e ) ; Allocate raw memory
vo id fr ee (v o i d *ptr ) ; Release memory
void *real loc(vo i d *ptr, s ize_t size) ; Change size of existing allocation
The allocation functions all return rype void *. This is a typefess or generic pointer;
all yo u can do with such a pointer is cast it to a different rype and assign it to a ryped
pointer. Examples are coming up,
The rype s iz e _ t is an unsigned integral rype that represents amounts of memory.
It is used for dynamic memory allocation, and we see many uses of it throughout the
book. On most modern systems, size_t is uns igned l ong, but it's better to use
size_t explicitly than to use a plain uns i gned integral rype.

The ptrdi f f _t rype is used for address calculations in pointer arithmetic, such as
calculating where in an array a pointer may be pointing:
#de fin e MAXBUF ...
c har *p;
c ha r buf [MAXBUF ] ;
p trdiff_ t where;

p = buf;
while ( s ome condi t ion )

P += something ;

where = p - buf; / * wha t index are we a t ? */


}

The <stdlib.h> header file declares many of the standard C library routines and
rypes (such as siz e _ t), and it also defines the preprocessor constant NULL, which rep-
resents the" null" or invalid pointer. (This is a zero value, such as 0 or ' ( (va id *) 0)' .
58 Chapter 3 • User-Level Memory Management

The C++ idiom is to use 0 explicitly; in C, however, NULL is preferred, and we find it
to be much more readable for C code.)

3.2.1.2 Initially Allocating Memory: rnalloc ( )


Memory is allocated initially with mall oe () . The value passed in is the total number
of bytes requested. The return value is a pointer to the newly allocated memory or NULL
if memory could not be allocated. In the latter event, errno will be set to indicate the
error. (errno is a special variable that system calls and library functions set to indicate
what went wrong. It's described in Section 4.3, "Determining What Went Wrong, "
page 86.) For example, suppose we wish to allocate a variable number of some structure.
The code looks something like this:
struct c oo rd { /* 3D coordinates * /
int x, y, z;
} *coo rdinates;
unsigned int count ; / * how many we need * /
size_t amount; /* total amount of memo ry * /

/ * ... determine count somehow ... * /


amount = count * sizeof(struct coord ) ; / * how many bytes to allocate * /

coordinates = (struct coord * ) malloc (amount) ; / * get the space */


if (coordinates == NULL ) {
/* report error, recover or give up * /

/* use coordinates ... * /

The steps shown here are quite boilerplate. The order is as follows:
1. Declare a pointer of the proper rype to point to the allocated memory.
2. Calculate the size in bytes of the memory to be allocated. This involves multi-
plying a count of objects needed by the size of the individual object. This size
in turn is retrieved from the C sizeof operator, which exists for this purpose
(among others). Thus, while the size of a particular struet may vary across
compilers and architectures, sizeof always returns the correct value and the
source code remains correct and portable.
When allocating arrays for character strings or other data of type ehar, it is
not necessary to multiply by sizeof (ehar) , since by definition this is always
1. But it won' t hurt anything either.

3. Allocate the storage by calling mall oe ( ) , assigning the function's return value
to the pointer variable. It is good practice to cast the return value ofmalloc ()
3.2 M emo ry Allocarion 59

ro that of the variable being assigned co . In C ie's n ot required (although the


compiler may generate a warning). We strongly recommend always casting the
return value.
Note that in C++, assignment of a pointer value of one type ro a pointer of
another type does requires a cast, whatever the context. For dynamic memory
management, C++ program s sho uld use new and delet e , ro avoid type prob-
lems, and not mallo c () and f ree ( ) .
4. Check the return value. N ever assume that memory allocation will succeed. If
the allocation fails , malloc () returns NULL . If you use the value without
checking, it is likely that your program will immediately die from a segmentation
violation (o r segf'ault), which is an attempt ro use m emory n o t in you r address
space.
If you check the return value, you can at least print a diagnostic message and
terminate gracefully. Or you can attempt so me other method of recovery.

Once we've allocated memory and set coordi nates to point to it, we can then treat
coo r dinat e s as if it were an array, although it's really a pointer:
int cur_x, cur_Y t cur_Z i
siz e_t an_index;
an_i nde x = someth ing;
cur_x coordina t es (an_ index].x;
cur_y = coordina tes (an_i ndex] . y ;
cur_z = co ordina tes [an_index] . z;

The compiler generates correct code for indexing through the pointer ro retrieve the
members of the sttucture at coordi na t es [an_index 1 .

I NOTE The memory returned byma lloc () is not initialized. It can contain any
j[ random garbage. You should immediately initialize the memory with valid data
I or at least with zeros . To do the latter, use mems et () (discussed in Section 12 .2,
m " Low-Level Memory: The memXXX () Functions, " page 432 ):
ffi memset(coordinates , '\0 ' , amount) ;

I
ill
Another option is to use ca lloc ( ) , described shortly.

Geoff Collyer recommends the following technique for allocating memory:


some_type *pointer;

poi nter = mal l oc(count * sizeof(*pointer));


60 Chapter 3 • User-Level Memory Management

This approach guarantees that the mall oc () will allocate the correct amount of
memory without your having to consult the declaration of p ointer. If p o inter's type
later changes, the s i zeo f operator automatically ensures that the count of byres to al-
locate stays correct. (Geoffs technique omits the cast that we just discussed. Having
the cast there also ensures a diagnostic if po i n t e r's type changes and the call to
mallo c ( ) isn't updated.)

3.2.1.3 Releasing Memory: free ( )


When you're done using the memory, you "give it back" by using the free ( )
function . The single argument is a pointer previously obtained from one of the other
allocation routines. It is safe (although useless) to pass a null pointer to fr ee ( ) :
fr ee (coordinates) ;
coo r d inates = NULL; / * n ot required, but a g ood i dea * /

Once fr ee (coo r dinates) is called, the memory pointed to by coordi nat e s is


off limits. It now "belongs" to the allocation subroutines, and they are free to manage
it as they see fit. They can change the contents of the memory or even release it from
the process's address space! There are thus several common errors to watch out for with
fr ee ( ) :

Accessingfreed memory
If unchanged, c oo r d i nates continues to point at memory that no longer belongs
to the application. This is called a dangling pointer. In many systems, you can get
away with continuing to access this memory, at least until the next time more
memory is allocated or freed. In many others though, such access won ' t work.
In sum, accessing freed memory is a bad idea: It's not portable or reliable, and the
GNU Coding Standards disallows it. For this reason, it's a good idea to immediately
set the program's pointer variable to NULL. If you then accidentally attempt to
access freed memory, YOut program will immediately fai l with a segmentation
fault (before you've released it to the world, we hope).
Freeing the same pointer twice
This causes "undefined behavior. " Once the memory has been handed back to
the allocation routines, they may merge the freed block with other free storage
under management. Freeing something that's already been freed is likely to lead
to confusion or crashes at best, and so-called double frees have been known to
lead to security problems.
3. 2 Memory Allocarion 61

Passing a pointer not obtained.from mall oe () , ea lloe ( ) , or reall oe ( )


This seems obvious , but it's important nonetheless. Even passing in a pointer to
somewhere in the middle of dynamically allocated memory is bad:
free(coordinat e s + 10) ; / * Release all bu t first 10 elements . * /

This call won't work, and it's likely to lead to disastrous consequences, such as a
crash. (This is because m any ma lloe () implememations keep "bookkeeping"
information in font o/the ret urned data. When free () goes to use that informa-
tion, it will find invalid data there. Other implemematio ns have the bookkeeping
information at the end of the allocated ch unk; the same issues apply.)
Buffer overruns and underruns
Accessing memory outside an allocated chunk also leads to undefined behavior,
again because this is likely to be bookkeeping information or possibly memory
that's not even in the address space. Writing into such memory is much worse,
since it's likely to destroy the bookkeeping data.
Failure to .free memory
Any dynamic memory that's not needed should be released. In particular, memory
that is allocated inside loo ps or recursive or deeply nested function calls should
be carefully managed and released. Failure to take care leads to memory leaks,
whereby the process's memory can grow without bounds ; eventually, the process
dies from lack of memory.
This si tuation can be particularly pernicious if memory is allocated per input
record or as some other function of the input: The memory leak won' t be noticed
when run on small inputs but can suddenly become obvious (and embarrassing)
when run on large ones . T his error is even worse for systems that must run contin-
uously, such as telephone switching systems. A memory leak that crashes such a
system can lead to significant monetary or other damage.
Even if the program never dies for lack of memory, constantly growing programs
suffer in performance, because the operating sys tem has to manage keeping in-use
data in physical m emory. In the worst case, this can lead to behavior known as
thrashing, whereby the operating system is so busy m oving the contents of the
address space into and out of physical memory that no real work gets done.
62 Chapter 3 • User-Level M em ory Management

While it's possible for f r ee () to hand released memory back to the system and shrink
the process address space, this is almost never done. Instead, the released memory is
kept available for allocation by the next call to malloc ( ) , calloc ( ) , or reall oc ( ) .
Given that released memory continues to reside in the process's address space, it may
pay to zero it out before releasing it. Security-sensitive programs may choose to do this,
for example.
See Section 15.5.2, "Memory Allocation Debuggers, " page 612, for discussion of a
number of useful dynamic-memory debugging tools.

3 .2 .1 .4 Changing Size: realloc ( )


Dynamic memory has a significant advantage over statically declared arrays, which
is that it's possible to use exactly as much memory as you need, and no more. It's not
necessary to declare a global, s ta tic , or automatic array of some fixed size and hope
that it's (a) big enough and (b) not too big. Instead, you can allocate exactly as much
as you need, no more and no less.
Additionally, it's possible to change the size of a dynamically allocated memory area.
Although it's possible to shrink a block of memory, more typically, the block is grown.
Changing the size is handled with reall oc () . Continuing with the coo rdinates
example, typical code goes like this :
int new_coun t ;
size_t new_amount;
struct coord *newcoords;

/ * set new_ count , for example : * /


new_count = count * 2; / * double the storage * /
new_amount = new_count * s i z eof(struct coord ) ;

newcoords = (struct coord *) realloc(coord i nates, new_amount ) ;


if (ne wcoords == NULL) {
/ * report error, r e cover or give u p * /

coordinates = newcoords;
/ * cont inue usi n g coordinates ... */

As with mall oc ( ) , the steps are boilerplate in nature and are similar in concept:
1. Compute the new size to allocate, in byres.
2. Call rea lloc () with the original pointer obtained from malloc () (or from
calloc () or an earlier call to r ealloc ( ) ) an d the new size.
3.2 Memory Allocarion 63

3. Cast and assign the return value of realloe (). More discussion of this shortly.
4. As for malloe ( ), check the return value to make sure it's not NULL. Any
memory allocation routine can fail.

When growing a block of memory, realloe () often allocates a new block of the
right size, copies the data from the old block into the new one, and returns a pointer
to the new one.
When shrinking a block of data, realloe () can often just update the internal
bookkeeping information and return the same pointer. This saves having to copy the
original data. However, if this happens, don't assume you can still use the memory beyond
the new size!
In either case, you can assume that if realloe () doesn ' t return NULL , the old data
has been copied for yo u into the new memory. Furthermore, the old pointer is no
longer valid, as if yo u had called free () with it, and you should not use it. This is true
of all pointers into that block of data, not just the particular one used to call free () .
You may have noticed that our example code used a separate variable to point to the
changed storage block. It would be possible (but a bad idea) to use the same initial
variable, like so:
coordinates = realloc(coordinates, new_amount);

This is a bad idea for the following reason. When realloe () returns NULL, the
original pointer is still valid; it's safe to continue using that memory. However, if yo u
reuse the same variable and realloc () returns NULL, you've now lost the pointer to
the original memory. That memory can no longer be used. More important, that
memory can no longer be freed! This creates a memory leak, which is to be avoided.
There are some special cases for the Standard C version of reall oc ( ) : When the
ptr argument is NULL, realloe () acts like malloe () and allocates a fresh block of
storage. When the size argument is 0 , realloe () acts like free () and releases the
memory that ptr points to. Because (a) this can be confusing and (b) older systems
don' t implement this feature, we recommend using malloe () when you mean
malloe () and free () when you mean free ( ) .
64 Chapter 3 • User-Level Memory Management

Here is another, fairly subtle, "gotcha.,,4 Consider a routine that maintains a stati c
pointer to some dynamically allocated data, which the routine occasionally has to grow.
It may also maintain automatic (that is, local) pointers into this data. (For brevity, we
omit error checking code. In production code, don' t do that.) For example:
void manage_table (void)
{
static struct table *table;
struct table *cur, *p;
int i;
size_t count;

table (struct table * ) rnalloc(count * sizeof (struct table) ) ;


/ * fill table * /
cur = & table[i]; / * point at i'th item * /

cur->i = j; /* use pointer * /

if ( some condition ) /* need to grow table * /


count += count/2;
p = (struct table * ) realloc(table, count * sizeof(struct table));
table = p;

cur->i j; / * PROBLEM 1: update table element * /

other_routine ( ) ; / * PROBLEM 2 : see text * /


cur->j = k; / * PROBLEM 2: see text * /

This looks straightforward; ma nage_table () allocates the data, uses it, changes the
size, and so on. But there are some problems that don't jump off the page (or the screen)
when you are looking at this code.
In the line marked 'PROBLEM 1', the c ur pointer is used to update a table element.
However, c ur was assigned on the basis of the initial value of table. If some
c ondi bon was true and reall o c () returned a different block of memory, cur now
points into the original, freed memory! Whenever table changes, any pointers into
the memory need to be updated too. What's missing here is the statement ' cur = &
table [ i 1 ;' after table is reassigned following the call to reall o c ( ) .

4 It is derived from real-life experience with gawk.


3.2 Memory Alloca[ion 65

The two lines marked ' PROBLEM 2' are even more subtle. In particular, suppose
other_ r outine () makes a recursive call to manage_table ( ) . T he t able variable
could be changed again, completely invisib ly! Upon return from other_r outine () ,
the value of cu r co uld once again be invalid.
One might think (as we did) that the only solution is to be aware of this and supply
a suitably commented reassignment to cu r after the function call. However, Brian
Kernighan kindly set us straight. If we use indexing, the pointer maintenance iss ue
doesn't even arise:
table = ( struc c table *) mal loc (count * siz e of (struct cable )) ;
/ * f i ll table * /

tabl e [ i ] . i = j ; /* Updat e a membe r of the i'th e l ement */

if (some con d i ti on) / * ne ed to g r ow table * /


count += c ount/2 ;
p = (st r uct table * ) rea lloc ( table, count * si ze of ( struct table )) ;
tabl e = p;

table [i] . i = j ; / * PROBLEM 1 goes away * /


ocher_rou tine () ; / * Recursively calls us, modifies tabl e * /
cable[i] . j = k; / * PROBLEM 2 goes away al s o * /

Using indexing doesn' t solve the problem if you have a global copy of the original
pointer to the allocated data; in that case, you still have to worry about updating your
global structures after calling r ealloe ( ) .

m
I NOTE As with malloe ( ) , whe n you grow a piece of memory, the newly
~ allocated memory returned from realloe () is not zero-filled . You must clear
I it you rself with mems et () if that's necessary, since realloe () only allocates
@ th e fresh memory; it doesn 't do anything else.
!li

3.2.1.5 Allocating and Zero-filling: call oc ( )


The eall oe () function is a straightforward wrapper around mall oe ( ) . Its primary
advantage is that it zeros the dynamically allocated memory. It also performs the size
calculation for you by taking as parameters the number of items and the size of each:
coo rdinates = ( struct coo rd *) c al loc(count, sizeof(st ruc t coord}} ;

Conceptually, at least, the ealloe () code is fairly simple. Here is one possible
implementation :
66 Chapter 3 • User-Level Memory M anagement

void *cal loc ( size_t nmemb , size_t size )


(
void *p;

total = nmemb * size; Compute size


p = malloc(total); Allocate the memory

i f (p ! = NULL ) Ifit worked ,,'


memse t(p, '\0' , total); Fill it with zeros

return p; Return value is NULL or pointer

Many experienced programmers prefer to use eal loc () si nce then there's never any
question about the contents of the newly allocated memory.
Also, if you know you'll need zero-filled memory, you should use ea lloe ( ) , because
it's possible that the memory mallo e () returns is already zero-filled. AJthough yo u,
the programmer, can' t know this, ealloe () can know about it and avoid the call
to memset ( ) .

3.2.1.6 Summarizing from the GNU Coding Standards


To summarize, here is what the GNU Coding Standards has to say about using the
memory allocation routines:

Check every call to malloe or reallo e to see if it returned zero . Check


rea lloe even if you are making the block smaller; in a system that rounds
block sizes to a power of 2, rea lloe may get a different block if you ask for
less space.
In Unix, realloe can destroy the storage block if it returns zero. GNU
rea lloe does not have this bug: If it fails, the origi nal block is unchanged.
Feel free to assume the bug is fixed. If you wish to run your program on
Unix, and wish to avoid lossage in this case, you can use the GNU mall oe .
You must expect free to alter the contents of the block that was freed.
Anything you want to fetch from the block, yo u must fetch before calling
free .

In three short paragraphs, Ri chard Stallman has distilled the important principles
for doing dynamic memory management with malloe ( ). It is the use of dynamic
3.2 Memory Allocation 67

memory and the "no arbitrary limits" principle that makes GNU programs so robust
and more capable than their Unix counterparts.
W e do wish to point out that the C standard requires r ealloe () to not destroy the
original block if it returns NU LL .

3.2.1.7 Using Private All ocators


The mallo e () suite is a general-purpose memory allocator. It has to be ab le to
handle requests for arbitrarily large or small amounts of memory and do all the book-
keeping when different chunks of allocated memory are released. If your program does
considerable dynamic memory allocation, you may thus find that it spends a large
propo rtion of its time in the malloe () functions.
One thing you can do is write a private allocator-a set of functions or m acros that
allocates large chunks of memory from mall oe ( ) and then parcels out small chunks
one at a time. This technique is particularly useful if you allocate many individual in-
stan ces of the same relatively small structure.
For example, GNU awk (g awk) uses this technique. From the file awk . h in the gawk
distribution (edited slightly to fit the page):
#de fi n e getnode (n ) if ( ne x t fr ee ) n = nextfree, nextfree = nextfree->nextp ; \
else n = more_nodes ()

#def ine freenode(n) ((n)-> f l ags = 0, (n)->exec_ count = 0,\


(n)->nextp = n e x tfree, next free = (n) )

The nextfr ee variable points to a linked list of NOD E structures. The getnode ( ) macro
pulls the first structure off the list if one is there. Otherwise, it calls mor e_nodes () to
allocate a new list of free NODES. T he fr eenode ( ) macro releases a NOD E by putting it
at the head of the list.

I~ NOTE When firstwritingyourapplication, do itthe simple way: use mall oe ()


and f r ee () directly. I{and only i{profiling your program shows you that it's
I. sPhendlding a signifidcant a.n: ount oftime iln the memory-a llocation functions
l
fJ s ou you co ns l er wntlng a pnvate a ocator.
~

3,2. 1.8 Example: Readin g Arbi trarily Long Lines


Since this is, after all, Linux Programming by Example, it's time for a real-life
example. The following code is the readl i n e () function from GNU Make 3.80
68 Chapter 3 • User-Level Memory Management

(ftp: // ftp.gnu. o rg / gnu / make / make- 3 .80.tar.gz). It can be found in the file
read. c .

Following the "no arbitrary limits" principle, lines in a Makefile can be of any
length. Thus, this routine's primary job is to read lines of any length and make sure
that they fit into the buffer being used.
A secondary job is to deal with continuation lines. As in C, lines that end with a
backslash logically continue to the next line. The strategy used is to maintain a buffer.
As many lines as will fit in the buffer are kept there, with pointers keeping track of the
start of the buffer, the current line, and the next line. Here is the structure:
struct ebuffer

char *buffer; /* Start of the current line i n the buffer . * /


char *bufnext ; /* Start of the next line in the buffer . * /
char *bufstart ; /* Start of the entire buffer . * /
unsigned int size; /* Malloc'd size of buffer . * /
FILE *fp; /* File, or NULL if this is an i nternal buffer . */
struct floc floc; /* Info o n the file in fp (if any). */
};

The size field tracks the size of the entire buffer, and f p is the FILE pointer for the
input file. The fl oc structure isn' t of interest for studying the routine.
The function returns the number of lines in the buffer. (The line numbers here are
relative to the start of the function , not the source file. )
1 static long
2 readline (ebuf ) static long readline(struct ebuffer *ebuf)
3 struct ebuffer *ebuf;
4
5 char *p;
6 char *end;
7 char *start ;
8 long nlines = 0 ;
9
10 / * The behaviors between string and stream buffers are differ e nt enough to
11 warrant different functions . Do the Right Thing . */
12
13 if ( !ebuf->fp)
14 return readstring (ebuf);
15
16 / * When reading from a file, we always start over at t h e beginning of the
17 buffer for each new line. */
18
19 p = start = ebuf->bufstart;
20 end = p + ebuf->size ;
21 *p= '\0 ';
3.2 Memory Alloca(ion 69

We start by noticin g that GNU Make is written in K&R C for maximal portability.
T he initial part declares variables, and if the input is coming from a string (s uch as
fro m the expansion of a macro) , the code hands things off to a different function,
re adstring () (lines 13 and 14) . The test ' ! ebuf -> fp' (line 13) is a shorter (and less
clear, in our opinion) test for a null pointer; it's the same as 'ebu f ->fp == NULL' .
Lines 19- 21 initialize the pointers, and insert a NUL byte, which is the C string
terminator character, at the end of the buffer. The function then starts a loop (lines
23-9 5) , which runs as lo ng as there is more inp ut.
23 wh i l e ( fgets (p, end - p, ebu f->f p) ! = 0)
24 (
25 char *p2 ;
26 unsigned l ong len;
21 int backs lash ;
28
29 len = strlen (p ) ;
30 if (len == 0)
31 (
32 / * This only happens when the fir st thing on the li n e is a '\0 '.
33 It is a pre tty hop eless cas e , but (wonder of wonders ) Athena
34 lossage strike s again! (xmkrnf p uts NUL s in its make file s . )
35 There is nothi ng really to be d one; we syn thesiz e a newli ne so
36 the following line doe sn't appe ar to be part of this line . */
37 error (&ebuf ->floc,
38 _( "warni ng : NUL cha ra cter seen; re st of line ignored " ));
39 p [0 J = ' \ n';
40 len = 1;
41 }

The fg ets () function (line 23) takes a pointer to a buffer, a co unt of bytes to read,
and a FILE * vari able for the fi le to read from. It reads one less than the count so that
it can terminate the buffer with ' \ 0 ' . T his fun ction is good since it allows yo u to avoid
buffer overflows . It stops upon encountering a newline or end-of-file, and if the newline
is there, it's placed in the buffer. It returns NULL on failure or the (pointer) value of the
first argument on success.
In this case, the arguments are a pointer to the free area of the buffer, the amo unt
of room left in the buffer, and the FILE pointer to read from.
The comment o n lines 32- 36 is self-explanatory; if a zero byte is encountered, the
program prints an error message and pretends it was an empty line. After compensating
for the NUL byte (lines 30-4 1), the code continues .
70 Chapter 3 • User-Level Memory Management

43 /* Jump past the text we just read. */


44 p += len;
45
46 / * If the last char isn't a newline, the whole line didn't fit int o the
47 buffer. Get some mo re buffer and try again . */
48 i f (p[-lJ ! = ' \n')
49 goto more_buffer;
50
51 / * We got a newline , so add one to the count of lines . */
52 ++n l ines;

Lines 43-52 increment the pointer into the buffer past the data just read. The code
then checks whether the last character read was a newline. The construct p [-1] (line 48)
looks at the character in front ofp, just as p [ 0] is the current character and p [1] is the
next. This looks strange at first, but if you translate it into terms of pointer math,
* (p-1) , ir makes more sense, and the indexing form is possibly easier to read.
If the last character was not a newline, this means that we've run out of space, and
the code goes off (with g o t o ) to get more (line 49). Otherwise, the line count is
incremented.
54 #if !def ined (WINDOWS32) && ! defined ( __ MSDOS __ )
55 / * Check to see if the line was really ended with CRLF; if so ignore
56 the CR . * /
57 if ((p - start) > 1 && p[-2J == ' \r' )
58 {
59 - -p;
60 p[-lJ '\n' ;
61
62 #endif

Lines 54-62 deal with input lines that follow the Microsoft convention of ending
with a Carriage Return-Line Feed (CR-LF) combination, and not just a Line Feed (or
newline), which is the Linux/Unix convention. Note that the #ifdef excludes the code
on Microsoft systems; apparently the <stdi o . h > library on those systems handles this
conversion automatically. This is also true of other non-Unix systems that support
Standard C.
64 backs lash 0;
65 for (p2 = p - 2 ; p2 >= start; --p2)
66
67 if (*p2 ! = ' \ \ ' )
68 break;
69 backslash = ! backslash;
70
71
3.2 Memory Allocation 71

72 if (! backs lash)
73 {
74 p[-lJ = '\0' ;
75 brea k;
76
77
78 /* It was a backslash/newline combo . If we h ave mo re space, read
79 anothe r line . */
80 if (end - p >= 80)
81 continue ;
82
83 / * We need more space at the end of our buffer,
so realloc it .
84 Make sure to preserve the current offset of p . */
85 more_buffer :
86
87 unsigned long off = p - start ;
88 ebuf->size *= 2 ;
89 start = ebuf->buffer = ebuf->bufstart (char * ) xrealloc ( start,
90 ebuf->size) ;
91 p = start + off ;
92 end = start + ebuf->size;
93 *p = ' \ 0' ;
94
95

So far we've dealt with the mechanics of getting at least one complete line into the
buffer. The next chunk handles the case of a continuation line. It has to make sure,
though, that the final backslash isn't part of multiple backslashes at the end of the line.
It tracks whether the total number of such backslashes is odd or even by toggling the
backs l ash variable from 0 to 1 and back. (Lines 64-70.)
If the number is even, the test'! bac ks la s h' (line 72) will be true. In this case, the
final newline is replaced with a NUL byte, and the code leaves the loop.
On the other hand, if the number is odd, then the line contained an even number
of backslash pairs (representing escaped backslashes, \ \ as in C), and a final backslash-
newline combination. 5 In this case, if at least 80 free bytes are left in the buffer, the
program continues around the loop to read another line (lines 78-81). (The use of
the magic number 80 isn't great; it would have been better to define and use a symbolic
constant.)

5 This code has the scent of practical experience abo ut it: It wo uldn 't be surprising to lea rn that earli er versions
simply checked for a final backslash before the newline, until so meone co mplained th at it didn 't wo rk when there
we re multiple backslashes at th e end of the line.
72 Chapter 3 • User-Level Memory Management

Upon reaching line 83, the program needs more space in the buffer. Here's where
the dynamic memory management comes into play. Note the comment about preserving
p (lines 83-84); we discussed this earlier in terms of reinitializing pointers into dynamic
memory. end is also reset. Line 89 resizes the memory.
Note that here the function being called is xrealloe ( ) . Many GNU programs use
"wrapper" functions around malloe () and realloe () that automatically print an
error message and exit if the standard routines return NULL. Such a wrapper might look
like this:
extern const char *myname; / * se t in main( ) * /

v o id *xreall oc(vo i d *ptr, size_t amount )

void *p = reall o c (ptr, amount ) ;

if (p == NULL ) {
fprintf(stderr, "%s: out of memory' \ n", myname ) ;
exit (1 ) ;

Thus, if xrealloe () returns, it's guaranteed to return a valid pointer. (This strategy
complies with the "check every call for errors" principle while avoiding the code clutter
that comes with doing so using the standard routines directly.) In addition, this allows
valid use of the construct 'ptr = xrealloc (ptr , new_size)', which we otherwise
warned against earlier.
Note that it is not always appropriate to use such a wrapper. If you wish to handle
errors youtself, you shouldn't use it. On the other hand, if running out of memory is
always a fatal error, then such a wrapper is quite handy.
97 if (ferr or (ebuf->fp ) )
98 pfatal_with_name (ebuf->fl o c . filenm ) ;
99
100 / * If we found some lines , return how many .
101 If we didn ' t, but we did find _something_, that indicates we read the last
102 line of a file with n o final newline; return 1 .
103 If we read nothing, we're at EOF; return -1 . */
104
105 return nlines ? nlines : p == ebuf->bufstart ? - 1 : 1;
106
3. 2 Memory Allocation 73

Finally, the read l i ne () ro utine checks for I/O errors, and then returns a descriptive
return value. The fun ction pf atal_wi th_name () (line 98) doesn ' t return.

3 .2.1.9 GLlBC Only: Re ading Entire Lines: getl ine () and getde lim ( )
Now that yo u've seen how to read an arbitrary-length line, you can breathe a sigh
of relief that yo u don ' t h ave to write such a function for yourself. GLIBC provides two
functions to do this for you:
#de fine _GNU_SOURCE 1 CUBe
#inc l ude <stdio . h>
#include <sys/typ e s. h> / * for ssize_ t */

ssi ze_t getline(c har * *linept r, s ize_t *n, FI LE * stream) ;


ssi ze_ t getdelim( char **lineptr, size_ t *n , in t delim , FILE *stream) ;

Defining the constant _GNU_SOURCE brings in the declaration of the getl ine ( )
and getdelim() functions. Otherwise, they're implicitly declared as returning int o
<sys / typ es. h> is needed so you can declare a variable of eype ssi ze_t ro hold the
return value. (An ss i ze_ t is a "signed size_t ." It's meant for the same use as a size_t ,
bur for places where yo u need to be ab le to hold n egative values as well.)
Both functi ons manage dynamic storage for you, ensuring that the buffer containing
an input line is always big enough to hold the input line. They differ in that getline ( )
reads until a newline character, and get delim () uses a user-provided delimiter character.
The com mon arguments are as fo llows:
char ** lineptr
A pointer to a char * pointer to hold the address of a dynamically allocated
buffer. It sho uld be initialized to NULL if yo u want getl ine () to do all the work.
O therwise, it should point to storage previo usly obtained from malloc ( ) .
size t *n
An indication of the size of the buffer. If yo u allocated yo ur own buffer, *n sho uld
co ntain the buffer's size. Both functions update *n to the new buffer size if they
change it.
FI LE *str eam
The location from which to ge t input characters.
74 Chapter 3 • U ser-Level Memo ry M anagement

The functions return -1 upon end-of-file or error. The strings hold the terminating
newline or delimiter (if there was one) , as well as a terminating zero byte. Using
get1ine () is easy, as shown in ch 03-getline . c :
/ * ch03- getl i n e . c -- - d e monstrate g e tl i ne() . */

#define _ GNU_SOURCE 1
#include <s t dio.h>
# include <s ys /types . h>

/ * main -- - re ad a li n e and echo it back ou t until EOF. */

int main (v oi d )

char * line = NULL;


siz e _ t si ze =0;
ss iz e_t r et ;

while (( r et = getl in e (& line, & s ize, stdin)) != -1)


pri n tf("(%lu ) %s " , size , l i n e );

return 0 ;

H ere it is in action, showing the size of the buffer. The third input and output lines
are purposely long, to force get l ine () ro grow the buffer; thus, they wrap around:
$ ch03-getline Run the program
this is a line
(120) th i s i s a line
And another line.
( 12 0) And a nother line .
A llllllllllllllllloooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnngggg
gggggggg llliiiiiiiiiiiiiiiiiiinnnnnnnnnnnnnnnnnnnneeeeeeeeee
(240) A ll l llll llllll llll oooooooooo o ooooooooooooo ooo ooooonnnnnnnnnnnnnnnn g
nnnggggggggggg llli ii i iiii iiii iiii ii innnnnnnnnnnnnnnnnnnn e e e e eeeeee

3.2.2 String Copying: strdup ()


One extremely common operation is ro allocate storage for a copy of a string. It's so
common that many programs provide a simple function for it instead of using inline
code, and often that functi on is named s trdup () :
3.2 M emory Alloca(ion 75

#include <str ing . h>

/ * strdup --- malloc() storage for a copy o f string and copy it * /

char *strdup ( const char *str )


(
size_t len;
char *copy ;

len = strlen (str ) + 1 ; / * include room for terminating ' \0' */


copy = malloc(len) ;

if (copy != NULL )
strcpy ( copy, str ) ;

return copy; / * return s NULL i f err o r * /


}

With the 200 1 POSIX standard, programmers the world over can breathe a little
easier: This function is now part of POSIX as an XSI extension:
#include <string . h> XSI

c har *strdup(cons t c har *str ) ; Duplicate str


The return value is NULL if there was an error or a pointer to dynamically allocated
sto rage holding a copy of str . The returned value should be freed with free () when
it's no longer needed.

3.2.3 System Calls: brk () and sbrk ( )


The four routines we've covered (rnall oc ( ), calloc ( ), real loc (), and free () )
are the standard, portable functio ns to use for dynamic memory managem ent.
On Unix systems, the standard functions are implemented on top of two additional,
very primitive routines, which directly change the size of a process's address space. We
present them here to help yo u understand how GNU/Linux and Unix work ("under
the hood" again); it is highly unlikely that you will ever need to use these functio ns in
a regular program. They are declared as follows:
#include <unistd . h> Common
#inc lude <malloc . h> / * Ne cessary for GLIBC 2 systems */

int brk(void *end_data_segment) ;


void *sbrk(ptrdiff_t increment) ;
76 Chapter 3 • User-Level Memory Management

The brk () system call actually changes the process's address space. The address is a
pointer representing the end of the data segment (really the heap area, as shown earlier
in Figure 3.1). Its argument is an absolute logical address representing the new end of
the address space. It returns 0 on success or - 1 on failure.
The s b r k () function is easier to use; its argument is the increment in bytes by which
to change the address space. By calling it with an increment of 0, you can determine
where the address space currently ends. Thus, to increase your address space by 32
bytes, use code like this:
char *p = (char *) sbrk (O) ; /* get current end of address space * /
if (brk(p + 32) < 0) (
/ * handle error * /

/ * else, change worked * /

Practically speaking, you would not use brk () directly. Instead, you would use
sbr k () exclusively to grow (or even shrink) the address space. (We show how to do
this shortly, in Section 3.2.5, "Address Space Examination," page 78.)
Even more practically, you should never use these routines. A program using them
can' t then use rna ll oc () also, and this is a big problem, since many parts of the standard
library rely on being able to use rna l l oc ( ) . Using b r k () or sbr k () is thus likely to
lead to hard-to-find program crashes.
But it's worth knowing about the low-level mechanics , and indeed, the rnalloc ( )
suite of routines is implemented with sbr k () and brk ( ) .

3.2.4 Lazy Programmer Calls: alloca ()


"Danger, Will Robinson! Danger! "
-The Robot-
There is one additional memory allocation function that you should know about.
We discuss it only so that you'll understand it when you see it, but you should not use
it in new programs! This function is named allo c a ( ); it's declared as follows:
/ * Header on GNU/Linux, possibly not all Unix systems * / Common
#include <alloca . h>

void *alloca (size_t size ) ;


3.2 Memory Allocatio n 77

The all oca () function allocates size bytes from the stack. What's nice about this
is that the allocated storage disappears when the function returns. There's no need to
explicitly free it because it goes away automatically, just as local variables do .
At first glance, alloca () seems like a programming panacea; memory can be allo-
cated that doesn't have to be managed at all. Like the Dark Side of the Force, this is
indeed seductive. And it is similarly to be avoided, for the following reasons:

• The function is nonstandard; it is not included in any formal standard, either ISO
Cor POSDC
• The function is not portable. Although it exists on many Unix systems and
GNU/Linux, it doesn't exist on non-Unix systems. This is a problem, since it's
often important for code to be multiplatform, above and beyo nd just Linux
and Unix.
• On some systems, alloca () can't even be implemented. All the world is not an
Intel x86 processor, nor is all the world GCe.
• Quoting the manpage (emphasis added): "The all oca function is machine
and compiler dependent. On many systems its implementation is buggy. Its use is
disco uraged."
• Quoting the manpage again : "On many systems all oca cannot be used inside
the list of arguments of a function call, because the stack space reserved by alloca
would appear on the stack in the middle of the space for the fun ction arguments."
• It encourages sloppy coding. Careful and correct memory management isn't hard;
you just to have to think about what you're doing and plan ahead.

GCC generally uses a built-in version of the function that operates by using inline
code. As a result, there are other consequences of alloca ( ). Quoting again from
the manpage:
The fact that the code is inlined means that it is impossible to take the address
of this function, or to change its behavior by linking with a different library.
The inlined code often consists of a single instruction adjusting the stack
pointer, and does not check for stack overflow. Thus, there is no NULL error
return.
78 Chapter 3 • User-Level Memory Managemem

The manual page doesn 't go quite far enough in describing the problem with Gee's
built-in alloca ( ) . If there's a stack overflow, the return value is garbage. And you have
no way to tell! This Haw makes GCC's alloca () impossible to use in robust code.
All of this should convince you to stay away from alloca () for any new code that
you m ay write. If yo u' re going to have to write portable code using malloc () and
free () anyway, there's no reason to also write code using alloca ( ) .

3.2.5 Address Space Examination


The following program, ch 03 -mernaddr . c, summarizes everything we've seen about
the address space. It does many things that yo u should not do in practice, such as call
all oc a () or use brk () and sbrk () directly:
1 / *
2 * ch03-memaddr . c Show address of code, data and stack sections,
3 as well as BSS and dynamic memory .
4 */
5
6 #include <stdio . h>
7 #include <malloc . h> / * for definition of ptrdifCt on GLIBC * /
8 #include <unistd . h>
9 #include <alloca . h> / * for demonstration only * /
10
11 extern void afunc (void) ; / * a function for showing stack growth * /
12
13 int bss_var ; / * auto init to 0 , should be in BSS * /
14 int data_var 42; / * init to nonzero / should be data * /
15
16 int
17 main(int argc, char **argv ) / * arguments aren't used * /
18
19 char *p, *b, *nb;
20
21 printf ( 'Text Locations : \n');
22 printf('\tAddress of main : %p\n', main ) ;
23 printf('\tAddress of afunc: %p\n', afunc);
24
25 printf('Stack Locat i ons:\n');
26 afunc () ;
27
28 p = (char *) alloca(32 );
29 if (p '= NULL) {
30 printf('\tStart of alloca() 'ed array : %p\n', p);
31 printf('\tEnd of alloca() 'ed array: %p\n', p + 31) ;
32
33
3.2 Memory Allocarion 79

34 printf ( "Data Locations : \n " ) ;


35 printf( "\tAddress of data_var : %p\n", & data_var) ;
36
37 printf("BSS Locations : \n " ) ;
38 printf("\tAddress of bss_var : %p\n", & bss_var) ;
39
40 b = sbrk((ptrdiff_t) 32); 1* grow address space *1
41 nb = sbrk((ptrdiff_t) 0) ;
42 printf("Heap Locations : \ n" ) ;
43 printf("\tInitial end of heap : %p\n", b) ;
44 printf("\tNew end of heap : %p \n", nb) ;
45
46 b = sbrk( (ptrdiff_t) -16) ; 1* shrink it *1
47 nb = sbrk( (ptrdiff _ t) 0) ;
48 printf(" \tF inal end of heap : %p\n", nb) ;
49
50
51 void
52 afunc (void)
53 {
54 static int level = 0 ; 1* recursion level *1
55 auto int stack_var; 1* automatic variable, on stack * 1
56
57 if ( ++ level == 3) 1* a v oid infinite recursion * 1
58 return;
59
60 printf("\tStack leve l %d : address of stack_var : %p\n ",
61 level , & stack_var);
62 afunc () ; 1 * recursive call * 1
63

T his program prints the locations of the two functions main () and afune () (lines
22-23). It then shows how the stack grows downward, letting a f une ( ) (lines 51 -63)
print the address of successive instantiations of its local variable s taek_var. (s t a e k_v ar
is purposely declared a ut o, to emphasize that it's on the stack.) It then shows the loca-
tion of memory allocated by a lloe a () (lines 28-32). Finally it prints the locations of
data and BSS variables (lines 34-38), and then of memory allocated directly through
sbrk () (lines 40-48). Here are the results when the program is run on an Intel
GNU/Lin ux system:
$ ch03-memaddr
Text Locations :
Address of main : Ox 804838c
Address of afunc : Ox8 0484a8
Stack Locations :
Stack level 1 : address of stack_var : Oxbfff f864
Stack l evel 2 : address o f stack_var : Oxbff ff844 Stack grows downward
Start of alloca()'ed array : Oxbffff8 60
End o f alloca() 'ed array : Ox bf fff 87f Addresses are on the stack
80 Chapter 3 • User-Level Memory Management

Data Locations:
Address of data_var: Ox80496b8
BSS Locations :
Address of bss_var: Ox80497c4 ass is above data variables
Heap Locations :
Initial end of heap: Ox80497c8 Heap is immediately above ass
New end o f heap: Ox80497e8 And grows upward
Final end of heap: Ox80497d8 Address spaces can shrink

3.3 Summary
• Every Linux (and Unix) program has different memory areas. They are stored in
separate parts of the executable program's disk file. Some of the sections are loaded
into the same part of memory when the program is run. All running copies of the
same program share the executable code (the text segment). The size program
shows the sizes of the different areas for relocatable object files and fully linked
executable files.
• The address space of a running program m ay have holes in it, and the size of the
address space can change as memory is allocated and released. On modern systems,
address 0 is not part of the address space, so don ' t attempt to dereference
NULL pointers.

• At the C level, memory IS allocated or reallocated with one of mall oe ( ) ,


ealloe ( ) , or realloe ( ) . M emory is freed with free ( ) . (Although rea lloe ( )
can do everything, using it that way isn' t recommended). It is unusual for freed
memory to be removed from the address space; instead, it is reused for
later allocations.
• Extreme care must be taken to
• Free only memory received from the allocation routines,
• Free such memory once and only once,
• Free unused memory, and
• Not "leak" any dynamically allocated memory.
• POSIX provides the s trdup () function as a convenience, and GLIBC provides
getline () and getdelim() for reading arbitrary-length lines.
3.4 Exercises 81

• The low-level system call interface functions, brk () and sbrk ( ) , provide direct
but primitive access to memory allocation and deallocation. Unless yo u are wri ting
your own storage allo cator, you should not use them.
• The alloca () function for allocating memory on the stack exists, but is not rec-
ommended. Like being able to recognize poiso n ivy, you should know it only so
that yo u'll know to avoid it.

Exercises

1. Starting with the structure-


struct line {

char *buf;
FILE * fp;
};

-write yo ur own r eadline () function that will read an any-length line.


Don't worry about backslash continuation li nes. Instead of using fge ts () to
read lines, use g etc () to read characters one at a time.
2. Does your function preserve the terminating newline? Explain why o r why not.
3. How does your function handle lines that end in CR-LF?
4. How d o you initialize the structure? With a separate routine? With a document-
ed requirement for specific values in the structure?
5. How do yo u indicate end-of-file? How do you indicate that an I/O error has
occurred? For errors , should your function print an error message? Explain why
or why not.
6. Write a program that uses yo ur functi on to test it, and another program to
generate input data to the first program. Test your functio n.
7. Rewrite your function to use fgets () and test it. Is the new code more complex
or less complex? H ow does its performance compare to the getc () version?
8. Study the V7 end(3) manpage (/ usr / man / man3 / end. 3 in the V7 distribution).
Does it shed any light on how 'sbrk (0)' might work?
9. Enhance ch03 - memaddr. c to print our the location of the arguments and the
environment. In which part of the address space do they reside?
In this chapter

• 4 .1 Introducing th e Linuxj Unix I/ O Mod el page 84


• 4.2 Presenting a Bas ic Program Structure page 84
• 4 .3 Determining What Went Wrong page 86

• 4.4 Doing Input and Output page 91


• 4.5 Rand om Access : Moving Aro und wit hin a Fi le page 10 2

• 4 .6 Creating Fi les page 106

• 4 .7 Fo rcin g Data to Disk page 11 3

• 4.8 Setting File Length page 114

• 4.9 Sum m ary page 115

• Exercise s page 115

83
T his chapter describes basic file operations: opening and creating files , reading
and writing them, moving around in them, and closing them. Along the way
it presents the standard mechanisms for detecting and reporting errors. The chapter
ends off by describing how to set a file's length and force file data and metadata
to disk.

4.1 Introducing the Linux/Unix I/O Model


The Linux/Unix API model for I/O is straightforward. It can be summed up in four
words: open, read, write, close. In fact , those are the names of the system calls: open ( ) ,
read ( ) , write (), close () . H ere are their declarations:
#include <sys /types.h> POSIX
#include <sys/stat . h> /* for mode - t * /
#include <fcntl .h> / * for flags for open ( ) * /
#include <unistd.h> /* for ssize- t * /

int open(const char *pathname, int flags, mode_t mode ) ;


ssize_t read ( int fd, void *buf, size_t count ) ;
ssize_t write(int fd, canst void *buf, size_t count ) ;
int close (int fd);

In the next and subsequent sections, we illustrate the model by writing a very simple
version of cat . It's so simple that it doesn 't even have options; all it does is concatenate
the contents of the named files to standard output. It does do minimal error reporting.
Once it's written, we compare it to the V7 cat .
We present the program top-down, starting with the command line. In succeeding
sections, we present error reporting and then get down to brass tacks, showing how to
do actual file I/O .

4.2 Presenting a Basic Program Structure


Our version of ca t follows a structure that is generally useful. The first part starts
with an explanatory comment, header includes, declarations, and the main () function:
1 / *
2 * ch04-cat . c Demonstrate open( ) , read () , write(), close(),
3 errno and strerror () .
4 */
5
6 #include <stdio.h> / * for fprintf () , stderr, BUFSIZ * /
7 #include <errno . h> /* declare errno * /
8 #include <fcntl.h> / * for flags for open() * /
9 #include <string.h> / * declare strerror () * /

84
4 .2 Preseming a Basic Program Structure 8S

10 #include <unistd . h> / * f or ssize_t * /


11 #include <sys/types . h>
12 #incl ude <sys/stat . h> / * for mode_t */
13
14 char *myname;
15 int proc es s (char *file);
16
17 / * main - -- loop over file argume n ts */
18
19 int
20 main(in t argc , char **argv )
21
22 int i;
23 in t errs = 0;
24
25 myname = argv [O];
26
27 if (argc == 1)
28 errs process( " -") ;
29 el se
30 for (i 1· i < argc; i++ )
31 errs += proc es s (argv[ i ]) ;
32
33 retu rn (er r s ! = 0) ;
34
... co ntinued later in the chapter ..

The myname variable (line 14) is used later for error messages ; mai n () sets it to the
program name (argv [01) as its first action (line 25). Then ma in () loo ps over the ar-
guments. For each argument, it calls a function named proces s () to d o the wo rk.
When given the filen ame - (a single dash, or minus sign) , Unix cat reads standard
input instead of trying to open a file n amed -. In addition , with n o arguments, ca t
reads standard input. ch04-cat implements both of these behaviors. T he check for
'arg c == l' (line 27) is true when there are no filen ame arguments; in this case, ma in ( )
passes" -" to proc ess ( ) . Otherwise, main () loops over all the arguments, treating
them as files to be processed. If one of them happens to be "-", the program then
processes standard input.
If process () returns a n onzero value, it means that so mething went wrong. Errors
are added up in the er rs variable (lines 28 and 3 1) . When main () ends, it returns 0
if there were no errors, and 1 if there were (line 33) . This is a fairly standard co nventio n ,
whose meaning is discussed in more detail in Section 9.1.5.1, "Defining Process Exit
Status," page 300.
86 Chapter 4 • Files and File 1/0

The structure presented in main () is quite generic: pro ces s () could do anything
we want to the file. For example (ignoring the special use of" - "), process () could
just as easily remove files as concatenate them!
Before looking at the process () function, we have to describe how system call errors
are represented and then how I/O is done. The process () function itself is presented
in Section 4.4.3 , "Reading and Writing, " page 96.

4.3 Determining What Went Wrong


"If anything can go wrong, it will."
-Murphy's Law-
"Be prepared ."
-The Boy Scouts-
Errors can occur anytime. Disks can fill up, users can enter invalid data , the server
on a network from which a file is being read can crash, the network can die, and so on.
It is important to always check every operation for success or failure.
The basic Linux system calls almost universally return -1 on error, and 0 or a positive
value on success. This lets you know that the operation has succeeded or failed:
int result;

result = some_system_call(paraml, param2 ) ;


if (result < 0 ) (
/ * error occurred, do something * /

else
/* all ok, proceed * /

Knowing that an error occurred isn't enough. It's necessary to know what error oc-
curred. For that, each process has a predefined variable named errno. Whenever a
system call fails, errno is set to one of a set of predefined error values. errno and the
predefined values are declared in the <errno . h> header file:
#include <errno . h> ISO C

e x tern int errno ;

errno itself may be a macro that acts like an int variable; it need not be a real integer.
In particular, in threaded environments, each thread will have its own private version
of errno. Practically speaking, though , for all the system calls and functions in this
book, you can treat errno like a simple into
4.3 Derermining Whar Wenr Wrong 87

4.3.1 Values for errno


The 200 1 POSIX standard defines a large number of possible values for errno .
Many of these are related to networking, IPe, or other specialized tasks. The man page
for each system call describes the possible errno values that can occur; thus, you can
write code to check for particular errors and handle them specially if need be. The
possible values are defined by symbolic constants. Table 4.1 lists the constants provided
by CUBe.

TABLE 4 .1
GLlBC values for errno

Name Meaning
E2BIG Argument list too long.
EACCES Permiss ion denied.
EADDRlNUSE Address in use.
EADDRNOTAVAIL Address not available.
EAFNOSUPP ORT Address family not supported.
EAGAIN Resource unavailable, try again (may be the same value as EWOULDBLOCK).
EALREADY Connection already in progress.
EBADF Bad file descriptor.
EBADMSG Bad message.
EBUSY Device or resource busy.
ECANC ELED Operation canceled.
ECHILD No child processes.
ECONNABORTED Connection aborted.
ECONNREFUSED Connection refused.
ECONNRESET Connection reset.
EDEADLK Resource deadlock would occur.
EDESTADDRREQ Destination address required.
EDOM Mathematics argument out of domain of function.
EDQUOT Reserved.
EEXI ST File exists.
88 Chapter 4 • Files and File I/O

TABLE 4 .1 (Continued)

Name Meaning

EFAULT Bad address.


EFBI G File too large.
EHOSTUNREACH Host is unreachable.
EIDRM Identifier removed.
EILSEQ Illegal byte sequence.
EINPROGRES S Operation in progress.
EINTR Interrupted function.
EINVAL Invalid argument.
EI O 110 error.
EIS CONN Socket is connected.
EISDIR Is a directory.
ELOOP Too many levels of symbolic links.
EMFILE Too many open files.
EMLINK Too many links.
EMSGSIZE Message too large.
EMULTIHOP Reserved.
ENAMETOOLONG Filename too long.
ENETDOWN Network is down.
ENETRE SET Connection aborted by network.
ENETUNREACH Network unreachable.
ENFILE Too many files open in system.
ENOBUFS No buffer space available.
ENODEV No such device.
ENOENT No such file or directory.
ENOEXEC Executable file format error.
ENOLCK No locks available.
ENOLINK Reserved.
ENOMEM Not enough space.
4.3 Determining What Wem Wrong 89

TABLE 4.1 (Continued)

Name Meaning

ENOMSG No message of the desired type.


ENOPROTOOPT Protoco l not available.
ENOS PC No space left on device.
ENOS YS Function not supported.
ENOTCONN The socket is not connected.
ENOTDIR Not a directory.
ENOTEMPTY Directory not empty.
ENOTSOCK Not a socket.
ENOT SUP Not supported.
ENOTTY Inappropriate I/O control operation .
ENX IO No such device or address.
EOPNOTSU PP Operation not supported on socket.
EOVERFLOW Value too large to be stored in data type.
EPE RM Operation not permitted.
EPIPE Broken pipe.
EPROTO Protocol error.
EPROTONOSU PPORT Protocol not supported.
EP ROTOTYPE Protocol wrong type for socket.
ERANGE Result too large.
EROFS Read-only fil e sys tem.
ESPIPE Invalid seek.
ESRCH No such process.
ESTALE Reserved.
ETIMEDOUT Connection timed o ut.
ETXTBSY Text file busy.
EWOULDBLOCK Operation would block (may be the same value as EAGAIN).
EXDEV Cross-device link.
90 Chapter 4 • Files and File I/O

Many systems provide other error values as well, and older systems may not have all
the errors just listed. You should check your local intro (2) and errno (2) manpages for
the full story.

II NOTE errn o should be examined only after an error has occurred and before
. further system calls are mad e. Its in itial value is o. However, noth ing changes
:~ errn o between errors, meaning that a successful system call does not reset it
I to O. You can , of course , manually set it to 0 initially or whenever you like, but
I this is rarely done.

Initially, we use e rrno only for error reporting. There are two useful functions for
error reporting. The first is pe r r or ( ) :
#include <stdio.h> ISOC

void perror(const char *s ) ;

The pe rr or ( ) function prints a program-s upplied string, followed by a colon, and


then a string describing the value of errno:
if (some_ system_ call (param1, param2) < 0 )
perror ( "system cal l failed " ) ;
return 1 ;

We prefer the s trerror ( ) function , which takes an error value parameter and returns
a pointer to a string describing the error:
#inc l ude <string . h> ISO C

char * strerro r ( in t errnum ) ;

st r err or ( ) provides maximum flexibiliry in error reportIng, Slllce fpr intf ( )


makes it possible to print the error in any way we like:
if ( s ome_system_call (paraml, param2 ) < 0 ) {
fprintf ( stderr, "%s : %d, %d : some_ s y stem_call fail e d : %s \ n",
a r gv[O ] , param1 , param2 , s t r e r ro r (e rrn o)) ;
return 1 ;

You will see many examples of both functions throughout the book.

4.3.2 Error Message Style


C provides several special macros for use in error reporting. The most widely used
are __ F I LE __ and _ _ LI NE_ _ , which expand to the name of the source file and the
4.4 Doing Inpur and Outpur 91

current line number in that file. These have been available in C since its beginning.
C 99 defines an additional predefined identifier, __ f unc __ , which represents the name
of the current function as a character string. The macros are used like this :
if (some_s ystem_c al l(paraml, param2 ) < 0) {
fpri n t f(st de rr, " %s : %s (%s %d): some _sys tem_c al l (%d , %d) faile d : %s\n ",
a r gv [0 l , __ func __ , __ FI LE__ , __ L I NE__ ,
paraml, param2, s trerror ( errno)) ;
retu rn 1 ;

Here, the error message includes not only the program 's name but also the func tion
name, source file name, and line number. The full list of identifiers useful for diagnostics
is provided in Table 4.2.

TABLE 4 .2
e99 diagnostic identifiers

Identifier eversion Meaning


DATE C89 Date of compilation in the form "Mnun nn yyyy".
FILE Original Source-file name in the form "progr am . c ".
LINE- - Original Source-file line number in the form 42.
TIME- - C89 Time of compilation in the form "hh:mm: s s" .
func C99 Name of current function , as if declared
cons t char __ fun c __ [1 = "n ame".

T he use of _ _ FI LE __ and _ _ LI NE__ was quite popular in the early days of Unix,
when most people had source code and could find the error and fix it. As Unix systems
became more commercial, use of these identifiers gradually diminished, since knowing
the source code location isn't of much help ro someone who only has a binary executable.
Today, although GNU/Linux systems come with source code, said source code often
isn ' t installed by default. Thus, using these identifiers for error messages doesn 't seem
ro provide much additional value. T he GNU Coding Standards don't even mention them.

4.4 Doing Input and Output


All 110 in Linux is accomplished through file descriptors. This section introduces file
descriptors, describes how ro obtain and release them, and explains how ro do 110
with them.
92 Chapter 4 • Files and File I/O

4.4.1 Understanding File Descriptors


A file descriptor is an integer value. Valid file descriptors start at 0 and go up to some
system-defined limit. These integers are in fact simple indexes into each process's table
of open files. (This table is maintained inside the operating system; it is not accessible
to a running program.) On most modern systems, the size of the table is large. The
command 'u limi t -n' prints the value:
$ ulimit -n
1024

From C, the maximum number of open files is returned by the getdtablesiz e ( )


(get descriptor table size) function:
#include <unistd . h> Common

int getdtablesize(void) ;

This small program prints the result of the function:


/ * ch04-maxfds . c --- Demonstrate getdtablesize() . * /

#include <stdi o . h> / * for fprintf(), stderr, BUFSIZ * /


#include <unistd . h> / * for ssize_t * /

int
main ( int argc, char * *argv )

printf('max fds: %d\n', getdtablesize ()) ;


exit (0) ;
}

When compiled and run, not surprisingly the program prints the same value as
printed by ulimi t :
$ ch04-maxfds
max fds : 1024

File descriptors are held in normal int variables; it is typical to see declarations of the
form ' int fd ' for use with 110 system calls. There is no predefined type for
file descriptors.
In the usual case, every program starts running with three file descriptors already
opened for it. These are standard input, standard output, and standard error, on file
descriptors 0, 1, and 2 , respectively. (If not otherwise redirected, each one is connected
to your keyboard and screen.)
4.4 Doing Input and Output 93

Obvious Man ifest Constants. An Oxymoron?


When working with file-descriptor-based system calls and the standard input, output
and error, it is common practice to use the integer constants 0 ,1, and 2 directly in code.
In the overwhelming majority of cases, such manifest constants are a bad idea. You never
know what the meaning is of so me random integer constant and whether the same
constant used elsewhere is related to it or not. To this end, the POSIX standard requires
the definition of the fo llowing symbolic constants in <uni s td . h>:
STDIN_ FILENO T he "file n umber" fo r standard input: O.
STDOUT_FILENO The fi le number for standard output: 1.
STDERR_FILENO The file number for standard error: 2.
However, in our humble opinion, using these macros is overki ll. First, it's painfol to
rype 12 or 13 characters instead of JUSt 1. Second, the use of 0 , 1, and 2 is 50 standard
and 50 well known that there's really no grounds for confusion as to the meaning of these
particular manifest constants.
On the other hand, use of these constants leaves no do ubt as to what was intended.
Co nsider this statement:
int fd = 0;

Is fd being initialized to refer to standard input, or is the programmer being careful to


initialize his variables to a reasonable value ? You can 't tell.
One approach (as recommended by Geoff Collyer) is to use the following enurn definition:
enum { Stdin, Stdout, Stderr };

These co nstants can then be used in place of 0, 1, and 2 . They are both readable and
eas ier to type.

4.4 .2 Opening and Closing Files


New file descriptors are obtained (among o ther sources) from the open () system
call. This sys tem call opens a file for reading or writing and returns a new file descriptor
for subsequent operations on the file . We saw the declaration earlier:
#include <sys /types . h> POSIX
#include <sys/sta t . h>
#include <fcntl . h>
#include <unistd . h>

int open(c onst char *pathname, in t flags, mode_t mode) ;

The three arguments are as follows :


94 Chapter 4 • Files and File I/O

cons t char *pathnarne


A C string, representing the name of the file to open.
int fla g s
The bitwise-OR of one or more of the constants defined in <f cntl.h> . We de-
scribe them shortly.
mode_t mode
The permissions mode of a file being created. This is discussed later in the chapter,
see Section 4.6 , "Creating Files," page 106. When opening an existing file , omit
this parameter. 1

The return value from open () is either the new file descriptor or - 1 to indicate an
error, in which case errno will be set. For simple I/O, the fla gs argument should be
one of the values in Table 4.3.

TABLE 4 .3
Flag values for open ( )

Symbolic constant Value Meaning


O_RDONLY o Open file only for reading; writes will fail.
O_ WRONLY 1 Open fil e only for writing; reads will fail.
O_RDWR 2 Open fil e for reading and writing.

We will see example code shortly. Additional values for flags are described in Sec-
tion 4.6 , "Creating Files," page 106. Much early Unix code didn't use the symbolic
values. Instead, the numeric value was used. Today this is considered bad practice, but
we present the values so that you'll recognize their meanings if you see them.
The close () system call closes a file: The entry for it in the system's file descriptor
table is marked as unused, and no further operations may be done with that file descrip-
tor. The declaration is
#include <unistd . h> POSIX

int close ( int fd ) ;

1 open () is one of the few variadic system calls.


4.4 Doing Inpu[ and OU[PU[ 95

°
The return value is on success, -1 on error. There isn't much you can do if an error
does occur, other than report it. Errors closing files are unusual, but not unheard of,
particularly for files being accessed over a network. Thus, it's good practice to check
the return value, particularly for files opened for writing.
If yo u choose to ignore the return value, specifically cast it to vo id, to signify that
you don't care about the result:
(vo id) close(fd) ; / * throwaway return va lue */

The flip side of this advice is that too many casts to void tend to the clutter the code.
For example, despite the "always check the return value" principle, it's exceedingly rare
to see code that checks the return value of printf () or bothers to cast it to void. As
with many aspects of C programming, experience and judgment sho uld be applied
here too.
As mentioned, the number of open files , while large, is limited, and you should always
close files when you're done with them. If you don't, you will eventually run out of file
descriptors, a situation that leads to a lack of robustness on the part of your program.
The system closes all open files when a process exits, but-except for 0,1 , and 2-it's
bad form to rely on this.
When open () returns a new file descriptor, it always returns the lowest unused integer
value. Always. Thus, if file descriptors 0- 6 are open and the program closes file descriptor
5 , then the next call to open () returns 5, not 7. This behavior is important; we see
later in the book how it's used to cleanly implement many important Unix features ,
such as 110 redirection and piping.

4.4.2. 1 Mapp ing FILE * Vari ables to File Descriptors


The Standard 110 library functions and FILE * variables from <stdio. h>, such as
stdin, stdout, and stderr, are built on top of the file-descriptor-based system calls.
Occasionally, it's useful to directly access the file descriptor associated with a
<stdio . h> file pointer if yo u need to do something not defined by the ISO C standard.
The fileno () function returns the underlying file descriptor:
#include <stdio . h> POSIX

int fileno( FILE *stream ) ;

We will see an example later, in Section 4.4.4, "Example: Unix cat ," page 99.
96 Chapter 4 • Files and File I/O

4.4.2 .2 Closing All Open Files


Open files are inherited by child processes from their parent processes. They are, in
effect, shared. In particular, the position in the file is shared. We leave the details for
discussion later, in Section 9.1.1.2, "File Descriptor Sharing," page 286.
Since programs can inherit open files, you may occasionally see programs that close
all their files in order to start out with a "clean slate." In particular, code like this
is typical:
int i;

1 * leave 0, 1, and 2 alone *1


for (i = 3; i < getdtablesize ( ) ; i++ )
(vo id ) close ( i ) ;

Assume that the result of get dtablesize () is 1024. This code works, bur it makes
(1024 - 3) * 2 = 2042 system calls. 1020 of them are needless, since the return value
from getdtablesize () doesn't change. Here is a better way to write this code:
int i, fds;

for (i = 3, fds = getdt ab lesize(); i < fds; i ++)


(void) close(i) ;

Such an optimization does not affect the readability of the code, and it can make a
difference, particularly on slow systems. In general, it's worth looking for cases in which
loops compute the same result repeatedly, to see if such a computation can't be pulled
our of the loop. In all such cases, though, be sure that you (a) preserve the code's cor-
rectness and (b) preserve its readability!

4.4.3 Reading and Writing


I/O is accomplished with the read () and wri te () system calls, respectively:
#include <sys/types .h> POSIX
#include <sys / stat . h>
#inc lude <fcntl . h>
#include <unistd.h>

ssize_t re a d(int fd, void *buf, size t count);


ssize_t wri te(int fd, const void *buf, size_t count);
4.4 D oing Inpm and O m pu( 97

Each function is about as simple as can be. The arguments are the file descriptor for
the open file, a pointer to a buffer to read data into or to write data from, and the
number of bytes to read or write.
The return value is the number of bytes actually read or written. (This number can
be smaller than the requested amount: For a read operation this happens when fewer
than count bytes are left in the file, and for a write operatio n it happens if a disk fill s
up or some other error occurs .) The return value is -1 if an error occurred, in which
case errno indicates the error. When r ead () returns 0, it means that end-of-file has
been reached.
We can now show the rest of the code for ch04 -cat. The p roc ess () routine uses
° if the input fil ename is "-", for standard input (lines 50 and 51) . Otherwise, it opens
the given file:
36 /*
37 * p r o ce ss do s ometh i ng with the file, in thi s case,
38 send it to stdout ( fd 1) .
39 Returns 0 if all OK, 1 otherwi se .
40 */
41
42 int
43 process(char *file)
44 {
45 int fd:
46 ssi ze_ t rcount, wcount :
47 cha r buffer[ BUFS IZ ] :
48 int errors = 0:
49
50 if ( st rcmp ( fi le, "-") == 0)
51 fd = 0 :
52 el se if ((f d = open(fil e , O_RDONLY) ) < 0) (
53 fprintf ( stderr, "%s : %s : c annot open for read ing : %s \ n",
54 myname, file, st rerr or( e r r n o )) :
55 re turn 1 :
56 }

The buffer buffe r (line 47) is of size BUFS IZ; this constant is defined by <stdio . h>
to be the "optimal" block size for I/O. Although the value for BUF SIZ varies across
systems, code that uses this constant is clean and portable.
The core of the routine is the following loop, which repeatedly reads data until either
end-of-file or an error is encountered:
98 Chapter 4 • Files and File I/O

58 while ( (rcount = read(fd, buffer, size o f buffer) ) > 0 ) (


59 wcount = write(l, buffer, rc ount ) ;
60 if (wcount != rcount ) (
61 fprintf ( stderr, "%s: %s: write error: %s \ n",
62 myname, file, strerror ( errno )) ;
63 errors++;
64 break;
65
66

The re ount and we aunt variables (line 45) are of type ssize_ t, "signed size_t,"
which allows them to hold negative values. Note that the count value passed to wri te ( )
is the return value from read () (line 59). While we want to read fixed-size BUFSIZ
chunks, it is unlikely that the file itself is a multiple of BUFSIZ bytes big. When the
final, smaller, chunk of bytes is read from the file, the return value indicates how many
bytes of buffer received new data. Only those bytes should be copied to standard
output, not the entire buffer.
The test 'wcount ! = reount' on line 60 is the correct way to check for write errors;
if some, but not all, of the data were written, then wcount will be positive but smaller
than reoun t.
Finally, proce ss () checks for read errors (lines 68-72) and then attempts to close
the file. In the (unlikely) event that close () fails (line 7 5) , it prints an error message.
Avoiding the close of standard input isn' t strictly necessary in this program, but it's a
good habit to develop for writing larger programs, in case other code elsewhere wants
to do something with it or if a child program will inherit it. The last statement (line 82)
returns 1 if there were errors, 0 otherwise.
68 if (rc ount < 0) (
69 fprintf ( stderr, "%s: %s : read error : %s \ n " ,
70 myname, file, strerror(errno));
71 err o rs++;
72
73
74 if ( f d '= 0) (
75 if (c l ose ( fd ) < 0 ) (
76 fprintf(stderr, " %s : %s: close error: %s \ n",
77 myname, file, strerror (errno )) ;
78 errors++;
79
80
81
82 return (errors ! = 0) ;
83
4.4 Doing Inpur and Outpur 99

ch04-c at checks every sys tem call for errors. While this is tedious, it provides ro-
bustness (or at least clarity): When so mething goes wrong, c h04-cat prints an error
message that is as specific as possible. The combination of er rno and strerror ()
makes this easy (0 do. That's it for ch04- cat , only 88 lines of code!
To sum up , there are several points (0 understand about Unix I/O:

flO is uninterpreted.
The I/O system calls merely move bytes around. They do no interpretation of the
data; all interpretation is up to the user-level program. This makes reading and
writing binary suuctures just as easy as reading and writing lines of text (easier,
really, although using binary data introduces portability problems).
flO is flexible.
You can read or write as many bytes at a time as you like. You can even read and
write data one byte at a time, although doing so for large amounts of data is more
expensive that doing so in large chunks.
110 is simple.
The three-valued return (negative for error, zero for end-of-file, positive for a
co unt) makes programming straightforward and obvious.
110 can be partial.
Both read () and wri te () can transfer fewer bytes than requested. Application
code (that is, your code) must always be aware of this.

4.4.4 Example: Unix cat


As promised, here is the V7 version of ca t. 2 It begins by checking for options. The
V7 cat accepts a single option, -u, for doing unbuffered output.
The basic design is similar (0 the one shown above; it loops over the files named by
the command-line arguments and reads each file , one character at a time, sending the
characters (0 standard outp ut. Unlike our version, it uses the <stdi o. h> facilities. In
many ways code using the Standard 1/0 library is easier to read and write, since all
buffering issues are hidden by the library.

2 See /usr / src / cmd/ cat. c in the V7 distribution. Th e program co mpiles without change under GNU/Linux.
100 Chapter 4 • Files and File I/O

1 1*
2 * Concatenate files .
3 *1
4
5 #include <st dio.h>
6 #include <sys/types . h>
7 #include <sys/stat.h>
8
9 char stdbuf [BUFSI Z ] ;
10
11 main(argc , argyl int main (int argc, char ""argy)
12 char **ar gv;
13
14 int fflg = 0;
15 register FILE *fi;
16 register c;
17 int dey, ino = -1;
18 struct stat statb;
19
20 setbuf ( stdout, stdbuf);
21 for t ; argc >l && argv[1] [0]==' -' ; argc -- ,argv++ ) {
22 swi tch ( argv [1] [1]) { Process options
23 case 0 :
24 break;
25 case 'u':
26 setbuf (stdout, ( char * ) NULL ) ;
27 co ntinue;
28
29 brea k;
30
31 fsta t(fil eno(stdout), &statb ) ; Lines 3 1- 36 explained in Chapter 5
32 statb.st_mode &= S_IFMT;
33
34 dey statb . st_dev ;
35 inc = stat b.st_ino;
36
37 if (argc < 2 )
38 argc = 2;
39 fflg++;
40
41 whil e (--argc > 0) { Loop over files
42 if (fflg II (*++argv) [0] == ' - ' && ( *argv) [1]== ' \0')
43 fi = stdin;
44 else {
45 if ( ( f i = f open(*argv, "r " )) == NULL ) {
46 fprint f (stderr, "cat : can't op en %s \n" , *argv ) ;
47 conti nue;
48
49
4.4 Doing In pu[ and Outp ut 101

50 fstat ( filen o ( fi ) , &statb ) ; Lines 50- 56 explained in Chapter 5


51 if ( stat b . st_dev==dev && s ta tb . st_i no==ino) (
52 fp rin tf ( stderr , "cat: input %s is output \ n",
53 ff lg? " - ": *a rgv ) ;
54 f close(fi ) ;
55 co ntinue ;
56
57 while ({ c = get c( fi )) ! = EOF ) Copy file contents to stdout
58 pu tchar (c ) ;
59 if (f i!=st d i n)
60 f close (f i ) ;
61
62 re turn ( O) ;
63

Of note is that the program always exits successfully (line 62); it co uld have been
written to note errors and indicate them in ma in ( ) 's return value. (The mechanics of
process exiting and the meaning of differe nt exit status values are discussed in Sec-
tion 9.l.5.1, "Defining Process Exit Status," page 300.)
T he code dealing with the struet stat and the fsta t () functio n (lines 31-36
and 50-56) is undoubtedly opaque, since we haven't yet covered these functions, and
won ' t until the next chapter. (But do note the use of fileno () on line 50 to get at the
underlying file descriptor asso cia ted with the FILE * variables .) The idea behind the
code is to make sure that no input file is the same as the o utput file. This is intended
to prevent infinite file growth , in case of a co mmand like this:
$ cat myfile » myfile Append one copy of myfile onto itself?
And indeed, the check works:
$ echo hi > myfile Create a file
S v7cat myfile » myfile Attempt to append it onto itself
c at : input myfile is outpu t

If yo u try this with eh04-eat, it will keep running, and myfile will keep growing
until yo u interrupt it. The GNU version of c at does perform the check. Note that
something like the following is beyond eat's control:
S v7cat < myfile > myfile
ca t : input - is output
S Is -1 myfile
-rw-r--r -- 1 arnol d devel o Ma r 24 14: 17 myf i l e

In this case, it's too late because the shell truncated myf i le (with the > operator) before
cat ever gets a chance to examine the file!
In Section 5.4.4.2, "The V7 cat Revisited," page 150, we explain the struet stat
code.
102 Chapter 4 • Files and File I/O

4.5 Random Access: Moving Around within a File


So far, we have discussed sequentialI/O, whereby data are read or written beginning
at the front of the file and continuing until the end. Often, this is all a program needs
to do. However, it is possible to do random access I/O; that is, read data from an arbitrary
position in the file , without having to read everything before that position first.
The offiet of a file descriptor is the position within an open file at which the next
read or write will occur. A program sets the offset with the lseek () system call:
#include <sys/types . h> / * for off_ t * / POSIX
#include <un i std.h> / * decla res lseek () and whence values * /

off_t ls eek ( i nt fd, o ff_t offset , int whence) ;

The type ofet (offset type) is a signed integer type representing byte positions
(offsets from the beginning) within a file. On 32-bit systems , the type is usually a long.
However, many modern systems allow very large files, in which case off_t may be a
more unusual type, such as a C99 int64_ t or some other extended type. lseek () takes
three arguments, as follows:

in t fd
The file descriptor for the open file.
off t of fset
A position to which to move. The interpretation of this value depends on the
whenc e parameter. offset can be positive or negative: Negative values move to-
ward the front of the file; positive values move toward the end of the file.
int whence
Describes the location in the file to which o ffset is relative. See Table 4.4.

TABLE 4.4
whence values for lseek ( )

Symbolic constant Value Meaning


o o ffset is absolute, that is, relative to the beginning of the
file .
1 of fset is relative to the current position in the file .

2 o ffset is relative to the end of the file .


4.5 Random Access: Moving Around within a File 103

Much old code uses the numeric values shown in Table 4.4. However, any new code
you write should use the symbolic values, whose meanings are clearer.
The meaning of the values and their effects upon file positio n are shown in Figure 4. 1.
Assuming that the file has 3000 bytes and that the current offset is 2000 before each
call to lseek ( ), the new position after each call is as shown:

File start: 0 Current: 2000 File end: offset 3000

I---------------~I ···· · ........·


New position:

3040
2960
2040
1960
40
l L Lb""k"d'
lseek(fd,
l s eek ( fd ,
lse ek ( fd,
(o ff_t)
(ofCt) 40 ,
l s eek ( fd,
(o iCt ) 40,
-4 0 , SEEK_CUR) ;
SE EK_SET ) ;
10 ff. 1
(off t) -40,
SEEK_CUR ) ;
40, SEE K_END ) ;
SE EK_ END ) ;

FIGURE 4.1
Offsets for l s eek ( )

N egative offsets relative to the beginning of the file are meaningless; they fai l with
an "invalid argument" error.
The return value is the new position in the file. Thus, to find our where in the file
you are, use

curpos = l s e e k ( fd , (o ff_ t) 0, SE EK_CUR) ;

T he 1 in ls e e k () stands for long. l s eek () was inttoduced in V7 Unix when file


sizes were extended ; V6 had a simple seek ( ) system call. As a result, much old docu-
mentation (and code) treats the o ffset parameter as if it had type long, and instead
of a cast to o f f_ t , it's not unusual to see an L suffix on constant offset val ues:
curp o s = lseek ( fd , OL, SEEK_CUR ) ;

On systems with a Standard C compiler, where lseek () is declared with a prototype,


such old code continues to work since the comp iler automatically promotes the OL
from long to o fC t if they are different typ es .
One interesting and important aspect of lseek ( ) is that it is possible to seek beyond
the end of a file . Any data that are subsequently written at that point go into the file,
104 Chapter 4 • Files and File I/O

but with a "gap" or "hole" between the data at the previous end of the file and the new
data. Data in the gap read as if they are all zeros.
The following program demonstrates the creation of holes. It writes three instances
of a s tru c t at the beginning, middle, and far end of a file. The offsets chosen (lines
16-18, the third element of each structure) are arbitrary but big enough to demonstrate
the point:
/ * ch 04-ho l es. c Demonstrate lseek() and holes in files . * 1
2
3 #include <s t di o . h> 1* for fp r intf () , stderr, BUFSIZ * 1
4 #include <errno . h> 1* decla r e errno * 1
5 #include <fcnt l .h> 1* f or fl a gs for open ( ) * 1
6 #include <string . h> 1* decla r e strerror ( ) * 1
7 #include <unistd . h> 1* for s si z e - t * 1
8 #include <sys / types . h> 1* f or off _ t , etc. * 1
9 #include <sys / stat . h> 1* for mode - t * /
10
11 struct person (
12 char name [ 1 0] ; 1* first name *1
13 char id [1 0] ; 1* ID n umber * I
14 off_t pos; 1* posit i on in file, for demonstration * 1
15 peop l e [] = {
16 { "arno l d ", " 123456789", 0 l.
17 { "mi riam", "987654321", 10240 l.
18 "j oe " , " 192837465", 81920 },
19 };
20
21 in t
22 main ( in t argc , char * * argv )
23
24 int f d ;
25 int i, j;
26
27 if (argc < 2 ) (
28 fprintf ( stderr, "usage : %s file \ n", argv[O ]) ;
29 return 1;
30
31
32 fd = open (argv[l], O_ RDWR l o_CREATl o_TRUNC, 0666 ) ;
33 if (fd < 0 ) (
34 fprintf ( stderr, "%s : %s : cannot open for read / write : %s \ n" ,
35 a r gv[O], argv[l] , strerror(er r no )) ;
36 return 1;
37
38
39 j = sizeof (people ) I sizeof(people[O] ) ; 1* count of elements * 1

Lines 27-30 make sure that the program was invoked properly. Lines 32-37 open
the named file and verifY that the open succeeded.
4. 5 Random Access: Moving Around wirhin a File 10S

The calculati on on line 39 of j , the array element co unt, uses a lovely, portable trick:
The number of elements is the size of the entire array divided by the size of the first
element. The beauty of this idiom is that it's always right: No matter how many elements
yo u add to or remove from such an array, the compiler will figure it out. It also doesn' t
require a terminating sentinel element; that is, one in which all the fields are set to zero,
NU LL, or some such.

The work is done by a loop (lines 41-55), which seeks to the byte offset given in
each structure (line 42) and then writes the structure out (line 49):
41 for (i = 0; i < j ; i ++ ) {
42 if (lsee k (fd , people [i] .pos, SEE K_SET ) < 0) (
43 fprintf(stde rr, "%s : %s : see k er ror : %s\n ",
44 argv [O] , argv [ l], st rerro r (errno)) ;
45 (void) close ( fd ) ;
46 ret urn 1 ;
47
48
49 i f (wri te( fd, &peopl e [i], si ze of(people[i])) '= siz eo f (pe ople[ i] ))
50 fprintf ( s tderr, "%s : %S : writ e error : %s\ n",
51 argv[O], ar gv[l], str er ror(errn o)) ;
52 (void) clos e ( fd) ;
53 return Ii
54
55
56
57 / * all ok here * /
58 (void) close(fd);
59 return 0 ;
60

Here are the results when the program is run:


$ ch04-holes peoplelist Run the program
$ Is -Is peoplelist Show size and blocks used
16 -rw-r--r - - 1 arnold devel 81944 Mar 23 17 : 43 people list
$ echo 81944 / 4096 I be -1 Show blocks if no holes
20 . 005859375000000000 00

We happen to know that each disk block in the file uses 4 096 bytes. (H ow we know
that is discussed in Section 5.4.2, "Retrieving File Information, " page 14l. For now,
take it as a given.) The final be command indicates that a file of size 81 ,944 bytes needs
2 1 disk blocks. However, the -s op tion to 18, which tells us how many blocks a file
106 Chapter 4 • Files and File I/O

really uses, shows that the file uses only 16 blocks!3 The missing blocks in the file are
the holes. This is illustrated in Figure 4.2.

,,;_.
arno ld miriam joe

Block: 3 21

I~ Logical file size

FIGURE 4.2
Holes in a file

I
II NOTE ch0 4-ho1es. c does direct binary I/ O . This nicely illustrates the beauty
of random access I/ O: You can treat a disk file as ifit were a very large array of
binary data structures.

II In practice, storing live data by using binary I/ O is a design decision that you
should consider carefully. For example, suppose you need to move the data to
W a system using different byte orders for integers? Or different floating-point

IIformats? Or to a system with different alignment requirements? Ignoring such


issues can become significantly costly.

4.6 Creating Files


As described earlier, open () apparently opens existing files only. This section describes
how brand-new files are created. There are two choices: creat () and open () with
additional flags. Initially, crea t () was the only way to create a file, but open () was
later enhanced with this functionality as well. Both mechanisms require specification
of the initial file permissions.

4.6 . 1 Specifying Initial File Permissions


As a GNU/Linux user, yo u are familiar with file permissions as printed by 'l s -1':
read, write, and execute for each of user (the file 's owner), group, and other. The various

3 At least three of th ese blocks conrain the data that we wrote ou t; the others are for use by the operating system
in keeping track of where the data reside.
4.6 Creating Files 107

combinations are often expressed in octal, particularly for the chmod and umask com-
mands. For example, file permissions -rw-r--r-- is equivalent to octal 0644 and
-rwxr-xr-x is equivalent to ocral 075 5 . (The leading 0 is C's notation for octal values .)

When yo u create a file, you must know the protections to be given to the new file.
You can do this as a raw octal number if you choose, and indeed it's not uncommon
to see such numbers in older code. However, it is better to use a bitwise OR of one or
more of the symbolic constants from <sys/ stat . h>, described in Table 4.5.

TABLE 4.5
POSIX symbo lic constants for file modes

Symbolic constant Value Meaning


S IRWXU 00700 User read, write, and execute permission.
S IRU SR 00400 User read permission.
S IREAD Same as S_ IRUSR.
S lWU SR 00200 User write permission.
S IWRITE Same as S_IWUSR.
S IXUSR 00100 User execute permission.
S IEXEC Same as S_ IXUSR.
S_IRWXG 00070 G roup read, write, and execute pe rmission.
S IRGRP 000 40 Group read permission.
S IWGRP 00020 Group write permission.
S IXGRP 00010 Gro up execute permission.
S IRWXO 00007 Other read, write, and execute permission.
S IROTH 00004 Other read permission .
S IWOTH 00002 Other write permission.
S IXOTH 00001 Other execute permission.

The following fragment shows how to create variables representing permlsslOns


-rw-r--r-- and -rwxr-xr-x (06 44 and 0755 respectively):

rW_ffiode S IRUSR S lWUSR S IRGRP S_IROTH; / * 0644 */


rwx_ffiode S IRWXU S IRGRP S IXGRP S_I ROTH I S_IXOTH; / * 0 755 * /
108 Chapter 4 • Files and File I/O

Older code used S_IREAD, S_IWRITE, and S_IEXEC (Ogether with bit shifting (0

produce the same results:

rw_mode (S_IREADls_IWRITE) I (S_IREAD» 3) I (S_IREAD» 6); /* 0644 * /


rwx_mode = (S_IREAD I S_IWRITE I S_IEXEC) I
( (S_IREAD I S_IEXEC) » 3) I (( S_IREAD I S_IEXEC) » 6); / * 0755 * /

Unfortunately, neither notation is incredibly clear. The modern version is preferred


since each permission bit has its own name and there is less opportunity (0 do the bitwise
operations incorrectly.
The additional permission bits shown in Table 4.6 are available for use when you
are changing a file 's permission , but they should not be used when yo u initially create
a file. Whether these bits may be included varies wildly by operating system. It's best
not (0 try; rather, you should explicitly change the permissions after the file is created.
(Changing permission is described in Section 5.5.2, "Changing Permissions: chmod ( )
and fchmod()," page 156. The meanings of these bits is discussed in Chapter 11 ,
"Permissions and User and Group ID Numbers," page 403.)

TABLE 4 .6
Additional POSIX symbolic constants for file modes

Symbolic constant Value Meaning


S_ISUID 04000 Set user 10.
S_ISGID 02000 Set group 10.
S_ISVTX 01000 Save text.

When standard utilities create files, the default permissions they use are - rw-rw-rw-
(or 0666 ). Because most users prefer (0 avoid having files that are world-writable, each
process carries with it a umask. The umask is a set of permission bits indicating those
bits that should never be allowed when new files are created. (The umask is not used
when changing permissions.) Conceptually, the operation that occurs is
actual-permissions = (requested-permissions & (-umask));

The umask is usually set by the umask command in $HOMEI .profile when you
log in. From a C program, it's set with the umask () system call:
4. 6 Crearing Files 109

# include <sys/typ es . h > POSIX


#i n clude <sys / sta t . h>

The return value is the old umask. Thus, to determine the current mask, you must
set it to a value and then reset it (or change it, as desired):
mo d e_ t mas k = umas k ( O) ; I x re trieve cu rr ent mas k * 1
(v o i d ) umas k (ma s k ) ; 1* res tore i t * 1

Here is an example of the umask in actio n, at the shell level:


$ umask Show the curren t mask
0022
S touch newfi1e Create a file
$ 1s -1 newfi1e Show perm issions of new file
- r w- r - -r -- 1 a rnold devel o Ma r 24 15 : 43 newfi l e
$ wnask 0 Set mask to empty
$ touch newfi1e2 Create a second fi le
S 1s -1 newfi1e2 Show permissions of new file
- rw -rw-rw- 1 arnol d devel o Ma r 24 15 : 44 new fil e 2

4.6.2 Creating Files with crea t ( )


The crea t () 4 system call creates new files. It is declared as follows:
#include <s ys / t ypes . h> POSIX
# include <s y s /stat . h>
# include <f cnt l . h >

in t c r e at (const cha r *pathname , mode_ t mode) ;

The mo de argument represents the permissions for the new file (as discussed in the
previous section). The file named by p athname is created, with the given permission
as modified by the umask. It is opened for writing (only) , and the return val ue is the
file descriptor for the new file or -1 if there was a problem. In this case, er rno indicates
the error. If the file already exists , it will be truncated when opened.
In all other respects, file descriptors returned by creat () are the same as those
returned by open ( ) ; they're used for writing and seeking and must be closed with
c lo s e () :

4 Yes, rhar's how ir's spelled. Ken T hompson , one of rhe [wo "fa rh ers" of Unix, was once asked whar he wo uld
have done differendy if he had ir co do ove r again. H e rep lied rhar he would have speLl ed c reat () wirh an "e."
Indeed, rhar is exacrly whar he di d for rhe Plan 9 From Bell Labs o perating system.
110 Chapter 4 • Files and File I/O

int fd, count;

/* Error checking om itted for brevity * /


fd = creat( " /some/new/ file", 0666) ;
count = write(f d , "some data\ n ", 10 ) ;
(void) clo s e (fd) ;

4.6.3 Revisiting open ( )


You may recall the declaration for open ( ) :
int open( const char *pathname, int flag s, mode_t mode ) ;

Earlier, we said that when opening a file for plain I/O , we could igno re the mode
argument. Having seen crea t ( ), though, you can probably guess that open () can
also be used for creating files and that the mode argument is used in this case. This is
indeed true.
Besides the O_RDONLY, O_WRONLY, and O_RDWR flags, additional flags may be bitwise
OR'd when open () is called. The POSIX standard mandates a number of these addi-
tional flags. Table 4.7 presents the flags that are used for most mundane applications.

TABLE 4.7
Additional POSIX flags for open ( )

Flag Meaning
O_APPEND Force all wri tes to occur at the end of the fil e.
O_CREAT C reate the fil e if it doesn 't exist.
O_EXCL When used with O_ CREAT, ca use open () to fail if the file already exists.
O_TRUNC Truncate the file (set it to zero length) if it exists.

G iven O_APP END and O_TRUNC, you can imagine how the shell might open or create
files corresponding to the > and » operators. For example:
int f d;
extern char *filename;
mode_t mod e = S_IRUSRI S_IWUSR I S_IRGRP I S_IWGRPIS_IROTH I S_IWOTH; / * 0666 * /

fd = open ( filenam e, O._CREAT IO_WRONLYIO_TRUNC , mode ) ; /* for> * /

fd = open( fi lename, O_CREATlo_ WRONLYIO_APPEND, mode); /* for » */


4.6 Creating Files 111

Note that the O_EXCL flag would not be used here, since for both > and », it's not
an erro r for the file to exist. Remember also that the sys tem applies the umask to the
req uested permissions.
Also, it's easy to see that, at least conceptually, cr e at () could be wri tten this easily:
inc creat (const char *path, mode _t model

'i·.~· NOTE If a file is opened wi th O_ APPEND, al l data will be written at the end of
* the file , eve n if the c urrent po si tion has been reset w ith l s eek ( ) .
:(.~

Modern systems provide additional flags whose u ses are more specialized . Table 4.8
describes them briefly.

TABLE 4.8
Addition a l advanced POSIX flags for open ( )

Flag Meaning
O_ NOC TTY If the device being opened is a terminal, it does not become the process's
controlling terminal. (This is a more advanced topic, discussed briefly in
Sectio n 9.2. 1, page 312.)
O_ NONBLOCK Disables blocking o fI /O operations in certain cases (see Section 9.4.3 .4,
page 333) .
° DSYNC Ensure that data written to a file make it all the way to physical storage before
wri te () returns.

° RSYNC Ensure that any data that read () wo uld read, which may have been written
to the file being read, have made it all the way to physical storage before
read () returns.
Like O_DSYNC , but also ensure that all file metadata, such as access times, have
also been written to physical storage.

The O_DSYNC, O_RS YNC, and O_ SYNC flags need so me explanation. Unix sys tems
(including Linux) maintain an internal cache of disk blocks, called the buffer cache.
When the wri te () system call returns , the data passed to the operating system have
been copied to a buffer in the buffer cache. They are not necessarily written out to
the disk.
112 Chapter 4 • Files and File I/O

The buffer cache provides considerable performance improvement: Since disk 110
is often an order of magnitude or more slower than CPU and memory operations,
programs would slow down considerably if they had to wait for every write to go all
the way through to the disk. In addition, if data have recently been written to a file, a
subsequent read of that same data will find the information already in the buffer cache,
where it can be returned immediately instead of having to wait for an I/O operation
to read it from the disk.
Unix systems also do read-ahead; since most reads are sequential, upon reading one
block, the operating system will read several more consecutive disk blocks so that their
information will already be in the buffer cache when a program asks for it. If multiple
programs are reading the same file, they all benefit since they will all get their data from
the same copy of the file's disk blocks in the buffer cache.
All of this caching is wonderful, but of course there's no free lunch. While data are
in the buffer cache and before they have been written to disk, there's a small-but very
real-window in which disaster can strike; for example, if the power goes out. Modern
disk drives exacerbate this problem: Many have their own internal buffers, so while
data may have made it to the drive, it may not have made it onto the media when the
power goes our! This can be a significant issue for small systems that aren't in a data
center with controlled power or that don 't have an uninterruptible power supply (UPS). 5
For most applications, the chance that data in the buffer cache might be inadvertently
lost is acceptably small. However, for some applications , any such chance is not accept-
able. Thus, the notion of synchronous I/O was added to Unix systems, whereby a program
can be guaranteed that if a system call has returned, the data are safely written on a
physical storage device.
The O_ DSYNC Bag guarantees data integrity; the data and any other information that
the operating system needs to find the data are written to disk before wri te () returns.
However, metadata, such as access and modification times, may not be written to disk.
The O_S YNC Bag requires that metadata also be written to disk before wri te ( ) returns.
(Here too there is no free lunch; synchronous writes can seriously affect the performance
of a program, slowing it down noticeably.)

5 If you don 't have a UPS and you use your system for critical work, we highly recommend investing in one. You
should also be doing regular backups.
4.7 Forcing Daca [0 Disk 113

The O_ RSYNC Hag is for data reads: If read () finds data in the buffer cache tha t were
scheduled for writing to disk, then re ad () won' t return that data until they have been
written to disk. The other two Hags can affect this: In particular, O_SYNC will cause
re ad () to wait until the file metadata h ave been written out as well.

I NOTE As of kernel ve rsion 2.4, Linu x treats all three flags the sa me, with
~ essentially the meaning ofO_SYNC . Furthermore , Linux defines additional Rags
I that are Linux specific and intend ed for special ized uses . Check the GNU/ Linux
I
ill
open(2) manpage for more information .

4.7 Forcing Data to Disk


Earlier, we described the O_DSYNC, O_RSYNC, and O_ SYNC Hags for ope n ( ) . We
noted that using these Hags co uld slow a program d own since each wri t e () does not
return until all data have been written to physical media.
For a slightly higher risk level, we can have our cake and eat it too. We do this by
opening a fil e without one of the O_ x SYNC Hags and then using one of the foll owing
two system calls at whatever point it's necessary to have the data safely moved to phys-
ical storage:
# i nc lud e <unist d .h>

in t fs ync(int fdl; POSIX FSC


int fda tasync( int fd ) ; POSIX SIO

The f d a ta s y nc () system call is like O_DSYNC: It forces all file data to be written to
the final physical device. The fsyn c ( ) system call is like O_SYNC, forcing not just file
data, but also file metadata, to physical storage. The f s ync () call is more portable; it
has been around in the Unix world for lo nger and is more likely to exist across a broad
range of systems.
You can use these calls with <stdi o . h> file pointers by first calling ffl us h () and
then using f il eno ( ) to obtain the underlying file descrip to r. Here is an fpsync ( )
functi on that can be used to wrap both operations in one call. It returns 0 on success:
114 Chapter 4 • Fil es and File I/O

/ * fpsync -- - sync a stdi o FILE * v ar iable * /

in t fpsync ( FILE *fp )

if ( fp == NULL II ffl us h (fp) EOF II fs ync (fil e n o(fp)) < 0)


return -1;

return 0;
)

Technically, both of these calls are extensions to the base POSIX standard: f sync ( )
in the "File Synchronization" extension (FSC) , and f da tasyn c ( ) in the "Synchronized
Input and O utput" extension. N evertheless, you can use them on a GNU/Linux system
witho ut any problem.

4.8 Setting File Length


Two system calls make it possible to adjust the size of a fi le:
#include <unistd . h>
# include <sys/types . h >

in t trunc ate (const char *pa th, off _t length); XSI


int ftru nc a te (int fd, off_t length ) ; POSIX

As should be obvio us from the p arameters, trun c ate () takes a filename argument,
whereas f t runca t e () works on an open file descriptor. (T he xxx () and fxxxx ( )
naming convention for system call pairs that work on a fil ename or fil e descripto r is
common. W e see several examples in this an d subsequent ch apters.) For both, the
length argument is the new size of the file.
This system call originated in 4 .2 BSD Unix, and in early systems could only be used
to sh orten a file 's length, hence the name. (It was created to simplify implementation
of the truncate operation in Fortran.) On modern sys tems, incl uding Lin ux, the name
is a misnomer, since it's possible to extend the length of a file with these calls, not
just shorten a file. (However, POSIX indicates that the ability to extend a file is an
XSI extension.)
For these calls, the file being truncated must have write permission (for t r uncate ()) ,
or have been opened for writing (for ftrunca t e ( )). If the file is being shortened, any
data past the new end of the file are lost. (Thus, you can' t shorten the file, lengthen it
again , and expect to find th e original data.) If the file is extended, as with data written
after an ls eek ( ) , the data between the old end of the file and the new end of fi le read
as zeros.
4.10 Exercises 115

These calls are very different from ' open ( f i l e , ... I O_TRUNC, mode)' . The latter
truncates a file completely, throwing away all its data. These calls simply set the file 's
absolute length to the given value.
These functions are fairly specialized; they' re used only four times in all of the
GNU Coreutils code. We present an example use of ft r uncate () in Section 5.5.3 ,
"Changing Timestamps: utime (), " page 157.

4. 9 Summary
• When a system call fails , it usually returns -1, and the global variable e r r no is set
to a predefined value indicating the problem. The functions pe rr or () and
s t r er r or () can be used for reporting errors.
• Files are manipulated by small integers called file descriptors. File descriptors for
standard input, standard output, and standard error are inherited from a program's
parent process. Others are obtained with open () or creat ( ) . They are closed
with close (), and getdtables i ze () returns the maximum number of allowed
open files . The value of the umask (set with uma sk ( )) affects the permissions
given to new files created with c r eat () or the O_CREAT flag for open () .
• The read () and wri te () system calls read and write data, respectively. Their
interface is simple. In particular, they do no interpretation of the data; files are
linear streams of bytes. The lseek ( ) system call provides random access I/O: the
ability to move around within a file.
• Additional flags for open () provide for synchronous I/O, whereby data make it
all the way to the physical srorage media before wri te () or r ead () return. Data
can also be forced to disk on a controlled basis with fsyn c () or f d a t async ( ) .
• The trunc at e () and ftr un c a te () system calls set the absolute length of a file.
(On older systems, they can only be used to shorten a file; on modern systems
they can also extend a file.)

Exercises

1. Using just ope n ( ), read ( ) , wri te ( ) , and c l ose () , write a simple cop y
program that copies the file named by its first argument to the file named by
its second.
116 Chapter 4 • Files and File I/O

2. Enhance the copy program to accept" -" to mean "standard input" if used
as the first argument and "standard output" as the second. Does 'copy - -'
work correctly?
3. Look at the proc(5) manpage on a GNU/Linux system. In particular the fd
subsection. Do an 'ls -1 /dev/fd' and examine the files in the
/ proc / self / fd directly. If /dev/s tdin and friends had been around in the
early versions ofU nix, how would that have simplified the code for the V7 cat
program? (Many other modern Unix systems have a / dev / f d directory or
filesystem. If you're not using GNU/Linux, see what yo u can discover about
your Unix version.)
4. Even though you don 't understand it yet, try to copy the code segment from
the V7 cat . c that uses the struct stat and the fstat () function into
ch04-cat. c so that it too reports an error for 'cat file » file' .
5. (Easy.) Assuming the existence of strerror ( ), write your own version of
perror ().
6. What is the result of 'ul imi t -n' on your system?
7. Write a simple version of the umask program, named myumask, that takes an
octal mask on the command line. Use strtol () with a base of 8 to convert
the character string command-line argument into an integer value. Change the
umask to the new mask with the umask () system call.
Compile and run myumask, and then examine the value of the umask
with the regular umask command. Explain the results. (Hint: in Bash, enter
,type umask., )

8. Change the simple copy program you wrote earlier to use open () with the
O_SYNC flag. Using the time command, compare the performance of the
original version and the new version on a large file.
9. For ftruncate (), we said that the file must have been opened for writing.
How can a file be open for writing when the file itself doesn't have write
permission?
10. Write a truncate program whose usage is 'truncate filelength'.
In this chapter

• 5.1 Considering Directory Contents page 118

• 5.2 Creating and Removing Directories page 130

• 5.3 Reading Directories page 132


• 5.4 Obtaining Information about Files page 139
• 5.5 Changing Owners hip , Permission , and Mod ifi cation Times page 155

• 5.6 Summary page 162

• Exercises page 163

117
T his chapter continues the climb up the learning curve toward the next plateau:
understanding directories and information about files.
In this chapter we explore how file information is stored in a directory, how direc-
tories themselves are read, created, and removed, what information about files is
available, and how to retrieve it. Finally, we explore other ways to update file
metadata, such as the owner, group , permissions, and access and modification times.

5.1 Considering Directory Contents


All Unix systems, including Linux, use the same conceptual design for storing file
information on disk. Although there is considerable variation in the implementation
of the design, the interface at the C level remains consistent, making it possible to write
portable programs that compile and run on many different systems.

5.1.1 Definitions
USER FR I ENDLY by l ll iad

ReMeMBeR THe
DAYS Of' DOS
WITH WNFIG
FUS? e.DIT yeAH? WEU..- 00/,11 WIMP!
WAS SUCH A WUSS. I WROTe MY weLL.. I EDITeD
HOPa£SS IUSCD FUS WITH THe INODES BY
TeXT eDITOR. eDUN. UHO. HANO WITH
MAGNeTS-
\ /

Copyright 1997-2004 © J.D . "llIiad" Frazer


Used with permission . http: // www.userfriendly.org

We start the discussion by defining some terms.

Partition
A unit of physical storage. Physical partitions are typically either part of a disk or
an entire disk. Modern systems make it possible to create logical partitions from
multiple physical ones.

118
5.1 Considering Direcrory Conrenrs 119

Filesystem
A partition (physical or logical) that co ntains file data and metadata, information
abo ut files (as opposed to the file contents, which is information in the files). Such
metadata include file ownership, permissions, size, and so on, as well as information
for use by the operating system in locating file contents .
You place filesystems "in" partitions (a one-to-one correspondence) by wri ting
standard information in them. This is done with a user-level program, such as
mke2 fs on GNU/Linux, or newfs on Unix. (The Unix mkf s co mmand makes
partitions but is difficult to use directly. n ewfs calls it with the correct parameters.
If your system is a Unix sys tem, see the newfs(S) and mkfs(S) manpages for
the details.)
For the most part, GNU/ Linux and Unix hide the existence of filesys tems and
partitio ns . (Further details are given in Section S. l , "Mounting and U n mounting
Filesystems ," page 22S). Everything is accessed by pathnames, wi th out reference
to which disk a file lives on. (Contrast this with almost every other co mmercial
operating system, such as Open VMS, or the default behavio r of any
Microsoft system. )
[node
Short for "index node," ini tially abbreviated "i-node" and now written "inode."
A small block of information describing everythin g about a file except the fi le's
name(s). The number of in odes, and thus the number of unique files per filesystem ,
is set and m ade permanent when the filesystem is created. 'd f - i' can tell yo u
how many in odes you h ave and how many are used.
D evice
In the context of files, filesystems, and file metadata, a unique number representing
an in-use ("mounted") filesystem. The (device, in ode) pair uniquely identifies the
file: Two different files are guaranteed to h ave different (device, inode) pairs. This
is discussed in more detail later in this chapter.
D irectory
A special file , containing a list of (inode number, name) pairs. Directories can be
opened for reading but not for writing; the operating system makes all the changes
to a directory's contents.
120 Chapter 5 • Directories and File Metadara

Conceptually, each disk block contains either some number of inodes, or file data.
The inode, in turn, contains pointers to the blocks that contain the file's data. See
Figure 5.1.

.. Disk blocks, linearly in a partition -------_~

I I I
111 :11111 II
IN NINININ NI
I0 0 10 f 0 1 0 0 I Data Data Data Data
ID DIDID!D DI
iE ElE i ElE Ei
·t~s:m<:m~@"":>.~·" ~::r,:::;::::~;;s::::>.:t'::-.-x~§;;;:;;::::m::::.:."X::::~:::~.;;~:>~::::::e::::;;;:<::::::::::::::: <::::;:::~::~::::::::::'«i;::';;;:::;::::~:~:~~:::::::::::~ :::::~*mX;:~:i:!:;:::';:~::::::i:O:::::;:;":;::~:::::::;;::::ii:(';:;::':::::>';:"<::~;::::::;:::~::::; ::,:::::~~:w~.>,'~

l'----l=-l-------...t..--=-_1~---=-1---=-J______J
FIGURE 5.1
Conceptual view of inode and data blocks

The figure shows all the inode blocks at the front of the partition and the data blocks
after them. Early Unix filesystems were indeed organized this way. However, while all
modern systems still have in odes and data blocks, the organization has changed for
improved efficiency and robustness. The details vary from system to system, and even
within GNU/Linux: systems there are multiple kinds of filesystems, but the concepts
are the same.

5.1.2 Directory Contents


Directories make the connection between a filename and an inode. Directory entries
contain an inode number and a filename. They also contain additional bookkeeping
information that is not of interest to us here. See Figure 5.2.
Early Unix systems had two-byte inode numbers and up to 14-byte filenames. Here
is the entire content of the V7 / usr / include/sys / dir . h :
5.1 Considering Direcro ry Contents 121

I
23 [ . Dot
---->-----------
19 [ . . Dot·dot
----f-----------
-~:-~-~~~~:.----
Filename

o [ tempdata Empty slot


----r-----------
37 [ . p r ofile Filename
----1-----------
I
I

FIGURE 5.2
Conceptual directory contents

#ifndef DIRSIZ
#de fin e DIRSIZ 14
#endi f
struct di rec t

ina - t d _ino ;
char d _name [D IRSI Z] ;
};

An ino_t is defined in the V7 <s ys / types . h> as ' type def unsigned in t
ino_ t;' . Since a PDP-II int is 16 bits, so too is the ino_ t . This organization made
it easy to read directo ries directly; since the size of an entry was fixed, the code was
simple. (The only thing to watch out for was that a full 14-character d_name was not
NUL-terminated.)
Directory content management was also easy for the system. When a file was removed
from a directory, the system replaced the inode number with a binary zero, signifYing
that the "slot" in the directory was unused. New files could then reuse the empty slot.
This helped keep the size of directory files themselves reasonable. (By convention, inode
number 1 is unused; inode number 2 is always the first usable inode. More details are
provided in Section 8.1, "Mounting and Unmounting Filesys tems," page 228.)
Modern systems provide long filenames. Each directory entry is of variable length,
with a common limit of 255 bytes for the filename component of the directory. Later
on, we show how to read a directory's contents on a modern system. Modern systems
also provide 32-b it (or even 64-bit!) inode numbers.
122 Chapter 5 • Directories and File Metadata

5.1.3 Hard Links


When a file is created with open () or crea t ( ) , the system finds an unused in ode
and assigns it to the new file. It creates the directory entry for the file, with the file 's
name and inode number in it. The -i option to 18 shows the inode number:
$ echo hello, world > message Create new file
$ 1s - il message Show inode number too
228786 -rw-r- -r-- 1 arnold devel 13 May 4 15 : 4 3 message

Since directory entries associate filenames with inodes , it is possible for one file to
have multiple names. Each directory entry referring to the same inode is called a link,
or hard link, to the file. Links are created with the ln command. The usage is 'I n
oldfile newfile' :
$ In message msg Create a link
$ cat msg Show contents of new name
hello, world
$ 1s -il msg message Show inode numbers
228786 -rw-r--r -- 2 arno ld devel 1 3 May 4 15:43 message
228786 -rw-r--r -- 2 arnold devel 13 May 4 15:43 msg

The output shows that the inode numbers of the two files are the same, and the third
fiel d in the long output is now 2. This field is the link count, which reports how many
links (directory entries referring to the inode) the file has.
It cannot be emphasized enough: Hard links all refer to the same file. If you change
one, you have changed the others:
$ echo "Hi, how ya doin' ?" > msg Change file by new name
$ cat message Show contents by old name
Hi, how ya doi n' ?
$ 1s -i1 message msg Show info. Size changed
228786 -rw-r --r-- 2 arnold devel 19 May 4 15: 51 message
22 878 6 -rw-r--r-- 2 arn old devel 19 May 4 15:5 1 msg

Although we've created two links to the same file in a single directory, hard links are
not restricted to being in the same directory; they can be in any other directory on the
same filesystem. (This is discussed a bit more in Section 5.1.6, "Symbolic Links,"
page 128.)
Additionally, you can create a link to a file you don 't own as long as you have write
permission in the directory in which you're creating the link. (S uch a file retains all the
attributes of the original file: the owner, permissions, and so on. This is because it is
the original file; it has only acquired an additional name.) User-level code cannot create
a hard link to a directory.
5.1 Considering Directory Contents 123

Once a link is removed, creating a new file by the same name as the original file
creates a new file:
$ rm message Remove old name
$ echo "What's happenin?" > message Reuse the name
$ Is -il msg message Show information
22879 4 -rw-r --r-- 1 arnold devel 17 May 4 15 : 58 message
228786 -rw-r--r-- 1 arnold devel 19 May 4 15 : 51 msg

Notice that the link co unts for both files are now equal (0 l.

At the C level, links are created wi th the 1 i nk () sys tem call :


#include <unistd . h> POSIX

int linklconst char *oldpath, const char *newpath ) ;

The return value is 0 if the li nk was created su ccessfully, or - 1 oth erwise, in which
case errno reRects the erro r. An im portant failure case is one in which newpa th already
exists. T h e system won' t rem ove it for you, since attempting to do so can cause incon-
sistencies in the filesystem.

5.1.3.1 The GNU link Program


The In program is complicated an d large. H owever, the GNU Core utils contains a
simple link program that just calls link () on its firs t two arguments. T he fo llowing
exam ple shows the code from link. c , with some irrelevant parts deleted. Line n umbers
relate to the actual file.
20 /* Implementation overview :
21
22 Simply call the system 'link' function */
23
... #include statements omitted for brevity .. .
34
35 /* The official name of this program le . g . , no 'g' prefix). */
36 #define PROGRAM_NAME "link "
37
38 #define AUTHORS "Michael Stone "
39
40 / * Name this program was run with . */
41 char *program_name;
42
43 void
44 usage lint status)
45 {
... omitted fo r brevity ...
62 }
63
124 Chapter 5 • Directories and File Metadata

64 int
65 main (int argc, char **argv)
66
67 program_name = argv[O];
68 setlocale (LC_ALL, "");
69 bindtextdomain (PACKAGE, LOCALEDIR);
70 textdomain (PACKAGE);
71
72 atexit (close_stdout) ;
73
74 parse_long_options (argc, argv, PROGRAM_NAME, GNU_PACKAGE, VERSION,
75 AUTHORS, usage);
76
77 /* The above handles --help and --version .
78 Since there is no other invocation of getopt , handle here . */
79 i f (1 < argc && STREQ (argv [1], "- - " ) )
80 {
81 --argc;
82 ++argv;
83
84
85 if (argc < 3)
86 {
87 error (0, 0, _("too few arguments" )) ;
88 usage (EXIT_FAILURE);
89
90
91 if (3 < argc )
92 {
93 error (0 , 0 , _ ( "to o many arguments") ) ;
94 usage (EXIT_FAILURE);
95
96
97 if (link (argv[l], argv[2]) != 0)
98 error (EXIT_FAILURE, errno, _( "cannot create link %s to %s"),
99 quote_n (0 , argv[2] ) , quote_n (1 , argv[l]));
100
101 exit (EXIT_SUCCESS);
102 )

Lines 67-75 are typical Coreutils boilerplate, setting up internationalization, the


final action upon exit, and parsing the arguments. Lines 79-95 make sure that link is
called with only two arguments. The l ink () system call itself occurs on line 97. (The
quote_ n ( ) function provides quoting of the arguments in a style suitable for the current
locale; the details aren't important here.)
5.1 Co nsidering Direcrory Conrenrs 125

5.1.3.2 Dot and Dot-Dot


Ro unding off the discussion oflinks, let's look at how the '.' and ' . . ' special names
are managed. T hey are really just hard links. In the first case, '. ' is a hard link to the
directory containing it, and ' .. ' is a h ard link to the parent directory. The operating
system creates these links for you; as mentioned earlier, user-level code cannot create a
hard link to a directory. This example illustrates the links:
$ pwd Show current directory
I tmp
$ Is -ldi I tmp Show its inode number
225345 drwxrwx r wt 14 root root 4096 May 4 16 : 15 Itmp
$ mkdir x Crea te a new directory
$ Is -ldi x And show its inode number
52794 drwxr-xr-x 2 arno ld devel 4096 May 4 16 : 27 x
$ Is -ldi xl . xl .. Show. and .. inode numbers
52794 drwxr-xr-x 2 arnold devel 4096 May 4 16 : 27 xl.
225345 drwxrwxrwt 15 root root 4096 May 4 16 : 27 x l ..

The root's parent directory (I . . ) is a special case; we defer discussio n of it until


C hap ter 8, "Filesystems and Directory Walks," page 227.

5.1.4 File Renaming


Given the way in which directory entries map names to inode numbers , renaming
a file is conceptually quite easy:

1. If the new name for the file names an existing fi le, remove the exis ting file first.
2. Create a new link to the file by the new name.
3. Remove the old name (link) for the file. (Removing names is discussed in the
next section.)

Early versions of the mv command did work this way. However, when done this way,
file renaming is not atomic; that is, it d oesn' t happen in one uninterruptible operation.
And, on a heavily loaded system, a m alicious user could take advantage of race
conditions, 1 subverting the rename operation and substituting a different file for the
original one.

1 A race condition is a situation in which details of timing can produce unintended side effects or bugs. In thi s case,
the direcro ry, for a short period of time , is in an in co nsistent state, and it is this inco nsistency that introduces
the vulnerabi li ry.
126 Chapter 5 • Direcrories and File Metadata

For this reason, 4 .2 BSD introduced the rename () system call:


#include <s tdi o . h> ISOC

int rename(c onst char *oldpath, cons t char *newpa t h );

On Linux systems, the renaming operation is atomic; the manpage states:

If newpath already exists it will be atomically replaced ... , so that there is


no point at which another ptocess attempting to access newpa th will find
It mlssillg.
If newpa t h exists but the operatio n fails for some reason , rename guarantees
to leave an instance of newpath in place.
However, when overwriting there will probably be a window in which both
oldpath and newpath refer to the file being renamed.

As with other system calls, a 0 return indicates success, and a return value of -1 indi-
cates an error.

5.1 .5 File Removal


Removing a file means removing the file 's entry in the directory and decrementing
the file 's link count (maintained in the inode) . The contents of the file, and the disk
blocks holding them, are not freed until the link count reaches zero.
The system call is named un l ink ( ) :
# include <unis td . h> POSIX

int unlink ( c o n st char *pathname );

Given our discussion of file links, the name makes sense; this call removes the given
link (directory entry) for the file. It returns 0 on success and - 1 on error.
The ability to remove a file requires write permission only for the directory and not for
thefde itself. This fact ca n be confusing, particularly for new Linux/Unix users . However,
since the operation is one on the directory, this makes sense; it is the directory contents
that are being modified, not the file 's co ntents.2

2 Indeed , the file 's metadata are changed (the number of links), but that does not affect any oth er fil e amibute,
nor does it affect th e fil e's contents. U pdating the link co unt is the only operati on on a file th at doesn't involve
ch ecking (he file 's permissions.
5.1 Considering Directo ry Conrenrs 127

5.1.5.1 Removing Open Files


Since the earliest days of U nix, it has bee n possible to remove open files. Simply call
unl i n k () with the filename after a successful call to open () or creat () .
At first glance, this seems to be a strange thing to do. Since the system frees the data
blocks when a fi le's link co unt goes to zero , is it even possible to u se the open fil e?
The answer is yes, you can continue to use the open file normally. T he system knows
that the file is open, and therefore it delays the release of the file 's storage until the last
file descriptor on the file is closed. Once the file is completely unused, the storage is freed.
This operation also happens to be a useful one: It is an easy way fo r a program to
get temporary file storage that is guaranteed [Q be both private and automatically released
when no longer needed:
/ * Obta ining pri vate tempora ry st orage, er r or checking omitted for brev ity */
int fd;
mode _ t mode = O_CREAT lo_EXCL l o _ TRUNC lo_RDWR ;

fd = open( " /tmp/ myfile", mode , 0000) ; Open the file


u nl i nk(" /tmp/myfile " ) ; Remove it

... continue to use file ...


cl ose (fd) ; Close file, free storage
The downside to this approach is that it's also possible for a run away application to
fill up a filesystem with an open but anonymous file , in which case the system adminis-
trator has to try to find and kill the process. In olden days, a reboo t and fil esys tem
co nsistency check might have been required; thankfully, this is exceedingly rare on
modern systems.

5.1.5.2 Using ISO C: r emove ( )


ISO C provides the remove () fun ctio n for removing files; this is intended to be
a general function, usable on any system that supports ISO C, not just Unix and
GNU/Linux:
#include <stdi o . h> ISOC

int remove (const char *pa thname);

While not technically a system call, the return value is in the same vein: 0 on success
and -1 on error, with er rno reflecting the val ue.
128 Chapter 5 • Directories and File Metadata

On GNU/Linux, remove () uses the unlink () system call to remove files, and the
rmdir () system call (discussed later in the chapter) to remove directories. (On older
GNU/Linux systems not using GLIBC, remove () is an alias for unlink ( ) ; this fails
on directories . If you have such a system, you sho uld probably upgrade it.)

5.1 .6 Symbolic Links


W e started the chapter with a discussion of partitions, fil esystems, and inodes. We
also saw that directory entries associate names with inode numbers. Because directory
entries contain no other information, hard links are restricted to files within the same
filesystem. This has to be; there is no way to distinguish inode 2341 on one filesystem
from inode 2341 on another fil esystem. Here is what happens when we try:
$ mount Show filesystems in use
/dev/hda2 on / type ext3 ( rw )
/ dev /hda5 on /d type ext3 ( rw )

$ ls -li /tmp/message Earlier example was on filesystem for /


228786 -rw- r- -r-- 2 arno ld devel 19 May 4 15 : 51 /tmp / message
$ cat /tmp/message
Hi, how ya doin' ?
$ /bin/pwd Current directory is on a different {ilesystem
/d/home/ arnold
$ 1n /tmp/message Attempt the link
In: creat ing hard li nk ' . /message' to '/tmp/message' : Inval id cross-device l ink

Large systems often have many partitions, both on physically attached local disks
and on remo tely mounted network filesystems. The hard-link restriction to the same
filesystem is inconvenient, for example, if som e files or directories must be moved to a
new location, but old software uses a hard-coded filename for the old location.
To get around this restriction, 4 .2 BSD introduced symbolic links. A symbolic link
(also referred to as a soft Link) is a special kind of file (just as a directory is a special kind
of file). The contents of the file are the pathname of the file being "pointed to." All
modern Unix systems, including Linux, provide symbolic links; indeed they are now
part of POSIX.
Symbolic links may refer to any file anywhere on the system. They may als o refer to
directories. This makes it easy to move directories from place to place, with a symbolic
link left behind in the original location pointing to the new location.
5.1 Considering Directory Contents 129

When processing a filename, the system notices symbolic links and instead performs
the actio n on the pointed-to fi le or directory. Symbolic links are created with the - s
op tion to In:
$ /bin/pwd Where are we
/ d/home/arnold On a different filesystem
$ In -s /tmp/message ./hello Create a symbolic link
$ cat hello Use it
Hi, how ya doin' ?
$ ls -1 hello Show information about it
lrwxrwx rwx 1 arnold devel 12 May 4 16 :4 1 hello -> / tmp / message

The file pointed to by the link need not exist. The system detects this at runtime
and acts appropriately:
$ rm /tmp/message Remove pointed-to file
$ cat ./hello Attempt to use it by the soft link
cat : . /he llo : No such file or directo ry
$ echo hi again > hello Create new file contents
$ ls -1 /tmp / message Show pointed-to file info ..
-rw-r--r-- 1 arnold devel 9 May 4 16 : 45 /tmp/message
$ cat /tmp/message ... and contents
hi again

Symbolic links are created with the symlink () system call:


#include <unistd . h> POSIX

int symlink (const char *oldpath, const char *newpath);

The oldpath argument n ames the pointed-to file or directory, and newpath is the
name of the symbolic link to be created. The return value is 0 on success and - 1 on
error; see your symlink(2) man page for the possible errno values.
Symbolic links have their disadvantages:

• They take up extra disk space, requiring a separate inode and data block. H ard
links take up only a directo ry slot.
• They add overhead. The kernel has to work harder to resolve a pathname contain-
ing symbolic links.
• They can introduce "loops. " Consider the following:
$ rm -£ a b Make sure 'a' and 'b' don't exist
$ In -s a b Symlink old file 'a' to new file 'b'
$ In -s b a Symlink old file 'b' to new file 'a'
$ cat a What happens?
cat : a : Too many leve l s o f symbolic links
130 Chapter 5 • Directories and File M etadata

The kernel has to be able to detect this case and produce an error message.
• They are easy to break. If you move the pointed-to file to a different location or
rename it, the symbolic link is no longer valid. This can' t happen with a hard link.

5.2 Creating and Removing Directories


Creating and removing directories is straightforward. The two system calls, not sur-
prisingly, are mkdir () and rmd ir ( ) , respectively:
#include <sys / types.h> POSIX
#include <sys / stat . h>

int mkdir (const char *pathname , mode_ t mode) ;

#include <unistd . h> POSIX

int rmdir (const char *pathname);

Both return 0 on success and - 1 on error, with errno set appropriately. For mkdi r ( ) ,
the mode argument represents the permissions to be applied to the directory. It is
completely analogous to the mode arguments for c rea t () and open () discussed in
Section 4.6, "Creating Files," page lOG.
Both functions handle the' . ' and ' . . ' in the directory being created or removed. A
directory must be empty before it can be removed; errno is set to ENOTEMPTY if
the directory isn' t empty. (In this case, "empty" means the directory contains only ' . '
and ' .. ' .)
New directories, like all fi les, are assigned a group ID number. Unfortunately, how
this works is complicated. We delay discussion until Section 11.5.1 , "Default Group
for New Files and Directories, " page 412.
Both functions work one directory level at a time. If ! s ome d ir exists and
! s ome d ir ! sub1 d oes not, 'mkd ir ( " ! somedi r ! sub1 ! sub2 " ) , fails. Each component
in a long pathname h as to be created individually (th us the - p option to mkdir,
see mkdir(1 )).
Also , if pathname ends with a ! character, mkdir ( ) and rmdi r ( ) will fail on some
systems and succeed on others . The fo llowing program, ch 05- t r ymkd i r . c , dem on-
strates both aspects.
5.2 Crearing and Removing Direcrories 13 1

1 1* ch05-trymkdir . c -- - Demo nstrat e mkd ir () behavior.


2 Co urtesy of Nel son H. F . Beebe . * 1
3
4 #include <stdio . h >
5 #include <stdl ib.h>
6 #include <errno .h>
7
8 #i f 'defined ( EXIT_SUCCESS)
9 #define EXIT SUCCESS 0
10 #endif
11
12 void do_t est (const char *path)
13 (
14 int retcode;
15
16 errno = 0 ;
17 ret code = mkdir(path , 07 55 ) ;
18 prin tf( "mkdir(\"%s\ " ) returns %d : er rno = %d [%s]\n ",
19 pat h, retcode , er rno, s trerror( e rrn o)) ;
20
21
22 int main (void)
23
24 do_ test ( " Itmp /t1/t2/t3/t 4"); Attempt creation in subdirs
25 do_t est ( " / tmp / t1 /t2/t 3" ) ;
26 do_ test("/t mp / t1 /t2 " ) ;
27 do_test ("/t mp / t1" ) ;
28
29 do_tes t ( " / tmp /u1 " ) ; Make subdirs
30 do_test ( " / tmp/u1 /u2 " ) ;
31 do_test ( " /tmp / u1 /u2/u3 " ) ;
32 do_test ( " / tmp / u1 /u2/u3/u 4" ) ;
33
34 do_test ("/ tmp /v1/ " ) ; How is trailing 'I' handled?
35 do_test ("/ tmp / v1 / v2 / " ) ;
36 do_ te s t ("/ tmp/v1 / v 2 /v3 / " ) ;
37 do_test ( "/ tmp/v1/v2 /v3 /v4 I" ) ;
38
39 retur n ( EXIT_SUCCESS) ;
40

Here are the results under G NU/Linux:


$ chOS - trymkdir
mkdir ( " / tmp / t1 / t2 /t3/ t4") retu rns -1 : errno = 2 [No s uch file or dir ecto ry]
mkdir(" / tmp / t1 / t2 /t3 " ) returns -1 : errno = 2 [No suc h file or directory]
mkdir("/tmp/t1/t2" ) returns -1 : er rno = 2 [No suc h fil e or directory]
mkdir ( " / tmp/ t1" ) retur ns 0 : errno = 0 [Succe ss ]
mkdir("/tmp/u1 " ) returns 0 : errno = 0 [Success]
mkdir(" /tmp/u1 /u2 " ) returns 0 : er rno = 0 [Success]
132 Chapter 5 • Directories and File Meradara

mkdir("/tmp / ul/u2/u3 " ) re turns 0: errno = 0 [Succes s]


mkd ir("/ tmp / ul /u2/ u3 / u4" ) returns 0 : errno = 0 [Success ]
mkdir ( " / tmp /vl/ " ) returns 0 : errn o = 0 [Success]
mkd ir ( " / tmp / vl / v 2 / " ) returns 0: errno = 0 [Suc cess ]
mkdir( "/tmp/vl / v 2 /v3/ " ) returns 0 : errno = 0 [Success]
mkd ir( " /tmp / vl /v2/v3/v4/ " ) returns 0 : errno = 0 [Suc cess]

Note h ow GNU/Linux accepts a trailing slash. Not all systems do.

5.3 Reading Directories


On the original Unix systems, reading directory contents was easy. A program opened
the directory with open () and read binary struc t dire ct structures directly, 16
bytes at a time. The following fragment of code is from the V7 rmdir program,3 lines
60-74. It shows the check for the directory being empty.
60 if((fd = open(name,O )) < 0)
61 fpr intf ( stderr, "rmdir : %s unr eadable \ n", name ) ;
62 ++E rr ors;
63 return;
64
65 whi le(r ead (fd , (char * )&dir, sizeof dir) == sizeof dir) (
66 if(d ir . d_ino == 0) c ontinue;
67 if ( ! s trcmp (dir .d_name, ".") II !strcmp (di r.d_name, " .. ") )
68 con tinue;
69 fpri ntf (stderr , "rmdir: %s not empty \n", name ) ;
70 ++Error s ;
71 close (fd ) ;
72 retu rn;
73
74 close ( fd ) ;

Line 60 opens the directory for reading (a second argument of 0 , equal to O_RDONLY).
Line 65 reads the struct dire c t. Line 66 is the check for an empty directory slot;
that is , one with an inode number of o. Lines 67 and 68 check for ' .' and' .. '. Upon
reaching line 69, we know that some other fil ename has been seen and, therefore, that
the directory isn't empty.
(The test'! strcmp (51, s2)' is a shorter way of saying 'strcmp (51, s2) == 0' ;
that is, testing that the strings are equal. For what it's worth , we consider the
, ! 5 trcmp (sl, s2)' form to be poor style. As Henry Spencer once said, "s t r cmp ( )
is not a boolean!")

3 See / usr / src/ cmd/ rmdir . c in the V7 distribu t io n.


5.3 Reading Direcrories 133

When 4.2 BSD introduced a new filesys tem form at that allowed longer fil enames
and provided better performance, it also introduced seve ral new functions ro provide
a directory-reading abstraction. This suite of functions is usable no matter what the
underlying filesystem and directory organization are. The basic partS of it are what is
standardized by POSIX, an d programs using it are portable across GNU / Linux and
U nix systems.

5.3.1 Basic Directory Reading


Directory entries are rep resented by a struct diren t (not the same as the V7
struct direct !):
st ruct dirent {

ino_t d _i no; /* XSI extension - -- see text * /


cha r d _ name[ ... J ; / * See text on the size of thi s arr ay */

};

For portability, POSIX specifies only the d_name field, which is a zero-terminated
array of bytes representing the fil ename part of the directory entry. The size of d_name
is not specified by the standard, other than to say that there may be at m os t NAME_ MAX
bytes before the terminating zero. (NAME_ MAX is defined in <limits . h>.) T he XSI ex-
tension to PO SIX provides for the d_ ino inode number field .
In practice, since filenames can be of variable length and is usually fairly
NAME_MAX
large (like 255), the st ruct dirent contains additional members that aid in the
bookkeeping of variable-length directory entries on disk. These additional members
are not relevant for everyday code.
The following functions provide the directory-reading interface:
#include <sys /types .h> POSIX
#include <d iren t . h>

DIR *opendir(c onst cha r *name) ; Open a directory for reading


struc t dirent *readdir (DIR *dir ) ; Return one struct diren t at a time
int closedir(DIR *dir ) ; Close an open directory
void r ewinddir (DIR *dirp ); Return to the front ofa directory

The DIR type is analogous to the FILE type in <s td i o . h> . It is an opaque type,
meaning that application code is not supposed to know what's inside it; its contents
are for use by the other directory routines. If opendir () returns NULL , the named di-
rectory could no t be opened fo r reading and errno is set to indicate the error.
134 Chapter 5 • Directories and File Me(ada(a

Once you have an open DI R * variable, it can be used to retrieve a pointer ro a


s t ru ct di rent representing the next directory entry. re addi r () returns NULL upon
end-of-file or error.
Finally, closedir () is analogous to the f close ( ) function in <stdio . h> ; it closes
the open DI R * variable. The rewinddi r () function can be used ro start over at the
beginning of a directory.
With these routines in hand (or at least in the C library) , we can write a simple
catdi r program that "cats" the contents of a direcrory. Such a program is presented
in ch0 5- catdi r. c :
1 /* chOS-catdir . c Demonstrate opendir(), readdir(), closedir(). * /
2
3 #include <stdio.h> /* for printf () etc . * /
4 #include <errno.h> /* for errno * /
5 #include <sys/types.h> /* for system types * /
6 #include <dirent.h> /* for directory functions * /
7
8 char *myname;
9 int process(char *dir );
10
11 /* main --- loop over dir ectory arguments * /
12
13 int main(int argc, char **argv )
14
15 int i;
16 int errs = 0 ;
17
18 myname = argv [0) ;
19
20 if (argc == 1)
21 errs process("."); / * default to current directory * /
22 else
23 for ( i 1; i < argc; i++ )
24 errs += process (a rgv[i) ) ;
25
26 return (errs != 0);
27

This program is quite similar to ch04 - cat . c (see Section 4.2, "Presenting a Basic
Program Structure," page 84); the main () function is almost identical . The primary
difference is that it defaults to using the current directory if there are no arguments
(lines 20-21) .
5.3 Reading Direcrories 135

29 1*
30 * process --- do something wit h the di rect ory, in th is case,
31 print inode/name pairs on standard output .
32 Retur ns 0 if all o k, 1 o therwis e .
33 *I
34
35 int
36 process(char *dir )
37
38 OIR *dp;
39 struct dirent *ent ;
40
41 if ((dp = op endir (di r )) = = NULL) (
42 fprintr( s tderr, "% s : %s : cannot open for read ing : %s\n",
43 myname, di r, s crer ror(e rrno )) ;
44 return 1 ;
45
46
47 err no = 0 ;
48 whil e (( ent = readdir (dp)) ' = NULL )
49 printf ("%8Id %s \ n", enc->d_ino, enc->d_ name ) ;
50
51 if ( errno ! = 0) {
52 fprintf (stde rr , "%s : %s : rea ding directory ent ries : %s\n",
53 myname , dir, stre rror(er rno)) ;
54 return 1 ;
55
56
57 if (closedir (dp) ' = 0) (
58 fprintf (stde rr , "%S : %S : closedir: %s\n ",
59 myname , dir, st rerror(errno)) ;
60 return 1;
61
62
63 ret urn 0 ;
64

T he p r oc ess () functio n does all the work, and the m ajority of it is error-checking
code. The h eart of the function is lines 4 8 and 49 :
while (( ent = readdir (dp )) ! = NULL)
print f ( " %8Id %s\n ", en t->d_i no , ent ->d_ name ) ;

T his loo p reads directory entries, one at a time, until readdir () returns NUL L . The
loo p body prints the inode num ber and filename of each entry. H ere's wh at happens
when the p rogram is run :
136 Chapter 5 • Directories and File Metadata

$ ch05 - catdir Default to current directory


639063
639062
639064 p r oposal . txt
639012 lightsabers . u r l
688470 code
638976 progex. texi
639305 texinf o . tex
639007 15 - process es . texi
639011 OO - preface . tex i
639020 18 - tty . texi
638980 Makefil e
639239 19 -i 18n . texi

The output is not sorted in any way; it represents the linear contents of the directory.
(We describe how to sort the directory contents in Section 6.2, "Sorting and Searching
Functions," page 181 .)

5.3 . 1.1 Portability Considerations


There are several portability considerations. First, you should not assume that the
first two entries returned by readdir () will always be '.' and ' .. '. Many filesystems
use directory organizations that are different from that of the original Unix design , and
, . ' and ' .. ' could be in the middle of the directory or possibly not even present. 4
Second, the POSIX stanJard is silent about possible values for d_ino. It does say
that the returned structures represent directory entries for files; this implies that empty
slots are not returned by readdir ( ), and thus the GNU/Linux readdir () implemen-
tation doesn 't bother returning entries when 'd_ino == 0 '; it contin ues to the next
valid directory entry.
So, on GNU/Linux and U nix systems at least, it is unlikely that d_ino will ever be
zero . However, it is best to avoid using this field entirely if you can.
Finally, some systems use d_filen o instead of d_ ino inside the struct dir e nt .
Be aware of this if you have to port directory-reading code to such systems.

4 GNU/Linux systems are capable of mountin g filesystems from many non-U nix operating systems. Many com-
mercial Unix systems can also mount MS- DOS filesystems. Assumptions about Unix filesystem s don't apply in
such cases.
5.3 Reading Direc(Ories 137

Indirect System Calls

"Don't try this at home, kid s!"


-Mr. Wizard-

Many system calls, such as open ( ) , r ead ( ) , and wr i t e ( ) , are meant to be call ed
directly from user-level application code: in other words, from code that you, as a
GNU/Linux developer, would write.
However, other system calls exis t only (0 make it poss ible to implement higher-level,
standard library function s and should not be called directly. The GNU/Linux
ge t d e nt s () sys tem call is one such; it reads multi ple directory entries into a buffer
provided by the caller- in this case, the code that implements r e add ir ( ). The
re addir () code then returns valid directory entries from the buffer, one at a time,
refilling the buffer as needed.
These for-library- use-o nly system calls can be distinguished fro m for-user-use system
calls by their appearance in the man page. For examp le, from getdents (2):
NAME
get dent s - get d ir e c tory entr ie s
SYNOPSIS
#include <unistd . h >
#include <linux / type s . h>
#inc l ude <linux /dirent . h>
#include <linux/ un i std . h >

_sy s c al 1 3 ( i nt , g etdents, uint, f d , s truct d irent * , di rp, u in t , count ) ;

int getde nts (unsigne d int fd, s t r uct dirent *di r p, unsigne d int count);

Any system call that uses a _syscal lX () macro should not be called by application
code. (More information on these calls can be found in the intro(2) manpage; yo u should
read that manpage if yo u haven 't already.)
In the case of get d en ts ( ), many other Unix systems have a similar system call;
som etimes with the same name, sometimes wi th a different name. Thus, trying to use
these calls would only lead to a massive portability mess anyway; yo u're much better off
in all cases using readd ir ( ) , whose interface is well defined , standard, and portable.

5.3.1.2 Linux and BSD Directory Entries


Although we just said that you should only use the d_i no and d_name members of
the s truct dirent, it's worth knowing about the d_typ e member in the BSD and
Linux s tru c t d irent . This is an uns igned char value that stores the rype of the
file named by the directo ry entry:
138 Chapter 5 • Directories and File Metadata

st ruct d irent {

ino_t d_ino; / * As before * /


char d_name[ ... J ; /* As befor e */
unsigned cha r d_ type; /* Linux and mo dern BSD * /

} ;

d_type can have any of the values described in Table 5.1.

TABLE S.1
Values for d_type

Name Meaning

DT_BLK Block device fil e.


DT_CHR C haracter device file.
DT_DIR Directory.
DT_FIFO FIFO or named pipe.
DT_LNK Symbolic link.
DT_REG Regular file.
DT SOCK Socket.
DT_UNKNOWN Unknown file type.
Whiteout entry (BSD systems only).

Knowing the file's type just by reading the directory entry is very handy; it can save
a possibly expensive s ta t () system call. (The s ta t () call is described shortly, in Sec-
tion 5.4.2, "Retrieving File Information," page 141.)

5.3.2 BSD Directory Positioning Functions


Occasionally, it's useful to mark the current position in a directory in order to be
able to return to it later. For example, you might be writing code that traverses a direc-
tory tree and wish to recursively enter each subdirectory as yo u come across it. (How
to distinguish files from directories is discussed in the next section. ) For this reason ,

the original BSD interface included two additional routines:


#include <dirent.h> XSI

/ * Caveat Emptor : POSIX XSI uses long, not of f_t, for both function s * /
of f_t telld ir (DIR *dir ) ; Return current position
voi d seekdir(DIR *dir, off_t off set); Move to given position
5.4 Ob[aining Informa[ion abou[ Files 139

These routines are similar to the ft ell () and fs eek () functi ons in <stdi o . h> .
T hey return the current positio n in a directory and set the current position to a previ-
ously retrieved value, respectively.
These routines are included in the XSI part of the POSIX standard, since they make
sense only for directories that are implemented with linear storage of directo ry entries.
Besides the ass umptio ns made about the underlying directory structure, these routines
are riskier to use than the simple directory-reading routines. This is because the contents
of a directory might be changing dynami cally: As files are added to or removed from a
directory, the operating syste m adjusts the contents of the directory. Since directory
entries are of variable length, it may be that the absolute offset saved at an earlier ti me
no longer represe nts the start of a directory entry! Th us, we don 't reco mmend that yo u
use these functions unless you have to.

5.4 Obtaining Information about Files


Reading a directory to retrieve filenames is only half the battle. Once you have a
filename, yo u n eed to know h ow to retrieve the o ther information associated with a
file , such as the file 's type, its permissio ns, owner, and so on.

5.4.1 Linux File Types


Linux (and Unix) supports the following differe nt kinds of file types:

Regular files
As the name implies; used for data, executable programs, and anything else you
might like. In an '1 s - 1' listing, they show up with a ' - ' in the first character of
the permissions (mode) field .
Directories
Special files for associating file names with inodes. In an ' I s -1' listing, they show
up with a d in the first character of the permissions field.
Symbolic links
As described earlier in the chapter. In an ' Is -1' listing, they show up with an 1
(letter "ell," not digit 1) in the first character of the permissions fiel d.
140 Chapter 5 • Direc[Qries and File Metadata

Devices
Files representing both physical hardware devices and software pseudo-devices.
There are two kinds:
Block devices
Devices on which I/O happens in chunks of some fixed physical record size,
such as disk drives and tape drives . Access to such devices goes through the
kernel's buffer cache. In an '18 -1 ' listing, they show up with a b in the first
character of the permissions field.
Character devices
Also known as raw devices. Originally, character devices were those on which
I/O happened a few bytes at a time, such as terminals. However, the character
device is also used for direct I/O to block devices such as tapes and disks,
bypassing the buffer cache. 5 In an ' 1 8 -1' listing, they show up with a c in
the first character of the permissions field .
Namedpipes
Also known as FIFOs ("first-in first-out") files. These special files act like pipes;
data written into them by one program can be read by another; no data go to or
from the disk. FIFOs are created with the mkfifo command; they are discussed
in Section 9.3.2, "FIFOs," page 319. In an '1 8 -1 ' listing, they show up with a
p in the first character of the permissions field.

Sockets
Similar in purpose to named pipes,6 they are managed with the socket interprocess
communication (IPe) system calls and are not otherwise dealt with in this book.
In an ' 18 -1 ' listing, they show up with an 8 in the first character of the permis-
sions field .

5 Linux uses the block device for disks exclusively. Other systems use both.

6 Named pipes and sockets were developed independendy by the System V and BSD U nix groups, respectively.
As U n ix systems reco nve rged, both kinds of files became uni versally available.
5. 4 Obtaining Information about Files 141

5.4.2 Retrieving File Information


Three system calls return information about files:
#include <sys/types . h> POSIX
#include <sys/stat . h>
#include <unistd . h>

int stat(const char *file_name, struct stat *buf ) ;


int fstat(int filedes, struct stat *buf) ;
int lstat(const char *fi le_name , struct stat *buf);

The s t a t () function accepts a pathname and returns information about the given
file. It follows symbolic links; that is, when applied to a symbolic link, s ta t () returns
information about the pointed-to file , not about the link itself. For those times when
you want to know if a file is a symbolic link, use the lstat () function instead; it does
not follow symbolic li nks.
The f s ta t () function retrieves information about an already open file. It is partic-
ularly useful for file descriptors 0, 1, and 2, (standard input, output, and error) which
are already open when a process starts up. However, it can be applied to any open file.
(An open file descriptor will never relate to a symbolic link; make sure you under-
stand why.)
The value passed 10 as the second parameter should be the address of a struct
stat, declared in < s y s /stat . h> . As with the struct dirent , the st ru ct stat
contains at least the following members:
struct stat {

dey- t st_dev; /* device */


ino - t st _ino; /* inode * /
mode - t st_mode; /* type and protection */
nlink - t st_nlink; /* number o f hard links */
uid- t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_ t st_rdev; /* device type (block or character device) */
off - t st _size i /* total size, in bytes */
blks ize - t st_blksize; /* blocksize for filesystem I/O */
blkcnt - t st_blocks; /* number of blocks allocated */
time - t st_atime; /* time of last access */
time - t st_mtime; /* time of last modification */
time - t st_ctime; /* time of last inode change */

};

(The layout may be different on different architectures.) This structure uses a number
of typede f' d types . Although they are all (typically) integer types , the use of specially
142 Chapter 5 • Directories and File Metadata

defined types allows them to have different sizes on different systems. This keeps user-
level code that uses them portable. Here is a fuller description of each field .
st_dev
The device for a mounted filesystem. Each mounted filesystem has a unique value
for st_dev.
st ino
The file 's inode number within the filesystem. The (st_dev, st_ino) paIr
uniquely identifies the file .
st_rnode
The file's type and its permissions encoded together in one field. We will shortly
see how tu extract this information.
st_n l i nk
The number of hard links to the file (the link count). This can be zero if the file
was unlinked after being opened.
st_uid
The file 's UID (owner number).
st_ gid
The file's GID (group number).
st rdev
The device type if the file is a block or character device. s t _ rdev encodes infor-
mation about the device. We will shortly see how to extract this information. This
field has no meaning if the fil e is not a block or character devi ce.
st siz e
The logical size of the file. As mentioned in Section 4.5 , "Random Access: Moving
Around within a File," page 102, a file may have holes in it, in which case the size
may not reflect the true amount of storage space that it occupies.
s t _ bl ks i ze
The "block size" of the file. This represents the preferred size of a data block for
I/O to or from the file. This is almost always larger than a physical disk sector.
Older Unix systems don 't have this field (or st_blocks ) in the stru ct stat .
For the Linux ext2 and ext 3 filesystems , this value is 4096.
5.4 Obtaining In formatio n about Files 143

st_b locks
The number of "blocks " used by the file. On Linux, this is in units of 512-byte
blocks. On other systems, the size of a block m ay be different; check your local
stat(2) manpage. (This number comes from the DEV_B S IZE constant in
<sys / param . h>. This co nstant isn' t standardized, but it is fairly widely used on
U nix systems.)
The number of blocks may be m ore than 's t_s ize / 51 2 '; besides the data
blocks, a filesystem m ay use additional blocks to store the locations of the data
blocks. This is particularly necessary for large fi les.
st a time
The file's access time; that is, the last time the file's data were read .
st_mt i me
T he file 's modification time; that is , the last time the file 's data were written or
truncated.
st ctime
The file's inode change time. This indicates the last time when the file's metadata
changed, such as the permissions or the owner.

i
~ NOTE The st_c time field is not the file's "creation tim e" ! There is no such
. th ing in a Linux or Unix system . Some early documentati o n referred to the
st_c time field as the creation time. This was a m isguid ed effort to s imp lify
I th e presentation of the file m etadata.

The t ime_t type used for the st_atime, s t _mtime , and st_ c time fields represents
dates and times. These time-related values are sometimes termed timestamps. Discussion
of how to use a time_t value is delayed until Section 6. 1, "Times and Dates," page 166.
Similarly, the uid_t and gid_t types represent user and group ID numbers, which
are discussed in Section 6.3, "User and Group Names," page 195. Most of the other
types are n ot of general interest.

5.4.3 Linux Only: Specifying Higher-Precision File Times


T he 2.6 and later Linux kernel supplies three additional fields in th e struct sta t .
These provide nanosecond resolution on the file times:
144 Chapter 5 • Directories and File Metadata

The nanoseconds component of the file 's access time.


The nanoseconds component of the file 's modification time.
The nanoseconds component of the file's inode change time.
Some other systems also provide such high-resolution time fields, but the member
names for the s truct stat are not standardized, making it difficult to write portable
code that uses these times. (See Section 14.3.2, "Microsecond File Times: utimes () ,"
page 545, for a related advanced system call.)

5.4.4 Determining File Type


Recall that the s t_mode field encodes both the file's type and its permISSIOns.
< sys / s ta t . h > defines a n umber of macros that determine the file's type. In particular,
these macros return true or false when applied to the st_mode field. The macros corre-
spond to each of the file types described earlier. Assume that the following code has
been executed:
struct stat stbuf ;
char filenarne[PATH_ MAX }; / * PATH_MAX is fr om <li mits.h> * /

... fill in filenam e with a file name ..


i f (sta t(filenarne, & stbuf ) < 0) (
/ * handle error * /
)

Once stbuf has been filled in by the system, the following macros caI1 be called,
being passed stbuf . st_mode as the argument:
S_ ISREG{stbuf.st_mode)
Returns true if filen ame is a regular file.
S_I SDIR {stbuf .st_mode )
Returns true if fi lename is a directory.
S_ISCHR{ stbuf.st_mode)
Returns true if filen a me is a character device. Devices are shortly discussed in
more detail.
S_ISBLK{stbuf. st_mode )
Returns true if f i 1 ename is a block device.
S_ISF I FO{ stbu f. st_mo de )
Returns true if filen ame is a FIFO.
5.4 Ob(aining Informacio n abou( Files 145

S_ISLNK(stbuf . st_mode)
Returns true if fi lename is a symbolic link. (This can never return true if stat ( )
or fst at () were used instead of lstat ( ) .)
S_IS SOCK(stbuf .st_mode )
Returns true if f i 1 ename is a socket.

:r~
'~':~

i
'ru

I
NOTE It happens that on GNU/ Linux, these macros return 1 for true and 0
for false . However, on other system s, it's possible that they return an arbitrary
nonzero value for true, instead of 1. ( POSIX specifies only non ze ro vs. zero .)
\~ Thus, you shoul d always use these macros as standalone tests instead of testing
:~ the return va lue:
.ill if (S_ISREG(stbuf . st_mode)) . .. Correct
fg
~
~~ if (S_ IS REG (stbu f .s t_mode ) == 1) Incorrect
11

Along with the m acros, <sys / stat . h> provides two sets of bit masks. One set is for
testing permission, and the other set is for testing the type of a file. We saw the permis-
sion masks in Section 4 .6, "Creating Files," page 106, when we discussed the mode_ t
type and values for open () and creat ( ) . The bitmasks, their values for GNU/ Linux,
and their meanings are described in Table 5.2.
Several of these masks serve to isolate the different sets of bits encoded in the
s t_mode field:

• S_ IFMT represents bits 12- 15, which are where the different types of files are
encoded.
• S_IRWXU represents bits 6-8 , which are the user's permission (read, write, execute
for User) .
• S_IRWXG represents bits 3-5, which are the group's permission (read, write, execute
for Group).
• S_ IRWXO represents bits 0-2 , whi ch are the "other" permission (read, write, execute
for Other).

The permission and file type bits are depicted graphically in Figure 5.3.
146 Chapter 5 • Directories and File Meradara

TABLE 5.2
POSIX fi le-type and perm issi on bitmasks in <sys / stat .h>

Mask Value Meaning


S IFMT 0170000 Bitmask fo r the file type bitfields.
S IFSOCK 0140000 Socket.
S IFLNK 0120000 Sym bolic link.
S IFREG 0100000 Regular file.
S IFBLK 0060000 Block device.
S IFDIR 0040000 Directory.
S IFCHR 00200 00 C haracter device.
S IFI FO 0010000 FIFO.
S I SUI D 0004000 Setuid bit.
S ISGID 0002000 Setgid b it.
S ISVTX 0001000 Sti cky bit.
S I RWXU 0000 700 Mask for owner perm issio ns.
S IRUSR 0 000 4 00 Owner read permission.
S IWUSR 000 0200 Owner w rite permission.
S IXUSR 000 0100 Owner execute permission.
S IRWXG 00 00070 Mask for group perm issions.
S IRGRP 00000 4 0 Group read permission.
S IWGRP 0000020 Group write permission .
S IXGRP 00000 10 Group execute permission.
S I RWXO 0000007 Mask for permissions for others.
S IROTH 0000 004 Other read permission.
S IWOTH 0000002 O ther write permission.
S- IXOTH 0000001 Other execute permission.
5.4 Obraining Informarion about Files 147

T he file- rype masks are standardized primarily for compatibiliry with older code;
they should not be used directly, because such code is less readable than the corres pond-
ing macros . It h appens that the macros are implemented, logically enough, with the
masks, but that's irrelevant for user-level code.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 o

Fil e type Group r /w/x Other r / w/x

FIGURE 5.3
Permission and file-type bits

The POSIX standard explicitly states that no new bitmasks will be standardized in
the futute and that tests for any addi tional kinds of file rypes that may be added will
be available only as S_I Sxxx () macros.

5.4.4.1 Device Information


Because it is meant to apply to non-Unix sys tems as well as Unix sys tems, the POSIX
standard doesn' t define the meaning for the dev_t rype. H owever, it's worthwh ile to
know what's in a dev_t .
When S_ISBLK (sbuf. st_ffiode) or S_I SCHR(s buf. st_ffiode) is true, then the
device information is found in the s buf . st_rdev field. Otherwise, this field does not
contain any useful information.
Traditionally, Unix device files encode a major device number and a minor device
number within the dev _ t val ue. The major number distinguishes the device type, such
as "disk drive" or "tape drive." Major numbers also distinguish among different rypes
of devices, such as SCSI disk vs. IDE disk. The minor number distinguishes the unit
of that rype, for example, the first disk or the second one. You can see these val ues with
'ls -1 ':
148 Chapter 5 • Direcrories and File Meradara

$ 1s -1 /dev/hda /dev/hda? Show numbers for first hard disk


brw-rw---- 1 root disk 3, 0 Aug 31 2002 /dev/ hda
brw-rw---- 1 root disk 3, 1 Aug 31 2002 /dev/hda1
brw-rw---- 1 root disk 3, 2 Aug 31 2002 /dev/hda2
brw-rw---- 1 root disk 3, 3 Aug 31 2002 /dev/hda3
brw-rw---- 1 root disk 3, 4 Aug 31 2002 /dev/hda4
brw-rw---- 1 r oot disk 3, 5 Aug 31 2 002 /dev/ hdaS
brw-rw--- - 1 root disk 3, 6 Aug 31 2002 /dev/ hda6
brw-rw---- 1 root disk 3, 7 Aug 31 2002 /dev/hda7
brw-rw---- 1 root disk 3, 8 Aug 31 2002 /dev/hda8
brw-rw---- 1 root disk 3, 9 Aug 31 2 002 /dev/hda9

$ 1s -1 /dev/nu11 Show info for / dev/ null, too


crw-rw-rw- 1 root root 1, 3 Aug 31 2002 /dev/ null

Instead of the file size, l s displays the major and minor numbers. In the case of the
hard disk, / dev / hda represents the whole drive. /dev / hdal, / dev / hda 2, and so on,
represent partitions within the drive. They all share the same major device number (3),
but have different minor device numbers.
Note that the disk devices are block devices, whereas / dev / nu11 is a character device.
Block devices and character devices are separate entities; even if a character device and
a block device share the same major device number, they are not necessarily related.
The major and minor device numbers can be extracted from a d ev_ t value with the
ma j or () and mi nor () functions defined in <sys / sysmacros . h >:
#include <s ys / type s .h> Common
#i nclude <sys/sysmacros . h>

int major (dev_t dev); Major device number


int minor(dev_t dev); Minor device number
dev_t makedev(int major, int minor) ; Create a dev_t value
(Some systems implement them as macros.)
The makedev ( ) function goes the other way; it takes separate major and minor values
and encodes them into a dev_t value. Its use is otherwise beyond the scope of this
book; the morbidly curious should see mknod(2).
The following program, ch0 5-dev num . c , shows how to use the stat () system call,
the file-type test macros, and finally, the maj o r () and mi no r () macros.
1 * ch05-devnum . c -- - Demonstrat e stat() , ma jor(), lninor(). */

#inc l ude <stdio.h>


#include <errno .h>
#include <sys / types . h>
#include <sys/ stat . h>
#include <sys/sysmacros . h>
5.4 Obtaining In for mation about Files 149

int main (int argc, char **argv)

struct stat sbuf;


char *devtyp e;

if (argc I = 2)
fprintf( stderr, "usage : %s path\n", argv[O]);
exi t (1) ;

if (stat(argv[l], & sbuf) < 0)


fprintf( stderr, " %s : stat : %s\n ", argv[l], stre rror( errno)) ;
exi t ( 1) ;

if (S_ISCHR ( sbuf.st_mode))
devtype = "char";
else if (S_ISBLK(sbuf . st _mode))
devtype = "block";
el se {
fprintf (stderr, "%s is not a block or character devic e \n", argv[l]) ;
e xi t (1) ;

print f ( " %s: major : %d, minor : %d\ n", devtype,


major (s buf .st_rdev ) , mino r ( sbuf.st_rdev )) ;

e xit(O) ;

H ere is what happens when the program is run:


$ chOS-devnum /tmp Try a nondevice
/ tmp is not a block or character device
$ chOS - devnum /dev/null Character device
char: majo r: 1, minor : 3
$ chOS-devnum /dev/hda2 Block device
block: majo r : 3 , minor : 2

Fortunately, the outp ut agrees with that of ls, giving us confidence 7 that we h ave
indeed wri tten correct code.
Reproducing the o utput oEl s is all fine and good, but is it really useful? T he answer
is yes. Any application that works with file hierarchies must be able to distinguish among
all the different types of files. Consider an archiver such as tar or cpio . It would be
disastrous if such a program treated a disk device file as a regular file, attemp ting to
read it and sto re its contents in an archive! Or consider fin d, which can perform

7 The technical term is a warm jiIZZY.


150 Chapter 5 • Directories and File Metadaca

arbitrary actions b ased on the eype and other attributes of files it encounters. (fi nd is
a complicated program; see find(l ) if you're not familiar with it.) Or even something
as simple as a disk space accounting package has to distinguish regular files from
everything else.

5.4.4.2 The V7 cat Revisited


In Section 4.4.4, "Example: Unix cat, " page 99, we promised to return to the V7
ca t program to review its use of the s ta t () system call. The first group of lines that
used it were these:
31 fstat( fil eno(stdout), &statb);
32 statb . st_IDode &= S_ IFMT;

34 dev st a tb.s t_dev ;


35 ino = stat b .s t_ino;
36 }

This code should now make sense. Line 31 calls f s ta t () on the standard output
to fill in the s ta t b structure. Line 32 throws away all the information in
s tatb . s t_mode except the file eype, by ANDing the mode with the S_IFMT mask.
Line 33 checks that the file being used for standard output is not a device file . In that
case, the program saves th e device and inode numbers in dev and ino . These values
are then checked for each input file in lines 50- 56:
50 fstat(f il eno(fi), &statb ) ;
51 if (s tatb.st_dev ==dev && statb .s t _ino==ino)
52 fp ri ntf ( stderr, " ca t : input %s is output\n",
53 ff lg?"-": *argv);
54 fclo s e(fi);
55 co nt inu e ;
56

If an input file 's s t_dev and s t_ino values match those of the output file , then c a t
complains and continues to the next file named on the command line.
The check is done unconditionally, even though dev and ino are set only if
the output is not a device file. This works out OK, because of how those variables
are declared:
17 int dev, ino -1;
5.4 Obtaining Informarion about Files 151

Since i no is initialized to - 1, no valid inode number will ever be eq ual to ir. 8 That
dev is not so initialized is sloppy, but not a problem , since the test on line 51 requires
th at both the device and inode be equal. (A good compiler will complain that dev is
llsed without being initialized: 'gee - Wa 11' does.)
No te also that neither call to fsta t () is checked for errors. This too is sloppy, al-
though less so; it is unlikely that fsta t () wi ll fail on a valid file descriptor.
The test for input file equals output file is done only for nondevice files. This makes
it possible to use eat to copy input from device files to themselves, such as
with terminals:
$ tty Print current terminal device name
/ dev/ p ts!3
$ cat /dev/pts/3 > /dev/pts/3 Copy keyboard input to screen
this is a line of text Type in a line
this i s a line o f text cat repeats it

5.4.5 Working with Symbolic Links


In general, symbolic links act like hard links; file operations such as open () and
sta t () apply to the pointed-to file instead of to the symbolic link itself. However,
there are times when it really is necessary to work wi th the symbolic link instead of
with the file the link points to.
For this reason, the lstat () sys tem call exists. It behaves exactly like s tat () , but
if the file being checked happens to be a symbolic link, then the information returned
applies to the symbolic link, and not to the p ointed-to file. Specifically:

• S_ISLNK(sbu f .st_ffiode) will be true.


• sbu f . st_ siz e is the number of byres used by the name of the pointed-to file.

We already saw that the syml ink () system call creates a symbolic link. But given
an existing symbolic link, how can we retrieve the name of the file it points to? (Is
obviously can, so we ought to be able to also.)
Opening the link with open () in order to read it with read () won't work; open ()
fo llows the link to the pointed-to file. Symbolic links thus necessitate an additional
system call, named re adlink ( ) :

8 This s(a(ement was (rue for V7 ; (h ere are no such gu arantees on modern sys(ems.
152 Chapter 5 • Direcrories and File Meradara

#i nclude <unistd.h> POSIX

in t readl ink ( cons t char *path, char *buf, size_ t bufsiz ) ;

readl ink () places the contents of the symbolic link named by pa th into the buffer
pointed to by buf . No more than bufsiz characters are copied. The return value is
the number of characters placed in buf or - 1 if an error occurred . readlink () does
not supply the trailing zero byte.
Note that if the buffer passed in to readlink () is too small, you will lose informa-
tion; the full name of the pointed-to file won't be available. To properly use
r eadlink () , your code should do the followin g:

1. Use l s tat () to verify that you have a symbolic link.


2. Make sure that your buffer to hold the link contents is at least ' sbu f. st_size
+ l ' bytes big; the' + l' is for the trailing zero byte to turn the butTer into a
usable C string.
3. Call readl ink ( ) . It doesn't hurt to verify that the returned value is th e same
as sbuf. st siz e.
4. Assign ' \ 0 ' to the byte after the contents of the link, to make it into a C stri ng.

Code to do all that would look something like this:


/ * Err o r check ing omit ted for brevity */
int count;
cha r linkf il e [ PATH_MAX J. real fi le[PATH_MAXl; / * PATH_MAX is in <limits . h> * /
strut stat sbuf ;

.. . fill in linkfile with path to symbolic link of interest ..


lst at (linkfile , & sbuf); Get stat information
if ( ! S_ISLNK(sbu f. st_mode) ) Check that it's a symlink
/ * not a symbolic link, h and le it * /
if ( sbuf . st_size + 1 > PATH_MAX ) Check buffer size
/* handle buf fer size problems * /

count = r eadli nk(link file , real fi le, PATH_MAX ) ; Read the link
if (count ! = sbuf.st_size )
/ * something weird going on, handle i t * /

realfile[countl = '\0' ; Make it into a C string


This example uses fixed-size buffers for simplicity of presentation. Real code would
use malloe () to allocate a buffer of the correct size since the fixed-size arrays might
be too small. The file 1 i b / xreadl ink. c in the GNU Coreutils does just this . It reads
5.4 Obraining Informarion abour Files 153

the contents of a symbolic link into storage allocated by malloc ( ) . We show here just
the function ; most of the fil e is boilerplate definitions. Line numbers are relative to the
start of the file:
55 1* Call readlink t o get the s ymbolic l ink value of FILENAME .
56 Return a poi nter to tha t NUL-terminated string in mall oc'd stor age.
57 If readlink fai ls, return NULL (calle r may use errno c o diagnos e ) .
58 If r ealloc fails, or if the link value is longer than SIZ E_MAX :- ) ,
59 give a diagno s tic and exit . *1
60
61 c ha r *
62 xre a dlink (cha r c onst *filename)
63
64 1* The initi al buffer size f or the link val ue . A power of 2
65 detects arithmetic overflow earlier, but is not requir ed . *1
66 si ze_ t buf_size = 128;
67
68 while ( 1)
69 (
70 cha r *buffer = xmall oc (bu f_size ) ;
71 ssi ze_ t link_l ength = readl ink ( fi lename , buffer, buf _s i ze ) ;
72
73 if (l ink_leng th < 0)
74 (
75 inc saved_e rrno = errno ;
76 free (bu ffer ) ;
77 errno = saved_errno;
78 return NULL ;
79
80
81 if (( size_t ) link_leng th < bu f_ size )
82 (
83 buffer [link_ leng th) 0;
84 return buffe r;
85
86
87 free (buff er) ;
88 bu f_size *= 2;
89 if ( SSIZE_MAX < buf si ze II (SIZE_MAX I 2 < SSIZE_MAX && buf S1.ze 0) )
90 xalloc_d ie () ;
91
92 }

The function body consists of an infinite loop (lines 68-91), broken at line 84 which
returns the allocated buffer. The loop starts by allocating an initial buffer (line 70) and
reading the link (line 71) . Lines 73- 79 handle the error case, saving and restoring errno
so that it can be used correctly by the calling code.
Lines 81-85 handle the "s uccess" case, in which the link's contents' length is smaller
than the buffer size. In this case, the terminating zero is supplied (line 83) and then the
154 Chapter 5 • Directories and File Metadata

buffer returned (line 84), breaking the infinite loop. This ens ures that the entire link
contents have been placed into the buffer, since readlink () has no way to indicate
"insufficient space in buffer. "
Lines 87-88 free the buffer and double the buffer size for the next try at the top of
the loop. Lines 89-90 handle the case in which the link's size is roo big: bu C siz e is
greater than SSIZE_ MAX, or S SIZ E_MAX is larger than the value that can be represented
in a signed integer of the same size as used to hold SI ZE_MAX and buf_siz e has wrapped
around to zero. (These are unlikely conditions, but strange things do happen .) If either
condition is true, the program dies with an error message. Otherwise, the function
continues around to the top of the loop to make another try at allocating a buffer and
reading the link.
Some further explanation: The 'SIZE_MAX / 2 < S SIZE_MAX' condition is true
only on systems on which 'SI ZE_ MAX < 2 * SS IZE_MAX' ; we don ' t know of any, but
only on such a system can bu f _ s i z e wrap around to zero. Since in practice this co ndi-
tion can't be true, the compiler can optimize away the whole expression, including the
following 'buCsiz e == 0' test. After reading this code, you might ask, "Why not use
1 s ta t () to retrieve the size of the symbolic link, allocate a buffer of the right size with
mal lo c (), and be done ?" Well, there are a number of reasons .9

• 1 s ta t () is a system call-it's best to avoid the overhead of making it since the


contents of most symbolic links will fit in the initial buffer size of 128 .
• Calling lst a t () introduces a race condition : The link could change benveen the
execution of lsta t ( ) and read link () , forcing the need to iterate anyway.
• Some sys tems don' t properly fill in the st_si z e member for symbolic links. (Sad,
but true.) In a similar fashion, as we see in Section 8.4.2, "Getting the C urrent
Directory: g et cwd ( ) ," page 258, Linux provides special symbolic links under
/ proc whose st si ze is zero, but for which readlink () does return
valid conten t.

Finally, when the buffer isn't big enough, xreadl ink () uses fr e e () and malloc ( )
with a bigger size, instead of realloc ( ) , to avoid the useless copying that realloc ( )

9 Thanks to Jim Meyering for explaining [he issues.


5.5 Changing Ownership, Permission, and Modificarion Times 155

does. (The comment on line 58 is thus out of date since reall oc () isn't being used;
this is fixed in the post-5.0 version of the Coreutils.)

5.5 Changing Ownership, Permission, and Modification Times


Several additional system calls let you change other file-related information: in par-
ticular, the owner and group of a file , the file 's permissions, and the file's access and
modification times.

5.5.1 Changing File Ownership: chown ( ) , f c hown ( ) , and lchown ( )


File ownership and group are changed wi th three similar system calls:
#incl u d e <sy s/types h > POSIX
#i nclude <unistd . h>

inc chovm (const cha r *path, uid_ t ovmer, g id_ t gr oup ) ;


int r c hown ( int fd, uid_t owne r , gi d_t group) ;
i n t l c h o wn (cons t char *path , uid_t o wne r , gid_t group) ;

c hown () works on a pathname argument, fch own ( ) works on an open file, and
1ch own ( ) works on symbolic links instead of on the files pointed to by symbolic links.
In all other respects, the three calls work identically, returning 0 on success and - 1
on errOL
It is noteworthy that one system call ch anges both the owner and gro up of a file. To
change only the owner or only the group, pass in a value of -1 for the ID number that
is to be left unchanged.
While you might think that you could pass in the corresponding value from a previ-
ously retrieved s t ruct s ta t for the file or file descriptor, that method is more erro r
prone. There's a race condition: The owner or group could have changed between the
call to s ta t () and the call to chown ( ) .
You might wonder, "Why be able to change ownership of a symbolic link? The
permissions and owners hip on them don't marteL" But what happens if a user leaves,
but all his files are still needed? It's necessary to be able to change the ownership on all
the person's files to someone else, including symbolic links.
GNU/Linux systems normally do not permit ordinary (non-root) users to change
the ownership of ("give away") their files. Changing the group to one of the user's
groups is allowed, of course. The restriction on changi ng owners follows BSD sys tems,
156 Chapter 5 • Directories and File Metadata

which also have this prohibition. The primary reason is that allowing users to give away
files can defeat disk acco unting. Consider a scenario like this:
$ mkdir mywork Make a directory
$ chmod go-rwx mywork Set permissions to drwx------
$ cd mywork Go there
$ myprogram > large_ data_ file Create a large file
$ chmod ugo+rw large_ data_ file Set permissions to - rw-rw- rw-
$ chown otherguy large_ data_ file Give file away to otherguy
In this example, large_da ta_file now belongs to user otherguy. The original
user can continue to read and write the file , because of the permissions. But otherguy
will be charged for the disk space it occupies. However, since it's in a directory that
belongs to the original user, which cannot be accessed by o therguy, there is no way
for otherguy to remove the file.
Some System V systems do allow users to give away files. (Setuid and setgid files have
the corresponding bit removed when the owner is changed.) This can be a particular
problem when files are extracted from a . tar or . epio archive; the extracred files end
up belonging to the UID or GID encoded in the archive. On such systems, the tar
and epio programs have options that prevent this, but it's important to know that
ehown ( ) 's behavior does vary across systems.

We will see in Section 6.3 , "User and Group Names," p age 195, how to relate user
and group names to their corresponding numeric values.

5.5.2 Changing Permissions: chmod () and f chmod ()


After all the discussion in Chapter 4, "Files and File I/O," page 83, and in this
chapter, changing permissions is almost anti climatic. It's done with one of two system
calls, chmod () and fehmod ( ) :
#in clude <sys / types . h> POSIX
#include <sys /stat . h>

in t chmo d (cons t char *path, mode_t mode);


int fchm od(int fildes, mode_t mode );

chmod () works on a path name argument, and f chmo d () works on an open file.
(There is no lchmod () call in POSIX, since the system ignores the permission settings
on symbolic links. Some systems do have such a call, though.) Ai> with most other system
calls, these return 0 on success and -1 on failure. Only the file's owner or r oot can
change a file 's permissions.
5.5 C hanging Ownership , Permission, and Modifica rio n Times 157

T he mode val ue is created in the same way as for open () and creat () , as discussed
in Section 4.6, "Creating Files, " page 106 . See also Table 5.2 , which lists the permis-
SlO n constan ts .
T he system will not allow setting the setgid bit (S_ISGID) if the group o f the file
does not m atch the effective gro up ID of the p rocess or one of its supplem entary gro ups.
(We have no t yet d iscussed these iss ues in detail; see Sectio n 11 . 1.1, "Real and Effective
IDs, " page 40 5.) Of course, this check does not apply to r oot or to code running
as roo t .

5.5.3 Changing Timestamps: utime ( )


T he st ruct s ta t structure contains three fields of type t i me_t :
s t a t ime T he time the fi le was last accessed (read).
st_mtime The time the file was last modified (wrinen).
st c t i me T he time the file's inode was las t changed (for example, renamed).
A t i me_ t value represents time in "seconds since the Epoch." T he Epoch is the Be-
ginning of Time for comp uter systems. GNU/Lin ux an d U nix use Midnight, Jan uary
1, 1970 UTC 10 as the Epoch. M icrosoft W indows system s use M idnight J anuary 1,
1980 (local time, apparently) as the Epoch .
t ime_ t values are sometimes referred to as timestamps. In Section 6.1 , "Times and
Dates," page 166, we look at how these val ues are obtai ned an d at how they're used .
For now, it's enough to know what a time_t value is and that it represents seconds
since the Epoch.
T he uti me () system call allows yo u to change a file 's access and modificatio n
times tamps:
#include <sys/types. h> POSIX
#incl u de <utime . h >

int utime(const char *f i lename, struct utimbuf *buf);

A struct ut imbuf looks like this:

to UTC is a language- independe nt acronym for Coordinated U niversal T ime. O lder code (and so metimes older
people) refer ro this as "Greenwich Mean T ime" (GMT), which is the time in G reenwich , England . When time
zones ca me inro widespread use, G ree nwich was chosen as the locatio n to which all other time zo nes are relative,
either behind it or ahead of it.
158 Chapter 5 • Directories and File Metadata

s truct u t irnbuf {
time_t a ctime; / * access time * /
time_t modtime ; / * modifi c at i on t ime * /
};

If the call is successful, it returns 0; otherwise, it returns -1. Ifbuf is NULL, then the
system sets both the access time and the modification time to the current time.
To change one time but not the other, use the original value from the s t ruet sta t .
For example:
/ * Error checking o mi tted for brevity * /
struct s t a t sbuf;
struct utirnbuf uti
time_t now ;

time (& now) ; Get current time of day, see next chapter
stat( " /some / f ile ", & sbuf ); Fill in sbuf
ut . actime = sbuf.st_ at i me; Access time unchanged

u t . modtime = n ow - (24 * 60 * 60); Set modtime to 24 hours ago

u time( " /some /file ", & ut ) ; Set the values


About now, you may be asking yo urself, "Why would anyone want to change a file 's
access an d modification times?" Good question.
T o answer it, consider the case of a program that creates backup archives, such as
tar or e pi o . These programs have to read the contents of a file in order to archive
them. Reading the file, of course, changes the file's access time.
However, that file might not have been read by a human in 10 years. Someone doing
an '1 s -1 u' , which displays the access time (instead of the default modification time),
should see that the last time the file was read was 10 years ago. Thus, the backup program
should save the original access and modification times , read the file in order to archive
it, and then restore the original times with u t i rne ( ) .
Similarly, consider the case of an archiving program restoring a file from an archive.
The archive stores the file's original access and modification times. However, when a
file is extracted from an archive to a newly created copy on disk, the new file has the
current date and time of day for its access and modification times.
However, it's more useful if the newly created file looks as if it's the same age as the
original file in the archive. Thus, the archiver needs to be able to set the access and
modification times to those stored in the archive.
5.5 Changing Ownership, Permission , and Modifica(ion Times 15 9

IIi NOTE I n new code, yo u may wish to use the utimes () call (note the s in the
I name ), which is described later in th e book, in Section 14.3. 2, "Microsecond
I File Tim es: utimes ( ) ," page 545 .

5.5.3 .1 Faking u time (f ile, NULL)


Some older sys tems don ' t set the access and modification times to the current time
when the second argument to utime () is NULL . Yet, higher-level code (such as GNU
to uch) is simpler and more straightforward if it can rely on a single standardized
interface.
The GNU Coreutils library thus contains a replacement function for u ti me () that
handles this case, which can then be called by higher-level code. T his reflects the "pick
the best interface for the job" design principle we described in Section l.5, "Portabiliry
Revis ited," page 19.
The replacement function is in the fi le li b / utime . c in the Coreutils distribution.
The following code is the version from Coreutils 5.0. Line numbers are relative to the
start of the file:
24 #include <sys/types.h>
25
26 #ifdef HAVE_UTIME_H
27 # include <ut ime . h>
28 #endif
29
30 #include "full - wr ite.h"
31 #include " safe -read . h"
32
33 / * Some systems (even some that do hav e <utime . h>) don't declare this
34 structure anywhere. */
35 #if nde f HAVE_STRUCT_UTIMBUF
36 struct utimbuf
37
38 long actime ;
39 long modtime ;
40 };
41 #endif
42
43 / * Emulat e utime (f ile, NULL ) for syst ems (l ike 4 . 3BSD ) tha t do not
44 interp r et it to se t the access and modificat i on times o f FILE to
45 the cu rr ent time . Return 0 if s uccess ful , -1 if not . * /
46
160 Chapter 5 • Direcrories and File Meradara

47 static int
48 utime_null (const char *fil e )
49
50 #if HAVE_UTIMES_NULL
51 return utimes (file , 0);
52 #els e
53 int fd;
54 char c;
55 int status = 0 ;
56 struct stat sb;
57
58 fd = open(file, O_RDWR ) ;
59 if (fd0 <
60 II fs tat ( fd, &sb ) < 0
61 I I saf e_read (fd , &c, sizeof c ) == SAFE_READ_ERROR
62 II l s eek (fd , (o fe t) 0 , SEEK_ SET ) < 0
63 II full _write (fd , &c, sizeo f c) ! = sizeof c
64 /* Maybe do this - - it's nece ss ary on SunOS4. 1.3 with some combina tion
65 of patches, but that system doesn't use this code : it has utimes .
66 I I fsync ( fd ) < 0
67 *I
68 I I (st.s t_size == 0 && ftruncate (fd , st.st_s ize) < 0)
69 II close (fd) < 0)
70 status = -1;
71 return status;
72 #endif
73
74
75 i nt
76 rpl _ut ime (const cha r *file, const struct utimbuf *times)
77
78 if ( times )
79 return utime ( fil e, times ) ;
80
81 return utime_nu ll (file);
82 }

Lines 33-41 define the struc t utimbuf; as the comment says, some systems don ' t
declare the structure. The utime_nul l () function does the wo rk. If the utimes ()
system call is available, it is used. (ut imes () is a similar, but more advanced, system
call, which is covered in Section 14.3.2, "Microsecond File Times: ut ime s ( ) ," page 545 .
It also allows NULL for the second argument, meaning use the current time. )
In the case that the times must be updated manually, the code does the update by
first reading a byte from the file , and then writing it back. (The original Unix touc h
worked this way.) The operations are as follows:

l. Open the file , line 58.


2. Call stat () on the file, line 60.
5.5 C hanging Ownership , Permission, and Modificarion Times 161

3. Read one byte, line 6 1. For our purposes, safe_read () acts like read() ; it's
explained in Section 10.4.4, " Restartable System Calls," page 357.
4. Seek back to the front of the file with lseek ( ) , line 62. This is done to write
the just-read byte back on top of itself.
5. Write the byte back, line 63. full_write () acts like writ e (); it is also covered
in Section 10.4.4, "Restartable System Calls," page 357.
6. If the file is of zero size, use ftruncate () to set it to zero size (line 68) . This
doesn't change the file , but it has the side effect of updating the access an d
m odificatio n times. (ft runcate () was described in Section 4.8, "Setting File
Length," page 114.)
7. Close the file , line 69.

T hese steps are all done in one long successive chain of tests , inside an if. The tests
are set up so that if any operation fails, u time_null () returns - 1, like a regular sys tem
call. e rrno is automatically set by the system, for use by higher-level code.
The rp l_utime () function (lines 75-82) is the "replacement utime () ." If the
seco nd argument is not NULL , then it calls the real utime ( ) . Otherwise, it calls
utime _ null () .

5.5.4 Using fchown () and fc hmod () for Security


T he original U nix systems had only chown () and chmod () system calls. H owever,
on heavily loaded systems, these system calls are subject to race conditions, by which
an attacker could arrange to replace with a different file the file whose ownership or
permissions were being changed.
However, o nc