Linux Programming by Example
Linux Programming by Example
: PRENTICE
• • HALL
Programmi by Example
ARNOLD ROBBINS
Prentice Hall
Open Source Software Development Series
Arnold Robbins, Series Editor
Open Source technology has revolll(ionized the computing world. Many large-scale projects are
in production use worldwide, such as Apache, MySQL, and Postgres, with programmers writing
applications in a variety of languages includ ing Perl , Python , and PHP These technologies are in
use o n m any di fferent systems, ranging fro m proprietary sys tems , to Linux systems, to traditional
UNIX sys tems, to main fra mes.
T he Prentice Hall Open Source Software D evelopment Series is designed to bring you the
best of these Open Sou rce tech nologies. Not only will you learn how to use them for yo ur
projects, but you will learn ftom them. By seeing real code from real applications , yo u will learn
the best practices of Open So urce developers the wo rl d over.
The Linux® Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures
Claud ia Salzbe rg, Gordo n Fischer, Steven Smolski
013118 1637, Paper, 9/21/2005
A comprehensive view of the Linux Kernel is presented in a top down ap proach-t he big picture
first wi th a clear view of all components , how they interrelate, and where the hardware/softwa re
separation exists. The coverage of both (he x86 and the PowerPC is unique to this book.
To my wife Miriam)
and my children)
Chana) Rivka) Nachum) and MaIka.
Linux Programming
by Example
Arnold Robbins
PRENTICE HALL
Professional Technical Reference
PREN T ICE
HAll Upper Saddle River, NJ 07458
PTR www.phptr.com
© 2004 Pearson Education, In c.
PRENTI C E
Publi shin g as Prenrice Hall Professio nal Technical Refere nce
HAll U pper Saddl e River, New Jersey 074 58
PTR
Prenrice H all PT R offe rs d isco unrs on m is book wh en orde red in quantiry for bul k purchases or special sales. Fo r
more info rmat io n, please conract: U.S. Co rporate and Governm enr Sales, 1-800-382-34 19,
corpsales@ pearsonrechgro up.com. For sales o uts ide of the U nited States, please co nract: Inrernational Sales,
1-3 17 -58 1-3793, inrernati o nal@pearsonrech group.com.
Porti ons of Chapter 1, Copyright © 1994 Arn old David Robbins, first appeared in an article in Issue 16 of Linux
JournaL, reprinred by permi ssion.
Porti on s of the documenratio n for Valgrind , C o pyright © 2003 Julian Seward , reprinred by permi ssion.
Portions of the documentatio n fo r the DBUG library, by Fred N. Fish, reprinred by permiss ion.
The GNU programs in this book are Copyright © 1985-2003, Free Software Foundati on , Inc .. T he full list of fil es
and copyright dates is provided in the Preface. Each program is "free software; you can redistribute it and/or modify
it un der the terms of the G NU General Pu blic License as pu blished by the Free Software Foundation; either version
2 of the License, or (at your option) any later version." Appendi x C of this book p rovides m e text of the GNU
General Public License .
.All V7 Unix code and docum enration are Copyri ght © C ald era International In c. 2001 -200 2. Al l ri ghts reserved .
They are reprinred here und er the terms of th e C aldera Ancient UN IX License, which is reprod uced in full in
Appendix B.
Cove r im age courtesy of Parks Sabers, Inc. T he Arc-Wave(tm) saber is manufactured by Parks Sabers, Inc., C opyright
© 2001 , www. parksabers.com. Parks Sabers is not associated with any Lucasfi lm Ltd . properry, fi lm, or franchi se.
The programs and applications presenred in thi s book have been included fo r th ei r instructi onal val ue. They h ave
been tested with care but are not guaranteed fo r any particular purpose. The publisher does not offe r any warranti es
or represe ntations, nor does it accept any li abi lities with respect to the programs or applications. UN IX is a registered
tradem ark of T he O pen G roup in the U ni ted States and oth er co untri es.
Microso ft, MS, and MS- D OS are registered trademarks, and W indows is a trad emark of Microsoft Co rpo rati o n in
the U ni ted States and other countries. Linux is a registered trademark of Linux Torvalds.
All com pany and product names mentioned herein are the tradematks or registered tradem arks of th eir respective
owners.
This ma te rial may be distributed only subj ect ro th e terms and conditions set fo rth in th e O pen Pu bli cati on License,
vl.O or later (the latest version is presently available at http://www.opencontenr.org/openp ub/). with License
Option B.
Printed in the United States of America
ISBN 0-13-142964-7
T ext printed on recycled paper
First primi ng
Pearso n Ed ucation LTD .
Pearson Ed ucation Austral ia PIT, Lim ited
Pearso n Education Sin gapo re, Pte. Ltd .
Pearso n Ed ucation North Asia Ltd.
Pearso n Ed ucation C an ada, Ltd.
Pearso n Ed ucacion de Mexico, S.A. de c.v.
Pearso n Education- Japan
Pearso n Ed ucation M alays ia, Ptd. Ltd.
Contents
v
VI Coments
3.1 Linux/Unix Address Space ............................. .................. ... ...... ................ ....... . 52
3.2 Memory Allocation ....................................................... .............................. .... . 56
3.2.1 Library Calls: malloc (), calloc (), realloc (), free () ............ .. 56
3.2.1.1 Examining C Language Details ............................................ .. ............ . 57
3.2.1.2 Initially Allocating Memory: malloc () .......................................... . 58
3.2.1.3 Releasing Memory: free () ...................... .. .................................... .. 60
3.2.1.4 Changing Size: realloc () ............................................................ .. 62
3.2.1.5 Allocating and Zero-filling: calloc () ............................................ .. 65
3.2.1.6 Summarizing from the GNU Coding Standards .... .................... ........ .. 66
3.2.1.7 Using Private Allocators .................................................................... .. 67
3.2.1.8 Example: Reading Arbitrarily Long Lines ...................... .... ................ . 67
3.2.1.9 GLIBC Only: Reading Entire Lines: getline ( ) and getdelim () . 73
3.2.2 String Copying: s trdup () .................................................................... . 74
3.2.3 System Calls: brk () and sbrk () .......... .. ........................................ .... .. 75
3.2.4 Lazy Programmer Calls: alloca () .......... .............. ............................... . 76
3.2.5 Address Space Examination .. .............................................. .................. .. .. 78
3.3 Summary .. ... .. ....................... ........... ......................................... .. ..... ......... ... .... . 80
Exercises ........................ ............................................... ........ ........................... ...... .. .. .. 81
Contents VII
5.2 Creating and Removing Directories ... .... ....... .. ......... .... ... .. .... .. .......... ... ............. 130
5.3 Reading Directories ..... ..... .... ............ ...... ...... .... ......... ... ..... .. .. ...... ............... ..... . 132
5.3.1 Basic Directory Reading .... ... ..... .... ............ ...... ................ ......................... 133
5.3.1.1 Portability Considerations ... .... ........... .. .............................................. 136
5.3. 1.2 Linux and BSD Directory Entries ...... ..... ... .......... ..... ............. ...... .... .. . 137
5.3.2 BSD Directory Positioning Functions ............. ..... ..... ................. .. ... ........ . 138
5.4 Obtaining Information about Files ........... ..... ...................... ... .......... .. .... .......... 139
5.4.1 Linux File Types .... ........ ....... ....... ... ........................ .... ...... ...... .... ..... .... .... . 139
5.4.2 Retrieving File Information ... ..... .......... ..... .. ..... .... ..... ............. ..... .... .... ..... 141
5.4.3 Linux Only: Specifying Higher-Precision File Times ....... ... .. .... ... ... .......... 143
5.4.4 Determining File Type .. .... ........... ..... ................ ..... .... ...... .. ...... ................ 144
5.4.4.1 Device Information ... ................. ... ...... ....... .. ............. .. ........ ...... ....... .. 147
5.4.4.2 The V7 cat Revisited .. ..... .... .. ... ..... ... ... ... .... .... .. .... ... ... .. ... ..... ... .. ....... 150
5.4 .5 Working with Symbolic Links ....... ... ..... ...... ........... ... ........ ...... .. .... ..... ...... 151
5.5 Changing Ownership, Permission, and Modification Times .. ... ....... .. ............. .. 155
5.5.1 Changing File Ownership: chown (), fch own (), and l c h own () ....... 155
5.5.2 Changing Permissions: chmod() and f chrnod() ..... ... ........... .......... ... .. 156
5.5.3 Changing Timestamps: ut ime () ........ ........... ..... ..... ........ .... ........ ........ ... 157
5.5.3.1 Fakingutime ( f ile, NULL) ... . ... ........ .. .. ................ .. ................... . 159
5.5.4 Using fc h own () and fchrn od () for Security ...... ....... ... .................. .. ... 161
5.6 Summary ............ .. ....... ............. ...... ... ........... ....... ... .. .. ..... ..... ............. ...... ... .... .. 162
Exe rcises. .. ....... ... .......... ...... ........... ..... ...... .... .... ........ .... .. .. .... ... ..... ... ............. ... ..... ....... 163
Chapter 6 General Library Interfaces - Part 1 ............................................ 165
6.1 Times and Dates ......... ............. ........ .......... .......... ................. ......... ... ... ...... .. .... 166
6.1.1 Retrieving the Current Time: time () and difftime () .................. .. .. 167
6.1.2 Breaking Down Times: gmtime () and l ocalt ime () ... .......... ..... ....... 168
6.1.3 Formatting Dates and Times ............. .. .. .. .......... ..... .......... .. .. ......... ..... ...... 170
6.1.3. 1 Simple Time Formatting: asctime () and c time ( ) ..... ......... ........ 170
6.1.3 .2 Complex Time Formatting: str ft i me () .. ... .. .. ......... .... ... ... ....... ... .. 171
6.1. 4 Converting a Broken-Down Time to a t i me _ t ... ............ ........ ..... .. ........ 176
6.1.5 Getting Time-Zone Information .... ...... ......... ... .. .. ... .. ............................... 178
6.1.5.1 BSD Systems Gotcha: time zone ( ) , Not timezone ....... .......... ..... 179
6.2 Sorting and Searching Functions ... ............................................... .. ... .. .. ........... 181
6.2.1 Sorting: qso rt () .... ........ ... ..... ....... ... ....... ...... .... ....... ...... ........... ... ... .... .. 181
Contems IX
8.5 Walking a File Tree: GNU du .. ................ .... ... ... .................. ..... .. .. .. ... ........... .. 269
8.6 Changi ng the Root Directory: c hr oo t () ..... ... .. ...... .... .. ...... ..... .... ..... ........ .... . 276
8.7 Summary ... ................ .. .. .... .. ... ... .......... .... .... ..... .. ... .. .... ...... ...... ... ... .... ... .. .... .. .... 277
Exercises .. ..... ... .... ... .... .. ...... ... ...... .. .. .... ...... ...... .. ... ... ..... ..... .... ... .... ......... ....... .. .. ... ..... ... 278
9.1 Process Creation and Managem ent .. ...... .. .. .. ...... .. ...... .. ...... .. .... .. .. .. .. .. .... .. ........ . 284
9.l. 1 Creating a Process: fo rk () ...... ............ .. .... .................................... .. .... .. . 284
9.l.1.1 After the fork () : Shared and Distinct Attributes .......... .. .... .. .. ........ . 285
9. l.l.2 File Descrip to r Sharing .... .... .............. ............ .. .. .... .. .. .. .. .. .......... .. .... ... 286
9.1.l.3 File Descriptor Sharing and clo s e () ...... .. .. .. .. .. .. .. ....... .... .. ...... .. .... .. 288
9.1.2 Identifying a Process: getpid ( ) and getppid () ...... .. .. .... .... .. ........ .. .. 289
9.1.3 Setting Process Priority: ni c e () .. ............ .. ...... .... ...... .. .... .. .. .. .... ... .... ...... 29 1
9.1.3.1 POSIX vs. Reality ........ ...... .... .. .......... .. ........ .. .. .. .. .... ...... ...... ... ............ 293
9.1.4 Starting New Program s: T he exec () Family ...... .... .. ...... ........ ........ .... .. .. 293
9.l.4.1 T he e x ecve ( ) System Call ............ .. .................... .... .. .. ...... ...... .... .. .. 294
9. l.4.2 Wrapper Functions: e xec l () et al. ............ .. ........ .. .. .... .. .. .. .. ... ......... 295
9. l.4.3 Program N ames and a rgv[O) .. .. .. .... .... .. .. .. .. .. .. ............ .. .......... .... .... 297
9.1.4.4 Attributes Inherited across exe c () .. .. .. .. .. .. .. .. .... .. ....................... .. .. .. 298
9.1. 5 Terminating a Process .......... .. .. ...... .. .. .. .. .......... ........ .... .. ........ .. .............. .. 300
9.1.5.1 D efining Process Exit Status .. .. ...... .... .......... .. .. .. .. .. .. ........ .. ....... ......... . 300
9.1.5.2 Returning from ma i n ( ) .................. .. .. .. .......... ........... .. ......... ....... .... . 30 1
9.1.5.3 Exiting Functions .. .. ..... .. .. ...... .. ............ .. .... .... ........ .. .. .. .. .. .......... ....... . 302
9.l. 6 Recovering a Child's Exit Status .. .. .... .. .... ................ ................. .. .. .. .. .. ...... 305
9.l.6. 1 Using POSIX Fun ctions: wa it ( ) and wai tp i d () .. ......... ...... .. ...... 306
9. l.6.2 Using BSD Functions: wai t3 () and wai t4 () ...... .. ... .... ................ 310
9.2 Process Groups .. .... .. ....... .. ...... ...... .. .... .. ................................ ..... ... .... .... .......... .. 312
9.2.1 Job C ontrol Overview .. .. .. ...... ........ .... ........ ...... .... .... .................. .. ............ 312
9.2.2 Process Gro up Identification: getpg r p () and g etpgid ( ) .... .. ............ 314
9.2.3 Process Gro up Setting: s etpgid ( ) and se t pgrp () .. .. .......... .... .......... 314
9. 3 Basic Interprocess Communication: Pipes and FIFOs .... .... .. .. .. ... .. .. ............ .. ... 315
9.3.1 Pipes ........ .. .. ................. ....... .. .... .... .. .. .. ..... ..... ...... ..... .. .... ... .. .......... .... ....... 315
9.3.1.1 Creating Pipes .... ...... .. .. .................... .... .. .. .. ...... ....... .. ...... .. ...... .... .. ..... 316
9.3.1.2 Pipe Buffering .. .... ...... ...... .. .... .. ............ ......... .................... .. ............ .. . 318
Contents XI
9.3.2 FIFOs .. ....... .............. ... ....... .. .. .... ..... .... .......... .. ...... .... ..... ...... ........ ............ 319
9.4 File Descriptor M anagement .. .... .. ... ...... .. .............. .... ..... .... ... .... .. ............. .. ...... 320
9.4.1 Duplicating Open Files: dup () and dup2 () ...... ...... ......... ....... ............. 321
9.4.2 Creating Nonlinear Pipelines: I dev I fd l xx ........................................... 326
9.4.3 Managing File Attributes: fcntl () ..... ................ .. ......... ....... .......... ....... 328
9.4.3.1 The Close-on-exec Flag ... ......... .. .... ... .... ........... ..... ........................... .. 329
9.4.3.2 File Descriptor Duplication ................................................................ 331
9.4.3.3 Manipulation of File Status Flags and Access Modes ............ .............. 332
9.4.3.4 Nonblocking I/O for Pipes and FIFOs .... .......................... .... ............. 333
9.4.3.5 fcntl () Summary ................ .. .... .. .. .. ...... .. ........................ .. .... ........ . 336
9.5 Example: Two-Way Pipes in gawk .................................................................. 337
9.6 Suggested Reading ...... .... ................... ............. .......... ...... .. .. .... .. .............. .. .. .. .. .. 34 1
9.7 Summary ..................... ... .. ... ...... .. ....... ............... .. ............. ... .. ........................... 342
Exercises ... ........ ... ........ .... ........... ...................... ... .................. ...... ... ............... .. .. ..... ... .. 344
Chapter 10 Signals ...... .... ...... .. .. ........................... .. .......... ........ ..... .... .. .. .... .. . 347
11.8 C rossing a Security Minefield: Setuid root .... ... ......... ... ...... ........... ....... .. ........ 422
11.9 Suggested Reading .................................. ......... .... .............. .......................... ..... 423
11.10 Summary ... ........ ...................... ........................ .. ... .. .... ... .. .............. .... .. .. ........... 424
Exerc ises ..... .. .. .................. ....... ... ....... ... ... ......... ......... ......................... .... .. ....... ...... ...... 426
12.1 Assertion Statements: as se r t () .............................. .. ... ......... .... .. .... ... .. ...... ... 428
12.2 Low-Level Memory: T he me mXXX () Functions ...... .... ..... ............. .. .......... ...... 432
12.2. 1 Setting Memory: me mset () .... .. ........ .... ................................................ . 432
12.2.2 Copyi ng Memory: memcpy ( ) , memmove ( ) , and memcc py () .. ..... .. .. .. 433
12.2.3 Compar ing Memory Blocks: memcmp () .................. ...... ............. ... .. ...... . 434
12.2.4 Searching for a Byre Value: memc hr () .... .. ...... .. ... .. ................. .. .. .. ...... .. .. 435
12.3 Temporary Files .......... .......... ......... ........... ..... ... ............................. .... .. ...... .. .. .. 436
12.3.1 Ge nerating Temporary Filenames (Bad) ................................ ...... .. ... .. ...... 437
12.3.2 Creating and Openi ng Temporary Files (Good) .................... ... .. .... .... ... ... 44 1
12.3 .3 Using the TMPDIR Environment Variable ........ .... ................................ .... 443
12.4 Committing Suicide: abo rt () ....................... .. ........ ......................... ........ ..... 445
12.5 Non local Gotos .............................. .. ......... ..... ..... .............. ..... ......... ................ . 446
12. 5. 1 Using Standard Functions: se tjmp () and longjmp () ..... .... ..... .......... 447
12. 5.2 H andli ng Signal Masks: si g s etjmp ( ) and si g l o ng j mp () .. .. .... .. ..... 449
12.5.3 Observing Important Caveats .... .. .. .. .......... .. .. ... ............... ...... ................... 450
12.6 Pseudorando m Numbers ........ .................... .. ........ ......... .. ..... ....................... ..... 454
12.6. 1 Standard C: rand () and srand () ...... ........ .. ... ........ .... .... ................ ..... 455
12.6.2 POSIX Functions: random () and srandom () .................. .............. ..... 457
12.6.3 The Idev / random and Idev / urandom Special Files ...... .. .... .............. 460
12.7 Metacharacter Expans ions...... ..................... .. .... .......... .... .......... ....................... 461
12. 7. 1 Simple Pattern Matching: fnma tch () ..... .. ... .. ... ....... .... .. .... ............... .... 462
12.7.2 Filename Expansion: gl o b () and g lob free () .... ... .. ........ ..... ...... .. ..... 464
12.7.3 Shell Word Expansion: wo r d exp ( ) and wo r dfree () ......................... 470
12.8 Regular Exp ressions .... ........... ........ .................. .. .. ............................................. 47 1
12.9 Suggested Reading ........... .... ......... ..... ....................................... ..... .. ................. 480
12.10 Summary .... .......... .............. ... ........ ....... .. ...... .. ....... ................... ... .... .. .... ........... 48 1
Exercises ................. .............. ... ......... ...... ... ... ........... . .... ........ ....... .. ......... ... ........ .... ...... 482
XIV Contents
14.1 Allocating Aligned Memory: posix_rnernal ign () and memalign () ........ . 530
14.2 Locking Files ............ ................. .. ...... ...................... ........................... ......... .... . 531
Comems xv
14.2. 1 File Locking Co ncepts ..... ................ ............. ................. .. ...................... ... 531
14.2.2 POSIX Locking: f cntl () and loc H () ....... ..... .......... ....... .......... .. .. ... 533
14.2.2.1 Describing a Lock ..... ... ...... ..... ...... ..... .... .......... ... .... ................ ...... ...... 533
14.2. 2.2 O btaining and Releasi ng Locks ............. .... ................ .. .......... ........ ...... 536
14.2. 2.3 O bserving Locking Caveats .......... .. ...... .. ................. .. .. .. ...... ............... 538
14.2.3 BSD Locking: flock () ........ ........ .. ..................................... .... .... ........... 539
14.2.4 Mandatory Locking ........................ .. .. .... ..... .... ........... ... ...... ..... .......... ... ... 540
14.3 More Precise Times .. ...... .... .. .. ...................... ........ .... ........... .... ........ ................. 543
14.3. 1 Microsecond Times: get time o fday () ...... .. ....... .. .............................. . 544
14.3.2 Microsecond File Times: utimes () .... ...... ...... ........ .. .. .. ............ .. .... .. .... . 545
14.3.3 Interval Timers: seti timer () and geti timer () .. .. .. .. .. .... .. .... ........ . 546
14.3.4 More Exact Pauses: nanosleep () .. .......................................... .. ...... ..... 550
14.4 Advanced Searchin g with Binary Trees .. .. .......... .... .... .. .. ...... ........ ....... .. .. .. ........ 551
14.4.1 Introduction to Binary Trees ...... .. .......... ...... .. ............................ .... .......... 551
14.4.2 Tree Management Functions .......... .. .......................................... .. .......... .. 554
14.4.3 Tree Insertion : tsearch () ............ .. ...................................................... 554
14.4.4 Tree Lookup and Use of A Returned Poin ter: t fin d () and
tsearch () ............ ......... ................................ .... ... .............. .. ... ............. 555
14.4.5 Tree Traversal: twalk () .......... ........ ...... .... .... .. ........ .... ........................... 55 7
14.4.6 Tree Node Removal and Tree Deletion: tdelete () and tdest r oy (). 561
14.5 Summary ............. .......... ......... .. .. .. ....... ........ .... ..... .... .... ..... ......... .... ........... ... .... 562
Exercises .. ... ....................... ...... .. ... .... ... .... .... ...... ..... ....... ............................................. . 563
15.4.1.4 Use Debugging Helper Functions ... ... .. ........... ...... ... ... .... ..... ........... .. .. 584
15.4.1.5 Avoid Unions When Possible ... ............. ....... .......... ...... ...... .. .. ...... ...... 591
15.4.2 Runtime Debugging Code ...... ................ ........ ....... ..... .. ........ .... ........ ....... 595
15.4.2.1 Add Debugging Options and Variables..... ... ........... .......... .. ...... ......... 595
15.4.2.2 Use Special Environment Variables ........ ...... ..... ..... ..... ........ .. .... .. ...... . 597
15.4.2.3 Add Loggi ng Code .......... ............... .. ............. ... .. .... .... ........ ..... .. ......... 601
15 .4.2.4 Runtime Debugging Files ....... ...... ..... ..... .......... .... .............. ..... ........... 602
15.4.2.5 Add Special H ooks fo r Breakpoints ..... ............ ........... .... .... .... ..... ...... . 603
15.5 D ebugging Tools .............. .. .. ..... ........ ........ ... .. .. ... ... ....... ......... ..... ......... .... .... .... 605
15.5.1 The dbug Library - A Sophisticated p r i n tf () ........ ............. ...... ..... .. . 606
15.5.2 Memory Allocation Debuggers ... ... ...... ........ .. .. ........ ................................. 612
15 .5.2.1 GNU/Linux mtrace .. ... ..... .. ..... .. ... ..... ..... ... ....... ... .... ... ... .... .. .. ........ .. 613
15.5.2.2 Electric Fence ... .......... .... .. .. .... ...... .... ...... .. .. .... ......... .. ....... ...... .... ... .... . 614
15 .5.2.3 Debugging Malloe: dmalloc ..... ............ .. .... .... ... .... ...... ...... ... ..... ..... . 619
15 .5.2.4 Valgrind: A Versatile Tool... .. .. .... ..... ....... .... .... ........ .... ........ ... .... .. .. ... . 623
15 .5.2.5 Other Malloc Debuggers .. .... ............. ..... ... .... .. ....... .. ....... .... ...... .... .... . 629
15.5.3 A Modern l i nt .. ... ........ ....... ....... .... .... .. ... ..... ... ......... ... .. .......... .... .. .. ...... 63 1
15.6 Software Testing .. .. ...... ..... .......... .... .. ... ..... .... ... ... ........ .... ......... ... ........ .... ..... ..... 632
15.7 Debugging Rules ...... ............ .... .. ......... ..... ............... .. .. ........... ..... ....... .............. 633
15.8 Suggested Reading ... .. .......... ... ........ .. ... ...... ... ............... ..... .... ... .... ...... ... .. ... ..... .. 637
15.9 Summary .. .... ...... ... .. ........ .... .... .... ....... .... .......... ....... .. ... .... .... ....... .. ......... .......... 638
Exercises ...... ... ....... .. ..... ......................... .... ......... ..... ......... .. ...... ........... .. ........ .... ......... . 639
Chapter 16 A Project That Ties Everything Together ............................ ........ 641
16.1 Project Description .. .... .... .. ........ .... ........ ..................... ...... .. ...... ........ .. .... .... .. .. .. 642
16.2 Suggested Reading ......... ..... .... ........ ........ .. ..... .. .. ..... ...... .. ..... ...... ... .. ... .... .. ...... ... 644
Au d ien ce
This book is intended for the person who understands programming and is familiar
with the basics ofC, at least on the level of The C Programming Language by Kernighan
and Ritchie. (Java programmers wishing to read this book should understand C pointers,
since C code makes heavy use of them.) The examples use both the 1990 version of
Standard C and Original C.
In particular, you should be familiar with all C operators, control-How structures,
variable and pointer declarations and use, the string management functions, the use of
exi t (), and the <stdio. h> suite of functions for file input/output.
You should understand the basic concepts of standard input, standard output, and
standard error and the fact that all C programs receive an array of character strings
representing invocation options and arguments . You should also be familiar with the
fundamental command-line tools, such as cd, cp, date, ln, ls, man (and info if you
XVI I
XVIII Preface
have it) , rmdir, and rm, the use oflong and short command-line options, environment
variables, and I/O redirection, including pipes.
We assume that you want to write programs that work not just under GNU/Linux
but across the range of Unix systems. To that end, we mark each interface as to its
availability (GLIBC systems only, or defined by P05IX, and so on), and portability
advice is included as an integral part of the text.
The programming taught here may be at a lower level than you're used to ; that's
OK. The system calls are the fundamental building blocks for higher operations and
are thus low-level by nature. This in turn dictates our use of C: The APIs were designed
for use from C, and code that interfaces them to higher-level languages, such as c++
and Java, will necessarily be lower level in nature, and most likely, written in C. It may
help to remember that "low level" doesn 't mean "bad," it just means "more challenging. "
We have purposely kept the list of topics short. We believe that it is intimidating to
try to learn "all there is to know" from a single book. Most readers prefer smaller, more
focused books, and the best Unix books are all written that way.
So, instead of a single giant tome, we plan several volumes: one on Interprocess
Communication (IPC) and networking, and another on software development and
code portability. We also have an eye toward possible additional volumes in a Linux
Preface XIX
Programming by Example series that will cover topics such as thread program ming and
GUI programming.
The APIs we cover include both sys tem calls and library functions . Indeed, at the C
level, both appear as sim ple function calls. A system call is a direct request for system
services, such as reading or writing a file o r creating a process. A library function, on the
o ther han d, runs at the user level , possibly never requesting any services from the oper-
ating system. System calls are doc umented in section 2 of the reference manual (viewable
online with the man command) , and library functions are documented in section 3.
Our goal is to teach yo u the use of the Linux APIs by example: in particular, through
the use, wherever possible, of both original Unix so urce code and the GNU urilities.
U nfortunately, there aren ' t as many self-contained examples as we though t there'd be.
Th us, we have written numerous small demonstration programs as well. We stress
programming principles : especially those aspects of GNU programming, such as "no
arbitrary limits ," that make the G NU utilities into exceptional programs.
T he choice of everyday programs to study is deliberate. If you've been using
GNU/Linux for any length of time, yo u already understand what programs such as ls
and cp do; it then becomes easy to dive straight into how the programs work, without
having to spend a lot of time learning what they do.
Occasionally, we present both higher-level and lower-level ways of doing things.
Usually the higher-level standard interface is implemented in terms of the lower-level
interface or co nstruct. We hope that such views of what's " under the hood" w ill help
yo u understand how things wo rk; for all the code you wri te, you should always use the
higher-level, standard interface.
Similarly, we sometimes introduce functions that provide certain functio nali ty and
then recommend (with a provided reason) that these functions be avoided! The primary
reason for this app roach is so that yo u'll be able to recognize these functions when you
see them and thus understand the code using them. A well-rounded knowledge of a
topic requires understanding not just what yo u can do, but what you should and should
not do.
Finally, each chapter co ncludes with exercises . Some involve m odifying or writing
code. Others are more in the category of "thought experiments" or "why do you
think .. . " We recommend that yo u do all of them- they will help cement yo ur under-
standing of the material.
xx Preface
Initially, we planned to teach the LinuxAPI by using the code from the GNU utilities.
However, the modern versions of even simple command-line programs (like mv and
cp) are large and many-featured. This is particularly true of the GNU variants of the
standard utilities, which allow long and short options, do everything required by POSIX,
and often have additional, seemingly unrelated options as well (like output highlighting).
It then becomes reasonable to ask, "Given such a large and confusing forest , how
can we focus on the one or two important trees?" In other words, if we present the
current full-featured program, will it be possible to see the underlying core operation
of the program?
That is when Hoare's law 1 inspired us to look to the original Unix programs for ex-
ample code. The original V7 Unix utilities are small and straightforward, making it
easy to see what's going on and to understand how the system calls are used. (V7 was
released around 1979; it is the common ancestor of all modern Unix systems, including
GNU/Linux and the BSD systems.)
For many years, Unix source code was protected by copyrights and trade secret license
agreements, making it difficult to use for study and impossible to publish. This is still
true of all commercial Unix source code. However, in 2002, Caldera (currently operating
as SeO) made the original Unix code (through V7 and 32V Unix) available under an
Open Source style license (see Appendix B, "Caldera Ancient UNIX License," page 655).
This makes it possible for us to include the code from the early Unix system in this book.
Standards
Throughout the book we refer to several different formal standards. A standard is a
document describing how something works. Formal standards exist for many things,
for example, the shape, placement, and meaning of the holes in the electrical outlet in
1 This famous statement was made at The International Workshop on Efficient Production of Large Programs in
Jablonna, Poland, August 10- 14, 1970.
Preface XXI
your wall are defined by a formal standard so that all the power cords in your country
work in all the outlets.
50 , too, formal standards for computing systems define how they are supposed to
work; this enables developers and users to know what to expect from their software and
enables them to complain to their vendor when software doesn't work.
Of interest to us here are:
Although language standards aren't exciting reading, you may wish to consider pur-
chasing a copy of the C standard: It provides the final definition of the language. Copies
XXII Preface
can be purchased from ANSI 2 and from ISO.3 (The PDF version of the C standard is
quite affordable.)
The POSIX standard can be ordered from The Open Group.4 By working through
their publications catalog to the items listed under "CAE Specifications," you can find
individual pages for each part of the standard (named "C031" through "C034"). Each
one's page provides free access to the online HTML version of the particular volume.
The POSIX standard is intended for implementation on both Unix and Unix-like
systems, as well as non-Unix systems. Thus, the base functionality it provides is a subset
of what Unix systems have. However, the POSIX standard also defines optional exten-
sions-additional functionality, for example, for threads or real-time support. Of most
importance to us is the XlOpen System Interface (XSI) extension, which describes facilities
from historical Unix systems.
Throughout the book, we mark each API as to its availability: ISO C, POSIX, XSI,
GUBC only, or nonstandard but commonly available.
By using GNU programs, we want to meet both goals: show you well-written,
modern code from which you will learn how to write good code and how to use the
APIs well.
We believe that GNU software is better because it is free (in the sense of "freedom, "
not "free beer"). But it's also recognized that GNU software is often technically better
than the corresponding Unix counterparts, and we devote space in Section 1.4, "Why
GNU Programs Are Better, " page 14, to explaining why.
A number of the GNU code examples come from g a wk (GNU aWk). The main
reason is that it's a program with which we' re very familiar, and therefore it was easy
to pick examples from it. We don 't otherwise make any special claims about it.
Summary of Chapters
Driving a car is a holistic process that involves multiple simultaneous tasks. In many
ways, Linux programming is similar, requiring understanding of multiple aspects
of the API, such as file 110, file metadata, directories, storage of time information,
and so on.
The first part of the book looks at enough of these individual items to enable studying
the first significant program, the V7 15 . Then we complete the discussion of files and
users by looking at file hierarchies and the way filesystems work and are used.
Chapter 1, '1ntroduction,"page 3,
describes the Unix and Linux file and process models , looks at the differences be-
tween Original C and 1990 Standard C, and provides an overview of the principles
that make GNU programs generally better than standard Unix programs.
Chapter 2, "Arguments, Options, and the Environment," page 23,
describes how a C program accesses and processes command-line arguments and
options and explains how to work with the environment.
Chapter 3, "User-Level Memory Management,"page 51,
provides an overview of the different kinds of memory in use and available in a
running process. User-level memory management is central to every nontrivial
application, so it's important to understand it early on.
XXIV Preface
The second part of the book deals with process creation and management, interprocess
communication with pipes and signals, user and group IDs, and additional general
programming interfaces. Next, the book first describes internationalization with GNU
gettext and then several advanced APIs.
We tound the book off with a chapter on debugging, since (almost) no one gets
things right the first time, and we suggest a final project to cement your knowledge of
the APIs covered in this book.
Several appendices cover topics of interest, including the licenses for the source code
used in this book.
Appendix A, "Teach Yourself Programming in Ten Years," page 649,
invokes the famous saying, "Rome wasn't built in a day." So too, Linux/Unix ex-
pertise and understanding only come with time and practice. To that end, we
have included this essay by Peter Norvig which we highly recommend.
Appendix B, "Caldera Ancient UNIX License," page 655,
covers the Unix source code used in this book.
Appendix C, "GNU General Public License,"page 657,
covers the GNU so urce code used in this book.
XXVI Preface
Typographical Conventions
Like all books on computer-related topics, we use certain typographical conventions
to convey information. Definitions or first uses of terms appear in italics, like the word
"Definitions" at the beginning of this sentence. Italics are also used for emphasis, for
citations of other works, and for commentary in examples. Variable items such as argu-
ments or filenames , appear l i ke t hi s . Occasionally, we use a bold font when a point
needs to be made strongly.
Things that exist on a computer are in a constant-width font , such as filenames
(f aa . c ) and command names (Is, grep). Short snippets that you type are additionally
enclosed in single quotes: ' 1 s -1 *. c' .
$ and > are the Bourne shell primary and secondary prompts and are used to display
interactive examples. User input appears in a different font from regular comput e r
outpu t in examples. Examples look like this:
$ 18 -1 Look at files. Option is digit 1, not letter I
foo
bar
baz
We prefer the Bourne shell and its variants (ksh9 3 , Bash) over the C shell; thus, all
our examples show only the Bourne shell. Be aware that quoting and line-continuation
rules are different in the C shell; if you use it, you' re on your own!6
When referring to functions in programs, we append an empty pair of parentheses
to the function 's name: printf ( ) , st r cpy () . When referring to a manual page (acces-
sible with the man command), we follow the standard Unix convention of writing the
command or function name in italics and the section in parentheses after it, in regular
type: awk(1), printf(3).
6 See th e csh(l) and m hO ) man pages and the book Using csh & tcsh, by Paul DuBois, O 'Reilly & Associates, Se-
bastopol, CA, USA, 1995. ISBN: 1-56592- 132- 1.
Preface XXVII
Unix Code
Archives of various "ancient" versions of Unix are maintained by The UNIX Heritage
Society (TUHS), h ttp : // www . tuh s. org.
Of most interest is that it is possible to browse the archive of old Unix source code
on the Web. Start with http : // minnie . tuh s . org / UnixTree / . All the example code
in this book is from the Seventh Edition Research UNIX System, also known as "V7."
The TUHS si te is physically located in Australia, although there are mirrors of the
archive around the world- see http: // www . tuh s. org/archi ve_sit es . html.
This page also indicates that the archive is available for mirroring with rsync.
(See htt p: //rsync . samba . org/ if you don 't have rsync: It's standard on
GNU/Linux systems.)
You will need about 2-3 gigabytes of disk ro copy the entire archive. To copy the
archive, create an empty directoty, and in it, run the following commands:
mkdir Applicati ons 4BSD PDP-ll PDP-ll/Trees VAX Other
You may wish to omit copying the Trees directory, which contains extractions of
several versions of Unix, and occupies around 7 00 megabytes of disk.
You may also wish to consult the TUHS mailing list to see if anyone near YOLl can
provide copies of the archive on CD-ROM, to avoid transferring so much data over
the Internet.
The folks at Southern Storm Software, Pry. Ltd., in Australia, have "modernized" a
portion of the V7 user-level code so that it can be compiled and run on current systems,
most notably GNU/Linux. This code can be downloaded fro m their web site. 7
It's interesting to note that V7 code does not contain any copyright or permission
notices in it. The authors wrote the code primarily for themselves and their research,
leaving the permission issues to AT &T' s corporate licensing department.
GNU Code
If yo u're using GNU/Linux, then your distribution will have come with source code,
presumably in whatever packaging format it uses (Red Hat RPM files , Debian DEB
files, Slackware . tar . gz files, etc.). Many of the examples in the book are from the
GNU Coreutils, version 5.0. Find the appropriate CD-ROM for your GNU/Lin ux
distribution, and use the appropriate tool to extract the code. Or follow the instructions
in the next few paragraphs to retrieve the code.
If you prefer to retrieve the files yourself from the GNU ftp site, you will find them
atftp: // ftp.gnu . org / gnu / coreutils / coreutils-5.0 . tar. gz.
You can use the wget utility to retrieve the file:
$ wget ftp://ftp.gnu.org/gnu/coreutils/coreutils-S . O.tar.gz Retrieve the distribution
... lots of output here as file is retrieved ...
Alternatively, you can use good old-fashioned ftp to retrieve the file:
$ ftp ftp.gnu.org Connect to GNU ftp site
Connected to ftp . gnu.org ( 199.232.41 . 7).
220 GNU FTP server ready .
Name (ftp .gnu . org : arnold) : anonymous Use anonymous ftp
331 please specify the password.
Password: Password does not echo on screen
230-If you have any problems with the GNU software or its downloading,
230-please refer your questions to <gnu@gnu . org>.
Lots of verbiage deleted
230 Login successful. Have fun.
Remote system type is UNIX .
Using binary mode to transfer files.
ftp> cd /gnu/coreutils Change to Coreutils directory
250 Directory successfully changed .
ftp> bin
200 Switching to Binary mode .
ftp> hash Print # signs as progress indicators
Hash mark printing on (1024 bytes/hash mark ) .
ftp> get coreutils-S.O.tar . gz Retrieve file
local: coreutils - 5 . 0 . tar . gz remote: coreutils-5.0 . tar . gz
227 Entering Passive Mode (199 ,2 32 ,41,7,86, 107)
150 Opening BINARY mode data connection for coreutils-5 . 0 . tar.gz (6020616 bytes)
#################################################################################
#################################################################################
In compliance with the GNU General Public License, here is the Copyright infor-
mation for all GNU programs quoted in this book. All the programs are "free software;
you can redistribute it and/or mo dify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either version 2 of the License,
or (at your option) any later versio n." See Appen dix C, "GNU General Public License, "
page 657, for the text of the GNU General Pub lic License.
Coreutils S.O File Copyright dates
l i b/sa fe - read.c Copyright© 1993-1994, 1998,2002
l ib / safe-write.c Copyrigh t© 2002
lib/utime . c Copyright © 1998, 200 1-2002
l ib / xreadlink. c Copyright © 2001
src/du . c Copyright © 1988-1991,1995- 2003
src/env. c Copyright© 1986, 1991 - 2003
src / install.c Copyright © 1989-1991,1995-2002
srcllink. c Copyright © 2001-2002
srclls . c Copyright© 1985, 1988,1990, 199 1,1995-2003
src / pathchk.c Copyright© 1991 - 2003
src / s ort. c Copyright © 1988, 1991-2002
src/sys2.h Copyright © 1997-2003
src / wc . c Copyright © 1985, 1991, 1995-2002
In the hands of a Jedi Knight, a light saber is both a powerful weapon and a thing
ofbeaury. Its use demonstrates the power, knowledge, control of the Force, and arduous
training of the J edi who wields it.
The elegance of the light saber mirrors the elegance of the original Unix API design.
There, too, the studied, precise use of the APls and the Software Tools and GNU design
principles lead to today's powerful, flexible, capable GNU/Linux system. This system
demonstrates the knowledge and understanding of the programmers who wrote all its
components.
And, of course, light sabers are just way cool!
Ac kn owledgmen ts
Writing a book is lots of work, and doing it well requires help from many people.
Dr. Brian W. Kernighan, Dr. Doug McIlroy, Peter Memishian, and Peter van der
Linden reviewed the initial book proposal. David J. Agans, Fred Fish, Don Marti, Jim
Meyering, Peter Norvig, and Julian Seward provided reprint permission for various
items quoted throughout the book. Thanks to Geoff Collyer, Ulrich Drepper, Yosef
Gold, Dr. CA.R. (Tony) Hoare, Dr. Manny Lehman, Jim Meyering, Dr. Dennis M.
Ritchie, Julian Seward, Henry Spencer, and Dr. Wladyslaw M. Turski, who provided
much useful general information. Thanks also to the other members of the GNITS
gang: Karl Berry, Akim DeMaille, Ulrich Drepper, Greg McGary, Jim Meyering,
Fran<;:ois Pinard, and Tom Tromey, who all provided helpful feedback about good
programming practice. Karl Berry, Alper Ersoy, and Dr. Nelson H.F. Beebe provided
valuable technical help with the T exinfo and DocBook/XML toolchains.
Good technical reviewers not only make sute that an author gets his facts right, they
also ensure that he thinks carefully about his presentation. Dr. Nelson H.F. Beebe,
Geoff Collyer, Russ Cox, Ulrich Drepper, Randy Lechlitner, Dr. Brian W. Kernighan,
Peter Memishian, Jim Meyering, Chet Ramey, and Louis Taber acted as technical re-
viewers for the entire book. Dr. Michael Brennan provided helpful comments on
Chapter 15. Both the prose and many of the example programs benefited from their
reviews. I hereby thank all of them. As most authors usually say here, "Any remaining
.
errors are mme. "
I would especially like to thank Mark Taub of Pearson Education for initiating this
project, for his enthusiasm for the series, and for his help and advice as the book moved
XXX II Preface
through its various stages. Anthony Gemmellaro did a phenomenal job of realizing my
concept for the cover, and Gail Cocker's interior design is beautiful. Faye Gemmellaro
made the production process enjoyable, instead of a chore. Dmitry Kirsanov and
Alina Kirsanova did the figures , page layout, and indexing; they were a pleasure to
work with.
Finally, my deepest gratitude and love to my wife, Miriam, for her support and en-
couragement during the book's writing.
Arnold Robbins
No/Ayalon
ISRAEL
Chapter 1 Introduction page 3
• 1 .7 Summary page 2 1
3
I f there is one phrase that summarizes the primary GNU/Linux (and therefore
Unix) concepts, it's "files and processes. " In this chapter we review the Linux
file and process models. These are important to understand because the system calls
are almost all concerned with modifYing some attribute or part of the state of a file
or a process.
Next, because we'll be examining code in both styles, we briefly review the major
difference between 1990 Standard C and Original C. Finally, we discuss at some
length what makes GNU programs "better," programming principles that we'll see
in use in the code.
This chapter contains a number of intentional simplifications. The full details are
covered as we progress through the book. If you're already a Linux wizard, please
forgive us.
4
1.1 The Linux/ Unix File Model 5
record sizes, no indexed files , nothing. The interpretation of fil e contents is entirely up
to the application. (This isn' t quite true, as we'll see shortly, but it's close enough for
a start.)
Once you have a file, you can do three things with the file 's data: read them, write
them , or execute them.
Unix was designed for time-sharing minicomputers; this implies a multiuser environ-
m ent from the get-go. Once there are multiple users, it must be possible to specify a
file's permissions: Perhaps user jane is user fr ed's boss, and jane doesn't want fre d
to read the latest performance evaluations.
For file permission purposes, users are classified into three distinct categories: user:
the owner of a file; group: the group of users associated with this file (discussed shortly) ;
and other: anybody else. For each of these categories, every file has separate read, write,
and execute permission bits ass ociated with it, yielding a total of nine permission bits.
This shows up in the first field of the output of ' 1 s - 1':
S ls -1 progex.t e xi
- r w- r - - r- - 1 arno l d dev el 5 61 4 F e b 24 18 : 0 2 pr o gex . tex i
Here, arno l d and deve l are the owner and group ofproge x . t exi , and - r w- r- - r- -
are the file type and permissions . The first character is a dash for regular file, a d for
directories, or o ne of a small set of other characters for other kinds of files that aren't
important at the moment. Each subsequent group of three characters represents read,
write, and execute permission for the owner, group, and "other," respectively.
In this example, progex. t e xi is readable and writable by the owner, and readab le
by the group and other. The dashes indicate absent permissions, thus the fil e is no t ex-
ecutable by anyone, nor is it wri table by the group or other.
T he owner and group of a file are stored as numeric values known as the user ID
(UID) and group ID (GID); standard library functions that we present later in the book
m ake it possible to print the values as human -readable names.
A file's owner can change the permission by using the chmod (change mode)
command. (As such, file permissions are sometimes referred to as the "file mode. ")
A file's group can be changed with the chgrp (change group) and chown (change
owner) commands. 1
1 Some sysrems al low regular use rs ro cha nge rhe ownership o n rheir fi les ro someo ne else, rhus "giving rh em away."
T h e details are srandardized by POSIX bur are a bir messy. Typical GNU/Linux configurarions do nor allow it.
6 Chapter 1 • Introduction
Group permissions were intended to support cooperative work: Although one person
in a group or department may own a particular file , perhaps everyone in that group
needs to be able to modify it. (Consider a collaborative marketing paper or data from
a survey.)
When the system goes to check a file access (usually upon opening a file) , if the UID
of the process matches that of the file , the owner permissions apply. If those permissions
deny the operation (say, a write to a file with - r--rw-rw- permissions), the operation
fails; Unix and Linux do not proceed to test the group and other permissions. 2 The
same is true if the UID is different but the GID matches; if the group permissions deny
the operation, it fails.
U nix and Linux support the notion of a superuser: a user with special privileges. This
user is known as r oo t and has the UID of o. r oot is allowed to do anything; all bets
are off, all doors are open, all drawers unlocked. 3 (This can have significant security
implications, which we touch on throughout the book but do not cover exhaustively.)
Thus, even if a file is mode ----- -----, r oot can still read and write the file. (One
exception is that the file can't be executed. But as root can add execute permission,
the restriction doesn' t prevent anything.)
The user/group/other, read/write/execute permissions model is simple, yet flexible
enough to cover most situations. Other, more powerful but more complicated, models
exist and are implemented on different systems , but none of them are well enough
standardized and broadly enough implemented to be worth discussing in a general-
purpose text like this one.
2 The own er can always change the permission , of course. Most users don't di sable write permission fo r themselves.
'3 There are some rare exceptions to this rule, all of which are beyo nd th e scope of this book.
1.1 T he Linux/Un ix File Model 7
page 83. They are also special in that the operating system dictates the format of direc-
to ryentnes.
Filenames may contain any valid 8-bit byte except the / (forward slash) character
and ASCII NUL, the character whose bits are all zero. Early Unix systems limited file-
names to 14 bytes; modern systems allow individual filenames to be up to 255 bytes.
T he inode contains all the information abo ut a file except its name: the type, owner,
group, permissions, size, m odification and access times . It also stores the locations on
disk of the blocks containing the file 's data. All of these are data about the file, not the
file 's data itself, thus the term metadata.
Directory permissions have a slightly different m eaning from those for file permissions.
Read permission means the ability to search the directory; that is, to look through it to
see what files it contains. Write permission is the abili ty to create and remove files in
the directory. Execute permission is the abili ty to go through a directory when opening
or otherwise accessing a co ntained file or subdirectory.
II*
d rwxrwxrwt 1 1 root roo t 40 96 May 1 5 17 :1 1 /trop
Note the t is the last position of the first fi eld . On most directories thi s position
Im has an x in it. Wi th th e sticky bit set, only you, as the fil e's owner, or r o ot may
:ffi remove your fil es. (We discu ss this in more detail in Section 11 .5. 2 , " Directori es
I and the Sticky Bit," page 414. )
ill
Although the kernel will only run a file laid out in the proper format, it is up to user-
level utilities to create these files. The compiler for a programming language (such as
Ada, Fortran, C , or C++) creates object files, and then a linker or loader (usually named
ld) binds the object files with library routines to create the final executable. Note that
even if a file h as all the right bits in all the right places, the kernel won' t run it if the
appropriate execute permission bit isn't turned on (or at least one execute bit for r oo t) .
Because the compiler, assembler, and loader are user-level tools, it's (relatively) easy
to change object file formats as needs develop over time; it's only necessary to "teach"
the kernel about the new format and then it can be used. The part that loads executables
is relatively small and this isn't an impossible task. Thus, Unix file formats have evolved
over time. The original format was known as a . out (Assembler OUTput) . The next
format , still used on some commercial systems, is known as COFF (Common Object
File Format), and the current, most widely used format is ELF (Extensible Linking
Format). Modern GNU/Linux systems use ELF .
The kernel recognizes that an executable file contains binary object code by looking
at the first few bytes of the file for special m agic numbers. These are sequences of two
or four bytes that the kernel recognizes as being special. For backwards compatibility,
modern Unix systems recognize multiple formats . ELF files begin with the four characters
" \ 177ELF" .
Besides binary executables, the kernel also supports executable scripts. Such a file also
begins with a magic number: in this case, the two regular characters # ! . A script is a
program executed by an interpreter, such as the shell, awk, Peri, Python, or Tcl. The
#! line provides the full path to the interpreter and, optionally, one single argument:
#! I bin l awk -f
Let's assume the above contents are in a file named hello . awk and that the file is
executable. When you type 'hell o . awk' , the kernel runs the program as if you had
typed ' I bin l awk - f hell o . aWk' . Any additional command-line arguments are also
passed on to the program. In this case, awk runs the program and prints the universally
known hel lo , world message.
The # ! mechanism is an elegant way of hiding the distinction between binary exe-
cutables and script executables. If he ll o . awk is renamed to just hell o, the user typing
l.1 The Linux/ Unix File Model 9
'he llo ' can't tell (and indeed sho uldn't have to know) that hello isn' t a binary exe-
cutab le program.
1 .1.4 Devices
One of U nix's most notable innovations was the unificatio n of file I/O and device
I/0 .4 Devices appear as files in the filesystem, regular permissio ns apply to their access,
and the same I/O system calls are used for opening, reading, writing, and closing them.
All of the "magic" to make devi ces look like files is hidden in the kernel. This is just
another aspect of the driving simplicity principle in action: We might phrase it as no
special cases for user code.
Two devices appear frequently in everyday use, particularly at the shell level:
/ dev / null and / dev / tty.
/ dev/null is the "bit bucker." All data sent to Idev/null is discarded by the oper-
ating sys tem, and attempts to read from it always return end-of-file (EOF) immediately.
I dey / tty is the process's current controlling terminal-
the one to which it listens
when a user types the interrupt character (typically CTRL-C) or performs job control
(CTRL-Z).
GNU/Linux systems , and many modern Unix systems, supply /dev / stdin,
/ dev / stdout , and / dev / stderr devices, which provide a way to name the open files
each process inherits upon startup .
Other devices rep resent real hardware, such as tape and disk drives, CD-ROM drives,
and serial ports. There are also software devices, such as pseudo-ttys, that are used for
networking logins and windowing sys tems. / dey I console represents the system console,
a particular hardware device on minicomputers. On modern co mputers, / dey / c onsol e
is the screen and keyboard, but it could be a serial port.
Unfortun ately, device-naming conventions are not standardized, and each operating
system has different names for tapes, disks, and so on. (Fortunately, that's not an issue
for what we cover in this book.) Devices have either a b or c in the first character of
'ls -1' o utput:
4 T his feature firsr appeared in M ulrics, bur Mulrics was neve r widely used.
10 Chapter 1 • Inrroduc(ion
$ 16 -1 /dev/tty /dev/hda
brw-rw---- 1 root disk 3, o Aug 31 02 : 31 /dev/hda
crw-rw-rw- 1 root root 5, o Feb 26 08 : 44 / dev / tty
The initial b represents block devices, and a c represents character devices. Device files
are discussed further in Section 5.4, "Obtaining Information about Files," page 139.
When the main () function begins execution, all of these things have already been
put in place for the running program. System calls are available to query and change
each of the above items; covering them is the purpose of this book.
New processes are always created by an existing process. The existing process is termed
the parent, and the new process is termed the child. Upon booting, the kernel handcrafts
the first , primordial process, which runs the program / sbin / ini t; it has process ID
5 Processes can be suspe nded , in which case they are not "running"; however, neither are they terminated. In any
case, in the early stages of the climb up the learning curve, it pays not ro be roo pedantic.
1.2 The Linux/ Unix Process Model 11
1 and serves several administrative functions. All other processes are descendants of
init. (init's parent is the kernel, often listed as process 10 0.)
T h e child- to-parent relationship is one-to-one; each process h as only one parent,
and thus it's easy to find out the PID of the parent. T he parent-to-child relationship
is one-to-many; any given process can create a potentially unlimited number of children.
Thus, there is no easy way for a process to find o ut the PIDs of al l its children. (In
practice, it's no t necessary, anyway.) A parent process can arrange to be notified when
a child process terminates ("dies"), and it can also explicitly wai t for such an event.
Each process's address space (memory) is separate from that of every other. U nless
two processes have made explicit arrangement to share memory, one process cannot
affect the address space of another. This is important; it provides a basic level of securiry
and system reliabiliry. (Fo r efficiency, the system arranges to share the read-only exe-
cutable code of the same program among all the processes running that program . This
is transparent to the user and to the runni ng program.)
The current working directory is the one to which relative pathnames (those that
don't start with a / ) are relative. This is the directory you are "in" whenever you issue
a 'cd s omeplac e' command to the shell.
By co nvention, all programs start out with three files already open: standard input,
standard output, and standard error. These are where input comes fro m , output goes
to, and error messages go to, respectively. In the co urse of this book, we will see h ow
these are put in place. A parent process can open addi tional files and have them already
available for a child p rocess; the child will have to know they' re there, either by way of
some convention or by a command-line argument or environment variable.
T he environment is a set of strings, each of the form 'n ame=v al ue'. Functions exist
for querying and setting environment variables, and child processes inherit the environ-
ment of their parents. Typical environment variables are things like PATH and HOME in
the shell. Many programs look fo r the exis tence and val ue of specific environment
variables in order to control their behavior.
It is important to understand that a single process may execute multiple programs
during its lifetime. U nless explicitly changed, all of the other system-maintained
attributes (cutrent directory, open files, PID, etc.) remain the same. The separation of
"starting a new process" from "choosing which program to run" is a key Unix innovation.
12 Chapter 1 • Introduction
It makes many operations simple and straightforward. Other operating systems that
combine the two operations are less general and more complicated to use.
st ruct my_stru ct s , t;
in t j ;
Original C, functions are declared w ithout the argument list being specified:
e xtern int myfunc() ; Returns int, arguments unknown
Furthermore, function defini tions list the parameter names in the fun ction header,
and then declare the parameters before the functi on body. Parameters of type int d on ' t
have to be declared, and if a function returns int, that doesn't have to be declared either:
myfunc (a , b, c, d) Return type is int
struct my_struct *a, *b ;
double C ; No te, no declaration of parameter d
{
Consider again the same erro neous function call: j = my_ fun c ( - 1, - 2 , 0);' . In
C
Original C, the compiler has no way of knowing that yo u've (accidentally, we assume)
passed the wrong arguments to my_ fune () . Such erroneo us calls generally lead to hard-
to-find runtime problems (s uch as segmentation faults, whereby the program dies), and
the Unix lint program was created to deal with these kinds of things.
So, although function prototypes were a radical departure from existing practice,
their additional type checking was deemed too important to be witho ut, and they were
added to the language with little opposition.
14 Chapter 1 • Introducti o n
In 1990 Standard C, code written in the original style, for both declarations and
definitions, is valid. This makes it possible to continue to compile millions of lines of
existing code with a standard-conforming compiler. New code, obviously, should be
written with prototypes because of the improved possibilities for compile-time
error checking.
1999 Standard C continues to allow original style declarations and definitions.
However, the "implicit int" rule was removed; functions must have a return type, and
all parameters must be declared.
Furthermore, when a ptogram called a function that had not been formally declared,
Original C would create an implicit declaration for the function, giving it a return type
of int o 1990 Standard C did the same, additionally noting that it had no information
about the parameters. 1999 Standard C no longer provides this "auto-declare" feature .
Other notable additions in Standard C are the const keyword, also from C+ +, and
the vola t ile keyword, which the committee invented. For the code you'll see in this
book, understanding the different function declaration and definition syntaxes is the
most important thing.
For V7 code using original style definitions, we have added comments showing the
equivalent prototype. Otherwise, we have left the code alone, preferring to show it ex-
actlyas it was originally written and as you'll see it if you download the code yourself.
Altho ugh 1999 C adds some additional keywords and features beyond the 1990
version, we have chosen to stick to the 1990 dialect, since C99 compilers are not yet
commonplace. Practically speaking, this doesn 't matter: C89 code should compile and
run without change when a C99 compiler is used, and the new C99 features don't affect
our discussion or use of the fundamental Linux/Unix APIs.
6 This section is adapted from an articl e by the author that appeared in Issue 16 of Linux Journal. (See
h ttp : // www .li nu x j ournal . com / article . php? s id=11 3 5.) Reprinted and adapted by permission.
1.4 Why GNU Programs Are Bener 15
it's much better. " GNU software is generally more robust, and performs better, than
standard Unix versions. In this section we look at some of the reasons why, and at the
document that describes the principles of GNU software design.
The GNU Coding Standards describes how to write software for the GNU
project. It covers a range of topics. You can read the GNU Coding Standards online at
ht tp: // www . gnu . org / prep / standa rds . h t ml. See the online version for pointers
to the source files in other formats.
In this section, we describe only those parts of the GNU Coding Standards that relate
to program design and implementation.
7 T hi s statement refers to the HURD kern el, which is srill under develo pment (as of early 2004) . GCC and GN U
C Library (GLIBC) development rake place mostly on Linux-based sysrems today.
16 Chapter 1 • Introduction
pick. The GNU Coding Standards also makes this point. (So metimes, there is no de-
tectable consistent coding style, in which case the program is probably overdue for a
trip through either GNU indent or Unix's cb.)
What we find important about the chapter on C coding is that the advice is good
for any C coding, not just if you happen to be working on a GNU program. So, if
yo u' re just learning C or even if yo u've been working in C (o r C++) for a while, we
recommend this chapter to you since it encapsulates many years of experience.
This rule is perhaps the single most important rule in GNU software design-no
arbitrary Limits. All GNU utilities should be able to manage arbi trary amounts of data.
While this requirement perhaps makes it harder for the programmer, it makes things
much better for the user. At one point, we had a gawk user who regularly ran an awk
program on more than 650,000 files (no, that's n ot a typo) to gather statistics. gawk
would grow to over 192 megabytes of data space, and the program ran fo r around seven
CPU hours. He would not have been able to run his program using another awk
implementation. 8
Utilities reading files should not drop NUL characters, or any other nonprint-
ing characters incLuding those with codes above 0177 The only sensible excep-
tions would be utilities specifically intended for interface to certain types of
terminals or printers that can't handle those characters.
8 T his situatio n occurred circa 1993; [he truism is eve n more obvious roday, as users process gigabytes of log files
with gawk .
18 Chapter 1 • Introduction
It is also well known that Emacs can edit any arbitrary file, including files containing
binary data!
Whenever possible, try to make programs work properly with sequences of
bytes that represent multi byte characters, using encodings such as UTF-8
and others. 9 Check every system call for an error return, unless you know
you wish to ignore errors. Include the system error text (from perro r or
equivalent) in every error message resulting from a failing system call, as well
as the name of the file if any and the name of the utility. Just "cannot open
foo .c" or "stat failed" is not sufficient.
Checking every system call provides robustness. This is another case in which life is
harder for the programmer but better for the user. An error message detailing what ex-
actly went wrong makes finding and solving any problems much easier. 1o
Finally, we quote from Chapter 1 of the GNU Coding Standards, which discusses
how to write your program differently from the way a Unix program may have
been written.
For example, Unix utilities were generally optimized to minimize memory
use; if you go for speed instead, your program will be very different. You
could keep the entire input file in core and scan it there instead of using
stdio. Use a smarter algorithm discovered more recently than the Unix pro-
gram. Eliminate use of temporary files. Do it in one pass instead of two (we
did this in the assembler).
Or, on the contrary, emphasize simplicity instead of speed. For some appli-
cations, the speed of today's computers makes simpler algorithms adequate.
Or go for generality. For example, Unix programs often have static tables or
fixed-size strings, which make for arbitrary limits; use dynamic allocation
instead. Make sure your program handles NULs and other funny characters
in the input files. Add a programming language for extensibility and write
part of the program in that language.
9 Sectio n 13.4 , "Can You Spell That for M e, Please?", page 521 , provides an overvi ew of mu!tibyre characters and
encodings.
10 The m echanics of checking for and reporting errors are discussed in Section 4.3, "Determining What Went
Wrong," page 86.
1.5 Porcability Revisited 19
ISO standards for C and the 2003 standard for c++ since most Linux programming
is done in one of those two languages.
Also, the POSIX standard for library and system call interfaces, while large, has
broad industry support. Writing to POSIX greatly improves the chances of suc-
cessfully moving your code to other systems besides GNU/Linux. This standard
is quite readable; it distills decades of experience and good practice.
Pick the best interface for the job.
If a standard interface does what you need, use it in your code. Use Autoconf to
detect an unavailable interface, and supply a replacement version of it for deficient
systems. (For example, some older systems lack the memmove () function, which
is fairly easy to code by hand or to pull from the GLIBC library.)
Isolate portability problems behind new interfaces.
Sometimes, you may need to do operating-system-specific tasks that apply on
some systems but not on others. (For example, on some systems, each program
has to expand command-line wildcards instead of the shell doing it.) Create a new
interface that does nothing on systems that don't need it but does the correct thing
on systems that do.
Use Autoconffor configuration.
Avoid #ifdef if possible. If not, bury it in low-level library code. Use Autoconf
to do the checking for the tests to be performed with #ifdef.
This book is also a classic. It covers Original C as well as the 1990 and 1999
standards. Because it is current, it makes a valuable companion to The C Pro-
gramming Language. It covers many important items , such as internationaliza-
tion-related types and library functions, that aren ' t in the Kernighan and
Ritchie book.
3. Notes on Programming in C, by Rob Pike, February 21, 1989. Available
on the Web from many sites. Perhaps the most widely cited location is
http : // www . lysato r. liu .s e /c/ pikestyle . h t rnl. (Many other useful
articles are available from one level up: http : // www . lysato r. liu . se /c/ .)
Rob Pike worked for many years at the Bell Labs research center where C and
Unix were invented and did pioneering development there. His notes distill
many years of experience into a "philosophy of clarity in programming" that
is well worth reading.
4. The various links at ht tp: // www . c hris-l o tt . o rg / r es o u r ces / cstyl e/ .
This site includes Rob Pike's notes and several articles by Henry Spencer. Of
particular note is the Recommended C StyLe and Coding Standards, originally
written at the Bell Labs Indian Hill site.
1.7 Summary
• "Files and processes" summarizes the Linux/Unix worldview. The treatment of
fi les as byte streams and devices as files, and the use of standard input, output,
and error, simplify program design and unify the data access model. The permis-
sions model is simple, yet Bexible, applying to both files and direcrories.
• Processes are running programs that have user and group identifiers associated
with them for permission checking, as well as other attributes such as open files
and a current working directory.
• The most visible difference between Standard C and Original C is the use of
function prototypes for stricter type checking. A good C programmer should be
able to read Original-style code, since many existing programs use it. New code
should be written using prototypes.
• The GNU Coding Standards describe how to write GNU programs. They provide
numerous valuable techniques and guiding principles for producing robust, usable
22 Chapter 1 • Introduction
software. The "no arbitrary limits" principle is perhaps the single most important
of these. This document is required reading for serious programmers.
• Making programs portable is a significant challenge. Guidelines and tools help ,
but ultimately experience is needed too.
Exercises
23
C ommand-line option and argument interpretation is usually the first task of
any program. This chapter examines how C (and C++) programs access their
command-line arguments, describes standard routines for parsing options, and takes
a look at the environment.
Here, the user typed four "words." All four words are made available ro the program
as Its arguments.
The second definition is more informal: Arguments are all the words on the command
line except the command name. By default, Unix shells separate arguments from each
other with whitespace (spaces or TAB characters). Quoting allows arguments to include
whitespace:
$ echo here are lots of spaces
here are lots of spaces Th e shell "eats" the spaces
$ echo "here are lots of spaces"
here are lots of spaces Spaces are preserved
Quoting is transparent to the running program; e cho never sees the double-quote
characters. (Double and single quotes are different in the shell; a discussion of the rules
is beyond the scope of this book, which focuses on C programming.)
Arguments can be further classified as options or operands. In the previous two exam-
ples all the arguments were operands: files for I s and raw text for echo.
Options are special arguments that each program interprets. Options change a pro-
gram's behavior, or they provide information to the program. By ancient convention,
(almost) universally adhered to, options start with a dash (a.k.a. hyphen, minus sign)
and consist of a single letter. Option arguments are information needed by an option,
as opposed to regular operand arguments. For example, the fgrep program's - f option
means "use the contents of the following file as a list of strings to search for." See
Figure 2.1.
24
2. 1 O ption and Argumem Co nvemions 25
Command name
Option
Option argument
-
I r-----rr~r
FIGURE 2.1
Command-line components
Thus, patfile is not a data file to search, but rather it's for use by fgre p in defining
the list of strings to search for.
1. Program names should h ave no less than rwo and no more than nine characters.
2. Program names should consist of only lowercase letters and digits.
3. Option names should be single alphanumeric characters. Multidigit options
sho uld not be allowed . For vendors implementing the POSIX utilities, the - w
option is reserved for vendor-specific options.
4 . All options should begin with a '-' character.
5. For options that don' t require option arguments, it sh ould be possible to group
multiple options after a single '- ' character. (For example, 'f o o -a - b -c'
and 'foo - abc' should be treated the same way.)
6. When an option does require an option argument, the argument should be
separated from the option by a space (for example, 'f grep -f pa tfile').
26 Chapter 2 • Argumems, Oprions, and rhe Environmem
The standard, however, does allow for historical practice, whereby sometimes
the option and the operand could be in the same string: ' f g r ep - f p atfile' .
In practice, the getopt () and getopt_ long () functions interpret '- fpatfile'
as '-f patfile', not as ' - f -p -a - t . . . '.
7. Option arguments should not be optional.
This means that when a program documents an option as requiring an option
argument, that option 's argument must always be present or else the program
will fail. GNU getopt () does provide for optional option arguments since
they' re occasionally useful.
8. If an option takes an argument that may have multiple values, the program
should receive that argument as a single string, with values separated by commas
or whitespace.
For example, suppose a h ypothetical program myprog requires a list of users
for its -u option. Then, it should be invoked in one of these two ways:
myprog -u "arnold, joe, jane" Separate with commas
myprog -u "arnold joe jane" Separate with whitespace
In such a case, you're on your own for splitting out and processing each value
(that is, there is no standard routine), but doing so manually is usually
straightforward.
9. Options should come first on the command line, before operands. Unix versions
of getopt () enforce this convention . GNU getopt () does not by default,
although you can tell it to .
10. The special argument ' - -' indicates the end of all options. Any subsequent ar-
guments on the command line are treated as operands, even if they begin with
a dash.
11. The order in which options are given should not matter. However, for mutu-
ally exclusive options, when one option overrides the setting of another, then
(so to speak) the last one wins. If an option that has arguments is repeated, the
program should process the arguments in order. For example, 'myprog - u
arnold - u jane' is the same as 'myprog - u "arno l d , j ane"'. (You have
to enforce this yourself; getopt ( ) doesn ' t help you. )
12. It is OK for the order of operands to matter to a program. Each program should
document such things.
2. 1 Option and Argument Conventions 27
13. Programs that read or write named files should treat the single argument' -' as
meaning standard input or standard output, as is appropriate for the program.
Note that many standard programs don't follow all of the above conventions . The
primary reason is historical compatibility; many such programs predate the codi fYing
of these conventions.
7. Programs can choose to allow long options to begin with a single dash. (This
is common with many X Window programs.)
the chapter.
The GNU Coding Standards devotes considerable space to listing all the long and
shorr options used by GNU programs. If you're writing a program that accepts long
options, see if option names already in use might make sense for you to use as well.
c ha r ** char *
FIGURE 2.2
Memory for argv
By convention, argv [0 1 is the program's name. (For details, see Section 9.1.4.3,
"Program Names and argv [0 1," page 297.) Subsequent entries are the command line
arguments. The final entry in the argv array is a NULL pointer.
2.2 Basic Comma nd-Line Processing 29
argc indicates how m an y arguments there are; since C is zero-based , it is always true
that 'argv [a rgc] == NULL' . Because of this, particul arly in Unix code, you will see
different ways of checking for the end of arguments, such as looping until a counter is
greater than or equal to argc , or until 'argv [i] == 0' or while ' * argv ! = NULL' and
so on . These are all equivalent.
Only 23 lines! There are two points of interest. First , decrementing argc and simul-
taneo usly incrementing argv (lines 12 and 13) are common ways of skipping initial
arguments. Second, the check for -n (line 10) is simplistic. - no-newl ine-at-the-
end also works. (Compile it and try it!)
Manual option parsing is common in V7 code because the ge topt () function hadn't
been invented yet.
Finally, here and in other places throughout the book, we see use of the r eg i s ter
keyword. At one time, this keyword provided a hint to the compiler that the given
variables should be placed in CPU registers, if possible. Use of this keyword is obsolete;
modern compilers all base register assignment on analysis of the source code, ignoring
the r egister keyword. We've chosen to leave code using it alone, but you should be
aware that it has no real use anymore. 2
The arguments arg c and argv are normally passed straight from those of main ( ) .
op t string is a string of option letters. If any letter in the string is followed by a colon,
then that option is expected to have an argument.
2 Wh en we asked Jim M eyerin g, the C oreutils maintainer, abour instances of register in the GNU Coreurils,
he gave us an interesting response. H e removes them wh en modifYing code, bur oth erwise leaves them alon e to
make it easier to inregrate changes submirred against existing versions.
2. 3 Op(io n Parsing: g e top t () and getopt_ long ( ) 31
To use getop t ( ) , call it repeatedly from a whi le loo p unti l it returns - 1. Each time
that it finds a valid option letter, it returns that letter. If the option takes an argument,
opta rg is set to point to it. Consider a program that accepts a - a option that doesn't
take an argument and a - b argument that does:
in t oc ; / * op ti on chara c ter * /
char *b_ opt_arg ;
d efault :
/ * e rr or han dling , see t e x t * /
c ha r *optarg
The argument for an optio n, if the option accepts one.
int opt ind
The current index in argv. When the wh i le loop has finished , rema1l11l1g
operands are found in ar gv [op t ind] through argv [argc- l] . (Remember that
'argv[argc] == NULL' .)
int op t er r
When this variable is nonzero (which it is by default) , ge topt () prints itS own
error messages for invalid options and for missing option argum ents.
int optopt
When an invalid optio n character is fo und, getop t () returns either a ' ? ' or a
, : ' (see below), and opt op t contains the invalid character that was found.
People being human, it is inevitable that programs will be invoked incorrectly, either
with an invalid option or with a missing option argument. In the normal case, getopt ( )
32 Chapter 2 • Arguments, Options, and the Environment
prints its own messages for these cases and returns the ' ? ' character. However, you
can change its behavior in two ways.
First, by setting opt err to 0 before invoking getop t ( ) , you can force get op t ( )
to remain silent when it finds a problem.
Second, if the first character in the opts tring argument is a colon, then getopt ( )
is silent and it returns a different character depending upon the error, as follows:
Invalid option
get opt () returns a ' ?' and optopt contains the invalid option character. (This
is the normal behavior.)
Missing option argument
getopt () returns a ' : ' . If the first character of optst ring is not a colon, then
getopt () returns a '?', making this case indistinguishable from the invalid
optlon case.
Thus, making the first character of op tstring a colon is a good idea since it allows
you to distinguish between "invalid option" and "missing option argument." The cost
is that using the colon also silences get opt ( ) , forcing you to supply your own error
messages. Here is the previous example, this time with error message handling:
int o c; / * option character * /
char *b_opt_ arg;
default :
/ * invalid option * /
fprintf(stderr, "%s: option ' -%c' is invalid : ignored \ n",
argv[ O], optopt ) ;
break;
2.3 Option Parsing: getopt () and getopt_long ( ) 33
A word about flag or option variab le-naming conventions: Much Unix code uses
names of the form xfl g for any given option letter x (for example, nflg in the V7
echo; xflag is also common). This may be great for the program's author, who happens
to know what the x option does wi thout having to check the documentation. But it's
unkind to someone else trying to read the code who doesn' t know the meaning of all
the option letters by heart. It is much better to use names that convey the option's
meaning, such as no_newline for ech o's -n option.
As for standard get opt (), if the first character of optstring is a ' : " then GNU
getopt () distinguishes between "invalid option" and "missing option argument" by
returning' ?' or ' : ' , respectively. The' : ' in opts tring can be the second character
if the first character is ' +' or ' - '.
Finally, if an option letter in opts tring is followed by two colon characters, then
that option is allowed to have an optional option argument. (Say that three times fast!)
Such an argument is deemed to be present if it's in the same argv element as the option,
34 Chapter 2 • Arguments, Options, and the Environment
and absent otherwise. In the case that it's absent, GNU getopt () returns the option
letter and sets optarg to NULL. For example, given-
whil e ((c = getopt(argc, argv, "ab ::" )) ! = 1)
-for - bYANKEES , the return value is 'b ', and op targ points to "YANKEE S", while
for - b or ' - b YANKEE S', the return value is still 'b' but optarg is set to NULL. In the
latter case, "YANKEE S " is a separate command-line argument.
int get opt_long _ only (int argc, char *cons t argY ll,
const char *optst ring,
const s truct opt ion *l ongopts , in t *longindex ) ;
The first three arguments are the same as for get opt ( ) . The next option is a pointer
to an array of st ruc t opt ion, which we refer to as the long options table and which
is described shortly. The longindex parameter, if not set to NULL, points to a variable
which is filled in with the index in longopts of the long option that was found. This
is useful for error diagnostics, for example.
struct option {
co nst char *name ;
int has _arg ;
int *f lag;
int va l;
};
Each long option has a single entry with the values appropriately filled in. The last
element in the array should have zeros for all the values. The array need not be so rted;
get opt_long () does a linear search. H owever, sorting it by long name may make it
easier for a programmer to read.
36 Chapter 2 • Arguments, Options, and the Environment
TABLE 2.1
Values for has_arg
The use of flag and v al seems confusing at first encounter. Let's step back for a
moment and examine why it works the way it does. Most of the time, option processing
consists of setting different Bag variables when different option letters are seen, like so:
while ((c = getopt(argc, argv , ":af :hv " )) != -1) {
switch (c) (
case 'a':
do all 1;
break;
case 'f' :
myfile optarg ;
break;
case 'h' :
do_help 1;
break;
case 'v' :
do_verbose 1;
break;
Error handling code here
When flag is not NULL, getopt_long () sets the variable for you. This reduces the
three cases in the previous swi tch to one case. Here is an example long options table
and the code to go wi th it:
int do_all, do_ help, do_verbose ; / * flag variabl es * /
char *myfile;
No tice that the value passed for the op ts tring argument no longer contains' a ' ,
, h' , or ' v ' . This means that the corresponding short optio ns are not accep ted. To allow
both long and short options, yo u would have to restore the corresponding cas e s fro m
the first example to the swi t c h.
Practically speaking, yo u sho uld write your programs such that each short option
also has a co rresp onding long option. In this case, it's easiest to have fl ag be NULL and
val be the corresponding single letter.
With this change, - Wall is the same as -- all and -Wfil e =myfile is the same as
--fi le =myfile. The use of a semicolon makes it possible for a program to use - Was
a regular option, if desired. (For example, Gee uses it as a regular option, whereas
gawk uses it for POSIX conformance.)
TABLE 2.2
getopt_l ong () return values
Finally, we enhance the previous example code, showing the full swi tch statement:
int do_all, do_help, do_verbose; / * flag variables * /
char *myfile, *user; /* input file, user name * /
};
°, 0, 0, }
°
while {(c = get o pt_long (argc, argv, ": ahvf : u :: W;", l o ngopts, NULL }) != -1 ) {
switch (c ) {
case 'a' :
d o all 1;
br eak;
case 'f':
myfile optarg;
break;
case 'h':
do_help 1;
break;
case 'u':
if (optarg ! = NULL )
user optarg;
else
user " root " ;
break;
case 'v' :
do_verbose = 1;
break ;
case 0 : / * getopt_long( ) set a variable, just keep going * /
break;
2.3 Option Parsing: getopt ( ) and getopt_lo ng ( ) 39
#i f 0
case 1 :
1*
* Use this case if getopt_ long() should go through all
* ar guments . If so , add a leading character to opts tri ng.
* Actual code , if any, go es here .
'I
break;
#endif
case ' : ' : 1 * missing op tion argument * 1
fprin tf(stderr , "%s : opt ion '-%c ' requir es an argument\n ",
argv [O], optopt ) ;
break;
case '?' :
default : 1 * invalid option * 1
fprintf(stderr, "%s : option '- %c' is inva lid : i gnored\n",
argv[O] , opto p t) ;
br eak;
In your programs, you may wish to have comments for each option letter explaining
what each one does. However, if you've used descriptive variable names for each option
letter, comments are not as necessary. (Compare do_verbos e to v f lg .)
You may be wondering, "Gee, I already use GNU/Linux. Why should I include
getopt_long () in my executable, making it bigger, if the routine is already in the C
library?" That's a good question. However, there's nothing to worry about. The source
code is set up so that if it's compiled on a system that uses GLIBC, the compiled files
will not contain any code! Here's the proof, on our system:
$ uname -a Show system name and type
Linux example 2.4.18-14 #1 Wed Sep 4 13 : 35:50 EDT 2002 i686 i686 i386 GNU/Linux
$ Is -1 getopt.o getoptl.o Show file sizes
-rw-r--r-- 1 arnold devel 9836 Mar 24 13 : 55 getopt.o
-rw-r--r-- 1 arnold devel 10324 Mar 24 13:55 getopt1 . o
$ size getopt.o getoptl.o Show sizes included in executable
text data bss dec hex filename
o 0 0 0 o getopt. o
o o o o o getoptl.o
The size command prints the sizes of the various parts of a binary object or exe-
cutable file. We explain the output in Section 3.1 , "Linux/UnixAddress Space," page 52.
What's important to understand right now is that, despite the nonzero sizes of the files
themselves, they don't contribute anything to the final executable. (We think this is
pretty neat.)
Of course, the disadvantage to using environment variables is that they can silently
change a program's behavior. Jim Meyering, the maintainer of the Coreutils, put it
this way:
It makes it easy for the user to customize how the program works without
changing how the program is invoked. That can be both a blessing and a
curse. If yo u write a script that depends on your having a certain environment
variab le set, but then have someone else use that same script, it m ay well fail
(o r worse, silently pro duce invalid results) if that other person d oesn't have
the same environment settings.
Occasionally, environment variables exist, but with empty values. In this case, the
return value will be non-NULL, but the first character pointed to will be the zero byte,
which is the C string terminator, ' \ 0 ' . Your code should be careful to check that the
return value pointed to is not NULL. Even if it isn 't NULL, also check that the string is
not empty if you intend to use its value for something. In any case, don ' t just blindly
use the returned value.
42 Chapter 2 • Arguments, Options, and the Environment
It's possible that a variable already exists in the environment. If the third argument
is true (nonzero) , then the supplied value overwrites the previous one. Otherwise, it
doesn 't. The return value is -1 if there was no memory for the new variable, and 0
otherwise. s e t env () makes private copies of both the variable name and the new value
for storing in the environment.
A simpler alternative to s et env () is putenv ( ) , which takes a single" n ame= v al u e"
string and places it in the environment:
if (putenv ( "PATH= / bin : l usr/bin : lusr/ucb") != 0) {
1* handle fai l ure *1
}
pu tenv () blindly replaces any previous value for the same variable. Also, and perhaps
more importantly, the string passed to puten v () is placed directly into the environment.
This means that if your code later modifies this string (for example, if it was an array,
not a string constant) , the environment is modified also. This in turn means that you
should not use a local variable as the parameter for putenv ( ) . For all these reasons
set env () is preferred.
scratch. If cl earenv () is not available, the GNU/ Linux clearenv(3) manpage recom-
mends using ' envi ron = NUL L ; ' to accomplish the task.
i nt i;
retur n 0;
Although it's unlikely to happen, this program makes sure that environ isn' t NUL L
before attempting to use it.
Variables are kept in the environment in random order. Although some Unix shells
keep the environment sorted by variable name, there is no formal requirement that this
be so , and many shells don 't keep them sorted.
44 Chapter 2 • Arguments, Options, and the Environment
You can then use envp as you would have used environ. Although you may see this
occasionally in old code, we don' t recommend its use; envir o n is the official, standard,
portable way to access the entire environment, should you need to do so.
$ env - PATH=/bin:/usr/bin myprog argl Clear environment, add PATH, run program
$ env -u IFS PATH=/bin:/usr/bin myprog argl Unset IFS, add PATH, run program
The code begins with a standard GNU copyright statement and explanatory comment.
We have omitted both for brevity. (The copyright statement is discussed in Appendix C ,
"GNU General Public License, " page 657. The --help output shown previously is
enough to understand how the program works.) Following the copyright and comments
2.4 The Environment 4S
are header includes and declarations. The 'N_ ( " s tr ing" ) , macro invocati on (line 9 3)
is for use in internationalization and localization of the software, topics covered in
C hapter 13, "Internationalization and Localization," page 485 . For now, you can treat
it as if it were the contained string co nstant.
80 #include <config . h>
81 #include <stdio . h>
82 #include <getopt . h>
83 #include <sys/ types . h>
84 #include <ge topt . h >
85
86 #include "syst em . h"
87 #include "erro r. h"
88 # incl ude "clos e out.h"
89
90 1* The official name of this program (e .g. , no 'g' prefi x ) . *1
91 #define PROGRAM_NAME "env"
92
93 #defin e AUTHORS N_ ("Richard Mlynar i k and Davi d MacKenzie " )
94
95 in t putenv () ;
96
97 e x tern cha r **envi r on;
98
99 1 * The name by wh ich this p r ogram was run . * 1
100 char *program_name;
101
102 static struct o ption const longopts[]
103
104 {" ignore -env ironmen t ", no_ argument , NULL , 'i'} ,
105 {"unset", required_argumen t, NULL, ' u ' },
106 {GETOPT_HEL P_OPTION_DECL},
107 {GETOPT_VERSION_OPTION_DECL } ,
108 {NULL , 0 , NULL , O}
109 };
The GNU Coreutils contain a large number of programs, m any of which perform
the same common tasks (for example, argument parsing) , To make maintenance easier,
m any commo n idioms are defined as macros. GETOPT_ HELP _ OPTI ON_DECL and
GETOPT_VERSION_ OPT I ON (lines 106 and 107) are two such. We examine their defini-
tions shortly. The first function , usage ( ), prints the usage information and exits.
T he _ ( "stri ng " ) macro (line 115, and used throughout the program) is also for
internationalization, and for now you should also treat it as if it were the contained
stnng co nstant.
46 Chapter 2 • Arguments, Opcions, and che Enviro nment
111 void
112 usage (int status )
11 3
11 4 if (status != 0)
115 fprintf (stderr, _ ( "Try '% s --help' for more information. \n"),
11 6 program_name ) ;
117 else
118
119 printf (_ ( " \
120 Usage : %s [O PTION] ... [-] [NAME=VALUE] ... [COMMAND [ARG] . .. ]\ n" ) ,
121 program_name ) ;
122 fputs (_ ( " \
123 Set each NAME to VALUE in the environment and run COMMAND . \ n \
124 \n\
125 -i, --igno re-environment start with an empty environment \n\
126 -u, --unset=NAME remove variable from the environment\n \
127 " ), s tdou t ) ;
128 fputs (HELP_OPTION_DESCRIPTION, stdout ) ;
129 fputs (VERSION_OPTION_DES CR IPTION, stdout);
130 fputs (_ ( " \
131 \ n\
132 A mere - implies -i. If no COMMAND , print the resulting environment . \ n \
133 " ) , stdout ) ;
134 printf (_ ( " \nR eport bugs to <%s>.\n"), PACKAGE_BUGREPORT ) ;
135
136 exit (status);
137 }
The first part of main () declares variables and sets up the internationalization. The
functions setlocale (), bindtextdomain (), and textdomain () (lines 147-149)
are all discussed in Chapter 13, "Internationalization and Localization," page 485. Note
that this program does use the envp argument to main () (line 140). It is the only one
of the Coreutils programs to do so. Finally, the call to a texi t () on line 151 (see Sec-
tion 9.1.5 .3, "Exiting Functions," page 302) registers a Coreutils library function thac
Bushes all pending output and closes stdout, reporting a message if there were problems.
The next bit processes the command-line arguments, using getopt_ long ( ) .
139 int
140 main (register int argc, register char **argv, char **envp )
141
142 char *dummy_environ [ l ] ;
143 int optc ;
144 int ignore_environment = 0 ;
145
146 program_name = argv[O];
147 setlocale (LC_ALL , "" ) ;
148 bindtextdomain (PACKAGE, LOCALEDIR ) ;
149 textdomain (PACKAG E) ;
150
151 atexit (close_stdout) ;
2.4 The Enviro nmenr 47
152
153 whi le (( o p t c = ge top t_long (argc, argv, "+ iu : ", longopts , NULL)) != - 1)
154 (
155 swi tch (optc )
156
157 ca se 0 :
158 b reak;
159 c ase ' i' :
160 ignor e _envir onment 1;
161 brea k ;
162 c ase !u ' :
163 b reak ;
164 c as e _GETOPT_ HELP_CHAR ;
165 ca s e _ GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS) ;
166 defaul t :
167 u s ag e (2);
168
169
170
171 if (optind ! = argc && ! strcmp (a r gv [ optind l , "-"))
172 ignore_e nvir olli~e n t = 1;
H ere are the macros, from src / sys2 . h in the Coreutils distribution, that define
the declarations we saw earlier and the 'cas e_GETOPT_xxx' m acros used above (lines
164- 165);
/ * Factor out s ome of the common --help and -- version processi n g c o d e . */
/ * These enum values ca nnot possibly conflict with the option valu es
o rdinaril y u s ed by commands, including CHAR_MAX + 1, etc . Avoid
CHAR_MIN - 1, as it may equal -1, the getopt end-of-options v alue . */
enum
GETOPT_HELP_CHAR = (CHAR_ MI N - 2) ,
GETOPT_VERSION_CHAR = (CHAR_ MI N - 3 )
};
The upshot of this code is that --help prints the usage message and --version
prints version information. Both exit successfully. ("Success" and "failure" exit statuses
are described in Section 9.1.5.1 , "Defining Process Exit Status," page 300.) Given that
the Coreutils have dozens of utilities , it makes sense to factor out and standardize as
much repetitive code as possible.
Returning to env. c:
174 environ = dummy_environ;
175 envir on[O) = NULL;
176
177 if ( ! ignore_environment )
178 for ( ; *envp; envp++ )
179 putenv ( *envp ) ;
180
181 optind = 0 ; / * Force GNU getopt to re-initialize . * /
182 while ((optc = getopt_long (argc, argv, "+iu:", longopts, NULL )) != -1 )
183 i f (optc == 'u' )
184 putenv ( optarg ) ; /* Requires GNU putenv. */
185
186 if (optind ! = argc && !strcmp (argv[optindJ, "-" )) Skip options
187 ++ optind;
188
189 while (optind < argc && strchr (argv[ optind), ' =' )) Set environment variables
190 putenv (argv[optind++ )) ;
191
192 / * If no program is specified, print the environment and exit. * /
193 if (optind == argc )
194 {
195 while ( *environ )
196 puts (* envir on++ ) ;
197 exit ( EXIT_SUCCESS ) ;
198
Lines 174-179 copy the existing environment into a fresh copy of the environment.
The global variable environ is set to point to an empty local array. The envp parameter
maintains access to the original environment.
Lines 181-184 remove any environment variables as requested by the -u option.
The program does this by rescanning the command line and removing names listed
there. Environment variable removal relies on the GNU putenv () behavior discussed
earlier: that when called with a plain variable name, putenv () removes the environment
variable.
After any options, new or replac~ment environment variables are supplied on the
command line. Lines 189-190 continue scanning the command line, looking for envi-
ronment variable settings of the form 'name=value'.
2.5 Summary 49
Upon reaching line 192, if nothing is left on the command line, env is supposed to
print the new environment, and exit. It does so (lines 195-197).
If arguments are left, they represent a command name to run and arguments to pass
to that new command. This is done with the execvp () system call (line 200), which
replaces the current program with the new one. (This call is discussed in Section 9.l.4 ,
"S tarting New Programs: The exec () Family," page 293; don 't worry about the details
for now.) If this call returns to the current program, it failed. In such a case, env prints
an error message and exits .
200 e xe cvp (argv[optind] , &argv[optind]) ;
201
202
203 int exit_status = (errno == ENOENT ? 127 : 126) ;
204 error (0 , errno , "%s", argv[optind]);
205 exit (e xit_status ) ;
206
207
The exit status values, 126 and 127 (determined on line 203) , conform to POSIX.
127 means the program that execvp () attempted to run didn ' t exist. (ENOENT means
the file doesn ' t have an entry in the directory.) 126 means that the file exists, but
something else went wrong.
2. 5 Summary
• C programs access their command-line arguments through the parameters argc
and argv. The getopt () function provides a standard way for consistent parsing
of options and their arguments. The GNU version of getopt () provides some
extensions, and getopt_1ong () and getopt_1ong_ on1y () make it possible to
easily parse long-style options.
• The environment is a set of 'name=value' pairs that each program inherits from
its parent. Programs can, at their author's whim, use environment variables to
change their behavior, in addition to any command-line arguments. Standard
routines (getenv ( ), setenv () , putenv ( ) , and unsetenv ( )) exist for retrieving
environment variable values, changing them, or removing them. If necessary, the
entire environment is available through the external variable environ or
through the char * * envp third argument to rna in ( ). The latter technique is
discouraged.
50 Chapter 2 • Arguments, Options, and the Environment
Exercises
1. Assume a program accepts options -a, -b, and -c, and that - b requires an ar-
gument. Write the manual argument parsing code for this program, without
using getopt () or getopt_ long ( ). Accept - - to end option processing.
Make sure that - ac works, as do -bYANKEES, -b YANKEES, and -abYANKEES.
Test your program.
2. Implement getopt ( ) . For the first version, don't worry about the case in which
'optstring [0 1 == ,:, '. You may also ignore opterr .
3. Add code for 'optstring [0 1 ' . " and opt err to your verSlOn of
get opt ().
4. Print and read the GNU get opt. h, get op t . c and get opt l . c files.
5. Write a program that declares both env ir on and envp and compares their
values.
6. Parsing command line arguments and options is a wheel that many people
can't refrain from reinventing. Besides get opt () and getopt_l ong ( ) , you
may wish to examine different argument-parsing packages, such as:
• The Plan 9 From Bell Labs arg(2) argument-parsing library,4
• Argp, 5
• Argv,6
• Autoopts'?
• GNU Gengetopt,8
• Opt,9
• PopelO See also the popt(3) manpage on a GNU/Linux system.
7. Extra credit: Why can't a C compiler completely ignore the regist er keyword?
Hint: What operation cannot be applied to a register variable?
• Exercises page 8 1
51
W ithout memory for storing data, it's impossible for a program to get any
work done. (Or rather, it's impossible to get any useful work done.) Real-
world programs can't afford to rely on fixed-size buffers or arrays of data structures.
They have to be able ro handle inputs of varying sizes, from small ro large. This in
turn leads to the use of dynamically allocated memory--memory allocated at runtime
instead of at compile time. This is how the GNU "no arbitrary limits" principle is
put into action.
Because dynamically allocated memory is such a basic building block for real-world
programs, we cover it early, before looking at everything else there is to do. Our
discussion focuses exclusively on the user-level view of the process and its memory;
it has nothing to do with CPU architecture.
52
3.1 Linux/Unix Address Space 53
Zero-initialized data
Global and statically allocated data that are initialized to zero by default are kept
in what is colloquially called the BSS area of the process. ! Each process running
the same program has its own BSS area. When running, the BSS data are placed
in the data segment. In the executable file, they are stored in the BSS section.
The format of a Linux/U nix executable is such that only variables that are initialized
to a nonzero value occupy space in the executable's disk file. Thus, a large array
declared 'stat ic char somebuf [2048] ; ', which is automatically zero-filled,
does not take up 2 KB worth of disk space. (So me compilers have options that let
you place zero-initialized data into the data segment.)
Heap
The heap is where dynamic memory (obtained by ma llo c () and friends) comes
from. As memory is allocated on the heap , the process's address space grows, as
you can see by watching a running program with the ps command.
Although it is possible to give memory back to the system and shrink a process's
address space, this is almost neve r done. (We distinguish between releasing no-
longer-needed dynami c memory and shrinking the address space; this is discussed
in more detail later in this chapter.)
It is rypical for the heap to "grow upward. " This means that successive items that
are added to the heap are added at addresses that are numerically greater than
previous items. It is also rypical for the heap to start immediately after the BSS
area of the data segment.
Stack
The stack segment is where local variables are allocated. Local variables are all
variables declared inside the opening left brace of a function body (or other left
brace) that aren't defined as s tatic .
On most architectures, function parameters are also placed on the stack, as well
as "invisible" bookkeeping information generated by the compiler, such as room
for a function return value and storage for the return address representing the return
from a function to its caller. (Some architectures do all this with registers.)
I BSS is an acronym for "B lock Started by Symbol," a mnemonic from the IBM 7094 asse mbler.
S4 Chapter 3 • User-Level Memory Managemenr
It is the use of a stack for function parameters and return values that makes it
convenient to write recursive functions (functions that call themselves).
Variables stored on the stack" disappear" when the function containing them re-
turns; the space on the stack is reused for subsequent function calls.
On most modern architectures, the stack "grows downward," meaning that items
deeper in the call chain are at numerically lower addresses.
When a program is running, the initialized data, BSS, and heap areas are usually
placed into a single contiguous area: the data segment. The stack segment and code
segment are separate from the data segment and from each other. This is illustrated in
Figure 3.1.
High Address
Program Stack
STACK SEGMENT
Possible "hole"
in address space
Heap
Globals and
Static variables
(Data)
Low Address
Executable code
(shared)
TEXT SEGMENT
FIGURE 3.1
LinuxjUnix process address space
3.1 Linux/U nix Address Space ss
Although it's theoretically possible for the stack and heap to grow into each other,
the operating system prevents that event, and any program that tries to make it happen
is asking for trouble. This is particularly true on modern systems, on which process
address spaces are large and the gap between the top of the stack and the end of the
heap is a big one. The different memory areas can have different hardware memory
protection assigned to them. For example, the text segment might be marked "execute
only," whereas the data and stack segments would have execute permission disabled.
This practice can prevent certain kinds of security attacks. The details, of course, are
hardware and operating-system specific and likely to change over time. Of note is that
both Standard C and c++ allow canst items to be placed in read-only memory. The
relationship among the different segments is summarized in Table 3.l.
TABLE 3 .1
Executable program segments an d their locations
The size program prints out the size in bytes of each of the text, data, and BSS
sections, along with the total size in decimal and hexadecimal. (The c h 0 3 -memaddr. c
program is shown later in this chapter; see Section 3.2.5, "Address Space Examination,"
page 78.)
$ cc -0 ch03-memaddr . c -0 ch03-memaddr Compile the program
$ Is -1 ch03-memaddr Show total size
-rwxr- x r-x 1 arnold devel 12320 Nov 24 16 : 4 5 ch03-memaddr
$ size ch03-memaddr Show component sizes
tex t data bss dec hex fil ename
1458 276 8 1742 6ce c h 03-memaddr
$ strip ch03-memaddr Remove symbols
$ Is -1 chO 3 -memaddr Show total size again
- rwxr-xr - x 1 arnold devel 3480 Nov 24 16 : 45 ch03-memaddr
$ size chO 3 -memaddr Component sizes haven't changed
tex t da t a bss dec hex filename
1458 276 8 1742 6ce ch03 - memaddr
56 Chapter 3 • User-Level Memory Management
The total size of what gets loaded into memory is only 1742 bytes, in a file that is
12,320 bytes long. Most of that space is occupied by the symbols, a list of the program's
variables and function names. (The symbols are not loaded into memory when the
program runs.) The strip program removes the symbols from the object file . This can
save significant disk space for a large program, at the cost of making it impossible to
debug a core dump2 should one occur. (On modern systems this isn' t worth the trouble;
don ' t use strip.) Even after removing the symbols, the file is still larger than what gets
loaded into memory since the object file format maintains additional data about the
program, such as what shared libraries it may use, if any. 3
Finally, we'll mention that threads represent multiple threads of execution within a
single address space. Typically, each thread has its own stack, and a way to get thread
local data, that is, dynamically allocated data for private use by the thread. We don ' t
otherwise cover threads in this book, since they are an advanced topic.
2 A core dump is the memory image of a running process created when the process terminates unexpectedly. It may
be used later for debugging. Unix systems named the file core, and GNU/Linux systems use core. pid, where
pi d is the process 10 of the process that died.
3 The description here is a deliberate simplificati on. Runn ing programs occupy much more space than the size
progran1 indicates, since shared libraries are included in the address space. Also, the data segment will grow as a
program allocates memory.
3.2 Memory Allocarion 57
of a certain initial size, you can change its size with the rea lloc () function. D ynamic
memory is released with the fr ee () function.
Debugging the use of dynamic memory is an important top ic in its own right. We
discuss tools for this purpose in Section 15.5.2, "Memory Allocation Debuggers, "
page 612.
voi d *call oc(s ize_ t nmemb, s ize_t siz e ) ; Allocate and zero fill
v oid *mal loc ( si ze_ t si z e ) ; Allocate raw memory
vo id fr ee (v o i d *ptr ) ; Release memory
void *real loc(vo i d *ptr, s ize_t size) ; Change size of existing allocation
The allocation functions all return rype void *. This is a typefess or generic pointer;
all yo u can do with such a pointer is cast it to a different rype and assign it to a ryped
pointer. Examples are coming up,
The rype s iz e _ t is an unsigned integral rype that represents amounts of memory.
It is used for dynamic memory allocation, and we see many uses of it throughout the
book. On most modern systems, size_t is uns igned l ong, but it's better to use
size_t explicitly than to use a plain uns i gned integral rype.
The ptrdi f f _t rype is used for address calculations in pointer arithmetic, such as
calculating where in an array a pointer may be pointing:
#de fin e MAXBUF ...
c har *p;
c ha r buf [MAXBUF ] ;
p trdiff_ t where;
p = buf;
while ( s ome condi t ion )
P += something ;
The <stdlib.h> header file declares many of the standard C library routines and
rypes (such as siz e _ t), and it also defines the preprocessor constant NULL, which rep-
resents the" null" or invalid pointer. (This is a zero value, such as 0 or ' ( (va id *) 0)' .
58 Chapter 3 • User-Level Memory Management
The C++ idiom is to use 0 explicitly; in C, however, NULL is preferred, and we find it
to be much more readable for C code.)
The steps shown here are quite boilerplate. The order is as follows:
1. Declare a pointer of the proper rype to point to the allocated memory.
2. Calculate the size in bytes of the memory to be allocated. This involves multi-
plying a count of objects needed by the size of the individual object. This size
in turn is retrieved from the C sizeof operator, which exists for this purpose
(among others). Thus, while the size of a particular struet may vary across
compilers and architectures, sizeof always returns the correct value and the
source code remains correct and portable.
When allocating arrays for character strings or other data of type ehar, it is
not necessary to multiply by sizeof (ehar) , since by definition this is always
1. But it won' t hurt anything either.
3. Allocate the storage by calling mall oe ( ) , assigning the function's return value
to the pointer variable. It is good practice to cast the return value ofmalloc ()
3.2 M emo ry Allocarion 59
Once we've allocated memory and set coordi nates to point to it, we can then treat
coo r dinat e s as if it were an array, although it's really a pointer:
int cur_x, cur_Y t cur_Z i
siz e_t an_index;
an_i nde x = someth ing;
cur_x coordina t es (an_ index].x;
cur_y = coordina tes (an_i ndex] . y ;
cur_z = co ordina tes [an_index] . z;
The compiler generates correct code for indexing through the pointer ro retrieve the
members of the sttucture at coordi na t es [an_index 1 .
I NOTE The memory returned byma lloc () is not initialized. It can contain any
j[ random garbage. You should immediately initialize the memory with valid data
I or at least with zeros . To do the latter, use mems et () (discussed in Section 12 .2,
m " Low-Level Memory: The memXXX () Functions, " page 432 ):
ffi memset(coordinates , '\0 ' , amount) ;
I
ill
Another option is to use ca lloc ( ) , described shortly.
This approach guarantees that the mall oc () will allocate the correct amount of
memory without your having to consult the declaration of p ointer. If p o inter's type
later changes, the s i zeo f operator automatically ensures that the count of byres to al-
locate stays correct. (Geoffs technique omits the cast that we just discussed. Having
the cast there also ensures a diagnostic if po i n t e r's type changes and the call to
mallo c ( ) isn't updated.)
Accessingfreed memory
If unchanged, c oo r d i nates continues to point at memory that no longer belongs
to the application. This is called a dangling pointer. In many systems, you can get
away with continuing to access this memory, at least until the next time more
memory is allocated or freed. In many others though, such access won ' t work.
In sum, accessing freed memory is a bad idea: It's not portable or reliable, and the
GNU Coding Standards disallows it. For this reason, it's a good idea to immediately
set the program's pointer variable to NULL. If you then accidentally attempt to
access freed memory, YOut program will immediately fai l with a segmentation
fault (before you've released it to the world, we hope).
Freeing the same pointer twice
This causes "undefined behavior. " Once the memory has been handed back to
the allocation routines, they may merge the freed block with other free storage
under management. Freeing something that's already been freed is likely to lead
to confusion or crashes at best, and so-called double frees have been known to
lead to security problems.
3. 2 Memory Allocarion 61
This call won't work, and it's likely to lead to disastrous consequences, such as a
crash. (This is because m any ma lloe () implememations keep "bookkeeping"
information in font o/the ret urned data. When free () goes to use that informa-
tion, it will find invalid data there. Other implemematio ns have the bookkeeping
information at the end of the allocated ch unk; the same issues apply.)
Buffer overruns and underruns
Accessing memory outside an allocated chunk also leads to undefined behavior,
again because this is likely to be bookkeeping information or possibly memory
that's not even in the address space. Writing into such memory is much worse,
since it's likely to destroy the bookkeeping data.
Failure to .free memory
Any dynamic memory that's not needed should be released. In particular, memory
that is allocated inside loo ps or recursive or deeply nested function calls should
be carefully managed and released. Failure to take care leads to memory leaks,
whereby the process's memory can grow without bounds ; eventually, the process
dies from lack of memory.
This si tuation can be particularly pernicious if memory is allocated per input
record or as some other function of the input: The memory leak won' t be noticed
when run on small inputs but can suddenly become obvious (and embarrassing)
when run on large ones . T his error is even worse for systems that must run contin-
uously, such as telephone switching systems. A memory leak that crashes such a
system can lead to significant monetary or other damage.
Even if the program never dies for lack of memory, constantly growing programs
suffer in performance, because the operating sys tem has to manage keeping in-use
data in physical m emory. In the worst case, this can lead to behavior known as
thrashing, whereby the operating system is so busy m oving the contents of the
address space into and out of physical memory that no real work gets done.
62 Chapter 3 • User-Level M em ory Management
While it's possible for f r ee () to hand released memory back to the system and shrink
the process address space, this is almost never done. Instead, the released memory is
kept available for allocation by the next call to malloc ( ) , calloc ( ) , or reall oc ( ) .
Given that released memory continues to reside in the process's address space, it may
pay to zero it out before releasing it. Security-sensitive programs may choose to do this,
for example.
See Section 15.5.2, "Memory Allocation Debuggers, " page 612, for discussion of a
number of useful dynamic-memory debugging tools.
coordinates = newcoords;
/ * cont inue usi n g coordinates ... */
As with mall oc ( ) , the steps are boilerplate in nature and are similar in concept:
1. Compute the new size to allocate, in byres.
2. Call rea lloc () with the original pointer obtained from malloc () (or from
calloc () or an earlier call to r ealloc ( ) ) an d the new size.
3.2 Memory Allocarion 63
3. Cast and assign the return value of realloe (). More discussion of this shortly.
4. As for malloe ( ), check the return value to make sure it's not NULL. Any
memory allocation routine can fail.
When growing a block of memory, realloe () often allocates a new block of the
right size, copies the data from the old block into the new one, and returns a pointer
to the new one.
When shrinking a block of data, realloe () can often just update the internal
bookkeeping information and return the same pointer. This saves having to copy the
original data. However, if this happens, don't assume you can still use the memory beyond
the new size!
In either case, you can assume that if realloe () doesn ' t return NULL , the old data
has been copied for yo u into the new memory. Furthermore, the old pointer is no
longer valid, as if yo u had called free () with it, and you should not use it. This is true
of all pointers into that block of data, not just the particular one used to call free () .
You may have noticed that our example code used a separate variable to point to the
changed storage block. It would be possible (but a bad idea) to use the same initial
variable, like so:
coordinates = realloc(coordinates, new_amount);
This is a bad idea for the following reason. When realloe () returns NULL, the
original pointer is still valid; it's safe to continue using that memory. However, if yo u
reuse the same variable and realloc () returns NULL, you've now lost the pointer to
the original memory. That memory can no longer be used. More important, that
memory can no longer be freed! This creates a memory leak, which is to be avoided.
There are some special cases for the Standard C version of reall oc ( ) : When the
ptr argument is NULL, realloe () acts like malloe () and allocates a fresh block of
storage. When the size argument is 0 , realloe () acts like free () and releases the
memory that ptr points to. Because (a) this can be confusing and (b) older systems
don' t implement this feature, we recommend using malloe () when you mean
malloe () and free () when you mean free ( ) .
64 Chapter 3 • User-Level Memory Management
Here is another, fairly subtle, "gotcha.,,4 Consider a routine that maintains a stati c
pointer to some dynamically allocated data, which the routine occasionally has to grow.
It may also maintain automatic (that is, local) pointers into this data. (For brevity, we
omit error checking code. In production code, don' t do that.) For example:
void manage_table (void)
{
static struct table *table;
struct table *cur, *p;
int i;
size_t count;
This looks straightforward; ma nage_table () allocates the data, uses it, changes the
size, and so on. But there are some problems that don't jump off the page (or the screen)
when you are looking at this code.
In the line marked 'PROBLEM 1', the c ur pointer is used to update a table element.
However, c ur was assigned on the basis of the initial value of table. If some
c ondi bon was true and reall o c () returned a different block of memory, cur now
points into the original, freed memory! Whenever table changes, any pointers into
the memory need to be updated too. What's missing here is the statement ' cur = &
table [ i 1 ;' after table is reassigned following the call to reall o c ( ) .
The two lines marked ' PROBLEM 2' are even more subtle. In particular, suppose
other_ r outine () makes a recursive call to manage_table ( ) . T he t able variable
could be changed again, completely invisib ly! Upon return from other_r outine () ,
the value of cu r co uld once again be invalid.
One might think (as we did) that the only solution is to be aware of this and supply
a suitably commented reassignment to cu r after the function call. However, Brian
Kernighan kindly set us straight. If we use indexing, the pointer maintenance iss ue
doesn't even arise:
table = ( struc c table *) mal loc (count * siz e of (struct cable )) ;
/ * f i ll table * /
Using indexing doesn' t solve the problem if you have a global copy of the original
pointer to the allocated data; in that case, you still have to worry about updating your
global structures after calling r ealloe ( ) .
m
I NOTE As with malloe ( ) , whe n you grow a piece of memory, the newly
~ allocated memory returned from realloe () is not zero-filled . You must clear
I it you rself with mems et () if that's necessary, since realloe () only allocates
@ th e fresh memory; it doesn 't do anything else.
!li
Conceptually, at least, the ealloe () code is fairly simple. Here is one possible
implementation :
66 Chapter 3 • User-Level Memory M anagement
Many experienced programmers prefer to use eal loc () si nce then there's never any
question about the contents of the newly allocated memory.
Also, if you know you'll need zero-filled memory, you should use ea lloe ( ) , because
it's possible that the memory mallo e () returns is already zero-filled. AJthough yo u,
the programmer, can' t know this, ealloe () can know about it and avoid the call
to memset ( ) .
In three short paragraphs, Ri chard Stallman has distilled the important principles
for doing dynamic memory management with malloe ( ). It is the use of dynamic
3.2 Memory Allocation 67
memory and the "no arbitrary limits" principle that makes GNU programs so robust
and more capable than their Unix counterparts.
W e do wish to point out that the C standard requires r ealloe () to not destroy the
original block if it returns NU LL .
The nextfr ee variable points to a linked list of NOD E structures. The getnode ( ) macro
pulls the first structure off the list if one is there. Otherwise, it calls mor e_nodes () to
allocate a new list of free NODES. T he fr eenode ( ) macro releases a NOD E by putting it
at the head of the list.
(ftp: // ftp.gnu. o rg / gnu / make / make- 3 .80.tar.gz). It can be found in the file
read. c .
Following the "no arbitrary limits" principle, lines in a Makefile can be of any
length. Thus, this routine's primary job is to read lines of any length and make sure
that they fit into the buffer being used.
A secondary job is to deal with continuation lines. As in C, lines that end with a
backslash logically continue to the next line. The strategy used is to maintain a buffer.
As many lines as will fit in the buffer are kept there, with pointers keeping track of the
start of the buffer, the current line, and the next line. Here is the structure:
struct ebuffer
The size field tracks the size of the entire buffer, and f p is the FILE pointer for the
input file. The fl oc structure isn' t of interest for studying the routine.
The function returns the number of lines in the buffer. (The line numbers here are
relative to the start of the function , not the source file. )
1 static long
2 readline (ebuf ) static long readline(struct ebuffer *ebuf)
3 struct ebuffer *ebuf;
4
5 char *p;
6 char *end;
7 char *start ;
8 long nlines = 0 ;
9
10 / * The behaviors between string and stream buffers are differ e nt enough to
11 warrant different functions . Do the Right Thing . */
12
13 if ( !ebuf->fp)
14 return readstring (ebuf);
15
16 / * When reading from a file, we always start over at t h e beginning of the
17 buffer for each new line. */
18
19 p = start = ebuf->bufstart;
20 end = p + ebuf->size ;
21 *p= '\0 ';
3.2 Memory Alloca(ion 69
We start by noticin g that GNU Make is written in K&R C for maximal portability.
T he initial part declares variables, and if the input is coming from a string (s uch as
fro m the expansion of a macro) , the code hands things off to a different function,
re adstring () (lines 13 and 14) . The test ' ! ebuf -> fp' (line 13) is a shorter (and less
clear, in our opinion) test for a null pointer; it's the same as 'ebu f ->fp == NULL' .
Lines 19- 21 initialize the pointers, and insert a NUL byte, which is the C string
terminator character, at the end of the buffer. The function then starts a loop (lines
23-9 5) , which runs as lo ng as there is more inp ut.
23 wh i l e ( fgets (p, end - p, ebu f->f p) ! = 0)
24 (
25 char *p2 ;
26 unsigned l ong len;
21 int backs lash ;
28
29 len = strlen (p ) ;
30 if (len == 0)
31 (
32 / * This only happens when the fir st thing on the li n e is a '\0 '.
33 It is a pre tty hop eless cas e , but (wonder of wonders ) Athena
34 lossage strike s again! (xmkrnf p uts NUL s in its make file s . )
35 There is nothi ng really to be d one; we syn thesiz e a newli ne so
36 the following line doe sn't appe ar to be part of this line . */
37 error (&ebuf ->floc,
38 _( "warni ng : NUL cha ra cter seen; re st of line ignored " ));
39 p [0 J = ' \ n';
40 len = 1;
41 }
The fg ets () function (line 23) takes a pointer to a buffer, a co unt of bytes to read,
and a FILE * vari able for the fi le to read from. It reads one less than the count so that
it can terminate the buffer with ' \ 0 ' . T his fun ction is good since it allows yo u to avoid
buffer overflows . It stops upon encountering a newline or end-of-file, and if the newline
is there, it's placed in the buffer. It returns NULL on failure or the (pointer) value of the
first argument on success.
In this case, the arguments are a pointer to the free area of the buffer, the amo unt
of room left in the buffer, and the FILE pointer to read from.
The comment o n lines 32- 36 is self-explanatory; if a zero byte is encountered, the
program prints an error message and pretends it was an empty line. After compensating
for the NUL byte (lines 30-4 1), the code continues .
70 Chapter 3 • User-Level Memory Management
Lines 43-52 increment the pointer into the buffer past the data just read. The code
then checks whether the last character read was a newline. The construct p [-1] (line 48)
looks at the character in front ofp, just as p [ 0] is the current character and p [1] is the
next. This looks strange at first, but if you translate it into terms of pointer math,
* (p-1) , ir makes more sense, and the indexing form is possibly easier to read.
If the last character was not a newline, this means that we've run out of space, and
the code goes off (with g o t o ) to get more (line 49). Otherwise, the line count is
incremented.
54 #if !def ined (WINDOWS32) && ! defined ( __ MSDOS __ )
55 / * Check to see if the line was really ended with CRLF; if so ignore
56 the CR . * /
57 if ((p - start) > 1 && p[-2J == ' \r' )
58 {
59 - -p;
60 p[-lJ '\n' ;
61
62 #endif
Lines 54-62 deal with input lines that follow the Microsoft convention of ending
with a Carriage Return-Line Feed (CR-LF) combination, and not just a Line Feed (or
newline), which is the Linux/Unix convention. Note that the #ifdef excludes the code
on Microsoft systems; apparently the <stdi o . h > library on those systems handles this
conversion automatically. This is also true of other non-Unix systems that support
Standard C.
64 backs lash 0;
65 for (p2 = p - 2 ; p2 >= start; --p2)
66
67 if (*p2 ! = ' \ \ ' )
68 break;
69 backslash = ! backslash;
70
71
3.2 Memory Allocation 71
72 if (! backs lash)
73 {
74 p[-lJ = '\0' ;
75 brea k;
76
77
78 /* It was a backslash/newline combo . If we h ave mo re space, read
79 anothe r line . */
80 if (end - p >= 80)
81 continue ;
82
83 / * We need more space at the end of our buffer,
so realloc it .
84 Make sure to preserve the current offset of p . */
85 more_buffer :
86
87 unsigned long off = p - start ;
88 ebuf->size *= 2 ;
89 start = ebuf->buffer = ebuf->bufstart (char * ) xrealloc ( start,
90 ebuf->size) ;
91 p = start + off ;
92 end = start + ebuf->size;
93 *p = ' \ 0' ;
94
95
So far we've dealt with the mechanics of getting at least one complete line into the
buffer. The next chunk handles the case of a continuation line. It has to make sure,
though, that the final backslash isn't part of multiple backslashes at the end of the line.
It tracks whether the total number of such backslashes is odd or even by toggling the
backs l ash variable from 0 to 1 and back. (Lines 64-70.)
If the number is even, the test'! bac ks la s h' (line 72) will be true. In this case, the
final newline is replaced with a NUL byte, and the code leaves the loop.
On the other hand, if the number is odd, then the line contained an even number
of backslash pairs (representing escaped backslashes, \ \ as in C), and a final backslash-
newline combination. 5 In this case, if at least 80 free bytes are left in the buffer, the
program continues around the loop to read another line (lines 78-81). (The use of
the magic number 80 isn't great; it would have been better to define and use a symbolic
constant.)
5 This code has the scent of practical experience abo ut it: It wo uldn 't be surprising to lea rn that earli er versions
simply checked for a final backslash before the newline, until so meone co mplained th at it didn 't wo rk when there
we re multiple backslashes at th e end of the line.
72 Chapter 3 • User-Level Memory Management
Upon reaching line 83, the program needs more space in the buffer. Here's where
the dynamic memory management comes into play. Note the comment about preserving
p (lines 83-84); we discussed this earlier in terms of reinitializing pointers into dynamic
memory. end is also reset. Line 89 resizes the memory.
Note that here the function being called is xrealloe ( ) . Many GNU programs use
"wrapper" functions around malloe () and realloe () that automatically print an
error message and exit if the standard routines return NULL. Such a wrapper might look
like this:
extern const char *myname; / * se t in main( ) * /
if (p == NULL ) {
fprintf(stderr, "%s: out of memory' \ n", myname ) ;
exit (1 ) ;
Thus, if xrealloe () returns, it's guaranteed to return a valid pointer. (This strategy
complies with the "check every call for errors" principle while avoiding the code clutter
that comes with doing so using the standard routines directly.) In addition, this allows
valid use of the construct 'ptr = xrealloc (ptr , new_size)', which we otherwise
warned against earlier.
Note that it is not always appropriate to use such a wrapper. If you wish to handle
errors youtself, you shouldn't use it. On the other hand, if running out of memory is
always a fatal error, then such a wrapper is quite handy.
97 if (ferr or (ebuf->fp ) )
98 pfatal_with_name (ebuf->fl o c . filenm ) ;
99
100 / * If we found some lines , return how many .
101 If we didn ' t, but we did find _something_, that indicates we read the last
102 line of a file with n o final newline; return 1 .
103 If we read nothing, we're at EOF; return -1 . */
104
105 return nlines ? nlines : p == ebuf->bufstart ? - 1 : 1;
106
3. 2 Memory Allocation 73
Finally, the read l i ne () ro utine checks for I/O errors, and then returns a descriptive
return value. The fun ction pf atal_wi th_name () (line 98) doesn ' t return.
3 .2.1.9 GLlBC Only: Re ading Entire Lines: getl ine () and getde lim ( )
Now that yo u've seen how to read an arbitrary-length line, you can breathe a sigh
of relief that yo u don ' t h ave to write such a function for yourself. GLIBC provides two
functions to do this for you:
#de fine _GNU_SOURCE 1 CUBe
#inc l ude <stdio . h>
#include <sys/typ e s. h> / * for ssize_ t */
Defining the constant _GNU_SOURCE brings in the declaration of the getl ine ( )
and getdelim() functions. Otherwise, they're implicitly declared as returning int o
<sys / typ es. h> is needed so you can declare a variable of eype ssi ze_t ro hold the
return value. (An ss i ze_ t is a "signed size_t ." It's meant for the same use as a size_t ,
bur for places where yo u need to be ab le to hold n egative values as well.)
Both functi ons manage dynamic storage for you, ensuring that the buffer containing
an input line is always big enough to hold the input line. They differ in that getline ( )
reads until a newline character, and get delim () uses a user-provided delimiter character.
The com mon arguments are as fo llows:
char ** lineptr
A pointer to a char * pointer to hold the address of a dynamically allocated
buffer. It sho uld be initialized to NULL if yo u want getl ine () to do all the work.
O therwise, it should point to storage previo usly obtained from malloc ( ) .
size t *n
An indication of the size of the buffer. If yo u allocated yo ur own buffer, *n sho uld
co ntain the buffer's size. Both functions update *n to the new buffer size if they
change it.
FI LE *str eam
The location from which to ge t input characters.
74 Chapter 3 • U ser-Level Memo ry M anagement
The functions return -1 upon end-of-file or error. The strings hold the terminating
newline or delimiter (if there was one) , as well as a terminating zero byte. Using
get1ine () is easy, as shown in ch 03-getline . c :
/ * ch03- getl i n e . c -- - d e monstrate g e tl i ne() . */
#define _ GNU_SOURCE 1
#include <s t dio.h>
# include <s ys /types . h>
int main (v oi d )
return 0 ;
H ere it is in action, showing the size of the buffer. The third input and output lines
are purposely long, to force get l ine () ro grow the buffer; thus, they wrap around:
$ ch03-getline Run the program
this is a line
(120) th i s i s a line
And another line.
( 12 0) And a nother line .
A llllllllllllllllloooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnngggg
gggggggg llliiiiiiiiiiiiiiiiiiinnnnnnnnnnnnnnnnnnnneeeeeeeeee
(240) A ll l llll llllll llll oooooooooo o ooooooooooooo ooo ooooonnnnnnnnnnnnnnnn g
nnnggggggggggg llli ii i iiii iiii iiii ii innnnnnnnnnnnnnnnnnnn e e e e eeeeee
if (copy != NULL )
strcpy ( copy, str ) ;
With the 200 1 POSIX standard, programmers the world over can breathe a little
easier: This function is now part of POSIX as an XSI extension:
#include <string . h> XSI
The brk () system call actually changes the process's address space. The address is a
pointer representing the end of the data segment (really the heap area, as shown earlier
in Figure 3.1). Its argument is an absolute logical address representing the new end of
the address space. It returns 0 on success or - 1 on failure.
The s b r k () function is easier to use; its argument is the increment in bytes by which
to change the address space. By calling it with an increment of 0, you can determine
where the address space currently ends. Thus, to increase your address space by 32
bytes, use code like this:
char *p = (char *) sbrk (O) ; /* get current end of address space * /
if (brk(p + 32) < 0) (
/ * handle error * /
Practically speaking, you would not use brk () directly. Instead, you would use
sbr k () exclusively to grow (or even shrink) the address space. (We show how to do
this shortly, in Section 3.2.5, "Address Space Examination," page 78.)
Even more practically, you should never use these routines. A program using them
can' t then use rna ll oc () also, and this is a big problem, since many parts of the standard
library rely on being able to use rna l l oc ( ) . Using b r k () or sbr k () is thus likely to
lead to hard-to-find program crashes.
But it's worth knowing about the low-level mechanics , and indeed, the rnalloc ( )
suite of routines is implemented with sbr k () and brk ( ) .
The all oca () function allocates size bytes from the stack. What's nice about this
is that the allocated storage disappears when the function returns. There's no need to
explicitly free it because it goes away automatically, just as local variables do .
At first glance, alloca () seems like a programming panacea; memory can be allo-
cated that doesn't have to be managed at all. Like the Dark Side of the Force, this is
indeed seductive. And it is similarly to be avoided, for the following reasons:
• The function is nonstandard; it is not included in any formal standard, either ISO
Cor POSDC
• The function is not portable. Although it exists on many Unix systems and
GNU/Linux, it doesn't exist on non-Unix systems. This is a problem, since it's
often important for code to be multiplatform, above and beyo nd just Linux
and Unix.
• On some systems, alloca () can't even be implemented. All the world is not an
Intel x86 processor, nor is all the world GCe.
• Quoting the manpage (emphasis added): "The all oca function is machine
and compiler dependent. On many systems its implementation is buggy. Its use is
disco uraged."
• Quoting the manpage again : "On many systems all oca cannot be used inside
the list of arguments of a function call, because the stack space reserved by alloca
would appear on the stack in the middle of the space for the fun ction arguments."
• It encourages sloppy coding. Careful and correct memory management isn't hard;
you just to have to think about what you're doing and plan ahead.
GCC generally uses a built-in version of the function that operates by using inline
code. As a result, there are other consequences of alloca ( ). Quoting again from
the manpage:
The fact that the code is inlined means that it is impossible to take the address
of this function, or to change its behavior by linking with a different library.
The inlined code often consists of a single instruction adjusting the stack
pointer, and does not check for stack overflow. Thus, there is no NULL error
return.
78 Chapter 3 • User-Level Memory Managemem
The manual page doesn 't go quite far enough in describing the problem with Gee's
built-in alloca ( ) . If there's a stack overflow, the return value is garbage. And you have
no way to tell! This Haw makes GCC's alloca () impossible to use in robust code.
All of this should convince you to stay away from alloca () for any new code that
you m ay write. If yo u' re going to have to write portable code using malloc () and
free () anyway, there's no reason to also write code using alloca ( ) .
T his program prints the locations of the two functions main () and afune () (lines
22-23). It then shows how the stack grows downward, letting a f une ( ) (lines 51 -63)
print the address of successive instantiations of its local variable s taek_var. (s t a e k_v ar
is purposely declared a ut o, to emphasize that it's on the stack.) It then shows the loca-
tion of memory allocated by a lloe a () (lines 28-32). Finally it prints the locations of
data and BSS variables (lines 34-38), and then of memory allocated directly through
sbrk () (lines 40-48). Here are the results when the program is run on an Intel
GNU/Lin ux system:
$ ch03-memaddr
Text Locations :
Address of main : Ox 804838c
Address of afunc : Ox8 0484a8
Stack Locations :
Stack level 1 : address of stack_var : Oxbfff f864
Stack l evel 2 : address o f stack_var : Oxbff ff844 Stack grows downward
Start of alloca()'ed array : Oxbffff8 60
End o f alloca() 'ed array : Ox bf fff 87f Addresses are on the stack
80 Chapter 3 • User-Level Memory Management
Data Locations:
Address of data_var: Ox80496b8
BSS Locations :
Address of bss_var: Ox80497c4 ass is above data variables
Heap Locations :
Initial end of heap: Ox80497c8 Heap is immediately above ass
New end o f heap: Ox80497e8 And grows upward
Final end of heap: Ox80497d8 Address spaces can shrink
3.3 Summary
• Every Linux (and Unix) program has different memory areas. They are stored in
separate parts of the executable program's disk file. Some of the sections are loaded
into the same part of memory when the program is run. All running copies of the
same program share the executable code (the text segment). The size program
shows the sizes of the different areas for relocatable object files and fully linked
executable files.
• The address space of a running program m ay have holes in it, and the size of the
address space can change as memory is allocated and released. On modern systems,
address 0 is not part of the address space, so don ' t attempt to dereference
NULL pointers.
• The low-level system call interface functions, brk () and sbrk ( ) , provide direct
but primitive access to memory allocation and deallocation. Unless yo u are wri ting
your own storage allo cator, you should not use them.
• The alloca () function for allocating memory on the stack exists, but is not rec-
ommended. Like being able to recognize poiso n ivy, you should know it only so
that yo u'll know to avoid it.
Exercises
char *buf;
FILE * fp;
};
83
T his chapter describes basic file operations: opening and creating files , reading
and writing them, moving around in them, and closing them. Along the way
it presents the standard mechanisms for detecting and reporting errors. The chapter
ends off by describing how to set a file's length and force file data and metadata
to disk.
In the next and subsequent sections, we illustrate the model by writing a very simple
version of cat . It's so simple that it doesn 't even have options; all it does is concatenate
the contents of the named files to standard output. It does do minimal error reporting.
Once it's written, we compare it to the V7 cat .
We present the program top-down, starting with the command line. In succeeding
sections, we present error reporting and then get down to brass tacks, showing how to
do actual file I/O .
84
4 .2 Preseming a Basic Program Structure 8S
The myname variable (line 14) is used later for error messages ; mai n () sets it to the
program name (argv [01) as its first action (line 25). Then ma in () loo ps over the ar-
guments. For each argument, it calls a function named proces s () to d o the wo rk.
When given the filen ame - (a single dash, or minus sign) , Unix cat reads standard
input instead of trying to open a file n amed -. In addition , with n o arguments, ca t
reads standard input. ch04-cat implements both of these behaviors. T he check for
'arg c == l' (line 27) is true when there are no filen ame arguments; in this case, ma in ( )
passes" -" to proc ess ( ) . Otherwise, main () loops over all the arguments, treating
them as files to be processed. If one of them happens to be "-", the program then
processes standard input.
If process () returns a n onzero value, it means that so mething went wrong. Errors
are added up in the er rs variable (lines 28 and 3 1) . When main () ends, it returns 0
if there were no errors, and 1 if there were (line 33) . This is a fairly standard co nventio n ,
whose meaning is discussed in more detail in Section 9.1.5.1, "Defining Process Exit
Status," page 300.
86 Chapter 4 • Files and File 1/0
The structure presented in main () is quite generic: pro ces s () could do anything
we want to the file. For example (ignoring the special use of" - "), process () could
just as easily remove files as concatenate them!
Before looking at the process () function, we have to describe how system call errors
are represented and then how I/O is done. The process () function itself is presented
in Section 4.4.3 , "Reading and Writing, " page 96.
else
/* all ok, proceed * /
Knowing that an error occurred isn't enough. It's necessary to know what error oc-
curred. For that, each process has a predefined variable named errno. Whenever a
system call fails, errno is set to one of a set of predefined error values. errno and the
predefined values are declared in the <errno . h> header file:
#include <errno . h> ISO C
errno itself may be a macro that acts like an int variable; it need not be a real integer.
In particular, in threaded environments, each thread will have its own private version
of errno. Practically speaking, though , for all the system calls and functions in this
book, you can treat errno like a simple into
4.3 Derermining Whar Wenr Wrong 87
TABLE 4 .1
GLlBC values for errno
Name Meaning
E2BIG Argument list too long.
EACCES Permiss ion denied.
EADDRlNUSE Address in use.
EADDRNOTAVAIL Address not available.
EAFNOSUPP ORT Address family not supported.
EAGAIN Resource unavailable, try again (may be the same value as EWOULDBLOCK).
EALREADY Connection already in progress.
EBADF Bad file descriptor.
EBADMSG Bad message.
EBUSY Device or resource busy.
ECANC ELED Operation canceled.
ECHILD No child processes.
ECONNABORTED Connection aborted.
ECONNREFUSED Connection refused.
ECONNRESET Connection reset.
EDEADLK Resource deadlock would occur.
EDESTADDRREQ Destination address required.
EDOM Mathematics argument out of domain of function.
EDQUOT Reserved.
EEXI ST File exists.
88 Chapter 4 • Files and File I/O
TABLE 4 .1 (Continued)
Name Meaning
Name Meaning
Many systems provide other error values as well, and older systems may not have all
the errors just listed. You should check your local intro (2) and errno (2) manpages for
the full story.
II NOTE errn o should be examined only after an error has occurred and before
. further system calls are mad e. Its in itial value is o. However, noth ing changes
:~ errn o between errors, meaning that a successful system call does not reset it
I to O. You can , of course , manually set it to 0 initially or whenever you like, but
I this is rarely done.
Initially, we use e rrno only for error reporting. There are two useful functions for
error reporting. The first is pe r r or ( ) :
#include <stdio.h> ISOC
We prefer the s trerror ( ) function , which takes an error value parameter and returns
a pointer to a string describing the error:
#inc l ude <string . h> ISO C
You will see many examples of both functions throughout the book.
current line number in that file. These have been available in C since its beginning.
C 99 defines an additional predefined identifier, __ f unc __ , which represents the name
of the current function as a character string. The macros are used like this :
if (some_s ystem_c al l(paraml, param2 ) < 0) {
fpri n t f(st de rr, " %s : %s (%s %d): some _sys tem_c al l (%d , %d) faile d : %s\n ",
a r gv [0 l , __ func __ , __ FI LE__ , __ L I NE__ ,
paraml, param2, s trerror ( errno)) ;
retu rn 1 ;
Here, the error message includes not only the program 's name but also the func tion
name, source file name, and line number. The full list of identifiers useful for diagnostics
is provided in Table 4.2.
TABLE 4 .2
e99 diagnostic identifiers
T he use of _ _ FI LE __ and _ _ LI NE__ was quite popular in the early days of Unix,
when most people had source code and could find the error and fix it. As Unix systems
became more commercial, use of these identifiers gradually diminished, since knowing
the source code location isn't of much help ro someone who only has a binary executable.
Today, although GNU/Linux systems come with source code, said source code often
isn ' t installed by default. Thus, using these identifiers for error messages doesn 't seem
ro provide much additional value. T he GNU Coding Standards don't even mention them.
int getdtablesize(void) ;
int
main ( int argc, char * *argv )
When compiled and run, not surprisingly the program prints the same value as
printed by ulimi t :
$ ch04-maxfds
max fds : 1024
File descriptors are held in normal int variables; it is typical to see declarations of the
form ' int fd ' for use with 110 system calls. There is no predefined type for
file descriptors.
In the usual case, every program starts running with three file descriptors already
opened for it. These are standard input, standard output, and standard error, on file
descriptors 0, 1, and 2 , respectively. (If not otherwise redirected, each one is connected
to your keyboard and screen.)
4.4 Doing Input and Output 93
These co nstants can then be used in place of 0, 1, and 2 . They are both readable and
eas ier to type.
The return value from open () is either the new file descriptor or - 1 to indicate an
error, in which case errno will be set. For simple I/O, the fla gs argument should be
one of the values in Table 4.3.
TABLE 4 .3
Flag values for open ( )
We will see example code shortly. Additional values for flags are described in Sec-
tion 4.6 , "Creating Files," page 106. Much early Unix code didn't use the symbolic
values. Instead, the numeric value was used. Today this is considered bad practice, but
we present the values so that you'll recognize their meanings if you see them.
The close () system call closes a file: The entry for it in the system's file descriptor
table is marked as unused, and no further operations may be done with that file descrip-
tor. The declaration is
#include <unistd . h> POSIX
°
The return value is on success, -1 on error. There isn't much you can do if an error
does occur, other than report it. Errors closing files are unusual, but not unheard of,
particularly for files being accessed over a network. Thus, it's good practice to check
the return value, particularly for files opened for writing.
If yo u choose to ignore the return value, specifically cast it to vo id, to signify that
you don't care about the result:
(vo id) close(fd) ; / * throwaway return va lue */
The flip side of this advice is that too many casts to void tend to the clutter the code.
For example, despite the "always check the return value" principle, it's exceedingly rare
to see code that checks the return value of printf () or bothers to cast it to void. As
with many aspects of C programming, experience and judgment sho uld be applied
here too.
As mentioned, the number of open files , while large, is limited, and you should always
close files when you're done with them. If you don't, you will eventually run out of file
descriptors, a situation that leads to a lack of robustness on the part of your program.
The system closes all open files when a process exits, but-except for 0,1 , and 2-it's
bad form to rely on this.
When open () returns a new file descriptor, it always returns the lowest unused integer
value. Always. Thus, if file descriptors 0- 6 are open and the program closes file descriptor
5 , then the next call to open () returns 5, not 7. This behavior is important; we see
later in the book how it's used to cleanly implement many important Unix features ,
such as 110 redirection and piping.
We will see an example later, in Section 4.4.4, "Example: Unix cat ," page 99.
96 Chapter 4 • Files and File I/O
Assume that the result of get dtablesize () is 1024. This code works, bur it makes
(1024 - 3) * 2 = 2042 system calls. 1020 of them are needless, since the return value
from getdtablesize () doesn't change. Here is a better way to write this code:
int i, fds;
Such an optimization does not affect the readability of the code, and it can make a
difference, particularly on slow systems. In general, it's worth looking for cases in which
loops compute the same result repeatedly, to see if such a computation can't be pulled
our of the loop. In all such cases, though, be sure that you (a) preserve the code's cor-
rectness and (b) preserve its readability!
Each function is about as simple as can be. The arguments are the file descriptor for
the open file, a pointer to a buffer to read data into or to write data from, and the
number of bytes to read or write.
The return value is the number of bytes actually read or written. (This number can
be smaller than the requested amount: For a read operation this happens when fewer
than count bytes are left in the file, and for a write operatio n it happens if a disk fill s
up or some other error occurs .) The return value is -1 if an error occurred, in which
case errno indicates the error. When r ead () returns 0, it means that end-of-file has
been reached.
We can now show the rest of the code for ch04 -cat. The p roc ess () routine uses
° if the input fil ename is "-", for standard input (lines 50 and 51) . Otherwise, it opens
the given file:
36 /*
37 * p r o ce ss do s ometh i ng with the file, in thi s case,
38 send it to stdout ( fd 1) .
39 Returns 0 if all OK, 1 otherwi se .
40 */
41
42 int
43 process(char *file)
44 {
45 int fd:
46 ssi ze_ t rcount, wcount :
47 cha r buffer[ BUFS IZ ] :
48 int errors = 0:
49
50 if ( st rcmp ( fi le, "-") == 0)
51 fd = 0 :
52 el se if ((f d = open(fil e , O_RDONLY) ) < 0) (
53 fprintf ( stderr, "%s : %s : c annot open for read ing : %s \ n",
54 myname, file, st rerr or( e r r n o )) :
55 re turn 1 :
56 }
The buffer buffe r (line 47) is of size BUFS IZ; this constant is defined by <stdio . h>
to be the "optimal" block size for I/O. Although the value for BUF SIZ varies across
systems, code that uses this constant is clean and portable.
The core of the routine is the following loop, which repeatedly reads data until either
end-of-file or an error is encountered:
98 Chapter 4 • Files and File I/O
The re ount and we aunt variables (line 45) are of type ssize_ t, "signed size_t,"
which allows them to hold negative values. Note that the count value passed to wri te ( )
is the return value from read () (line 59). While we want to read fixed-size BUFSIZ
chunks, it is unlikely that the file itself is a multiple of BUFSIZ bytes big. When the
final, smaller, chunk of bytes is read from the file, the return value indicates how many
bytes of buffer received new data. Only those bytes should be copied to standard
output, not the entire buffer.
The test 'wcount ! = reount' on line 60 is the correct way to check for write errors;
if some, but not all, of the data were written, then wcount will be positive but smaller
than reoun t.
Finally, proce ss () checks for read errors (lines 68-72) and then attempts to close
the file. In the (unlikely) event that close () fails (line 7 5) , it prints an error message.
Avoiding the close of standard input isn' t strictly necessary in this program, but it's a
good habit to develop for writing larger programs, in case other code elsewhere wants
to do something with it or if a child program will inherit it. The last statement (line 82)
returns 1 if there were errors, 0 otherwise.
68 if (rc ount < 0) (
69 fprintf ( stderr, "%s: %s : read error : %s \ n " ,
70 myname, file, strerror(errno));
71 err o rs++;
72
73
74 if ( f d '= 0) (
75 if (c l ose ( fd ) < 0 ) (
76 fprintf(stderr, " %s : %s: close error: %s \ n",
77 myname, file, strerror (errno )) ;
78 errors++;
79
80
81
82 return (errors ! = 0) ;
83
4.4 Doing Inpur and Outpur 99
ch04-c at checks every sys tem call for errors. While this is tedious, it provides ro-
bustness (or at least clarity): When so mething goes wrong, c h04-cat prints an error
message that is as specific as possible. The combination of er rno and strerror ()
makes this easy (0 do. That's it for ch04- cat , only 88 lines of code!
To sum up , there are several points (0 understand about Unix I/O:
flO is uninterpreted.
The I/O system calls merely move bytes around. They do no interpretation of the
data; all interpretation is up to the user-level program. This makes reading and
writing binary suuctures just as easy as reading and writing lines of text (easier,
really, although using binary data introduces portability problems).
flO is flexible.
You can read or write as many bytes at a time as you like. You can even read and
write data one byte at a time, although doing so for large amounts of data is more
expensive that doing so in large chunks.
110 is simple.
The three-valued return (negative for error, zero for end-of-file, positive for a
co unt) makes programming straightforward and obvious.
110 can be partial.
Both read () and wri te () can transfer fewer bytes than requested. Application
code (that is, your code) must always be aware of this.
2 See /usr / src / cmd/ cat. c in the V7 distribution. Th e program co mpiles without change under GNU/Linux.
100 Chapter 4 • Files and File I/O
1 1*
2 * Concatenate files .
3 *1
4
5 #include <st dio.h>
6 #include <sys/types . h>
7 #include <sys/stat.h>
8
9 char stdbuf [BUFSI Z ] ;
10
11 main(argc , argyl int main (int argc, char ""argy)
12 char **ar gv;
13
14 int fflg = 0;
15 register FILE *fi;
16 register c;
17 int dey, ino = -1;
18 struct stat statb;
19
20 setbuf ( stdout, stdbuf);
21 for t ; argc >l && argv[1] [0]==' -' ; argc -- ,argv++ ) {
22 swi tch ( argv [1] [1]) { Process options
23 case 0 :
24 break;
25 case 'u':
26 setbuf (stdout, ( char * ) NULL ) ;
27 co ntinue;
28
29 brea k;
30
31 fsta t(fil eno(stdout), &statb ) ; Lines 3 1- 36 explained in Chapter 5
32 statb.st_mode &= S_IFMT;
33
34 dey statb . st_dev ;
35 inc = stat b.st_ino;
36
37 if (argc < 2 )
38 argc = 2;
39 fflg++;
40
41 whil e (--argc > 0) { Loop over files
42 if (fflg II (*++argv) [0] == ' - ' && ( *argv) [1]== ' \0')
43 fi = stdin;
44 else {
45 if ( ( f i = f open(*argv, "r " )) == NULL ) {
46 fprint f (stderr, "cat : can't op en %s \n" , *argv ) ;
47 conti nue;
48
49
4.4 Doing In pu[ and Outp ut 101
Of note is that the program always exits successfully (line 62); it co uld have been
written to note errors and indicate them in ma in ( ) 's return value. (The mechanics of
process exiting and the meaning of differe nt exit status values are discussed in Sec-
tion 9.l.5.1, "Defining Process Exit Status," page 300.)
T he code dealing with the struet stat and the fsta t () functio n (lines 31-36
and 50-56) is undoubtedly opaque, since we haven't yet covered these functions, and
won ' t until the next chapter. (But do note the use of fileno () on line 50 to get at the
underlying file descriptor asso cia ted with the FILE * variables .) The idea behind the
code is to make sure that no input file is the same as the o utput file. This is intended
to prevent infinite file growth , in case of a co mmand like this:
$ cat myfile » myfile Append one copy of myfile onto itself?
And indeed, the check works:
$ echo hi > myfile Create a file
S v7cat myfile » myfile Attempt to append it onto itself
c at : input myfile is outpu t
If yo u try this with eh04-eat, it will keep running, and myfile will keep growing
until yo u interrupt it. The GNU version of c at does perform the check. Note that
something like the following is beyond eat's control:
S v7cat < myfile > myfile
ca t : input - is output
S Is -1 myfile
-rw-r--r -- 1 arnol d devel o Ma r 24 14: 17 myf i l e
In this case, it's too late because the shell truncated myf i le (with the > operator) before
cat ever gets a chance to examine the file!
In Section 5.4.4.2, "The V7 cat Revisited," page 150, we explain the struet stat
code.
102 Chapter 4 • Files and File I/O
The type ofet (offset type) is a signed integer type representing byte positions
(offsets from the beginning) within a file. On 32-bit systems , the type is usually a long.
However, many modern systems allow very large files, in which case off_t may be a
more unusual type, such as a C99 int64_ t or some other extended type. lseek () takes
three arguments, as follows:
in t fd
The file descriptor for the open file.
off t of fset
A position to which to move. The interpretation of this value depends on the
whenc e parameter. offset can be positive or negative: Negative values move to-
ward the front of the file; positive values move toward the end of the file.
int whence
Describes the location in the file to which o ffset is relative. See Table 4.4.
TABLE 4.4
whence values for lseek ( )
Much old code uses the numeric values shown in Table 4.4. However, any new code
you write should use the symbolic values, whose meanings are clearer.
The meaning of the values and their effects upon file positio n are shown in Figure 4. 1.
Assuming that the file has 3000 bytes and that the current offset is 2000 before each
call to lseek ( ), the new position after each call is as shown:
3040
2960
2040
1960
40
l L Lb""k"d'
lseek(fd,
l s eek ( fd ,
lse ek ( fd,
(o ff_t)
(ofCt) 40 ,
l s eek ( fd,
(o iCt ) 40,
-4 0 , SEEK_CUR) ;
SE EK_SET ) ;
10 ff. 1
(off t) -40,
SEEK_CUR ) ;
40, SEE K_END ) ;
SE EK_ END ) ;
FIGURE 4.1
Offsets for l s eek ( )
N egative offsets relative to the beginning of the file are meaningless; they fai l with
an "invalid argument" error.
The return value is the new position in the file. Thus, to find our where in the file
you are, use
but with a "gap" or "hole" between the data at the previous end of the file and the new
data. Data in the gap read as if they are all zeros.
The following program demonstrates the creation of holes. It writes three instances
of a s tru c t at the beginning, middle, and far end of a file. The offsets chosen (lines
16-18, the third element of each structure) are arbitrary but big enough to demonstrate
the point:
/ * ch 04-ho l es. c Demonstrate lseek() and holes in files . * 1
2
3 #include <s t di o . h> 1* for fp r intf () , stderr, BUFSIZ * 1
4 #include <errno . h> 1* decla r e errno * 1
5 #include <fcnt l .h> 1* f or fl a gs for open ( ) * 1
6 #include <string . h> 1* decla r e strerror ( ) * 1
7 #include <unistd . h> 1* for s si z e - t * 1
8 #include <sys / types . h> 1* f or off _ t , etc. * 1
9 #include <sys / stat . h> 1* for mode - t * /
10
11 struct person (
12 char name [ 1 0] ; 1* first name *1
13 char id [1 0] ; 1* ID n umber * I
14 off_t pos; 1* posit i on in file, for demonstration * 1
15 peop l e [] = {
16 { "arno l d ", " 123456789", 0 l.
17 { "mi riam", "987654321", 10240 l.
18 "j oe " , " 192837465", 81920 },
19 };
20
21 in t
22 main ( in t argc , char * * argv )
23
24 int f d ;
25 int i, j;
26
27 if (argc < 2 ) (
28 fprintf ( stderr, "usage : %s file \ n", argv[O ]) ;
29 return 1;
30
31
32 fd = open (argv[l], O_ RDWR l o_CREATl o_TRUNC, 0666 ) ;
33 if (fd < 0 ) (
34 fprintf ( stderr, "%s : %s : cannot open for read / write : %s \ n" ,
35 a r gv[O], argv[l] , strerror(er r no )) ;
36 return 1;
37
38
39 j = sizeof (people ) I sizeof(people[O] ) ; 1* count of elements * 1
Lines 27-30 make sure that the program was invoked properly. Lines 32-37 open
the named file and verifY that the open succeeded.
4. 5 Random Access: Moving Around wirhin a File 10S
The calculati on on line 39 of j , the array element co unt, uses a lovely, portable trick:
The number of elements is the size of the entire array divided by the size of the first
element. The beauty of this idiom is that it's always right: No matter how many elements
yo u add to or remove from such an array, the compiler will figure it out. It also doesn' t
require a terminating sentinel element; that is, one in which all the fields are set to zero,
NU LL, or some such.
The work is done by a loop (lines 41-55), which seeks to the byte offset given in
each structure (line 42) and then writes the structure out (line 49):
41 for (i = 0; i < j ; i ++ ) {
42 if (lsee k (fd , people [i] .pos, SEE K_SET ) < 0) (
43 fprintf(stde rr, "%s : %s : see k er ror : %s\n ",
44 argv [O] , argv [ l], st rerro r (errno)) ;
45 (void) close ( fd ) ;
46 ret urn 1 ;
47
48
49 i f (wri te( fd, &peopl e [i], si ze of(people[i])) '= siz eo f (pe ople[ i] ))
50 fprintf ( s tderr, "%s : %S : writ e error : %s\ n",
51 argv[O], ar gv[l], str er ror(errn o)) ;
52 (void) clos e ( fd) ;
53 return Ii
54
55
56
57 / * all ok here * /
58 (void) close(fd);
59 return 0 ;
60
We happen to know that each disk block in the file uses 4 096 bytes. (H ow we know
that is discussed in Section 5.4.2, "Retrieving File Information, " page 14l. For now,
take it as a given.) The final be command indicates that a file of size 81 ,944 bytes needs
2 1 disk blocks. However, the -s op tion to 18, which tells us how many blocks a file
106 Chapter 4 • Files and File I/O
really uses, shows that the file uses only 16 blocks!3 The missing blocks in the file are
the holes. This is illustrated in Figure 4.2.
,,;_.
arno ld miriam joe
Block: 3 21
FIGURE 4.2
Holes in a file
I
II NOTE ch0 4-ho1es. c does direct binary I/ O . This nicely illustrates the beauty
of random access I/ O: You can treat a disk file as ifit were a very large array of
binary data structures.
II In practice, storing live data by using binary I/ O is a design decision that you
should consider carefully. For example, suppose you need to move the data to
W a system using different byte orders for integers? Or different floating-point
3 At least three of th ese blocks conrain the data that we wrote ou t; the others are for use by the operating system
in keeping track of where the data reside.
4.6 Creating Files 107
combinations are often expressed in octal, particularly for the chmod and umask com-
mands. For example, file permissions -rw-r--r-- is equivalent to octal 0644 and
-rwxr-xr-x is equivalent to ocral 075 5 . (The leading 0 is C's notation for octal values .)
When yo u create a file, you must know the protections to be given to the new file.
You can do this as a raw octal number if you choose, and indeed it's not uncommon
to see such numbers in older code. However, it is better to use a bitwise OR of one or
more of the symbolic constants from <sys/ stat . h>, described in Table 4.5.
TABLE 4.5
POSIX symbo lic constants for file modes
Older code used S_IREAD, S_IWRITE, and S_IEXEC (Ogether with bit shifting (0
TABLE 4 .6
Additional POSIX symbolic constants for file modes
When standard utilities create files, the default permissions they use are - rw-rw-rw-
(or 0666 ). Because most users prefer (0 avoid having files that are world-writable, each
process carries with it a umask. The umask is a set of permission bits indicating those
bits that should never be allowed when new files are created. (The umask is not used
when changing permissions.) Conceptually, the operation that occurs is
actual-permissions = (requested-permissions & (-umask));
The umask is usually set by the umask command in $HOMEI .profile when you
log in. From a C program, it's set with the umask () system call:
4. 6 Crearing Files 109
The return value is the old umask. Thus, to determine the current mask, you must
set it to a value and then reset it (or change it, as desired):
mo d e_ t mas k = umas k ( O) ; I x re trieve cu rr ent mas k * 1
(v o i d ) umas k (ma s k ) ; 1* res tore i t * 1
The mo de argument represents the permissions for the new file (as discussed in the
previous section). The file named by p athname is created, with the given permission
as modified by the umask. It is opened for writing (only) , and the return val ue is the
file descriptor for the new file or -1 if there was a problem. In this case, er rno indicates
the error. If the file already exists , it will be truncated when opened.
In all other respects, file descriptors returned by creat () are the same as those
returned by open ( ) ; they're used for writing and seeking and must be closed with
c lo s e () :
4 Yes, rhar's how ir's spelled. Ken T hompson , one of rhe [wo "fa rh ers" of Unix, was once asked whar he wo uld
have done differendy if he had ir co do ove r again. H e rep lied rhar he would have speLl ed c reat () wirh an "e."
Indeed, rhar is exacrly whar he di d for rhe Plan 9 From Bell Labs o perating system.
110 Chapter 4 • Files and File I/O
Earlier, we said that when opening a file for plain I/O , we could igno re the mode
argument. Having seen crea t ( ), though, you can probably guess that open () can
also be used for creating files and that the mode argument is used in this case. This is
indeed true.
Besides the O_RDONLY, O_WRONLY, and O_RDWR flags, additional flags may be bitwise
OR'd when open () is called. The POSIX standard mandates a number of these addi-
tional flags. Table 4.7 presents the flags that are used for most mundane applications.
TABLE 4.7
Additional POSIX flags for open ( )
Flag Meaning
O_APPEND Force all wri tes to occur at the end of the fil e.
O_CREAT C reate the fil e if it doesn 't exist.
O_EXCL When used with O_ CREAT, ca use open () to fail if the file already exists.
O_TRUNC Truncate the file (set it to zero length) if it exists.
G iven O_APP END and O_TRUNC, you can imagine how the shell might open or create
files corresponding to the > and » operators. For example:
int f d;
extern char *filename;
mode_t mod e = S_IRUSRI S_IWUSR I S_IRGRP I S_IWGRPIS_IROTH I S_IWOTH; / * 0666 * /
Note that the O_EXCL flag would not be used here, since for both > and », it's not
an erro r for the file to exist. Remember also that the sys tem applies the umask to the
req uested permissions.
Also, it's easy to see that, at least conceptually, cr e at () could be wri tten this easily:
inc creat (const char *path, mode _t model
'i·.~· NOTE If a file is opened wi th O_ APPEND, al l data will be written at the end of
* the file , eve n if the c urrent po si tion has been reset w ith l s eek ( ) .
:(.~
Modern systems provide additional flags whose u ses are more specialized . Table 4.8
describes them briefly.
TABLE 4.8
Addition a l advanced POSIX flags for open ( )
Flag Meaning
O_ NOC TTY If the device being opened is a terminal, it does not become the process's
controlling terminal. (This is a more advanced topic, discussed briefly in
Sectio n 9.2. 1, page 312.)
O_ NONBLOCK Disables blocking o fI /O operations in certain cases (see Section 9.4.3 .4,
page 333) .
° DSYNC Ensure that data written to a file make it all the way to physical storage before
wri te () returns.
° RSYNC Ensure that any data that read () wo uld read, which may have been written
to the file being read, have made it all the way to physical storage before
read () returns.
Like O_DSYNC , but also ensure that all file metadata, such as access times, have
also been written to physical storage.
The O_DSYNC, O_RS YNC, and O_ SYNC flags need so me explanation. Unix sys tems
(including Linux) maintain an internal cache of disk blocks, called the buffer cache.
When the wri te () system call returns , the data passed to the operating system have
been copied to a buffer in the buffer cache. They are not necessarily written out to
the disk.
112 Chapter 4 • Files and File I/O
The buffer cache provides considerable performance improvement: Since disk 110
is often an order of magnitude or more slower than CPU and memory operations,
programs would slow down considerably if they had to wait for every write to go all
the way through to the disk. In addition, if data have recently been written to a file, a
subsequent read of that same data will find the information already in the buffer cache,
where it can be returned immediately instead of having to wait for an I/O operation
to read it from the disk.
Unix systems also do read-ahead; since most reads are sequential, upon reading one
block, the operating system will read several more consecutive disk blocks so that their
information will already be in the buffer cache when a program asks for it. If multiple
programs are reading the same file, they all benefit since they will all get their data from
the same copy of the file's disk blocks in the buffer cache.
All of this caching is wonderful, but of course there's no free lunch. While data are
in the buffer cache and before they have been written to disk, there's a small-but very
real-window in which disaster can strike; for example, if the power goes out. Modern
disk drives exacerbate this problem: Many have their own internal buffers, so while
data may have made it to the drive, it may not have made it onto the media when the
power goes our! This can be a significant issue for small systems that aren't in a data
center with controlled power or that don 't have an uninterruptible power supply (UPS). 5
For most applications, the chance that data in the buffer cache might be inadvertently
lost is acceptably small. However, for some applications , any such chance is not accept-
able. Thus, the notion of synchronous I/O was added to Unix systems, whereby a program
can be guaranteed that if a system call has returned, the data are safely written on a
physical storage device.
The O_ DSYNC Bag guarantees data integrity; the data and any other information that
the operating system needs to find the data are written to disk before wri te () returns.
However, metadata, such as access and modification times, may not be written to disk.
The O_S YNC Bag requires that metadata also be written to disk before wri te ( ) returns.
(Here too there is no free lunch; synchronous writes can seriously affect the performance
of a program, slowing it down noticeably.)
5 If you don 't have a UPS and you use your system for critical work, we highly recommend investing in one. You
should also be doing regular backups.
4.7 Forcing Daca [0 Disk 113
The O_ RSYNC Hag is for data reads: If read () finds data in the buffer cache tha t were
scheduled for writing to disk, then re ad () won' t return that data until they have been
written to disk. The other two Hags can affect this: In particular, O_SYNC will cause
re ad () to wait until the file metadata h ave been written out as well.
I NOTE As of kernel ve rsion 2.4, Linu x treats all three flags the sa me, with
~ essentially the meaning ofO_SYNC . Furthermore , Linux defines additional Rags
I that are Linux specific and intend ed for special ized uses . Check the GNU/ Linux
I
ill
open(2) manpage for more information .
The f d a ta s y nc () system call is like O_DSYNC: It forces all file data to be written to
the final physical device. The fsyn c ( ) system call is like O_SYNC, forcing not just file
data, but also file metadata, to physical storage. The f s ync () call is more portable; it
has been around in the Unix world for lo nger and is more likely to exist across a broad
range of systems.
You can use these calls with <stdi o . h> file pointers by first calling ffl us h () and
then using f il eno ( ) to obtain the underlying file descrip to r. Here is an fpsync ( )
functi on that can be used to wrap both operations in one call. It returns 0 on success:
114 Chapter 4 • Fil es and File I/O
return 0;
)
Technically, both of these calls are extensions to the base POSIX standard: f sync ( )
in the "File Synchronization" extension (FSC) , and f da tasyn c ( ) in the "Synchronized
Input and O utput" extension. N evertheless, you can use them on a GNU/Linux system
witho ut any problem.
As should be obvio us from the p arameters, trun c ate () takes a filename argument,
whereas f t runca t e () works on an open file descriptor. (T he xxx () and fxxxx ( )
naming convention for system call pairs that work on a fil ename or fil e descripto r is
common. W e see several examples in this an d subsequent ch apters.) For both, the
length argument is the new size of the file.
This system call originated in 4 .2 BSD Unix, and in early systems could only be used
to sh orten a file 's length, hence the name. (It was created to simplify implementation
of the truncate operation in Fortran.) On modern sys tems, incl uding Lin ux, the name
is a misnomer, since it's possible to extend the length of a file with these calls, not
just shorten a file. (However, POSIX indicates that the ability to extend a file is an
XSI extension.)
For these calls, the file being truncated must have write permission (for t r uncate ()) ,
or have been opened for writing (for ftrunca t e ( )). If the file is being shortened, any
data past the new end of the file are lost. (Thus, you can' t shorten the file, lengthen it
again , and expect to find th e original data.) If the file is extended, as with data written
after an ls eek ( ) , the data between the old end of the file and the new end of fi le read
as zeros.
4.10 Exercises 115
These calls are very different from ' open ( f i l e , ... I O_TRUNC, mode)' . The latter
truncates a file completely, throwing away all its data. These calls simply set the file 's
absolute length to the given value.
These functions are fairly specialized; they' re used only four times in all of the
GNU Coreutils code. We present an example use of ft r uncate () in Section 5.5.3 ,
"Changing Timestamps: utime (), " page 157.
4. 9 Summary
• When a system call fails , it usually returns -1, and the global variable e r r no is set
to a predefined value indicating the problem. The functions pe rr or () and
s t r er r or () can be used for reporting errors.
• Files are manipulated by small integers called file descriptors. File descriptors for
standard input, standard output, and standard error are inherited from a program's
parent process. Others are obtained with open () or creat ( ) . They are closed
with close (), and getdtables i ze () returns the maximum number of allowed
open files . The value of the umask (set with uma sk ( )) affects the permissions
given to new files created with c r eat () or the O_CREAT flag for open () .
• The read () and wri te () system calls read and write data, respectively. Their
interface is simple. In particular, they do no interpretation of the data; files are
linear streams of bytes. The lseek ( ) system call provides random access I/O: the
ability to move around within a file.
• Additional flags for open () provide for synchronous I/O, whereby data make it
all the way to the physical srorage media before wri te () or r ead () return. Data
can also be forced to disk on a controlled basis with fsyn c () or f d a t async ( ) .
• The trunc at e () and ftr un c a te () system calls set the absolute length of a file.
(On older systems, they can only be used to shorten a file; on modern systems
they can also extend a file.)
Exercises
1. Using just ope n ( ), read ( ) , wri te ( ) , and c l ose () , write a simple cop y
program that copies the file named by its first argument to the file named by
its second.
116 Chapter 4 • Files and File I/O
2. Enhance the copy program to accept" -" to mean "standard input" if used
as the first argument and "standard output" as the second. Does 'copy - -'
work correctly?
3. Look at the proc(5) manpage on a GNU/Linux system. In particular the fd
subsection. Do an 'ls -1 /dev/fd' and examine the files in the
/ proc / self / fd directly. If /dev/s tdin and friends had been around in the
early versions ofU nix, how would that have simplified the code for the V7 cat
program? (Many other modern Unix systems have a / dev / f d directory or
filesystem. If you're not using GNU/Linux, see what yo u can discover about
your Unix version.)
4. Even though you don 't understand it yet, try to copy the code segment from
the V7 cat . c that uses the struct stat and the fstat () function into
ch04-cat. c so that it too reports an error for 'cat file » file' .
5. (Easy.) Assuming the existence of strerror ( ), write your own version of
perror ().
6. What is the result of 'ul imi t -n' on your system?
7. Write a simple version of the umask program, named myumask, that takes an
octal mask on the command line. Use strtol () with a base of 8 to convert
the character string command-line argument into an integer value. Change the
umask to the new mask with the umask () system call.
Compile and run myumask, and then examine the value of the umask
with the regular umask command. Explain the results. (Hint: in Bash, enter
,type umask., )
8. Change the simple copy program you wrote earlier to use open () with the
O_SYNC flag. Using the time command, compare the performance of the
original version and the new version on a large file.
9. For ftruncate (), we said that the file must have been opened for writing.
How can a file be open for writing when the file itself doesn't have write
permission?
10. Write a truncate program whose usage is 'truncate filelength'.
In this chapter
117
T his chapter continues the climb up the learning curve toward the next plateau:
understanding directories and information about files.
In this chapter we explore how file information is stored in a directory, how direc-
tories themselves are read, created, and removed, what information about files is
available, and how to retrieve it. Finally, we explore other ways to update file
metadata, such as the owner, group , permissions, and access and modification times.
5.1.1 Definitions
USER FR I ENDLY by l ll iad
ReMeMBeR THe
DAYS Of' DOS
WITH WNFIG
FUS? e.DIT yeAH? WEU..- 00/,11 WIMP!
WAS SUCH A WUSS. I WROTe MY weLL.. I EDITeD
HOPa£SS IUSCD FUS WITH THe INODES BY
TeXT eDITOR. eDUN. UHO. HANO WITH
MAGNeTS-
\ /
Partition
A unit of physical storage. Physical partitions are typically either part of a disk or
an entire disk. Modern systems make it possible to create logical partitions from
multiple physical ones.
118
5.1 Considering Direcrory Conrenrs 119
Filesystem
A partition (physical or logical) that co ntains file data and metadata, information
abo ut files (as opposed to the file contents, which is information in the files). Such
metadata include file ownership, permissions, size, and so on, as well as information
for use by the operating system in locating file contents .
You place filesystems "in" partitions (a one-to-one correspondence) by wri ting
standard information in them. This is done with a user-level program, such as
mke2 fs on GNU/Linux, or newfs on Unix. (The Unix mkf s co mmand makes
partitions but is difficult to use directly. n ewfs calls it with the correct parameters.
If your system is a Unix sys tem, see the newfs(S) and mkfs(S) manpages for
the details.)
For the most part, GNU/ Linux and Unix hide the existence of filesys tems and
partitio ns . (Further details are given in Section S. l , "Mounting and U n mounting
Filesystems ," page 22S). Everything is accessed by pathnames, wi th out reference
to which disk a file lives on. (Contrast this with almost every other co mmercial
operating system, such as Open VMS, or the default behavio r of any
Microsoft system. )
[node
Short for "index node," ini tially abbreviated "i-node" and now written "inode."
A small block of information describing everythin g about a file except the fi le's
name(s). The number of in odes, and thus the number of unique files per filesystem ,
is set and m ade permanent when the filesystem is created. 'd f - i' can tell yo u
how many in odes you h ave and how many are used.
D evice
In the context of files, filesystems, and file metadata, a unique number representing
an in-use ("mounted") filesystem. The (device, in ode) pair uniquely identifies the
file: Two different files are guaranteed to h ave different (device, inode) pairs. This
is discussed in more detail later in this chapter.
D irectory
A special file , containing a list of (inode number, name) pairs. Directories can be
opened for reading but not for writing; the operating system makes all the changes
to a directory's contents.
120 Chapter 5 • Directories and File Metadara
Conceptually, each disk block contains either some number of inodes, or file data.
The inode, in turn, contains pointers to the blocks that contain the file's data. See
Figure 5.1.
I I I
111 :11111 II
IN NINININ NI
I0 0 10 f 0 1 0 0 I Data Data Data Data
ID DIDID!D DI
iE ElE i ElE Ei
·t~s:m<:m~@"":>.~·" ~::r,:::;::::~;;s::::>.:t'::-.-x~§;;;:;;::::m::::.:."X::::~:::~.;;~:>~::::::e::::;;;:<::::::::::::::: <::::;:::~::~::::::::::'«i;::';;;:::;::::~:~:~~:::::::::::~ :::::~*mX;:~:i:!:;:::';:~::::::i:O:::::;:;":;::~:::::::;;::::ii:(';:;::':::::>';:"<::~;::::::;:::~::::; ::,:::::~~:w~.>,'~
l'----l=-l-------...t..--=-_1~---=-1---=-J______J
FIGURE 5.1
Conceptual view of inode and data blocks
The figure shows all the inode blocks at the front of the partition and the data blocks
after them. Early Unix filesystems were indeed organized this way. However, while all
modern systems still have in odes and data blocks, the organization has changed for
improved efficiency and robustness. The details vary from system to system, and even
within GNU/Linux: systems there are multiple kinds of filesystems, but the concepts
are the same.
I
23 [ . Dot
---->-----------
19 [ . . Dot·dot
----f-----------
-~:-~-~~~~:.----
Filename
FIGURE 5.2
Conceptual directory contents
#ifndef DIRSIZ
#de fin e DIRSIZ 14
#endi f
struct di rec t
ina - t d _ino ;
char d _name [D IRSI Z] ;
};
An ino_t is defined in the V7 <s ys / types . h> as ' type def unsigned in t
ino_ t;' . Since a PDP-II int is 16 bits, so too is the ino_ t . This organization made
it easy to read directo ries directly; since the size of an entry was fixed, the code was
simple. (The only thing to watch out for was that a full 14-character d_name was not
NUL-terminated.)
Directory content management was also easy for the system. When a file was removed
from a directory, the system replaced the inode number with a binary zero, signifYing
that the "slot" in the directory was unused. New files could then reuse the empty slot.
This helped keep the size of directory files themselves reasonable. (By convention, inode
number 1 is unused; inode number 2 is always the first usable inode. More details are
provided in Section 8.1, "Mounting and Unmounting Filesys tems," page 228.)
Modern systems provide long filenames. Each directory entry is of variable length,
with a common limit of 255 bytes for the filename component of the directory. Later
on, we show how to read a directory's contents on a modern system. Modern systems
also provide 32-b it (or even 64-bit!) inode numbers.
122 Chapter 5 • Directories and File Metadata
Since directory entries associate filenames with inodes , it is possible for one file to
have multiple names. Each directory entry referring to the same inode is called a link,
or hard link, to the file. Links are created with the ln command. The usage is 'I n
oldfile newfile' :
$ In message msg Create a link
$ cat msg Show contents of new name
hello, world
$ 1s -il msg message Show inode numbers
228786 -rw-r--r -- 2 arno ld devel 1 3 May 4 15:43 message
228786 -rw-r--r -- 2 arnold devel 13 May 4 15:43 msg
The output shows that the inode numbers of the two files are the same, and the third
fiel d in the long output is now 2. This field is the link count, which reports how many
links (directory entries referring to the inode) the file has.
It cannot be emphasized enough: Hard links all refer to the same file. If you change
one, you have changed the others:
$ echo "Hi, how ya doin' ?" > msg Change file by new name
$ cat message Show contents by old name
Hi, how ya doi n' ?
$ 1s -i1 message msg Show info. Size changed
228786 -rw-r --r-- 2 arnold devel 19 May 4 15: 51 message
22 878 6 -rw-r--r-- 2 arn old devel 19 May 4 15:5 1 msg
Although we've created two links to the same file in a single directory, hard links are
not restricted to being in the same directory; they can be in any other directory on the
same filesystem. (This is discussed a bit more in Section 5.1.6, "Symbolic Links,"
page 128.)
Additionally, you can create a link to a file you don 't own as long as you have write
permission in the directory in which you're creating the link. (S uch a file retains all the
attributes of the original file: the owner, permissions, and so on. This is because it is
the original file; it has only acquired an additional name.) User-level code cannot create
a hard link to a directory.
5.1 Considering Directory Contents 123
Once a link is removed, creating a new file by the same name as the original file
creates a new file:
$ rm message Remove old name
$ echo "What's happenin?" > message Reuse the name
$ Is -il msg message Show information
22879 4 -rw-r --r-- 1 arnold devel 17 May 4 15 : 58 message
228786 -rw-r--r-- 1 arnold devel 19 May 4 15 : 51 msg
Notice that the link co unts for both files are now equal (0 l.
The return value is 0 if the li nk was created su ccessfully, or - 1 oth erwise, in which
case errno reRects the erro r. An im portant failure case is one in which newpa th already
exists. T h e system won' t rem ove it for you, since attempting to do so can cause incon-
sistencies in the filesystem.
64 int
65 main (int argc, char **argv)
66
67 program_name = argv[O];
68 setlocale (LC_ALL, "");
69 bindtextdomain (PACKAGE, LOCALEDIR);
70 textdomain (PACKAGE);
71
72 atexit (close_stdout) ;
73
74 parse_long_options (argc, argv, PROGRAM_NAME, GNU_PACKAGE, VERSION,
75 AUTHORS, usage);
76
77 /* The above handles --help and --version .
78 Since there is no other invocation of getopt , handle here . */
79 i f (1 < argc && STREQ (argv [1], "- - " ) )
80 {
81 --argc;
82 ++argv;
83
84
85 if (argc < 3)
86 {
87 error (0, 0, _("too few arguments" )) ;
88 usage (EXIT_FAILURE);
89
90
91 if (3 < argc )
92 {
93 error (0 , 0 , _ ( "to o many arguments") ) ;
94 usage (EXIT_FAILURE);
95
96
97 if (link (argv[l], argv[2]) != 0)
98 error (EXIT_FAILURE, errno, _( "cannot create link %s to %s"),
99 quote_n (0 , argv[2] ) , quote_n (1 , argv[l]));
100
101 exit (EXIT_SUCCESS);
102 )
1. If the new name for the file names an existing fi le, remove the exis ting file first.
2. Create a new link to the file by the new name.
3. Remove the old name (link) for the file. (Removing names is discussed in the
next section.)
Early versions of the mv command did work this way. However, when done this way,
file renaming is not atomic; that is, it d oesn' t happen in one uninterruptible operation.
And, on a heavily loaded system, a m alicious user could take advantage of race
conditions, 1 subverting the rename operation and substituting a different file for the
original one.
1 A race condition is a situation in which details of timing can produce unintended side effects or bugs. In thi s case,
the direcro ry, for a short period of time , is in an in co nsistent state, and it is this inco nsistency that introduces
the vulnerabi li ry.
126 Chapter 5 • Direcrories and File Metadata
As with other system calls, a 0 return indicates success, and a return value of -1 indi-
cates an error.
Given our discussion of file links, the name makes sense; this call removes the given
link (directory entry) for the file. It returns 0 on success and - 1 on error.
The ability to remove a file requires write permission only for the directory and not for
thefde itself. This fact ca n be confusing, particularly for new Linux/Unix users . However,
since the operation is one on the directory, this makes sense; it is the directory contents
that are being modified, not the file 's co ntents.2
2 Indeed , the file 's metadata are changed (the number of links), but that does not affect any oth er fil e amibute,
nor does it affect th e fil e's contents. U pdating the link co unt is the only operati on on a file th at doesn't involve
ch ecking (he file 's permissions.
5.1 Considering Directo ry Conrenrs 127
While not technically a system call, the return value is in the same vein: 0 on success
and -1 on error, with er rno reflecting the val ue.
128 Chapter 5 • Directories and File Metadata
On GNU/Linux, remove () uses the unlink () system call to remove files, and the
rmdir () system call (discussed later in the chapter) to remove directories. (On older
GNU/Linux systems not using GLIBC, remove () is an alias for unlink ( ) ; this fails
on directories . If you have such a system, you sho uld probably upgrade it.)
Large systems often have many partitions, both on physically attached local disks
and on remo tely mounted network filesystems. The hard-link restriction to the same
filesystem is inconvenient, for example, if som e files or directories must be moved to a
new location, but old software uses a hard-coded filename for the old location.
To get around this restriction, 4 .2 BSD introduced symbolic links. A symbolic link
(also referred to as a soft Link) is a special kind of file (just as a directory is a special kind
of file). The contents of the file are the pathname of the file being "pointed to." All
modern Unix systems, including Linux, provide symbolic links; indeed they are now
part of POSIX.
Symbolic links may refer to any file anywhere on the system. They may als o refer to
directories. This makes it easy to move directories from place to place, with a symbolic
link left behind in the original location pointing to the new location.
5.1 Considering Directory Contents 129
When processing a filename, the system notices symbolic links and instead performs
the actio n on the pointed-to fi le or directory. Symbolic links are created with the - s
op tion to In:
$ /bin/pwd Where are we
/ d/home/arnold On a different filesystem
$ In -s /tmp/message ./hello Create a symbolic link
$ cat hello Use it
Hi, how ya doin' ?
$ ls -1 hello Show information about it
lrwxrwx rwx 1 arnold devel 12 May 4 16 :4 1 hello -> / tmp / message
The file pointed to by the link need not exist. The system detects this at runtime
and acts appropriately:
$ rm /tmp/message Remove pointed-to file
$ cat ./hello Attempt to use it by the soft link
cat : . /he llo : No such file or directo ry
$ echo hi again > hello Create new file contents
$ ls -1 /tmp / message Show pointed-to file info ..
-rw-r--r-- 1 arnold devel 9 May 4 16 : 45 /tmp/message
$ cat /tmp/message ... and contents
hi again
The oldpath argument n ames the pointed-to file or directory, and newpath is the
name of the symbolic link to be created. The return value is 0 on success and - 1 on
error; see your symlink(2) man page for the possible errno values.
Symbolic links have their disadvantages:
• They take up extra disk space, requiring a separate inode and data block. H ard
links take up only a directo ry slot.
• They add overhead. The kernel has to work harder to resolve a pathname contain-
ing symbolic links.
• They can introduce "loops. " Consider the following:
$ rm -£ a b Make sure 'a' and 'b' don't exist
$ In -s a b Symlink old file 'a' to new file 'b'
$ In -s b a Symlink old file 'b' to new file 'a'
$ cat a What happens?
cat : a : Too many leve l s o f symbolic links
130 Chapter 5 • Directories and File M etadata
The kernel has to be able to detect this case and produce an error message.
• They are easy to break. If you move the pointed-to file to a different location or
rename it, the symbolic link is no longer valid. This can' t happen with a hard link.
Both return 0 on success and - 1 on error, with errno set appropriately. For mkdi r ( ) ,
the mode argument represents the permissions to be applied to the directory. It is
completely analogous to the mode arguments for c rea t () and open () discussed in
Section 4.6, "Creating Files," page lOG.
Both functions handle the' . ' and ' . . ' in the directory being created or removed. A
directory must be empty before it can be removed; errno is set to ENOTEMPTY if
the directory isn' t empty. (In this case, "empty" means the directory contains only ' . '
and ' .. ' .)
New directories, like all fi les, are assigned a group ID number. Unfortunately, how
this works is complicated. We delay discussion until Section 11.5.1 , "Default Group
for New Files and Directories, " page 412.
Both functions work one directory level at a time. If ! s ome d ir exists and
! s ome d ir ! sub1 d oes not, 'mkd ir ( " ! somedi r ! sub1 ! sub2 " ) , fails. Each component
in a long pathname h as to be created individually (th us the - p option to mkdir,
see mkdir(1 )).
Also , if pathname ends with a ! character, mkdir ( ) and rmdi r ( ) will fail on some
systems and succeed on others . The fo llowing program, ch 05- t r ymkd i r . c , dem on-
strates both aspects.
5.2 Crearing and Removing Direcrories 13 1
Line 60 opens the directory for reading (a second argument of 0 , equal to O_RDONLY).
Line 65 reads the struct dire c t. Line 66 is the check for an empty directory slot;
that is , one with an inode number of o. Lines 67 and 68 check for ' .' and' .. '. Upon
reaching line 69, we know that some other fil ename has been seen and, therefore, that
the directory isn't empty.
(The test'! strcmp (51, s2)' is a shorter way of saying 'strcmp (51, s2) == 0' ;
that is, testing that the strings are equal. For what it's worth , we consider the
, ! 5 trcmp (sl, s2)' form to be poor style. As Henry Spencer once said, "s t r cmp ( )
is not a boolean!")
When 4.2 BSD introduced a new filesys tem form at that allowed longer fil enames
and provided better performance, it also introduced seve ral new functions ro provide
a directory-reading abstraction. This suite of functions is usable no matter what the
underlying filesystem and directory organization are. The basic partS of it are what is
standardized by POSIX, an d programs using it are portable across GNU / Linux and
U nix systems.
};
For portability, POSIX specifies only the d_name field, which is a zero-terminated
array of bytes representing the fil ename part of the directory entry. The size of d_name
is not specified by the standard, other than to say that there may be at m os t NAME_ MAX
bytes before the terminating zero. (NAME_ MAX is defined in <limits . h>.) T he XSI ex-
tension to PO SIX provides for the d_ ino inode number field .
In practice, since filenames can be of variable length and is usually fairly
NAME_MAX
large (like 255), the st ruct dirent contains additional members that aid in the
bookkeeping of variable-length directory entries on disk. These additional members
are not relevant for everyday code.
The following functions provide the directory-reading interface:
#include <sys /types .h> POSIX
#include <d iren t . h>
The DIR type is analogous to the FILE type in <s td i o . h> . It is an opaque type,
meaning that application code is not supposed to know what's inside it; its contents
are for use by the other directory routines. If opendir () returns NULL , the named di-
rectory could no t be opened fo r reading and errno is set to indicate the error.
134 Chapter 5 • Directories and File Me(ada(a
This program is quite similar to ch04 - cat . c (see Section 4.2, "Presenting a Basic
Program Structure," page 84); the main () function is almost identical . The primary
difference is that it defaults to using the current directory if there are no arguments
(lines 20-21) .
5.3 Reading Direcrories 135
29 1*
30 * process --- do something wit h the di rect ory, in th is case,
31 print inode/name pairs on standard output .
32 Retur ns 0 if all o k, 1 o therwis e .
33 *I
34
35 int
36 process(char *dir )
37
38 OIR *dp;
39 struct dirent *ent ;
40
41 if ((dp = op endir (di r )) = = NULL) (
42 fprintr( s tderr, "% s : %s : cannot open for read ing : %s\n",
43 myname, di r, s crer ror(e rrno )) ;
44 return 1 ;
45
46
47 err no = 0 ;
48 whil e (( ent = readdir (dp)) ' = NULL )
49 printf ("%8Id %s \ n", enc->d_ino, enc->d_ name ) ;
50
51 if ( errno ! = 0) {
52 fprintf (stde rr , "%s : %s : rea ding directory ent ries : %s\n",
53 myname , dir, stre rror(er rno)) ;
54 return 1 ;
55
56
57 if (closedir (dp) ' = 0) (
58 fprintf (stde rr , "%S : %S : closedir: %s\n ",
59 myname , dir, st rerror(errno)) ;
60 return 1;
61
62
63 ret urn 0 ;
64
T he p r oc ess () functio n does all the work, and the m ajority of it is error-checking
code. The h eart of the function is lines 4 8 and 49 :
while (( ent = readdir (dp )) ! = NULL)
print f ( " %8Id %s\n ", en t->d_i no , ent ->d_ name ) ;
T his loo p reads directory entries, one at a time, until readdir () returns NUL L . The
loo p body prints the inode num ber and filename of each entry. H ere's wh at happens
when the p rogram is run :
136 Chapter 5 • Directories and File Metadata
The output is not sorted in any way; it represents the linear contents of the directory.
(We describe how to sort the directory contents in Section 6.2, "Sorting and Searching
Functions," page 181 .)
4 GNU/Linux systems are capable of mountin g filesystems from many non-U nix operating systems. Many com-
mercial Unix systems can also mount MS- DOS filesystems. Assumptions about Unix filesystem s don't apply in
such cases.
5.3 Reading Direc(Ories 137
Many system calls, such as open ( ) , r ead ( ) , and wr i t e ( ) , are meant to be call ed
directly from user-level application code: in other words, from code that you, as a
GNU/Linux developer, would write.
However, other system calls exis t only (0 make it poss ible to implement higher-level,
standard library function s and should not be called directly. The GNU/Linux
ge t d e nt s () sys tem call is one such; it reads multi ple directory entries into a buffer
provided by the caller- in this case, the code that implements r e add ir ( ). The
re addir () code then returns valid directory entries from the buffer, one at a time,
refilling the buffer as needed.
These for-library- use-o nly system calls can be distinguished fro m for-user-use system
calls by their appearance in the man page. For examp le, from getdents (2):
NAME
get dent s - get d ir e c tory entr ie s
SYNOPSIS
#include <unistd . h >
#include <linux / type s . h>
#inc l ude <linux /dirent . h>
#include <linux/ un i std . h >
int getde nts (unsigne d int fd, s t r uct dirent *di r p, unsigne d int count);
Any system call that uses a _syscal lX () macro should not be called by application
code. (More information on these calls can be found in the intro(2) manpage; yo u should
read that manpage if yo u haven 't already.)
In the case of get d en ts ( ), many other Unix systems have a similar system call;
som etimes with the same name, sometimes wi th a different name. Thus, trying to use
these calls would only lead to a massive portability mess anyway; yo u're much better off
in all cases using readd ir ( ) , whose interface is well defined , standard, and portable.
st ruct d irent {
} ;
TABLE S.1
Values for d_type
Name Meaning
Knowing the file's type just by reading the directory entry is very handy; it can save
a possibly expensive s ta t () system call. (The s ta t () call is described shortly, in Sec-
tion 5.4.2, "Retrieving File Information," page 141.)
/ * Caveat Emptor : POSIX XSI uses long, not of f_t, for both function s * /
of f_t telld ir (DIR *dir ) ; Return current position
voi d seekdir(DIR *dir, off_t off set); Move to given position
5.4 Ob[aining Informa[ion abou[ Files 139
These routines are similar to the ft ell () and fs eek () functi ons in <stdi o . h> .
T hey return the current positio n in a directory and set the current position to a previ-
ously retrieved value, respectively.
These routines are included in the XSI part of the POSIX standard, since they make
sense only for directories that are implemented with linear storage of directo ry entries.
Besides the ass umptio ns made about the underlying directory structure, these routines
are riskier to use than the simple directory-reading routines. This is because the contents
of a directory might be changing dynami cally: As files are added to or removed from a
directory, the operating syste m adjusts the contents of the directory. Since directory
entries are of variable length, it may be that the absolute offset saved at an earlier ti me
no longer represe nts the start of a directory entry! Th us, we don 't reco mmend that yo u
use these functions unless you have to.
Regular files
As the name implies; used for data, executable programs, and anything else you
might like. In an '1 s - 1' listing, they show up with a ' - ' in the first character of
the permissions (mode) field .
Directories
Special files for associating file names with inodes. In an ' I s -1' listing, they show
up with a d in the first character of the permissions field.
Symbolic links
As described earlier in the chapter. In an ' Is -1' listing, they show up with an 1
(letter "ell," not digit 1) in the first character of the permissions fiel d.
140 Chapter 5 • Direc[Qries and File Metadata
Devices
Files representing both physical hardware devices and software pseudo-devices.
There are two kinds:
Block devices
Devices on which I/O happens in chunks of some fixed physical record size,
such as disk drives and tape drives . Access to such devices goes through the
kernel's buffer cache. In an '18 -1 ' listing, they show up with a b in the first
character of the permissions field.
Character devices
Also known as raw devices. Originally, character devices were those on which
I/O happened a few bytes at a time, such as terminals. However, the character
device is also used for direct I/O to block devices such as tapes and disks,
bypassing the buffer cache. 5 In an ' 1 8 -1' listing, they show up with a c in
the first character of the permissions field .
Namedpipes
Also known as FIFOs ("first-in first-out") files. These special files act like pipes;
data written into them by one program can be read by another; no data go to or
from the disk. FIFOs are created with the mkfifo command; they are discussed
in Section 9.3.2, "FIFOs," page 319. In an '1 8 -1 ' listing, they show up with a
p in the first character of the permissions field.
Sockets
Similar in purpose to named pipes,6 they are managed with the socket interprocess
communication (IPe) system calls and are not otherwise dealt with in this book.
In an ' 18 -1 ' listing, they show up with an 8 in the first character of the permis-
sions field .
5 Linux uses the block device for disks exclusively. Other systems use both.
6 Named pipes and sockets were developed independendy by the System V and BSD U nix groups, respectively.
As U n ix systems reco nve rged, both kinds of files became uni versally available.
5. 4 Obtaining Information about Files 141
The s t a t () function accepts a pathname and returns information about the given
file. It follows symbolic links; that is, when applied to a symbolic link, s ta t () returns
information about the pointed-to file , not about the link itself. For those times when
you want to know if a file is a symbolic link, use the lstat () function instead; it does
not follow symbolic li nks.
The f s ta t () function retrieves information about an already open file. It is partic-
ularly useful for file descriptors 0, 1, and 2, (standard input, output, and error) which
are already open when a process starts up. However, it can be applied to any open file.
(An open file descriptor will never relate to a symbolic link; make sure you under-
stand why.)
The value passed 10 as the second parameter should be the address of a struct
stat, declared in < s y s /stat . h> . As with the struct dirent , the st ru ct stat
contains at least the following members:
struct stat {
};
(The layout may be different on different architectures.) This structure uses a number
of typede f' d types . Although they are all (typically) integer types , the use of specially
142 Chapter 5 • Directories and File Metadata
defined types allows them to have different sizes on different systems. This keeps user-
level code that uses them portable. Here is a fuller description of each field .
st_dev
The device for a mounted filesystem. Each mounted filesystem has a unique value
for st_dev.
st ino
The file 's inode number within the filesystem. The (st_dev, st_ino) paIr
uniquely identifies the file .
st_rnode
The file's type and its permissions encoded together in one field. We will shortly
see how tu extract this information.
st_n l i nk
The number of hard links to the file (the link count). This can be zero if the file
was unlinked after being opened.
st_uid
The file 's UID (owner number).
st_ gid
The file's GID (group number).
st rdev
The device type if the file is a block or character device. s t _ rdev encodes infor-
mation about the device. We will shortly see how to extract this information. This
field has no meaning if the fil e is not a block or character devi ce.
st siz e
The logical size of the file. As mentioned in Section 4.5 , "Random Access: Moving
Around within a File," page 102, a file may have holes in it, in which case the size
may not reflect the true amount of storage space that it occupies.
s t _ bl ks i ze
The "block size" of the file. This represents the preferred size of a data block for
I/O to or from the file. This is almost always larger than a physical disk sector.
Older Unix systems don 't have this field (or st_blocks ) in the stru ct stat .
For the Linux ext2 and ext 3 filesystems , this value is 4096.
5.4 Obtaining In formatio n about Files 143
st_b locks
The number of "blocks " used by the file. On Linux, this is in units of 512-byte
blocks. On other systems, the size of a block m ay be different; check your local
stat(2) manpage. (This number comes from the DEV_B S IZE constant in
<sys / param . h>. This co nstant isn' t standardized, but it is fairly widely used on
U nix systems.)
The number of blocks may be m ore than 's t_s ize / 51 2 '; besides the data
blocks, a filesystem m ay use additional blocks to store the locations of the data
blocks. This is particularly necessary for large fi les.
st a time
The file's access time; that is, the last time the file's data were read .
st_mt i me
T he file 's modification time; that is , the last time the file 's data were written or
truncated.
st ctime
The file's inode change time. This indicates the last time when the file's metadata
changed, such as the permissions or the owner.
i
~ NOTE The st_c time field is not the file's "creation tim e" ! There is no such
. th ing in a Linux or Unix system . Some early documentati o n referred to the
st_c time field as the creation time. This was a m isguid ed effort to s imp lify
I th e presentation of the file m etadata.
The t ime_t type used for the st_atime, s t _mtime , and st_ c time fields represents
dates and times. These time-related values are sometimes termed timestamps. Discussion
of how to use a time_t value is delayed until Section 6. 1, "Times and Dates," page 166.
Similarly, the uid_t and gid_t types represent user and group ID numbers, which
are discussed in Section 6.3, "User and Group Names," page 195. Most of the other
types are n ot of general interest.
Once stbuf has been filled in by the system, the following macros caI1 be called,
being passed stbuf . st_mode as the argument:
S_ ISREG{stbuf.st_mode)
Returns true if filen ame is a regular file.
S_I SDIR {stbuf .st_mode )
Returns true if fi lename is a directory.
S_ISCHR{ stbuf.st_mode)
Returns true if filen a me is a character device. Devices are shortly discussed in
more detail.
S_ISBLK{stbuf. st_mode )
Returns true if f i 1 ename is a block device.
S_ISF I FO{ stbu f. st_mo de )
Returns true if filen ame is a FIFO.
5.4 Ob(aining Informacio n abou( Files 145
S_ISLNK(stbuf . st_mode)
Returns true if fi lename is a symbolic link. (This can never return true if stat ( )
or fst at () were used instead of lstat ( ) .)
S_IS SOCK(stbuf .st_mode )
Returns true if f i 1 ename is a socket.
:r~
'~':~
i
'ru
I
NOTE It happens that on GNU/ Linux, these macros return 1 for true and 0
for false . However, on other system s, it's possible that they return an arbitrary
nonzero value for true, instead of 1. ( POSIX specifies only non ze ro vs. zero .)
\~ Thus, you shoul d always use these macros as standalone tests instead of testing
:~ the return va lue:
.ill if (S_ISREG(stbuf . st_mode)) . .. Correct
fg
~
~~ if (S_ IS REG (stbu f .s t_mode ) == 1) Incorrect
11
Along with the m acros, <sys / stat . h> provides two sets of bit masks. One set is for
testing permission, and the other set is for testing the type of a file. We saw the permis-
sion masks in Section 4 .6, "Creating Files," page 106, when we discussed the mode_ t
type and values for open () and creat ( ) . The bitmasks, their values for GNU/ Linux,
and their meanings are described in Table 5.2.
Several of these masks serve to isolate the different sets of bits encoded in the
s t_mode field:
• S_ IFMT represents bits 12- 15, which are where the different types of files are
encoded.
• S_IRWXU represents bits 6-8 , which are the user's permission (read, write, execute
for User) .
• S_IRWXG represents bits 3-5, which are the group's permission (read, write, execute
for Group).
• S_ IRWXO represents bits 0-2 , whi ch are the "other" permission (read, write, execute
for Other).
The permission and file type bits are depicted graphically in Figure 5.3.
146 Chapter 5 • Directories and File Meradara
TABLE 5.2
POSIX fi le-type and perm issi on bitmasks in <sys / stat .h>
T he file- rype masks are standardized primarily for compatibiliry with older code;
they should not be used directly, because such code is less readable than the corres pond-
ing macros . It h appens that the macros are implemented, logically enough, with the
masks, but that's irrelevant for user-level code.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 o
FIGURE 5.3
Permission and file-type bits
The POSIX standard explicitly states that no new bitmasks will be standardized in
the futute and that tests for any addi tional kinds of file rypes that may be added will
be available only as S_I Sxxx () macros.
Instead of the file size, l s displays the major and minor numbers. In the case of the
hard disk, / dev / hda represents the whole drive. /dev / hdal, / dev / hda 2, and so on,
represent partitions within the drive. They all share the same major device number (3),
but have different minor device numbers.
Note that the disk devices are block devices, whereas / dev / nu11 is a character device.
Block devices and character devices are separate entities; even if a character device and
a block device share the same major device number, they are not necessarily related.
The major and minor device numbers can be extracted from a d ev_ t value with the
ma j or () and mi nor () functions defined in <sys / sysmacros . h >:
#include <s ys / type s .h> Common
#i nclude <sys/sysmacros . h>
if (argc I = 2)
fprintf( stderr, "usage : %s path\n", argv[O]);
exi t (1) ;
if (S_ISCHR ( sbuf.st_mode))
devtype = "char";
else if (S_ISBLK(sbuf . st _mode))
devtype = "block";
el se {
fprintf (stderr, "%s is not a block or character devic e \n", argv[l]) ;
e xi t (1) ;
e xit(O) ;
Fortunately, the outp ut agrees with that of ls, giving us confidence 7 that we h ave
indeed wri tten correct code.
Reproducing the o utput oEl s is all fine and good, but is it really useful? T he answer
is yes. Any application that works with file hierarchies must be able to distinguish among
all the different types of files. Consider an archiver such as tar or cpio . It would be
disastrous if such a program treated a disk device file as a regular file, attemp ting to
read it and sto re its contents in an archive! Or consider fin d, which can perform
arbitrary actions b ased on the eype and other attributes of files it encounters. (fi nd is
a complicated program; see find(l ) if you're not familiar with it.) Or even something
as simple as a disk space accounting package has to distinguish regular files from
everything else.
This code should now make sense. Line 31 calls f s ta t () on the standard output
to fill in the s ta t b structure. Line 32 throws away all the information in
s tatb . s t_mode except the file eype, by ANDing the mode with the S_IFMT mask.
Line 33 checks that the file being used for standard output is not a device file . In that
case, the program saves th e device and inode numbers in dev and ino . These values
are then checked for each input file in lines 50- 56:
50 fstat(f il eno(fi), &statb ) ;
51 if (s tatb.st_dev ==dev && statb .s t _ino==ino)
52 fp ri ntf ( stderr, " ca t : input %s is output\n",
53 ff lg?"-": *argv);
54 fclo s e(fi);
55 co nt inu e ;
56
If an input file 's s t_dev and s t_ino values match those of the output file , then c a t
complains and continues to the next file named on the command line.
The check is done unconditionally, even though dev and ino are set only if
the output is not a device file. This works out OK, because of how those variables
are declared:
17 int dev, ino -1;
5.4 Obtaining Informarion about Files 151
Since i no is initialized to - 1, no valid inode number will ever be eq ual to ir. 8 That
dev is not so initialized is sloppy, but not a problem , since the test on line 51 requires
th at both the device and inode be equal. (A good compiler will complain that dev is
llsed without being initialized: 'gee - Wa 11' does.)
No te also that neither call to fsta t () is checked for errors. This too is sloppy, al-
though less so; it is unlikely that fsta t () wi ll fail on a valid file descriptor.
The test for input file equals output file is done only for nondevice files. This makes
it possible to use eat to copy input from device files to themselves, such as
with terminals:
$ tty Print current terminal device name
/ dev/ p ts!3
$ cat /dev/pts/3 > /dev/pts/3 Copy keyboard input to screen
this is a line of text Type in a line
this i s a line o f text cat repeats it
We already saw that the syml ink () system call creates a symbolic link. But given
an existing symbolic link, how can we retrieve the name of the file it points to? (Is
obviously can, so we ought to be able to also.)
Opening the link with open () in order to read it with read () won't work; open ()
fo llows the link to the pointed-to file. Symbolic links thus necessitate an additional
system call, named re adlink ( ) :
8 This s(a(ement was (rue for V7 ; (h ere are no such gu arantees on modern sys(ems.
152 Chapter 5 • Direcrories and File Meradara
readl ink () places the contents of the symbolic link named by pa th into the buffer
pointed to by buf . No more than bufsiz characters are copied. The return value is
the number of characters placed in buf or - 1 if an error occurred . readlink () does
not supply the trailing zero byte.
Note that if the buffer passed in to readlink () is too small, you will lose informa-
tion; the full name of the pointed-to file won't be available. To properly use
r eadlink () , your code should do the followin g:
count = r eadli nk(link file , real fi le, PATH_MAX ) ; Read the link
if (count ! = sbuf.st_size )
/ * something weird going on, handle i t * /
the contents of a symbolic link into storage allocated by malloc ( ) . We show here just
the function ; most of the fil e is boilerplate definitions. Line numbers are relative to the
start of the file:
55 1* Call readlink t o get the s ymbolic l ink value of FILENAME .
56 Return a poi nter to tha t NUL-terminated string in mall oc'd stor age.
57 If readlink fai ls, return NULL (calle r may use errno c o diagnos e ) .
58 If r ealloc fails, or if the link value is longer than SIZ E_MAX :- ) ,
59 give a diagno s tic and exit . *1
60
61 c ha r *
62 xre a dlink (cha r c onst *filename)
63
64 1* The initi al buffer size f or the link val ue . A power of 2
65 detects arithmetic overflow earlier, but is not requir ed . *1
66 si ze_ t buf_size = 128;
67
68 while ( 1)
69 (
70 cha r *buffer = xmall oc (bu f_size ) ;
71 ssi ze_ t link_l ength = readl ink ( fi lename , buffer, buf _s i ze ) ;
72
73 if (l ink_leng th < 0)
74 (
75 inc saved_e rrno = errno ;
76 free (bu ffer ) ;
77 errno = saved_errno;
78 return NULL ;
79
80
81 if (( size_t ) link_leng th < bu f_ size )
82 (
83 buffer [link_ leng th) 0;
84 return buffe r;
85
86
87 free (buff er) ;
88 bu f_size *= 2;
89 if ( SSIZE_MAX < buf si ze II (SIZE_MAX I 2 < SSIZE_MAX && buf S1.ze 0) )
90 xalloc_d ie () ;
91
92 }
The function body consists of an infinite loop (lines 68-91), broken at line 84 which
returns the allocated buffer. The loop starts by allocating an initial buffer (line 70) and
reading the link (line 71) . Lines 73- 79 handle the error case, saving and restoring errno
so that it can be used correctly by the calling code.
Lines 81-85 handle the "s uccess" case, in which the link's contents' length is smaller
than the buffer size. In this case, the terminating zero is supplied (line 83) and then the
154 Chapter 5 • Directories and File Metadata
buffer returned (line 84), breaking the infinite loop. This ens ures that the entire link
contents have been placed into the buffer, since readlink () has no way to indicate
"insufficient space in buffer. "
Lines 87-88 free the buffer and double the buffer size for the next try at the top of
the loop. Lines 89-90 handle the case in which the link's size is roo big: bu C siz e is
greater than SSIZE_ MAX, or S SIZ E_MAX is larger than the value that can be represented
in a signed integer of the same size as used to hold SI ZE_MAX and buf_siz e has wrapped
around to zero. (These are unlikely conditions, but strange things do happen .) If either
condition is true, the program dies with an error message. Otherwise, the function
continues around to the top of the loop to make another try at allocating a buffer and
reading the link.
Some further explanation: The 'SIZE_MAX / 2 < S SIZE_MAX' condition is true
only on systems on which 'SI ZE_ MAX < 2 * SS IZE_MAX' ; we don ' t know of any, but
only on such a system can bu f _ s i z e wrap around to zero. Since in practice this co ndi-
tion can't be true, the compiler can optimize away the whole expression, including the
following 'buCsiz e == 0' test. After reading this code, you might ask, "Why not use
1 s ta t () to retrieve the size of the symbolic link, allocate a buffer of the right size with
mal lo c (), and be done ?" Well, there are a number of reasons .9
Finally, when the buffer isn't big enough, xreadl ink () uses fr e e () and malloc ( )
with a bigger size, instead of realloc ( ) , to avoid the useless copying that realloc ( )
does. (The comment on line 58 is thus out of date since reall oc () isn't being used;
this is fixed in the post-5.0 version of the Coreutils.)
c hown () works on a pathname argument, fch own ( ) works on an open file, and
1ch own ( ) works on symbolic links instead of on the files pointed to by symbolic links.
In all other respects, the three calls work identically, returning 0 on success and - 1
on errOL
It is noteworthy that one system call ch anges both the owner and gro up of a file. To
change only the owner or only the group, pass in a value of -1 for the ID number that
is to be left unchanged.
While you might think that you could pass in the corresponding value from a previ-
ously retrieved s t ruct s ta t for the file or file descriptor, that method is more erro r
prone. There's a race condition: The owner or group could have changed between the
call to s ta t () and the call to chown ( ) .
You might wonder, "Why be able to change ownership of a symbolic link? The
permissions and owners hip on them don't marteL" But what happens if a user leaves,
but all his files are still needed? It's necessary to be able to change the ownership on all
the person's files to someone else, including symbolic links.
GNU/Linux systems normally do not permit ordinary (non-root) users to change
the ownership of ("give away") their files. Changing the group to one of the user's
groups is allowed, of course. The restriction on changi ng owners follows BSD sys tems,
156 Chapter 5 • Directories and File Metadata
which also have this prohibition. The primary reason is that allowing users to give away
files can defeat disk acco unting. Consider a scenario like this:
$ mkdir mywork Make a directory
$ chmod go-rwx mywork Set permissions to drwx------
$ cd mywork Go there
$ myprogram > large_ data_ file Create a large file
$ chmod ugo+rw large_ data_ file Set permissions to - rw-rw- rw-
$ chown otherguy large_ data_ file Give file away to otherguy
In this example, large_da ta_file now belongs to user otherguy. The original
user can continue to read and write the file , because of the permissions. But otherguy
will be charged for the disk space it occupies. However, since it's in a directory that
belongs to the original user, which cannot be accessed by o therguy, there is no way
for otherguy to remove the file.
Some System V systems do allow users to give away files. (Setuid and setgid files have
the corresponding bit removed when the owner is changed.) This can be a particular
problem when files are extracted from a . tar or . epio archive; the extracred files end
up belonging to the UID or GID encoded in the archive. On such systems, the tar
and epio programs have options that prevent this, but it's important to know that
ehown ( ) 's behavior does vary across systems.
We will see in Section 6.3 , "User and Group Names," p age 195, how to relate user
and group names to their corresponding numeric values.
chmod () works on a path name argument, and f chmo d () works on an open file.
(There is no lchmod () call in POSIX, since the system ignores the permission settings
on symbolic links. Some systems do have such a call, though.) Ai> with most other system
calls, these return 0 on success and -1 on failure. Only the file's owner or r oot can
change a file 's permissions.
5.5 C hanging Ownership , Permission, and Modifica rio n Times 157
T he mode val ue is created in the same way as for open () and creat () , as discussed
in Section 4.6, "Creating Files, " page 106 . See also Table 5.2 , which lists the permis-
SlO n constan ts .
T he system will not allow setting the setgid bit (S_ISGID) if the group o f the file
does not m atch the effective gro up ID of the p rocess or one of its supplem entary gro ups.
(We have no t yet d iscussed these iss ues in detail; see Sectio n 11 . 1.1, "Real and Effective
IDs, " page 40 5.) Of course, this check does not apply to r oot or to code running
as roo t .
to UTC is a language- independe nt acronym for Coordinated U niversal T ime. O lder code (and so metimes older
people) refer ro this as "Greenwich Mean T ime" (GMT), which is the time in G reenwich , England . When time
zones ca me inro widespread use, G ree nwich was chosen as the locatio n to which all other time zo nes are relative,
either behind it or ahead of it.
158 Chapter 5 • Directories and File Metadata
s truct u t irnbuf {
time_t a ctime; / * access time * /
time_t modtime ; / * modifi c at i on t ime * /
};
If the call is successful, it returns 0; otherwise, it returns -1. Ifbuf is NULL, then the
system sets both the access time and the modification time to the current time.
To change one time but not the other, use the original value from the s t ruet sta t .
For example:
/ * Error checking o mi tted for brevity * /
struct s t a t sbuf;
struct utirnbuf uti
time_t now ;
time (& now) ; Get current time of day, see next chapter
stat( " /some / f ile ", & sbuf ); Fill in sbuf
ut . actime = sbuf.st_ at i me; Access time unchanged
IIi NOTE I n new code, yo u may wish to use the utimes () call (note the s in the
I name ), which is described later in th e book, in Section 14.3. 2, "Microsecond
I File Tim es: utimes ( ) ," page 545 .
47 static int
48 utime_null (const char *fil e )
49
50 #if HAVE_UTIMES_NULL
51 return utimes (file , 0);
52 #els e
53 int fd;
54 char c;
55 int status = 0 ;
56 struct stat sb;
57
58 fd = open(file, O_RDWR ) ;
59 if (fd0 <
60 II fs tat ( fd, &sb ) < 0
61 I I saf e_read (fd , &c, sizeof c ) == SAFE_READ_ERROR
62 II l s eek (fd , (o fe t) 0 , SEEK_ SET ) < 0
63 II full _write (fd , &c, sizeo f c) ! = sizeof c
64 /* Maybe do this - - it's nece ss ary on SunOS4. 1.3 with some combina tion
65 of patches, but that system doesn't use this code : it has utimes .
66 I I fsync ( fd ) < 0
67 *I
68 I I (st.s t_size == 0 && ftruncate (fd , st.st_s ize) < 0)
69 II close (fd) < 0)
70 status = -1;
71 return status;
72 #endif
73
74
75 i nt
76 rpl _ut ime (const cha r *file, const struct utimbuf *times)
77
78 if ( times )
79 return utime ( fil e, times ) ;
80
81 return utime_nu ll (file);
82 }
Lines 33-41 define the struc t utimbuf; as the comment says, some systems don ' t
declare the structure. The utime_nul l () function does the wo rk. If the utimes ()
system call is available, it is used. (ut imes () is a similar, but more advanced, system
call, which is covered in Section 14.3.2, "Microsecond File Times: ut ime s ( ) ," page 545 .
It also allows NULL for the second argument, meaning use the current time. )
In the case that the times must be updated manually, the code does the update by
first reading a byte from the file , and then writing it back. (The original Unix touc h
worked this way.) The operations are as follows:
3. Read one byte, line 6 1. For our purposes, safe_read () acts like read() ; it's
explained in Section 10.4.4, " Restartable System Calls," page 357.
4. Seek back to the front of the file with lseek ( ) , line 62. This is done to write
the just-read byte back on top of itself.
5. Write the byte back, line 63. full_write () acts like writ e (); it is also covered
in Section 10.4.4, "Restartable System Calls," page 357.
6. If the file is of zero size, use ftruncate () to set it to zero size (line 68) . This
doesn't change the file , but it has the side effect of updating the access an d
m odificatio n times. (ft runcate () was described in Section 4.8, "Setting File
Length," page 114.)
7. Close the file , line 69.
T hese steps are all done in one long successive chain of tests , inside an if. The tests
are set up so that if any operation fails, u time_null () returns - 1, like a regular sys tem
call. e rrno is automatically set by the system, for use by higher-level code.
The rp l_utime () function (lines 75-82) is the "replacement utime () ." If the
seco nd argument is not NULL , then it calls the real utime ( ) . Otherwise, it calls
utime _ null () .