0% found this document useful (0 votes)
247 views734 pages

UNIX and Shell Programming (Zer07)

Uploaded by

das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
247 views734 pages

UNIX and Shell Programming (Zer07)

Uploaded by

das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 734

UNIX AND SHELL

PROGRAMMING
B.M. Harwani
Founder & Owner
Microchip Computer Education (MCE)
Ajmer


3
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries.

Published in India by
Oxford University Press
YMCA Library Building, 1 Jai Singh Road, New Delhi 110001, India

© Oxford University Press 2013

The moral rights of the author/s have been asserted.

First published in 2013

All rights reserved. No part of this publication may be reproduced, stored in


a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence, or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above.

You must not circulate this work in any other form


and you must impose this same condition on any acquirer.

ISBN-13: 978-0-19-808216-3
ISBN-10: 0-19-808216-9

Typeset in Times
by Quick Sort (India) Private Limited, Chennai
Printed in India by Raj Kamal Electric Press, Kundli, Haryana

Third-party website addresses mentioned in this book are provided


by Oxford University Press in good faith and for information only.
Oxford University Press disclaims any responsibility for the material contained therein.
Dedicated to my mother,
Nita Harwani
Mom, whatever I am today is
because of the moral values taught by you.

I also pay tribute to the officers,


men and women of all ranks,
of the Indian Armed Forces.
I salute these brave, patriotic, and disciplined people
for serving our country.
Preface
Unix operating system, developed in the 1960s, is regarded as one of the most powerful operating
systems, due to its portability and usage in almost all kinds of environments. It is the result of the
combined efforts of many people—students, professors, researchers, and commercial companies. It is a
multitasking and multi-user operating system that is portable on several hardware platforms and is very
secure. It provides a rich set of tools and utilities that help administrators, programmers, and users, to
a great extent, in executing their tasks. Besides this, Unix offers the flexibility of controlling individual
jobs executed by a user.
Since its inception, Unix has been evolving constantly and has given rise to various products such
as Linux, Ubuntu, FreeBSD, SunOS, Solaris, SCO, and AIX. In order to understand and learn these
products that are widely in use, it is imperative for users to have a clear understanding of the root, that
is, the actual Unix operating system—its features, management of devices and files, implementation of
security, scheduling of CPU, and memory management.
Nowadays, Unix and its by-products are used as servers and in developing mobile applications. Unix
has also served as a model for the development of the Internet, thus shifting the focus of computers
towards the creation of networks.

ABOUT THE BOOK


The book has been designed to cater to students, teachers, professionals, and developers to help them
learn the fundamental concepts of the Unix operating system. It follows a bottom-up approach, that is,
it explains basic commands and gradually moves towards advanced commands. Similarly, it begins with
small and easy scripts and makes the reader acquainted with the fundamental statements, loops, and
conditional statements in a systematic manner. Gradually, it moves on to explain large, complex, and
critical scripts. The book focuses on advanced Unix commands that perform critical functions such as
setting access permissions, changing ownerships of the files, sharing files among groups, performing
input/output (I/O) redirections, cutting or slicing the file vertically, pasting content, comparing files, and
printing documents. It explains in detail the manipulation of processes and signals and the role of system
calls. All the major editors in Unix, namely, stream editor (sed), visual editor (vi), and modeless editor
(emacs) are explained in detail.
The book describes Bourne, Korn, and C shell programming and covers all important topics and
commands associated with these shells. It also includes numerous programming scripts for better
understanding of the three types of shells. The later part of the book includes dedicated chapters
on language development tools (Yacc, Lex, and M4), text-formatting tools (troff and nroff), and Unix
networking and administration.

KEY FEATURES
The book is packed with numerous student-friendly features that are described here.
• Complete scripts along with their outputs are provided for easy implementation of the concepts learnt.
• Each command is explained with its syntax with the help of multiple examples.
Preface vii

• Several options of a single command have been provided in a tabular format along with their function,
description, and examples for quick understanding and usage.
• Numerous notes are interspersed with the text for providing additional relevant information.
• Around 1000 solved examples and over 900 end-chapter exercises (with answers to objective-type
questions) are provided.
• Specially designed brain teasers are provided at the end of most chapters for the readers to develop
an analytical approach to problem-solving.
• A variety of objective-type questions—state true or false, fill in the blanks, and multiple-choice
questions—are provided at the end of every chapter for testing the understanding of the concepts
learnt.
• Several review questions and programming exercises are provided for the reader to practise the
commands and scripts explained in the chapters.

ONLINE RESOURCES
The companion website of the book, http://oupinheonline.com/book/harwani-unix-shell-programming/
9780198082163, provides the following additional resources:
For faculty
• Chapter-wise PowerPoint Slides
• Answers to select programming exercises given in the book
For students
• Chapter-wise executable and complete shell scripts and codes for all the programs given in the
book
• Mail Organizer—a small project that sends mail to the desired recipient on a given date
• Inventory Management System—a small project that explains maintenance of inventory using
MySQL database server
• Debugging exercises with solutions
• Flashcards—for active recall of all important Unix commands

ORGANIZATION OF THE BOOK


The book is organized into 15 chapters.
Chapter 1, Unix: An Introduction, focuses on the fundamentals of operating systems, history of Unix,
structure of the Unix operating system, Unix environment, and different types of shells.
Chapter 2, Unix File System, explains the different types of regular and device files, organization of a
file system, accessing, mounting, and unmounting a file system, different blocks of a file system, and
structure of inode blocks.
Chapter 3, Basic Unix Commands, describes basic commands such as logging into the system, changing
the password, checking who is logged in, displaying date and time of the system, and dealing with
file operations such as creating files, displaying their contents, deleting files, creating links to files,
renaming files, and moving files. The chapter also explains commands for maintaining directories,
creating a directory, changing the current directory, removing a directory, displaying calendars, using
viii Preface

basic calculators, displaying information about current systems, deleting symbolic links, and exiting
from a Unix system.
Chapter 4, Advanced Unix Commands, discusses advanced commands such as setting access permissions
for the existing files and directories, setting default permissions for the newly created files and directories,
creating groups, changing ownerships of the files, and sharing files among groups. The chapter covers
commands for sorting content, performing I/O redirections, cutting the file vertically, pasting content,
splitting files, counting characters, words, and lines in files, using the pipe operator, comparing files,
eliminating and displaying duplicate lines, among others.
Chapter 5, File Management and Compression Techniques, explains the types of devices, role of device
drivers, and the way in which devices are represented in the Unix operating system. It details different
disk-related commands required for copying, formatting, finding usage, finding free space, and making
partitions. It also covers compression and decompression of files.
Chapter 6, Manipulating Processes and Signals, focuses on processes and their address space, structure,
data structures describing the processes and process states, commands related to scheduling processes
at the desired time, handling jobs, and switching jobs from the foreground to the background and vice
versa. It explains suspending, resuming, and terminating jobs, executing commands in a batch, ensuring
process execution even when a user logs out, increasing and decreasing the priority of processes, and
killing processes. The chapter also discusses signals, their types, and the methods of signal generation,
virtual memory and its role in executing large applications in a limited physical memory, and mapping
of a virtual address to the physical memory.
Chapter 7, System Calls, is devoted to the role of system calls in performing different tasks. The chapter
explains system calls that are used in file handling operations such as opening, creating, reading from
and writing to files, closing, deleting, and linking to files, changing file access permissions, accessing
file information, and relocating and duplicating file descriptors. The chapter covers the system calls that
perform different tasks related to directory handling such as changing, opening, and reading directories.
The chapter throws light on the system calls involved in process handling operations such as the exec(),
fork, and wait system calls and those that deal with memory management—allocating memory, freeing
memory, changing the size of the allocated memory, file locking, and record locking .
Chapter 8, Editors in Unix, explains the usage of the stream editor (sed) in filtering out the desired data
from the specified file, inserting lines, deleting lines, saving filtered content into another file, loading the
content of another file into the current file, and searching for content that matches specific patterns. The
chapter also explains the visual editor (vi) and the modeless editor (emacs).
Chapter 9, AWK Script, discusses the role of the AWK scripts in filtering and processing content. It
explains the different functions used in AWK for printing results, formatting output, and searching for
desired patterns. The chapter also details different operators (comparison, logical, arithmetic), functions
(string, arithmetic, and search and substitute), and built-in variables to perform the desired operations
quickly and with the least effort. It also discusses different loops to perform repetitive tasks, taking input
from the user to perform operations on the desired content.
Chapter 10, Bourne Shell Programming, explains different command line parameters used in Bourne
shell scripts, conditional statements, loops, reading input, displaying output, testing data, translating
content, and searching for patterns in files. The chapter also covers displaying the exit status of the
Preface ix

commands, applying command substitution, sending and receiving messages between users, creating
and using functions, setting and displaying terminal configurations, managing positional parameters,
and using fetch options in the command line.
Chapter 11, Korn Shell Programming, helps us in understanding different features of the Korn shell,
command line editing, file name completion, command name aliasing, command history substitution,
and meta characters. It explains different operators, shell variables, basic I/O commands, command line
arguments, if else and case statements, strings, files, loops, arrays, functions, and I/O redirection.
Chapter 12, C Shell Programming, describes the C shell and its different features. The chapter explains
command history, command substitution, filename substitution (globbing), filename completion, and
aliases. It also covers job control, running jobs in the background, and suspending, resuming, and killing
jobs. It aids in the understanding of environment variables, shell variables, built-in shell variables, and
customizing the shell and C shell operators. The chapter also discusses different flow control statements,
loops, arrays, and errors.
Chapter 13, Different Tools and Debuggers, describes language development tools Yacc, Lex, and M4 and
text-formatting tools, troff and nroff. The chapter covers different preprocessors for nroff and troff
such as tbl, eqn, and pic. The chapter also discusses debugger tools, dbx, adb, and sdb.
Chapter 14, Interprocess Communication, covers pipes and messages as also accessing, attaching,
reading, writing, and detaching the shared memory segment. It helps the readers in getting acquainted
with initializing, managing, and performing operations on sockets (stream and datagram), I/O
multiplexing, filters, and semaphores.
Chapter 15, Unix System Administration and Networking, discusses the Unix booting procedure,
mounting and unmounting file systems, managing user accounts, network security, and backup and
restore.

ACKNOWLEDGEMENTS
I thank my family, my small world: my wife, Anushka and my wonderful children, Chirag and Naman
for inspiring and motivating me and forgiving me for spending long hours on the computer during
the course of development of this book.
Speaking of encouragement, I must thank my students who, with their innumerable queries, helped
me understand the essential expectations of a reader. This in turn made me add numerous examples
and exercises, thus giving a practical approach to the book.
My acknowledgements would remain incomplete if I did not thank the editorial team at Oxford
University Press, India, who supported me throughout the development of this book. My special
thanks are due to the reviewers for their constructive comments and valuable suggestions.
I have tried to cover the necessary topics and explain them in a simple and user-friendly manner.
Any comments or suggestions that can be incorporated in future editions of this book may be sent to me
at [email protected].
B.M. Harwani
Brief Contents
Features of the Book iv
Preface vi
Detailed Contents xi
1. Unix: An Introduction 1
2. Unix File System 13
3. Basic Unix Commands 27
4. Advanced Unix Commands 59
5. File Management and Compression Techniques 94
6. Manipulating Processes and Signals 148
7. System Calls 192
8. Editors in Unix 258
9. AWK Script 305
10. Bourne Shell Programming 378
11. Korn Shell Programming 480
12. C Shell Programming 558
13. Different Tools and Debuggers 624
14. Interprocess Communication 653
15. Unix System Administration and Networking 672
Index 697
Detailed Contents
Features of the Book iv
Preface vi
Brief Contents x
1. Unix: An Introduction 1 2.4.3 Inode Block 21
1.1 Operating System 1 2.4.4 Data Block 24
1.1.1 Functions of Operating Systems 2
1.2 History of Unix 3 3. Basic Unix Commands 27
1.3 Overview and Features of 3.1 login: Logging in to Systems 27
Unix System 4 3.2 Overview of Commands 28
1.3.1 Multitasking 4 3.2.1 Structure 29
1.3.2 Multi-user 5 3.2.2 Types of Commands in Unix 29
1.3.3 Portability 5
1.3.4 Job Control 5 4. Advanced Unix Commands 59
1.3.5 Tools and Utilities 5 4.1 Overview 59
1.3.6 Security 6 4.2 File Access Permissions 60
1.4 Structure of Unix System 6 4.2.1 chmod: Changing File
1.4.1 Hardware 6 Access Permissions 61
1.4.2 Kernel 7 4.2.2 umask: Setting Default
1.4.3 Shell 8 Permissions 62
1.4.4 Tools and Applications 9 4.2.3 chown: Changing File Ownership 64
1.5 Unix Environment 9 4.2.4 chgrp: Changing Group
1.5.1 Stand-alone Personal Environment 10 Command 65
1.5.2 Time-sharing Environment 10 4.2.5 groups: Displaying Group
1.5.3 Client–Server Environment 10 Membership 66
4.2.6 groups: Sharing
2. Unix File System 13 Files Among Groups 66
2.1 Introduction to Files 13 4.3 Input/Output Redirection
2.1.1 Types of Files 13 in Unix 67
2.1.2 Symbolic Links 15 4.3.1 Output Redirection Operator 67
2.1.3 Pipes 15 4.3.2 Input Redirection Operator 68
2.1.4 Sockets 16 4.4 Pipe Operator 68
2.2 Organization of File Systems 16 4.5 cut: Cutting Data from Files 68
2.3 Accessing File Systems 17 4.6 paste: Pasting Data in Files 71
2.3.1 Mounting File Systems 18 4.7 split: Splitting Files into
2.3.2 Unmounting File Systems 18 Lines or Bytes 71
2.4 Structure of File Systems 20 4.8 wc: Counting Characters,
2.4.1 Boot Block 20 Words, and Lines in Files 73
2.4.2 Super Block 20 4.9 sort: Sorting Files 73
xii Detailed Contents

4.10 head: Displaying Top 5.4.2 gunzip Command 107


Contents of Files 74 5.4.3 zip Command 109
4.11 tail: Displaying Bottom 5.4.4 unzip Command 111
Contents of Files 75 5.4.5 compress Command 111
4.12 diff: Finding Differences 5.4.6 uncompress Command 114
between Two Files 75 5.4.7 pack Command 115
4.13 cmp: Comparing Files 77 5.4.8 unpack Command 115
4.14 uniq: Eliminating and 5.4.9 bzip2 and bunzip2 Commands 117
Displaying Duplicate Lines 78 5.4.10 bunzip2 Command 119
4.15 comm: Displaying and 5.4.11 7-zip—Implementing
Suppressing Unique or Maximum Compression 119
Common Content in Two Files 79 5.5 Dealing with Files 123
4.16 time: Finding Consumed Time 81 5.5.1 file: Determining File Type 124
4.17 pg: Showing Content Page-wise 82 5.5.2 find: Locating Files 124
4.18 lp: Printing Documents 82 5.5.3 locate: Searching for
4.19 cancel: Cancelling Files with Specific Strings 129
Print Command 84 5.5.4 which/whence: Finding
4.20 Understanding .profile Files 84 Locations of Programs or
4.21 calendar: Getting Reminders 85 Utilities on Disks 130
4.22 script: Recording Sessions 85 5.5.5 fsck: Utility for Checking
4.23 Conversions between File Systems 130
DOS and Unix 86 5.6 Important Unix System Files 135
4.24 man: Displaying Manual 87 5.6.1 /etc/passwd 135
4.25 Correcting Typing Mistakes 88 5.6.2 /etc/shadow 136
5.6.3 /etc/hosts 136
5. File Management and 5.6.4 /etc/hosts.allow and
Compression Techniques 94 /etc/hosts.deny 137
5.1 Managing and Compressing Files 94 5.7 Shell Variables 138
5.2 Computer Devices 95 5.7.1 User-created Shell Variables 138
5.2.1 Dealing with Devices 96 5.7.2 System Shell Variables 138
5.2.2 Block device 97 5.8 Export of Local and
5.2.3 Major and Minor Numbers 98 Global Shell Variables 141
5.3 Disk-related Commands 98
5.3.1 dd: Copying Disks 99 6. Manipulating Processes
5.3.2 du: Disk Usage 99 and Signals 148
5.3.3 df: Reporting Free and 6.1 Process Basics 148
Available Space on File Systems 101 6.1.1 Process Address Space 151
5.3.4 dfspace: Reporting Free 6.1.2 Process Structure 151
Space on File Systems 103 6.1.3 Creation and
5.3.5 fdisk: Dividing Termination of Processes 154
Disks into Partitions 103 6.2 Process States and Transitions 154
5.4 Compressing and Uncompressing Files 105 6.3 Zombie Process 156
5.4.1 gzip Command 105 6.4 Context Switching 157
Detailed Contents xiii

6.5 Threads 158 7.2.4 write(): Writing to Files 197


6.5.1 Comparison between 7.2.5 lseek(): Relocating
Threads and Processes 158 File Descriptors 199
6.6 ps: Status of Processes 159 7.2.6 close(): Closing Files 200
6.7 Handling Jobs 161 7.2.7 mknod(): Creating Files 201
6.7.1 fg: Foreground Jobs 162 7.2.8 dup() and dup2(): Duplicating
6.7.2 bg: Background Jobs 162 File Descriptors 202
6.7.3 Switching Jobs from Background 7.2.9 link() and symlink():
to Foreground and Vice Versa 164 Linking to Files 203
6.7.4 jobs: Showing Job Status 164 7.2.10 unlink(): Unlinking Files 205
6.8 Scheduling of Processes 165 7.2.11 stat(), fstat(), and
6.8.1 cron: Chronograph— lstat(): Accessing
Time-based Job Scheduler 166 File Status Information 205
6.8.2 crontab: Creating Crontab Files 166 7.2.12 access(): Checking Permissions 207
6.8.3 at: Scheduling Commands 7.2.13 chown(), lchown(), and
at Specific Dates and Times 167 fchown(): Changing
6.8.4 batch: Executing Owner and Group of Files 208
Commands Collectively 170 7.2.14 chmod() and fchmod():
6.8.5 nohup: No Hangups 170 Changing Permissions of Files 210
6.8.6 nice: Modifying Priority 171 7.2.15 umask(): Setting
6.8.7 kill: Killing Processes 172 File Mode Creation Mask 211
6.9 Signals 173 7.2.16 utime(): Changing Access
6.9.1 Classes of Signals 175 and Modification Times 211
6.9.2 Sending Signals Using kill() 7.2.17 ioctl(): Controlling Devices 212
and raise() 176 7.3 Directory Handling System Calls 213
6.9.3 Signal Handling 7.3.1 mkdir() and rmdir():
Using signal() 177 Creating and Removing
6.10 Virtual Memory 183 Directories 214
6.10.1 Paging 184 7.3.2 chdir(): Changing Directories 215
6.10.2 Demand Paging 184 7.3.3 getcwd(): Determining
6.10.3 Segmentation 186 Current Working Directory 216
6.10.4 Memory-mapped 7.3.4 opendir(): Opening Directories 217
Input/Output 187 7.3.5 readdir(): Reading Directories 217
7.3.6 telldir(), seekdir(), and
7. System Calls 192 rewinddir(): Knowing,
7.1 Introduction 192 Setting, and Resetting
7.1.1 Operation Modes 192 Position in Directory Streams 220
7.1.2 Kernel Mode 193 7.3.7 closedir(): Closing
7.1.3 User Mode 193 Directory Streams 222
7.2 File-related System Calls 194 7.4 Process-related System Calls 223
7.2.1 open(): Opening Files 195 7.4.1 exec(): Replacing Executable
7.2.2 create(): Creating Files 196 Binaries with New Processes 223
7.2.3 read(): Reading from Files 196 7.4.2 fork(): Creating New Processes 225
xiv Detailed Contents

7.4.3 wait(): Waiting 226 8.2.1 Actions with Sed 260


7.4.4 exit(): Terminating Processes 227 8.2.2 Remembered Patterns 269
7.5 Interrupted System Call 228 8.3 Visual Editor 270
7.6 Standard C Library Functions 230 8.3.1 Creating and Editing Files 271
7.6.1 Difference between System 8.3.2 Inserting and Appending Text 271
Calls and Library Functions 230 8.3.3 Replacing Text 272
7.7 Streams and File Input/ 8.3.4 Inserting and Joining Lines 273
Output Library Functions 231 8.3.5 Exiting and Writing to Files 273
7.7.1 fopen(): Opening Files 232 8.3.6 Navigating—Line Positioning
7.7.2 fwrite(): Writing into Files 232 and Cursor Positioning 274
7.7.3 fread(): Reading 8.3.7 Positioning Cursor on Words 275
Data from Files 233 8.3.8 Positioning Cursor on Sentences 275
7.7.4 fclose(): Closing Files 234 8.3.9 Positioning Cursor on Paragraphs 276
7.7.5 fflush(): Flushing out to Files 234 8.3.10 Scrolling through Text 276
7.7.6 fseek(): Relocating 8.3.11 Marking Text 276
File Pointers 234 8.3.12 Deleting and Undoing Text 277
7.7.7 fgetc(), getc(), and 8.3.13 Repeating Previous Commands 278
getchar(): Reading Characters 235 8.3.14 Going to Specified Lines 278
7.7.8 fgets() and gets(): 8.3.15 Searching for and Repeating
Reading Strings 236 Search Patterns 278
7.8 Error Handling 238 8.3.16 Searching for Characters 279
7.8.1 Using strerror Function 239 8.3.17 Copying, Changing, Pasting,
7.8.2 perror(): Displaying Errors 239 and Filtering Commands 280
7.9 Stream Errors 241 8.3.18 Set Commands 280
7.10 Functions for Dynamic 8.3.19 Reading and Writing
Memory Management 242 across Files 283
7.10.1 malloc(): Allocating 8.3.20 Global Substitution—
Memory Block 242 Find and Replace 285
7.10.2 calloc(): Allocating 8.3.21 Ex Mode—Line Editor Mode 287
Arrays of Memory Blocks 243 8.3.22 Abbreviating Text Input 294
7.10.3 realloc(): Resizing 8.3.23 Mapping Keys of Keyboard 295
Allocated Memory 243 8.3.24 Customizing vi Session 295
7.10.4 free(): Freeing 8.4 Emacs Editor 296
Allocated Memory 243 8.4.1 Cursor Movements 297
7.11 File Locking 245 8.4.2 Quitting Emacs 297
7.11.1 Creating Lock Files 245 8.4.3 Dealing with Buffers 298
7.11.2 Record Locking 247 8.4.4 Cutting and Pasting 298
7.11.3 Competing Locks 249 8.4.5 Searching and Replacing 298
7.11.4 Deadlock 252 8.4.6 Miscellaneous Commands 299

8. Editors in Unix 258 9. AWK Script 305


8.1 Introduction 258 9.1 AWK Command 305
8.2 Stream Editor 259 9.1.1 Versions 305
Detailed Contents xv

9.1.2 Advantages and Disadvantages 10.2 Beginning Bourne Shell Scripting 379
of Using AWK Filters 306 10.2.1 echo: Displaying
9.2 print: Printing Results 307 Messages and Values 379
9.3 printf: Formatting Output 308 10.2.2 Variables 380
9.4 Displaying Content of 10.2.3 expr: Evaluating Expressions 380
Specified Patterns 308 10.2.4 let: Assigning and
9.5 Comparison Operators 309 Evaluating Expressions 381
9.5.1 ~ and !~: Matching 10.2.5 bc: Base Conversion 381
Regular Expressions 310 10.2.6 factor: Factorizing Numbers 382
9.6 Compound Expressions 312 10.2.7 units: Scale Conversion 383
9.7 Arithmetic Operators 315 10.3 Writing Shell Scripts 383
9.8 Begin and End Sections 315 10.4 Command Line Parameters 385
9.9 User-defined Variables 316 10.5 read: Reading Input from Users 385
9.10 if else Statement 318 10.6 for Loop 386
9.11 Built-in Variables 321 10.7 while Loop 390
9.11.1 fs: Field Separator 322 10.8 until Loop 392
9.11.2 ofs: Output Field Separator 322 10.9 if Statement 393
9.12 Changing Input Field Separator 323 10.10 Bourne Shell Commands 394
9.13 Functions 324 10.10.1 test: Testing
9.13.1 String Functions 325 Expressions for Validity 395
9.13.2 Arithmetic Functions 334 10.10.2 [ ]: Test Command 397
9.14 Loops 337 10.10.3 tr: Applying Translation 400
9.14.1 for Loop 337 10.10.4 wc: Counting Lines,
9.14.2 do while Loop 341 Words, and Characters 403
9.14.3 while Loop 342 10.10.5 grep: Searching Patterns 404
9.15 Getting Input from User 343 10.10.6 egrep: Searching Extended
9.15.1 getline Command: Regular Expressions 409
Reading Input 343 10.10.7 Command Substitution 411
9.16 Search and Substitute 10.10.8 cut: Slicing Input 412
Functions 345 10.10.9 paste: Pasting Content 413
9.16.1 sub() 345 10.10.10 sort: Sorting Input 415
9.16.2 gsub() 347 10.10.11 uniq: Eliminating and
9.16.3 match() 348 Displaying Duplicate Lines 421
9.16.4 toupper() 349 10.10.12 /dev/null:
9.16.5 tolower() 349 Suppressing Echo 422
9.17 Copying Results into 10.10.13 Logical Operators 426
Another File 361 10.10.14 exec: Execute Command 429
9.18 Deleting Content from Files 363 10.10.15 sleep: Suspending Execution 434
9.19 Arrays 364 10.10.16 exit: Terminating Programs 435
9.20 Associative Arrays 366 10.10.17 $?: Observing Exit Status 436
10.10.18 tty: Terminal Command 441
10. Bourne Shell Programming 378 10.10.19 write: Sending and
10.1 Introduction 378 Receiving Messages 442
xvi Detailed Contents

10.10.20 mesg: Controlling 11.6.3 PS3 Variable 493


Delivery of Messages 443 11.6.4 PS4 Variable 495
10.10.21 wall: Broadcasting Message 444 11.7 Setting Display
10.10.22 stty: Setting and Environment Variable 496
Configuring Terminals 444 11.7.1 Terminal 496
10.10.23 w ; who: Activities 11.7.2 Display 497
of Logged in User 449 11.8 Steps to Create and Run
10.10.24 last: Listing Last Logged 449 Korn Shell Scripts 497
10.10.25 case Statement 451 11.9 Basic Input/Output Commands 499
10.10.26 Functions 455 11.9.1 echo 499
10.10.27 select: Creating Menus 457 11.9.2 print 500
10.10.28 basename: Extracting 11.9.3 read 500
Base Filename 460 11.9.4 printf 501
10.10.29 expr—Advanced Features 462 11.9.5 typeset 502
10.10.30 getopts: Handling 11.9.6 Converting Base 10 to Octal 503
Options in Command Line 464 11.9.7 unset 504
10.10.31 set: Setting 11.10 Variable Substitution 505
Positional Parameters 467 11.11 Command Line Arguments 506
10.10.32 shift: Shifting 11.11.1 shift: Shifting
Command Line Arguments 468 Positional Parameters 508
10.10.33 at: Scheduling Execution 469 11.11.2 set: Handling
10.11 Trapping Signals 470 Positional Parameters 509
11.11.3 test Command 510
11. Korn Shell Programming 480 11.12 Pattern-matching Operators 511
11.1 Introduction 480 11.12.1 If Else Statement 511
11.2 Features 480 11.13 Testing Strings 513
11.2.1 Command Line Editing 481 11.14 case...esac Statement 521
11.2.2 Filename Completion 483 11.15 while Loop 524
11.2.3 Command Name Aliasing 483 11.16 break: Breaking out of Loops 526
11.2.4 Command History Substitution 484 11.17 continue: Skipping
11.3 Korn Shell Meta Characters 484 Statements in Loops 527
11.4 Operators 485 11.18 until Loop 529
11.4.1 Arithmetic and 11.19 for Loop 530
Logical Operators 485 11.20 Arrays 537
11.4.2 Relational Operators 486 11.20.1 Indexed Array 538
11.5 Variables 486 11.20.2 Associative Array 539
11.5.1 Shell Variables 486 11.21 Functions 540
11.5.2 Environment Variables 487 11.21.1 return Command 541
11.6 Setting Shell Prompts 491 11.21.2 Passing Arguments to Functions 542
11.6.1 PS1 Variable 492 11.21.3 Creating Local Variables 543
11.6.2 PS2 Variable 493 11.21.4 Recursion 544
Detailed Contents xvii

11.22 exit() 546 12.9 Arrays 612


11.23 $? 546 12.10 Displaying Errors 617
11.24 Input/Output Redirection 547
13. Different Tools and Debuggers 624
12. C Shell Programming 558 13.1 Language Development
12.1 C Shell 558 Tools—Yacc, Lex, and M4 624
12.1.1 Features 558 13.1.1 Yet Another Compiler–Compiler 624
12.1.2 Command History 559 13.1.2 Lexical Analyser 625
12.1.3 Command Substitution 561 13.1.3 m4 626
12.1.4 Filename Substitution—Globbing 561 13.2 Text-Formatting Tools 628
12.1.5 Filename Completion 562 13.2.1 troff 628
12.1.6 Aliases 563 13.2.2 nroff 629
12.1.7 Job Control 564 13.3 Preprocessors for nroff and troff 630
12.2 Start-up Files 565 13.3.1 tbl 630
12.2.1 .cshrc File 566 13.3.2 eqn 633
12.2.2 .login File 566 13.3.3 pic 635
12.2.3 .logout File 567 13.3.4 Commands Used in pic 638
12.3 Variables 567 13.4 Debugger Tools 639
12.3.1 Environment Variables 567 13.4.1 dbx 640
12.3.2 Shell Variables 569 13.4.2 adb 641
12.3.3 Built-in Shell Variables 569 13.4.3 sdb 642
12.3.4 Unsetting Variable 570 13.5 strip: Discarding Symbols
12.4 Customizing Shells 571 from Object Files 647
12.4.1 Setting Primary Prompt 571 13.6 Version-Control Systems 648
12.4.2 Changing History Characters 572 13.6.1 Manual Version Control 648
12.4.3 Setting mail Variable 572 13.6.2 Automated Version Control 648
12.5 C Shell Operators 573
12.6 Writing and Executing First 14. Interprocess Communication 653
C Shell Script 576 14.1 Interprocess Communication 653
12.6.1 Reading Data 578 14.1.1 Pipes 654
12.6.2 User-defined Shell Variables 579 14.1.2 Messages 654
12.7 Flow Controlling Statements 582 14.1.3 Sockets 654
12.7.1 if-then-else Statements 582 14.1.4 Shared Memory 657
12.7.2 Branching with goto 591 14.2 Synchronization 661
12.7.3 exit Command 593 14.2.1 Mutual Exclusion Locks 661
12.7.4 switch, case, 14.2.2 Semaphores 661
breaksw, and endsw Statements 595 14.3 Input/Output Multiplexing 664
12.8 Loops 599 14.3.1 select() System Call 664
12.8.1 while end Loop 599 14.3.2 pselect() System Call 666
12.8.2 repeat Command 602 14.4 Filters 666
12.8.3 foreach end Loop 604 14.4.1 more Filter 667
xviii Detailed Contents

14.4.2 less Filter 667 15.5.8 Trivial File Transfer Protocol 683
14.4.3 tee Command 668 15.5.9 finger 683
15.5.10 rlogin 683
15. Unix System Administration 15.5.11 Unix Network Security 684
and Networking 672 15.6 mail Command 685
15.1 Unix Booting Procedure 672 15.6.1 Sending E-mails 685
15.1.1 Single-user Mode 672 15.6.2 Reading Mails 686
15.1.2 Multi-user Mode 673 15.6.3 Sending Replies 687
15.2 Mounting Unix File System 673 15.6.4 Mail Commands 687
15.3 Unmounting Unix File System 674 15.6.5 Saving Messages 688
15.4 Managing User Accounts 674 15.6.6 Deleting Messages 688
15.4.1 Creating User Accounts 674 15.6.7 Undeleting Messages 689
15.4.2 Modifying User Accounts 676 15.6.8 Quitting Mail Command 689
15.4.3 Deleting User Accounts 676 15.7 Distributed File System 689
15.4.4 Creating Groups 677 15.7.1 Andrew File System 690
15.4.5 Modifying Groups 677 15.8 Firewalls 691
15.4.6 Deleting Groups 677 15.8.1 Advantages 692
15.5 Networking Tools 678 15.8.2 Building
15.5.1 ping 678 Simple Firewalls 692
15.5.2 nslookup 678 15.9 Backup and Restore 692
15.5.3 telnet 679 15.9.1 tar 693
15.5.4 arp 680 15.9.2 cpio 693
15.5.5 netstat 681 15.9.3 dd 693
15.5.6 route 681 15.10 Shut Down and Restart 693
15.5.7 ftp 681
Index 697
Unix: C HA PT E R

An Introduction
1
After studying this chapter, the reader will be conversant with the following:
• Fundamentals of operating systems
• History of Unix
• Structure of the Unix operating system
• Various types of shells and their responsibilities
• Numerous features of the Unix operating system
• The Unix environment

Even after four decades of use, Unix is regarded as one of the most powerful operating
systems, due to its portability and usage in almost all kinds of environments, ranging from
micro to supercomputers.
We cannot even think of using a computer system without an operating system. An
operating system is an interface that enables the use of a computer system’s resources;
without an operating system, the computer will be a dead piece of electronic device.
In this chapter, before delving into the history and structure of Unix, we will attempt
to understand the following: what an operating system is; why it is essential in running a
computer system; and in what manner Unix is different from the other operating systems
used earlier and in recent times.

1.1 OPERATING SYSTEM


An operating system is the main software component of a computer system. It provides users
with an environment that makes it possible to use the hardware devices of a computer. Without
an operating system, we cannot access any of the resources of the computer system, including
its hardware and software. Examples of popular operating systems available nowadays include
Android, BSD, iOS, LINUX, Microsoft Windows, Mac OS X, and z/OS. Apart from Microsoft
Windows and z/OS, all the other operating systems in this list are Unix-based.
Let us understand how an operating system is related to hardware, software, and the users
(see Fig. 1.1).
2 Unix and Shell Programming

As depicted in Fig. 1.1, it is evident that users are able to interact with hardware through the
operating system. The operating system as well as the software creates an environment for
the user that enables easy access and use of hardware. Basically, the operating system creates
an interface between the user and the hardware.
The following section discusses the functions that an operating system performs, which
enable easy operation of a computer system.
1.1.1 Functions of Operating Systems
An operating system performs the following functions:
Memory and data management All operating systems provide methods for controlling
data in the memory. When a job has to be performed, the operating system should allocate
the memory for loading that job into the memory.
Communication An operating system should support methods in such a manner that the
various computer systems can communicate with one another for exchange of data.
Time sharing Time sharing enables several people to use the same computer
simultaneously. A few operating systems support time-sharing features.
Security In a multi-user environment, security should be provided by the operating system.
This security prevents one user from interfering with the work done or being done by another
user. It also prevents unauthorized personnel from using the computer system.
User-command interpretation This is a function of the operating system using which the
commands that are typed in by the user are read and interpreted by the operating system.
Through interpretation, the operating system understands what the user wants.
Accounting Through this function, the operating system keeps an account of all the
resources used by different processes. Resources, here, means memory, CPU, disk space
requirement, and so on.
Program development tools All operating systems provide program
development tools, which assist users in writing and maintaining programs.
Users Software development is one of the important features provided by the
operating system.
Scheduling A scheduler is the heart of all multi-user operating systems.
Software This program enables many people to use the computer simultaneously. The
scheduler assigns the CPU time slice to the ready process. After that time slice,
the process is stored in the wait queue, the next process in the ready queue is
Operating system picked, and the CPU pays attention to it.
Swapping When several users are working simultaneously, their processes are
stored in the memory. When the memory is full and a new process has to be
Hardware activated, the scheduler takes the current process in the memory and copies it to
the hard disk. Next, the scheduler starts a new process in the space freed in the
Fig. 1.1 Operating memory. This process is known as swapping. After returning to the time slice,
system in relation to the process that was swapped out of the memory is brought in (swapped in), and
hardware, software, and some other process is swapped out. This feature is available in a virtual memory
users environment.
Unix: An Introduction 3

1.2 HISTORY OF UNIX


The development of the Unix operating system began in 1957 and had its roots in Bell
Labs. The growth of this wonderful, multitasking, highly powerful, affordable, and secure
operating system was not an accident, but the result of the joint efforts of many people,
including students, professors, researches, and commercial companies. A short date-wise
history of the evolution of Unix is given here.
In 1957, Bell Labs required an operating system for their in-house computer centre. They
created BESYS to sequence their jobs and to control the system resources. However, they
wanted a more efficient operating system.
In 1964, the researchers from General Electric, MIT, and Bell Labs came together
and created a new general-purpose, multi-user, time-sharing operating system known as
Multiplexed Information and Computing System (Multics).
In 1969, the Multics project was withdrawn because of the high cost of development
and due to differences among its members. When this happened, Ken Thompson, Dennis
Ritchie, Douglas Ritchie, and Douglas Mcllroy, along with a few others, began working on
Uniplexed Information and Computing System (UNICS) by using an old PDP-7 computer.
The name Unics was then shortened to Unix. While working on the early assembly versions
of Unix, Thompson worked on a FORTRAN compiler that evolved to support the language
B, which was a smaller version of BCPL.
In 1971, the first edition of Unix appeared along with the B compiler. It introduced
several well-known Unix commands including cat, chdir, chmod, chown, and cp. Together
it included more than 60 commands. However, it did not have the pipe feature. Some of the
utilities in this first edition were written in the B compiler. In the next few years, Dennis
Ritchie rewrote the B compiler and developed the C compiler.
In 1972, the second edition of Unix was released.
In 1973, the third edition of Unix appeared along with the Unix C compiler (cc). The
kernel was still written in assembly language. The pipe feature was also introduced in this
version.
In 1973, the fourth edition of Unix was released. The kernel was rewritten in the C compiler.
In 1974, the fifth edition of Unix was released. The source code was made freely available
to universities for educational purposes. Unix also spread outside AT&T and Bell Labs and
was provided to academic institutions at a very small charge. It became very popular, as it was
inexpensive, could run on the available hardware, was provided along with the source code,
and was written in a programming language that was easier to understand. In 1974, Thompson
taught Unix for a year at the University of California, Berkeley. When Thompson returned
to Bell Labs, students and professors at Berkeley continued to enhance Unix. This led to the
formation of the Berkeley Software Distribution, which was commonly known as BSD.
In 1975, the sixth edition of Unix was released. This edition, also known as V6 UNIX,
was the first edition that was available outside Bell Labs.
In 1977, 1BSD, the first edition, was released; in 1978, 2BSD, the second edition, was
released.
In 1979, the seventh edition of Unix was released. This edition was released along
with Steve Bourne’s shell (sh). The kernel was rewritten to make it more portable to other
4 Unix and Shell Programming

types of architecture. At this point, the Unix systems group (USG) was created, and it was
focused on enhancing the seventh edition. Three groups were working in all and the original
versions of Unix were developed by the computer research group (CRSG) of Bell Labs. The
support for internal releases was provided by the USG. The task of developing and writing
tools was done by another group at Bell Labs, the programmer’s workbench (PWB).
The development of Unix split into two main branches: System 5 (SYSV) and Berkeley
software distribution (BSD). BSD was developed by students and professors at the University
of California, Berkeley. SYSV was developed by AT&T and other commercial companies.
In 1979, 3BSD, the third edition, was released.
In 1980, 4.0BSD, the fourth version of the BSD Unix variant, was released.
In 1982, AT&T transferred its Unix development to Western Electric, which developed
the System III version of Unix.
In 1983, Western Electric released System V, whereas System IV was reserved for only
AT&T’s use.
In 1984, the USG group, which was renamed the UNIX system development laboratory
(USDL) group, released System V Release 2 (SVR2), which was the first version of Unix
that supported paging, shared memory, and other associated features.
In 1985, the eighth edition of Unix was released on the basis of the 4.1BSD version.
In 1987, the USDL group, which was renamed AT&T Information Systems (ATTIS)
group, released System V Release 3 (SVR3).
In 1988, the ninth version of Unix was developed, and it was based on the 4.3BSD version.
In 1989, the tenth version of Unix was developed.
Unix is one of the most popular operating systems, which was developed step by step, as
evident from the aforementioned timeline.
Let us now have a broad overview of the Unix system.

1.3 OVERVIEW AND FEATURES OF UNIX SYSTEM


The Unix system is a multitasking, multi-user operating system that is portable on several
hardware platforms and which is quite secure. It also provides a rich set of tools and utilities
that help administrators, programmers, and users, to a great extent, in executing their tasks.
Besides this, the system provides the flexibility of controlling individual jobs executed by
the user.
Some of the important features of the Unix operating system are as follows:
1. Multitasking 4. Job Control
2. Multi-user 5. Tools and Utilities
3. Portability 6. Security

1.3.1 Multitasking
Unix is a multitasking operating system, that is, it can execute multiple tasks simultaneously.
In a multitasking environment, the CPU processes a task and when the process waits for
an input/output (I/O) operation to be completed, the CPU switches to another task. The
switching between tasks is so fast that it appears that the operating system is executing all
the tasks simultaneously. Due to multitasking, we can carry out several tasks simultaneously.
Unix: An Introduction 5

For instance, commands for printing a file, editing text, and managing files can be given
simultaneously; all tasks are thus performed simultaneously. With the help of this feature,
Unix maximizes the computer resource utilization and hence, the computer’s efficiency.

1.3.2 Multi-user
The multi-user feature of Unix enables several users to work simultaneously and access
system resources concurrently. The operating system not only receives commands from all
the users, but also carries out the desired processing and responds accordingly. The operating
system manages the consumption of system resources among the users and implements the
locking mechanism to maintain the integrity and consistency of applications and data that
are accessed simultaneously. The multi-user approach maximizes the computer resource
utilization and hence reduces the cost per user. Since the system resources are shared,
resource management is done so as to avoid any deadlock.
Note: Deadlock is a situation wherein two or more competing actions wait for each other to finish and, as a
result, neither reaches completion.

1.3.3 Portability
Unix is portable, that is, it is available on a wide range of hardware. Since the Unix operating
system is coded in a high-level language, C programming language, it is less hardware
dependent and, hence, can be easily moved from one brand of computer to another without
a major code rewrite. It is also the kernel that provides an interface between the hardware
and other application modules. The application modules interface with the kernel and not the
hardware, and hence, when Unix is ported to another hardware platform, only the kernel and
not the application modules requires modification. This makes the operating system almost
hardware independent and does not require much modification.

1.3.4 Job Control


Unix enables us to control the execution of jobs. For example, we can suspend or resume
any job, switch a job from the background to the foreground, and kill a job. The jobs that
require user interaction frequently need I/O operation, have specific time constraints, and
are executed in the foreground. In foreground jobs, the shell waits for a job to be completed
and only then displays a prompt to execute another job. Background jobs are those that are
executed behind the scenes. Jobs of a lower priority do not require user interaction and are
executed in the background. Suspended jobs are paused for a while and can be moved to
either the background or the foreground. The job control feature of Unix enables users to
execute several jobs and control them on the basis of their priority.

1.3.5 Tools and Utilities


Unix supports a number of tools and utilities that make the users’ job easier. Tasks such as
splitting files, merging files, searching for content in files, arranging files, and sending mail
can be simply done by issuing certain commands. Unix not only has a vast library of system
tools, but also has programming tools that provide a flexible platform for programmers and
developers to create portable and efficient applications.
6 Unix and Shell Programming

1.3.6 Security
Unix is considered a comparatively more secure operating system. Each user has an identity
through a unique user ID and group ID. In order to avoid any unauthorized access, each
file and directory has an owner and a group that are associated with it. Three permissions
are attached to each file and directory—read, write, and execute. The set of permissions, r,
w, and x, are associated with the three types of users—owners, groups, and others. Hence,
we can individually assign the desired permissions to these three types of users. Next let us
explore the structure of the Unix system.

1.4 STRUCTURE OF UNIX SYSTEM


The structure of the Unix operating system consists of four parts (as shown in Fig. 1.2):
hardware, kernel, shell, and tools and applications. The various parts of the Unix system are
discussed in detail in the following sections.

1.4.1 Hardware
Hardware refers to the physical components that collectively form a computer machine.
The following three primary components constitute the hardware of a computer system:

I/O devices Data is supplied or entered into the computer for processing through input
devices such as keyboard, mouse, track ball, magnetic ink character recognition (MICR),
optical character recognition (OCR), and optical mark recognition (OMR). Output devices
display processed data. The two most common output devices are screen and printer.

Central processing unit The central processing unit (CPU) is the heart of the computer.
It obtains the data from the user through input devices, processes the entered data into
information, and displays the information through output devices. The processed data can be
saved in the memory for future use.

Memory It is used for storing data and is of


two types: primary and secondary. The primary
Tools and applications
memory includes RAM and ROM, out of which
Shell RAM is volatile in nature. While the data is being
Kernel processed by the CPU, it is temporarily stored
in the RAM. After processing, it is removed
and replaced by new data, which has to be
Hardware
processed further. ‘Volatile’ means that the data
stored in the RAM is temporary in nature, (i.e.,
it is overwritten by the new data and the whole
data is lost on switching off the computer). The
secondary memory includes hard disk drives, pen
drives, CDs/DVDs, and so on. These devices are
of a permanent nature, that is, once data is written
Fig. 1.2 Structure of the Unix system in them, it will be stored until deleted by the user.
Unix: An Introduction 7

Note: Networking components such as LAN cards, cables, routers, and switches are also considered part of
the hardware.

1.4.2 Kernel
The kernel is the heart of any operating system. Its main purpose is to ensure that the jobs
of the operating system are performed properly. These jobs mainly include the scheduling
of tasks, resource management, process management, and file management. Resource
management refers to the allotment of CPU time, disk space, memory space, and so on to
different processes. Process management includes the allocation of resources such as CPU,
memory, and other devices. File management includes the management of files and their
permissions, among others.
The kernel hides all the complexities of accessing hardware and provides a user-friendly
interface by doing all the tasks behind the scene.
A brief view of the different tasks performed by the kernel is provided in Fig. 1.3.
Let us take a quick look at the operations that a kernel can perform:
1. It controls the execution of processes by enabling their creation, termination or suspension,
and communication.
2. It schedules processes fairly for execution on the CPU. The processes share the CPU
in a time-shared manner. The CPU executes a process; the kernel suspends it when its
time quantum elapses and schedules another process to be executed. Later, the kernel
reschedules the suspended process.
3. It allocates the main memory for an executing process. The kernel enables processes to
share portions of their address space under certain conditions, but protects the private
address space of a process from outside tampering. If the system runs low on free memory,
the kernel frees the memory by writing a process temporarily to the secondary memory,
which is called a swap device. If the kernel writes entire processes to a swap device,
the implementation of the Unix system is called a swapping system, whereas if it writes
pages of memory to a swap
device, it is called a paging
Applications Shells Utilities system.
4. It allocates secondary
memory for efficient stor-
age and retrieval of user
System call interface
data. This service consti-
tutes the file system. The
kernel allocates second-
Kernel
ary storage for user files,
reclaims unused storage,
structures the file sys-
Memory Process File Peripheral tem in a well-understood
management scheduling systems devices manner, and protects
unauthorized users from
Fig. 1.3 Different tasks performed by the kernel illegal access.
8 Unix and Shell Programming

5. It allows processes-controlled access to peripheral devices such as terminals, tape


drives, disk drives, and network devices.
6. It provides the necessary functionality to applications, shells, and utilities through the
system call interface. The applications of all the respective systems are called in order to
get certain tasks performed by the kernel.
After having understood the kernel and the tasks that it performs, we are ready to
understand the next part of the Unix structure—the shell.

1.4.3 Shell
The shell is an interface between the user and the kernel. The kernel does not know human
language; hence the shell accepts the commands from the user and converts them into a
language that the kernel can understand. It is a program that interprets user requests, calls
programs from the memory, and executes them one at a time. Several shells such as Bourne,
Korn, Bourne-again, and C Shell are available.
The shell also provides the facility of chaining or pipelining commands. This means the
output of one command is sent to the input of another command for further processing. In
this manner, one input data can be processed by several commands.
There are two major parts of a shell. The first is the interpreter. The interpreter reads
out commands and works with the kernel to execute them. The second part of the shell is a
programming capability that enables us to write a shell (command) script. A shell script is a
file that contains a collection of shell commands to perform a specified task. It is also known
as a shell program.

Types of shells
Shells are independent of the underlying Unix kernel. This fact has enabled the development
of several shells for Unix systems. Each type of shell has its own special features.
Bourne shell It is the most common shell in Unix systems and was the first major shell. It
was developed by Steve Bourne at the AT&T Labs. This shell was released in 1977 and was
called ‘sh’.
Korn shell It was developed by David Korn at AT&Bell Labs. It is built on the Bourne
shell. The most stable version of this shell was released in 1988 by AT&T’s Unix System
Laboratories as ‘ksh’. The Korn shell also incorporates the features of the C shell (e.g.,
process control). One of the important features of this shell is that it can run Bourne shell
scripts without any modification at all.
Bourne-again shell An enhanced version of the Bourne-again shell, which is also known
as ‘bash’, is distributed as the standard shell in almost all Unix systems. This is a freeware
shell from the Free Software Foundation (FSF), where it was developed by Brian Fox and
Chet Raney.
C shell It is also called the programmer’s shell and exists as ‘csh.’ It was developed by Bill
Joy at the University of California, Berkeley. The C shell got its name because its syntax and
usage is very similar to the C programming language. A compatible version of the C shell,
‘tcsh’ is used in Linux.
Unix: An Introduction 9

Bourne-again The C shell is not always available on all machines.


shell In addition, shell scripts written in the C shell are not
compatible with the Bourne shell. Such scripts should
be modified for working with the Bourne shell. One
Bourne shell of the major advantages of the C shell (compared with
the Bourne shell), however, is its ability to execute
processes in the background.
Korn shell The four shells are shown in Fig. 1.4. Tcsh is a
compatible version of the C shell that is used in LINUX.
Both the Korn shell (ksh) and the Bourne-again
C shell Tcsh
shell (bash) are extensions of, and compatible with, the
basic Bourne shell (sh). The original C Shell (csh) is
Fig. 1.4 Different shells in Unix operating system
only partially based on the Bourne shell and has been
extended into a shell called the ‘TC shell’ (tcsh), which is a C Shell with some additional
features. Since the TC shell is completely compatible with the C shell, it is also frequently
referred to as the ‘C shell’.
Until now, we have seen the functions that are generally performed by an operating
system; we should also know the additional features of the Unix operating system.
1.4.4 Tools and Applications
Tools and applications are built-in modules that are used by the operating system to perform
the tasks assigned by the user. These are available in the form of libraries that add special
capabilities to the operating system. Irrespective of whether the task is to display date and
time, find files, copy files, list files, or translate characters, among others, all tasks are
performed through Unix utilities. The tools and utilites are categorized on the basis of the
kind of tasks they perform. For example, file utilites do the tasks related to files: breaking
text files into pieces, combining text files together, and sorting their contents. Other utilities
such as grep, sed, and awk help in filtering or searching the desired content from the files.
Some of the most commonly used file-related Unix utilities are as follows:
1. cp: Copying files
2. ln: Linking one file to another
3. ls: Listing files or directory contents
4. mv: Moving or renaming files
5. rm: Removing files
6. pr: Printing files
7. tr: Translating characters
Usually an operating system is used in a single environment, but Unix is an operating
system that can be used in several environments. Let us now have a brief discussion of the
Unix environment.

1.5 UNIX ENVIRONMENT


Unix is a multi-user and multiprocessing operating system that can be used in three
environments: stand-alone personal environment, time-sharing environment, and client–
server environment.
10 Unix and Shell Programming

1.5.1 Stand-alone Personal Environment


Unix can be installed on personal computers and used as stand-alone machines. Though
the major features of the Unix operating system are exploited in a multi-user environment,
its security features, multitasking capability, and portability make it an attractive choice for
installation on personal computers.

1.5.2 Time-sharing Environment


A time-sharing environment is an environment in which a computer is connected to several
terminals and all the terminals share the resources of the central computer: CPU time, hard
disk, and printer. The central computer divides its CPU time into small time slices and serves
each terminal in the time slot assigned to it. Hence, each terminal waits for its time slot
to get its jobs processed by the central computer. Though this environment is economical,
the total dependency on the central computer is its major drawback. If the central machine
fails, all the terminals connected to it stop working, and hence, this environment is not very
popular nowadays. Since all the tasks of the terminals are performed by the central CPU, it
is overloaded and hence its response is very poor.

1.5.3 Client–Server Environment


The client–server environment is better than the time-sharing environment, as here, the
central computer is not connected to dumb terminals but to workstations or PCs that have
their individual processing power. As a result, all the processing tasks are not assigned to the
central computer but are divided among the central computer and the connected workstations
so that the local and small tasks can be processed at the workstation level (without bothering
the central computer), and the main tasks (that require more resources) are transferred to the
central computer. The workstations in this environment are known as clients, and the central
computer (that serves the requests sent by the clients) is known as server. In this environment,
dependency on the central computer is decreased and since the local tasks are performed at
the client’s level, the server is not overloaded, hence increasing its response time.

Note: We can customize the Unix shell environment by also making use of system variables known as
environment variables, which will be discussed in Chapter 10.

■ SUMMARY ■

1. Operating systems provide an environment that interprocess communication, time sharing, security,
makes it possible for us to use the resources of a user-command interpretation, accounting, program
computer, namely hardware and software. A few development, scheduling, and swapping.
examples of modern-day operating systems include 3. In 1960, Multics started the development of the now
Android, BSD, iOS, LINUX, Microsoft Windows, Mac well-known Unix operating system. Unix became
OS X, and z/OS. commercially viable in 1973 when it was entirely
2. The various functions that an operating system recoded in C, thereby facilitating portability in other
performs include memory and data management, hardware. A typical structure of the Unix operating
Unix: An Introduction 11

system consists of hardware, a kernel, a shell, and 6. The main features of the Unix operating system are
various tools and applications. portability, multitasking, and multi-user capability.
4. The kernel is the heart of the operating system. It is 7. Since Unix is a multiprocessing and multitasking
defined as a nucleus of the operating system that operating system, it can be used in three different
manages all the resources and gets the task performed types of environments: stand-alone personal environ-
by the desired hardware. ment, time-sharing environment, and client–server
5. A shell acts as an interface between a user and a environment.
kernel. Mainly four types of shells are available in the 8. Currently, Unix is also portable on mobile devices.
Unix operating system, namely Bourne shell (sh), C Almost all mobile operating systems, including
shell (csh), Korn shell (ksh), and Bourne-again shell iOS, Android, and webOS, run on Unix or LINUX
(bash). kernels.

■ EXERCISES ■

Objective-type Questions
State True or False
1.1 The Unics operating system was further 1.6 The Bourne-again shell (bash) was developed by
developed to Unix. David Korn.
1.2 An operating system creates an environment 1.7 The Korn shell was developed by Brian Fox and
that enables us to use different resources of a Chet Raney.
computer system. 1.8 The Bourne shell derives its name from Stephen
1.3 The Korn shell is the oldest of all shells. Bourne.
1.4 The Korn shell and Bourne-again shell are not 1.9 The shell manages all the resources and gets the
compatible with the Bourne shell. tasks performed by the desired hardware.
1.5 Unix is a multi-user and multitasking operating 1.10 Unix enables a user to run only one process at a
system. time.

Fill in the Blanks


1.1 Unix operating system is written in computer is connected to the workstations
language. or PCs.
1.2 BSD stands for . 1.7 Unix treats each job or task as a .
1.3 The operating system creates an 1.8 Both the Korn shell (ksh) and the Bourne-again
between the user and the hardware. shell (bash) are extensions of, and compatible
1.4 C shell was developed by . with, the shell.
1.5 When a job has to be performed, the 1.9 prevents unauthorized personnel
should allocate the memory for loading that job from using the computer system.
into the memory. 1.10 schedules processes for execution
1.6 In a environment, the central on the CPU.

Multiple-choice Questions
1.1 Which of the following is the heart of any 1.2 Korn Shell was developed by
operating system? (a) David Korn (c) Bill Joy
(a) Hardware (c) Software (b) Steve Bourne (d) Ken Thompson
(b) Kernel (d) Users
12 Unix and Shell Programming

1.3 The default prompt of the C shell is (c) WAN environment


(a) $ (b) % (c) cs (d) > (d) isolated environment
1.4 The three environments in which the Unix 1.5 The shell that is completely compatible with the
operating system can be used are stand-alone C shell is
personal environment, time-sharing environment, (a) Korn shell
and (b) Bourne-again shell
(a) client–server environment (c) TC shell
(b) LAN environment (d) Bourne shell

Review Questions
1.1 Write short notes on the following: 1.3 How did the Unix operating system come into
(a) Different tasks performed by the kernel the picture? Briefly explain its history.
(b) Role of shell in the Unix operating system 1.4 How many different types of shells are there?
(c) Structure of the Unix system Explain in detail.
1.2 Explain the functions performed by an operating 1.5 Explain the time-sharing and client–server
system. environment of the Unix operating system.

■ ANSWERS TO OBJECTIVE-TYPE QUESTIONS ■


State True or False 1.8 True 1.3 interface Multiple-choice
1.1 True 1.9 True 1.4 Bill Joy Questions
1.2 True 1.10 False 1.5 operating system 1.1 (b)
1.3 False 1.6 client–server 1.2 (a)
1.4 False Fill in the Blanks 1.7 process 1.3 (b)
1.5 True 1.1 ‘C’ 1.8 Bourne 1.4 (a)
1.6 False 1.2 Berkeley Software 1.9 Security 1.5 (c)
1.7 False Distribution 1.10 Scheduler
Unix File C HA PT E R

System

2
After studying this chapter, the reader will be conversant with the following:
• Unix files and their types
• Different types of device files
• Organization of a file system
• Accessing, mounting, and unmounting a file system
• Different blocks of a file system
• Structure of inode blocks

2.1 INTRODUCTION TO FILES


A file is a container of text, images, codes, and so on. Everything is a file on a Unix system.
Not only the data, programs, and applications, but also the directories and input/output (I/O)
devices are considered special kinds of files.
Generally, files are ordered in a hierarchical tree-like fashion with a root represented by
the character, ‘/’. The directories are the internal nodes of the tree structure, while the files
are considered to be the leaves. Let us learn about the different types of files in Unix.

2.1.1 Types of Files


The files are divided into the following three categories in the Unix operating system:
Ordinary files These files contain only data.
Directory files These files act as a container and can contain ordinary files and device files
along with directory files.
Device files These files represent all the hardware devices.
Ordinary files
We can store anything we want in these files. These files include data, source programs,
objects, executable codes, Unix commands, and any file created by the user. Commands such
as cat and ls are treated as ordinary files. An ordinary file is also referred to as a regular file.
14 Unix and Shell Programming

The most common type of ordinary file is the text file. This is just a regular file that contains
printable characters. For example, the programs that we write are text files. However, the Unix
commands that we use or the C programs that we execute do not fall into the category of text files.
The characteristic feature of text files is that the data stored inside them is divided into
groups of lines, with each line terminated by the newline character. This character is not
visible, and it does not appear in the hard copy output. It is generated by the system when we
press the <Enter> key.
Examples letter.txt, bank.sh, payment

The files in Unix may or may not have any extension. The first two examples depict files with
extensions .txt and .sh, respectively. The third example depicts a file without any extension.
In most Unix systems, a filename can have approximately 255 characters. If we enter more
than 255 characters while specifying a filename, only the first 255 characters are effectively
interpreted by the system.
Note: We have to assign extensions for the AWK files or other programming files (e.g., C).

Directory files
A directory contains no external data, but it stores some details of the files and sub-directories
it contains. The Unix file system is organized into a number of such directories and sub-
directories, which can also be created as and when needed. We often need to group a set of
files pertaining to a specific application. This enables two or more files in separate directories
to have the same filename.
If a directory contains, for example, 10 files, there will be 10 entries in the directory file
displaying information such as size of the file, date and time of creation, or last modification.
When an ordinary file is created or removed, its entry in the corresponding directory file is
automatically updated by the kernel with the relevant information about the file.
Note: The directory file contains the names of all resident files in the directory.

Examples projects, shell_scripts

These examples show two directories named projects and shell_scripts.


Device files
In the Unix operating system, peripheral devices, terminals, printers, CD-ROMs, modems,
disks, and tapes are treated as special files that are termed device files. The representation of
devices in the form of device files simplifies the task of using them. For example, printing
content on a printer is as simple as copying that content to the printer device file. A device
file interacts with the device driver, making it possible for the user to directly interact
with the device driver using standard I/O system calls, hence controlling the device more
precisely.
There are two types of device files based on how data is read or written into them: character
devices and block devices.
Character devices Character devices are those in which the read and write operations
are performed character by character, that is, one byte at a time. These devices are also
known as raw devices. The read and write operations in these device files are performed in
Unix File System 15

the actual transfer units of the device, that is, single characters at a time without collecting
or combining them into a block. It is quite obvious that character devices are comparatively
slow and have a large access time.
Examples include virtual terminals, terminals, and serial modems.
Block devices Block devices are those in which the read and write operations are performed
one block at a time, where the size of one block can range from 512 bytes to 32 KB. When
compared with character devices in which transactions are performed one character at a time,
block devices are quite fast. Moreover, block devices use caching to reduce the access time.
By caching, we mean that when a block device is accessed, the kernel reads the whole block
into a buffer in the memory, so that future read and write operations are performed to the
cached version in the memory, hence reducing the access time to a great extent. Finally, the
modified buffer contents are written to block devices. The only drawback in using memory
buffers is that if the system crashes before modified buffers are written into the block device,
the data will be inconsistent. Hence, we need to periodically flush out the modified buffers
to the block device.
Examples include hard disk, DVD/CD ROM, and memory regions.
Note: All device files are stored in the /dev directory.

2.1.2 Symbolic Links


A symbolic link is a special file that points to another existing file on the system. This link
contains the path name of the file it is pointing to. We can create several names for the same
file through symbolic links. In order to create symbolic links, the ln command is used, and
for listing them, the long-listing command ls -l is used. These commands will be discussed
in detail in Chapter 3. A brief introduction to these commands is as follows.
For example, let us assume we have a file letter.txt. Through the ln command, we can
create its symbolic link in the file memo.txt as shown here:
Syntax ln -s source destination

Here, source is the absolute or relative path of the file whose link we want to create, and
destination is the name of the link.

Example ln -s letter.txt memo.txt

The two filenames, letter.txt and memo.txt, refer to the same file, and changes made in
either file will be reflected in the other file.

2.1.3 Pipes
Pipes are used for sending the output of a command as the input to another command. Pipes
are created through the vertical bar character ‘|’, which contains commands on either side.
The output of the command on the left-hand side is sent as input to the command on the
right-hand side. The syntax for creating a pipe is as follows:
Syntax command1 | command2

Example ls | sort
16 Unix and Shell Programming

We will discuss two commands, ls and sort, in Chapter 3, but for the time being, it is
enough to understand that the output of the ls command is sent to the sort command before
outputting the result on the screen.
The pipe created through this syntax is known as anonymous pipe, because it is created
and later destroyed when the process is over. command1 and command2 on either side of the pipe
have their own file descriptors that are automatically closed when the process is over.
Apart from anonymous pipes, we can also create named pipes. As the name suggests,
named pipes have specific names that are assigned to them, and exist as special files within
the file system. Named pipes are known as first in first out (FIFO) because of two reasons.
First, once the data is read from the pipe, it cannot be read again. Second, the order in which
the data is read cannot be deviated. The named pipes are not automatically deleted as in the
case of anonymous pipes but have to be explicitly deleted using the rm or unlink command.
The command used for creating named pipes is mknod. The three commands, mknod, rm, and
unlink, will be discussed in detail in Chapter 3.

2.1.4 Sockets
Socket files are used for transferring information between two processes that are running
on different machines. Socket files are basically used as an interface between our Unix
process and the networking protocol. For example, while accessing the Internet through a
web browser, sockets are used to establish communication between the Unix process and the
browser. The creation of socket files is explained in detail in Chapter 14.

2.2 ORGANIZATION OF FILE SYSTEMS


The file system is organized as a tree with a single root node called root (written as ‘/’); every
non-leaf node of the file system structure is a directory of files, and files at the leaf nodes of
the tree are directories, regular files, or special device files. The name of a file is indicated by
a path name that describes how to locate the file in the file system hierarchy.
A path name is a sequence of component names that are separated by slash characters.
A component is an arrangement of characters that designates a filename that is uniquely
contained in the previous (directory) component. A full path name starts with a slash
character and specifies a file that can be found by starting at the file system root and
traversing the file tree, following the branches, which lead to successive component names
of the path name.
Unix is an operating system that is divided into directories and sub-directories. The
system programs and libraries are categorized according to their functions and placed in
their respective directories. The forward slash (/) at the top of the tree is the root of the tree.
All other directories are the sub-directories of the root directory as shown in Fig. 2.1.
The following is the list of directories and their contents:
bin Executable files are kept in this directory and these files can be run by users. After
compilation, a program is converted into an executable binary and is usually kept in the
/bin directory.

dev All the special files in the Unix file system, such as the keyboard or terminal device
drivers, are kept in this directory.
Unix File System 17

etc All administrative files of Unix are kept in this


directory.
bin dev etc lib lost + found tmp
lib This is the central library storage for files that
Fig. 2.1 List of files and directories in the Unix are commonly used by other programs. A library is a
operating system collection of files (usually binary) that can be shared
among many processes. The advantage of having a
library is that it is a single source of data and each program can use it without needing
a unique copy of these executable functions for itself.
lost + found This is the most likely place where files can be found after the system crashes.
tmp Programs usually need extra space to store data on a disk. The /tmp directory is a
directory used by programs that need extra buffer area in order to be executed.

Since the memory of a computer is limited in nature, we need to swap in the desired process
and swap out the process whose task is done. This swapping is handled by a special file
system known as the swap file system, which is discussed here.

Swap file system


Swapping is a useful technique that enables us to execute programs and manage files that are
larger than the computer’s primary memory. The program or file that we wish to execute or
manage is logically split into small blocks that are known as pages or segments. One of the
blocks is loaded into the primary memory where it can be worked on, whereas the remaining
blocks of the program or file are stored in the physical disk drive. When a part of the program
or file that is not in the primary memory is required, swapping takes place.
Swapping is an operation by which the block of the program or file in the primary memory
is swapped out to the physical disk drive and the next block from the physical disk drive is
loaded into the primary memory. Swapping continues until the desired block is loaded in the
primary memory. The concept by which a system appears to have more memory than what
it actually has is known as virtual memory.
In the Unix operating system, a partitition of hard disk can be treated as virtual memory.
Thus, a separate partition known as swap partition is created on the disk that is meant to hold
the swapped pages of the program or file. Every system should have a swap file system that
is used by the kernel to control the movement of processes. When the system memory is
heavily loaded, the kernel has to move processes out of the memory to this file system. When
these swapped processes are ready to run, they are loaded back to the memory. Users cannot
access this file system directly.

2.3 ACCESSING FILE SYSTEMS


Let us assume we have a device such as a floppy or CD containing a few files, and we want
to view and access those files. The files on these devices make up an individual file system
with its root as ‘/’. The file system on these devices will not be accessible by the Unix system
unless it is mounted.
Mounting a file system means assigning the root directory of the new file system to a sub-
directory of the root directory of our Unix system. The subdirectory on which the new file
18 Unix and Shell Programming

system is mounted is called the mount point of a file system. The files and directories in the
new file system or mounted file system are accessible when we go into that subdirectory. By
mounting a file system, it will become a non-distinguishable part of the existing file system.
Basically, mounting is a procedure of making the main existing file system aware of the new
file system.

2.3.1 Mounting File Systems


By mounting a file system of any device, we make its file and directories accessible through
the existing Unix file system. The format of mounting a file system is as follows:
Syntax mount filesystem/devicename directory

The device name or file system is mounted on the given directory. The directory, also known as
mount point, is the name of the directory that the newly mounted file system will be assigned to.
Note: For the file system to be mounted on a particular directory, the directory should already exist on the
current file system.

In order to mount a file system that has the special device name /dev/fdɧɧ (for floppy disk 0)
onto the existing /mnt directory, the following command is used:
#mount /dev/fdɧɧ /mnt

The new file system is simply an extension of the /mnt directory. We can view and access
the files and directories of the mounted file system by changing the directory to the /mnt
directory. We can also create directories and files in the /mnt directory sub-tree.
The mount point (/mnt) should usually be an empty directory, as we will not be able to
access its original files and subdirectories once a file system is mounted on it. The files of the
/mnt directory will be accessible only when the file system is unmounted.
The file system that is mounted to the main file system should be unmounted after its
job is done. Before shutting down the Unix system, all the mounted file systems need to be
unmounted; otherwise this may result in corruption of the content.

2.3.2 Unmounting File Systems


Unmounting a file system means detaching the mounted file system from the directory of the
Unix system on which it was mounted. Once a file system is unmounted, we will not be able
to access its files or directories.
A file system cannot be unmounted if any of its files or directories are still active, that
is, if they are currently in use. If we try unmounting a file system whose file or directory is
currently open, we get an error message, ‘device busy’.
To unmount a file system that is mounted on any directory, first of all, we need to close all
the open files and directories and then proceed to unmount it. We use the following command
without any arguments, in order to know the file systems mounted within a file system:
mount

It gives a list of the mounted file systems. We might obtain the following output:
mounted mounted over
/dev/fdɧɧ /mnt
Unix File System 19

This output shows that the file system /dev/fdɧɧ is mounted on the /mnt directory.
The command that is used for unmounting the mounted file system is umount. The
following format is adopted for using the umount command:
Syntax umount filesystem name/mount point

Example The command to unmount the file system, /dev/fdɧɧ that we mounted on the /
mnt directory will be as follows:
umount /mnt
We cannot unmount the file system even if we are sitting in the same file system. Thus,
unmounting the /dev/fdɧɧ file system while sitting in the /mnt directory is not possible. We
have to come out of the /mnt directory before giving the umount command.
Note: A file system cannot be dismounted if it is busy, that is, when a file or directory on that file system is
being accessed.

Nowadays, floppy disk drives are no longer manufactured or used. Only for the sake of
explaining mount and umount commands, the concept of floppy disk drives is used. In the
currently available Unix operating systems (like Oracle Solaris 10, which we are using in
this book) and Linux systems, CD ROMs, DVDs, and USB storage devices are automatically
mounted without using the mount command. Thus, mount and umount commands are no longer
needed in the currently available Unix operating systems or equivalents. The USB storage
device is automatically mounted and is available under the /rmdisk directory, whereas the
CD ROM/DVD is automatically mounted and available under the /cdrom directory. This also
means that the following commands will navigate us to the CD ROM/DVD drive and will
depict its contents:
$ cd /cdrom
$ ls

Table 2.1 gives a brief comparison of the file systems of Windows and Unix operating
systems.

Table 2.1 Comparison between file systems of Windows and Unix operating systems

Unix Windows
In Unix, the / (forward slash) represents a separator while In Windows, the \ (backslash) is used for defining the path.
defining the path to indicate a new directory level. The following For example, the directory levels, usr and projects, in
command represents two directory levels, usr and projects: Windows are represented as follows:
cd /usr/projects cd \usr\projects
In Unix, the forward slash (/) indicates the root directory, that In Windows (and in DOS), C:\ indicates the top-level
is, the directory from where all other directories begin. All directory of the file system. Other hard disk drives, floppy
other hard disk drives, pen drives, CD ROM/DVD drives, etc., disk drives, and CD ROM/DVD drives are indicated by
are accessed via the root (/) directory. For example, /cdrom various top-level directory equivalents such as D:\ and E:\.
represents the CD ROM drive.
In Unix, the root account acts as the Unix administrator. In Windows, there is an administrator account that
performs the administrative tasks.
20 Unix and Shell Programming

2.4 STRUCTURE OF FILE SYSTEMS


A file system is a group of files and directories that exists in the form of a tree-like structure
with its root in the form of a root directory. A hard disk can be partitioned into several
parts with each part having its own file system; there can be several file systems in a hard
disk. A file system cannot be split on two different disks; it has to be entirely on a single disk.
A file system has the following four sections that are known as blocks: boot, super, inode,
and data.
The first block of a file system is called the boot block, and it is followed by super, inode,
and data blocks, as shown in Fig. 2.2.

Boot block Super block Inode block Data blocks

Fig. 2.2 File system of Unix

2.4.1 Boot Block


The boot block is the first block of the file system that contains a small bootstrap program.
The bootstrap program is a short program that is loaded by the basic input/output system
(BIOS) on starting any computer. It checks and initializes the I/O devices and also loads the
operating system. During bootstrapping, which is also known as booting, the master boot
record (MBR) is loaded into the memory, which in turn, loads the kernel into the memory.

2.4.2 Super Block


The boot block is followed by the super block. This block contains global file information
about disk usage and availability of data blocks and inodes. It also contains a pointer to the
head of the free list of data blocks.
The super block consists of the following information:
1. Size of the file system
2. Number of free data blocks available in the file system
3. List of free data blocks available in the file system
4. The index of the next free data block in the free data block list
5. Size of the inode list
6. Number of free inodes available in the file system
7. List of free inodes in the file system
8. The index of the next free inode in the free inode list
Figure 2.3 shows a super block containing an array of free data block numbers and a
pointer to the head of the free list of data blocks.
The values 109, 125, 104, 175, and 138 in this figure refer to the
109 125 104 175 138 Super block free data block numbers. The entry 109 is a data block that contains
a pointer to an array of free data blocks. We assume that the index,
which points to the next free data block in the free data block list, is
207 204 292 275 250 Data block
pointing to the last data block number, which is 138.
Fig. 2.3 List of free data blocks in When a process requests for a data block, it searches the free
super block data blocks in the super block and returns the available data block
Unix File System 21

207 204 292 275 250 Super block pointed to by the index, that is, the data block number 138 will
be returned and the index will shift to another data block in the
Fig. 2.4 List of free data blocks list. Thus, after having assigned data block 138 to the requesting
copied from data block to super block process, the index will shift to the point at the data block numbered
175, and the procedure will continue. If the super block contains
only one entry, which is a pointer to the array of free data blocks, all the entries from that
block will be copied to the super block free list as shown in Fig. 2.4.
As usual, the requesting process will continue to get block numbers from the ones listed
in the super block.

2.4.3 Inode Block


Every file or directory has an inode number—a unique number that recognizes the file or
directory in the file system. All inodes are stored in inode blocks.
We know that a hard disk is organized into blocks (or sectors) and a file stored in a
hard disk may be scattered in different blocks. The addresses of these blocks (containing
file parts) are stored in the form of a linked list in an inode block, that is, a table of blocks
is maintained for each file. Each Unix file system has its own inode table in which inode
blocks are stored. Each inode is referenced by a device + inode number pair. There are three
reserved inode numbers—0, 1, and 2—defined as follows:
0 refers to the deleted files and directories.
1 refers to file system creation time, bad blocks count, and so on.
2 refers to the root directory of the file system.
The inode block contains information on each file in the data block. The information
comprises owner of the file, file type, permissions, address of the file, and so on. Each inode
block usually contains the following entries:
1. Owner: Indicates the owner of the inode
2. Group: Indicates the group to which the inode belongs
3. File type: Indicates the type of file, that is, whether the inode represents a file, a directory,
a FIFO, a character device, or a block device (The type value is set to 0 to indicate that
the inode is free).
4. Permissions: Indicates the read, write, and execute permissions of the owner (user),
group, and other members, for the file
5. Access time: Indicates the time at which the file was last accessed
6. Modification time: Indicates the time at which the file was last modified
7. Inode modification time: Indicates the time at which the inode was last modified. (The
inode is modified when the contents of the file are changed, the permission of the file is
changed, link for the file is created, and so on.)
8. Number of links to the file: Indicates the number of links of the file
9. Physical addresses: Indicates the blocks containing the file parts
10. Size: Indicates the actual size of the file
A file’s inode number can be found using the ls -i command.
Example $ ls -i letter.txt
45267 letter.txt
22 Unix and Shell Programming

Here, 45267 is the inode number.


The Unix system also maintains an inode table in the memory for a file that it uses. When
the file is opened, its inode is copied from the hard disk to the system’s inode table.
Note: The inode contains all the attributes of a file except the filename. The filename is stored in the directory
in which the file is kept. The i number is also not stored in the inode, but is used to locate the position of the
inode in the inode blocks.

Directory
The directory contains only two file attributes: inode number and filename. When we create
a link for a file, no separate inode is allocated for it, but the link count in the inode is
incremented by one. A directory entry is also created with the new filename. When we
remove a linked file with the rm command, the link count in the inode is decremented,
and the directory entry for that link is also removed. A file is removed when its link count
becomes zero. The associated disk blocks are also freed in order to make them available for
new files.
A file is internally identified by Unix through a unique inode number that is associated
with it. A directory file contains the names of the files and the subdirectories present in that
directory along with an inode number for each. The inode number is nothing but an index
to the inode table in which information about the file is stored. For example, if the inode
number of the file letter.txt is 45267, it means that the slot number 45267 in the
inode table contains information about the file letter.txt.
Suppose the file letter.txt is present in a directory called India. If we attempt to cat
the letter.txt file, Unix will first check if the user has the read permission for the directory
India. If so, it will find out whether this directory has an entry with letter.txt. If such an
entry is found, its inode number is fetched from India. This inode number is an index to the
inode table. The contents of the file letter.txt are read from the disk addresses mentioned
in the inode entry of letter.txt and then displayed on the screen.
The file contents are placed in the form of data blocks dispersed throughout the disk. In
each inode, an array is maintained to keep track of the data blocks. The first 10 elements of
the array indicate direct indexing, that is, they directly point to the data blocks that contain
the file content. Thus, a file that needs less than or equal to 10 data blocks is accessible
via the direct index entries. After direct indexing comes single indirect indexing, which
in turn, is followed by double indirect indexing and triple indirect indexing, as shown in
Fig. 2.5.
If the file needs more than 10 blocks, it uses single indirect indexing. It contains a pointer
that points to a block, which in turn, contains an array of pointers pointing to the file’s data
blocks.
Double indirect indexing is used for larger files where a pointer points to a block of
pointers that point to other blocks of pointers, which in turn, point to the file’s data blocks.
Triple indirect indexing is used for extremely large files where a pointer points to a block
of pointers that point to other blocks of pointers, which in turn, point to other blocks of
pointers, which finally point to the file’s data blocks.
A question arises with regard to the maximum size of a file that can be pointed to by an
inode.
Unix File System 23

Owner
Group
Data
File type
block
Permissions
Access time
Modification time
Inode modification time Data
Size block
Direct

Data
Direct block
pointers
pointing
to file’s Data
data blocks block

Direct
Single indirect Data
Double indirect block
Triple indirect

Data
block

Data
block

Blocks of pointers
Fig. 2.5 Single, double, and triple indirect addressing for large files

Assuming a data block is of size 4KB and there are 10 direct pointers in an inode, the
directly addressable data block size is 10 × 4KB = 40KB.
In case of single indirect indexing, a pointer points to an entire block of pointers. If a
block is of size 4KB, and each pointer is of 4 bytes, there will be 4 KB/4 pointers, that is,
1024 pointers in a block, where each pointer points to a 4KB block. This means that a single
indirect addressing can address a file that is 1024 × 4KB in size.
Similarly, in double indirect indexing, a pointer points to a block of pointers, which in
turn, point to a block of pointers. Hence, a double indirect addressing can address a file that
is 1024 × 1024 × 4KB in size. By following the same pattern, a triple indirect addressing can
address a file of 1024 × 1024 × 1024 × 4KB size.

Note: The maximum file size that Unix supports is the sum of sizes accessible by the direct, single indirect,
double indirect, and triple indirect addressing.
24 Unix and Shell Programming

2.4.4 Data Block


The actual data is stored in data blocks. Apart from these direct data blocks, there are also
indirect blocks that contain the addresses of the direct blocks. The inode maintains a list of
these indirect block addresses.
When a file is created, the kernel looks up the list available in the super block to look for
a free inode. The efficiency of the system is increased to a great extent, as the list is always
updated, and hence, quite reliable. The kernel reads and writes the copy of the super block
in the memory when controlling the allocation of inodes and data blocks.
The information in the file system needs to be written to the disk before the power of the
Unix system is turned off. The system checks for a possible mismatch during booting. Since
the kernel works with the memory copy of the superblock rather than with the disk copy, the
kernel updates the disk copy with the memory copy. This is done with the sync operation.

Note: The kernel always maintains a copy of the superblock in the memory. The in-memory copy actually
contains the latest and the correct file system status rather than its disk copy.

The information stored in the inode table changes whenever we use any file or change its
permissions; hence, a copy of the super block and inode table are kept in the memory (RAM)
at start-up time, and all changes are made in the RAM copies of the super block and inode
table every time some modification occurs. The original super block and inode table in the
disk are updated after a fixed interval of time, say every 30 seconds, by a command called
sync. This command synchronizes the inode table in the memory with the one on the disk by
simply overwriting the memory copy on to the disk.
The disk space allotted to a Unix file system is made up of blocks, each of which is
typically 512 bytes in size. Some file systems may have blocks of 1024 or 2048 bytes.
Note: The standard system block size is 1024 bytes (known as logical block) and the physical block size is
512 bytes long (i.e., one logical block contains two physical blocks).

■ SUMMARY ■

1. In the Unix operating system, there are three types cached block of the memory.
of files: ordinary files, directory files, and device files. 4. All device files are stored in the /dev directory.
Ordinary files are also referred to as regular files, and 5. A symbolic link is a special file that points to another
they may contain printable characters. existing file on the system. These links are used to
2. The device files are of two types—character device create several names for the same file. Through the
files and block device files. In character devices, in command, we can create the symbolic link of a
read and write operations are performed character by file.
character, that is, 1 byte at a time, whereas in block 6. A pipe is represented as a vertical bar character (|)
devices, read and write operations are performed one and is used for sending the output of a command
block at a time. as an input to another command. Pipes are of two
3. Caching is a process in which the block of the disk types: anonymous pipes and named pipes. Named
accessed is kept in buffer in the memory so that in pipes are known as FIFO, as once the data is read
future, read and write operations are performed in the from the pipe, it cannot be read again.
Unix File System 25

7. Socket files are used for transferring information subdirectory on which the new file system is mounted
between two processes that are running on different is called the mount point of a file system.
machines. 11. Unmounting a file system means detaching the
8. The file system is organized as a tree with a mounted file system from the directory of the Unix
single root node called root that is represented system on which it was mounted.
as ‘/’. 12. A Unix file system typically consists of four blocks:
9. The concept by which a system appears to have boot, super, inode, and data.
more memory than what it actually has is known as 13. Every file or directory has an inode number—a
virtual memory. unique number that recognizes the file or directory
10. Mounting a file system means assigning the root in the file system.
directory of the new file system to a subdirectory 14. A file’s inode number can be found using the ls -i
of the root directory of our Unix system. The command.

■ EXERCISES ■

Objective-type Questions
State True or False
2.1 The first block of the Unix file system is known through symbolic links.
as super block. 2.7 Named pipes are also known as last in first out
2.2 Every file or directory has a unique inode number. (LIFO).
2.3 Unix also treats the physical devices as files. 2.8 In order to see the files or directories of any
2.4 tmp is the folder in which all administrative files device, its file system needs to be mounted.
are kept. 2.9 In block devices, read and write operations are
2.5 Double indirection is used for smaller files. performed one byte at a time.
2.6 We can create several names for the same file 2.10 Pipes are of two types: anonymous and named.

Fill in the Blanks


2.1 In the Unix system, a filename may approximately have more memory than what it actually has is
be characters long. known as .
2.2 Every Unix system has a file system 2.7 Character devices are also known as
in it. .
2.3 The command is used to unmount a 2.8 The boot block is the first block of the file system
file system. that contains a program.
2.4 There are three types of files in Unix: , 2.9 Inodes are maintained in an array form and
, and . are accessed through their indices known as
2.5 Every file or directory is represented by a unique .
number known as . 2.10 Ordinary files are also referred to as ,
2.6 The concept by which our system appears to and they contain printable characters.

Multiple-choice Questions
2.1 The first block of a file system is 2.2 If a directory has 10 files, the number of entries
(a) super block (c) inode block in the directory file will be
(b) data block (d) boot block (a) 10 (b) 11 (c) 9 (d) 0
26 Unix and Shell Programming

2.3 In the Unix operating system, the files are divided (d) triple indirect addressing
into three categories—ordinary, directory, and 2.7 The command that synchronizes the inode table
(a) special files (c) device files in the memory with the one on the disk is
(b) hidden files (d) inode files (a) sync
2.4 The directory in which executable files of the (b) synchronizer
Unix operating system are kept is (c) tally
(a) lib (c) dev (d) matcher
(b) etc (d) bin 2.8 The reserved inode number 0 refers to the
2.5 The Unix file system is organized as a tree with a (a) linked files
single node at the top known as (b) deleted files and directories
(a) foundation (c) seed (c) directories
(b) root (d) stem (d) device files
2.6 The indexing by which a pointer points to a block 2.9 The bootstrap program is a short program
of pointers that point to other blocks of pointers, loaded by
which in turn, point to the file’s data blocks is (a) data block (c) BIOS
known as (b) hard disk (d) named pipe
(a) direct addressing 2.10 The number of sections or blocks that a file
(b) single indirect addressing system has is
(c) double indirect addressing (a) 1 (b) 2 (c) 3 (d) 4

Review Questions
2.1 Write short notes on the following: unmounting a file in a Unix operating system.
(a) Inode block What is the significance of this process?
(b) Ordinary files 2.4 Differentiate the following:
(c) Pipes (a) Character and block devices
(d) Symbolic link (b) Boot block and data block
(e) Inode table (c) Single and double indirect addressing
2.2 What are the different blocks that constitute a 2.5 Explain the role of default files and directories in
Unix file system? the Unix operating system.
2.3 Explain the procedure of mounting and

■ ANSWERS TO OBJECTIVE-TYPE QUESTIONS ■


State True or False 2.10 True 2.5 inode number 2.2 (a)
2.1 False 2.6 virtual memory 2.3 (c)
2.2 True Fill in the Blanks 2.7 raw devices 2.4 (d)
2.3 True 2.1 255 2.8 bootstrap 2.5 (b)
2.4 False 2.2 swap 2.9 i number 2.6 (c)
2.5 False 2.3 umount 2.10 regular file 2.7 (a)
2.6 True 2.4 ordinary files, 2.8 (b)
2.7 False directory Multiple-choice 2.9 (c)
2.8 True files, device Questions 2.10 (d)
2.9 False files 2.1 (d)
Basic Unix C HA PT E R

3
Commands

After studying this chapter, the reader will be conversant with the following:
• Some basic commands that are frequently used
• Logging in to the system, changing password, checking who is logged in,
and displaying date and time of the system
• Dealing with file operations such as creating files, displaying their contents,
deleting files, creating links to files, renaming files, and moving files
• Maintaining directories, creating a directory, changing the current directory,
and removing a directory
• Displaying calendars, using basic calculators, displaying information about
current systems, deleting symbolic links, and exiting from a Unix system

Unix has a large family of commands. However, even before we discuss how to perform a
task with the help of these commands, we need to first log in to the system. Let us see how
this is done.

3.1 LOGIN: LOGGING IN TO SYSTEMS


The first step involved while working on the Unix system is to log in or identify ourselves
with the system. We should be assigned a user ID and password by the administrator,
which would enable us to log in to the system. The user ID and password (along with
other information) are assigned while adding a new user to the system. As soon as we
switch on the Unix system, we are prompted to log in. The login prompt appears as Login.
At the login prompt, we type the user ID (a unique login name provided by the
administrator).
After having typed the user ID, we are prompted for a password. The password entered
here is encrypted and appears in the form of a string of asterisk symbols in the following
manner.
28 Unix and Shell Programming

Example login: chirag


password: *****

Note: One of the main security features of the Unix operating system is the displaying of asterisks while typing
the password and storing the actual password in an encrypted format (also known as the hash of the password)
in the /etc/shadow file that can be accessed only by the root.

In case the user ID or password is wrongly entered, we get the following error message:
Login incorrect
login:

This message informs the user that either the user ID or the password has been entered
incorrectly, and a new login prompt is displayed to try again.
If the user ID and password are correct, we will be allowed to log in to the Unix system
and will be navigated to our home directory, that is, the directory in which our personal files
and settings are stored. In addition, a message indicating when we last logged in, along with
the shell prompt, is displayed:
Last login: Fri Dec 15 10:30:05 on ttys17
$

This message indicates the date, time, and terminal from which we last logged in. The
message is followed by the default Unix shell prompt by which we can write and execute
Unix commands. The default Unix prompt for the Bourne, Bash, and Korn shells is the
dollar sign ($). For C and tcsh shells, the prompt is the percentage sign (%).
You must be wondering who the administrator refers to. Let us understand this term.
System administrator A system administrator is a person who is responsible for setting up
and maintaining the Unix operating system. He/She is responsible for the proper functioning
of the Unix system and also ensures that the system resources are optimally utilized. The
following are a few of the tasks performed by the system administrator:
1. Set up and maintain user accounts
2. Monitor access and privileges and set up security policies
3. Monitor system performance and ensure proper utilization of resources
4. Install and upgrade software whenever desired
5. Take backup at regular intervals and restore systems in case of a crash
6. Perform proper starting and shutting down of systems

3.2 OVERVIEW OF COMMANDS


Commands refer to one-line statements that operate on the supplied operands to perform some
action. Each command is supported with certain options that add extra features to the command.
These options enable the user in driving the command to carry out the desired action. The options
are mostly prefixed with a hyphen (-) and more than one option can be used in a command.
We are going to look at very basic internal and external commands that users execute while
working on the Unix operating system. These command functions include listing of files
and directories, making new directories, changing directories, removing directories, creating
Basic Unix Commands 29

files, looking at the content of the files, copying, renaming, and deleting files, viewing system
date and time, and knowing the list of users who are logged in, among others.
The user performs very general operations while working with the Unix operating system.

3.2.1 Structure
As mentioned in Section 3.2, a traditional Unix command consists of options and operands,
where options are generally in the form of a character prefixed by a hypen (-), which is used for
exploiting a particular feature of the command. The argument refers to the content or data to
which the command has to be applied. An argument can be a file, directory, terminal, device, etc.
The syntax of a Unix command is as follows:
Unix_Comannd [-option1][-option2]...[Argument]
Let us understand the different types of commands in Unix.

3.2.2 Types of Commands in Unix


The basic commands in Unix are divided into two broad categories:
Internal commands The shell has a number of built-in commands that are known as internal
commands. Some of these, such as cd and echo, do not generate a process, and are directly
executed by the shell. These commands are built into the shell and do not exist as separate files.
External commands External commands are Unix utilities and programs such as cat and
ls. These commands exist in the form of individual files and are distributed in different
directories. The commonly used user commands are placed in the /bin directory, and the
commands that are usually used only by system administrators are placed in the /etc directory.
In the subsequent sections, we will be learning about the usage of the basic commands in
Unix, namely passwd, ls, mkdir, cd, rmdir, pwd, uname, touch, cat, cp, rm, mv, ln, unlink, tput,
who, finger, date, cal, echo, bc, globbing, and exit, and some line-continuation characters.

➢ passwd: Changing password


It is considered good practice to change the login passwords at regular intervals so as to
avoid any possibility of illegal access of the files by any unauthorized person. The command
for changing the password is passwd.
Syntax $passwd

On executing the passwd command, we will be prompted to enter the old password before
giving the new password (to confirm that only authorized people are changing the password).
In addition, the new password should be significantly different from the older one. It should
be at least six characters long, and have at least two alphabets, one numeric, and one special
character. On executing the command, we may get the output shown in the following example.
Example
$passwd
Changing password for chirag
Old password: *********
New password: **********
Re-enter new password: **********
30 Unix and Shell Programming

If the new password and the old password are not very different from each other, we may get
the following error:
Passwords must differ by at least 3 positions
The two passwords entered in New password and Re-enter new password should be the same;
else we will get the following error:
They don't match
Try again
In case the two passwords entered in New password and Re-enter new password are exactly
the same, the password of the user will be changed and we will get a confirming message:
Password updated successfully.

➢ ls: Listing files and directories


This command shows the files and directories on the disk. By default, the files and directories
are displayed in alphabetical order. If the name of the directory is not specified, it will display
the list of files and directories of the current working directory.
Syntax $ ls -[options]

There is a list of options available with the ls command, as shown in Table 3.1.
Table 3.1 List of options available with the ls command
Options Syntax Description
-x ls –x Shows files in multiple columns (default)
-F ls -F Shows files and directories, files have / as suffix
-r ls -r Shows files sorted in reverse alphabetical order
-R ls -R Shows the recursive listing, that is, files of directories as well as
subdirectories are also displayed
-a ls -a Shows all the hidden and visible files; hidden files start with a dot (.)
-d ls –d directory_name Shows only the directory name instead of listing its content; used
with –l option to know the status of the directory
-l ls -l Shows files in the long-listing format (shows seven attributes of a
file, that is, file permissions, number of links, owner, group, size,
date and time, and file/directory name)
-t ls -t Sorts files by modification time; the latest file is on the top
-u ls -u Sorts files according to the last access time, starting with the most
recent file
-i ls -i Shows inode number of all the files

While listing and searching for files and directories, we can also make use of wild-card
characters. These characters help in finding files and directories that begin with specific
character(s), contain specific character(s) or a range of characters in their names, consist of
names of a specific length, and so on. They provide a quick and convenient way of searching
for the desired files and directories.
Wild card matching A string is a wild-card pattern if it contains one of the following
characters: ‘?’, ‘*’, or ‘[’.
Basic Unix Commands 31

1. ‘?’ matches any single character.


2. ‘*’ matches 0 or more instances of any character, that is, it also matches the empty string.
3. [c1-c2] matches a single instance of any character within the range c1 and c2. For
example, [a-d] is equivalent to [‘abcd’]. Similarly, [0–9] represents a value from 0 to 9.
[a-zA-Z] represents all lower-case and upper-case letters.
We will learn more about wild cards and filename substitution (globbing) at the end of this chapter.
In order to understand these options, let us for a while assume that on giving the ls command
in the current directory, we get a list of the following files and directories:
$ls
courses
notes.txt
programs.doc
university
..

To get all the files beginning with a specific character, we can give a command using the
following syntax:
Syntax $ ls charactername*

Here, * represents 0 or more occurrences of any character.


Example In order to obtain all the files beginning with character n, we can give the
following command.
$ ls n*
notes.txt

In order to get all the files beginning with a character in a given range, we give the command
in the following syntax.
Syntax $ ls [c1-c2]*

Here, c1 and c2 represent the beginning and ending character of the range, respectively.
Example In order to get all the files beginning with characters a to d, we can give the
following command.
$ ls [a-d]*
courses

Similarly, we can use the wild-card character, ?, which represents a single character, to get
the desired files. For example, to get all the files that consist of three characters and begin
with character a, we can give the following command:
$ ls a??
However, since none of the files meet these criteria (assuming no filename exists that is three
characters long and begins with character a), we will not get anything as the output.
In order to get all the files that begin with character a followed by any digit, we can give
the following command:
$ ls a[0-9]*
32 Unix and Shell Programming

Again, as we can see, no file that begins with character a is followed by a digit in our list of
directories and thus no output is generated.
If we use the –l option for long listing, we may get the following output:
$ls –l
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university

Seven attributes are displayed: file permissions, number of links, owner, group, size, date
and time, and file/directory name.
In order to see all the files, including the hidden files, we use the –a option. The output is
as follows:
$ls –al
-rwxr--r-- 2 chirag it 80 Nov 11:31 .
-rwxr--r-- 2 chirag it 72 Nov 11:31 ..
-rwxr--r-- 1 chirag it 210 Nov 11:31 .profile
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university

Note: Filenames that begin with the dot (.) are considered hidden files in Unix.

By default, the file and directory names are sorted alphabetically. We can use the –t option to
sort them according to the modification time; the file that is created last is displayed at the top.
$ls –lt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt

In order to get the inode number of the specified file, we can use the –i option, as shown here:
$ ls –li programs.doc
39984 -rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc

The digit 39984 is the inode number of the file programs.doc. Let us recall a concept from
Chapter 1: each file or directory in the Unix operating system has a unique number known as
inode number, which recognizes the file or directory in the file system.

➢ mkdir: Making directories


The mkdir command enables us to create one or more new directories.
Syntax $ mkdir –[mp] dirname

The option –m stands for mode and is used for creating the directory with certain specific
permissions.
Basic Unix Commands 33

The option –p stands for parent and is first used for creating all the non-existing parent
directories that are mentioned in the given path.
dirname is the directory name that may be either an absolute path name or a relative path
name. We may specify more than one directory name on a single command line.
Note: Absolute and relative paths—A path refers to the exact location of a given file or directory. Basically,
directories exists in a tree hierarchy, one inside another, and a directory or file is referred through a path, where
the path components are delimited by the forward slash (/).
A path can be an absolute path or a relative path. The absolute path points to the given file or directory
regardless of the current working directory and is written in reference to the root directory, whereas the relative
path is a path for a given file or directory in relation to the current working directory. Remember, the absolute
path always starts with a forward slash, which represents the root directory. Moreover, the absolute path of the
given file or directory is always the same, whereas the relative path changes according to the current directory
location. The following are the examples:
(a) Assuming a directory projects exists inside another directory usr, exists on the root and that the current
working directory is usr, the following are the two paths to the projects directory:
Absolute path: /usr/projects
Relative path: projects
(b) Similarly, if there is another directory experiment inside the directory /usr, and the current working
directory is projects, then the following are the two paths to the experiments directory:
Absolute path: /usr/experiments
Relative path: ../experiments

Consider the following example.


$ mkdir courses

This command creates a directory by the name courses under the current directory.
$ mkdir courses faculty placement

This command will create three directories by the names courses, faculty, and placement.
Note: If dirname already exists, the mkdir command aborts and does not overwrite the existing directory.

$ mkdir courses

Since a directory with the name courses already exists, this command generates the following
error:
mkdir: can't make directory courses

By default, the directories are created with read, write, and execute permissions for owners and
with read and execute permissions for groups and others, respectively. However, in order to create
a directory with a particular set of permissions of our choice, we can use the following command:
$ mkdir –m 746 country

This command creates a directory country with read, write, and execute permissions for the
owner; only read permission for the group; and read and write permissions for others.
The option –p stands for parent and is used for creating a parent directory in the given path.
34 Unix and Shell Programming

Example $ mkdir –p university/colleges/professors

This command creates a directory university; within the university, a subdirectory


colleges gets created, and under that, a sub-subdirectory professors is created.
There are several situations in which the directory is not created and the mkdir command
is aborted while displaying the following error:
mkdir: Failed to make directory

The reasons for this error can be any of the following:


1. A directory with the same name already exists.
2. An ordinary file by the same name exists in the current directory.
3. The user does not have the read and write permissions to create files and directories in
the current directory.
Line-continuation characters Sometimes, we come across commands that are too long to
be accommodated in a single line. Such commands are continued in the next consecutive line by
making use of the line-continuation character. The line-continuation character is \ (backslash).
Hence while writing a long command, when we come to the end of a line, we have to type
\ (backslash) and press the Enter key without any space or character following it. The shell
displays the ‘>’ symbol to indicate that the current line is in continuation of the previous line. We
can continue typing the remaining part of the command to the right of the ‘>’symbol and press the
Enter key when the command is over in order to get its output. We can use the line-
continuation character, that is, \ (backslash) any number of times for a single command.
Figure 3.1 demonstrates this by using the mkdir command for creating three direc-
tories, courses, faculty, and placement, while using the line-continuation character
(i.e., backslash).

➢ cd: Changing directories


We use the cd command to change to any directory in the file system.
Syntax $ cd pathname

Here, path name is either an absolute or a relative path name for the desired target directory.
Example $ cd ajmer

This command changes our current directory to ajmer (that is assumed to exist in the current
directory). When we directly give the directory name (without using ‘/’ as prefix), it means
that it is a relative path (i.e., a path related to the current directory).
$ cd /home/chirag/ajmer
$ mkdir courses \
> faculty \ This command takes us to the sub-subdirectory ajmer, which is in the chirag
> placement
subdirectory of the home directory. The path used in the aforementioned
$ example is an absolute path.
Fig. 3.1 Line-continuation
$ cd ..
character used in the
mkdir command This command takes us to the parent directory.
Basic Unix Commands 35

Note: .. refers to the parent directory.

We can return to our home directory from any other directory by simply typing the
cd command without an argument. We do not need to specify our home directory as an
argument, because our shell always knows the name of our home directory.

➢ rmdir: Removing directories


This command is used to remove a directory.
Syntax $ rmdir [-p] pathname

Here, the –p option is used for deleting the parent directory if it is empty.
Note: The rmdir command cannot remove a directory until it is empty.

Examples
(a) In order to remove a single directory, consider the following example.
$ rmdir ajmer

This removes the directory ajmer if it is empty; else, we will get the following error:
rmdir: ajmer: Directory not empty

(b) We can delete more than one directory using the following single command.
$ rmdir courses placement

The directories that are empty will be deleted with this command.
$ rmdir university/colleges/professors university/colleges university

This command deletes the professors sub-subdirectory from the colleges subdirectory;
then it deletes the colleges subdirectory from the university directory and finally, from
the university directory.
We can get the same result using the –p option as follows:
$ rmdir –p university/colleges/professors
Remember, we cannot use rmdir to remove our current working directory. If we wish to
remove our working directory, we have to first come out of it.

➢ pwd: Print working directory


pwd stands for print working directory. It displays the absolute path name of our working
directory.
Syntax $ pwd

Example $ pwd
/home/chirag

This output indicates that we are in the home directory of the user ID chirag. We can see that
the pwd command displays the full path name of the current directory.
36 Unix and Shell Programming

The pwd command is a valuable utility when we are moving around in the file system
hierarchy. If we change our directory, pwd confirms the change of our location, as shown in
the following sequence of commands:
$ pwd
/home/chirag
$ cd ajmer
$ pwd
/home/chirag/ajmer

We can see that when we change our directory to the ajmer subdirectory, the output displayed
by the pwd command confirms the same.

➢ uname: Displaying information about current system


The uname command displays information about the current system: hardware platform,
name of the operating system, and release level.
Syntax uname [-a] [-i] [-n] [-r] [-v] [-s] [-S system_name]

The options and arguments shown in the aforementioned syntax are briefly explained in
Table 3.2.
Table 3.2 Brief description of the options in the uname command
Options Description
-a Displays basic information currently available in the system
-i Displays the name of the hardware platform
-n Displays the node name, the name by which it is connected to the communication network
-r Displays the operating system release level
-v Displays the operating system version
-s Displays the name of the operating system (default)
-S Used to get basic information of the specified system name (Only the super user can use this
option.)

Note: Super user and root user refer to the Unix administrator.

Examples
(a) $ uname -a
SunOS station1 5.10 Generic_147441-01 i86pc i386 i86pc
This output shows the basic information of the system, including the hardware
platform, the operating system, its version, and so on.
(b) $uname -n
station1
This output shows that our machine is connected in the network by name, station1.
(c) $uname -i
i86pc
This output indicates that our machine is using a 64-bit processor.
Basic Unix Commands 37

(d) $uname -r
5.10
This output shows the operating system release level.
(e) $uname -s
SunOS
This output indicates that our machine has a Linux operating system installed.

➢ touch: Creating files and changing time stamps


The touch command is used for creating files and changing time stamps. Here, time stamps
means both the times, that is, the time the file was last accessed and the time the file was last
modified.
Syntax touch –[ma] time_expression filename

Here, the –m option is used for changing the modification time, and the –a option is used
for changing the access time. The time_expression that we would provide should be in the
following format: MMDDhhmm, where M: month, D: day, h: hour, and m: minute.
When the touch command is given without any option and time expression, it simply
creates a file of zero bytes.
Examples
(a) $ touch chirag.txt
This creates a file called chirag.txt of zero byte.
We can create several empty files quickly with the touch command.
(b) $ touch chirag1 chirag2 chirag3 chirag4
This command creates four new files with the following names: chirag1, chirag2,
chirag3, and chirag4 (without any contents in them).
(c) $ touch 09211520 chirag.txt
This sets the modification and access time of the file chirag.txt to Sep 21 15:20.
(d) $ touch –m 11071015 chirag.txt
This command sets the modification time of the file chirag.txt to Nov 07 10:15.
(e) $ touch –a 07120820 chirag.txt
This will set the access time of the file chirag.txt to Jul 12 08:20.
Note: The commands ls –l and ls –lu can also be used to set the modification time and access time,
respectively, of any file.

➢ cat: Showing, creating, and concatenating files


The cat command is basically a short form of concatenate, which means ‘combine’. With
the cat command, we can not only create files and see their contents but also combine their
contents. The operator > can be used with the cat command to combine multiple files into a
single one. The operator >> can be used to append to an existing file.
Syntax cat [-n] [-s] [-v] files

The options and arguments shown in the aforementioned syntax are briefly explained in
Table 3.3.
38 Unix and Shell Programming

Table 3.3 Brief description of options available with the cat command
Options Description
-n It precedes each line output with its line number.
-s It suppresses messages when non-existent files are used in the command.
-v It displays non-printing characters, except tabs, new lines, forms, and feeds, that exist in a file. To
display new lines, the -e option is used along with the –v option. To display tabs and form feeds, the -t
option is used along with the –v option. The new lines are represented by ‘$’, tabs are represented by
‘^I’, and form feeds are represented by ‘^L’.

Showing content To display the contents of any file, we just need to specify the filename
after the cat command
$ cat chirag

This command shows the contents of the file chirag.


If more than one filename is specified, cat will display the contents of all the files one after the
other, that is, the contents of the first file followed by the contents of the second file, and so on.
$ cat chirag notes.txt
This command displays the contents of the file chirag followed by the contents of the file
notes.txt.
To get the line numbering along with the file contents, we use the –n option with the cat
command:
$cat –n chirag
1 Today it might rain
2 Thanks so much
3 This is supposed to be a leap year

Note: We assume that the file chirag contains a couple of tabs that are deliberately added to the file.

Creating files For creating files through the cat command, we redirect the standard output
to a file instead of the monitor, as shown in the following example:
$ cat >chirag

If we press the Enter key, we would find the cursor positioned in the next line, waiting to
type the matter that we want to store in the file chirag. After typing a few lines, press Ctrl-d.
Note: Ctrl-d keys indicate the end of file character (EOF).

Example $ cat >chirag


Today it might rain
Thanks so much
This is supposed to be a leap year
Ctrl-d

Showing hidden characters in files The following command shows the hidden characters
and new lines in the form of $ (refer to Fig. 3.2):
Basic Unix Commands 39

Today it might rain$ TodayᶺIitᶺImightᶺIᶺIrain


Thanks so much$ Thanks so much
This is supposed to be a leap year$ This isᶺIsupposedᶺIto be a leap year

Fig. 3.2 New lines displayed in the form of $ Fig. 3.3 Tabs and form feeds displayed as ^I and ^L

$ cat –ve chirag

The following command shows the hidden characters and tabs in the form of ‘^I’ and form
feeds as ‘^L’ as shown in Fig. 3.3.
$ cat –vt chirag

The cat command, apart from displaying the contents of the file, also helps concatenate the
contents file.
Concatenating files To concatenate the contents of two files and store them in the third
file, we can use the following command:
$ cat chirag1 chirag2 >chirag3

This command stores the contents of the file chirag1 followed by the contents of the file
chirag2 into the file chirag3. If chirag3 already contains something, it would be overwritten.
If we want it to remain intact and the contents of chirag1 and chirag2 to be appended, we
should use the following command:
$ cat chirag1 chirag2 >>chirag3

➢ cp: Copying files


The cp command is used to create a duplicate copy of ordinary file(s) in another name.
Syntax $ cp –[ir] srcfile destfile

Here, srcfile is the original or source filename, and destfile stands for destination filename.
If a file by the destination filename already exists, it will be overwritten with the contents of
the source file without any warning.
The option –i is used for interactive copying, that is, if a file by the destination filename
already exists, then cp will prompt us before overwriting the file.
The option –r is used for recursive copying and especially when we want to make a copy
of an entire directory (along with its subdirectories and files) using another directory name.
Example $ cp chirag chirag1
This example makes a copy of the file chirag in the name chirag1. We can confirm this by
looking at the content of both the files. If the contents of both the files are found to be the
same, it indicates that the chirag file is successfully copied in the name chirag1. With the
help of the cat commands, we can look at the contents of the files chirag and chirag1.
$ cat chirag
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
40 Unix and Shell Programming

$cat chirag1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns

The contents of chirag and chirag1 are found to be the same and hence, this confirms that
the file chirag is copied in the filename chirag1.
$ cp /home/chirag/ajmer/a.bat .

This command copies the file a.bat from the directory ajmer (subdirectory of chirag)
into the current directory. The period (.) at the end of the cp command denotes the current
directory.
For interactive copying, we use the following command:
$ cp –i chirag chirag1

If a file by the name chirag1 already exists, then, before overwriting it, we will be notified
with the following message:
cp: overwrite chirag1 (yes/no)?

Here, we need to enter y followed by the Enter key if we want to overwrite the file.
For copying an entire directory along with its subdirectories, we use the following
command:
$ cp –r courses latestcourses

It will make a copy of the courses directory (along with its files and subdirectories) with the
name latestcourses.

➢ mv: Renaming files


The mv (move) command changes the name of a file. Basically, using the mv command, the
file is removed from its current location and is copied to another location. In the directory
entry, the link to the old filename is removed, and it is replaced by a link to the new filename.
Syntax $ mv oldname newname

This command in the syntax will change the filename from oldname to newname.
Example $mv chirag chirag2

This command moves or renames the file chirag to chirag2. When we look at the contents
of the file chirag2, we get the same contents that were in the file chirag, which is shown
here.
$ cat chirag2
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns

It indicates that the file chirag2 is nothing but the same file that existed earlier with the name
chirag.
Basic Unix Commands 41

$ mv chirag2 ajmer
or
$ mv chirag2 /home/chirag/ajmer

This moves the file chirag2 into the ajmer subdirectory. Now, the file chirag2 is no longer
available in the current directory.
For moving more than one file, we use the following command:
$ mv notes.txt programs.doc /home/chirag/ajmer
The files notes.txt and programs.doc are removed from the current location and moved to
the ajmer subdirectory.

➢ rm: Removing files


This command removes one or more ordinary files from a directory. The file is removed by
deleting its pointer in the appropriate directory. In this way, the link between that filename
and the physical file is broken, hence, the file can no longer be accessed.
Syntax $ rm -[irf] filename

Here, each filename is separated by white space. The options and arguments shown in the
aforementioned syntax are briefly explained in Table 3.4.
Table 3.4 Brief description of options available with the rm command
Options Description
-i It is used for interactive file deletion, i.e., we will be prompted for confirmation before the
file is deleted.
–r It is used for recursive deletion, i.e., it is used for removing an entire directory along with
its files and subdirectories.
–f It is used to forcibly remove a file for which we do not have the write permission.

We will learn about file permissions shortly.


Examples
(a) $rm notes.txt
This command deletes the files notes.txt.
(b) $rm –i programs.doc
This command prompts us for confirmation before removing the file. The prompt may
be as follows:
rm: remove programs.doc (yes/no)?
To delete the file, we type y followed by the Enter key (or type character n in case we
do not want to delete the file).
(c) We can delete more than one file using a single rm command.
$ rm syllabus notes.txt programs.doc
This command deletes all the three files syllabus, notes.txt, and programs.doc.
(d) $ rm –r courses
This command removes all the files and subdirectories of the courses directory and,
finally, the courses directory itself.
42 Unix and Shell Programming

(e) To remove a file that is write protected (for which we do not have the write permission),
we can use the following command.
$ rm –f results
This command deletes the results file even if we do not have the write permission for
doing so.

➢ ln: Linking files


The Unix file system allows the creation of more than one filename for the same physical
file. In other words, it is possible to have aliases (links) for any given file.
The ln command may be used for establishing additional links. There are two types of
links—hard as well as symbolic links—and both can be created with this command.
Syntax $ ln –[sf] oldname newname

After the ln linking, both newname and oldname refer to the same file.
The default link type is hard. In order to create a symbolic link, the symbolic option (-s)
is used.
Example $ln chirag1 mce1
$ls

Through this command, a hard link will be created for the file chirag1 by the name mce1.
Note: When the –s option is not used with the ln command, a hard link is created.

We get several filenames and directories in the current directory along with the two filenames
mce1 and chirag1, and when we write the following command,

$ ls -l chirag1

we get the following output:


-rwxrwxrwx 2 chirag it 7669 Nov 11:21 chirag1

The group of rwx is the permissions for owners, groups, and others; 2 is the number of
links (also known as link count) of the file; and chirag is the owner. The group name is it.
The size of the file is 7669 bytes. Next comes the date and time the file was last modified.
The output ends with the filename chirag1.
If another link was to be created, the link count would change to 3.
Note: A link count is an integer value that is maintained for each file or directory and indicates the total number
of links pointing to it. When a new link is created, the link count value is increased by one. Similarly, when a link
is removed, the value is decreased by one. When a link count becomes zero, it means the file or directory has
no links, and hence, the disk space allocated to it is deallocated.

Both mce1 and chirag1 point to the same file. When we look at the contents of the file mce1,
we get the same contents as in chirag1.
$cat mce1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
Basic Unix Commands 43

Note: If we change the contents of the file mce1, the contents of chirag1 will also change, because although
the names mce1 and chirag1 are different, both of them refer to the same file.

To see the inode numbers of the linked files, we give the following command:
$ls -li ichirag1 mce1
20985 -rwxrwxrwx 2 chirag it 320 Nov 11:21 chirag1
20985 -rwxrwxrwx 2 chirag it 320 Nov 11:21 mce1

The -li option with the ls command displays the inode number along with the long listing
of the specified files. We can see that both the files have the same inode number, 20985,
which confirms that both point to the same file.
In order to remove a file with more than one link from the file system, we should delete all
the links with the rm command. For example, let us delete the link mce1 using the following
command:
$ rm mce1

The file still exists under the name chirag1 as confirmed by the following command:
$cat chirag1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns

Let us remove the file chirag1.


$ rm chirag1

The file is now completely inaccessible.


If the destination file already exists, the link will not be created. Assume we want to make
xyz.txt as the link of the file abc.txt and xyz.txt is already an existing file that has some
contents. Let us give the following command:
$ ln abc.txt xyz.txt

The link will not be created and the following error will be displayed:
ln: xyz.txt: File exists

The –f option stands for force option and is used when we want to overwrite an existing file
(while creating a link) without getting any message.
$ ln -f abc.txt xyz.txt

Hard links The default link that is created is a hard link (which we have been using until
now). The following are the characteristics of hard links:
1. Unix hard links can point to programs and files, but not to directories.
2. If the original program or file is renamed, moved, or deleted, the hard link is not broken.
3. Hard links in Unix cannot span different file systems, that is, we cannot have a hard link on
the /usr file system that refers to a program or file on the /tmp file system. The reason is that
hard links share an inode number, whereas each file system has its own set of inode numbers.
44 Unix and Shell Programming

Symbolic links Hard links cannot be created for different file systems. that is, they can
be made within the current directory structure. Symbolic links (symlinks) are used to link
to a different file system. The symbolic link, also referred to as soft link, is a special type
of file that references another file or directory. It simply contains the name of the file that it
references and contains no actual data. It gives us power and flexibility to manage files. We
can change the symlink to point to the desired files. Soft links also inherit the permission of
the folder they are pointing at. To create a symbolic link in Unix, let us use the following
syntax:
Syntax ln -s target_file symbolic_link

Here, target_file is the name of the existing file for which we want to create the symbolic
link, and the symbolic_link is the symbolic link for the target_file.
Example Consider a file named chirag1. Let us create a symlink called mce1, which points
to the original file, inventory.txt.

$ ln -s chirag1 mce1

We first specify the target file, the file that we want our symlink to point to, and then specify the
name of our symbolic link. On executing the ls -al command, we will find that the mce1 file
will have an ‘l’ in the long format of the ls command, which confirms that it is a symbolic link.

Note: An orphan symlink is a symbolic link that points nowhere, that is, the original target file it used to point
to earlier is either deleted or renamed.

20985 -rwxrwxrwx 2 chirag it 320 Nov 11:21 chirag1


20985 lrwxrwxrwx 2 chirag it 320 Nov 11:21 mce1->chirag1

The difference between symbolic link and hard link is that the symbolic link has the ability
to link to directories or files on remote computers. In addition, when you delete a target file,
the symbolic links to that file become unusable, whereas the hard links preserve the contents
of the file.

➢ unlink: Deleting symbolic links


The unlink command removes the specified file, including symbolic links.
Syntax unlink filename

Example unlink accounts.txt


If this file accounts.txt exists as a linked file, it will be deleted.

➢ tput: Exploiting terminal capabilities


The tput command is used for exploiting terminal capabilities through the terminfo database.
Hence, we can use the tput command to clear the screen, move the cursor, and underline
text. The tput command uses the terminfo database to know the different features that are
supported by a terminal, and converts the commands given by the user through the tput
command into the code that the terminal understands.
Basic Unix Commands 45

Terminfo is a database that defines terminal and printer attributes and capabilities. It contains
information such as the number of rows and columns in a terminal and the attributes of text
displayed on the terminal.
Syntax tput [clear][cup col row] [cols][lines][sc][rc][civis][cnorm][dl n][setb]
[setf][bold][sgr0][smul[rmul]

Table 3.5 gives a list of options available with the tput command.

Table 3.5 Brief description of the options available with the tput command
Options Description
clear It clears the whole screen.
cup col row It moves the cursor position to the given row and col position.
cols It displays the number of columns on the terminal screen.
lines It displays the number of lines on the terminal screen.
sc It saves the current cursor location.
rc It restores the cursor position, i.e., it returns the cursor to its last saved location.
dl n It deletes n number of lines below, including the current row, i.e., the row in which the
cursor is positioned.
bold It makes the text appear in bold.
sgr0 It turns off bold.
Smul It begins underlining text.
Rmul It stops underlining text.

Examples
(a) $ tput cup 10 5
This statement moves the cursor to the fifth row and the tenth column.
(b) $ tput cols
This statement displays a value 80, which represents the number of columns of the
terminal screen.
(c) $ tput dl 4
This statement deletes four lines below, including the current row.
(d) $ tput bold
This statement will make the text appear in bold until the srg0 command is invoked.
(e) $ tput clear
It clears the whole screen.

Note: We know that the tput command is mostly used in scripts but is deliberately provided here, as its option
clear is frequently used for clearing the screen while running commands at the command prompt.

➢ who: Who is online


The who command displays all users who are currently logged in to the system. It returns the
user’s name (ID), terminal, and the time at which he or she logged in.
46 Unix and Shell Programming

Table 3.6 Brief description of options in the who command


Options Description
-u It displays information regarding users who are logged in. Their login name, name of the terminal through
which they are logged in, and the date and time of login are displayed.
-H It displays information pertaining to the users who are logged in along with the column headings.
am I It displays information of the users who are logged in.

Syntax who [-u] [-H] [am i]

These options are briefly explained in Table 3.6.


Examples
(a) $ who
anil tty1 Oct 15 10:56
chirag tty2 Oct 10 11:25
ravi tty5 Oct 15 13:07

To know whether the user is active, we use the -u option, which also indicates how long it
has been since there was any activity. This is known as idle time. It also returns the process
ID for the user.
$ who -u
anil tty1 Feb 10 14.25 0:45 1103
chirag tty2 Feb 08 11:25 old 1568
ravi tty5 Feb 10 15:10 . 1456

If we look at this example carefully, we will see three different formats for idle time. The
first user has had no activity for 0 hours and 45 minutes. The second user has had no activity
for more than 24-hours. Since there is only enough room for 24 hours in the idle time format,
when a user is inactive for more than 24 hours, the system simply says ‘old’. The third
user’s idle time is a period (.), that is, he/she has carried out an activiy in the last minute.
(b) If we use the H option, Unix displays a header that explains each column.
$ who -uH
NAME LINE TIME IDLE PID COMMENTS
anil tty1 Feb 10 14.25 0:45 1103 (:0)
chirag tty2 Feb 08 11:25 old 1568 (:0.0)
ravi tty5 Feb 10 15:10 . 1456 (:0.0)
(c) If we want to view information about ourselves, we can use the argument am I along with
the who command.
$who am I
chirag tty2 Feb 08 11:25

➢ finger: Online user’s details


The finger command displays information of users who are logged in. Compared to the
who command, the finger command displays more elaborate information pertaining to these
Basic Unix Commands 47

Table 3.7 Brief description of the options in the finger command

Options Description
-l Displays information of the user in a long format comprising login name, real name, terminal name, write status,
idle time, login time, office location, office phone number, user’s home directory, home phone number, login
shell, mail status, and the contents of the files, .plan, .project, and so on, from the user’s home directory
-s Displays information of the user in a short format comprising login name, real name, terminal name, write
status, idle time, login time, office location, and office phone number
-b Suppresses printing the user’s home directory and shell in a long format display
-w Suppresses printing the full name in a short format display

users. Apart from the login name, terminal name, date, and time of the logged-in users, the
command also displays other information such as the user’s home directory, phone number,
login shell, and mail status, among others.
Syntax finger [-b] [-l] [-s] [-w] [username]

These options are briefly explained in Table 3.7.


If no options are specified, the finger defaults to the -l option output if username is provided;
else, the -s option output is chosen. In the case of fields whose information (such as office
location and phone number) is not available, the information will not be displayed in the output.
To find out who is logged in to which terminal, we use the finger command without an
argument, as in the following example.
Example $ finger
Login Name TTY Idle When Where
chirag chirag tty2 3:10 Sat 08:15 :0
ravi ravi tty3 Sat 11:33 :0.0
root root tty1 10d Sat 08:15 :0.0

This list shows three active logins. The actual time at which each user logged in and the time
the terminal has been idle are also listed. The idle time is the time that has elapsed since the
last keystroke. From the idle time, we can usually tell whether someone is at the terminal.
For example, we can say that there is no root user at the tty1 terminal, because there has
been no keystroke for 10 days. The user on the tty2 terminal has not used the terminal for
more than three hours.
The finger command can also be used to get the details of a single user, as shown in the
following example.
$ finger chirag
Login: chirag In real life: (null)
Directory: /home/chirag Shell: /bin/bash
On since Mon Dec 26 02:15 on tty2 from :0.0
No mail.
No Plan.

This output shows the login name, real name (null), home directory, login shell, login time,
terminal name, mail status, and so on.
48 Unix and Shell Programming

➢ date: Displaying system date and time


This command displays the system date and time.
$ date

If no argument is given, the current date and time are displayed:


Sunday 12 February 2012 05:32:20 AM IST
The command can also be used with suitable format specifiers as arguments. Each format is
preceded by a + symbol followed by the % operator, and a single character describing the format.
Syntax date [arguments]

The arguments are used for displaying the date in the desired format. The list of available
arguments is given in Table 3.8.
Table 3.8 Brief description of the arguments used in the date command
Arguments Description
%d For displaying day (01–31)
%m For displaying month (01–12)
%b For displaying abbreviated month name (Jan, Feb, etc.)
%y For displaying the year—last two digits (00,…, 99)
%Y For displaying the year with century—four digits
%H For displaying hours—military format (00,01,…, 23)
%I For displaying hours (0,1,…, 12)
%p For displaying a.m./p.m.
%M For displaying minutes (0,1, …, 59)
%S For displaying seconds (0,1,…, 59)
%x For displaying only date (07/15/12)
%X For displaying only time (17:15:30)
%a For displaying abbreviated weekday (Fri)

Examples
(a) $ date + %m
It prints only the month, that is, 07.
(b) $ date +%b
It prints the month name, that is, Jul.
(c) $ date +%Y
It prints the year with century, that is, 2012.
(d) $date + "%I %p"
It displays the hour with a.m./p.m.
05 PM

➢ cal: Displaying calendar


The cal command is used to display the calendar of a specified month and year.
Basic Unix Commands 49

Syntax cal {month [1-12]} {year[1-9999]}

Here, values 1–12 represent the month, and values 1–9999 represent the year.
Examples
(a) To display the current month’s calendar, just use the cal command without any arguments
(refer to Fig. 3.4).
$cal
(b) To display the calendar of March 2012, write the following command (refer to Fig. 3.4).
$ cal 3 2012
(c) To display the calendar for a whole year, specify the year in the cal command as shown
in Fig. 3.5.
$ cal 2012

➢ echo: Displaying messages and results


The echo command is used to display messages and the results of computation on the screen.
Syntax echo [-n] message/variables

These options are briefly explained in Table 3.9.


Table 3.9 Brief description of an option in the echo command
Option Description
-n It suppresses a new line after the echoed message or variables. The output of the next echo
statement will appear on the current line. This option is usually used while scripting.

The echo command recognizes the following Escape characters:


\\ represents backslash.
\a rings a bell.
\b represents the backspace key.
\f represents the form feed.
\n represents a new line character.
\r represents carriage return.
\t represents a horizontal tab character.
\v represents a vertical tab character.
Example echo "Hello World"

This example displays the message, Hello World on the screen.


Figure 3.6 shows how Escape sequences can be used with the echo command. We can see that
the \n results in a new line character, displaying the following word, World that appears on the
next line. Similarly, \t results in a horizontal tab between the words Hello and World. The third
example displays backslash (\) between Hello and World. The fourth example shows how \b
takes the cursor one character back, hence overwriting the character ‘o’ of Hello and displaying
the word, HellWorld. The fifth example inserts a vertical tab between the words Hello and World.

➢ bc: Basic calculator


The bc command activates a basic calculator that is meant for doing simple calculations. The
command is executed in the interactive mode, that is, we enter the expression we wish to compute
50 Unix and Shell Programming

# cal # cal 3 2012 $ echo "Hello\nWorld"


Hello
April 2012 March 2012 World
S M Tu W Th F S S M Tu W Th F S $ echo "Hello\tWorld"
1 2 3 4 5 6 7 1 2 3 Hello World
8 9 10 11 12 13 14 4 5 6 7 8 9 10 $ echo "Hello\\World"
15 16 17 18 19 20 21 11 12 13 14 15 16 17 Hello\World
22 23 24 25 26 27 28 18 19 20 21 22 23 24
$ echo "Hello\bWorld"
29 30 25 26 27 28 29 30 31 HellWorld
$ echo "Hello\vWorld"
Fig. 3.4 Calendar of current month and specified month Hello
World

Fig. 3.6 Echo command


output

Jan Feb Mar


S M Tu W Th F S S M Tu W Th F S S M Tu W Th F S
1 2 3 4 5 6 7 1 2 3 4 1 2 3
8 9 10 11 12 13 14 5 6 7 8 9 10 11 4 5 6 7 8 9 10
15 16 17 18 19 20 21 12 13 14 15 16 17 18 11 12 13 14 15 16 17
22 23 24 25 26 27 28 19 20 21 22 23 24 25 18 19 20 21 22 23 24
29 30 31 26 27 28 29 25 26 27 28 29 30 31

Apr May Jun


S M Tu W Th F S S M Tu W Th F S S M Tu W Th F S
1 2 3 4 5 6 7 1 2 3 4 5 1 2
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16
22 23 24 25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23
29 30 27 28 29 30 31 24 25 26 27 28 29 30

Jul Aug Sep


S M Tu W Th F S S M Tu W Th F S S M Tu W Th F
S
1 2 3 4 5 6 7 1 2 3 4 1
8 9 10 11 12 13 14 5 6 7 8 9 10 11 2 3 4 5 6 7 8
15 16 17 18 19 20 21 12 13 14 15 16 17 18 9 10 11 12 13 14 15
22 23 24 25 26 27 28 19 20 21 22 23 24 25 23 24 25 26 27 28 29
29 30 31 26 27 28 29 30 31 30

Oct Nov Dec


S M Tu W Th F S S M Tu W Th F S S M Tu W Th F S
1 2 3 4 5 6 1 2 3 1
7 8 9 10 11 12 13 4 5 6 7 8 9 10 2 3 4 5 6 7 8
14 15 16 17 18 19 20 11 12 13 14 15 16 17 9 10 11 12 13 14 15
21 22 23 24 25 26 27 18 19 20 21 22 23 24 16 17 18 19 20 21 22
28 29 30 37 25 26 27 28 29 30 23 24 25 26 27 28 29
30 31

Fig. 3.5 Calendar of the entire year


Basic Unix Commands 51

on the command line, and the command immediately displays the result on pressing the Enter
key. To quit the interactive mode, we either press Ctrl-d or type quit followed by the Enter key.
Syntax bc [-l]

-l defines the math functions and initializes the scale to 20, instead of the default zero.
The functions that can be used with the bc command are given in Table 3.10.
Table 3.10 List of functions available with the bc command
Function Description
sqrt() It calculates the square root of the supplied number.
s() It calculates the sine value. The argument should be in radians.
c() It calculates the cosine value. The argument should be in radians.
a() It calculates the arctangent. The result of the function is displayed in radians.
l() It calculates the natural logarithm of the supplied number.
e() It calculates the exponential of the supplied number.

We can use all operators including +, -, *, /, %, ^, where % represents the mod operator, that
is, it returns the remainder and ^ represents ‘to the power’.
Apart from the -l option, we can also use the scale to specify the number of digits to the
right of the decimal point.
Examples bc
$bc
5/3
1
quit

$ bc -l
5/3
1.66666666666666666666
quit
$ bc
2 + 2
4
5/3
1
scale = 2
5/3
1.66
3^2
9
sqrt(81)
9.00
quit
52 Unix and Shell Programming

$ x='echo "5/3" | bc -l'


$ echo $x
1.66666666666666666666

Table 3.11 Brief description of the wild cards used in Filename substitution—Globbing
filename substitution Filename substitution is the process by which the
shell expands a string containing wild cards into a list
Wild card Description
of filenames. The process of filename substitutions
! Used with [ ] to negate the meaning
is also known as globbing. Apart from the wild
~ Substitutes the user's home directory cards, *, ?, and [c1-c2], which we discussed while
{characters} Matches the given set of characters learning the ls command, Table 3.11 shows the
wild cards that are used in filename substitution.
Examples

(a) $ ls *
It displays all the names of the files and directories in the current directory.
(b) $ ls a*
It displays all the names of the files and directories that begin with the character a.
(c) $ ls *a
It displays all the names of the files and directories that end with the character a.
(d) $ ls *ab*
It displays all the names of the files and directories that contain ab.
(e) $ ls a*/*
It displays all the names of the files and directories that begin with the character a in all
the directories that are one level under the current directory.
The filename substitution applies to the files in the current directory. To match filenames
in the subdirectories, we need to use the / character.
(f) $ ls a*/*/*
It displays all the names of files and directories that begin with the character a in all the
directories that are two levels under the current directory.
(g) $ ls ???
It displays all the names of the files and directories that consist of three characters.
(h) $ ls ???*
It displays all the names of the files and directories that consist of at least three characters.
(i) $ ls student?.txt
It displays all the names of the files and directories that begin with the word student
followed by one character followed by extension .txt such as stduent1.txt, student2.
txt, and studenta.txt.
(j) $ ls [ab]*
It displays all the names of the files and directories that begin with either character a
or character b followed by zero or more occurrences of any character.
(k) $ ls [ab]*[12]
It displays all the names of the files and directories that begin with either character a or
character b followed by zero or more occurrences of any character and which end with
either the digit 1 or 2.
Basic Unix Commands 53

(l) $ ls [ab]*[1-5]
It displays all the names of the files and directories that begin with either character a or
character b followed by zero or more occurrences of any character and which end with
any digit from 1 to 5.
(m) $ ls [a-d]*
It displays all the names of the files and directories that begin with any character from a
through d followed by zero or more occurrences of any character.
(n) $ ls [a-d]??
It displays all the names of the files and directories that begin with any character from a
through d followed by exactly two characters.
(o) $ ls [!a-d]*
It displays all the names of the files and directories that begin with any character except
a through d followed by any number of characters.
(p) $ ls [A-Za-z]*
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case followed by any number of characters.
(q) $ ls [A-Za-z][a-z]*
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case, followed by any character from a through z
in lower case, followed by any number of characters.
(r) $ ls [A-Za-z][a-z][12]
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case, followed by any character from a through
z in lower case, followed by either digit 1 or 2.
(s) $ ls {aa,bb,cc}*
It displays all the names of the files and directories that begin with the characters aa, bb,
or cc followed by any number of characters.
(t) $ ls a*{d,1,z}
It displays all the names of the files and directories that begin with the character a
followed by any number of characters and, which end with d, 1, or z.
(u) $ ls a*{d,[1-3],[ab]}
It displays all the names of the files and directories that begin with the character a
followed by any number of characters and which end with d, a number from 1 through
3, or by either character a or b.
(v) The tilde (~) character by itself expands to the full path name of the user’s home directory.
The following echo command confirms this:
$ echo ~
/home/bintu

(w) When the tilde is appended before a path, it expands to the home directory and the rest
of the path name. Consider the following command.
$ cd ~/data

(x) We will be taken into the directory, data that is present within the user’s home directory.
The following pwd command confirms this.
54 Unix and Shell Programming

$ pwd
/home/bintu/data
(y) When the tilde is appended before a username, it expands to the full path name of that
user’s home directory. Consider the following command.
$ cd ~john
We will be taken to the user john’s home directory. The following command confirms this:
$ pwd
/home/john

➢ exit: Exiting
The exit command is used to log out of the Unix system, exit from a shell, and exit from a
shell script.
Syntax exit

Example exit
To log out of the Unix shell, Ctrl-d is a short cut that is used. Before exiting from the Unix
system, we should make sure that all the files that were open are saved and closed; else they
might get corrupted. Usually, when we exit from the shell, the currently running process
or command is automatically killed. In order to run the task in the background even after
exiting from the shell, we should use the nohup command (discussed in Chapter 6).

■ SUMMARY ■

1. When compared with the who command, the finger directory, phone number, login shell, mail status, and
command displays more elaborate information per- much more.
taining to users who are logged in. 3. Filename substitution or globbing is the process by
2. The finger command not only displays the login name, which the shell expands a string containing wild cards
terminal, date, and time of the logged-in users, but also into a list of filenames.
displays other information such as the user’s home

■ F U N C T ION SPECIFICATION ■

Command Function Command Function


ls To see a list of files and directories, including mv For moving files from one directory to another
hidden files as well as for renaming files
mkdir To create directories. We can also create passwd For changing the password
directories with specific permissions with this ln For creating links of files. There are two types
command. of links—hard links and symbolic links.
cd For changing a directory. We can use both who To know how many users are currently online
relative and absolute paths for changing the finger To know the current working directory
directory. touch To know who is online, when the user is
pwd To know the current working directory logged in, and for how long his/her terminal
touch To create empty files and also change the has been idle.
modification and access time of a file date For displaying system date and time
Basic Unix Commands 55

Command Function Command Function


cat For displaying contents of files, creating files, uname For displaying information of the current
and concatenating files system such as its hardware platform, name
rmdir For removing a directory provided it is empty of the operating system, and its release level.
cp For copying files as well as an entire directory unlink To remove the specified file, including symbolic
links
rm For deleting files as well as an entire directory
bc To activate a basic calculator that is meant for
cal To display the calendar of the specified month doing simple calculations. It can also be used
and year. By default, it displays the calendar to compute square root, sine value, cosine
of the current month. value, natural logarithms, and exponential
mv For moving files from one directory to another values.
as well as for renaming files exit To log out from the Unix system, a shell, or
cal-y To display the calendar of the current year a shell script

■ EXERCISES ■

Objective-type Questions
State True or False
3.1 The ls command shows the list of files and 3.10 We can delete more than one file with a single rm
directories that are sorted alphabetically by default. command.
3.2 The option used with the ls command to see the 3.11 With the rm command, we can forcibly delete a
names of the files and directories in reverse order file even if we do not have its write permission.
is -R. 3.12 With the mv command, we can move a file from
3.3 We can create only one directory at a time using one directory to another but cannot rename it.
the mkdir command. 3.13 The hard link should be created within the
3.4 The cd command, if given without any ar- current directory structure.
guments, will take us to our home directory. 3.14 We can log out of the Unix system using Ctrl-d.
3.5 With the touch command, we can only change 3.15 Through the cal command, we cannot see the
the timestamps of the files but cannot create files. calendar of the previous month.
3.6 While creating a file with the cat command, we 3.16 The mail status of the user can be seen through
need to use Ctrl-d to specify the end of the file. the finger command.
3.7 With the rmdir command, we can remove the 3.17 The uname command can be used to know the
non-empty directory as well. version and release of the operating system.
3.8 If we use the -i option with the cp command, it 3.18 The wild-card character ‘?’ represents a single
will prompt us before overwriting the destination character.
file if it already exists. 3.19 The bc or the basic calculator command can be
3.9 The cp command is used for making a copy of used to find the square root of a number.
the files; we cannot use it for copying an entire 3.20 The unlink command cannot delete symbolic
directory with its files and subdirectories. links.

Fill in the Blanks


3.1 The option used with the ls command to see 3.4 The command used to know our current working
hidden files is . directory is .
3.2 The option used with mkdir for creating directories 3.5 The format of time expression used to change
with specified permission is . modification or access time in the touch
3.3 With the cd command, we can give absolute as command is .
well as path names. 3.6 The option used with the rmdir command to
56 Unix and Shell Programming

delete an empty parent directory is . 3.12 The command used to display the calendar of the
3.7 The option is used with the rm current year is .
command to recursively delete all the files and 3.13 The command used to display information of
subdirectories of the specified directory. the logged-in user, including home directory,
3.8 The command used to create a link for a file is login shell, mail status, and phone number
known as . is .
3.9 There are two types of links to a file: 3.14 The option used with the ls command to display
and . the inode number of files is .
3.10 The option used with the date command to 3.15 The option used with the cat command that
display only the time is . displays non-printing characters in the file is
3.11 The function used to find the natural logarithm in .
the bc command is .

Multiple-choice Questions

3.1 The command bc-l sets the scale to 3.6 Apart from displaying contents of the files, the
(a) 20 (c) 10 command used for concatenating files is
(b) 5 (d) 6 (a) concat (c) merge
3.2 The tput cup 7 5 command moves the cursor to (b) cat (d) add_files
the 3.7 There are two types of links of files—hard and
(a) seventh row and fifth column (a) tough (c) volatile
(b) fifth row and seventh column (b) robust (d) symbolic
(c) top left corner of the screen 3.8 The echo command ~ will display
(d) right bottom corner of the screen (a) error
3.3 The command date +%M will display (b) list of files and directories
(a) month in character form (c) home directory of the user
(b) month in numerical form (d) profile file
(c) minutes 3.9 The following command is used to display the
(d) a.m./p.m. names of the files and directories that consist of
3.4 The option used in the cp command for interactive at least two characters:
copying is (a) ls??* (c) ls *
(a) -i (b) -r (c) -c (d) -d (b) ls (d) ls ?*
3.5 The following option is used in the cat command 3.10 The option used in the ls command to show files
to suppress messages when a non-existent file is and directories that are sorted on their modi-
used in the command: fication time is
(a) -o (b) -v (c) -n (d) -s (a) -m (b) -a (c) -t (d) -u

Programming Exercises
3.1 What will the following commands do? txt/college/students
(a) $ls [a-d]?? (j) $ rm -r college
(b) $ls [a-z][0-9]* (k) $ mv mbacourse.txt management.txt
(c) $ls -Rt (l) $ ln -f juice.txt energy.txt
(d) $mkdir -m 740 apple (m) $ finger Charles
(e) $mkdir -p fruits/delicious/apple (n) $ bc
(f) $touch 07151000 mbacourse.txt scale = 2
(g) $ cat mbacourse.txt lawcourse.txt 17/3
(h) $rmdir -p fruits/delicious/apple (o) $cal 10 2012
(i) $ cp /fruits/delicious/apple/juice. 3.2 Write the command for the following tasks:
Basic Unix Commands 57

(a) To display the list of files and directories (i) To change the password
that begin with a vowel (j) To create a link of the file mbacourse.txt in
(b) To change the access time of the file the name management.txt (If a file by the
mbacourse.txt to Feb 10 09:15 name management.txt already exists, we
(c) To show the contents of the file mbacourse. should be asked for a confirmation before
txt along with line numberings overwriting its contents.)
(d) To concatenate the contents of the two files (k) To get the list of all online users with their
mbacourse.txt and lawcourse.txt and activity and column headers
store them in a third file career.txt (l) To display day, month, and year in the
(e) To remove the empty subdirectories, students format 17 Nov 2012
and teachers, from the college directory (m) To log out from the Unix system
(f) To copy the entire directory teachers along (n) To show all the names of the files and
with its subdirectories in the name faculty directories that begin with any character
(g) To forcibly remove the file mbacourse.txt from a through z followed by exactly three
from the college directory characters
(h) To move the file mbacourse.txt from the (o) To find the square root of number 17 (The
current directory to the professional sub- result should be displayed up to five places
directory of the college directory of decimals.)

Review Questions
3.1 Explain the following commands with their syntax 3.4 What do you mean by escape characters?
and examples. Explain their usage through the echo command.
(a) ls (d) rmdir 3.5 Explain the term globbing with examples.
(b) who (e) cp 3.6 What is the use of the date command? Name the
(c) touch options that are used with the date command to
3.2 Explain the differences between the following: display only the year, hour in military format,
(a) Hard and symbolic links and only the day.
(b) who and finger commands 3.7 Explain the command used to exploit terminal
(c) cat and touch commands capabilities.
(d) rm and rmdir commands 3.8 Explain with examples the command that is used
3.3 What is the use of the bc command? Explain a to display the calendar of the desired month and
few functions that are associated with it. year.

Brain Teasers
3.1 In the long-listing command ls –li, if you find the communication network? If yes, mention the
two or more files having the same inode number, command.
what does it mean? 3.5 Consider the following cat command:
3.2 Identify the error in the following command and $ cat chirag notes.txt
correct it to display all the files that consist of It displays an error indicating that the file notes.
exactly four characters. txt does not exist. How can you avoid this error
$ ls **** message?
3.3 Identify the error in the following command and 3.6 If on using s() function in the bc command
correct it to display the hardware platform of the for finding sine value, a wrong answer was
current machine. obtained, identify the error.
$ uname -v 3.7 You want to change your password but the follow-
3.4 Can you display the node name, that is, the ing command is not working. Where is the error?
name by which your machine is connected in $password
58 Unix and Shell Programming

3.8 Is there any way to copy the content of the files to 0 places of decimal. What change is required
a.txt and b.txt to a file c.txt without deleting to be made in order to get the result up to 20
the earlier content of file c.txt? If yes, what is decimal places ?
that? $ bc
3.9 What should the command given to display the 17/3
hardware platform and name of the operating 3.13 What is the mistake in the following command
system on a machine be? for changing the modification time of the file?
3.10 You wish that a confirmation prompt appears a.txt to Oct 15 04:15?
before deleting the files. However, by using the $ touch –a 10150415 a.txt
following command, the confirmation message 3.14 The following command to recursively
is not prompted. Where and what is the error? copy the content of the directory projects to
$ rm -f a*.* experiments is not working. Identify the error
3.11 The following command creates a hard link of and correct it.
the file a.txt in the name b.txt. What changes $ cp projects experiments
are required to be made to this command in order 3.15 The following date command is not displaying
to create a symbolic link instead of a hard link? century in four digits. Identify the error and
$ ln a.txt b.txt correct it.
3.12 The following bc command displays the result $ date +%y

■ ANSWERS TO OBJECTIVE-TYPE QUESTIONS ■


State True or False 3.13 True 3.4 pwd Multiple-
3.1 True 3.14 True 3.5 MMDDhhmm choice
3.2 False 3.15 False 3.6 -p Questions
3.3 False 3.16 True 3.7 -r 3.1 (a)
3.4 True 3.17 True 3.8 ln 3.2 (b)
3.5 False 3.18 True 3.9 hard, 3.3 (c)
3.6 True 3.19 True symbolic 3.4 (a)
3.7 False 3.20 False 3.10 %x 3.5 (d)
3.8 True 3.11 l() 3.6 (b)
3.9 False Fill in the Blanks 3.12 cal -y 3.7 (d)
3.10 True 3.1 -a 3.13 finger 3.8 (c)
3.11 True 3.2 -m 3.14 -i 3.9 (a)
3.12 False 3.3 relative 3.15 -v 3.10 (c)
Advanced Unix C HA PT E R

Commands

4
After studying this chapter, the reader will be conversant with the following:
• Advanced commands used in the Unix operating system such as setting access permissions for the
existing files and directories, setting default permissions for the newly created files and directories,
creating groups, changing ownerships of the files, and sharing files among groups
• Sorting content and performing input/output (I/O) redirections, that is, diverting the output of a command
to a file or providing input to a command from a file
• Cutting or slicing the file vertically, pasting content, splitting files, counting characters, words, and lines
in files or other content, and using a pipe operator, that is, sending the output of a command as input to
another command
• Displaying the top and bottom contents of a file, presenting content page-wise, and displaying the
manual of any command
• Comparing files, eliminating and displaying duplicate lines in two files, and displaying and suppressing
the unique and common content in two files
• Printing documents, setting reminders of appointments, carrying out conversions between DOS and
Unix files, and measuring time usage in the execution of commands

4.1 OVERVIEW
The advanced Unix commands help us perform several tasks such as setting access permissions
for the existing files and directories, setting default permissions for the newly created files and
directories, changing ownership of the files, and sharing files among groups. These commands
also include sorting file content, performing input/output (I/O) redirections, and piping the
output of a command as input to another command. Unix also offers commands for operations
such as cutting or slicing the file vertically, pasting content, splitting files, counting characters,
words, and lines in files, extracting the top and bottom contents of files, presenting content
page-wise, and displaying manual commands. These commands also include comparing files,
eliminating and displaying duplicate lines in two files, suppressing the unique and common content
in two files, printing documents, setting reminders of appointments, carrying out conversions
between DOS and Unix files, and measuring the time usage in the execution of commands.
60 Unix and Shell Programming

The list of advanced commands that will be covered in this chapter is as follows:
chmod, umask, chown, chgrp, groups, input/output redirection in Unix, pipe operator, cut,
paste, split, wc, sort, head, tail, diff, cmp, uniq, comm, time, pg, lp, .profile, calendar,
script, dos2unix, and man.

4.2 FILE ACCESS PERMISSIONS


The data in the Unix system is contained in files. We may restrict or permit access to this
data by restricting or permitting access to the files containing the data. There are three types
of Unix files: ordinary, directory, and special files.
We may use file permissions to avoid any accidental modifications. We can retain the
ability to read the file while restricting the ability to write in them. Similarly, we can also
restrict other users in a multi-user environment from reading our files.
There are three classes of system users.The first is the user. The user is usually the system
user who created the file. The user has full control over restricting or permitting access to the
file at any time. In addition to individual file ownership, it is possible for one or more system
users to own the file collectively in a kind of group ownership. A system user who is not the
file owner may access the file if this user belongs to the group of system users who are allowed
to access the file. The last category of system users is the one who is neither the owner nor part
of the group and is known as the other user. Hence, there are three classes of system users:
1. User refers to the system user who created the file and is also sometimes called owner.
2. Group refers to one or more users who may access the file as a group.
3. Other refers to any other users of the system.
There are several permissions for system usage. System users with a read permission
may read the contents of an ordinary file while users with a write permission may write in a
file and change its contents. Write permission is also required to delete the file using the rm
command (Table 4.1).
Table 4.1 Access modes and permissions

Access mode Ordinary file Directory file


Read Allows examination of file contents Allows listing of files within the directory
Write Allows changing of contents of the file Allows creation of new files and removal of
old ones
Execute Allows execution of the file as a command Allows searching of the directory

We can view the permissions of a file or directory through the long listing command. The
following example shows the long listing of file mce1.

Example $ ls –l mce1
This statement requests the long directory listing for the ordinary file called mce1. We might
get the output shown in Fig. 4.1.
The dash (-) in the file type field indicates that it is an ordinary file. The access permissions
field tells us what kinds of access permissions are granted. The number, 1, indicates that
there is only one link for this file from the directory, which means that this file only has one
Advanced Unix Commands 61

– -rw-rw-rw- 1 chirag it 120 Mar 15 12:20 mce1

File type Permissions Links Owner Group Size Date and time filename
of last modification
Fig. 4.1 Output of the long listing command for the file mce1

name associated with it. The word chirag is the owner’s name; it is the group name that has
access to this file; 120 refers to the file size; Mar 15 12:20 is the date and time the file was last
modified; and mce1 is the filename.
We have seen that long listing shows the permissions for all the three system users—User,
Group, and Other—besides other information such as name of the file (or directory), size,
date, and time of last access. Assume that the permissions for the file mce1 are as follows:
r w x r - x r - - 1 chirag it 120 Mar 15 12:20 mce1
The first three characters, r, w, and x, are the permissions for the User. This is followed by the
permissions for the Group members. The last three characters represent the permissions for the
Other member. The aforementioned output indicates that the User has all the three permissions,
r w x (read, write, and execute), for the file mce1. The permissions r – x indicate that the
Group members have read and execute permissions for the file mce1. The missing permission is
represented by a hyphen (-). The Other users have only r, that is, read permission for the file mce1.
Suppose the permissions for the file mce1 are as follows:
r - x - - x - - - 1 chirag it 120 Mar 15 12:20 mce1

The permissions indicate that the User has r - x, that is, read and execute permissions for
the file mce1. The Group members have - - x, that is, only execute permission for the file,
and the Other members have no permission (- - -), that is, the Other members cannot read,
write, or execute the file mce1.
Let us take a look at how we can assign and remove permissions from a file or directory.

4.2.1 chmod: Changing File Access Permissions


chmod stands for change mode and the command is used for changing the access permission for
files and directories. Only the owner or super user can change the access permission for files.
Syntax chmod [option] mode files

Here, option refers to the following elements given in Table 4.2.


The keyword mode refers to the three access permissions given in Table 4.3.
Examples
(a) $chmod 751 a.txt
This command assigns permission 7 to user (i.e., owner), 5 to group, and 1 to other.
Permission 7 means 4(r) + 2(w) + 1(x), that is, the user has all the three permissions,
read, write, and execute for the file a.txt. Similarly, the group members have 4(r) + 1(x),
that is, read and execute, but no write permission for the file a.txt. However, the other
users can only execute th e file. Refer to Fig. 4.2 to view the output of the command.
(b) $chmod 760 a.txt
62 Unix and Shell Programming

Table 4.2 Brief description of options used with the chmod Table 4.3 Brief description of modes used with
command the chmod command
Option Description Mode Description
u Represents User or the owner of the file r or 4 Represents read permission
g Represents Group w or 2 Represents write permission
O Represents Other x or 1 Represents execute permission
A Represents all (User, Group, and Other). It is the
default option
This command assigns permission 7,
+ Adds access permission
4(r) + 2(w) + 1(x), that is, read, write, and
- Removes access permission
execute permissions for the file a.txt to
= Assigns permission to u, g, o, or a the user (or owner) of the file. Permission
6, 4(r) + 2(w), that is, read and write
permission is assigned to the group members of the file, and 0 or no permission to
other users. The other users cannot read, write, or execute the file a.txt. Refer to
Fig. 4.2 to view the output of the command.
(c) $chmod o+r a.txt
This command adds the read permission to the other members for the file a.txt. Other
existing permissions are left undisturbed. Refer to Fig. 4.2 to view the output of the
command.
(d) $chmod u-x,g-w+x,o+wx a.txt
It removes the execute permission of the user (i.e., owner), removes the write permis-
sion of the group members, adds execute permission to the group members, and adds
write and execute permissions to the other users. The existing permissions are left
undisturbed. Refer to Fig. 4.2 to view the output of the command.
Note: There should not be any space after the comma (,) or while specifying permissions of the user, group,
and others in the command.

(e) $ chmod u=rwx,g=rx, o=x a.txt


This command assigns permission, rwx, that is, read, write, and execute permission
to u, that is, the user or owner of the file. It also assigns, rx, that is, read and execute
permission to the group members of the file and x, that is, execute permission to the
other users. Previous permissions assigned to user, group, and other members will be
removed. Refer to Fig. 4.2 to view the output of the command.
(f) $ chmod u=w a.txt
This command assigns write permission to the user, that is, owner of the file a.txt.
Previous permissions assigned to the user will be removed. Existing permissions to
the group and other users will be undisturbed. Refer to Fig. 4.2 to view the output
of the command.

4.2.2 umask: Setting Default Permissions


The umask command sets the default permissions for the files that will be created in the
future.
Advanced Unix Commands 63

$ chmod 751 a.txt Syntax umask ugo


$ ls -al a.txt Here, u, g, and o refer to the permissions that
-rwxr-x--x 1 bintu None 21 Dec 30 19:10 a.txt
we do not want the user, group, and others
$ chmod 760 a.txt
to have for the new files. Yes, you have read
$ ls -al a.txt correctly, umask is not for assigning but for
-rwxrw---- 1 bintu None 21 Dec 30 19:10 a.txt
removing permissions of the three categories
$ chmod o+r a.txt of system users—user (owner), group, and
$ ls -al a.txt other—for the future files.
-rwxrw-r-- 1 bintu None 21 Dec 30 19:10 a.txt To understand this better, let us first create
$ chmod u-x,g-w+x,o+wx a.txt an empty file called chirag using the touch
$ ls -al a.txt command and then try to list it.
-rw-r-xrwx 1 bintu None 21 Dec 30 19:10 a.txt
Example
$ chmod u=rwx,g=rx,o=x a.txt
$ touch chirag
$ ls -al a.txt
-rwxrw-x--1 bintu None 21 Dec 30 19:10 a.txt $ ls -l chirag
-rw-r--r-- 1 mce it 26 Oct 27
$ chmod u=w a.txt
10:12 chirag
$ ls -al a.txt
--w-r-x--x 1 bintu None 21 Dec 30 19:10 a.txt The permission of this file is 644. Whenever
Fig. 4.2 Output of application of the chmod command on we create a file, Unix uses the value stored
file a.txt in a variable called umask to decide the
default permissions. umask stands for user file
creation mask, and is used for defining the
permissions to mask or hide, that is, the permissions that we want to deny. The current value
of umask can be easily determined by typing umask followed by the Enter key.
$ umask
0022

The first 0 indicates that what follows is an octal number. The three digits that follow the first
zero refer to the permissions to be denied to the owner, group, and others. This means that for
the owner no permission is denied, whereas for both the group and others, write permission
(2) is denied.
Whenever a file is created, Unix assumes that the permissions for this file should be 666.
However, since our unmask value is 022, Unix subtracts this value from the default system-
wide permissions (666) resulting in a value 644. This value is then used as the permissions
for the file that we create.
This is the reason why the permissions turned out to be 644 or rw-r--r-- for the file
chirag that we created.
Similarly, the system-wide default permissions for a directory are 777. This implies that
when we create a directory its permission would be 777 − 022, that is, 755.

Note: If a directory does not have an execute permission we will never be able to enter data into it.

To change umask value:


$ umask 342
64 Unix and Shell Programming

This would ensure that from this point onwards, any new file that we create would have the
permissions 324 (666 − 342) and any directory that we create would have the permissions
435 (777 − 342).

4.2.3 chown: Changing File Ownership


The chown command is used for changing the owner and group owner of a file.
Syntax chown [-R] new_owner[:[new_group]] filenames

The options and arguments of this command are briefly explained in Table 4.4.
Table 4.4 Brief description of options used in the chown command

Option Description
-R The command applies recursively to the files and subdirectories of the current directory.
new_owner It is the new owner of the files, that is, new_owner will become the new owner of the files
and hence gets all the permissions to access the file and modify its access permissions.
new_group It is the group name to which we want to assign the files.
filenames These are the files whose ownership we wish to change.

To change both the owner and the group of the file, new_owner must be followed by a
colon and a new_group with no space in between.

Note: If no new_group is specified after the new_owner and colon, the owner and group of the file is changed
to new_owner and group of new_owner, respectively.
If the new_owner is missing but colon and new_group are specified then only the group of the files is changed,
that is, the command will act as the chgrp command. We will learn about the chgrp command next.

Examples By default, when we create or copy a file, we become its owner. For example,
suppose we have a file named notes.txt and we want to change its ownership to another
person named Ravi.
Let us first view the current owner of the file by giving the following command:
(a) $ ls –l notes.txt
-rwxrwxr-x 1 chirag it 120 Mar 15 12:20 notes.txt
We can see that chirag is the current owner of the file. Now chirag can give the following
command to give the ownership of the file notes.txt to Ravi.
(b) $ chown ravi notes.txt
To see whether the ownership is changed, let us again give the ls –l command.
(c) $ ls –l notes.txt
-rwxrwxr-x 1 ravi it 120 Mar 15 12:20
notes.txt

We can see that the owner of the file notes.txt is changed from chirag to ravi.
Now, chirag will no longer be able to change the permissions of the file notes.txt and
only ravi can do so.
Advanced Unix Commands 65

Note: This process is one way because we must either be the owner of the file or the super user to
change its ownership. After we give the file to ravi, we cannot get its ownership back until and unless
ravi issues the chown command to return the ownership to us.

Examples The following are a few more examples.


(a) $ chown chirag:mba notes.txt
This command will change the owner and group of the file notes.txt to chirag and mba,
respectively.
Note: The user chirag, and group mba must exist before giving the command. The commands to add a new
user and group to the system are useradd and groupadd, respectively (discussed in Chapter 15).

(b) $ chown chirag: notes.txt


This command will change the owner of the file notes.txt to chirag and the group of
the file is changed to the one to which chirag belongs.
(c) $ chown :mba notes.txt
This command will change the group of the file notes.txt to mba.
(d) $ chown -R chirag projects
This command will change the ownership of the files and subdirectories of the directory
projects to chirag.

4.2.4 chgrp: Changing Group Command


The chgrp command is used for changing the group of the specified number of files. The files
will thereby be made accessible to the specified group.
Syntax chgrp [-R] [-h] new_group filenames

The options and arguments of this command are briefly explained in Table 4.5.
Table 4.5 Brief description of the options used in the chgrp command

Option Description
-R It recursively changes the group of the files and subdirectories of the specified directory.
-h If the specified file is a symbolic link, its group is changed. In the absence of a -h option,
the group of the file referenced by the symbolic link is changed and not the symbolic link.
new_group It is the group name we want to assign the files to.
filenames Specifies the files whose group we want to change.

By default the file we create gets group ownership in the group we belong to, that is, the
group to which the owner belongs becomes the default group ownership of the file.
For example, if we belong to the group it, our file will also have the same group ownership,
as can be seen by the following command:
$ ls –l notes.txt
-rwxrwxr-x 1 chirag it 120 Mar 15 12:20 notes.txt

The following command is used to change the group ownership of a file named notes.txt
from group it to group hospital:
66 Unix and Shell Programming

$ chgrp hospital notes.txt

Note: The group hospital must exist before giving this command.

Now, the group ownership may appear as follows:


$ ls –l notes.txt
-rwxrwxr-x 1 chirag hospital 120 Mar 15 12:20 notes.txt

Note: Since we are still the owner of the file, we can again change its group ownership any time.

Examples
(a) $chgrp -R hospital projects
This command changes the group of all the files and subdirectories present in the projects
directory to hospital.
(b) $chgrp -h hospital finance.txt
This changes the group of the symbolic file finance.txt to hospital.

4.2.5 groups: Displaying Group Membership


The groups command is used for finding the group to which a user belongs.
Syntax groups username1 [ username2 [username3 …]]

Example
(a) % groups chirag
mba
This example asks the group name of the user, chirag. The output mba signifies that the
user chirag belongs to the group named mba.
We can also find the group membership of more than one user simultaneously as follows:
(b) % groups chirag ravi
chirag : mba
ravi : other
This command asks the group names of the two users, chirag and ravi. The output indicates
that the user chirag belongs to the mba group and the user ravi belongs to the other group.

4.2.6 groups: Sharing Files Among Groups


Files can be shared with a group of users so that they can simultaneously read, work, and
operate the files(s). For this to happen, a group needs to be created by the system administrator.
To create a group, we give the command with the following syntax:
Syntax groupadd group_name

Example % groupadd bankproject

This will create a group by the name bankproject. After creating a group, the next step is to
set the group ownership of the file(s) to the given group using the chgrp command.
Syntax $ chgrp groupname filename
Advanced Unix Commands 67

Example
(a) $ chgrp bankproject accounts.txt
This will set the group owner of the file accounts.txt to our newly created group bankproject.
Similarly, we need to change the group ownership of all the files that we wish to share with
the users of our group bankproject. Thereafter, we need to set the file permissions so that
everybody in the group can read and write the file through the following syntax:
Syntax $ chmod g+rw filename

We can also assign access permissions to the group in the following way:
Syntax $ chmod 770 filename
This example assigns read, write, and execute permissions to the owner and group members
of the file and no permission to the other users.

4.3 INPUT/OUTPUT REDIRECTION IN UNIX


The input to a command or a shell is usually provided through the standard input, that is, the
keyboard, while the output of the command is displayed on the standard output, that is, on
the terminal screen. By default, each command takes its input from the standard input and
sends the results to the standard output.
By making use of the I/O redirection operators, we can change the location of providing
input to a command and displaying the output of the command. Let us first understand the
output redirection operator.

4.3.1 Output Redirection Operator


The output redirection operator is used to redirect the output (from a command) that is
supposed to go to a terminal by default, to a file instead. This process of diverting the output
from its default destination is known as output redirection. For redirecting the output, the
operator that we have to use is the ‘>’ operator in shell command. The ‘>’ symbol is known
as the output redirection operator and we can use it to divert the output of any command to
a file instead of the terminal screen.
Syntax command [> | >>] output_file
Here, output_file is the name of the file where we wish to direct and save the output of the
command.
Example $ ls > kk
On using this command, nothing will appear on the output screen and all output, that is, the
list of files and directories from the ls command is redirected to the file kk.
On viewing the contents of the file kk, we get the list of files and directories in it.
Note: If the file kk does not exist, the redirection operator will first create it, and if it already exists, its contents
will be overwritten.
In order to append output to the file, the append operator, >> is used as shown in the following
command:
$ ls >> kk
68 Unix and Shell Programming

4.3.2 Input Redirection Operator


In order to redirect the standard input, we use the input redirection operator, the < (less than)
symbol.
Syntax command < input_file

Here, input_file is the name of the file from where the data will be supplied to the command
for the purpose of computation.
Examples
(a) $ sort < kk
The sort command in the example receives the input stream of bytes from the file kk.
We can also combine input and output redirection operators.
(b) $ sort < kk > mm
On using the command, nothing will appear on the terminal screen; instead the content
of the file kk will be sorted and sent directly to the file mm.

4.4 PIPE OPERATOR


The pipe operator, represented by the symbol ‘|’, is used on the command line for the purpose
of sending the output of a command as an input to another command. The pipe operator is
different from the output redirection operator, ‘>’ in a way that the output indirection operator
‘>’ is mostly used for sending the output of a command to a file, whereas the pipe operator
is used for sending output of a command to some other command for further processing.
Syntax command1 | command2 [| command3…]

Example $ cat notes.txt | wc


4 20 75

Here, the output of the cat command is sent as input to another command, wc. The wc command
counts the lines, words, and characters in the file notes.txt whose content is passed to it.
We can combine several commands with pipes on a single command line as follows:
$ cat notes.txt | sort| lp
This command sorts the content of the file notes.txt and sends the sorted content to the
printer for printing.
Note: The pipe operator provides a one-way flow of data that is from left to right, whereas the redirection
operator enables two-way flow of data.

4.5 cut: CUTTING DATA FROM FILES


The cut command is used for slicing (cutting) a file vertically.
Syntax cut [-c –f] file_name

Here, –c refers to columns or characters and –f refers to the fields, that is, words delimited
by whitespace or tab.
Advanced Unix Commands 69

Examples
(a) cut -c 6-22,30-35 bank.lst
This command retrieves 6-22 characters and 30-35 columns (characters) from the file
bank.lst and displays them on the screen.
Let us look at another example.
(b) $ cut -f2 bank.lst
We get the content of the second field of the file bank.lst displayed on the screen.
Let us assume the file bank.lst has the following content.
101 Anil
102 Ravi
103 Sunil
104 Chirag
105 Raju
Note: The fields in the file bank.lst are separated by a tab space.

Here, the cut command will display the second field of the file bank.lst, that is, we will get
the output shown in Fig. 4.3.
The fields in the file bank.lst are delimited by a tab. If they are
$ cut -f2 bank.lst
separated by a delimiter other than tab or white space, then the
Anil output of the cut command will be different.
Ravi
Sunil Let us assume the file bank.lst has the following content.
Chirag
Raju 101,Anil
102,Ravi
Fig. 4.3 Output
103,Sunil
displaying second field of
104,Chirag
the file bank.lst
105,Raju

We can see that the fields of the file bank.lst are delimited by a comma (,) and not by a tab or
white space. The following command will not display anything on the screen as the default
delimiter for identifying fields is either white space or tab.
$ cut -f2 bank.lst
Hence, the file bank.lst will be considered to be consisting of a single field on each
line.
To specify the delimiter when the fields are delimited by some other character other than
tab or white space as in the aforementioned file, we use -d (delimiter) to specify the field
delimiter as shown in the following example:
cut -f2 -d "," bank.lst
This statement will show the second field of the file bank.lst where the fields are delimited
by commas (,).
Assume that the fields are delimited by a comma (,). The following statement cuts the
fields, starting from the first, from the file bank.lst:
$ cut -d"," -f1- bank.lst
70 Unix and Shell Programming

Assuming that the fields are delimited by commas (,), the following statement cuts the first
field, fourth field, and so on, from the file bank.lst:
$ cut -d"|" -f1,4- bank.lst

Can we cut the fields of two separate files and paste them to make a third file? Yes, of course.
Let us see how.
Assume there are two files, Names and Telephone, with the following contents.
The Names file consists of employee codes and names as follows:
101 Anil
102 Ravi
103 Sunil
104 Chirag
105 Raju

The Telephone file consists of employee codes and telephone numbers as follows:
101 2429193
102 3334444
103 7777888
104 9990000
105 5555111

Let us cut the second field from both the files and paste them to make a third file, that is, cut
the employee names from the Names file and telephone numbers from the Telephone file and
paste them to create a third file.
To cut out the second word (field) from the file Names, we give the following
command:
$ cut -f2 Names

We get the output as shown in Fig. 4.4.


Similarly, to cut the telephone numbers, that is, the second field from
Anil
Ravi the Telephone file, we give the following command:
Sunil
Chirag $ cut -f2 Telephone
Raju
2429193
Fig. 4.4 Output 3334444
showing second 7777888
field of file Names 9990000
5555111

We can save the output by redirecting the standard output to a file.


$ cut -f2 Name > names.txt
$ cut -f2 Telephone > numbers.txt

The names and telephone numbers will be saved in two files, names.txt and numbers.txt,
respectively. To paste the content of the two files, we need to understand the paste command.
Let us now study this command.
Advanced Unix Commands 71

4.6 paste: PASTING DATA IN FILES


It is used to join textual data together and is very useful if we want to put together textual
information located in various files.
Syntax paste [-s] [-d "delimiter"] files

These options are briefly explained in Table 4.6.

Table 4.6 Brief description of the options used in the paste command

Option Description
-s The paste command usually displays the corresponding lines of each specified file.
The -s option refers to a serial option and is used to combine all the lines of each file
into one line and display them one below the other.
-d This option is for specifying the delimiter to be used for pasting lines from the specified
files. The default delimiter used to separate the lines from the files is the Tab character.

Anil 2429193 Example Consider the file names.txt mentioned in


Ravi 3334444 Section 4.5.
Sunil 7777888
Chirag 9990000
Raju 5555111 $ paste names.txt numbers.txt

Fig. 4.5 Pasting of two files names. The output will be as shown in Fig. 4.5.
txt and numbers.txt with the We can see that the corresponding lines of the files
default tab character in between names.txt and numbers.txt are pasted with a tab character
in between. By default, the paste command uses the tab
Anil:2429193 character for pasting lines; however, we can specify a
Ravi:3334444
Sunil:7777888 delimiter of our choice with the -d command as shown in
Chirag:9990000
Raju:5555111 the following example.
$ paste -d"|" names.txt numbers.txt
Fig. 4.6 Two files names.txt
and numbers.txt pasted with This joins the two files with the help of the | delimiter
the ‘|’ symbol in between and not tab (i.e., between names and telephone numbers,
there will be a ‘|’ symbol instead of the tab
$ paste -s names.txt numbers.txt character, as shown in Fig. 4.6).
Anil Ravi Sunil Chirag Raju The example shown in Fig. 4.7 serially
2429193 3334444 7777888 9990000 5555111
pastes the contents from the files. It combines
Fig. 4.7 Two files names.txt and numbers.txt all the lines of each file into one line and
pasted one below the other displays them one below the other.

4.7 split: SPLITTING FILES INTO LINES OR BYTES


The split command is used to split a file into pieces.
Syntax split [-b n [K | M]] [ -l n] [-n] file_name dest_file

The options and arguments are briefly explained in Table 4.7.


72 Unix and Shell Programming

Table 4.7 Brief description of the options and arguments used in the split command

Option Description
-b n It splits the specified file into pieces that are n bytes in size.
-b nK It splits the specified file into pieces that are n kilo bytes in size.
-b nM It splits the specified file into pieces that are n mega bytes in size.
-l n It splits the specified file into n number of lines (default option). The default value of n is 1000.
-n It is the same as -l n.
File_name It is the name of the file to be split.
dest_file It is the name of the file in which the split pieces will be stored. If the dest_file is, say, demo, the
split pieces will be stored in the files demoaa, demoab, demoac, and so on.

$ split numbers.txt trial $ split -b 20 numbers.txt temp


$ ls trial* $ ls temp*
trialaa tempaa tempab
$ cat trialaa $ cat tempaa
2429193
2429193 3334444
3334444 7777
7777888
9990000 $ cat tempab
5555111 888
9990000
Fig. 4.8 File numbers.txt split into the 5555111
file trialaa Fig. 4.9 File numbers.txt split into two 20-byte
files, tempaa and tempab
$ split -1 2 numbers.txt temp
Example When the split command is
$ ls de*
tempaa demoab demoac given without any option, the file is split
into pieces that are 1000 lines each, that is,
$ cat demoaa
the default option is –l as shown in Fig. 4.8.
2429193
3334444 We can see that the file numbers.txt is
$ cat tempab
split into a single file trialaa consisting of
777888 the complete content of the file numbers.txt
9990000
(because the size of the file numbers.txt is
$ cat demoac lesser than 1000 lines).
5555111
The example in Fig. 4.9 splits the file
Fig. 4.10 File numbers.txt split into three files, numbers.txt into files that are 20 bytes each.
demoaa, demoab, and demoac, which are two The file numbers.txt is split into two
lines each pieces, tempaa and tempab, where tempaa
contains the first 20 bytes of the file numbers.txt while the file tempab contains the remaining
number of bytes.
The example in Fig. 4.10 splits the file numbers.txt into pieces, each of which consists
of two lines.
The file numbers.txt is split into three files with the following names: demoaa, demoab, and
demoac. Each of these files contains two lines (as shown in Fig. 4.10).
Advanced Unix Commands 73

4.8 wc: COUNTING CHARACTERS, WORDS, AND LINES IN FILES


The wc (word count) command is usually used to find the number of lines in any file. By
default, it displays all the three counts—characters, words, and lines—of any given file.
Syntax wc [-l -w -c] [filename]

Here, -l counts the number of lines.


-w counts the number of words delimited by white space or a tab.
-c counts the number of characters.
Example $ wc phone.lst
12 124 650 phone.lst
This command displays the count of lines, words, and characters in the file phone.lst, that
is, the numerical values 12, 124, and 650 represent the count of lines, words, and characters,
respectively in the file phone.lst.
If we wish to view only the number of lines in the file, we need to only use the –l option,
as in the following example:
$ wc -l phone.lst
12 phone.lst
As we can see, the command displays the count of the number of lines in the file phone.lst.
Similarly, the -w option will give the total number of words in a file, and the -c option gives
the total number of characters in a file.

4.9 sort: SORTING FILES


It is used for sorting files either line-wise or on the basis of certain fields, where the fields refer
to the words that are separated by one of the following: white space, tab, or special symbol.
Syntax sort [-n][-r][-f][-u] filename

The options and arguments shown here are briefly explained in Table 4.8.
Table 4.8 Brief description of the options used in the All lines in the filename will be
sort command arranged in alphabetical order on the
basis of the first character of the line.
Option Description The other syntax for using the sort
-n Sorts numerical values instead of ASCII, command is as follows:
ignoring blanks and tabs
Syntax sort +p1 - p2 filename
-r Sorts in reverse order
-f Sorts upper and lower case together, that This limits the sort to be applied on the
is, ignores the difference in upper and basis of the characters beginning from
lower case field p1 and ending at field p2. If p2 is
omitted, then sorting will be done on
-u Displays unique lines, that is, it eliminates
duplicate lines in the output
the basis of the characters beginning
from field p1 till the end of the line.
filename Represents the file to be sorted
Examples
(a) $ sort +2 -4 bnk.lst
74 Unix and Shell Programming

This command skips the first two fields and uses the third and fourth fields for sorting
the file bnk.lst.
(b) $ sort +3 -4 bnk.lst
This command skips the first three fields and uses the fourth field for sorting the file
bnk.lst.
(c) $ sort +2 bnk.lst
This command skips the first two fields and uses the third and the rest of the fields up till
the end of the line for sorting the file bnk.lst.
(d) $ sort bnk.lst -o bank.lst
This command sorts the file bnk.lst and stores the result in bank.lst.
(e) $ sort +0 -1 bnk.lst
This command sorts the file bnk.lst on the basis of the first field.
(f) $ sort +1 -4 bnk.lst
This command sorts the file bnk.lst from the second to the fourth fields.
(g) $ sort +2b bnk.lst
This command sorts the file bnk.lst on the third field after ignoring leading blank spaces.
The -f option is used to ignore the upper and lower case distinction.
(h) $ sort +2bf bnk.lst
The command will sort the third field after ignoring leading blank spaces and sort the
upper and lower case data together.
The -n option is used for sorting the file on the basis of numerical values rather than
ASCII values.
(i) $ sort -n +2 -3 a.bat
The command sorts the file a.bat on the third field, on the assumption that it is a
numerical field.
The -r option is used for sorting a given file in reverse order.
(j) $ sort -r link.lst
The command will sort the file link.lst in the reverse order. The -u option will eliminate
duplicate lines in the sorted output.
(k) $ sort -nu +2 -3 a.bat
The command sorts the file a.bat on the third field after eliminating duplicate lines.

4.10 head: DISPLAYING TOP CONTENTS OF FILES


The head command is used for selecting the specified number of lines from the beginning of
the file and displaying them on the screen.
Syntax head –[n] file name

Here, n is the number of lines that we want to select.


Example head bnk.lst

When used without an option, this displays the first ten records (lines) of the specified file.
head -3 bnk.lst

It displays the first three lines of the file bnk.lst.


Advanced Unix Commands 75

We can also specify more than one file.


head -3 bnk.lst notes.txt

It will display the first three lines of both the files, bnk.lst and notes.txt, one after the other.

4.11 tail: DISPLAYING BOTTOM CONTENTS OF FILES


The tail command is used for selecting the specified number of lines from the end of the file
and displaying them on the screen.
Syntax tail –[ncbr] filename

The options of the command are briefly explained in Table 4.9.


Table 4.9 Brief description of the options used in Note: There is one more option that is used with
the tail command the tail command. +n gives an instruction to skip
Option Description n–1 lines and select the rest until the end of the file.

-n Selects the last n lines Examples


-c Selects the last c number of characters (a) $ tail -3 bnk.lst
-b Selects a specified number of disk blocks It will display the last three lines.
-r Sorts the selected lines in reverse order (b) $ tail -10 bnk.lst
It will display the last 10 lines.
(c) $ tail +10 bnk.lst
It will start extracting from the tenth line (it will skip nine lines) up to the end of the file.
(d) $ tail –50c bnk.lst
It will display the last 50 characters of the file bnk.lst.
(e) $ tail –2b bnk.lst
It will display the last two disk blocks of the file bnk.lst. A disk block is usually 512
bytes big.
(f) $ tail –2r bnk.lst
It will display the last two lines of the file bnk.lst in reverse order, that is, the last line
will appear first followed by the second last line.
(g) $ head -25 a.txt | tail +20 > b.txt
It extracts lines numbering from 20 to 25 from the file a.txt.
The head utility extracts the first 25 lines from the file a.txt and pipes them to the tail
utility, which skips the first 19 lines and extracts lines 20 to 25. The results are then
stored in file b.txt.

4.12 diff: FINDING DIFFERENCES BETWEEN TWO FILES


The diff command is used for comparing two files. If there are no differences between the
two files being compared, the command does not display any output. Otherwise it displays
the output indicating the changes that need to be made to the first file to make it same as the
second file.
Syntax diff file1 file2
76 Unix and Shell Programming

All the differences found in the two files are displayed in a format consisting of two numbers
and a character in between. The number to the left of the character represents the line number
in the first file, and the number to the right of the character represents the line number in the
second file. The character can be any of the following:
1. d: delete
2. c: change
3. a: add
Example Assume we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy

Now on comparing the two files, we get the following output.


$ diff users.txt customers.txt
2c2
< Peter
--
> Charles

Note: The < character precedes the lines from the first file and > precedes the lines from the second file.

This output indicates that the two files differ by only one line. It indicates that if the second
line, Peter, in the first file (users.txt) is changed to the second line, Charles, of the second
file (customers.txt), both files will be exactly the same.
To better understand the diff command, let us twist the content of the first file users.txt
as follows:
users.txt
John
Peter
Charles

Keeping the content of the file customers.txt same as before, when we compare the two
files, we get the following output.
$ diff users.txt customers.txt
2d1
< Peter
3a3
> Troy

The output indicates that to make the file users.txt the same as customers.txt, we have to
delete the second line, Peter, and add the third line, Troy, from customers.txt after the third
line in users.txt.
Advanced Unix Commands 77

4.13 cmp: COMPARING FILES


The cmp command compares two files and indicates the line number where the first difference
in the files occurs. The cmp command does not display anything if the files being compared
are exactly the same.
Syntax cmp [[-l][-s]] file1 file2 [skip1] [skip2]

The related options and arguments are briefly explained in Table 4.10.
Table 4.10 Brief description of the options and arguments used in the cmp command

Option Description
-l It prints the byte number and the differing byte values in octal for each difference.
-s It displays nothing but the return exit status on the screen. The status returned can be
any of the following:
0: If the two files are identical
1: If the two files are different
>1: If an error occurs while reading the files
file1 and file2 These are the files to be compared.
skip1 and skip2 These are the optional byte offsets from the beginning of file1 and file2 respectively,
where we wish to begin the comparison of files. The offset can be specified in
decimal, octal, and hexadecimal. The offsets in hexadecimal and octal formats have
to be preceded by ‘0x’ and ‘0’, respectively.

Example Consider we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy

The following are examples of commands that are used to compare the two files.

$cmp users.txt customers.txt

The cmp command compares the files users.txt and customers.txt and displays the
following output.
users.txt customers.txt differ: byte 6, line 2

The output indicates that the byte location where the first difference between the two files
(users.txt and customers.txt) occurs is 6.
The following example shows the list of byte locations and the differing byte values in
octal format for every difference found in the two files:
$cmp -l users.txt customers.txt
78 Unix and Shell Programming

$ cmp -1 users.txt customers.txt We get the output shown in Fig. 4.11.


6 120 103 The output displays the byte locations of the
7 145 150 difference between the two files and also shows that
8 164 141
9 145 162 the users.txt file is smaller than the customers.txt
10 162 154
11 12 145 file as it encounters its end of file (EOF) marker while
12 124 163 being compared with the content of customers.txt.
13 162 12
14 157 124 The following example shows the status returned on
15 171 162 comparing the two files.
16 12 157
cmp: EOF on users.txt $ cmp -s users.txt customers.txt
Fig. 4.11 List of byte locations and the
This example returns the exit status. On displaying
differing bytes in the two files users.txt
the exit status value (Fig. 4.12), we get a value one,
and customers.txt
confirming that the two files are not the same but
different.
$ cmp -s users.txt customers.txt
$ echo $? The following example compares the two files after
1 skipping the offset of 12 and 14 bytes from the two
$ cmp -s users.txt customers.txt 12 14
$ echo $? files respectively.
0
$ cmp -s users.txt customers.txt 12 14
Fig. 4.12 Comparison between files and display
of status On displaying the return status (Fig. 4.12), we get an
output 0, which confirms that the two files are exactly
the same after giving the offset values.

4.14 uniq: ELIMINATING AND DISPLAYING DUPLICATE LINES


The uniq command is used to find and display duplicate lines in a file. In addition, we can
use it to eliminate duplicate lines and display only unique lines.
Syntax uniq [-c | -d | -u ] [ -f fields ] [ -s char ] [input_file [ output_file ] ]

The related options and arguments are briefly explained in Table 4.11.

Table 4.11 Brief description of the options and arguments used in the uniq command

Option Description
-c It precedes each line with a count of the number of occurrences.
-d It displays only repeated lines (duplicate) in the input.
-u It displays only unique lines in the input.
-f fields It ignores the first given number of fields on each input line.
-s char It ignores the first given number of characters of each input line. If this option is used along
with the -f option, the first given number of characters after the first fields will be ignored.
input_file It is the name of the file whose content we need to compare.
output_file It is the name of the file where the output of the command will be stored. If no output file is
specified, the output will appear on the standard output.
Advanced Unix Commands 79

Example $ uniq a.txt > b.txt

This command removes duplicate lines in the file a.txt and saves it in another file b.txt.
Let us assume the file a.txt contains the following content:
a.txt
It may rain today
I am leaving now
It may rain today
Lovely weather
I am leaving now

The following is the command for removing all duplicate lines from a file.
$ sort a.txt | uniq
This command sorts and removes all the duplicate lines in the file a.txt and displays only
the unique lines on the screen. We get the following output.
I am leaving now
It may rain today
Lovely weather

The following command is used to display only the unique lines.


$ sort a.txt | uniq -u
The command sorts and displays only the unique lines in the file a.txt on the screen. We get
the following output.
Lovely weather

The following command is used to display all the duplicate lines in a file.
$ sort a.txt | uniq -d

We get the following output.


I am leaving now
It may rain today

The following command is used to display the count of duplicate occurrences in a file.
$ sort a.txt | uniq -c

We get the following output.


2 I am leaving now
2 It may rain today
1 Lovely weather

4.15 comm: DISPLAYING AND SUPPRESSING UNIQUE OR COMMON


CONTENT IN TWO FILES
The comm command displays or suppresses the content common to two files. The output is
displayed in a column format, where the first column represents the output related to the first
file and the second column displays the output related to the second file.
80 Unix and Shell Programming

Syntax comm [-1] [-2] [-3 ] file1 file2

The options and arguments are briefly explained in Table 4.12.


Table 4.12 Brief description of the options and arguments used in the comm command

Option Description
-1 It suppresses the display of the content that is unique to file1. It also displays the unique
content in file2.
-2 It suppresses the display of the content that is unique to file2. It also displays the unique
content in file1.
-3 It suppresses the display of the content that is common to both file1 and file2, that is, it
displays the unique content in file1 and file2.
file1 and file2 These are the two files being compared.

Note: When the comm command is executed without any options, the output will comprise three columns,
where the first column displays content unique to the first file, the second column displays content unique to
the second file, and the third column displays content common to both the files.

Examples Suppose we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy

(a) This example compares the two files (users.


$ comm users.txt customers.txt
John txt and customers.txt) and displays the
Charles output in three columns. The first column
Peter
Troy displays content unique to the first file, the
$ comm -1 users.txt customer.txt
John second column displays content unique to
Charles the second file, and the third column displays
Troy
$ comm -2 users.txt customers.txt content common to both the files (Fig. 4.13).
John
Peter $comm users.txt customers.txt
Troy
$ comm -3 users.txt customers.txt (b) This example compares the two files, users.
Charles txt and customers.txt, suppresses the
Peter
content that is unique in users.txt, and also
Fig. 4.13 Output of the comm command on displays the unique content in customers.txt
comparing two files (Fig. 4.13).
$comm -1 users.txt customers.txt
(c) This example compares the two files, users.txt and customers.txt and suppresses the
content that is unique in customers.txt and also displays the unique content in users.
txt (refer to Fig. 4.13).

$comm -2 users.txt customers.txt


Advanced Unix Commands 81

(d) This example compares the aforementioned two files and suppresses the content that is
common in customers.txt and users.txt (Fig. 4.13).
$comm -3 users.txt customers.txt

4.16 time: FINDING CONSUMED TIME


The time command executes a specified command and also displays the time consumed in its
execution. The time usage of the specified command is displayed on the screen.
Syntax time command [arguments]

In the aforementioned syntax, command is the command whose time usage is to be


determined.
Examples We can determine the time taken to perform the sorting operation by preceding
the sort command with the time command.
(a) $ time sort -o newlist invoice.lst
real 0m1.18s
user 0m0.73s
sys 0m0.38s

The real time refers to the time elapsed from the invocation of the command till its
termination. The user time shows the time spent by the command in executing itself
while sys indicates the time used by the Unix system in invoking the command.
(b) Let us see how much time it takes to store the recursive long listing of files and directories
sorted on modification time in a file.
$ time ls -ltR >k.out
real 0m0.04s
user 0m0.01s
sys 0m0.01s

Real time The real time represents the time taken by the command (from its initiation to
termination) to execute.
User time The user time represents the time taken by the command to execute its own
code, that is, the code run in user mode. It represents the actual CPU time used in executing
the command. For small programs that take milliseconds to execute, this time is often
reported as 0.0.
Sys time The sys time is the amount of CPU time spent in the kernel for running the
command. It represents the CPU time spent in executing the system calls that are invoked by
the command within the kernel.
The time command can be used to isolate the commands that are time consuming so that
they can be run in the background. We will learn the process of executing the commands in
the background in Chapter 6.

Note: The combination of user and sys time is known as CPU time.
82 Unix and Shell Programming

4.17 pg: SHOWING CONTENT PAGE-WISE


This command displays the specified long file page-wise, that is, one screen page at a time.
It also enables us to navigate to the previous or following screen. In addition, we can search
for the desired pattern in the given file. On giving this command, it shows the first screen
page of the given file and shows the colon (:) at the bottom of the screen where we can use
the following character(s) to view the desired content in the given file.
Syntax pg [-number] [+ linenumber] [+/pattern/] [filename]

Here, -number specifies the screen size in lines. The default screen size is 23 lines. +line_
number shows the file from the given line number. +/pattern/ shows the file where the given
pattern begins. filename specifies the filename that we wish to view page-wise along with its
path.
Table 4.13 Brief description of the list of commands The list of commands that can be
given on execution of the pg command given on execution of the pg command
are briefly explained in Table 4.13.
Command Description
h Displays help information
Examples
q or Q Quits the pg command
(a) $ pg letter.txt
<blank> or Moves to the next page
This command displays the file
<newline>
letter.txt one screen page at a
$ Moves to the previous page time.
f Skips the next page (b) $pg letter.txt -10
/pattern Searches forward for the given pattern and This command displays the
displays it content of the file letter.txt
?pattern Searches backward for the given pattern one screen page at a time where
and displays it a page consists of 10 lines.

(c) $pg letter.txt +25


This command displays the content of the file letter.txt page-wise from the 25th
line.
(d) $pg letter.txt +/happy/
This command displays the content of the file letter.txt from the location where the
word happy occurs in the file.

4.18 lp: PRINTING DOCUMENTS


The term lp stands for line printer and the command is used for printing files.

Syntax lp [ -d printer_destination ] [ -n number_of_copies] [ -q priority ]


[ -H ] [ -P page_list ] file(s)

The options used in this command are briefly explained in Table 4.14.
Advanced Unix Commands 83

Table 4.14 Brief description of the options used in the lp command

Option Description
-d It is used for defining the printer destination, that is, the name of the printer we wish to print
the file(s) with.
-n It is used to define the number of copies to print. The valid range is from 1 to 100.
-P It is used to define the pages of a selected file that we wish to print. The page list contains the
page numbers and page range separated by commas (,). Examples: 1, 5, 9–11, 20.
-i It is used to identify the job ID assigned to the print command. On giving the lp command,
it notifies the job ID assigned to the task.
-H It is used to control the printing job. The values used with this option are as follows:
1. Hold: Holds the printing job
2. Resume: Resumes the printing job
3. HH:MM: Holds the job till the specified time
4. Immediate: Prints the job immediately
-q - It is used to set the priority of the print job. The valid values are from 1 (indicates lowest
priority) till 100 (indicates highest priority). The default priority value is 50.

Examples The following are the examples of the lp commmand.

(a) $lp notes.txt


This command prints the file notes.txt on the default printer.
(b) $lp -d Deskjet1001 notes.txt
This command prints the file notes.txt on the printer named Deskjet1001.
(c) $lp -d Deskjet1001 -n 2 notes.txt
This command prints two copies of the file notes.txt on the printer named
Deskjet1001.
(d) $lp -d Deskjet1001 -P 2, 5-7, 10, 15- notes.txt
This command prints pages 2, 5, 6, 7, 10, and from 15 till the end of the file notes.txt
on the printer Deskjet1001.

Note: The hyphen after 15 suggests from page 15 onwards.

(e) $lp -d Deskjet1001 notes.txt accounts.txt


This command prints the file notes.txt and accounts.txt on the printer Deskjet1001.
(f) $lp -i 1207 -H hold
The print job number 1207 is held for a while.
(g) $lp -i 1207 -H resume
The print job number 1207 is resumed for printing.
(h) $lp -i 1207 -q 100
The print job number 1207 is given the highest priority.
A command that goes along with the lp command is the cancel command. Let us now
discuss this briefly.
84 Unix and Shell Programming

4.19 cancel: CANCELLING PRINT COMMAND


The cancel command cancels existing print jobs.
Syntax cancel [ id ] [ printer_destination ]

The options and arguments used in this command are briefly explained in Table 4.15.
Table 4.15 Brief description of the options used in the
cancel command

Option Description
id It indicates the print job ID that we wish
to cancel.
printer_ It removes all jobs from the specified
destination printer destination.

Examples
(a) $cancel Deskjet1001
This command cancels all print jobs sent for printing at the Deskjet1001 printer.
(b) $cancel 1207
This command cancels the print job with ID 1207.

4.20 UNDERSTANDING .profile FILES


The .profile file exists in our home directory and is the start-up file that is automatically
executed when we log in to the Unix system. The file can be used to customize our
environment, setting PATH variable and terminal type, and also to write the commands and
scripts that we wish to automatically execute when we log in.
Note: The Unix operating system executes several system files including the .profile file before returning
the command prompt to the user.

The most basic variables used in the .profile file to set up an environment for us are as follows:
1. The PATH variable defines the search path to find the commands and applications that we
execute. Through the PATH variable, the commands and scripts can be executed in directories
other than their source directories (directories where the command or script exists).
2. $HOME is the name of the directory from where we begin our Unix session.
3. ENV refers to the environment variables.
We will learn about these variables in detail in Chapter 5.
Using any editor, we can add commands to the .profile file, which we wish to execute
automatically when we log in. Chapter 8 will help you use different editors. A new command
added to .profile will come into effect either when we log out and log in again or when we
run the .profile file at the command prompt through the following command:
$.$HOME/.profile
Advanced Unix Commands 85

4.21 calendar: GETTING REMINDERS


The calendar command reads the calendar file and displays appointments and reminders for
the current day.
Syntax calendar

Example For this command to work, we need to create a file named calendar at the root
of our home directory and write our appointments or reminders in the following format.
10/7/2012 Today is Board Meeting
10/8/2012 Visiting Doctor

Now, if today is 7 October 2012, and we execute the calendar command, the line Today is
Board Meeting will appear on the screen.

Note: To avoid executing the calendar command every day, add it at the end of our .profile file that we
just discussed.

4.22 script: RECORDING SESSIONS


The script command is used for recording our interaction with the Unix system. It runs in
the background recording everything that is displayed on our screen.
Syntax script [-a] filename

The options and arguments used in the command are briefly explained in Table 4.16.

Table 4.16 Brief description of the options used in the script command

Option Description
-a It appends the session into the filename. If this option is not specified, the filename
will be overwritten with the new data.
filename This gives the name of the file where our session will be recorded. If we do not
provide a filename to the script command, it places its output in a default file
named transcript.

Example The following example will begin recording the session in the file transact.txt:
$ script transact.txt

To exit from the scripting session, either press Ctrl-d or write exit on the command prompt
followed by the Enter key.
Figure 4.14(a) shows how the session is recorded in the file transact.txt. The commands
executed, cat, sort, mkdir, rmdir, etc., are recorded into the file transact.txt. To stop
recording, Ctrl-d keys are pressed. To confirm if the session is properly recorded in the file,
we execute the cat command to view the contents of the file transact.txt. Figure 4.14(b)
confirms that the session is correctly recorded in the file transact.txt.
86 Unix and Shell Programming

$ script transact.txt $ cat transact.txt


Script started, file is transact.txt Script started on 21 February 2012
10:24:40 PM IST
$ cat bank.lst
101 Aditya 0 14/11/2012 current $ cat bank.lst
102 Anil 10000 20/05/2011 saving 101 Aditya 0 14/11/2012 current
103 Naman 0 20/08/2009 current 102 Anil 10000 20/05/2011 saving
104 Rama 10000 15/08/2010 saving 103 Naman 0 20/08/2009 current
105 Jyotsna 5000 16/06/2012 saving 104 Rama 10000 15/08/2010 saving
105 Jyotsna 5000 16/06/2012 saving
106 Mukesh 14000 20/12/2009 current 106 Mukesh 14000 20/12/2009 current
107 Yashasvi 14500 30/11/2011 saving 107 Yashasvi 14500 30/11/2011 saving
108 Chirag 0 15/12/2012 current 108 Chirag 0 15/12/2012 current
109 Arya 16000 14/12/2010 current 109 Arya 16000 14/12/2010 current
110 Puneet 130 16/11/2009 saving 110 Puneet 130 16/11/2009 saving
$ sort -r bank.lst $ sort -r bank.lst
110 Puneet 130 16/11/2009 saving 110 Puneet 130 16/11/2009 saving
109 Arya 16000 14/12/2010 current 109 Arya 16000 14/12/2010 current
108 Chirag 0 15/12/2012 current 108 Chirag 0 15/12/2012 current
107 Yashasvj 14500 30/11/2011 saving 107 Yashasvj 14500 30/11/2011 saving
106 Mukesh 14000 20/12/2009 current 106 Mukesh 14000 20/12/2009 current
105 Jyotsna 5000 16/06/2012 saving 105 Jyotsna 5000 16/06/2012 saving
104 Rama 10000 15/08/2010 saving 104 Rama 10000 15/08/2010 saving
103 Naman 0 20/08/2009 current 103 Naman 0 20/08/2009 current
102 Anil 10000 20/05/2011 saving 102 Anil 10000 20/05/2011 saving
101 Aditya 0 14/11/2012 current 101 Aditya 0 14/11/2012 current

$ mkdir projects
$ mkdir projects
$ rmdir projects
$ rmdir projects
$ ^d
$ Script done, file is transact.txt
$ Script done on 21 February 2012
10:25:19 PM IST

(a) (b)
Fig. 4.14 Recording a session (a) Recording in the file transact.txt (b) Recorded session

4.23 CONVERSIONS BETWEEN DOS AND UNIX


There is a difference in format between Unix and DOS files. DOS (or Windows) files end
with both the line feed and carriage return, whereas Unix files end only with the line feed
character. It also means that in DOS files the new line character comprises carriage return
and line feed, whereas in Unix files, the new line character comprises only the line feed
character. The following are the two commands used in conversions between DOS and
Unix files:

1. dos2unix: Converts text files from DOS to Unix format


2. unix2dos: Converts text files from Unix format to DOS format

The syntax for the dos2unix command is as follows:

Syntax dos2unix [-b] file1 [file2]

The options and arguments used in this command are briefly explained in Table 4.17.
Advanced Unix Commands 87

Table 4.17 Brief description of the options used in the Examples


dos2unix command
(a) $ dos2unix a.txt b.txt
Option Description This converts the DOS file a.txt into b.txt.
-b It creates a backup of file1 with the name The new line characters consisting of the
file1.bak before converting it into Unix format. carriage return and line feed character
File1 It is the file in DOS format. in a.txt will be converted to line feed
File2 It is the file in which we wish to store the Unix character and stored in file b.txt, which is
format of the file. If file2 is not used, the original a file in Unix format.
file file1, will be converted into the Unix format. (b) $ dos2unix -b a.txt
The command converts the DOS file a.txt
Table 4.18 Brief description of the options and into the Unix format. The original DOS
arguments used in the unix2dos command format will be backed up and stored as
a.txt.bak.
Option Description
The syntax for the unix2dos command is as
-b It creates a backup of file1 by name file1.
follows:
bak before converting it into DOS format.
file1 It is a file in the Unix format. Syntax unix2dos [-b] file1 [file2]
file2 It is the file in which we wish to store the The options and arguments used in this
DOS format of the file. If file2 is not used command are briefly explained in Table 4.18.
then the original file, file1 will be converted
into the DOS format. Examples
(a) $ unix2dos a.txt b.txt
The command converts the Unix file a.txt into b.txt. The new line characters consisting
of line feed character in a.txt will therefore be converted to a combination of carriage
return and line feed character and stored in file b.txt, which is a file in DOS format.
(b) $ unix2dos -b a.txt
The command converts the Unix file a.txt into DOS format. The original Unix format
will be backed up and stored as a.txt.bak.

4.24 man: DISPLAYING MANUAL


The term man stands for manual and displays the online documentation of the given Unix
command. This command is meant for helping the user by providing usage, syntax, and
examples of using the given command.
Syntax man [-] [-k pattern] command

Here, – (hyphen)displays the information without stopping; -k pattern searches all the
commands documented in the man pages that contain the specified pattern, and displays the
list of matching commands.
Example $ man cp

This example displays the manual of the cp command. If the manual consists of several pages,
the first page will be displayed and we can press the spacebar to move on to the next page.
$ man -k backup
88 Unix and Shell Programming

$ man -k backup
/usr/dt/man/windex: No such file or directory
/usr/man/windex: No such file or directory
/usr/oenwin/share/man/windex: No such file or directory

$ catman -w
/usr/lib/getNAME: gnome-session-save.l - repeated date

$ man - k backup
asadmin-backup-domain asadmin-backup-domain (las) - performs a backup
on the domain
asadmin-list-backups asadmin-list-backup (las) - lists all backups
and restores
asadmin -restore-domain asadmin-restore-domain (las) - restores files
from backup
backup-domain asadmin-backup-domain (las) - performs a backup on the domain
list-backups asadmin-list-backups (las) - lists all backups and restores
nisbackup nisbackup (lm) - backup NIS+ directories
nistrestore nisrestore (lm) - restore NIS+ directory backup
restore-domain asadmin-restore-domain (las) - restores files form backup
tdbackup tdbbackup(lm) - tool for backing up and for validating the in
egbrity of samba \&. tdb files

Fig. 4.15 Manual containing the specified pattern

This example searches the documentation in the man pages and displays the list of commands
that contain the pattern backup. In case we get an error—windex directory not found—as
shown in Fig. 4.15, we need to create the windex directory by giving the catman –w command.
The windex directory once created will show the manual entry of the desired pattern.
Figure 4.15 shows the manual entry having the pattern backup.

4.25 CORRECTING TYPING MISTAKES


While typing commands on the terminal screen, it is inevitable that we might commit typing
mistakes. To correct the typing mistakes, the default key combinations listed in Table 4.19 are used.

Table 4.19 Default key combinations

Keys Description
Ctrl-h It erases text.
Ctrl-c The Interrupt key terminates any currently running process and returns to the prompt.
Ctrl-d It represents the exit or end of a transaction. The keys are used to indicate that the entering of text is complete.
Ctrl-j It represents the Enter key.
Ctrl-s It suspends the output temporarily and is usually used to stop the scrolling of screen output.
Ctrl-q Its function is opposite to that of Ctrl-s. It resumes the scrolling of output.
Ctrl-z It temporarily suspends a program and provides another shell prompt. In order to resume, it uses the jobs
command to find the program’s name and restarts it with the fg command.
Ctrl-u It kills the command line, that is, clears the complete line.
Ctrl-\ It terminates the running command and creates a core file containing the memory image of the command.
Advanced Unix Commands 89

We can change these default keys for erasing characters and killing a line through the stty
command. The stty command is discussed in detail in Chapter 10.
This chapter dealt with numerous advanced Unix commands. It covered the essential
commands for changing the permissions of files and directories, changing ownership and
groups, sharing files among groups, pipe operators, etc. In addition, the chapter covered
commands such as cut, paste, head, and tail that are used to extract desired regions from
given files. For comparing files, diff, cmp, uniq, and comm commands were discussed.
Commands for printing, measuring the time consumed in running certain commands,
showing calendar, recording sessions, and configuring the environment through .profile
have also been explained in detail.

■ SUMMARY ■

1. There are three classes of system users in Unix: The append operator, ‘>>’ is used for appending the output
Owner, Group, and Other. The read permission has a of a command to a file, that is, without overwriting its older
value = 4, the write permission has a value = 2, and the content. To redirect the standard input, we use the input
execute permission has a value = 1. redirection operator, that is, the ‘<’ (less than) symbol.
2. Unix assumes the default permissions of a directory 5. The pipe operator ‘|’ is used for sending the output of
to be 777 and that of a file as 666 and subtracts the one command as the input to another command.
permissions specified in the umask command to define 6. The difference between the pipe operator and the
their permissions at the time of their creation. output indirection operator ‘>’ is that the output
3. By default, each command takes its input from the indirection operator ‘>’ is mostly used for sending
standard input and sends the results to the standard the output of a command to a file, whereas the pipe
output; however, through I/O redirection, we can operator is used for sending output of a command to
change the default location of input and output. another command for further processing.
4. The ‘>’ (greater than) symbol is known as the output 7. DOS (or Windows) files end with both the line feed and
redirection operator and we can use it to divert the output carriage return, whereas Unix files end only with the
of any command to a file instead of the terminal screen. line feed character.

■ F U N C T ION SPECIFICATION ■

Command Function Command Function


chmod It changes file/directory permissions. cut It slices (cuts) a file vertically. Files can be
umask It stands for user file creation mask and cut on the basis of characters and fields too.
sets the default permissions of the files that paste It joins content from different files.
will be created in the future. wc (word It calculates the number of characters,
chown It transfers ownership of a file to another count) words, and lines in a file.
user. Once the ownership of a file is head It selects the specified number of lines and
transferred to another user, one cannot characters from the beginning of the given
change its permissions until they become file; the default number of lines selected is 10.
the owner again. tail It selects a specified number of lines and
chgrp It changes the group ownership of the file. characters from the bottom of the specified
groups It creates a group. file; the default number of lines selected is 10.
sort It sorts files either line-wise or on the basis pg It displays a long file page-wise, that is, one
of certain fields. screen page at a time.
90 Unix and Shell Programming

Command Function Command Function


man It displays the online documentation or amount of time that the command takes to
manual of the given Unix command. execute its own code. The sys time repre-
diff It displays the difference between two files sents the time taken by Unix to invoke the
in a format that consists of two numbers command.
and a character in between. The number to lp It stands for line printer and prints files.
the left of the character represents the line Using the lp command, we can define
number in the first file, and the number to the printer we wish to use through the -d
the right of the character represents the line option, the number of copies through the
number in the second file. -n option, the pages to print through the -p
uniq It finds and displays duplicate lines in a file. option, and priority through the -q option.
In addition, it can be used to display only cancel It cancels existing print jobs.
the unique lines in a file. The -u option of profile file It exists in the home directory and is the
the uniq command removes all duplicate start-up file that automatically executes
lines from a file. The -d option of the uniq when we log in to the Unix system. It can be
command is used to display all duplicate used to customize our environment. It can
lines in a file. also be used to write the commands and
split It splits a file into a specified number of scripts that we wish to execute automat-
lines or bytes. ically when we log in.
cmp It compares two files and indicates the line calendar It reads the calendar file and displays
number where the first difference in the files appointments and reminders for the current
occurs. It does not display anything if the day.
files being compared are exactly the same. script It records our interaction with the Unix
comm It displays or suppresses the content com- system. It runs in the background recording
mon to two files. everything that shows up on the screen.
time It displays the time usage by a specific dos2unix It converts text files from a DOS to a Unix
command. The real time is the elapsed time format.
from the invocation of the command till its unix2dos It converts text files from a Unix format to a
termination. The user time represents the DOS format.

■ EXERCISES ■

Objective-type Questions
State True or False
4.1 The three classes of system users that are used in 4.5 If we transfer the ownership of our file to
assigning permissions to the files and directories another person, we can no longer change its file
are Owner, Group, and Family. permissions.
4.2 To delete a file, a write permission is not required 4.6 Either the owner or the super user can change the
but an execute permission is required. ownership of a file.
4.3 By using the umask command, we can specify 4.7 We can make a group of users share permissions
the permissions that we want to deny. on a given set of files.
4.4 The system-wide default permission for a dir- 4.8 The sort command can sort the file on the basis
ectory is 666. of a given field in the file.
Advanced Unix Commands 91

4.9 We cannot sort a file in the reverse order through in the file occurs.
the sort command. 4.15 The cmp command displays a message ‘exactly
4.10 The ‘<’ symbol is the output redirection operator same’ if the files compared are exactly the
and the ‘>’ symbol is the input redirection operator. same.
4.11 The ‘>>’ symbol redirects the output of a command 4.16 The comm command either displays or hides the
to a file after overwriting its earlier content. content common to two files.
4.12 Several commands can be attached using the 4.17 The time command displays the system time and
pipe operator. even allows it to be modified.
4.13 The default number of lines into which the split 4.18 The real time is the elapsed time from the
command splits a file is 100 lines. invocation of the command till its termination.
4.14 The cmp command compares two files and in- 4.19 The calendar command displays the calendar of
dicates the line number where the first difference a specified month and year.

Fill in the Blanks


4.1 There are three types of system users: owner, 4.11 The diff command displays the differences
group, and . between two files that are being compared in a
4.2 The command used to change the permission of format that consists of two and a
the file or directory is . in between.
4.3 The symbol x in the chmod command represents 4.12 The command is used for identifying
permission. and displaying duplicate lines in a file.
4.4 The command used to get the specified nu- 4.13 The option of the uniq command
mber of lines from the beginning of a file is removes all duplicate lines from a file.
. 4.14 The file can be used to customize
4.5 The option used with the tail command to skip our environment.
the specified number of lines is . 4.15 For recording our interaction with the Unix
4.6 The option used with the tail command to get system, command is used.
the specified number of characters from the end 4.16 The option of the lp command is
of file is . used for defining the destination printer while
4.7 Unix assumes the default permissions for a file to the option is used for defining the
be . number of copies to be printed.
4.8 The input redirection operator is represented as 4.17 The command is used to display
. appointments and reminders for the current day.
4.9 To display content one screen page at a time, 4.18 The command converts text files
command is used. from the DOS format to the Unix format.
4.10 The command is used for displaying 4.19 The command is used for cancelling
the documentation of a command. a print job.

Multiple-choice Questions
4.1 The command used for setting default permissions 4.4 The command used for comparing two files is
of files and directories is (a) comp (c) uniq
(a) chmod (c) default (b) compare (d) diff
(b) umask (d) chstat 4.5 The option of the uniq command that removes
4.2 The three types of system users are User, all duplicate lines is
Group, and (a) -d (c) -r
(a) Other (c) Community (b) -u (d) -m
(b) Society (d) Everyone 4.6 The command used to change the group of a file
4.3 The option used with the chgrp command to is
change the group of a symbolic link is (a) groups (c) chgrp
(a) -s (b) -l (c) -g (d) -h (b) chmod (d) ls -g
92 Unix and Shell Programming

4.7 The statement $chown :accounts a.txt will selects and displays lines in reverse order from
change the bottom to the top is
(a) group of the file (a) -t (c) -b
(b) owner of the file (b) -r (d) -c
(c) nothing 4.10 The statement $ head -c 10 a.txt b.txt
(d) owner and group of the file displays
4.8 The option used with the sort command to re- (a) the first 10 lines of a.txt file only
move duplicate lines in a sorted output is (b) the first 10 lines of a.txt and b.txt files
(a) -d (b) -q (c) -u (d) -n (c) the first 10 characters of a.txt file only
4.9 The option used with the tail command that (d) the first 10 characters of a.txt and b.txt files

Programming Exercises
4.1 What will the following commands do? (d) To display the first two lines of the files
(a) $chmod 410 management.txt mbacourse.txt and management.txt
(b) $umask 233 (e) To display lines starting from the fifth till the
(c) $chgrp jobs mbacourse.txt end of the file in mbacourse.txt
(d) $head -c 100 mbacourse.txt management (f) To show the content of the file finance.txt
.txt located in accounts directory page-wise
(e) $tail -2 management.txt (g) To sort the file a.txt in reverse order and
(f) $man -K disk store it in file b.txt
(g) $cut -d"," -f3 bank.lst (h) To cut the first and third fields of the file
(h) $paste -d"<>" names.txt numbers.txt letter.txt that is delimited by a tab space
(i) $sort a.txt > b.txt (i) To create a group by the following name:
(j) $ split -5 numbers.txt temp latestprojects
(k) $ cmp -s a.txt b.txt 3 5 (j) To compare two files, accounts.txt and
(l) $ time ls | sort | lp finance.txt, and show the changes that need
(m) $ lp -d Epson100 -P 10-15, 20 a.txt to be made in the file accounts.txt to make
(n) $ comm a.txt b.txt it similar to finance.txt
4.2 Write the command for the following tasks: (k) To display all duplicate lines in the file
(a) To assign read, write, and execute permissions accounts.txt
to the owner; read and write permission to the (l) To remove all duplicate lines in the file
group; and only read permission to others for accounts.txt and save it in another file
the file mbacourse.txt correct.txt
(b) To set permissions for the directories to be (m) To split a file accounts.txt into the files
created in the future as read, write, and execute accountaa, accountab, accountac, and so
for the owner; read and write for the group; on, each consisting of 20 bytes
and only read for others (n) To compare two files, a.txt and b.txt, and
(c) To change the ownership of the file mbcourse. display the first character that is different in
txt to charles the two files

Review Questions
4.1 Explain the following commands with syntax and (b) What is the difference between the cmp and
examples. diff commands?
(a) pg (c) dos2unix 4.3 (a) Explain the different options used in the lp
(b) wc (d) tail command while printing a file.
4.2 (a) What is the difference between the chown and (b) Explain how a file is sorted.
chgrp commands? 4.4 What is the difference between the following pairs
Advanced Unix Commands 93

of commands that are used for extracting content (b) head and tail commands
from the files? 4.5 Briefly explain how the file access permissions are
(a) cut and split commands handled in the Unix operating system.

Brain Teasers
4.1 Suppose you want to assign read, write, and 4.6 Correct the mistake in the following command
execute permissions to the user, that is, the owner to cut the first and third fields of the file a.txt
of the file a.txt using the following command. delimited by the ‘|’ symbol
What is wrong with the following command? $ cut -f1,3 a.txt
Correct the mistake. 4.7 Is there a way to split a file a.txt into pieces
$ chmod o=rwx a.txt that are 10 kB each? If yes, what is that?
4.2 Correct the following command to change the 4.8 When you compare two files, a.txt and b.txt
owner and group of the file a.txt to user chirag with the cmp command, no output appears on the
and accounts respectively. screen. What does this mean?
$ chown accounts:chirag a.txt 4.9 Correct the mistake in the following command
4.3 Correct the mistake in the following command to suppress the display of the content, that is,
in order to change the group of the symbolic file commands in the files a.txt and b.txt.
b.txt to accounts. $ comm -1 a.txt b.txt
$ chgrp accounts b.txt 4.10 Correct the mistake in the following command in
4.4 Can you sort the file a.txt on the second and order to print two copies of the file a.txt.$ lp
third field skipping the first field? How? a.txt -q 2
4.5 The following command overwrites the content 4.11 What will happen if you add the calendar
of the file a.txt. What command will you use command in the .profile file?
to avoid the accidental overwriting of an existing 4.12 Correct the mistake in the following command to
file? extract line numbers 10 to 15 from the file a.txt.
$ ls > a.txt $ head -10 a.txt | tail +15

■ ANSWERS TO OBJECTIVE-TYPE QUESTIONS ■


State True or False 4.14 True 4.7 666 Multiple-choice
4.1 False 4.15 False 4.8 < Questions
4.2 False 4.16 True 4.9 pg
4.1 (b)
4.3 True 4.17 False 4.10 man
4.2 (a)
4.4 False 4.18 True 4.11 numbers, 4.3 (d)
4.5 True 4.19 False character 4.4 (d)
4.6 True 4.12 uniq
4.5 (b)
4.7 True Fill in the Blanks 4.13 -u 4.6 (c)
4.8 True 4.1 Other 4.14 .profile
4.7 (a)
4.9 False 4.2 chmod 4.15 script 4.8 (c)
4.10 False 4.3 execute 4.16 -d, -n 4.9 (b)
4.11 False 4.4 head 4.17 calendar 4.10 (d)
4.12 True 4.5 +n 4.18 dos2unix
4.13 False 4.6 -c 4.19 cancel
File Management C HA PT E R

5
and Compression
Techniques

After studying this chapter, the reader will be conversant with the following:
• The types of devices, role of device drivers, and the way in which devices
are represented in the Unix operating system
• Using disk-related commands for copying disks, formatting disks, finding
disk usage, finding free disk space, and dividing the disk into partitions
• Compressing and uncompressing files using different commands such as
gzip, gunzip, zip, compress, uncompress, pack, unpack, bzip2,
bunzip2, and 7-zip
• The types of files, locating files, searching for files with a specific string, and
finding utility on a disk
• Checking a file system for corruption
• Important files of the Unix system, where and how passwords are kept,
where the list of hosts is kept, and how to allow or deny any user from
accessing certain resources

5.1 MANAGING AND COMPRESSING FILES


File management deals with the different types of files that are managed in the Unix system.
It helps one understand the various ways of searching for the desired files, repairing the
file system, and the important files of the Unix system that manage user passwords, store
addresses of hosts, and the list of users that are allowed or denied access to the system.
Since files are stored on disks, different disk-related commands have also been referred to.
Compression techniques encompass the various methods of compressing and uncompressing
files. Compressing files is the best way to optimize the disk usage. Moreover, it is quite easy
to manage compressed files, that is, we can backup and restore the compressed files easily.
We will see the pros and cons of different commands and the extent of compression they
carry out. In this chapter, we will learn the following types of commands:
1. Dealing with devices
2. Device drivers
File Management and Compression Techniques 95

3. Block and character devices


4. Major and minor numbers
5. Disk-related commands: dd, du, df, dfspace, and fdisk
6. Compressing and uncompressing files: gzip, gunzip, zip, unzip, compress, uncompress,
pack, unpack, bzip2, bunzip2, and 7-zip
7. Dealing with files: file, find, which, locate, and fsck (file system check utility)
8. Important files of the Unix system: etc/passwd, /etc/shadow, /etc/hosts, etc/hosts.
allow, and /etc/hosts.deny
9. Shell variables:
(a) User-created shell variables
(b) System shell variables: CDPATH, HOME, PATH, Primary Prompt (PSI Prompt), and TERM
(c) Local and global shell variables: export
Disks are considered to be the essential devices of a computer. Let us first understand what
the different devices are and how they are dealt with in the Unix system before discussing
the different disks and file management commands.

5.2 COMPUTER DEVICES


While working on a computer, we deal with various peripherals such as hard drives, floppy
and CD-ROM drives, audio and video cards, and serial and parallel ports. These peripherals
are also known as devices. These devices combine to make the computer the system it is. In
Unix, all devices are considered to be files, which are also known as device files. We learnt
in Chapter 1 that there are several categories of files, namely ordinary files, directory files,
device files, symbolic links, pipes, and sockets. The question that arises here is how the
different categories of files can be differentiated. The answer lies in the long listing command.
On executing the long listing command, ls -al, the list of files and directories that is
displayed as a result helps us distinguish the different categories of files. The mode field, the
first character in the listing, indicates what type of file it represents. The first character in the
listing may be either a hyphen (-) or one of the following letters: l, c, b, p, s, or d. Table 5.1
explains the different characters that may be displayed in the mode field (of the long listing)
and the type of file represented by it.
Table 5.1 Characters used in Let us observe the output of the long listing command, which is as
the long listing command follows:
$ls -al
Character File type
-rwxrwxrwx 1 chirag it 344 Dec 2 09:20 letter.txt
- Regular file drwxrwxrwx 1 chirag it 10 Oct 12 10:45 projects
l Symbolic link lrwxrwxrwx 2 chirag it 669 Feb 8 03:15 xyz.txt
c Character special prw-r--r-- 1 root root 0 Apr 15 05:20 pipe
srwx—— 1 root root 0 May 12 12:30 log
b Block special
crw-------- 12 bin 6, 0 Dec 5 09:11 lp0
p Named pipe
brw-rw-rw- 1 root 51, 0 Jul 31 07:28 cd0
s Socket
The output of this long listing command shows different types of files
d Directory file
and directories, which are as follows:
96 Unix and Shell Programming

Regular file The file letter.txt that is represented by a hyphen (-) in the mode field is a
regular file. This is the simplest and most common type of file in the Unix system. It is just
a collection of bytes.
Directory The file projects that is represented by character d in the mode field is a
directory—a container of several file directories.
Symbolic link The file xyz.txt that is represented by character l in the mode field is a
symbolic link that refers to other file(s) of the file system.
Named pipe The file pipe that is represented by character p in the mode field is a named
pipe and is used in interprocess communication, that is, sending the output of one process as
input to another process.
Socket The file log that is represented by character s in the mode field is a socket and is a
special file used for advanced interprocess communication.
Special device file The files lp0 and cd0 represented by the characters c and b in the mode
field are special device files. They may be either characters or block device files.
Now we can understand how the device files can be recognized through the long listing of
files and directories.
Next, we will see how the Unix operating system manages and deals with all the devices
of a computer system.

5.2.1 Dealing with Devices


As mentioned in Section 5.2, all devices are represented as files in the Unix operating system.
Like files, we can open and read a device, write into it, and then close it. The functions for
opening, reading, and writing into a device are built into the kernel for each/every device of
the system. These functions or routines for specific devices are known as the device drivers.
Although the terms device driver and device files appear to be similar, they are totally
different. A device file is the representation of a device on the file system hierarchy. It is
basically a special type of file that points to an inode that contains information about the
device that it actually represents. The information in the inode includes major and minor
device numbers where the major device number defines the type of device and the minor
device number identifies a particular device in that type.
On the other hand, a device driver is a program that establishes communication between
the computer and the device by translating the calls given by the user into calls that the
device understands. It hides the inner complexities of how a device works. The commands
given by the user to operate a device are passed to the device driver in the form of calls. The
device driver then maps those calls to device-specific operations.
In order to access a particular device, the kernel calls its device driver. The kernel must
not only know the type of device but also certain details about that device such as the density
of a floppy or the partition of the disk for using the device efficiently.
All device files are stored in /dev or in its subdirectories and can be listed by executing
the following command:
$ ls -l /dev
brw-rw-rw- 1 root 51, 0 Jul 31 07:28 cd0
brw-rw-rw- 1 bin 2, 48 Oct 22 12:10 fd0135ds18
File Management and Compression Techniques 97

brw-rw-Hv- 1 bin 2, 42 Nov 30 19:44 fd196ds15


crw-------- 1 bin 6, 0 Dec 5 09:11 lp0
cr—r—rp- 1 root 50, 0 Mar 31 06:15 rcdt0
crw-rw-rw- 1 bin 2, 48 Jun 22 11:25 rfd0135ds15
… … … … … … … …
… … … … … … …
The files that we see in this listing are not device drivers but are just pointers to where the
driver code can be found in the kernel. In the mode field of the file permissions, we can see
that there is a character c or b. The character c represents a character device whereas the
character b represents a block device. Following the mode field are the permissions, the
links count, and finally the owner of the file. After the owner, we see two numbers that are
separated by a comma (,). These two numbers refer to the major and minor device numbers
respectively. The major device number refers to the device type and the minor number refers
to different instances of the device. For example, two floppy disk drives (second and third
rows in the aforementioned listing) can have the same major number (2), but different minor
numbers as one represents the 1.2 MB and the other represents the 1.44 MB floppy disk
drive. Following the major and minor numbers are the date and time of last modification.
The last column displays the device filenames.
When we provide commands to operate a device, the system uses the major and minor
numbers of the device file to identify the device and henceforth, determines the device driver
that will be used to communicate with the device.
The device driver simplifies the input/output (I/O) tasks performed with the respective
devices. Hardware devices such as printers, disk controllers, network devices, and serial ports
are attached with a device driver that enables the kernel to communicate with them, and hence
get the desired task performed. Device drivers drive the device to perform
User according to the requests received by the kernel, as shown in Fig. 5.1.
A device driver has the following uses:
1. It connects and communicates with the hardware device.
Kernel 2. It is the software that operates the device controller.
3. It resides within the Unix kernel and provides an interface to
Device driver hardware devices.
The two general kinds of device files in the Unix-like operating systems
are character special files and block special files. The difference between
Hardware
them lies in how data is written into them and read from them, and how
Fig. 5.1 Interactions it is processed by the operating system and hardware. These together
between user, kernel, can be called device special files, in contrast to named pipes, which
device driver, and are not connected to a device but are not ordinary files either. A brief
hardware introduction of block and character devices is given in Section 5.2.2.

5.2.2 Block device


Block special files represent the devices that move data in the form of blocks. When such a
device file is accessed for reading or writing data, the kernel provides the address of a kernel
buffer (i.e., buffer cache) that can be used for data transmission to the device driver. Hence,
98 Unix and Shell Programming

while reading a block device, the data is first read in the block and then written into the buffer
cache, so that when the same data is again required, it is read from the buffer cache instead of
being read from the device. Similarly, while writing on a block device, the data is first stored
in the buffer cache before writing on the device. The block devices enable random access.
In other words, the data can be accessed from these devices in a random order. Examples of
block devices include hard disks, CD-ROM drives, and flash drives.
The character devices (or raw devices) are those that can be accessed directly bypassing
the operating system’s buffer caches. This means that the data is read or written into these
devices directly without being stored in the buffer cache. In addition, the name ‘character
device’ itself signifies that the data from such a device is accessed one character at a time.
Data is accessed from a character device sequentially (not randomly) in the form of a stream
of characters. Examples of character devices include serial port, mouse, keyboard, virtual
terminal, and printer.
In the long listing, the first character in the mode field is c or b. Refer to the long listing
shown in Section 5.2.1, where the floppy drive, CD-ROM, and the hard disk have b prefixed to
their permissions confirming that they are block devices. Similarly, printers, raw floppy drives,
and tape drives have c prefixed to their permissions, which confirms that they are raw devices.

5.2.3 Major and Minor Numbers


Devices are divided into sets called major device numbers. For instance, all small computer
system interface (SCSI) disks have major number 8, floppy disks have major number 2,
and so on. Further, each individual device has a minor device number too. For example, the
device /dev/sda has minor device number 0 and /dev/fd135ds18 has minor number 48. It
also means that the major number helps the kernel in recognizing the device category and the
minor number makes the recognition more precise. Similarly, the major number 8 informs
the kernel that the device is a SCSI disk, and the minor number 0 informs that it is the first
disk drive. Similarly, the major number 2 informs the kernel that the device is a floppy disk
drive and the minor number 48 informs that the device is the first, A: drive. Hence, both the
major and minor device numbers collectively identify the device to the kernel.
In the output of the long listing command, ls –al, the fifth column shows a pair of two numbers,
separated by a comma. These numbers are the major and minor device numbers. As the major
number represents the type of device, we can see in the output that all floppy disk drives have
the same major number 2. The minor number indicates the special characteristics of the device to
recognize it precisely. For example, fd0135ds18 and fd196ds15 represent two floppy disk drives,
hence both of them have the same major number (2), but different minor numbers (48 and 42) to
distinguish that one is floppy disk drive A: and the other is floppy disk drive B: respectively.
Note: Taking backup is an essential task in an operating system. A backup helps in restoring the data in case
of any disk failure or system crash. The commands that we are going to learn in Section 5.3 are concerned with
formatting disks, backing up data, restoring, etc.

5.3 DISK-RELATED COMMANDS


In this section, we will focus on different disk-related commands. We will learn about
the commands that are used for copying data from one disk to another, formatting disks,
File Management and Compression Techniques 99

displaying usage of disk space, that is, the space used by different files and directories of
the disk, the amount of free disk space in all the file systems in our machines, the amount of
free disk space in terms of megabytes (MB) and percentage, and dividing the disk drive into
different partitions. Let us see how disks are copied.

5.3.1 dd: Copying Disks


The dd (data dump) command is used for copying data from one medium to another. It reads
and writes data in block-sized chunks, where the default size of the block is 512 bytes.
Syntax dd if=INPUT-FILE-NAME of=OUTPUT-FILE-NAME [options]

Here, if represents input and of represents output.


Examples
(a) To backup a hard disk to a file, type the following command.
dd if=/dev/hda of=/file.dd
It copies the entire disk, hda, to another file, file.dd.
(b) To backup a hard disk to another disk, type the following command.
dd if=/dev/hda of=/dev/hdb
It copies the entire disk, hda, to another disk, hdb.
Note: The output from dd can be a new file or another storage device.
Table 5.2 shows the common options used with the dd command.
Table 5.2 Options used with the dd command Examples
Options Description (a) dd if=/dev/hda of=/dev/hdb conv=
noerror,sync
bs = n Sets the block size to n bytes
Here, hda is the source disk, hdb
count = n Copies n blocks and then stops
is the destination disk, sync is for
skip = n Copies after skipping n blocks synchronized I/O, and noerror is
conv = noerror Prevents dd from stopping on for continuing the copy operation
encountering an error even if there are read errors.
sync Pads the input block with null bytes (b) dd if=/dev/.hda count=1 of=file.dd
to make it equal to the block size It copies just one sector of the disk
hda.
(c) dd if=/dev/hda skip=1 count=1 of=file.dd
It skips the first sector and copies just the second sector of the disk hda.
Note: dd uses only raw devices.

5.3.2 du: Disk Usage


This utility is used to get complete information about the usage of disk space by each file and
directory of the system. If we specify a directory name along with the du utility, we get the list of
disk space consumed by the directory and all of its subdirectories.
Syntax du [options] directories

Table 5.3 shows the common options used with the du command.
100 Unix and Shell Programming

Table 5.3 Options used with the du command Examples


Options Description (a) By default, the du command without any options
-k Displays the block size in units of 1024 displays the directories (in the current directory).
bytes, rather than the default 512 bytes The following are the blocks consumed by each
-a Displays the blocks used by each file of those directories.
-s Displays the summary (total) for each of $ du .
the specified files 2 ./.snap
31068 ./bin
96 ./include/altq
68 ./include/arpa
128 ./include/bsm

Here, du reports the number of blocks used by the current directory (denoted by .) and
those used by subdirectories within the current directory.
(b) The number of blocks used by the etc directory and its subdirectories are displayed
using the following command.
$ du /etc
54 /etc/defaults
2 /etc/X11
8 /etc/bluetooth
4 /dev/devd
These blocks (to the left of each directory) are 512 bytes in size.
(c) To ascertain the blocks (that are 1024 bytes in size) that are used by the subdirectories
in the etc directory, we will use the following command.
$ du –k /etc
27 /etc/defaults
1 /etc/X11
4 /etc/bluetooth
2 /dev/devd

The blocks shown in this output are 1024 bytes in size.


(d) To find the usage of every file, we can use the following command.
$ du -a
(e) The option -a displays all the files and the blocks used by each file.
2 ./.snap
82 ./bin/ctfconvert
20 ./bin/ctfdump
56 ./bin/ctfmerge
18 ./bin/sgsmsg
(f) If we want to view only the total number of blocks occupied by the specific directory, we
have to use the summary (-s) option. The following example displays the total number
of blocks used by the current directory.
$ du -s
3616480
File Management and Compression Techniques 101

(g) To ascertain the number of blocks used by a specific file(s), we can use the following
command.
$ du –s *.txt
10 abc.txt
7 pqr.txt
11 xyz.txt

This output shows the number of blocks used by the different files with extension .txt.
Note: The du command displays information in terms of 512-byte blocks independent of the actual disk block size.

5.3.3 df: Reporting Free and Available Space on File Systems


This command reports the free disk space for all the file systems installed on our machines
in terms of disk blocks. The command displays the capacity of each file system, the space in
use, the free space, and the number of free files.
Syntax df [-options][filesystem]

Table 5.4. shows the common options used with the df command.

Table 5.4 Options used with the df command


Options Description
h Displays the size in human readable formats (KB, MB, and GB)
e Displays only the number of files free
k Displays the size in terms of blocks where a block is of 1 KB

Examples
(a) If we want to have information regarding the free and available disk space of a particular
file system, we can mention it in the df command. Furthermore, we can use the df
command without any option or file system (as shown here) in order to obtain information
about all the file systems installed on our machines.
$ df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 507630 165380 301640 35% /
devfs 1 1 0 100% /dev
/dev/ad0s1e 507630 12 467008 0% /tmp
/dev/ad0s1f 73138272 3616480 63670732 5% /usr
/dev/ad0s1d 1185230 2050 1088362 0% /var

The first column displays the different partitions on the disk of our system. The
second column displays the size of the partitions in terms of blocks of size 1 KB.
Similarly, the size of the first partition represented by ad0s1a is of size 507630 KB
(507 MB). Out of the 507630 KB, 165380 KB is used up and 301640 KB is free, as
represented by the third and fourth columns respectively. The fifth column shows
102 Unix and Shell Programming

the used (consumed) percentage of the disk. The last column indicates where the
partition is connected to the Unix file system. For example, the partition ad0s1a
(shown in the first row) is the root partition and hence is represented to be mounted on.
(b) To know the amount of free space in a particular partition, we can specify that while
giving the df command. For example, in order to know the amount of free disk space in
the root partition, we need to give the following command.
$ df /
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 507630 165380 301640 35% /

This output shows the total size of the root partition in terms of KB, the amount of used
space, free space, and percentage of disk space used.
(c) In order to easily remember the size of the partitions, we make use of the –h option to
display the size of the partitions in human readable forms.
$ df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 506 MB 164 MB 300 MB 35% /
devfs 1 KB 1 KB 0 100% /dev
/dev/ad0s1e 506 MB 0 506 MB 0% /tmp
/dev/ad0s1f 71 GB 5.4 GB 62 GB 5% /usr
/dev/ad0s1d 1 GB 1.8 MB 1 GB 0% /var

In this output, the size of the partitions is displayed in megabytes, which is computed by
dividing the block sizes in KB by a value 1024.
(d) The option –k of the df command displays the size of the file systems in kilo bytes as
shown in the following example.
$ df -k
Filesystem KBytes Used Avail Capacity Mounted on
/dev/ad0s1a 518144 167936 307200 35% /
devfs 1 1 0 100% /dev
/dev/ad0s1e 518144 0 518144 0% /tmp
/dev/ad0s1f 74448896 5662310 65011712 5% /usr
/dev/ad0s1d 1048576 18432 1030144 0% /var
(e) The option –e of the df command displays the number of files that are free on the file
systems as shown in the following example.
$ df -e
Filesystem ifree
/dev/ad0s1a 34596
devfs 0
/dev/ad0s1e 7483620
/dev/ad0s1f 8402
/dev/ad0s1d 56129

This output shows the numbers of files free on each of the file systems.
File Management and Compression Techniques 103

5.3.4 dfspace: Reporting Free Space on File Systems


The dfspace command is specific to the SCO Unix system. It works in a manner similar to
the df command and presents information regarding free space on file systems on our disks
in a more readable format, that is, it reports the free disk space in terms of megabytes and
percentage of the total disk space.
Note: This command will work with SCO Unix and not on Oracle Solaris 10, which this book focuses on.

Syntax dfspace [file system]

Here, the file system is used to find out the free disk space available on it. If the file system
is not specified, all file systems on the disk are displayed along with the information on
available disk space on each of them.
Example $/etc/dfspace
: Disk Space: 6.32 MB of 137.74 MB available (4.59 %)
Total disk Space: 10.50 MB of 200 MB available (3.89%)
In the aforementioned example, we have written /etc/dfspace instead of dfspace, because
the dfspace command exists in the etc directory. The output reports free disk space for the
root file system. If there had been other file systems installed, their free space would have
also been reported. It also reports the total disk space available.
It is to be noted that the df and dfspace commands report the disk space available in the file
system as a whole, whereas du reports the disk space used by specified files and directories.

5.3.5 fdisk: Dividing Disks into Partitions


Dividing the hard disk into one or more logical disks is called partitions. The partitions,
the divisions of the disk, are described in the partition table found in sector 0 of the disk.
However, fdisk in Linux creates both partitions as well as file systems.
A large disk drive is partitioned into smaller segments to increase system performance. It is
quite obvious that searching or interacting with a file in a smaller disk drive segment will be quite
faster when compared to a larger disk drive. There are two types of partitions: primary partition
and extended partition. A hard drive can contain up to four primary partitions. A primary
partition is necessary to make the drive bootable—an operating system is installed in it. It is not
used for data storage. Multiple primary partitions are created to make a multiboot system. For
a single boot system, one primary partition is sufficient. In order to overcome the limitation of
having a maximum of four primary partitions on a drive, we make use of the extended partition.
An extended partition is the only kind of partition that can have multiple partitions inside. The
partitions created inside the extended partition are known as logical drives. An extended partition
acts as a container for the logical drives. It cannot hold any data without first installing a logical
drive. We can create as many logical drives as we want on an extended partition.
Note: On an IDE drive, the first drive is called hda, and the partitions are shown as hda1, hda2, etc.
The second drive is called hdb, and the partitions are shown as hdb1, hdb2, etc. On an SCSI drive, the first
drive is called sda, and the partitions are sda1, sda2, and so on. The second drive is called sdb and the
partitions are sdb1, sdb2, etc.
104 Unix and Shell Programming

The fdisk command is used to create, delete, and activate partitions.


Syntax fdisk [-l] [-u] [-b sector_size] [-v] [device] [-s partition]

Table 5.5 shows the aforementioned options.


Table 5.5 Options used with the fdisk command
Option Description
-l It lists the partition tables for the specified device. The device is usually one of the following:
/dev/hda
/dev/hdb
/dev/sda
/dev/sdb
-u It displays the sizes in terms of sections instead of cylinders while listing partition tables.
-b sector_size It specifies sector size of the disk (valid values are 512, 1024, or 2048).
-s partition It displays the size of the specified partition in blocks.
-v It prints the version number of the fdisk command.

Table 5.6 Menu options of the fdisk command When the fdisk command is active, it
displays a menu of options that we can
Option Description
use to create, list, display, and delete
d Deletes a partition partitions. Table 5.6 gives the menu
l Lists the partitions options of the fdisk command.
m Displays this menu We can create a primary partition
n Creates a new partition with one file system on it, or an extended
p Prints the partition table
partition with multiple logical drives in
the partition.
q Quits without saving changes
w Writes the partition table to the disk and exits Example $ fdisk -l

This command lists the partition information of the disk drive on our computer system as
given here:
Disk /dev/hda1: 64 heads, 63 sectors, 1023 cylinders
Units = cylinders of 4032 * 512 bytes

Device Boot Begin Start End Blocks Id System


/dev/hda1 636 636 902 538272 64 Linux native
/dev/hda2 903 903 1024 245952 8 Extended
/dev/hda3 229 229 635 819189 5 Linux
/dev/hda4 903 903 1024 245920+ 4 Linux swap

The first hard disk as a whole is represented as /dev/hda, while individual partitions in this
disk take on names hda1, hda2, and so forth. hda1 here is a primary partition, hda2 is an
extended partition containing a logical partition hda4. The active partition is indicated by
an * in the second column. The second hard disk will have the name /dev/hdb with similar
numeric extensions.
File Management and Compression Techniques 105

5.4 COMPRESSING AND UNCOMPRESSING FILES


In this section, we will learn about the different ways of compressing and uncompressing
files using commands such as gzip, zip, compress, pack, and bzip2, and gunzip, unzip,
uncompress, unpack, and bunzip2, respectively, along with their syntax and examples. The
implementation of these commands is discussed in Section 5.4.1.

5.4.1 gzip Command


The gzip command compresses the specified file and replaces it with the .gz extension file,
that is, the original file is deleted and is replaced by the compressed version having the same
primary name (as that of the original file) and the secondary name as .gz.
Syntax gzip [-d][-l][-f][-c] file_name

Table 5.7 shows the aforementioned options.


Table 5.7 Options used with the gzip command
Option Description
-d It decompresses the specified file.
-l It lists the information of each compressed file. The information includes compressed size,
uncompressed size, compression ratio, and name of the uncompressed file.
-f It means force compression or decompression. This option performs the operation without giving
a confirmation message and overwrites the existing file, if the corresponding file already exists.
-c It displays the compressed output on the screen, keeping the original file unchanged. The command
provides several compressed files in the output if there are several input files—one for each input file.
file_name It refers to the filename that we wish to compress.

Figure 5.2 shows two files names.txt and numbers.txt with the initial content that we wish
to compress.
Examples
(a) $ gzip -c names.txt
This command does not compress the file names.txt, but displays the compressed output
on the screen (refer to Fig. 5.2).
(b) $ gzip names.txt
The file names.txt is compressed and renamed names.txt.gz and is confirmed using the
ls command (refer to Fig. 5.2).
(c) $ cat names.txt.gz
The command shows the compressed content of the file names.txt. We can see (refer to
Fig. 5.2) that the output displayed using the -c option matches the output of this example.
(d) $ gzip numbers.txt
The file numbers.txt is also compressed into the file numbers.txt.gz and is confirmed
using the ls command, which is shown in Fig. 5.2.
(e) $ gzip -l *.gz
It lists the information of compressed files, names.txt and numbers.txt. This is evident
from the list of commands shown in Fig. 5.2. This figure shows the compressed size,
uncompressed size, compression ratio, and the name of the uncompressed file.
106 Unix and Shell Programming

$ ls n*
names.txt numbers.txt
$ cat names.txt
Anil
Ravi
Sunil
Chirag
Raju
$ cat numbers.txt
2429193
3334444
7777888
9990000
5555111
$ gzip -c names.txt
? N ͉names.txt s
J,
.͋1 32 ?μμY \ xA
$ gzip names.txt
$ ls n*
names.txt.gz numbers.txt
$ cat names.txt.gz
? N ͉names.txt s
J,
- ͋1 32 ?μμY \ xA
$ gzip numbers.txt
$ ls n*
names.txt.gz numbers.txt.gz
$ gzip -1 *.gz
copmressed uncompressed ratio uncompressed_name
54 28 7.1% names.txt
63 40 17.5% numbers.txt
117 68 -27.9% <totals>
$ gzip -d names.txt.gz
$ ls n*
names.txt numbers.txt.gz
$ cat >numbers.txt
12345
$ ls n*
names.txt numbers.txt numbers.txt.gz
$ gzip -d numbers.txt.gz
gzip: numbers.txt already exists; do you wish to overwrite <y or n>? n
not overwritten
$ gzip -df numbers.txt.gz
$ ls n*
names.txt numbers.txt

Fig. 5.2 Compression and uncompression of files names.txt and numbers.txt using
the gzip command
File Management and Compression Techniques 107

(f) $ gzip -d names.txt.gz


The compressed file names.txt.gz is uncompressed or decompressed to names.txt and is
confirmed using the ls command, which is shown in Fig. 5.2.
(g) $ gzip -d numbers.txt.gz
The compressed file numbers.txt.gz is supposed to be uncompressed to numbers.txt.
However, since a file numbers.txt already exists, a warning message—gzip: numbers.
txt already exists; do you wish to overwrite (y or n)?—is displayed.
(h) $ gzip -df numbers.txt.gz
The option -f results in force decompression and overwrites the existing file numbers.
txt without displaying any warning message. Figure 5.2 shows both the uncompressed
files names.txt and numbers.txt.

5.4.2 gunzip Command


This command is used to uncompress the compressed file using the commands gzip,
compress, or pack.

Syntax gunzip [-l][-f][-c] file_name

Table 5.8 shows the aforementioned options.


Table 5.8 Options used with the gunzip command
Option Description
-l It lists the information of each compressed file. This includes compressed size, uncompressed
size, compression ratio, and name of the uncompressed file.
-f This means force decompression. It overwrites the existing file without confirmation, if the
corresponding uncompressed file already exists
-c It displays the content of the compressed file in an uncompressed format on the screen
keeping it in the compressed form. This option uncompresses the input files and arranges the
uncompressed content of each file one below the other without any blank line in between.
file_name It refers to the filename that we wish to uncompress.

Examples We compressed files names.txt.gz and numbers.txt.gz. Let us look at the


examples to uncompress them using the gunzip command.
(a) $ gunzip -l *.gz
It lists the information of compressed files names.txt and numbers.txt. Figure 5.3 shows
that the output is the same as gzip -l *.gz. The listing shows the compressed size,
uncompressed size, compression ratio, and the name of the uncompressed file.
(b) $ gunzip -c names.txt.gz
This command does not uncompress the file names.txt.gz, but displays its uncom-
pressed content on the screen (refer to Fig. 5.3).
(c) $ gunzip -c names.txt.gz numbers.txt.gz
When more than one compressed file is used with the -c option, their uncompressed
contents will be displayed on the screen one below the other without any blank line in
between (refer to Fig. 5.3). The files remain unchanged.
108 Unix and Shell Programming

$ ls n*
names.txt.gz numbers.txt.gz
$ gunzip -1 *.gz
compressed uncompressed ratio uncompressed_name
54 28 7.1% names.txt
63 40 17.5% numbers.txt
117 68 -27.9% <totals>
$ gunzip -c names.txt.gz
Anil
Ravi
Sunil
Chirag
Raju
$ gunzip -c names.txt.gz numbers.txt.gz
Anil
Ravi
Sunil
Chirag
Raju
2429193
3334444
7777888
9990000
5555111
$ gunzip names.txt.gz
$ ls n*
names.txt numbers.txt.gz
$ cat >numbers.txt
12345
$ ls n*
names.txt numbers.txt numbers.txt.gz
$ gunzip numbers.txt.gz
gzip: numbers.txt already exists; do you wish ot overwrite <y or n>? n
not overwritten
$ gunzip -f numbers.txt.gz
$ ls n*
names.txt numbers.txt

Fig. 5.3 Uncompression of files names.txt and numbers.txt using the gunzip command
(d) $ gunzip names.txt.gz
The file names.txt.gz is uncompressed and renamed names.txt. This is confirmed using
the ls command (refer to Fig. 5.3).
(e) $ gunzip numbers.txt.gz
The compressed file numbers.txt.gz is supposed to be uncompressed to numbers.txt.
However, as the file numbers.txt already exists, the following warning message—gzip:
numbers.txt already exists; do you wish to overwrite ( y or n)?—is displayed.
(f) $ gunzip -f numbers.txt.gz
The option -f results in force decompression and hence overwrites the existing file
numbers.txt without displaying any warning message. We can see the uncompressed
files names.txt and numbers.txt in Fig. 5.3.
Note: When we uncompress a file, the compressed file is automatically deleted from the system.
File Management and Compression Techniques 109

5.4.3 zip Command


The zip command compresses a set of files into a single archive. The syntax for zipping a set
of files into a compressed form is as follows:
Syntax zip [-g][-F][-q][-r] file_name files
Table 5.9 shows the aforementioned options.
Table 5.9 Options used with the zip command
Option Description
-g Adds files to an existing zip file
-F Fixes any zip file, if damaged
-q Makes the zip command run in the quiet mode, so that the files are compressed without
displaying any response on the screen
-r Compresses the files in the current directory as well as subdirectories
file_name Refers to the archive in which compressed files are stored (an extension .zip will be
automatically appended to file_name)
files Refers to the files that we wish to compress

Examples
(a) $ zip abc *
All the files in the current directory are compressed into a single file abc.zip.
Note: The gzip command can only compress a single file whereas the zip command can compress
multiple files.
A range of filenames can be given using wild cards. As the zip command compresses
the files, the progress will be reported on the screen. When we compress these files, the
original files remain unchanged.
(b) If we wish to add a file(s) that we forgot to add in the zip file, the following statement
will solve the purpose.
$ zip -g abc a.txt
This example adds the file a.txt to an existing zip file abc.zip.
(c) The following is the option to correct the damaged zip file.
$ zip -F abc –out pqr
This example fixes the zip file abc.zip if damaged, and copies the fixed version into
another zip file pqr.zip.
(d) The following example compresses the files with extension .dat from the current
directory in the quiet mode, that is, without displaying any response on the screen.
$ zip -q abc *.txt
(e) In order to compress the files of subdirectories, we use the -r option.
$ zip –r abc projects
This example compresses all the files in the projects directory as well as in its
subdirectories and saves them in the abc.zip file.
The execution of the aforementioned commands is shown in Fig. 5.4.
110 Unix and Shell Programming

$ ls -l
total 12
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt

$ zip abc *
adding: customers.txt (stored 0%)
adding: letter.txt (stored 0%)
adding: matter.txt (deflated 21%)
adding: projects/ (stored 0%)
adding: transact.txt (deflated 64%)
adding: users.txt (stored 0%)

$ ls -l
total 16
-rw-r--r-- 1 root root 1370 Feb 22 14:55 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt

$ cat > a.txt


Testing
^D

$ zip -g abc a.txt


adding: a.txt (stored 0%)

$ zip -F abc --Out pqr


Fix archive (-F) - assume mostly intact archive
Zip entry offsets do not need adjusting
copying: customers.txt
copying: letter.txt
copying: matter.txt
copying: projects/
copying: transact.txt
copying: users.txt
copying: a.txt

$ ls -l
total 158
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1516 Feb 22 14:56 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
-rw-r--r-- 1 root root 1516 Feb 22 14:58 pqr.zip
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt

$ zip -q abc *.txt

$ zip -r abc projects


updating: projects/ (stored 0%)
adding: projects/bank.lst (deflated 45%)

Fig. 5.4 Compression of files using the zip command


File Management and Compression Techniques 111

5.4.4 unzip Command


The unzip command is used to unzip the archive and extract all the files that were compressed
in it.
Table 5.10 Options used with the unzip command Syntax unzip [-p][-t][-l][-d
Option Description directory_name][-f] file_name
Table 5.10 shows the aforementioned
-p Extracts files in the archive (zip
file) to the screen (i.e., the files’
options.
content is displayed on the screen) Examples
-t Tests the archive file and determines
(a) To unzip a zipped archive, we use
if it is consistent and prints only
the following unzip command.
the summary message to indicate
$ unzip abc
whether the archive is OK or not
This example extracts all the files
-l Lists the archive file, which shows
stored in the zip file abc.zip into the
the names of the compressed files,
their size, modification date, etc
current directory.
-d directory_name Extracts the compressed files from
(b) $ unzip –d temp abc
the zip file into the specified directory
This command extracts the files
in the archive abc.zip into the
-f Updates only the existing files, that
temporary directory.
is, only the files that exist in the
current directory and are newer
(c) $ unzip –p abc
than the current disk copies are This command extracts the files in the
uncompressed from the archive archive abc.zip on the screen. The
file content is displayed on the screen.
(d) $ unzip –t abc
This command tests the archive abc.zip and displays a summary message informing us
if the archive is OK or not.
(e) $ unzip –l abc
This command lists the archive abc.zip and shows the names of the compressed files,
their size, modification date, etc.
(f) $ unzip –f abc
This command extracts and updates only those files from the archive abc.zip that exists
in the current directory.
Table 5.11 Options used with the compress command Figure 5.5 demonstrates the execution of the
aforementioned commands.
Option Description
-c Compresses the file and displays the compressed 5.4.5 compress Command
version on the screen; retains the original file, that
is, no .Z file is created The compress command compresses the
-f Applies force compression of the files and specified file. It replaces the original file with its
overwrites the corresponding .Z file if it exists compressed version that has the same filename
without verification with a .Z extension added to it.
-v Displays the size of the compressed files Syntax compress [-c] [-f] [-v] file
file Represents the files that have to be compressed
Table 5.11 shows the aforementioned options.
112 Unix and Shell Programming

$ ls -l
total 4
-rw-r--r-- l root root 1869 Feb 22 15:01 abc.zip
$ unzip abc
Archive: abc.zip $
extracting: customers.txt
extracting: letter.txt
inflating: matter.txt
creating: projects/
inflating: transact.txt
extracting: users.txt
extracting: a.txt
inflating: projects/bank.lst
$ ls -l
total 18
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ ls projects
bank.lst
$ unzip -d temp abc
Archive: abc.zip
extracting: temp/customers.txt
extracting: temp/letter.txt
inflating: temp/matter.txt
creating: temp/projects/
inflating: temp/transact.txt
extracting: temp/users.txt
extracting: temp/a.txt
inflating: temp/projects/bank.lst
$ ls -l
total 152
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ unzip -p abc
John
Charles
Troy
hello
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today
$ unzip -t abc
Archive: abc.zip
testing: customers.txt OK
testing: letter.txt OK
testing: matter.txt OK
testing: projects/ OK
testing: transact.txt OK
testing: users.txt OK

Fig. 5.5 Screenshots of the unzip command (Contd)


File Management and Compression Techniques 113

testing: a.txt OK
testing: projects/bank.lst Ok
No errors detected in compressed data of abc.zip.
$ unzip -l abc
Archive: abc.zip
Length Date Time Name
----------- --------- ----- -----
18 02-22-2006 14:51 customers.txt
6 02-22-2006 14:51 letter.txt
113 02-22-2006 14:51 matter.txt
0 02-22-2006 14:53 projects/
892 02-22-2006 14:51 transact.txt
16 02-22-2006 14:51 users.txt
8 02-22-2006 14:56 a.txt
347 02-22-2006 14:53 projects/bank.lst
-------- --------
1400 8 files
$ ls -l
total 360
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 Customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ rm a.txt
$ rm matter.txt
$ unzip -f abc
Archive: abc.zip
$ ls-l
total 356
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 Customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt

Fig. 5.5 (Contd)

Examples
(a) $ compress transact.txt
This example compresses the file transact.txt and renames it transact.txt.Z.
Note: The original file is replaced by another file, which has the same name with a .Z extension added to it
(i.e., transact.txt is replaced by the file transact.txt.Z).

(b) $ compress –c customers.txt


It displays the compressed format of the file customers.txt on the screen, but does not
compress it.
(c) $ compress –f transact.txt
If a file transact.txt.Z already exists, this command overwrites it with the compressed
114 Unix and Shell Programming

$ ls -l transact*
-rw-r--r-- l root root 892 Feb 22 14:51 transact.txt
$ compress transact.txt
$ ls -l transact*
-rw-r--r-- 1 root root 551 Feb 22 14:51 transact.txt.Z
$ compress -c customers.txt
.੄੄ ř“੄੄ɥ”੄੄੄੄੄੄ɮ›ɏɤ
$ cat > transact.txt
testing
^D
$ compress transact.txt
–”ƒ•ƒ…–Ŝ–š–ŜƒŽ”‡ƒ†›‡š‹•–•چ‘›‘—™‹•Š–‘‘˜‡”™”‹–‡–”ƒ•ƒ…–Ŝ–š–Ŝſ›‡•‘”‘ƀţ
‘–‘˜‡”™”‹––‡
$ compress -f transact.txt
$ ls -l transact*
-rw-r--r-- 1 root root 12 Feb 22 16:51 transact.txt.Z
ɛ…‘’”‡••ޘƒ––‡”Ŝ–š–
ƒ––‡”Ŝ–š–ś‘’”‡••‹‘śɨɩŜɪɯʩŞŞ”‡’Žƒ…‡†™‹–Šƒ––‡”Ŝ–š–Ŝ

Fig. 5.6 Compression of files using the compress command

version of the earlier file transact.txt without confirmation. If –f option is not used, the
compress command asks for confirmation before overwriting any existing file.
(d) $ compress -v matter.txt
This example displays how much compression was carried out by showing the output
given here.
matter.txt: Compression: 12.38% -- replaced with matter.txt.Z
The output of these commands is given as a screenshot in Fig. 5.6.

5.4.6 uncompress Command


This command is used to get the compressed file back to its original form. The uncompressed
file will have the same filename with the extension .Z removed.
Syntax uncompress [-c] [-f] file
Table 5.12 shows the aforementioned options.
Table 5.12 Options used with the uncompress
command Examples
Option Description (a) $ uncompress transact.txt.Z
-c It displays the content of the compressed
Using this command, the compressed file transact.
txt.Z is uncompressed into the file transact.txt, that
file without uncompressing it.
is, the file transact.txt.Z is deleted and the original
-f It applies force uncompression to the file,
file transact.txt is recreated in its original size.
that is, it overwrites the corresponding
(b) $ uncompress –c matter.txt.Z
file if it exists without verification.
It displays the uncompressed version of the file
file It represents the file that we wish to
matter.txt.Z on the screen, keeping the compressed
uncompress.
file intact.
File Management and Compression Techniques 115

(c) $ uncompress –f matter.txt.Z


Using this command, the compressed file matter.txt.Z is uncompressed into the file
matter.txt. The –f option performs force uncompression. In other words, if a file matter.
txt already exists, it is overwritten by the uncompressed version of the file matter.txt.Z
without confirmation.
To see the contents of the compressed files, we use the zcat command.
Syntax zcat file_name.Z

Here, file_name.Z represents the compressed file.


Example
$ zcat matter.txt.Z
This example will display the content of the compressed file matter.txt.Z without uncompressing
it. Figure 5.7 shows the output of the aforementioned commands.

5.4.7 pack Command


It compresses or shrinks files. The original file is replaced with a packed version. The original
filename will have the .z extension appended to it.
Syntax pack [-f] file_name

-f It applies force to pack the file. Sometimes if not much compression is possible, the
pack command refuses to pack the file. The –f option forcefully packs the file into the .z
extension even if there is not much saving.
file It represents the file we wish to pack.
Examples
(a) $ pack a.png
pack: a.png: 0.3% compression
The compressed file will be stored in the name a.png.z and the original file will be
deleted. In the pack command, the degree of compression is low.
(b) $ pack –f matter.txt
It packs the file matter.txt in the name matter.txt.z forcefully, that is, even when
not much compression is possible, the files will still be compressed into matter.txt.z.
To view the contents of a packed file, we use the pcat command.
Syntax pcat file_name.z

Here, file_name.z represents the packed file whose content we wish to see.
Example $ pcat matter.txt.z

This example will display the content of the packed file matter.txt.z without unpacking it.
Figure 5.8 shows the output of the aforementioned commands.

5.4.8 unpack Command


This command is used to get back the original file from the packed file.
Syntax unpack file_name
116 Unix and Shell Programming

$ ls -l transact*
-rw-r--r-- 1 root root 551 Feb 22 17:04 transact.txt.z
$ uncompress transactl.txt.Z
$ ls - l transact*
-rw-r--r-- 1 root root 892 Feb 22 17:04 transact.txt
$ ls -l matter*
-r------r-- 1 root root 99 Feb 22 17:08 matter.txt.Z
$ uncompress -c matter.txt.Z
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today
$ cat > matter.txt
Hello
^D
$ uncompress matter.txt.Z
matter.txt already exists; do you wish to overwrite matter.txt (yes or no)? n
not overwritten
$ uncompress -f matter.txt.Z
$ ls -l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
$ compress matter.txt
$ ls -l matter*
-r-----r-- 1 root root 99 Feb 22 17:08 matter.txt.Z
$ zcat matter.txt.Z
Hello this is testing of cut command
I think it is working as per the expected
result.it is going to rain today

Fig. 5.7 Uncompression of files using the uncompress command

$ ls -l a*
-rw-r--r-- 1 root root 34878 Feb 22 17:28 a.png
$ pack a.png
pack: a.png: 0.3% Compression
$ ls -l a*
-rw-r--r-- 1 root root 34779 Feb 22 17:28 a.png.z
$ ls -l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
$ pack matter.txt
pack: matter.txt: no saving - file unchanged
$ pack -f matter.txt
pack: matter.txt: 11.5% Compression
$ ls -l matter*
-r-----r-- 1 root root 100 Feb 22 17:08 matter.txt.z
$ pcat matter.txt.z
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today

Fig. 5.8 Compression of files using the pack command


File Management and Compression Techniques 117

Here, file_name unpacks or uncompresses the packed file by removing its extension .z.
Example $ unpack matter.txt.z

The packed file matter.txt.z will be unpacked to the file matter.txt as shown in Fig. 5.9.

$ ls - l matter*
-r-----r-- 1 root root 100 Feb 22 17:08 matter.txt.z

$ unpack matter.txt.z
unpack: matter.txt: unpacked

$ ls - l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt

Fig. 5.9 Uncompression of files using the unpack command

5.4.9 bzip2 and bunzip2 Commands


bzip2 and bunzip2 are the compression commands similar to gzip/gunzip, but with a different
compression method. As far as the technique is concerned, these methods are considered
better than gzip/gunzip. However, they comparatively take a longer time to compress and
uncompress the files. The bzip2 command compresses the specified file by replacing it with
its compressed version having a .bz2 extension.
Syntax bzip2 [-d][-f][-k][-v] filenames
Table 5.13 shows the aforementioned options.
Table 5.13 Options used with the bzip2 command Assume that we have two files,
names.txt and numbers.txt, with the
Option Description initial content shown in Fig. 5.10,
-d It decompresses the file. which we wish to compress.
-f It performs force operation, that is, it
Examples
overwrites the corresponding file without
warning. (a) $ bzip2 names.txt
-k It keeps the original file and creates another The file names.txt is compressed
compressed file with an extension .bz2. and is renamed names.txt.bz2,
-v Verbose mode shows the compression ratio which is confirmed by the ls
for each compressed file . command (refer to Fig. 5.10).
Filenames It represents the files that have to be (b) $ cat names.txt.bz2
compressed. The command shows the com-
pressed content of the file names.
txt.bz2 (refer to Fig. 5.10).
(c) $ bzip2 -v numbers.txt
The file numbers.txt is compressed into the file numbers.txt.bz2, but this time in
verbose mode. This means that it displays the information regarding compression
ratio, number of bits per byte, and other related information, as shown in Fig. 5.10.
118 Unix and Shell Programming

(d) $ bzip2 -d names.txt.bz2 numbers.txt.bz2


The compressed files names.txt.bz2 and numbers.txt.bz2 are uncompressed or
decompressed to names.txt and numbers.txt, respectively, which is confirmed by the ls
command shown in Fig. 5.10.
(e) $ bzip2 -k names.txt
The file names.txt is compressed and its compressed version is stored in another file
names.txt.bz2, keeping the original file intact. Hence, the original file names.txt will
not be overwritten by its compression version, but a separate file is made to keep the
compressed format.
(f) $ bzip2 -d names.txt.bz2
The names.txt.bz2 file is supposed to be uncompressed into the file names.txt. However,
as the file names.txt already exists, uncompression does not take place and the following
warning message is displayed: bzip2: Output file names.txt already exists.
(g) $ bzip2 -df names.txt.bz2
The option -f results in force decompression. The file names.txt.bz2 is uncompressed,
overwriting the existing file names.txt, without displaying any warning message. We
can see both the uncompressed files, names.txt and numbers.txt, in Fig. 5.10.
In addition to the bzip2 -d command, we can also uncompress files using the bunzip2 command.

$ ls n*
names.txt numbers.txt

$ cat names.txt
Anil
Ravi
Sunil
Raju

$ cat numbers.txt
2429193
3334444
7777888
9990000
5555111

$ bzip2 names.txt

$ ls n*
names.txt.bz2 numbers.txt

$ cat names.txt.bz2
H AY&SY={ : 䉴< ↑ ! 4=L P Q lννz PE{
W

$ bzip2 -v numbers.txt
numbers.txt: 0.635:1, 12.600 bits/byte, -57.50% saved, 40 in, 63 out.

$ ls n*
name.txt.bz2 numbers.txt.bz2

Fig. 5.10 Compression and uncompression of files names.txt and numbers.txt using the
bzip2 command (Contd)
File Management and Compression Techniques 119

$ bzip2 -d names.txt.bz2 numbers.txt.bz2

$ ls n*
names.txt numbers.txt

$ bzip2 -k names.txt

$ ls n*
names.txt names.txt.bz2 numbers.txt

$ bzip2 -d names.txt.bz2
bzip2: Output file names.txt already exists.

$ bzip2 -fd names.txt.bz2

$ ls n*
names.txt numbers.txt

Fig. 5.10 (Contd)

5.4.10 bunzip2 Command


This command uncompresses the file that is compressed by the bzip2 command.
Syntax bunzip2 filename
Example $ bunzip2 numbers.txt.bz2

The file numbers.txt.bz2 is uncompressed into the file numbers.txt (i.e., the file numbers.
txt.bz2 will be deleted).

5.4.11 7-zip—Implementing Maximum Compression


Besides zip, bzip, gzip, and other similar commands, Unix also supports the 7-zip command.
The 7-zip command is the file archiver command that compresses files at the highest
Table 5.14 Options used with the 7-zip command compression ratio (around 30–50% more than
Option Description the other zip formats). This is because it uses
the Lempel–Ziv–Markov chain algorithm
a It adds file(s) to the compressed_file.
(LZMA) compression algorithm, which
d It deletes file(s) from the compressed_file.
enables it to have the highest compression
e It extracts the content from the compressed_file
ratio. The syntax for this command is as
in the current directory, that is, extracts the files of
the directories (if in the compressed form) into the
follows:
current directory Syntax 7z [a][d][e][x] [l][t] compressed_
x It extracts the content from the compressed_file file [files_to_compress]
along with the full paths, that is, the files of the
directories (if in the compressed form) will be Table 5.14 shows the aforementioned options.
extracted into their respective directories. These
directories will be created if they do not exist and the
Examples
files will be extracted into them. (a) The following example compresses all
l It lists the content in the compressed_file. the files with extension .txt in the current
t It tests whether the compressed file is OK or not directory into the file data.7z.
(i.e., corrupted). $ 7z a data.7z *.txt
120 Unix and Shell Programming

(b) The following example displays the list of files compressed in the file data.7z.
$ 7z l data.7z
(c) The following example tests whether the files in the compressed file data.7z are OK or
not. If the files are found to be OK, the filenames are displayed along with a of information
about the compressed files: size and number of files and folders compressed in it.
$ 7z t data.7z
(d) The following example adds the files of the directory projects to an existing compressed
file data.7z.
$ 7z a data.7z projects
(e) The following example extracts the files found in the compressed file data.7z into the
current directory.
$ 7z e data.7z projects

Note: The compressed files of subdirectories will also be uncompressed into the current directory, that is, the
respective subdirectories will not be created.

To create subdirectories and uncompress files into the respective subdirectories, option
x is used instead of e.
(f) The following example deletes the directory projects and its files from the compressed
file data.7z.
$7z d data.7z projects
The screenshot of the aforementioned examples is shown in Fig. 5.11.

$ ls - l
total 10
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root mba 892 Feb 22 17:21 transact.txt

$ ls -l projects
total 4
-rwxr-xr-x 1 root root 347 Feb 22 14:53 bank.lst
-rw-r--r-- 1 root root 6 Feb 27 20:30 hello.txt

$ 7z a data.7z *.txt

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)
Scanning
Creating archive data.7z

Compressing customers.txt
Compressing letter.txt
Compressing matter.txt
Compressing transact.txt

Fig. 5.11 Compression and uncompression of files using the 7-zip command (Contd)
File Management and Compression Techniques 121

Everything is ok
$ ls - l
total 102
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rw------- 1 root root 639 Feb 27 20:31 data.7z
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root mba 892 Feb 22 17:21 transact.txt

$ 7z l data.7z

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Listing archive: data.7z

Method = LZMA
Solod = +
Block = 1
Date Time Attr Size Compressed Name
------------------- ----- ----------- ----------- --------------------
--------
2006-02-22 14:51:18....A 18 423 customers.txt
2006-02-22 14:51:18....A 6 letter.txt
2006-02-22 17:08:46....A 113 matter.txt
2006-02-22 17:21:48....A 892 transact.txt
------------------- ----- ----------- ----------- --------------------
1029 423 4 files, 0 folders
$ 7z t data.7z

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Processing archive: data.7z

Testing customers.txt
Testing letter.txt
Testing matter.txt
Testing transact.txt

Everything is ok

Total:
Folders: 0
Files: 4
Size: 1029
Compressed: 639

$ 7z a data.7z projects

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Scanning
Fig. 5.11 (Contd)
122 Unix and Shell Programming

Updating archive data.7z

Compressing projects/hello.txt
Compressing projects/bank.lst

Everything is ok
$ 7z l data.7z

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Listing archive: data.7z

Method = LZMA
Solid = +
Blocks + 1

Date Time Attr Size Compressed Name


------------------- ----- ----------- ----------- --------------------
--------
2006-02-22 14:51:18 ....A 18 423 customers.txt
2006-02-22 14:51:18 ....A 6 letter.txt
2006-02-22 17:08:46 ....A 113 matter.txt
2006-02-22 17:21:48 ....A 892 transact.txt
2006-02-27 20:30:21 ....A 6 198 projects/hello.txt
2006-02-22 14:53:14 ....A 347 projects/dank.lst
2006-02-27 20:30:19 D.... 0 0 projects
------------------- ----- ----------- ----------- --------------------
1382 621 6 files, 1 folders
$ rm *.txt

$ rm -r projects

$ ls -l
totla 276
-rw------- 1 root root 909 Feb 27 20:36 data.7z

$ 7z e data.7z

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN/UTF-8,Utf16+on,HugeFiles=on,1 CPU)

Processing archive: data.7z

Extracting customers.txt
Extracting letter.txt
Extracting matter.txt
Extracting transact.txt
Extracting projects/hello.txt
Extracting projects/bank.lst
Extracting projects

Everything is ok

Fig. 5.11 (Contd)


File Management and Compression Techniques 123

Total:
Folders: 1
Files: 6
Size: 1382
Compressed: 909

$ ls - l
total 462
-rwxr-xr-x 1 root root 347 Feb 22 14:53 bank.lst
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rw------- 1 root root 909 Feb 27 20:36 data.7z
-rw-r--r-- 1 root root 6 Feb 27 20:30 hello.txt
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root root 892 Feb 22 17:21 transact.txt

$ 7z d data.7z projects

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Updating archive data.7z

Everything is ok

$ 7z l data.7z

7-Zip 4.55 beta Copyright (c) 1999-2007 Igor Pavlov 2007-09-05


p7zip Version 4.55 (locale=en_IN.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Listing archive: data.7z

Method = LZMA
Solid = +
Blocks = 1

Date Time Attr Size Compressed Name


------------------- ----- ----------- ----------- --------------------
--------
2006-02-22 14:51:18 ....A 18 423 customers.txt
2006-02-22 14:51:18 ....A 6 letter.txt
2006-02-22 17:08:46 ....A 113 matter.txt
2006-02-22 17:21:48 ....A 892 transact.txt
------------------- ----- ----------- ----------- --------------------
1029 423 4 files, 0 folders

Fig. 5.11 (Contd)

5.5 DEALING WITH FILES


In this section, we will learn the different commands that deal with files such as finding
the file type (finding whether the specified file is a regular file, directory, device file, or
something else), locating or searching for files with the given criteria, confirming the
124 Unix and Shell Programming

presence of a specified application program or system utility on the disk drive, and checking
the file system. Let us see how we can find the file type.

5.5.1 file: Determining File Type


This command determines the file type, that is, whether the specified name belongs to a file
or a directory. The command does certain checks or tests on the specified file to determine
its type.
Syntax file [-f filelist]files

-f filelist Determines the type of files contained in the file filelist.


Files Filenames whose file type we wish to determine.
Examples
(a) $ file matter.txt
matter.txt: English text
This command determines if the file matter.txt is a directory, a special device file, or an
ordinary file. If matter.txt is an ordinary file, it checks to see whether it is empty and
displays the message ‘empty’.
If the file command finds that the file matter.txt is not empty, it checks to see if it
begins with a magic number, that is, a numeric or string constant used to indicate the
type of file if it is not suitable for human reading. It is to be remembered that files that are
not ASCII text files, such as executable files, archive, and library files, contain a magic
number at the beginning of the file.
If the file matter.txt does not have a magic number, the file command examines
its first block comprising 512 bytes and finds its language (i.e., whether its language is
English, a programming language, or some other special data). On the basis of all these
analyses, the file command displays the type of the specified file.
(b) The following example displays the file type of all the files and directories in the current
directory.
$ file *
(c) The following example displays the file type of all the files that are mentioned in the file,
filenames.
$ file –f filenames
The output of these commands is shown in Fig. 5.12.
In this output of the file command, the file letter.txt is declared as the command text
because the execute permission is assigned to this file.
Note: For the file that does not have the read permission, the file command displays the output ‘cannot open
for reading’.

5.5.2 find: Locating Files


It is used to locate one or more files that satisfy the given criteria. We can also perform
certain operations or actions that we want to perform on the searched files.
File Management and Compression Techniques 125

$ ls - l
total 148
-rw-r--r-- 1 root root 34878 Feb 22 17:30 a.png
-rw-r--r-- 1 root root 34779 Feb 22 17:28 a.png.z
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 16:39 projects
-rw-r--r-- 1 root root 892 Feb 22 17:21 transact.txt

$ file matter.txt
matter.txt: English text

$ file customers.txt
customers.txt: ascii text

$ file a.png
a.png: PNG image data

$ file a.png.z
a.png.z: packed data

$ file projects
projects: directory

$ file *
a.png: PNG image data
a.png.z: packed data
customers.txt: ascii text
letter.txt: commands text
matter.txt: English text
projects: directory
transact.txt: ascii text

$ cat filenames
letter.txt
a.png
transact.txt

$ file -f filenames
letter.txt: commands text
a.png: PNG image data
transact.txt: ascii text

Fig. 5.12 File types using the file command

Syntax # find path criteria action_list

Here, path refers to the directory or location of the disk in which we want the find command
to search for the desired files. The path may include more than one directory separated by
a space. The find command searches all the subdirectories specified in the path to find the
file(s) that meets the given criteria. Table 5.15 shows the options for writing criteria.
The action-list in the syntax indicates that we can apply several actions on the files that
are searched through the find command. Table 5.16 lists the most frequent actions that are
applied to the found files.
126 Unix and Shell Programming

Table 5.15 Options used with the find command


Options Description
–atime n It finds files that were accessed n days ago.
–ctime n It finds files that were created n days ago.
–mtime n It finds files that were modified more than +n days, less than –n days, or exactly n days ago.
-size n[c] It finds files that are n blocks or c bytes in size. One block is equal to 512 bytes.
–name pattern It finds files where the filename matches the pattern.
-perm octal_num It finds files that have given permissions.
–type It finds files of the specified type. The type is represented through the following characters:
b: Block file
c: Character file
d: Directory
l: Symbolic ink
f: Regular file
p: Named pipe
s: Socket
–user name It finds files that are owned by the given user_name. We can also use the user ID instead of the
username.
–group group_name It finds files that belong to a given group_name. We can also use the group ID instead of the
group name.

Table 5.16 Actions applied to the found files


Action Description
-print Default action that displays the path name of the files that meet the given criteria
-exec command Executes the Unix system command on the files that meet the given criteria
-ok command Same as the exec command, it prompts the user for confirmation, that is, the user has to press ‘y’ for
executing the command

In case of the –exec or ok command, a pair of braces {} is used to represent the files that
are found by the find command. In other words, the files located by find will replace the
braces and the specified command will be applied on each found file one by one. In order
to distinguish between the command being executed and the arguments used by the find
command, a semicolon (;) is used. Since the shell also uses the semicolon, we use the
‘escape’ character (a backslash or quotes) to differentiate it. A format of the find command
when a command is executed on the found files is as follows:
$ find pathname-list condition-list -exec command {} \;

While using complex expressions for finding files, we can use the operators explained in the
next sub-section.
File Management and Compression Techniques 127

Using find operators


While writing expressions for finding files in the find command, we can use the following
find operators (see Table 5.17).
Table 5.17 List of find operators used to connect expressions
Operator Description
! (Negation operator) This performs reverse action, that is, it finds the files that do not satisfy the
specified expression.
-o (OR operator) This is used to connect one or more expressions. On using an operator, the
files that satisfy even a single expression will also be displayed.
-a (AND operator) This is the default operator. Only the files that satisfy all the expressions
connected with the –a operator are displayed.

Note: While using –a or –o operators, we may use parentheses ( ) for separation, but they must be ‘escaped’
as they are used by the shell. This means that the parentheses must be prefixed with a backslash, ‘\(’, ‘\)’

Examples

(a) The following command displays the files and their path names that have not been
accessed for over a month (+30).
$ find / -atime +30 -print
(b) To find files that are of a size larger than 20 blocks and which have not been accessed
for over a month (+30), use the following command.
$ find / -atime +30 –size +20 –print
(c) To search for files that are of a size between 1000 and 2000 bytes, use the following command.
$ find . -size +1000c -size -2000c -print
We can see in this command that a minus sign designates ‘less than,’ and the plus sign
designates ‘greater than’.
(d) To remove files that are of a size larger than 20 blocks with the interactive action
command ok, enter the following.
$ find / -atime +30 -size +20 -ok rm -f { } \;
(e) To list all files and directories under the current directory, use the following command.
$ find . -print
(f) To search for the file a.txt in the current directory and its subdirectories, use the
following command.
$ find -name 'a.txt' -print
(g) To search for the file a.txt on the root and all its subdirectories, use the following
command.
$ find / -name 'a.txt' -print
(h) To display all .c files under the current directory, use the following command.
$ find . -name '*.c' -print
(i) To print all files beginning with the word test in the current directory and its
subdirectories, use the following command.
$ find . -name 'test*' -print
128 Unix and Shell Programming

(j) To print all filenames comprising three characters that begin with an upper-case or a lower-
case character in the current directory and its subdirectories, use the following command.
$ find . -name '[a-zA-Z]??' -print
(k) To display the list of the directories, use the following command.
$ find . -type d -print
(l) To find all those .c files that were last modified less than three days ago, use the following
command.
$ find . -mtime -3 -name "*.c" -print

Note: We can use single quotes as well as double quotes for defining the pattern.

(m) To find all those .c files that were last modified more than three days ago, use the
following command.
$ find . -mtime +3 -name "*.c" -print
(n) To find all those .c files that were modified exactly three days ago, use the following
command.
$ find . -mtime 3 -name "*.c" -print
(o) To find the .txt files that have the 755 permission, use the following command.
$ find . -name '*.txt' -perm 755 -print
We can see that 755 is an octal number representing read, write, and execute permissions
for the owner, and read and execute permission for the group and other members.
We can also use the and operator, that is, the -a operator in the aforementioned
command. The and operator shows only those files that satisfy both the specified
expressions. With the and operator, this command can be written as follows.
$ find . -name '*.txt' -a -perm 755 -print
Remember, -a is the default operator, so we can optionally omit it.
(p) To find the subdirectories under the current directory having the 755 permission, use the
following command.
$ find . -type d -perm 755 –print
(q) To find all the files that have the User (owner) as root, use the following command.
$ find . -user root -print
Instead of the username, we can use the user ID. The following command is used.
$ find . -user 0 -print
(r) To find all the files that belong to the group projects, use the following command.
$ find . -group projects -print
As with the username, instead of the group name, we can use the group ID. The
following command is used.
$ find . -group 15 -print
(s) To find all the files except the a.txt file, use the following command.
$ find . ! -name 'a.txt' -print
In this command, ! is the negation operator and it reverses the meaning of the
expression.
(t) To find all the files except the ones with the extension .txt, use the following command.
$ find . ! -name '*.txt' -print
File Management and Compression Techniques 129

(u) To find .txt files or files that have the 755 permission, use the following command.
$ find . \( -name '*.txt' -o -perm 755 \) -print

The -o operator is the ‘OR’ operator and hence the files that satisfy either expression will
be displayed. We have used the ‘escaped’ parentheses in this expression, that is, they
are prefixed by a backslash to avoid them from being interpreted by the shell.
(v) To find .txt as well as .doc files, use the following command.
$ find . \( -name '*.txt' -o -name '*.doc' \) –print

We can also execute commands on the files that we find. The following is an example.
$ find . -name "*.txt" -exec wc -l '{}' ';'
This command counts the number of lines in every .txt file in and under the current
directory. The count of the lines is displayed before the name of the respective file. Basically
in this command, all the .txt files that are found replace the ‘{ }’ braces, that is, the wc –l
command is applied to each of the files that is found. The ‘;’ ends the -exec clause.
(w) To display the names of the files and subdirectories in the current directory, use the
following command.
$ find . -exec echo {} ';'
We can see that the semicolon is quoted.
(x) The following example finds the .txt files that have the 755 permission. From the files
that are found, the group read permission is removed, as shown here.
$ find . -name '*.txt' -perm 755 -exec chmod g-r '{}' ';';
The find command has several significances, which are as follows:
1. Searching for files with a specific pattern
2. Searching for files that are accessed a specific number of days ago
3. Searching for files of a specific size
4. Searching for files with specific permissions
5. Searching for files belonging to a specified user or group
6. Applying commands on the found files

5.5.3 locate: Searching for Files with Specific Strings


This command is used for searching for files whose name or path matches a particular search
string and for which, the user has access permissions.
Syntax locate [-q][-n] [-i] pattern_to_search

Table 5.18 explains these options.


Examples

(a) $ locate ".txt"


This command will find all filenames in the file system that contain .txt anywhere in their
full paths and for which the user has access permissions. It may display error messages
when it comes across the .txt files for which the user does not have access permissions.
(b) $ locate –q –n 10 ".txt"
130 Unix and Shell Programming

Table 5.18 Options used with the locate command This command will find the first 10 files
Options Description that contain .txt anywhere in their full
paths and for which the user has access
-q It suppresses error messages that
permissions. It will not display any error
are displayed for files for which
messages on finding the files for which the
the user does not have access
user does not have access permissions.
permissions.
(c) $ locate –i "project.txt"
-n It limits the result to a specified This command will find all project.txt
number.
files, be it in the upper case or lower case,
-i It ignores the case while for which the user has access permissions.
searching, that is, it returns the
result that matches the pattern in One disadvantage of locate is that it stores
upper case or lower case. all filenames on the system in an index
pattern_to_search It represents the string that we that is usually updated only once a day.
wish to search for in the path This means locate will not find files that
names or in the filenames. have been created very recently. It may
All the files that contain the also report filenames as being present even
pattern_to_search string in their though the file has just been deleted. Unlike
path or filename will be listed on find, locate cannot track down files on the
the screen. basis of their permissions, size, and so on.

5.5.4 which/whence: Finding Locations of Programs or


Utilities on Disks
This command is used to find out where the specified application program or system utility
is stored on the disk.
Note: The command whence works only in the Korn shell. The syntax for using whence is the same as that
of the which command.

Syntax which program_name/utility

Here, program_name/utility represents the command or programs whose location we wish


to find out.
Example $ which ls

Output /usr/bin/ls

5.5.5 fsck: Utility for Checking File Systems


The Unix file system can easily be corrupted if it is not properly shut down. Therefore, we
need to periodically check the file system and repair it. If the errors in the file system are
not repaired then and there, the whole system may crash. The fsck command checks our file
system, reports the errors that it may come across, and interactively prompts us for ‘Yes’ or
‘No’ decisions to correct those error conditions. It operates in two modes: interactive and
non-interactive. In the interactive mode, this command examines the file system and stops
at each error found in the file system. It displays the error description and asks for the user’s
File Management and Compression Techniques 131

response. On the basis of the user action, either the error is removed or fsck will continue
checking without making any changes to the file system. In the non-interactive mode, fsck
tries to repair all the errors found in the file system without waiting for the user response.
Although this mode is faster, it may delete some important files that have become corrupted.
Syntax # fsck [-y] [-n] [ filesystem ]

Note: The option –y or –n, if used, runs the fsck command in the non-interactive mode.

Here, filesystem is the name of the file system to be checked. If we do not specify the file
system, fsck will use the files /etc/checklist or /etc/fstab to know the names of the file
systems to be checked.
The options –y and –n are used to automatically provide answers Yes and No, respectively,
to all the queries that appear when using the fsck command.
Examples
(a) # fsck –y
This command checks all the file systems installed in our machines and displays the
answer ‘Yes’ (meaning granted) for all the queries that come up.
(b) # fsck –n
This command checks all the file systems installed in our machines and displays the
answer ‘No’ for all the queries that come up.
The fsck command runs in several phases as follows:
# fsck /dev/root
** Currently Mounted on /
** Phase 1 — Check Blocks and Sizes
** Phase 2 — Check Pathnames
** Phase 3 — Check Connectivity
** Phase 3b — Verify Shadows/ACLs
** Phase 4 — Check Reference Counts
** Phase 5 — Check Cylinder Groups
7899 files, 406203 used, 279169 free (257 frags, 34864 blocks, 0.0% fragmentation)
Let us have a quick view of all the phases of the fsck command.
In phase 1, each inode in the file system is checked and then the disk blocks pointed to by the
inode are checked. Error messages may appear at this stage if the block address in the inode
is invalid, a block is already being used by another inode, the expected number of blocks for
an ordinary file does not match with the actual number of blocks used by the inode, and there
are other similar errors. In short, phase 1 performs the following tasks:
1. Checks the inodes, looks for valid inode types, and corrects the inode size and format.
2. Checks for bad or duplicate blocks.
In phase 2, fsck checks all directory inodes in the file system. First, the inode for the root
directory is examined. In case the root inode is corrupted, the fsck will abort. If the inode
number of the directory entry is invalid, the inode field of the directory entry is set to zero. It
132 Unix and Shell Programming

is ensured that none of the directory entries points to an unallocated inode. In short, this phase
is focused on removing the directory entries that are invalid or pointing to invalid inode(s).
Thus, this phase reports errors that result from root inode mode and status, directory inode
pointers in a range, directory entries pointing to bad inodes, etc.
Note: This phase removes directory entries pointing to bad inodes used in phase 1.

In phase 3, all the allocated inodes are scanned for unreferenced directories, that is, directories
where the inodes corresponding to the parent directory entry do not exist. In this case, we
will be prompted to reconnect to any orphaned directories. If our answer is yes, then a link
between the orphan directory and the special directory /lost+found will be made. When the
fsck command is over, we can examine the entries in /lost+ found and can move them to
their respective directories.
Phase 4 deals with the inode count or reference count information, which was accumulated
in phases 2 and 3. In phase 1, the reference count is first set to the link count value stored
in the inode. The link count is the number of links to a physical file. Then, in phases 2 and
3, the reference count is decremented each time a valid link is found while scanning the file
system. Therefore, the reference count value should be zero when phase 4 begins.
Phase 5 checks the free block list. Any bad or duplicate blocks in this list are flagged,
which are later salvaged. On salvaging the free list, phase 6 is initiated that reconstructs the
free block list.
If a file system was corrupted and then fixed, the system is rebooted without a sync
operation (to prevent the ‘file system fixing’ from being undone). The reboot process
modifies the file system to repair it.
Note: Unless fsck is used in the single-user mode, the file system corruption will spread to other mounted file systems.

The fsck command checks the integrity of the file systems, especially the superblock, which
stores summary information of the volume. Whenever data is added or changed on a disk, it
is the superblock that is frequently modified to reflect the changes. There are many chances
of the superblock getting corrupted. Hence, the fsck checks the superblock for any errors.
The following two checks are essentially done:
1. The size of the file system must be greater than the size of the number of blocks identified
in the superblock.
2. The total number of inodes must be less than the maximum number of inodes.
Besides checking the superblock, the fsck command also checks the number and status of
the cylinder group blocks, inodes, indirect blocks, and data blocks. This command checks if
all the blocks that are marked as free are not being used by any files. If they are being used,
it means the files may be corrupted. In addition, fsck confirms if the number of free blocks
plus the number of used blocks equals the total number of blocks in the file system. In case
of any ambiguity, the maps of unallocated blocks are rebuilt.
When inodes are examined, fsck searches for any inconsistency in the format and type,
link count, duplicate blocks, bad block numbers, and inode size. Inodes should always be in
one of the three states: allocated (being used by a file), unallocated (not being used by a file),
and partially allocated (the procedure of allocation and unallocation is performed, but the
File Management and Compression Techniques 133

data that was supposed to be deleted is still there). The fsck command will clear the inode if
inconsistency of any type is detected.
The link count is the number of directory entries that are linked to a particular inode.
The entire directory structure is examined to find the number of links for every inode. If the
stored link count and the actual link count do not match, it confirms that the disk was not
synchronized before the shutdown, that is, while saving the changes in the file system, the
link count was not updated. In case the stored count is not zero but the actual count is zero,
then disconnected files are placed in the lost+found directory. In other cases, the actual count
replaces the stored count.
The output of the fsck command is shown in Fig. 5.13.

# fsck -y
/dev/dsk/c0d0s0 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
7899 files, 406203 user, 279169 free (257 frags, 34864 blocks,
0.0% fragmentation)

***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****


/dev/dsk/c0d0s6 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s6
** Currently Mounted on /usr
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
150119 files, 3244347 used, 1892040 free (5304 frags, 235842 blocks,
0.1% fragmentation)

***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****


/dev/dsk/c0d0s3 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s3
** Currently Mounted on /var
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs

Fig. 5.13 Output displayed while running the fsck command (Contd)
134 Unix and Shell Programming

** Phase 4 - Check Reference Counts


** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
19560 files, 83088 used, 156495 free (455 frags, 19505 blocks,
0.2% fragmentation)

***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****


/dev/dsk/c0d0s7 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s7
** Currently Mounted on /export/home
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
2 files, 9 user, 31295480 free (16 frags, 3911933 blocks,
0.0% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****
/dev/dsk/c0d0s5 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s5
** Currently Mounted on /opt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
98 files, 25985 used, 24505 free (9 frags, 3062 blocks,
0.0% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****
/dev/dsk/c0d0s1 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes

** /dev/dsk/c0d0s1
** Currently Mounted on /usr/openwin
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
8305 files, 206932 used, 116320 free (400 frags, 14490 blocks,
0.1% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****

Fig. 5.13 (Contd)


File Management and Compression Techniques 135

5.6 IMPORTANT UNIX SYSTEM FILES


In this section, we will learn about the important files of the Unix system such as /etc/
passwd, /etc/shadow, /etc/hosts file, /etc/hosts.allow, and /etc/hosts.deny. These are
the files where important information is kept as passwords of the users, IP addresses of the
computers on our network, and permissions to access different services, among others. Let us
begin with the /etc/passwd file. These files are very critical and can hinder the performance
of the Unix system if modified.

5.6.1 /etc/passwd
passwd is a file found in the /etc directory. It contains login names, passwords, home
directories, and other information about users. Each line of the file contains a series of fields,
which defines a login account. The fields in each line of the /etc/passwd file are separated
by colons. Table 5.19 shows the aforementioned fields.
Table 5.19 Fields found in the /etc/passwd file
Field Description
user name The username is the string entered in response to the login prompt. It is a unique
identifier for the user throughout the session. The program that prompts for the
login name reads this file to get information pertaining to the user
encrypted This program prompts for the login name, reads the information found in this field,
password and uses the information to validate the password entered by the user
user ID number Each user has an ID number that can be used as a synonym for the username.
Both the ID number and the username are unique within the system.
group ID number Each user has one group ID number. Any number of users can be assigned to the
same group. The group ID number is used to assign group access permissions to
files, directories, and devices.
real name This is a sort of comment or complete name of the user (login names are usually
unique identifiers only)
home directory It is the directory that the user reaches after entering the correct logon name and
password. This is the name that gets stored in the HOME environment variable
shell program This is a shell program that is run once the user logs in. If nothing is specified, /
bin/sh is assumed

Example $ grep ravi /etc/passwd


ravi:x:235:614:ravi sharma:/home/ravi:/bin/sh
Note: The grep command is used to search for a given pattern in a file and displays all the lines where the
pattern is found. We will learn about the grep command in detail in Chapter 10.

We can always find a root login in the /etc/passwd file, which is as follows:
root:x:0:0:root:/root:/bin/sh
The root user has a user ID of 0 and a group ID of 0. For security reasons, some systems
move the list of usernames into the shadow file.
136 Unix and Shell Programming

5.6.2 /etc/shadow
Passwords are encrypted for security. We only need to use the same algorithm to encrypt a
newly entered password and then compare the result against the encrypted version stored in
the file; if they match, the password is correct.
The /etc/passwd file must be readable by everyone because it is used by so many programs
to find the user ID number, group membership, and home directory. This allows one to get
a copy of the /etc/passwd file, and thus get a copy of all the password encryptions. It may
result in security problems.
The solution is to hide the passwords in another file. The file holding the passwords
is known as the shadow file and is normally named /etc/shadow. The shadow file is only
readable by its owner, which is the root. This means that no one can read the passwords
unless they have the root access.
We can tell by looking at the data in the /etc/passwd file whether the actual password is
in a shadow file, because the password field displays an x rather than an encrypted password.
The shadow file is a text file, and each line displays the password information of a user. The
fields in each line of the /etc/shadow file are separated by colons. Table 5.20 gives a brief
description of the fields found in this file.
Table 5.20 Fields found in the /etc/shadow file
Field Description
user name The same name found in the /etc/passwd file
encrypted password The encrypted form of the password
password last changed The day the password was last changed (The date is a count of the
number of days since 1 January 1970.)
password may be changed The number of days before the user has permission to change the
password (A value of -1 means that it can be changed any time.)
password must be changed The number of days from the time the password is set until the
password expires and must be changed
password expire warning The number of days prior to the password expiry date that the user has
to be warned
disable after expires The number of days after the password expires that the account is to
be automatically disabled
disabled The date that the account was disabled (The date is a count of the
number of days since 1 January 1970.)

The shadow file also contains dates and day counters, which can be used to force the users to
change their passwords from time to time, under the threat of their account getting disabled.

5.6.3 /etc/hosts
The hosts file contains static address information for computers on our local network.
Whenever we refer to a computer by its name, the commands that we use must have some
way to translate that name into an IP address. Our Internet service provider (ISP) should
provide us with the address of one or more name servers that we can use. If we use a dial-up
File Management and Compression Techniques 137

connection, the address of a name server is normally returned to our computer as part of the
initial connection sequence. However, in some cases, we must configure the address of the
name server into the routing table.
If we have a local network, we will need to provide each member of our network with
the address of all the other members. If there are many computers on our local network, it
is easier to use one of them as a name server by configuring a daemon to respond to address
requests and then configuring other computers to send address queries to our local name
server.
The contents of /etc/hosts file may be as follows:
127.0.0.1 localhost localhost.localdomain
192.168.0.1 mce1 mce1.localdomain
192.168.0.2 mce2 mce2.localdomain
192.168.0.3 mce3 mce3.localdomain

The same list of addresses is required for every computer on the network, and a computer’s
address can be included in the file, so that the file can be duplicated everywhere on the
network by simply copying it from one computer to another. Each line in the file contains
an IP address, followed by a list of alias names for the computer. In this example, each
computer can be located by its simple name or domain name.
The first line of the file is always named the local host and always has the address 127.0.0.1.
This special loopback address is used by programs on the local computer to address its own
services.

5.6.4 /etc/hosts.allow and /etc/hosts.deny


When an Internet packet arrives, the contents of the hosts.allow file are scanned, and if
a specific permission is found for the requested action, access is granted and no further
checking is made. If the hosts.allow file did not specifically grant permission, the hosts.
deny file is scanned, and if access is not specifically denied, it is granted.
Each line specifies a service followed by a colon, which separates it from the list of hosts
being granted or denied that particular service. The keyword ALL can be used to specify all
services or all hosts.
$ cat /etc/hosts.deny .
ALL: ALL

If the content of hosts.deny file is as given, it means that every service is denied to every
host. The following example of hosts.allow begins by granting all permissions to every
host in the local domain and every host in the domain philips.com. All permissions are
also granted to the computer with the IP address 234.51.135.18. Finally, HTTP web service
(specified by naming the daemon to receive the message) is granted to every host except the
ones in the domain .godrej.com
$ cat /etc/hosts.allow
ALL: LOCAL.philips.com
ALL: 234.51.135.18
httpd: ALL EXCEPT .godrej.com
138 Unix and Shell Programming

5.7 SHELL VARIABLES


A variable is a medium to store the value to be used for manipulation or storage in a
programming language. It offers a symbolic way to represent and manipulate data. The shell
also has variables to serve the same purpose. Shell variables are of two types: those created
and maintained by the Unix system itself, and those created by the user.

5.7.1 User-created Shell Variables


To create a shell variable, we can simply use the following syntax:
Syntax name=value

Examples
(a) radius=5
Creates a shell variable having the name radius.
Similarly, name="ravi"
name is the variable with the value ravi.
(b) Null string is a string with no characters.
area=" "
We can use letters, digits, and the underscore character in variable names.
area_circle=56
(c) To find out the value of a shell variable, we can use the echo command. Ordinarily, echo
merely echoes its arguments on the screen.
$ echo radius
radius

However, if we use a variable name preceded by a $ as an argument to echo, the value of


the variable is echoed.
$ echo $radius
5
$ echo The radius of circle is $radius
The radius of circle is 5

5.7.2 System Shell Variables


The shell maintains its own set of shell variables. To find what your system is using, just
type set.
$ set

You may get the following output:


EXINIT='set ai nu'
HOME=/usr/ravi
IFS=
MAIL=/usr/mail/ravi
PATH=.:/bin:/usr/bin
File Management and Compression Techniques 139

PS1=$
PS2=>
TERM=adm5
Table 5.21 shows the description of these shell variables.
Table 5.21 Shell variables
Shell variable Description
EXINIT This refers to the initialization instructions for the ex and vi editors.
HOME This is set to the path name of our home directory.
IFS (Internal This is set to a list of the characters that are used to separate words in a command
field separator) line. Normally, this list consists of the space character, the tab character, and the
newline character.
LOGNAME This gives the user’s login name.
MAIL This variable’s value is the name of the directory in which an electronic mail addressed
to us is placed. The shell checks the contents of this directory very often, and when a
new content shows up, we are informed about it.
PATH This names the directories that the shell will search in to find the commands that we
execute. A colon is used to separate the directory names without spaces.
PS1 (Prompt This symbol is used as our prompt. Normally, it is set to $, but we can redefine it by
string 1) merely assigning a new value. For example, the command PS1=# resets the prompt
to a # symbol.
PS2 (Prompt This prompt is used when a new line is started without finishing a command
string 2) (command continuation symbol).
TERM This identifies the kind of terminal we use (it helps the shell understand what to
interpret as erase key, kill line, etc.)

Note: Some variables like PS1 are defined by default. Others like PATH are defined in our .profile file.

A description of these terms is provided in the following sections.

CDPATH variable
The CDPATH variable contains a list of path names separated by colon (:) as shown.
:$HOME: /bin/usr/files

There are three paths in this example. Since the path starts with a colon, the first directory
is the current working directory. The second directory is our home directory. The third
directory is an absolute path name to a directory of files.
The contents of CDPATH are used by the cd command using the following rules:
1. If CDPATH is not defined, the cd command searches the working directory to locate the
requested directory. If the requested directory is found, cd moves to it. If it is not found,
cd displays an error message.
2. If CDPATH is defined as shown in the previous example, the actions listed are taken when
the following command is executed.
$ cd ajmer
140 Unix and Shell Programming

(a) The cd command searches the current directory for the ajmer directory. If it is found,
the current directory is changed to ajmer.
(b) If the ajmer directory is not found in the current directory, the cd command searches
in the home directory, which is the second entry in CDPATH. If the ajmer directory is
found in the home directory, it becomes the current directory.
(c) If the ajmer directory is not found in the home directory, cd tries to find it in /bin/
usr/files, which is the third directory in CDPATH. If the ajmer directory is found in /
bin/usr/files, it becomes the current directory.
(d) If the ajmer directory is not found in /bin/usr/files, the cd command displays an
error message and terminates.
$ echo $CDPATH
$ CDPATH= : $HOME: /bin/usr/files

HOME variable
The HOME variable contains the PATH to our home directory. The default is our login directory.
Some commands use the value of this variable when they need the PATH to our home directory.
For example, when we use the cd command without any argument, the command uses the
value of the HOME variable as the argument.
$ echo $HOME
/mnt/disk1/usr/chirag
$ oldHOME=$HOME
$ echo $oldHOME
/mnt/disk1/usr/chirag

$HOME=$(pwd)
/mnt/disk1/usr/chirag/ajmer

$ HOME =$oldHOME
$ echo $ HOME
/mnt/disk1/usr/chirag

PATH variable
The PATH variable is used for a command directory. The entries in the path variable must be
separated by colons. It works just like CDPATH. When the SHELL encounters a command, it
uses the entries in the PATH variable to search for the command under each directory in the
PATH variable. The major difference is that the current directory, which will be searched for
by the command, is mentioned at the end in this variable.
If we set the PATH variable as follows,
$ PATH =/bin:/usr/bin::
then, the shell will look for the commands that we execute in this sequence—shell will first search
the /bin directory, followed by the /usr/bin directory, and finally the current working directory.
Primary prompt variable
The primary prompt (PSI prompt) is set in the variable PS1 for the Korn and Bash shells and
prompt for the C shell. The shell uses the primary prompt when it accepts a command. The
default is the dollar sign($) for the Korn and Bash shells and the percent sign (%) for the C shell.
File Management and Compression Techniques 141

We begin by changing the primary prompt to reflect the shell we are working in, the Korn
shell. Since we have a blank at the end of the prompt, we must use quotes to set it. As soon
as it is set, a new prompt is displayed. At the end, we change it back to the default.
$ PS1="mce>"
mce > echo $PS1
mce>
mce > PS1="$"
$

SHELL variable
The SHELL variable holds the path of our login shell.

TERM variable
It holds the description for the computer terminal or terminal emulator we are using. The
value of this variable determines the keys that we can use for the purpose of editing. The
default value of TERM variable is vt100 (a terminal type).

Setting/Unsetting system shell variables


The following are the ways by which we can set or unset system shell variables:
An assignment operator is used to set the value for a system shell variable.
Syntax variable=value

Example $ TERM=vt100

The unset command is used to unset a system shell variable. The syntax for using the unset
command is as follows:
Syntax unset shell_variable

Example unset TERM


We can look at the value assigned to a system shell variable through the echo command.
$ echo $TERM

We can use the set command with no arguments to display the variables that are currently set.
$ set

5.8 EXPORT OF LOCAL AND GLOBAL SHELL VARIABLES


When a process is created by the shell, it makes available certain features of its own
environment to the child processes. The created process (i.e., the command) can also make
use of these inherited parameters for it to operate.
These parameters include the following:
1. The PID of the parent process
2. The user and group owner of the process
142 Unix and Shell Programming

3. The current working directory


4. The three standard files
5. Other open files used by the parent process
6. Some environment variables available in the parent process
By default, the values stored in shell variables are local to the shell, that is, they are available
only in the shell in which they are defined. They are not passed on to a child shell. However,
the shell can also export these variables recursively to all child processes so that, once
defined, they are available globally. This is done using the export command.
Note: A variable defined in a process is only local to the process in which it is defined and is not available in a
child process. However, when it is exported, it is available recursively to all child processes.

The syntax for the export command is as follows:


Syntax export variable[=value]

Examples
(a) $ export welcomemsg
This example exports an earlier defined shell variable welcomemsg to make it available to
child processes.
(b) $ export welcomemsg='Good Morning'
This example defines a shell variable welcomemsg as well as exports it to be available to
child processes.

Note: While defining a shell variable, there should not be any space on either side of the ‘=’ sign.

(c) The following example creates a new shell variable radius.


$ radius=5
$ echo $radius
5
$ sh - Create new shell
$ echo $radius
$

We can see that the value of the shell variable radius is not seen in the new shell as it does
not know about this.
If we want the new shell to know about the shell variables created by us, we use the export
command. By using the export command, the shell variables are exported to child processes,
making it a global variable. The following example demonstrates this implementation.
Example $ radius=5

$ echo $radius
5
$ export radius
$ sh - Create new shell
File Management and Compression Techniques 143

$ echo $radius
5 - The new shell has a copy of radius

$ radius=30 - The copy gets a new value


$ echo $radius
30

$ Ctrl-d - Return to old shell


$ echo $radius
5 - We get the original value of radius

The export command causes a new shell to be given a copy of the original variable. This
copy has the same name and value as the original. Subsequently, the value of the copy
can be changed but when the subshell dies, the copy is gone though the original variable
remains.
To erase or remove a global variable, we use the unset command.
Syntax unset variable_name

Example $ unset radius

Note: To find out the list of variables exported, just type the set command followed by the enter key: $ set

In this chapter, we understood the different types of files, the role of device drivers while
operating the devices, differences between block and character devices, usage of disk space,
amount of free disk space in all file systems, and partition in a disk drive. We learnt how
commands such as gzip, gunzip, zip, unzip, compress, uncompress, pack, unpack, bzip2,
and bunzip2 can be used for compressing and uncompressing files. We also discussed how
desired files can be found and executed specific commands on them. In addition, we learnt
how a corrupted file system can be repaired. We have also seen the role of important files
of the Unix system, shell variables, and system shell variables. In Chapter 6, we will learn
about handling processes, jobs, and signals in detail.

■ SUMMARY ■

1. All devices are considered to be files in Unix. Devices 3. The hosts file contains static address information for
such as floppy drive, CD-ROM, and hard disk are computers on our local network.
known as block devices as data is read from and 4. The hosts.allow file defines the list of hosts for whom
written into these devices in terms of blocks. Character the services are allowed; the hosts.deny file defines
devices, on the other hand, are also known as raw the list of hosts for whom the services are denied.
devices as the read/write operations in these devices 5. By default, the values stored in shell variables are local
are done directly, that is, ‘raw’ without using the buffer to the shell, that is they are available only in the shell in
cache. which they are defined.
2. A disk can be divided into several partitions. It can have 6. The export statement exports the local variables
a primary partition and an extended partition. There recursively to all child processes so that they are
can be multiple logical drives in an extended partition. available globally.
144 Unix and Shell Programming

■ F U N C T ION SPECIFICATION ■

Command Function Command Function


dd (disk Used for copying data from one medium The degree of compression in the pack
data) to another. command is less than the compress
format Used for formatting disks. command.
du (disk Used to display information about the pcat Used to view the contents of a packed file.
usage) usage of disk space by each file and unpack Used to uncompress the packed file into
directory of the system.
the original file.
dfspace Used to report the free disk space in
terms of megabytes and percentage of
bzip2 Used to compress a specified file by
the total disk space. replacing it with its compressed version
fdisk Used to create, delete, and activate having a .bz2 extension.
partitions. 7-zip Used to compress files at the highest
gzip Used to compress the specified file and compression ratio (around 30–50% more
replace it with the compressed file having than the other zip formats).
the extension .gz. The -l option is file Used to determine the file type, that is,
used with gzip to know the extent to if it is a regular file, directory, device file,
which a file is compressed.
etc.
gunzip Used to uncompress a compressed file.
find Used to locate one or more files that
zip Used to compress a set of files into a
single file. satisfy the given criteria.
unzip Used to unzip a zipped archive. which/ Used to find out the location of a specified
compress Used to compress a specified file. It whence application program or system utility on
replaces the original file with its compressed the disk.
version that has the same filename with a locate Used to search for files whose names
.Z extension added to it. match a particular search string.
uncompress Used to uncompress the compressed file fsck Used to check and repair a file system if
back to its original form.
corrupted.
zcat Used to see the contents of the
compressed files. passwd A file found in the /etc directory contain-
pack Used to compress the given file and ing login names, passwords, home direct-
replace the original file with the same ories, and other information about users.
filename having a .z extension added to it. set Used to see the list of shell variables.

■ EXERCISES ■
Objective-type Questions
State True or False
5.1 All devices are considered as files in Unix. 5.7 The term bs in the dd command stands for block
5.2 All device files are stored in /etc or in its size.
subdirectories. 5.8 The du utility displays complete information about
5.3 CD-ROM is a character device. the usage of disk space by each file and directory.
5.4 Printer is a character device. 5.9 By default, the du command displays information
5.5 The minor number represents the type of device. in terms of 1024-byte blocks.
5.6 The dd command is used for copying data from 5.10 The df command reports only the free disk space
one medium to another. of the file system installed on our machines.
File Management and Compression Techniques 145

5.11 The dfspace command reports the used disk compress, or pack commands.
space of the file system. 5.17 The fdisk command can be used to create and delete
5.12 The -q option of the zip command makes it run partitions on a disk, but cannot activate partitions.
in quiet mode. 5.18 A disk can have several primary partitions.
5.13 The extension added to the file that is compressed 5.19 The gzip command compresses the specified file
by the compress command is .C. and replaces it with the compressed file having
5.14 The find command is used for searching for files. the extension .gz.
5.15 The file command is used for displaying filenames. 5.20 By default, the values stored in shell variables
5.16 The gunzip command is used to uncompress a are local to the shell, that is, they are available
compressed file that is compressed by the gzip, only in the shell in which they are defined.

Fill in the Blanks


5.1 Devices are of two types: and 5.11 The option used with the find command to search
. for files of the specific owner is .
5.2 The term ‘if’ in the dd command refers to 5.12 The option in the compress com-
. mand stands for verbose and displays how much
5.3 The option used with the du command to see the compression has been done.
usage of every file is . 5.13 To see the contents of the compressed files,
5.4 The command compresses the command is used.
specified file and replaces it with the extension 5.14 The pack command compresses the given file and
.gz. replaces the original file with the same filename
5.5 The command used to uncompress any com- having extension added to it.
pressed file is . 5.15 The compresses the specified file by
5.6 The command used to compress a set of files into replacing it with its compressed version having a
a single, compact archive is . .bz2 extension.
5.7 The option used with the zip command to fix 5.16 The option of the bzip2 command
any damaged zip file is . can be used to uncompress a file.
5.8 The option of the compress command that 5.17 The command is used to convert a
displays the amount of compression is local shell variable into a global variable.
. 5.18 The system shell variable stores the
5.9 The command used to repair the file system is information of the terminal that we are using.
. 5.19 The system shell variable indicates
5.10 The option used with the find command to search the location where emails of a user are stored.
for files that have not been accessed for a given 5.20 The path name of the home directory of the user
time length is . is stored in the variable.

Multiple-choice Questions

5.1 The fdisk command is used to 5.3 The command used to view the contents of a
(a) format a disk packed file is
(b) remove bad sectors from a disk (a) pcat (c) show
(c) create partitions (b) cat (d) catpack
(d) repair a file system 5.4 The user’s log name is stored in
5.2 The gzip command compresses the file with the (a) LOGNAME (c) OWNER
extension (b) USER (d) LOGIN
(a) .gzip 5.5 The command to see the list of shell variables
(b) .gz is
(c) .gp (a) showvar (c) disp
(d) .g (b) showshell (d) set
146 Unix and Shell Programming

5.6 The command that is used to find out where an allowed is stored in the file
application program or system utility is stored on (a) hosts.allow (c) hosts
a disk is (b) services.txt (d) allowed
(a) search 5.9 The shell variable that sets the symbol for the
(b) findapp primary shell prompt is
(c) whence (a) PS2 (c) shellpr
(d) util (b) sprompt (d) PS1
5.7 The fsck command is used for 5.10 The TERM shell variable stores
(a) finding a file (a) shell duration
(b) compressing a file (b) terminal description
(c) uncompressing a file (c) logged-in time
(d) repairing a file system (d) booting time
5.8 The list of hosts for whom the services are

Programming Exercises
5.1 Write the command for the following tasks: (m) To determine the type of the file, accounts.
(a) To copy the entire disk, hdb, to a file called txt
back.dd (n) To display the files and their path names that
(b) To find the disk usage of every file in / have not been accessed for over 10 days
project directory (o) To check the file system
(c) To find the total number of blocks occupied 5.2 What will the following commands do?
by the /project directory (a) $ export project_name
(d) To display a report of the free disk space for (b) $ PS1="UnixPrompt>"
all the file systems installed on our machines (c) $ passwd
(e) To display the free disk space in terms of (d) $ grep john /etc/passwd
megabytes and percentage of total disk space (e) $ which cat
(f) To compress a file a.txt to a.txt.gz (f) $ find . - mtime - 10 -name "*.txt" -
(g) To add a file account.txt to a zipped file print
finance.zip (g) $ bunzip2 accounts.txt.bz2
(h) To fix a zipped file finance.zip (h) $ pack accounts.txt
(i) To compress a file a.txt and also show how (i) $ zip -q accounts.zip *.txt
much compression was done (j) $ df -h
(j) To uncompress a file a.txt.bz2 file (k) $ du –s *.txt
(k) To set the secondary prompt, the prompt that (l) $du /projects
is displayed when a command is continued to (m) $ locate "projects"
the second line to '>>>' (n) $ find / -size +15 -print
(l) To display the list of path names (o) $ echo $HOME

Review Questions
5.1 Explain the following commands with syntax and 5.3 (a) Explain the different options used in the find
examples: command to search for a desired file.
(a) dd (c) uncompress (b) How is the file system repaired in Unix? Explain.
(b) format (d) unpack 5.4 What is the difference between the following
5.2 (a) What are the points of comparison between the files?
following commands: gzip, zip, compress, (a) /etc/passwd and /etc/shadow
and pack? (b) /etc/hosts.allow and /etc/hosts.deny
(b) What is the difference among the following 5.5 How is a shell variable created and how can a
commands: du, df, and dfspace? local shell variable be made a global variable?
File Management and Compression Techniques 147

5.6 Explain the usage of the following system shell (a) HOME (c) PS2 (e) TERM
variables: (b) MAIL (d) PATH

Brain Teasers
5.1 In long listing command ls –l, if you find using the bunzip2 command? If yes, what is
a file with mode field set to l, what does it mean? that?
5.2 Correct the following command to backup a hard 5.8 If we provide the command file a.txt, we get
disk to a file. the output, ‘cannot open for reading’. What does
$ dd if=/file.dd of=/dev/hda it mean?
5.3 Correct the mistake in the following command 5.9 What command must be given to delete all the
for compressing few .txt files in the name abc. files that have not been accessed for the last six
zip in quiet mode. months?
$ zip *.txt abc.zip 5.10 Correct the following command to display all
5.4 Can you uncompress a .bz2 file to the standard .txt files in the current directory.
output? If yes, how? $ find . - name "*.txt" - ls
5.5 How will you know whether a particular file in 5.11 What will happen if answer Yes is provided to
the /dev directory represents a character device the question “CLEAR?”, which appears while
or block device? running fsck command?
5.6 If a device has a major number 8 and minor 5.12 Is the following command to set the primary
number 0, what does it represent? prompt correct? If not, identify the mistake.
5.7 Is there any way to uncompress a .bz2 file without PS2='UnixPrompt>'

■ ANSWERS TO OBJECTIVE-TYPE QUESTIONS ■


State True or False 5.14 True 5.6 zip 5.20 HOME
5.1 True 5.15 False 5.7 -F
5.2 False 5.16 True 5.8 -v Multiple-choice
5.3 False 5.17 False 5.9 fsck Questions
5.4 True 5.18 False 5.10 -atime n
5.1 (c)
5.5 False 5.19 True 5.11 -user name
5.2 (b)
5.6 True 5.20 True 5.12 -v
5.3 (a)
5.7 True 5.13 zcat
5.4 (a)
5.8 True Fill in the Blanks 5.14 .Z
5.5 (d)
5.9 False 5.1 character, block 5.15 bzip2
5.6 (c)
5.10 True 5.2 input 5.16 -d
5.7 (d)
5.11 False 5.3 -a 5.17 export
5.8 (a)
5.12 True 5.4 gzip 5.18 TERM
5.9 (d)
5.13 False 5.5 gunzip 5.19 MAIL
5.10 (b)
Manipulating C HA PT E R

Processes and
Signals
6
After studying this chapter, the reader will be conversant with the following:
• Processes and their address space, structure, data structures describing
the processes, and process states
• Difference between a process and a thread
• Commands related to scheduling processes at the desired time, handling
jobs, switching jobs from the foreground to the background and vice
versa, etc.
• Suspending, resuming, and terminating jobs, executing commands in a
batch, ensuring process execution even when a user logs out, increasing
and decreasing priority of processes, and killing processes
• Signals, their types, and the methods of signal generation
• Virtual memory and its role in executing large applications in a limited
physical memory and mapping of a virtual address to the physical memory

6.1 PROCESS BASICS


All processes in the Unix system are created when an existing process executes a process
creation system call known as fork. The first process in the Unix system, also known as process
0, is related to bootstrapping. The process of starting a computer is known as bootstrapping or
booting. During bootstrapping, a computer runs a self test, and loads a boot program into the
memory from the boot device. The boot program loads the kernel and passes the control to the
kernel, which in turn, configures the devices, performs hardware status verification, detects
new hardware, and initializes the existing devices and system processes. After performing
these initial activities, the kernel creates an init process with process identification 1 (PID 1).
The process 0 is a part of the kernel itself and basically functions as a sched (or swapper).
It also does the job of swapping, that is, moving in and out of the processes.
The init process always remains in the background while the system is running. It is the
ancestor of all further processes. It is the init process that forks the getty process, which
Manipulating Processes and Signals 149

enables users to log in to the Unix system. When a user logs in, the command shell runs
as the first process from where other processes are forked in response to the commands,
programs, utilities, etc., executed by the user.

Note: The process that calls fork is known as the parent process and the process that is created through
fork is known as the child process. The child process is an exact clone of the parent process. Both these
processes share the same memory, registers, environment, open files, etc. In addition, the parent and child
processes have separate address spaces enabling them to execute independently.

A process operates in either user mode or kernel mode:

User mode User mode is the mode in which processes related to user activities get
executed. Commands, programs, utilities, etc., executed by the user are run in this mode.
These processes being trivial in nature, the code in the user mode runs in a non-privileged
protection mode. Switching from user to kernel mode takes place either when a user’s
process requests services from the operating system by making a system call or when some
interrupt occurs during the events such as timers, keyboard, and hard disk input/output (I/O).
Kernel mode In kernel mode, the system processes, that is, the processes related to
managing a computer system and its resources get executed. The processes used to allocate
memory to access hardware peripherals such as printer and disk drive run in this mode.
These processes are critical in nature, that is, they can make an operating system inconsistent
if they are not handled properly. Hence for security reasons, these processes are run in a
privileged protection mode.

The user and kernel modes can be better understood with the help of a block diagram
(Fig. 6.1) of the kernel architecture.
In Fig. 6.1, the users initially execute their processes in the user mode. When the user
process needs some kernel service (such as accessing memory, disk file, printer, or other
hardware peripherals), it interacts with the kernel through the system call. System calls are
functions that run in the kernel mode. Hence while executing system calls, the user process
switches from the user mode to the kernel mode.
Figure 6.1 shows the following two main components that make up the kernel:

File subsystem
The file subsystem manages the files of the Unix system. In the previous chapters, we
learnt that everything in Unix is in the form of files, that is, all devices and peripherals are
considered files. Communication between the hardware and their respective device drivers
are managed by the file subsystem. Even the buffers that are used for storing the data that
is either fetched from the devices or is to be written to the devices are managed by the file
subsystem.

Process control subsystem


The process control subsystem manages all the tasks required for successful execution
of processes. It allocates memory to the processes and schedules, synchronizes, and even
150 Unix and Shell Programming

User

User mode
Kernel mode
System call inteface

File Process
subsystem control
subsystem
Interprocess
Buffer
communication
cache
Scheduler
Device Memory
drivers management
Hardware drivers/Interface

Hardware

Fig. 6.1 Block diagram of kernel architecture

implements communication between them. The processes are basically executable files
that are designated for certain tasks. For loading the executable file into the memory,
the process control system interacts with the file subsystem and thereafter executes it to
perform the required action. The process control subsystem comprises the following three
modules:
Interprocess communication An application usually consists of several processes that
undergo execution simultaneously. In addition, the data processed by one process has to be
input into another process for further processing. This module performs all the tasks required
to establish communication among the different processes and also synchronizes them. By
process synchronization, we imply that the module manages the locks when two processes
update a particular type of content, that is, it ensures that no two processes update the same
data simultaneously.
Memory management This module manages memory allocation. It allocates memory to
the required process. If the memory is not enough, it transfers certain selected pages of the
current process to the secondary storage, hence creating space for the required process. In
addition, it frees the memory assigned to the process when it is terminated so that memory
can be assigned to some other process.

Scheduler The task of this module is to pick up the ready-to-run processes from the
memory and assign the CPU to it. When the current process suspends for some I/O operation,
Manipulating Processes and Signals 151

its job is to seek the next process and schedule it for execution. In addition, when some
higher priority process comes in, the scheduler pre-empts the current process and brings in
the higher priority process and assigns the CPU to it.
Both the file and process subsystems are used for managing the hardware of a system.
These interact with the drivers and hardware interface (part of the kernel) for getting the
desired task performed by the hardware.
We will now be dealing with the processes in more detail, including the segments that
create them and the structures that are involved in handling them.

6.1.1 Process Address Space


Each process runs in its private address space. A process running in user mode refers to a
stack, data, and code areas. When running in kernel mode, the process addresses the kernel
data and code areas and uses a kernel stack. In short, a process includes three segments:
1. Text: It represents the program code, that is, the executable instructions.
2. Data: It represents the program variables and other data processed by the program code.
It is a global content that can be accessed by the program and its subroutines (if any).
3. Stack: It represents a program segment that is used while implementing procedure calls
for storing information pertaining to parameters, return addresses, etc.
Besides these three segments, the process also uses a memory heap to store the dynamic
structures. These dynamic structures are those that are created during the execution of the
process and are successfully removed when their task is completed, that is, the resources
allocated to the dynamic structures are immediately freed when their purpose is finished so
that those resources can be reused by other structures.

6.1.2 Process Structure


A process structure comprises a complex set of data structures that provide the operating
system with all the information necessary to manage and dispatch processes. It consists of an
address space and a set of data structures in the kernel to keep track of that process. The address
space is a section of the memory that contains the execution code, data, signal handlers, open
files, etc. The information about processes is described in the following data structures:

Process table
The process table (also known as kernel process table) is an array of structures that contains
an entry per process. Every process entry contains process control information required
by the kernel to manage the process and is hence maintained in the main memory. The
process entry is also known as process control block (PCB) and contains the following
information:
Process state It represents the process state, that is, whether it is in ready, running, waiting,
sleeping, or zombie mode.
Process identification information It uniquely identifies a process and consists of the
following three elements:
152 Unix and Shell Programming

1. Process identifier (PID): This refers to a unique number assigned to identify a process.
2. User identifier (UID): This refers to the ID of the user who created the process. The process
identification also includes the group ID of the user (GID), the effective user ID (EUID), set
user ID (SUID), file system user ID (FSUID), the effective group ID (EGID), set group ID
(SGID), and file system group ID (FSGID) of the user who also starts the process.
3. Parent process identifier (PPID): This refers to the identifier of the parent process that
created the process.
Program counter It stores the address of the next instruction to be executed by this process.
CPU registers It helps in initiation of the process using general-purpose and other registers.
CPU scheduling information It includes an algorithm on the basis of which the scheduling
of the process is determined.
Memory-management information It stores information of the memory used and released
by the process.
Accounting information It stores information such as process numbers, job numbers, and
CPU time consumed.
I/O status information It stores information such as the list of I/O devices and the status
of open files allocated to the process.

User area
The Unix kernel executes in the context of certain processes. The user area (U area) refers
to private information in the context of a process. The U area of a process contains the
following:
1. User IDs that determine user privileges
2. Current working directory
3. Timer fields that store the time the process spent in the user and kernel modes
4. Information for signal handling
5. Identification of any associated control terminal
6. Identification of data areas relevant to I/O activity
7. Return values and error conditions from system calls
8. Information on the file system environment of the process
9. User file descriptor table that stores the file descriptors of the files that the process has
opened
Note: The process entry also contains certain pointers such as pointers to the user and shared text areas.
You may recall that all the information of a file, such as file data, access permissions, and access
times, is stored in an inode. Inodes are maintained in the inode table. Besides inode table, the
kernel has two other file structures known as the file table and the user file descriptor table.
File table It is a global kernel structure that contains information such as storing the byte
offset in the file and indicating the location from where the next write/read operation will
start, mode of opening, and reference count of all the currently opened files. The file table
also contains the permissions that are assigned to the process.
Manipulating Processes and Signals 153

User file descriptor table An individual file descriptor is allocated per process. It keeps
track of the files that are opened by the process.
When a process opens or creates a file, a file descriptor for it is returned by the kernel, which
is stored as a new entry created in the user file descriptor table. For reading and writing into
a file, the file descriptor in the user file descriptor table is located and the pointers from it to
the file table and inode table are used to access or write the file data (refer to Fig. 6.2).

User file descriptor table File table Inode table

Fig. 6.2 Relation between user file descriptor table, file table, and inode table

After understanding the process table, let us discuss the next structure that stores information
that is private to the process.

Per process region table


The kernel process table points to per process region table as each process has a per process
region (pregion) table associated with it. The per process region table in turn points to the
region table to indicate the regions that are private to it and the regions that are shared with other
processes. This is to say that an entry in the region table may be shared with other processes
too. The per process region table is used to keep the following information of a pregion:
1. A pointer to an inode of the source file that contains a copy of the region, if any exists
2. The virtual address of the region
3. Permissions of the regions, that is, whether the region is read-only, read–write, or read–
execute
4. The region types (e.g., text, data, and stack)

Region table
A region is a continuous area of a process’s address space such as text, data, and stack.
Region table entries indicate whether the region is shared or private. They also point to the
location of the region in the memory (refer to Fig. 6.3). A region table stores the following
information:
1. Pointers to inodes of files in the region
2. The type of region
154 Unix and Shell Programming

Region table

U area Text
User file Data
descriptor Process table
Stack
table Process1
Process2
U area
User file Region shared
descriptor by two processes
table

Text
Data
Stack
Per process
region table
Fig. 6.3 Structures that make up a process
3. Region size
4. Pointers to page tables that store the region
5. Bit indicating if the region is locked
6. The process numbers currently accessing the region

6.1.3 Creation and Termination of Processes


Besides the built-in processes that are auto created on booting the Unix system, we can also
create our own processes. In addition, the processes can be terminated after their tasks are
completed in order to release the resources acquired by them. In Chapter 7, we will learn
about the system calls that are required to create, suspend, and terminate a process.
A process consumes system resources, such as memory, disk space, and CPU time. If
there is more than one process running at a time, the kernel allocates system resources to one
process, while keeping other processes waiting.
Let us see how the processes change their states and undergo transitions.

6.2 PROCESS STATES AND TRANSITIONS


A process is created through the fork command, and depending on the availability of the
primary memory, it is either kept in the memory in ready to run state or is swapped out to
the secondary memory in ready to run swapped out state, as shown in Fig. 6.4. The kernel
monitors the processes that are in ready to run state in the memory and schedules the process
depending on the algorithm used by the operating system. When scheduled, the process
executes in the kernel mode, that is, it switches to the kernel running state. From the kernel
running state, the process can be moved to the pre-empted state if a process of a higher
priority is scheduled. As a result, process switching takes place, wherein the current process
is switched to the pre-empted state and another process is scheduled to switch to the kernel
running state. In addition, the process running in kernel mode can return to the user mode,
Manipulating Processes and Signals 155

Fork

Created

Not enough memory


Primary memory
Return to user Pre-empted
is available

Swap out
User running Preempt Ready to run Ready to run
in memory Swap in swapped out
Return Reschedule
process
System call,
interrupt Kernel running
Wakeup Wakeup
Sleep
Exit

Asleep Swap out Sleep and


Zombie
in memory swapped out
Fig. 6.4 Different states of a process

that is, to the user running state. Besides this, a process in the kernel running state can also
switch to the sleep state, waiting for the occurrence of an event (like waiting for the user to
enter some data). This stage is known as asleep in memory state. The process in the kernel
running state can also terminate switching itself to the zombie state. A zombie process is a
dead child process that has completed its execution and has sent a SIGCHLD signal to its parent
allowing it to read its exit status. Until and unless the parent reads the exit status of the child
process, its entry remains in the process table. The process sleeping in the memory can either
be swapped out to the secondary storage to sleep and swapped out state or woken up to move
to the ready to run in memory state, if the event that it was waiting for occurs. The process
in sleep and swapped out state will be moved to the ready to run swapped out state where it
waits for the swapper to move it to the ready to run in memory state whenever it is required.
Note: When a process is required, the space for it is created in the primary memory and is swapped into the
primary memory by the swapper switching it to the ready to run state.
The process in the preempted state returns to the user mode, that is, the user running state
when it is required by the user. The process running in the user mode switches to the kernel
mode when an interrupt occurs, a system call is made to access operating system services, or
when some fault or exception occurs.
Note: The scheduler decides the process that has to be submitted next to the CPU for action.
The different states of a process are briefly described in Table 6.1.
Almost all the process states discussed are self-explanatory, except one, the zombie
process. We will elaborate on this in Section 6.3.
156 Unix and Shell Programming

Table 6.1 Unix process states


State Description
User running The process executes in user mode.
Kernel running The process executes in kernel mode.
Ready to run in memory The process is ready to run as soon as the scheduler schedules it.
Asleep in memory The process is unable to execute until an event occurs; the process is in main memory
(a blocked state).
Ready to run, swapped out The process is ready to run, but the swapper must swap the process into the main memory
before the scheduler can schedule it to execute.
Sleeping, swapped The process is waiting for an event and has been swapped to the secondary storage
(a blocked state).
Pre-empted The process is in the suspended mode as the higher priority process is scheduled and
switched to the kernel running state.
Created The process is newly created and not yet ready to run.
Zombie The process has completed execution and is currently dead, but still has an entry occupied in
the process table, waiting for the parent to read its exit status.

6.3 ZOMBIE PROCESS


A zombie process is a process that has completed execution and is currently dead, but still has
an entry occupied in the process table, waiting for the process that started it to read its exit status.
The zombie process does not consume any memory or other resources. Zombie processes are
usually created when a child process is spawned, but dies before the parent process reads its exit
status. Since the parent process has not received its return value, the process becomes a zombie.
We can identify a zombie process by executing the ps command. Zombie processes
contain a character Z in their state field (S), as shown in the following output:

$ ps -el

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY


1 Z 0 146 0 0 0 20 fec20000 0 d6dd12f2 tty01
0 S 1 6 0 0 40 15 d0b5c488 635 d08f8bfc tty01
1 R 0 3 1 0 0 12 dob5aad8 1175 d29bb8f2 tty01

The description of the output is as follows:


1. F represents flags associated with the process.
2. S represents the state of the process.
3. UID represents the user ID.
4. PID represent the process ID.
5. PPID represents the parent process ID.
6. C represents the utilization of the processor.
7. PRI represents the scheduling parameters for a process.
8. NI represents the nice value (discussed in Section 6.8.6 ) assigned to the process.
Manipulating Processes and Signals 157

Table 6.2 Brief description of the 9. ADDR represents the memory address of the
characters that may appear in the S column process.
Process States 10. SZ represents the total number of pages in the
process.
D Indicates a process in disk
11. WCHAN represents the address of an event where
I Indicates an idle process the process is switched to sleep mode.
R Indicates a runnable process 12. TTY represents the terminal from where the
S Indicates a sleeping process process is created.
T Indicates a stopped process The character Z in the S column confirms that it is
Z Indicates a zombie process a zombie process. We can see in this output that
the process with PID 146 is zombie. The other
characters that may appear in the S column to show the current state of the process are shown
in Table 6.2.
To remove or delete a zombie process, the kill command is used. To kill, the zombie
process with PID 146, shown in the output, can be deleted using the following statement:
$ kill -9 146
Conventionally, to remove a zombie process, its parent is informed that the child has died by
sending a SIGCHLD signal manually using the kill command. Thereafter, the signal handler
executes the wait system call that reads its exit status and removes the zombie. In case a
parent fails to call the wait system call, the zombie will be left in the process table. On reading
the exit status of the zombie process, it is removed. Once removed from the process table, the
zombie’s process ID and entry in this table can be reused. In case the parent process refuses
to remove the zombie, we can forcefully remove a zombie by removing the parent process.
Note: A zombie process is not the same as an orphan process. An orphan process is a process that is still
executing, but whose parent has died. Orphan processes do not become zombie processes, because when a
process loses its parent, the init process becomes its new parent.
What is the name of the task that suspends the execution of one process on the CPU while
resuming execution of some other process? It is called context switching. We will now
discuss this in detail.

6.4 CONTEXT SWITCHING


Depending on the priority, a current running process can be switched from the running state
to the blocked state (pre-empted, sleeping, etc.) at any time, and higher priority processes
can be scheduled to run. The state of the blocked process is saved so that in the future, it
can run further from the state at which it was held. The tasks conducted during process
switching, that is, saving the state of the current process and loading the saved state of the
new process is known as context switching. While context switching, the information of the
program counter and other registers of the blocked process is saved. In addition, its PCB is
updated to change its state from the running state to a pre-empted, sleeping, or other state.
The PCB of the process that is scheduled to run is updated to indicate its running state.
We usually get confused while differentiating between processes and threads, as both are
meant for processing. Then what is the difference between the two? Let us clarify this confusion.
158 Unix and Shell Programming

6.5 THREADS
A thread is the smallest unit of processing. A process can have one or more threads. This
is shown in Fig. 6.5. Multiple threads within a process share memory resources whereas
different processes do not share these resources. In multithreading, a processor switches
between different threads. A thread has its own independent flow of control as long as its
parent process exists and dies if the parent process dies.
Figure 6.5 shows two processes, Process 1 and
Process 1 Process 2 Process 2, in the user space. Process 1 consists of a
single thread whereas Process 2 is multi-threaded.
Threads have some properties of processes.
Like processes, a thread consists of the following:
1. A program counter to indicate which
instruction to execute next
2. Registers to store data in the variables
Single thread Multiple threads 3. A stack to store information related to the
procedure called
User space
Having properties similar to processes, threads
Fig. 6.5 Threads within processes
are also known as lightweight processes.
6.5.1 Comparison Between Threads and Processes
Table 6.3 shows the differences between processes and threads.
Table 6.3 Differences between processes and threads
Process Thread
Processes are individual entities. Threads are part of processes.
It takes quite a long time to create and terminate It comparatively takes lesser time to create a new thread
a process. than a process, because the newly created thread uses
the current process address space. Similarly, it takes
lesser time to terminate a thread than a process.
It takes longer to switch between two processes It takes lesser time to switch between two threads
as they have their individual address spaces. within the same process as they use the same
address space.
Communication of data among processes is Communication of data among threads is quite easy
quite sophisticated as it requires an inter process as they share a common address space.
communication mechanism.

Similar to a traditional process, a thread can be in any one of the following states: running,
blocked, ready, or terminated. A running thread is the one to which the CPU is assigned
and is currently active. A blocked thread is the one that is waiting for some event to occur.
On occurrence of the event, the blocked thread turns into a ready state. A ready thread is a
thread that has all the resources except the CPU, and hence waits for the CPU’s attention.
The thread that has completed its work is said to be in a terminated state. A thread can also
be terminated in between, if desired by the process.
Manipulating Processes and Signals 159

When a process is created, it is assigned a unique identification number known as the process
identifier (PID) by the kernel. The PID value can be any value from 0 to 32767. However, this
range depends on a particular Unix variant. It is typed as pid_t, whose size may vary from
system to system. The name of the process remains same as the name of the program being
executed. Every process is created from a parent process. The process that is created is known
as the child process and the process from which it is created is known as its parent process.
Unix creates the first process with PID as 0 when the system is booted.
Let us take a look at the commands that give us information of the processes running in
our system.

6.6 ps: STATUS OF PROCESSES


This command is used to display the list of processes that are running at the moment. The list
of the processes along with their PID number, terminal from where the process is executed,
the elapsed time (time consumed by the process since it started), and the name of the process
will be displayed.
Syntax ps [-a] [-e] [-f] [-l] [-L] [ -u user_name ] [ -g group_IDs ]
[ -t terminal] [ -p process_IDs ]

The options of the ps command are briefly described in Table 6.4.


Table 6.4 Brief description of the options used in the ps command
Options Description
-a It displays information regarding all the processes.
-e It displays information regarding every running process.
-f It displays full information regarding each process.
-l It displays a long listing.
-L It displays threads with lightweight processes (LWP) and number of lightweight
process (NLWP) columns.
-u user_name It displays the list of processes of the specified user. We can also specify the
user ID instead of the login name. In addition, we can specify more than one
username or user IDs separated by a space or comma (,).
-g group_IDs It displays the list of processes of the specified group_ID. We can specify more
than one group ID, separated by a space or comma (,).
-t terminal It displays the list of processes associated with the specified terminal.
-p process_IDs It displays information regarding specified process ID numbers.

Examples
(a) $ ps
PID TTY TIME CMD
739 tty01 00:00:03 sh
894 tty01 00:00:12 ps
160 Unix and Shell Programming

By default, the ps command displays only the processes that are running at the user’s terminal.
(b) To get the list of processes of the other users logged in to the system, we use the following
command.
$ ps -a
PID TTY TIME CMD
739 tty01 00:00:03 sh
894 tty01 00:00:12 ps -a
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat
724 tty03 00:00:08 sh

The option –a is used for displaying the processes of all the users.
(c) To get the list of processes of a particular user, we give the following command.
$ ps –u ravi
Here, the option –u is used for displaying a list of processes of only the specified user and ravi
is the login ID of the user whose process list we want to see. We may get the following output.
PID TTY TIME CMD
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat
(d) To get complete (full) information of the processes, including the login ID of the user,
ID of the parent process, CPU time consumed, etc., we give the following command.
$ ps -f
Here, the option –f stands for full information. We may get the following output.
UID PID PPID C STIME TTY TIME CMD
ravi 423 341 3 13:01:39 tty01 00:00:01 -sh
ravi 661 423 9 13:05:78 tty01 00:00:01 ps -f

The first column (UID) displays the login ID of the user. PID stands for the process iden-
tifier and is used for the identification of the process. PPID is the identification of the parent
process from where the current process was born (or created). C is the amount of CPU time
consumed by the process. STIME is the time when the process started. The login shell has
PID 423 and PPID 341, which implies that the shell is the child process that was created by
a system process with PID 341. The parent PID (i.e., PPID) of the ps -f command is 423 as this
command was launched by the shell (hence the shell is the parent process of the ps -f command).
(e) To get the list of processes that are created by the user from a particular terminal, we give
the following command.
$ ps –t tty02
PID TTY TIME CMD
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat

The option –t is used for specifying the terminal number.


Besides the processes that we create, there are several processes that are automatically
created by the Unix operating system at the time of booting and are used for managing
different tasks that include handling memory and other resources.
Manipulating Processes and Signals 161

(f) To see the list of the processes that are system-generated and the ones that are running at
the current instant, we give the following command.
$ ps -e
PID TTY TIME CMD
0 ? 00:00:00 sched
1 ? 00:00:01 init
2 ? 00:00:00 vhand
3 ? 00:01:01 bdflush
970 ? 00:00:00 getty
975 ? 00:01:00 getty
Most of the processes that we see in this listing are very important for the functioning of
the Unix operating system and hence keep running continuously in the background until
the system shuts down. These processes are known as daemons as they run automatically
without any request generated from the user. Since these system processes or daemons
are not executed from any terminal, we see a ? in the column TTY in the listing provided.
We also see in the aforementioned listing that the first process is the sched (scheduler) that
schedules the next process from the ready queue and submits it to the CPU for necessary
action. The init is the parent process of a daemon and its PID is 1. The vhand is a sort of page
stealing daemon that releases pages of the memory for use by other processes. The rest of
the processes (found in the list) also help in some way or the other in the proper functioning
of the Unix system and do different tasks such as initializing the processes, swapping in
and out the active processes, and flushing the buffer for different I/O operations, among
others.
(g) To see the threads of the currently running processes, we use the following command.
$ ps -L
PID LWP TTY LTIME CMD
739 1 tty01 0:00 sh
894 1 tty01 0:00 ps
This command shows threads with LWP and NLWP columns. As said in Section 6.6,
LWP and NLWP represent lightweight processes and number of lightweight process,
respectively.

6.7 HANDLING JOBS


A job refers to a command or program executed by the user to perform some task. As
discussed in Section 6.6, a process is nothing but a program in execution mode. In other
words, we can call our jobs as processes. The jobs or processes are controlled by the shell.
For example, the following command is a job or process.
Example $cat letter.txt

There are two types of jobs—foreground and background. Foreground jobs are those
that appear active on the terminal and need continuous interaction with the user for their
execution. In other words, a foreground job might require input from the user and until and
unless it is completed or suspended, no other job or command can be executed, whereas
162 Unix and Shell Programming

background jobs are those, which on execution, immediately display the shell prompt
allowing the user to execute other jobs. This means that the background job does not lock
the input and output terminals and instead allows the user to execute more processes.

6.7.1 fg: Foreground Jobs


Jobs that require a high level of user interaction are executed as foreground jobs. In addition,
the most preferred jobs, whose results we want to see immediately, are executed in the
foreground. The foreground job locks the standard input and output terminals and does not
allow any other job to begin until and unless it is either suspended or complete. To start a
foreground job, just type in a command followed by the Enter key.
Syntax fg [%job]

Here, %job represents the job we wish to run in the foreground.


Examples
(a) $ fg
When fg command is used without any arguments, it resumes the first job.
(b) $ fg %1
This statement resumes the job whose ID is 1.
(c) Any command that is issued initially runs the job in the foreground. The following sort
command executes in the foreground.
$ sort letter.txt

Suspending, resuming, and terminating foreground jobs


We can suspend any running foreground job and resume it any time we want. To suspend a
running foreground job, we press the Ctrl-z keys and to resume the suspended job, we use
the fg command.
Example
$ sort a.lst > b.lst Foreground job
Ctrl-z Suspended job
On suspending a foreground job, we immediately get the shell prompt. We can then give any
other command that we want to execute. For example,
$ date
Tuesday 10 Sep 2012 12:43:44 AM IST

To resume the suspended job, we use the fg command in the following way.
$ fg: It resumes the same suspended job, sort a.lst > b.lst (i.e., sort command).
To terminate (kill) a running foreground job, we use Ctrl-c. After terminating the job, we
press the Enter key for getting the command prompt.

6.7.2 bg: Background Jobs


The jobs whose results are not urgent, that is, jobs that have no time constraint and usually
take a longer time to complete are executed in the background. As mentioned in the
Manipulating Processes and Signals 163

beginning of this section, the background jobs do not lock the standard input and output
terminals and immediately display the shell prompt, allowing us to execute jobs of a higher
preference. To execute any job in the background, simply add the ampersand symbol (&)
after the command.
Syntax bg [%job]

Here, %job represents the job we want to run in the background.


Examples
(a) $ bg
This command displays the list of currently running jobs in the background.
(b) $ bg %1
This statement resumes or restarts the stopped background job with job ID 1.
Assuming we have a file letter.txt, we use the following command to sort the file
letter.txt in the background and save the sorted rows in the file better.txt.
$ sort letter.txt > better.txt &
[1] 53702
Since several jobs (commands) can be executed in the background, the kernel issues and
displays a unique job number and PID number of the executed background jobs for our
reference. Hence, the number [1] (1 within square brackets) is the job number and 53702
is the PID number of the job (sorting of file letter.txt). We can use the job number to
stop, restart, or kill the desired background job.

Suspending, resuming, and terminating background jobs


To suspend a background job, we use the stop command. To restart it, we use the bg
command. To terminate it, we use the kill command. For all the three commands, (stop, bg,
and kill), we need to specify the job number of the desired background job prefixed by the
percent (%) sign.
Syntax stop pid

Here, pid represents the process ID that we wish to suspend.


Examples
To understand how the background jobs are stopped, resumed, or killed, let us look at the
following steps:
(a) To execute a job in the background, we give the following command.
$ sort letter.txt > better.txt &
[1] 53702
Here, [1] is the job number of the given background job.
(b) To stop the job of sorting the file letter.txt, we specify its job number in the stop
command.
$ stop %1
[1] + 53702 stopped (SIGSTOP} sort letter.txt > better.txt &
(c) To resume or restart the stopped background job (of sorting the file letter.txt), we
specify its job number in the bg command.
164 Unix and Shell Programming

$ bg %1
[1] sort letter.txt > better.txt &
(d) If we do not want to sort the file letter.txt and wish to terminate the background job,
we kill the job by specifying its job number by using the following command.
$ kill %1
[1] + Terminated sort letter.txt > better.txt &
We can see that all the three commands—stop, bg, and kill—display the program name on
the right.

6.7.3 Switching Jobs from Background to Foreground and Vice Versa


Sometimes, we might want a task (running in the background) to finish a little faster or we may
expect a background job to request for user input. In such cases, we switch the background job
to the foreground job. Similarly, we may also need to switch a task running in the foreground
to the background so as to execute other jobs that are of a higher priority. We can switch a
job from the background to the foreground and vice versa when the job is in the suspended
mode. A foreground job (in suspended mode) can be switched to the background with the bg
command. Similarly, to switch a background job to the foreground, we use the fg command.
Since the background jobs run in the background, we might forget their job numbers and
hence would also like to see their status (i.e., if they are in the stopped or running mode).
To get a list of all the jobs running in the background along with their statuses, we use the
jobs command.

6.7.4 jobs: Showing Job Status


The jobs command displays all the jobs with their job number and the current status (running
or stopped mode).
Table 6.5 Brief description of the options of the Syntax jobs [ -l][-p ] [ %job_id ][%str][%?str]
jobs command [%%][%+][%−]

Options Description The options of the jobs command are briefly described
-l Displays the process ID along with the in Table 6.5.
job ID for each job All the jobs running in the foreground or
-p Displays only the process ID for each background will be displayed. The output of the jobs
job, without the job ID command displays the job number, currency flag, and
%job_id Represents the identification number of
the status of the job.
the job whose status we wish to find out Examples
%str Represents the job whose command
(a) $jobs
begins with the string, str
[3] + Stopped(SIGTSTP) sort letter.txt
%?str Represents the job whose command > better.txt &
contains the string, str [2] − Running cat abc.txt | lp
%% Represents the current job [1] + Running chirag1.sh&
%+ Represents the current job (same as %%)
In this listing, we see that job 3 has a plus (+) and
%- Represents the previous job job 2 has a minus (−) in the second column. These +
Manipulating Processes and Signals 165

and − signs are known as the currency flags. The plus sign (+) indicates the default job.
The default job is the job that will be considered when any of the commands, namely
stop, bg, fg, and kill, is given without specifying the job number. For example, if we
issue the kill command (without specifying the job number of the job that we want to
kill), job number 3 will be killed as it is the default job. The currency flag minus sign
indicates the default job that follows the first job. In other words, when the first default job
is terminated or is complete, the job with minus sign will become the default job, that is,
its sign currency flag will be changed from − sign to + sign.
When any job is suspended (by issuing Ctrl-z command), it automatically becomes
the default job and is assigned a + currency flag. When another job is also suspended,
that one becomes the default job (getting the + currency flag) and the earlier suspended
job gets the − currency flag, and so on.
(b) To display the process ID along with the job ID use the –l option in the following way.
$jobs –l
[3] + 30178 Stopped(SIGTSTP) sort letter.txt > better.txt &
[2] − 30189 Running cat abc.txt | lp
[1] 30190 Running chirag1.sh&
(c) To display the status of the job with ID 2, we give the following command.
$jobs %2
[2] - Running cat abc.txt | lp
(d) To display the status of the job that contains the lp command, we give the following
command.
$jobs %?lp
[2] - Running cat abc.txt | lp

Note: Process synchronization—When more than one process runs simultaneously, it is quite possible that
they try to access and modify the same content (of a file or its region) simultaneously. This situation may
result in inconsistency and ambiguity, that is, modifications made by one process may be lost or overwritten
by the modifications performed by another. Synchronization among the processes is implemented to maintain
consistency and avoid ambiguity. Process synchronization sets up a mechanism where only one process is
able to modify the content and other processes that wish to modify the same content are compelled to wait until
the first process is complete. Enabling only a single process to modify the content ensures the integrity of the
content. We will discuss process synchronization through semaphore in detail in Chapter 14.

6.8 SCHEDULING OF PROCESSES


Scheduling of a process is a mechanism of defining a timetable for different processes to
auto execute at a prescribed date and time. The tasks that are to be executed at the defined
period or time can be scheduled. For example, tasks such as sending reminders to save files,
taking a backup of data, or mailing important information can all be scheduled to run at a
specific date or time.
Unix provides several commands for scheduling a process to execute within a desired
period.
The first topic we will discuss in this section is cron, a time-based job scheduler.
166 Unix and Shell Programming

6.8.1 cron: Chronograph—Time-based Job Scheduler


cron is a daemon that keeps running and ticks (fires) every minute, that is, it gets activated every
minute and opens its special file to check if there are any processes waiting to be executed in
that particular minute. If none of the processes is found waiting, it goes to sleep again (to fire
in the next minute). If there are any processes to be executed in that minute, it executes them
and again goes to sleep. This daemon continues to execute until the Unix system shuts down.
The cron automatically starts when the Unix system boots. During booting, Unix
executes the file /etc/cron (to execute cron) and displays the message ‘cron started’ on
the terminal. The special file that is opened by the cron to view the list of processes that are
required to be executed is stored in the /usr/spool/cron/crontabs directory. We can also
create our own crontab file containing the list of processes along with their schedule and
place it in the /usr/spool/cron/crontabs directory. Let us see how a crontab file is created.

6.8.2 crontab: Creating Crontab Files


The crontab command creates a crontab file (containing the list of processes and their
schedule time) and places it in the /usr/spool/cron/crontabs directory with our login name.
For example, if your name is Ravi, then the crontab file made for you will be created with the
name Ravi in the /usr/spool/cron/crontabs directory. The crontab file is made on behalf
of the local file that we create in our home directory. The local file can be given any name,
say, a.bat, and it must contain a list of the processes that we want to execute along with the
schedule (date and time at which we want them to
Table 6.6 Brief description of the options of the be executed) in a specific format.
crontab command Syntax crontab [-l | -r | -e] [filename]
Options Description The options of the crontab command are briefly
-l It displays the crontab file. described in Table 6.6.
The format in which the command and schedule
-r It removes the crontab file.
is specified in the local file is as follows:
-e It edits the crontab file using the editor
Minute Hour Day Month Day of week
defined through the VISUAL or EDITOR
Command
environment variables. The modified crontab
file is taken into consideration when saved. Example
filename It refers to the optional file where the list Let us assume that we want the following task to
of commands and their schedules are be executed at the specified date and time.
defined. The crontab file has five fields for $ cat a.bat
specifying day, date, and time followed by 15 12 10,20 * * echo "Keep smiling and
the command that we wish to run at that work hard"
time. The five fields are given here: 0 10 1 1 * date > /dev/console
Minute: The valid value is from 0–59. The first command will echo the message at
Hour: The valid value is from 1–23. 12:15 on the 10th and 20th day of every month.
Day of month: The valid value is from 1–31. The second command will display the time at
Month: The valid value is from 1–12.
10 a.m. on January 1, every year. The asterisk in
Day of week: The valid value is from 0–6.
any field designates a wild card that matches any
Sunday is represented by 0.
value. For specifying more than one value we can
Manipulating Processes and Signals 167

use a comma (,). In this example we have used a comma to specify both the 10th and 20th
day of every month.
Each field in a.bat is separated by either a space or a tab. The first day of the week,
Sunday, is represented by 0.
When we execute the crontab command, the following occurs:
$ crontab a.bat
The contents of a.bat are automatically transferred to the /usr/spool/cron/crontabs directory
where they are stored in a file with our login name. From there onwards, the cron daemon will
read this file (crontab file) and execute the commands (processes) specified in it regularly.
If we want to make some changes in the scheduling of the processes, we need to edit
our local file a.bat (in our home directory) and after saving the changes, again execute
the crontab command to re-transfer it in the /usr/spool/cron/crontabs directory using our
login name (the earlier crontab file will be replaced by the new one).
To view the commands that we have supplied to our crontab file, we use the -l option with
the crontab command:
$crontab -l
To remove the crontab file, we use the following command:
$ crontab -r
Another command that allows the scheduling of processes is the at command. We will now
study this.

6.8.3 at: Scheduling Commands at Specific Dates and Times


The at command is used for executing Unix commands at a specific date and time. Tasks such as
taking backup of the disk at regular intervals or sending mail messages at odd hours can be easily
accomplished using the at command. We can specify the Unix commands (to be executed) at the
command prompt or save them in a file and use the file to execute the commands.
Syntax at [-f filename] [-m] [-l] [-r] time

The options of the at command are briefly described in Table 6.7.

Table 6.7 Brief description of the options of the at command


Option Description
-f filename It reads the commands to be executed from the specified filename instead of the standard input.
-m It mails the user when the commands are executed.
-l It lists the commands that are scheduled to run.
-r It cancels the scheduled command.
time It indicates the time at which we wish to execute command(s). We can define the time either
specifically or relatively. The specific time can be given in the following format:
hh:mm a.m./p.m

(Contd)
168 Unix and Shell Programming

Table 6.7 (Contd)


Option Description
Here, a.m./p.m. indicates that time is in the 12-hour format. Without a.m./p.m., the time is assumed to
follow a 24-hour clock. We can also specify optional time zone such as EST and GMT after the time.
The time can be relatively specified in any of the following ways:
now: This indicates the current day and time.
today: This indicates the current day.
tomorrow: This indicates the day following the current day.
midnight: This indicates the time 12:00 a.m., that is, 00:00.
noon: This indicates the time 12:00 p.m.

We can also specify a future time by adding a plus sign (+) followed by the minute, hours,
days, weeks, months, or years.
Examples
(a) $at 18:00
echo "Office time over. Time to log out"> /dev/tty02
Ctrl-d
Job 3434443 at Sun Nov 16 18:00:00 IST 2012
On pressing Ctrl-d, the at command displays the job number and the date and time
of the scheduled execution of the echo command. The job number terminates with ‘a’
indicating that this job has been submitted using the at command.
Now, the following message will be echoed on the tty02 terminal at 6:00 p.m.
Office time over. Time to log out.

Note: When the output is redirected to a terminal, as is done in the aforementioned command (/dev/tty02),
the message will be echoed on the screen and when redirection is not specified, the message is received by
the target through mail command.
(b) We can also execute the commands stored in a file as shown in the following example.
$at 18:00
jobstodo.sh
Ctrl-d
Job 3434443.a at Sun Nov 16 18:00:00 IST 2010
By executing this command, all the commands stored in the script file jobstodo.sh will
be executed at 6:00 p.m. and their outputs will be mailed to us. You may recall that if the
redirection is not specified for any command, its output is sent to the user through mail.
(c) It can be noted that we can also add a.m. or p.m. with the time. For example, in the
aforementioned command, we can write $at 18:00 as $ at 6pm
On executing this at command, we will see a message on our screen displaying ‘you
have mail’ at 6 p.m.
(d) To view the output of the aforementioned command, we use the following mail command.
$mail
message 1:
To: ravi
Manipulating Processes and Signals 169

Date: Sun Nov 16 18:00:00 IST 2010


Office time over. Time to logout
$
(e) To schedule jobs from the given file, we give the following command.
$ at –f jobs.txt 11:00 today
All the commands specified in the file jobs.txt will be executed at 11 o’clock on that
particular day.

Note: The commands specified in jobs.txt will still run even if we exit from the system.

(f) To view the list of jobs submitted using the at command, we give the following
command.
$at –l
(g) To remove scheduled jobs from the job queue, we use the following command.
$ at -r 3434443
This command will remove job 3434443 from the job queue.
(h) We can use a lot of keywords when specifying the time for scheduling jobs such as now, today,
tomorrow, noon, day, year, month, hours, and minutes. The following are some examples.
(i) $ at now + 2 hours
(ii) $ at now +1 week
(iii) $ at 6pm today
(iv) $ at 6pm next month
(v) $ at 6pm Fri
(vi) $ at 0915 am Nov 16
(vii) $ at 9:15 am Nov 16
The two commands that are often discussed along with the at command are atq and atrm.

atq This command lists the jobs that are scheduled to run, similar to the at -l command.
The jobs are displayed along with their job number, date, hour, etc.
Syntax atq

Example
$ atq
324556 2012-10-15 10:30 a sort a.txt
324557 2012-10-16 07:00 a date
atrm This command deletes the specified job number, similar to the at -r command.
Syntax atrm job_no

Example $ atrm 3434443

This command will remove job 3434443 from the job queue.

Note: The difference between the at and crontab commands is that the jobs scheduled by the at command
have to be rescheduled after their execution (if we want to execute them again). On the other hand, crontab
carries out the submitted job every day for years without the need for rescheduling.
170 Unix and Shell Programming

6.8.4 batch: Executing Commands Collectively


As the name implies, the batch command is used for issuing a set of commands that we want
to execute collectively (in a batch). The commands given in the batch will be executed later
when the system load permits, that is, when the CPU is free, it will execute the commands
specified by the batch command.
Syntax batch [-f filename] [-m] [-l] [-r] time
The options of the batch command are briefly described in Table 6.8.
Table 6.8 Brief description of the options of the batch Examples
command
(a) $batch
Option Description echo "Keep smiling and work
-f filename It reads the commands to be executed hard"
from the specified filename instead of the date > /dev/console
standard input. sort letter.txt > better.txt
Ctrl d
-m It sends mails to the user when the
job 6646566.b at Sun Nov 16
commands are executed.
18:00:00 IST 2010
-l It lists the commands that are collected to
On pressing Ctrl-d, we will get a
run in a batch.
job number that terminates with b
-r