UNIX and Shell Programming (Zer07)
UNIX and Shell Programming (Zer07)
PROGRAMMING
B.M. Harwani
Founder & Owner
Microchip Computer Education (MCE)
Ajmer
3
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries.
Published in India by
Oxford University Press
YMCA Library Building, 1 Jai Singh Road, New Delhi 110001, India
ISBN-13: 978-0-19-808216-3
ISBN-10: 0-19-808216-9
Typeset in Times
by Quick Sort (India) Private Limited, Chennai
Printed in India by Raj Kamal Electric Press, Kundli, Haryana
KEY FEATURES
The book is packed with numerous student-friendly features that are described here.
• Complete scripts along with their outputs are provided for easy implementation of the concepts learnt.
• Each command is explained with its syntax with the help of multiple examples.
Preface vii
• Several options of a single command have been provided in a tabular format along with their function,
description, and examples for quick understanding and usage.
• Numerous notes are interspersed with the text for providing additional relevant information.
• Around 1000 solved examples and over 900 end-chapter exercises (with answers to objective-type
questions) are provided.
• Specially designed brain teasers are provided at the end of most chapters for the readers to develop
an analytical approach to problem-solving.
• A variety of objective-type questions—state true or false, fill in the blanks, and multiple-choice
questions—are provided at the end of every chapter for testing the understanding of the concepts
learnt.
• Several review questions and programming exercises are provided for the reader to practise the
commands and scripts explained in the chapters.
ONLINE RESOURCES
The companion website of the book, http://oupinheonline.com/book/harwani-unix-shell-programming/
9780198082163, provides the following additional resources:
For faculty
• Chapter-wise PowerPoint Slides
• Answers to select programming exercises given in the book
For students
• Chapter-wise executable and complete shell scripts and codes for all the programs given in the
book
• Mail Organizer—a small project that sends mail to the desired recipient on a given date
• Inventory Management System—a small project that explains maintenance of inventory using
MySQL database server
• Debugging exercises with solutions
• Flashcards—for active recall of all important Unix commands
basic calculators, displaying information about current systems, deleting symbolic links, and exiting
from a Unix system.
Chapter 4, Advanced Unix Commands, discusses advanced commands such as setting access permissions
for the existing files and directories, setting default permissions for the newly created files and directories,
creating groups, changing ownerships of the files, and sharing files among groups. The chapter covers
commands for sorting content, performing I/O redirections, cutting the file vertically, pasting content,
splitting files, counting characters, words, and lines in files, using the pipe operator, comparing files,
eliminating and displaying duplicate lines, among others.
Chapter 5, File Management and Compression Techniques, explains the types of devices, role of device
drivers, and the way in which devices are represented in the Unix operating system. It details different
disk-related commands required for copying, formatting, finding usage, finding free space, and making
partitions. It also covers compression and decompression of files.
Chapter 6, Manipulating Processes and Signals, focuses on processes and their address space, structure,
data structures describing the processes and process states, commands related to scheduling processes
at the desired time, handling jobs, and switching jobs from the foreground to the background and vice
versa. It explains suspending, resuming, and terminating jobs, executing commands in a batch, ensuring
process execution even when a user logs out, increasing and decreasing the priority of processes, and
killing processes. The chapter also discusses signals, their types, and the methods of signal generation,
virtual memory and its role in executing large applications in a limited physical memory, and mapping
of a virtual address to the physical memory.
Chapter 7, System Calls, is devoted to the role of system calls in performing different tasks. The chapter
explains system calls that are used in file handling operations such as opening, creating, reading from
and writing to files, closing, deleting, and linking to files, changing file access permissions, accessing
file information, and relocating and duplicating file descriptors. The chapter covers the system calls that
perform different tasks related to directory handling such as changing, opening, and reading directories.
The chapter throws light on the system calls involved in process handling operations such as the exec(),
fork, and wait system calls and those that deal with memory management—allocating memory, freeing
memory, changing the size of the allocated memory, file locking, and record locking .
Chapter 8, Editors in Unix, explains the usage of the stream editor (sed) in filtering out the desired data
from the specified file, inserting lines, deleting lines, saving filtered content into another file, loading the
content of another file into the current file, and searching for content that matches specific patterns. The
chapter also explains the visual editor (vi) and the modeless editor (emacs).
Chapter 9, AWK Script, discusses the role of the AWK scripts in filtering and processing content. It
explains the different functions used in AWK for printing results, formatting output, and searching for
desired patterns. The chapter also details different operators (comparison, logical, arithmetic), functions
(string, arithmetic, and search and substitute), and built-in variables to perform the desired operations
quickly and with the least effort. It also discusses different loops to perform repetitive tasks, taking input
from the user to perform operations on the desired content.
Chapter 10, Bourne Shell Programming, explains different command line parameters used in Bourne
shell scripts, conditional statements, loops, reading input, displaying output, testing data, translating
content, and searching for patterns in files. The chapter also covers displaying the exit status of the
Preface ix
commands, applying command substitution, sending and receiving messages between users, creating
and using functions, setting and displaying terminal configurations, managing positional parameters,
and using fetch options in the command line.
Chapter 11, Korn Shell Programming, helps us in understanding different features of the Korn shell,
command line editing, file name completion, command name aliasing, command history substitution,
and meta characters. It explains different operators, shell variables, basic I/O commands, command line
arguments, if else and case statements, strings, files, loops, arrays, functions, and I/O redirection.
Chapter 12, C Shell Programming, describes the C shell and its different features. The chapter explains
command history, command substitution, filename substitution (globbing), filename completion, and
aliases. It also covers job control, running jobs in the background, and suspending, resuming, and killing
jobs. It aids in the understanding of environment variables, shell variables, built-in shell variables, and
customizing the shell and C shell operators. The chapter also discusses different flow control statements,
loops, arrays, and errors.
Chapter 13, Different Tools and Debuggers, describes language development tools Yacc, Lex, and M4 and
text-formatting tools, troff and nroff. The chapter covers different preprocessors for nroff and troff
such as tbl, eqn, and pic. The chapter also discusses debugger tools, dbx, adb, and sdb.
Chapter 14, Interprocess Communication, covers pipes and messages as also accessing, attaching,
reading, writing, and detaching the shared memory segment. It helps the readers in getting acquainted
with initializing, managing, and performing operations on sockets (stream and datagram), I/O
multiplexing, filters, and semaphores.
Chapter 15, Unix System Administration and Networking, discusses the Unix booting procedure,
mounting and unmounting file systems, managing user accounts, network security, and backup and
restore.
ACKNOWLEDGEMENTS
I thank my family, my small world: my wife, Anushka and my wonderful children, Chirag and Naman
for inspiring and motivating me and forgiving me for spending long hours on the computer during
the course of development of this book.
Speaking of encouragement, I must thank my students who, with their innumerable queries, helped
me understand the essential expectations of a reader. This in turn made me add numerous examples
and exercises, thus giving a practical approach to the book.
My acknowledgements would remain incomplete if I did not thank the editorial team at Oxford
University Press, India, who supported me throughout the development of this book. My special
thanks are due to the reviewers for their constructive comments and valuable suggestions.
I have tried to cover the necessary topics and explain them in a simple and user-friendly manner.
Any comments or suggestions that can be incorporated in future editions of this book may be sent to me
at [email protected].
B.M. Harwani
Brief Contents
Features of the Book iv
Preface vi
Detailed Contents xi
1. Unix: An Introduction 1
2. Unix File System 13
3. Basic Unix Commands 27
4. Advanced Unix Commands 59
5. File Management and Compression Techniques 94
6. Manipulating Processes and Signals 148
7. System Calls 192
8. Editors in Unix 258
9. AWK Script 305
10. Bourne Shell Programming 378
11. Korn Shell Programming 480
12. C Shell Programming 558
13. Different Tools and Debuggers 624
14. Interprocess Communication 653
15. Unix System Administration and Networking 672
Index 697
Detailed Contents
Features of the Book iv
Preface vi
Brief Contents x
1. Unix: An Introduction 1 2.4.3 Inode Block 21
1.1 Operating System 1 2.4.4 Data Block 24
1.1.1 Functions of Operating Systems 2
1.2 History of Unix 3 3. Basic Unix Commands 27
1.3 Overview and Features of 3.1 login: Logging in to Systems 27
Unix System 4 3.2 Overview of Commands 28
1.3.1 Multitasking 4 3.2.1 Structure 29
1.3.2 Multi-user 5 3.2.2 Types of Commands in Unix 29
1.3.3 Portability 5
1.3.4 Job Control 5 4. Advanced Unix Commands 59
1.3.5 Tools and Utilities 5 4.1 Overview 59
1.3.6 Security 6 4.2 File Access Permissions 60
1.4 Structure of Unix System 6 4.2.1 chmod: Changing File
1.4.1 Hardware 6 Access Permissions 61
1.4.2 Kernel 7 4.2.2 umask: Setting Default
1.4.3 Shell 8 Permissions 62
1.4.4 Tools and Applications 9 4.2.3 chown: Changing File Ownership 64
1.5 Unix Environment 9 4.2.4 chgrp: Changing Group
1.5.1 Stand-alone Personal Environment 10 Command 65
1.5.2 Time-sharing Environment 10 4.2.5 groups: Displaying Group
1.5.3 Client–Server Environment 10 Membership 66
4.2.6 groups: Sharing
2. Unix File System 13 Files Among Groups 66
2.1 Introduction to Files 13 4.3 Input/Output Redirection
2.1.1 Types of Files 13 in Unix 67
2.1.2 Symbolic Links 15 4.3.1 Output Redirection Operator 67
2.1.3 Pipes 15 4.3.2 Input Redirection Operator 68
2.1.4 Sockets 16 4.4 Pipe Operator 68
2.2 Organization of File Systems 16 4.5 cut: Cutting Data from Files 68
2.3 Accessing File Systems 17 4.6 paste: Pasting Data in Files 71
2.3.1 Mounting File Systems 18 4.7 split: Splitting Files into
2.3.2 Unmounting File Systems 18 Lines or Bytes 71
2.4 Structure of File Systems 20 4.8 wc: Counting Characters,
2.4.1 Boot Block 20 Words, and Lines in Files 73
2.4.2 Super Block 20 4.9 sort: Sorting Files 73
xii Detailed Contents
9.1.2 Advantages and Disadvantages 10.2 Beginning Bourne Shell Scripting 379
of Using AWK Filters 306 10.2.1 echo: Displaying
9.2 print: Printing Results 307 Messages and Values 379
9.3 printf: Formatting Output 308 10.2.2 Variables 380
9.4 Displaying Content of 10.2.3 expr: Evaluating Expressions 380
Specified Patterns 308 10.2.4 let: Assigning and
9.5 Comparison Operators 309 Evaluating Expressions 381
9.5.1 ~ and !~: Matching 10.2.5 bc: Base Conversion 381
Regular Expressions 310 10.2.6 factor: Factorizing Numbers 382
9.6 Compound Expressions 312 10.2.7 units: Scale Conversion 383
9.7 Arithmetic Operators 315 10.3 Writing Shell Scripts 383
9.8 Begin and End Sections 315 10.4 Command Line Parameters 385
9.9 User-defined Variables 316 10.5 read: Reading Input from Users 385
9.10 if else Statement 318 10.6 for Loop 386
9.11 Built-in Variables 321 10.7 while Loop 390
9.11.1 fs: Field Separator 322 10.8 until Loop 392
9.11.2 ofs: Output Field Separator 322 10.9 if Statement 393
9.12 Changing Input Field Separator 323 10.10 Bourne Shell Commands 394
9.13 Functions 324 10.10.1 test: Testing
9.13.1 String Functions 325 Expressions for Validity 395
9.13.2 Arithmetic Functions 334 10.10.2 [ ]: Test Command 397
9.14 Loops 337 10.10.3 tr: Applying Translation 400
9.14.1 for Loop 337 10.10.4 wc: Counting Lines,
9.14.2 do while Loop 341 Words, and Characters 403
9.14.3 while Loop 342 10.10.5 grep: Searching Patterns 404
9.15 Getting Input from User 343 10.10.6 egrep: Searching Extended
9.15.1 getline Command: Regular Expressions 409
Reading Input 343 10.10.7 Command Substitution 411
9.16 Search and Substitute 10.10.8 cut: Slicing Input 412
Functions 345 10.10.9 paste: Pasting Content 413
9.16.1 sub() 345 10.10.10 sort: Sorting Input 415
9.16.2 gsub() 347 10.10.11 uniq: Eliminating and
9.16.3 match() 348 Displaying Duplicate Lines 421
9.16.4 toupper() 349 10.10.12 /dev/null:
9.16.5 tolower() 349 Suppressing Echo 422
9.17 Copying Results into 10.10.13 Logical Operators 426
Another File 361 10.10.14 exec: Execute Command 429
9.18 Deleting Content from Files 363 10.10.15 sleep: Suspending Execution 434
9.19 Arrays 364 10.10.16 exit: Terminating Programs 435
9.20 Associative Arrays 366 10.10.17 $?: Observing Exit Status 436
10.10.18 tty: Terminal Command 441
10. Bourne Shell Programming 378 10.10.19 write: Sending and
10.1 Introduction 378 Receiving Messages 442
xvi Detailed Contents
14.4.2 less Filter 667 15.5.8 Trivial File Transfer Protocol 683
14.4.3 tee Command 668 15.5.9 finger 683
15.5.10 rlogin 683
15. Unix System Administration 15.5.11 Unix Network Security 684
and Networking 672 15.6 mail Command 685
15.1 Unix Booting Procedure 672 15.6.1 Sending E-mails 685
15.1.1 Single-user Mode 672 15.6.2 Reading Mails 686
15.1.2 Multi-user Mode 673 15.6.3 Sending Replies 687
15.2 Mounting Unix File System 673 15.6.4 Mail Commands 687
15.3 Unmounting Unix File System 674 15.6.5 Saving Messages 688
15.4 Managing User Accounts 674 15.6.6 Deleting Messages 688
15.4.1 Creating User Accounts 674 15.6.7 Undeleting Messages 689
15.4.2 Modifying User Accounts 676 15.6.8 Quitting Mail Command 689
15.4.3 Deleting User Accounts 676 15.7 Distributed File System 689
15.4.4 Creating Groups 677 15.7.1 Andrew File System 690
15.4.5 Modifying Groups 677 15.8 Firewalls 691
15.4.6 Deleting Groups 677 15.8.1 Advantages 692
15.5 Networking Tools 678 15.8.2 Building
15.5.1 ping 678 Simple Firewalls 692
15.5.2 nslookup 678 15.9 Backup and Restore 692
15.5.3 telnet 679 15.9.1 tar 693
15.5.4 arp 680 15.9.2 cpio 693
15.5.5 netstat 681 15.9.3 dd 693
15.5.6 route 681 15.10 Shut Down and Restart 693
15.5.7 ftp 681
Index 697
Unix: C HA PT E R
An Introduction
1
After studying this chapter, the reader will be conversant with the following:
• Fundamentals of operating systems
• History of Unix
• Structure of the Unix operating system
• Various types of shells and their responsibilities
• Numerous features of the Unix operating system
• The Unix environment
Even after four decades of use, Unix is regarded as one of the most powerful operating
systems, due to its portability and usage in almost all kinds of environments, ranging from
micro to supercomputers.
We cannot even think of using a computer system without an operating system. An
operating system is an interface that enables the use of a computer system’s resources;
without an operating system, the computer will be a dead piece of electronic device.
In this chapter, before delving into the history and structure of Unix, we will attempt
to understand the following: what an operating system is; why it is essential in running a
computer system; and in what manner Unix is different from the other operating systems
used earlier and in recent times.
As depicted in Fig. 1.1, it is evident that users are able to interact with hardware through the
operating system. The operating system as well as the software creates an environment for
the user that enables easy access and use of hardware. Basically, the operating system creates
an interface between the user and the hardware.
The following section discusses the functions that an operating system performs, which
enable easy operation of a computer system.
1.1.1 Functions of Operating Systems
An operating system performs the following functions:
Memory and data management All operating systems provide methods for controlling
data in the memory. When a job has to be performed, the operating system should allocate
the memory for loading that job into the memory.
Communication An operating system should support methods in such a manner that the
various computer systems can communicate with one another for exchange of data.
Time sharing Time sharing enables several people to use the same computer
simultaneously. A few operating systems support time-sharing features.
Security In a multi-user environment, security should be provided by the operating system.
This security prevents one user from interfering with the work done or being done by another
user. It also prevents unauthorized personnel from using the computer system.
User-command interpretation This is a function of the operating system using which the
commands that are typed in by the user are read and interpreted by the operating system.
Through interpretation, the operating system understands what the user wants.
Accounting Through this function, the operating system keeps an account of all the
resources used by different processes. Resources, here, means memory, CPU, disk space
requirement, and so on.
Program development tools All operating systems provide program
development tools, which assist users in writing and maintaining programs.
Users Software development is one of the important features provided by the
operating system.
Scheduling A scheduler is the heart of all multi-user operating systems.
Software This program enables many people to use the computer simultaneously. The
scheduler assigns the CPU time slice to the ready process. After that time slice,
the process is stored in the wait queue, the next process in the ready queue is
Operating system picked, and the CPU pays attention to it.
Swapping When several users are working simultaneously, their processes are
stored in the memory. When the memory is full and a new process has to be
Hardware activated, the scheduler takes the current process in the memory and copies it to
the hard disk. Next, the scheduler starts a new process in the space freed in the
Fig. 1.1 Operating memory. This process is known as swapping. After returning to the time slice,
system in relation to the process that was swapped out of the memory is brought in (swapped in), and
hardware, software, and some other process is swapped out. This feature is available in a virtual memory
users environment.
Unix: An Introduction 3
types of architecture. At this point, the Unix systems group (USG) was created, and it was
focused on enhancing the seventh edition. Three groups were working in all and the original
versions of Unix were developed by the computer research group (CRSG) of Bell Labs. The
support for internal releases was provided by the USG. The task of developing and writing
tools was done by another group at Bell Labs, the programmer’s workbench (PWB).
The development of Unix split into two main branches: System 5 (SYSV) and Berkeley
software distribution (BSD). BSD was developed by students and professors at the University
of California, Berkeley. SYSV was developed by AT&T and other commercial companies.
In 1979, 3BSD, the third edition, was released.
In 1980, 4.0BSD, the fourth version of the BSD Unix variant, was released.
In 1982, AT&T transferred its Unix development to Western Electric, which developed
the System III version of Unix.
In 1983, Western Electric released System V, whereas System IV was reserved for only
AT&T’s use.
In 1984, the USG group, which was renamed the UNIX system development laboratory
(USDL) group, released System V Release 2 (SVR2), which was the first version of Unix
that supported paging, shared memory, and other associated features.
In 1985, the eighth edition of Unix was released on the basis of the 4.1BSD version.
In 1987, the USDL group, which was renamed AT&T Information Systems (ATTIS)
group, released System V Release 3 (SVR3).
In 1988, the ninth version of Unix was developed, and it was based on the 4.3BSD version.
In 1989, the tenth version of Unix was developed.
Unix is one of the most popular operating systems, which was developed step by step, as
evident from the aforementioned timeline.
Let us now have a broad overview of the Unix system.
1.3.1 Multitasking
Unix is a multitasking operating system, that is, it can execute multiple tasks simultaneously.
In a multitasking environment, the CPU processes a task and when the process waits for
an input/output (I/O) operation to be completed, the CPU switches to another task. The
switching between tasks is so fast that it appears that the operating system is executing all
the tasks simultaneously. Due to multitasking, we can carry out several tasks simultaneously.
Unix: An Introduction 5
For instance, commands for printing a file, editing text, and managing files can be given
simultaneously; all tasks are thus performed simultaneously. With the help of this feature,
Unix maximizes the computer resource utilization and hence, the computer’s efficiency.
1.3.2 Multi-user
The multi-user feature of Unix enables several users to work simultaneously and access
system resources concurrently. The operating system not only receives commands from all
the users, but also carries out the desired processing and responds accordingly. The operating
system manages the consumption of system resources among the users and implements the
locking mechanism to maintain the integrity and consistency of applications and data that
are accessed simultaneously. The multi-user approach maximizes the computer resource
utilization and hence reduces the cost per user. Since the system resources are shared,
resource management is done so as to avoid any deadlock.
Note: Deadlock is a situation wherein two or more competing actions wait for each other to finish and, as a
result, neither reaches completion.
1.3.3 Portability
Unix is portable, that is, it is available on a wide range of hardware. Since the Unix operating
system is coded in a high-level language, C programming language, it is less hardware
dependent and, hence, can be easily moved from one brand of computer to another without
a major code rewrite. It is also the kernel that provides an interface between the hardware
and other application modules. The application modules interface with the kernel and not the
hardware, and hence, when Unix is ported to another hardware platform, only the kernel and
not the application modules requires modification. This makes the operating system almost
hardware independent and does not require much modification.
1.3.6 Security
Unix is considered a comparatively more secure operating system. Each user has an identity
through a unique user ID and group ID. In order to avoid any unauthorized access, each
file and directory has an owner and a group that are associated with it. Three permissions
are attached to each file and directory—read, write, and execute. The set of permissions, r,
w, and x, are associated with the three types of users—owners, groups, and others. Hence,
we can individually assign the desired permissions to these three types of users. Next let us
explore the structure of the Unix system.
1.4.1 Hardware
Hardware refers to the physical components that collectively form a computer machine.
The following three primary components constitute the hardware of a computer system:
I/O devices Data is supplied or entered into the computer for processing through input
devices such as keyboard, mouse, track ball, magnetic ink character recognition (MICR),
optical character recognition (OCR), and optical mark recognition (OMR). Output devices
display processed data. The two most common output devices are screen and printer.
Central processing unit The central processing unit (CPU) is the heart of the computer.
It obtains the data from the user through input devices, processes the entered data into
information, and displays the information through output devices. The processed data can be
saved in the memory for future use.
Note: Networking components such as LAN cards, cables, routers, and switches are also considered part of
the hardware.
1.4.2 Kernel
The kernel is the heart of any operating system. Its main purpose is to ensure that the jobs
of the operating system are performed properly. These jobs mainly include the scheduling
of tasks, resource management, process management, and file management. Resource
management refers to the allotment of CPU time, disk space, memory space, and so on to
different processes. Process management includes the allocation of resources such as CPU,
memory, and other devices. File management includes the management of files and their
permissions, among others.
The kernel hides all the complexities of accessing hardware and provides a user-friendly
interface by doing all the tasks behind the scene.
A brief view of the different tasks performed by the kernel is provided in Fig. 1.3.
Let us take a quick look at the operations that a kernel can perform:
1. It controls the execution of processes by enabling their creation, termination or suspension,
and communication.
2. It schedules processes fairly for execution on the CPU. The processes share the CPU
in a time-shared manner. The CPU executes a process; the kernel suspends it when its
time quantum elapses and schedules another process to be executed. Later, the kernel
reschedules the suspended process.
3. It allocates the main memory for an executing process. The kernel enables processes to
share portions of their address space under certain conditions, but protects the private
address space of a process from outside tampering. If the system runs low on free memory,
the kernel frees the memory by writing a process temporarily to the secondary memory,
which is called a swap device. If the kernel writes entire processes to a swap device,
the implementation of the Unix system is called a swapping system, whereas if it writes
pages of memory to a swap
device, it is called a paging
Applications Shells Utilities system.
4. It allocates secondary
memory for efficient stor-
age and retrieval of user
System call interface
data. This service consti-
tutes the file system. The
kernel allocates second-
Kernel
ary storage for user files,
reclaims unused storage,
structures the file sys-
Memory Process File Peripheral tem in a well-understood
management scheduling systems devices manner, and protects
unauthorized users from
Fig. 1.3 Different tasks performed by the kernel illegal access.
8 Unix and Shell Programming
1.4.3 Shell
The shell is an interface between the user and the kernel. The kernel does not know human
language; hence the shell accepts the commands from the user and converts them into a
language that the kernel can understand. It is a program that interprets user requests, calls
programs from the memory, and executes them one at a time. Several shells such as Bourne,
Korn, Bourne-again, and C Shell are available.
The shell also provides the facility of chaining or pipelining commands. This means the
output of one command is sent to the input of another command for further processing. In
this manner, one input data can be processed by several commands.
There are two major parts of a shell. The first is the interpreter. The interpreter reads
out commands and works with the kernel to execute them. The second part of the shell is a
programming capability that enables us to write a shell (command) script. A shell script is a
file that contains a collection of shell commands to perform a specified task. It is also known
as a shell program.
Types of shells
Shells are independent of the underlying Unix kernel. This fact has enabled the development
of several shells for Unix systems. Each type of shell has its own special features.
Bourne shell It is the most common shell in Unix systems and was the first major shell. It
was developed by Steve Bourne at the AT&T Labs. This shell was released in 1977 and was
called ‘sh’.
Korn shell It was developed by David Korn at AT&Bell Labs. It is built on the Bourne
shell. The most stable version of this shell was released in 1988 by AT&T’s Unix System
Laboratories as ‘ksh’. The Korn shell also incorporates the features of the C shell (e.g.,
process control). One of the important features of this shell is that it can run Bourne shell
scripts without any modification at all.
Bourne-again shell An enhanced version of the Bourne-again shell, which is also known
as ‘bash’, is distributed as the standard shell in almost all Unix systems. This is a freeware
shell from the Free Software Foundation (FSF), where it was developed by Brian Fox and
Chet Raney.
C shell It is also called the programmer’s shell and exists as ‘csh.’ It was developed by Bill
Joy at the University of California, Berkeley. The C shell got its name because its syntax and
usage is very similar to the C programming language. A compatible version of the C shell,
‘tcsh’ is used in Linux.
Unix: An Introduction 9
Note: We can customize the Unix shell environment by also making use of system variables known as
environment variables, which will be discussed in Chapter 10.
■ SUMMARY ■
1. Operating systems provide an environment that interprocess communication, time sharing, security,
makes it possible for us to use the resources of a user-command interpretation, accounting, program
computer, namely hardware and software. A few development, scheduling, and swapping.
examples of modern-day operating systems include 3. In 1960, Multics started the development of the now
Android, BSD, iOS, LINUX, Microsoft Windows, Mac well-known Unix operating system. Unix became
OS X, and z/OS. commercially viable in 1973 when it was entirely
2. The various functions that an operating system recoded in C, thereby facilitating portability in other
performs include memory and data management, hardware. A typical structure of the Unix operating
Unix: An Introduction 11
system consists of hardware, a kernel, a shell, and 6. The main features of the Unix operating system are
various tools and applications. portability, multitasking, and multi-user capability.
4. The kernel is the heart of the operating system. It is 7. Since Unix is a multiprocessing and multitasking
defined as a nucleus of the operating system that operating system, it can be used in three different
manages all the resources and gets the task performed types of environments: stand-alone personal environ-
by the desired hardware. ment, time-sharing environment, and client–server
5. A shell acts as an interface between a user and a environment.
kernel. Mainly four types of shells are available in the 8. Currently, Unix is also portable on mobile devices.
Unix operating system, namely Bourne shell (sh), C Almost all mobile operating systems, including
shell (csh), Korn shell (ksh), and Bourne-again shell iOS, Android, and webOS, run on Unix or LINUX
(bash). kernels.
■ EXERCISES ■
Objective-type Questions
State True or False
1.1 The Unics operating system was further 1.6 The Bourne-again shell (bash) was developed by
developed to Unix. David Korn.
1.2 An operating system creates an environment 1.7 The Korn shell was developed by Brian Fox and
that enables us to use different resources of a Chet Raney.
computer system. 1.8 The Bourne shell derives its name from Stephen
1.3 The Korn shell is the oldest of all shells. Bourne.
1.4 The Korn shell and Bourne-again shell are not 1.9 The shell manages all the resources and gets the
compatible with the Bourne shell. tasks performed by the desired hardware.
1.5 Unix is a multi-user and multitasking operating 1.10 Unix enables a user to run only one process at a
system. time.
Multiple-choice Questions
1.1 Which of the following is the heart of any 1.2 Korn Shell was developed by
operating system? (a) David Korn (c) Bill Joy
(a) Hardware (c) Software (b) Steve Bourne (d) Ken Thompson
(b) Kernel (d) Users
12 Unix and Shell Programming
Review Questions
1.1 Write short notes on the following: 1.3 How did the Unix operating system come into
(a) Different tasks performed by the kernel the picture? Briefly explain its history.
(b) Role of shell in the Unix operating system 1.4 How many different types of shells are there?
(c) Structure of the Unix system Explain in detail.
1.2 Explain the functions performed by an operating 1.5 Explain the time-sharing and client–server
system. environment of the Unix operating system.
System
2
After studying this chapter, the reader will be conversant with the following:
• Unix files and their types
• Different types of device files
• Organization of a file system
• Accessing, mounting, and unmounting a file system
• Different blocks of a file system
• Structure of inode blocks
The most common type of ordinary file is the text file. This is just a regular file that contains
printable characters. For example, the programs that we write are text files. However, the Unix
commands that we use or the C programs that we execute do not fall into the category of text files.
The characteristic feature of text files is that the data stored inside them is divided into
groups of lines, with each line terminated by the newline character. This character is not
visible, and it does not appear in the hard copy output. It is generated by the system when we
press the <Enter> key.
Examples letter.txt, bank.sh, payment
The files in Unix may or may not have any extension. The first two examples depict files with
extensions .txt and .sh, respectively. The third example depicts a file without any extension.
In most Unix systems, a filename can have approximately 255 characters. If we enter more
than 255 characters while specifying a filename, only the first 255 characters are effectively
interpreted by the system.
Note: We have to assign extensions for the AWK files or other programming files (e.g., C).
Directory files
A directory contains no external data, but it stores some details of the files and sub-directories
it contains. The Unix file system is organized into a number of such directories and sub-
directories, which can also be created as and when needed. We often need to group a set of
files pertaining to a specific application. This enables two or more files in separate directories
to have the same filename.
If a directory contains, for example, 10 files, there will be 10 entries in the directory file
displaying information such as size of the file, date and time of creation, or last modification.
When an ordinary file is created or removed, its entry in the corresponding directory file is
automatically updated by the kernel with the relevant information about the file.
Note: The directory file contains the names of all resident files in the directory.
the actual transfer units of the device, that is, single characters at a time without collecting
or combining them into a block. It is quite obvious that character devices are comparatively
slow and have a large access time.
Examples include virtual terminals, terminals, and serial modems.
Block devices Block devices are those in which the read and write operations are performed
one block at a time, where the size of one block can range from 512 bytes to 32 KB. When
compared with character devices in which transactions are performed one character at a time,
block devices are quite fast. Moreover, block devices use caching to reduce the access time.
By caching, we mean that when a block device is accessed, the kernel reads the whole block
into a buffer in the memory, so that future read and write operations are performed to the
cached version in the memory, hence reducing the access time to a great extent. Finally, the
modified buffer contents are written to block devices. The only drawback in using memory
buffers is that if the system crashes before modified buffers are written into the block device,
the data will be inconsistent. Hence, we need to periodically flush out the modified buffers
to the block device.
Examples include hard disk, DVD/CD ROM, and memory regions.
Note: All device files are stored in the /dev directory.
Here, source is the absolute or relative path of the file whose link we want to create, and
destination is the name of the link.
The two filenames, letter.txt and memo.txt, refer to the same file, and changes made in
either file will be reflected in the other file.
2.1.3 Pipes
Pipes are used for sending the output of a command as the input to another command. Pipes
are created through the vertical bar character ‘|’, which contains commands on either side.
The output of the command on the left-hand side is sent as input to the command on the
right-hand side. The syntax for creating a pipe is as follows:
Syntax command1 | command2
Example ls | sort
16 Unix and Shell Programming
We will discuss two commands, ls and sort, in Chapter 3, but for the time being, it is
enough to understand that the output of the ls command is sent to the sort command before
outputting the result on the screen.
The pipe created through this syntax is known as anonymous pipe, because it is created
and later destroyed when the process is over. command1 and command2 on either side of the pipe
have their own file descriptors that are automatically closed when the process is over.
Apart from anonymous pipes, we can also create named pipes. As the name suggests,
named pipes have specific names that are assigned to them, and exist as special files within
the file system. Named pipes are known as first in first out (FIFO) because of two reasons.
First, once the data is read from the pipe, it cannot be read again. Second, the order in which
the data is read cannot be deviated. The named pipes are not automatically deleted as in the
case of anonymous pipes but have to be explicitly deleted using the rm or unlink command.
The command used for creating named pipes is mknod. The three commands, mknod, rm, and
unlink, will be discussed in detail in Chapter 3.
2.1.4 Sockets
Socket files are used for transferring information between two processes that are running
on different machines. Socket files are basically used as an interface between our Unix
process and the networking protocol. For example, while accessing the Internet through a
web browser, sockets are used to establish communication between the Unix process and the
browser. The creation of socket files is explained in detail in Chapter 14.
dev All the special files in the Unix file system, such as the keyboard or terminal device
drivers, are kept in this directory.
Unix File System 17
Since the memory of a computer is limited in nature, we need to swap in the desired process
and swap out the process whose task is done. This swapping is handled by a special file
system known as the swap file system, which is discussed here.
system is mounted is called the mount point of a file system. The files and directories in the
new file system or mounted file system are accessible when we go into that subdirectory. By
mounting a file system, it will become a non-distinguishable part of the existing file system.
Basically, mounting is a procedure of making the main existing file system aware of the new
file system.
The device name or file system is mounted on the given directory. The directory, also known as
mount point, is the name of the directory that the newly mounted file system will be assigned to.
Note: For the file system to be mounted on a particular directory, the directory should already exist on the
current file system.
In order to mount a file system that has the special device name /dev/fdɧɧ (for floppy disk 0)
onto the existing /mnt directory, the following command is used:
#mount /dev/fdɧɧ /mnt
The new file system is simply an extension of the /mnt directory. We can view and access
the files and directories of the mounted file system by changing the directory to the /mnt
directory. We can also create directories and files in the /mnt directory sub-tree.
The mount point (/mnt) should usually be an empty directory, as we will not be able to
access its original files and subdirectories once a file system is mounted on it. The files of the
/mnt directory will be accessible only when the file system is unmounted.
The file system that is mounted to the main file system should be unmounted after its
job is done. Before shutting down the Unix system, all the mounted file systems need to be
unmounted; otherwise this may result in corruption of the content.
It gives a list of the mounted file systems. We might obtain the following output:
mounted mounted over
/dev/fdɧɧ /mnt
Unix File System 19
This output shows that the file system /dev/fdɧɧ is mounted on the /mnt directory.
The command that is used for unmounting the mounted file system is umount. The
following format is adopted for using the umount command:
Syntax umount filesystem name/mount point
Example The command to unmount the file system, /dev/fdɧɧ that we mounted on the /
mnt directory will be as follows:
umount /mnt
We cannot unmount the file system even if we are sitting in the same file system. Thus,
unmounting the /dev/fdɧɧ file system while sitting in the /mnt directory is not possible. We
have to come out of the /mnt directory before giving the umount command.
Note: A file system cannot be dismounted if it is busy, that is, when a file or directory on that file system is
being accessed.
Nowadays, floppy disk drives are no longer manufactured or used. Only for the sake of
explaining mount and umount commands, the concept of floppy disk drives is used. In the
currently available Unix operating systems (like Oracle Solaris 10, which we are using in
this book) and Linux systems, CD ROMs, DVDs, and USB storage devices are automatically
mounted without using the mount command. Thus, mount and umount commands are no longer
needed in the currently available Unix operating systems or equivalents. The USB storage
device is automatically mounted and is available under the /rmdisk directory, whereas the
CD ROM/DVD is automatically mounted and available under the /cdrom directory. This also
means that the following commands will navigate us to the CD ROM/DVD drive and will
depict its contents:
$ cd /cdrom
$ ls
Table 2.1 gives a brief comparison of the file systems of Windows and Unix operating
systems.
Table 2.1 Comparison between file systems of Windows and Unix operating systems
Unix Windows
In Unix, the / (forward slash) represents a separator while In Windows, the \ (backslash) is used for defining the path.
defining the path to indicate a new directory level. The following For example, the directory levels, usr and projects, in
command represents two directory levels, usr and projects: Windows are represented as follows:
cd /usr/projects cd \usr\projects
In Unix, the forward slash (/) indicates the root directory, that In Windows (and in DOS), C:\ indicates the top-level
is, the directory from where all other directories begin. All directory of the file system. Other hard disk drives, floppy
other hard disk drives, pen drives, CD ROM/DVD drives, etc., disk drives, and CD ROM/DVD drives are indicated by
are accessed via the root (/) directory. For example, /cdrom various top-level directory equivalents such as D:\ and E:\.
represents the CD ROM drive.
In Unix, the root account acts as the Unix administrator. In Windows, there is an administrator account that
performs the administrative tasks.
20 Unix and Shell Programming
207 204 292 275 250 Super block pointed to by the index, that is, the data block number 138 will
be returned and the index will shift to another data block in the
Fig. 2.4 List of free data blocks list. Thus, after having assigned data block 138 to the requesting
copied from data block to super block process, the index will shift to the point at the data block numbered
175, and the procedure will continue. If the super block contains
only one entry, which is a pointer to the array of free data blocks, all the entries from that
block will be copied to the super block free list as shown in Fig. 2.4.
As usual, the requesting process will continue to get block numbers from the ones listed
in the super block.
Directory
The directory contains only two file attributes: inode number and filename. When we create
a link for a file, no separate inode is allocated for it, but the link count in the inode is
incremented by one. A directory entry is also created with the new filename. When we
remove a linked file with the rm command, the link count in the inode is decremented,
and the directory entry for that link is also removed. A file is removed when its link count
becomes zero. The associated disk blocks are also freed in order to make them available for
new files.
A file is internally identified by Unix through a unique inode number that is associated
with it. A directory file contains the names of the files and the subdirectories present in that
directory along with an inode number for each. The inode number is nothing but an index
to the inode table in which information about the file is stored. For example, if the inode
number of the file letter.txt is 45267, it means that the slot number 45267 in the
inode table contains information about the file letter.txt.
Suppose the file letter.txt is present in a directory called India. If we attempt to cat
the letter.txt file, Unix will first check if the user has the read permission for the directory
India. If so, it will find out whether this directory has an entry with letter.txt. If such an
entry is found, its inode number is fetched from India. This inode number is an index to the
inode table. The contents of the file letter.txt are read from the disk addresses mentioned
in the inode entry of letter.txt and then displayed on the screen.
The file contents are placed in the form of data blocks dispersed throughout the disk. In
each inode, an array is maintained to keep track of the data blocks. The first 10 elements of
the array indicate direct indexing, that is, they directly point to the data blocks that contain
the file content. Thus, a file that needs less than or equal to 10 data blocks is accessible
via the direct index entries. After direct indexing comes single indirect indexing, which
in turn, is followed by double indirect indexing and triple indirect indexing, as shown in
Fig. 2.5.
If the file needs more than 10 blocks, it uses single indirect indexing. It contains a pointer
that points to a block, which in turn, contains an array of pointers pointing to the file’s data
blocks.
Double indirect indexing is used for larger files where a pointer points to a block of
pointers that point to other blocks of pointers, which in turn, point to the file’s data blocks.
Triple indirect indexing is used for extremely large files where a pointer points to a block
of pointers that point to other blocks of pointers, which in turn, point to other blocks of
pointers, which finally point to the file’s data blocks.
A question arises with regard to the maximum size of a file that can be pointed to by an
inode.
Unix File System 23
Owner
Group
Data
File type
block
Permissions
Access time
Modification time
Inode modification time Data
Size block
Direct
Data
Direct block
pointers
pointing
to file’s Data
data blocks block
Direct
Single indirect Data
Double indirect block
Triple indirect
Data
block
Data
block
Blocks of pointers
Fig. 2.5 Single, double, and triple indirect addressing for large files
Assuming a data block is of size 4KB and there are 10 direct pointers in an inode, the
directly addressable data block size is 10 × 4KB = 40KB.
In case of single indirect indexing, a pointer points to an entire block of pointers. If a
block is of size 4KB, and each pointer is of 4 bytes, there will be 4 KB/4 pointers, that is,
1024 pointers in a block, where each pointer points to a 4KB block. This means that a single
indirect addressing can address a file that is 1024 × 4KB in size.
Similarly, in double indirect indexing, a pointer points to a block of pointers, which in
turn, point to a block of pointers. Hence, a double indirect addressing can address a file that
is 1024 × 1024 × 4KB in size. By following the same pattern, a triple indirect addressing can
address a file of 1024 × 1024 × 1024 × 4KB size.
Note: The maximum file size that Unix supports is the sum of sizes accessible by the direct, single indirect,
double indirect, and triple indirect addressing.
24 Unix and Shell Programming
Note: The kernel always maintains a copy of the superblock in the memory. The in-memory copy actually
contains the latest and the correct file system status rather than its disk copy.
The information stored in the inode table changes whenever we use any file or change its
permissions; hence, a copy of the super block and inode table are kept in the memory (RAM)
at start-up time, and all changes are made in the RAM copies of the super block and inode
table every time some modification occurs. The original super block and inode table in the
disk are updated after a fixed interval of time, say every 30 seconds, by a command called
sync. This command synchronizes the inode table in the memory with the one on the disk by
simply overwriting the memory copy on to the disk.
The disk space allotted to a Unix file system is made up of blocks, each of which is
typically 512 bytes in size. Some file systems may have blocks of 1024 or 2048 bytes.
Note: The standard system block size is 1024 bytes (known as logical block) and the physical block size is
512 bytes long (i.e., one logical block contains two physical blocks).
■ SUMMARY ■
1. In the Unix operating system, there are three types cached block of the memory.
of files: ordinary files, directory files, and device files. 4. All device files are stored in the /dev directory.
Ordinary files are also referred to as regular files, and 5. A symbolic link is a special file that points to another
they may contain printable characters. existing file on the system. These links are used to
2. The device files are of two types—character device create several names for the same file. Through the
files and block device files. In character devices, in command, we can create the symbolic link of a
read and write operations are performed character by file.
character, that is, 1 byte at a time, whereas in block 6. A pipe is represented as a vertical bar character (|)
devices, read and write operations are performed one and is used for sending the output of a command
block at a time. as an input to another command. Pipes are of two
3. Caching is a process in which the block of the disk types: anonymous pipes and named pipes. Named
accessed is kept in buffer in the memory so that in pipes are known as FIFO, as once the data is read
future, read and write operations are performed in the from the pipe, it cannot be read again.
Unix File System 25
7. Socket files are used for transferring information subdirectory on which the new file system is mounted
between two processes that are running on different is called the mount point of a file system.
machines. 11. Unmounting a file system means detaching the
8. The file system is organized as a tree with a mounted file system from the directory of the Unix
single root node called root that is represented system on which it was mounted.
as ‘/’. 12. A Unix file system typically consists of four blocks:
9. The concept by which a system appears to have boot, super, inode, and data.
more memory than what it actually has is known as 13. Every file or directory has an inode number—a
virtual memory. unique number that recognizes the file or directory
10. Mounting a file system means assigning the root in the file system.
directory of the new file system to a subdirectory 14. A file’s inode number can be found using the ls -i
of the root directory of our Unix system. The command.
■ EXERCISES ■
Objective-type Questions
State True or False
2.1 The first block of the Unix file system is known through symbolic links.
as super block. 2.7 Named pipes are also known as last in first out
2.2 Every file or directory has a unique inode number. (LIFO).
2.3 Unix also treats the physical devices as files. 2.8 In order to see the files or directories of any
2.4 tmp is the folder in which all administrative files device, its file system needs to be mounted.
are kept. 2.9 In block devices, read and write operations are
2.5 Double indirection is used for smaller files. performed one byte at a time.
2.6 We can create several names for the same file 2.10 Pipes are of two types: anonymous and named.
Multiple-choice Questions
2.1 The first block of a file system is 2.2 If a directory has 10 files, the number of entries
(a) super block (c) inode block in the directory file will be
(b) data block (d) boot block (a) 10 (b) 11 (c) 9 (d) 0
26 Unix and Shell Programming
2.3 In the Unix operating system, the files are divided (d) triple indirect addressing
into three categories—ordinary, directory, and 2.7 The command that synchronizes the inode table
(a) special files (c) device files in the memory with the one on the disk is
(b) hidden files (d) inode files (a) sync
2.4 The directory in which executable files of the (b) synchronizer
Unix operating system are kept is (c) tally
(a) lib (c) dev (d) matcher
(b) etc (d) bin 2.8 The reserved inode number 0 refers to the
2.5 The Unix file system is organized as a tree with a (a) linked files
single node at the top known as (b) deleted files and directories
(a) foundation (c) seed (c) directories
(b) root (d) stem (d) device files
2.6 The indexing by which a pointer points to a block 2.9 The bootstrap program is a short program
of pointers that point to other blocks of pointers, loaded by
which in turn, point to the file’s data blocks is (a) data block (c) BIOS
known as (b) hard disk (d) named pipe
(a) direct addressing 2.10 The number of sections or blocks that a file
(b) single indirect addressing system has is
(c) double indirect addressing (a) 1 (b) 2 (c) 3 (d) 4
Review Questions
2.1 Write short notes on the following: unmounting a file in a Unix operating system.
(a) Inode block What is the significance of this process?
(b) Ordinary files 2.4 Differentiate the following:
(c) Pipes (a) Character and block devices
(d) Symbolic link (b) Boot block and data block
(e) Inode table (c) Single and double indirect addressing
2.2 What are the different blocks that constitute a 2.5 Explain the role of default files and directories in
Unix file system? the Unix operating system.
2.3 Explain the procedure of mounting and
3
Commands
After studying this chapter, the reader will be conversant with the following:
• Some basic commands that are frequently used
• Logging in to the system, changing password, checking who is logged in,
and displaying date and time of the system
• Dealing with file operations such as creating files, displaying their contents,
deleting files, creating links to files, renaming files, and moving files
• Maintaining directories, creating a directory, changing the current directory,
and removing a directory
• Displaying calendars, using basic calculators, displaying information about
current systems, deleting symbolic links, and exiting from a Unix system
Unix has a large family of commands. However, even before we discuss how to perform a
task with the help of these commands, we need to first log in to the system. Let us see how
this is done.
Note: One of the main security features of the Unix operating system is the displaying of asterisks while typing
the password and storing the actual password in an encrypted format (also known as the hash of the password)
in the /etc/shadow file that can be accessed only by the root.
In case the user ID or password is wrongly entered, we get the following error message:
Login incorrect
login:
This message informs the user that either the user ID or the password has been entered
incorrectly, and a new login prompt is displayed to try again.
If the user ID and password are correct, we will be allowed to log in to the Unix system
and will be navigated to our home directory, that is, the directory in which our personal files
and settings are stored. In addition, a message indicating when we last logged in, along with
the shell prompt, is displayed:
Last login: Fri Dec 15 10:30:05 on ttys17
$
This message indicates the date, time, and terminal from which we last logged in. The
message is followed by the default Unix shell prompt by which we can write and execute
Unix commands. The default Unix prompt for the Bourne, Bash, and Korn shells is the
dollar sign ($). For C and tcsh shells, the prompt is the percentage sign (%).
You must be wondering who the administrator refers to. Let us understand this term.
System administrator A system administrator is a person who is responsible for setting up
and maintaining the Unix operating system. He/She is responsible for the proper functioning
of the Unix system and also ensures that the system resources are optimally utilized. The
following are a few of the tasks performed by the system administrator:
1. Set up and maintain user accounts
2. Monitor access and privileges and set up security policies
3. Monitor system performance and ensure proper utilization of resources
4. Install and upgrade software whenever desired
5. Take backup at regular intervals and restore systems in case of a crash
6. Perform proper starting and shutting down of systems
files, looking at the content of the files, copying, renaming, and deleting files, viewing system
date and time, and knowing the list of users who are logged in, among others.
The user performs very general operations while working with the Unix operating system.
3.2.1 Structure
As mentioned in Section 3.2, a traditional Unix command consists of options and operands,
where options are generally in the form of a character prefixed by a hypen (-), which is used for
exploiting a particular feature of the command. The argument refers to the content or data to
which the command has to be applied. An argument can be a file, directory, terminal, device, etc.
The syntax of a Unix command is as follows:
Unix_Comannd [-option1][-option2]...[Argument]
Let us understand the different types of commands in Unix.
On executing the passwd command, we will be prompted to enter the old password before
giving the new password (to confirm that only authorized people are changing the password).
In addition, the new password should be significantly different from the older one. It should
be at least six characters long, and have at least two alphabets, one numeric, and one special
character. On executing the command, we may get the output shown in the following example.
Example
$passwd
Changing password for chirag
Old password: *********
New password: **********
Re-enter new password: **********
30 Unix and Shell Programming
If the new password and the old password are not very different from each other, we may get
the following error:
Passwords must differ by at least 3 positions
The two passwords entered in New password and Re-enter new password should be the same;
else we will get the following error:
They don't match
Try again
In case the two passwords entered in New password and Re-enter new password are exactly
the same, the password of the user will be changed and we will get a confirming message:
Password updated successfully.
There is a list of options available with the ls command, as shown in Table 3.1.
Table 3.1 List of options available with the ls command
Options Syntax Description
-x ls –x Shows files in multiple columns (default)
-F ls -F Shows files and directories, files have / as suffix
-r ls -r Shows files sorted in reverse alphabetical order
-R ls -R Shows the recursive listing, that is, files of directories as well as
subdirectories are also displayed
-a ls -a Shows all the hidden and visible files; hidden files start with a dot (.)
-d ls –d directory_name Shows only the directory name instead of listing its content; used
with –l option to know the status of the directory
-l ls -l Shows files in the long-listing format (shows seven attributes of a
file, that is, file permissions, number of links, owner, group, size,
date and time, and file/directory name)
-t ls -t Sorts files by modification time; the latest file is on the top
-u ls -u Sorts files according to the last access time, starting with the most
recent file
-i ls -i Shows inode number of all the files
While listing and searching for files and directories, we can also make use of wild-card
characters. These characters help in finding files and directories that begin with specific
character(s), contain specific character(s) or a range of characters in their names, consist of
names of a specific length, and so on. They provide a quick and convenient way of searching
for the desired files and directories.
Wild card matching A string is a wild-card pattern if it contains one of the following
characters: ‘?’, ‘*’, or ‘[’.
Basic Unix Commands 31
To get all the files beginning with a specific character, we can give a command using the
following syntax:
Syntax $ ls charactername*
In order to get all the files beginning with a character in a given range, we give the command
in the following syntax.
Syntax $ ls [c1-c2]*
Here, c1 and c2 represent the beginning and ending character of the range, respectively.
Example In order to get all the files beginning with characters a to d, we can give the
following command.
$ ls [a-d]*
courses
Similarly, we can use the wild-card character, ?, which represents a single character, to get
the desired files. For example, to get all the files that consist of three characters and begin
with character a, we can give the following command:
$ ls a??
However, since none of the files meet these criteria (assuming no filename exists that is three
characters long and begins with character a), we will not get anything as the output.
In order to get all the files that begin with character a followed by any digit, we can give
the following command:
$ ls a[0-9]*
32 Unix and Shell Programming
Again, as we can see, no file that begins with character a is followed by a digit in our list of
directories and thus no output is generated.
If we use the –l option for long listing, we may get the following output:
$ls –l
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university
Seven attributes are displayed: file permissions, number of links, owner, group, size, date
and time, and file/directory name.
In order to see all the files, including the hidden files, we use the –a option. The output is
as follows:
$ls –al
-rwxr--r-- 2 chirag it 80 Nov 11:31 .
-rwxr--r-- 2 chirag it 72 Nov 11:31 ..
-rwxr--r-- 1 chirag it 210 Nov 11:31 .profile
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university
Note: Filenames that begin with the dot (.) are considered hidden files in Unix.
By default, the file and directory names are sorted alphabetically. We can use the –t option to
sort them according to the modification time; the file that is created last is displayed at the top.
$ls –lt
-rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
-rwxr--r-- 2 chirag it 48 Nov 11:31 courses
-rwxr-xrw- 2 chirag it 65 Dec 05:10 university
-rw-rwxr-- 1 chirag it 669 Dec 09:15 notes.txt
In order to get the inode number of the specified file, we can use the –i option, as shown here:
$ ls –li programs.doc
39984 -rwxrwxrwx 1 chirag it 1560 Nov 11:21 programs.doc
The digit 39984 is the inode number of the file programs.doc. Let us recall a concept from
Chapter 1: each file or directory in the Unix operating system has a unique number known as
inode number, which recognizes the file or directory in the file system.
The option –m stands for mode and is used for creating the directory with certain specific
permissions.
Basic Unix Commands 33
The option –p stands for parent and is first used for creating all the non-existing parent
directories that are mentioned in the given path.
dirname is the directory name that may be either an absolute path name or a relative path
name. We may specify more than one directory name on a single command line.
Note: Absolute and relative paths—A path refers to the exact location of a given file or directory. Basically,
directories exists in a tree hierarchy, one inside another, and a directory or file is referred through a path, where
the path components are delimited by the forward slash (/).
A path can be an absolute path or a relative path. The absolute path points to the given file or directory
regardless of the current working directory and is written in reference to the root directory, whereas the relative
path is a path for a given file or directory in relation to the current working directory. Remember, the absolute
path always starts with a forward slash, which represents the root directory. Moreover, the absolute path of the
given file or directory is always the same, whereas the relative path changes according to the current directory
location. The following are the examples:
(a) Assuming a directory projects exists inside another directory usr, exists on the root and that the current
working directory is usr, the following are the two paths to the projects directory:
Absolute path: /usr/projects
Relative path: projects
(b) Similarly, if there is another directory experiment inside the directory /usr, and the current working
directory is projects, then the following are the two paths to the experiments directory:
Absolute path: /usr/experiments
Relative path: ../experiments
This command creates a directory by the name courses under the current directory.
$ mkdir courses faculty placement
This command will create three directories by the names courses, faculty, and placement.
Note: If dirname already exists, the mkdir command aborts and does not overwrite the existing directory.
$ mkdir courses
Since a directory with the name courses already exists, this command generates the following
error:
mkdir: can't make directory courses
By default, the directories are created with read, write, and execute permissions for owners and
with read and execute permissions for groups and others, respectively. However, in order to create
a directory with a particular set of permissions of our choice, we can use the following command:
$ mkdir –m 746 country
This command creates a directory country with read, write, and execute permissions for the
owner; only read permission for the group; and read and write permissions for others.
The option –p stands for parent and is used for creating a parent directory in the given path.
34 Unix and Shell Programming
Here, path name is either an absolute or a relative path name for the desired target directory.
Example $ cd ajmer
This command changes our current directory to ajmer (that is assumed to exist in the current
directory). When we directly give the directory name (without using ‘/’ as prefix), it means
that it is a relative path (i.e., a path related to the current directory).
$ cd /home/chirag/ajmer
$ mkdir courses \
> faculty \ This command takes us to the sub-subdirectory ajmer, which is in the chirag
> placement
subdirectory of the home directory. The path used in the aforementioned
$ example is an absolute path.
Fig. 3.1 Line-continuation
$ cd ..
character used in the
mkdir command This command takes us to the parent directory.
Basic Unix Commands 35
We can return to our home directory from any other directory by simply typing the
cd command without an argument. We do not need to specify our home directory as an
argument, because our shell always knows the name of our home directory.
Here, the –p option is used for deleting the parent directory if it is empty.
Note: The rmdir command cannot remove a directory until it is empty.
Examples
(a) In order to remove a single directory, consider the following example.
$ rmdir ajmer
This removes the directory ajmer if it is empty; else, we will get the following error:
rmdir: ajmer: Directory not empty
(b) We can delete more than one directory using the following single command.
$ rmdir courses placement
The directories that are empty will be deleted with this command.
$ rmdir university/colleges/professors university/colleges university
This command deletes the professors sub-subdirectory from the colleges subdirectory;
then it deletes the colleges subdirectory from the university directory and finally, from
the university directory.
We can get the same result using the –p option as follows:
$ rmdir –p university/colleges/professors
Remember, we cannot use rmdir to remove our current working directory. If we wish to
remove our working directory, we have to first come out of it.
Example $ pwd
/home/chirag
This output indicates that we are in the home directory of the user ID chirag. We can see that
the pwd command displays the full path name of the current directory.
36 Unix and Shell Programming
The pwd command is a valuable utility when we are moving around in the file system
hierarchy. If we change our directory, pwd confirms the change of our location, as shown in
the following sequence of commands:
$ pwd
/home/chirag
$ cd ajmer
$ pwd
/home/chirag/ajmer
We can see that when we change our directory to the ajmer subdirectory, the output displayed
by the pwd command confirms the same.
The options and arguments shown in the aforementioned syntax are briefly explained in
Table 3.2.
Table 3.2 Brief description of the options in the uname command
Options Description
-a Displays basic information currently available in the system
-i Displays the name of the hardware platform
-n Displays the node name, the name by which it is connected to the communication network
-r Displays the operating system release level
-v Displays the operating system version
-s Displays the name of the operating system (default)
-S Used to get basic information of the specified system name (Only the super user can use this
option.)
Note: Super user and root user refer to the Unix administrator.
Examples
(a) $ uname -a
SunOS station1 5.10 Generic_147441-01 i86pc i386 i86pc
This output shows the basic information of the system, including the hardware
platform, the operating system, its version, and so on.
(b) $uname -n
station1
This output shows that our machine is connected in the network by name, station1.
(c) $uname -i
i86pc
This output indicates that our machine is using a 64-bit processor.
Basic Unix Commands 37
(d) $uname -r
5.10
This output shows the operating system release level.
(e) $uname -s
SunOS
This output indicates that our machine has a Linux operating system installed.
Here, the –m option is used for changing the modification time, and the –a option is used
for changing the access time. The time_expression that we would provide should be in the
following format: MMDDhhmm, where M: month, D: day, h: hour, and m: minute.
When the touch command is given without any option and time expression, it simply
creates a file of zero bytes.
Examples
(a) $ touch chirag.txt
This creates a file called chirag.txt of zero byte.
We can create several empty files quickly with the touch command.
(b) $ touch chirag1 chirag2 chirag3 chirag4
This command creates four new files with the following names: chirag1, chirag2,
chirag3, and chirag4 (without any contents in them).
(c) $ touch 09211520 chirag.txt
This sets the modification and access time of the file chirag.txt to Sep 21 15:20.
(d) $ touch –m 11071015 chirag.txt
This command sets the modification time of the file chirag.txt to Nov 07 10:15.
(e) $ touch –a 07120820 chirag.txt
This will set the access time of the file chirag.txt to Jul 12 08:20.
Note: The commands ls –l and ls –lu can also be used to set the modification time and access time,
respectively, of any file.
The options and arguments shown in the aforementioned syntax are briefly explained in
Table 3.3.
38 Unix and Shell Programming
Table 3.3 Brief description of options available with the cat command
Options Description
-n It precedes each line output with its line number.
-s It suppresses messages when non-existent files are used in the command.
-v It displays non-printing characters, except tabs, new lines, forms, and feeds, that exist in a file. To
display new lines, the -e option is used along with the –v option. To display tabs and form feeds, the -t
option is used along with the –v option. The new lines are represented by ‘$’, tabs are represented by
‘^I’, and form feeds are represented by ‘^L’.
Showing content To display the contents of any file, we just need to specify the filename
after the cat command
$ cat chirag
Note: We assume that the file chirag contains a couple of tabs that are deliberately added to the file.
Creating files For creating files through the cat command, we redirect the standard output
to a file instead of the monitor, as shown in the following example:
$ cat >chirag
If we press the Enter key, we would find the cursor positioned in the next line, waiting to
type the matter that we want to store in the file chirag. After typing a few lines, press Ctrl-d.
Note: Ctrl-d keys indicate the end of file character (EOF).
Showing hidden characters in files The following command shows the hidden characters
and new lines in the form of $ (refer to Fig. 3.2):
Basic Unix Commands 39
Fig. 3.2 New lines displayed in the form of $ Fig. 3.3 Tabs and form feeds displayed as ^I and ^L
The following command shows the hidden characters and tabs in the form of ‘^I’ and form
feeds as ‘^L’ as shown in Fig. 3.3.
$ cat –vt chirag
The cat command, apart from displaying the contents of the file, also helps concatenate the
contents file.
Concatenating files To concatenate the contents of two files and store them in the third
file, we can use the following command:
$ cat chirag1 chirag2 >chirag3
This command stores the contents of the file chirag1 followed by the contents of the file
chirag2 into the file chirag3. If chirag3 already contains something, it would be overwritten.
If we want it to remain intact and the contents of chirag1 and chirag2 to be appended, we
should use the following command:
$ cat chirag1 chirag2 >>chirag3
Here, srcfile is the original or source filename, and destfile stands for destination filename.
If a file by the destination filename already exists, it will be overwritten with the contents of
the source file without any warning.
The option –i is used for interactive copying, that is, if a file by the destination filename
already exists, then cp will prompt us before overwriting the file.
The option –r is used for recursive copying and especially when we want to make a copy
of an entire directory (along with its subdirectories and files) using another directory name.
Example $ cp chirag chirag1
This example makes a copy of the file chirag in the name chirag1. We can confirm this by
looking at the content of both the files. If the contents of both the files are found to be the
same, it indicates that the chirag file is successfully copied in the name chirag1. With the
help of the cat commands, we can look at the contents of the files chirag and chirag1.
$ cat chirag
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
40 Unix and Shell Programming
$cat chirag1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
The contents of chirag and chirag1 are found to be the same and hence, this confirms that
the file chirag is copied in the filename chirag1.
$ cp /home/chirag/ajmer/a.bat .
This command copies the file a.bat from the directory ajmer (subdirectory of chirag)
into the current directory. The period (.) at the end of the cp command denotes the current
directory.
For interactive copying, we use the following command:
$ cp –i chirag chirag1
If a file by the name chirag1 already exists, then, before overwriting it, we will be notified
with the following message:
cp: overwrite chirag1 (yes/no)?
Here, we need to enter y followed by the Enter key if we want to overwrite the file.
For copying an entire directory along with its subdirectories, we use the following
command:
$ cp –r courses latestcourses
It will make a copy of the courses directory (along with its files and subdirectories) with the
name latestcourses.
This command in the syntax will change the filename from oldname to newname.
Example $mv chirag chirag2
This command moves or renames the file chirag to chirag2. When we look at the contents
of the file chirag2, we get the same contents that were in the file chirag, which is shown
here.
$ cat chirag2
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
It indicates that the file chirag2 is nothing but the same file that existed earlier with the name
chirag.
Basic Unix Commands 41
$ mv chirag2 ajmer
or
$ mv chirag2 /home/chirag/ajmer
This moves the file chirag2 into the ajmer subdirectory. Now, the file chirag2 is no longer
available in the current directory.
For moving more than one file, we use the following command:
$ mv notes.txt programs.doc /home/chirag/ajmer
The files notes.txt and programs.doc are removed from the current location and moved to
the ajmer subdirectory.
Here, each filename is separated by white space. The options and arguments shown in the
aforementioned syntax are briefly explained in Table 3.4.
Table 3.4 Brief description of options available with the rm command
Options Description
-i It is used for interactive file deletion, i.e., we will be prompted for confirmation before the
file is deleted.
–r It is used for recursive deletion, i.e., it is used for removing an entire directory along with
its files and subdirectories.
–f It is used to forcibly remove a file for which we do not have the write permission.
(e) To remove a file that is write protected (for which we do not have the write permission),
we can use the following command.
$ rm –f results
This command deletes the results file even if we do not have the write permission for
doing so.
After the ln linking, both newname and oldname refer to the same file.
The default link type is hard. In order to create a symbolic link, the symbolic option (-s)
is used.
Example $ln chirag1 mce1
$ls
Through this command, a hard link will be created for the file chirag1 by the name mce1.
Note: When the –s option is not used with the ln command, a hard link is created.
We get several filenames and directories in the current directory along with the two filenames
mce1 and chirag1, and when we write the following command,
$ ls -l chirag1
The group of rwx is the permissions for owners, groups, and others; 2 is the number of
links (also known as link count) of the file; and chirag is the owner. The group name is it.
The size of the file is 7669 bytes. Next comes the date and time the file was last modified.
The output ends with the filename chirag1.
If another link was to be created, the link count would change to 3.
Note: A link count is an integer value that is maintained for each file or directory and indicates the total number
of links pointing to it. When a new link is created, the link count value is increased by one. Similarly, when a link
is removed, the value is decreased by one. When a link count becomes zero, it means the file or directory has
no links, and hence, the disk space allocated to it is deallocated.
Both mce1 and chirag1 point to the same file. When we look at the contents of the file mce1,
we get the same contents as in chirag1.
$cat mce1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
Basic Unix Commands 43
Note: If we change the contents of the file mce1, the contents of chirag1 will also change, because although
the names mce1 and chirag1 are different, both of them refer to the same file.
To see the inode numbers of the linked files, we give the following command:
$ls -li ichirag1 mce1
20985 -rwxrwxrwx 2 chirag it 320 Nov 11:21 chirag1
20985 -rwxrwxrwx 2 chirag it 320 Nov 11:21 mce1
The -li option with the ls command displays the inode number along with the long listing
of the specified files. We can see that both the files have the same inode number, 20985,
which confirms that both point to the same file.
In order to remove a file with more than one link from the file system, we should delete all
the links with the rm command. For example, let us delete the link mce1 using the following
command:
$ rm mce1
The file still exists under the name chirag1 as confirmed by the following command:
$cat chirag1
Microchip Computer Education
Sri Nagar Road, Ajmer
Gone time never returns
The link will not be created and the following error will be displayed:
ln: xyz.txt: File exists
The –f option stands for force option and is used when we want to overwrite an existing file
(while creating a link) without getting any message.
$ ln -f abc.txt xyz.txt
Hard links The default link that is created is a hard link (which we have been using until
now). The following are the characteristics of hard links:
1. Unix hard links can point to programs and files, but not to directories.
2. If the original program or file is renamed, moved, or deleted, the hard link is not broken.
3. Hard links in Unix cannot span different file systems, that is, we cannot have a hard link on
the /usr file system that refers to a program or file on the /tmp file system. The reason is that
hard links share an inode number, whereas each file system has its own set of inode numbers.
44 Unix and Shell Programming
Symbolic links Hard links cannot be created for different file systems. that is, they can
be made within the current directory structure. Symbolic links (symlinks) are used to link
to a different file system. The symbolic link, also referred to as soft link, is a special type
of file that references another file or directory. It simply contains the name of the file that it
references and contains no actual data. It gives us power and flexibility to manage files. We
can change the symlink to point to the desired files. Soft links also inherit the permission of
the folder they are pointing at. To create a symbolic link in Unix, let us use the following
syntax:
Syntax ln -s target_file symbolic_link
Here, target_file is the name of the existing file for which we want to create the symbolic
link, and the symbolic_link is the symbolic link for the target_file.
Example Consider a file named chirag1. Let us create a symlink called mce1, which points
to the original file, inventory.txt.
$ ln -s chirag1 mce1
We first specify the target file, the file that we want our symlink to point to, and then specify the
name of our symbolic link. On executing the ls -al command, we will find that the mce1 file
will have an ‘l’ in the long format of the ls command, which confirms that it is a symbolic link.
Note: An orphan symlink is a symbolic link that points nowhere, that is, the original target file it used to point
to earlier is either deleted or renamed.
The difference between symbolic link and hard link is that the symbolic link has the ability
to link to directories or files on remote computers. In addition, when you delete a target file,
the symbolic links to that file become unusable, whereas the hard links preserve the contents
of the file.
Terminfo is a database that defines terminal and printer attributes and capabilities. It contains
information such as the number of rows and columns in a terminal and the attributes of text
displayed on the terminal.
Syntax tput [clear][cup col row] [cols][lines][sc][rc][civis][cnorm][dl n][setb]
[setf][bold][sgr0][smul[rmul]
Table 3.5 gives a list of options available with the tput command.
Table 3.5 Brief description of the options available with the tput command
Options Description
clear It clears the whole screen.
cup col row It moves the cursor position to the given row and col position.
cols It displays the number of columns on the terminal screen.
lines It displays the number of lines on the terminal screen.
sc It saves the current cursor location.
rc It restores the cursor position, i.e., it returns the cursor to its last saved location.
dl n It deletes n number of lines below, including the current row, i.e., the row in which the
cursor is positioned.
bold It makes the text appear in bold.
sgr0 It turns off bold.
Smul It begins underlining text.
Rmul It stops underlining text.
Examples
(a) $ tput cup 10 5
This statement moves the cursor to the fifth row and the tenth column.
(b) $ tput cols
This statement displays a value 80, which represents the number of columns of the
terminal screen.
(c) $ tput dl 4
This statement deletes four lines below, including the current row.
(d) $ tput bold
This statement will make the text appear in bold until the srg0 command is invoked.
(e) $ tput clear
It clears the whole screen.
Note: We know that the tput command is mostly used in scripts but is deliberately provided here, as its option
clear is frequently used for clearing the screen while running commands at the command prompt.
To know whether the user is active, we use the -u option, which also indicates how long it
has been since there was any activity. This is known as idle time. It also returns the process
ID for the user.
$ who -u
anil tty1 Feb 10 14.25 0:45 1103
chirag tty2 Feb 08 11:25 old 1568
ravi tty5 Feb 10 15:10 . 1456
If we look at this example carefully, we will see three different formats for idle time. The
first user has had no activity for 0 hours and 45 minutes. The second user has had no activity
for more than 24-hours. Since there is only enough room for 24 hours in the idle time format,
when a user is inactive for more than 24 hours, the system simply says ‘old’. The third
user’s idle time is a period (.), that is, he/she has carried out an activiy in the last minute.
(b) If we use the H option, Unix displays a header that explains each column.
$ who -uH
NAME LINE TIME IDLE PID COMMENTS
anil tty1 Feb 10 14.25 0:45 1103 (:0)
chirag tty2 Feb 08 11:25 old 1568 (:0.0)
ravi tty5 Feb 10 15:10 . 1456 (:0.0)
(c) If we want to view information about ourselves, we can use the argument am I along with
the who command.
$who am I
chirag tty2 Feb 08 11:25
Options Description
-l Displays information of the user in a long format comprising login name, real name, terminal name, write status,
idle time, login time, office location, office phone number, user’s home directory, home phone number, login
shell, mail status, and the contents of the files, .plan, .project, and so on, from the user’s home directory
-s Displays information of the user in a short format comprising login name, real name, terminal name, write
status, idle time, login time, office location, and office phone number
-b Suppresses printing the user’s home directory and shell in a long format display
-w Suppresses printing the full name in a short format display
users. Apart from the login name, terminal name, date, and time of the logged-in users, the
command also displays other information such as the user’s home directory, phone number,
login shell, and mail status, among others.
Syntax finger [-b] [-l] [-s] [-w] [username]
This list shows three active logins. The actual time at which each user logged in and the time
the terminal has been idle are also listed. The idle time is the time that has elapsed since the
last keystroke. From the idle time, we can usually tell whether someone is at the terminal.
For example, we can say that there is no root user at the tty1 terminal, because there has
been no keystroke for 10 days. The user on the tty2 terminal has not used the terminal for
more than three hours.
The finger command can also be used to get the details of a single user, as shown in the
following example.
$ finger chirag
Login: chirag In real life: (null)
Directory: /home/chirag Shell: /bin/bash
On since Mon Dec 26 02:15 on tty2 from :0.0
No mail.
No Plan.
This output shows the login name, real name (null), home directory, login shell, login time,
terminal name, mail status, and so on.
48 Unix and Shell Programming
The arguments are used for displaying the date in the desired format. The list of available
arguments is given in Table 3.8.
Table 3.8 Brief description of the arguments used in the date command
Arguments Description
%d For displaying day (01–31)
%m For displaying month (01–12)
%b For displaying abbreviated month name (Jan, Feb, etc.)
%y For displaying the year—last two digits (00,…, 99)
%Y For displaying the year with century—four digits
%H For displaying hours—military format (00,01,…, 23)
%I For displaying hours (0,1,…, 12)
%p For displaying a.m./p.m.
%M For displaying minutes (0,1, …, 59)
%S For displaying seconds (0,1,…, 59)
%x For displaying only date (07/15/12)
%X For displaying only time (17:15:30)
%a For displaying abbreviated weekday (Fri)
Examples
(a) $ date + %m
It prints only the month, that is, 07.
(b) $ date +%b
It prints the month name, that is, Jul.
(c) $ date +%Y
It prints the year with century, that is, 2012.
(d) $date + "%I %p"
It displays the hour with a.m./p.m.
05 PM
Here, values 1–12 represent the month, and values 1–9999 represent the year.
Examples
(a) To display the current month’s calendar, just use the cal command without any arguments
(refer to Fig. 3.4).
$cal
(b) To display the calendar of March 2012, write the following command (refer to Fig. 3.4).
$ cal 3 2012
(c) To display the calendar for a whole year, specify the year in the cal command as shown
in Fig. 3.5.
$ cal 2012
on the command line, and the command immediately displays the result on pressing the Enter
key. To quit the interactive mode, we either press Ctrl-d or type quit followed by the Enter key.
Syntax bc [-l]
-l defines the math functions and initializes the scale to 20, instead of the default zero.
The functions that can be used with the bc command are given in Table 3.10.
Table 3.10 List of functions available with the bc command
Function Description
sqrt() It calculates the square root of the supplied number.
s() It calculates the sine value. The argument should be in radians.
c() It calculates the cosine value. The argument should be in radians.
a() It calculates the arctangent. The result of the function is displayed in radians.
l() It calculates the natural logarithm of the supplied number.
e() It calculates the exponential of the supplied number.
We can use all operators including +, -, *, /, %, ^, where % represents the mod operator, that
is, it returns the remainder and ^ represents ‘to the power’.
Apart from the -l option, we can also use the scale to specify the number of digits to the
right of the decimal point.
Examples bc
$bc
5/3
1
quit
$ bc -l
5/3
1.66666666666666666666
quit
$ bc
2 + 2
4
5/3
1
scale = 2
5/3
1.66
3^2
9
sqrt(81)
9.00
quit
52 Unix and Shell Programming
Table 3.11 Brief description of the wild cards used in Filename substitution—Globbing
filename substitution Filename substitution is the process by which the
shell expands a string containing wild cards into a list
Wild card Description
of filenames. The process of filename substitutions
! Used with [ ] to negate the meaning
is also known as globbing. Apart from the wild
~ Substitutes the user's home directory cards, *, ?, and [c1-c2], which we discussed while
{characters} Matches the given set of characters learning the ls command, Table 3.11 shows the
wild cards that are used in filename substitution.
Examples
(a) $ ls *
It displays all the names of the files and directories in the current directory.
(b) $ ls a*
It displays all the names of the files and directories that begin with the character a.
(c) $ ls *a
It displays all the names of the files and directories that end with the character a.
(d) $ ls *ab*
It displays all the names of the files and directories that contain ab.
(e) $ ls a*/*
It displays all the names of the files and directories that begin with the character a in all
the directories that are one level under the current directory.
The filename substitution applies to the files in the current directory. To match filenames
in the subdirectories, we need to use the / character.
(f) $ ls a*/*/*
It displays all the names of files and directories that begin with the character a in all the
directories that are two levels under the current directory.
(g) $ ls ???
It displays all the names of the files and directories that consist of three characters.
(h) $ ls ???*
It displays all the names of the files and directories that consist of at least three characters.
(i) $ ls student?.txt
It displays all the names of the files and directories that begin with the word student
followed by one character followed by extension .txt such as stduent1.txt, student2.
txt, and studenta.txt.
(j) $ ls [ab]*
It displays all the names of the files and directories that begin with either character a
or character b followed by zero or more occurrences of any character.
(k) $ ls [ab]*[12]
It displays all the names of the files and directories that begin with either character a or
character b followed by zero or more occurrences of any character and which end with
either the digit 1 or 2.
Basic Unix Commands 53
(l) $ ls [ab]*[1-5]
It displays all the names of the files and directories that begin with either character a or
character b followed by zero or more occurrences of any character and which end with
any digit from 1 to 5.
(m) $ ls [a-d]*
It displays all the names of the files and directories that begin with any character from a
through d followed by zero or more occurrences of any character.
(n) $ ls [a-d]??
It displays all the names of the files and directories that begin with any character from a
through d followed by exactly two characters.
(o) $ ls [!a-d]*
It displays all the names of the files and directories that begin with any character except
a through d followed by any number of characters.
(p) $ ls [A-Za-z]*
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case followed by any number of characters.
(q) $ ls [A-Za-z][a-z]*
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case, followed by any character from a through z
in lower case, followed by any number of characters.
(r) $ ls [A-Za-z][a-z][12]
It displays all the names of the files and directories that begin with any character from a
through z in either upper case or lower case, followed by any character from a through
z in lower case, followed by either digit 1 or 2.
(s) $ ls {aa,bb,cc}*
It displays all the names of the files and directories that begin with the characters aa, bb,
or cc followed by any number of characters.
(t) $ ls a*{d,1,z}
It displays all the names of the files and directories that begin with the character a
followed by any number of characters and, which end with d, 1, or z.
(u) $ ls a*{d,[1-3],[ab]}
It displays all the names of the files and directories that begin with the character a
followed by any number of characters and which end with d, a number from 1 through
3, or by either character a or b.
(v) The tilde (~) character by itself expands to the full path name of the user’s home directory.
The following echo command confirms this:
$ echo ~
/home/bintu
(w) When the tilde is appended before a path, it expands to the home directory and the rest
of the path name. Consider the following command.
$ cd ~/data
(x) We will be taken into the directory, data that is present within the user’s home directory.
The following pwd command confirms this.
54 Unix and Shell Programming
$ pwd
/home/bintu/data
(y) When the tilde is appended before a username, it expands to the full path name of that
user’s home directory. Consider the following command.
$ cd ~john
We will be taken to the user john’s home directory. The following command confirms this:
$ pwd
/home/john
➢ exit: Exiting
The exit command is used to log out of the Unix system, exit from a shell, and exit from a
shell script.
Syntax exit
Example exit
To log out of the Unix shell, Ctrl-d is a short cut that is used. Before exiting from the Unix
system, we should make sure that all the files that were open are saved and closed; else they
might get corrupted. Usually, when we exit from the shell, the currently running process
or command is automatically killed. In order to run the task in the background even after
exiting from the shell, we should use the nohup command (discussed in Chapter 6).
■ SUMMARY ■
1. When compared with the who command, the finger directory, phone number, login shell, mail status, and
command displays more elaborate information per- much more.
taining to users who are logged in. 3. Filename substitution or globbing is the process by
2. The finger command not only displays the login name, which the shell expands a string containing wild cards
terminal, date, and time of the logged-in users, but also into a list of filenames.
displays other information such as the user’s home
■ F U N C T ION SPECIFICATION ■
■ EXERCISES ■
Objective-type Questions
State True or False
3.1 The ls command shows the list of files and 3.10 We can delete more than one file with a single rm
directories that are sorted alphabetically by default. command.
3.2 The option used with the ls command to see the 3.11 With the rm command, we can forcibly delete a
names of the files and directories in reverse order file even if we do not have its write permission.
is -R. 3.12 With the mv command, we can move a file from
3.3 We can create only one directory at a time using one directory to another but cannot rename it.
the mkdir command. 3.13 The hard link should be created within the
3.4 The cd command, if given without any ar- current directory structure.
guments, will take us to our home directory. 3.14 We can log out of the Unix system using Ctrl-d.
3.5 With the touch command, we can only change 3.15 Through the cal command, we cannot see the
the timestamps of the files but cannot create files. calendar of the previous month.
3.6 While creating a file with the cat command, we 3.16 The mail status of the user can be seen through
need to use Ctrl-d to specify the end of the file. the finger command.
3.7 With the rmdir command, we can remove the 3.17 The uname command can be used to know the
non-empty directory as well. version and release of the operating system.
3.8 If we use the -i option with the cp command, it 3.18 The wild-card character ‘?’ represents a single
will prompt us before overwriting the destination character.
file if it already exists. 3.19 The bc or the basic calculator command can be
3.9 The cp command is used for making a copy of used to find the square root of a number.
the files; we cannot use it for copying an entire 3.20 The unlink command cannot delete symbolic
directory with its files and subdirectories. links.
delete an empty parent directory is . 3.12 The command used to display the calendar of the
3.7 The option is used with the rm current year is .
command to recursively delete all the files and 3.13 The command used to display information of
subdirectories of the specified directory. the logged-in user, including home directory,
3.8 The command used to create a link for a file is login shell, mail status, and phone number
known as . is .
3.9 There are two types of links to a file: 3.14 The option used with the ls command to display
and . the inode number of files is .
3.10 The option used with the date command to 3.15 The option used with the cat command that
display only the time is . displays non-printing characters in the file is
3.11 The function used to find the natural logarithm in .
the bc command is .
Multiple-choice Questions
3.1 The command bc-l sets the scale to 3.6 Apart from displaying contents of the files, the
(a) 20 (c) 10 command used for concatenating files is
(b) 5 (d) 6 (a) concat (c) merge
3.2 The tput cup 7 5 command moves the cursor to (b) cat (d) add_files
the 3.7 There are two types of links of files—hard and
(a) seventh row and fifth column (a) tough (c) volatile
(b) fifth row and seventh column (b) robust (d) symbolic
(c) top left corner of the screen 3.8 The echo command ~ will display
(d) right bottom corner of the screen (a) error
3.3 The command date +%M will display (b) list of files and directories
(a) month in character form (c) home directory of the user
(b) month in numerical form (d) profile file
(c) minutes 3.9 The following command is used to display the
(d) a.m./p.m. names of the files and directories that consist of
3.4 The option used in the cp command for interactive at least two characters:
copying is (a) ls??* (c) ls *
(a) -i (b) -r (c) -c (d) -d (b) ls (d) ls ?*
3.5 The following option is used in the cat command 3.10 The option used in the ls command to show files
to suppress messages when a non-existent file is and directories that are sorted on their modi-
used in the command: fication time is
(a) -o (b) -v (c) -n (d) -s (a) -m (b) -a (c) -t (d) -u
Programming Exercises
3.1 What will the following commands do? txt/college/students
(a) $ls [a-d]?? (j) $ rm -r college
(b) $ls [a-z][0-9]* (k) $ mv mbacourse.txt management.txt
(c) $ls -Rt (l) $ ln -f juice.txt energy.txt
(d) $mkdir -m 740 apple (m) $ finger Charles
(e) $mkdir -p fruits/delicious/apple (n) $ bc
(f) $touch 07151000 mbacourse.txt scale = 2
(g) $ cat mbacourse.txt lawcourse.txt 17/3
(h) $rmdir -p fruits/delicious/apple (o) $cal 10 2012
(i) $ cp /fruits/delicious/apple/juice. 3.2 Write the command for the following tasks:
Basic Unix Commands 57
(a) To display the list of files and directories (i) To change the password
that begin with a vowel (j) To create a link of the file mbacourse.txt in
(b) To change the access time of the file the name management.txt (If a file by the
mbacourse.txt to Feb 10 09:15 name management.txt already exists, we
(c) To show the contents of the file mbacourse. should be asked for a confirmation before
txt along with line numberings overwriting its contents.)
(d) To concatenate the contents of the two files (k) To get the list of all online users with their
mbacourse.txt and lawcourse.txt and activity and column headers
store them in a third file career.txt (l) To display day, month, and year in the
(e) To remove the empty subdirectories, students format 17 Nov 2012
and teachers, from the college directory (m) To log out from the Unix system
(f) To copy the entire directory teachers along (n) To show all the names of the files and
with its subdirectories in the name faculty directories that begin with any character
(g) To forcibly remove the file mbacourse.txt from a through z followed by exactly three
from the college directory characters
(h) To move the file mbacourse.txt from the (o) To find the square root of number 17 (The
current directory to the professional sub- result should be displayed up to five places
directory of the college directory of decimals.)
Review Questions
3.1 Explain the following commands with their syntax 3.4 What do you mean by escape characters?
and examples. Explain their usage through the echo command.
(a) ls (d) rmdir 3.5 Explain the term globbing with examples.
(b) who (e) cp 3.6 What is the use of the date command? Name the
(c) touch options that are used with the date command to
3.2 Explain the differences between the following: display only the year, hour in military format,
(a) Hard and symbolic links and only the day.
(b) who and finger commands 3.7 Explain the command used to exploit terminal
(c) cat and touch commands capabilities.
(d) rm and rmdir commands 3.8 Explain with examples the command that is used
3.3 What is the use of the bc command? Explain a to display the calendar of the desired month and
few functions that are associated with it. year.
Brain Teasers
3.1 In the long-listing command ls –li, if you find the communication network? If yes, mention the
two or more files having the same inode number, command.
what does it mean? 3.5 Consider the following cat command:
3.2 Identify the error in the following command and $ cat chirag notes.txt
correct it to display all the files that consist of It displays an error indicating that the file notes.
exactly four characters. txt does not exist. How can you avoid this error
$ ls **** message?
3.3 Identify the error in the following command and 3.6 If on using s() function in the bc command
correct it to display the hardware platform of the for finding sine value, a wrong answer was
current machine. obtained, identify the error.
$ uname -v 3.7 You want to change your password but the follow-
3.4 Can you display the node name, that is, the ing command is not working. Where is the error?
name by which your machine is connected in $password
58 Unix and Shell Programming
3.8 Is there any way to copy the content of the files to 0 places of decimal. What change is required
a.txt and b.txt to a file c.txt without deleting to be made in order to get the result up to 20
the earlier content of file c.txt? If yes, what is decimal places ?
that? $ bc
3.9 What should the command given to display the 17/3
hardware platform and name of the operating 3.13 What is the mistake in the following command
system on a machine be? for changing the modification time of the file?
3.10 You wish that a confirmation prompt appears a.txt to Oct 15 04:15?
before deleting the files. However, by using the $ touch –a 10150415 a.txt
following command, the confirmation message 3.14 The following command to recursively
is not prompted. Where and what is the error? copy the content of the directory projects to
$ rm -f a*.* experiments is not working. Identify the error
3.11 The following command creates a hard link of and correct it.
the file a.txt in the name b.txt. What changes $ cp projects experiments
are required to be made to this command in order 3.15 The following date command is not displaying
to create a symbolic link instead of a hard link? century in four digits. Identify the error and
$ ln a.txt b.txt correct it.
3.12 The following bc command displays the result $ date +%y
Commands
4
After studying this chapter, the reader will be conversant with the following:
• Advanced commands used in the Unix operating system such as setting access permissions for the
existing files and directories, setting default permissions for the newly created files and directories,
creating groups, changing ownerships of the files, and sharing files among groups
• Sorting content and performing input/output (I/O) redirections, that is, diverting the output of a command
to a file or providing input to a command from a file
• Cutting or slicing the file vertically, pasting content, splitting files, counting characters, words, and lines
in files or other content, and using a pipe operator, that is, sending the output of a command as input to
another command
• Displaying the top and bottom contents of a file, presenting content page-wise, and displaying the
manual of any command
• Comparing files, eliminating and displaying duplicate lines in two files, and displaying and suppressing
the unique and common content in two files
• Printing documents, setting reminders of appointments, carrying out conversions between DOS and
Unix files, and measuring time usage in the execution of commands
4.1 OVERVIEW
The advanced Unix commands help us perform several tasks such as setting access permissions
for the existing files and directories, setting default permissions for the newly created files and
directories, changing ownership of the files, and sharing files among groups. These commands
also include sorting file content, performing input/output (I/O) redirections, and piping the
output of a command as input to another command. Unix also offers commands for operations
such as cutting or slicing the file vertically, pasting content, splitting files, counting characters,
words, and lines in files, extracting the top and bottom contents of files, presenting content
page-wise, and displaying manual commands. These commands also include comparing files,
eliminating and displaying duplicate lines in two files, suppressing the unique and common content
in two files, printing documents, setting reminders of appointments, carrying out conversions
between DOS and Unix files, and measuring the time usage in the execution of commands.
60 Unix and Shell Programming
The list of advanced commands that will be covered in this chapter is as follows:
chmod, umask, chown, chgrp, groups, input/output redirection in Unix, pipe operator, cut,
paste, split, wc, sort, head, tail, diff, cmp, uniq, comm, time, pg, lp, .profile, calendar,
script, dos2unix, and man.
We can view the permissions of a file or directory through the long listing command. The
following example shows the long listing of file mce1.
Example $ ls –l mce1
This statement requests the long directory listing for the ordinary file called mce1. We might
get the output shown in Fig. 4.1.
The dash (-) in the file type field indicates that it is an ordinary file. The access permissions
field tells us what kinds of access permissions are granted. The number, 1, indicates that
there is only one link for this file from the directory, which means that this file only has one
Advanced Unix Commands 61
File type Permissions Links Owner Group Size Date and time filename
of last modification
Fig. 4.1 Output of the long listing command for the file mce1
name associated with it. The word chirag is the owner’s name; it is the group name that has
access to this file; 120 refers to the file size; Mar 15 12:20 is the date and time the file was last
modified; and mce1 is the filename.
We have seen that long listing shows the permissions for all the three system users—User,
Group, and Other—besides other information such as name of the file (or directory), size,
date, and time of last access. Assume that the permissions for the file mce1 are as follows:
r w x r - x r - - 1 chirag it 120 Mar 15 12:20 mce1
The first three characters, r, w, and x, are the permissions for the User. This is followed by the
permissions for the Group members. The last three characters represent the permissions for the
Other member. The aforementioned output indicates that the User has all the three permissions,
r w x (read, write, and execute), for the file mce1. The permissions r – x indicate that the
Group members have read and execute permissions for the file mce1. The missing permission is
represented by a hyphen (-). The Other users have only r, that is, read permission for the file mce1.
Suppose the permissions for the file mce1 are as follows:
r - x - - x - - - 1 chirag it 120 Mar 15 12:20 mce1
The permissions indicate that the User has r - x, that is, read and execute permissions for
the file mce1. The Group members have - - x, that is, only execute permission for the file,
and the Other members have no permission (- - -), that is, the Other members cannot read,
write, or execute the file mce1.
Let us take a look at how we can assign and remove permissions from a file or directory.
Table 4.2 Brief description of options used with the chmod Table 4.3 Brief description of modes used with
command the chmod command
Option Description Mode Description
u Represents User or the owner of the file r or 4 Represents read permission
g Represents Group w or 2 Represents write permission
O Represents Other x or 1 Represents execute permission
A Represents all (User, Group, and Other). It is the
default option
This command assigns permission 7,
+ Adds access permission
4(r) + 2(w) + 1(x), that is, read, write, and
- Removes access permission
execute permissions for the file a.txt to
= Assigns permission to u, g, o, or a the user (or owner) of the file. Permission
6, 4(r) + 2(w), that is, read and write
permission is assigned to the group members of the file, and 0 or no permission to
other users. The other users cannot read, write, or execute the file a.txt. Refer to
Fig. 4.2 to view the output of the command.
(c) $chmod o+r a.txt
This command adds the read permission to the other members for the file a.txt. Other
existing permissions are left undisturbed. Refer to Fig. 4.2 to view the output of the
command.
(d) $chmod u-x,g-w+x,o+wx a.txt
It removes the execute permission of the user (i.e., owner), removes the write permis-
sion of the group members, adds execute permission to the group members, and adds
write and execute permissions to the other users. The existing permissions are left
undisturbed. Refer to Fig. 4.2 to view the output of the command.
Note: There should not be any space after the comma (,) or while specifying permissions of the user, group,
and others in the command.
The first 0 indicates that what follows is an octal number. The three digits that follow the first
zero refer to the permissions to be denied to the owner, group, and others. This means that for
the owner no permission is denied, whereas for both the group and others, write permission
(2) is denied.
Whenever a file is created, Unix assumes that the permissions for this file should be 666.
However, since our unmask value is 022, Unix subtracts this value from the default system-
wide permissions (666) resulting in a value 644. This value is then used as the permissions
for the file that we create.
This is the reason why the permissions turned out to be 644 or rw-r--r-- for the file
chirag that we created.
Similarly, the system-wide default permissions for a directory are 777. This implies that
when we create a directory its permission would be 777 − 022, that is, 755.
Note: If a directory does not have an execute permission we will never be able to enter data into it.
This would ensure that from this point onwards, any new file that we create would have the
permissions 324 (666 − 342) and any directory that we create would have the permissions
435 (777 − 342).
The options and arguments of this command are briefly explained in Table 4.4.
Table 4.4 Brief description of options used in the chown command
Option Description
-R The command applies recursively to the files and subdirectories of the current directory.
new_owner It is the new owner of the files, that is, new_owner will become the new owner of the files
and hence gets all the permissions to access the file and modify its access permissions.
new_group It is the group name to which we want to assign the files.
filenames These are the files whose ownership we wish to change.
To change both the owner and the group of the file, new_owner must be followed by a
colon and a new_group with no space in between.
Note: If no new_group is specified after the new_owner and colon, the owner and group of the file is changed
to new_owner and group of new_owner, respectively.
If the new_owner is missing but colon and new_group are specified then only the group of the files is changed,
that is, the command will act as the chgrp command. We will learn about the chgrp command next.
Examples By default, when we create or copy a file, we become its owner. For example,
suppose we have a file named notes.txt and we want to change its ownership to another
person named Ravi.
Let us first view the current owner of the file by giving the following command:
(a) $ ls –l notes.txt
-rwxrwxr-x 1 chirag it 120 Mar 15 12:20 notes.txt
We can see that chirag is the current owner of the file. Now chirag can give the following
command to give the ownership of the file notes.txt to Ravi.
(b) $ chown ravi notes.txt
To see whether the ownership is changed, let us again give the ls –l command.
(c) $ ls –l notes.txt
-rwxrwxr-x 1 ravi it 120 Mar 15 12:20
notes.txt
We can see that the owner of the file notes.txt is changed from chirag to ravi.
Now, chirag will no longer be able to change the permissions of the file notes.txt and
only ravi can do so.
Advanced Unix Commands 65
Note: This process is one way because we must either be the owner of the file or the super user to
change its ownership. After we give the file to ravi, we cannot get its ownership back until and unless
ravi issues the chown command to return the ownership to us.
The options and arguments of this command are briefly explained in Table 4.5.
Table 4.5 Brief description of the options used in the chgrp command
Option Description
-R It recursively changes the group of the files and subdirectories of the specified directory.
-h If the specified file is a symbolic link, its group is changed. In the absence of a -h option,
the group of the file referenced by the symbolic link is changed and not the symbolic link.
new_group It is the group name we want to assign the files to.
filenames Specifies the files whose group we want to change.
By default the file we create gets group ownership in the group we belong to, that is, the
group to which the owner belongs becomes the default group ownership of the file.
For example, if we belong to the group it, our file will also have the same group ownership,
as can be seen by the following command:
$ ls –l notes.txt
-rwxrwxr-x 1 chirag it 120 Mar 15 12:20 notes.txt
The following command is used to change the group ownership of a file named notes.txt
from group it to group hospital:
66 Unix and Shell Programming
Note: The group hospital must exist before giving this command.
Note: Since we are still the owner of the file, we can again change its group ownership any time.
Examples
(a) $chgrp -R hospital projects
This command changes the group of all the files and subdirectories present in the projects
directory to hospital.
(b) $chgrp -h hospital finance.txt
This changes the group of the symbolic file finance.txt to hospital.
Example
(a) % groups chirag
mba
This example asks the group name of the user, chirag. The output mba signifies that the
user chirag belongs to the group named mba.
We can also find the group membership of more than one user simultaneously as follows:
(b) % groups chirag ravi
chirag : mba
ravi : other
This command asks the group names of the two users, chirag and ravi. The output indicates
that the user chirag belongs to the mba group and the user ravi belongs to the other group.
This will create a group by the name bankproject. After creating a group, the next step is to
set the group ownership of the file(s) to the given group using the chgrp command.
Syntax $ chgrp groupname filename
Advanced Unix Commands 67
Example
(a) $ chgrp bankproject accounts.txt
This will set the group owner of the file accounts.txt to our newly created group bankproject.
Similarly, we need to change the group ownership of all the files that we wish to share with
the users of our group bankproject. Thereafter, we need to set the file permissions so that
everybody in the group can read and write the file through the following syntax:
Syntax $ chmod g+rw filename
We can also assign access permissions to the group in the following way:
Syntax $ chmod 770 filename
This example assigns read, write, and execute permissions to the owner and group members
of the file and no permission to the other users.
Here, input_file is the name of the file from where the data will be supplied to the command
for the purpose of computation.
Examples
(a) $ sort < kk
The sort command in the example receives the input stream of bytes from the file kk.
We can also combine input and output redirection operators.
(b) $ sort < kk > mm
On using the command, nothing will appear on the terminal screen; instead the content
of the file kk will be sorted and sent directly to the file mm.
Here, the output of the cat command is sent as input to another command, wc. The wc command
counts the lines, words, and characters in the file notes.txt whose content is passed to it.
We can combine several commands with pipes on a single command line as follows:
$ cat notes.txt | sort| lp
This command sorts the content of the file notes.txt and sends the sorted content to the
printer for printing.
Note: The pipe operator provides a one-way flow of data that is from left to right, whereas the redirection
operator enables two-way flow of data.
Here, –c refers to columns or characters and –f refers to the fields, that is, words delimited
by whitespace or tab.
Advanced Unix Commands 69
Examples
(a) cut -c 6-22,30-35 bank.lst
This command retrieves 6-22 characters and 30-35 columns (characters) from the file
bank.lst and displays them on the screen.
Let us look at another example.
(b) $ cut -f2 bank.lst
We get the content of the second field of the file bank.lst displayed on the screen.
Let us assume the file bank.lst has the following content.
101 Anil
102 Ravi
103 Sunil
104 Chirag
105 Raju
Note: The fields in the file bank.lst are separated by a tab space.
Here, the cut command will display the second field of the file bank.lst, that is, we will get
the output shown in Fig. 4.3.
The fields in the file bank.lst are delimited by a tab. If they are
$ cut -f2 bank.lst
separated by a delimiter other than tab or white space, then the
Anil output of the cut command will be different.
Ravi
Sunil Let us assume the file bank.lst has the following content.
Chirag
Raju 101,Anil
102,Ravi
Fig. 4.3 Output
103,Sunil
displaying second field of
104,Chirag
the file bank.lst
105,Raju
We can see that the fields of the file bank.lst are delimited by a comma (,) and not by a tab or
white space. The following command will not display anything on the screen as the default
delimiter for identifying fields is either white space or tab.
$ cut -f2 bank.lst
Hence, the file bank.lst will be considered to be consisting of a single field on each
line.
To specify the delimiter when the fields are delimited by some other character other than
tab or white space as in the aforementioned file, we use -d (delimiter) to specify the field
delimiter as shown in the following example:
cut -f2 -d "," bank.lst
This statement will show the second field of the file bank.lst where the fields are delimited
by commas (,).
Assume that the fields are delimited by a comma (,). The following statement cuts the
fields, starting from the first, from the file bank.lst:
$ cut -d"," -f1- bank.lst
70 Unix and Shell Programming
Assuming that the fields are delimited by commas (,), the following statement cuts the first
field, fourth field, and so on, from the file bank.lst:
$ cut -d"|" -f1,4- bank.lst
Can we cut the fields of two separate files and paste them to make a third file? Yes, of course.
Let us see how.
Assume there are two files, Names and Telephone, with the following contents.
The Names file consists of employee codes and names as follows:
101 Anil
102 Ravi
103 Sunil
104 Chirag
105 Raju
The Telephone file consists of employee codes and telephone numbers as follows:
101 2429193
102 3334444
103 7777888
104 9990000
105 5555111
Let us cut the second field from both the files and paste them to make a third file, that is, cut
the employee names from the Names file and telephone numbers from the Telephone file and
paste them to create a third file.
To cut out the second word (field) from the file Names, we give the following
command:
$ cut -f2 Names
The names and telephone numbers will be saved in two files, names.txt and numbers.txt,
respectively. To paste the content of the two files, we need to understand the paste command.
Let us now study this command.
Advanced Unix Commands 71
Table 4.6 Brief description of the options used in the paste command
Option Description
-s The paste command usually displays the corresponding lines of each specified file.
The -s option refers to a serial option and is used to combine all the lines of each file
into one line and display them one below the other.
-d This option is for specifying the delimiter to be used for pasting lines from the specified
files. The default delimiter used to separate the lines from the files is the Tab character.
Fig. 4.5 Pasting of two files names. The output will be as shown in Fig. 4.5.
txt and numbers.txt with the We can see that the corresponding lines of the files
default tab character in between names.txt and numbers.txt are pasted with a tab character
in between. By default, the paste command uses the tab
Anil:2429193 character for pasting lines; however, we can specify a
Ravi:3334444
Sunil:7777888 delimiter of our choice with the -d command as shown in
Chirag:9990000
Raju:5555111 the following example.
$ paste -d"|" names.txt numbers.txt
Fig. 4.6 Two files names.txt
and numbers.txt pasted with This joins the two files with the help of the | delimiter
the ‘|’ symbol in between and not tab (i.e., between names and telephone numbers,
there will be a ‘|’ symbol instead of the tab
$ paste -s names.txt numbers.txt character, as shown in Fig. 4.6).
Anil Ravi Sunil Chirag Raju The example shown in Fig. 4.7 serially
2429193 3334444 7777888 9990000 5555111
pastes the contents from the files. It combines
Fig. 4.7 Two files names.txt and numbers.txt all the lines of each file into one line and
pasted one below the other displays them one below the other.
Table 4.7 Brief description of the options and arguments used in the split command
Option Description
-b n It splits the specified file into pieces that are n bytes in size.
-b nK It splits the specified file into pieces that are n kilo bytes in size.
-b nM It splits the specified file into pieces that are n mega bytes in size.
-l n It splits the specified file into n number of lines (default option). The default value of n is 1000.
-n It is the same as -l n.
File_name It is the name of the file to be split.
dest_file It is the name of the file in which the split pieces will be stored. If the dest_file is, say, demo, the
split pieces will be stored in the files demoaa, demoab, demoac, and so on.
The options and arguments shown here are briefly explained in Table 4.8.
Table 4.8 Brief description of the options used in the All lines in the filename will be
sort command arranged in alphabetical order on the
basis of the first character of the line.
Option Description The other syntax for using the sort
-n Sorts numerical values instead of ASCII, command is as follows:
ignoring blanks and tabs
Syntax sort +p1 - p2 filename
-r Sorts in reverse order
-f Sorts upper and lower case together, that This limits the sort to be applied on the
is, ignores the difference in upper and basis of the characters beginning from
lower case field p1 and ending at field p2. If p2 is
omitted, then sorting will be done on
-u Displays unique lines, that is, it eliminates
duplicate lines in the output
the basis of the characters beginning
from field p1 till the end of the line.
filename Represents the file to be sorted
Examples
(a) $ sort +2 -4 bnk.lst
74 Unix and Shell Programming
This command skips the first two fields and uses the third and fourth fields for sorting
the file bnk.lst.
(b) $ sort +3 -4 bnk.lst
This command skips the first three fields and uses the fourth field for sorting the file
bnk.lst.
(c) $ sort +2 bnk.lst
This command skips the first two fields and uses the third and the rest of the fields up till
the end of the line for sorting the file bnk.lst.
(d) $ sort bnk.lst -o bank.lst
This command sorts the file bnk.lst and stores the result in bank.lst.
(e) $ sort +0 -1 bnk.lst
This command sorts the file bnk.lst on the basis of the first field.
(f) $ sort +1 -4 bnk.lst
This command sorts the file bnk.lst from the second to the fourth fields.
(g) $ sort +2b bnk.lst
This command sorts the file bnk.lst on the third field after ignoring leading blank spaces.
The -f option is used to ignore the upper and lower case distinction.
(h) $ sort +2bf bnk.lst
The command will sort the third field after ignoring leading blank spaces and sort the
upper and lower case data together.
The -n option is used for sorting the file on the basis of numerical values rather than
ASCII values.
(i) $ sort -n +2 -3 a.bat
The command sorts the file a.bat on the third field, on the assumption that it is a
numerical field.
The -r option is used for sorting a given file in reverse order.
(j) $ sort -r link.lst
The command will sort the file link.lst in the reverse order. The -u option will eliminate
duplicate lines in the sorted output.
(k) $ sort -nu +2 -3 a.bat
The command sorts the file a.bat on the third field after eliminating duplicate lines.
When used without an option, this displays the first ten records (lines) of the specified file.
head -3 bnk.lst
It will display the first three lines of both the files, bnk.lst and notes.txt, one after the other.
All the differences found in the two files are displayed in a format consisting of two numbers
and a character in between. The number to the left of the character represents the line number
in the first file, and the number to the right of the character represents the line number in the
second file. The character can be any of the following:
1. d: delete
2. c: change
3. a: add
Example Assume we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy
Note: The < character precedes the lines from the first file and > precedes the lines from the second file.
This output indicates that the two files differ by only one line. It indicates that if the second
line, Peter, in the first file (users.txt) is changed to the second line, Charles, of the second
file (customers.txt), both files will be exactly the same.
To better understand the diff command, let us twist the content of the first file users.txt
as follows:
users.txt
John
Peter
Charles
Keeping the content of the file customers.txt same as before, when we compare the two
files, we get the following output.
$ diff users.txt customers.txt
2d1
< Peter
3a3
> Troy
The output indicates that to make the file users.txt the same as customers.txt, we have to
delete the second line, Peter, and add the third line, Troy, from customers.txt after the third
line in users.txt.
Advanced Unix Commands 77
The related options and arguments are briefly explained in Table 4.10.
Table 4.10 Brief description of the options and arguments used in the cmp command
Option Description
-l It prints the byte number and the differing byte values in octal for each difference.
-s It displays nothing but the return exit status on the screen. The status returned can be
any of the following:
0: If the two files are identical
1: If the two files are different
>1: If an error occurs while reading the files
file1 and file2 These are the files to be compared.
skip1 and skip2 These are the optional byte offsets from the beginning of file1 and file2 respectively,
where we wish to begin the comparison of files. The offset can be specified in
decimal, octal, and hexadecimal. The offsets in hexadecimal and octal formats have
to be preceded by ‘0x’ and ‘0’, respectively.
Example Consider we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy
The following are examples of commands that are used to compare the two files.
The cmp command compares the files users.txt and customers.txt and displays the
following output.
users.txt customers.txt differ: byte 6, line 2
The output indicates that the byte location where the first difference between the two files
(users.txt and customers.txt) occurs is 6.
The following example shows the list of byte locations and the differing byte values in
octal format for every difference found in the two files:
$cmp -l users.txt customers.txt
78 Unix and Shell Programming
The related options and arguments are briefly explained in Table 4.11.
Table 4.11 Brief description of the options and arguments used in the uniq command
Option Description
-c It precedes each line with a count of the number of occurrences.
-d It displays only repeated lines (duplicate) in the input.
-u It displays only unique lines in the input.
-f fields It ignores the first given number of fields on each input line.
-s char It ignores the first given number of characters of each input line. If this option is used along
with the -f option, the first given number of characters after the first fields will be ignored.
input_file It is the name of the file whose content we need to compare.
output_file It is the name of the file where the output of the command will be stored. If no output file is
specified, the output will appear on the standard output.
Advanced Unix Commands 79
This command removes duplicate lines in the file a.txt and saves it in another file b.txt.
Let us assume the file a.txt contains the following content:
a.txt
It may rain today
I am leaving now
It may rain today
Lovely weather
I am leaving now
The following is the command for removing all duplicate lines from a file.
$ sort a.txt | uniq
This command sorts and removes all the duplicate lines in the file a.txt and displays only
the unique lines on the screen. We get the following output.
I am leaving now
It may rain today
Lovely weather
The following command is used to display all the duplicate lines in a file.
$ sort a.txt | uniq -d
The following command is used to display the count of duplicate occurrences in a file.
$ sort a.txt | uniq -c
Option Description
-1 It suppresses the display of the content that is unique to file1. It also displays the unique
content in file2.
-2 It suppresses the display of the content that is unique to file2. It also displays the unique
content in file1.
-3 It suppresses the display of the content that is common to both file1 and file2, that is, it
displays the unique content in file1 and file2.
file1 and file2 These are the two files being compared.
Note: When the comm command is executed without any options, the output will comprise three columns,
where the first column displays content unique to the first file, the second column displays content unique to
the second file, and the third column displays content common to both the files.
Examples Suppose we have two files, users.txt and customers.txt, with the following
content.
users.txt customers.txt
John John
Peter Charles
Troy Troy
(d) This example compares the aforementioned two files and suppresses the content that is
common in customers.txt and users.txt (Fig. 4.13).
$comm -3 users.txt customers.txt
The real time refers to the time elapsed from the invocation of the command till its
termination. The user time shows the time spent by the command in executing itself
while sys indicates the time used by the Unix system in invoking the command.
(b) Let us see how much time it takes to store the recursive long listing of files and directories
sorted on modification time in a file.
$ time ls -ltR >k.out
real 0m0.04s
user 0m0.01s
sys 0m0.01s
Real time The real time represents the time taken by the command (from its initiation to
termination) to execute.
User time The user time represents the time taken by the command to execute its own
code, that is, the code run in user mode. It represents the actual CPU time used in executing
the command. For small programs that take milliseconds to execute, this time is often
reported as 0.0.
Sys time The sys time is the amount of CPU time spent in the kernel for running the
command. It represents the CPU time spent in executing the system calls that are invoked by
the command within the kernel.
The time command can be used to isolate the commands that are time consuming so that
they can be run in the background. We will learn the process of executing the commands in
the background in Chapter 6.
Note: The combination of user and sys time is known as CPU time.
82 Unix and Shell Programming
Here, -number specifies the screen size in lines. The default screen size is 23 lines. +line_
number shows the file from the given line number. +/pattern/ shows the file where the given
pattern begins. filename specifies the filename that we wish to view page-wise along with its
path.
Table 4.13 Brief description of the list of commands The list of commands that can be
given on execution of the pg command given on execution of the pg command
are briefly explained in Table 4.13.
Command Description
h Displays help information
Examples
q or Q Quits the pg command
(a) $ pg letter.txt
<blank> or Moves to the next page
This command displays the file
<newline>
letter.txt one screen page at a
$ Moves to the previous page time.
f Skips the next page (b) $pg letter.txt -10
/pattern Searches forward for the given pattern and This command displays the
displays it content of the file letter.txt
?pattern Searches backward for the given pattern one screen page at a time where
and displays it a page consists of 10 lines.
The options used in this command are briefly explained in Table 4.14.
Advanced Unix Commands 83
Option Description
-d It is used for defining the printer destination, that is, the name of the printer we wish to print
the file(s) with.
-n It is used to define the number of copies to print. The valid range is from 1 to 100.
-P It is used to define the pages of a selected file that we wish to print. The page list contains the
page numbers and page range separated by commas (,). Examples: 1, 5, 9–11, 20.
-i It is used to identify the job ID assigned to the print command. On giving the lp command,
it notifies the job ID assigned to the task.
-H It is used to control the printing job. The values used with this option are as follows:
1. Hold: Holds the printing job
2. Resume: Resumes the printing job
3. HH:MM: Holds the job till the specified time
4. Immediate: Prints the job immediately
-q - It is used to set the priority of the print job. The valid values are from 1 (indicates lowest
priority) till 100 (indicates highest priority). The default priority value is 50.
The options and arguments used in this command are briefly explained in Table 4.15.
Table 4.15 Brief description of the options used in the
cancel command
Option Description
id It indicates the print job ID that we wish
to cancel.
printer_ It removes all jobs from the specified
destination printer destination.
Examples
(a) $cancel Deskjet1001
This command cancels all print jobs sent for printing at the Deskjet1001 printer.
(b) $cancel 1207
This command cancels the print job with ID 1207.
The most basic variables used in the .profile file to set up an environment for us are as follows:
1. The PATH variable defines the search path to find the commands and applications that we
execute. Through the PATH variable, the commands and scripts can be executed in directories
other than their source directories (directories where the command or script exists).
2. $HOME is the name of the directory from where we begin our Unix session.
3. ENV refers to the environment variables.
We will learn about these variables in detail in Chapter 5.
Using any editor, we can add commands to the .profile file, which we wish to execute
automatically when we log in. Chapter 8 will help you use different editors. A new command
added to .profile will come into effect either when we log out and log in again or when we
run the .profile file at the command prompt through the following command:
$.$HOME/.profile
Advanced Unix Commands 85
Example For this command to work, we need to create a file named calendar at the root
of our home directory and write our appointments or reminders in the following format.
10/7/2012 Today is Board Meeting
10/8/2012 Visiting Doctor
Now, if today is 7 October 2012, and we execute the calendar command, the line Today is
Board Meeting will appear on the screen.
Note: To avoid executing the calendar command every day, add it at the end of our .profile file that we
just discussed.
The options and arguments used in the command are briefly explained in Table 4.16.
Table 4.16 Brief description of the options used in the script command
Option Description
-a It appends the session into the filename. If this option is not specified, the filename
will be overwritten with the new data.
filename This gives the name of the file where our session will be recorded. If we do not
provide a filename to the script command, it places its output in a default file
named transcript.
Example The following example will begin recording the session in the file transact.txt:
$ script transact.txt
To exit from the scripting session, either press Ctrl-d or write exit on the command prompt
followed by the Enter key.
Figure 4.14(a) shows how the session is recorded in the file transact.txt. The commands
executed, cat, sort, mkdir, rmdir, etc., are recorded into the file transact.txt. To stop
recording, Ctrl-d keys are pressed. To confirm if the session is properly recorded in the file,
we execute the cat command to view the contents of the file transact.txt. Figure 4.14(b)
confirms that the session is correctly recorded in the file transact.txt.
86 Unix and Shell Programming
$ mkdir projects
$ mkdir projects
$ rmdir projects
$ rmdir projects
$ ^d
$ Script done, file is transact.txt
$ Script done on 21 February 2012
10:25:19 PM IST
(a) (b)
Fig. 4.14 Recording a session (a) Recording in the file transact.txt (b) Recorded session
The options and arguments used in this command are briefly explained in Table 4.17.
Advanced Unix Commands 87
Here, – (hyphen)displays the information without stopping; -k pattern searches all the
commands documented in the man pages that contain the specified pattern, and displays the
list of matching commands.
Example $ man cp
This example displays the manual of the cp command. If the manual consists of several pages,
the first page will be displayed and we can press the spacebar to move on to the next page.
$ man -k backup
88 Unix and Shell Programming
$ man -k backup
/usr/dt/man/windex: No such file or directory
/usr/man/windex: No such file or directory
/usr/oenwin/share/man/windex: No such file or directory
$ catman -w
/usr/lib/getNAME: gnome-session-save.l - repeated date
$ man - k backup
asadmin-backup-domain asadmin-backup-domain (las) - performs a backup
on the domain
asadmin-list-backups asadmin-list-backup (las) - lists all backups
and restores
asadmin -restore-domain asadmin-restore-domain (las) - restores files
from backup
backup-domain asadmin-backup-domain (las) - performs a backup on the domain
list-backups asadmin-list-backups (las) - lists all backups and restores
nisbackup nisbackup (lm) - backup NIS+ directories
nistrestore nisrestore (lm) - restore NIS+ directory backup
restore-domain asadmin-restore-domain (las) - restores files form backup
tdbackup tdbbackup(lm) - tool for backing up and for validating the in
egbrity of samba \&. tdb files
This example searches the documentation in the man pages and displays the list of commands
that contain the pattern backup. In case we get an error—windex directory not found—as
shown in Fig. 4.15, we need to create the windex directory by giving the catman –w command.
The windex directory once created will show the manual entry of the desired pattern.
Figure 4.15 shows the manual entry having the pattern backup.
Keys Description
Ctrl-h It erases text.
Ctrl-c The Interrupt key terminates any currently running process and returns to the prompt.
Ctrl-d It represents the exit or end of a transaction. The keys are used to indicate that the entering of text is complete.
Ctrl-j It represents the Enter key.
Ctrl-s It suspends the output temporarily and is usually used to stop the scrolling of screen output.
Ctrl-q Its function is opposite to that of Ctrl-s. It resumes the scrolling of output.
Ctrl-z It temporarily suspends a program and provides another shell prompt. In order to resume, it uses the jobs
command to find the program’s name and restarts it with the fg command.
Ctrl-u It kills the command line, that is, clears the complete line.
Ctrl-\ It terminates the running command and creates a core file containing the memory image of the command.
Advanced Unix Commands 89
We can change these default keys for erasing characters and killing a line through the stty
command. The stty command is discussed in detail in Chapter 10.
This chapter dealt with numerous advanced Unix commands. It covered the essential
commands for changing the permissions of files and directories, changing ownership and
groups, sharing files among groups, pipe operators, etc. In addition, the chapter covered
commands such as cut, paste, head, and tail that are used to extract desired regions from
given files. For comparing files, diff, cmp, uniq, and comm commands were discussed.
Commands for printing, measuring the time consumed in running certain commands,
showing calendar, recording sessions, and configuring the environment through .profile
have also been explained in detail.
■ SUMMARY ■
1. There are three classes of system users in Unix: The append operator, ‘>>’ is used for appending the output
Owner, Group, and Other. The read permission has a of a command to a file, that is, without overwriting its older
value = 4, the write permission has a value = 2, and the content. To redirect the standard input, we use the input
execute permission has a value = 1. redirection operator, that is, the ‘<’ (less than) symbol.
2. Unix assumes the default permissions of a directory 5. The pipe operator ‘|’ is used for sending the output of
to be 777 and that of a file as 666 and subtracts the one command as the input to another command.
permissions specified in the umask command to define 6. The difference between the pipe operator and the
their permissions at the time of their creation. output indirection operator ‘>’ is that the output
3. By default, each command takes its input from the indirection operator ‘>’ is mostly used for sending
standard input and sends the results to the standard the output of a command to a file, whereas the pipe
output; however, through I/O redirection, we can operator is used for sending output of a command to
change the default location of input and output. another command for further processing.
4. The ‘>’ (greater than) symbol is known as the output 7. DOS (or Windows) files end with both the line feed and
redirection operator and we can use it to divert the output carriage return, whereas Unix files end only with the
of any command to a file instead of the terminal screen. line feed character.
■ F U N C T ION SPECIFICATION ■
■ EXERCISES ■
Objective-type Questions
State True or False
4.1 The three classes of system users that are used in 4.5 If we transfer the ownership of our file to
assigning permissions to the files and directories another person, we can no longer change its file
are Owner, Group, and Family. permissions.
4.2 To delete a file, a write permission is not required 4.6 Either the owner or the super user can change the
but an execute permission is required. ownership of a file.
4.3 By using the umask command, we can specify 4.7 We can make a group of users share permissions
the permissions that we want to deny. on a given set of files.
4.4 The system-wide default permission for a dir- 4.8 The sort command can sort the file on the basis
ectory is 666. of a given field in the file.
Advanced Unix Commands 91
4.9 We cannot sort a file in the reverse order through in the file occurs.
the sort command. 4.15 The cmp command displays a message ‘exactly
4.10 The ‘<’ symbol is the output redirection operator same’ if the files compared are exactly the
and the ‘>’ symbol is the input redirection operator. same.
4.11 The ‘>>’ symbol redirects the output of a command 4.16 The comm command either displays or hides the
to a file after overwriting its earlier content. content common to two files.
4.12 Several commands can be attached using the 4.17 The time command displays the system time and
pipe operator. even allows it to be modified.
4.13 The default number of lines into which the split 4.18 The real time is the elapsed time from the
command splits a file is 100 lines. invocation of the command till its termination.
4.14 The cmp command compares two files and in- 4.19 The calendar command displays the calendar of
dicates the line number where the first difference a specified month and year.
Multiple-choice Questions
4.1 The command used for setting default permissions 4.4 The command used for comparing two files is
of files and directories is (a) comp (c) uniq
(a) chmod (c) default (b) compare (d) diff
(b) umask (d) chstat 4.5 The option of the uniq command that removes
4.2 The three types of system users are User, all duplicate lines is
Group, and (a) -d (c) -r
(a) Other (c) Community (b) -u (d) -m
(b) Society (d) Everyone 4.6 The command used to change the group of a file
4.3 The option used with the chgrp command to is
change the group of a symbolic link is (a) groups (c) chgrp
(a) -s (b) -l (c) -g (d) -h (b) chmod (d) ls -g
92 Unix and Shell Programming
4.7 The statement $chown :accounts a.txt will selects and displays lines in reverse order from
change the bottom to the top is
(a) group of the file (a) -t (c) -b
(b) owner of the file (b) -r (d) -c
(c) nothing 4.10 The statement $ head -c 10 a.txt b.txt
(d) owner and group of the file displays
4.8 The option used with the sort command to re- (a) the first 10 lines of a.txt file only
move duplicate lines in a sorted output is (b) the first 10 lines of a.txt and b.txt files
(a) -d (b) -q (c) -u (d) -n (c) the first 10 characters of a.txt file only
4.9 The option used with the tail command that (d) the first 10 characters of a.txt and b.txt files
Programming Exercises
4.1 What will the following commands do? (d) To display the first two lines of the files
(a) $chmod 410 management.txt mbacourse.txt and management.txt
(b) $umask 233 (e) To display lines starting from the fifth till the
(c) $chgrp jobs mbacourse.txt end of the file in mbacourse.txt
(d) $head -c 100 mbacourse.txt management (f) To show the content of the file finance.txt
.txt located in accounts directory page-wise
(e) $tail -2 management.txt (g) To sort the file a.txt in reverse order and
(f) $man -K disk store it in file b.txt
(g) $cut -d"," -f3 bank.lst (h) To cut the first and third fields of the file
(h) $paste -d"<>" names.txt numbers.txt letter.txt that is delimited by a tab space
(i) $sort a.txt > b.txt (i) To create a group by the following name:
(j) $ split -5 numbers.txt temp latestprojects
(k) $ cmp -s a.txt b.txt 3 5 (j) To compare two files, accounts.txt and
(l) $ time ls | sort | lp finance.txt, and show the changes that need
(m) $ lp -d Epson100 -P 10-15, 20 a.txt to be made in the file accounts.txt to make
(n) $ comm a.txt b.txt it similar to finance.txt
4.2 Write the command for the following tasks: (k) To display all duplicate lines in the file
(a) To assign read, write, and execute permissions accounts.txt
to the owner; read and write permission to the (l) To remove all duplicate lines in the file
group; and only read permission to others for accounts.txt and save it in another file
the file mbacourse.txt correct.txt
(b) To set permissions for the directories to be (m) To split a file accounts.txt into the files
created in the future as read, write, and execute accountaa, accountab, accountac, and so
for the owner; read and write for the group; on, each consisting of 20 bytes
and only read for others (n) To compare two files, a.txt and b.txt, and
(c) To change the ownership of the file mbcourse. display the first character that is different in
txt to charles the two files
Review Questions
4.1 Explain the following commands with syntax and (b) What is the difference between the cmp and
examples. diff commands?
(a) pg (c) dos2unix 4.3 (a) Explain the different options used in the lp
(b) wc (d) tail command while printing a file.
4.2 (a) What is the difference between the chown and (b) Explain how a file is sorted.
chgrp commands? 4.4 What is the difference between the following pairs
Advanced Unix Commands 93
of commands that are used for extracting content (b) head and tail commands
from the files? 4.5 Briefly explain how the file access permissions are
(a) cut and split commands handled in the Unix operating system.
Brain Teasers
4.1 Suppose you want to assign read, write, and 4.6 Correct the mistake in the following command
execute permissions to the user, that is, the owner to cut the first and third fields of the file a.txt
of the file a.txt using the following command. delimited by the ‘|’ symbol
What is wrong with the following command? $ cut -f1,3 a.txt
Correct the mistake. 4.7 Is there a way to split a file a.txt into pieces
$ chmod o=rwx a.txt that are 10 kB each? If yes, what is that?
4.2 Correct the following command to change the 4.8 When you compare two files, a.txt and b.txt
owner and group of the file a.txt to user chirag with the cmp command, no output appears on the
and accounts respectively. screen. What does this mean?
$ chown accounts:chirag a.txt 4.9 Correct the mistake in the following command
4.3 Correct the mistake in the following command to suppress the display of the content, that is,
in order to change the group of the symbolic file commands in the files a.txt and b.txt.
b.txt to accounts. $ comm -1 a.txt b.txt
$ chgrp accounts b.txt 4.10 Correct the mistake in the following command in
4.4 Can you sort the file a.txt on the second and order to print two copies of the file a.txt.$ lp
third field skipping the first field? How? a.txt -q 2
4.5 The following command overwrites the content 4.11 What will happen if you add the calendar
of the file a.txt. What command will you use command in the .profile file?
to avoid the accidental overwriting of an existing 4.12 Correct the mistake in the following command to
file? extract line numbers 10 to 15 from the file a.txt.
$ ls > a.txt $ head -10 a.txt | tail +15
5
and Compression
Techniques
After studying this chapter, the reader will be conversant with the following:
• The types of devices, role of device drivers, and the way in which devices
are represented in the Unix operating system
• Using disk-related commands for copying disks, formatting disks, finding
disk usage, finding free disk space, and dividing the disk into partitions
• Compressing and uncompressing files using different commands such as
gzip, gunzip, zip, compress, uncompress, pack, unpack, bzip2,
bunzip2, and 7-zip
• The types of files, locating files, searching for files with a specific string, and
finding utility on a disk
• Checking a file system for corruption
• Important files of the Unix system, where and how passwords are kept,
where the list of hosts is kept, and how to allow or deny any user from
accessing certain resources
Regular file The file letter.txt that is represented by a hyphen (-) in the mode field is a
regular file. This is the simplest and most common type of file in the Unix system. It is just
a collection of bytes.
Directory The file projects that is represented by character d in the mode field is a
directory—a container of several file directories.
Symbolic link The file xyz.txt that is represented by character l in the mode field is a
symbolic link that refers to other file(s) of the file system.
Named pipe The file pipe that is represented by character p in the mode field is a named
pipe and is used in interprocess communication, that is, sending the output of one process as
input to another process.
Socket The file log that is represented by character s in the mode field is a socket and is a
special file used for advanced interprocess communication.
Special device file The files lp0 and cd0 represented by the characters c and b in the mode
field are special device files. They may be either characters or block device files.
Now we can understand how the device files can be recognized through the long listing of
files and directories.
Next, we will see how the Unix operating system manages and deals with all the devices
of a computer system.
while reading a block device, the data is first read in the block and then written into the buffer
cache, so that when the same data is again required, it is read from the buffer cache instead of
being read from the device. Similarly, while writing on a block device, the data is first stored
in the buffer cache before writing on the device. The block devices enable random access.
In other words, the data can be accessed from these devices in a random order. Examples of
block devices include hard disks, CD-ROM drives, and flash drives.
The character devices (or raw devices) are those that can be accessed directly bypassing
the operating system’s buffer caches. This means that the data is read or written into these
devices directly without being stored in the buffer cache. In addition, the name ‘character
device’ itself signifies that the data from such a device is accessed one character at a time.
Data is accessed from a character device sequentially (not randomly) in the form of a stream
of characters. Examples of character devices include serial port, mouse, keyboard, virtual
terminal, and printer.
In the long listing, the first character in the mode field is c or b. Refer to the long listing
shown in Section 5.2.1, where the floppy drive, CD-ROM, and the hard disk have b prefixed to
their permissions confirming that they are block devices. Similarly, printers, raw floppy drives,
and tape drives have c prefixed to their permissions, which confirms that they are raw devices.
displaying usage of disk space, that is, the space used by different files and directories of
the disk, the amount of free disk space in all the file systems in our machines, the amount of
free disk space in terms of megabytes (MB) and percentage, and dividing the disk drive into
different partitions. Let us see how disks are copied.
Table 5.3 shows the common options used with the du command.
100 Unix and Shell Programming
Here, du reports the number of blocks used by the current directory (denoted by .) and
those used by subdirectories within the current directory.
(b) The number of blocks used by the etc directory and its subdirectories are displayed
using the following command.
$ du /etc
54 /etc/defaults
2 /etc/X11
8 /etc/bluetooth
4 /dev/devd
These blocks (to the left of each directory) are 512 bytes in size.
(c) To ascertain the blocks (that are 1024 bytes in size) that are used by the subdirectories
in the etc directory, we will use the following command.
$ du –k /etc
27 /etc/defaults
1 /etc/X11
4 /etc/bluetooth
2 /dev/devd
(g) To ascertain the number of blocks used by a specific file(s), we can use the following
command.
$ du –s *.txt
10 abc.txt
7 pqr.txt
11 xyz.txt
This output shows the number of blocks used by the different files with extension .txt.
Note: The du command displays information in terms of 512-byte blocks independent of the actual disk block size.
Table 5.4. shows the common options used with the df command.
Examples
(a) If we want to have information regarding the free and available disk space of a particular
file system, we can mention it in the df command. Furthermore, we can use the df
command without any option or file system (as shown here) in order to obtain information
about all the file systems installed on our machines.
$ df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 507630 165380 301640 35% /
devfs 1 1 0 100% /dev
/dev/ad0s1e 507630 12 467008 0% /tmp
/dev/ad0s1f 73138272 3616480 63670732 5% /usr
/dev/ad0s1d 1185230 2050 1088362 0% /var
The first column displays the different partitions on the disk of our system. The
second column displays the size of the partitions in terms of blocks of size 1 KB.
Similarly, the size of the first partition represented by ad0s1a is of size 507630 KB
(507 MB). Out of the 507630 KB, 165380 KB is used up and 301640 KB is free, as
represented by the third and fourth columns respectively. The fifth column shows
102 Unix and Shell Programming
the used (consumed) percentage of the disk. The last column indicates where the
partition is connected to the Unix file system. For example, the partition ad0s1a
(shown in the first row) is the root partition and hence is represented to be mounted on.
(b) To know the amount of free space in a particular partition, we can specify that while
giving the df command. For example, in order to know the amount of free disk space in
the root partition, we need to give the following command.
$ df /
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 507630 165380 301640 35% /
This output shows the total size of the root partition in terms of KB, the amount of used
space, free space, and percentage of disk space used.
(c) In order to easily remember the size of the partitions, we make use of the –h option to
display the size of the partitions in human readable forms.
$ df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 506 MB 164 MB 300 MB 35% /
devfs 1 KB 1 KB 0 100% /dev
/dev/ad0s1e 506 MB 0 506 MB 0% /tmp
/dev/ad0s1f 71 GB 5.4 GB 62 GB 5% /usr
/dev/ad0s1d 1 GB 1.8 MB 1 GB 0% /var
In this output, the size of the partitions is displayed in megabytes, which is computed by
dividing the block sizes in KB by a value 1024.
(d) The option –k of the df command displays the size of the file systems in kilo bytes as
shown in the following example.
$ df -k
Filesystem KBytes Used Avail Capacity Mounted on
/dev/ad0s1a 518144 167936 307200 35% /
devfs 1 1 0 100% /dev
/dev/ad0s1e 518144 0 518144 0% /tmp
/dev/ad0s1f 74448896 5662310 65011712 5% /usr
/dev/ad0s1d 1048576 18432 1030144 0% /var
(e) The option –e of the df command displays the number of files that are free on the file
systems as shown in the following example.
$ df -e
Filesystem ifree
/dev/ad0s1a 34596
devfs 0
/dev/ad0s1e 7483620
/dev/ad0s1f 8402
/dev/ad0s1d 56129
This output shows the numbers of files free on each of the file systems.
File Management and Compression Techniques 103
Here, the file system is used to find out the free disk space available on it. If the file system
is not specified, all file systems on the disk are displayed along with the information on
available disk space on each of them.
Example $/etc/dfspace
: Disk Space: 6.32 MB of 137.74 MB available (4.59 %)
Total disk Space: 10.50 MB of 200 MB available (3.89%)
In the aforementioned example, we have written /etc/dfspace instead of dfspace, because
the dfspace command exists in the etc directory. The output reports free disk space for the
root file system. If there had been other file systems installed, their free space would have
also been reported. It also reports the total disk space available.
It is to be noted that the df and dfspace commands report the disk space available in the file
system as a whole, whereas du reports the disk space used by specified files and directories.
Table 5.6 Menu options of the fdisk command When the fdisk command is active, it
displays a menu of options that we can
Option Description
use to create, list, display, and delete
d Deletes a partition partitions. Table 5.6 gives the menu
l Lists the partitions options of the fdisk command.
m Displays this menu We can create a primary partition
n Creates a new partition with one file system on it, or an extended
p Prints the partition table
partition with multiple logical drives in
the partition.
q Quits without saving changes
w Writes the partition table to the disk and exits Example $ fdisk -l
This command lists the partition information of the disk drive on our computer system as
given here:
Disk /dev/hda1: 64 heads, 63 sectors, 1023 cylinders
Units = cylinders of 4032 * 512 bytes
The first hard disk as a whole is represented as /dev/hda, while individual partitions in this
disk take on names hda1, hda2, and so forth. hda1 here is a primary partition, hda2 is an
extended partition containing a logical partition hda4. The active partition is indicated by
an * in the second column. The second hard disk will have the name /dev/hdb with similar
numeric extensions.
File Management and Compression Techniques 105
Figure 5.2 shows two files names.txt and numbers.txt with the initial content that we wish
to compress.
Examples
(a) $ gzip -c names.txt
This command does not compress the file names.txt, but displays the compressed output
on the screen (refer to Fig. 5.2).
(b) $ gzip names.txt
The file names.txt is compressed and renamed names.txt.gz and is confirmed using the
ls command (refer to Fig. 5.2).
(c) $ cat names.txt.gz
The command shows the compressed content of the file names.txt. We can see (refer to
Fig. 5.2) that the output displayed using the -c option matches the output of this example.
(d) $ gzip numbers.txt
The file numbers.txt is also compressed into the file numbers.txt.gz and is confirmed
using the ls command, which is shown in Fig. 5.2.
(e) $ gzip -l *.gz
It lists the information of compressed files, names.txt and numbers.txt. This is evident
from the list of commands shown in Fig. 5.2. This figure shows the compressed size,
uncompressed size, compression ratio, and the name of the uncompressed file.
106 Unix and Shell Programming
$ ls n*
names.txt numbers.txt
$ cat names.txt
Anil
Ravi
Sunil
Chirag
Raju
$ cat numbers.txt
2429193
3334444
7777888
9990000
5555111
$ gzip -c names.txt
? N ͉names.txt s
J,
.͋1 32 ?μμY \ xA
$ gzip names.txt
$ ls n*
names.txt.gz numbers.txt
$ cat names.txt.gz
? N ͉names.txt s
J,
- ͋1 32 ?μμY \ xA
$ gzip numbers.txt
$ ls n*
names.txt.gz numbers.txt.gz
$ gzip -1 *.gz
copmressed uncompressed ratio uncompressed_name
54 28 7.1% names.txt
63 40 17.5% numbers.txt
117 68 -27.9% <totals>
$ gzip -d names.txt.gz
$ ls n*
names.txt numbers.txt.gz
$ cat >numbers.txt
12345
$ ls n*
names.txt numbers.txt numbers.txt.gz
$ gzip -d numbers.txt.gz
gzip: numbers.txt already exists; do you wish to overwrite <y or n>? n
not overwritten
$ gzip -df numbers.txt.gz
$ ls n*
names.txt numbers.txt
Fig. 5.2 Compression and uncompression of files names.txt and numbers.txt using
the gzip command
File Management and Compression Techniques 107
$ ls n*
names.txt.gz numbers.txt.gz
$ gunzip -1 *.gz
compressed uncompressed ratio uncompressed_name
54 28 7.1% names.txt
63 40 17.5% numbers.txt
117 68 -27.9% <totals>
$ gunzip -c names.txt.gz
Anil
Ravi
Sunil
Chirag
Raju
$ gunzip -c names.txt.gz numbers.txt.gz
Anil
Ravi
Sunil
Chirag
Raju
2429193
3334444
7777888
9990000
5555111
$ gunzip names.txt.gz
$ ls n*
names.txt numbers.txt.gz
$ cat >numbers.txt
12345
$ ls n*
names.txt numbers.txt numbers.txt.gz
$ gunzip numbers.txt.gz
gzip: numbers.txt already exists; do you wish ot overwrite <y or n>? n
not overwritten
$ gunzip -f numbers.txt.gz
$ ls n*
names.txt numbers.txt
Fig. 5.3 Uncompression of files names.txt and numbers.txt using the gunzip command
(d) $ gunzip names.txt.gz
The file names.txt.gz is uncompressed and renamed names.txt. This is confirmed using
the ls command (refer to Fig. 5.3).
(e) $ gunzip numbers.txt.gz
The compressed file numbers.txt.gz is supposed to be uncompressed to numbers.txt.
However, as the file numbers.txt already exists, the following warning message—gzip:
numbers.txt already exists; do you wish to overwrite ( y or n)?—is displayed.
(f) $ gunzip -f numbers.txt.gz
The option -f results in force decompression and hence overwrites the existing file
numbers.txt without displaying any warning message. We can see the uncompressed
files names.txt and numbers.txt in Fig. 5.3.
Note: When we uncompress a file, the compressed file is automatically deleted from the system.
File Management and Compression Techniques 109
Examples
(a) $ zip abc *
All the files in the current directory are compressed into a single file abc.zip.
Note: The gzip command can only compress a single file whereas the zip command can compress
multiple files.
A range of filenames can be given using wild cards. As the zip command compresses
the files, the progress will be reported on the screen. When we compress these files, the
original files remain unchanged.
(b) If we wish to add a file(s) that we forgot to add in the zip file, the following statement
will solve the purpose.
$ zip -g abc a.txt
This example adds the file a.txt to an existing zip file abc.zip.
(c) The following is the option to correct the damaged zip file.
$ zip -F abc –out pqr
This example fixes the zip file abc.zip if damaged, and copies the fixed version into
another zip file pqr.zip.
(d) The following example compresses the files with extension .dat from the current
directory in the quiet mode, that is, without displaying any response on the screen.
$ zip -q abc *.txt
(e) In order to compress the files of subdirectories, we use the -r option.
$ zip –r abc projects
This example compresses all the files in the projects directory as well as in its
subdirectories and saves them in the abc.zip file.
The execution of the aforementioned commands is shown in Fig. 5.4.
110 Unix and Shell Programming
$ ls -l
total 12
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ zip abc *
adding: customers.txt (stored 0%)
adding: letter.txt (stored 0%)
adding: matter.txt (deflated 21%)
adding: projects/ (stored 0%)
adding: transact.txt (deflated 64%)
adding: users.txt (stored 0%)
$ ls -l
total 16
-rw-r--r-- 1 root root 1370 Feb 22 14:55 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ ls -l
total 158
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1516 Feb 22 14:56 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
-rw-r--r-- 1 root root 1516 Feb 22 14:58 pqr.zip
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ ls -l
total 4
-rw-r--r-- l root root 1869 Feb 22 15:01 abc.zip
$ unzip abc
Archive: abc.zip $
extracting: customers.txt
extracting: letter.txt
inflating: matter.txt
creating: projects/
inflating: transact.txt
extracting: users.txt
extracting: a.txt
inflating: projects/bank.lst
$ ls -l
total 18
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ ls projects
bank.lst
$ unzip -d temp abc
Archive: abc.zip
extracting: temp/customers.txt
extracting: temp/letter.txt
inflating: temp/matter.txt
creating: temp/projects/
inflating: temp/transact.txt
extracting: temp/users.txt
extracting: temp/a.txt
inflating: temp/projects/bank.lst
$ ls -l
total 152
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ unzip -p abc
John
Charles
Troy
hello
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today
$ unzip -t abc
Archive: abc.zip
testing: customers.txt OK
testing: letter.txt OK
testing: matter.txt OK
testing: projects/ OK
testing: transact.txt OK
testing: users.txt OK
testing: a.txt OK
testing: projects/bank.lst Ok
No errors detected in compressed data of abc.zip.
$ unzip -l abc
Archive: abc.zip
Length Date Time Name
----------- --------- ----- -----
18 02-22-2006 14:51 customers.txt
6 02-22-2006 14:51 letter.txt
113 02-22-2006 14:51 matter.txt
0 02-22-2006 14:53 projects/
892 02-22-2006 14:51 transact.txt
16 02-22-2006 14:51 users.txt
8 02-22-2006 14:56 a.txt
347 02-22-2006 14:53 projects/bank.lst
-------- --------
1400 8 files
$ ls -l
total 360
-rw-r--r-- 1 root root 8 Feb 22 14:56 a.txt
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 Customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 14:51 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
$ rm a.txt
$ rm matter.txt
$ unzip -f abc
Archive: abc.zip
$ ls-l
total 356
-rw-r--r-- 1 root root 1869 Feb 22 15:01 abc.zip
-rw-r--r-- 1 root root 18 Feb 22 14:51 Customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
drwxr-xr-x 2 root root 512 Feb 22 14:53 projects
drwxr-xr-x 3 root root 512 Feb 22 15:50 temp
-rw-r--r-- 1 root root 892 Feb 22 14:51 transact.txt
-rw-r--r-- 1 root root 16 Feb 22 14:51 users.txt
Examples
(a) $ compress transact.txt
This example compresses the file transact.txt and renames it transact.txt.Z.
Note: The original file is replaced by another file, which has the same name with a .Z extension added to it
(i.e., transact.txt is replaced by the file transact.txt.Z).
$ ls -l transact*
-rw-r--r-- l root root 892 Feb 22 14:51 transact.txt
$ compress transact.txt
$ ls -l transact*
-rw-r--r-- 1 root root 551 Feb 22 14:51 transact.txt.Z
$ compress -c customers.txt
.
řɥɮɏɤ
$ cat > transact.txt
testing
^D
$ compress transact.txt
ŜŜŚ
ŜŜſƀţ
$ compress -f transact.txt
$ ls -l transact*
-rw-r--r-- 1 root root 12 Feb 22 16:51 transact.txt.Z
ɛ
ŞŜ
ŜśśɨɩŜɪɯʩŞŞ
ŜŜ
version of the earlier file transact.txt without confirmation. If –f option is not used, the
compress command asks for confirmation before overwriting any existing file.
(d) $ compress -v matter.txt
This example displays how much compression was carried out by showing the output
given here.
matter.txt: Compression: 12.38% -- replaced with matter.txt.Z
The output of these commands is given as a screenshot in Fig. 5.6.
-f It applies force to pack the file. Sometimes if not much compression is possible, the
pack command refuses to pack the file. The –f option forcefully packs the file into the .z
extension even if there is not much saving.
file It represents the file we wish to pack.
Examples
(a) $ pack a.png
pack: a.png: 0.3% compression
The compressed file will be stored in the name a.png.z and the original file will be
deleted. In the pack command, the degree of compression is low.
(b) $ pack –f matter.txt
It packs the file matter.txt in the name matter.txt.z forcefully, that is, even when
not much compression is possible, the files will still be compressed into matter.txt.z.
To view the contents of a packed file, we use the pcat command.
Syntax pcat file_name.z
Here, file_name.z represents the packed file whose content we wish to see.
Example $ pcat matter.txt.z
This example will display the content of the packed file matter.txt.z without unpacking it.
Figure 5.8 shows the output of the aforementioned commands.
$ ls -l transact*
-rw-r--r-- 1 root root 551 Feb 22 17:04 transact.txt.z
$ uncompress transactl.txt.Z
$ ls - l transact*
-rw-r--r-- 1 root root 892 Feb 22 17:04 transact.txt
$ ls -l matter*
-r------r-- 1 root root 99 Feb 22 17:08 matter.txt.Z
$ uncompress -c matter.txt.Z
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today
$ cat > matter.txt
Hello
^D
$ uncompress matter.txt.Z
matter.txt already exists; do you wish to overwrite matter.txt (yes or no)? n
not overwritten
$ uncompress -f matter.txt.Z
$ ls -l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
$ compress matter.txt
$ ls -l matter*
-r-----r-- 1 root root 99 Feb 22 17:08 matter.txt.Z
$ zcat matter.txt.Z
Hello this is testing of cut command
I think it is working as per the expected
result.it is going to rain today
$ ls -l a*
-rw-r--r-- 1 root root 34878 Feb 22 17:28 a.png
$ pack a.png
pack: a.png: 0.3% Compression
$ ls -l a*
-rw-r--r-- 1 root root 34779 Feb 22 17:28 a.png.z
$ ls -l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
$ pack matter.txt
pack: matter.txt: no saving - file unchanged
$ pack -f matter.txt
pack: matter.txt: 11.5% Compression
$ ls -l matter*
-r-----r-- 1 root root 100 Feb 22 17:08 matter.txt.z
$ pcat matter.txt.z
Hello this is testing of cut command
I think it is working as per the expected
result. it is going to rain today
Here, file_name unpacks or uncompresses the packed file by removing its extension .z.
Example $ unpack matter.txt.z
The packed file matter.txt.z will be unpacked to the file matter.txt as shown in Fig. 5.9.
$ ls - l matter*
-r-----r-- 1 root root 100 Feb 22 17:08 matter.txt.z
$ unpack matter.txt.z
unpack: matter.txt: unpacked
$ ls - l matter*
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
$ ls n*
names.txt numbers.txt
$ cat names.txt
Anil
Ravi
Sunil
Raju
$ cat numbers.txt
2429193
3334444
7777888
9990000
5555111
$ bzip2 names.txt
$ ls n*
names.txt.bz2 numbers.txt
$ cat names.txt.bz2
H AY&SY={ : 䉴< ↑ ! 4=L P Q lννz PE{
W
$ bzip2 -v numbers.txt
numbers.txt: 0.635:1, 12.600 bits/byte, -57.50% saved, 40 in, 63 out.
$ ls n*
name.txt.bz2 numbers.txt.bz2
Fig. 5.10 Compression and uncompression of files names.txt and numbers.txt using the
bzip2 command (Contd)
File Management and Compression Techniques 119
$ ls n*
names.txt numbers.txt
$ bzip2 -k names.txt
$ ls n*
names.txt names.txt.bz2 numbers.txt
$ bzip2 -d names.txt.bz2
bzip2: Output file names.txt already exists.
$ ls n*
names.txt numbers.txt
The file numbers.txt.bz2 is uncompressed into the file numbers.txt (i.e., the file numbers.
txt.bz2 will be deleted).
(b) The following example displays the list of files compressed in the file data.7z.
$ 7z l data.7z
(c) The following example tests whether the files in the compressed file data.7z are OK or
not. If the files are found to be OK, the filenames are displayed along with a of information
about the compressed files: size and number of files and folders compressed in it.
$ 7z t data.7z
(d) The following example adds the files of the directory projects to an existing compressed
file data.7z.
$ 7z a data.7z projects
(e) The following example extracts the files found in the compressed file data.7z into the
current directory.
$ 7z e data.7z projects
Note: The compressed files of subdirectories will also be uncompressed into the current directory, that is, the
respective subdirectories will not be created.
To create subdirectories and uncompress files into the respective subdirectories, option
x is used instead of e.
(f) The following example deletes the directory projects and its files from the compressed
file data.7z.
$7z d data.7z projects
The screenshot of the aforementioned examples is shown in Fig. 5.11.
$ ls - l
total 10
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root mba 892 Feb 22 17:21 transact.txt
$ ls -l projects
total 4
-rwxr-xr-x 1 root root 347 Feb 22 14:53 bank.lst
-rw-r--r-- 1 root root 6 Feb 27 20:30 hello.txt
$ 7z a data.7z *.txt
Compressing customers.txt
Compressing letter.txt
Compressing matter.txt
Compressing transact.txt
Fig. 5.11 Compression and uncompression of files using the 7-zip command (Contd)
File Management and Compression Techniques 121
Everything is ok
$ ls - l
total 102
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rw------- 1 root root 639 Feb 27 20:31 data.7z
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root mba 892 Feb 22 17:21 transact.txt
$ 7z l data.7z
Method = LZMA
Solod = +
Block = 1
Date Time Attr Size Compressed Name
------------------- ----- ----------- ----------- --------------------
--------
2006-02-22 14:51:18....A 18 423 customers.txt
2006-02-22 14:51:18....A 6 letter.txt
2006-02-22 17:08:46....A 113 matter.txt
2006-02-22 17:21:48....A 892 transact.txt
------------------- ----- ----------- ----------- --------------------
1029 423 4 files, 0 folders
$ 7z t data.7z
Testing customers.txt
Testing letter.txt
Testing matter.txt
Testing transact.txt
Everything is ok
Total:
Folders: 0
Files: 4
Size: 1029
Compressed: 639
$ 7z a data.7z projects
Scanning
Fig. 5.11 (Contd)
122 Unix and Shell Programming
Compressing projects/hello.txt
Compressing projects/bank.lst
Everything is ok
$ 7z l data.7z
Method = LZMA
Solid = +
Blocks + 1
$ rm -r projects
$ ls -l
totla 276
-rw------- 1 root root 909 Feb 27 20:36 data.7z
$ 7z e data.7z
Extracting customers.txt
Extracting letter.txt
Extracting matter.txt
Extracting transact.txt
Extracting projects/hello.txt
Extracting projects/bank.lst
Extracting projects
Everything is ok
Total:
Folders: 1
Files: 6
Size: 1382
Compressed: 909
$ ls - l
total 462
-rwxr-xr-x 1 root root 347 Feb 22 14:53 bank.lst
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
-rw------- 1 root root 909 Feb 27 20:36 data.7z
-rw-r--r-- 1 root root 6 Feb 27 20:30 hello.txt
-rwx--xr-x 1 root root 6 Feb 22 14:51 letter.txt
-rwx--xr-x 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 27 20:30 projects
-rw-r--r-- 1 root root 892 Feb 22 17:21 transact.txt
$ 7z d data.7z projects
Everything is ok
$ 7z l data.7z
Method = LZMA
Solid = +
Blocks = 1
presence of a specified application program or system utility on the disk drive, and checking
the file system. Let us see how we can find the file type.
$ ls - l
total 148
-rw-r--r-- 1 root root 34878 Feb 22 17:30 a.png
-rw-r--r-- 1 root root 34779 Feb 22 17:28 a.png.z
-rw-r--r-- 1 root root 18 Feb 22 14:51 customers.txt
--w-r-x--x 1 root root 6 Feb 22 14:51 letter.txt
-r-----r-- 1 root root 113 Feb 22 17:08 matter.txt
drwxr-xr-x 2 root root 512 Feb 22 16:39 projects
-rw-r--r-- 1 root root 892 Feb 22 17:21 transact.txt
$ file matter.txt
matter.txt: English text
$ file customers.txt
customers.txt: ascii text
$ file a.png
a.png: PNG image data
$ file a.png.z
a.png.z: packed data
$ file projects
projects: directory
$ file *
a.png: PNG image data
a.png.z: packed data
customers.txt: ascii text
letter.txt: commands text
matter.txt: English text
projects: directory
transact.txt: ascii text
$ cat filenames
letter.txt
a.png
transact.txt
$ file -f filenames
letter.txt: commands text
a.png: PNG image data
transact.txt: ascii text
Here, path refers to the directory or location of the disk in which we want the find command
to search for the desired files. The path may include more than one directory separated by
a space. The find command searches all the subdirectories specified in the path to find the
file(s) that meets the given criteria. Table 5.15 shows the options for writing criteria.
The action-list in the syntax indicates that we can apply several actions on the files that
are searched through the find command. Table 5.16 lists the most frequent actions that are
applied to the found files.
126 Unix and Shell Programming
In case of the –exec or ok command, a pair of braces {} is used to represent the files that
are found by the find command. In other words, the files located by find will replace the
braces and the specified command will be applied on each found file one by one. In order
to distinguish between the command being executed and the arguments used by the find
command, a semicolon (;) is used. Since the shell also uses the semicolon, we use the
‘escape’ character (a backslash or quotes) to differentiate it. A format of the find command
when a command is executed on the found files is as follows:
$ find pathname-list condition-list -exec command {} \;
While using complex expressions for finding files, we can use the operators explained in the
next sub-section.
File Management and Compression Techniques 127
Note: While using –a or –o operators, we may use parentheses ( ) for separation, but they must be ‘escaped’
as they are used by the shell. This means that the parentheses must be prefixed with a backslash, ‘\(’, ‘\)’
Examples
(a) The following command displays the files and their path names that have not been
accessed for over a month (+30).
$ find / -atime +30 -print
(b) To find files that are of a size larger than 20 blocks and which have not been accessed
for over a month (+30), use the following command.
$ find / -atime +30 –size +20 –print
(c) To search for files that are of a size between 1000 and 2000 bytes, use the following command.
$ find . -size +1000c -size -2000c -print
We can see in this command that a minus sign designates ‘less than,’ and the plus sign
designates ‘greater than’.
(d) To remove files that are of a size larger than 20 blocks with the interactive action
command ok, enter the following.
$ find / -atime +30 -size +20 -ok rm -f { } \;
(e) To list all files and directories under the current directory, use the following command.
$ find . -print
(f) To search for the file a.txt in the current directory and its subdirectories, use the
following command.
$ find -name 'a.txt' -print
(g) To search for the file a.txt on the root and all its subdirectories, use the following
command.
$ find / -name 'a.txt' -print
(h) To display all .c files under the current directory, use the following command.
$ find . -name '*.c' -print
(i) To print all files beginning with the word test in the current directory and its
subdirectories, use the following command.
$ find . -name 'test*' -print
128 Unix and Shell Programming
(j) To print all filenames comprising three characters that begin with an upper-case or a lower-
case character in the current directory and its subdirectories, use the following command.
$ find . -name '[a-zA-Z]??' -print
(k) To display the list of the directories, use the following command.
$ find . -type d -print
(l) To find all those .c files that were last modified less than three days ago, use the following
command.
$ find . -mtime -3 -name "*.c" -print
Note: We can use single quotes as well as double quotes for defining the pattern.
(m) To find all those .c files that were last modified more than three days ago, use the
following command.
$ find . -mtime +3 -name "*.c" -print
(n) To find all those .c files that were modified exactly three days ago, use the following
command.
$ find . -mtime 3 -name "*.c" -print
(o) To find the .txt files that have the 755 permission, use the following command.
$ find . -name '*.txt' -perm 755 -print
We can see that 755 is an octal number representing read, write, and execute permissions
for the owner, and read and execute permission for the group and other members.
We can also use the and operator, that is, the -a operator in the aforementioned
command. The and operator shows only those files that satisfy both the specified
expressions. With the and operator, this command can be written as follows.
$ find . -name '*.txt' -a -perm 755 -print
Remember, -a is the default operator, so we can optionally omit it.
(p) To find the subdirectories under the current directory having the 755 permission, use the
following command.
$ find . -type d -perm 755 –print
(q) To find all the files that have the User (owner) as root, use the following command.
$ find . -user root -print
Instead of the username, we can use the user ID. The following command is used.
$ find . -user 0 -print
(r) To find all the files that belong to the group projects, use the following command.
$ find . -group projects -print
As with the username, instead of the group name, we can use the group ID. The
following command is used.
$ find . -group 15 -print
(s) To find all the files except the a.txt file, use the following command.
$ find . ! -name 'a.txt' -print
In this command, ! is the negation operator and it reverses the meaning of the
expression.
(t) To find all the files except the ones with the extension .txt, use the following command.
$ find . ! -name '*.txt' -print
File Management and Compression Techniques 129
(u) To find .txt files or files that have the 755 permission, use the following command.
$ find . \( -name '*.txt' -o -perm 755 \) -print
The -o operator is the ‘OR’ operator and hence the files that satisfy either expression will
be displayed. We have used the ‘escaped’ parentheses in this expression, that is, they
are prefixed by a backslash to avoid them from being interpreted by the shell.
(v) To find .txt as well as .doc files, use the following command.
$ find . \( -name '*.txt' -o -name '*.doc' \) –print
We can also execute commands on the files that we find. The following is an example.
$ find . -name "*.txt" -exec wc -l '{}' ';'
This command counts the number of lines in every .txt file in and under the current
directory. The count of the lines is displayed before the name of the respective file. Basically
in this command, all the .txt files that are found replace the ‘{ }’ braces, that is, the wc –l
command is applied to each of the files that is found. The ‘;’ ends the -exec clause.
(w) To display the names of the files and subdirectories in the current directory, use the
following command.
$ find . -exec echo {} ';'
We can see that the semicolon is quoted.
(x) The following example finds the .txt files that have the 755 permission. From the files
that are found, the group read permission is removed, as shown here.
$ find . -name '*.txt' -perm 755 -exec chmod g-r '{}' ';';
The find command has several significances, which are as follows:
1. Searching for files with a specific pattern
2. Searching for files that are accessed a specific number of days ago
3. Searching for files of a specific size
4. Searching for files with specific permissions
5. Searching for files belonging to a specified user or group
6. Applying commands on the found files
Table 5.18 Options used with the locate command This command will find the first 10 files
Options Description that contain .txt anywhere in their full
paths and for which the user has access
-q It suppresses error messages that
permissions. It will not display any error
are displayed for files for which
messages on finding the files for which the
the user does not have access
user does not have access permissions.
permissions.
(c) $ locate –i "project.txt"
-n It limits the result to a specified This command will find all project.txt
number.
files, be it in the upper case or lower case,
-i It ignores the case while for which the user has access permissions.
searching, that is, it returns the
result that matches the pattern in One disadvantage of locate is that it stores
upper case or lower case. all filenames on the system in an index
pattern_to_search It represents the string that we that is usually updated only once a day.
wish to search for in the path This means locate will not find files that
names or in the filenames. have been created very recently. It may
All the files that contain the also report filenames as being present even
pattern_to_search string in their though the file has just been deleted. Unlike
path or filename will be listed on find, locate cannot track down files on the
the screen. basis of their permissions, size, and so on.
Output /usr/bin/ls
response. On the basis of the user action, either the error is removed or fsck will continue
checking without making any changes to the file system. In the non-interactive mode, fsck
tries to repair all the errors found in the file system without waiting for the user response.
Although this mode is faster, it may delete some important files that have become corrupted.
Syntax # fsck [-y] [-n] [ filesystem ]
Note: The option –y or –n, if used, runs the fsck command in the non-interactive mode.
Here, filesystem is the name of the file system to be checked. If we do not specify the file
system, fsck will use the files /etc/checklist or /etc/fstab to know the names of the file
systems to be checked.
The options –y and –n are used to automatically provide answers Yes and No, respectively,
to all the queries that appear when using the fsck command.
Examples
(a) # fsck –y
This command checks all the file systems installed in our machines and displays the
answer ‘Yes’ (meaning granted) for all the queries that come up.
(b) # fsck –n
This command checks all the file systems installed in our machines and displays the
answer ‘No’ for all the queries that come up.
The fsck command runs in several phases as follows:
# fsck /dev/root
** Currently Mounted on /
** Phase 1 — Check Blocks and Sizes
** Phase 2 — Check Pathnames
** Phase 3 — Check Connectivity
** Phase 3b — Verify Shadows/ACLs
** Phase 4 — Check Reference Counts
** Phase 5 — Check Cylinder Groups
7899 files, 406203 used, 279169 free (257 frags, 34864 blocks, 0.0% fragmentation)
Let us have a quick view of all the phases of the fsck command.
In phase 1, each inode in the file system is checked and then the disk blocks pointed to by the
inode are checked. Error messages may appear at this stage if the block address in the inode
is invalid, a block is already being used by another inode, the expected number of blocks for
an ordinary file does not match with the actual number of blocks used by the inode, and there
are other similar errors. In short, phase 1 performs the following tasks:
1. Checks the inodes, looks for valid inode types, and corrects the inode size and format.
2. Checks for bad or duplicate blocks.
In phase 2, fsck checks all directory inodes in the file system. First, the inode for the root
directory is examined. In case the root inode is corrupted, the fsck will abort. If the inode
number of the directory entry is invalid, the inode field of the directory entry is set to zero. It
132 Unix and Shell Programming
is ensured that none of the directory entries points to an unallocated inode. In short, this phase
is focused on removing the directory entries that are invalid or pointing to invalid inode(s).
Thus, this phase reports errors that result from root inode mode and status, directory inode
pointers in a range, directory entries pointing to bad inodes, etc.
Note: This phase removes directory entries pointing to bad inodes used in phase 1.
In phase 3, all the allocated inodes are scanned for unreferenced directories, that is, directories
where the inodes corresponding to the parent directory entry do not exist. In this case, we
will be prompted to reconnect to any orphaned directories. If our answer is yes, then a link
between the orphan directory and the special directory /lost+found will be made. When the
fsck command is over, we can examine the entries in /lost+ found and can move them to
their respective directories.
Phase 4 deals with the inode count or reference count information, which was accumulated
in phases 2 and 3. In phase 1, the reference count is first set to the link count value stored
in the inode. The link count is the number of links to a physical file. Then, in phases 2 and
3, the reference count is decremented each time a valid link is found while scanning the file
system. Therefore, the reference count value should be zero when phase 4 begins.
Phase 5 checks the free block list. Any bad or duplicate blocks in this list are flagged,
which are later salvaged. On salvaging the free list, phase 6 is initiated that reconstructs the
free block list.
If a file system was corrupted and then fixed, the system is rebooted without a sync
operation (to prevent the ‘file system fixing’ from being undone). The reboot process
modifies the file system to repair it.
Note: Unless fsck is used in the single-user mode, the file system corruption will spread to other mounted file systems.
The fsck command checks the integrity of the file systems, especially the superblock, which
stores summary information of the volume. Whenever data is added or changed on a disk, it
is the superblock that is frequently modified to reflect the changes. There are many chances
of the superblock getting corrupted. Hence, the fsck checks the superblock for any errors.
The following two checks are essentially done:
1. The size of the file system must be greater than the size of the number of blocks identified
in the superblock.
2. The total number of inodes must be less than the maximum number of inodes.
Besides checking the superblock, the fsck command also checks the number and status of
the cylinder group blocks, inodes, indirect blocks, and data blocks. This command checks if
all the blocks that are marked as free are not being used by any files. If they are being used,
it means the files may be corrupted. In addition, fsck confirms if the number of free blocks
plus the number of used blocks equals the total number of blocks in the file system. In case
of any ambiguity, the maps of unallocated blocks are rebuilt.
When inodes are examined, fsck searches for any inconsistency in the format and type,
link count, duplicate blocks, bad block numbers, and inode size. Inodes should always be in
one of the three states: allocated (being used by a file), unallocated (not being used by a file),
and partially allocated (the procedure of allocation and unallocation is performed, but the
File Management and Compression Techniques 133
data that was supposed to be deleted is still there). The fsck command will clear the inode if
inconsistency of any type is detected.
The link count is the number of directory entries that are linked to a particular inode.
The entire directory structure is examined to find the number of links for every inode. If the
stored link count and the actual link count do not match, it confirms that the disk was not
synchronized before the shutdown, that is, while saving the changes in the file system, the
link count was not updated. In case the stored count is not zero but the actual count is zero,
then disconnected files are placed in the lost+found directory. In other cases, the actual count
replaces the stored count.
The output of the fsck command is shown in Fig. 5.13.
# fsck -y
/dev/dsk/c0d0s0 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes
** /dev/dsk/c0d0s0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
7899 files, 406203 user, 279169 free (257 frags, 34864 blocks,
0.0% fragmentation)
** /dev/dsk/c0d0s6
** Currently Mounted on /usr
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
150119 files, 3244347 used, 1892040 free (5304 frags, 235842 blocks,
0.1% fragmentation)
** /dev/dsk/c0d0s3
** Currently Mounted on /var
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
Fig. 5.13 Output displayed while running the fsck command (Contd)
134 Unix and Shell Programming
** /dev/dsk/c0d0s7
** Currently Mounted on /export/home
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
2 files, 9 user, 31295480 free (16 frags, 3911933 blocks,
0.0% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****
/dev/dsk/c0d0s5 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes
** /dev/dsk/c0d0s5
** Currently Mounted on /opt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
98 files, 25985 used, 24505 free (9 frags, 3062 blocks,
0.0% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****
/dev/dsk/c0d0s1 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? yes
** /dev/dsk/c0d0s1
** Currently Mounted on /usr/openwin
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
8305 files, 206932 used, 116320 free (400 frags, 14490 blocks,
0.1% fragmentation)
***** PLEASE RERUN FSCK ON UNMOUNTED FILE SYSTEM *****
5.6.1 /etc/passwd
passwd is a file found in the /etc directory. It contains login names, passwords, home
directories, and other information about users. Each line of the file contains a series of fields,
which defines a login account. The fields in each line of the /etc/passwd file are separated
by colons. Table 5.19 shows the aforementioned fields.
Table 5.19 Fields found in the /etc/passwd file
Field Description
user name The username is the string entered in response to the login prompt. It is a unique
identifier for the user throughout the session. The program that prompts for the
login name reads this file to get information pertaining to the user
encrypted This program prompts for the login name, reads the information found in this field,
password and uses the information to validate the password entered by the user
user ID number Each user has an ID number that can be used as a synonym for the username.
Both the ID number and the username are unique within the system.
group ID number Each user has one group ID number. Any number of users can be assigned to the
same group. The group ID number is used to assign group access permissions to
files, directories, and devices.
real name This is a sort of comment or complete name of the user (login names are usually
unique identifiers only)
home directory It is the directory that the user reaches after entering the correct logon name and
password. This is the name that gets stored in the HOME environment variable
shell program This is a shell program that is run once the user logs in. If nothing is specified, /
bin/sh is assumed
We can always find a root login in the /etc/passwd file, which is as follows:
root:x:0:0:root:/root:/bin/sh
The root user has a user ID of 0 and a group ID of 0. For security reasons, some systems
move the list of usernames into the shadow file.
136 Unix and Shell Programming
5.6.2 /etc/shadow
Passwords are encrypted for security. We only need to use the same algorithm to encrypt a
newly entered password and then compare the result against the encrypted version stored in
the file; if they match, the password is correct.
The /etc/passwd file must be readable by everyone because it is used by so many programs
to find the user ID number, group membership, and home directory. This allows one to get
a copy of the /etc/passwd file, and thus get a copy of all the password encryptions. It may
result in security problems.
The solution is to hide the passwords in another file. The file holding the passwords
is known as the shadow file and is normally named /etc/shadow. The shadow file is only
readable by its owner, which is the root. This means that no one can read the passwords
unless they have the root access.
We can tell by looking at the data in the /etc/passwd file whether the actual password is
in a shadow file, because the password field displays an x rather than an encrypted password.
The shadow file is a text file, and each line displays the password information of a user. The
fields in each line of the /etc/shadow file are separated by colons. Table 5.20 gives a brief
description of the fields found in this file.
Table 5.20 Fields found in the /etc/shadow file
Field Description
user name The same name found in the /etc/passwd file
encrypted password The encrypted form of the password
password last changed The day the password was last changed (The date is a count of the
number of days since 1 January 1970.)
password may be changed The number of days before the user has permission to change the
password (A value of -1 means that it can be changed any time.)
password must be changed The number of days from the time the password is set until the
password expires and must be changed
password expire warning The number of days prior to the password expiry date that the user has
to be warned
disable after expires The number of days after the password expires that the account is to
be automatically disabled
disabled The date that the account was disabled (The date is a count of the
number of days since 1 January 1970.)
The shadow file also contains dates and day counters, which can be used to force the users to
change their passwords from time to time, under the threat of their account getting disabled.
5.6.3 /etc/hosts
The hosts file contains static address information for computers on our local network.
Whenever we refer to a computer by its name, the commands that we use must have some
way to translate that name into an IP address. Our Internet service provider (ISP) should
provide us with the address of one or more name servers that we can use. If we use a dial-up
File Management and Compression Techniques 137
connection, the address of a name server is normally returned to our computer as part of the
initial connection sequence. However, in some cases, we must configure the address of the
name server into the routing table.
If we have a local network, we will need to provide each member of our network with
the address of all the other members. If there are many computers on our local network, it
is easier to use one of them as a name server by configuring a daemon to respond to address
requests and then configuring other computers to send address queries to our local name
server.
The contents of /etc/hosts file may be as follows:
127.0.0.1 localhost localhost.localdomain
192.168.0.1 mce1 mce1.localdomain
192.168.0.2 mce2 mce2.localdomain
192.168.0.3 mce3 mce3.localdomain
The same list of addresses is required for every computer on the network, and a computer’s
address can be included in the file, so that the file can be duplicated everywhere on the
network by simply copying it from one computer to another. Each line in the file contains
an IP address, followed by a list of alias names for the computer. In this example, each
computer can be located by its simple name or domain name.
The first line of the file is always named the local host and always has the address 127.0.0.1.
This special loopback address is used by programs on the local computer to address its own
services.
If the content of hosts.deny file is as given, it means that every service is denied to every
host. The following example of hosts.allow begins by granting all permissions to every
host in the local domain and every host in the domain philips.com. All permissions are
also granted to the computer with the IP address 234.51.135.18. Finally, HTTP web service
(specified by naming the daemon to receive the message) is granted to every host except the
ones in the domain .godrej.com
$ cat /etc/hosts.allow
ALL: LOCAL.philips.com
ALL: 234.51.135.18
httpd: ALL EXCEPT .godrej.com
138 Unix and Shell Programming
Examples
(a) radius=5
Creates a shell variable having the name radius.
Similarly, name="ravi"
name is the variable with the value ravi.
(b) Null string is a string with no characters.
area=" "
We can use letters, digits, and the underscore character in variable names.
area_circle=56
(c) To find out the value of a shell variable, we can use the echo command. Ordinarily, echo
merely echoes its arguments on the screen.
$ echo radius
radius
PS1=$
PS2=>
TERM=adm5
Table 5.21 shows the description of these shell variables.
Table 5.21 Shell variables
Shell variable Description
EXINIT This refers to the initialization instructions for the ex and vi editors.
HOME This is set to the path name of our home directory.
IFS (Internal This is set to a list of the characters that are used to separate words in a command
field separator) line. Normally, this list consists of the space character, the tab character, and the
newline character.
LOGNAME This gives the user’s login name.
MAIL This variable’s value is the name of the directory in which an electronic mail addressed
to us is placed. The shell checks the contents of this directory very often, and when a
new content shows up, we are informed about it.
PATH This names the directories that the shell will search in to find the commands that we
execute. A colon is used to separate the directory names without spaces.
PS1 (Prompt This symbol is used as our prompt. Normally, it is set to $, but we can redefine it by
string 1) merely assigning a new value. For example, the command PS1=# resets the prompt
to a # symbol.
PS2 (Prompt This prompt is used when a new line is started without finishing a command
string 2) (command continuation symbol).
TERM This identifies the kind of terminal we use (it helps the shell understand what to
interpret as erase key, kill line, etc.)
Note: Some variables like PS1 are defined by default. Others like PATH are defined in our .profile file.
CDPATH variable
The CDPATH variable contains a list of path names separated by colon (:) as shown.
:$HOME: /bin/usr/files
There are three paths in this example. Since the path starts with a colon, the first directory
is the current working directory. The second directory is our home directory. The third
directory is an absolute path name to a directory of files.
The contents of CDPATH are used by the cd command using the following rules:
1. If CDPATH is not defined, the cd command searches the working directory to locate the
requested directory. If the requested directory is found, cd moves to it. If it is not found,
cd displays an error message.
2. If CDPATH is defined as shown in the previous example, the actions listed are taken when
the following command is executed.
$ cd ajmer
140 Unix and Shell Programming
(a) The cd command searches the current directory for the ajmer directory. If it is found,
the current directory is changed to ajmer.
(b) If the ajmer directory is not found in the current directory, the cd command searches
in the home directory, which is the second entry in CDPATH. If the ajmer directory is
found in the home directory, it becomes the current directory.
(c) If the ajmer directory is not found in the home directory, cd tries to find it in /bin/
usr/files, which is the third directory in CDPATH. If the ajmer directory is found in /
bin/usr/files, it becomes the current directory.
(d) If the ajmer directory is not found in /bin/usr/files, the cd command displays an
error message and terminates.
$ echo $CDPATH
$ CDPATH= : $HOME: /bin/usr/files
HOME variable
The HOME variable contains the PATH to our home directory. The default is our login directory.
Some commands use the value of this variable when they need the PATH to our home directory.
For example, when we use the cd command without any argument, the command uses the
value of the HOME variable as the argument.
$ echo $HOME
/mnt/disk1/usr/chirag
$ oldHOME=$HOME
$ echo $oldHOME
/mnt/disk1/usr/chirag
$HOME=$(pwd)
/mnt/disk1/usr/chirag/ajmer
$ HOME =$oldHOME
$ echo $ HOME
/mnt/disk1/usr/chirag
PATH variable
The PATH variable is used for a command directory. The entries in the path variable must be
separated by colons. It works just like CDPATH. When the SHELL encounters a command, it
uses the entries in the PATH variable to search for the command under each directory in the
PATH variable. The major difference is that the current directory, which will be searched for
by the command, is mentioned at the end in this variable.
If we set the PATH variable as follows,
$ PATH =/bin:/usr/bin::
then, the shell will look for the commands that we execute in this sequence—shell will first search
the /bin directory, followed by the /usr/bin directory, and finally the current working directory.
Primary prompt variable
The primary prompt (PSI prompt) is set in the variable PS1 for the Korn and Bash shells and
prompt for the C shell. The shell uses the primary prompt when it accepts a command. The
default is the dollar sign($) for the Korn and Bash shells and the percent sign (%) for the C shell.
File Management and Compression Techniques 141
We begin by changing the primary prompt to reflect the shell we are working in, the Korn
shell. Since we have a blank at the end of the prompt, we must use quotes to set it. As soon
as it is set, a new prompt is displayed. At the end, we change it back to the default.
$ PS1="mce>"
mce > echo $PS1
mce>
mce > PS1="$"
$
SHELL variable
The SHELL variable holds the path of our login shell.
TERM variable
It holds the description for the computer terminal or terminal emulator we are using. The
value of this variable determines the keys that we can use for the purpose of editing. The
default value of TERM variable is vt100 (a terminal type).
Example $ TERM=vt100
The unset command is used to unset a system shell variable. The syntax for using the unset
command is as follows:
Syntax unset shell_variable
We can use the set command with no arguments to display the variables that are currently set.
$ set
Examples
(a) $ export welcomemsg
This example exports an earlier defined shell variable welcomemsg to make it available to
child processes.
(b) $ export welcomemsg='Good Morning'
This example defines a shell variable welcomemsg as well as exports it to be available to
child processes.
Note: While defining a shell variable, there should not be any space on either side of the ‘=’ sign.
We can see that the value of the shell variable radius is not seen in the new shell as it does
not know about this.
If we want the new shell to know about the shell variables created by us, we use the export
command. By using the export command, the shell variables are exported to child processes,
making it a global variable. The following example demonstrates this implementation.
Example $ radius=5
$ echo $radius
5
$ export radius
$ sh - Create new shell
File Management and Compression Techniques 143
$ echo $radius
5 - The new shell has a copy of radius
The export command causes a new shell to be given a copy of the original variable. This
copy has the same name and value as the original. Subsequently, the value of the copy
can be changed but when the subshell dies, the copy is gone though the original variable
remains.
To erase or remove a global variable, we use the unset command.
Syntax unset variable_name
Note: To find out the list of variables exported, just type the set command followed by the enter key: $ set
In this chapter, we understood the different types of files, the role of device drivers while
operating the devices, differences between block and character devices, usage of disk space,
amount of free disk space in all file systems, and partition in a disk drive. We learnt how
commands such as gzip, gunzip, zip, unzip, compress, uncompress, pack, unpack, bzip2,
and bunzip2 can be used for compressing and uncompressing files. We also discussed how
desired files can be found and executed specific commands on them. In addition, we learnt
how a corrupted file system can be repaired. We have also seen the role of important files
of the Unix system, shell variables, and system shell variables. In Chapter 6, we will learn
about handling processes, jobs, and signals in detail.
■ SUMMARY ■
1. All devices are considered to be files in Unix. Devices 3. The hosts file contains static address information for
such as floppy drive, CD-ROM, and hard disk are computers on our local network.
known as block devices as data is read from and 4. The hosts.allow file defines the list of hosts for whom
written into these devices in terms of blocks. Character the services are allowed; the hosts.deny file defines
devices, on the other hand, are also known as raw the list of hosts for whom the services are denied.
devices as the read/write operations in these devices 5. By default, the values stored in shell variables are local
are done directly, that is, ‘raw’ without using the buffer to the shell, that is they are available only in the shell in
cache. which they are defined.
2. A disk can be divided into several partitions. It can have 6. The export statement exports the local variables
a primary partition and an extended partition. There recursively to all child processes so that they are
can be multiple logical drives in an extended partition. available globally.
144 Unix and Shell Programming
■ F U N C T ION SPECIFICATION ■
■ EXERCISES ■
Objective-type Questions
State True or False
5.1 All devices are considered as files in Unix. 5.7 The term bs in the dd command stands for block
5.2 All device files are stored in /etc or in its size.
subdirectories. 5.8 The du utility displays complete information about
5.3 CD-ROM is a character device. the usage of disk space by each file and directory.
5.4 Printer is a character device. 5.9 By default, the du command displays information
5.5 The minor number represents the type of device. in terms of 1024-byte blocks.
5.6 The dd command is used for copying data from 5.10 The df command reports only the free disk space
one medium to another. of the file system installed on our machines.
File Management and Compression Techniques 145
5.11 The dfspace command reports the used disk compress, or pack commands.
space of the file system. 5.17 The fdisk command can be used to create and delete
5.12 The -q option of the zip command makes it run partitions on a disk, but cannot activate partitions.
in quiet mode. 5.18 A disk can have several primary partitions.
5.13 The extension added to the file that is compressed 5.19 The gzip command compresses the specified file
by the compress command is .C. and replaces it with the compressed file having
5.14 The find command is used for searching for files. the extension .gz.
5.15 The file command is used for displaying filenames. 5.20 By default, the values stored in shell variables
5.16 The gunzip command is used to uncompress a are local to the shell, that is, they are available
compressed file that is compressed by the gzip, only in the shell in which they are defined.
Multiple-choice Questions
5.1 The fdisk command is used to 5.3 The command used to view the contents of a
(a) format a disk packed file is
(b) remove bad sectors from a disk (a) pcat (c) show
(c) create partitions (b) cat (d) catpack
(d) repair a file system 5.4 The user’s log name is stored in
5.2 The gzip command compresses the file with the (a) LOGNAME (c) OWNER
extension (b) USER (d) LOGIN
(a) .gzip 5.5 The command to see the list of shell variables
(b) .gz is
(c) .gp (a) showvar (c) disp
(d) .g (b) showshell (d) set
146 Unix and Shell Programming
5.6 The command that is used to find out where an allowed is stored in the file
application program or system utility is stored on (a) hosts.allow (c) hosts
a disk is (b) services.txt (d) allowed
(a) search 5.9 The shell variable that sets the symbol for the
(b) findapp primary shell prompt is
(c) whence (a) PS2 (c) shellpr
(d) util (b) sprompt (d) PS1
5.7 The fsck command is used for 5.10 The TERM shell variable stores
(a) finding a file (a) shell duration
(b) compressing a file (b) terminal description
(c) uncompressing a file (c) logged-in time
(d) repairing a file system (d) booting time
5.8 The list of hosts for whom the services are
Programming Exercises
5.1 Write the command for the following tasks: (m) To determine the type of the file, accounts.
(a) To copy the entire disk, hdb, to a file called txt
back.dd (n) To display the files and their path names that
(b) To find the disk usage of every file in / have not been accessed for over 10 days
project directory (o) To check the file system
(c) To find the total number of blocks occupied 5.2 What will the following commands do?
by the /project directory (a) $ export project_name
(d) To display a report of the free disk space for (b) $ PS1="UnixPrompt>"
all the file systems installed on our machines (c) $ passwd
(e) To display the free disk space in terms of (d) $ grep john /etc/passwd
megabytes and percentage of total disk space (e) $ which cat
(f) To compress a file a.txt to a.txt.gz (f) $ find . - mtime - 10 -name "*.txt" -
(g) To add a file account.txt to a zipped file print
finance.zip (g) $ bunzip2 accounts.txt.bz2
(h) To fix a zipped file finance.zip (h) $ pack accounts.txt
(i) To compress a file a.txt and also show how (i) $ zip -q accounts.zip *.txt
much compression was done (j) $ df -h
(j) To uncompress a file a.txt.bz2 file (k) $ du –s *.txt
(k) To set the secondary prompt, the prompt that (l) $du /projects
is displayed when a command is continued to (m) $ locate "projects"
the second line to '>>>' (n) $ find / -size +15 -print
(l) To display the list of path names (o) $ echo $HOME
Review Questions
5.1 Explain the following commands with syntax and 5.3 (a) Explain the different options used in the find
examples: command to search for a desired file.
(a) dd (c) uncompress (b) How is the file system repaired in Unix? Explain.
(b) format (d) unpack 5.4 What is the difference between the following
5.2 (a) What are the points of comparison between the files?
following commands: gzip, zip, compress, (a) /etc/passwd and /etc/shadow
and pack? (b) /etc/hosts.allow and /etc/hosts.deny
(b) What is the difference among the following 5.5 How is a shell variable created and how can a
commands: du, df, and dfspace? local shell variable be made a global variable?
File Management and Compression Techniques 147
5.6 Explain the usage of the following system shell (a) HOME (c) PS2 (e) TERM
variables: (b) MAIL (d) PATH
Brain Teasers
5.1 In long listing command ls –l, if you find using the bunzip2 command? If yes, what is
a file with mode field set to l, what does it mean? that?
5.2 Correct the following command to backup a hard 5.8 If we provide the command file a.txt, we get
disk to a file. the output, ‘cannot open for reading’. What does
$ dd if=/file.dd of=/dev/hda it mean?
5.3 Correct the mistake in the following command 5.9 What command must be given to delete all the
for compressing few .txt files in the name abc. files that have not been accessed for the last six
zip in quiet mode. months?
$ zip *.txt abc.zip 5.10 Correct the following command to display all
5.4 Can you uncompress a .bz2 file to the standard .txt files in the current directory.
output? If yes, how? $ find . - name "*.txt" - ls
5.5 How will you know whether a particular file in 5.11 What will happen if answer Yes is provided to
the /dev directory represents a character device the question “CLEAR?”, which appears while
or block device? running fsck command?
5.6 If a device has a major number 8 and minor 5.12 Is the following command to set the primary
number 0, what does it represent? prompt correct? If not, identify the mistake.
5.7 Is there any way to uncompress a .bz2 file without PS2='UnixPrompt>'
Processes and
Signals
6
After studying this chapter, the reader will be conversant with the following:
• Processes and their address space, structure, data structures describing
the processes, and process states
• Difference between a process and a thread
• Commands related to scheduling processes at the desired time, handling
jobs, switching jobs from the foreground to the background and vice
versa, etc.
• Suspending, resuming, and terminating jobs, executing commands in a
batch, ensuring process execution even when a user logs out, increasing
and decreasing priority of processes, and killing processes
• Signals, their types, and the methods of signal generation
• Virtual memory and its role in executing large applications in a limited
physical memory and mapping of a virtual address to the physical memory
enables users to log in to the Unix system. When a user logs in, the command shell runs
as the first process from where other processes are forked in response to the commands,
programs, utilities, etc., executed by the user.
Note: The process that calls fork is known as the parent process and the process that is created through
fork is known as the child process. The child process is an exact clone of the parent process. Both these
processes share the same memory, registers, environment, open files, etc. In addition, the parent and child
processes have separate address spaces enabling them to execute independently.
User mode User mode is the mode in which processes related to user activities get
executed. Commands, programs, utilities, etc., executed by the user are run in this mode.
These processes being trivial in nature, the code in the user mode runs in a non-privileged
protection mode. Switching from user to kernel mode takes place either when a user’s
process requests services from the operating system by making a system call or when some
interrupt occurs during the events such as timers, keyboard, and hard disk input/output (I/O).
Kernel mode In kernel mode, the system processes, that is, the processes related to
managing a computer system and its resources get executed. The processes used to allocate
memory to access hardware peripherals such as printer and disk drive run in this mode.
These processes are critical in nature, that is, they can make an operating system inconsistent
if they are not handled properly. Hence for security reasons, these processes are run in a
privileged protection mode.
The user and kernel modes can be better understood with the help of a block diagram
(Fig. 6.1) of the kernel architecture.
In Fig. 6.1, the users initially execute their processes in the user mode. When the user
process needs some kernel service (such as accessing memory, disk file, printer, or other
hardware peripherals), it interacts with the kernel through the system call. System calls are
functions that run in the kernel mode. Hence while executing system calls, the user process
switches from the user mode to the kernel mode.
Figure 6.1 shows the following two main components that make up the kernel:
File subsystem
The file subsystem manages the files of the Unix system. In the previous chapters, we
learnt that everything in Unix is in the form of files, that is, all devices and peripherals are
considered files. Communication between the hardware and their respective device drivers
are managed by the file subsystem. Even the buffers that are used for storing the data that
is either fetched from the devices or is to be written to the devices are managed by the file
subsystem.
User
User mode
Kernel mode
System call inteface
File Process
subsystem control
subsystem
Interprocess
Buffer
communication
cache
Scheduler
Device Memory
drivers management
Hardware drivers/Interface
Hardware
implements communication between them. The processes are basically executable files
that are designated for certain tasks. For loading the executable file into the memory,
the process control system interacts with the file subsystem and thereafter executes it to
perform the required action. The process control subsystem comprises the following three
modules:
Interprocess communication An application usually consists of several processes that
undergo execution simultaneously. In addition, the data processed by one process has to be
input into another process for further processing. This module performs all the tasks required
to establish communication among the different processes and also synchronizes them. By
process synchronization, we imply that the module manages the locks when two processes
update a particular type of content, that is, it ensures that no two processes update the same
data simultaneously.
Memory management This module manages memory allocation. It allocates memory to
the required process. If the memory is not enough, it transfers certain selected pages of the
current process to the secondary storage, hence creating space for the required process. In
addition, it frees the memory assigned to the process when it is terminated so that memory
can be assigned to some other process.
Scheduler The task of this module is to pick up the ready-to-run processes from the
memory and assign the CPU to it. When the current process suspends for some I/O operation,
Manipulating Processes and Signals 151
its job is to seek the next process and schedule it for execution. In addition, when some
higher priority process comes in, the scheduler pre-empts the current process and brings in
the higher priority process and assigns the CPU to it.
Both the file and process subsystems are used for managing the hardware of a system.
These interact with the drivers and hardware interface (part of the kernel) for getting the
desired task performed by the hardware.
We will now be dealing with the processes in more detail, including the segments that
create them and the structures that are involved in handling them.
Process table
The process table (also known as kernel process table) is an array of structures that contains
an entry per process. Every process entry contains process control information required
by the kernel to manage the process and is hence maintained in the main memory. The
process entry is also known as process control block (PCB) and contains the following
information:
Process state It represents the process state, that is, whether it is in ready, running, waiting,
sleeping, or zombie mode.
Process identification information It uniquely identifies a process and consists of the
following three elements:
152 Unix and Shell Programming
1. Process identifier (PID): This refers to a unique number assigned to identify a process.
2. User identifier (UID): This refers to the ID of the user who created the process. The process
identification also includes the group ID of the user (GID), the effective user ID (EUID), set
user ID (SUID), file system user ID (FSUID), the effective group ID (EGID), set group ID
(SGID), and file system group ID (FSGID) of the user who also starts the process.
3. Parent process identifier (PPID): This refers to the identifier of the parent process that
created the process.
Program counter It stores the address of the next instruction to be executed by this process.
CPU registers It helps in initiation of the process using general-purpose and other registers.
CPU scheduling information It includes an algorithm on the basis of which the scheduling
of the process is determined.
Memory-management information It stores information of the memory used and released
by the process.
Accounting information It stores information such as process numbers, job numbers, and
CPU time consumed.
I/O status information It stores information such as the list of I/O devices and the status
of open files allocated to the process.
User area
The Unix kernel executes in the context of certain processes. The user area (U area) refers
to private information in the context of a process. The U area of a process contains the
following:
1. User IDs that determine user privileges
2. Current working directory
3. Timer fields that store the time the process spent in the user and kernel modes
4. Information for signal handling
5. Identification of any associated control terminal
6. Identification of data areas relevant to I/O activity
7. Return values and error conditions from system calls
8. Information on the file system environment of the process
9. User file descriptor table that stores the file descriptors of the files that the process has
opened
Note: The process entry also contains certain pointers such as pointers to the user and shared text areas.
You may recall that all the information of a file, such as file data, access permissions, and access
times, is stored in an inode. Inodes are maintained in the inode table. Besides inode table, the
kernel has two other file structures known as the file table and the user file descriptor table.
File table It is a global kernel structure that contains information such as storing the byte
offset in the file and indicating the location from where the next write/read operation will
start, mode of opening, and reference count of all the currently opened files. The file table
also contains the permissions that are assigned to the process.
Manipulating Processes and Signals 153
User file descriptor table An individual file descriptor is allocated per process. It keeps
track of the files that are opened by the process.
When a process opens or creates a file, a file descriptor for it is returned by the kernel, which
is stored as a new entry created in the user file descriptor table. For reading and writing into
a file, the file descriptor in the user file descriptor table is located and the pointers from it to
the file table and inode table are used to access or write the file data (refer to Fig. 6.2).
Fig. 6.2 Relation between user file descriptor table, file table, and inode table
After understanding the process table, let us discuss the next structure that stores information
that is private to the process.
Region table
A region is a continuous area of a process’s address space such as text, data, and stack.
Region table entries indicate whether the region is shared or private. They also point to the
location of the region in the memory (refer to Fig. 6.3). A region table stores the following
information:
1. Pointers to inodes of files in the region
2. The type of region
154 Unix and Shell Programming
Region table
U area Text
User file Data
descriptor Process table
Stack
table Process1
Process2
U area
User file Region shared
descriptor by two processes
table
Text
Data
Stack
Per process
region table
Fig. 6.3 Structures that make up a process
3. Region size
4. Pointers to page tables that store the region
5. Bit indicating if the region is locked
6. The process numbers currently accessing the region
Fork
Created
Swap out
User running Preempt Ready to run Ready to run
in memory Swap in swapped out
Return Reschedule
process
System call,
interrupt Kernel running
Wakeup Wakeup
Sleep
Exit
that is, to the user running state. Besides this, a process in the kernel running state can also
switch to the sleep state, waiting for the occurrence of an event (like waiting for the user to
enter some data). This stage is known as asleep in memory state. The process in the kernel
running state can also terminate switching itself to the zombie state. A zombie process is a
dead child process that has completed its execution and has sent a SIGCHLD signal to its parent
allowing it to read its exit status. Until and unless the parent reads the exit status of the child
process, its entry remains in the process table. The process sleeping in the memory can either
be swapped out to the secondary storage to sleep and swapped out state or woken up to move
to the ready to run in memory state, if the event that it was waiting for occurs. The process
in sleep and swapped out state will be moved to the ready to run swapped out state where it
waits for the swapper to move it to the ready to run in memory state whenever it is required.
Note: When a process is required, the space for it is created in the primary memory and is swapped into the
primary memory by the swapper switching it to the ready to run state.
The process in the preempted state returns to the user mode, that is, the user running state
when it is required by the user. The process running in the user mode switches to the kernel
mode when an interrupt occurs, a system call is made to access operating system services, or
when some fault or exception occurs.
Note: The scheduler decides the process that has to be submitted next to the CPU for action.
The different states of a process are briefly described in Table 6.1.
Almost all the process states discussed are self-explanatory, except one, the zombie
process. We will elaborate on this in Section 6.3.
156 Unix and Shell Programming
$ ps -el
Table 6.2 Brief description of the 9. ADDR represents the memory address of the
characters that may appear in the S column process.
Process States 10. SZ represents the total number of pages in the
process.
D Indicates a process in disk
11. WCHAN represents the address of an event where
I Indicates an idle process the process is switched to sleep mode.
R Indicates a runnable process 12. TTY represents the terminal from where the
S Indicates a sleeping process process is created.
T Indicates a stopped process The character Z in the S column confirms that it is
Z Indicates a zombie process a zombie process. We can see in this output that
the process with PID 146 is zombie. The other
characters that may appear in the S column to show the current state of the process are shown
in Table 6.2.
To remove or delete a zombie process, the kill command is used. To kill, the zombie
process with PID 146, shown in the output, can be deleted using the following statement:
$ kill -9 146
Conventionally, to remove a zombie process, its parent is informed that the child has died by
sending a SIGCHLD signal manually using the kill command. Thereafter, the signal handler
executes the wait system call that reads its exit status and removes the zombie. In case a
parent fails to call the wait system call, the zombie will be left in the process table. On reading
the exit status of the zombie process, it is removed. Once removed from the process table, the
zombie’s process ID and entry in this table can be reused. In case the parent process refuses
to remove the zombie, we can forcefully remove a zombie by removing the parent process.
Note: A zombie process is not the same as an orphan process. An orphan process is a process that is still
executing, but whose parent has died. Orphan processes do not become zombie processes, because when a
process loses its parent, the init process becomes its new parent.
What is the name of the task that suspends the execution of one process on the CPU while
resuming execution of some other process? It is called context switching. We will now
discuss this in detail.
6.5 THREADS
A thread is the smallest unit of processing. A process can have one or more threads. This
is shown in Fig. 6.5. Multiple threads within a process share memory resources whereas
different processes do not share these resources. In multithreading, a processor switches
between different threads. A thread has its own independent flow of control as long as its
parent process exists and dies if the parent process dies.
Figure 6.5 shows two processes, Process 1 and
Process 1 Process 2 Process 2, in the user space. Process 1 consists of a
single thread whereas Process 2 is multi-threaded.
Threads have some properties of processes.
Like processes, a thread consists of the following:
1. A program counter to indicate which
instruction to execute next
2. Registers to store data in the variables
Single thread Multiple threads 3. A stack to store information related to the
procedure called
User space
Having properties similar to processes, threads
Fig. 6.5 Threads within processes
are also known as lightweight processes.
6.5.1 Comparison Between Threads and Processes
Table 6.3 shows the differences between processes and threads.
Table 6.3 Differences between processes and threads
Process Thread
Processes are individual entities. Threads are part of processes.
It takes quite a long time to create and terminate It comparatively takes lesser time to create a new thread
a process. than a process, because the newly created thread uses
the current process address space. Similarly, it takes
lesser time to terminate a thread than a process.
It takes longer to switch between two processes It takes lesser time to switch between two threads
as they have their individual address spaces. within the same process as they use the same
address space.
Communication of data among processes is Communication of data among threads is quite easy
quite sophisticated as it requires an inter process as they share a common address space.
communication mechanism.
Similar to a traditional process, a thread can be in any one of the following states: running,
blocked, ready, or terminated. A running thread is the one to which the CPU is assigned
and is currently active. A blocked thread is the one that is waiting for some event to occur.
On occurrence of the event, the blocked thread turns into a ready state. A ready thread is a
thread that has all the resources except the CPU, and hence waits for the CPU’s attention.
The thread that has completed its work is said to be in a terminated state. A thread can also
be terminated in between, if desired by the process.
Manipulating Processes and Signals 159
When a process is created, it is assigned a unique identification number known as the process
identifier (PID) by the kernel. The PID value can be any value from 0 to 32767. However, this
range depends on a particular Unix variant. It is typed as pid_t, whose size may vary from
system to system. The name of the process remains same as the name of the program being
executed. Every process is created from a parent process. The process that is created is known
as the child process and the process from which it is created is known as its parent process.
Unix creates the first process with PID as 0 when the system is booted.
Let us take a look at the commands that give us information of the processes running in
our system.
Examples
(a) $ ps
PID TTY TIME CMD
739 tty01 00:00:03 sh
894 tty01 00:00:12 ps
160 Unix and Shell Programming
By default, the ps command displays only the processes that are running at the user’s terminal.
(b) To get the list of processes of the other users logged in to the system, we use the following
command.
$ ps -a
PID TTY TIME CMD
739 tty01 00:00:03 sh
894 tty01 00:00:12 ps -a
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat
724 tty03 00:00:08 sh
The option –a is used for displaying the processes of all the users.
(c) To get the list of processes of a particular user, we give the following command.
$ ps –u ravi
Here, the option –u is used for displaying a list of processes of only the specified user and ravi
is the login ID of the user whose process list we want to see. We may get the following output.
PID TTY TIME CMD
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat
(d) To get complete (full) information of the processes, including the login ID of the user,
ID of the parent process, CPU time consumed, etc., we give the following command.
$ ps -f
Here, the option –f stands for full information. We may get the following output.
UID PID PPID C STIME TTY TIME CMD
ravi 423 341 3 13:01:39 tty01 00:00:01 -sh
ravi 661 423 9 13:05:78 tty01 00:00:01 ps -f
The first column (UID) displays the login ID of the user. PID stands for the process iden-
tifier and is used for the identification of the process. PPID is the identification of the parent
process from where the current process was born (or created). C is the amount of CPU time
consumed by the process. STIME is the time when the process started. The login shell has
PID 423 and PPID 341, which implies that the shell is the child process that was created by
a system process with PID 341. The parent PID (i.e., PPID) of the ps -f command is 423 as this
command was launched by the shell (hence the shell is the parent process of the ps -f command).
(e) To get the list of processes that are created by the user from a particular terminal, we give
the following command.
$ ps –t tty02
PID TTY TIME CMD
224 tty02 00:00:10 sh
901 tty02 00:00:07 cat
(f) To see the list of the processes that are system-generated and the ones that are running at
the current instant, we give the following command.
$ ps -e
PID TTY TIME CMD
0 ? 00:00:00 sched
1 ? 00:00:01 init
2 ? 00:00:00 vhand
3 ? 00:01:01 bdflush
970 ? 00:00:00 getty
975 ? 00:01:00 getty
Most of the processes that we see in this listing are very important for the functioning of
the Unix operating system and hence keep running continuously in the background until
the system shuts down. These processes are known as daemons as they run automatically
without any request generated from the user. Since these system processes or daemons
are not executed from any terminal, we see a ? in the column TTY in the listing provided.
We also see in the aforementioned listing that the first process is the sched (scheduler) that
schedules the next process from the ready queue and submits it to the CPU for necessary
action. The init is the parent process of a daemon and its PID is 1. The vhand is a sort of page
stealing daemon that releases pages of the memory for use by other processes. The rest of
the processes (found in the list) also help in some way or the other in the proper functioning
of the Unix system and do different tasks such as initializing the processes, swapping in
and out the active processes, and flushing the buffer for different I/O operations, among
others.
(g) To see the threads of the currently running processes, we use the following command.
$ ps -L
PID LWP TTY LTIME CMD
739 1 tty01 0:00 sh
894 1 tty01 0:00 ps
This command shows threads with LWP and NLWP columns. As said in Section 6.6,
LWP and NLWP represent lightweight processes and number of lightweight process,
respectively.
There are two types of jobs—foreground and background. Foreground jobs are those
that appear active on the terminal and need continuous interaction with the user for their
execution. In other words, a foreground job might require input from the user and until and
unless it is completed or suspended, no other job or command can be executed, whereas
162 Unix and Shell Programming
background jobs are those, which on execution, immediately display the shell prompt
allowing the user to execute other jobs. This means that the background job does not lock
the input and output terminals and instead allows the user to execute more processes.
To resume the suspended job, we use the fg command in the following way.
$ fg: It resumes the same suspended job, sort a.lst > b.lst (i.e., sort command).
To terminate (kill) a running foreground job, we use Ctrl-c. After terminating the job, we
press the Enter key for getting the command prompt.
beginning of this section, the background jobs do not lock the standard input and output
terminals and immediately display the shell prompt, allowing us to execute jobs of a higher
preference. To execute any job in the background, simply add the ampersand symbol (&)
after the command.
Syntax bg [%job]
$ bg %1
[1] sort letter.txt > better.txt &
(d) If we do not want to sort the file letter.txt and wish to terminate the background job,
we kill the job by specifying its job number by using the following command.
$ kill %1
[1] + Terminated sort letter.txt > better.txt &
We can see that all the three commands—stop, bg, and kill—display the program name on
the right.
Options Description The options of the jobs command are briefly described
-l Displays the process ID along with the in Table 6.5.
job ID for each job All the jobs running in the foreground or
-p Displays only the process ID for each background will be displayed. The output of the jobs
job, without the job ID command displays the job number, currency flag, and
%job_id Represents the identification number of
the status of the job.
the job whose status we wish to find out Examples
%str Represents the job whose command
(a) $jobs
begins with the string, str
[3] + Stopped(SIGTSTP) sort letter.txt
%?str Represents the job whose command > better.txt &
contains the string, str [2] − Running cat abc.txt | lp
%% Represents the current job [1] + Running chirag1.sh&
%+ Represents the current job (same as %%)
In this listing, we see that job 3 has a plus (+) and
%- Represents the previous job job 2 has a minus (−) in the second column. These +
Manipulating Processes and Signals 165
and − signs are known as the currency flags. The plus sign (+) indicates the default job.
The default job is the job that will be considered when any of the commands, namely
stop, bg, fg, and kill, is given without specifying the job number. For example, if we
issue the kill command (without specifying the job number of the job that we want to
kill), job number 3 will be killed as it is the default job. The currency flag minus sign
indicates the default job that follows the first job. In other words, when the first default job
is terminated or is complete, the job with minus sign will become the default job, that is,
its sign currency flag will be changed from − sign to + sign.
When any job is suspended (by issuing Ctrl-z command), it automatically becomes
the default job and is assigned a + currency flag. When another job is also suspended,
that one becomes the default job (getting the + currency flag) and the earlier suspended
job gets the − currency flag, and so on.
(b) To display the process ID along with the job ID use the –l option in the following way.
$jobs –l
[3] + 30178 Stopped(SIGTSTP) sort letter.txt > better.txt &
[2] − 30189 Running cat abc.txt | lp
[1] 30190 Running chirag1.sh&
(c) To display the status of the job with ID 2, we give the following command.
$jobs %2
[2] - Running cat abc.txt | lp
(d) To display the status of the job that contains the lp command, we give the following
command.
$jobs %?lp
[2] - Running cat abc.txt | lp
Note: Process synchronization—When more than one process runs simultaneously, it is quite possible that
they try to access and modify the same content (of a file or its region) simultaneously. This situation may
result in inconsistency and ambiguity, that is, modifications made by one process may be lost or overwritten
by the modifications performed by another. Synchronization among the processes is implemented to maintain
consistency and avoid ambiguity. Process synchronization sets up a mechanism where only one process is
able to modify the content and other processes that wish to modify the same content are compelled to wait until
the first process is complete. Enabling only a single process to modify the content ensures the integrity of the
content. We will discuss process synchronization through semaphore in detail in Chapter 14.
use a comma (,). In this example we have used a comma to specify both the 10th and 20th
day of every month.
Each field in a.bat is separated by either a space or a tab. The first day of the week,
Sunday, is represented by 0.
When we execute the crontab command, the following occurs:
$ crontab a.bat
The contents of a.bat are automatically transferred to the /usr/spool/cron/crontabs directory
where they are stored in a file with our login name. From there onwards, the cron daemon will
read this file (crontab file) and execute the commands (processes) specified in it regularly.
If we want to make some changes in the scheduling of the processes, we need to edit
our local file a.bat (in our home directory) and after saving the changes, again execute
the crontab command to re-transfer it in the /usr/spool/cron/crontabs directory using our
login name (the earlier crontab file will be replaced by the new one).
To view the commands that we have supplied to our crontab file, we use the -l option with
the crontab command:
$crontab -l
To remove the crontab file, we use the following command:
$ crontab -r
Another command that allows the scheduling of processes is the at command. We will now
study this.
(Contd)
168 Unix and Shell Programming
We can also specify a future time by adding a plus sign (+) followed by the minute, hours,
days, weeks, months, or years.
Examples
(a) $at 18:00
echo "Office time over. Time to log out"> /dev/tty02
Ctrl-d
Job 3434443 at Sun Nov 16 18:00:00 IST 2012
On pressing Ctrl-d, the at command displays the job number and the date and time
of the scheduled execution of the echo command. The job number terminates with ‘a’
indicating that this job has been submitted using the at command.
Now, the following message will be echoed on the tty02 terminal at 6:00 p.m.
Office time over. Time to log out.
Note: When the output is redirected to a terminal, as is done in the aforementioned command (/dev/tty02),
the message will be echoed on the screen and when redirection is not specified, the message is received by
the target through mail command.
(b) We can also execute the commands stored in a file as shown in the following example.
$at 18:00
jobstodo.sh
Ctrl-d
Job 3434443.a at Sun Nov 16 18:00:00 IST 2010
By executing this command, all the commands stored in the script file jobstodo.sh will
be executed at 6:00 p.m. and their outputs will be mailed to us. You may recall that if the
redirection is not specified for any command, its output is sent to the user through mail.
(c) It can be noted that we can also add a.m. or p.m. with the time. For example, in the
aforementioned command, we can write $at 18:00 as $ at 6pm
On executing this at command, we will see a message on our screen displaying ‘you
have mail’ at 6 p.m.
(d) To view the output of the aforementioned command, we use the following mail command.
$mail
message 1:
To: ravi
Manipulating Processes and Signals 169
Note: The commands specified in jobs.txt will still run even if we exit from the system.
(f) To view the list of jobs submitted using the at command, we give the following
command.
$at –l
(g) To remove scheduled jobs from the job queue, we use the following command.
$ at -r 3434443
This command will remove job 3434443 from the job queue.
(h) We can use a lot of keywords when specifying the time for scheduling jobs such as now, today,
tomorrow, noon, day, year, month, hours, and minutes. The following are some examples.
(i) $ at now + 2 hours
(ii) $ at now +1 week
(iii) $ at 6pm today
(iv) $ at 6pm next month
(v) $ at 6pm Fri
(vi) $ at 0915 am Nov 16
(vii) $ at 9:15 am Nov 16
The two commands that are often discussed along with the at command are atq and atrm.
atq This command lists the jobs that are scheduled to run, similar to the at -l command.
The jobs are displayed along with their job number, date, hour, etc.
Syntax atq
Example
$ atq
324556 2012-10-15 10:30 a sort a.txt
324557 2012-10-16 07:00 a date
atrm This command deletes the specified job number, similar to the at -r command.
Syntax atrm job_no
This command will remove job 3434443 from the job queue.
Note: The difference between the at and crontab commands is that the jobs scheduled by the at command
have to be rescheduled after their execution (if we want to execute them again). On the other hand, crontab
carries out the submitted job every day for years without the need for rescheduling.
170 Unix and Shell Programming