0% found this document useful (0 votes)
337 views722 pages

Core Linux PDF

This document provides an introduction to operating system concepts. It discusses what an operating system is and its key functions, including managing computer resources and allowing for multitasking and multi-users. The document outlines the basic components of an operating system, including processes - the running instance of a program in memory. It notes that processes have data and resources associated with them beyond just the program file on disk. The goal is to provide a starting point for understanding operating systems at a high level before diving into more advanced topics.

Uploaded by

Sreeram Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
337 views722 pages

Core Linux PDF

This document provides an introduction to operating system concepts. It discusses what an operating system is and its key functions, including managing computer resources and allowing for multitasking and multi-users. The document outlines the basic components of an operating system, including processes - the running instance of a program in memory. It notes that processes have data and resources associated with them beyond just the program file on disk. The goal is to provide a starting point for understanding operating systems at a high level before diving into more advanced topics.

Uploaded by

Sreeram Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 722

Chapter I

Introduction to Operating Systems


It is a common occurrence to find users who are not even aware of what operating system they are
running. On occasion, you may also find an administrator who knows the name of the operating
system, but nothing about the inner workings of it. In many cases, they have no time as they are
often clerical workers or other personnel who were reluctantly appointed to be the system
administrator.
Being able to run or work on a Linux system does not mean you must understand the intricate
details of how it functions internally. However, there are some operating system concepts that will
help you to interact better with the system. They will also serve as the foundation for many of the
issues we're going to cover in this section.
In this chapter we are going to go through the basic composition of an operating system. First, we'll
talk about what an operating system is and why it is important. We are also going to address how
the different components work independently and together.
My goal is not to make you an expert on operating system concepts. Instead, I want to provide you
with a starting point from which we can go on to other topics. If you want to go into more detail
about operating systems, I would suggest Modern Operating Systems by Andrew Tanenbaum,
published by Prentice Hall, and Operating System Concepts by Silberschatz, Peterson, and Galvin,
published by Addison-Wesley. Another is Inside Linux by Randolph Bentson, which gives you a
quick introduction to operating system concepts from the perspective of Linux.

What Is an Operating System


In simple terms, an operating system is a manager. It manages all the available resources on a
computer. These resources can be the hard disk, a printer, or the monitor screen. Even memory is a
resource that needs to be managed. Within an operating system are the management functions that
determine who gets to read data from the hard disk, what file is going to be printed next, what
characters appear on the screen, and how much memory a certain program gets.
Once upon a time, there was no such thing as an operating system. The computers of forty years ago
ran one program at a time. The computer programmer would load the program he (they were almost
universally male at that time) had written and run it. If there was a mistake that caused the program
to stop sooner than expected, the programmer had to start over. Because there were many other
people waiting for their turn to try their programs, it may have been several days before the first
programmer got a chance to run his deck of cards through the machine again. Even if the program
did run correctly, the programmer probably never got to work on the machine directly. The program
(punched cards) was fed into the computer by an operator who then passed the printed output back
to the programmer several hours later.
As technology advanced, many such programs, or jobs, were all loaded onto a single tape. This tape
was then loaded and manipulated by another program, which was the ancestor of today's operating
systems. This program would monitor the behavior of the running program and if it misbehaved
(crashed), the monitor could then immediately load and run another. Such programs were called
(logically) monitors.
In the 1960's, technology and operating system theory advanced to the point that many different
programs could be held in memory at once. This was the concept of "multiprogramming." If one
program needed to wait for some external event such as the tape to rewind to the right spot, another
program could have access to the CPU. This improved performance dramatically and allowed the
CPU to be busy almost 100 percent of the time.
By the end of the 1960's, something wonderful happened: UNIX was born. It began as a one-man
project designed by Ken Thompson of Bell Labs and has grown to become the most widely used
operating system. In the time since UNIX was first developed, it has gone through many different
generations and even mutations. Some differ substantially from the original version, like BSD
(Berkeley Software Distribution) UNIX or Linux. Others, still contain major portions that are based
on the original source code. (A friend of mine described UNIX as the only operating system where
you can throw the manual onto the keyboard and get a real command.)
Linux is an operating system like many others, such as DOS, VMS, OS/360, or CP/M. It performs
many of the same tasks in very similar manners. It is the manager and administrator of all the
system resources and facilities. Without it, nothing works. Despite this, most users can go on
indefinitely without knowing even which operating system they are using, let alone the basics of
how the operating system works.
For example, if you own a car, you don't really need to know the details of the internal combustion
engine to understand that this is what makes the car move forward. You don't need to know the
principles of hydraulics to understand what isn't happening when pressing the brake pedal has no
effect.
An operating system is like that. You can work productively for years without even knowing what
operating system you're running on, let alone how it works. Sometimes things go wrong. In many
companies, you are given a number to call when problems arise, you report what happened, and it is
dealt with.
If the computer is not back up within a few minutes, you get upset and call back, demanding to
know when "that darned thing will be up and running again." When the technician (or whoever has
to deal with the problem) tries to explain what is happening and what is being done to correct the
problem, the response is usually along the lines of, "Well, I need it back up now."
The problem is that many people hear the explanation, but don't understand it. It is common for
people to be unwilling to acknowledge that they didn't understand the answer. Instead, they try to
deflect the other person's attention away from that fact. Had they understood the explanation, they
would be in a better position to understand what the technician is doing and that he/she is actually
working on the problem.
By having a working knowledge of the principles of an operating system you are in a better position
to understand not only the problems that can arise, but also what steps are necessary to find a
solution. There is also the attitude that you have a better relationship with things you understand.
Like in a car, if you see steam pouring out from under the hood, you know that you need to add
water. This also applies to the operating system.
In this section, I am going to discuss what goes into an operating system, what it does, how it does
it, and how you, the user, are affected by all this.
Because of advances in both hardware design and performance, computers are able to process
increasingly larger amounts of information. The speed at which computer transactions occur is often
talked about in terms of billionths of a second. Because of this speed, today's computers can give
the appearance of doing many things simultaneously by actually switching back and forth between
each task extremely fast. This is the concept of multitasking. That is, the computer is working on
multiple tasks "at the same time."
Another function of the operating system is to keep track of what each program is doing. That is,
the operating system needs to keep track of whose program, or task, is currently writing its file to
the printer or which program needs to read a certain spot on the hard disk, etc. This is the concept of
multi-users, as multiple users have access to the same resources.
In subsequent sections, I will be referring to UNIX as an abstract entity. The concepts we will be
discussing are the same for Linux and any other dialect. When necessary, I will specifically
reference where Linux differs.

Processes
One basic concept of an operating system is the process. If we think of the program as the file
stored on the hard disk or floppy and the process as that program in memory, we can better
understand the difference between a program and a process. Although these two terms are often
interchanged or even misused in "casual" conversation, the difference is very important for issues
that we talk about later. Often one refers to an instance of that command or program.
A process is more than just a program. Especially in a multi-user, multi-tasking operating system
such as UNIX, there is much more to consider. Each program has a set of data that it uses to do
what it needs. Often, this data is not part of the program. For example, if you are using a text editor,
the file you are editing is not part of the program on disk, but is part of the process in memory. If
someone else were to be using the same editor, both of you would be using the same program.
However, each of you would have a different process in memory. See the figure below to see how
this looks graphically.

Under UNIX, many different users can be on the system at the same time. In other words, they have
processes that are in memory all at the same time. The system needs to keep track of what user is
running what process, which terminal the process is running on, and what other resources the
process has (such as open files). All of this is part of the process.
With the exception of the init process (PID 1) every process is the child of another process. In
general, every process has the potential to be the parent of another process. Perhaps the program is
coded in such a way that it will never start another process. However, this is a limitation of that
programm and not the operating system.
When you log onto a UNIX system, you usually get access to a command line interpreter, or shell.
This takes your input and runs programs for you. If you are familiar with DOS, you already have
used a command line interpreter: the COMMAND.COM program. Under DOS, your shell gives
you the C:> prompt (or something similar). Under UNIX, the prompt is usually something like $, #,
or %. This shell is a process and it belongs to you. That is, the in-memory (or in-core) copy of the
shell program belongs to you.
If you were to start up an editor, your file would be loaded and you could edit your file. The
interesting thing is that the shell has not gone away. It is still in memory. Unlike what operating
systems like DOS do with some programs, the shell remains in memory. The editor is simply
another process that belongs to you. Because it was started by the shell, the editor is considered a
"child" process of the shell. The shell is the parent process of the editor. (A process has only one
parent, but may have many children.)
As you continue to edit, you delete words, insert new lines, sort your text and write it out
occasionally to the disk. During this time, the backup is continuing. Someone else on the system
may be adding figures to a spreadsheet, while a fourth person may be inputting orders into a
database. No one seems to notice that there are other people on the system. For them, it appears as
though the processor is working for them alone.
Another example we see in the next figure. When you login, you normally have a single process,
which is your login shell (bash). If you start the X Windowing System, your shell starts another
process, xinit. At this point, both your shell and xinit are running, but the shell is waiting for xinit to
complete. Once X starts, you may want a terminal in which you can enter commands, so you start
xterm.

From the xterm, you might then start the ps command, to see what other processes are running. In
addition, you might have something like I do, where a clock is automatically started when X starts.
At this point, your process tree might look like the figure above.
The nice thing about UNIX is that while the administrator is backing up the system, you could be
continuing to edit your file. This is because UNIX knows how to take advantage of the hardware to
have more than one process in memory at a time. (Note: It is not a good idea to do a backup with
people on the system as data may become inconsistent. This was only used as an illustration.)
As I write this sentence, the operating system needs to know whether the characters I press are part
of the text or commands I want to pass to the editor. Each key that I press needs to be interpreted.
Despite the fact that I can clip along at about thirty words per minute, the Central Processing
Unit(CPU) is spending approximately 99 percent of its time doing nothing.
The reason for this is that for a computer, the time between successive keystrokes is an eternity.
Let's take my Intel Pentium running at a clock speed of 1.7 GHz as an example. The clock speed of
1.7 GHz means that there are 1.7 billion(!) clock cycles per second. Because the Pentium gets close
to one instruction per clock cycle, this means that within one second, the CPU can get close to
executing 1.7 billion instructions! No wonder it is spending most of its time idle. (Note: This is an
oversimplification of what is going on.)
A single computer instruction doesn't really do much. However, being able to do 1.7 billion little
things in one second allows the CPU to give the user an impression of being the only one on the
system. It is simply switching between the different processes so fast that no one is aware of it.
Each user, that is, each process, gets complete access to the CPU for an incredibly short period of
time. This period of time (referred to as a time slice) is typically 1/100th of a second. That means
that at the end of that 1/100th of a second, it's someone else's turn and the current process is forced
to give up the CPU. (In reality, it is much more complicated than this. We'll get into more details
later.)
Compare this to an operating system like standard Windows (not Windows NT/2000). The program
will hang onto the CPU until it decides to give it up. An ill-behaved program can hold onto the CPU
forever. This is the cause of a system hanging because nothing, not even the operating system itself,
can gain control of the CPU. Linux uses the concept of pre-emptive multi-tasking. Here, the system
can pre-empt one process or another, to let another have a turn. Older versions of Windows, use co-
operative multi-tasking. This means the process must be "cooperative" and give up control of the
CPU.
Depending on the load of the system (how busy it is), a process may get several time slices per
second. However, after it has run for its , the operating system checks to see if some other process
needs a turn. If so, that process gets to run for a time slice and then its someone else's turn: maybe
the first process, maybe a new one.
As your process is running, it will be given full use of the CPU for the entire 1/100th of a second
unless one of three things happens. Your process may need to wait for some event. For example, the
editor I am using to write this in is waiting for me to type in characters. I said that I type about 30
words per minute, so if we assume an average of six letters per word, that's 180 characters per
minute, or three characters per second. That means that on average, a character is pressed once
every 1/3 of a second. Because a is 1/100th of a second, more than 30 processes can have a turn on
the CPU between each keystroke! Rather than tying everything up, the program waits until the next
key is pressed. It puts itself to sleep until it is awoken by some external event, such as the press of a
key. Compare this to a "busy loop" where the process keeps checking for a key being pressed.
When I want to write to the disk to save my file, it may appear that it happens instantaneously, but
like the "complete-use-of-the-CPU myth," this is only appearance. The system will gather requests
to write to or read from the disk and do it in chunks. This is much more efficient than satisfying
everyone's request when they ask for it.
Gathering up requests and accessing the disk all at once has another advantage. Often, the data that
was just written is needed again, for example, in a database application. If the system wrote
everything to the disk immediately, you would have to perform another read to get back that same
data. Instead, the system holds that data in a special buffer; in other words, it "caches" that data in
the buffer. This is called the buffer cache.
If a file is being written to or read from, the system first checks the buffer . If on a read it finds what
it's looking for in the buffer cache, it has just saved itself a trip to the disk. Because the buffer cache
is in memory, it is substantially faster to read from memory than from the disk. Writes are normally
written to the buffer , which is then written out in larger chunks. If the data being written already
exists in the buffer , it is overwritten. The flow of things might look like this:

When your process is running and you make a request to read from the hard disk, you typically
cannot do anything until you have completed the write to the disk. If you haven't completed your
time slice yet, it would be a waste not to let someone else have a turn. That's exactly what the
system does. If you decide you need access to some resource that the system cannot immediately
give to you, you are "put to sleep" to wait. It is said that you are put to sleep waiting on an event,
the event being the disk access. This is the second case in which you may not get your full time on
the CPU.
The third way that you might not get your full is also the result of an external event. If a device
(such as a keyboard, the clock, hard disk, etc.) needs to communicate with the operating system, it
signals this need through the use of an interrupt. When an interrupt is generated, the CPU itself will
stop execution of the process and immediately start executing a routine in the operating system to
handle interrupts. Once the operating system has satisfied this interrupt, it returns to its regularly
scheduled process. (Note: Things are much more complicated than that. The "priority" of both the
interrupt and process are factors here. We will go into more detail in the section on the CPU.)
As I mentioned earlier, there are certain things that the operating system keeps track of as a process
is running. The information the operating system is keeping track of is referred to as the process
context. This might be the terminal you are running on or what files you have open. The context
even includes the internal state of the CPU, that is, what the content of each register is.
What happens when a process's has run out or for some other reason another process gets to run? If
things go right (and they usually do), eventually that process gets a turn again. However, to do
things right, the process must be allowed to return to the exact place where it left off. Any difference
could result in disaster.
You may have heard of the classic banking problem concerning deducting from your account. If the
process returned to a place before it made the deduction, you would deduct twice. If the process
hadn't yet made the deduction but started up again at a point after which it would have made the
deduction, it appears as though the deduction was made. Good for you, but not so good for the
bank. Therefore, everything must be put back the way it was.
The processors used by Linux (Intel 80386 and later, as well as the DEC Alpha, and SPARC) have
built-in capabilities to manage both multiple users and multiple tasks. We will get into the details of
this in later chapters. For now, just be aware of the fact that the CPU assists the operating system in
managing users and processes. This shows how multiple processes might look in memory:

In addition to user processes, such as shells, text editors, and databases, there are system processes
running. These are processes that were started by the system. Several of these deal with managing
memory and scheduling turns on the CPU. Others deal with delivering mail, printing, and other
tasks that we take for granted. In principle, both of these kinds of processes are identical. However,
system processes can run at much higher priorities and therefore run more often than user processes.
Typically a system process of this kind is referred to as a daemon process or because they run
behind the scenes (i.e. in the background) without user intervention. It is also possible for a user to
put one of his or her processes in the background. This is done by using the ampersand (&)
metacharacter at the end of the command line. (I'll talk more about metacharacters in the section on
shells .)
What normally happens when you enter a command is that the shell will wait for that command to
finish before it accepts a new command. By putting a command in the background, the shell does
not wait, but rather is ready immediately for the next command. If you wanted, you could put the
next command in the background as well.
I have talked to customers who have complained about their systems grinding to a halt after they put
dozens of processes in the background. The misconception is that because they didn't see the
process running, it must not be taking up any resources. (Out of sight, out of mind.) The issue here
is that even though the process is running in the background and you can't see it, it still behaves like
any other process.

Virtual Memory Basics


One interesting aspect about modern operating systems is the fact that they can run programs that
require more memory than the system actually has. Like the Tardis in Dr. Who, Linux memory is
much bigger on the inside than on the outside.
At the extreme end, this means that if your CPU is 32-bit (meaning that it has s that are 32-bits),
you can access up to 232 bytes (that 4,294,967,296 or 4 billion). That means you would need 4 Gb
of main memory () in order to to completely take advantage of this. Although many systems are
currently available (2003) with 256 MB or even 512 MB, more RAM than that is rare; and 4 Gb is
extremely rare for a home PC.
The interesting thing is that when you sum the memory requirements of the programs you are
running,you often reach far beyond the physical memory you have. Currently my system appears to
need about 570 Mb. although my machine only has 384 Mb. Surprisingly enough I don't notice any
performance problems. So, how is this possible?
Well, Linux, Unix and many other operating systems take advantage of the fact that most programs
don't use all of the memory that they "require", as you typically do not use every part of the
program at once. For example, you might be using a word processor, but not currently using the
spell checking feature, or printing function, so there is no need to keep these in memory at the same
time. Also, while you are using your word processor, your email program is probably sitting around
doing nothing.
From the user's perspective the email program (or parts of the word processor) are loaded into
memory. However, the system only loads what it needs. In some cases, they might all be in memory
at once. However, if you load enough programs, you eventually reach a point where you have more
programs than you have memory. To solve this problem, Linux uses something called "virtual
memory". It's virtual because it can use more than you actually have. In fact, with virtual memory
you can use the whole 232 bytes. Basically, what this means is that you can run more programs at
once without the need for buying more memory.
When a program starts, Linux does not load all of it, just the portion it takes to get started. One
aspect of virtual memory is keeping parts of the program that are not needed on the hard disk. As
the process runs, when it finds it needs other parts of the program, it goes and gets them. Those
parts that are never needed are never loaded and the system does not use all of the memory it
appears to "require".
If you have more data than physical memory, the system might store it temporarily on the hard disk
should it not be needed at the moment. The process of moving data to and from the hard disk like
this is called , as the data is "swapped" in and out. Typically, when you install the system, you
define a specific partition as the swap partition, or swap "space". However, Linux can also swap to a
physical file, although with older Linux versions this is much slower than a special partition. An old
rule of thumb is that you have at least as much swap space as you do physical RAM, this ensures
that all of the data can be swapped out, if necessary. You will also find that some texts say that you
should have at least twice as much swap as physical . We go into details on swap in the section in
installing and upgrading.
In order to do all this, the system needs to manage your memory. This function is logically called
"memory management" and is one of the core aspects of any modern operating system. Although
the details are different from one operating system to the next, the basic principles apply, even
between different types.
In other sections of the tutorial, we will talk about the details of memory management from both the
perspective of the CPU and the operating system kernel.

Files and Directories


Another key aspect of any operating system is the concept of a file. A file is nothing more than a
related set of bytes on disk or other media. These bytes are labeled with a name, which is then used
as a means of referring to that set of bytes. In most cases, it is through the name that the operating
system is able to track down the file's exact location on the disk.
There are three kinds of files with which most people are familiar: programs, text files, and data
files. However, on a UNIX system, there are other kinds of files. One of the most common is a
device file. These are often referred to as device files or device nodes. Under UNIX, every device is
treated as a file. Access is gained to the hardware by the operating system through the device files.
These tell the system what specific device driver needs to be used to access the hardware.
Another kind of file is a pipe. Like a real pipe, stuff goes in one end and out the other. Some are
named pipes. That is, they have a name and are located permanently on the hard disk. Others are
temporary and are unnamed pipes. Although these do not exist once the process using them has
ended, they do take up physical space on the hard disk. We'll talk more about pipes later.
Unlike operating systems like DOS, there is no pattern for file names that is expected or followed.
DOS will not even attempt to execute programs that do not end with .EXE, .COM, or .BAT. UNIX,
on the other hand, is just as happy to execute a program called program as it is a program called
program.txt. In fact, you can use any character in a file name except for "/" and NULL.
However, completely random things can happen if the operating system tries to execute a text file as
if it were a binary program. To prevent this, UNIX has two mechanisms to ensure that text does not
get randomly executed. The first is the file's permission bits. The permission bits determine who can
read, write, and execute a particular file. You can see the permissions of a file by doing a long
listing of that file. What the permissions are all about, we get into a little later. The second is that the
system must recognize a magic number within the program indicating that it is a binary executable.
To see what kinds of files the system recognizes, take a look in /etc/magic. This file contains a list
of file types and information that the system uses to determine a file's type.
Even if a file was set to allow you to execute it, the beginning portion of the file must contain the
right information to tell the operating system how to start this program. If that information is
missing, it will attempt to start it as a shell script (similar to a DOS batch file). If the lines in the file
do not belong to a shell script and you try to execute the program, you end up with a screen full of
errors.
What you name your file is up to you. You are not limited by the eight-letter name and three-letter
extension as you are in DOS. You can still use periods as separators, but that's all they are. They do
not have the same "special" meaning that they do under DOS. For example, you could have files
called
letter.txt
letter.text
letter_txt
letter_to_jim
letter.to.jim
Only the first file example is valid under DOS, but all are valid under Linux. Note that even in older
versions of UNIX where you were limited to 14 characters in a file name, all of these are still valid.
With Linux, I have been able to create file names that are 255 characters long. However, such long
file names are not easy to work with. Note that if you are running either Windows NT or Windows
95, you can create file names that are basically the same as with Linux.
Also keep in mind that although you can create file names with spaces in them, it can cause
problems. Spaces are used to seperate the different components on the command line. You can tell
your shell to treat a name with spaces as a single unit by including it in quotes. However, you need
to be careful. Typically, I simply use an underline (_) when the file name ought to have a space. It
almost looks the same and I don't run into problems.
One naming convention does have special meaning in Linux: "dot" files. In these files, the first
character is a "." (dot). If you have such a file, it will by default be invisible to you. That is, when
you do a listing of a containing a "dot" file, you won't see it.
However, unlike the DOS/Windows concept of "hidden" files, "dot" files can be seen by simply
using the -a (all) option to ls, as in ls -a. (ls is a command used to list the contents of directories.)
With DOS/Windows the "dir" command can show you hidden files and directories, but has no
option to show these along with the others.
The ability to group your files together into some kind of organizational structure is very helpful.
Instead of having to wade through thousands of files on your hard disk to find the one you want,
Linux, along with other operating systems, enables you to group the files into a . Under Linux, a
directory is actually nothing more than a file itself with a special format.
It contains the names of the files associated with it and some pointers or other information to tell
the system where the data for the file actually reside on the hard disk.
Directories do not actually "contain" the files that are associated with them. Physically (that is, how
they exist on the disk), directories are just files in a certain format. The structure is imposed on them
by the program you use, such as ls.
The directories have information that points to where the real files are. In comparison, you might
consider a phone book. A phone book does not contain the people listed in it, just their names and
telephone numbers. A has the same information: the names of files and their numbers. In this case,
instead of a telephone number, there is an information node number, or inode number.
The logical structure in a telephone book is that names are grouped alphabetically. It is very
common for two entries (names) that appear next to each other in the phone book to be in different
parts of the city. Just like names in the phone book, names that are next to each other in a may be in
distant parts of the hard disk.
As I mentioned, directories are logical groupings of files. In fact, directories are nothing more than
files that have a particular structure imposed on them. It is common to say that the "contains" those
files or the file is "in" a particular directory. In a sense, this is true. The file that is the directory
"contains" the name of the file. However, this is the only connection between the directory and file,
but we will continue to use this terminology. You can find more details about this in the section on
files and file systems.
One kind of file is a . What this kind of file can contain are files and more directories. These, in
turn, can contain still more files and directories. The result is a hierarchical tree structure of
directories, files, more directories, and more files. Directories that contain other directories are
referred to as the parent directory of the child or subdirectory that they contain. (Most references I
have seen refer only to parent and subdirectories. Rarely have I seen references to child directories.)
When referring to directories under UNIX, there is often either a leading or trailing slash ("/"), and
sometimes both. The top of the directory tree is referred to with a single "/" and is called the "root"
directory. Subdirectories are referred to by this slash followed by their name, such as /bin or /dev.
As you proceed down the directory tree, each subsequent directory is separated by a slash. The
concatenation of slashes and directory names is referred to as a path. Several levels down, you
might end up with a path such as /home/jimmo/letters/personal/chris.txt, where chris.txt is the actual
file and /home/jimmo/letters/personal is all of the directories leading to that file. The directory
/home contains the subdirectory jimmo, which contains the subdirectory letters, which contains the
subdirectory personal. This directory contains the file chris.txt.
Movement up and down the tree is accomplished by the means of the cd (change directory)
command, which is part of your shell. Although this is often difficult to grasp at first, you are not
actually moving anywhere. One of the things that the operating system keeps track of within the
context of each process is the process's current directory, also referred to as the current working
directory. This is merely the name of a directory on the system. Your process has no physical
contact with this directory; it is just keeping the directory name in memory.
When you change directories, this portion of the process memory is changed to reflect your new
"location." You can "move" up and down the tree or make jumps to completely unrelated parts of
the directory tree. However, all that really happens is that the current working directory portion of
your process gets changed.
Although there can be many files with the same name, each combination of directories and file
name must be unique. This is because the operating system refers to every file on the system by this
unique combination of directories and file name. In the example above, I have a personal letter
called chris.txt. I might also have a business letter by the same name. Its path (or the combination of
directory and file name) would be /home/jimmo/letters/business/chris.txt. Someone else named
John might also have a business letter to Chris. John's path (or combination of path and file name)
might be /home/john/letters/business/chris.txt. This might look something like this:

One thing to note is that John's business letter to Chris may be the exact same file as Jim's. I am not
talking about one being a copy of the other. Rather, I am talking about a situation where both names
point to the same physical locations on the hard disk. Because both files are referencing the same
bits on the disk, they must therefore be the same file.
This is accomplished through the concept of a link. Like a chain link, a file link connects two pieces
together. I mentioned above the "telephone number" for a file was its inode. This number actually
points to a special place on the disk called the inode table, with the inode number being the offset
into this table. Each entry in this table not only contains the file's physical location on this disk, but
the owner of the file, the access permissions, and the number of links, as well as many other things.
In the case where the two files are referencing the same entry in the inode table, these are referred to
as hard links. A soft link or symbolic link is where a file is created that contains the path of the other
file. We will get into the details of this later.
An inode does not contain the name of a file. The name is only contained within the directory.
Therefore, it is possible to have multiple directory entries that have the same inode. Just as there can
be multiple entries in the phone book, all with the same phone number. We'll get into a lot more
detail about inodes in the section on filesystems. A directory and where the inodes point to on the
hard disk might look like this:
Lets think about the telephone book analogy once again. Although it is not common for an
individual to have multiple listings, there might be two people with the same number. For example,
if you were sharing a house with three of your friends, there might be only one telephone. However,
each of you would have an entry in the phone book. I could get the same phone to ring by dialing
the telephone number of four different people. I could also get to the same inode with four different
file names.
Under Linux, files and directories are grouped into units called filesystems. A filesystem is a portion
of your hard disk that is administered as a single unit. Filesystems exist within a section of the hard
disk called a partition. Each hard disk can be broken down into multiple partitions and the
filesystem is created within the partition. Each has specific starting and ending points that are
managed by the system. (Note: Some dialects of UNIX allow multiple filesystems within a
partition.) When you create a filesystem under Linux, this is comparable to formatting the partition
under DOS. The filesystem structure is laid out and a table is created to tell you where the actual
data are located. This table, called the inode table in UNIX, is where almost all the information
related to the file is kept.
In an operating system such as Linux, a file is more than just the basic unit of data. Instead, almost
everything is either treated as a file or is only accessed through files. For example, to read the
contents of a data file, the operating system must access the hard disk. Linux treats the hard disk as
if it were a file. It opens it like a file, reads it like a file, and closes it like a file. The same applies to
other hardware such as tape drives and printers. Even memory is treated as a file. The files used to
access the physical hardware are the device files that I mentioned earlier.
When the operating system wants to access any hardware device, it first opens a file that "points"
toward that device (the device node). Based on information it finds in the inode, the operating
system determines what kind of device it is and can therefore access it in the proper manner. This
includes opening, reading, and closing, just like any other file.
If, for example, you are reading a file from the hard disk, not only do you have the file open that
you are reading, but the operating system has opened the file that relates to the filesystem within the
partition, the partition on the hard disk, and the hard disk itself (more about these later). Three
additional files are opened every time you log in or start a shell. These are the files that relate to
input, output, and error messages.
Normally, when you login, you get to a shell prompt. When you type a command on the keyboard
and press enter, a moment later something comes onto your screen. If you made a mistake or the
program otherwise encountered an error, there will probably be some message on your screen to
that effect. The keyboard where you are typing in your data is the input, referred to as standard input
(standard in or stdin) and that is where input comes from by default. The program displays a
message on your screen, which is the output, referred to as standard output (standard out or stdout).
Although it appears on that same screen, the error message appears on standard error (stderr).
Although stdin and stdout appear to be separate physical devices (keyboard and monitor), there is
only one connection to the system. This is one of those device files I talked about a moment ago.
When you log in, the file (device) is opened for both reading, so you can get data from the
keyboard, and writing, so that output can go to the screen and you can see the error messages.
These three concepts (standard in, standard out, and standard error) may be somewhat difficult to
understand at first. At this point, it suffices to understand that these represent input, output, and
error messages. We'll get into the details a bit later.

Operating System Layers


Conceptually, the Linux operating system is similar to an onion. It consists of many layers, one on
top of the other. At the very core is the interface with the hardware. The operating system must
know how to communicate with the hardware or nothing can get done. This is the most privileged
aspect of the operating system.
Because it needs to access the hardware directly, this part of the operating system is the most
powerful as well as the most dangerous. What accesses the hardware is a set of functions within the
operating system itself (the kernel) called device drivers. If it does not behave correctly, a device
driver has the potential of wiping out data on your hard disk or "crashing" your system. Because a
device driver needs to be sure that it has properly completed its task (such as accurately writing or
reading from the hard disk), it cannot quit until it has finished. For this reason, once a driver has
started, very little can get it to stop. We'll talk about what can stop it in the section on the kernel.
Above the device driver level is what is commonly thought of when talking about the operating
system, the management functions. This is where the decision is made about what gets run and
when, what resources are given to what process, and so on.
In our previous discussion on processes, we talked about having several different processes all in
memory at the same time. Each gets a turn to run and may or may not get to use up its . It is at this
level that the operating system determines who gets to run next when your time slice runs out, what
should be done when an interrupt comes in, and where it keeps track of the events on which a
sleeping process may be waiting. It's even the alarm clock to wake you up when you're sleeping.
The actual processes that the operating system is managing are at levels above the operating system
itself. Generally, the first of these levels is for programs that interact directly with the operating
system, such as the various shells. These interpret the commands and pass them along to the
operating system for execution. It is from the shell that you usually start application programs such
as word processors, databases, or compilers. Because these often rely on other programs that
interact directly with the operating system, these are often considered a separate level. This is how
the different levels (or layers) might look like graphically:

If you are running Linux with a graphical interface (e.g. the X Windowing System), you have an
additional layer. Your shell might start the graphical interface, which then starts the other programs
and applications as we discussed.
Under Linux, there are many sets of programs that serve common functions. This includes things
like mail or printing. These groups of related programs are referred to as "System Services",
whereas individual programs such as vi or fdisk are referred to as utilities. Programs that perform a
single function such as ls or date are typically referred to as commands.

Moving On
So you now have an understanding of the basics of how Linux works. We talked about the different
functions that the operating system is responsible for, what it manages, and a little about how
everything fits together. As we move on through the book, well build on these ideas and concepts to
give you a complete understanding of a Linux system.
I came from the DOS world before I started on UNIX. I had many preconceptions about the way an
operating system "should" behave and react. The way DOS did things was the "right" way. As I
learned UNIX, I began to see a completely different world. The hardest part was not that I had to
learn a whole new set of commands, but rather that I was fighting myself because I was so used to
DOS. Therefore, I need to make one general comment about UNIX before I let you move on.
Always remember that UNIX is not DOS. Nor is it any other operating system for that matter.
UNIX is UNIX and Linux is Linux. There are probably as many "dialects" of Linux as there are
dialects of UNIX. All have their own subtle differences. As you go through this book, keep that in
mind.
For example, I believed that the way commands were given arguments or options was better in
DOS. Every time I used a UNIX command, I grumbled about how wrong it was to do things like
that. As I learned more about UNIX, I came to realize that many of the decisions on how things
work or appear is completely arbitrary. There is no right way of doing many things. There is a DOS
way and a UNIX way. Neither is right. You might be used to the DOS way or whatever system you
use. However, that does not make it right.
When I started working with Linux, I had several years experience with a half-dozen different
dialects of UNIX. It was much easier for me to adjust and simply said to myself, "Oh, so this is the
way Linux does it."
If you are new to Linux, keep in mind that there are going to be differences. There are even
differences among the various distributions. If you keep this in mind, you will have a much more
enjoyable time learning about the Linux way.
I have always found that the best way to learn something is by doing it. That applies to learning a
new operating system as well. Therefore, I suggest that when you find something interesting in this
book, go look at your Linux system and see what it looks like on your system. Play with it. Twist it.
Tweak it. See if it behaves the way in which you expect and understand.
Chapter II
Linux Basics
With many UNIX systems that are around, the user is unaware that the operating system is a UNIX
system. Many companies have point-of-sales systems hooked up to a UNIX host. For example, the
users at the cash register may never see what is being run. Therefore, there is really no need to go
into details about the system other than for pure curiosity assuming that users find out that they are
running on a UNIX system.
On the other hand, if you do have access to the command line or interact with the system by some
other means, knowing how the system is put together is useful information. Knowing how things
interact helps expand your knowledge. Knowing what's on your system is helpful in figuring out
just what your system can do.
That's what this chapter is about: what's out there. We're going to talk about what makes up Linux.
This brings up the question "What is Linux?" There are more than a dozen versions commercially
available, in several different countries, all with their own unique characteristics. How can you call
any one of them the Linux distribution? The answer is you can't. What I will do instead is to
synthesize all the different versions into a single pseudo-version that we can talk about. Although
there are differences in the different versions, the majority of the components are the same. There
has been a great deal of effort in the past few years to standardize Linux, with a great deal of
success. I will therefore address this standard Linux and then mention those areas where specific
versions diverge.

What Linux is All About


Linux is available from many companies and in many versions. Often, a company will produce its
own version with specific enhancements or changes. These are then released commercially and
called distributions. Although Linux is technically only the kernel, it is commonly considered to be
all of the associated programs and utilities. Combined with the kernel, the utilities and often some
applications comprise a commercial distribution.

Guided Tour
Unless you are on familiar ground, you usually need a map to get around any large area. To get from
one place to another, the best map is a road map (or street map). If you are staying in one general
area and are looking for places of interest, you need a tourist map. Because we are staying within
the context of Linux and were looking for things of interest, what I am going to give you now is a
tourist map of Linux directories.
In later chapters, we'll go into detail about many of the directories that we are going to encounter
here. For now, I am going to briefly describe where they are and what their functions are. As we get
into different sections of the book, it will be a lot easier to move about and know how files relate if
we already have an understanding of the basic directory structure.
One thing I would like to point out is that (for the most part) the directories of most UNIX systems
are laid out according to the functionality of the files and programs within the directory. One
enhancement that Linux makes is allowing things to be in more than one place. For example, files
that the system uses may be in one place and those that normal users need may be in another place.
Linux takes advantage of links to allow the necessary files to be in both places. We'll talk more
about links as we move on.
One question people often ask is why it is necessary to know what all the directories are for. Well, it
isn't. It isn't necessary to know them all, just the more important ones. While working in tech
support, I have talked numerous times with administrators who were trying to clean up their
systems a little. Because they had little experience with UNIX systems, they ended up removing
things that they thought were unnecessary, but turned out to be vital for the operation of the system.
If they knew more about where things were and what they were for, they wouldn't have made these
mistakes.
As we go through these directories, keep in mind that your system may not be like this. I have tried
to follow the structure of the Linux Filesystem Standard as well as to find some commonality
among the different versions that I've installed. On your system, the files and directories may be in a
different place, have different names, or may be gone altogether. Note that depending on your
distribution and the packages you have installed, these files and directories will look different. In
addition, although my system has every conceivable package installed (well, almost), I did not list
all the files and directories I have. I included this list with the intention of giving you a
representative overview. In addition, some of the directories are not mentioned in the text, as I
cannot say too much more than in the popup in the image map in so little space.
With that said, let's have a look.
The top-most directory is the root directory. In verbal conversation, you say "root directory" or
"slash," whereas it may be referred to in text as simply "/."
So when you hear someone talking about the /bin directory, you may hear them say "slash bin."
This is also extended to other directories, so /usr/bin would be "slash user, slash bin." However,
once you get the feeling and begin to talk "Linux-ese," you will start talking about the directories as
"bin" or "user bin." Note that usr is read as "user."
Under the root, there are several subdirectories with a wide range of functions. The image below
shows the key subdirectories of /. This representation does not depict every subdirectory of /, just
the more significant ones that appear with most default installations. In subsequent diagrams, I will
continue to limit myself to the most significant directories to keep from losing perspective.

One of these files, one could say, is the single most important file: vmlinuz. This file is the operating
system proper. It contains all the functions that make everything go. When referring to the file on
the hard disk, one refers to /vmlinuz, whereas the in-memory, executing version is referred to as the
kernel.
The first directory we get to is /bin. Its name is derived from the word "binary." Often, the word
"binary" is used to refer to executable programs or other files that contains non-readable characters.
The /bin directory is where many of the system-related binaries are kept, hence the name. Although
several of the files in this directory are used for administrative purposes and cannot be run by
normal users, everyone has read permission on this directory, so you can at least see what the
directory contains.
The /boot directory is used to boot the system. There are several files here that the system uses at
different times during the boot process. For example, the files /boot/boot.???? are copies of the
original boot sector from your hard disk. (for example boot.0300) Files ending in .b are "chain
loaders," secondary loaders that the system uses to boot the various operating systems that you
specify.
The /dev directory contains the device nodes. As I mentioned in our previous discussion on
operating system basics, device files are the way both the operating system and users gain access to
the hardware. Every device has at least one device file associated with it. If it doesn't, you can't gain
access to it. We'll get into more detail on individual device files later.
The /etc directory contains files and programs that are used for system configuration. Its name
comes from the common abbreviation etc., for et cetera, meaning "and so on." This seems to come
from the fact that on many systems, /etc contains files that don't seem to fit elsewhere.
Under /etc are several subdirectories of varying importance to both administrators and users. The
following image shows a number of important sub-directories. Depending on what software you
have installed you may not have some of these or you may have many more not listed.

In some Linux distributions you will find the /etc/lilo directory, which is used for the Linux loader
(lilo). This directory contains a single file, install, which is a link to /sbin/lilo. This file is used
(among other things) to install the boot configuration options. On some systems, the lilo
configuration file (lilo.conf) is found directly in the /etc directory We'll get into this more in the
section on starting and stopping your system.
There several directories named /etc/cron*. As you might guess these are used by the daemon.
The /etc/cron.d contains configuration files used by cron. Typically what is here are various system
related cron jobs, such as /etc/cron.d/seccheck, which does various security checks. The
directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, /etc/cron.monthly contain files with
cron jobs which run hourly, daily, weekly and monthly, respectively. There is a cron job listed in
/etc/crontab that runs the program /usr/lib/cron/run-crons, which checks the other files.
The /etc/init.d directory contains scripts that the system uses when starting up or shutting down.
Which files are read depends on whether the system is being started or shut down. We'll talk more
about these directories and their associated files in the section on starting up and shutting down the
system. You may also find that these files are located in /etc/rc.d. On SuSE, /etc/rc.d is a symbolic
link to /etc/init.d.
The /etc/skel directory is used when you create a new user with the adduser command. This is the
"skeleton" of files that is copied to the user's home directory when it's created (hence the name
"skel"). If you want to ensure that each user gets other files at startup, place them in here. For
example, you may want everyone to have a configuration file for vi (.exrc) or for mail (.mailrc).
Depending on your Linux distribution, either the /etc/sysconfig or /etc/rc.config.d directory contains
default system configuration information. For example, the keyboard file defines which keyboard
table is to be used and the network file contains network parameters, such as the hostname.
The /etc/pam.d directory contains configuration files used by the Pluggable Authentication Modules
(PAM). PAM is a system of libraries that are responsible for authentication tasks of applications and
services on your system. These libraries provide an Application Programming Interface (API)
allowing for a standardization of authorization functions. Previously, where necessary each program
did its own authorization/authentication. With PAM, a single set of configuration files allows for a
more consistent security policy. In some cases, an /etc/pam.conf file is used instead of the
/etc/pam.d directory.
The /etc/profile.d directory contains default configuration for many of the s that Linux provides. As
we talk about in the section on shells, each shell has an which contains a number of different
characteristics. Many of the defaults are defined in the files under /etc/profile.d. The name of each
file gives an indication of the appropriate .
The /etc/security directory contains security related configurations files. Whereas PAM concerns
itself with the methods used to authenticate any given user, the files under /etc/security are
concerned with just what a user can or cannot do. For example, the file /etc/security/access.conf is a
list of what users are allowed to login and from what (for example, using telnet). The
/etc/security/limits.conf contains various system limits, such as maximum number of processes.
(Yes, these are really related to security!)
Moving back up to the root directory, we next find /home. As its name implies, this is the default
location for user's home directories. However, as we'll talk about later, you can have the home
directory anywhere.
The /lost+found directory is used to store files that are no longer associated with a directory. These
are files that have no home and are, therefore, lost. Often, if your system crashes and the filesystem
is cleaned when it reboots, the system can save much of the data and the files will end up here. Note
that a lost+found directory is created automatically for each filesystem you create. We'll get into
more detail about this in the section on filesystems.
The /lib directory (for library) contains the libraries needed by the operating system as it is running.
You will also find several sub directories.
The /proc directory takes a little while to get used to, especially if you come from a non-UNIX
world or have used a version of UNIX without this directory. This is a "pseudo-filesystem" that is
used to access information in the running system. Rather than having you access kernel memory
directly (i.e., through the special device /dev/kmem), you can access the files within this directory.
There are directories for every running process as well. We will get into more detail about this when
we talk about monitoring your system. If you are curious now, check out the proc(8) man-page.
The /root directory is the home directory for the user root. This is different from many UNIX
dialects that have the root's home directory in /. On SuSE, the /root directory is actually a symbolic
link to /home/root.
The /sbin directory contains programs that are used (more or less) to administer the system. In other
words, the system binaries. Many documentation sources say that this is only for system
administrators. However, most of these files are executable by normal users, as well. Whether the
support files or device nodes are accessible is another matter. If a normal user cannot access the
device nodes or other files, the program won't run.
The /usr directory contains many user-related subdirectories. Note the 'e' is missing from "user". In
general, one can say that the directories and files under /usr are used by and related to users. There
are programs and utilities here that users use on a daily basis. Unless changed on some systems, /usr
is where users have their home directory. The figure below shows what the subdirectories of /usr
would look like graphically.

Where /bin contains programs that are used by both users and administrators, /usr/bin contains files
that are almost exclusively used by users. (However, like everything in UNIX, there are exceptions.)
Here again, the bin directory contains binary files. In general, you can say the the programs and
utilities that all user more or less require as stored in bin, whereas the "nice-to-have" programs and
utilities are stored in /usr/bin. Programs and utilities needs for administrative tasks are stored in
/sbin. Note that is common to seperate files like this, but it is not an absolute.
The /usr/adm directory contains mostly administrative data. The name "adm" comes from
"administration," which is no wonder considering this contains a lot of the administrative
information that relates to users. This may be a symbolic link to the /var directory.
The /usr/include directory and its various subdirectories contain all the include files. These contain
information that is needed both by the kernel when it is being recreated and by programs when they
are being compiled. For normal users and even most system administrators, the information here is
more a place to get one's curiosity satisfied. (For those of you who know that this is dramatic over-
simplification, all I can say is that you already know what this directory is for anyway.)
The /usr/src directory contains the source code for both the Linux kernel and for any program that
you specifically install.
Many system parameters and values are stored inside the files underneath /usr/src/linux/include.
Because of the information provided in many of the files, I will be making reference to them
through the book. Rather than spelling out the full path of the directory, I will make a reference to
the files relative to the /usr/src/linux/include directory, the same way that it is done in C source
code. For example, when I refer to something like <linux/user.h>, I mean the full path
/usr/src/linux/include/linux/user.h. When you see something enclosed in angled brackets like this,
you can make the expansion yourself.
The /usr/lib directory is difficult to explain. We could say that it contains the user-related library
files (based on its name). However, that still does not accurately describe the complete contents.
One thing it contains is the library files that are less general than those you find in /lib. This
directory contains many of the systemwide configuration files for user-level programs such as perl
and emacs.
The /usr/lib/kbd directory contains files that are used to configure the system console keyboard.
Through these files, you can configure your keyboard to accommodate one of several different
languages. You can even configure it for dialects of the same language, such as the German
keyboard as used in Switzerland or Germany. You can also change these files to create a totally new
keyboard layout, such as the Dvorak.
If you have switched to the more secure npasswd program, the /usr/lib/npasswd directory is used to
contain some configuration information.
The /usr/lib/terminfo directory contains both the source files and compiled versions of the terminfo
database. Terminfo is the mechanism by which the system can work with so many different types of
terminals and know which key is being pressed. For more information, see the terminfo(5) man-
page.
When configuring UUCP, all the necessary files are contained in the /usr/lib/uucp directory. Not
only are the configuration files here, but this is also home for most of the UUCP programs. UUCP
(Unix-to-Unix Copy) is a package that allows you to transfer files and communicate with remote
systems using serial lines. We'll talk in more detail about this directory in the section on networking.
There are typically many more directories under /usr/lib. Most are related to user programs and
operations. We'll get to some of them as we move along.
The directory /usr/X11R6 contains all the X Windows System files. This makes upgrading to newer
releases of X much easier as the files are not spread out over the entire system. If you have an older
version of Linux, you might still have X11R5 or if a newer release comes out you might have
X11R7. To simplify things even further, the directory /usr/X11 is what many things look at instead.
This is then linked to the appropriate directory (i.e., /usr/X11R6, /usr/X11R5).
Underneath this directory are the subdirectories bin, lib, and man, which have the same
functionality as those under /usr. In most cases, links in other directories point here. For example,
you should have a directory /usr/bin/X11. This is a symbolic link to the directory /usr/X11R6/bin.
The directory /usr/lib/X11 is a symbolic link to /usr/X11R6/lib. The reason for this is to maintain
the directory structure, but still make upgrading easy. When X11R7 comes out, all that you need to
do is make the links point to the X11R7 directories and not copy the individual files.
Next, /usr/sbin contains more system binaries, including the daemon programs that run in the
background. In some UNIX dialects, these files may be in /etc.
Moving back up to the /usr directory, we find the /usr/local sub-directory. This may or may not
contain anything. In fact, there are no rules governing its contents. It is designed to contain
programs, data files, and other information that is specific to your local system, hence the name.
There is often a bin directory that contains local programs and a lib directory that contains data files
or libraries used by the programs in /usr/local/bin.
Also in the /usr directory is /usr/man. This is where the man-pages and their respective indices are
kept. This directory contains the index files, which you can search through to find a command you
are looking for. You can also create and store your own manual pages here. The /usr/info and
/usr/doc directories contain GNU Info documents and other documentation files.
The /usr/spool directory is the place where many different kinds of files are stored temporarily. The
word "spool" is an acronym for simultaneous peripheral operation off-line, the process whereby
jobs destined for some peripheral are queued to be processed later. This may be a link to /var/spool.
Several subdirectories are used as holding areas for the applicable programs. For example, the
/usr/spool/cron directory contains the data files used by cron and at. The /usr/spool/lp directory not
only contains print jobs as they are waiting to be printed, it also contains the configuration files for
the printers.

The /var directory contains files that vary as the system is running, such as log files. This was
originally intended to be used when the /usr directory is shared across multiple systems. In such a
case, you don't want things like the mail or print spoolers to be shared.
The /var/man/cat directory is a cache for man-pages when they are formatted. Some are stored in a
pre-formatted form, and those that need to be formatted are cached here in case they are needed
again soon.
Many system lock files are kept in /var/lock. These are used to indicate that one program or another
is currently using a particular file or maybe even a device. If other programs are written to check in
here first, you don't have collisions.
As you might guess, the /var/log directory contains log files. The /var/run contains information that
is valid until the system is rebooted. For example, the process ID of the inetd daemon can be found
here. It is often important to know this information when changes are made to the system and
storing them here makes them quickly accessible.
The /var/yp directory contains the changing files that are used with the Network Information
Service (NIS), also know as Yellow Pages, or YP.
As I mentioned before, the /usr/adm directory is a link to /var/adm. There are several key log files
stored here. Perhaps, the most important is the messages file that contains all the system service,
kernel, and device driver messages. This is where the system logs messages from the syslogd
daemon.
There were many directories that I skipped, as I said I would at the beginning of this section. Think
about the comparison that I made to a tourist map. We visited all the museums, 200-year-old
churches, and fancy restaurants, but I didn't show you where the office of city planning was.
Granted, such offices are necessary for a large city, but you really don't care about them when you're
touring the city; just as there are certain directories and files that are not necessary to appreciate and
understand the Linux directory structure.

What Linux is Made of


There are many aspects of the Linux operating system that are difficult to define. We can refer to
individual programs as either utilities or commands, depending on the extent of their functions.
However, it is difficult to label collections of files. Often, the labels we try to place on these
collections do not accurately describe the relationship of the files. However, I am going to try.
Linux comes with essentially all the basic UNIX commands and utilities that you have grown to
know and love (plus some that you don't love so much). Basic commands like ls and cat, as well as
text manipulation programs like sed and awk are available. If you don't come from a Unix
background, then many of the commands may seem a little obscure and even intimidating.
However, as you learn more about them you will see how useful and powerful they can be, even if it
takes longer to learn them.
Linux also comes with a wide range of programming tools and environments, including the GNU
gcc compiler, make, rcs, and even a debugger. Several languages are available, including Perl,
Python, Fortran, Pascal, ADA, and even Modula-3.
Unless you have an extremely low-level distribution, you probably have X11R6 in the form of
XFree86 3.x, which contains drivers for a wide range of video cards. There are a dozen text editors
(vi, emacs, jove) and shells (bash, zsh, ash, pdksh), plus a wide range of text processing tools, like
TeX and groff. If you are on a network, there is also a wide range of networking tools and programs.
Even if you have been working with a Linux or any UNIX dialect for a while, you may have heard
of certain aspects of the operating system but not fully understood what they do. In this section, I'm
going to talk about functions that the system performs as well as some of the programs and files that
are associated with these functions. I'm also going to talk about how many of the system files are
grouped together into what are referred to as "packages," and discuss some of the more important
packages.
To install, remove, and administer these packages on a Slackware-derived system, use the pkgtool
tool, which is actually a link to the shell script cpkgtool. This tool can be called from the command
line directly or by the /sbin/setup program. Each package comes on its own set of disks. These
packages are:
• A Base Linux System
• AP various applications that do not need X
• D Program Development (C, C++, Lisp, Perl, etc.)
• E GNU emacs
• F FAQ lists, HOWTO documentation
• I Info files readable with info, JED, or emacs
• IV InterViews Development + Doc and Idraw apps for X
• N Networking (TCP/IP, UUCP, Mail, News)
• OOP Object-Oriented Programming (GNU Smalltalk 1.1.1)
• Q Extra Linux kernels with custom drivers
• T TeX ,text processing system
• TCL Tcl/Tk/TclX, Tcl language and Tk toolkit for X
• X XFree-86 3.1 X Window System
• XAP X applications
• XD XFree-86 3.1 X11 Server Development System
• XV XView 3.2 (OpenLook Window Manager, apps)
• Y games (that do not require X)

Why is it important to know the names of the different packages? Well, for the average user, it
really isn't. However, the average user logs on, starts an application and has very little or no
understanding of what lies under the application. The mere fact that you are reading this says to me
that you want to know more about the operating system and how things work. Because these
packages are the building blocks of the operating system (at least in terms of how it exists on the
hard disk), knowing about them is an important part of understanding the whole system.
Plus one of the key advantages that Linux has over Windows is the ability to selectively install and
remove packages with much finer granularity. For example, you can add and remove individual
programs to a greater extent with Linux than you can with Windows. Further there are fewer groups
of programs in Windows (such a group of programs is often called a "package" in Linux. This
allows you to pick and chose what you want to install to a greater extent. Therefore, knowing where
each package resides (or at least having a starting point) is a big a help. To be able to do any work
on a Linux system, you must first install software. Most people think of installing software as
adding a word processing program or database application; but any program on the operating
system needs to be installed at one time or another. Even the operating system itself was installed.
Earlier, I referred to the Linux operating system as all the files and programs on the hard disk. For
the moment, I want to restrict the definition of "operating system" to just those files that are
necessary for "normal" operation. Linux (at least Slackware) has defined that set of programs and
files as the Base Linux System, or Base Package. Although there are many files in the Base Package
that could be left out to have a running system, this is the base set that is usually installed.
Many versions of Linux are now using the Red Hat Package Manager (RPM) format. In fact, RPM
is perhaps the format most commonly found on the Internet. Most sites will have new or updated
programs as RPM files. You can identify this format by the rpm extension to the file name.
This has proven itself to be a much more robust mechanism for adding and removing packages, as it
is much easier to add and manage single programs than with Slackware. We'll get into more detail
about this when I talk about installing. You will also find that RPM packages are also grouped into
larger sets like those in Slackware, so the concepts are the same.
Although most commercial distributions use the RPM format, there are often a number of
differences in which package groups there are and which programs and applications appear in which
group. For example, later SuSE distribution has the following package group:
• a1 - Linux Base System (required)
• ap1 - Applications which do not need X
• ap4 - Applications which do not need X
• d1 - Development (C, C++, Lisp, etc.)
• doc1 - Documentation
• doc4 - Documentation
• e1 - Emacs
• fun1 - Games and more
• gra1 - All about graphics
• gra3 - All about graphics
• k2de1 - KDE2 - K Desktop Environment (Version 2)
• n1 - Network-Support (TCP/IP, UUCP, Mail, News)
• perl1 - Perl modules
• sec1 - Security related software
• snd1 - Sound related software
• spl1 - Spell checking utilities and databases
• tcl1 - Tcl/Tk/TclX, Tcl-Language and Tk-Toolkit for X
• tex1 - TeX/LaTeX and applications
• x1 - Base X Window System - XFree86\tm
• x3d1 - 3D software for X11 and console
• xap1 - X Applications
• xdev1 - Development under X11
• xsrv1 - Several X Servers (XFree86)
• xsrv2 - Several X Servers (XFree86)
• xsrv3 - Several X Servers (XFree86)
• xsrv4 - Several X Servers (XFree86)
• xv1 - XView (OpenLook, Applications)
• xwm1 - Window managers and desktop environments
• yast1 - YaST Components
• zq - source packages

Note that in the case of SuSE, when you are in the administration tool (YAST), the names of these
groups will probably appear somewhat different. For example, there are two groups of applications:
those that need X-Windows and those that do not. When you are in YAST, there are two dozen
application groups, such as spreadsheets, math and databases. The groups listed above are how you
might find them on the CD and date from a time when you did not have many applications and there
were few distributions. Most people got Linux from the net and these package groups were pretty
convenient.
Today, SuSE is on several CDs and just to make things easier, you are also given a DVD or two
depending on which package you get. Also the package groups have changed as you see in the
following figure:

If you compare the previous list to the groups you see here, you will notice that the groupings are
similar but not identical. Tools like YaST are able to determine what other packages are required
and today there is really no need to group packages to make downloading easier. Typically, you will
either order or download them. There are a number of places where you can download complete
packages, but you have to spend the time downloading and the burnign the CDs or DVS. Or, you
can save yourself time and money by ordering them from places like OS Heaven.

What Linux Does


On any operating system, a core set of tasks is performed. On multi-user or server systems such as
Linux, these tasks include adding and configuring printers, adding and administering users, and
adding new hardware to the system. Each of these tasks could take up an entire chapter in this book.
In fact, I do cover all of these, and many others, in a fair bit of detail later on.
I think it's important to briefly cover all of the basic tasks that an administrator needs to perform in
one place. There are a couple of reasons for this. First, many administrators of Linux systems are
not only novice administrators, they are novice users. They get into the position as they are the only
ones in the company or department with computer experience. (They've worked with DOS before.)
Second, by introducing the varied aspects of system administration here, I hope to lay the
foundation for later chapters. If you are not familar with this issue, you may have trouble later.
Keep in mind that depending on what packages are installed, any Linux distribution can do a lot
more. Here we will be discussing just the basic administrative functions.
The average user may not want to get into the details that the later chapters provide. So here I give
an overview of the more important components. Hopefully, this will give you a better understanding
of what goes into an operating system as well as just how complex the job is that your system
administrator does.
The first job of a system administrator is to add users to the system. Access is gained to the system
only through user accounts. Although it may be all that a normal user is aware of, these accounts
consist of substantially more than just a name and password. Each user must also be assigned one of
the shells, a home directory, and a set of privileges to access system resources.
Although the system administrator could create a single user account for all users to use to log in, it
ends up creating more problems than it solves. Each user has his/her own password and home
directory. If there were a single user, everyone's files would be stored in the same place and
everyone would have access to everyone else's data. This may be fine in certain circumstances, but
not in most.
Users are normally added to the system through the adduser command. Here, when adding a user,
you can input that user's default shell, his/her home directory as well as his/her access privileges.
Another very common function is the addition and configuration of system printers. This includes
determining what physical connection the printer has to the system, what characteristics the printer
has as well as making the printer available for printing. Generically, all the files and programs that
are used to access and manage printers are called the print spool, although not all of them are in the
spool directory. Adding a printer is accomplished like in many UNIX dialects: you do it manually
with the primary configuration file, /etc/printcap file. The printcap man-page lists all the capabilities
that your version of Linux supports. You must also add the appropriate directory and enable printing
on the port. We'll get into more detail about it as we move on.
What happens when you want to remove a file and inadvertently end up removing the wrong one
(or maybe more than one)? If you are like me with my first computer, you're in big trouble. The
files are gone, never to show up again. I learned the hard way about the need to do backups. If you
have a good system administrator, he/she has probably already learned the lesson and makes regular
backups of your system.
There are several ways of making backups and several different utilities for doing them. Which
program to use and how often to make backups completely depends on the circumstances. The
system administrator needs to take into account things like how much data needs to be backed up,
how often the data are changed, how much can be lost, and even how much will fit on the backup
media.
There are tasks that an administrator may need to perform at regular intervals, such as backups,
cleaning up temporary directories, or calling up remote sites to check for incoming mail. The
system administrator could have a checklist of these things and a timer that goes off once a day or
every hour to remind him/her of these chores, which he/she then executes manually.
Fortunately, performing regular tasks can be automated. One basic utility in every UNIX version is
cron. Cron (the "o" is short) is a program that sits in the background and waits for specific times.
When these times are reached, it starts pre-defined programs to accomplish various, arbitrarily
defined tasks. These tasks can be set to run at intervals ranging from once a minute to once a year,
depending on the needs of the system administrator.
Cron "jobs" (as they are called) are grouped together into files, called cron tables, or crontabs for
short. There are several that are created by default on your system and many users and even system
administrators can go quite a long time before they notice them. These monitor certain aspects of
system activity, clean up temporary files, and even check to see if you have UUCP jobs that need to
be sent.
What about a program that you only want to run one time at a specific time and then never again?
Linux provides a mechanism: at. Like cron, at will run a job at a specific time, but once it has
completed, the job is never run again.
A third command that relates to cron and at, the batch command, differs from the other two in that
batch runs the job you submit whenever it has time; that is, when the system load permits.
Linux supports the idea of virtual consoles (VCs), like SCO. With this, the system console (the
keyboard and monitor attached to the computer itself) can work like multiple terminals. By default,
the system is configured with at least four VCs that you switch between by pressing the ALT key
and one of the function keys F1-F6.
Normally, you will only find the first six VCs active. Also, if you are using the X Windowing
System, it normally starts up on VC 7. To switch from the X-Windows screen to one of the virtual
consoles, you need to press CTRL-ALT plus the appropriate function key.
Keeping the data on your system safe is another important task for the system administrator. Linux
provides a couple of useful tools for this: tar and cpio. Each has its own advantages and
disadvantages. Check out the details on the respective man-page.
What goes with Linux
Throughout this site, we are going to be talking a great deal about what makes up the Linux
operating system. In its earliest form, Linux consisted of the base operating system and many of the
tools that were provided on a standard UNIX system. For many companies or businesses, that was
enough. These companies may have only required a single computer with several serial terminals
attached, running a word processor, database, or other application. However, when a single
computer is not enough, the base Linux package does not provide you with everything that you
need.
Suppose you want to be able to connect all the computers in your company into a computer
network. The first thing that you could use is the networking capabilities of UUCP, which is
included in Linux's network package. However, this is limited to exchanging files, remotely
executing programs, and simple terminal emulation. Also, it is limited to serial lines and the speed
at which data can be transferred is limited as well.
So it was in the dark recesses of ancient computer history. Today, products exist that allow
simultaneous connection between multiple machines with substantially higher performance. One
such product is TCP/IP (Transmission Control Protocol/Internet Protocol). If a company decides it
needs an efficient network, it might decide to install TCP/IP, which has become the industry
standard for connecting not only UNIX systems, but other systems as well.
There is a problem with TCP/IP that many companies run into. Suppose you want everyone in the
company to be able to access a specific set of files. With TCP/IP you could devise a scheme that
copies the files from a central machine to the others. However, if the files need to be changed, you
need to ensure that the updated files are copied back to your source machine. This is not only prone
to errors, but it is also inefficient.
Why not have a single location where the source files themselves can be edited? That way, changes
made to a file are immediately available to everyone. The problem is that TCP/IP by itself has
nothing built in to allow you to share files. You need a way to make a directory (or set of
directories) on a remote machine appear as though it were local to your machine.
Like many operating systems, Linux provides an answer: NFS (Network File System). With NFS,
directories or even entire filesystems can appear as if they are local. One central computer can have
the files physically on its hard disk and make them available via NFS to the rest of the network.
Two other products are worth mentioning. To incorporate the wonders of a graphical user interface
(GUI), you have a solution in the form of X-Windows. And if you just switched to Linux and still
have quite a few DOS applications that you can't live without, Linux provides a solution: dosemu or
the DOS Emulator package.

Linux Documentation
Software documentation is a very hot subject. It continues to be debated in all sorts of forums from
USENET newsgroups to user groups. Unless the product is very intuitive, improperly documented
software can be almost worthless to use. Even if intuitive to use, many functions remain hidden
unless you have decent documentation. Unfortunately for many, UNIX is not very intuitive.
Therefore, good documentation is essential to be able to use Linux to its fullest extent.
Unlike a commercial UNIX implementation, Linux does not provide you with a bound set of
manuals that you can refer to. The documentation that is available is found in a large number of
documents usually provided with your Linux distribution. Because the documentation was
developed by many different people at many different locations, there is no single entity that
manages it all.
The Linux Documentation Project (LDP) was organized for this very reason. More and more
documents are being produced as Linux develops. There are many HOWTOs available that give
step-by-step instructions to perform various tasks. These are typically quite long, but go into the
detail necessary to not only solve specific problems, but help you configure detailed aspects of your
system. There are also a number of "mini" HOWTOs, which discuss less extensive topics.
In many cases, these were written by the program developers themselves, giving you insights into
the software that you normally wouldn't get. You'll find ASCII versions on the CD under the
doc/HOWTO directory and HTML versions under doc/HTML. The most current HOWTOs can be
found on the LDP Web site.
Many HOWTOs will have a section of frequently asked questions (FAQs). As their name implies,
these are lists of questions that are most frequently asked about the particular topic. Sometimes
these are questions about specific error messages, but are often questions about implementing
certain features. These can also be found on the LDP Web site. The Brief Linux FAQ (BLFAQ)
provides answers to basic questions about working with Linux.
Unfortunately, in my experience in tech support, few administrators and even fewer users take the
time to read the manuals. This is not good for two important reasons. The first is obviously the
wasted time spent calling support or posting messages to the Internet for help on things in the
manual. The second is that you miss many of the powerful features of the various programs. When
you call support, you usually get a quick and simple answer. Tech support does not have the time to
train you how to use a particular program. Two weeks later, when you try to do something else with
the same program, you're on the phone again.
The biggest problem is that people see the long list of files containing the necessary information and
are immediately intimidated. Although they would rather spend the money to have support explain
things rather than spend time "wading" through documentation, it is not as easy with Linux. There is
no tech support office. There is an increasing number of consulting firms specializing in Linux, but
most companies cannot afford the thousands of dollars needed to get that kind of service.
The nice thing is that you don't have to. You neither have to wade through the manuals nor spend
the money to have support hold your hand. Most of the necessary information is available on-line in
the form of manual pages (man-pages) and other documentation.
Built into the system is a command to read these man-pages: man. By typing man <command>, you
can find out many details about the command <command>. There are several different options to
man that you can use. You can find out more about them by typing man man, which will bring up
the man man-page (or, the man-page for man).
When referring to a particular command in Linux documentation, you very often will see the name
followed by a letter or number in parenthesis, such as ls(1). This indicates that the ls command can
be found in section 1 of the man-pages. This dates back to the time when man-pages came in books
(as they often still do). By including the section, you could more quickly find what you were
looking for. Here I will be making references to files usually as examples. I will say only what
section the files are in when I explicitly point you toward the man-page.
For a list of what sections are available, see the table below or the man man-page. If you are looking
for the man-page of a particular command and know what section it is in, it is often better to specify
the section. Sometimes there are multiple man-pages in different sections. For example, the passwd
man-page in section 1 lists the details of the passwd command. The passwd man-page in section 5,
lists the details of the /etc/passwd file. Therefore,if you wanted the man-page on the passwd file,
you would use the -S option (for "section") and then to specify section 4, you would call up the
man-page like this:
man -S 5 passwd

Section Description
1 s, Utilities and other executable programs, which are typically user-related
2 s
3 s
4 Special files, typically s in /dev
5 File formats and their respective conventions, layout
6 Games
7 Macro packages
8 System administration commands
9 Kernel routines
Table - Manual Page Sections
Man-pages usually have the same basic format, although not all of the different sections are there
for every man-page. At the very top is the section NAME. This is simply the name of the command
or file being discussed. Next is the SYNOPSIS section, which provides a brief overview of the
command or file. If the man-page is talking about a command or utility, the SYNOPSIS section may
list generalized examples of how the command syntax is put together. The tar man-page is a good
example.
The DESCRIPTION section, is just that: a description of the command. Here you get a detailed
overview about what the command does or what information a particular file contains. Under
OPTIONS, you will find details of the various command line switches and parameters, if any. The
SEE ALSO section lists other man-pages or other documentation, if any, that contain addition
information. Often if there is an info page (see below) for this man-page it is listed here. BUGS is a
list of known bugs, other problems and limitations the program might have. Sometimes, there is an
AUTHOR section, which lists the author(s) of the program and possibly how to contact them.
Note that these sections are just a sampling and not all man-pages have these sections. Some man-
pages have other sections that are not applicable to other man-pages. In general, the section
headings are pretty straightforward. If all else fails, look at the man man-page.
In many cases, each section has its own man page. By running
man -k intro

you can see which sections have an introduction, which sometimes provides useful information
about that section of man-pages.
Sometimes applications will provide their own man-pages and end up putting them in a directory
that the normal man command doesn't use. If the installation routine for the application is well
written, then you should not have a problem. Otherwise you need to tell the man command where to
look. Some distributions use the /etc/manpath.config file (which has its own man-page), which
contains (among other things) the directories that man should search. You might also have to define
the MANPATH variable explicitly to tell the system where to look. Note that typically, if the
MANPATH variable is set., the manpath.config file is ignored.
Often the manual pages are not stored in the original form, but in a pre-formatted form "cat pages".
This is done to speed up the display, so that the man pages do not need to be processed each time
they are called. I have worked on some systems where these pages are not created by default and
every single man-page reports "No manual entry for (whatever)". To solve this problem simply run
the command catman. It may take a while so be patient.
If you want to look at multiple man-pages, you can simply input them on the same line. For
example, to look at the grep and find man-pages, you might have a command that looks like this:
man grep find

By pressing 'q' or waiting until the page is displayed, you will be prompted to go to the next file. If
the same term is in multiple sections, you can use the -a option to display all of them. For example:
man -a passwd

Sometimes it will happen that you know there is a command that performs a certain function, but
you are not sure what the name is. If you don't know the name of the command, it is hard to look for
the man-page. Well, that is what the -k option is for (-k for "keyword"). The basic syntax is:
man -k keyword

where "keyword" is a keyword in the description of the command you are looking for. Note that
"man -k" is the same thing as the apropos command. If you have a command and want to know
what the command does, you can use the whatis command. For example, like this:
whatis diff

which would give you this:


diff (1) - find differences between two files
Paired with whatis is the whereis command. The whereis command will tell you the path to the
command that is being executed, just like the which that we discussed in the section on directory
path. However, whereis will also show you other information like the location of the man-pages,
source code, and so forth. This might give us something like this: whereis find
find: /usr/bin/find /usr/share/man/man1/find.1.gz /usr/share/man/mann/find.n.gz
Should there be other, related files (like /usr/bin/passwd and /etc/passwd), whereis will display
these, as well.
For many commands, as well as general system information, there are additional info files that you
access using the info command. Although there are not as many info files as there are commands,
the info files contain information on more aspects of your system. In many cases, the information
contained in the info files is identical with the man-pages. To get started, simply type "info" and the
command line. To get the information page for particular command, like with man you give the
command name as an option to the info command. So, to get information about the tar command,
you would input:
info tar

which would bring up something like this:

If you are familiar with the emacs editor, then navigation is fairly easy. However, for the most part,
you can move around fairly well using the arrow keys and the enter key. As you can see in the
image above, menu items are indicated with an asterisk (*). Move with an arrow key or the tab key
until the desired item is highlighted and then press enter to select that item. Depending how your
keyboard is layed out, you can move up and down within each page using the page-up and page-
down keys.
Rather than moving through the menu items, you can simply press 'm' which will prompt you to
input the text of the menu item you want to select. You don't have to input the complete text, but
just enough to differentiate it from other items.
Some commands (including info itself) have a tutorial section. This provides examples and step-by-
step instructions how to use the specific command. To reach the info tutorial from the info page for
any command, simply press 'h' (for "help").
SuSE takes this even further by providing you an online copy of their online support knowledge
base. This can also be accessed on the internet here.
Before installing any Linux system it is best to know if there is anything to watch out for. For
commercial software, this is usually the release notes. Often there is a file in the root directory of
the CD (if that's what you are installing from) called README or README.1ST which mentions
the things to look out for. Typically when you download software or even source code from the
Internet, there is a README file. If this file does not give you specific information about installing
the software, it will tell you where to find it.

Other Resources
The Internet is full of resources to find out more about your Linux system. The most obvious places
are the home page of the particular distribution you have, but there are many, many more sites that
provide information, such as Linux.Org and the Linux Documentation Project, as we discussed in
the section on Linux documentation. For a collection of links that I have found useful, check out the
main
One extremely useful place to find information is netnews. Actually, it would be more appropriate
to say "places" as there are thousands of newsgroups, hundreds which apply to computers and
dozens which apply specifically to Linux. Most are archived on www.deja.com, which, as of this
writing, is being redirected to Google Groups. They have a 20 year archive of the various news
groups, not just those related to computers. Here you can also post, but you need to register first.
If you have a Internet Services Provider (ISP) that also provides its own news server, then you
might want to consider a local newsreader such as knode, which comes with the KDE. Using a local
reader has the advantage of being able to subscribe to newsgroups from various topics, such as both
Linux and music, allowing you to easily bounce between the groups you like.
Newsgroups are broken into "hierarchies", or general groupings of particular topics. For example,
the "comp" hierarchy is about computers, the "rec" hierarchy is for recreation. For the
comp.os.linux newsgroup, click here.
Other good sources of information are mailing lists. The difference between a mailing list and
newsgroup is that a copy of each message sent to a mailing list is also sent to every single member.
This means that depending on the mailing list and how many you get, you could have hundreds of
email messages each day. With newsgroups, you download them as you need them. Depending on
your newsreader, you might download all of the messages (which could take quite a long time) or
you can download just the headers and then the contents of the messages as you need to.
Mailing lists also have the advantage of being able to filter messages into sub-directories based on
their content, sender and so forth. Also, most mailing lists allow only members to submit messages,
whereas typically anyone can post to a newsgroup. This means there is often a lot of junk in the
newsgroups, such as advertisements, Linux opponents who just want to start arguments and so
forth. Since you are required to provide your email address for a mailing list, you cannot be so
anonymous and things are usually a lot more pleasant.
To get a list of some of the currently available mailing lists send a message to
[email protected], which contains just the word "lists". To get detailed help information
send a message with the word "help".
Chapter III
Working with the System
Whether you login using the GUI or a character console, the way you interact with a Linux system
is essentially the same. You must first be able to identify yourself to the system by providing the
appropriate information. This information is your user name or login name and a password. As we
discuss in other sections, by default Linux will prompt you for this information when the system is
started. Once you have correctly identified yourself, you are given access to the system. What
happens next will depend on whether you are using the GUI or a character console.
For simplicities sake, we will first talk about interacting with a system from a character console,
which can also be referred to as a character terminal. One important reason for this is that even if
you are using a GUI, you still have access to a character terminal window and the way you interact
is the same. Also, when you connect to a remote system (i.e. using something like telnet) the way
you interact is the same as well.

Backing-up and Restoring Files


If you're using Linux in your company, the system administrator probably does regular backups
(assuming he wants to keep his job). However, if you are administering your own Linux system (i.e.
it's your home workstation), then it is up to you to ensure that your data and important system files
are safe.
The computer boom of the 1990's put a PC in everyone's house, but it did not provide them with the
same awareness and knowledge that computer users of the 1970's and 80's had. With point-n-click
and plug-n-play computers became a "black box" where the insides are an unknown. You turn on
your computer and it just works. When you turn on your computer and it doesn't work, people don't
know what to do. It's possible that the computer can be repaired, but if the hard disk is damaged, the
data may be unrecoverable.
If all you use your computer for it to surf the internet, then there may not be any valuable data on
your system. However, if you write letters, manage your bank accounts or many other things on
your computer, you may have files you want to keep. Although you may think it is safe, it is
extremely important how quickly even a small defect can make the data inaccessible. Therefore,
you need to be able to store that data on an external medium to keep it safe.
The data stored on an external medium like a floppy or CD ROM is called a backup. The process of
storing the data (or making the copy) is called "making a backup". Sometimes, I will copy files onto
a different hard disk. If the first one crashes, I still have access. Even if you don't have a different
drive, you can still protect your data to a limited extent by copying it onto a different partition or
even a different directory. If the drive develops a problem at the exact spot where your data is, it
might be safe some place else. However, if the whole drive dies, your data is gone.
One advantage of storing it on an external device, is that if the computer completely crashes the
data is completely safe. In many cases, companies will actually store the data at a different location
in case the building burns down or there is some other disaster (no kidding!). Linux provides a
number of different useful tools to help you backup your system. Perhaps the most commonly used
tool is tar, probably because of its simplicity. For example let's say you wanted to make a backup
copy of the entire directory /data, the command might look like this:
tar cvf data.backup /data
Where data.backup is the name of the file in which you want to store the backups of your files.
When tar completes, you have a single file which contains a copy of everything in the /data
directory. One thing that we discussed in another section, is that Linux treats hardware just like
regular files. Therefore instead of using a filename you could use the name of a device file, like this:
tar cvf /dev/tape /data
Assuming you had a tape drive on your system, and you had named it /dev/tape, this command
would backup your data to your tape drive.
Note that there are tools available for Linux which allow you to recover files which you have
removed from your system. This goes into too much depth for now, but there is a how-to.
There are other options which you can use with tar that are very useful:
-z compresses - This compresses the file using gzip after it has made the archive. This should not be
done telling tar to use different compression programs. See the tar man-page for details.
-T, --files-from=FILENAME - Here you can specify a file which contains a list of files you want to
archive. This is useful for system configuration files spread out across your system. Although you
could copy all of your system files into one directory prior to making a backup, this method is much
more efficient. Typically when files are removed on Linux they're gone for good. You can create
your own "trash can" by creating a shell function that actually moves the file into a different
directory for example:
function rm() {
mv $1 /home/jimmo/trashcan
}
Then when you want to clear out the trash, you would use the full path to the rm command:
/bin/rm.

Keep in mind that simply being able to backup files is not enough. Often you do not have enough
space on your tapes to do a complete backup of your system every day. Sometimes, doing a
complete backup takes so long that even if you start right as people go home, there is not enough
time to finish before they come back to work. Further, when trying to restore a complete backup of
your system, it will take longer to find the files you need and thus will takes longer to get people
back to work. Therefore, you need a backup strategy, which we discuss in the section on problem
solving.

Interacting with the System


It is common to have people working on UNIX systems that have never worked on a computer
before or have only worked in pure windowing environments, like on a Macintosh. When they get
to the command line, they are lost. On more than one occasion, I have talked to customers and I
have asked them to type in cd /. There is a pause and I hear: click-click-click-click-click-click-click-
click-click-click-click-click. "Hmmm," I think to myself, "that's too many characters." So I ask
them what they typed, and they respond, "cd-space-slash."
We need to adhere to some conventions throughout this site to make things easier. One is that
commands that I talk about will be in your path unless I say otherwise. Therefore, to access them,
all you need to do is input the name of the command without the full path.
The second convention is the translation of the phrases "input the command," "enter the command,"
and "type in the command." These are translated to mean "input/enter/type in the command and
press Enter." I don't know how many times I have talked with customers and have said "type in the
command" and then asked them for what happens and their response is, "Oh, you want me to press
Enter?" Yes! Unless I say otherwise, always press Enter after inputting, entering, or typing in a
command.
Simply having shell is probably not enough for most users. Although you could probably come up
with an interesting and possibly useful shell script, more than likely you're going to need some
commands to run. There are literally hundreds of different commands that come with your system
by default and there are many more different variations of these commands, which you can
download from the Internet.
Sometimes the commands you issue are not separate files on the hard disk, but rather are built-in to
your shell. For example, the cd command, which is used to change directories, is part of the shell,
whereas the ls command, which is used to display the contents of directories is a separate program.
In some cases one shell has a particular command built-in, but it is not available in another shell.
In general a command is broke down into three parts:
programname option(s) argument(s)
Note that not all commands have options and you do not always need to have arguments to a
command. For example, the date does have any arguments and works just fine without any options.
Some command are built-in to the shell you are using, but may be an external command with a
different shell. For example, the echo is internal to the bash shell, but you will probably also find
the /bin/echo command on your system.
If you ever run into trouble and are confused about the behavior of your shell, one important thing
to know is what shell you have. If you weren't told what shell you had when your account was
created or you are installing Linux for the first time and really don't know, there are a couple of
ways of finding out. The first is to simply ask the shell. This is done by accessing the $SHELL
environment variable. (We discuss environment variables in detail in the echo command like this:
echo $SHELL
As you might guess, the echo command simply displays on the screen exactly what you told it, in
this case we told it to display the $SHELL variable. (We know it is a variable because of the leading
$, which we also will discuss in section on shell variables .) What should probably happen is you
get something like this: /bin/bash
In this case, the shell is /bin/bash. We can also find out what shell we are using by seeing which
programs we are currently running. With Linux, as with other Unix dialects, a running program is
called a " process", and you check your processes using the ps command . You can start it with an
argument simply by inputting ps and pressing the enter key. This will probably get you something
like this: PID TTY TIME CMD 21797 pts/1 00:00:00 bash 6060 pts/1 00:00:00 ps
In this case we see under the heading CMD (for command) only "bash" and not the full pathname as
in the previous example. However, there are options to the ps command which will show us the
path.
The shell you are using is just one piece of information the system maintains in regard to your
current session. Much of this information is stored in the form of variables, like your shell. These
variables are set for you when you login to the system. You can also set variables yourself using the
set command. This might look like this:
set VAR=value
Where VAR is the variable name and "value" is the value which you assigned to that variable. Note
that it is not until you want to access the value of the variable that you preceded with the $. To find
out the contents of all variables, you would use the set command by itself with no arguments. This
gives you a long list of variables.
When you login to the system you start in your "home" directory, which can be stored in the
$HOME variable. As we discussed earlier, to change your current directory (also called your
working directory) you use the cd command. If you wanted to return to your home directory, you
can issue the command cd $HOME and your shell will pass the value of the $HOME variable to the
cd, which would then change directories for you. (Note that typically if you use the cd command
with no arguments at all, you change to your home directory by default.)
One part of your environment which is extremely useful to know is the directory you are currently
in. To do this you might want to tell the system to simply print your current working directory. This
is done with the pwd command, which simply displays the full path to your current directory.
It is also useful to see what files and directories reside in your current directory. This is done with
the ls command (short for "list"). Without the options the ls command provides you a simple list of
what is in your current directory, without any additional information. The output might look like
this:
prompt# ls letter.txt memo.txt picture.jpg
you can use the -l option to get a "long" listing of the files and directories. This might show you
something like this:
prompt# ls -l -rw-r--r-- 1 jimmo users 2457 Feb 13 22:00 letter.txt -rw-r--r-- 1 jimmo users 7426
Feb 15 21:33 memo.txt -rw-r--r-- 1 jimmo users 34104 Feb 14 21:31 picture.jpg
This information includes the permissions on the file, who owns the file, the size, and so forth.
Details of this can be found in the
For a more detailed discussion on how various shells behave see the section on shells.
There are many ways to do the things you want to do. Some use a hammer approach and force the
answer out of the system. In many cases, there are other commands that do the exact same thing
without all the gyrations. So, what I am going to try to do here is step through some of the logic
(and illogic) that I went through when first learning Linux. That way, we can all laugh together at
how silly I was, and maybe you won't make the same mistakes I did.
Every dialect of UNIX that I have seen has the ls command. This gives a directory listing of either
the current directory if no argument is given, or a listing of a particular file or directory if arguments
are specified. The default behavior under Linux for the ls command is to list the names of the files
in a single column. Try it and see.
It is a frequent (maybe not common) misconception for new users to think that they have to be in a
particular directory to get a listing of it. They will spend a great deal of time moving up and down
the directory tree looking for a particular file. Fortunately, they don't have to do it that way. The
issue with this misunderstanding is that every command is capable of working with paths, as is the
operating system that does the work. Remember our discussion of Linux basics. Paths can be
relative to our current directory, such as ./directory, or absolute, such as /home/jimmo/directory.
For example, assume that you have a subdirectory of your current working directory called letters.
In it are several subdirectories for types of letters, such as business, school, family, friends, and
taxes. To get a listing of each of these directories, you could write
ls ./letters/business

ls ./letters/school

ls ./letters/family

ls ./letters/friends

ls ./letters/taxes

Because the ls command lets you have multiple commands on the same line, you also could have
issued the command like this:
ls ./letters/business ./letters/school ./letters/family
./letters/friends ./letters/taxes

Both will give you a listing of each of the five directories. Even for five directories, typing all of
that is a pain. You might think you could save some typing if you simply entered
ls ./letters

However, this gives you a listing of all the files and directories in ./letters, not the subdirectories.
Instead, if you entered
ls ./letters/*

the shell would expand the wildcard (*) and give you a listing of both the ./letters directory as well
as the directories immediately below ./letters, like the second example above. If each of the
subdirectories is small, then this might fit onto one screen. If, on the other hand, you have 50 letters
in each subdirectory, they are not all going to fit on the screen at once. Remember our discussion on
shell basics? You can use the pipe (|) to send the command through something like more so that you
could read it a page at a time.
It is common to run command one right after the other. If you simply press the enter key after the
first command, the shell executes it before returning to the prompt. Often you want to issue two
commands in sequence. This is done by separating the commands with a semi-colon, like this:
command1; command2
Note that these commands are not really connected in any way. The shell simply executes one after
the other. To actually "connect" the commands, you would need to use a pipe. Details on pipes can
be found in the section on shells

Logging In
Like many contexts, the name you use as a "real person" is not necessarily the way the system
identifies you. With Linux you see yourself as a particular user, such as jimmo, whereas the system
might see you as the number 12709. For most of the time this difference is pretty much irrelevant,
as the system makes the conversion between user name and this number (the user ID or UID) itself.
There are a few cases where the difference is important, which we will get to in other sections. In
addition, you could make the conversion yourself to see what your user ID is, which we will also
get to elsewhere.
The place where this system makes this conversion is the file /etc/passwd. You can take a look at it
by typing
cat /etc/passwd
from the command line. Here you find one user per line. The details of this file can be found in the
section on administering user accounts. Note that there are a number of predefined, system users in
the /etc/passwd file and they do not relate to real users. For security reasons, some system
administrators will delete many of these users. However, you should leave them alone unless you
know what you are doing.
If you are installing Linux on your system at home, more than likely you are prompted to select a
user name and password during installation. This is the account you should use for normal day-to-
day work. You should not use the system administration account: . The root account is "all
powerful" and can essentially do anything to the system you can imagine. If you make a mistake,
the system is unforgiving and if you are working as root, the results can be catastrophic. If you are
used to Windows, this is a major difference. If you administer a Windows NT/2000 system, you are
typically in the Administrators group. This means you automatically have administrator privileges,
which means you can accidentally cause damage. With Linux, you generally have to make a
conscious effort to switch to the root user to carry out your administrative task (which is fairly easy
and safer).
if you are using a Linux system that someone else installed (perhaps at work), then an account will
need to be created for you. The name of the account will more than likely be unique to your
company, so there's no real need to discuss the different possibilities here. Ask your system
administrator for details.
Keep in mind that both the user name and password are case sensitive. That means it will make a
difference if you spell either with upper or lowercase letters. Using a lowercase letter when creating
the account, then using an uppercase letter when attempting to login, will prevent you from gaining
access to the system.
The process of identifying yourself to the system, whereby you provide your user name and
password, is referred to as " logging in". When the system is ready for you to login you are
presented with a login prompt. That is, you are prompted to login. How the login prompt looks
differs among the various Linux distributions, but generally has the word "login:". Once you input
your username and press the enter key, you are then prompted to input your password. This is done
simply by displaying the word " password:". Typically, you end up seeing something like this:

It is possible that even after installing Linux and rebooting you do not see a login prompt. One
possible reason is that the login prompt is there, but you just don't see it. Perhaps, some one turned
off a monitor, the machine has gone into power saving mode, and so forth. It is also possible that
your Linux system has been configured to automatically start into the GUI. If the video card was
not correctly configured, you may not be able to see anything on the screen at all. In this case all is
not lost because Linux provides you with something called "virtual consoles". We go into these in
detail in the
Bear in mind that the system keeps track of who is currently logged into the system as well as who
has logged in in the past. Since you might be held accountable for things which were done with
your account, it is important to keep your password secret in order to prevent someone from gaining
improper access to the system. This is the reason that when you login you see your username
displayed on the screen as you type it, but not your password. It is possible that someone could be
looking over your shoulder and sees your password as you type.
Once you login to system it starts your "login shell". In essence, a shell is a command line
interpreter, meaning that the shell interprets and executes the commands you enter. If you are
familiar with either DOS or Windows, the command prompt is basically the same thing as a Linux
shell, in that it is also a command line interpreter. The shell indicates that it is ready to accept a new
command by displaying a shell prompt or command prompt. Typical prompts are the #,%, or $. It is
also possible that your system administrator has defined a different prompt. It is common to include
your username, your current directory, the system you are working on or other pieces of
information. You can find more details about how the system perceives the shell in the section on
processes in the operating system introduction.
One of the first things that you should do when you login to a system where someone else created
the user account for you is to change your password. Obviously, the person creating your account
will have to know your password in order to be able to tell you what it is. Your password should be
something which is easy for you to remember (so you do not need to write it down), but extremely
difficult for someone else to guess. What constitutes good and bad passwords is something we get
into in the section on security.
The Linux command which is used to change your password is called passwd, which you can start
from any shell prompt. As a security feature, the passwd program will first prompt you for your old
password before allowing you to change to a new one. This is to ensure that you are the person the
system thinks you are. Perhaps you have left your desk for a moment and someone wants to play a
trick on you and changes your password.
The exception is the root account. The system administrator must have the ability to change any
password and could not do this in every case if the old password was always required. For example,
you may have forgotten your password and need the administrator to change it for you. It would do
no good to ask the administrator to change your password if he had to know it first. This is one
reason why you need to be careful when you are working with the root user account.

Logging Out
If you are running Linux at home, then there is probably no need to stop your sessions when you're
finished working. However, if your session is accessible by others, then it is not a bad idea to "log
out" when you're done. Where "logging in" connects you to a session, "logging out" disconnects
you from that session. In most cases it is sufficient simply to type in the word exit to end your
session (exit is actually a command built-in to many shells). It is also possible to exit a shell session
by pressing CTRL-D (holding down the control key and pressing the letter "d").
After you log out, the system typically sends a new login prompt to your terminal.
The details of this process can be found in the section on logging in .

When Things Go Wrong


Until you become very accustomed to using Linux you're likely to make mistakes (which also
happens to people who have been working with Linux for a long time). In this section, we'll be
talking about some common mistakes and problems that occur when you first start using Linux.
Usually when you make mistakes the system will let you know in some way. When using using the
command line, the system will tell you in the form of error messages. For example, if you try to
execute a command and the command does not exist, the system may report something like this:
bash: some_command: command not found
Such an error might occur if the command exists, but it does not reside in a directory in your search
path. You can find more about this in the section on directory paths.
The system may still report an error, even if it can execute the command. For example, if the
command acts on a file that does not exist. For example, the more displays the contents of a file. If
the file you want to look at does not exist, you might get the error:
some_file: No such file or directory
In the first example, the error came from your shell as it tried to execute the command. In the
second case, the error came from the more command as it encountered the error when trying to
access the file.
In both these cases, the problem is pretty obvious. In some cases, you are not always sure. Often
you include such commands within shell scripts and want to change the flow of the script based on
errors or success of the program. When a command ends, it provides its "exit code" or "return code"
in the special variable $?. So after a command fails, running this command will show you the exit
code:
echo $?

Note that it is up to the program to both provide the text message and the return code. Sometimes
you end up with a text message that does not make sense (or there is no text at all), so all you get is
the return code, which is probably even less understandable. To make a translation between the
return code and a text message, check the file /usr/include/asm/errno.h.
You need to be aware that errors on one system (i.e. one Linux distribution) are not necessarily
errors on other systems. For example, if you forget the space in this command, some distributions
will give you an error:
ls-l
However, on SUSE Linux, this will generate the same output as if you had not forgotten the space.
This is because the ls-l is an to the command ls -l. As the name implies, an alias is a way of
referring to something by a different name. For details take a look at the section on aliases.
It has happened before that I have done a directory listing and saw a particular file. When I tried to
remove it, the system told me the file did not exist. The most likely explanation is that I misspelled
the filename, but that wasn't it. What can happen sometimes is that a control character ends up
becoming part of the filename. This typically happens with the backspace as it is not always defined
as the same character on every system. Often the backspace is CTRL-H, but it could happen that
you create a file on a system with a different backspace key and end up creating a filename with
CTRL-H. When you display the file it prints out the name and when it reaches the backspace backs
up one character before continuing. For example your ls output might show you this file:
jimmo
However trying to erase it you get an error message. To see any " non printable" characters you
would use the -q option to ls. This might show you:
jimmoo?
Which says the file name actually contains two o's and a trailing backspace. Since the backspace
erased the last 'o' in the display, you do not see it when the file name is displayed normally.
Sometimes you lose control of programs and they seem to "runaway". In other cases, a program
may seem to hang and freeze your terminal. Although it is possible because of a bug in the software
or a flaky piece of hardware, oftentimes the user makes a mistake he was not even aware of. This
can be extremely frustrating for the beginner, since you do not even know how you got yourself into
the situation, let alone how to get out.
When I first started learning Unix (even before Linux was born) I would start programs and quickly
see that I needed to stop them. I knew I could stop the program with some combination of the
control key and some other letter. In my rush to stop the program, I would press the control key and
many different letters in sequence. On some occassions, the program simply stop and goes no
further. On other occasions, the program would appear to stop, but I would later discover that it was
still running. What happened was that I hit a combination that did not stop the program but did
something else.
In the first example, where the program would stop and go no further, I had "suspended" the
program. In essence, I'd put it to sleep and it would wait for me to tell it to start up again. This is
typically done by pressing CTRL-S. This feature can obviously be useful in the proper
circumstance, but when it is unexpected and you don't know what you did, it can be very unnerving.
To put things right, you resume the command with CTRL-Q.
In the second example, where the program seemed to have disappeared, I had also suspended the
program but at the same time had put in the "background". This special feature of Unix shells dates
from the time before graphical interfaces were common. It was a great waste of time to start a
program and then have to wait for it to complete, when all you were interested in was the output
which you could simply write to file. Instead you put a program in the background and the shell
returned to the prompt, ready for the next command. It's sometimes necessary to do this once a
command is started, which you do by pressing CTRL-Z, which suspends the program, but returns to
the prompt. You then issue the bg command, which starts the previous command in the background.
(This is all part of "job control" which is discussed in
To stop the program, what I actually wanted to do was to "interrupt" it. This is typical done with
CTRL-C.
What this actually does is to send a signal to the program, in this case an interrupt signal. You can
define which signal is sent when you press any given combination of keys. We talk about this in the
section on terminal settings.
When you put a command in the background which send output to the screen, you need to be
careful about running other programs in the meantime. What could happen is that your output gets
mixed up, making it difficult to see which output belongs to which command.
There have been occasions where I have issued a command and the shell jumps to the next line,
then simply displays a greater than symbol (>). What this often means is that the shell does not
think you are done with the command. This typically happens when you are enclosing something on
the command line quotes in you forget to close the quotes. For example if I wanted to search for my
name in a file I would use the grep command. If I were to do it like this:
grep James Mohr filename.txt

I would get an error message saying that the file "Mohr" did not exist.
To issue this command correctly I would have to include my name inside quotes, like this:
grep "James Mohr" filename.txt

However, if I forgot the final quote, for example, the shell would not think the command was done
yet and would perceive the enter key that I pressed as part of the command. What I would need to
do here is to interrupt the command, as we discussed previously. Note this can also happen if you
use single quotes. Since the shell does not see any difference between a single quote and an
apostrophe, you need to be careful with what you type. For example if I wanted to print the phrase
"I'm Jim", I might be tempted to do it like this:/P>
echo I'm Jim
However, the system does not understand contractions and thinks I have not finished the command.
As we will discuss in the section on pipes and redirection, you can send the output of a command to
a file. This is done with the greater than symbol (>). The generic syntax looks like this:
command > filename
This can cause problems if the command you issue expects more arguments than you gave it. For
example, if I were searching the contents of a file for occurrences of a particular phrase
grep phrase > filename

What would happen is the shell would drop down to the next line and simply wait forever or until
you interrupted the command. The reason is that the grep command can also take input from the
command line. It is waiting for you to type in text, before it will begin searching. Then if it finds the
phrase you are looking for it will write it into the file. If that's not what you want the solution here is
also to interrupt the command. You can also enter the end of file character (CTRL-D), which would
tell grep to stop reading input.
One thing to keep in mind, is that you can put a program in the background even if the shell does
not understand job control. In this case, it is impossible to bring the command back to the
foreground in order to interrupt. You need to do something else. As we discussed earlier, Linux
provides you a tool to display the processes which you are currently running (the ps command).
Simply typing ps on the command line might give you something like this:
PID TTY TIME CMD 29518 pts/3 00:00:00 bash 30962 pts/3 00:00:00 ps
The PID column in the ps output is the process identifier (PID).
If not run in the background, the child processes will continue to do its job until its finished and
then report back to its parent when it is done. A little house cleaning is done and the process
disappears from the system. However, sometimes, the child doesn't end like it is supposed to. One
case is when it becomes a "runaway" process. There are a number of causes of runaway processes,
but essentially it means that the process is no longer needed but does not disappear from the system
The result of this is often the parent cannot end either. In general, the parent should not end until all
of its children are done (however there are cases where it is desired). If processes continue to run
they take up resource and can even bring the system to a stand still.
In cases where you have "runaway" processes or any other time where as process is running that
you need to stop, you can send any process a to stop execution if you know its PID. This is the kill
command and syntax is quite simple:
kill <PID>

By default, the kill command sends a termination signal to that process. Unfortunately, there are
some cases where a process can ignore that termination signal. However, you can send a much more
urgent "kill" signal like this:
kill -9 <PID>

Where "9" is the number of the SIGKILL or kill signal. In general, you should first try to use signal
15 or SIGTERM. This sends a terminate singal and gives the process a chance to end "gracefully".
You should also look to see if the process you want to stop has any children.
For details on what other signals can be sent and the behavior in different circumstances look at the
kill or simply try kill -l:
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15)
SIGTERM 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN 22) SIGTTOU
23) SIGURG 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29)
SIGIO 30) SIGPWR 31) SIGSYS 35) SIGRTMIN 36) SIGRTMIN+1 37) SIGRTMIN+2 38)
SIGRTMIN+3 39) SIGRTMIN+4 40) SIGRTMIN+5 41) SIGRTMIN+6 42) SIGRTMIN+7 43)
SIGRTMIN+8 44) SIGRTMIN+9 45) SIGRTMIN+10 46) SIGRTMIN+11 47) SIGRTMIN+12 48)
SIGRTMIN+13 49) SIGRTMIN+14 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
Keep in mind that sending signals to a process is not just to kill a process. In fact, sending signals to
processes is a common way for processes to communicate with each other. You can find more
details about signals in the section on interprocess communication.
In some circumstances, it is not easy to kill processes by their PID. For example, if something starts
dozens of other processes, it is ineffective to try to input all of their PIDs. To solve this problem
Linux has the killall command and takes the command name instead of the PID. You can also use
the -i, --interactive option to interactively ask you if the process should be kill or the -w, --wait
option to wait for all killed processes to die. Note that if processed ignores the signal or if it is a ,
then killall may end up waiting forever.
There have been cases where I have frantically tried to stop a runaway program and repeatedly
pressed Ctrl-C. The result is that the terminal gets into an undefined state whereby it does not react
properly to any input, that is when you press the various keys. For example, pressing the enter key
may not bring you to a new line (which it normally should do). If you try executing a command, it's
possible to command is not executed properly, because the system has not identified the enter key
correctly. You can return your terminal to a "sane" condition by inputting:
stty sane Ctrl-J

The Ctrl-J character is the line feed character and is necessary as the system does not recognize the
enter key.
It has happened to me a number of times, that the screen saver was activated and it was if the
system had simply frozen. There were no error messages, no keys work and the machine did not
even respond across the network (telnet, ping, etc.) Unfortunately, the only thing to do in this case is
to turn the computer off and then on again.
On the other hand, you can prevent these problems in advance. THe most likely cause it that the
Advanced Power Management (APM) is having problems. In this case, you should disable the APM
within the system . Some machines also have something called "hardware monitoring". This can
cause problems, as well, and should be disabled.
Problems can also be caused by the Advanced Programmable Interrup controller. This can be
deactivated by changing the boot string used by either LILO or grub. In addtion, you can disable it
by adding "disableapic" to your boot line.

Accessing Disks
For the most part, you need to tell Linux what to do. This gives you a lot of freedom, because it
does what you tell it, but people new to Linux have a number of pre-conceptions from Windows.
One thing you need to do is to tell the system to mount devices like hard disks and CD-ROMs.
Typically Linux sees the CD-ROMs the same way it does hard disks, since they are usually all on
the controllers. The device /dev/hda is the master device on the first controller, /dev/hdb is the slave
device on the first controller, /dev/hdc is the master on the second controller and /dev/hdd is the
slave device on the second controller. To mount a filesystem/disk you use the mount
command.Details on this can be found in the section on hard disks and file systems. Assuming that
your CD-ROM is the master device on the second controller you might mount it like this:
mount DEVICE DIRECTORY

mount /dev/hdc /media/cdrom

Sometimes /media/cdrom does not exist, so you might want to try this.
mount /dev/hdc /mnt

Sometimes the system already know about the CD-ROM device, so you can leave off either
component:
mount /media/cdrom

mount /dev/hdc
Chapter IV
Shells and Utilities
Most UNIX users are familiar with "the shell"; it is where you input commands and get output on
your screen. Often, the only contact users have with the shell is logging in and immediately starting
some application. Some administrators, however, have modified the system to the point where users
never even see the shell, or in extreme cases, have eliminated the shell completely for the users.
Because the Linux has become so easy to use, it is possible that you can go for quite a long time
without having to input commands at a shell prompt. If your only interaction with the operating
system is logging into the GUI and starting applications, most of this entire site can only serve to
satisfy your curiosity. Obviously, if all you ever do is start a graphical application, then
understanding about shell is not all that important. However, if you are like most Linux users,
understanding the basic workings of the shell will do wonders to improve your ability to use the
system to its fullest extent.
Up to this point, we have referred to the shell as an abstract entity. In fact, in most texts, it is usually
referred to as simply "the shell", although there are many different shells that you can use, and there
is always a program that must be started before you can interact with "the shell". Each has its own
characteristics (or even quirks), but all behave in the same general fashion. Because the basic
concepts are the same, I will avoid talking about specific shells until later.
In this chapter, we are going to cover the basic aspects of the shell. We'll talk about how to issue
commands and how the system responds. Along with that, we'll cover how commands can be made
to interact with each other to provide you with the ability to make your own commands. We'll also
talk about the different kinds of shells, what each has to offer, and some details of how particular
shells behave.

The Shell
As I mentioned in the section on introduction to operating systems, the shell is essentially a user's
interface to the operating system. The shell is a command line interpreter, just like other operating
systems. In Windows you open up a "command window" or "DOS box" to input commands, which
is nothing other than a command line interpreter. Through it, you issue commands that are
interpreted by the system to carry out certain actions. Often, the state where the system is sitting at a
prompt, waiting for you to type input, is referred to (among other things) as being at the shell
prompt or at the command line.
For many years before the invention of graphical user interfaces, such as X-Windows (the X
Windowing System, for purists), the only way to input commands to the operating system was
through a command line interpreter, or shell. In fact, shells themselves were thought of as wondrous
things during the early days of computers because prior to them, users had no direct way to interact
with the operating system.
Most shells, be they under DOS, UNIX, VMS, or other operating systems, have the same input
characteristics. To get the operating system to do anything, you must give it a command. Some
commands, such as the date command under UNIX, do not require anything else to get them to
work. If you type in date and press Enter, that's what appears on your screen: the date.
Some commands need something else to get them to work: an argument. Some commands, like
mkdir (used to create directories), work with only one argument, as in mkdir directory_name.
Others, like cp (to copy files), require multiple arguments, as in
cp file1 file2

In many cases, you can pass flags to commands to change their behavior. These flags are generally
referred to as options. For example, if you wanted to create a series of sub-directories without
creating every one individually, you could run mkdir with the -p option, like this:
mkdir -p one/two/three/four

In principle, anything added to the command line after the command itself is an argument to that
command. The convention is that an option changes the behavior, whereas an argument is acted
upon by the command. Let's take the mkdir command as an example:
mkdir dir_name

Here we have a single argument which is the name of the directory to be created. Next, we add an
option:
mkdir -p sub_dir/dir_name

The -p is an option. Using the terminology discussed, some arguments are optional and some
options are required. That is, with some commands you must always have an option, such as the tar
command. Some commands don't always need to have an argument, like the date command.
Generally, options are preceded by a dash (-), whereas arguments are not. I've said it before and I
will say it again, nothing is certain when it comes to Linux or UNIX, in general. By realizing that
these two terms are often interchanged, you won't get confused when you come across one or the
other. I will continue to use option to reflect something that changes the command's behavior and
argument to indicate something that is acted upon. In some places, you will also see arguments
referred to as "operands". An operand is simply something on which the shell "operates", such as a
file, directory or maybe even simple text.
Each program or utility has its own set of arguments and options, so you will have to look at the
man-pages for the individual commands. You can call these up from the command line by typing in
man <command_name>

where <command_name> is the name of the command you want information about. Also, if you are
not sure what the command is, many Linux versions have the whatis command that will give you a
brief description. There is also the apropos command, which searches through the man-pages for
words you give as arguments. Therefore, if you don't know the name of the command, you can still
find it.
Arguments (whether they are options or operands) which are enclosed in square brackets ([ ]) are
optional. In some cases, there are optional components to the optional arguments, so you may end
up having brackets within brackets.
An ellipsis (...) Indicates that the preceding arguments can be repeated. For example, the ls
command can take multiple file or directory names as arguments as well as multiple options.
Therefore, you might have a usage message that looks like this:
ls [OPTION] ... [FILE] ...

This tells us that no options are required, but if you wanted you could use multiple options. It also
tells us that no file name is required, but if you wanted you could use multiple ones.
Words that appeared in angle brackets (< >) or possibly in italics in the printed form, indicate that
the word is a place holder. Like in the example below:
man <filename>

Many commands require that an option appear immediately after the command and before any
arguments. Others have options and arguments interspersed. Again, look at the man-page for the
specifics of a particular command.
Often, you just need a quick reminder as to what the available options are and what their syntax is.
Rather than going through the hassle of calling up the man-page, a quick way is to get the command
to give you a usage message. As its name implies, a usage message reports the usage of a particular
command. I normally use -? as the option to force the usage message, as I cannot think of a
command where -? is a valid option. Your system may also support the --help (two dashes) option.
More recent versions of the various commands will typically give you a usage message if you use
the wrong option. Note that fewer and fewer commands support the -?.
To make things easier, the letter used for a particular option is often related to the function it serves.
For example, the -a option to ls says to list "all" files, even those that are "hidden". On older
versions of both Linux and Unix, options typically consisted of a single letter, often both upper and
lowercase letters. Although this meant you could have 52 different options it made remembering
them difficult, if they were multiple functions that all began with the same letter. Multiple options
can either be placed separately, each preceded by a dash, or combined. For example, both of these
commands are valid and have the exact same effect:
ls -a -l

ls -al

In both cases you get a long listing which also included all of the hidden files.
Newer versions of commands typically allow for both single letter options and "long options"
which use full words. For example, the long equivalent of -a would be --all. Note that the long
options are preceded with two dashes because it would otherwise be indistinguishable from the -a
followed by two -l options.
Although it doesn't happen too often, you might end up with a situation where one of the arguments
to your command starts with a dash (-), for example a file name. Since options typically start with a
dash, the shell cannot figure out that it is an argument and not a long line of options. Let's assume
that some application I had created a file called "-jim". If I wanted to do a simple listing of the file, I
might try this:
ls -jim
However, since the shell first tries to figure out what options are being used before it shows you the
listing, it thinks that these are all options and gives you the error message:
ls: invalid option -- j Try `ls --help' for more information.
You can solve this problem with some commands by using two dashes to tell the command that
what follows is actually an argument. So to get the listing in the previous example, the command
might look like this:
ls -- -jim
The Search Path
It may happen that you know there is a program by a particular name on the system, but when you
try to start it from the , you are told that the file is not found. Because you just ran it yesterday, you
assume it has gotten removed or you don't remember the spelling.
The most common reason for this is that the program you want to start is not in your . Your search
path is a predefined set of directories in which the system looks for the program you type in from
the (or is started by some other command). This saves time because the system does not have to
look through every directory trying to find the program. Unfortunately, if the program is not in one
of the directories specified in your path, the system cannot start the program unless you explicitly
tell it where to look. To do this, you must specify either the of the command or a path relative to
where you are currently located.
Lets look at this issue for a minute. Think back to our discussion of files and directories. I
mentioned that every file on the system can be referred to by a unique combination of path and file
name. This applies to executable programs as well. By inputting the complete path, you can run any
program, whether it is in your path or not.
Lets take a program that is in everyones path, like date (at least it should be). The date program
resides in the /bin directory, so its is /bin/date. If you wanted to run it, you could type in /bin/date,
press Enter, and you might get something that looks like this:
Sat Jan 28 16:51:36 PST 1995
However, because date is in your , you need to input only its name, without the path, to get it to run.
One problem that regularly crops up for users coming from a environment is that the only place
looks for commands is in your path. However, even if not specified in your path, the first place
looks is in your current directory. This is not so for . UNIX only looks in your path.
For most users, this is not a problem as the current directory is included in your path by default.
Therefore, the will still be able to execute something in your current directory. Root does not have
the current directory in its path. In fact, this is the way it should be. If you want to include the
current directory in roots path, make sure it is the last entry in the path so that all "real" commands
are executed before any other command that a user might try to "force" on you. In fact, I suggest
that every user adds entries to the end of their path.
Assume a malicious user created a "bad" program in his/her directory called more. If root were to
run more in that users directory, it could have potentially disastrous results. (Note that the current
directory normally always appears at the end of the . So, even if there was a program called more in
the current directory, the one in /bin would probably get executed first. However, you can see how
this could cause problems for root.) To figure out exactly which program is actually being run, you
can use the (what else?) which command.
Newer versions of the bash-Shell can be configured to not only complete commands automatically
by inputting just part of the command, but also arguments to the command, as well as directory
names. See the bash for more details.
Commands can also be starting by including a directory path, whether or not they are in you search
path. You can use relative or absolute paths, usually with the same result. Details on this can be
found in the section on directory paths.
One very important is the PATH variable. Remember that the PATH tells the shell where it needs to
look when determining what command it should run. One of the things the shell does to make sense
of your command is to find out exactly what program you mean. This is done by looking for the
program in the places specified by your PATH .
Although it is more accurate to say that the looks in the directories specified by your PATH , it is
commonly said that the "searches your path." Because this is easier to type, I am going to use that
convention here.
If you were to specify a path in the command name, the does not use your PATH to do any
searching. That is, if you issued the command bin/date, the would interpret that to mean that you
wanted to execute the command date that was in the bin of your current directory. If you were in /
(the root directory), all would be well and it would effectively execute /bin/date. If you were
somewhere else, the might not be able to find a match.
If you do not specify any path (that is, the command does not contain any slashes), the system will
search through your path. If it finds the command, great. If not, you get a message saying the
command was not found.
Let's take a closer look at how this works by looking at my path . From the , if I type
echo $PATH
I get
/usr/local/bin:/bin:/usr/bin:/usr/X11/bin:/home/jimmo/bin:/:.
WATCH THE DOT!
If I type in date, the first place in which the looks is the /bin directory. Because that's where date
resides, it is executed as /bin/date. If I type in vi, the looks in /bin, doesn't find it, then looks in
/usr/bin, where it does find vi. Now I type in getdev. (This is a program I wrote to translate major
device numbers into the driver name. Don't worry if you don't know what a major number is. You
will shortly.) The looks in /usr/local/bin and doesn't find it. It then looks in /bin. Still not there. It
then tries /usr/bin and /usr/X11/bin and still can't find it. When it finally gets to /home/jimmo/bin, it
finds the getdev command and executes it. (Note that because I wrote this program, you probably
won't have it on your system.)
What would happen if I had not yet copied the program into my personal bin directory? Well, if the
getdev program is in my current directory, the finds a match with the last "." in my path.
(Remember that the "." is translated into the current directory, so the program is executed as
./getdev.) If that final "." was missing or the getdev program was somewhere else, the could not find
it and would tell me so with something like
getdev: not found
One convention that I (as well as most other authors) will use with is that commands that we talk
about will be in your path unless we specifically say otherwise. Therefore, to access them, all you
need to do is input the name of the command without the .

Directory Paths
As we discussed in the section on the seach path, you can often start programs simply by inputting
their name, provided they lie in your search path. You could also start a program by referencing it
through a relative path, the path in relation to your current working directory. To understand the
syntax of relative paths, we need to backtrack a moment. As I mentioned, you can refer to any file
or directory by specifying the path to that directory. Because they have special significance, there is
a way of referring to either your current directory or its parent directory. The current directory is
referenced by "." and its parent by ".." (often referred to in conversation as "dot" and "dot-dot").
Because directories are separated from files and other directories by a /, a file in the current
directory could be referenced as ./file_name and a file in the parent directory would be referenced as
../file_name. You can reference the parent of the parent by just tacking on another ../, and then
continue on to the root directory if you want. So the file ../../file_name is in a directory two levels
up from your current directory. This slash (/) is referred to as a forward slash, as compared to a
back-slash (\), which is used in DOS to separate path components.
When interpreting your command line, the shell interprets everything up to the last / as a directory
name. If we were in the root (upper-most) directory, we could access date in one of several ways.
The first two, date and /bin/date, we already know about. Knowing that ./ refers to the current
directory means that we could also get to it like this: ./bin/date. This is saying relative to our current
directory (./), look in the bin subdirectory for the command date. If we were in the /bin directory, we
could start the command like this: ./date. This is useful when the command you want to execute is
in your current directory, but the directory is not in your path. (More on this in a moment.)
We can also get the same results from the root directory by starting the command like this: bin/date.
If there is a ./ at the beginning, it knows that everything is relative to the current directory. If the
command contains only a /, the system knows that everything is relative to the root directory. If no
slash is at the beginning, the system searches until it gets to the end of the command or encounters a
slash whichever comes first. If there is a slash there (as in our example), it translates this to be a
subdirectory of the current directory. So executing the command bin/date is translated the same
as ./bin/date.
Let's now assume that we are in our home directory, /home/jimmo (for example). We can obviously
access the date command simply as date because it's in our path. However, to access it by a relative
path, we could say ../../bin/date. The first ../ moves up one level into /home. The second ../ moves up
another level to /. From there, we look in the subdirectory bin for the command date. Keep in mind
that throughout this whole process, our current directory does not change. We are still in
/home/jimmo.
Searching your path is only done for commands. If we were to enter vi file_name (vi is a text editor)
and there was no file called file_name in our current directory, vi would start editing a new file. If
we had a subdirectory called text where file_name was, we would have to access it either as vi
./text/file_name or vi text/file_name. Of course, we could access it with the absolute path of vi
/home/jimmo/text/file_name.
When you input the path yourself (either a command or a file) The shell interprets each component
of a pathname before passing it to the appropriate command. This allows you to come up with some
pretty convoluted pathnames if you so choose. For example:
cd /home/jimmo/data/../bin/../../chuck/letters
This example would be interpreted as first changing into the directory /home/jimmo/data/, moving
back up to the parent directory (..), then into the subdirectory bin, back into the parent and its parent
(../../) and then into the subdirectory chuck/letters. Although this is a pretty contrived example, I
know many software packages that rely on relative paths and end up with directory references
similar to this example.

Relative
Current directory Target directory Absolute path
path
/data/home/jimmo/letter /data/home/jimmo/letter/dave ./dave or dave /data/home/jimmo/letter/dave
/data/home/jimmo/letter /data/home/jimmo ../ /data/home/jimmo
/data/home/jimmo/letter /data/home/ ../.. /data/home
/data/home/jimmo/letter /tmp ../../../../tmp /tmp

Shell Variables
The shell's environment is all the information that the shell will use as it runs. This includes such
things as your command search path, your logname (the name you logged in under), and the
terminal type you are using. Collectively, they are referred to as your environment variables and
individually, as the "so-and-so" environment variable, such as the TERM environment variable,
which contains the type of terminal you are using.
When you log in, most of these are set for you in one way or another. (The mechanism that sets all
environment variables is shell-dependent, so we will talk about it when we get to the individual
shells.) Each environment variable can be viewed by simply typing echo $VARIABLE. For
example, if I type
echo $LOGNAME

I get:
jimmo
Typing
echo $TERM

I get:
ansi
In general, variables that are pre-defined by the system (e.g. PATH, LOGNAME, HOME) are
written in capital letters. Note that this is not a requirement as there are exceptions.
Note that shell variables are only accessible from the current shell. In order for them to be
accessible to child processes (i.e. sub-processes) they must be made available using the export
command. In the system-wide shell configuration file or "profile" (etc/profile) many variables, such
as PATH are exported. More information on processes can be found in the section on processes in
the chapter "Introduction to Operating Systems".
It is very common that users' shell prompt is defined by the systems. For example, you might have
something that looks like this:
PS1='\u@\h:\w> '
What this does is to set the first level prompt variable PS1 to include the username, hostname and
the current working directory. This ends up looking something like this:
jimmo@linux:/tmp>
Adding the \A to display the time, we end up with something that looks like this:
10:09 jimmo@linux:/tmp>
Variable Meaning
\u Username
\h Hostname
\H The fully-qualified hostname
\w Current working directory
\d date
\t the current time in 24-hour HH:MM:SS format
\T the current time in 12-hour HH:MM:SS format
\@ the current time in 12-hour am/pm format
\A the current time in 24-hour HH:MM format
\l the basename of the shell's terminal device
\e Escape character
\n newline
\r carriage return
One way of using the escape character in your prompt is to send a terminal control sequence. The
can be used, for example, to change the prompt so that the time is shown in red:
PS1='\e[31m\A\e[0m \u@\h:\w> '
Which then looks like this: 10:09 jimmo@linux:/tmp>

Permissions
All this time we have been talking about finding and executing commands, but there is one issue
that I haven't mentioned. That is the concept of permissions. To access a file, you need to have
permission to do so. If you want to read a file, you need to have read permission. If you want to
write to a file, you need to have write permission. If you want to execute a file, you must have
execute permission.
Permissions are set on a file using the chmod command or when the file is created (the details of
which I will save for later). You can read the permissions on a file by using either the l command or
ls -l. At the beginning of each line will be ten characters, which can either be dashes or letters. The
first position is the type of the file, whether it is a regular file (-), a directory (d), a block device file
(b), and so on. Below are some examples of the various file types.

interactive)
- - regular file
c - character device
b - block device
d - directory
p - named pipe
l - symbolic link
We'll get into the details of these files as we move along. If you are curious about the format of each
entry, you can look at the ls man-page.
The next nine positions are broken into three groups. Each group consists of three characters
indicating the permissions. They are, in order, read(r), write(w), and execute(x). The first set of
characters indicates what permissions the owner of the file has. The second set of characters
indicates the permissions for the group of that file. The last set of characters indicates the
permissions for everyone else.
If a particular permission is not given, a dash (-) will appear here. For example, rwx means all three
permissions have been given. In our example above, the symbolic link /usr/bin/vi has read, write,
and execute permissions for everyone. The device nodes /dev/tty1 and /dev/hda1 have permissions
rw- for the owner and group, meaning only read and write, but not execute permissions have been
given. The directory /bin has read and execute permissions for everyone (r-x), but only the owner
can write to it (rwx).
For directories, the situation is slightly different than for regular files. If you do not have read
permission on a directory, you cannot read the contents of that directory. Also, if you do not have
write permission on a directory, you cannot write to it. This means that you cannot create a new file
in that directory. Execute permissions on a directory mean that you can search it or list its contents.
That is, if the execution bit is not set on a directory but the read bit is, you can see what files are in
the directory but cannot execute any of the files or even change into that directory. If you have
execution permission but no read permission, you can execute the files, change directories, but not
see what is in the files. Write permission on a directory also has an interesting side effect. Because
you need to have write permission on a directory to create a new file, you also need to have write
permission to remove an existing file. Even if you do not have write permission on the file itself, if
you can write to the directory, you can erase the file.
At first this sounds odd. However, remember that a directory is nothing more than a file in a special
format. If you have write permission to a directory-file, you can remove the references to other files,
thereby removing the files themselves.
If we were to set the permissions for all users so that they could read, write, and execute a file, the
command would look this:
chmod 777 filename

You can also use symbolic permissions to accomplish the same thing. We use the letters u, g, and o
to specify the user(owner), group, and others for this file, respectively. The permissions are then r
for read, w for write, and x for execute. So to set the permissions so that the owner can read and
write a file, the command would look like this:
chmod u=rw filename

Note that in contrast to the absolute numbers, setting the permissions symbolically is additive. So,
in this case, we would just change the user's permissions to read and write, but the others would
remain unchanged. If we changed the command to this
chmod u+w filename

we would be adding write permission for the user of that file. Again, the permissions for the others
would be unchanged.
To make the permissions for the group and others to be the same as for the user, we could set it like
this
chmod go=u filename

which simply means "change the mode so that the permissions for the group and others equals the
user." We also could have set them all explicitly in one command, like this
chmod u=rw,g=rw,o=rw filename

which has the effect of setting the permissions for everyone to read and write. However, we don't
need to write that much.
Combining the commands, we could have something that looks like this:
chmod u=rw, go=u filename

This means "set the permissions for the user to read and write, then set the permissions for group
and others to be equal to the user."
Note that each of these changes is done in sequence. So be careful what changes are made. For
example, let's assume we have a file that is read-only for everyone. We want to give everyone write
permission for it, so we try
chmod u+w,gu=o filename

This is a typo because we meant to say go=u. The effect is that we added read permissions for the
user, but then set the permissions on the group and user to the same as others.
We might want to try adding the write permissions like this:
chmod +w filename

This works on some systems, but not on the Linux distributions that I have seen. According to the
man-page, this will not change those permissions where the bits in the UMASK are set. (More on
this later. See the chmod man-page for details.)
To get around this, we use a to specify all users. Therefore, the command would be
chmod a+w filename

There are a few other things that you can do with permissions. For example, you can set a program
to change the UID of the process when the program is executed. For example, some programs need
to run as root to access other files. Rather than giving the user the root password, you can set the
program so that when it is executed, the process is run as root. This is a Set-UID, or SUID program.
If you want to run a program with a particular group ID, you would use the SGID program with the
s option to chmod, like this
chmod u+s program or chmod g+s program

There are a few other special cases, but I will leave it up to you to check out the chmod man-page if
you are interested.
When you create a file, the access permissions are determined by their file creation mask. This is
defined by the UMASK variable and can be set using the umask command. One thing to keep in
mind is that this is a mask. That is, it masks out permissions rather than assigning them. If you
remember, permissions on a file can be set using the chmod command and a three-digit value. For
example
chmod 600 letter.john

explicitly sets the permissions on the file letter.john to 600 (read and write permission for the user
and nothing for everyone else). If we create a new file, the permissions might be 660 (read/write for
user and group). This is determined by the UMASK. To understand how the UMASK works, you
need to remember that the permissions are octal values, which are determined by the permissions
bits. Looking at one set of permissions we have

bit: 2 1 0
value: 4 2 1
symbol: r w x
which means that if the bit with value 4 is set (bit 2), the file can be read; if the bit with value 2 is
set (bit 1), the file can be written to; and if the bit with value 1 is set (bit 0), the file can be executed.
If multiple bits are set, their values are added together. For example, if bits 2 and 1 are set
(read/write), the value is 4+2=6. Just as in the example above, if all three are set, we have 4+2+1=7.
Because there are three sets of permissions (owner, group, others), the permissions are usually used
in triplets, just as in the chmod example above.
The UMASK value masks out the bits. The permissions that each position in the UMASK masks
out are the same as the file permissions themselves. So, the left-most position masks out the owner
permission, the middle position the group, and the right most masks out all others. If we have
UMASK=007, the permissions for owner and group are not touched. However, for others, we have
the value 7, which is obtained by setting all bits. Because this is a mask, all bits are unset. (The way
I remember this is that the bits are inverted. Where it is set in the UMASK, it will be unset in the
permissions, and vice versa.)
The problem many people have is that the umask command does not force permissions, but rather
limits them. For example, if we had UMASK=007, we could assume that any file created has
permissions of 770. However, this depends on the program that is creating the file. If the program is
creating a file with permissions 777, the umask will mask out the last bits and the permissions will,
in fact, be 770. However, if the program creates permissions of 666, the last bits are still masked
out. However, the new file will have permissions of 660, not 770. Some programs, like the C
compiler, do generate files with the execution bit (bit 0) set. However, most do not. Therefore,
setting the UMASK=007 does not force creation of executable programs, unless the program
creating the file does itself).
Lets look at a more complicated example. Assume we have UMASK=047. If our program creates a
file with permissions 777, then our UMASK does nothing to the first digit, but masks out the 4 from
the second digit, giving us 3. Then, because the last digit of the UMASK is 7, this masks out
everything, so the permissions here are 0. As a result, the permissions for the file are 730. However,
if the program creates the file with permissions 666, the resulting permissions are 620. The easy
way to figure out the effects of the UMASK are to subtract the UMASK from the default
permissions that the program sets. (Note that all negative values become 0.)
As I mentioned, one way the UMASK is set is through the environment variable UMASK. You can
change it anytime using the umask command. The syntax is simply
umask <new_umask>
Here the <new_umask> can either be the numeric value (e.g., 007) or symbolic. For example, to set
the umask to 047 using the symbolic notation, we have
umask u=,g=r,o=rwx
This has the effect of removing no permissions from the user, removing read permission from the
group, and removing all permissions from others.
Being able to change the permissions on a file is often not enough. What if the only person that
should be able to change a file is not the owner? Simple! You change the owner. This is
accomplished with the chown command, which has the general syntax:
chown new_owner filename

Where "new_owner" is the name of the user account we want to sent the owner of the file to, and
"filename" is the file we want to change. In addition, you can use chown to change not only the
owner, but the group of the file as well. This has the general syntax:
chown new_owner.new:group filename

Another useful trick is the ability to set the owner and group to the same ones as another file. This is
done with the --reference= option, which sets to the name of the file you are referencing. If you
want to change just the group, you can use the chgrp command, which has the same basic syntax as
chown. Not that both chgrp and chmod can also take the --reference= option. Further, all three of
these commands take the -R option, which recursively changes the permissions, owner or group.

Regular Expressions and Metacharacters


Often, the arguments that you pass to commands are file names. For example, if you wanted to edit
a file called letter, you could enter the command vi letter. In many cases, typing the entire name is
not necessary. Built into the shell are special characters that it will use to expand the name. These
are called metacharacters.
The most common metacharacter is *. The * is used to represent any number of characters,
including zero. For example, if we have a file in our current directory called letter and we input
vi let*

the shell would expand this to


vi letter

Or, if we had a file simply called let, this would match as well.
Instead, what if we had several files called letter.chris, letter.daniel, and letter.david? The shell
would expand them all out to give me the command
vi letter.chris letter.daniel letter.david

We could also type in vi letter.da*, which would be expanded to


vi letter.daniel letter.david

If we only wanted to edit the letter to chris, we could type it in as vi *chris. However, if there were
two files, letter.chris and note.chris, the command vi *chris would have the same results as if we
typed in:
vi letter.chris note.chris

In other words, no matter where the asterisk appears, the shell expands it to match every name it
finds. If my current directory contained files with matching names, the shell would expand them
properly. However, if there were no matching names, file name expansion couldn't take place and
the file name would be taken literally.
For example, if there were no file name in our current directory that began with letter, the command
vi letter*

could not be expanded and we would end up editing a new file called (literally) letter*, including
the asterisk. This would not be what we wanted.
What if we had a subdirectory called letters? If it contained the three files letter.chris, letter.daniel,
and letter.david, we could get to them by typing
vi letters/letter*

This would expand to be: vi letters/letter.chris letters/letter.daniel letters/letter.david


The same rules for path names with commands also apply to files names. The command
vi letters/letter.chris

is the same as
vi ./letters/letter.chris

which as the same as


vi /home/jimmo/letters/letter.chris

This is because the shell is doing the expansion before it is passed to the command. Therefore, even
directories are expanded. And the command
vi le*/letter.*

could be expanded as both letters/letter.chris and lease/letter.joe., or any similar combination


The next wildcard is ?. This is expanded by the shell as one, and only one, character. For example,
the command vi letter.chri? is the same as vi letter.chris. However, if we were to type in vi
letter.chris? (note that the "?" comes after the "s" in chris), the result would be that we would begin
editing a new file called (literally) letter.chris?. Again, not what we wanted. This wildcard could be
used if, for example, there were two files named letter.chris1 and letter.chris2. The command vi
letter.chris? would be the same as
vi letter.chris1 letter.chris2

Another commonly used metacharacter is actually a pair of characters: [ ]. The square brackets are
used to represent a list of possible characters. For example, if we were not sure whether our file was
called letter.chris or letter.Chris, we could type in the command as: vi letter.[Cc]hris. So, no matter
if the file was called letter.chris or letter.Chris, we would find it. What happens if both files exist?
Just as with the other metacharacters, both are expanded and passed to vi. Note that in this example,
vi letter.[Cc]hris appears to be the same as vi letter.?hris, but it is not always so.
The list that appears inside the square brackets does not have to be an upper- and lowercase
combination of the same letter. The list can be made up of any letter, number, or even punctuation.
(Note that some punctuation marks have special meaning, such as *, ?, and [ ], which we will cover
shortly.) For example, if we had five files, letter.chris1-letter.chris5, we could edit all of them with
vi letter.chris[12435].
A nice thing about this list is that if it is consecutive, we don't need to list all possibilities. Instead,
we can use a dash (-) inside the brackets to indicate that we mean a range. So, the command
vi letter.chris[12345]

could be shortened to
vi letter.chris[1-5]

What if we only wanted the first three and the last one? No problem. We could specify it as
vi letter.chris[1-35]
This does not mean that we want files letter.chris1 through letter.chris35! Rather, we want
letter.chris1, letter.chris2, letter.chris3, and letter.chris5. All entries in the list are seen as individual
characters.
Inside the brackets, we are not limited to just numbers or just letters. we can use both. The
command vi letter.chris[abc123] has the potential for editing six files: letter.chrisa, letter.chrisb,
letter.chrisc, letter.chris1, letter.chris2, and letter.chris3.
If we are so inclined, we can mix and match any of these metacharacters any way we want. We can
even use them multiple times in the same command. Let's take as an example the command
vi *.?hris[a-f1-5]

Should they exist in our current directory, this command would match all of the following:
letter.chrisa note.chrisa letter.chrisb note.chrisb letter.chrisc
note.chrisc letter.chrisd note.chrisd letter.chrise note.chrise
letter.chris1 note.chris1 letter.chris2 note.chris2 letter.chris3
note.chris3 letter.chris4 note.chris4 letter.chris5 note.chris5
letter.Chrisa note.Chrisa letter.Chrisb note.Chrisb letter.Chrisc
note.Chrisc letter.Chrisd note.Chrisd letter.Chrise note.Chrise
letter.Chris1 note.Chris1 letter.Chris2 note.Chris2 letter.Chris3
note.Chris3 letter.Chris4 note.Chris4 letter.Chris5 note.Chris5
Also, any of these names without the leading letter or note would match. Or, if we issued the
command:
vi *.d*

these would match


letter.daniel note.daniel letter.david note.david
Remember, I said that the shell expands the metacharacters only with respect to the name specified.
This obviously works for file names as I described above. However, it also works for command
names as well.
If we were to type dat* and there was nothing in our current directory that started with dat, we
would get a message like
dat*: not found
However, if we were to type /bin/dat*, the shell could successfully expand this to be /bin/date,
which it would then execute. The same applies to relative paths. If we were in / and entered
./bin/dat* or bin/dat*, both would be expanded properly and the right command would be executed.
If we entered the command /bin/dat[abcdef], we would get the right response as well because the
shell tries all six letters listed and finds a match with /bin/date.
An important thing to note is that the shell expands as long as it can before it attempts to interpret a
command. I was reminded of this fact by accident when I input /bin/l*. If you do an
ls /bin/l*

you should get the output:-rwxr-xr-x 1 root root 22340 Sep 20 06:24 /bin/ln -r-xr-xr-x 1 root root
25020 Sep 20 06:17 /bin/login -rwxr-xr-x 1 root root 47584 Sep 20 06:24 /bin/ls
At first, I expected each one of the files in /bin that began with an "l" (ell) to be executed. Then I
remembered that expansion takes place before the command is interpreted. Therefore, the command
that I input, /bin/l*, was expanded to be
/bin/ln /bin/login /bin/ls

Because /bin/ln was the first command in the list, the system expected that I wanted to link the two
files together (what /bin/ln is used for). I ended up with error message:
/bin/ln: /bin/ls: File exists
This is because the system thought I was trying to link the file /bin/login to /bin/ls, which already
existed. Hence the message.
The same thing happens when I input /bin/l? because the /bin/ln is expanded first. If I issue the
command /bin/l[abcd], I get the message that there is no such file. If I type in
/bin/l[a-n]
I get:
/bin/ln: missing file argument
because the /bin/ln command expects two file names as arguments and the only thing that matched
is /bin/ln.
I first learned about this aspect of shell expansion after a couple of hours of trying to extract a
specific subdirectory from a tape that I had made with the cpio command. Because I made the tape
using absolute paths, I attempted to restore the files as /home/jimmo/letters/*. Rather than restoring
the entire directory as I expected, it did nothing. It worked its way through the tape until it got to the
end and then rewound itself without extracting any files.
At first I assumed I made a typing error, so I started all over. The next time, I checked the command
before I sent it on its way. After half an hour or so of whirring, the tape was back at the beginning.
Still no files. Then it dawned on me that hadn't told the cpio to overwrite existing files
unconditionally. So I started it all over again.
Now, those of you who know cpio realize that this wasn't the issue either. At least not entirely.
When the tape got to the right spot, it started overwriting everything in the directory (as I told it to).
However, the files that were missing (the ones that I really wanted to get back) were still not copied
from the backup tape.
The next time, I decided to just get a listing of all the files on the tape. Maybe the files I wanted
were not on this tape. After a while it reached the right directory and lo and behold, there were the
files that I wanted. I could see them on the tape, I just couldn't extract them.
Well, the first idea that popped into my mind was to restore everything. That's sort of like fixing a
flat tire by buying a new car. Then I thought about restoring the entire tape into a temporary
directory where I could then get the files I wanted. Even if I had the space, this still seemed like the
wrong way of doing things.
Then it hit me. I was going about it the wrong way. The solution was to go ask someone what I was
doing wrong. I asked one of the more senior engineers (I had only been there less than a year at the
time). When I mentioned that I was using wildcards, it was immediately obvious what I was doing
wrong (obvious to him, not to me).
Lets think about it for a minute. It is the shell that does the expansion, not the command itself (like
when I ran /bin/l*). The shell interprets the command as starting with /bin/l. Therefore, I get a
listing of all the files in /bin that start with "l". With cpio , the situation is similar.
When I first ran it, the shell interpreted the files (/home/jimmo/data/*) before passing them to cpio.
Because I hadn't told cpio to overwrite the files, it did nothing. When I told cpio to overwrite the
files, it only did so for the files that it was told to. That is, only the files that the shell saw when it
expanded /home/jimmo/data/*. In other words, cpio did what it was told. I just told it to do
something that I hadn't expected.
The solution is to find a way to pass the wildcards to cpio. That is, the shell must ignore the special
significance of the asterisk. Fortunately, there is a way to do this. By placing a (\) before the
metacharacter, you remove its special significance. This is referred to as "escaping" that character.
So, in my situation with cpio, when I referred to the files I wanted as /home/jimmo/data/\*, the shell
passed the arguments to cpio as /home/jimmo/data/*. It was then cpio that expanded the * to mean
all the files in that directory. Once I did that, I got the files I wanted.
You can also protect the metacharacters from being expanded by enclosing the entire expression in
single quotes. This is because it is the shell that first expands wildcard before passing them to the
program. Note also that if the wild card cannot be expanded, the entire expression (including the
metacharacters) is passed as an argument to the program. Some programs are capable of expanding
the metacharacters themselves.
As in places, other the exclamation mark (!) has a special meaning. (That is, it is also a
metacharacter) When creating a regular expression, the exclamation mark is used to negate a set of
characters. For example, if we wanted to list all files that did not have a number at the end, we could
do something like this
ls *[!0-9]
This is certainly faster than typing this
ls *[a-zA-z]
However, this second example does not mean the same thing. In the first case, we are saying we do
not want numbers. In the second case, we are saying we only want letters. There is a key difference
because in the second case we do not include the punctuation marks and other symbols.
Another symbol with special meaning is the dollar sign ($). This is used as a marker to indicate that
something is a variable. I mentioned earlier in this section that you could get access to your login
name environment variable by typing: echo $LOGNAME

The system stores your login name in the environment variable LOGNAME (note no "$"). The
system needs some way of knowing that when you input this on the command line, you are talking
about the variable LOGNAME and not the literal string LOGNAME. This is done with the
"$".Several variables are set by the system. You can also set variables yourself and use them later
on. I'll get into more detail about shell variables later.
So far, we have been talking about metacharacters used for searching the names of files. However,
metacharacters can often be used in the arguments to certain commands. One example is the grep
command, which is used to search for strings within files. The name grep comes from Global
Regular Expression Print (or Parser). As its name implies, it has something to do with regular
expressions. Lets assume we have a text file called documents, and we wish to see if the string
"letter" exists in that text. The command might be
grep letter documents

This will search for and print out every line containing the string "letter." This includes such things
as "letterbox," "lettercarrier," and even "love-letter." However, it will not find "Letterman," because
we did not tell grep to ignore upper- and lowercase (using the -i option). To do so using regular
expressions, the command might look like this
grep [Ll]etter documents

Now, because we specified to look for either "L" or "l" followed by "etter," we get both "letter" and
"Letterman." We can also specify that we want to look for this string only when it appears at the
beginning of a line using the caret (^) symbol. For example
grep ^[Ll]etter documents

This searches for all strings that start with the "beginning-of-line," followed by either "L" or "l,"
followed by "etter." Or, if we want to search for the same string at the end of the line, we would use
the dollar sign to indicate the end of the line. Note that at the beginning of a string, the dollar sign is
treated as the beginning of the string, whereas at the end of a string, it indicates the end of the line.
Confused? Lets look at an example. Lets define a string like this:
VAR=^[Ll]etter
If we echo that string, we simply get ^[Ll]etter. Note that this includes the caret at the beginning of
the string. When we do a search like this
grep $VAR documents

it is equivalent to
grep ^[Ll]etter documents

Now, if write the same command like this


grep $VAR$ documents

This says to find the string defined by the VAR variable(^[Ll]etter) , but only if it is at the end of the
line. Here we have an example, where the dollar sign has both meanings. If we then take it one step
further:
grep ^$VAR$ documents

This says to find the string defined by the VAR variable, but only if it takes up the entry line. In
other words, the line consists only of the beginning of the line (^), the string defined by VAR, and
the end of the line ($).
Here I want to side step a little. When you look at the variable $VAR$ it might be confusing to
some people. Further, if you were to combine this variable with other characters you may end with
something you do not expect because the shell decides to include as part of the variable name. To
prevent this, it is a good idead to include the variable name within curly-braces, like this:
${VAR}$
The curly-braces tell the shell what exactly belongs to the variable name. I try to always include the
variable name within curly-braces to ensure that there is no confusion. Also, you need to use the
curly-braces when comining variables like this:
${VAR1}${VAR2}
Often you need to match a series of repeated characters, such as spaces, dashes and so forth.
Although you could simply use the asterisk to specify any number of that particular character, you
can run into problems on both ends. First, maybe you want to match a minimum number of that
character. This could easily solved by first repeating that character a certain number of times before
you use the wildcard. For example, the expression ====*
would match at least three equal signs. Why three? Well, we have explicitly put in three equal signs
and the wildcard follows the fourth. Since the asterisk can be zero or more, it could mean zero and
therefore the expression would only match three.
The next problem occurs when we want to limit the maximum number of characters that are
matched. If you know exactly how many to match, you could simply use that many characters.
What do you do if you have a minimum and a maximum? For this, you enclose the range with
curly-braces: {min,max}. For example, to specify at least 5 and at most 10, it would look like this:
{5,10}. Keep in mind that the curly braces have a special meaning for the shell, so we would need
to escape them with a back-slash when using them on the command line. So, lets say we wanted to
search a file for all number combinations between 5 and 10 number long. We might have something
like this:
grep "[0-9]\{5,10\}" FILENAME

This might seem a little complicated, but it would be far more complicated to write an regular
expression that searches for each combination individually.
As we mentioned above, to define a specific number of a particular character you could simply
input that character the desired number of times. However, try counting 17 periods on a line or 17
lower-case letters ([a-z]). Imagine trying to type in this combination 17 times! You could specify a
range with a maximum of 17 and a minimum of 17, like this: {17,17}. Although this would work,
you could save yourself a little typing by simply including just the single value. Therefore, to match
exactly 17 lower-case letters, you might have something like this:
grep "[a-z]\{17\}" FILENAME

If we want to specify a minimum number of times, without a maximum, we simply leave off the
maximum, like this:
grep "[a-z]\{17,\}" FILENAME
This would match a pattern of at least 17 lower-case letters.
Another problem occurs when you are trying to parse data that is not in English. If you were
looking for all letters in an English text, you could use something like this: [a-zA-Z]. However, this
would not include German letters, like ä,Ö,ß and so forth. To do so, you would use the expressions
[:lower:], [:upper:] or [:alpha:] for the lower-case letters, upper-case letters or all letters,
respectively, regardless of the language. (Note this assumes that national language support (NLS) is
configured on your system, which it normally is for newer Linux distributions.
Other expressions include:
• [:alnum:] - Alpha-numeric characters.
• [:cntrl:] - Control characters.
• [:digit:] - Digits.
• [:graph:] - Graphics characters.
• [:print:] - Printable characters.
• [:punct:] - Punctuation.
• [:space:] - White spaces.

One very important thing to note is that the brackets are part of the expression. Therefore, if you
want to include more in a bracket expression you need to make sure you have the correction number
of brackets. For example, if you wanted to match any number of alpha-numeric or punctuation, you
might have an expression like this: [[:alnum:][:digit:]]*.
Another thing to note is that in most cases, regular expression are expanded as much as possible.
For example, let's assume I was parsing an HTML file and wanted to match the first tag on the line.
You might think to try an expression like this: "<.*>". This says to match any number of characters
between the angle brackets. This works if there is only one tag on the line. However, if you have
more than one tag, this expression would match everything from the first opening angle-bracket to
the last closing angle bracket with everything inbetween.
There are a number of rules that are defined for regular expression, the understanding of which
helps avoid confusion:
1. An non-special character is equivalent to that character.
2. When preceeded by a backslash (\) is every special character equivalent to itself
3. A period specifies any single character
4. An asterisk specifies zero or more copies of the preceeding chacter
5. When used by itself, an asterisk species everything or nothing
6. A range of characters is specified within square brackets ([ ])
7. The beginning of the line is specified with a caret (^) and the end of the line with a dollar
sign ($)
8. If included within square brackets, a caret (^) negates the set of characters

Quotes
One last issue that causes its share of confusion is quotes. In Linux, there are three kinds of quotes:
double-quotes ("), single-quotes ('), and back-quotes(``) (also called back-ticks). On most US
keyboards, the single-quotes and double-quotes are on the same key, with the double-quotes
accessed by pressing Shift and the single-quote key. Usually this key is on the right-hand side of the
keyboard, next to the Enter key. On a US-American keyboard the back-quote is usually in the upper
left-hand corner of the keyboard, next to the 1.
To best understand the difference between the behavior of these quotes, I need to talk about them in
reverse order. I will first describe the back-quotes, or back-ticks.
When enclosed inside back-ticks, the shell interprets something to mean "the output of the
command inside the back-ticks." This is referred to as command substitution, as the output of the
command inside the back-ticks is substituted for the command itself. This is often used to assign the
output of a command to a variable. As an example, lets say we wanted to keep track of how many
files are in a directory. From the command line, we could say
ls | wc

The wc command gives me a word count, along with the number of lines and number of characters.
The | is a "pipe" symbol that is used to pass the output of one command through another. In this
example, the output of the ls command is passed or piped through wc. Here, the command might
come up as:
7 7 61
However, once the command is finished and the value has been output, we can only get it back
again by rerunning the command. Instead, If we said:
count=`ls |wc`

The entire line of output would be saved in the variable count. If we then say echo $count, we get
7 7 61
showing me that count now contains the output of that line. If we wanted, we could even assign a
multi-line output to this variable. We could use the ps command, like this
trash=`ps`

then we could type in


echo $trash

which gives us:


PID TTY TIME CMD 29519 pts/6 00:00:00 bash 12565 pts/6 00:00:00 ps
This is different from the output that ps would give when not assigned to the variable trash:
PID TTY TIME CMD
29519 pts/6 00:00:00 bash
12564 pts/6 00:00:00 ps

The next kind of quote, the single-quote ('), tells the system not to do any expansion at all. Lets take
the example above, but this time, use single quotes:
count='ls |wc'
If we were to now type
echo $count

we would get
ls |wc

And what we got was exactly what we expected. The shell did no expansion and simply assigned
the literal string "ls | wc" to the variable count. This even applies to the variable operator "$." For
example, if we simply say
echo '$LOGNAME'

what comes out on the screen is


$LOGNAME
No expansion is done at all and even the "$" is left unchanged.
The last set of quotes is the double-quote. This has partially the same effect as single-quotes, but to
a limited extent. If we include something inside of double-quotes, everything loses its special
meaning except for the variable operator ($), the back-slash (\), the back-tick (`), and the double-
quote itself. Everything else takes on its absolute meaning. For example, we could say
echo "`date`"

which gives us
Wed Feb 01 16:39:30 PST 1995
This is a round-about way of getting the date, but it is good for demonstration purposes. Plus, I
often use this in shell scripts when I want to log something and keep track of the date. Remember
that the back-tick first expands the command (by running it) and then the echo echoes it to the
screen.
That pretty much wraps up the quote characters. For details on other characters that have special
meaning to the shell check out the section on regular expressions. You can get more details from
any number of references books on Linux or UNIX in general (if you need it). However, the best
way to see what's happening is to try a few combinations and see if they behave as you expect.

Previously, I mentioned that some punctuation marks have special meaning, such as *, ?, and [ ]. In
fact, most of the other punctuation marks have special meaning, as well. We'll get into more detail
about them in the section on basic shell scripting.
It may happen that you forget to close the quotes, and you end up on a new line that starts with
(typically) a greater than symbol >. This is the secondary prompt (PS2) and is simply telling you
that your previous line continues. You can continue the line and the close the quotes later, like this:
VAR="Now is the time for all good admins > to come to the aid of their operating system."
It is as if you wrote the entire line at once.
Sometimes it is necessary to include the literal quotes in your output variable. This is a problem
because your shell interprets the quotes before assinging the value to the variable. To get around this
you need to "escape" or "protect" the quotes using a backslash", like this:
echo \"hello, world\"

Pipes and Redirection


Perhaps the most commonly used character is "|", which is referred to as the pipe symbol, or simply
pipe. This enables you to pass the output of one command through the input of another. For
example, say you would like to do a long directory listing of the /bin directory. If you type ls -l and
then press Enter, the names flash by much too fast for you to read. When the display finally stops,
all you see is the last twenty entries or so.
If instead we ran the command
ls -l | more

the output of the ls command will be "piped through more". In this way, we can scan through the list
a screenful at a time.
In our discussion of standard input and standard output in Chapter 1, I talked about standard input
as being just a file that usually points to your terminal. In this case, standard output is also a file that
usually points to your terminal. The standard output of the ls command is changed to point to the
pipe, and the standard input of the more command is changed to point to the pipe as well.
The way this works is that when the shell sees the pipe symbol, it creates a temporary file on the
hard disk. Although it does not have a name or directory entry, it takes up physical space on the hard
disk. Because both the terminal and the pipe are seen as files from the perspective of the operating
system, all we are saying is that the system should use different files instead of standard input and
standard output.
Under Linux (as well as other UNIX dialects), there exist the concepts of standard input, standard
output, and standard error. When you log in and are working from the command line, standard input
is taken from your terminal keyboard and both standard output and standard error are sent to your
terminal screen. In other words, the shell expects to be getting its input from the keyboard and
showing the output (and any error messages) on the terminal screen.
Actually, the three (standard input, standard output, and standard error) are references to files that
the shell automatically opens. Remember that in UNIX, everything is treated as a file. When the
shell starts, the three files it opens are usually the ones pointing to your terminal.
When we run a command like cat, it gets input from a file that it displays to the screen. Although it
may appear that the standard input is coming from that file, the standard input (referred to as stdin)
is still the keyboard. This is why when the file is large enough and you are using something like
more to display the file one screen at a time and it stops after each page, you can continue by
pressing either the Spacebar or Enter key. That's because standard input is still the keyboard.
As it is running, more is displaying the contents of the file to the screen. That is, it is going to
standard output (stdout). If you try to do a more on a file that does not exist, the message
file_name: No such file or directory
shows up on your terminal screen as well. However, although it appears to be in the same place, the
error message was written to standard error (stderr). (I'll show how this differs shortly.)
One pair of characters that is used quite often, "<" and ">," also deal with stdin and stdout. The
more common of the two, ">," redirects the output of a command into a file. That is, it changes
standard output. An example of this would be ls /bin > myfile. If we were to run this command, we
would have a file (in my current directory) named myfile that contained the output of the ls /bin
command. This is because stdout is the file myfile and not the terminal. Once the command
completes, stdout returns to being the terminal. What this looks like graphically, we see in the figure
below.

Now, we want to see the contents of the file. We could simply say more myfile, but that wouldn't
explain about redirection. Instead, we input more <myfile

This tells the more command to take its standard input from the file myfile instead of from the
keyboard or some other file. (Remember, even when stdin is the keyboard, it is still seen as a file.)
What about errors? As I mentioned, stderr appears to be going to the same place as stdout. A quick
way of showing that it doesn't is by using output redirection and forcing an error. If wanted to list
two directories and have the output go to a file, we run this command:
ls /bin /jimmo > /tmp/junk
We then get this message:
/jimmo not found
However, if we look in /tmp, there is indeed a file called junk that contains the output of the ls /bin
portion of the command. What happened here was that we redirected stdout into the file /tmp/junk.
It did this with the listing of /bin. However, because there was no directory /jimmo (at least not on
my system), we got the error /jimmo not found. In other words, stdout went into the file, but stderr
still went to the screen.
If we want to get the output and any error messages to go to the same place, we can do that. Using
the same example with ls, the command would be:
ls /bin /jimmo > /tmp/junk 2>&1
The new part of the command is 2>&1, which says that file descriptor 2 (stderr) should go to the
same place as file descriptor 1 (stdout). By changing the command slightly
ls /bin /jimmo > /tmp/junk 2>/tmp/errors
we can tell the shell to send any errors someplace else. You will find quite often in shell scripts
throughout the system that the file that error messages are sent to is /dev/null. This has the effect of
ignoring the messages completely. They are neither displayed on the screen nor sent to a file.
Note that this command does not work as you would think:
ls /bin /jimmo 2>&1 > /tmp/junk
The reason is that we redirect stderr to the same place as stdout before we redirect stdout. So, stderr
goes to the screen, but stdout goes to the file specified.
Redirection can also be combined with pipes like this:
sort < names | head or ps | grep sh > ps.save

In the first example, the standard input of the sort command is redirected to point to the file names.
Its output is then passed to the pipe. The standard input of the head command (which takes the first
ten lines) also comes from the pipe. This would be the same as the command
sort names | head

Which we see here:

In the second example, the ps command (process status) is piped through grep and all of the output
is redirected to the file ps.save.
If we want to redirect stderr, we can. The syntax is similar:
command 2> file
It's possible to input multiple commands on the same command line. This can be accomplished by
using a semi-colon (;) between commands. I have used this on occasion to create command lines
like this:
man bash | col -b > man.tmp; vi man.tmp; rm man.tmp

This command redirects the output of the man-page for bash into the file man.tmp. (The pipe
through col -b is necessary because of the way the man-pages are formatted.) Next, we are brought
into the vi editor with the file man.tmp. After I exit vi, the command continues and removes my
temporary file man.tmp. (After about the third time of doing this, it got pretty monotonous, so I
created a shell script to do this for me. I'll talk more about shell scripts later.)

Interpreting the Command


When you input a , the shell needs to be able intepret it correctly in order to know what exactly to
do. Maybe you have multiple options or redirect the output to a file. In any event the shell goes
through several steps to figure out that needs to be done.
One question I had was, "In what order does everything get done?" We have shell variables to
expand, maybe an alias or function to process, "real" commands, pipes and input/output redirection.
There are a lot of things that the shell must consider when figuring out what to do and when.
For the most part, this is not very important. Commands do not get so complex that knowing the
evaluation order becomes an issue. However, on a few occasions I have run into situations in which
things did not behave as I thought they should. By evaluating the command myself (as the shell
would), it became clear what was happening. Let's take a look.
The first thing that gets done is that the shell figures out how many commands there are on the line.
(Remember, you can separate multiple commands on a single line with a semicolon.) This process
determines how many tokens there are on the command line. In this context, a token could be an
entire command or it could be a control word such as "if." Here, too, the shell must deal with
input/output redirection and pipes.
Once the shell determines how many tokens there are, it checks the syntax of each token. Should
there be a syntax error, the shell will not try to start any of the commands. If the syntax is correct, it
begins interpreting the tokens.
First, any alias you might have is expanded. Aliases are a way for some shells to allow you to define
your own commands. If any token on the command line is actually an alias that you have defined, it
is expanded before the shell proceeds. If it happens that an alias contains another alias, they are both
expanded before continuing with the next step.
The next thing the shell checks for is functions. Like the functions in programming languages such
as C, a shell function can be thought of as a small subprogram. Check the other sections for details
on aliases and functions.
Once aliases and functions have all been completely expanded, the shell evaluates variables.
Finally, it uses any wildcards to expand them to file names. This is done according to the rules we
talked about previously.
After the shell has evaluated everything, it is still not ready to run the command. It first checks to
see if the first token represents a command built into the shell or an external one. If it's not internal,
the shell needs to go through the search path.
At this point, it sets up the redirection, including the pipes. These obviously must be ready before
the command starts because the command may be getting its input from someplace other than the
keyboard and may be sending it somewhere other than the screen. The figure below shows how the
evaluation looks graphically.
This is an oversimplification. Things happen in this order, though many more things occur in and
around the steps than I have listed here. What I am attempting to describe is the general process that
occurs when the shell is trying to interpret your command.
Once the shell has determined what each command is and each command is an executable binary
program (not a ), the shell makes a copy of itself using the fork() system call. This copy is a child
process of the shell. The copy then uses the exec() system call to overwrite itself with the binary it
wants to execute. Keep in mind that even though the child process is executing, the original shell is
still in memory, waiting for the child to complete (assuming the command was not started in the
background with &).
If the program that needs to be executed is a shell script, the program that is created with fork() and
exec() is another shell. This new shell starts reading the and interprets it, one line at a time. This is
why a syntax error in a is not discovered when the script is started, but rather when the erroneous
line is first encountered.
Understanding that a new process is created when you run a shell script helps to explain a very
common misconception under UNIX. When you run a shell script and that script changes
directories, your original shell knows nothing about the change. This confuses a lot of people who
are new to UNIX as they come from the DOS world, where changing the directory from within a
batch file does change the original shell. This is because DOS does not have the same concept of a
process as UNIX does.
Look at it this way: The sub-shell's environment has been changed because the current directory is
different. However, this is not passed back to the parent. Like "real" parent-child relationships, only
the children can inherit characteristics from their parent, not the other way around. Therefore, any
changes to the environment, including directory changes, are not noticed by the parent. Again, this
is different from the behavior of DOS .bat files.
You can get around this by either using aliases or shell functions (assuming that your shell has
them). Another way is to use the dot command in front of the shell script you want to execute. For
example:
. myscript

<--NOTICE THE DOT!


This script will be interpreted directly by the current shell, without forking a sub-shell. If the script
makes changes to the environment, it is this shell's environment that is changed.
You can use this same functionality if you ever need to reset your environment. Normally, your
environment is defined by the start-up files in your home directory. On occasion, things get a little
confused (maybe a variable is changed or removed) and you need to reset things. You can you the
dot command to do so. For example, with either sh or ksh, you can write it like this:
. $HOME/.profile

<--NOTICE THE DOT!


Or, using a function of bash you can also write
. ~/.profile

<--NOTICE THE DOT!


This uses the tilde (~), which I haven't mentioned yet. Under many shells, you can use the tilde as a
shortcut to refer to a particular users home directory.
If you have csh, the command is issued like this:
source $HOME/.login

<--NOTICE THE DOT!


Some shells keep track of your last directory in the OLDPWD environment variable. Whenever you
change directories, the system saves your current directory in OLDPWD before it changes you to
the new location.
You can use this by simply entering cd $OLDPWD. Because the variable $OLDPWD is expanded
before the cd command is executed, you end up back in your previous directory. Although this has
more characters than just popd, it's easier because the system keeps track of my position, current
and previous, for you. Also, because it's a variable, I can access it in the same way that I can access
other environment variables.
For example, if there were a file in your old directory that you wanted to move to your current one,
you could do this by entering:
cp $OLDPWD/<file_name> ./
However, things are not as difficult as they seem. Typing in cd $OLDPWD is still a bit cumbersome.
It is a lot less characters to type in popd -like in the csh. Why isn't there something like that in the
ksh or bash? There is. In fact, it's much simpler. When I first found out about it, the adjective that
first came to mind was "sweet." To change directories to your previous directory, simply type "cd -".
Different Kinds of Shells
The great-grandfather of all shells is /bin/sh, called simply sh or the Bourne Shell, named after its
developer, Steven Bourne. When it was first introduced in the mid-1970s, this was almost a godsend
as it allowed interaction with the operating system. This is the "standard" shell that you will find on
every version in UNIX (at least all those I have seen). Although many changes have been made to
UNIX, sh has remained basically unchanged.
All the capabilities of "the shell" I've talked about so far apply to sh. Anything I've talked about that
sh can do, the others can do as well. So rather than going on about what sh can do (which I already
did), I am going to talk about the characteristics of some other shells.
Later, I am going to talk about the C-Shell, which kind of throws a monkey wrench into this entire
discussion. Although the concepts are much the same between the C-Shell and other shells, the
constructs are often quite different. On the other hand, the other shells are extensions of the Bourne
Shell, so the syntax and constructs are basically the same.
Be careful here. This is one case in which I have noticed that the various versions of Linux are
different. Not every shell is in every version. Therefore, the shells I am going to talk about may not
be in your distribution. Have no fear! If there is a feature that you really like, you can either take the
source code from one of the other shells and add it or you can find the different shells all over the
Internet, which is much easier. Linux includes several different shells and we will get into the
specific of many of them as we move along. In addition, many different shells are available as either
public domain, shareware, or commercial products that you can install on Linux.
As I mentioned earlier, environment variables are set up for you as you are logging in or you can set
them up later. Depending on the shell you use, the files used and where they are located is going to
be different. Some variables are made available to everyone on the system and are accessed through
a common file. Others reside in the user's home directory.
Normally, the files residing in a users home directory can be modified. However, a system
administrator may wish to prevent users from doing so. Often, menus are set up in these files to
either make things easier for the user or to prevent the user from getting to the command line.
(Often users never need to get that far.) In other cases, environment variables that shouldn't be
changed need to be set up for the user.
One convention I will be using here is how I refer to the different shells. Often, I will say "the bash"
or just "bash" to refer to the Bourne-Again Shell as a concept and not the program /bin/bash. I will
use "bash" to refer to the "Bourne Shell" as an abstract entity and not specifically to the program
/bin/sh.
Why the Bourne-Again Shell? Well, this shell is compatible with the Bourne Shell, but has many of
the same features as both the Korn Shell (ksh) and C-Shell (csh). This is especially important to me
as I flail violently when I don't have a Korn Shell.
Most of the issues I am going to address here are detailed in the appropriate man-pages and other
documents. Why cover them here? Well, in keeping with one basic premise of this book, I want to
show you the relationships involved. In addition, many of the things we are going to look at are not
emphasized as much as they should be. Often, users will go for months or years without learning the
magic that these shells can do.
Only one oddity really needs to be addressed: the behavior of the different shells when moving
through symbolic links. As I mentioned before, symbolic links are simply pointers to files or
directories elsewhere on the system. If you change directories into symbolic links, your "location"
on the disk is different than what you might think. In some cases, the shell understands the
distinction and hides from you the fact that you are somewhere else. This is where the problem lies.
Although the concept of a symbolic link exists in most versions of UNIX, it is a relatively new
aspect. As a result, not all applications and programs behave in the same way. Let's take the
directory /usr/spool as an example. Because it contains a lot of administrative information, it is a
useful and commonly accessed directory. It is actually a symbolic link to /var/spool. If we are using
ash as our shell, when we do a cd /usr/spool and then pwd, the system responds with: /var/spool.
This is where we are "physically" located, despite the fact that we did a cd /usr/spool. If we do a
cd .. (to move up to our parent directory), we are now located in /var. All this seems logical. This is
also the behavior of csh and sh on some systems.
If we use bash, things are different. This time, when we do a cd /usr/spool and then pwd, the system
responds with /usr/spools. This is where we are "logically". If we now do a cd .., we are located
in /usr. Which of these is the "correct" behavior? Well, I would say both. There is nothing to define
what the "correct" behavior is. Depending on your preference, either is correct. I tend to prefer the
behavior of ksh. However, the behavior of ash is also valid.

Command Line Editing


When I first started working in tech support, I was given a csh and once I figured out all it could do,
I enjoyed using it. I found the editing to be cumbersome from time to time, but it was better than
retyping everything.
One of my co-workers, Kamal (of IguanaCam fame), was an avid proponent of the Korn Shell.
Every time he wanted to show me something on my terminal, he would grumble when he forgot
that I wasn't using ksh. Many times he tried to convert me, but learning a new shell wasn't high on
my list of priorities.
I often complained to Kamal how cumbersome vi was (at least I thought so at the time). One day I
asked him for some pointers on vi, because every time I saw him do something in vi, it looked like
magic. He agreed with the one condition that I at least try the ksh. All he wanted to do was to show
me one thing and if after that I still wanted to use the csh, that was my own decision. Not that he
would stop grumbling, just that it was my own choice.
The one thing that Ram showed me convinced me of the errors of my ways. Within a week, I had
requested the system administrator to change my login shell to ksh.
What was that one thing? Kamal showed me how to configure the ksh to edit previous commands
using the same syntax as the vi editor. I felt like the csh editing mechanism was like using a sledge-
hammer to pound in a nail. It does what you want, but it is more work than you need.
Many different shells have a history mechanism. The history mechanism of both the ksh and bash
has two major advantages over that of the csh. First, the information is actually saved to a file. This
is either defined by the HISTFILE environment variable before the shell is invoked, or it defaults
to .bash_history (for the bash) in your home directory. At any point you can edit this file and make
changes to what the ksh perceives as your command history.
This could be useful if you knew you were going to be issuing the same commands every time you
logged in and you didn't want to create aliases or functions. If you copied a saved version of this file
(or any other text file) and named it .sh_history, you would immediately have access to this new
history. (Rewriting history? I shudder at the ramifications.)
The second advantage is the ability to edit directly any of the lines in your .bash_history file from
the command line. If your EDITOR environment variable is set to vi or you use the set -o vi
command, you can edit previous commands using many of the standard vi editing commands.
To enter edit mode, press Esc. You can now scroll through the lines of your history file using the vi
movement keys (h-j-k-l). Once you have found the line you are looking for, you can use other vi
commands to delete, add, change, or whatever you need. If you press "v," you are brought into the
full-screen version of vi (which I found out by accident). For more details, check out the vi or ksh
man-page or the later section on vi.
Note that by default, the line editing commands are similar to the emacs editor. If vi-mode is
activated, you can activate emacs-mode with set -o emacs". Turning either off can be done with +o
emacs or +o vi.
One exciting thing that bash can do is extend the command line editing. There are a large number of
key combinations to which you can get bash to react. You say that the key combinations are
"bound" to certain actions. The command you use is bind. To see what keys are currently bound, use
bind -v. This is useful for finding out all the different editing commands to which you can bind
keys.

Functions
Most (all?) shells have the means of creating new "internal" commands. This is done by creating
shell functions. Shell functions are just like those in a programming language. Sets of commands are
grouped together and jointly called by a single name.
The format for functions is:

function_name()
{
first thing to do
second thing to do
third thing to do
}

Functions can be defined anywhere, including from the command line. All you need to do is simply
type in the lines one at a time, similar to the way shown above. The thing to bear in mind is that if
you type a function from a command line, once you exit that shell, the function is gone.
Shell functions have the ability to accept arguments, just like commands. A simple example is a
script that looks like this:
display()
{
echo $1
}

display Hello
The output would be
Hello
Here we need to be careful. The variable $1 is the positional parameter from the call to the display
function and not to the script. We can see this when we change the script to look like this:
display()
{
echo $1
}

echo $1

display Hello
Lets call the script display.sh and start it like this:
display.sh Hi

The output would then look like this:


Hi
Hello
The first echo shows us the parameter from the command line and the second one shows us the
parameter from the function.

Job Control
Job control is the ability to move processes between the foreground and background. This is very
useful when you need to do several things at once, but only have one terminal. For example, let's
say there are several files spread out across the system that we want to edit. Because we don't know
where they are, we can't use full paths. Because they don't have anything common in their names,
we can't use find. So we try ls -R > more.
After a minute or two, we find the first file we want to edit. We can then suspend this job by
pressing Ctrl+Z. We then see something that looks like this:
[1]+ Stopped ls -R | more
This means that the process has been stopped or suspended. One very important thing to note is that
this process is not in the background as if we had put an "&" at the end. When a process is
suspended, it stops doing anything, unlike a process in the background, which keeps on working.
Once the ls is in the background, we can run vi. When we are done with vi, we can bring the ls
command back with the fg (foreground) command.
If we wanted to, we could have more than just one job suspended. I have never had the need to have
more than two running like this, but I have gotten more than ten during tests. One thing that this
showed me was the meaning of the plus sign (+). This is the "current" job, or the one we suspended
last.
The number in brackets is the process entry in the job table, which is simply a table containing all of
your jobs. Therefore, if we already had three jobs, the next time we suspended a job, the entry
would look like this:
[4]+ Stopped ls -R >> output
To look at the entire job table, we simply enter the command jobs, which might give us
[1] Stopped ls -R /usr >> output.usr
[2] Stopped find / -print > output.find
[3]- Stopped ls -R /var >> output.var
[4]+ Stopped ls -R >> output.root
The plus sign indicates the job that we suspended last. So this is the one that gets called if we run fg
without a job number. In this case, it was Job 4. Note that there is a minus sign (-) right after Job 3.
This was the second to last job that we suspended. Now, we bring Job 2 in the foreground with fg 2
and then immediately suspend it again with Ctrl+Z. The table now looks like this:
[1] Stopped ls -R /usr >> output
[2]+ Stopped find / -print > output.find
[3] Stopped ls -R /var >> output
[4]- Stopped ls -R >> output
Note that Job 2 now has the plus sign following it and Job 4 has the minus sign.
In each of these cases, we suspended a job that was running in the foreground. If we had started a
job and put it in the background from the command line, the table might have an entry that looked
like this:
[3] Running ls -R /var >> output &
This shows us that although we cannot see the process (because it is in the background), it is still
running. We could call it to the foreground if we wanted by running fg 3. And, if we wanted, we
could use the bg command to send one of the stopped jobs to the background. So
bg %1

would send Job 1 to the background just as if we had included & from the command line.
One nice thing is that we don't have to use just the job numbers when we are pulling something into
the foreground. Because we know that we started a process with the find command, we can get it by
using
fg %find
Actually, we could have used %f or anything else that was not ambiguous. In this case, we were
looking for a process that started with the string we input. We could even look for strings anywhere
within the command. To do this, the command might be:fg %?print

which would have given us the same command. Or, if we had tried
fg %?usr

we would have gotten Job 1 because it contains the string usr.


If we find that there is a job that we want to kill (stop completely), we can use the kill command.
This works the same way, so kill %<nr> kills the job with number <nr>, kill %<string> kills the job
starting with string, and so on.
Keep in mind that process takes up resources whether they are in the foreground or not. That is,
background processes take up resources,too.
If you do not remember the process ID of the last process that was placed in the background you
can reference it any time using the $! system variable. You can also use the wait command to stop
processing until the particular process is done. The syntax is simply:
wait PID
Although generally considered part of "job control" you can change the default priority a process
has when it starts, as well as the process of a running process. Details of this can be found in the
section on process scheduling.

Aliases
What is an alias? It isn't the ability to call yourself Thaddeus Jones when your real name is Jedediah
Curry. Instead, in a Linux-context it is the ability to use a different name for a command. In
principle, personal aliases can be anything you want. They are special names that you define to
accomplish tasks. They aren't shell scripts, as a shell script is external to your shell. To start up a
shell script, type in its name. The system then starts a shell as a child process of your current shell to
run the script.
Aliases, too, are started by typing them in. However, they are internal to the shell (provided your
shell uses aliases). That is, they are internal to your shell process. Instead of starting a sub-shell, the
shell executes the alias internally. This has the obvious advantage of being quicker, as there is no
overhead of starting the new shell or searching the hard disk.
Another major advantage is the ability to create new commands. You can do this with shell scripts
(which we will get into later), but the overhead of creating a new process does not make it
worthwhile for simple tasks. Aliases can be created with multiple commands strung together. For
example, I created an alias, t, that shows me the time. Although the date command does that, all I
want to see is the time. So, I created an alias, t, like this:
alias t=`date | cut -c12-16`
When I type in t, I get the hours and minutes, just exactly the way I want.
Aliases can be defined in either the .profile, .login or the .cshrc, depending on your shell. However,
as I described above, if you want them for all sub-shells, they need to go in .cshrc. If you are
running a Bourne Shell, aliasing may be the first good reason to switch to another shell.
Be careful when creating aliases or functions so that you don't redefine existing commands. Either
you end up forgetting the alias, or some other program uses the original program and fails because
the alias gets called first. I once had a call from a customer with a system in which he could no
longer install software.
We tried replacing several programs on his system, but to no avail. Fortunately, he had another copy
of the same product, but it, too, died with the same error. It didn't seem likely that it was bad media.
At this point, I had been with him for almost an hour, so I decided to hand it off to someone else
(often, a fresh perspective is all that is needed).
About an hour later, one of the other engineers came into my cubicle with the same problem. He
couldn't come up with anything either, which relieved me, so he decided that he needed to research
the issue. Well, he found the exact same message in the source code and it turned out that this
message appeared when a command could not run the sort command. Ah, a corrupt sort binary.
Nope! Not that easy. What else was there? As it turned out, the customer had created an alias called
sort that he used to sort directories in a particular fashion. Because the Linux command couldn't
work with this version of sort, it died.
Why use one over the other? Well, if there is something that can be done with a short shell script,
then it can be done with a function. However, there are things that are difficult to do with an alias.
One thing is making long, relatively complicated commands. Although you can do this with an
alias, it is much simpler and easier to read if you do it with a function. I will go into some more
detail about shell functions later in the section on shell scripting. You can also find more details in
the bash man-page.
On some systems, you will find that they have already provide a number of aliases for you. To see
what alias are currently configured, just run alias with no options and you might get something like
this:
alias +='pushd .' alias -='popd' alias ..='cd ..' alias ...='cd ../..' alias beep='echo -en "\007"' alias
dir='ls -l' alias l='ls -alF' alias la='ls -la' alias ll='ls -l' alias ls='ls $LS_OPTIONS' alias ls-l='ls -l'
alias md='mkdir -p' alias o='less' alias rd='rmdir' alias rehash='hash -r' alias unmount='echo "Error:
Try the command: umount" 1>&2; false' alias which='type -p' alias you='yast2 online_update'
As you can see there are many different ways you can use aliases.

A Few More Constructs


There are a few more loop constructs that we ought to cover as you are likely to come across them
in some of the system scripts. The first is for a for-loop and has the following syntax:

for var in word1 word2 ...


do
list of commands
done

We might use this to list a set of pre-defined directories like this:


or dir in bin etc usr
do
ls -R $dir
done

This script does a recursive listing three times. The first time through the loop, the variable dir is
assigned the value bin, next etc, and finally usr.
You may also see that the do/done pair can be replaced by curly braces ({ }). So, the script above
would look like this:
for dir in bin etc usr
{
ls -R $dir
}

Next, we have while loops. This construct is used to repeat a loop while a given expression is true.
Although you can use it by itself, as in
while ( $VARIABLE=value)

I almost exclusively use it at the end of a pipe. For example:


cat filename | while read line
do
commands
done

This sends the contents of the file filename through the pipe, which reads one line at a time. Each
line is assigned to variable line. I can then process each line, one at a time. This is also the format
that many of the system scripts use.
For those of you who have worked with UNIX shells before, you most certainly should have
noticed that I have left out some constructs. Rather than turning this into a book on shell
programming, I decided to show you the constructs that occur most often in the shell scripts on your
system. I will get to others as we move along. The man-pages of each of the shells provide more
details.

The C-Shell
One of the first "new" shells to emerge was the csh or C-Shell. It is so named because much of the
syntax it uses is very similar to the C programming language. This isn't to say that this shell is only
for C programmers, or programmers in general. Rather, knowing C makes learning the syntax much
easier. However, it isn't essential. (Note: The csh syntax is similar to C, so don't get your dander up
if it's not exactly the same.)
The csh is normally the shell that users get on many UNIX systems. Every place I ever got a UNIX
account, it was automatically assumed that I wanted csh as my shell. When I first started out with
UNIX, that was true. In fact, this is true for most users. Because they don't know any other shells,
the csh is a good place to start. You might actually have tcsh on your system, but the principles are
the same as for csh.
As you login with csh as your shell, the system first looks in the global file /etc/cshrc. Here, the
system administrator can define variables or actions that should be taken by every csh user. Next,
the system reads two files in your home directory: .login and .cshrc. The .login file normally
contains the variables you want to set and the actions you want to occur each time you log in.
In both of these files, setting variables have a syntax that is unique to the csh. This is one major
difference between the csh and other shells. It is also a reason why it is not a good idea to give root
csh as its default shell. The syntax for csh is set variable_name=value
whereas for the other two, it is simply
variable=value
Because many of the system commands are Bourne scripts, executing them with csh ends up giving
you a lot of syntax errors. Once the system has processed your .login file, your .cshrc is processed.
The .cshrc contains things that you want executed or configured every time you start a csh. At first,
I wasn't clear with this concept. If you are logging in with the csh, don't you want to start a csh?
Well, yes. However, the reverse is not true. Every time I start a csh, I don't want the system to
behave as if I were logging in.
Let's take a look as this for a minute. One of the variables that gets set for you is the SHELL
variable. This is the shell you use anytime you do a shell escape from a program. A shell escape is
starting a shell as a subprocess of a program. An example of a program that allows a shell escape is
vi.
When you do a shell escape, the system starts a shell as a new (child) process of whatever program
you are running at the time. As we talked about earlier, once this shell exits, you are back to the
original program. Because there is no default, the variable must be set to a shell. If the variable is set
to something else, you end up with an error message like the following from vi:
invalid SHELL value: <something_else>
where <something_else> is whatever your SHELL variable is defined as.
If you are running csh and your SHELL variable is set to /bin/csh, every time you do a shell escape,
the shell you get is csh. If you have a .cshrc file in your home directory, not only is this started when
you log in, but anytime you start a new csh. This can be useful if you want to access personal aliases
from inside of subshells.
One advantage that the csh offered over the Bourne Shell is its ability to repeat, and even edit,
previous commands. Newer shells also have this ability, but the mechanism is slightly different.
Commands are stored in a shell "history list," which, by default, contains the last 20 commands.
This is normally defined in your .cshrc file, or you can define them from the command line. The
command set
history=100
would change the size of your history list to 100. However, keep in mind that everything you type at
the command line is saved in the history file. Even if you mistype something, the shell tosses it into
the history file.
What good is the history file? Well, the first thing is that by simply typing "history" with nothing
else you get to see the contents of your history file. That way, if you can't remember the exact
syntax of a command you typed five minutes ago, you can check your history file.
This is a nice trick, but it goes far beyond that. Each time you issue a command from the csh
prompt, the system increments an internal counter that tells the shell how many commands have
been input up to that point. By default, the csh often has the prompt set to be a number followed by
a %. That number is the current command, which you can use to repeat those previous commands.
This is done with an exclamation mark (!), followed by the command number as it appears in the
shell history. For example, if the last part of your shell history looked like this:
21 date 22 vi letter.john 23 ps 24 who
You could edit letter.john again by simply typing in !22. This repeats the command vi letter.john and
adds this command to your history file. After you finish editing the file, this portion of the history
file would look like
21 date 22 vi letter.john 23 ps 24 who 25 vi letter.john
Another neat trick that's built into this history mechanism is the ability to repeat commands without
using the numbers. If you know that sometime within your history you edited a file using vi, you
could edit it again by simply typing !vi. This searches backward though the history file until it finds
the last time you used vi. If there were no other commands since the last time you used vi, you
could also Enter !v. To redo the last command you entered, you could do so simply by typing in !!.
This history mechanism can also be used to edit previously issued commands. Lets say that instead
of typing vi letter.john, we had typed in vi letter.jonh. Maybe we know someone named jonh, but
that's not who we meant to address this letter to. So, rather than typing in the whole command, we
can edit it. The command we would issue would be !!:s/nh/hn/.
At first, this seems a little confusing. The first part, however, should be clear. The "!!" tells the
system to repeat the previous command. The colon (:) tells the shell to expect some editing
commands. The "s/nh/hn/" says to substitute for pattern nh the hn. (If you are familiar with vi or
sed, you understand this. If not, we get into this syntax in the section on regular expressions and
metacharacters.)
What would happen if we had edited a letter to john, done some other work and decided we wanted
to edit a letter to chris instead. We could simply type !22:s/john/chris/. Granted, this is actually more
keystrokes than if we had typed everything over again. However, you hopefully see the potential for
this. Check out the csh man-page for many different tricks for editing previous commands.
In the default .cshrc are two aliases that I found quite useful. These are pushd and popd. These
aliases are used to maintain a directory "stack". When you run pushd <dir_name>, your current
directory is pushed onto (added to) the stack and you change the directory to <dir_name>. When
you use popd, it pops (removes) the top of the directory stack and you change directories to it.
Like other kinds of stacks, this directory stack can be several layers deep. For example, lets say that
we are currently in our home directory. A "pushd /bin" makes our current directory /bin with our
home directory the top of the stack. A "pushd /etc" brings us to /etc. We do it one more time with
pushd /usr/bin, and now we are in /usr/bin. The directory /usr/bin is now the top of the stack.
If we run popd (no argument), /usr/bin is popped from the stack and /etc is our new directory.
Another popd, and /bin is popped, and we are now in /bin. One more pop brings me back to the
home directory. (In all honesty, I have never used this to do anything more than to switch
directories, then jump back to where I was. Even that is a neat trick.)
There is another useful trick built into the csh for changing directories. This is the concept of a
directory path. Like the execution search path, the directory path is a set of values that are searched
for matches. Rather than searching for commands to execute, the directory path is searched for
directories to change into.
The way this works is by setting the cdpath variable. This is done like any other variable in csh. For
example, if, as system administrator, we wanted to check up on the various spool directories, we
could define cdpath like this:
set cdpath = /usr/spool
Then, we could enter
cd lp
If the shell can't find a subdirectory named lp, it looks in the cdpath variable. Because it is defined
as /usr/spool and there is a /usr/spool/lp directory, we jump into /usr/spool/lp. From there, if we type
cd mail
we jump to /usr/spool/mail. We can also set this to be several directories, like this:
set cdpath = ( /usr/spool /usr/lib /etc )
In doing so, each of the three named directories will be searched.
The csh can also make guesses about where you might want to change directories. This is
accomplished through the cdspell variable. This is a Boolean variable (true/false) that is set simply
by typing
set cdspell
When set, the cdspell variable tells the csh that it should try to guess what is really meant when we
misspell a directory name. For example, if we typed
cd /sur/bin (instead of /usr/bin)
the cdspell mechanism attempts to figure out what the correct spelling is. You are then prompted
with the name that it guessed as being correct. By typing in anything other than "n" or "N," you are
changing into this directory. There are limitations, however. Once it finds what it thinks is a match,
it doesn't search any further.
For example, we have three directories, "a," "b," and "c." If we type "cd d," any of the three could
be the one we want. The shell will make a guess and choose one, which may or may not be correct.
Note that you may not have the C-Shell on your system. Instead, you might have something called
tcsh. The primary difference is that tcsh does command line completion and command line editing.

Commonly Used Commands and Utilities


There are hundreds of commands and utilities plus thousands of support files in a normal Linux
installation. Very few people I have met know what they all do. As a matter of fact, I don't know
anyone who knows what they all do. Some are obvious and we use them everyday, such as date.
Others are not so obvious and I have never met anyone who has used them. Despite their
overwhelming number and often cryptic names and even more cryptic options, many commands are
very useful and powerful. I have often encountered users, as well as system administrators, who
combine many of these commands into something fairly complicated. The only real problem is that
there is often a single command that would do all of this for them.
In this section, we are going to cover some of the more common commands. I am basing my choice
on a couple of things. First, I am going to cover those commands that I personally use on a regular
basis. These commands are those that I use to do things I need to do, or those that I use to help end
users get done what they need to. Next, I will discuss the Linux system itself. There are dozens of
scripts scattered all through the system that contain many of these commands. By talking about
them here, you will be in a better position to understand existing scripts should you need to expand
or troubleshoot them.
Because utilities are usually part of some larger process (such as installing a new hard disk or
adding a new user), I am not going to talk about them here. I will get to the more common utilities
as we move along. However, to whet your appetite, here is a list of programs used when working
with files.

File and Directory Basics


Command Function
cd change directory
cp copy files
file determine a file's contents
ls list files or directories
ln make a to a file
mkdir make a directory
mv move (rename) a file
rm remove a file
rmdir remove a directory

File Viewing
Command Function
cat Display the contents of file
less Page through files
head show the top portion of a file
more display screenfuls of a file
tail display bottom portion of a file
nl count the number of lines in a file
wc count the number of lines, words and characters
in a file
od View a binary file
tee display output on and write it to a file
simultaneously

File Management
Command Function
ls display file attributes
stat display file attributes
File Management
wc count the number of lines, words and characters
in a file
file identify file types
touch set the time stamp of a file or directory
chgrp change the group of a file
chmod change the permissions (mode) of a file
chown change the owner of a file
chattr change advanced file attributes
lsattr display advanced file attributes

File Manipulation
Command Function
awk pattern-matching, programming language
csplit split a file
cut display columns of a file
paste append columns in a file
dircmp compare two directories
find find files and directories
perl scripting language
sed Stream Editor
sort sort a file
tr translate chracters in a file
uniq find unique or repeated lines in a file
xargs process multiple arguements

File Editing
Command Function
vi text editor
emacs text editor
sed Stream Editor

Locate Files
Command Function
find find files and directories
which locate commands within your
whereis locate standard files
File Compression and Archiving
Command Function
gzip compress a file using GNU Zip
gunzip uncompress a file using GNU Zip
compress compress a file using UNIX compress
uncompress uncompress a file using UNIX compress
bzip2 compress a file using block-sorting file
compressor
bunzip2 uncompress a file using block-sorting file
compressor
zip compress a file using Windows/DOS zip
unzip uncompress a file using Windows/DOS zip
tar read/write (tape) archives
cpio copy files to and from archives
dump dump a disk to tape
restore restore a dump
mt tape control programm

File Comparison
Command Function
diff find differences in two files
cmp compare two files
comm compare sorted files
md5sum compute the MD5 checksum of a file
sum compute the checksum of a file

Disks and File Systems


Command Function
df display free space
du display disk usage
mount mount a filesystem
fsck check aand repair a filesystem
sync Flush disk caches

Printing
Command Function
lpr print files
Printing
lpq view the print queue
lprm Remove print jobs
lpc line printer control program

Process Management
Command Function
ps list processes
w list users' processes
uptime view the system load, amount of time it has been
running, etc.
top monitor processes
free display free memory
kill send signals to processes
killall kill processes by name
nice set a processes nice value
renice set the nice value of a running process.
at run a job at a specific time
crontab schedule repeated jobs
batch run a job as the system load premits
watch run a programm at specific intervals
sleep wiat for a specified interval of time

Host Information
Command Function
uname Print system information
hostname Print the system's hostname
ifconfig Display or set network interface configuration
host lookup DNS information
nslookup lookup DNS information (deprecated)
whois Lookup domain registrants
ping Test reachability of a host
traceroute Display network path to a host

Networking Tools
Command Function
ssh Secure remote access
Networking Tools
telnet Log into remote hosts
scp Securely copy files between hosts
ftp Copy files between hosts
wget Recursively download files from a remote host
lynx Character based web-browser

Examples of Commonly Used Utilities

Directory listings: ls
When doing a long listing of a directory or file and looking at the date, you typically only want to
see the date when the contents of the file was last changed. This is the default behavior with the -l
option. However there may be cases, where you want to see when other aspects of the file were
changed, such as the permissions. This is done adding the -c option (i.e. -lc). Note that if you leave
off the -l option you may not see any dates at all. Instead, the output is sorted in columns by the
time the file was changed.
Typically, when you do a simple ls of a directory, the only piece of information you get is the
filename. However, you could use the -p option to display a little bit more. For example, you might
end up a something that looks like this:
Data/ letter.txt script* script2@
Here you can see that at the end of many of the files are a number of different symbols. The /
(forward slash) indicates it is a directory, the @ says it is a symbolic link, and * (asterisk) says it is
executable.
For many years, this is the extent of what you could do (that is, differentiate file types by which
symbol was displayed). However, with newer systems there is a lot more that you can do. If your
terminal can display colors, it is possible to color-code the output of ls. Newer versions of ls have
the option --color= followed by when it should display colors. For example, never, always, or "
auto". If set to auto, output will only be in color if you are connected to a terminal. If, for example,
you used ls any script, it may not be useful to have the output displayed in color. In fact, it might
mess up your script. On some systems, you can also set it to tty so that color is only turned on when
running on the console or terminal that supports colors.
By default, a number of different file types and their associated colors are specified in the
/etc/DIR_COLORS file. For example, dark red is used for executable files, light red is used for
archives (tar, rpm), dark blue is for directories, magenta is for image files and so forth. If you have a
symbolic link that points nowhere (i.e the target file does not exist) the name will blink red. If you
want to change the system defaults, copy /etc/DIR_COLORS to .dir_colors in your home directory.
Some linux distributions, the ls command is by default an and defined like this:
alias ls='/bin/ls $LS_OPTIONS'
where $LS_OPTIONS might contain --color=tty. I have run into cases where the different colors are
very hard to see. In such cases, the easiest thing to do is simply disable the like this:
unalias ls
In many cases, you may have a long list of files where you want to find the most recent ones (such
as log file). You could do a long listing and check the date of each one individually to find the most
recent ones. Instead, you could use the -t option to ls, which sorts the files by their modification
time. That is when the data was last changed. Using the -r option, ls prints them in reverse order, so
the most recent ones are at the botton of the list. So, to get the 10 most recent files, you would have
a command liek this: ls -ltr | tail

Removing files: rm
-i queries you before removing the file
-r recursively removes files
-f forces removal
The way files and directories are put together in Linux has some interesting side effects. In the
section on files and filesystems, we talk about the fact that a directory is essentially just a list of the
files and pointers to where the files are on the hard disk. If you were to remove the entry in the
directory list, the system would not know where to find the file. That basically means you have
removed the file. That means that even if you did not have write permission on a file, you could
remove it if you had write on its parent directory. The same thing applies in reverse. If you did not
have write permissions on the directory, you could not remove the file.

Copying files: cp
More than likely you'll sometimes need to make a copy of an existing file. This is done with the cp
command, which typically takes two arguments, the source name and destination name. By default,
the cp command does not work on directories. To do that, you would use the -r option which says to
recursively copy the files.
Typically the cp command only takes two arguments, the source and destination of the copy.
However, you can use more than two arguments if the last argument is a directory. That way you
could copy multiple files into a directory with a single command.
One thing to keep in mind is that the system will open the target file for writing and if the file does
not yet exist it will be created using default files for permissions, owner and group. However, if the
file already exists, the contents are written to the targer file using the old values for permissions,
owner and group. Assume we have the follwing two files:
-rw-r--r-- 1 root root 29 Mar 19 18:59 file1 -rw-r--r-- 1 jimmo root 538 Mar 19 19:01 file2
If I ran as root this command:
cp file1 file3

I end up with a new file that looks like this:


-rw-r--r-- 1 root root 29 Mar 19 19:06 file3
Howe if I ran this command:
cp file1 file2

I end up with a new file and all of the files looks like this:
-rw-r--r-- 1 root root 29 Mar 19 18:59 file1 -rw-r--r-- 1 jimmo root 29 Mar 19 19:09 file2 -rw-r--r--
1 root root 29 Mar 19 19:06 file3
The owner of file2 did not change. This was because the file was not created, but rather the contents
were simply overwritten with the contents of file1. You can use the -p option to ensure "preserve"
the attributes on the new file.
Often times you don not want to overwrite an existing file if it exists. This is where the -i,
--interactive comes in. It will interactively query you to ask if the target file should be overwritten
or not. The opposite of this is the -f, --force option which forces cp to overwrite the target file.
I also use the -R, -r, --recursive to recursively copy a directory tree from one place to another. That
means, all of the files and directories from the source directory are copied into the target.

Option Meaning
-a, --archive same as -dpR
--backup[=CONTROL] make a backup of each existing destination file
-b like --backup but does not accept an argument
--copy-contents copy contents of special files when recursive
-d same as --no-dereference --preserve=link
--no-dereference never follow symbolic links
if an existing destination file cannot be opened, remove it and
-f, --force
try again
-i, --interactive prompt before overwrite
-H follow command-line symbolic
links
-l, --link link files instead of copying
-L, --dereference always follow symbolic links
-p same as
--preserve=mode,ownership,timest
amps
preserve the specified attributes (default:
--preserve[=ATTR_LIST] mode,ownership,timestamps), if possible additional attributes:
links, all
--no-preserve=ATTR_LIST don't preserve the specified attributes
-R, -r, --recursive copy directories recursively
-s, --symbolic-link make symbolic links instead of copying
--target-directory=DIRECTORY move all SOURCE arguments into DIRECTORY
copy only when the SOURCE file is newer than the destination
-u, --update
file or when the destination file is missing
-v, --verbose explain what is being done
Renaming and moving files: mv
To rename files, you use the mv command (for move). The logic here is that you are moving the
files from one name to another. You would also use this command if moving a file between
directories. Theoretically one could say you are "renaming" the entire path to the file, therefore,
"rename" might be a better command name.
You can see the effects of this if compare the time to copy a very large file as opposed to moving it.
In the first case, the entire contents needs to be rewritten. In the second cases only the filename is
changed, which obviously is a lot fast.
Note that simply changing the file name only works if the source and target files are on the same
file system. If you move files between files systems the new one must be rewritten and basically
takes the same time as with a copy.
Like the cp command mv also takes the -i, --interactive option to query you prior to overwriting an
existing file.

Option Meaning
--backup[=CONTROL] make a backup of each existing destination file
-b like --backup but does not
accept an argument
do not prompt before overwriting equivalent to
-f, --force
--reply=yes
-i, --interactive prompt before overwrite equivalent to --reply=query
move only when the SOURCE file is newer than the is
-u, --update
destination file or when the destination file missing
-v, --verbose explain what is being done
Linking files: ln
Linux provides a couple of different ways of giving a file multiple names. One place this is
frequently used is for scripts that either start a program or stop it, depending on the name. If you
were to simply copy one file to another, and you needed to make a change, you would have to
change both files. Instead, you would create a "link". Links are nothing more than multiple files,
with different names, but referring to the exact same data on the hard disk.
There are actually two different kinds of links: "hard" and "soft". A hard link simply creates a new
directory entry for that particular file. This new directory entry can be in the current directory, or
any other directory on the same file system. This is an important aspect because Linux keeps track
of files using a numbered table, with each number representing a single set of data on your hard
disk. This number (the ) will be unique for each file system. Therefore, you cannot have hard links
between files on different file systems. (We'll get into details of inodes in the section of the hard
disk layout.) You can actually see this number if you want by using the -i to the ls command. You
might end up with output that looks like this.
184494 -rw-r--r-- 2 root root 2248 Aug 11 17:54 chuck 184494 -rw-r--r-- 2 root root 2248 Aug 11
17:54 jimmo 184502 -rw-r--r-- 1 root root 761 Aug 11 17:55 john
Look at the inode number associated with files jimmo and chuck; they are the same (184494). This
means that the two files are linked together and therefore are the exact same file. If you were to
change one file the other one would be changed as well.
To solve the limitation that links cannot cross filesystems, you would use a soft or "symbolic" link.
Rather than creating a new directory entry, like in the case of a hard link, a symbolic link is actually
a file that contains the pathname to the other file. Since a symbolic link contains the path, it can
point to files on other file systems, including files on completely different machines (for example, if
you are using NFS).
The downside of symbolic links is that when you remove the target file for a symbolic link, your
data is gone, even though the symbolic link still exists. To create either kind of link you use the ln
command, adding the -s option when you want to create a symbolic link. The syntax is basically the
same as the cp or mv command:

ln [-s] source destination

In this case "source" is the original file and "destination" is the new link.
In addition to being able to link files across file systems using symbolic links, symbolic links can be
used to link directories. Creating links to directories is not possible with hard links.

Display the contents of files: cat


You can display the contents of a file using the cat command. The syntax is very simple:

cat filename

If the file is large it may scroll off the screen. In that case you won't be able to see it all at one time.
In that case, you would probably use either the more or less commands. Both allow you to display
the contents of a file, while less allows you to scroll forward and backward throughout the files.
However, less is not found on every Unix operating system, so becoming familiar with more is
useful.
One might think that cat isn't useful. However, it is often used to sent the contents of a file through
another file (using a ). For example:

cat filename | sed 's/James/Jim/g'

This would send the file through the sed, replacing all occurrences of "James" with "Jim".
I often use cat to quickly create files without having to use an editor. One common thing is short-n-
sweat shell scripts like this:
cat > script
cat textfile | while read line
do
set -- $line
echo $1
done
<CTRL-D>

The first line redirects the of the script. Since we did not pass a filename to cat, it reads its input
from . The input is read into I press CTRL-D. I then change the and have an executable shell script.
For details on these constucts, see the section on basic shell scripting.
Note that the CTRL-D key combination is normally the default for the end-of-file character. This
can be displayed and changed using the stty command The end-of-file character is shown by the
value of eof=.
cat also has a few options that change it's behaviour. For example, the -E or --show-ends will show
a dollar sign ($) at the end of the line. Using -v or --show-nonprinting options will display non-
printable characters. Both of these are useful in determining if there are characters in your file that
you would not normally see.

Option Meaning
-b, --number-
number nonblank output lines
nonblank
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank never display more than one single blank line
-T, --show-tabs display TAB characters as ^I
-T equivalent to -vT
use ^ and M- notation to show non-printable characters, except for LFD and
-v, --show-nonprinting
TAB
-e equivalent to -vE
-A, --show-all equivalent to -vET
The cat command can also be used to combine multiple files. Here we need to consider to things.
First, the cat command simply displays all of the files listed to standard output. So to display three
files we might have this command: cat filename1 filename2 filename3

In the section on pipes and redirection, we talked about being able to redirect standard output to a
file using the greater-than symbol (>). So combining these concepts we might end up with this:
cat filename1 filename2 filename3 > new_file

This sends the contents of the three files (in the order given) into the file "newfile". Note that if the
file already exists, it will be overwritten. As we also discussed, you can also append to an existing
file using two greater-than symbols (>>).

Display the contents of files with line numbers: nl


As in many other cases, the nl command does some of the same thing that other commands do. For
example, cat -n will show you you lines numbers in the output, just as nl does.

Option Meaning
Option Meaning
-b, --body-numbering=STYLE use STYLE for numbering body lines
-d, --section-delimiter=CC use CC for separating logical pages
-f, --footer-numbering=STYLE use STYLE for numbering footer lines
-h, --header-numbering=STYLE use STYLE for numbering header lines
-i, --page-increment=NUMBER line number increment at each line
-l, --join-blank-lines=NUMBER group of NUMBER empty lines counted as one
-n, --number-format=FORMAT insert line numbers according to FORMAT
-p, --no-renumber do not reset line numbers at logical pages
Option Meaning
-s, --number-separator=STRING add STRING after (possible) line number
-v, --first-page=NUMBER first line number on each logical page
-w, --number-width=NUMBER use NUMBER columns for line numbers
Display the beginning of files: head
The head command displays the beginning or "head" of a file. By default, it displays the first 10
lines. Using the -n or --lines= option, you can specify how many lines to display. In some versionsv
of head you can simply preceed the number of lines with a dash, like this: head -20
I commonly use the head -1 when I want one just the first line of a file.
You can also specify multiple files on the command line. In which case, head will show you the
name of each file before the output. This can be suppressed with the -q or --quiet options. Also the
-v, --verbose will always display the header.
Note that head can also read from standard input. This means that it can server as one end of a pipe.
Therefore, you can send the output of other command through head. For example:
sort filename | head -5
This will sort the file and then give you the last five lines.

Option Meaning
-c, --bytes=SIZE print first SIZE bytes
-n, --lines=NUMBER print first NUMBER lines instead of first 10
-q, --quiet, --silent never print headers giving file names
-v, --verbose always print headers giving file names
Display the end of files: tail
The counterpart to the head command is tail. Instead of printing the start of a file, tail prints the end
of the file.
One very useful option that I use all of the time is -f. This "follows" a file or, in other words, it
continues to display the end of files as they are being written to. I use this quire often when
analyzing log files. Sometimes entries are coming to fast so I have to pipe the whole thing through
more, like this:
tail -f logfile | more
Once this is running, you can end it by pressing Ctrl-C. (Or whatever stty says your interrupt key
is).
If you use a plus-sign (+) along with a number, tail will start at that line number and then display the
read of the file. This is often useful if you want the output of a particular command, but not the
header information the comman displays. For example, I often use it like this:
vmstat 3 10 | tail +3
This starts at line 3 and displays the rest of the file.

Option Meaning
keep trying to open a file even if it is inaccessible when tail starts or if it
--retry
becomes inaccesible later -- useful only with -f
-c, --bytes=N output the last N bytes
-f, --follow[={name| output appended data as the file grows; -f, --follow, and
descriptor}] --follow=descriptor are equivalent
-F same as
--follow=name --retry
Seperating files by column: cut
The cut command is, as its name implies, used to cut up files. This can be done after a specific
number of characters or at specific "fields" within the file. If you look in the /etc/init.d/ directory,
you will find that there are quite a few scripts that use cut in one way or another.
In some cases, the file (or output) has fields that are a certain width. For example, a particular
column always starts at character 18 and the next one started at character 35. If you wanted to
display just the one field, your command might look like this: cut -c18-34 filename

Note that if you only specific a single number, you will get the single character at that position. If
you leave of the last number, then cut will display everything from the given position to the end of
the line.
If the file (or output) seperates the fields with a particular character (e.g. tab or semi-colon), you
cannot split the file at a specific character number, instead you need to split it by field number. For
example, if you wanted a list of the real names of all users in the /etc/passwd, your command might
look like this: cut -f 5 -d: /etc/passwd

Here too, you can specify a range of fields. For example, -f 5-8 would display fields 5 through 8. If
you wanted specific, non-ajoining fields you seperate them with a comma. For example, to display
the 1st and 5th fields in the previous example, the command might look like this:
cut -f 1,5 -d: /etc/passwd
Option Meaning
-b, --bytes=LIST output only these bytes
-c, --characters=LIST output only these characters
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
output only these fields; also print any line that contains -s option is
-f, --fields=LIST
no delimiter character, unless the specified
-n (ignored)
-s, --only-delimited do not print lines not containing delimiters
--output- use STRING as the output delimiter the default is to use
delimiter=STRING the input delimiter
Combining files: paste
The paste command is used to combine files. Lines in the second file that correspond sequentially to
lines in the first file are appended to the lines in the first file. Assume the first file consists of these
lines:
jim david daniel
and the second file looks like this:
jan dec sept
When you paste the two together you end up with this:
jim jan david dec daniel sept
Option Meaning
-d, --delimiters=LIST reuse characters from LIST instead of TABs
-s, --serial paste one file at a time instead of in parallel
Combining files: join
You can think of join as an enhance version of paste. However, in the case of join, the files you are
combining must have a field in common. For example, assume the first file consists of these lines:
jim jan david dec daniel sept
and the second looks like this:
jim pizza david soda daniel ice cream
When you join the two together you end up with this:
join three four jim jan pizza david dec soda daniel sept ice cream
This only works because both of the files have a common field. Note that the common field is not
as it would be had you used paste. To avoid problems with not being able to find matches, I suggest
that you first sort the files before you user join. Note that you do not necessarily need to match on
the first field as we did in the example. If necessary, the fields that match can be in any position in
either file. The -1 option defines which file do use in file one, and -2 defines the field to use in file
2.

Option Meaning
print unpairable lines coming from file FILENUM, where
-a FILENUM
FILENUM is 1 or 2, corresponding to FILE1 or FILE2
-e EMPTY replace missing input fields with EMPTY
-i, --ignore-case ignore differences in
case when comparing fields
-j FIELD (obsolescent) equivalent to `-1 FIELD -2 FIELD'
-j1 FIELD (obsolescent) equivalent to `-1 FIELD'
-j2 FIELD (obsolescent) equivalent to `-2 FIELD'
-o FORMAT obey FORMAT while constructing output line
-t CHAR use CHAR as input and output field separator
-v FILENUM like -a FILENUM, but suppress joined output lines
-1 FIELD join on this FIELD of file 1
Option Meaning
-2 FIELD join on this FIELD of file 2
Copying and converting files: dd
The dd command is used to create a "digital dump" of a file. It works very simply by opening the
source and destination files in binary mode and copying the contents of one to the other. In essence,
this is what the cp command does. However, dd also works with device nodes. Thus, you can use dd
to copy entire devices from one to the other.
Note that if you would use the dd command to copy a filesystem from one device to another (for
example /dev/hda1 to /dev/hdb1), you would not be copying invidivual files. Instead you would be
copying an image of the filesystem. This means that all of the the metadata for the file system (i.e.
inode table) would be overwritten and you would loose any existing data. If the target device was
smaller, you won't be able to get all of the old on the new one. Further, if the target device is just
one of several partition you may end up overwritting parts of other filesystems.
In it's simplest form, dd looks like this: dd if=input_file of=output_file

Where it reads from the input file and writes to the output file.
Two very useful options are ibs= (input bytes) and obs= (output bytes). Here you tell dd how many
bytes to read or write at the same time. When used properly this can save a great deal of time. For
example, if you are copying from one hard disk to another, the system reads one block and then
writes it. Because of the the latency of the spinning hard disk, it takes time for the disk to rotate
back to the correct position. If you choose a block size equal to the sector size of the hard disk, you
can read the whole sector at once, thus saving time.
The dd command can also be used when converting from one encoding to another. For example you
can convert files from to .

Option Meaning
bs=BYTES force ibs=BYTES and obs=BYTES
cbs=BYTES convert BYTES bytes at a time
conv=KEYWORDS convert the file as per the comma separated keyword list
count=BLOCKS copy only BLOCKS input blocks
ibs=n input block size (defaults to 512 byte blocks)
if=FILE read from FILE instead of stdin
obs=n output block size (defaults to 512 byte blocks)
of=FILE write to FILE instead of stdout
seek=BLOCKS skip BLOCKS obs-sized blocks at start of output
skip=BLOCKS skip BLOCKS ibs-sized blocks at start of input

Looking for Files


In the section on Interacting with the System we talked about using the ls command to look for files.
There we had the example of looking in the sub-directory ./letters/taxes for specific files. Using ls,
command we might have something like this: ls ./letters/*
What if the taxes directory contained a subdirectory for each year for the past five years, each of
these contained a subdirectory for each month, each of these contained a subdirectory for federal,
state, and local taxes, and each of these contained 10 letters?
If we knew that the letter we we're looking for was somewhere in the taxes subdirectory, the
command: ls ./letters/taxes/*

would show us the sub-directories of taxes (federal, local, state), and it would show their contents.
We could then look through this output for the file we were looking for.
What if the file we were looking for was five levels deeper? We could keep adding wildcards (*)
until we reached the right directory, as in: ls ./letters/taxes/*/*/*/*/*

This might work, but what happens if the files were six levels deeper. Well, we could add an extra
wildcard. What if it were 10 levels deeper and we didn't know it? Well, we could fill the line with
wildcards. Even if we had too many, we would still find the file we were looking for.
Fortunately for us, we don't have to type in 10 asterisks to get what we want. We can use the -R
option to ls to do a recursive listing. The -R option also avoids the "argument list too long" error
that we might get with wildcards. So, the solution here is to use the ls command like this:
ls -R ./letters/taxes | more

The problem is that we now have 1,800 files to look through. Piping them through more and
looking for the right file will be very time consuming. If we knew that it was there, but we missed it
on the first pass, we would have to run through the whole thing again.
The alternative is to have the more command search for the right file for you. Because the output is
more than one screen, more will display the first screen and at the bottom display --More--. Here,
we could type a slash (/) followed by the name of the file and press Enter. Now more will search
through the output until it finds the name of the file. Now we know that the file exists.
The problem here is the output of the ls command. We can find out whether a file exists by this
method, but we cannot really tell where it is. If you try this, you will see that more jumps to the spot
in the output where the file is (if it is there). However, all we see is the file name, not what directory
it is in. Actually, this problem exists even if we don't execute a search.
If you use more as the command and not the end of a pipe, instead of just seeing --More--, you will
probably see something like
--More--(16%)
This means that you have read 16 percent of the file.
However, we don't need to use more for that. Because we don't want to look at the entire output
(just search for a particular file), we can use one of three commands that Linux provides to do
pattern searching: grep, egrep, and fgrep. The names sound a little odd to the Linux beginner, but
grep stands for global regular expression print. The other two are newer versions that do similar
things. For example, egrep searches for patterns that are full regular expressions and fgrep searches
for fixed strings and is a bit faster. We go into details about the grep command in the section on
looking through files.
Let's assume that we are tax consultants and have 50 subdirectories, one for each client. Each
subdirectory is further broken down by year and type of tax (state, local, federal, sales, etc.). A
couple years ago, a client of ours bought a boat. We have a new client who also wants to buy a boat,
and we need some information in that old file.
Because we know the name of the file, we can use grep to find it, like this:
ls -R ./letters/taxes | grep boat

If the file is called boats, boat.txt, boats.txt, or letter.boat, the grep will find it because grep is only
looking for the pattern boat. Because that pattern exists in all four of those file names, all four
would be potential matches.
The problem is that the file may not be called boat.txt, but rather Boat.txt. Remember, unlike DOS,
UNIX is case-sensitive. Therefore, grep sees boat.txt and Boat.txt as different files. The solution
here would be to tell grep to look for both.
Remember our discussion on regular expressions in the section on shell basics? Not only can we use
regular expressions for file names, we can use them in the arguments to commands. The term
regular expression is even part of grep's name. Using regular expressions, the command might look
like this: ls -R ./letters/taxes | grep [Bb]oat

This would now find both boat.txt and Boat.txt.


Some of you may see a problem with this as well. Not only does Linux see a difference between
boat.txt and Boat.txt, but also between Boat.txt and BOAT.TXT. To catch all possibilities, we would
have to have a command something like this:
ls -R ./letters/taxes | grep [Bb][Oo][Aa][Tt]

Although this is perfectly correct syntax and it will find the files, it does not matter what case the
word "boat" is in, it is too much work. The programmers who developed grep realized that people
would want to look for things regardless of what case they are in. Therefore, they built in the -i
option, which simply says ignore the case. Therefore, the command
ls -R ./letters/taxes | grep -i boat

will not only find boats, boat.txt, boats.txt, and letter.boat, but it will also find Boat.txt and
BOAT.TXT as well.
If you've been paying attention, you might have noticed something. Although the grep command
will tell you about the existence of a file, it won't tell you where it is. This is just like piping it
through more. The only difference is that we're filtering out something. Therefore, it still won't tell
you the path.
Now, this isn't greps fault. It did what it was supposed to do. We told it to search for a particular
pattern and it did. Also, it displayed that pattern for us. The problem is still the fact that the ls
command is not displaying the full paths of the files, just their names.
Instead of ls, let's use a different command. Let's use find instead. Just as its name implies, find is
used to find things. What it finds is files. If we change the command to look like this:
find ./letters/taxes -print | grep -i boat
This finds what we are looking for and gives us the paths as well.
Before we go on, let's look at the syntax of the find command. There are a lot of options and it does
look foreboding, at first. We find it is easiest to think of it this way:
find <starting_where> <search_criteria> <do_something>

In this case, the "where" is ./letters/taxes. Therefore, find starts its search in the ./letters/taxes
directory. Here, we have no search criteria; we simply tell it to do something. That something was to
-print out what it finds. Because the files it finds all have a path relative to ./letters/taxes, this is
included in the output. Therefore, when we pipe it through grep, we get the path to the file we are
looking for.
We also need to be careful because the find command we are using will also find directories named
boat. This is because we did not specify any search criteria. If instead we wanted it just to look for
regular files (which is often a good idea), we could change the command to look like this:
find ./letters/taxes -type f -print | grep -i boat

Here we see the option -type f as the search criteria. This will find all the files of type f for regular
files. This could also be a d for directories, c for character special files, b for block special files, and
so on. Check out the find man-page for other types that you can use.
Too complicated? Let's make things easier by avoiding grep. There are many different things that
we can use as search criteria for find. Take a quick look at the man-page and you will see that you
can search for a specific owner, groups, permissions, and even names. Instead of having grep do the
search for us, let's save a step (and time) by having find do the search for us. The command would
then look like this: find ./letters./taxes -name boat -print

This will find any file named boat and list its respective path. The problem here is that it will only
find the files named boat. It won't find the files boat.txt, boats.txt, or even Boat.
The nice thing is that find understands about regular expressions, so we could issue the command
like this: find ./letters./taxes -name '[Bb]oat' -print

(Note that we included the single quote (') to avoid the square brackets ([]) from being first
interpreted by the shell.)
This command tells find to look for all files named both boat and Boat. However, this won't find
BOAT. We are almost there.
We have two alternatives. One is to expand the find to include all possibilities, as in
find ./letters./taxes -name '[Bb][Oo][Aa][Tt]' -print

This will find all the files with any combination of those four letters and print them out. However, it
won't find boat.txt. Therefore, we need to change it yet again. This time we have
find ./letters./taxes -name '[Bb][Oo][Aa][Tt]*' -print

Here we have passed the wildcard (*) to find to tell it took find anything that starts with "boat"
(upper- or lowercase), followed by anything else. If we add an extra asterisk, as in
find ./letters./taxes -name '*[Bb][Oo][Aa][Tt]*' -print
we not only get boat.txt, but also newboat.txt, which the first example would have missed.
This works. Is there an easier way? Well, sort of. There is a way that is easier in the sense that there
are less characters to type in. This is: find ./letters/taxes -print | grep -i boat

Isn't this the same command that we issued before? Yes, it is. In this particular case, this
combination of find and grep is the easier solution, because all we are looking for is the path to a
specific file. However, these examples show you different options of find and different ways to use
them. That's one of the nice things about Linux. There are many ways to get the same result.
Note that more recent versions of find do not require the -print options, as this is the default
behavior.
Looking for files with specific names is only one use of find. However, if you look at the find man-
page, you will see there are many other options you can use. One thing I frequently do is to look for
files that are older than a specific age. For example, on many systems, I don't want to hang on to log
files that are older than six months. Here I could use the -mtime options like this:
find /usr/log/mylogs -mtime +180

Which says to find everything in the /usr/log/mylogs directory which is older than 180 days (Not
exactly six months, but it works.) If I wanted, I could have used the -name option to have specified
a particular file pattern:
find ./letters./taxes -name '*[Bb][Oo][Aa][Tt]*' -mtime +180

One problem with this is what determines how "old" a file is? The first answer for many people is
that the age of a file is how long it has been since the file was created. Well, if I created a file two
years ago, but added new data to it a minute ago, is it "older" than a file that I created yesterday, but
have not changed since then? It really depends on what you are interested in. For log files, I would
say that the time the data in that was last changed is more significant than when the file was created.
Therefore, the -mtime is fitting as it bases its time on when the data was changed.
However, that's not always the case. Sometimes, you are interested in the last time the file was used,
or accessed. This is when you would use the -atime option. This is helpful in find old files on your
system that no one has used for a long time.
You could also use the -ctime option, which is based on when the files "status" was last changed.
The status is changed when the permissions or file owner is changed. I have used this option in
security contexts. For example, on some of our systems there are only a few places that contain files
that should change at all. For example, /var/log. If I search on all files that were changed at all
(content or status), it might give me an indication of improper activity on the system. I can run a
script a couple of times an hour to show me the files that have changed within the last day. If
anything shows up, I suspect a security problem.
Three files that we specifically monitor are /etc/passwd, /etc/group and /etc/shadow. Interestingly
enough, we want to have these files change once a month (/etc/shadow). This is our "proof" that the
root password was changed as it should be at regular intervals. Note that we have other mechanisms
to ensure that it was the root password that was changed and not simply changing something else in
the file, but you get the idea. One place you see this mechanism at work is your /usr/lib/cron/run-
crons file, which is started from /etc/crontab every 15 minutes.
One shortcoming of -mtime and the others is that it measures time in 24 hour increments starting
from now. That means that you cannot find anything that was changed within the last hour, for
example. For this newer versions of find have the -cmin, -amin and -mmin options, which measure
times in minutes. So, to find all of the files changed within the last hour (i.e. last 60 minutes) we
might have something like this: find / -amin -60

In this example, the value was preceded with a minus sign (-), which means that we are looking for
files with a value less than what we specified. In this case, we want values less than 60 minutes. In
the example above, we use a plus-sign (+) before the value, which means values greater that what
we specified. If you use neither one, then the time is exactly what you specified.
Along the same vein, are the options -newer, -anewer, -cnewer, which find files which are newer
than the file specified.
Note also that these commands find everything in the specified path older or younger than what we
specify. This includes files, directories, device nodes and so forth. Maybe this is what you want, but
not always. Particularly if you are using the -exec option and what to search through each file you
find, looking for "non-files" is not necessarily a good idea. To specify a file type, find provides you
with the -type option. Among the possible file type are:
• b - block device
• c - character device
• d - directory
• p - named pipe (FIFO)
• f - regular file
• l - symbolic link
• s - socket

As you might expect, you can combine the -type option with the other options we discussed, to give
you something like this:

find ./letters./taxes -name '*[Bb][Oo][Aa][Tt]*' -type f -mtime +180

The good news and the bad news at this point is that there are many, many more options you can
use. For example. you can search for files based on their permissions (-perm), their owner (-users),
their size (-size), and so forth. Many I occasionally use, some I have never used. See the find man-
page for a complete list.
In addition, to the -exec option, there are a number of other ones that are applied to the files that are
found (rather than used to restrict what files are found). Note that in most documentation, the
options used to restrict the search are called tests and the options that perform an operation on the
files are called actions. One very simple action is -ls, which does a listing of the files the same as
using the -dils options to the ls command.
A variant of the -exec action is -ok. Rather than simply performing the action on each file, -ok with
first ask you to confirm that it should do it. Pressing "Y" or "y" will run the command, pressing
anything else will not.
With what we have discussed so far, you might run into a snag if there is more than one criterion
you want to search on (i.e. more than one test). Find addresses that by allowing you to combine tests
using either OR (-o -or ) or AND (-a -and). Furthermore, you can negate the results of any tests (!
-not). Let's say we wanted to find all of the files that were not owned by the user jimmo. Our
command might look like this:
find ./ -name *.html -not -user jimmo

This brings up an important issue. In the section on interpreting the command, we talk about the fact
that the shell expands wildcards before passing them to the command to be executed. In this
example, if there was a file in the current directory ending in .html, the shell would first expand
the .html to that name before passing it to find. We therefore need to "protect" it before we pass it.
This is done using single quotes and the resulting command might look like this:
find ./ -name '*.html' -not -user jimmo

For details on how quoting works, check out the section on quotes.
It is important to keep in mind the order in which things are evaluated. First, negation (-not ! ),
followed by AND (and -a), then finally OR (-o -or). In order to force evaluation in a particular way,
you can include expressions in parentheses. For example, if we wanted all of the files or directories
owned by either root or bin, the command might look like this:
find / \( -type f -o -type d \) -a \( -user root -o -user bin \)

The requires a little explanation. I said that you would use parentheses to group the tests together.
However, they are preceded here with a . The reason is that the shell will see the parentheses and try
to execute what is inside in a separate shell, which is not what we wanted.

Looking Through Files


In the section on looking for files, we talk about various methods for finding a particular file on
your system. Let's assume for a moment that we were looking for a particular file, so we used the
find command to look for a specific file name, but none of the commands we issued came up with a
matching file. There was not a single match of any kind. This might mean that we removed the file.
On the other hand, we might have named it yacht.txt or something similar. What can we do to find
it?
We could jump through the same hoops for using various spelling and letter combinations, such as
we did for yacht and boat. However, what if the customer had a canoe or a junk? Are we stuck with
every possible word for boat? Yes, unless we know something about the file, even if that something
is in the file.
The nice thing is that grep doesn't have to be the end of a pipe. One of the arguments can be the
name of a file. If you want, you can use several files, because grep will take the first argument as
the pattern it should look for. If we were to enter
grep [Bb]oat ./letters/taxes/*

we would search the contents of all the files in the directory ./letters/taxes looking for the word Boat
or boat.
If the file we were looking for happened to be in the directory ./letters/taxes, then all we would need
to do is run more on the file. If things are like the examples above, where we have dozens of
directories to look through, this is impractical. So, we turn back to find.
One useful option to find is -exec. When a file is found, you use -exec to execute a command. We
can therefore use find to find the files, then use -exec to run grep on them. Still, you might be asking
yourself what good this is to you. Because you probably don't have dozens of files on your system
related to taxes, let's use an example from files that you most probably have.
Let's find all the files in the /etc directory containing /bin/sh. This would be run as
find ./etc -exec grep /bin/sh {} \;

The curly braces ({ }) are substituted for the file found by the search, so the actual grep command
would be something like
grep /bin/sh ./etc/filename

The "\;" is a flag saying that this is the end of the command.
What the find command does is search for all the files that match the specified criteria then run grep
on the criteria, searching for the pattern [Bb]oat. (in this case there were no criteria, so it found them
all)
Do you know what this tells us? It says that there is a file somewhere under the directory
./letters/taxes that contains either "boat" or "Boat." It doesn't tell me what the file name is because
of the way the -exec is handled. Each file name is handed off one at a time, replacing the {}. It is as
though we had entered individual lines for
grep [Bb]oat ./letters/taxes/file1

grep [Bb]oat ./letters/taxes/file2

grep [Bb]oat ./letters/taxes/file3

If we had entered
grep [Bb]oat ./letters/taxes/*

grep would have output the name of the file in front of each matching line it found. However,
because each line is treated separately when using find, we don't see the file names. We could use
the -l option to grep, but that would only give us the file name. That might be okay if there was one
or two files. However, if a line in a file mentioned a "boat trip" or a "boat trailer," these might not be
what we were looking for. If we used the -l option to grep, we wouldn't see the actual line. It's a
catch-22.
To get what we need, we must introduce a new command: xargs. By using it as one end of a pipe,
you can repeat the same command on different files without actually having to input the command
multiple times.
In this case, we would get what we wanted by typing
find ./letters/taxes -print | xargs grep [Bb]oat
The first part is the same as we talked about earlier. The find command simply prints all the names
it finds (all of them, in this case, because there were no search criteria) and passes them to xargs.
Next, xargs processes them one at a time and creates commands using grep. However, unlike the
-exec option to find, xargs will output the name of the file before each matching line.
Obviously, this example does not find those instances where the file we were looking for contained
words like "yacht" or "canoe" instead of "boat." Unfortunately, the only way to catch all
possibilities is to actually specify each one. So, that's what we might do. Rather than listing the
different possible synonyms for boat, lets just take the three: boat, yacht, and canoe.
To do this, we need to run the find | xargs command three times. However, rather than typing in the
command each time, we are going to take advantage of a useful aspect of the shell. In some
instances, the shell knows when you want to continue with a command and gives you a secondary
prompt. If you are running sh or ksh, then this is probably denoted as ">."
For example, if we typed
find ./letters/taxes -print |
the shell knows that the pipe (|) cannot be at the end of the line. It then gives us a > or ? prompt
where we can continue typing
> xargs grep -i boat
The shell interprets these two lines as if we had typed them all on the same line. We can use this
with a shell construct that lets us do loops. This is the for/in construct for sh and ksh, and the
foreach construct in csh. It would look like this:
for j in boat ship yacht
> do
> find ./letters/taxes -print | xargs grep -i $j
> done

In this case, we are using the variable j, although we could have called it anything we wanted. When
we put together quick little commands, we save ourselves a little typing by using single letter
variables.
In the bash/sh/ksh example, we need to enclose the body of the loop inside the do-done pair. In the
csh example, we need to include the end. In both cases, this little command we have written will
loop through three times. Each time, the variable $j is replaced with one of the three words that we
used. If we had thought up another dozen or so synonyms for boat, then we could have included
them all. Remember also that the shell knows that the pipe (|) is not the end of the command, so this
would work as well.
for j in boat ship yacht
> do
> find ./letters/taxes -print |
> xargs grep -i $j
> done

Doing this from the command line has a drawback. If we want to use the same command again, we
need to retype everything. However, using another trick, we can save the command. Remember that
both the ksh and csh have history mechanisms to allow you to repeat and edit commands that you
recently edited. However, what happens tomorrow when you want to run the command again?
Granted, ksh has the .sh_history file, but what about sh and csh?
Why not save commands that we use often in a file that we have all the time? To do this, you would
create a basic shell script, and we have a whole section just on that topic.
When looking through files, I am often confronted with the situation where I am not just looking for
a single text, but possible multiple matches. Imagine a data file that contains a list of machines and
their various characteristics, each on a separate line, which starts with that characteristic. For
example:
Name: lin-db-01 IP: 192.168.22.10 Make: HP CPU: 700 RAM: 512 Location: Room 3
All I want is the computer name, the IP address and the location, but not the others. I could do three
individual greps, each with a different pattern. However, it would be difficult to make the
association between the separate entries. That is, the first time I would have a list of machine's
names, the second time a list of addresses and the third time a list of locations. I have written scripts
before that handle this kind of situation, but in this case it would be easier to use a standard Linux
command: egrep.
The egrep command is an extension of the basic grep command. (The 'e' stands for extended) In
older versions of grep, you did not have the ability to use things like [:alpha:] to represent
alphabetic characters, so extended grep was born. For details on representing characters like this
check out the section in regular expressions.
One extension is the ability to have multiple search patterns that are checked simultaneously. That
is, if any of the patterns are found, the line is displayed. So in the problem above we might have a
command like this: egrep "Name:|IP:|Location:" FILENAME

This would then list all of the respective lines in order, making association between name and the
other values a piece of cake.
Another variant of grep is fgrep, which interprets the search pattern as a list of fixed strings,
separated by newlines, any of which is to be matched. On some systems, grep, egrep and fgrep will
all be a hard link to the same file.
I am often confronted with files where I want to filter out the "noise". That is, there is a lot of stuff
in the files that I don't want to see. A common example, is looking through large shell scripts or
configuration files when I am not sure exactly what I am looking for. I know when I see it, but to
simply grep for that term is impossible, as I am not sure what it is. Therefore, it would be nice to
ingore things like comments and empty lines.
Once again we could use egrep as there are two expressions we want to match. However, this type
we also use the -v option, which simply flips or inverts the meaning of the match. Let's say there
was a start-up script that contained a variable you were looking for, You might have something like
this: egrep -v "^$|^#" /etc/rc.d/*|more

The first part of the expressions says to match on the beginning of the line (^) followed immediately
by the end of the line ($), which turn out to be all empty lines. The second part of the expression
says to match on all lines that start with the pound-sign (a comment). This ends up giving me all of
the "interesting" lines in the file. The long option is easier to remember: --invert-match.
You may also run into a case where all you are interested in is which files contain a particular
expression. This is where the -l option comes in (long version: --files-with-matches). For example,
when I made some style changes to my web site I wanted to find all of the files that contained a
table. This means the file had to contain the <TABLE> tag. Since this tag could contain some
options, I was interested in all of the file which contained "<TABLE". This could be done like this:
grep -l '<TABLE' FILENAME

There is an important thing to note here. In the section on interpreting the command, we learn that
the shell sets up file redirection before it tries to execute the command. If we don't include the less-
than symbol in the single quotes, the shell will try to redirect the input from a file name "TABLE".
See the section on quotes for details on this.
The -l option (long version: --files-with-matches) says to simply list the file names. Using the -L
option (long version: --files-without-match) we have the same effect as using both the -v and the -l
options. Note that in both cases, the lines containing the matches are not displayed, just the file
name.
Another common option is -q (long: --quiet or --silent). This does not display anything. So, what's
the use in that? Well, often, you simply want to know if a particular value exists in a file. Regardless
of the options you use, grep will return 0 if any matches were found, and 1 if no matches were
found. If you check the $? variable after running grep -q. If it is 0, you found a match. Check out the
section on basic shell scripting for details on the $? and other variables.
Keep in mind that you do not need to use grep to read through files. Instead, it can be one end of a
pipe. For example, I have a number of scripts that look through the process list to see if a particular
process is running. If so, then I know all is well. However, if the process is not running, a message
is sent to the administrators.

Basic Shell Scripting


In many of the other sections of the shell and utilities, we talked about a few programming
constructs that you could use to create a quick script to perform some complex task. What if you
wanted to repeat that task with different parameters each time? One simple solution is to is to re-
type everything each time. Obviously not a happy thing.
We could use vi or some other text editor to create the file. However, we could take advantage of a
characteristic of the cat command, which is normally used to output the contents of a file to the
screen. You can also redirect the cat to another file.
If we wanted to combine the contents of a file, we could do something like this:
cat file1 file2 file3 >newfile

This would combine file1, file2, and file3 into newfile.


What happens if we leave the names of the source files out? In this instance, our command would
look like this:
cat > newfile
Now, cat will take its input from the default input file, stdin. We can now type in lines, one at a
time. When we are done, we tell cat to close the file by sending it an end-of-file character, Ctrl-D.
So, to create the new command, we would issue the cat command as above and type in our
command as the following:

for j in boat ship yacht


do
find ./letters/taxes -print | xargs grep -i $j
done

<CTRL-D>
Note that here the secondary prompt, >, does not appear because it is cat that is reading our input
and not the shell. We now have a file containing the five lines that we typed in that we can use as a
shell script.
However, right now, all that we have is a file named newfile that contains five lines. We need to tell
the system that it is a shell script that can be executed. Remember in our discussion on operating
system basics that I said that a file's permissions need to be set to be able to execute the file. To
change the permissions, we need a new command: chmod. (Read as "change mode" because we are
changing the mode of the file.)
The chmod command is used to not only change access to a file, but also to tell the system that it
should try to execute the command. I said "try" because the system would read that file, line-by-
line, and would try to execute each line. If we typed in some garbage in a shell script, the system
would try to execute each line and would probably report not found for every line.
To make a file execute, we need to give it execute permissions. To give everyone execution
permissions, you use the chmod command like this:
chmod +x newfile

Now the file newfile has execute permissions, so, in a sense, it is executable. However, remember
that I said the system would read each line. In order for a shell script to function correctly, it also
needs to be readable by the person executing it. In order to read a file, you need to have read
permission on that file. More than likely, you already have read permissions on the file since you
created it. However, since we gave everyone execution permissions, let's give them all read
permissions as well, like this:
chmod +r newfile

You now have a new command called newfile. This can be executed just like any the system
provides for you. If that file resides in a directory somewhere in your path, all you need to do is type
it in. Otherwise, (as we talked about before) you need to enter in the path as well. Keep in mind that
the system does not need to be able to read binary programs. All it needs to be able to do is execute
them. Now you have your first shell script and your first self-written UNIX command.
What happens if, after looking through all of the files, you don't find the one you are looking for.
Maybe you were trying to be sophisticated and used "small aquatic vehicle" instead of boat. Now,
six months later, you cannot remember what you called it. Looking through every file might take a
long time. If only you could shorten the search a little. Because you remember that the letter you
wrote was to the boat dealer, if you could remember the name of the dealer, you could find the
letter.
The problem is that six months after you wrote it, you can no more remember the dealer's name
than you can remember whether you called it a "small aquatic vehicle" or not. If you are like me,
seeing the dealer's name will jog your memory. Therefore, if you could just look at the top portion
of each letter, you might find what you are looking for. You can take advantage of the fact that the
address is always at the top of the letter and use a command that is designed to look there. This is
the head command, and we use it like this: find ./letters/taxes -exec head {} \;

This will look at the first 10 (the default for head) lines of each of the files that it finds. If the
addressee were not in the first ten lines, but rather in the first 20 lines, we could change the
command to be : find ./letters/taxes -exec head -20 {} \;

The problem with this is that 20 lines is almost an entire screen. If you ran this, it would be
comparable to running more on every file and hitting q to exit after it showed the first screen.
Fortunately, we can add another command to restrict the output even further. This is the tail
command, which is just the opposite of head as it shows you the bottom of a file. So, if we knew
that the address resided on lines 15-20, we could run a command like this:
find ./letters/taxes -exec head -20 {} \; | tail -5

This command passes the first 20 lines of each file through the pipe, and then tail displays the last
five lines. So you would get lines 15-20 of every file, right? Not quite.
The problem is that the shell sees these as two tokens. That is, two separate commands: find
./letters/taxes -exec head -20 {} \; and tail -5. All of the output of the find is sent to the pipe and it is
the last five lines of this that tail shows. Therefore, if the find | head had found 100 files, we would
not see the contents of the first 99 files!
The solution is to add two other shell constructs: while and read. The first command carries out a
particular command (or set of commands) while some criteria are true. The read can read input
either from the command line, or as part of a more complicated construction. So, using cat again to
create a command as we did above, we could have something like this:
find ./letters/taxes -print | while read FILE
do
echo $FILE
head -20 $FILE | tail -5
done

In this example, the while and read work together. The while will continue so long as it can read
something into the variable FILE; that is, so long as there is output coming from find. Here again,
we also need to enclose the body of the loop within the do-done pair.
The first line of the loop simply echoes the name of the file so we can keep track of what file is
being looked at. Once we find the correct name, we can use it as the search criteria for a find | grep
command. This requires looking through each file twice. However, if all you need to see is the
address, then this is a lot quicker than doing a more on every file.
If you have read through the other sections, you have a pretty good idea of how commands can be
put together to do a wide variety of tasks. However, to create more complicated scripts, we need
more than just a few commands. There are several shell constructs that you need to be familiar with
to make complicated scripts. A couple (the while and for-in constructs) we already covered.
However, there are several more that can be very useful in a wide range of circumstances.
There are several things we need to talk about before we jump into things. The first is the idea of
arguments. Like binary programs, you can pass arguments to shell scripts and have them use these
arguments as they work. For example, let's assume we have a script called myscript that takes three
arguments. The first is the name of a directory, the second is a file name, and the third is a word to
search for. The script will search for all files in the directory with any part of their name being the
file name and then search in those files for the word specified. A very simple version of the script
might look like this:

ls $1 | grep $2 | while read file


do
grep $3 ${1}/${file}
done

The syntax is: myscript directory file_name word

I discussed the while-do-done construct when I discussed different commands like find and grep.
The one difference here is that we are sending the output of a command through a second pipe
before we send it to the while.
This also brings up a new construct: ${1}/${file}. By enclosing a variable name inside of curly
braces, we can combine variables. In this case, we take the name of the directory (${1}), and tack
on a "/" for a directory separator, followed by the name of a file that grep found (${file}). This
builds up the path name to the file.
When we run the program like this
myscript /home/jimmo trip boat
the three arguments /home/jimmo, trip, and boat are assigned to the positional parameters 1, 2, and
3, respectively. "Positional" because the number they are assigned is based on where they appear in
the command. Because the positional parameters are shell variables, we need to refer to them with
the leading dollar sign ($).
When the shell interprets the command, what is actually run is
ls /home/jimmo | grep trip | while read file
do
grep boat /home/jimmo/${file}
done

If we wanted, we could make the script a little more self-documenting by assigning the values of the
positional parameters to variables. The new script might look like this:
DIR=$1
FILENAME=$2
WORD=$3
ls -1 $DIR | grep $FILENAME | while read file
do
grep $WORD ${DIR}/${file}
done
If we started the script again with the same arguments, first /home/jimmo would get assigned to the
variable DIR, trip would get assigned to the variable FILENAME, and boat would get assigned to
WORD. When the command was interpreted and run, it would still be evaluated the same way.
Being able to assign positional parameters to variables is useful for a couple of reasons. First is the
issue of self-documenting code. In this example, the script is very small and because we know what
the script is doing, we probably would not have made the assignments to the variables. However, if
we had a larger script, then making the assignment is very valuable in terms of keeping track of
things.
The next issue is that it might seem that many older shells can only reference 10 positional
parameters. The first $0 refers to the script itself. What this can be used for, we'll get to in a minute.
The others, $1-$9, refer to the arguments that are passed to the script. What happens if you have
more than nine arguments? This is where the shift instructions come in. These move the arguments
"down" in the positional parameters list.
For example, let's assume we changed the first part of the script like this:
DIR=$1
shift
FILENAME=$1

On the first line, the value of positional parameter 1 is /home/jimmo and we assign it to the variable
DIR. In the next line, the shift moves every positional parameter down. Because $0 remains
unchanged, what was in $1 (/home/jimmo) drops out of the bottom. Now, the value of positional
parameter 1 is trip, which is assigned to the variable FILENAME, and positional parameter 2 (boat)
is assigned to WORD. If we had 10 arguments, the tenth would initially be unavailable to us.
However, once we do the shift, what was the tenth argument is shifted down and becomes the ninth.
It is now accessible through the positional parameter 9. If we had more than 10, there are a couple
of ways to get access to them. First, we could issue enough shifts until the arguments all moved
down far enough. Or, we could use the fact that shift can take as an argument the number of shifts it
should do. Therefore, using shift 9 makes the tenth argument positional parameter 1.
What about the other nine arguments? Are they gone? If you never assigned them to a variable, then
yes, they are gone. However, if you assigned them to a variable before you made the shift, you still
have access to their values. New versions of many shells (such as bash) can handle greater number
of position parameters. However, being able to shift positional parameters comes in handy in other
instances, which brings up the issue of a new parameter: $*. This parameter refers to all the
positional parameters (except $0). So, we had 10 positional parameters and did a shift 2 (ignoring
whatever we did with the first two), the parameter $* would contain the value of the last eight
arguments. In our sample script above, if we wanted to search for a phrase and not just a single
word, we could change the script to look like this:
DIR=$1
FILENAME=$2
shift 2
WORD=$*
ls -1 $DIR | grep $FILENAME | while read file
do
grep "$WORD" ${DIR}/${file}
done
The first change was that after assigning positional parameters 1 and 2 to variables, we shifted
twice, effectively removing the first two arguments. We then assigned the remaining argument to
the variable WORD (WORD=$*). Because this could have been a phrase, we needed to enclose the
variable in double-quotes ("$WORD"). Now we can search for phrases as well as single words. If
we did not include the double quotes, the system would view our entry as individual arguments to
grep.
Another useful parameter keeps track of the total number of parameters: $#. In the previous script,
what would happen if we had only two arguments? The grep would fail because there would be
nothing for it to search for. Therefore, it would be a good thing to keep track of the number of
arguments. We need to first introduce a new construct: if-then-fi. This is similar to the while-do-
done construct, where the if-fi pair marks the end of the block (fi is simply if reversed). The
difference is that instead of repeating the commands within the block while the specific condition is
true, we do it only once, if the condition is true. In general, it looks like this:
if [ condition ]
then
do something
fi

The conditions are all defined in the test man-page. They can be string comparisons, arithmetic
comparisons, and even conditions where we test specific files, such as whether the files have write
permission. Check out the test man-page for more examples. Because we want to check the number
of arguments passed to our script, we will do an arithmetic comparison. We can check if the values
are equal, the first is less than the second, the second is less than the first, the first is greater than or
equal to the second, and so on. In our case, we want to ensure that there are at least three arguments,
because having more is valid if we are going to be searching for a phrase. Therefore, we want to
compare the number of arguments and check if it is greater than or equal to 3. So, we might have
something like this:
if [ $# -ge 3 ]
then
body_of_script
fi

If we have only two arguments, the test inside the brackets is false, the if fails, and we do not enter
the loop. Instead, the program simply exits silently. However, to me, this is not enough. We want to
know what's going on, therefore, we use another construct: else. When this construct is used with
the if-then-fi, we are saying that if the test evaluates to true, do one thing; otherwise, do something
else. In our example program, we might have something like this:
if [ $# -ge 3 ]
then
DIR=$1
FILENAME=$2
shift 2
WORD=$*
ls -1 $DIR | grep $FILENAME | while read file
do
grep "$WORD" ${DIR}/${file}
done
else
echo "Insufficient number of arguments"
fi
If we only put in two arguments, the if fails and the commands between the else and the fi are
executed. To make the script a little more friendly, we usually tell the user what the correct syntax
is; therefore, we might change the end of the script to look like this:
else
echo "Insufficient number of arguments"
echo "Usage: $0 <directory> <file_name> <word>"
fi

The important part of this change is the use of the $0. As I mentioned a moment ago, this is used to
refer to the program itself not just its name, but rather the way it was called. Had we hard-coded the
line to look like this
echo "Usage: myscript <directory> <file_name> <word>"

then no matter how we started the script, the output would always be
Usage: myscript <directory> <file_name> <word>

However, if we used $0 instead, we could start the program like this


/home/jimmo/bin/myscript /home/jimmo file

and the output would be


Usage: /home/jimmo/bin/myscript <directory> <file_name> <word>

On the other hand, if we started it like this


./bin/myscript /home/jimmo file

the output would be


Usage: ./bin/myscript <directory> <file_name> <word>

One thing to keep in mind is that the else needs to be within the matching if-fi pair. The key here is
the word matching. We could nest the if-then-else-fi several layers if we wanted. We just need to
keep track of things. The key issues are that the ending fi matches the last fi and the else is enclosed
within an if-fi pair. Here is how multiple sets might look:
if [ $condition1 = "TRUE" ]
then
if [ $condition2 = "TRUE" ]
then
if [ $condition3 = "TRUE"]
then
echo "Conditions 1, 2 and 3 are true"
else
echo "Only Conditions 1 and 2 are true"
fi
else
echo "Only Condition 1 is true"
fi
else
echo "No conditions are true"
fi
This doesn't take into account the possibility that condition1 is false, but that either condition2 or
condition3 is true or that conditions 1 and 3 are true, but 2 is false. However, you should see how to
construct nested conditional statements.
What if we had a single variable that could take on several values? Depending on the value that it
acquired, the program would behave differently. This could be used as a menu, for example. Many
system administrators build such a menu into their user's .profile (or .login) so that they never need
to get to a shell. They simply input the number of the program that they want to run and away they
go.
To do something like this, we need to introduce yet another construct: the case-esac pair. Like the if-
fi pair, esac is the reverse of case. So to implement a menu, we might have something like this:
read choice
case $choice in
a) program1;;
b) program2;;
c) program3;;
*) echo "No such Option";;
esac

If the value of choice that we input is a, b, or c, the appropriate program is started. The things to
note are the in on the first line, the expected value that is followed by a closing parenthesis, and that
there are two semi-colons at the end of each block.
It is the closing parenthesis that indicates the end of the possibilities. If we wanted, we could have
included other possibilities for the different options. In addition, because the double semi-colons
mark the end of the block, we could have simply added another command before we got to the end
of the block. For example, if we wanted our script to recognized either upper- or lowercase, we
could change it to look like this:
read choice
case $choice in
a|A) program1
program2
program3;;
b|B) program2
program3;;
c|C) program3;;
*) echo "No such Option";;
esac

If necessary, we could also include a range of characters, as in


case $choice in
[a-z] ) echo "Lowercase";;
[A-Z] ) echo "Uppercase";;
[0-9] ) echo "Number";;
esac

Now, whatever is called as the result of one of these choices does not have to be a UNIX command.
Because each line is interpreted as if it were executed from the command line, we could have
included anything as though we had executed the command from the command line. Provided they
are known to the shell script, this also includes aliases, variables, and even shell functions.
A shell function behaves similarly to functions in other programming languages. It is a portion of
the script that is set off from the rest of the program and is accessed through its name. These are the
same as the functions we talked about in our discussion of shells. The only apparent difference is
that functions created inside of a shell script will disappear when the shell exits. To prevent this,
start the script with a . (dot).
For example, if we had a function inside a script called myscript, we would start it like this:
./myscript
One construct that I find very useful is select. With select, you can have a quick menuing system. It
takes the form
select name in word1 word2 ...
do
list
done

where each word is presented in a list and preceded by a number. Inputting that number sets the
value of name to the word following that number. Confused? Lets look at an example. Assume we
have a simple script that looks like this:
select var in date "ls -l" w exit
do
$var
done

When we run this script, we get


1) date
2) ls -l
3) w
4) exit
#?

The "#?" is whatever you have defined as the PS3 (third-level prompt) variable. Here, we have just
left it at the default, but we could have set it to something else. For example:
export PS3="Enter choice: "

This would make the prompt more obvious, but you need to keep in mind that PS3 would be valid
everywhere (assuming you didn't set it in the script).
In our example, when we input 1, we get the date. First, however, the word "date" is assigned to the
variable "var." The single line within the list expands that variable and the line is executed. This
gives us the date. If we were to input 2, the variable "var" would be assigned the word "ls -l" and we
would get a long listing of the current directory (not where the script resides). If we input 4, when
the line was executed, we would exit from the script.
In an example above we discussed briefly the special parameter $#. This is useful in scripts, as it
keeps track of how many positional parameters there were and if there are not enough, we can
report an error. Another parameter is $*, which contains all of the positional parameters. If you want
to check the status of the last command you execute using the $? variable.
The process ID of the current shell is stored in the $$ parameter. Paired with this is the $!
parameters, which is the process ID of the last command executed in the background.
One thing I sort of glossed over up to this point was the tests we made in the if-statements in the
examples above. In one case we had this:
if [ $# -ge 3 ]

As we mentioned, this checks the number of command-line arguments ($#) and tests whether it is
greater than or equal to 3. We could have written it like this:
if test $# -ge 3

With the exact same result. In the case of the bash, both [ and test are built into the shell. However,
with other shells, they are external commands (however they are typically together). If you look at
either the test or bash man-page, you will see that there are many more things we can test. In our
examples, we were either testing two strings or testing numerical values. We can also test many
different conditions related to files, not just variables as we did in these examples.
It is common with many of the system scripts (i.e. those under /etc/rc.d) that they will first test if a
particular file exists before proceeding. For example, a script might want to test if a configuration
file exists. If so, it will read that file and use the values found in that file. Otherwise it will use
default values. Sometimes these scripts will check whether a file exists and is executable. In both
cases, a missing file could mean an error occurred or simply that a particular package was not
installed.

Managing Scripts
A very common use of shell scripts that you write is to automate work. If you need to run the
command by hand each time, it often defeats the intent of the automation. Therefore, it is also very
common that commands are started from cron.
As Murphy's Law would have it, sometimes something will prevent the script from ending.
However, each time starts, a new process is started, so you end up with dozens, if not hundreds of
processes. Depending on the script, this could have a dramatic effect on the performance of your
system. The solution is to make sure that the process can only start once, or if it is already running,
you want to stop any previous instances.
So, the first question is how to figure out what processes are running, which is something we go
into details about in another section. In short, you can use the ps command to see what processes are
running:
ps aux | grep your_process_name | grep -v grep

Note that when you run this command, it will also appear in the process table. Since your process
name is an argument to the grep command, grep ends up finding itself. The grep -v grep says to skip
entries that containing the word "grep" which means you do not find the command you just issued.
Assuming that the script is only started from cron, the only entries found will be those started by
cron. If the return code of the command is 1, you know the process is running (or at least grep found
a match.)
In your script, you check for the return code and if it is 1, the script exits, otherwise it does the
intended work. Alternatively, you can make the assumption that if it is still running, there is a
problem and you want to kill the process. You could use ps, grep, and awk to get the of that
processes (or even multiple processes). However, it is a lot easier using the pidof command. You
end up with something like this:
kill `pidof your_process_name`

The problem with that is the danger of killing a process that you hadn't intended. Therefore, you
need to be sure that you kill the correct process. This is done by storing the of the process in a file
and then checking for the existence of that file each time your scripts starts. If the file does not exist,
it is assumed the process is not running, so the very next thing the script does is create the file. This
could be done like this:
echo $$ > PID_file

This is already done by many system processes and typically these files are stored in /var/run and
have the ending .pid. Therefore, the file containing the of your HTTP server is /var/run/httpd.pid.
You can then be sure you get the right process with a command like this: kill `cat PID_file`

Note that in your script, you should first check for the existence of the file before you try to kill the
process. If the process does not exist, but the file does, maybe the process died. Depending on how
long ago the process died, it is possible that the has been re-used and now belongs to a completely
different process. So as an added safety measure you could verify that the belongs to the correct
process.
To get some ideas on how existing scripts manage processes take a look at the init scripts in
/etc/rc.d.
Details on if-then constructs in scripts can be found here.
Details on using back-quotes can be found here.
Details on file redirection can be found here.

Odds and Ends


This section includes a few tidbits that I wasn't sure where to put.
You can get the shell to help you debug your script. If you place set -x in your script, each command
with its corresponding arguments is printed as it is executed. If you want to just show a section of
your script, include the set -x before that section, then another set +x at the end. The set +x turns off
the output.
If you want, you can capture output into another file, without having it go to the screen. This is done
using the fact that output generated as a result of the set -x is going to stderr and not stdout. If you
redirect stdout somewhere, the output from set -x still goes to the screen. On the other hand, if you
redirect stderr, stdout still goes to your screen. To redirect sterr to a file start, the script like this:
mscript 2>/tmp/output

This says to send file descriptor 2 (stderr) to the file /tmp/output.


To create a directory that is several levels deep, you do not have to change directories to the parent
and then run mkdir from there. The mkdir command takes as an argument the path name of the
directory you want to create. It doesn't matter if it is a subdirectory, relative path, or absolute path.
The system will do that for you. Also, if you want to create several levels of directories, you don't
have to make each parent directory before you make the subdirectories. Instead, you can use the -p
option to mkdir, which will automatically create all the necessary directories.
For example, imagine that we want to create the subdirectory ./letters/personal/john, but the
subdirectory letters does not exist yet. This also means that the subdirectory personal doesn't exist,
either. If we run mkdir like this:
mkdir -p ./letters/personal/john

then the system will create ./letters, then ./letters/personal, and then ./letters/personal/john.
Assume that you want to remove a file that has multiple links; for example, assume that ls, lc, lx, lf,
etc., are links to the same file. The system keeps track of how many names reference the file
through the link count (more on this concept later). Such links are called hard links. If you remove
one of them, the file still exists as there are other names that reference it. Only when we remove the
last link (and with that, the link count goes to zero) will the file be removed.
There is also the issue of symbolic links. A symbolic link (also called a soft link) is nothing more
than a path name that points to some other file, or even to some directory. It is not until the link is
accessed that the path is translated into the "real" file. This has some interesting effects. For
example, if we create a link like this
ln -s /home/jimmo/letter.john /home/jimmo/text/letter.john

you would see the symbolic link as something like this:


drw-r--r-- 1 jimmo support 29 Sep 15 10:06 letter.john-> /home/jimmo/letter.john

Then,the file /home/jimmo/text/letter.john is a symbolic link to /home/jimmo/letter.john. Note that


the link count on /home/jimmo/letter.john doesn't change, because the system sees these as two
separate files. It is easier to think of the file /home/jimmo/text/letter.john as a text file that contains
the to /home/jimmo/letter.john. If we remove /home/jimmo/letter.john, /home/jimmo/text/letter.john
will still exist. However, it will point to something that doesn't exist. Even if there are other hard
links that point to the same file like /home/jimmo/letter.john, that doesn't matter. The symbolic
link, /home/jimmo/text/letter.john, points to the path /home/jimmo/letter.john. Because the path no
longer exists, the file can no longer be accessed via the symbolic link. It is also possible for you to
create a symbolic link to a file that does not exist, as the system does not check until you access the
file.
Another important aspect is that symbolic links can extend across file systems. A regular or hard
link is nothing more than a different name for the same physical file and used the same inode
number. Therefore it must be on the same filesystem. Symbolic links contain a path, so the
destination can be on another filesystem (and in some cases on another machine). For more on
inodes, see the section on filesystems.
The file command can be used to tell you the type of file. With DOS and Windows, it's fairly
obvious by looking at the file's extension to determine the files type. For example, files ending in
.exe are executables (programs), files ending in .txt are text files, and files ending in .doc are
documents (usually from some word processor). However, a program in UNIX can just as easily
have the ending .doc or .exe, or no ending at all.
The file command uses the file /etc/magic to make an assumption about the contents of a file. The
file command reads the header (first part of the file) and uses the information in /etc/magic to make
its guess. Executables of a specific type (a.out, ELF) all have the same basic format, so file can
easily recognize them. However, there are certain similarities between C source code, shell scripts,
and even text files that could confuse file.
For a list of some of the more commonly used commands, take a look here.
Chapter V
Editing Files
Because my intent here is not to make you shell or awk programming experts, there are obviously
things that we didn't have a chance to cover. However, I hope I have given you the basics to create
your own tools and configure at least your shell environment the way you need or want it.
Like any tool or system, the way to get better is to practice. Therefore, my advice is that you play
with the shell and programs on the system to get a better feeling for how they behave. By creating
your own scripts, you will become more familiar with both vi and shell script syntax, which will
helpyou to create your own tools and understand the behavior of the system scripts. As you learn
more, you can add awk and sed components to your system to make some very powerful commands
and utilities.

Vi
No one can force you to learn vi, just as no one can force you to do backups. However, in my
opinion, doing both will make you a better administrator. There will come a time when having done
regular backups will save your career. There may also come a time when knowing vi will save you
the embarrassment of having to tell your client or boss that you can't accomplish a task because you
need to edit a file and the only editor is the system default: vi.
On the other hand it is my favorite editor. In fact, most of my writing is done using vi. That includes
both books and articles. I find it a lot easier than using a so-called wysiwyg editor as I generally
don't care what the text is going to look like as my editors are going to change the appearance
anyway. Therefore, whether I am writing on Linux, Solaris, or even Windows, I have the same,
familiar editor. Then there is the fact that the files edited with vi are portable to any word processor,
regardless of the operating system. Plus it makes making global changes a whole lot easier.

vi Basics
The uses and benefits of any editor like vi are almost religious. Often, the reasons people choose
one editor over another are purely a matter of personal taste. Each offers its own advantages and
functionality. Some versions of UNIX provide other editors, such as emacs. However, the nice thing
about vi is that every dialect of UNIX has it. You can sit down at any UNIX system and edit a file.
For this reason more than any other, I think it is worth learning.
One problem vi has is that can be very intimidating. I know, I didn't like it at first. I frequently get
into discussions with people who have spent less than 10 minutes using it and then have ranted
about how terrible it was. Often, I then saw them spending hours trying to find a free or relatively
cheap add-on so they didn't have to learn vi. The problem with that approach is that if they has spent
as much time learning vi as they did trying to find an alternative, they actually could have become
quite proficient with vi.
There is more to vi than just its availability on different UNIX systems. To me, vi is magic. Once
you get over the initial intimidation, you will see that there is a logical order to the way the
commands are laid out and fit together. Things fit together in a pattern that is easy to remember. So,
as we get into it, let me tempt you a little.
Among the "magical" things vi can do:
• Automatically correct words that you misspell often
• Accept user-created vi commands
• Insert the output of UNIX commands into the file you are editing
• Automatically indent each line
• Shift sets of lines left or right
• Check for pairs of {}, () and [] (great for programmers)
• Automatically wrap around at the end of a line
• Cut and paste between documents

I am not going to mention every single vi command. Instead, I am going to show you a few and
how they fit together. At the end of this section, there is a table containing the various commands
you can use inside vi. You can then apply the relationships to the commands I don't mention.
To see what is happening when you enter commands, first find a file that you can poke around in.
Make a copy of the termcap file (/etc/termcap) in a temporary directory and then edit it (cd /tmp;
cp /etc/termcap . ; vi termcap). The termcap file contains a list of the capabilities of various
terminals. It is usually quite large and gives you a lot of things to play with in vi.
Before we can jump into the more advanced features of vi, I need to cover some of the basics. Not
command basics, but rather some behavioral basics. In vi, there are two modes: command mode and
input mode. While you are in command mode, every keystroke is considered part of a command.
This is where you normally start when you first invoke vi. The reverse is also true. While in input
mode, everything is considered input.
Well, that isn't entirely true and we'll talk about that in a minute. However, just remember that there
are these two modes. If you are in command mode, you go into input mode using a command to get
you there, such as append or insert (I'll talk about these in a moment). If you want to go from input
mode to command mode, press Esc.
When vi starts, it goes into full-screen mode (assuming your terminal is set up correctly) and it
essentially clears the screen (see the following image). If we start the command as
vi search
at the bottom of the screen, you see
"search" [New File]
Your cursor is at the top left-hand corner of the screen, and there is a column of tildes (~) down the
left side to indicate that these lines are nonexistent.
In the image below we see a vi session started from a terminal window running under X-Windows.
This is essentially the same thing you will see when starting vi from any command line.
As with most text editors or word processors, vi gives you the ability to save the file you are editing
without stopping the program. To issue the necessary command we first input a colon (:) when in
command mode. When then press w (for write) and the press the entry key. This might look like the
following figure:

After you press the enter key, you end up with something like the following image:

interactive)
If you are editing a file that already existing and try to save it like this, you may get an error
message that says the file is read only. You will also get this message, when trying to save a file
from "view", which is the "read-only" version of vi. To force the file to be written, you follow the w
with an exclamation mark. (:w!)
The ex-mode (or command mode) also allows you to do many other things with the file itself.
Among them are
• :q to quit the file (:q! if the file has been changed and you don't want to save the changes)
• :wq to write the file and quit
• :e to edit a new file (or even the same file)
• :r to read in a new file starting at the current location
Changing Text in vi
In addition to "standard" editing, there are a several special editing commands. Pressing dd will
delete the entire line you are on; 5dd would then delete five complete lines. To open up a line for
editing, we press o to open the line after the line you are currently on and O for the line before. Use
x to delete the character (including numbers) that the is on.
When we want to move something we just deleted, we put the on the spot where we want it. Then
press either p to put that text after the current cursor position or P to put it before the current
position. A nice trick that I always use to swap characters is xp. The x deletes the character you are
on and the p immediately inserts it. The result is that you swap characters. So if I had typed the
word "into" as "inot," I would place the cursor on the "o" and type xp, which would swap the "o"
and the "t."
To repeat the edit we just did, be it deleting 18 lines or inputting "I love you," we could do so by
pressing "." (dot) from command mode. In fact, any edit command can be repeated with the dot.
To make a change, press c followed by a movement command or number and movement command.
For example, to change everything from where you are to the next word, press cw. To change
everything from where you are to the end of the line, press C or c$. If you do that, then a dollar sign
will appear, indicating how much you intend to change.
If we go back into command mode (press Esc) before we reach the dollar sign, then everything from
the current position to the dollar sign is removed. When you think about this, it is actually logical.
By pressing C, you tell vi that you want to change everything to the end of the line. When you press
Enter, you are basically saying that you are done inputting text; however, the changes should
continue to the end of the line, thereby deleting the rest of the line.
To undo the last edit, what would we press? Well, whats the first letter of the word "undo"? Keep in
mind that pressing u will only undo the last change. For example, lets assume we enter the
following:
o to open a new line and go into input mode
I love
Esc to go back to command mode
a to append from current location
you
Esc to return to command mode
The result of what we typed was to open a new line with the text "I love you." We see it as one
change, but from the perspective of vi, two changes were made. First we entered "I love," then we
entered "you." If we were to press u, only "you" would be removed. However, if u undoes that last
change, what command do you think returns the line to its original state? What else: U. As you are
making changes, vi keeps track of the original state of a line. When you press U, the line is returned
to that original state.
If you want to replace all of the text on the current line, you could simply delete the line and insert a
new one. However, you could also replace the existing line by using the R (for replace) command.
This puts vi into replace mode and each character you type replaces the existing characters as you
write.

Moving Around in vi
Most editing and movement commands are single letters and are almost always the first letter of
what they do. For example, to insert text at your current cursor position, press i. To append text,
press a. To move forward to the beginning of the next word, press w. To move back to the beginning
of the previous word, press b.
The capital letter of each command has a similar behavior. Use I to insert at the beginning of a line.
Use A to start the append from the end of the line. To move "real" words, use W to move forward
and B to move back.
Real words are those terminated by whitespaces (space, tab, newline). Assume we wanted to move
across the phrase 'static-free bag'. If we start on the 's', pressing 'w', will move me to the '-'. Pressing
'w' again, we move to the 'f' and then to the 'b'. If we are on the 's' and press 'W', we jump
immediately to the 'b'. That is, to the next "real" word.
Moving in vi is also accomplished in other ways. Depending on your terminal type, you can use the
traditional method of arrow keys to move within the file. If vi doesn't like your terminal type, you
can use the keys h-j-k-l. If we want to move to the left we press 'h'. If you think about it, this make
sense since 'h' is on the left end of these four characters. To move right, press l. Again, this makes
sense as the 'l' is on the right end.
Movement up and down is not as intuitive. One of the two remaining characters (j and k) will move
us up and the other will move us down. But which one moves in which direction? Unfortunately, I
don't have a very sophisticated way of remembering. If you look at the two letters physically, maybe
it helps. If you imagine a line running through the middle of these characters, then you see that j
hangs down below that line. Therefore, use j to move down. On the other hand, k sticks up above
the middle, so we use k to move up. However, in most cases, the arrow keys will work, so you won't
need to remember. But it is nice to know them, as you can then leave your fingers on the keyboard.
As I mentioned, some keyboard types will allow you to use the arrow keys. However, you might be
surprised by their behavior in input mode. This is especially true if you are used to a word processor
where the arrow and other movement keys are the same all the time. The problem lies in the fact
that most keyboards actually send more than one character to indicate something like a left-arrow or
page-up key. The first of these is normally an escape (Esc). When you press one of these characters
in input mode, the Esc is interpreted as your wish to leave input mode.
If we want to move to the first character on a line, we press '0' (zero) or '^'. To move to the last
character, press $. Now, these are not intuitive. However, if you think back to our discussion on
regular expressions, you'll remember that ^ (caret) represents the beginning of a line and $ (dollar
sign) represents the end of a line. Although, these two characters do not necessarily have an
intuitive logic, they do fit in with other commands and programs that you find on a Linux system.
We can also take advantage of the fact that vi can count as well as combine movement with this
ability to count. By pressing a number before the movement command, vi will behave as if we had
pressed the movement key that many times. For example, 4w will move us forward four words or 6j
will move us six lines down.
If we want to move to a particular line we input the number and G. So, to move to line 43, we
would press 42G, kind of like 42-Go! If instead of G we press Enter, we would move ahead that
many lines. For example, if we were on line 85, pressing 42 and Enter would put us on line 127.
(No, you don't have to count lines; vi can display them for you, as we'll see in a minute.)
As you might have guessed, we can also use these commands in conjunction with the movement
keys (all except Ctrl-u and Ctrl-d). So, to delete everything from your current location to line 83, we
would input d83G. (Note that delete begins with d.) Or, to change everything from the current
cursor position down 12 lines, we would input c12+ or press c12 Enter.

Searching in vi
If you are trying to find a particular text, you can get vi to do it for you. You tell vi that you want to
enter a search pattern by pressing / (slash). This will bring you down to the bottom line of the screen
where you will see your slash. You then can type in what you want to look for. When you press
Enter, vi will start searching from your current location down toward the bottom of the file. If you
use press ? instead of /, then vi will search from your string toward the top of the file.
If the search is successful, that is, the string is found, you are brought to that point in the text. If you
decide that you want to search again, you have three choices. You can press ? or / and input the
search string again; press n, which is the first letter of the word "next"; or simply press ? or / with
no text following it for vi to continue the search in the applicable direction. If you wanted to find
the next string that matches but in the opposite direction, what do you think the command would
be? (Hint: the capital form of the "next" command.)
Once you have found what you are looking for, you can edit the text all you want and then continue
searching. This is because the search string you entered is kept in a buffer. So, when you press /, ?,
n, or N, the system remembers what you were looking for.
You can also include movement commands in these searches. First, you enclose the search pattern
with the character used to search (/ or ?), then add the movement command. For example, if you
wanted to search backward for the phrase "hard disk" and then move up a line, you would enter ?
hard disk?-. If you wanted to search forward for the phrase "operating system" and then move down
three lines, you would enter /operating system/+3.
All this time, we have been referring to the text patterns as search strings. As you just saw, you can
actually enter phrases. In fact, you can use any you want when searching for patterns. For example,
if you wanted to search for the pattern "Linux," but only when it appears at the beginning of a line,
you would enter /^Linux. If you wanted to search for it at the end of the line, you would enter
/Linux$.
You can also do more complicated searches such as /^new [Bb][Oo][Aa][Tt], which will search for
the word "new" at the beginning of a line, followed by the word "boat" with each letter in either
case.
No good text editor would be complete without the ability to not only search for text but to replace
it as well. One way of doing this is to search for a pattern and then edit the text. Obviously, this
starts to get annoying after the second or third instance of the pattern you want to replace. Instead,
you could combine several of the tools you have learned so far.
For example, lets say that everywhere in the text you wanted to replace "Unix" with "UNIX." First,
do a search on Unix with /Unix, tell vi that you want to change that word with cw, then input UNIX.
Now, search for the pattern again with /, and simply press . (dot). Remember that the dot command
repeats your last command. Now do the search and press the dot command again.
Actually, this technique is good if you have a pattern that you want to replace, but not every time it
appears. Instead, you want to replace the pattern selectively. You can just press n (or whatever) to
continue the search without carrying out the replacement.
What if you know that you want to replace every instance of a pattern with something else? Are you
destined to search and replace all 50 occurrences? Of course not. Silly you. There is another way.
Here I introduce what is referred to as escape or ex-mode, because the commands you enter are the
same as in the ex editor. To get to ex-mode, press : (colon). As with searches, you are brought down
to the bottom of the screen. This time you see the : (colon). The syntax is
: <scope> <command>
An example of this would be:
:45,100s/Unix/UNIX/
This tells vi the scope is lines 45 through 100. The command is s/Unix/UNIX/, which says you want
to substitute (s) the first pattern (Unix) with the second pattern (UNIX). Normally in English, we
would say "substitute UNIX for Unix." However, the order here is in keeping with the UNIX
pattern of source first, then destination (or, what it was is first, and what it will become is second,
like mv source destination).
Note that this only replaces the first occurrence on each line. To get all occurrences, we must
include g for global at the end of each line, like this:
:45,100s/Unix/UNIX/g
A problem arises if you want to modify only some of the occurrences. In this instance, you could
add the modifier c for confirm. The command would then look like this:
:45,100s/Unix/UNIX/gc
This causes vi to ask for confirmation before it makes the change.
If you wanted to search and replace on every line in the file, you could specify every line, such as :
1,48., assuming there were 48 lines in the file. (By the way, use Ctrl-g to find out what line you are
on and how many lines there are in the file.) Instead of checking how many lines there are each
time, you can simply use the special character $ to indicate the end of the file. (Yes, $ also means
the end of the line, but in this context, it means the end of the file.) So, the scope of the command
would look like :1,$.
Once again, the developers of vi made life easy for you. They realized that making changes
throughout a file is something that is probably done a lot. They included a special character to mean
the entire file: %. Therefore, the command is written as % = 1,$.
Here again, the search patterns can be regular expressions. For example, if we wanted to replace
every occurrence of "boat" (in either case) with the word "ship," the command would look like this:
:%s/[Bb][Oo][Aa][Tt]/ship/g
As with regular expressions in other cases, you can use the asterisk (*) to mean any number of the
preceding characters or a period (.) to mean any single character. So, if you wanted to look for the
word "boat" (again, in either case), but only when it was at the beginning of a line and only if it
were preceded by at least one dash, the command would look like this:
:%s/^--*[Bb][Oo][Aa][Tt]/ship/g
The reason you have two dashes is that the search criteria specified at least one dash. Because the
asterisk can be any number, including zero, you must consider the case where it would mean zero.
That is, where the word "boat" was at the beginning of a line and there were no spaces. If you didn't
care what the character was as long as there was at least one, you could use the fact that in a search
context, a dot means any single character. The command would look like this:
:%s/^..*[Bb][Oo][Aa][Tt]/ship/g

vi Buffers
Remember when we first starting talking about searching, I mentioned that the expression you were
looking for was held in a buffer. Also, whatever was matched by /[Bb][Oo][Aa][Tt] can be held in a
buffer. We can then use that buffer as part of the replacement expression. For example, if we wanted
to replace every occurrence of "UNIX" with "Linux," we could do it like this:
:%s/UNIX/Linux/g
The scope of this command is defined by the %, the shortcut way of referring to the entire text. Or,
you could first save "UNIX" into a buffer, then use it in the replacement expression. To enclose
something in a buffer, we enclose it within a matching pair of back slashes \( and \) to define the
extent of a buffer. You can even have multiple pairs that define the extent of multiple buffers. These
are reference by \#, where # is the number of the buffer.
In this example
:%s/\(UNIX\)/Linux \1/g
the text "UNIX," is placed into the first buffer. You then reference this buffer with \1 to say to vi to
plug in the contents of the first buffer. Because the entire search pattern is the same as the pattern
buffer, you could also have written it like this
:%s/\(UNIX\)/Linux &/g
in which the ampersand represents the entire search pattern.
This obviously doesn't save much typing. In fact, in this example, it requires more typing to save
"UNIX" into the buffer and then use it. However, if what you wanted to save was longer, you would
save time. You also save time if you want to use the buffer twice. For example, assume you have a
file with a list of other files, some of them C language source files. All of them end in .c. You now
want to change just the names of the C files so that the ending is "old" instead of .c. To do this,
insert mv at the beginning of each line as well as produce two copies of the file name: one with .c
and one with .old. You could do it like this:
:%s/^\(.*\)\.c/mv \1.c \1.old/g
In English, this line says:
• For every line (%)
• substitute (s)
• for the pattern starting at the beginning of the line (^), consisting of any number of
characters ( \(.*\) ) (placing this pattern into buffer #1) followed by .c
• and use the pattern mv, followed by the contents of buffer #1 (\1), followed by a .c, which is
again followed by the contents of buffer #1, (\1) followed by .old
• and do this for every line (g), (i.e., globally)

Now each line is of the form: mv file.c file.old

Note the slash preceeding the dot in the expression "\.c". The slash "protects" the dot from being
interpreted as the metacharacter for "any character". Instead, you want to search for a literal dot, so
you need to protect it. We can now change the permissions to make this a shell script and execute it.
We would then move all the files as described above.
Using numbers like this is useful if there is more that one search pattern that you want to process.
For example, assume that we have a three-column table for which we want to change the order of
the columns. For simplicity's sake, lets also assume that each column is separated by a space so as
not to make the search pattern too complicated.
Before we start, we need to introduce a new concept to vi, but one that you have seen before: [ ].
Like the shell, the square bracket pair ([ ]) of vi is used to limit sets of characters. Inside of the
brackets, the caret (^) takes on a new meaning. Rather than indicating the beginning of a line, here it
negates the character we are searching for. So we could type
%s/\([^ ]*\) \([^ ]*\) \([^ ]*\)/\3 \1 \2/g
Here we have three regular expressions, all referring to the same thing: \([^ ]*\). As we discussed
above, the slash pair \( and \) delimits each of the buffers, so everything inside is the search pattern.
Here, we are searching for [^ ]*, which is any number of matches to the set enclosed within the
brackets. Because the brackets limit a set, the set is ^, followed by a space. Because the ^ indicates
negation, we are placing any number of characters that is not a space into the buffer. In the
replacement pattern, we are telling vi to print pattern3, a space, pattern1, another space, then
pattern2.
In the first two instances, we followed the pattern with a space. As a result, those spaces were not
saved into any of the buffers. We did this because we may have wanted to define our column
separator differently. Here we just used another space.
I have often had occasion to want to use the pattern buffers more than once. Because they are not
cleared after each use, you can use them as many times as you want. Using the example above, if
we change it to
%s/\([^ ]*\) \([^ ]*\) \([^ ]*\)/\3 \1 \2 \1/g
we would get pattern3, then pattern1, then pattern2, and at the end, pattern1 again.
Believe it or not, there are still more buffers. In fact, there are dozens that we haven't touched on.
The first set is the numbered buffers, which are numbered 1-9. These are used when we delete text
and they behave like a stack. That is, the first time we delete something, say a word, it is placed in
buffer number 1. We next delete a line that is placed in buffer 1 and the word that was in buffer 1 is
placed in buffer 2. Once all the numbered buffers all full, any new deletions push the oldest ones out
the bottom of the stack and are no longer available.
To access these buffers, we first tell vi that we want to use one of the buffers by pressing the double-
quote ("). Next, we specify then the number of the buffer, say 6, then we type either p or P to put it,
as in "6p. When you delete text and then do a put without specifying any buffer, it automatically
comes from buffer 1.
There are some other buffers, in fact, 26 of them, that you can use by name. These are the named
buffers. If you can't figure out what their names are, think about how many of them there are (26).
With these buffers, we can intentionally and specifically place something into a particular buffer.
First, we say which buffer we want by preceding its name with a double-quote ("); for example, "f.
This says that we want to place some text in the named buffer f. Then, we place the data in the
buffer, for example, by deleting an entire line with dd or by deleting two words with d2w. We can
later put the contents of that buffer with "fp. Until we place something new in that buffer, it will
contain what we originally deleted.
If you want to put something into a buffer without having to delete it, you can. You do this by
"yanking it." To yank an entire line, you could done one of several things. First, there is yy. Next, Y.
Then, you could use y, followed by a movement commands, as in y-4, which would yank the next
four lines (including the current one), or y/expression, which would yank everything from your
current position up to and including expression.
To place yanked data into a named buffer (rather than the default buffer, buffer number 1), it is the
same procedure as when you delete. For example, to yank the next 12 lines into named buffer h, we
would do "h12yy. Now those 12 lines are available to us. Keep in mind that we do not have to store
full lines. Inputting "h12yw will put the next 12 words into buffer h.
Some of the more observant readers might have noticed that because there are 26 letters and each
has both an upper- and lowercase, we could have 52 named buffers. Well, up to now, the uppercase
letters did something different. If uppercase letters were used to designate different buffers, then the
pattern would be compromised. Have no fear, it is.
Instead of being different buffers than their lowercase brethren, the uppercase letters are the same
buffer. The difference is that yanking or deleting something into an uppercase buffer appends the
contents rather that overwriting them.
You can also have vi keep track of up to 26 different places with the file you are editing. These
functions are just like bookmarks in word processors.
To mark a spot, move to that place in the file, type m for mark (what else?), then a single back quote
(`), followed by the letter you want to use for this bookmark. To go back to that spot, press the back
quote (`), followed by the appropriate letter. So, to assign a bookmark q to a particular spot, you
would enter `q. Keep in mind that reloading the current file or editing a new one makes you lose the
bookmarks.
Note that with newer version of vi (particularly vim) you don't press the backquote to set the mark,
just m followed by the appropriate letter.

vi Magic
I imagine that long before now, you have wondered how to turn on all that magic I said that vi could
do. Okay, let's do it.
The first thing I want to talk about is abbreviations. You can tell vi that when you type in a specific
set of characters it is supposed to automagically change it to something else. For example, we could
have vi always change USA to United States of America. This is done with the abbr command.
To create a new abbreviation, you must get into ex-mode by pressing the colon (:) in command
mode. Next, type in abbr, followed by what you want to type in, and what vi should change it to.
For example:abbr USA United States of America

Note that the abbreviation cannot contain any spaces because vi interprets everything after the
second word as being part of the expansion.
If we later decide we don't want that abbreviation anymore, we enter
:unabbr USA
Because it is likely that we will want to use the abbreviation USA, it is not a good idea to use an
abbreviation that would normally occur, such as USA. It would be better, instead, to use an
abbreviation that doesn't occur normally, like Usa. Keep in mind, that abbreviations only apply to
complete words. Therefore, something like the name "Sousa" won't be translated to "SoUSA." In
addition, when your abbreviation is followed by a space, Tab, Enter, or Esc, the change is made.
Lets take this one step further. What if we were always spelling "the" as "teh." We could then create
an abbreviation :abbr teh the

Every time we misspell "the" as "teh," vi would automatically correct it. If we had a whole list of
words that we regularly misspelled and created similar abbreviations, then every time we entered
one of these misspelled words, it would be replaced with the correctly spelled word. Wouldn't that
be automatic spell correction?
If we ever want to "force" the spelling to be a particular way (that is, turn off the abbreviation
momentarily), we simply follow the abbreviation with a Ctrl-V. This tells vi to ignore the special
meaning of the following character. Because the next character is a white space, which would force
the expansion of the abbreviation (which makes the white space special in this case), "turning off"
the white space keeps the abbreviation from being expanded.
We can also use vi to re-map certain sequences. For example, I have created a command so that all I
need to do to save a file is Ctrl-W (for write). If I want to save the file and quit, I enter Ctrl-X with
the "map" command.
The most common maps that I have seen have used control sequences, because most of the other
characters are already taken up. Therefore, we need to side-step a moment. First, we need to know
how to access control characters from within vi. This is done in either command mode or input
mode by first pressing Ctrl-V and then pressing the control character we want. So to get Ctrl-W, I
would type Ctrl-V, then Ctrl-W. This would appear on the screen as ^W. This looks like two
characters, but if you inserted it into a text and moved over it with the cursor, you would realize that
vi sees it as only one character. Note that although I pressed the lowercase w, it will appear as
uppercase on the screen.
So, to map Ctrl-W so that every time we press it, we write our current file to disk, the command
would be: map ^W :w^M

This means that when we press Ctrl-W, vi interprets it as though we actually typed :w and pressed
Enter (the Ctrl-M, ^M). The Enter at the end of the command is a good idea because you usually
want the command to be executed right away. Otherwise, you would have to press Enter yourself.
Also keep in mind that this can be used with the function keys. Because I am accustomed to many
Windows and DOS applications in which the F2 key means to save, I map F2 to Ctrl-V, then F2. It
looks like this:
map ^[[N :w^M
(The ^[[N is what the F2 key displays on the screen)
If we want, we can also use shifted function characters. Therefore, we can map Shift-F2 to
something else. Or, for that matter, we can also use shifted and control function keys.
It has been my experience that, for the most part, if you use Shift and Ctrl with non-function keys,
vi only sees Ctrl and not Shift. Also, Alt may not work because on the system console, Alt plus a
function key tells the system to switch to multiscreens.
I try not to use the same key sequences that vi already does. First, it confuses me because I often
forget that I remapped something. Second, the real vi commands are then inaccessible. However, if
you are used to a different command set (that is, from a different editor), you can "program" vi to
behave like that other editor.
Never define a mapping that contains its own name, as this ends up recursively expanding the
abbreviation. The classic example is :map! n banana. Every time you typed in the word "banana,"
you'd get: bababababababababababababababababa...

and depending on what version you were running, vi would catch the fact that this is an infinite
translation and stop.

Command Output in vi
It often happens that we want the output of UNIX commands in the file we are editing. The
sledgehammer approach is to run the command and redirect it to a file, then edit that file. If that file
containing the commands output already exists, we can use the :r from ex-mode to read it in. But,
what if it doesn't yet exist. For example, I often want the date in text files as a log of when I input
things. This is done with a combination of the :r (for read) from ex-mode and a shell-escape.
A shell-escape is when we start from one program and jump out of it (escape) to a shell. Our
original program is still running, but we are now working in a shell that is a child process of that
program.
To do a shell-escape, we need to be in ex-mode. Next, press the exclamation mark (!) followed by
the command. For example, to see what time it is, type :!date. We then get the date at the bottom of
the screen with the message to press any key to continue. Note that this didn't change our original
text; it just showed us the output of the date command.
To read in a command's output, we need to include the :r command, as in :r!date. Now, the output of
the date is read into the file (it is inserted into the file). We could also have the output replace the
current line by pressing ! twice, as in !!date. Note that we are brought down to the last line on the
screen, where there is a single !.
If we want, we can also read in other commands. What is happening is that vi is seeing the output of
the command as a file. Remember that :r <file_name> will read a file into the one we are editing.
Why not read from the output of a file? With pipes and redirection, both stdin and stdout can be
files.
We can also take this one step further. Imagine that we are editing a file containing a long list. We
know that many lines are duplicated and we also want the list sorted. We could do :%!sort, which, if
we remember from our earlier discussion, is a special symbol meaning all the lines in the file. These
are then sent through the command on the other side of the !. Now we can type : %!uniq

to remove all the duplicate lines.


Remember that this is a shell-escape. From the shell, we can combine multiple commands using
pipes. We can do it here as well. So to save time, we could enter :%!sort | uniq

which would sort all the lines and remove all duplicate lines. If we only wanted to sort a set of lines,
we could do it like this : 45,112!sort

which would sort lines 45 through 112. We can take this one step further by either writing lines 45-
112 to a new file with :45,112w file_name or reading in a whole file to replace lines 45-112 with :
45,112r file_name.

More vi Magic
If we need to, we can also edit multiple files. This is done like this: vi file1 file2 file3

Once we are editing, we can switch between files with :n for the next file and :p for the previous
one. Keep in mind that the file names do not wrap around. In other words, if we keep pressing :n
and get to file3, doing it again does not wrap around and bring me to file1. If we know the name of
the file, we can jump directly there, with the ex-mode edit command, as in : e file3

The ability to edit multiple files has another advantage. Do you remember those numbered and
named buffers? They are assigned for a single instance of vi, not on a per-file basis. Therefore, you
can delete or yank text from one file, switch to the next and then insert it. This is a crude but
effective cut and paste mechanism between files.
You can specify line numbers to set your position within a file. If you switch to editing another file
(using :n or :r), or reload an original file (using :rew!), the contents of the deletion buffers are
preserved so that you can cut and paste between files. The contents of all buffers are lost, however,
when you quit vi.
vi Odds and Ends
You will find as you work with vi that you will often use the same vi commands over and again.
Here too, vi can help. Because the named buffers are simply sequences of characters, you can store
commands in them for later use. For example, when editing a file in vi, I needed to mark new
paragraphs in some way as my word processor normally sees all end-of-line characters as new
paragraphs. Therefore, I created a command that entered a "para-marker" for me.
First, I created the command. To do this, I opened up a new line in my current document and typed
in the following text:
Para
Had I typed this from command mode, it would have inserted the text "Para" at the beginning of the
line. I next loaded it into a named buffer with "pdd, which deletes the line and loads it into buffer p.
To execute it, I entered @p. The @ is what tells vi to execute the contents of the buffer.
Keep in mind that many commands, abbreviations, etc., are transitive. For example, when I want to
add a new paragraph, I don't write Para as the only characters on a line. Instead, I use something
less common: {P}. I am certain that I will never have {P} at the beginning of a line; however, there
are contexts where I might have Para at the beginning of a line. Instead, I have an abbreviation,
Para, that I translated to {P}.
Now, I can type in Para at the beginning of a line in input mode and it will be translated to {P}.
When I execute the command I have in buffer p, it inserts Para, which is then translated to {P}.
So why don't I just have {P} in buffer p? Because the curly braces are one set of movement keys
that I did not mention yet. The { moves you back to the beginning of the paragraph and } moves
you forward. Because paragraphs are defined by vi as being separated by a blank line or delimited
by nroff macros, I never use them (nroff is an old text processing language). Because vi sees the
brackets as something special in command mode, I need to use this transitivity.
If you are a C programmer, you can take advantage of a couple of nifty tricks of vi. The first is the
ability to show you matching pairs of parentheses ( () ), square brackets ([]), and curly braces ({}).
In ex-mode (:), type set showmatch. Afterward, every time you enter the closing paren'thesis,
bracket, or brace, you are bounced back to its match. This is useful in checking whether or not you
have the right number of each.
We can also jump back and forth between these pairs by using %. No matter where we are within a
curly braces pair ({}), pressing % once moves us to the first (opening) brace. Press % again and we
are moved to its match (the closing brace). We can also place the cursor on the closing brace and
press % to move us to the opening brace.
If you are a programmer, you may like to indent blocks of code to make things more readable.
Sometimes, changes within the code may make you want to shift blocks to the left or right to keep
the spacing the same. To do this, use << (two less-than signs) to move the text one "shift-width" to
the left, and >> (two greater-than signs) to move the text one "shift-width" to the right. A "shift-
width" is defined in ex-mode with set shiftwidth=n, where n is some number. When you shift a line,
it moves left or right n characters.
To shift multiple lines, input a number before you shift. For example, if you input 23>>, you shift
the next 23 lines one shiftwidth to the right.
There are a lot of settings that can be used with vi to make life easier. These are done in ex-mode,
using the set command. For example, use :set autoindent to have vi automatically indent. To get a
listing of options which have been changed from their default, simply input ":set" and you get
something like in the following image:

Inputting ":set all" will show you the value of all options. Watch out! There are a lot and typically
spread across multiple screens. See the vi(C) man-page for more details of the set command and
options.
Some useful set commands include:

• wrapmargin automatically "word wraps" when you get to


=n within n spaces of the end of the line

• showmode tells you whether you are in insert mode

• number displays line numbers at the left-hand edge of the


screen

• autowrite Saves any changes that have been made to the


current file when you issue the :n, :rew, or :!
command

• ignorecase Ignores the case of text while searching

• list Prints end-of-line characters such as $ and tab


characters such as ^I, which are normally invisible
• tabstop=n Sets the number of spaces between each tab stop
on the screen to n

• shiftwidth Sets the number of spaces << and >> shifts each
line

Configuring vi
When we first started talking about vi, I mentioned that there were a lot things we could do to
configure it. There are mappings and abbreviations and settings that we can control. The problem is
that once we leave vi, everything we added is lost.
Fortunately, there is hope. Like many programs, vi has its own configuration file: .exrc (note the dot
at the front). Typically, vi just uses its standard settings and does not create this file. However, if this
file resides in our , it will be valid every time we start vi unless we have an .exrc file in our current
directory which will then take precedence. Having multiple .exrc files is useful when doing
programming as well as when editing text. When writing text, I don't need line numbers or
autoindent like I do when programming.
The content and syntax of the lines is exactly the same as in vi; however, we don't have the leading
colon. Part of the .exrc file in my text editing directory looks like this:
map! ^X :wq
map x :wq
map! ^W :w
map w :w
set showmode
set wm=3
abbr Unix UNIX
abbr btwn between
abbr teh the
abbr refered referred
abbr waht what
abbr Para {P}
abbr inot into

Sed
Suppose you have a file in which you need to make some changes. You could load up vi and make
the changes that way, but what if what you wanted to change was the output of some command
before you sent it to a file? You could first send it to a file and then edit that file, or you could use
sed, which is a stream editor that is specifically designed to edit data streams.
If you read the previous section or are already familiar with either the search and replace
mechanisms in vi or the editor ed, you already have a jump on learning sed. Unlike vi, sed is non-
interactive, but can handle more complicated editing instructions. Because it is non-interactive,
commands can be saved in text files and used over and over. This makes debugging the more
complicated sed constructs much easier. For the most part, sed is line-oriented, which allows it to
process files of almost any size. However, this has the disadvantage that sed cannot do editing that
is dependent on relative addressing.
Unlike the section on vi, I am not going to go into as many details about sed. However, sed is a
useful tool and I use it often. The reason I am not going to cover it in too much detail is three-fold.
First, much of what is true about pattern searches, addressing, etc., in vi is also true in sed.
Therefore, I don't feel the need to repeat. Second, it is not that important that you become a sed
expert to be a good system administrator. In a few cases, scripts on a Linux system will use sed.
However, they are not that difficult to understand, provided you have a basic understanding of sed
syntax. Third, sed is like any programming language, you can get by with simple things. However,
to get really good, you need to practice and we just don't have the space to go beyond the basics.
In this section, I am going to talk about the basics of sed syntax, as well as some of the more
common sed commands and constructs. If you want to learn more, I recommend getting sed &
awkby Dale Dougherty from O'Reilly and Associates. This will also help you in the section on awk,
which is coming up next.
The way sed works is that it reads input one line at a time, and then carries out whatever editing
changes you specify. When it has finished making the changes, it writes them to stdout. Like
commands such as grep and sort, sed acts like a filter. However, with sed you can create very
complicated programs. Because I normally use sed as one end of a pipe, most of the sed commands
that I use have the following structure:
first_cmd | sed <options> <edit_description>

This is useful when the edit descriptions you are using are fairly simple. However, if you want to
perform multiple edits on each line, then this way is not really suitable. Instead, you can put all of
your changes into one file and start up sed like this
first_cmd | sed -f editscript or sed -f editscript <inputfile

As I mentioned before, the addressing and search/replace mechanisms within sed are basically the
same as within vi. It has the structure : [address1[,address2]] edit_description [arguments]
As with vi, addresses do not necessarily need to be line numbers, but can be regular expressions that
sed needs to search for. If you omit the address, sed will make the changes globally, as applicable.
The edit_description tells sed what changes to make. Several arguments can be used, and we'll get
to them as we move along. As sed reads the file, it copies each line into its pattern space. This
pattern space is a special buffer that sed uses to hold a line of text as it processes it. As soon as it has
finished reading the line, sed begins to apply the changes to the pattern space based on the edit
description.
Keep in mind that even though sed will read a line into the pattern space, it will only make changes
to addresses that match the addresses specified and does not print any warnings when this happens.
In general, sed either silently ignores errors or terminates abruptly with an error message as a result
of a syntax error, not because there we no matches. If there are no lines that contain the pattern, no
lines match, and the edit commands are not carried out.
Because you can have multiple changes on any given line, sed will carry them each out in turn.
When there are no more changes to be made, sed sends the result to its output. The next line is read
in and the whole process starts over. As it reads in each line, sed will increment an internal line
counter, which keeps track of the total number of lines read, not lines per file. This is an important
distinction if you have multiple files that are being read. For example, if you had two 50-line files,
from sed's perspective, line 60 would be the tenth line in the second file.
Each sed command can have 0, 1, or 2 addresses. A command with no addresses specified is applied
to every line in the input. A command with one address is applied to all lines that match that
address. For example: /mike/s/fred/john/

substitutes the first instance of "john for "fred only on those lines containing "mike. A command
with two addresses is applied to the first line that matches the first address, then to all subsequent
lines until a match for the second address is processed. An attempt is made to match the first address
on subsequent lines, and the process is repeated. Two addresses are separated by a comma.
For example
50,100s/fred/john/
substitutes the first instance of "john for "fred from line 50 to line 100, inclusive. (Note that there
should be no space between the second address and the s command.) If an address is followed by an
exclamation mark (!), the command is applied only to lines that do not match the address. For
example
50,100!s/fred/john/
substitutes the first instance of "john for "fred everywhere except lines 50 to 100, inclusive.
Also, sed can be told to do input and output based on what it finds. The action it should perform is
identified by an argument at the end of the sed command. For example, if we wanted to print out
lines 5-10 of a specific file, the sed command would be cat file | sed -n 5,10p

The -n is necessary so that every line isn't output in addition to the lines that match.
Remember the script we created in the first section of this chapter, where we wanted just lines 510
of every file. Now that we know how to use sed, we can change the script to be a lot more efficient.
It would now look like this:
find ./letters/taxes -print | while read FILE
do
echo $FILE
cat $FILE | sed -n 5-10p
done

Rather than sending the file through head and then the output through tail, we send the whole file
through sed. It can keep track of which line is line 1, and then print the necessary lines.
In addition, sed allows you to write lines that match. For example, if we wanted all the comments in
a shell script to be output to a file, we could use sed like this:
cat filename | sed -n /^#/w filename
Note that there must be exactly one space between the w and the name of the file. If we wanted to
read in a file, we could do that as well. Instead of a w to write, we could use an r to read. The
contents of the file would be appended after the lines specified in the address. Also keep in mind
that writing to or reading from a file are independent of what happens next. For example, if we
write every line in a file containing the name "John," but in a subsequent sed command change
"John" to "Chris," the file would contain references to "John," as no changes are made. This is
logical because sed works on each line and the lines are already in the file before the changes are
made.
Keep in mind that every time a line is read in, the contents of the pattern space are overwritten. To
save certain data across multiple commands, sed provides what is called the "hold space." Changes
are not made to the hold space directly, rather the contents of either one can be copied into the other
for processes. The contents can even be exchanged, if needed. The table below contains a list of the
more common sed commands, including the commands used to manipulate the hold and pattern
spaces.
Table sed Commands

a append text to the pattern space

b branch to a label

c append text

d delete text

delete all the characters from the start of the pattern


D
space up to and including the first new line

g overwrite the pattern space with the holding area

appends the holding area to the pattern space, separated


G
with a new line

h overwrite holding area with pattern space

append

H s the pattern space to the holding area, separated

by a newlinewith a new line

i insert text
l list the contents of the pattern space

n add a new line to the pattern space

append the next input line to the pattern space, separated


N
lines with a new line

p print the pattern space

print from the start of the pattern space up to and


P
including the first new line

r read in a file

s substitute patterns

branch only if a substitution has been made to the current


t
pattern space

w writes to a file

interchange the contents of the pattern space and the


x
holding area (the maximum number of addresses is two)

Awk
Another language that Linux provides and is standard on many (most?) UNIX systems is . The
abbreviation awk is an acronym composed of the first letter of the last names of its developers:
Alfred Aho, Peter Weinberger, and Brian Kernighan. Like sed, awk is an interpreted pattern-
matching language. In addition, awk, like sed, can also read stdin. It can also be passed the name of
a file containing its arguments.
The most useful aspect of awk (at least useful for me and the many Linux scripts that use it) is its
idea of a field. Like sed, awk will read whole lines, but unlike sed, awk can immediately break into
segments (fields) based on some criteria. Each field is separated by a field separator. By default,
this separator is a space. By using the -F option on the command line or the FS variable within an
awk program, you can specify a new . For example, if you specified a colon (:) as a field separator,
you could read in the lines from the /etc/password file and immediately break it into fields.
A programming language in its own right, awk has become a staple of UNIX systems. The basic
purposes of the language are manipulating and processing text files. However, awk is also a useful
tool when combined with output from other commands, and allows you to format that output in
ways that might be easier to process further. One major advantage of awk is that it can accomplish
in a few lines what would normally require dozens of lines in sh or csh shell script, or may even
require writing something in a lower-level language, like C.
The basic layout of an awk command is : pattern { action }

where the action to be performed is included within the curly braces ({}). Like sed, awk reads one
input a line at a time, aut awk sees each line as a record broken up into fields. Fields are separated
by an input Field Separator (FS), which by default is a Tab or space. The FS can be changed to
something else, for example, a semi-colon (;), with FS=;. This is useful when you want to process
text that contains blanks; for example, data of the following form:
Blinn, David;42 Clarke Street;Sunnyvale;California;95123;33
Dickson, Tillman;8250 Darryl Lane;San Jose;California;95032;34
Giberson, Suzanne;102 Truck Stop Road;Ben Lomond;California;26
Holder, Wyliam; 1932 Nuldev Street;Mount Hermon;California;95431;42
Nathanson, Robert;12 Peabody Lane;Beaverton;Oregon;97532;33
Richards, John;1232 Bromide Drive;Boston;Massachusetts;02134;36
Shaffer, Shannon;98 Whatever Way;Watsonville;California;95332;24
Here we have name, address, city, state, zip code, and age. Without using ; as a , Blinn and
David;42 would be two fields. Here, we would want to treat each name, address city, etc., a single
unit, rather than as multiple fields.
The basic format of an awk program or awk script, as it is sometimes called, is a pattern followed
by a particular action. Like sed, each line of the input is checked by awk to see if it matches that
particular pattern. Both sed and awk do well when comparing string values, However, whereas
checking numeric values is difficult with sed, this functionality is an integral part of awk.
If we wanted, we could use the data previously listed and output only the names and cities of those
people under 30. First, we need an awk script, called awk.scr, that looks like this:
FS=; $6 < 30 { print $1, $3 }
Next, assume that we have a data file containing the seven lines of data above, called awk.data. We
could process the data file in one of two ways. First
awk -f awk.scr awk.data
The -f option tells awk that it should read its instructions from the file that follows. In this case,
awk.scr. At the end, we have the file from which awk needs to read its data.
Alternatively, we could start it like this:
cat awk.data | awk -f awk.scr

We can even make string comparisons. as in


$4 == "California" { print $1, $3 }
Although it may make little sense, we could make string comparisons on what would normally be
numeric values, as in
$6 == "33" { print $1, $3 }
This prints out fields 1 and 3 from only those lines in which the sixth field equals the string 33.
Not to be outdone by sed, awk will also allow you to use regular expressions in your search criteria.
A very simple example is one where we want to print every line containing the characters "on."
(Note: The characters must be adjacent and in the appropriate case.) This line would look like this:
/on/ {print $0}
However, the regular expressions that awk uses can be as complicated as those used in sed. One
example would be
/[^s]on[^;]/ {print $0}
This says to print every line containing the pattern on, but only if it is not preceded by an ^s nor
followed by a semi-colon(^;). The trailing semi-colon eliminates the two town names ending in "on"
(Boston and Beaverton) and the leading s eliminates all the names ending in "son." When we run
awk with this line, our output is
Giberson, Suzanne;102 Truck Stop Road;Ben Lomond;California;96221;26
But doesn't the name "Giberson" end in "son"? Shouldn't it be ignored along with the others? Well,
yes. However, that's not the case. The reason this line was printed out was because of the "on" in
Ben Lomond, the city in which Giberson resides.
We can also use addresses as part of the search criteria. For example, assume that we need to print
out only those lines in which the first field name (i.e., the persons last name) is in the first half of
the alphabet. Because this list is sorted, we could look for all the lines between those starting with
"A" and those starting with "M." Therefore, we could use a line like this:
/^A/,/^M/ {print $0}
When we run it, we get
What happened? There are certainly several names in the first half of the alphabet. Why didn't this
print anything? Well, it printed exactly what we told it to print. Like the addresses in both vi and
sed, awk searches for a line that matches the criteria we specified. So, what we really said was
"Find the first line that starts with an A and then print all the lines up to and including the last one
starting with an M." Because there was no line starting with an "A," the start address didn't exist.
Instead, the code to get what we really want would look like this:
/^[A-M]/ {print $0}
This says to print all the lines whose first character is in the range A-M. Because this checks every
line and isn't looking for starting and ending addresses, we could have even used an unsorted file
and would have gotten all the lines we wanted. The output then looks like this:
Blinn, David;42 Clarke Street;Sunnyvale;California;95123;33
Dickson, Tillman;8250 Darryl Lane;San Jose;California;95032;34
Giberson, Suzanne;102 Truck Stop Road;Ben Lomond;California;96221;26
Holder, Wyliam; 1932 Nuldev Street;Mount Hermon;California;95431;42
If we wanted to use a starting and ending address, we would have to specify the starting letter of the
name that actually existed in our file. For example:
/^B/,/^H/ {print $0}
Because printing is a very useful aspect of awk, its nice to know that there are actually two ways of
printing with awk. The first we just mentioned. However, if you use printf instead of print, you can
specify the format of the output in greater detail. If you are familiar with the C programming
language, you already have a head start, as the format of printf is essentially the same as in C.
However, there are a couple of differences that you will see immediately if you are a C programmer.
For example, if we wanted to print both the name and age with this line
$6 >30 {printf"%20s %5d\n",$1,$6}
the output would look like this:
Blinn, David 33
Dickson, Tillman 34
Holder, Wyliam 42
Nathanson, Robert 33
Richards, John 36
The space used to print each name is 20 characters long, followed by five spaces for the age.
Because awk reads each line as a single record and blocks of text in each record as fields, it needs to
keep track of how many records there are and how many fields. These are denoted by the NR
variable.
Another way of using awk is at the end of a pipe. For example, you may have multiple-line output
from one command or another but only want one or two fields from that line. To be more specific,
you may only want the permissions and file names from an ls -l output. You would then pipe it
through awk, like this
ls -l | awk '{ print $1" "$9 }'
and the output might look something like this:
-rw-r--r-- mike.letter
-rw-r--r-- pat.note
-rw-r--r-- steve.note
-rw-r--r-- zoli.letter
This brings up the concept of variables. Like other languages, awk enables you to define variables.
A couple are already predefined and come in handy. For example, what if we didn't know off the
tops of our heads that there were nine fields in the ls -l output? Because we know that we wanted
the first and the last field, we can use the variable that specifies the number of fields. The line would
then look like this:
ls -l | awk '{ print $1" "$NF }'
In this example, the space enclosed in quotes is necessary; otherwise, awk would print $1 and $NR
right next to each other.
Another variable that awk uses to keep track of the number of records read so far is NR. This can be
useful, for example, if you only want to see a particular part of the text. Remember our example at
the beginning of this section where we wanted to see lines 5-10 of a file (to look for an address in
the header)? In the last section, I showed you how to do it with sed, and now I'll show you with
awk.
We can use the fact that the NR variable keeps track of the number of records, and because each line
is a record, the NR variable also keeps track of the number of lines. So, we'll tell awk that we want
to print out each line between 5 and 10, like this:
cat datafile | awk '{NR >=5 && NR <= 10 }'
This brings up four new issues. The first is the NR variable itself. The second is the use of the
double ampersand (&&). As in C, this means a logical AND. Both the right and the left sides of the
expression must be true for the entire expression to be true. In this example, if we read a line and
the value of NR is greater than or equal to 5 (i.e., we have read in at least five lines) and the number
of lines read is not more than 10, the expression meets the logical AND criteria. The third issue is
that there is no print statement. The default action of awk, when it doesn't have any additional
instructions, is to print out each line that matches the pattern. (You can find a list of other built in
variables in the table below)
The last issue is the use of the variable NR. Note that here, there is no dollar sign ($) in front of the
variable because we are looking for the value of NR, not what it points to. We do not need to prefix
it with $ unless it is a field variable. Confused? Lets look at another example.
Lets say we wanted to print out only the lines where there were more than nine fields. We could do
it like this:
cat datafile | awk '{ NF > 9 }'
Compare this
cat datafile | awk { print $NF }
which prints out the last field in every line. (You can find a list of other built in variable in the table
below)
Up to now, we've been talking about one line awk commands. These have all performed a single
action on each line. However, awk has the ability to do multiple tasks on each line as well as a task
before it begins reading and after it has finished reading.
We use the BEGIN and END pair as markers. These are treated like any other pattern. Therefore,
anything appearing after the BEGIN pattern is done before the first line is read. Anything after the
END pattern is done after the last line is read. Lets look at this script:
BEGIN { FS=";"}
{printf"%s\n", $1}
{printf"%s\n", $2}
{printf"%s, %s\n",$3,$4}
{printf"%s\n", $5}
END {print "Total Names:" NR}
Following the BEGIN pattern is a definition of the . This is therefore done before the first line is
read. Each line is processed four times, where we print a different set of fields each time. When we
finish, our output looks like this:
Blinn, David
42 Clarke Street
Sunnyvale, California
95123
Dickson, Tillman
8250 Darryl Lane
San Jose, California
95032
Giberson, Suzanne
102 Truck Stop Road
Ben Lomond, California
96221
Holder, Wyliam
1932 Nuldev Street
Mount Hermon, California
95431
Nathanson, Robert
12 Peabody Lane
Beaverton, Oregon
97532
Richards, John
1232 Bromide Drive
Boston, Massachusetts
02134
Shaffer, Shannon
98 Whatever Way
Watsonville, California
95332
Total Names:7
Aside from having a pre-defined set of variables to use, awk allows us to define variables ourselves.
If in the last awk script we had wanted to print out, lets say, the average age, we could add a line in
the middle of the script that looked like this:
{total = total + $6 }
Because $6 denotes the age of each person, every time we run through the loop, it is added to the
variable total. Unlike other languages, such as C, we don't have to initialize the variables; awk will
do that for us. Strings are initialized to the null string and numeric variables are initialized to 0.
After the END, we can include another line to print out our sum, like this:
{print "Average age: " total/NR
}
Table awk Comparison Operators

Operator Meaning

< less than

<= less than or equal to

== equal to

!= not equal to

>= greater than or equal to

greater than
>

Table Default Values of awk Built-in Variables

Variable Meaning Default

ARGC number of command-line arguments -

ARGV array of command-line arguments -

FILENAME name of current input file -

FNR record number in current file -

FS input field separator space or tab

number of fields in the current


NF -
record

NR number of records read -

OFMT numeric output format %.6g

OFS output field separator space

ORS output record separator new line

RS input record separator new line


Is that all there is to it? No. In fact, we haven't even touched the surface. awk is a very complex
programming language and there are dozens more issues that we could have addressed. Built into
the language are mathematical functions, if and while loops, the ability to create your own
functions, strings and array manipulation, and much more.
Unfortunately, this is not a book on UNIX programming languages. Some readers may be
disappointed that I do not have the space to cover awk in more detail. I am also disappointed.
However, I have given you a basic introduction to the constructs of the language to enable you to
better understand the more than 100 scripts on your system that use awk in some way.

Perl
If you plan to do anything serious on the Web, I suggest that you learn perl. In fact, if you plan to do
anything serious on your machine, then learning perl is also a good idea. Although not available on
a lot of commercial versions, perl is almost universally available with Linux.
Now, I am not saying that you shouldn't learn sed, awk, and shell programming. Rather, I am saying
that you should learn all four. Both sed and awk have been around for quite a while, so they are
deeply ingrained in the thinking of most system administrators. Although you could easily find a
shell script on the system that didn't have elements of sed or awk in it, you would be very hard
pressed to find a script that had no shell programming in it. On the other hand, most of the scripts
that process information from other programs use either sed or awk. Therefore, it is likely that you
will eventually come across one or the other.
perl is another matter altogether. None of the standard scripts have perl in them. This does not say
anything about the relative value of perl, but rather the relative availability of it. Because it can be
expected that awk and sed are available, it makes sense that they are commonly used. perl may not
be on your machine and including it in a system shell script might cause trouble.
In this section, I am going to talk about the basics of perl. We'll go through the mechanics of
creating perl scripts and the syntax of the perl language. There are many good books on perl, so I
would direct you to them to get into the nitty-gritty. Here we are just going to cover the basics. Later
on, we'll address some of the issues involved with making perl scripts to use on your Web site.
One aspect of perl that I like is that it contains the best of everything. It has aspects of C, shell, awk,
sed and many other things. perl is also free. The source code is readily available and the versions
that I have came with configuration scripts that determined the type of system I had and set up the
make-files accordingly. Aside from Linux, I was able to compile the exact same source on my Sun
Solaris workstation. Needless to say, the scripts that I write at home run just as well at work.
I am going to make assumptions as to what level of programming background you have. If you read
and understood the sections on sed, awk, and the shell, then you should be ready for what comes
next. In this section, I am going to jump right in. I am not going to amaze you with demonstrations
of how perl can do I/O, as that's what we are using it for in the first place. Instead, I am going to
assume that you want to do I/O and jump right into how to do it.
Lets create a shell script called hello.pl. The pl extension has no real meaning, although I have seen
many places where it is always used as an extension. It is more or less conventional to do this, just
as text files traditionally have the extension .txt, shell scripts end in .sh, etc.
We'll start off with the traditional
print "Hello, World!\n";
This shell script consists of a single perl statement, whose purpose is to output the text inside the
double-quotes. Each statement in perl is followed by a semi-colon. Here, we are using the perl print
function to output the literal string "Hello, World!\n" (including the trailing new line). Although we
don't see it, there is the implied file handle to stdout. The equivalent command with the explicit
reference would be
print STDOUT "Hello, World!\n";
Along with STDOUT, perl has the default file handlers STDIN and STDERR. Here is a quick script
that demonstrates all three as well as introduces a couple of familiar programming constructs:
while (<STDIN>)

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

print STDOUT "Input: $_ \n";

Functioning the same as in C and most shells, the while line at the top says that as long as there is
something coming from STDIN, do the loop. Here we have the special format (<STDIN>), which
tells perl where to get input. If we wanted, we could use a file handle other than STDIN. However,
we'll get to that in a little bit.
One thing that you need to watch out for is that you must include blocks of statements (such as after
while or if statements) inside the curly braces ({}). This is different from the way you do it in C,
where a single line can follow while or if. For example, this statement is not valid in perl:
while ( $a < $b )

$a++;

You would need to write it something like this:

while ( $a < $b ) {

$a++;

Inside the while loop, we get to an if statement. We compare the value of the special variable $_ to
see if it is empty. The variable $_ serves several functions. In this case, it represents the line we are
reading from STDIN. In other cases, it represents the pattern space, as in sed. If the latter is true,
then just the Enter key was pressed. If the line we just read in is equal to the newline character (just
a blank line), we use the print function, which has the syntax : print [filehandler] "text_to_print";
In the first case, filehandler is stderr and in the second case stdout is the filehandler. In each case,
we could have left out the filehandler and the output would go to stout.
Each time we print a line, we need to include a newline (\n) ourselves.
We can format the print line in different ways. In the second print line, where the input is not a
blank line, we can print "Input: " before we print the line just input. Although this is a very simple
way of outputting lines, it gets the job done. More complex formatting is possible with the perl
printf function. Like its counterpart in C or awk, you can come up with some very elaborate outputs.
We'll get into more details later.
One more useful function for processing lines of input is split. The split function is used to, as its
name implies, to split a line based on a field separator that you define. Say, for example, a space.
The line is then stored in an array as individual elements. So, in our example, if we wanted to input
multiple words and have them parsed correctly, we could change the script to look like this:
while (<STDIN>)

@field = split( ,$_);

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

print STDOUT "$_ \n";

print $field[0];

print $field[1];

print $field[2];

The split function has the syntax


split(pattern,line);
where pattern is our field separator and line is the input line. So our line
@field = split( ,$_);
says to split the line we just read in (stored in $_) and use a space ( ) as the field separator. Each
field is then placed into an element of the array field. The @ is needed in front of the variable field
to indicate that it's an array. In perl, there are several types of variables. The first kind we have
already met before. The special variable $_ is an example of a scalar variable. Each scalar variable
is preceded by a dollar sign ($) and can contain a single value, whether a character string or a
number. How does perl tell the difference? It depends on the context. perl will behave correctly by
looking at what you tell it to do with the variable. Other examples of scalars are
$name = "jimmo";
$initial = j;
$answertolifetheuniverseandeverything = 42;
Another kind of variable is an array, as we mentioned before. If we precede a variable with %, we
have an array. But don't we have an array with @? Yes, so whats the difference? The difference is
that arrays, starting with the @, are referenced by numbers, while those starting with the % are
referenced by a string. We'll get to how that works as we move along.
In our example, we are using the split function to fill up the array @field. This array will be
referenced by number. We see the way it is referenced in the three print statements toward the end
of the script.
If our input line had a different field separator (for example, %), the line might look like this:
@field = split(%,$_);
In this example, we are outputting the first three words that are input. But what if there are more
words? Obviously we just add more print statements. What if there are fewer words? Now we run
into problems. In fact, we run into problems when adding more print statements. The question is,
where do we stop? Do we set a limit on the number of words that can be input? Well, we can avoid
all of these problems by letting the system count for us. Changing the script a little, we get
while (<STDIN>)
{

@field = split( ,$_);

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

foreach $word (@field){

print $word,"\n";

} } }

In this example, we introduce the foreach construct. This has the same behavior as a for loop. In
fact, in perl, for and foreach are interchangeable, provided you have the right syntax. In this case,
the syntax is : foreach $variable (@array)

where $variable is our loop variable and @array is the name of the array. When the script is run,
@array is expanded to its components. So, if we had input four fruits, our line might have looked
like this: foreach $word(apple,bananna,cherry,orange);

Because I don't know how many elements there are in the array field, foreach comes in handy. In
this example, every word separated by a space will be printed on a line by itself, like this:
perl script.pl
one two three
one
two
three
^D
Our next enhancement is to change the field separator. This time we'll use an ampersand (&)
instead. The split line now looks like this:
@field = split(&,$_);
When we run the script again with the same input, what we get is a bit
different:
# perl script.pl
one two three
one two three
The reason why we get the output on one line is because the space is no longer a field separator. If
we run it again, this time using &, we get something different:
# perl script.pl
one&two&three
one
two
three
This time, the three words were recognized as separate fields.
Although it doesn't seem too likely that you would be inputting data like this from the keyboard, it
is conceivable that you might want to read a file that has data stored like this. To make things easy, I
have provided a file that represents a simple database of books. Each line is a record and represents
a single book, with the fields separated by %.
To be able to read from a file, we must create a file handle. To do this, we add a line and change the
while statement so it looks like this:
open ( INFILE,"< bookdata.txt");
while (<INFILE>)
The syntax of the open function is
open(file_handle,openwhat_&_how);
The way we open a file depends on the way we want to read it. Here, we use standard shell
redirection symbols to indicate how we want to read the specified file. In our example, we indicate
redirection from the file bookdata.txt. This says we want to read from the file. If we wanted to open
the file for writing, the line would look like this:
open ( INFILE,"> bookdata.txt");
If we wanted to append to the file, we could change the redirections so the line would look like this:
open ( INFILE,">> bookdata.txt");
Remember I said that we use standard redirection symbols. This also includes the pipe symbol. As
the need presents itself, your perl script can open a pipe for either reading or writing. Assuming that
we want to open a pipe for writing that sends the output through sort, the line might look like this:
open ( INFILE,"| sort ");
Remember that this would work the same as from the command line. Therefore, the output is not
being written to a file; it is just being piped through sort. However, we could redirect the output of
sort , if we wanted. For example:
open ( INFILE,"| sort > output_file");
This opens the file output_file for writing, but the output is first piped through sort . In our example,
we are opening the file bookdata.txt for reading. The while loop continues through and outputs each
line read. However, instead of being on a single line, the individual fields (separated by &) are
output on a separate line.
We can now take this one step further. Lets now assume that a couple of the fields are actually
composed of subfields. These subfields are separated by a plus sign (+). We now want to break up
every field containing + into its individual subfields.
As you have probably guessed, we use the split command again. This time, we use a different
variable and instead of reading out of the input line ($_), we read out of the string $field. Therefore,
the line would look like this:
@subfield = split(\+,$field);
Aside from changing the search pattern, I added the back slash (\) because + is used in the search
pattern to represent one or more occurrences of the preceding character. If we don't escape it, we
generate an error. The whole script now looks like this:
open(INFILE,"<bookdata.txt");

while (<INFILE>)

@data = split(&,$_);

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

foreach $field (@data){

@subfield = split(\+,$field);

foreach $word (@subfield){

print $word,"\n";

} } } }
If we wanted, we could have written the script to split the incoming lines at both & and +. This
would have given us a split line that looked like this:
@data = split([&\+],$_);
The reason for writing the script like we did was that it was easier to separate subfields and still
maintain their relationships. Note that the search pattern used here could have been any regular
expression. For example, we could have split the strings every place there was the pattern Di
followed by e, g, or r, but not if it was followed by i. The regular expression would be
Di[reg][^i]
so the split function would be:
@data = split(Di[reg][^i],$_);
At this point, we can read in lines from an ASCII file, separate the lines based on what we have
defined as fields, and then output each line. However, the lines don't look very interesting. All we
are seeing is the content of each field and do not know what each field represents. Let's change the
script once again. This time we will make the output show us the field names as well as their
content.
Lets change the script so that we have control over where the fields end up. We still use the split
statement to extract individual fields from the input string. This is not necessary because we can do
it all in one step, but I am doing it this way to demonstrate the different constructs and to reiterate
that in perl, there is always more than one way do to something. So, we end up with the following
script:
open(INFILE,"< bookdata.txt");

while (<INFILE>)

@data = split(&,$_);

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

$fields = 0;

foreach $field (@data){

$fieldarray[$fields] = $field;

print $fieldarray[$fields++]," ";

} } }

Each time we read a line, we first split it into the array @data, which is then copied into the fields
array. Note that there is no new line in the print statement, so each field will be printed with just a
space and the newline read at the end of each input line will then be output. Each time through the
loop, we reset our counter (the variable $fields) to 0.
Although the array is re-filled every time through the loop and we lose the previous values, we
could assign the values to specific variables.
Lets now make the output a little more attractive by outputting the field headings first. To make
things simpler, lets label the fields as follows
title, author, publisher, char0, char1, char2, char3, char4, char5
where char0-char5 are simply characteristics of a book. We need a handful of if statements to make
the assignment, which look like this:
foreach $field (@data){

if ( $fields = = 0 ){

print "Title: ",$field;

if ( $fields = = 1 ){

print "Author: ",$field;

if ( $fields = = 8 ){

print "Char 5: ",$field;

Here, too, we would be losing the value of each variable every time through the loop as they get
overwritten. Lets just assume we only want to save this information from the first line (our
reasoning will become clear in a minute). First we need a counter to keep track of what line we are
on and an if statement to enter the block where we make the assignment. Rather than a print
statement, we change the line to an assignment, so it might look like this: $title = $field;

When we read subsequent lines, we can output headers for each of the fields. We do this by having
another set of if statements that output the header and then the value, which is based on its position.
Actually, there is a way of doing things a little more efficiently. When we read the first line, we can
assign the values to variables on a single line. Instead of the line
foreach $field (@data) {
we add the if statement to check if this is the first line. Then we add the line
($field0,$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8)=
split(&,$_);

Rather than assigning values to elements in an array, we are assigning them to specific variables.
(Note that if there are more fields generated by the split command than we specified variables for,
the remaining fields are ignored.) The other advantage of this is that we saved ourselves a lot of
space. We could also call these $field1, $field2, etc., thereby making the field names a little more
generic. We could also modify the split line so that instead of several separate variables, we have
them in a single array called field and we could use the number as the offset into the array.
Therefore, the first field would be referenced like this:
$field[0]
The split command for this would look like this
@field=split(&,$_);
which looks like something we already had. It is. This is just another example of the fact that there
are always several different ways of doing things in perl.
At this point, we still need the series of if statements inside of the foreach loop to print out the line.
However, that seems like a lot of wasted space. Instead, I will introduce the concept of an associated
list. An associated list is just like any other list, except that you reference the elements by a label
rather than a number.
Another difference is that associated arrays, also referred to as associated lists, are always an even
length. This is because elements come in pairs: label and value. For example, we have:
%list= (name,James Mohr, logname, jimmo, department,IS);

Note that instead of $ or @ to indicate that this is an array, we use %. This specifies that this is an
associative array, so we can refer to the value by label; however, when we finally reference the
value, we use $. To print out the name, the line would look like this:
print "Name:",$list{name};

Also, the brackets we use are different. Here we use curly braces ({}) instead of square brackets
([]).
The introduction of the associate array allows us to define field labels within the data itself and
access the values using these labels. As I mentioned, the first line of the data file containing the field
labels. We can use these labels to reference the values. Lets look at the program itself:
open(INFILE,"< bookdata.txt");

$lines=0;

while (<INFILE>)

chop;

@data = split(&,$_);

if ( $lines == 0 )

@headlist=split(&,$_);

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field], );

$lines++;
}

else {

foreach $field (0..@data-1){

$headers{$headlist[$field]}=@data[$field];

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

} } }

At the beginning of the script, we added the chop function, which "chops" off the last character of a
list or variable and returns that character. If you don't mention the list or variable, chop affects the
$_ variable. This function is useful to chop off the newline character that gets read in. The next
change is that we removed the block that checked for blank lines and generated an error.
The first time we read a line, we entered the appropriate block. Here, we just read in the line
containing the field labels and we put each entry into the array headlist via the split function. The
foreach loop also added some new elements:

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field], );

The first addition is the element (0.. @headlist-1). Two numbers separated by two dots indicate a
range. We can use @headlist as a variable to indicate how many elements are in the array headlist.
This returns a human number, not a computer number (one that starts at 0). Because I chose to
access all my variables starting with 0, I needed to subtract 1 from the value of @headlist. There are
nine elements per line in the file bookdata.txt; therefore, their range is 0..9-1.
However, we don't need to know that! In fact, we don't even know how many elements there are to
make use of this functionality. The system knows how many elements it read in, so we don't have
to. We just use @headlist-1 (or whatever).
The next line fills in the elements of our associative array:
%headers = ( $headlist[$field], );
However, we are only filling in the labels and not the values themselves. Therefore, the second
element of the pair is empty (). One by one, we write the label into the first element of each pair.
After the first line is read, we load the values. Here again, we have a foreach loop that goes from 0
to the last element of the array. Like the first loop, we don't need to know how many elements were
read, as we let the system keep track of this for us. The second element in each pair of the
associative list is loaded with this line: $headers{$headlist[$field]}=@data[$field];

Lets take a look at this line starting at the left end. From the array @data (which is the line we just
read in), we are accessing the element at the offset that is specified by the variable $field. Because
this is just the counter used for our foreach loop, we go through each element of the array data one
by one. The value retrieved is then assigned to the left-hand side.
On the left, we have an array offset being referred to by an array offset. Inside we have
$headlist[$field]
The array headlist is what we filled up in the first block. In other words, the list of field headings.
When we reference the offset with the $field variable, we get the field heading. This will be used as
the string for the associative array. The element specified by
$headers{$headlist[$field]}
corresponds to the field value. For example, if the expression
$headlist[$field]}
evaluated to title, the second time through the loop, the expression $headers{$headlist[$field}
would evaluate to "2010: Odyssey Two."
At this point, we are ready to make our next jump. We are going to add the functionality to search
for specific values in the data. Lets assume that we know what the fields are and wish to search for a
particular value. For example, we want all books that have scifi as field char0. Assuming that the
script was called book.pl, we would specify the field label and value like this:
perl book.pl char0=scifi
Or we could add #!/usr/bin/perl to the top of the script to force the system to use perl as the
interpreter. We would run the script like this:
book.pl char0=scifi
The completed script looks like this:
($searchfield,$searchvalue) = split(=,$ARGV[0]);
open(INFILE,"< bookdata.txt");

$lines=0;

while (<INFILE>)

chop;

@data = split(&,$_);

if ( $_ eq "\n" )

print STDERR "Error: \n";

} else {

if ( $lines == 0 )

@headlist=split(&,$_);

foreach $field (0..@headlist-1){

%headers = ( $headlist[$field], );

$lines++;

} else { foreach $field (0..@data-1){


$headers{$headlist[$field]}=@data[$field];

if ( ($searchfield eq $headlist[$field] ) &&

($searchvalue eq $headers{$headlist[$field]} )) {

$found=1;

if ( $found == 1 )

foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

$found=0;

< P>}

We added a line at the top of the script that splits the first argument on
the command line:
($searchfield,$searchvalue) = split(=,$ARGV[0]);
Note that we are accessing ARGV[0]. This is not the command being called, as one would expect in
a C or shell program. Our command line has the string char0=scifi as its $ARGV[0]. After the split,
we have $searchfield=char0 and $searchvalue=scifi.
Some other new code looks like this:
if ( ($searchfield eq $headlist[$field] ) &&
($searchvalue eq $headers{$headlist[$field]} )) {
$found=1;
Instead of outputting each line in the second foreach loop, we are changing it so that here we are
checking to see if the field we input, $searchfield, is the one we just read in $headlist[$field] and if
the value we are looking for, ($searchvalue), equals the one we just read in.
Here we add another new concept: logical operators. These are just like in C, where && means a
logical AND and || is a logical OR. If we want a logical comparison of two variables and each has a
specific value, we use the logical AND, like
if ( $a == 1 && $b = 2)
which says if $a equals 1 AND $b equals 2, execute the following block. If we wrote it like this
if ( $a == 1 || $b = 2)
it would read as follows: if $a equals 1 OR $b equals 2, execute the block. In our example, we are
saying that if the search field ($searchfield) equals the corresponding value in the heading list
($headlist[$field]) AND the search value we input ($searchvalue) equals the value from the file
($headers{$headlist[$field]}), we then execute the following block. Our block is simply a flag to
say we found a match.
Later, after we read in all the values for each record, we check the flag. If the flag was set, the
foreach loop is executed:
if ( $found == 1 )

foreach $field (0..@data-1){

print $headlist[$field],": ", $headers{$headlist[$field]},"\n";

Here we output the headings and then their corresponding values. But what if we aren't sure of the
exact text we are looking for. For example, what if we want all books by the author Eddings, but do
not know that his first name is David? Its now time to introduce the perl function index. As its name
implies, it delivers an index. The index it delivers is an offset of one string in another. The syntax is
index(STRING,SUBSTRING,POSITION)
where STRING is the name of the string that we are looking in, SUBSTRING is the substring that
we are looking for, and POSITION is where to start looking. That is, what position to start from. If
POSITION is omitted, the function starts at the beginning of STRING. For example
index(pie,applepie);
will return 5, as the substring pie starts at position 5 of the string applepie. To take advantage of this,
we only need to change one line. We change this
if ( ($searchfield eq $headlist[$field] ) &&
($searchvalue eq $headers{$headlist[$field]} )) {
to this
if ( (index($headlist[$field],$searchfield)) != -1 &&
index($headers{$headlist[$field]},$searchvalue) != -1 ) {
Here we are looking for an offset of -1. This indicates the condition where the substring is not
within the string. (The offset comes before the start of the string.) So, if we were to run the script
like this : script.pl author=Eddings

we would look through the field author for any entry containing the string Eddings. Because there
are records with an author named Eddings, if we looked for Edding, we would still find it because
Edding is a substring of "David Eddings."
As you might have noticed, we have a limitation in this mechanism. We must ensure that we spell
things with the right case. Because Eddings is uppercase both on the command line and in the file,
there is no problem. Normally names are capitalized, so it would make sense to input them as such.
But what about the title of a book? Often, words like "the" and "and" are not capitalized. However,
what if the person who input the data, input them as capitals? If you looked for them in lowercase,
but they were in the file as uppercase, you'd never find them.
To consider this possibility, we need to compare both the input and the fields in the file in the same
case. We do this by using the tr (translate) function. It has the syntax
tr/SEARCHLIST/REPLACEMENTLIST/[options]
where SEARCHLIST is the list of characters to look for and REPLACEMENTLIST is the
characters to use to replace those in SEARCHLIST. To see what options are available, check the
perl man-page. We change part of the script to look like this:
foreach $field (0..@data-1){

$headers{$headlist[$field]}=@data[$field];

($search1 = $searchfield) =~ tr/A-Z/a-z/;

($search2 = $headlist[$field] ) =~ tr/A-Z/a-z/;

($search3 = $searchvalue)=~tr/A-Z/a-z/;

($search4 = $headers{$headlist[$field]})=~tr/A-Z/a-z/;

if ( (index($search2,$search1) != -1) && (index($search4,$search3) != -1) ) {

$found=1;

In the middle of this section are four lines where we do the translations. This demonstrates a special
aspect of the tr function. We can do a translation as we are assigning one variable to another. This is
useful because the original strings are left unchanged. We must change the statement with the index
function and make comparisons to reflect the changes in the variables.
So at this point, we have created an interface in which we can access a "database" and search for
specific values.
When writing conditional statements, you must be sure of the condition you are testing. Truth, like
many other things, is in the eye of the beholder. In this case, it is the perl interpreter that is
beholding your concept of true. It may not always be what you expect. In general, you can say that a
value is true unless it is the null string (), the number zero (0), or the literary string zero ("0").
One important feature of perl is the comparison operators. Unlike C, there are different operators for
numeric comparison and for string comparison. They're all easy to remember and you have certainly
seen both sets before, but keep in mind that they are different. Table 0-8 contains a list of the perl
comparison operators and Table 0-9 contains a list of perl operations.
Numeric String Comparison

== eq equal to

!= ne not equal to

> gt greater than

< lt less than

>= ge greater than or equal to

<= le less than or equal to

<=> cmp not equal to and sign is returned

(0 - strings equal, 1 - first string less,


-1 - first string greater)

Table -8 perl Comparison Operators


Another important aspect that you need to keep in mind is that there is really no such thing as a
numeric variable. Well, sort of. perl is capable of distinguishing between the two without you
interfering. If a variable is used in a context where it can only be a string, then that's they way perl
will interpret it as a string.
Lets take two variables: $a=2 and $b=10. As you might expect, the expression $a < $b evaluates to
true because we are using the numeric comparison operator <. However, if the expression were $a lt
$b, it would evaluate to false. This is because the string "10" comes before "2" lexigraphically (it
comes first alphabetically).
Besides simply translating sets of letters, perl can also do substitution. To show you this, I am going
to show you another neat trick of perl. Having been designed as a text and file processing language,
it is very common to read in a number of lines of data and processing them all in turn. We can tell
perl that it should assume we want to read in lines although we don't explicitly say so. Lets take a
script that we call fix.pl. This script looks like this:
s/James/JAMES/g;
s/Eddings/EDDINGS/g;
This syntax is the same as you would find in sed; however, perl has a much larger set of regular
expressions. Trying to run this as a script by itself will generate an error; instead, we run it like this:
perl -p fix.pl bookdata.pl
The -p option tells perl to put a wrapper around your script. Therefore, our script would behave as
though we had written it like this:
while (<>) {

s/James/JAMES/g;

s/Eddings/EDDINGS/g;

} continue {

print;

This would read each line from a file specified on the command line, carry out the substitution, and
then print out each line, changed or not. We could also take advantage of the ability to specify the
interpreter with #!. The script would then look like
#!/usr/bin/perl -p
s/James/JAMES/g;
s/Eddings/EDDINGS/g;
Another command line option is -i. This stands for "in-place," and with it you can edit files "in-
place." In the example above, the changed lines would be output to the screen and we would have to
redirect them to a file ourselves. The -i option takes an argument, which indicates the extension you
want for the old version of the file. So, to use the option, we would change the first line, like this:
#!/usr/bin/perl -pi.old
With perl, you can also make your own subroutines. These subroutines can be written to return
values, so that you have functions as well. Subroutines are first defined with the sub keyword and
are called using &. For example:
#!/usr/bin/perl

sub usage {

print "Invalid arguments: @ARGV\n";

print "Usage: $0 [-t] filename\n";

if ( @ARGV < 1 || @ARGV > 2 ) {

&usage;

This says that if the number of arguments from the command line @ARGV is less than 1 or greater
than 2, we call the subroutine usage, which prints out a usage message.
To create a function, we first create a subroutine. When we call the subroutine, we call it as part of
an expression. The value returned by the subroutine/function is the value of the last expression
evaluated.
Lets create a function that prompts you for a yes/no response:
#!/usr/bin/perl

if (&getyn("Do you *really* want to remove all the files in this directory? ")

eq "y\n" )

print "Don't be silly!\n"

sub getyn{

print @_;

$response = (<STDIN>);

This is a very simple example. In the subroutine getyn, we output everything that is passed to the
subroutine. This serves as a prompt. We then assign the line we get from stdin to the variable
$response. Because this is the last expression inside the subroutine to be evaluated, this is the value
that is returned to the calling statement.
If we enter "y" (which would include the new line from the Enter key), the calling if statement
passes the actual prompt as an argument to the subroutine. The getyn subroutine could then be used
in other circumstances. As mentioned, the value returned includes the new line; therefore, we must
check for "y\n." This is not "y" or "n," but rather "y#" followed by a newline.
Alternatively, we could check the response inside the subroutine. In other words, we could have
added the line : $response =~ /^y/i;

We addressed the =~ characters earlier in connection with the tr function. Here as well, the variable
on the left-hand side is replaced by the "evaluation" of the right. In this case, we use a pattern-
matching construct: /^y/i. This has the same behavior as sed, where we are looking for a y at the
beginning of the line. The trailing i simply says to ignore the case. If the first character begins with
a y or Y, the left-hand side ( $response) is assigned the value 1; if not, it becomes a null string.
We now change the calling statement and simply leave off the comparison to "y\n". Because the
return value of the subroutine is the value of the last expression evaluated, the value returned now is
either "1" or ". Therefore, we don't have to do any kind of comparison, as the if statement will react
according to the return value.
I wish I could go on. I haven't even hit on a quarter of what perl can do. Unfortunately, like the
sections on sed and awk, more details are beyond the scope of this book. Instead, I want to refer you
to a few other sources. First, there are two books from O'Reilly and Associates. The first is
Learning perl by Randal Schwartz. This is a tutorial. The other is Programming perl by Larry Wall
and Randal Schwartz. If you are familiar with other UNIX scripting languages, I feel you would be
better served by getting the second book.
The next suggestion I have is that you get the perl CD-ROM from Walnut Creek CD-ROM
(www.cdrom.com). This is loaded with hundreds of megabytes of perl code and the April 1996
version, which I used, contains the source code for perl 4 (4.036) and perl5 (5.000m). In many
cases, I like this approach better because I can see how to do the things I need to do. Books are
useful to get the basics and reminders of syntax, options, etc. However, seeing someone else's code
shows me how to do it.
Another good CD-ROM is the Mother of PERL CD from InfoMagic (www.infomagic.com). It, too,
is loaded with hundreds of megabytes of perl scripts and information.
There are a lot of places to find sample scripts while you are waiting for the CD to arrive. One place
is the Computers and Internet: Programming Languages: Perl hierarchy at Yahoo.
(www.yahoo.com). You can use this as a springboard to many sites that not only have information
on perl but data on using perl on the Web (e.g., in CGI scripts).
Chapter VI
Basic Administration
It's difficult to put together a simple answer when I'm asked about the job of a system administrator.
Every aspect of the system can fall within the realm of a system administrator. Entire books have
been written about just the software side, and for most system administrators, hardware, networks,
and even programming fall into their laps.
I work for the largest developer of online broker software in Germany. In addition to the software,
we also run the data centers for several online brokers. I am responsible for monitoring the systems
and providing reports on several levels, performance and many other things. I am expected to
understand how our software works with all of its various components, how they work with third
party products; as well as the workings of the network, firewalls, Solaris, Linux, Windows 2000 and
XP, perl, shell scripting, and so forth.
There is very little here on my site that does not directly relate to my job as a system administrator.
For the most part, you need to be a jack of all trades. Although Linux has come a long way in the
last few years and you no longer need to be a "guru" to get it to work, knowing how to administer
your system allows you to go beyond what is delivered to you out of the box.
In this chapter, we are just going to go through the basics. We won't necessarily be talking about
individual steps or processes used by the administrator, but rather about functional areas. With this,
I hope to be able to give you enough background to use the programs and utilities that the system
provides for you.

Starting and Stopping the System


Almost every user and many administrators never see what happens while the system boots, and
those who do often do not understand what they are seeing. Those who do often are not sure what is
happening. From the time you flip the power switch to the time you get that first login: prompt,
dozens of things must happen, many of which happen long before the system knows that it's
running Linux. Knowing what is happening as the system boots and in what order it is happening is
very useful when your system does not start the way it should.
In this chapter, I will first talk about starting your system. Although you can get it going by flipping
on the power switch and letting the system boot by itself, there are many ways to change the
behavior of your system as it boots. How the system boots depends on the situation. As we move
along through the chapter, we'll talk about the different ways to influence how the system boots.
After we talk about how to start your system, we'll look at a few ways to alter your system's
behavior when it shuts down.

The Boot Process


The process of turning on your computer and having it jump through hoops to bring up the
operating system is called booting, which derives from the term bootstrapping. This is an allusion to
the idea that a computer pulls itself up by its bootstraps, in that smaller pieces of simple code start
larger, more complex pieces to get the system running.
The process a computer goes through is similar among different computer types, whether it is a PC,
Macintosh, or SPARC Workstation. In the next section, I will be talking specifically about the PC,
though the concepts are still valid for other machines.
The first thing that happens is the Power-On Self-Test (POST). Here the hardware checks itself to
see that things are all right. It compares the hardware settings in the CMOS (Complementary Metal
Oxide Semiconductor) to what is physically on the system. Some errors, like the floppy types not
matching, are annoying, but your system still can boot. Others, like the lack of a video card, can
keep the boot process from continuing. Often, there is nothing to indicate what the problem is,
except for a few little "beeps."
Once the POST is completed, the hardware jumps to a specific, predefined location in RAM. The
instructions located here are relatively simple and basically tell the hardware to go look for a boot
device. Depending on how your CMOS is configured, the hardware first checks your floppy and
then your hard disk.
When a boot device is found (let's assume that it's a hard disk), the hardware is told to go to the 0th
(first) sector ( 0, 0, sector 0), then load and execute the instructions there. This is the master boot
record, or MBR for you DOS-heads (sometimes also called the master boot block.) This code is
small enough to fit into one block but is intelligent enough to read the partition table (located just
past the master boot block) and find the active partition. Once it finds the active partition, it begins
to read and execute the instructions contained within the first block.
It is at this point that viruses can affect/infect Linux systems. The master boot block has the same
format for essentially all PC-based operating systems and it does is find and execute code at the
beginning of the active partition. But if the master boot block contains code that tells it to go to the
very last sector of the hard disk and execute the code there, which then tells the system to execute
code at the beginning of the active partition, you would never know anything was wrong.
Let's assume that the instructions at the very end of the disk are larger than a single 512-byte sector.
If the instructions took up a couple of kilobytes, you could get some fairly complicated code.
Because it is at the end of the disk, you would probably never know it was there. What if that code
checked the date in the CMOS and, if the day of the week was Friday and the day of the month was
13, it would erase the first few kilobytes of your hard disk? If that were the case, then your system
would be infected with the Friday the 13th virus, and you could no longer boot your hard disk.
Viruses that behave in this way are called "boot viruses," as they affect the master boot block and
can only damage your system if this is the disk from which you are booting. These kinds of viruses
can affect all PC-based systems. Some computers will allow you to configure them (more on that
later) so that you cannot write to the master boot block. Although this is a good safeguard against
older viruses, the newer ones can change the CMOS to allow writing to the master boot block. So,
just because you have enabled this feature does not mean your system is safe. However, I must
point out that boot viruses can only affect Linux systems if you boot from an infected disk. This
usually will be a floppy, more than likely a DOS floppy. Therefore, you need to be especially
careful when booting from floppies.
Now back to our story...
As I mentioned, the code in the master boot block finds the active partition and begins executing the
code there. On an MS-DOS system, these are the IO.SYS and MSDOS.SYS files. On an Linux
system, this is often the LILO or Linux loader "program." Although IO.SYS and MSDOS.SYS are
"real" files that you can look at and even remove if you want to, the LILO program is not. The
LILO program is part of the partition, but not part of the file system; therefore, it is not a "real" file.
Regardless of what program is booting your system and loading the kernel, it is generally referred to
as a "boot loader".
Often, LILO is installed in the master boot block of the hard disk itself. Therefore, it will be the first
code to run when your system is booted. In this case, LILO can be used to start other operating
systems. On one machine, I have LILO start either Windows 95 or one of two different versions of
Linux.
In other cases, LILO is installed in the boot sector of a given partition. In this case, it is referred to
as a "secondary" boot loader and is used just to load the Linux installed on that partition. This is
useful if you have another operating system such as OS/2 or Windows NT and you use the boot
software from that OS to load any others. However, neither of these was designed with Linux in
mind. Therefore, I usually have LILO loaded in the master boot block and have it do all the work.
Assuming that LILO has been written to the master boot record and is, therefore, the master boot
record, it is loaded by the system BIOS into a specific memory location (0x7C00) and then
executed. The primary boot loader then uses the system BIOS to load the secondary boot loader into
a specific memory (0x9B000). The reason that the BIOS is still used at this point is that by
including the code necessary to access the hardware, the secondary boot loader would be extremely
large (at least by comparison to its current size.) Furthermore, it would need to be able to recognize
and access different hardware types such as IDE and EIDE, as well as SCSI, and so forth.
This limits LILO, because it is obviously dependant on the BIOS. As a result, LILO and the
secondary boot loader cannot access sectors on the hard disk that are above 1023. In fact, this is a
problem for other PC-based operating systems, as well. There are two solutions to this problem. The
original solution is simply to create the partitions so that the LILO and the secondary boot loader
are at cylinder 1023 or below. This is one reason for the moving the boot files into the /boot
directory which is often on a separate file system, that lies at the start of the hard disk.
The other solution is something called "Logical Block Addresses" (LBA). With LBA, the BIOS
"thinks" there are less sectors than there actually are. Details on LBA can be found in the section on
hard disks.
Contrary to common belief, it is actually the secondary boot loader that provides the prompt and
accepts the various options. The secondary boot loader is what reads the /boot/map file to determine
the location of kernel image to load.
You can configure LILO with a wide range of options. Not only can you boot with different
operating systems, but with Linux you can boot different versions of the kernel as well as use
different root file systems. This is useful if you are a developer because you can have multiple
versions of the kernel on a single system. You can then boot them and test your product in different
environments. We'll go into details about configuring LILO in the section on Installing your Linux
kernel.
In addition, I always have three copies of my kernel on the system and have configured LILO to be
able to boot any one of them. The first copy is the current kernel I am using. When I rebuild a new
kernel and install it, it gets copied to /vmlinuz.old, which is the second kernel I can access. I then
have a copy called /vmlinuz.orig, which is the original kernel from when I installed that particular
release. This, at least, contains the drivers necessary to boot and access my hard disk and CD-ROM.
If I can get that far, I can reinstall what I need to.
Typically on newer Linux versions, the kernel is no longer stored in the root directory, but rather in
the /boot directory. Also, you will find that it is common that the version number of the respective
kernel is added onto the end. For example, /boot/vmlinuz.2.4.18, which would indicate that this
kernel is version 2.4.18. What is important is that the kernel can be located when the system boots
and not what it is called.
During the course of this writing this material, I often had more than one distribution of Linux
installed on my system. It was very useful to see whether the application software provided with
one release was compatible with the kernel from a different distribution. Using various options to
LILO, I could boot one kernel but use the root file system from a different version. This was also
useful on at least one occasion when I had one version that didn't have the correct drivers in the
kernel on the hard disk and I couldn't even boot it.
Once your system boots, you will see the kernel being loaded and started. As it is loaded and begins
to execute, you will see screens of information flash past. For the uninitiated, this is overwhelming,
but after you take a closer look at it, most of the information is very straightforward.
Once you're booted, you can see this information in the file /usr/adm/messages. Depending on your
system, this file might be in /var/adm or even /var/log, although /var/log seems to be the most
common, as of this writing. In the messages file, as well as during the boot process, you'll see
several types of information that the system logging daemon (syslogd) is writing. The syslogd
daemon usually continues logging as the system is running, although you can turn it off if you want.
To look at the kernel messages messages after the system boots, you can use the dmesg command.
The general format for the entries is:
time hostname program: message
wheretime is the system time when the message is generated, hostname is the host that generated
the message, program is the program that generated the message, and message is the text of the
message. For example, a message from the kernel might look like this:
May 13 11:34:23 localhost kernel ide0: do_ide_reset: success
As the system is booting, all you see are the messages themselves and not the other information.
Most of what you see as the system boots are messages from kernel, with a few other things, so you
would see this message just as: ide0: do_ide_reset: success

Much of the information that the syslogd daemon writes comes from device drivers that perform
any initialization routines. If you have hardware problems on your system, this is very useful
information. One example I encountered was with two pieces of hardware that were both software-
configurable. However, in both cases, the software wanted to configure them as the same IRQ. I
could then change the source code and recompile so that one assigned a different IRQ.
You will also notice the kernel checking the existing hardware for specific capability, such as
whether an FPU is present, whether the CPU has the hlt (halt) instruction, and so on.
What is logged and where it is logged is based on the /etc/syslog.conf file. Each entry is broken
down into facility.priority, where facility is the part of the system such as the kernel or printer
spooler and security and priority indicate the severity of the message. The facility.priority ranges
from none, when no messages are logged, to emerg, which represents very significant events like
kernel panics. Messages are generally logged to one file or another, though emergency messages
should be displayed to everyone (usually done by default). See the syslog.conf man-page for details.
One last thing that the kernel does is start the init process, which reads the /etc/inittab file. It looks
for any entry that should be run when the system is initializing (the entry has a sysinit in the third
field) and then executes the corresponding command. (I'll get into details about different run-levels
and these entries shortly.)
The first thing init runs out of the inittab is the script /etc/rc.d/rc.sysinit , which is similar to the
bcheckrc script on other systems. As with everything else under /etc/rc.d, this is a shell script, so
you can take a look at it if you want. Actually, I feel that looking through and becoming familiar
with which scripts does what and it what order is a good way of learning about your system.
Among the myriad of things done here are checking and mounting file systems, removing old lock
and PID files, and enabling the swap space.
Note that if the file system check notes some serious problems, the rc.sysinit will stop and bring you
to a shell prompt, where you can attempt to clean up by hand. Once you exit this shell, the next
command to be executed (aside from an echo) is a reboot. This is done to ensure the validity of the
file systems.
Next, init looks through inittab for the line with initdefault in the third field. The initdefault entry
tells the system what run-level to enter initially, normally run-level 3 (without X Windows) or run-
level 5 (with X Windows). Other systems have the default run-level 1 to bring you into single-user
or maintenance mode. Here you can perform certain actions without worrying users or too many
other things happening on your system. (Note: You can keep users out simply by creating the file
/etc/nologin. See the nologin man-page for details.)
What kind of actions can you perform here? The action with the most impact is adding new or
updating software. Often, new software will affect old software in such a way that it is better not to
have other users on the system. In such cases, the installation procedures for that software should
keep you from installing unless you are in maintenance mode.
This is also a good place to configure hardware that you added or otherwise change the kernel.
Although these actions rarely impact users, you will have to do a kernel rebuild. This takes up a lot
of system resources and degrades overall performance. Plus, you need to reboot after doing a kernel
rebuild and it takes longer to reboot from run-level 3 than from run-level 1.
If the changes you made do not require you to rebuild the kernel (say, adding new software), you
can go directly from single-user to multi-user mode by running
init 3
The argument to init is simply the run level you want to go into, which, for most purposes, is run-
level 3. However, to shut down the system, you could bring the system to run-level 0 or 6. (See the
init man-page for more details.)
Init looks for any entry that has a 3 in the second field. This 3 corresponds to the run-level where we
currently are. Run-level 3 is the same as multi-user mode.
Within the inittab, there is a line for every run level that starts the script /etc/rc.d/rc, passing the run
level as an argument. The /etc/rc.d/rc script, after a little housekeeping, then starts the scripts for
that run level. For each run level, there is a directory underneath /etc/rc.d, such as rc3.d, which
contains the scripts that will be run for that run level.
In these directories, you may find two sets of scripts. The scripts beginning with K are the kill
scripts, which are used to shutdown/stop a particular subsystem. The S scripts are the start scripts.
Note that the kill and start scripts are links to the files in /etc/rc.d/init.d. If there are K and S scripts
with the same number, these are both linked to the same file.
This is done because the scripts are started with an argument of either start or stop. The script itself
then changes its behavior based on whether you told it to start or stop. Naming them something
(slightly) different allows us to start only the K scripts if we want to stop things and only the S
scripts when we want to start things.
When the system changes to a particular run level, the first scripts that are started are the K scripts.
This stops any of the processes that should not be running in that level. Next, the S scripts are run to
start the processes that should be running.
Let's look at an example. On most systems, run-level 3 is almost the same as run-level 2. The only
difference is that in run-level 2, NFS is not running. If you were to change from run-level 3 to run-
level 2, NFS would go down. In run-level 1 (maintenance mode), almost everything is stopped.

Run Levels
Most users are only familiar with two run states or run levels. The one that is most commonly
experienced is what is referred to a multiuser mode. This is where logins are enabled on terminals
when the network is running and the system is behaving "normally." The other run level is system
maintenance or single-user mode, when only a single user is on the system (root), probably doing
some kind of maintenance tasks. Although it could be configured to allow logins by other users,
usually the system is so configured that only one login is allowed on the system console.
On every system that I have encountered, Linux will automatically boot into run-level 3. This is the
normal operating mode. To get to a lower run level (for example, to do system maintenance), the
system administrator must switch levels manually.
It is generally said that the "system" is in a particular run-level. However, it is more accurate to say
that the init process is in a particular run level, because init determines what other processes are
started at each run-level.
In addition to the run levels most of us are familiar with, there are several others that the system can
run in. Despite this fact, few of them are hardly ever used. For more details on what these run levels
are, take a look at the init man-page.
The system administrator can change to a particular run level by using that run level as the
argument to init. For example, running init 2 would change the system to run-level 2. To determine
what processes to start in each run level, init reads the /etc/inittab file. This is defined by the second
field in the /etc/inittab file. Init reads this file and executes each program defined for that run level
in order. When the system boots, it decides what run level to go into based on the initdefault entry in
/etc/inittab.
The fields in the inittab file are:

id unique identity for that entry


rstate run level in which the entry will be processed
action tells init how to treat the process specifically
process what process will be started
One thing I need to point out is that the entries in inittab are not run exactly according to the order in
which they appear. If you are entering a run level other than S for the first time since boot-up, init
will first execute those entries with a boot or bootwait in the third column. These are those
processes that should be started before users are allowed access to the system, such as checking then
mounting the status of the file systems.
In run-level 3, the /sbin/mingetty process is started on the terminals specified. The getty process
gives you your login: prompt. When you have entered your logname for the first time, getty starts
the login process, which asks you for your password. If your password is incorrect, you are
prompted to input your logname again. If your password is correct, then the system starts your
"login shell. " Note that what gets started may not be a shell at all, but some other program. The
term "login shell" is the generic term for whatever program is started when you login. This is
defined by the last field of the corresponding entry in /etc/passwd.
Keep in mind that you can move in either direction, that is, from a lower to higher run level or from
a higher to lower run level without having to first reboot. init will read the inittab and start or stop
the necessary processes. If a particular process is not defined at a particular run level, then init will
kill it. For example, assume you are in run-level 3 and switch to run-level 1. Many of the processes
defined do not have a 1 in the second field. Therefore, when you switch to run-level 1, those
processes and all their children will be stopped.
If we look at the scripts in rc1.d, we see there all the scripts are kill scripts, with the exception of
one start script. It is this start script that actually kills all the processes. It does exec init -t1 S, which
brings the system into maintenance mode in one (-t1) minute.
To shutdown the system immediately, you could run: init 0

which will bring the system immediately into run-level 0. As with run-level 1, there is only one start
script for run-level 0. It is this script that kills all the processes, unmounts all the file systems, turns
off swap, and brings the system down.
After it has started the necessary process from inittab, init just waits. When one of its "descendants"
dies (a child process of a child process of a child process, etc., of a process that init started), init
rereads the inittab to see what should be done. If, for example, there is a respawn entry in the third
field, init will start the specified process again. This is why when you log out, you immediately get
a new login: prompt.
Because init just waits for processes to die, you cannot simply add an entry to inittab and expect the
process to start up. You have to tell init to reread the inittab. However, you can force init to reread
the inittab by running init (or telinit) Q.
In addition to the run levels we discussed here, several more are possible. There are three "pseudo"
run-levels a, b, and c. These are used to start specific programs as needed or "on demand". Any
listed in inittab with the approapriate run-level will be started, however no actual run-level change
occurs. If you're curious about the details, take a look at the init(8) man-page or the section on init-
scripts.

Action Meaning
boot Executed during system boot.
bootwait Executed during system boot, but init waits until they have completed.
initdefault The default run level init starts after the system boots.
ondemand Executed when one of the "on demand" run levels is called (a,b, and c)
powerwait Executed when the system power fails. Init will wait until the command completes.
powerfail Executed when the system power fails, but init will not wait for completion.
powerokwait Executed when init is informed that power has been restored.
powerfailnow Executed when init is informed that the external battery is empty.
resume Executed when init is told by the kernel that "Software Suspend"
sysinit Executed during system boot before any boot or bootwait entries.
respawn Restarted if the processes stops.
wait Started once when the specific run-level is entered and init waits for completion.
Started once when the specific run-level is entered but init does not wait for
once
completion.
ctrlaltdel Execute when someone on the system console presses CTRL-ALT-DEL.
Table - List of inittab actions.
If necessary, you can add your own entries into /etc/inittab. However, what is typically done is that
init-scripts are added to the appropriate directory for the run-level where you want to start it.
Depending on your Linux distribution, you could simply copy it into /etc/rc.d and use the
appropriate admin tool, like Yast2 to add the script to the appropriate directory. For more details see
the section on init-scripts.
Note however, that simply changing /etc/inittab is not enough. You need to tell the init process to re-
read it. Normally init will re-read the file when it changes run levels or by sending it a hangup
signal with
kill -HUP <PID_OF_ID>.
Also running the command
telinit q
will tell init to reread it.
Be extremely careful if you edit the /etc/inittab file by hand. An editing mistake could prevent your
system from booting into a specific run level. If you use the boot, bootwait or sysinit actions, you
could prevent your system from booting at all. Therefore, like with any system file it is good idea to
make a backup copy first. If you make a mistake that prevents a particular program from starting,
and the action is respawned, init might get caught in a loop. That is, init tries to start the program,
cannot, for whatever reason and then tries to start it again. If init finds that it is starting the program
more than 10 times within 2 minutes, it will treat this as an error and stops trying. Typically you will
get messages in the system log that the process is "respawning too rapidly".

Init Scripts
If you were to just install the Linux operating system on your hard disk, you would not be able to do
very much. What actually makes Linux so useful is all of the extra things which are brought with it.
This is essentially true for every operating system.
What makes Linux so useful as well as powerful are all the of services, which are generally referred
to as daemons. These daemons typically run without user intervention providing everything from
printing to file services to Web pages and beyond. Because they are not part of the operating system
proper they are normally loaded separately from the kernel. Although many of these services could
be made part of the kernel, they are mostly separate programs. Because they are separate programs
something needs to configure and start them.
In most cases, simply installing a particular package is sufficient to activate the appropriate daemon.
However, there are times when you need to make changes to how these demons behave, which
often means changing the way the program starts up. In order to be able to do that, you obviously
need to know just how and where these daemons are started in the first place. That's exactly what
we're going to talk about here.
Once the kernel is loaded, one of the last things it does is to start the init process. The job of the init
process (or simply init) is to start all of the daemons at the appropriate time. What the appropriate
time is depends on a number of different things. For example, you may be performing
administrative tasks and you do not want certain daemons to be running. Although you can stop
those demons you do not need, the system provides a mechanism to do this automatically.
To understand this mechanism we need to talk about something called "run states" or "run levels".
Most users (and many administrators, for that matter) are familiar with only one run level. This is
the run level in which the system is performing all of its normal functions. Users can login, submit
print jobs, access Web pages, and do everything else one would expect. This run level is commonly
referred to as multiuser mode. In contrast, maintenance or single user mode is normally
recommended for administrative tasks.
Each run level is referred to by its number. When the system is not doing anything, that is the
system is stopped, this is run level 0. Single user mode is run-level 1. Multiuser mode is actually
multiple runs levels. Depending on which distribution or which version of Unix you ahve, this can
be run-level 2, run-level 3 and run-level 5. Most Linux systems automatically booting into run-level
3 when the system starts. Run level 2 is very similar to run level 3, although a number of things do
not run in level 2. In fact, on some systems (SCO UNIX for example), run level 2 is the standard
multi-user mode. Run-level 5 is where the starts automatically. (For more details on the run levels,
take a look at the init(8) man-page.)
Like many other aspects of the system, init has its own configuration file: /etc/inittab (see the table
below). This file contains the init table (inittab), which tells it init what to do and when to do it.
Each activity init does is represented by a single line in the inittab, which consists of four entries,
separated by a colon. The first field is a unique identifier for that entry, which enables init to keep
track of each daemon as it runs. The second field is the run level in which each particular entry is
run.
The third entry is the action, which tells init how to behave in regard to this entry. For example,
some entries are only processed when the system boots. Others are automatically re-started should
that particular process stop (such as terminal logins). The last entry is what program will be started
and often a number of options for that program.
If you take look in inittab on your system you may notice something peculiar. More than likely, you
are not going to find any entries for the system demons we have been talking about. The reason is
quite simply that the daemons are not started through the inittab, but rather through scripts which
are started from the inittab. These scripts we see as the entries labeled l0 through l6, for run levels 0
through 6 (the letter "ell", not the number one).
In the example below, the "action" is that init waits until the program has terminated before
continuing on and processing other entries for this run level. This also means that the entry will only
be processed once as the system enters that particular one level.
The key to all of this is the program which is run for each run level. In every case, it is the shell
script rc, which is given the appropriate run level as an argument. This script is often called the "run
level master script" as it is responsible for loading all of the other init scripts. Where this script lies
and what it is called will be different for different Linux distributions. Under older versions of SuSe
it in /etc/rc.d, but now it's in /etc/init.d/. Under Caldera the script resides under /etc/rc.d. Note that
starting with version 8.0, SuSe also has an /etc/rc.d directory, which is actually a symbolic link to
/sbin/init.d.
Not just the location of the script is different between distributions, but so is the actual code.
However, the basic functionality is generally the same. That is, to start other scripts which finally
start the daemons we have been talking about all along.
One of the key aspects is how the system determines which daemon to start in which run level. As
you might guess, this is accomplished through the run-level that is passed as an argument to the RC
script. At least that's part of it. In addition, the system needs a list of which scripts should be started
in which run level. This is accomplished not by a text file, but rather by separating the programs or
scripts into different directories, one for each run level.
If you look in the /sbin/init.d or /etc/rc.d directory you'll see a number of subdirectories of the form
rc#.d, where # is a particular run level. For example, the directory rc3.d is for run level 3. Within the
subdirectories are not the actual scripts, as you might have guessed, but rather symbolic links to the
actual scripts. The primary reason for this is that a script can be started in more than one run level.
If the files were not links, but rather copies, any change would have to be made to every copy. The
reason they are symbolic links, is that they may point to files on other file systems which is only
possible by using symbolic links.
With SuSe, the /sbin/init.d directory is also where the real scripts reside. On Caldera, the scripts
reside under /etc/rc.d/init.d.
At first glance, the filenames may be a little confusing. Although it is fairly simple to figure out
what daemon is started by looking at the name, the way these links are named takes a little
explanation.
As you might guess, the link ending in "apache" points to the script which starts the Apache Web
server.
However, you'll see there are two files with this ending. The really odd thing is that both of these
links point to the exact same file. So, what's the deal?
Part of the explanation lies in the first letter of each of these links. As you see, each starts with
either the letter S or the letter K. Those which begin with the letter S are used to start the particular
service and those which begin with the letter K are used to stop or kill that same service.
That leaves us with just the numbers. These are used to define the order in which the scripts are run.
When the files are listed, they automatically appear in numerical order. In this way, the system can
ensure the scripts are run in the correct order. For example, you do not want to start the Apache Web
server before you start the network. Therefore, the linked used to start the network is S05network
whereas the link used to start Apache is S20apache as S05 comes before S20 no matter what comes
afterwards.
Note also, the same applies when the system shuts down. K20apache is used to shut down the
Apache server and K40network is used to shut down network. As in the first case, the network is
not shutdown until after Apache has.
It is interesting to note that this system could work even if the name of the link consisted of just S or
K and the appropriate number. That is, it would still work if the link told us nothing of the service
being started. There is actually more to it than making things simpler for us non-computers. Having
the names at the end allows the system to avoid unnecessary the unnecessary stopping and starting
of the various services. When a lower level is entered, only those of services are started which were
not started in previous run level. When leaving a run level, the only services that are stopped are
those that are not started in the new level.
Let's look at an example. In the directory /etc/init.d/rc3.d (for run level 3), there are links used to
both start and stop the network. However, this means the network will always be re-started when
moving from run level 1 to run level 3. This also means the network will always be stopped when
moving from run level 3 to run level 1. On the other hand, both links exist in rc.2 (for run level 2).
Therefore, when leaving either run level 2 or 3 and moving to the other, the network is not stopped
as there is a start link for it in the new run level. When entering the new run level, the network is not
started, as there was already a start link for the previous level. However, in moving from a run level
when network is running (e.g. 2,3 or 5) to run level 1, the network is stopped because there is no
link to start the network in run level 1.
We're not done yet.
Since the links to both start and stop a service can be to the exact same file, the script needs some
way of knowing whether it should start or stop the service. This is done by passing an argument to
the script: start to start the service and stop to stop the service (simple, huh?). Inside each script, this
argument is read (typically $1) and different activities are performed based on what the argument
was.
Note that for many scripts, you can pass other arguments than just start and stop. For example, one
common argument is restart. As its name implies, this is used to stop then start the service again, in
other words, restart a running service. Many will also accept the argument status, which is used to
deliver status information about that service.
# Default runlevel. id:3:initdefault: # System initialization. si::sysinit:/etc/rc.d/rc.modules default
bw::bootwait:/etc/rc.d/rc.boot # What to do in single-user mode. ~1:S:wait:/etc/rc.d/rc 1
~~:S:wait:/sbin/sulogin l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 5 l6:6:wait:/etc/rc.d/rc 6
Figure - Excerpts from the /etc/inittab file
On some systems, the scripts we have talked about so far are not the only scripts which are started
when the system boots. Remember that init reads the inittab to find out what to do, so there are any
number of things that "could" be started through the inittab, as compared to the rc-scripts. Even so,
for people who are used to other versions of UNIX, the inittab looks pretty barren.
One type of script that is often run from the inittab deals with system initialization. For example, the
boot script, which is found directly in /sbin/init.d. The entry in the inittab might look like this:
si:I:bootwait:/sbin/init.d/boot
The run level this script runs in is "I", which is not a traditional run level, but used by some
distributions (i.e. SuSe) to indicate system initialization. However, because the action is bootwait,
the run-level field is ignored. Bootwait means that this entry will be processed while the system
boots, and init will wait until the command or script has completed.
In this case, the script is /sbin/init.d/boot, which performs basic system initialization such as starting
the bdflush daemon (which writes dirty buffers to the disk), checking the filesystems (with fsck),
mounting filesystems, starting the kernel module daemon (kerneld), and many other things. Other
versions (as well as other UNIX dialects) may have several different entries in the inittab that
combine to do the same work as the /sbin/init.d/boot script under SuSe Linux.
The counterpart to the /sbin/init.d/boot script /sbin/init.d/halt. These are the procedures that are
carried out when the system is brought down. In general, these are the reverse of the procedures in
the boot scripts, such as stopping kerneld and unmounting filesystems.
SuSe also uses the system configuration file /etc/rc.config. This file contains a large number of
variables that are used to configure the various services. Reading this file and setting the variables is
one of the first things done by the script /sbin/init.d/rc. The counterpart to this file on Caldera is
/etc/syconfig/daemons. Instead of a single configuration file, you will find separate files for a
number of different daemons.
Creating your own init scripts
Sometimes the scripts your particular distribution provides are not sufficient and you need to add
your own. On a number of systems where I have needed to add my own system services, I have
needed to create my own init scripts. The method that works on any system is to simply follow the
conventions used by your distribution.
SuSe has realized the need for creating your own init scripts, so has provided a template for you.
This is the file /sbin/init.d/skeleton and as its name implies, is a "skeleton" init script. In its default
state, this is a completely runnable init script. At the same time it is completely useless as there is no
daemon behind it. Instead, you simply uncomment the lines you need, change the name of the
daemon or service and you are ready to run.

LILO -The Linux Loader


In the LInux LOader). This is basically a set of instructions to tell the operating system how to boot.
These instructions include what operating system to boot and from what partition, as well as a
number of different options. If LILO is installed in your master boot record, it can be used to boot
basically any operating system that you can install on that hardware. For example, on my machine, I
have had LILO boot various Windows versions (including NT), SCO Unix, Sun Solaris, and, of
course, Linux. Actually, most of the work was done by the boot loader of the respective operating
system, but LILO was used to load start the boot process.
In this section we are going to talk about some of the basics of LILO from a user's perspective. In
the section on
When your system reaches the point where LILO is executed, you are usually given the prompt:

LILO:
By simply pressing the enter key, LILO will execute it's default instructions, which usually means
loading and executing a Linux kernel. However, starting Linux is not a requirement. In the before
time, when I typically worked on Windows, rather than Linux, I had LILO as my boot loader, but it
booted Windows 95 by default. Either I pressed the enter key at the LILO prompt or simply waited
until LILO had reached the configured timeout (which I configured to be 10 seconds).
In order to boot different operating systems, you need to be able to tell LILO what to boot. This is
done by simply inputting the appropriate text at the LILO prompt. This text is configured in the
LILO configuration file (/etc/lilo.conf). A problem arises three months later when you have
forgotten what text you used. Fortunately, to get a list of available options, all you need to do is
press the TAB key, which will display the different texts. For example, I had three, which were
labeled "win95", "linux" and "linuxgood". The win95 was the default (before I knew better), linux
started my current kernel and linuxgood was a kernel that I had compiled with just the basic options
that I knew worked and I used it as a failsafe should something go wrong when installing a new .
Interestingly enough, SuSE added their own LILO entry in the meantime, which they simply called
"failsafe" with the same purpose as my entry.
In addition to accepting the tag, or label, for a specific entry, you can pass configuration options to
LILO directly at the prompt. One thing I commonly pass is the location of the root filesystem. I
used to have a couple of different Linux distributions on my system, particularly when a new
version came out.I would install the new version on a different partition to make sure things worked
correctly. I could then boot from either the new or old kernel and select which root filesystem I
wanted. This might be done like this:
linux root=/dev/hda6
Here /dev/hda6 is the partition where my root filesystem is. Note that LILO does not do anything
with these options, instead they are passed to the kernel. LILO is not very smart, but knows enough
to pass anything given it at the LILO prompt to the kernel. You can also pass options to tell the
kernel that the root filesystem is read-only (root=/dev/hda6,ro) or read-write (root=/dev/hda6,rw).
Another useful option is the keyword "single". This tells the kernel to boot into "single-user mode",
which is also referred to as "maintenance mode". As the names imply, only a single user can log into
the system and it is used to perform system maintenance.
If it runs into problems while booting, LILO will provide you information about what went wrong
(albeit not as obviously as you might hope). When loading, the letters of "LILO" are not printed all
at once, but rather as each phase of the boot process is reached. Therefore, you can figure out where
the boot process stopped by how much of the word LILO is displayed. The following table shows
you the various stages and what possible problems could be.

Characters Description
none LILO has not yet started. Either it was not installed or the partition is not active
The first stage boot loader has been loaded and started. However, the second stage
L errorcode boot loader cannot be loaded. The errorcode typically indicates a media problem, such
as a hard disk error or incorrect hard disk geometry.
The second stage boot loader was loaded, but could not be executed. Either a
LI geometry mismatch has occurred or boot/boot.b was moved without running the map
installer.
Second stage boot loader starts, but cannot load the descriptor table from the map file.
LIL
Typcially a media failure or by a geometry mismatch.
Second stage boot loader loaded at an incorrect address. Typically caused by a
LIL?
geometry mismatch or by moving /boot/boot.b without running the map installer.
Descriptor table is corrupt. Typically caused by either a geometry mismatch or by
LIL-
moving /boot/boot.b without running the map installer.
LILO Everything successfully loaded and executed.
Table - LILO boot stages and possible problems.

Stopping the System


For those of you who hadn't noticed, Linux isn't like DOS or Windows. Despite the superficial
similarity at the command prompt and similarities in the GUI, they have little in common. One very
important difference is the way you stop the system.
In DOS or Windows 95/98/ME, you are completely omnipotent. You know everything that's going
on. You have complete control over everything. If you decide that you've had enough and flip the
power switch, you are the only one doing so will effect. However, with dozens of people working
on an Linux system and dozens more using its resources, simply turning off the machine is not
something you want to do. Despite the fact that you will annoy quite a few people, it can cause
damage to your system, depending on exactly what was happening when you killed the power.
On a multi-user system like Linux, many different things are going on. You many not see any disk
activity, but the system may still have things its buffers which are waiting for the chance to write to
the hard disk. If you turn off the power before this data is written, what is on the hard disk may be
inconsistent.
Normally, pressing Ctrl-Alt-Del will reboot your system. You can prevent this by creating the file
/etc/shutdown.allow, which contains a list (one entry per line) of users. If this file exists, the system
will first check whether one of the users listed in shutdown.allow is logged in on the system
console. If none are, you see the message : shutdown: no authorized users logged in.

To make sure that things are stopped safely, you need to shut down your system "properly." What is
considered proper can be a couple of things, depending on the circumstances. Linux provides
several tools to stop the system and allows you to decide what is proper for your particular
circumstance. Flipping the power switch is not shutting down properly.
Note that the key combination Ctrl-Alt-Del is just a convention. There is nothing magic about that
key combination, other than people are used to it from DOS/Windows. By default, the combination
Ctrl-Alt-Del is assigned to the special keymap "Boot". This is typically defined by default in the file
/usr/src/linux/drivers/char/defkeymap.map, which is the keyboard mapping the kernel uses when it
boots. However, you can use the loadkeys program to change this if you need to.
If necessary, you could define that the combination Ctrl-Alt-Del is not assigned to anything,
therefore it would not shutdown your system. However, should the system get stuck in a state that
you cannot correct, shutting it down with Ctrl-Alt-Del is often the only safe alternative (as
compared with simply flipping the power switch.)
When you press the "boot" key combination, the init program is sent the signal SIGINT. What init
does will depend on how the /etc/inittab is configured. In the section on run levels, we talked about
the various actions in /etc/inittab that tell init what to do when the key combination Ctrl-Alt-Del is
pressed (one being ctraltdel). On my system it is defined as "/sbin/shutdown -r -t 4 now", which
says to run the shutdown command immediately (now) and reboot (-r), waiting four seconds
between the time the warning message is sent and the shutdown procedure is started (-t 4).
The first two tools to stop your system are actually two links in /sbin: halt and reboot, that link to
the same file. If either of these is called and the system is not in run-level 0 or 6, then shutdown
(also in /sbin) is called instead.
Running shutdown is really the safest way of bringing your system down, although you could get
away with running init 0. This would bring the system down, but would not give the users any
warning. Shutdown can be configured to give the users enough time to stop what they are working
on and save all of their data.
Using the shutdown command, you have the ability not only to warn your users that the system is
going down but also to give them the chance to finish up what they were doing. For example, if you
were going to halt the system in 30 minutes to do maintenance, the command might look like this:
shutdown -h +30 "System going down for maintenance. Back up after lunch."
This message will appear on everyone's screen immediately, then at increasing intervals, until the
system finally goes down.
If you have rebuilt your kernel or made other changes that require you to reboot your system, you
can use shutdown as well, by using the -r option.
Option Description
-d Do not write to the /var/log/wtmp.
-f Force a reboot, i.e. do not call shutdown.
-i Shutdown the network interfaces before halting or rebooting.
-n Do not sync (write data to disk) before halting or rebooting.
-p Power off after shutdown.
-w Do not actually stop the system, just write to /var/log/wmtp.
Table - Options to halt and reboot.

Option Description
-c Cancel a shutdown that is in progress.
-f Don't run fsck when the system reboots (i.e. a "fast" reboot).
-F Force fsck on reboot.
-h Halt the system when the shutdown is completed.
-k Send a warning message, but do not actually shutdown the system.
-n Shutdown without calling init. DEPRECATED.
-r Reboot the system after shutdown.
-t seconds Seconds to wait before starting the shutdown.
-z Shutdown using "software suspend".
Table - Options to shutdown.

User Accounts
Users gain access to the system only after the system administrator has created user accounts for
them. These accounts are more than just a user name and password; they also define the
environment the user works under, including the level of access he or she has.
Users are added to Linux systems in one of two ways. You could create the necessary entries in the
appropriate file, create the directories, and copy the start-up files manually. Or, you could use the
adduser command, which does that for you.
Adding a user to a Linux system is often referred to as "creating a user" or "creating a user
account". The terms "user" and "user account" are often interchanged in different contexts. For the
most part, the term "user" is used for the person actually working on the system and "user account"
is used to refer to the files and programs that create the user's environment when he or she logs in.
However, these two phrases can be interchanged and people will know what you are referring to.
When an account is created, a shell is assigned along with the default configuration files that go
with that shell. Users are also assigned a home directory, which is their default directory when they
login, usually in the form /home/<username>. Note that the parent of the user's home directories
may be different.
When user accounts are created, each user is assigned a User Name (login name or logname), which
is associated with a User ID (UID). Each is assigned to at least one group, with one group
designated as their login group. Each group has an associated Group ID (GID). The UID is a
number used to identify the user. The GID is a number used to identify the login group of that user.
Both are used to keep track of that user and determine what files he or she can access.
In general, programs and commands that interact with us humans report information about the user
by logname or group name. However, most identification from the operating system's point of view
is done through the UID and GID. The UID is associated with the user's logname. The GID is
associated with the user's login group. In general, the group a user is a part of is only used for
determining access to files.
User accounts are defined in /etc/passwd and groups are defined in /etc/group. If you look on your
system, you will see that everyone can read both of these files. Years ago, my first reaction was that
this was a security problem, but when I was told what this was all about, I realized that this was
necessary. I was also concerned that the password be accessible, even in encrypted format. Because
I know what my password is, I can compare my password to the encrypted version and figure out
the encryption mechanism, right? Nope! Its not that easy.
At the beginning of each encrypted password is a seed. Using this seed, the system creates the
encrypted version. When you login, the system takes the seed from the encrypted password and
encrypts the password that you input. If this matches the encrypted password, you are allowed in.
Nowhere on the system is the unencrypted password stored, nor do any of the utilities or
commands generate it.
Next, lets talk about the need to be able to access this information. Remember that the operating
system knows only about numbers. When we talked about operating system basics, I mentioned that
the information about the owner and group of a file was stored as a number in the inode. However,
when you do a long listing of a file (ls -l), you don't see the number, but rather, a name. For
example, if we do a long listing of /bin/mkdir, we get:
-rwxr-xr-x 1 root root 7593 Feb 25 1996 /bin/mkdir

The entries are:


permissions links owner group size date filename
Here we see that the owner and group of the file is root. Because the owner and group are stored as
numerical values in the inode table, the system must be translating this information before it
displays it on the screen. Where does it get the translation? From the /etc/passwd and /etc/group
files. You can see what the "untranslated" values are by entering
ls -ln /bin/mkdir
which gives us: -rwxr-xr-x 1 0 0 7593 Feb 25 1996 /bin/mkdir

If we look in /etc/passwd, we see that the 0 is the UID for root, and if we look in /etc/group, we see
that 0 is also the GID for the group root, which are the numbers we got above. If the /etc/passwd
and /etc/group files were not readable by everyone, then no translation could be made like this
without some major changes to most of the system commands and utilities.
On a number of occasions, I have talked to customers who claimed to have experienced corruption
when transferring files from one system to another. Sometimes it's with cpio, sometimes it's tar. In
every case, files have arrived on the destination machine and have had either "incorrect" owners or
groups and sometimes both. Sometimes, the "corruption" is so bad that there are no names for the
owner and group, just numbers.
Numbers, you say? Isn't that how the system stores the owner and group information for the files?
Exactly. What does it use to make the translation from these numbers to the names that we normally
see? As I mentioned, it uses /etc/passwd and /etc/group. When you transfer files from one system to
another, the only owner information that is transferred are the numbers. When the file arrives on the
destination machine, weird things can happen. Lets look at an example.
At work, my user name was jimmo and I had UID 12709. All my files were stored with 12709 in
the owner field of the inode. Lets say that I create a user on my machine at home, also named
jimmo. Because there are far fewer users on my system at home than at work, jimmo ended up with
UID 500. When I transferred files from work to home, the owner of all "my" files was 12709. That
is, where there normally is a name when I do a long listing, there was the number 12709, not jimmo.
The reason for this is that the owner of the file is stored as a number in the inode. When I copied the
files from my system at work, certain information from the inode was copied along with the file,
including the owner. Not the user's name, but the numerical value in the inode. When the files were
listed on the new system, there was no user with UID 12709, and therefore no translation could be
made from the number to the name. The only thing that could be done was to display the number.
This makes sense because what if there were no user jimmo on the other system? What value should
be displayed in this field? At least this way there is some value and you have a small clue as to what
is going on.
To keep things straight, I had to do one of two things. Either I create a shell script that changed the
owner on all my files when I transferred them or I figure out some way to give jimmo UID 12709
on my system at home. So I decided to give jimmo UID 12709.
Here, too, there are two ways I can go about it. I could create 12208 users on my system so the
12709th would be jimmo. (Why 12208? By default, the system starts with a UID 500 for normal
users.) This bothered me though, because I would have to remove the user jimmo with UID 500
then create it again. I felt that this would be a waste of time.
The other alternative was to change the system files. Now, there is nothing that Linux provides that
would do that. I could change many aspects of the user jimmo; however, the UID was not one of
them. After careful consideration, I realized that there was a tool that Linux provided to make the
changes: vi. Because this information is kept in simple text files, you can use a text editor to change
them. After reading the remainder of this chapter, you should have the necessary information to
make the change yourself.
One thing I would like to point out is that vi is not actually the tool you should use. Although you
could use it, something could happen while you are editing the file and your password file could get
trashed. Linux provides you with a tool (that's actually available on many systems) specifically
designed to edit the password file: vipw (for "vi password").
What vipw does is create a copy of the password file, which is what you actually edit. When you are
finished editing, vipw replaces the /etc/passwd with that copy. Should the system go down while
you are editing the file, the potential for problems is minimized. Note that despite its name, the
editor that is called is defined by your EDITOR environment variable.
On many systems, the adduser program is used to add users (what else?). Note that when you create
a user, you are assigned a value for the UID, usually one number higher than the previously
assigned UID. Because adduser is a shell script, you can change the algorithm used, if you really
want to.
When the first customer called with the same situation, I could immediately tell him why it was
happening, how to correct it, and assure him that it worked.
You can also change a user's group if you want. Remember, however, that all this does is change the
GID for that user in /etc/passwd. Nothing else! Therefore, all files that were created before you
make the change will still have the old group.
You can change your UID while you are working by using the su command. What does su stand
for? Well, that's a good question. I have seen several different translations in books and from people
on the Internet. I say that it means "switch UID, " as that's what it does. However, other possibilities
include "switch user" and "super-user." This command sets your UID to a new one. The syntax is
su <user_name>

where <user_name> is the logname of the user whose UID you want to use. After running the
command, you have a UID of that user. The shortcoming with this is that all that is changed is the
UID and GID; you still have the environment of the original user. If you want the system to
"pretend" as though you had actually logged in, include a dash (-). The command would then be
su - <user_name>

What is actually happening is that you are running a new shell as that user. (Check the ps output to
see that this is a new process.) Therefore, to switch back, you don't need to use su again, but just
exit that shell.
We need to remember that a shell is the primary means by which users gain access to the system.
Once they do gain access, their ability to move around the system (in terms of reading files or
executing programs) depends on two things: permissions and privileges.
In general, there is no need to switch groups. A user can be listed in more than one group in
/etc/group and the system will grant access to files and directories accordingly.
Permissions are something that most people are familiar with if they have ever worked on an Linux
(or similar) system before. Based on what has been granted, different users have different access to
files, programs, and directories. You can find out what permissions a particular file has by doing a
long listing of it. The permissions are represented by the first 10 characters on the line. This is
something that we covered in a fair bit of detail in the section on shell basics, so there is no need to
repeat it here.
Removing users is fairly straightforward. Unfortunately, I haven't found a utility that will remove
them as simply as you can create them. Therefore, you will need to do it manually. The simplest
way is to use vipw to remove the users entry from /etc/passwd and to remove its home directory and
mailbox.
However, this is not necessarily the best approach. I have worked in companies where once a user
was created, it was never removed. This provides a certain level of accountability.
Remember that the owner is simply a number in the inode table. Converting this number to a name
is done through the entry in /etc/passwd. If that entry is gone, there can be no conversion. If a new
user were to get the UID of an old, removed user, it may suddenly have access to a file that it
shouldn't (i.e., a file owned by the old user that it now owns).
Even if no new users get that UID, what do you do if you find an "unowned" file on your system,
that is, one with just a number as the owner and without associated entry in /etc/passwd? What you
do is up to your company, but I think it is safer to "retire" that user.
You could remove its home directory and mailbox. However, change its password to something like
NOLOGIN. This password is shorter than an encrypted password, so it is impossible that any input
password will encrypt to this. Then change its login shell to something like /bin/true. This closes
one more door. By making it /bin/true, no error message will be generated to give a potential hacker
a clue that there is "something" about this account. Alternatively, you could replace the login shell
with a message to say that the account has been disabled and the owner should report to have it re-
activated. This helps to dissuade would-be hackers.
Another useful tool for thwarting hackers is password shadowing. With this, the encrypted password
is not kept in /etc/passwd, but rather /etc/shadow. This is useful when someone decides to steal your
password file. Why is this a problem? I will get into details about it later, but lets say now that the
password file could be used to crack passwords and gain access to the system.
Because you must have the /etc/passwd file word-readable to make translations from UID to user
name, you cannot protect it simply by changing the permission. However, the /etc/shadow
password, where the real password is stored, is not readable by regular users and therefore is less of
a security risk. (I say "less" because if an intruder gets in as root, all bets are off).

Logging in
Users gain access to the system through "accounts." This is the first level of security. Although it is
possible to configure applications that start directly on specific terminals, almost everyone has
logged into an Linux system at least once. More that likely, if you are one of those people who
never login, you never see a shell prompt and are probably not reading this book.
Most Linux systems have a standard login. The figure below shows what the login process looks
like. You see the name of the system, followed by a brief message (the contents of /etc/issue) and
the login prompt, which usually consists of the system name and the word login. This is a text file,
so you can edit it as you please. Because it is read dynamically, the changes will appear the next
time someone tries to log in. After the contents of /etc/issue, you see the login prompts, such as
jmohr!login:
When you login, you are first asked your user name and your password. Having been identified and
your password verified, you are allowed access to the system. This often means that the system
starts a shell for you. However, many programs can be used in place of a shell.
One entry in the password file is your home directory, the directory that you have as your current
directory when you log in. This is also the place to which the shell returns you if you enter cd with
no arguments.
After determining your login shell and placing you in your home directory, the system will set up
some systemwide defaults. If you have a Bourne or Bourne Again-shell, these are done through
the /etc/profile file. If bash is your login shell, the system runs through the commands stored in
the .profile in your home directory then the .bashrc file, provided they exist. If you have sh, then
there is no equivalent for the .bashrc file. If you have a Z-shell, the system defaults are established
in the /etc/zprofile file. The system then executes the commands in the .zshrc and .zlogin files in
your home directory, provided they exist. See the appropriate man-page and the section on shell
basics for more details.
During the login process, you are shown several pieces of information about the local system.
Before the login prompt, you usually see the contents of the /etc/issue file, as I mentioned earlier.
After your login is successful, you will normally see a message about the last login and the message
of the day. The message of the day is the contents of the file /etc/motd.
In some cases, all of this information is bothersome. For example, many businesses have either
menus that their users log into or applications that start from their users .profile or .login. In some
cases, the information is of little value.
In some cases, even knowing that this is an UNIX system could be a problem. There are many
hackers in the world who would just love the chance to try to crack your security. By not even
telling them what kind of system you have, you reduce the amount by which they are tempted. At
least, that's one more piece of information that they need to figure out. Therefore, we need a way to
disable these messages.
The two obvious ways are by using /etc/issue and /etc/motd. By default, both of these files contain
information about your system. By either changing the contents or removing the files altogether,
you can eliminate that source of information.
Another way is the login: prompt itself. Again, by default, this prompt contains the name of your
system. This may not concern most system administrators, however, in cases where security is an
issue, I might like to disable it. The prompt comes from the /etc/gettydefs file. The gettydefs file
contains information the getty program uses when it starts the login program on a terminal. The
more common lines in the gettydefs file contain an entry that looks like this:
@S login:
Take a look at the
login:
prompt and you will see that it also contains the literal string login: immediately following the name
of the system. The name of the system comes from @S. By changing either of the parts (or both),
you can change the appearance of your login prompt, even removing the name of the system, if you
want.
The getty(1m) man-page contains a list of the different information that you can include with the
login: prompt. If you are providing PPP services, I recommend that you do not cahnge anything in
your login prompt, such as the date/time or the port name. This makes creating chat scripts difficult,
as the users trying to login will not know what to expect.
At this point, we are left with the last login messages. Unfortunately, these are not contained in files
that are as easily removed as /etc/motd and /etc/issue. However, by creating a file, the file
.hushlogin in your home directory, we can remove them. It has no contents; rather, the existence of
this file is the key. You can create it simply by changing to a users home directory (yours, if you are
that user) and running : touch .hushlogin

Often administrators want to keep users' knowledge of the system as limited as possible. This is
particularly important for systems with a high level of security in which users start applications and
never see the shell prompt. One give-away to what kind of system you are on is the following line
when you login: Last login: ...

System administrators often call support asking for a way to turn this feature off. Fortunately, there
is a way. This, too, is disabled by creating the .hushlogin file. Once this functionality is enabled, you
can simplify things by having this file created every time a new user is created. This is done by
simply adding the .hushlogin file to the /etc/skel directory. As with every other file in this directory,
it will be copied to the user's home directory whenever a new user is created.
One thing to consider before you turn this feature off is that seeing when the last login was done
may indicate a security problem. If you see that the last login was done at a time when you were not
there, it may indicate that someone is trying to break into your account.
You can see who is currently logged in by running either the who or w command. These commands
are kept in the file utmp in your system log directory (/usr/adm, /var/log, etc). Once the system
reboots, this information is gone.
You can also see the history of recent logins by using the last command. This information is kept in
wtmp in the system log directory. This command is kept between reboots and, depending on how
active your system gets, I have seen this file grow to more than a megabyte. Therefore, it might not
be a bad idea to truncate this file at regular intervals. (Note that some Linux distributions do this
automatically.)
One way to limit security risks is to keep the root account from logging in from somewhere other
than the system console. This is done by setting the appropriate terminals in /etc/securetty. If root
tries to log into a terminal that is not listed here, it will be denied access. It is a good idea to list only
terminals that are on the system console (tty1, tty2, etc.).
If you really need root access, you can use telnet from a regular account and then su to root. This
then provides a record of who used su.

Terminals
Unless your Linux machine is an Internet server or gateway machine, there probably will be users
on it. Users need to access the system somehow. Either they access the systems across a network
using a remote terminal program like telnet, rlogin, or access file systems using NFS. Also, like
users typically do on Windows, they might log in directly to the system. With Linux, this (probably)
is done from a terminal and the system must be told how to behave with the specific terminal that
you are using.
Increasingly people are using graphical user interfaces (GUIs) to do much of their work. With many
distributions a lot of the work is still done using the command line, which means they need a
terminal, whether or not it is displayed within a graphical window.
In live environments that use Linux (such as where I work), you do not have the access to a
graphical interface on all systems (for security reasons, among other things). Therefore, the only
way to remotely administer the system is through telnet, which typically requires a terminal
window. In cases like this, it is common to move from one operating system type to another (Linux
to Sun, or vis-versa). Therefore, knowledge of terminal settings capabilities is often very useful.
When we talk about terminals, we are not just talking about the old fashioned CRT that is hooked
up to your computer through a serial port. Instead, we are talking about any command-line (or shell)
interface to the system. This includes serial terminals, telnet connections and even the command-
line window that you can start from your GUI.

Terminal Capabilities
If you are interacting with the system solely through command line input, you have few occasions
to encounter the terminal capabilities. As the name implies, terminal capabilities determine what the
terminal is capable of. For example, can the terminal move the cursor to a specific spot on the
screen?
The terminal capabilities are defined by one of two databases. Older applications generally use
termcap, while newer ones use terminfo. For the specifics on each, please see the appropriate man-
page. Here I am going to talk about the concept of terminal capabilities and what it means to you as
a user.
Within each of these databases is a mapping of the character or character sequence the terminal
xxpects for certain behavior. For example, on some terminals, pressing the backspace key sends a
Ctrl-? character. On others, Crtl-H is sent. When your TERM environment variable is set to the
correct one for your terminal, pressing the backspace key sends a signal to the system which, in
turn, tells the application that the backspace characteristic was called. The application is told not
just that you pressed the key with the left arrow ( FACE="Symbol">¬) on it. Instead, the application
is told that that key was the backspace. It is then up to the application to determine what is to be
done.
The key benefit of a system like this is that you do not have to recompile or rewrite your application
to work on different terminals. Instead, you link in the appropriate library to access either termcap
or terminfo and wait for the capability that OS will send to you. When the application receives that
capability (not the key), it reacts accordingly.
There are three types of capabilities. The first capabilities are Boolean, which determine whether
that terminal has a particular feature. For example, does the terminal have an extra "status" line?
The next type is numeric values. Examples of this capability are the number of columns and lines
the terminal can display. In some cases, this may not remain constant, as terminals such as the Wyse
60 can change between 80- and 132-column mode. Last are the string capabilities that provide a
character sequence to be used to perform a particular operation. Examples of this would be clearing
the line from the current cursor position to the end of the line and deleting the contents of an entire
line (with or without removing the line completely).
Despite that there are hundreds of possible capabilities, any given terminal will have only a small
subset of capabilities. In addition, many of the capabilities do not apply to terminals, but rather to
printers.
Both the termcap and terminfo databases have their own advantages and disadvantages. The
termcap database is defined by the file /etc/termcap, an ASCII file that is easily modified. In
contrast to this is the terminfo database, which starts out as an ASCII file but must be compiled
before it can be used.
The termcap entries can be converted to terminfo with the captoinfo command and then compiled
using tic, the terminfo compiler. The tic utility will usually place the compiled version in a directory
under /usr/lib/terminfo based on the name of the entry. For example, the ANSI terminal ends up
in /usr/lib/terminfo/a and Wyse terminals end up in /usr/lib/terminfo/w.

Terminal Settings
Whenever you work with an application, what you see is governed by a couple of mechanisms. If
you have a serial terminal, the flow of data is controlled by the serial line characteristics, including
the baud rate, the number of data bits, parity, and so on. One aspect that is often forgotten or even
unknown to many users is the terminal characteristics, which are used to control the physical
appearance on the screen. However, most of the characteristics still apply, even if you are not
connected through a serial terminal.
The reason is that these conventions date back to the time of tele-typewriters. You had a keyboard
on one end of the connection connected to a printer that printed out every single character you
typed. At that time, it was essential that both ends knew what characteristics the connection had.
Even as technology advanced there was still a need to ensure both sides communicated in the exact
same way. Since you could not guarantee that the default settings were the same on both ends, you
needed a way to change the characteristics so that both ends matched.
As I mentioned elsewhere, the serial line characteristics are initially determined by the gettydefs
file. The characteristics are often changed within the users' startup scripts (.profile, .login, etc.). In
addition, you can change them yourself by using the stty command. Rather than jumping to
changing them, lets take a look at what our current settings are, which we also do with the stty
command. With no arguments, stty might give us something like this:
speed 38400 baud; line = 0; -brkint ixoff -imaxbel -iexten -echoctl
This is pretty straightforward. Settings that are Boolean values (on or off) are listed by themselves if
they are on (ixoff) or have a minus sign in front if they are turned off (-brkint). Settings that can
take on different values (like the baud rate) appear in two formats: one in which the value simply
follows the setting name (speed 38400 baud) and one in which an equal sign is between them
(line=0).
In general, if a setting has discrete values, like the baud rate, there is no equal sign. There is only a
discrete number of baud rates you could have (i.e., there is no 2678 baud). If the stty setting is for
something that could take on "any" value (like the interrupt key), then there is an equal sign.
Normally, the interrupt key is something like Ctrl-C or the Delete key. However, it could be the f
key or the Down-Arrow or whatever.
This example shows the more "significant" terminal (stty) settings. The top line shows the input and
output speed of this terminal, which is 38400. On the second line, we see that sending a break sends
an interrupt signal (-brkint).
Setting these values is very straightforward. For Boolean settings (on or off), the syntax is simply
stty <setting>

to turn it on or
stty -<setting>

(note the minus sign in front)


to turn it off.
For example, if I wished to turn on input stripping (in which the character is stripped to 7 bits), the
command would look like this:
stty istrip

Settings that require a value have the following syntax:


stty <setting> <value>

So, to set the speed (baud rate) to 19200, the syntax would look like this:
stty speed 19200

To set the interrupt character to Ctrl-C, we would enter


stty intr ^C

Note that ^C is not two separate characters. Instead, when you type it, hold down the Ctrl key and
press "c." In most documentation you will see that the letter appears as capital although you actually
press the lowercase letter. Sometimes you want to assign the particular characteristic to just a single
key. For example, it is often the case that you want to use the backspace key to send an "erase"
character. What the erase character does is tell the system to erase the last character, which is
exactly what the backspace is supposed to do. Just like the case where you press the control key and
the character, stty settings for single keys are done the same way. For example, you would type "stty
erase " and the press the backspace key (followed by the enter key, or course). What you would see
might look like this:
stty erase ^?

The ^? is typically what the backspace key will send (at least that is the visual representation of
what the backspace sends). You can get the same result by press CTRL-?.
If the default output does not show the particular characteristic you are looking for, you can use the
-a option to show all the characteristics. You might end up with output like this:
speed 38400 baud; rows 25; columns 80; line = 0; intr = ^C; quit = ^\; erase =
^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; start = ^Q; stop = ^S;
susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts -ignbrk -brkint -ignpar
-parmrk -inpck -istrip -inlcr -igncr icrnl ixon ixoff -iuclc -ixany -imaxbel
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon -iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
-echoctl echoke

Here were see a number of well-known characteristics, such as the baud rate, the numbers of rows
and columns, interrupt character, end of file character and so on. Some of which we talked about in
the section on working with the system. For details on what the rest of these mean, please see the
stty(1L) man-page.
In principle, you can set any key to any one of the terminal characteristics. For example, I could set
the interrupt key to be the letter g: stty intr g

Although this not does make too much sense, it is possible. What does make more sense is to set the
characteristic to something fitting for your keyboard. For example, you might be using telnet to
move between system. The key sequence that your backspace sends may not be ^? (often it is ^H)
and you want to set it accordingly (or the case is reversed, as we discussed above.)
To save, change, and then restore the original values of your stty settings, use the -g option. This
option outputs the stty settings as a strings of hexadecimal values. For example, I might get
something like this: stty -g

500:5:d050d:3b:7f:1c:8:15:4:0:0:0:0:0:1a:11:13:0:0:0:0:0:0:0:0:0
We can run the stty command to get these values and make the changes, then run stty again and use
these values as the argument. We don't have to type in everything manually; we simply take
advantage of the fact that variables are expanded by the shell before being passed to the command.
You could use this to add an additional password to your system:
echo "Enter your password: \c" oldstty=`stty -g` stty -echo intr ^- read password stty $oldstty
Assign the output of the stty command to the variable old, then change the stty settings so that the
characters you input are not echoed to the screen and the interrupt key is disabled (this is done with
stty -echo intr ^-). Then read a line from the keyboard and reset the stty settings to their old value.

Printers and Interfaces


Under Linux, printing is managed and administered by several commands and files located in
various parts of the system. The primary administrative directory is /usr/spool/. Each printer that
you have configured has its own subdirectory, /usr/spool/lpd/<name>, where <name> is the name of
the printer. In this subdirectory, you will find status information about the printer, as well as
information about the jobs currently being printed.
The actual printing is done by the lpd daemon. On system start-up, lpd is started through one of the
rc scripts (normally somewhere under /etc/rc.d). As it starts, lpd looks through the printer
configuration file, /etc/printcap, and prints any files still queued (normally after a system crash).
In each spool directory is a lock file that contains the process id (PID) of the lpd process. The PID
helps keep multiple printer daemons from running and potentially sending multiple jobs to the same
printer at the same time. The second line in the lock file contains the control file for the current print
job.
Management of the print system, or print spool, is accomplished through the lpc utility. This is
much more than a "command" because it performs a wide range of functions. One function is
enabling printing on a printer. By default, there is probably one printer defined on your system
(often lp). The entry is a very simple print definition that basically sends all the characters in the file
to the predefined port. (For the default printer on a parallel port, this is probably /dev/lp1.)
When a job is submitted to a local printer, two files are created in the appropriate directory in
/usr/spool. (For the default printer, this would be /usr/spool/lp1). The first file, starting with cf, is the
control file for this print job. Paired with the cf file is the data file, which starts with df and is the
data to be printed. If you are printing a pre-existing file, the df file will be a copy of that file. If you
pipe a command to the lpr command, the df file will contain the output of the command. Using the
-s option, you can force the system to create a symbolic link to the file to be printed.
The cf file contains one piece of information on each of several lines. The first character on each
line is an abbreviation that indicates the information contained. The information contained within
the cf file includes the name of the host from which the print job was submitted (H), the user/person
who submitted the job (P), the job name (J), the classification of the print job (C), the literal string
used on the banner page to identify the user (L), the file containing the data (this is the df file) (f),
which file to remove or "unlink" when the job is completed (U), and the name of the file to include
on the banner page (N). If you check the lpd man-page, you will find about a dozen more pieces of
information that you could include in the cf file. However, this list represents the most common
ones.
In the same directory, you will find a status file for that printer. This file is called simply "status"
and normally contains a single line such as
printing disabled
If you were to re-enable the printer, the line would then change to
lp is ready and printing
Looking at this line, you might have noticed something that might seem a little confusing. (Well, at
least it confused me the first time). That is, we've been talking about the directory lp1 all along, but
this says the printer is lp. Does this mean that we are talking about two separate printers? No, it
doesn't. The convention is to give the directory the same name as the printer, but there is no rule that
says you have to. You can define both the printer name and the directory any way you want.
This is probably a good time to talk about the printer configuration file, /etc/printcap. This file
contains not only the printer definitions but the printer "capabilities" as well. In general, you can say
the printcap file is a shortened version of the termcap file (/etc/termcap), which defines the
capabilities of terminals.
In the printcap file, you can define a wide range of capabilities or characteristics, such as the length
and width of each line, the remote machine name (if you are remote printing), and, as we discussed,
the name of the spool directory. I will get into shortly what each of the entries means.
As we talked previously, the lpc command is used to manage the print spooler. Not only can you use
it to start and stop printing, but you can use it to check the status of all the printer queues and even
change the order in which jobs are printed.
There are two ways of getting this information and to manage printer queues. The first is to call lpc
by itself. You are then given the lpc> prompt, where you can type in the command you want, such
as start, disable, or any other administrative command. Following the command name, you must
either enter "all," so the command will be for all printers, or the name of the printer.
The lpc program will also accept these same commands as arguments. For example, to disable our
printer, the command would be

lpc disable lp1

For a list of options, see the lpc man-page. A list of printer queue commands can be found in Table
0-1.
One aspect of the Linux print system that might be new to you is that you enable or disable the
printing functionality within the kernel. Even though printer functionality is configured, you may
not be able to print if you have hardware conflicts. When your run 'make configure' one of the
options is to enable printing.
Once you have added the printer support to the kernel, the first thing you should do is test the
connectivity by using the ls command and sending the output to the printer device. This will
probably be /dev/lp0, /dev/lp1, or /dev/lp2, which corresponds to the DOS device LPT1, LPT2, and
LPT3, respectively. For example, to test the first parallel port you could use
ls > /dev/lp0
What results is:
INSTALL@ dead.letter linux@ lodlin15.txt lodlin15.zip mbox sendmail.cf tests/
However, if you were to issue the command without the redirection, it would probably look like
this:
INSTALL@ dead.letter linux@ lodlin15.txt lodlin15.zip mbox sendmail.cf tests/
The reason for this is that the ls command puts a single new-line character at the end of the line.
Normally, the shell sees that new-line character and is told to add a carriage return onto the line.
However, the printer has been told. Therefore, when it reaches the end of the line with the
sendmail.cf, just a new line is sent. Therefore, the printer drops down to the next (new) line and
starts printing again. This behavior is called "stair-stepping" because the output looks like stair
steps. When a carriage return is added, the shell returns back to the left of the screen as it adds the
new line.

Command Function

lpc Printer control program

lpd Print spooler daemon

lpr Print program


lpq Print queue administration program

lprm Remove jobs from print queue

pr Convert text files for printing

Table - Print Queue Commands

Advanced Formatting
Being able to output to paper is an important issue for any business. Just having something on paper
is not all of the issue. Compare a letter that you type on a typewriter to what you print with a word
processor. With a word processor, you can get different sizes or types of fonts and sometimes you
can even create drawings directly in the word processor.
Many of you who have dealt with UNIX before might have the misconception that UNIX is only
capable of printing simple text files. Some of you might have seen UNIX systems with a word
processor that did fancy things with the output. Fortunately for us, these fancy tricks are not limited
to the word processing packages. Using vi and a couple of commonly available tools, you can
output in a wide range of styles.
Readily available from a number of sites, the TeX or LaTeX (pronounced Tech and Lahtech) text
formatting package can be used to create professional-looking output. Many academic and research
institutions running UNIX use (La)TeX as their primary text processing system. Not only is it free
but the source code is also available, allowing you to extend it to suit your needs. (In many cases,
the only way to get it onto your system is to get the source code and compile it.)
Like the *roff family, TeX is input directly by the writer. These source files are then run through a
processor that formats the output based on codes that were input. This process generates a device
independent file, usually with the extension .dvi. The .dvi files are analogous to .o files in C because
they need to be manipulated further to be useful. Unfortunately, this does not work for every kind of
printer.
If your printer does not understand the .dvi file, the dvips program will convert the .dvi file to
PostScript. If your printer doesn't support PostScript, you can use ghostview to output to a format
your printer can understand.
Included on your system (provided you installed the TeX package) is the dvips program, which
converts the .dvi files to PostScript. These PostScript files can be printed out on any compatible
printer.
At first this may sound a little confusing and annoying. You have to use so many tools that just to
get a simple printout. First, if all you really need is a simple printout, you probably won't need to go
through all of these steps. This demonstrates that no matter what standard you choose to use, there
are Linux tools available to help you get your job done.
Many different programs are available to allow you to print out, view, and manipulate PostScript
files. Ghostscript is a program used to view PostScript files. These need not be files that you
generated on your local machine, but any PostScript files you have. Ghostscript can also be used to
print PostScript files to print the file to non-PostScript-compatible printers.
Ghostscript supports the resolutions that most printers can handle. However, if you are printing to a
dot-matrix printer, you need to be especially careful about getting the right resolution because it is
not normally the standard 300 DPI.
I have to pause here to remind you about working with PostScript files and printers. Sometimes the
printer is PostScript-compatible, but you have to tell it to process the file as PostScript and not as
raw text. This applies to older models of certain laser jet printers. Once, I wanted to print out a 50-
page document and forgot to set the flag to say that it was a PostScript file. The result was that
instead of 50 pages, I ended up with more than 500 pages of PostScript source.
Under Linux, printers are not the only way you can get words on paper. As of this writing, there are
at least three packages with which you can fax documents from your Linux system. First, however,
you must have a fax modem with which you can connect.
Here I need to side-step for a minute. The older type of fax, Class 1 faxes, did not have as much
processing power distributed in the hardware. Instead, the software took over this job. It works fine
on single-user systems like Windows, but under pre-emptive multitasking systems like Linux, you
can run into timing problems. (Pre-emptive multitasking is where the operating system decides
which process will run and therefore could pause the fax program at a crucial moment. More details
can be found in the
In addition to Class 1, faxes fall into different groups. To work correctly, the fax software needs to
convert the document you are sending into a group-III-compatible image. This can be done with
Ghostscript.
The GNU netfax program accepts several different file formats (as of this writing, PostScript, dvi,
and ASCII). Originally available from prep.ai.mit.edu, it is no longer supported by the GNU. More
extensive than netfax is HylaFlex (renamed from FlexFax available to avoid trademark conflicts).
This is available (as of this writing) with ftp from sgi.com under /sgi/fax/. With this package, not
only can you send faxes, but you can configure it to receive them as well.
Man-pages are something that you may need to print. If you have files in ASCII format (the cat
pages), this is not an issue. However, with pages that have been formatted with *roff formatting,
you have a couple of choices. The man program has the ability to process files with *roff
formatting. By redirecting the output on man to a file (often piping it through col), you can get clean
ASCII text that you can then print.

Printcap
As with the termcap file, each entry in the printcap file is separated by a colon. Boolean
characteristics, such as suppressing the header (sh), exist by themselves. Characteristics that can
take on a value, such as the name of the output device, are followed by an equal sign (=) and the
value (lp=/dev/lp1). For a complete list of characteristics, see the printcap man-page.
Each entry in the /etc/printcap file consists of single logical line. There is one entry for each printer
on your system. To make the entry easier to read, you can break each logical line into several
physical lines. As an example, lets look at the entry for the default, generic printer:
lp:lp=/dev/lp1:sd=/usr/spool/lp1:sh
The first part of the line is the name of the printer, in this case, lp. Each field is separated from the
others with a colon, so in this example, there are three fields (plus the printer name).
If we were to break this example into multiple physical lines, it might look like this:
lp:\ :lp=/dev/lp1:\ :sd=/usr/spool/lp1:\ :sh
At the end of each physical line, there is a to tell lpd that the logical line continues. You'll also see
that each field now has a colon before it and after it.
Although it is not necessary, you may find a file minfree in each of the spool directories. This is a
simple text file that contains the number of disk blocks that should be left to keep the print spooler
from filling up the disk. As a safety mechanism on a system with a lot of print jobs, the spool
directory can be put on a separate file system. Should it fill up, the rest of the system won't suffer.
Often, data is sent directly to the printer devices, either because it is supposed to be raw ASCII text
or because the program that created the data did its own formatting. This is referred to as raw data
as the system doesn't do anything with it.
Sometimes the data is sent by the lpd daemon through another program that processes the data in
preparation of sending it to the printer. Such programs are called filters. The stdin of the input filters
receive what the lpd puts out. The stdout of the filter then goes to printer. Such filters are often
called input filters and are specified in the printcap file with if=.
Because of this behavior, a print filter can be anything that understands the concept of stdin and
stdout. In most cases on Linux, the input filters that I have seen are simply shell scripts. However,
they can also be perl scripts.With the exception of an input filter or a log file (which is specified
using lf=), I have rarely used any other option for local printing. However, using the printcap file,
you can configure your printer to print on a remote system, which is the subject of the next section.

Remote Printing
Setting up your system to print from another machine requires just a couple of alterations in your
printcap file. Use the rm= field to specify the remote machine and the rp= field to specify the
remote printer on that machine. Sending the print job to the printer is the last thing that happens, so
any other options, including input filters, are also honored.
On the destination side, you must be allowed to access the other machine. If you are already a
trusted host and have an entry in /etc/hosts.equiv, then there is no problem. If not, you will be
denied access. (This is a good time to start thinking about a log file.)
If the sole reason the remote machine needs to trust your machine is to do remote printing, I would
recommend not including it in the hosts.equiv file. This opens up more holes. Instead, put your host
name in the file /etc/hosts.lpd. The only thing this file does is decide who can access the printers
remotely. Putting remote machine names here is much safer.
System Logging
I am regularly confronted by Windows NT users who are overwhelmed by how much information
you can collect and process using the Windows NT Event Viewer. It is so nice, they maintain, that
occurrences (events) are sorted by system, security and applications. They go on with how much
you can filter the entries and search for specific values.
The problem is, that's where it stops. With the exception of a few security related events, what you
are able to log (or not log) is not configurable under Windows NT. You get whatever Microsoft has
decided is necessary. No more and no less. You can filter what is displayed, but there is little you
can do to restrict what is logged.
With Linux the situation is completely different. Not only can you tell the system what the system
should log but exactly where it should log it. On the other hand, Windows NT always logs specific
events to a specific file. In addition, Windows NT differentiates between only three different types
of logs. This means you may need to wade through hundreds if not thousands of entries looking for
the right one. Not only can you say what is logged and what not, you can specifically define where
to log any given type of message, including sending all (or whatever part you define) to another
machine, and even go so far as to execute commands based on the messages being logged.

Syslogd
The workhorse of the Linux logging system is the system logging daemon or syslogd. This daemon
is normally started from the system start-up (rc) scripts when the system goes into run level 1. Once
running, almost any part of the system, including applications, drivers, as well as other daemons can
make log entries. There is even a command line interface so you can make entries from scripts or
anywhere else.
With Windows NT, each system maintains its own log files. There is no central location where they
are all stored. Although the Event Viewer can access event logs on other machines, this can often
take a great deal of time especially when there are a lot of entries and you have a slow connection.
Instead, syslogd can be configured to send all (or just some) of the messages to a remote machine,
which processes them and writes them to the necessary files. It is thus possible that all the log
messages of a particular type from all machines in your network are stored in a single file, which
make accessing and administering them much easier.
Another advantage is due to the fact that syslogd stores configuration information and log entries in
text files. Therefore, it is a simple matter of writing a script that parses the entries and splits them
into separate files, or processes them in other ways.
Part of this ability lies in the standard format of each log entry. Although it is possible that a rogue
program could write information in any order, all system daemons and most programs follow the
standard, which is:
date time system facility message
Here "system" is the host name which generated the message. The "facility" is a component of the
system generating the message. This could be anything like the kernel itself, system daemons and
even applications. Finally, there is the text of the message itself. Here are two messages on the
system jupiter. One is from syslogd and the other from the kernel:
Jun 5 09:20:52 jupiter syslogd 1.3-0: restart.
Jun 5 09:20:55 jupiter kernel: VFS: Mounted root (ext2 file system) readonly.
As you can see, even if you could not separate the log entries into different files, it would be fairly
easy to separate them using a script.
Configuring syslogd
What is done and when it is done is determined by the syslogd configuration file, syslog.conf,
which is usually in /etc. (I have never seen it anywhere else.) This is a typical Linux configuration
file with one item (or rule) per line and comment lines begin with a pound-sign (#). Each rule
consists of selector portion, which determines the events to react to and the action portion, which
determines what is to be done.
The selector portion is itself broken into two parts, which are separated by a dot. The facility part
says what aspect of the system is to be recorded and the priority says what level of messages to
react to. The selector has the general syntax:
facility.priority
You can see a list of facilities in table 1 and a list of the priorities in table 2.
For both facilities and priorities there is a "wildcard" that can be used (an asterisk - *) which means
any facility or any priorities. For example, *.emerg would mean all emergency messages. mail.*
would mean all messages coming from the mail facility. Logically, *.* means all priorities of
messages from all facilities.
The word "none" is used to refer to no priority for the specified facility. For example, the selector
mail.none would say not to perform the action for any mail event. At first, this might not make
sense. Why not simply leave off that facility? The answer lies in the previous paragraph. Using the
wildcard, you could say that all info messages were to be logged to a certain file. However, for
obvious reasons, you want all of the security (regardless of the priority) written to another file.
Another possibility is to specify a sub-set of facilities, rather than all of them. This is done by
separating the facilities with a comma and then the priority follows the last facility listed. For
example, to refer to information messages for mail, uucp and news, the selector entry would look
like this:
mail,uucp,news.info
One thing I need to point out here is that when you specify a priority, you are actually specifying
everything at that priority or *higher*. Therefore, in this example, we are selecting all of the
priorities at info and higher.
There are three primary things you can do with these events (the actions). Probably the most
common action is to write them to a file. However, there is more to this than it appears. Remember
that Linux (as well as other UNIX dialects) treat devices as files. Therefore, you can send the
logging messages to a specific device.
Here, I not talking about sending them a tape drive (although that might not be a bad idea). Instead,
I am talking about something like the system console (/dev/console). It is a common practice to
send emergency messages to the system console, where someone will see the messages no matter to
what console terminal they are logged on. In other cases, kernel messages are sent to one of the
console terminals (e.g. /dev/tty7). You might end with something like this:
kernel.* /dev/tty7
When writing to files, you want to consider that the system will actually write the information to the
disk with each event. This ensures the entry actually makes it to the file if the system should crash.
The problem is that writing to the harddisk takes time. That's why the system normally saves up a
number of writes before sending them all to the disk.
If overall system performance becomes an important factor in regard to logging, you can tell
syslogd **not** to sync the disk each time it writes to a log file. This is done by putting a minus
sign (-) in front of the file name, like this:
lpr.info -/var/adm/printer.log
If you disable syncing the log file like this, one important thing to remember is that you stand the
chance of losing information. If the system goes down for some reason before the information is
written to the file, you may lose an important clue as to why the system went down. One solution
would be to have a central log server where all of the information is sent and where you do not
disable syncing. That way no matter what, you have a record of what happened.
Sending the log messages to another machine is done by using an at-sign (@) in front of the
machine name as the action. For example:
*.emerg @logserver
This sends all emergency message to the machine logserver. I would suggest that you do not create
a log server that is connected to the Internet. A ill-intended person might be able to bring the system
to a halt or at least affect its performance by flooding it with erroneous log messages.
Another useful feature is the ability to send messages to named pipes. This is done by preceding the
name of the pipe by the pipe-symbol (|). I find this a useful way of sending log messages to other
programs, where I can process them further. Named pipes are created using the mkfifo(1) command
and must exist prior to syslogd starting.
Another action is the ability to send messages to particular users, provided they are logged in at the
moment. To do this you simply put their username as the action. To send it to multiple users,
separate the names by a comma. This might give you something like this:
*.emerg root,jimmo
By using an asterisk in place of the list of user names, you can send a message to everyone logged
in.
In some cases, you want multiple actions for a specific facility or priority. This is no problem. You
simply create multiple rules. One common example is broadcasting all of the emergency messages
to every user, as well as writing them to a log file **and** sending them to another server in case
the local machine crashes. This might be done like this:
*.emerg /var/adm/messages
*.emerg *
*.emerg @logserver
Previously, I mentioned the ability to cause a single action based on the same kind of messages for
multiple facilities. This is still an example of a single selector resulting in a specific action. Taking
this one step further, you might want multiple selectors all to result in a specific action. Although it
could be done with multiple rules, it possible to have multiple selectors all on the same line. This is
done by separating the selectors with a semi-colon (;).
*.emerg;kernel.critical root,jimmo
This would notify the users root and jimmo for all emergency messages as well as critical messages
from the kernel facility.
The Linux syslogd has added a couple of functions that are not available in other versions of UNIX.
By preceding a priority with an equal-sign (=), you tell syslogd only to react to that one priority.
This is useful since syslogd normally reacts to everything with that priority and higher. One place
where this is useful is when you want all debugging messages to be logged to a specific file, but
everything logged to another file.
You can also explicitly exclude priorities by preceding them with an exclamation mark. Note that
this will exclude the priorities listed as well as anything higher. You can combine the equal-sign and
exclamation mark equal-sign and exclamation mark and therefore exclude a specific priority. If you
do so, you need to precede the equal sign with the exclamation mark as what you are saying is not
to include anything that equal a particular priority.
All of these features can be combined in many different ways. For example, you can have multiple
selectors, which include as well as exclude specific priorities. For example:
*.warn;mail.=info;lpr.none;uucp.!crit /dev/tty07
This would send warning messages from all priorities to the system console terminal /dev/tty7, plus
the mail log messages at only the info priority, no printer messages at all, and finally excluding just
the uucp critical messages. Granted this is a rather contrived example, but it does show you how
complex you can get.
Note that multiple selectors on a single line can cause some confusion when there are conflicting
components within a selector. The thing to keep in mind is that the last component takes
precedence. In the previous example, we specified warning messages for all facilities and then
"overwrote" portions of that for the mail, lpr and uucp facilities.

Managing System Logs


Often times it is useful to log messages from scripts. This can done using the logger command
(usually found in /usr/bin). Without any options it takes the user name as the facility and "notice" as
the priority. However, you can specify both a facility and priority from the command line by using
-p option for example:
logger -p kern.warning The kernel has been recompiled.
This would send the specified message to the same place other kernel messages are sent. For details
on the other options, see the logger(1) man-page.
One common problem is what to do with all of the log messages. If you do a lot of logging
(particularly if everything is sent to a central server), you can fill up your filesystem faster than you
think. The most obvious and direct solution is to remove them as after a specific length of time or
when they reach a particular size.
It is a fairly simple matter to write a shell script that is started from cron, which looks at the log files
and takes specific actions. The nice thing is that you do not have to. Linux provides this
functionality for you in the form of the logrotate command.
As its name implies, the goal of the logrotate program is to "rotate" log files. This could be as
simple as moving a log file to a different name and replacing the original with an empty file.
However, there is much more to it.
Two files define how logrotate behaves. The state file (specified with the -s or --state option)
basically tells logrotate when the last actions were taken. The default is /var/state/logrotate.
The configuration file tells logrotate when to rotate each of the respective files. If necessary, you
can have multiple configuration files which can all be specified on the same command line or you
include configuration files within another one.
The logrotate configuration file is broken into two parts. At the beginning are the global
configuration options, which apply to all log files. Next, there are the configuration sections of each
of the individual files (the logfile definitions). Note that some options can be global or for a specific
log file, which then overwrites the global options. However, there are some that can only be used
within a logfile definition.
A very simple logrotate configuration file to rotate the /var/log/messages might look like this:
errors root@logserver compress /var/log/messages { rotate 4 weekly postrotate /sbin/killall -HUP
syslogd endscript }
At the top are two global options, followed by a logfile definition for /var/log/messages. In this
case, we could have included the global definitions within the log file definition. However, there is
normally more than one logfile definition.
The first line says that all error messages are sent (mailed) to root at the logserver. The second line
says that log files are to be compressed after they are rotated.
The logfile definition consists of the logfile name and the directives to apply, which are enclosed
within curly brackets. The first line in the logfile definition says to rotate the 4 times before being
removed. The next line says to rotate the files once a week. Together these two lines mean that any
given copy of the /var/log/messages file will be saved for 4 weeks before it is removed.
The next three lines are actually a set. The postrotate directive says that what follows should be
done immediately after the log file has been rotated. In this case, syslogd is sent a HUP signal to
restart itself. There is also a prerotate directive, which has the same basic functionality, but does
everything before the log is rotated.
It is also possible to specify an entire directory. For example, you could rotate all of the samba logs
by specifying the directory /var/log/samba.d/*.
As I mentioned, you can also rotate logs based on their size. This is done by using the size= option.
Sitting size=100K would rotate logs larger than 100 Kb and 100M would rotate logs larger than 100
Mb.
Although you can ease the management of your log files with just the options we discussed, there
are an incredible number of additional options which you can use. Table 3 contains a list of options
you can use with a brief explanation. For more details see the logrotate(1) man-page.
Table 1
authpriv
cron
daemon
kern
lpr
mail
mark
news
security
syslog
user
uucp
local0 through local7.
The facility "security" should no longer be used and the "mark" facility is used internally and should
not be used within applications. The facilities local0 through local8 are intended for local events on
you local system when there is no other applicable facility.
Table 2 - Syslogd Priorities in increasing significance
debug
info
notice
warning or warn
err or error
crit
alert
emerg or panic
The priorities error, warn and panic are deprecated and should no longer used.
Table - logrotate options
compress/nocompress - compresses or does not compress old versions of logs.
delaycompress - Wait until the next cycle to compress the previous log.
create mode owner group - Log file is recreated with this mode, owner and group. (nocreate
overrides this.)
daily, weekly, monthly - Rotate logs in the indicated interval.
errors address - Send errors to the address indicated.
ifempty - Rotate the logs even if they are empty. (notifempty overrides this.)
include file_or_directory - Include the indicate file at this point. If a directory is given, all real files
in that directory are read.
mail address - Logs rotate out of existence are mailed to this address. (nomail overrides this)
olddir directory - old logs are moved to this directory, which must be on the same physical device.
(noolddir overrides this.)
postrotate/endscript - delimits commands run after the log is rotated. Both must appear on a line by
themselves.
prerotate/endscript - delimits commands before after the log is rotated. Both must appear on a line
by themselves.
rotate count - Rotates the log
size size - Log files greater than
tabooext [+] list - list of files not to include. A plus-sign means the files are added to the list rather
than replacing it.

Backups
In the section on backing up and restoring files under Working with the System, we talked briefly
about the process of backing up files and how to restore them. However, simply knowing what tools
you need is usually not enough. You might not have enough time or space to do a complete backup,
or restoring from a complete backup is not efficient. An advantage of doing a complete backup
every day is that it is very simple. If everything fits on a single tape, you stick in a tape when you
are done for the day and have something like cron schedule a backup in the middle of the night. If
you have more than will fit on one tape, there are hardware solutions, such as multiple tape drives
or a tape loader.
Rather that doing a complete back up every day, there are a number of different strategies that you
can employ to keep your data safe. For example, one way is to back up all of the data at regular
intervals and then once a day backup only the files that have changed since this full backup. If you
need to restore, you can load your master back and one extra tape.
Alternatively, you could make a full backup and then each day, only backup the files that changed
on that day. This is can be a problem if you have made changes to files on several different days and
need to load each time. This can be very time consuming.
What this basically says it that you need to make some kind of decision about what kind of backup
strategy you will use. Also consider that the backup strategy should have backups being done at a
time which has the least influence on users, for example in the middle of the night or on weekends.
Details of all this, I actually save for the section on problem solving.

cron
cron is a commonly confusing and misconfigured aspect of the operating system. Technically, cron
is just the clock daemon (/usr/sbin/cron or perhaps /usr/sbin/crond) that executes commands at
specific times. However, a handful of configuration files and programs go into making up the cron
package. Like many system processes, cron never ends.
The controlling files for cron are the cron-tables or crontabs. The crontabs are often located in
/var/spool/cron/crontab. However, on SuSE you will find them in /var/spool/cron/tabs. The names
of the files in this directory are the names of the users that submit the cron jobs.
Unlike other UNIX dialects, the Linux cron daemon does not sleep until the next cron job is ready.
Instead, when cron completes one job, it will keep checking once a minute for more jobs to run.
Also, you should not edit the files directly. You can edit them with a text editor like vi, though there
is the potential for messing things up. Therefore, you should use the tool that Linux provides:
crontab. (see the man-page for more details)
The crontab utility has several functions. It is the means by which files containing the cron jobs are
submitted to the system. Second, it can list the contents of your crontab. If you are root, it can also
submit and list jobs for any user. The problem is that jobs cannot be submitted individually. Using
crontab, you must submit all of the jobs at the same time.
At first, that might sound a little annoying. However, let's take a look at the process of "adding" a
job. To add a cron job, you must first list out the contents of the existing crontab with the -l option.
If you are root and wish to add something to another user's crontab, use the -u option followed by
the user's logname. Then redirect this crontab to a file, which you can then edit. (Note that on some
systems crontab has -e (for "edit"), which will do all the work for you. See the man-page for more
details.)
For example, lets say that you are the root user and want to add something to the UUCP user's
crontab. First, get the output of the existing crontab entry with this command:
crontab -l -u uucp >/tmp/crontab.uucp
To add an entry, simply include a new line. Save the file, get out of your editor, and run the crontab
utility again. This time, omit the -l to list the file but include the name of the file. The crontab utility
can also accept input from stdin, so you could leave off the file name and crontab would allow you
to input the cronjobs on the command line. Keep in mind that any previous crontab is removed no
matter what method you use.
The file /tmp/crontab.uucp now contains the contents of UUCPs crontab. It might look something
like this:
39,9 * * * * /usr/lib/uucp/uudemon.hour > /dev/null 10 * * * * /usr/lib/uucp/uudemon.poll >
/dev/null 45 23 * * * ulimit 5000; /usr/lib/uucp/uudemon.clean > /dev/null 48 10,14 * * 1-5
/usr/lib/uucp/uudemon.admin > /dev/null
Despite its appearance, each crontab entry consists of only six fields. The first five represent the
time the job should be executed and the sixth is the actual command. The first five fields are
separated by either a space or a tab and represent the following units, respectively:
• minutes (0-59)
• hour (0-23)
• day of the month (1-31)
• month of the year (1-12)
• day of the week (0-6, 0=Sunday)
To specify all possible values, use an asterisk (*). You can specify a single value simply by
including that one value. For example, the second line in the previous example has a value of 10 in
the first field, meaning 10 minutes after the hour. Because all of the other four time fields are
asterisks, this means that the command is run every hour of every day at 10 minutes past the hour.
Ranges of values are composed of the first value, a dash, and the ending value. For example, the
fourth line has a range (1-5) in the day of the week column, meaning that the command is only
executed on days 1-5, Monday through Friday.
To specify different values that are not within a range, separate the individual values by a comma.
In the fourth example, the hour field has the two values 10 and 14. This means that the command is
run at 10 a.m. and 2 p.m.
Note that times are additive. Lets look at an example:
10 * 1,16 * 1-5 /usr/local/bin/command
The command is run 10 minutes after every hour on the first and sixteenth, as well as Monday
through Friday. If either the first or the sixteenth were on a weekend, the command would still run
because the day of the month field would apply. However, this does not mean that if the first is a
Monday, the command is run twice.
The crontab entry can be defined to run at different intervals than just every hour or every day. The
granularity can be specified to every two minutes or every three hours without having to put each
individual entry in the crontab.
Lets say we wanted to run the previous command not at 10 minutes after the hour, but every ten
minutes. We could make an entry that looked like this.:
0,10,20,30,40,50 * 1,16 * 1-5 /usr/local/bin/command
This runs every 10 minutes: at the top of the hour, 10 minutes after, 20 minutes after, and so on. To
make life easier, we could simply create the entry like this:
*/10 * 1,16 * 1-5 /usr/local/bin/command
This syntax may be new to some administrators. (It was to me.) The slash (/) says that within the
specific interval (in this case, every minute), run the command every so many minutes; in this case,
every 10 minutes.
We can also use this even when we specify a range. For example, if the job was only supposed to
run between 20 minutes after the hour and 40 minutes after the hour, the entry might look like this:
20-40 * 1,16 * 1-5 /usr/local/bin/command
What if you wanted it to run at these times, but only every three minutes? The line might look like
this:
20-40/3 * 1,16 * 1-5 /usr/local/bin/command
To make things even more complicated, you could say that you wanted the command to run every
two minutes between the hour and 20 minutes after, every three minutes between 20 and 40 minutes
after, then every 5 minutes between 40 minutes after and the hour.
0-20/2,21-40/3,41-59/5 * 1,16 * 1-5 /usr/local/bin/command
One really nice thing that a lot of Linux dialects do is allow you to specify abbreviations for the
days of the week and the months. Its a lot easier to remember that fri is for Friday instead of 5.
With the exception of certain errors in the time fields, errors are not reported until cron runs the
command. All error messages and output is mailed to the users. At least that's what the crontab
man-page says and that is basically true. However, as you see in the previous examples, you are
redirecting stdout to /dev/null. If you wanted to, you could also redirect stderr there and you would
never see whether there were any errors.
Output is mailed to the user because there is no real terminal on which the cronjobs are being
executed. Therefore, there is no screen to display the errors. Also, there is no keyboard to accept
input. Does that mean you cannot give input to a cron job? No. Think back to the discussion on
shell scripts. We can redefine stdin, stdout and stderr. This way they can all point to files and behave
as we expect.
One thing I would like to point out is that I do not advocate doing redirection in the command field
of the crontab. I like doing as little there as possible. Instead, I put the absolute path to a shell script.
I can then test the crontab entry with something simple. Once that works, I can make changes to the
shell script without having to resubmit the cronjob.
Keep in mind that cron is not exact. It synchronizes itself to the top of each minute. On a busy
system in which you lose clock ticks, jobs may not be executed until a couple minutes after the
scheduled time. In addition, there may be other processes with higher priorities that delay cron jobs.
In some cases, (particularly on very busy systems) jobs might end up being skipped if they are run
every minute.
Access is permitted to the cron facility through two files, both in /etc. If you have a file cron.allow,
you can specify which users are allowed to use cron. The cron.deny says who are specifically not
allowed to use cron. If neither file exists, only the system users have access. However, if you want
everyone to have access, create an entry cron.deny file. In other words, no one is denied access.
It is often useful for root to run jobs as a different user without having to switch users (for example,
using the su command). Most Linux dialects provide a mechanism in the form of the /etc/crontab
file. This file is typically only writable by root and in some cases, only root can read it (which is
often necessary in high security environments). The general syntax is the same as the standard
crontabs, with a couple of exceptions. The first difference is the header, which you can see here:
SHELL=/bin/sh PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin MAILTO=root # # check
scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly # 59 * * * * root rm -f
/var/spool/cron/lastrun/cron.hourly 14 0 * * * root rm -f /var/spool/cron/lastrun/cron.daily 29 0 * *
6 root rm -f /var/spool/cron/lastrun/cron.weekly 44 0 1 * * root rm -f
/var/spool/cron/lastrun/cron.monthly
The SHELL variable defines the shell under which each command will run. The PATH variable is
like the normal PATH environment variable and defines the search path. The MAILTO variable says
who should get email messages, which includes error messages and the standard output of the
executed commands.
The structure of the actual entries is pretty much the same with the exception of the user name (root
in each case here). This way, the root users (or whoever can edit /etc/crontab) can define which user
executes the command. Keep in mind that this can be a big security hole. If someone can write to
this file, they can create an entry that runs as root and therefore has complete control of the system.
The next command in the cron "suite" is at. Its function is to execute a command at a specific time.
The difference is that once the at job has run, it disappears from the system. As for cron, two files,
at.allow and at.deny, have the same effect on the at program.
The batch command is also used to run commands once. However, commands submitted with batch
are run when the system gets around to it, which means when the system is less busy, for example,
in the middle of the night. Its possible that such jobs are spread out over the entire day, depending
on the load of the system.
One thing to note is the behavior of at and batch. Both accept the names of the commands from the
command line and not as arguments to the command itself. You must first run the command to be
brought to a new line, where you input the commands you want execute. After each command, press
Enter. When you are done, press Ctrl-D.
Because these two commands accept commands from stdin, you can input the command without
having to do so on a new line each time. One possibility is to redirect input from a file. For example

at now +1 hour < command_list

where command_list is a file containing a list of commands. You could also have at (or batch) as the
end of a pipe
cat command_list | at now + 1 hour

cat command_list | batch


Another interesting thing about both at and batch is that they create a kind of shell script to execute
your command. When you run at or batch, a file is created in /usr/spool/cron/atjobs. This file
contains the system variables that you would normally have defined, plus some other information
that is contained in /usr/lib/cron.proto. This essentially creates an environment as though you had
logged in.

User Communication
If you are running a multiuser system like Linux, you should expect to find other users on your
system. (I guess that's why it is a multi-user system.) Although there are many built-in mechanisms
to keep users separated, sometimes you will want to communicate with other users.
Linux provides several tools to do this, depending on exactly what you want to accomplish. If you
simply want to send a quick message to someone, for example, to remind him or her of a meeting,
you might use the write program, which sends (writes) a message to his or her terminal.
In contrast to some other systems (say, the winpop mechanism under Windows), each line is sent
when you press Enter. If you are on the receiving end of the message, the system lets you know who
sent you the message.
If the person you are trying to contact is logged in more than once, you need to specify the terminal
to which you want to send the message. So, if I wanted to talk to the user jimmo on terminal tty6,
the command would look like this:
write jimmo tty6

If you omit the terminal, write is kind enough to let you select which terminal to which you want to
send the message.
It might happen that someone tries the above command and receives the following message:
write: jimmo has messages disabled.
This message means that jimmo has used the mesg command to turn off such messages. The syntax
for this command is
mesg n

to turn it off and


mesg y

to turn it on. Unless the system administrator has decided otherwise, the command is on by default.
I have worked on some systems in which the administrator changed the default to off.
An extension of write is the wall command. Instead of simply writing the message to a single user,
wall writes as if it were writing on a (where else) wall. That is, everyone can see the message when
it is written on a wall, and so can every user. The wall command is often used by root to send
messages about system status (e.g. when the system is about to be shutdown. Even if a user has
disabled messages, the root user can still send them messages using wall.
If you want to have an interactive session, you could send write messages back and forth. On the
other hand, you could use the talk program that was designed to do just that. When talk first
connects to the other user, that other user sees on his or her screen
Message from TalkDaemon@source_machine... talk: connection requested by
callers_name@his_machine talk: respond with: talk callers_name@his_machine
As the message indicates, to respond, you would enter
talk callers_name@his_machine
You might have noticed that you can use talk to communicate with users on other machines. If you
omitted the machine name, talk would try to contact the user on the local machine (localhost). The
preceding message would simply say
talk: connection requested by callers_name@localhost
You can also disable talk by using the mesg command.
It is common practice to use a couple of terms from radio communication when using talk. Because
you cannot always tell when someone is finished writing, it is common to end the line with -o (or
use a separate line) to indicate that your turn is "over." When you are finished with the conversation
and wish to end it, use oo (over and out).
Both of these mechanisms have some major problems if the user is not logged in: they don't work!
Instead, there's mail or, more accurately, electronic mail (or e-mail).
On most UNIX systems (including Linux), e-mail is accessed through the mail command.
Depending on your system, the mail program may be linked to something else. On my system, the
default was to link to /usr/bin/mail.
There are several different programs for sending and viewing mail. You could use one mail program
(or mailer) to send the message and another to read it. Often the program that you use to read your
mail is called a mail reader or, simply, reader. Before we go on to the more advanced mail
programs, I want to talk about the most common mail program and the one that is most likely to be
on your system. (From here on, I will be referring to e-mail simply as mail.)
Mail comes in units called messages. Whether you use UUCP or the Internet, mail is sent back and
forth in messages. However, once the message has reached its destination, it is usually tacked onto
the end of an existing mail file. There is usually one mail file per user, but that single file contains
all of a user's messages (that is, all those that haven't yet been deleted).
To read your mail, you can use three primary character-based programs: elm, pine, and the default
reader, mail. Actually, you can use all three programs to send mail as well as read it. Each program
has its own advantages and disadvantages. Although the mail interface looks menu-driven, it simply
scrolls the information across the screen. Both elm and pine have much more complex menuing
systems. Because of this, mail is easier to learn, but you can do much more with the other two
programs.
All three programs understand the concept of a "folder" in which you can store messages. This
allows you to develop a hierarchy of files that is no different from the normal file system. How the
folders are created and managed depends on the program you are using. Therefore, I would suggest
that once you decide to use a specific program, stick with it because the files may not be
compatible.
In keeping with the basic premise of this book, I must treat these programs as applications.
Therefore, I won't go into any more detail about them. Instead, I suggest that you install all three
and see which one suits your needs best. If you have the space, you may consider providing all three
for your users. The man-pages provide a great deal of information and each program has its own on-
line help.
If you are using the X-Windowing System and a desktop environment such as the KDE, you have a
much larger and varied choice, such as my favorite Kmail. Prior to using kmail, I was using
Netscape Communicator. Although the Netscape Communicator has many useful features, Kmail
had the features I really need. Plus, I use the KDE as my desktop environment and Kmail fits into
the KDE architecture. (I will talk more about the KDE and many of the programs when I get the
time.)

Webmin
Linux advocates regularly hear from fans of other operating systems about how unstable Linux is.
They say there is no support for Linux and there are no applications. Some go so far as to claim
Linux is nothing more that a collection of programs creating by a small group of hackers and
inexperienced programmers.
The sheer number of Linux Internet servers attests to Linux's stability. Even a quick search of any of
a number of Linux sites leads to hundreds of companies tht provide Linux support. The fact that
major software vendors like Corel and Oracle have already ported their products to Unix,
demonstrates that the applications are there. Looking (glancing even) at some of the names
responsible for the Linux source code shows the quality of the people working on the various
components of Linux.
All of these aspects seem to show that these statements of Linux opponents are blatantly untrue and
demonstrate the ability of Linux to fit in well in most any environment. However, one place where
Linux advocates often loose the battle is when talking about graphical administration tools.
Especially when compared to Windows NT, Linux seems to be lagging behind.
Or so it seems.
In my mind, one of the of problems lies in the modularity of Linux. Although Linux is technically
just the kernel, the name is now used to include the all of the files, scripts and programs that are
delivered with the various distributions. However, no two distributions are identical and the tools
each provides vary (sometimes greatly). What this means is that the tools are often not always easy
to find, which leads some people to believe that the tools do not exist. In some cases, the tools that
are provided are lacking in some and functionality.
The real truth is that powerful graphical administration tools are not lacking. In fact, like many
aspects of Linux, you actually have a choice of several different packages. It is just a simple matter
of what tools you like working with.
One tool that I have grown fond of recently is Webmin, developed by Jamie Cameron. As you might
be able to tell from its name, Webmin is used to administer your system using a Web browser. That
means, you can administer your system from any system with a web browser. Webmin has taken
this one step further by enabling you to administer a wide range of systems, including several
different Linux distributions, Solaris, DEC OSF1, AIX, HP/UX, Irix and FreeBSD.
In essence, Webmin provides an mini-HTTP server (written in perl), which creates the forms,
processes the input, and executes the commands. Because you need to be root to make most of the
administration changes to you system, Webmin needs to be able to do that as well. This means that
Webmin runs by default with super-user privileges.
Some people may wince at the thought of allowing root access through a web browser. Although
there are some potential security holes, Webmin has a number of different features which increase
the overall security.
The first place where Webmin addresses the issue of security is by requiring an extra username and
password to access the server. By default this is the user "admin" who has the same password as
root. I would suggest that once you have Webmin installed, you change both the account name and
the password.
Webmin also allows you to assign administration to different users. You can create additional users,
to which you can then assign privileges to administer different aspects of your system. For example,
it is possible to define a user (or users) who are allowed to just administer the printers, whereas
another user can administer just DNS. This is done using the Webmin Users module, which also
gives you an overview of which users have which privileges.
One of the basic security problems HTTP has is that the information is transferred across your
network (or the Internet) in clear text. That is, it is possible to intercept the connection and read the
administrator's password. Webmin can easily protect against this by using the Secure Socket Layer
(SSL), provided you have the Perl SSL libraries installed on your system.

The figure above shows you the initial start up page for Webmin. As you can see the interface is
very simple, while at the same time being very practical. Behind each of the buttons is a different
administrative function which is contained within a single Webmin module. One of the modules is
used to administer the modules themselves. Part of the administration is to remove or install the
modules as needed.
Because Webmin is modular, it is very easy to add your own modules, without the need of changing
any of the existing scripts. Although the developer of a particular module needs to make sure the
right components are available, anyone using the module can plop it in like a Netscape plug-in.
When witting your own modules, there are two requirements that need to be followed. First, there
needs to be an icon for that module, which is stored as <module>/images/icon.gif. Second, there is
the <module>/module.info. In both cases, <module> is the name of the particular module. The
module.info file is a set of parameters, which take the form parameter = value and contains
information about the module, like its name, a description, what operating system it supports and so
on.
By convention, the module, should produce a page which looks like the other Webmin modules.
This can be done by using any programming language, such as C. However, one of the design goals
of Webmin is to have it run unchanged on as many platforms as possible. Therefore, if you write a
module in C, or the module uses any special programs, you will need to recompile on each new
machine. By convention modules are written in perl, which makes them portable across platforms.
Webmin has a particular advantage for administrators who are either new to a particular aspects of
administering a system or new to Linux in general, in that it already knows the syntax of the various
configuration files. This ensures that that syntax is correct. I also know experienced administrators
who use Webmin to setup the basic configuration and then edit the files by hand to make any
additional changes.
The first step is to get Webmin from the Webmin home page, which is provided as a gzipped tar
archive. When you unpack the archive it creates a sub-directory based on the version you have. For
example, the current version (as of this writing) might create the directory /usr/local/webmin-0.73).
This becomes the root directory for the HTTP server, so make sure you are extracting it in the right
place before you go on.
Next change into the directory where the Webmin archive was extracted and run the script setup.sh.
This is the primary setup/configuration script. This asks you a series of questions such as where to
put the configuration directory for Webmin, which defaults to /etc/webmin.
The setup script also asks you your operating system, administrator's username and password, and
other details about your existing configuration. Make sure that you choose the right operating
system and, if available, the right version. This is extremely important as the location of the scripts
and program, which Webmin uses, as well as their options, are be different among different
operating systems. In addition, Webmin uses this information to determine what modules it should it
include. If you don't get this right, Webmin won't work right.
During the setup process the script will also ask you if Webmin should be started when the system
boots. This adds the Webmin startup script to the appropriate rc-directory (i.e. /etc/rc.d/rc2.d to start
in run-level 2). In addition, if you have a previous version of Webmin in the config directory,
Webmin knows to upgrade it.
Part of the configuration process is to include the necessary modules. In many cases, the same
module can be used for multiple operating systems with little or no changes. In other cases, there
are specific modules for different operating systems. For example, there are separate modules to
configure NFS on a number of different systems. This is one reason why it is important to chose the
correct operating system during the setup.
If you look in the configuration directory, you will see that it contains a few scripts, text files and a
large number of directories. Here you find the start and stop scripts, which are called from the
appropriate rc-script if you have configured Webmin to start at boot time. The file miniserv.conf
contains the configuration information for the mini-server, such as the port it uses, hostname,
whether SSL is used and so forth. Some of these values are assigned when you first setup Webmin
and they can be changed using Webmin itself.
If you look at the directory name, it is fairly straightforward to figure out what each script does,
even if you have never used Webmin before. There is a directory for each module, which contains
the configuration information for that module. These directories mirror the directories under the
server root directory, which contain all of the various perl scripts. When you connect to the server
using the defined port, the script index.cgi in the server root directory is run. This checks for which
modules are installed and displays the necessary icons for each module. Since index.cgi is a script,
the menu it presents is dynamic. Therefore, if a module is removed or added there is no need to edit
any pages to reflect change you make.
The icons you see are hyperlinks to their respective directories. Here too the default page is the
script index.cgi, which once again builds the page as appropriate based on the current configuration.
These scripts are dynamic as well. Therefore, as I mentioned previously, it is possible to edit the
normal system configuration files by hand and then re-load the configuration from Webmin. That
means, there is no conflict if one administrator prefers to edit the files by hand and another chooses
to use Webmin. When you access the particular module, the appropriate configuration files are read
with any changes that have been made by hand.
With many of the modules, the first page is simply an overview of what can be configured. For
example, clicking on the Samba button brings you the page in Figure 2. At the top is a list of the
configured shares. Clicking on one allows you to configure that particular share. At the bottom of
the page are the global configuration options.
There are two modules which I feel require special attention as they are not directly related to
configuring your system. The first is the File Manager module which is just that. It is a Java applet,
which provides you a full-featured file manager which affects the files and directories on the remote
system (the one being administered). This includes all of the expected features, such as copy, delete,
move, rename, cut, paste, and so forth. You even have the ability to view text files.
Sometimes configuring the files through Webmin or even the File Manager is not enough. For
example, you may need to execute commands on the remote machine. Webmin makes this a lot
easier by providing you a Java telnet client. This means you don't need to start an external program
and can do it right from Webmin. Note that this is truly a telnet client, so if root is denied telnet
access, it will also be denied through this Webmin applet.
As of this writing, there are 8 Webmin third party modules in addition to the over 30 modules that
form the base product. The third party modules typically provide functionality which is only
necessary for user with specific applications, such as managing the secure shell SSH, configuring
the SAP router/proxy, administering MiniVend shops, and or managing Qmail or Zmail.
There is also a set of network utilities from Tim Niemueller (http://www.niemueller.de/webmin-
modules/nettools/) that use the Webmin interface to give you access to standard monitoring tools,
such as ping, traceroute and nslookup. It also provides an "IP subnet Calculator," which calculates
the smallest possible network (i.e. netmask) for a given number of nodes.
Chapter VII
The Operating System
In this section, we are going to go into some detail about what makes a Linux operating system. I
am not talking about the "product" Linux or any of the bundled distributions such as SuSE, RedHat,
or Mandrake. Here, I am talking strictly about the software that manages and controls your
computer. The collection of functions that do all the work are collectively called the "kernel".
Because an operating system is of little use without hardware and other software, we are going to
discuss how the operating system interacts with other parts of the various Linux distributions. I will
also talk about what goes into making the kernel, what components it is made of, and what you can
do to influence the creation of a new kernel.
Much of this information is far beyond what many system administrators are required to have for
their jobs. So why go over it? Because what is required and what the administrator should know are
two different things. Many calls I received while in tech support and many questions posted to
newsgroups could have been avoided had the administrator understood the meaning of a message
on the system console or the effects of making changes. By going over the details of how the kernel
behaves, I hope to put you in a better position to understand what is happening.
The contents of this discussion is based primarily on two sources. The first is my book Linux User's
Resource. The second is David Rusling's "The Linux Kernel". In our seperate documents David and
I covered different topics and in different levels of detail, so you didn't get the full story by reading
either one by itself. Rather than rewriting everything from scratch, David has graciously given me
permission to include his material with mine. Perhaps "merge" is a better term than "include",
because in spite of much commonality between the two documents, one often included information
that the other did not include.

Hardware Basics
An operating system has to work closely with the hardware system that acts as its foundations. The
operating system needs certain services that can only be provided by the hardware. In order to fully
understand the Linux operating system, you need to understand the basics of the underlying
hardware. This section gives a brief introduction to that hardware: the modern PC.
Note that some of this material is a repeat of what you will find in the
When the "Popular Electronics" magazine for January 1975 was printed with an illustration of the
Altair 8800 on its front cover, a revolution started. The Altair 8800, named after the destination of
an early Star Trek episode, could be assembled by home electronics enthusiasts for a mere $397.
With its Intel 8800 processor and 256 bytes of memory but no screen or keyboard, it was puny by
today's standards. Its inventor, Ed Roberts, coined the term "personal computer" to describe his new
invention, but the term PC is now used to refer to almost any computer that you can pick up without
needing help. By this definition, even some of the very powerful Alpha AXP systems are PCs.
Enthusiastic hackers saw the Altair's potential and started to write software and build hardware for
it. To these pioneers it represented freedom; the freedom from huge batch processing mainframe
systems run and guarded by an elite priesthood. Overnight fortunes were made by college dropouts
fascinated by this new phenomenon, a computer that you could have at home on your kitchen table.
A lot of hardware appeared, all different to some degree, and software hackers were happy to write
software for these new machines. Paradoxically it was IBM who firmly cast the mould of the
modern PC by announcing the IBM PC in 1981 and shipping it to customers early in 1982. With its
Intel 8088 processor, 64K of memory (expandable to 256K), two floppy disks and an 80 character
by 25 line Colour Graphics Adapter (CGA) it was not very powerful by today's standards but it sold
well. It was followed, in 1983, by the IBM PC-XT which had the luxury of a 10Mbyte hard drive. It
was not long before IBM PC clones were being produced by a host of companies such as Compaq,
and the architecture of the PC became a de-facto standard. This de-facto standard helped a multitude
of hardware companies to compete in a growing market which, happily for consumers, kept prices
low. Many of the system architectural features of these early PCs have carried over into the modern
PC. For example, even the most powerful Intel Pentium Pro based system starts running in the Intel
8086's addressing mode. In the early 1990's, when Linus Torvalds started writing what was to
become Linux, he picked the most plentiful and reasonably priced hardware, an Intel 80386 PC.

Looking at a PC from the outside, the most obvious components are a system box, a keyboard, a
mouse and a video monitor. On the front of the system box are some buttons, a little display
showing some numbers and a floppy drive. Most systems these days have a DVD drive, or at the
very least a CD ROM. If you feel that you have to protect your data, then there will also be a tape
drive for backups . These devices are collectively known as the peripherals.
Although the is in overall control of the system, it is not the only intelligent device. All of the
peripheral controllers, for example the controller, have some level of intelligence. Inside the PC
you will see a motherboard containing the or microprocessor, the memory and a number of slots for
the ISA or PCI peripheral controllers. Some of the controllers, for example the disk controller may
be built directly onto the system board.
Although all controllers are different, they usually act in accordance with information stored in
registers, a kind of dedicated, quickly accessed memory location. Software running on the must be
able to read and write those controlling registers. One register might contain status describing an
error. Another might be used for control purposes such as changing the mode of the controller. Each
controller on a bus can be individually addressed by the . This is so that the software device driver
can write to its registers and thus control it. The ribbon is a good example, as it gives you the ability
to access each drive on the bus separately. Another good example is the bus which allows each
device (for example a graphics card) to be accessed independently.
There are times when controllers need to read or write large amounts of data directly to or from
system memory, for example when user data is being written to the hard disk. In this case, Direct
Memory Access () controllers are used to allow hardware peripherals to directly access system
memory. However, this access is under strict control and supervision of the .

CPU Basics
The , or rather microprocessor, is the heart of any computer system. The microprocessor calculates,
performs logical operations and manages data flows by reading instructions from memory and then
executing them. In the early days of computing the functional components of the microprocessor
were separate (and physically large) units. This is when the term Central Processing Unit was
coined. The modern microprocessor combines these components onto an integrated circuit etched
onto a very small piece of silicon. The terms , microprocessor and processor are all used
interchangeably.
Microprocessors operate on binary data; that is data composed of ones and zeros. These ones and
zeros correspond to electrical switches being either on or off. Just as 42 is a decimal number
meaning "4 10s and 2 units", a binary number is a series of binary digits each one representing a
power of 2. In this context, a power means the number of times that a number is multiplied by itself.
10 to the power 1 ( 101 ) is 10, 10 to the power 2 ( 102 ) is 10x10, 103 is 10x10x10 and so on.
Binary 0001 is decimal 1, binary 0010 is decimal 2, binary 0011 is 3, binary 0100 is 4 and so on.
So, 42 decimal is 101010 binary or (2 + 8 + 32 or 21 + 23 + 25 ). Rather than using binary to
represent numbers in computer programs, another base, hexadecimal is usually used.
In this base, each digital represents a power of 16. As decimal numbers only go from 0 to 9, the
numbers 10 to 15 are represented as a single digit by the letters A, B, C, D, E and F. For example,
hexadecimal E is decimal 14 and hexadecimal 2A is decimal 42 (two 16s + 10). Using the C
programming language notation (as I do throughout this book) hexadecimal numbers are prefaced
by "0x"; hexadecimal 2A is written as 0x2A .
Microprocessors can perform arithmetic operations such as add, multiply and divide and logical
operations such as "is X greater than Y?".
The processor's execution is governed by an external clock. This clock, the system clock, generates
regular clock pulses to the processor and, at each clock pulse, the processor does some work. For
example, a processor could execute an instruction every clock pulse. A processor's speed is
described in terms of the rate of the system clock ticks. A 100Mhz processor will receive
100,000,000 clock ticks every second. It is misleading to describe the power of a by its clock rate as
different processors perform different amounts of work per clock tick. However, all things being
equal, a faster clock speed means a more powerful processor. The instructions executed by the
processor are very simple; for example "read the contents of memory at location X into Y".
Registers are the microprocessor's internal storage, used for storing data and performing operations
on it. The operations performed may cause the processor to stop what it is doing and jump to
another instruction somewhere else in memory. These tiny building blocks give the modern
microprocessor almost limitless power as it can execute millions or even billions of instructions a
second.
The instructions have to be fetched from memory as they are executed. Instructions may themselves
reference data within memory and that data must be fetched from memory and saved there when
appropriate. The size, number and type of within a microprocessor is entirely dependent on its type.
An Intel 4086 processor has a different set to an Alpha AXP processor; for a start, the Intel's are 32
bits wide and the Alpha AXP's are 64 bits wide. In general, though, any given processor will have a
number of general purpose s and a smaller number of dedicated registers. Most processors have the
following special purpose, dedicated, registers:
Program Counter (PC)
This contains the address of the next instruction to be executed. The contents of the PC are
automatically incremented each time an instruction is fetched.

Stack Pointer (SP)


Processors have to have access to large amounts of external read/write random access
memory () which facilitates temporary storage of data. The stack is a way of easily saving and
restoring temporary values in external memory. Usually, processors have special instructions
which allow you to push values onto the stack and to pop them off again later. The stack
works on a last in first out (LIFO) basis. In other words, if you push two values, x and y, onto
a stack and then pop a value off of the stack then you will get back the value of y.

Some processor's stacks grow upwards towards the top of memory whilst others grow
downwards towards the bottom, or base, of memory. Some processors support both types of
stacks, for example ARM.

Processor Status (PS)


Instructions may yield results; for example "is the content of X greater than the content of Y?"
will yield true or false as a result. The PS holds this and other information about the current
state of the processor. For example, most processors have at least two modes of operation,
kernel (or supervisor) and user. The PS would hold information identifying the current mode.

Memory Basics
All systems have a memory hierarchy with memory at different speeds and sizes at different points
in the hierarchy. The fastest memory is known as cache memory and is what it sounds like -
memory that is used to temporarily hold, or cache, contents of the main memory. This sort of
memory is very fast but expensive, therefore most processors have a small amount of on-chip cache
memory and more system based (on-board) cache memory. Some processors have one cache to
contain both instructions and data, but others have two, one for instructions and the other for data.
The Alpha AXP processor has two internal memory caches; one for data (the D-Cache) and one for
instructions (the I-Cache). The external cache (or B-Cache) mixes the two together. Finally there is
the main memory which, relative to the external cache memory, is very slow. Relative to the on-
cache, main memory is positively crawling. The cache and main memories must be kept in step
(coherent). In other words, if a word of main memory is held in one or more locations in cache, then
the system must make sure that the contents of cache and memory are the same. The job of cache
coherency is done partially by the hardware and partially by the operating system. This is also true
for a number of major system tasks where the hardware and software must cooperate closely to
achieve their aims.

Bus Basics
The individual components of the system board are interconnected by multiple connection systems
known as buses. The system bus is divided into three logical functions; the address bus, the data bus
and the control bus. The address bus specifies the memory locations (addresses) for the data
transfers. The data bus holds the data transfered. The data bus is bidirectional; it allows data to be
read into the and written from the . The control bus contains various lines used to route timing and
control signals throughout the system. Many flavours of buses exist, for example and buses are
popular ways of connecting peripherals to the system.

Controller and Peripheral Basics


Peripherals are real devices, such as graphics cards or disks controlled by controller chips on the
system board or cards plugged into it. The disks are controlled by the IDE controller chip and the
disks by the SCSI disk controller chips and so on. These controllers are connected to the and to each
other by a variety of es. Most systems built now use and buses to connect the main system
components. The controllers are processors like the itself. They can be viewed as intelligent helpers
to the . The is in overall control of the system.

Address Spaces
The system bus connects the with the main memory and is separate from the buses connecting the
with the system's hardware peripherals. Collectively the memory space that the hardware
peripherals exist in is known as I/O space. I/O space may itself be further subdivided, but we will
not worry too much about that for the moment. The can access both the system space memory and
the I/O space memory, whereas the controllers themselves can only access system memory
indirectly and then only with the help of the . For example the floppy disk controller can only see
the address space that its control registers are in (ISA), and not the system memory
Typically a will have separate instructions for accessing the memory and I/O space. For example,
there might be an instruction that means "read a byte from I/O address 0x3f0 into X". This is exactly
how the controls the system's hardware peripherals, by reading and writing to their registers in I/O
space. Where in I/O space the common peripherals have their registers has been set by convention
over the years as the PC architecture has developed. The I/O space address 0x3f0 just happens to be
the address of one of the serial port's (COM1) control registers.
Timers
All operating systems need to know the time, so the modern PC includes a special peripheral called
the Real Time Clock (RTC). This provides two things: a reliable time of day and an accurate timing
interval. The RTC has its own battery so that it continues to run even when the PC is not powered
on. This is how your PC always "knows" the correct date and time. The interval timer allows the
operating system to accurately schedule essential work.

Software Basics
A program is a set of computer instructions that perform a particular task. That program can be
written in assembler, a very low level computer language, or in a high level, machine independent
language such as the C programming language.
An operating system is a special program that allows the user to run applications such as
spreadsheets and word processors. This chapter introduces basic programming principles and gives
an overview of the aims and functions of an operating system.
A process could be thought of as a program in action. Each process is a separate entity that is
running a particular program. If you look at the processes on your Linux system, you will see that
there are rather a lot of processes running at any given moment.

Computer Languages
The instructions that a fetches from memory and executes are not at all understandable to human
beings. They are machine codes which tell the computer precisely what to do. The hexadecimal
number 0x89E5 is an Intel 80486 instruction which copies the contents of the ESP to the EBP
register. One of the first software tools invented for the earliest computers was an assembler, a
program which takes a human readable source file and assembles it into machine code. Assembly
languages explicitly handle registers and operations on data and they are specific to a particular
microprocessor. The assembly language for an Intel X86 microprocessor is very different from the
assembly language for an Alpha AXP microprocessor. The following Alpha AXP assembly code
shows the sort of operations that a program can perform:
ldr r16, (r15) ; Line 1
ldr r17, 4(r15) ; Line 2
beq r16,r17,100 ; Line 3
str r17, (r15) ; Line 4
100: ; Line 5

The first statement (on line 1) loads register 16 from the address held in register 15. The next
instruction loads register 17 from the next location in memory. Line 3 compares the contents of
register 16 with that of register 17 and, if they are equal, branches to label 100. If the registers do
not contain the same value then the program continues to line 4 where the contents of r17 are saved
into memory. If the registers do contain the same value then no data needs to be saved. Assembly
level programs are tedious and tricky to write and prone to errors. Very little of the Linux kernel is
written in assembly language and those parts that are are written only for efficiency and they are
specific to particular microprocessors.
The C Programming Language and Compiler
Writing large programs in assembly language is a difficult and time consuming task. It is prone to
error and the resulting program is not portable, being tied to one particular processor family. It is far
better to use a machine independent language like C. C allows you to describe programs in terms of
their logical algorithms and the data that they operate on. Special programs called compilers read
the C program and translate it into assembly language, generating machine specific code from it. A
good compiler can generate assembly instructions that are very nearly as efficient as those written
by a good assembly programmer. Most of the Linux kernel is written in the C language. The
following C fragment:
if (x != y)
x = y ;

performs exactly the same operations as the previous example assembly code. If the contents of the
variable x are not the same as the contents of variable y then the contents of y will be copied to x.
C code is organized into routines, each of which perform a task. Routines may return any value or
data type supported by C. Large programs like the Linux kernel comprise many separate C source
modules each with its own routines and data structures. These C source code modules group logical
functions such as filesystem handling code.
C supports many types of variables, or locations in memory which can be referenced by a symbolic
name. In the above C fragment x and y refer to locations in memory. The programmer does not care
where in memory the variables are put, it is the linker (see below) that has to worry about that.
Some variables contain different sorts of data, integer and floating point and others are pointers.
Pointers are variables that contain the address, the location in memory of other data. Consider a
variable called x, it might live in memory at address 0x80010000. You could have a pointer, called
px, which points at x. px might live at address 0x80010030. The value of px would be 0x80010000:
the address of the variable x.
C allows you to bundle together related variables into data structures. For example,
struct {
int i ;
char b ;
} my_struct ;

is a data structure called my_struct which contains two elements, an integer (32 bits of data
storage) called i and a character (8 bits of data) called b.

Linkers are programs that link together several object modules and libraries to form a single,
coherent, program. Object modules are the machine code output from an assembler or compiler and
contain executable machine code and data together with information that allows the linker to
combine the modules together to form a program. For example one module might contain all of a
program's database functions and another module its command line argument handling functions.
Linkers fix up references between these object modules, where a routine or data structure referenced
in one module actually exists in another module. The Linux kernel is a single, large program linked
together from its many constituent object modules.
Memory management Basics
With infinite resources, for example memory, many of the things that an operating system has to do
would be redundant. One of the basic tricks of any operating system is the ability to make a small
amount of physical memory behave like rather more memory. This apparently large memory is
known as virtual memory. The idea is that the software running in the system is fooled into
believing that it is running in a lot of memory. The system divides the memory into easily handled
pages and swaps these pages onto a hard disk as the system runs. The software does not notice
because of another trick, multi-processing.

Device Driver Basics


Device drivers make up the major part of the Linux kernel. Like other parts of the operating system,
they operate in a highly privileged environment and can cause disaster if they get things wrong.
Device drivers control the interaction between the operating system and the hardware device that
they are controlling. For example, the filesystem makes use of a general block device interface
when writing blocks to an disk. The driver takes care of the details and makes device specific things
happen. Device drivers are specific to the controller chip that they are driving which is why, for
example, you need the NCR810 SCSI driver if your system has an NCR810 controller.

Kernel Data Structures


The operating system must keep a lot of information about the current state of the system. As things
happen within the system these data structures must be changed to reflect the current reality. For
example, a new process might be created when a user logs onto the system. The kernel must create a
data structure representing the new process and link it with the data structures representing all of the
other processes in the system.
Mostly these data structures exist in physical memory and are accessible only by the kernel and its
subsystems. Data structures contain data and pointers, addresses of other data structures, or the
addresses of routines. Taken all together, the data structures used by the Linux kernel can look very
confusing. Every data structure has a purpose and although some are used by several kernel
subsystems, they are more simple than they appear at first sight. Understanding the Linux kernel
hinges on understanding its data structures and the use that the various functions within the Linux
kernel makes of them. This section bases its description of the Linux kernel on its data structures. It
talks about each kernel subsystem in terms of its algorithms, which are its methods of getting things
done, and their usage of the kernel's data structures.

Linked Lists
Linux uses a number of software engineering techniques to link together its data structures. On a lot
of occasions it uses linked or chained data structures. If each data structure describes a single
instance or occurance of something, for example a process or a network device, the kernel must be
able to find all of the instances. In a linked list a root pointer contains the address of the first data
structure, or element, in the list, then each subsequent data structure contains a pointer to the next
element in the list. The last element's next pointer would be 0 or NULL to show that it is the end of
the list. In a doubly linked list each element contains both a pointer to the next element in the list but
also a pointer to the previous element in the list. Using doubly linked lists makes it easier to add or
remove elements from the middle of list, although you do need more memory accesses. This is a
typical operating system trade off: memory accesses versus cycles.

Hash Tables
Linked lists are handy ways of tying data structures together, but navigating s can be inefficient. If
you were searching for a particular element, you might easily have to look at the whole list before
you find the one that you need. Linux uses another technique, hashing, to get around this restriction.
A hash table is an array or vector of pointers. An array, or vector, is simply a set of things coming
one after another in memory. A bookshelf could be said to be an array of books. Arrays are accessed
by an index, which is an offset into the array's associated area in memory. Taking the bookshelf
analogy a little further, you could describe each book by its position on the shelf; you might ask for
the 5th book.
A hash table is an array of pointers to data structures and its index is derived from information in
those data structures. If you had data structures describing the population of a village then you could
use a person's age as an index. To find a particular person's data you could use their age as an index
into the population hash table and then follow the pointer to the data structure containing the
person's details. Unfortunately many people in the village are likely to have the same age and so the
hash table pointer becomes a pointer to a chain or list of data structures each describing people of
the same age. However, searching these shorter chains is still faster than searching all of the data
structures.
As a hash table speeds up access to commonly used data structures, Linux often uses hash tables to
implement caches. Caches are handy information that needs to be accessed quickly and are usually
a subset of the full set of information available. Data structures are put into a cache and kept there
because the kernel often accesses them. The drawback to caches is that they are more complex to
use and maintain than simple linked lists or hash tables. If the data structure can be found in the
cache (this is known as a cache hit), then all well and good. If it cannot then all of the relevant data
structures must be searched and, if the data structure exists at all, it must be added into the cache. In
adding new data structures into the cache an old cache entry may need discarding. Linux must
decide which one to discard, the danger being that the discarded data structure may be the next one
that Linux needs.

Abstract Interfaces
The Linux kernel often abstracts its interfaces. An interface is a collection of routines and data
structures which operate in a well-defined way. For example all, network device drivers have to
provide certain routines to operate on particular data structures. This way there can be generic layers
of code using the services (interfaces) of lower layers of specific code. The network layer is generic
and it is supported by device specific code that conforms to a standard interface.
Often these lower layers register themselves with the upper layer at boot time. This registration
usually involves adding a data structure to a linked list. For example each filesystem built into the
kernel registers itself with the kernel at boot time or, if you are using modules, when the filesystem
is first used. You can see which filesystems have registered themselves by looking at the file
/proc/filesystems.
The registration data structure often includes pointers to functions. These are the addresses of
software functions that perform particular tasks. Again, using filesystem registration as an example,
the data structure that each filesystem passes to the Linux kernel as it registers includes the address
of a filesystem specfic routine which must be called whenever that filesystem is mounted.

The Kernel
If any single aspect of a Linux distribution could be called "Linux," then it would be the kernel. So
what is the kernel? Well, on the hard disk, it is represented by the file /vmlinuz. Just as a program
like /bin/date is a collection of bytes that isn't very useful until it is loaded in memory and running,
the same applies to /vmlinuz.
However, once the /vmlinuz program is loaded into memory and starts its work, it becomes "the
kernel" and has many responsibilities. Perhaps the two most important responsibilities are process
management and file management. However, the kernel is responsible for many other things. One
aspect is I/O management, which is essentially the accessing of all the peripheral devices.
In the following sections, we are going to look at what's under the hood of your Linux system.
Rather than turning this into a book on operating system theory, I am going to have to gloss over
some things. I will go into detail about those issues that can and will affect your ability to run and
administer a Linux system, however.
These sections will be based on the Intel 386 (i386) architecture, for which Linux was originally
designed. Linux also runs on 486, Pentium, and Pentium Pro processors, and has been ported to
other architectures. These concepts are common to all of them. Linux has been ported to other
processor types, however; it was originally designed for the i386. In addition, these versions are all
newer, and because this site cannot be a "do-all" and "be-all" for everyone, I felt it necessary to limit
the bulk of my discussion to the i386 because it is the most widespread version.

Memory Management
The memory management subsystem is one of the most important parts of the operating system.
Since the early days of computing, there has been a need for more memory than exists physically in
a system. Strategies have been developed to overcome this limitation and the most successful of
these is virtual memory. Virtual memory makes the system appear to have more memory than is
physically present by sharing it among competing processes as they need it.
Virtual memory does more than just make your computer's memory go farther. The memory
management subsystem provides:
Large Address Spaces
The operating system makes the system appear as if it has a larger amount of memory than it
actually has. The virtual memory can be many times larger than the physical memory in the
system.

Protection
Each process in the system has its own virtual address space. These virtual address spaces are
completely separate from each other and so a process running one application cannot affect
another. Also, the hardware virtual memory mechanisms allow areas of memory to be
protected against writing. This protects code and data from being overwritten by rogue
applications.
Memory Mapping
Memory mapping is used to map image and data files into a process' address space. In
memory mapping, the contents of a file are linked directly into the virtual address space of a
process.

Fair Physical Memory Allocation


The memory management subsystem allows each running process in the system a fair share of
the physical memory of the system.

Shared Virtual Memory


Although virtual memory allows processes to have separate (virtual) address spaces, there are
times when you need processes to share memory. For example there could be several
processes in the system running the bash command shell. Rather than have several copies of
bash, one in each process's virtual address space, it is better to have only one copy in physical
memory and all of the processes running bash share it. Dynamic libraries are another
common example of executing code shared between several processes.

Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with
two or more processes exchanging information via memory common to all of them. Linux
supports the Unix System V shared memory IPC.

Virtual Memory

An Abstract Model of Virtual Memory

Figure: Abstract model of Virtual to Physical address mapping


Before considering the methods that Linux uses to support virtual memory it is useful to consider an
abstract model that is not cluttered by too much detail.
As the processor executes a program it reads an instruction from memory and decodes it. In
decoding the instruction, the processor may need to fetch or store the contents of a location in
memory. The processor then executes the instruction and moves on to the next instruction in the
program. In this way the processor is always accessing memory either to fetch instructions or to
fetch and store data.
In a virtual memory system all of these addresses are virtual addresses and not physical addresses.
These virtual addresses are converted into physical addresses by the processor based on information
held in a set of tables maintained by the operating system.
To make this translation easier, virtual and physical memory are divided into handy sized chunks
called pages. These pages are all the same size. They need not be but if they were not, the system
would be very hard to administer. Linux on Alpha AXP systems uses 8 Kbyte pages and on Intel
x86 systems it uses 4 Kbyte pages. Each of these pages is given a unique number: the page frame
number (PFN).
In this paged model, a virtual address is composed of two parts: an offset and a virtual page frame
number. If the page size is 4 Kbytes, bits 1-10 of the virtual address contain the offset and bits 12
and above are the virtual page frame number. The processor extracts the virtual page frame number
and offset from a virtual address every time it encounters one. Then it matches the virtual page
frame number to a physical page and uses the offset to specify how far to go into the page. The
processor uses page tables to match the virtual page frame number to the physical page.
The figure above shows the virtual address spaces of two processes, process X and process Y, each
with their own page tables. These page tables map each process' virtual pages into physical pages in
memory. This shows that process X's virtual page frame number 0 is mapped into memory in
physical page frame number 1 and that process Y's virtual page frame number 1 is mapped into
physical page frame number 4. Each entry in the page table contains the following information:
• Valid flag. This indicates if this page table entry (PTE) is valid,
• The physical page frame number that this entry describes
• Access control information. This describes how the page may be used. Can it be written to?
Does it contain executable code?
The page table is accessed using the virtual page frame number as an offset. Virtual page frame 5
would be the 6th element of the table (0 is the first element).
To translate a virtual address into a physical one, the processor must first work out the virtual
address' page frame number and the offset within that virtual page. By making the page size a
power of 2 this can be easily done by masking and shifting. Looking again at the figures and
assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process
Y's virtual address space then the processor would translate that address into offset 0x194 into
virtual page frame number 1.
The processor uses the virtual page frame number as an index into the process' page table to retrieve
its page table entry. If the page table entry at that offset is valid, the processor takes the physical
page frame number from this entry. If the entry is invalid, the process has accessed a non-existent
area of its virtual memory. In this case, the processor cannot resolve the address and must pass
control to the operating system so that it can fix things up.
Just how the processor notifies the operating system that the correct process has attempted to access
a virtual address for which there is no valid translation is specific to the processor. However the
processor delivers it, this is known as a page fault and the operating system is notified of the
faulting virtual address and the reason for the page fault.
For a valid page table entry, the processor takes that physical page frame number and multiplies it
by the page size to get the address of the base of the page in physical memory. Finally, the processor
adds in the offset to the instruction or data that it needs. Using the above example again, process Y's
virtual page frame number 1 is mapped to physical page frame number 4 which starts at 0x8000 (4 x
0x2000). Adding in the 0x194 byte offset gives us a final physical address of 0x8194.
By mapping virtual to physical addresses this way, the virtual memory can be mapped into the
system's physical pages in any order. In the figure above, process X's virtual page frame number 0 is
mapped to physical page frame number 1, whereas virtual page frame number 7 is mapped to
physical page frame number 0 although it is higher in virtual memory than virtual page frame
number 0. This demonstrates an interesting byproduct of virtual memory; the pages of virtual
memory do not have to be present in physical memory in any particular order.

Shared Virtual Memory


Virtual memory makes it easy for several processes to share memory. All memory access are made
via page tables and each process has its own separate page table. For two processes sharing a
physical page of memory, its physical page frame number must appear in a page table entry in both
of their page tables.
The figure above shows two processes that each share physical page frame number 4. For process X
this is virtual page frame number 4 whereas for process Y this is virtual page frame number 6. This
illustrates an interesting point about sharing pages: the shared physical page does not have to exist
at the same place in virtual memory for any or all of the processes sharing it.

Physical and Virtual Addressing Modes


It does not make much sense for the operating system itself to run in virtual memory. This would be
a nightmare situation where the operating system must maintain page tables for itself. Most multi-
purpose processors support the notion of a physical address mode as well as a virtual address mode.
Physical addressing mode requires no page tables and the processor does not attempt to perform any
address translations in this mode. The Linux kernel is linked to run in physical address space.
The Alpha AXP processor does not have a special physical addressing mode. Instead, it divides up
the memory space into several areas and designates two of them as physically mapped addresses.
This kernel address space is known as KSEG address space and it encompasses all addresses
upwards from 0xfffffc0000000000. In order to execute from code linked in KSEG (by definition,
kernel code) or access data there, the code must be executing in kernel mode. The Linux kernel on
Alpha is linked to execute from address 0xfffffc0000310000.

Access Control
The page table entries also contain access control information. As the processor is already using the
page table entry to map a process' virtual address to a physical one, it can easily use the access
control information to check that the process is not accessing memory in a way that it should not.
There are many reasons why you would want to restrict access to areas of memory. Some memory,
such as that containing executable code, is naturally read only memory; the operating system should
not allow a process to write data over its executable code. By contrast, pages containing data can be
written to, but attempts to execute that memory as instructions should fail. Most processors have at
least two modes of execution: kernel and user. This adds a level of security to your operating
system. Because it is the core of the operating system and therefore can do most anything, kernel
code is only run when the CPU is in kernel mode. You would not want kernel code executed by a
user or kernel data structures to be accessible except when the processor is running in kernel mode.

Figure: Alpha AXP Page Table Entry


The access control information is held in the PTE and is processor specific; the figure above shows
the PTE for Alpha AXP. The bit fields have the following meanings:
V
Valid, if set this PTE is valid,
FOE
``Fault on Execute'', Whenever an attempt to execute instructions in this page occurs, the
processor reports a page fault and passes control to the operating system,
FOW
``Fault on Write'', as above but page fault on an attempt to write to this page,
FOR
``Fault on Read'', as above but page fault on an attempt to read from this page,
ASM
Address Space Match. This is used when the operating system wishes to clear only some of
the entries from the Translation Buffer,
KRE
Code running in kernel mode can read this page,
URE
Code running in user mode can read this page,
GH
Granularity hint used when mapping an entire block with a single Translation Buffer entry
rather than many,
KWE
Code running in kernel mode can write to this page,
UWE
Code running in user mode can write to this page,
page frame number
For PTEs with the V bit set, this field contains the physical Page Frame Number for this PTE.
For invalid PTEs, if this field is not zero, it contains information about where the page is in
the swap file.

The following two bits are defined and used by Linux:


_PAGE_DIRTY
if set, the page needs to be written out to the swap file,
_PAGE_ACCESSED
Used by Linux to mark a page as having been accessed.

Demand Paging

Once an executable image has been memory mapped into a process' virtual memory it can start to
execute. As only the very start of the image is physically pulled into memory it will soon access an
area of virtual memory that is not yet in physical memory. When a process accesses a virtual
address that does not have a valid page table entry, the processor will report a page fault to Linux.
The page fault describes the virtual address where the page fault occurred and the type of memory
access that caused the fault. Linux must find the area of memory in which the page fault occurred
in. This is done through the vm_area_struct kernel data structure. As searching through the
vm_area_struct data structures is critical to the efficient handling of page faults, these are
linked together in an AVL (Adelson-Velskii and Landis) tree structure. (An AVL tree structure is a
balanced binary search tree where the height of the two subtrees (children) of a node differs by at
most one, thus optimizing searches.) If there is no vm_area_struct data structure for this
faulting virtual address, this process has accessed an illegal virtual address. Linux will signal the
process, sending a SIGSEGV signal and if the process does not have a handler for that signal it will
be terminated.
Linux next checks the type of page fault that occurred against the types of accesses allowed for this
area of virtual memory. If the process is accessing the memory in an illegal way, say writing to an
area that it is only allowed to read from, it is also signalled with a memory error.
Now that Linux has determined that the page fault is legal, it must deal with it.
Linux must differentiate between pages that are in the swap file and those that are part of an
executable image on a disk somewhere. It does this by using the page table entry for this faulting
virtual address.
If the page's page table entry is invalid but not empty, the page fault is for a page currently being
held in the swap file. For Alpha AXP page table entries, these are entries which do not have their
valid bit set but which have a non-zero value in their PFN field. In this case the PFN field holds
information about where in the swap (and which swap file) the page is being held. How pages in the
swap file are handled is described later in this chapter.
Not all vm_area_struct data structures have a set of virtual memory operations and even those
that do may not have a nopage operation. This is because by default Linux will fix up the access by
allocating a new physical page and creating a valid page table entry for it. If there is a nopage
operation for this area of virtual memory, Linux will use it.
The generic Linux nopage operation is used for memory mapped executable images and it uses the
page cache to bring the required image page into physical memory.
However the required page is brought into physical memory, the process' page tables are updated. It
may be necessary for hardware specific actions to update those entries, particularly if the processor
uses translation look aside buffers. Now that the page fault has been handled it can be dismissed and
the process is restarted at the instruction that made the faulting virtual memory access.

Paging and Swapping


In the operating system uses capabilities of the CPU to make it appear as though you have more
memory than you really do. This is the concept of virtual memory. Later, I'll go into detail about
how this is accomplished, that is, how the operating system and CPU work together to keep up this
illusion. However, to make this section on the kernel complete, I should talk about this a little from
a software perspective.
One basic concept in the Linux implementation of virtual memory is the concept of a page. A page
is a 4Kb area of memory and is the basic unit of memory with which both the kernel and the CPU
deal. Although both can access individual bytes (or even bits), the amount of memory that is
managed is usually in pages.
If you are reading a book, you do not need to have all the pages spread out on a table for you to
work effectively just the page you are currently using. I remember many times in college when I
had the entire table top covered with open books, including my notebook. As I was studying, I
would read a little from one book, take notes on what I read, and, if I needed more details on that
subject, I would either go to a different page or a completely different book.
Virtual memory in Linux is very much like that. Just as I only need to have open the pages I am
working with currently, a process needs to have only those pages in memory with which it is
working. Like me, if the process needs a page that is not currently available (not in physical
memory), it needs to go get it (usually from the hard disk).
If another student came along and wanted to use that table, there might be enough space for him or
her to spread out his or her books as well. If not, I would have to close some of my books (maybe
putting bookmarks at the pages I was using). If another student came along or the table was fairly
small, I might have to put some of the books away. Linux does that as well. Imagine that the text
books represent the unchanging text portion of the program and the notebook represents the
changing data which might make things a little clearer.
It is the responsibility of both the kernel and the CPU to ensure that I don't end up reading someone
else's textbook or writing in someone else's notebook. That is, both the kernel and the CPU ensure
that one process does not have access to the memory locations of another process (a discussion of
cell replication would look silly in my calculus notebook). The CPU also helps the kernel by
recognizing when the process tries to access a page that is not yet in memory. It is the kernel's job to
figure out which process it was, what page it was, and to load the appropriate page.
It is also the kernel's responsibility to ensure that no one process hogs all available memory, just like
the librarian telling me to make some space on the table. If there is only one process running (not
very likely), there may be enough memory to keep the entire process loaded as it runs. More likely
is the case in which dozens of processes are in memory and each gets a small part of the total
memory. (Note: Depending on how much memory you have, it is still possible that the entire
program is in memory.)
Processes generally adhere to the principle of spatial locality. This means that typically processes
will access the same portions of their code over and over again. The kernel could establish a
working set of pages for each process, the pages that have been accessed with the last n memory
references. If n is small, the processes may not have enough pages in memory to do their job.
Instead of letting the processes work, the kernel is busy spending all of its time reading in the
needed pages. By the time the system has finished reading in the needed pages, it is some other
process's turn. Now, some other process needs more pages, so the kernel needs to read them in. This
is called thrashing. Large values of n may lead to cases in which there is not enough memory for all
the processes to run.
The solution is to use a portion of hard disk as a kind of temporary storage for data pages that are
not currently needed. This area of the hard disk is called the swap space or swap device and is a
separate area used solely for the purpose of holding data pages from programs.
The size and location of the swap device is normally set when the system is first installed.
Afterward, more swap space can be added if needed. (Swap space is added with the mkswap
command and the system is told to use it with the swapon command.)
Eventually, the process that was swapped out will get a turn on the CPU and will need to be
swapped back in. Before it can be swapped back in, the system needs to ensure that there is at least
enough memory for the task structure and a set of structures called page tables. Page tables are an
integral part of the virtual memory scheme and point to the actual pages in memory. I talk more
about this when I talk about the CPU in the hardware section.
Often you don't want to swap in certain pages. For example, it doesn't make sense to swap in pages
for a process that is sleeping on some event. Because that event hasn't occurred yet, them in means
that it will just need to go right back to sleep. Therefore, only processes in the TASK_RUNNING
state are eligible to have pages swapped back in. That is, only the processes that are runnable get
pages swapped back in.
Keep in mind that accessing the hard disk is hundreds of times slower than accessing memory.
Although does allow you to have more programs in memory than the physical RAM will allow,
using it slows down the system. If possible, it is a good idea to keep from swapping by adding more
RAM.
Until kernel 2.3.24 on the x86 platform, the Linux memory manager limited the size of each swap
area to 127.5 MB. You could have created a larger swap space, but only the first 127.5 MB will be
used. To solve this limitation, a system could have had up to 16 swap spaces for a total of 2GB in
swap space. Now Linux supports up to 64 GB of physical memory and several TB of swap.

Demand Paging
As there is much less physical memory than virtual memory the operating system must be careful
that it does not use the physical memory inefficiently. One way to save physical memory is to only
load virtual pages that are currently being used by the executing program. For example, a database
program may be run to query a database. In this case not all of the database needs to be loaded into
memory, just those data records that are being examined. If the database query is a search query
then it does not make sense to load the code from the database program that deals with adding new
records. This technique of only loading virtual pages into memory as they are accessed is known as
demand paging.
When a process attempts to access a virtual address that is not currently in memory, the processor
cannot find a page table entry for the virtual page being referenced. For example, in Figure 3.1
there is no entry in process X's page table for virtual page frame number 2 and so if process X
attempts to read from an address within virtual page frame number 2 the processor cannot translate
the address into a physical one. At this point the processor notifies the operating system that a page
fault has occurred.
If the faulting virtual address is invalid this means that the process has attempted to access a virtual
address that it should not have. Maybe the application has gone wrong in some way, for example
writing to random addresses in memory. In this case the operating system will terminate it,
protecting the other processes in the system from this rogue process.
If the faulting virtual address was valid but the page that it refers to is not currently in memory, the
operating system must bring the appropriate page into memory from the image on disk. Disk access
takes a long time, relatively speaking, and so the process must wait quite a while until the page has
been fetched. If there are other processes that could run, then the operating system will select one of
them to run. The fetched page is written into a free physical page frame and an entry for the virtual
page frame number is added to the process' page table. The process is then restarted at the machine
instruction where the memory fault occurred. This time the virtual memory access is made, the
processor can make the virtual to physical address translation and so the process continues to run.
Linux uses demand paging to load executable images into a process's virtual memory. Whenever a
command is executed, the file containing it is opened and its contents are mapped into the process's
virtual memory. This is done by modifying the data structures describing this process' memory map
and is known as memory mapping. However, only the first part of the image is actually brought into
physical memory. The rest of the image is left on disk. As the image executes, it generates page
faults and Linux uses the process's memory map in order to determine which parts of the image to
bring into memory for execution.

Swapping
If a process needs to bring a virtual page into physical memory and there are no free physical pages
available, the operating system must make room for this page by discarding another page from
physical memory. If the page to be discarded from physical memory came from an image or data
file and has not been written to then the page does not need to be saved. Instead it can be discarded
and brought back into memory from the original image or data file if it is needed again.
However, if the page has been modified, the operating system must preserve the contents of that
page so that it can be accessed at a later time. This type of page is known as a dirty page. When
dirty pages are removed from memory, they are saved in a special sort of file called the swap file.
Since access to the swap file takes a long time relative to the speed of the processor and physical
memory, the operating system must juggle the need to write pages to disk with the need to retain
them in memory.
If the swap algorithm, which is used to decide which pages to discard or swap is not efficient, then a
condition known as thrashing occurs. In the case of thrashing, pages are constantly being written to
and read back from disk. This causes the operating system to be too busy to perform enough real
work. If, for example, physical page frame number 1 in Figure 3.1 is being regularly accessed then
it is not a good candidate for swapping to hard disk. The set of pages that a process is currently
using is called the working set. An efficient swap scheme would make sure that all processes have
their working set in physical memory.
Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might
be removed from the system. This scheme involves every page in the system having an age which
changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is
accessed, the older and more stale it becomes. Old pages are good candidates for swapping.

Linux Page Tables

Figure: Three Level Page Tables


Linux assumes that there are three levels of page tables. Each Page Table contains the page frame
number of the next level of Page Table. The Figure above shows how a virtual address can be
broken into a number of fields; each field providing an offset into a particular Page Table. To
translate a virtual address into a physical one, the processor must take the contents of each level
field, convert it into an offset into the physical page containing the Page Table and read the page
frame number of the next level of Page Table. This is repeated three times until the page frame
number of the physical page containing the virtual address is found. Now the final field in the
virtual address, the byte offset, is used to find the data inside the page.
Each platform that Linux runs on must provide translation macros that allow the kernel to traverse
the page tables for a particular process. This way, the kernel does not need to know the format of the
page table entries or how they are arranged.
This is so successful that Linux uses the same page table manipulation code for the Alpha processor,
which has three levels of page tables, and for Intel x86 processors, which have two levels of page
tables.
Page Allocation and Deallocation
There are many demands on the physical pages in the system. For example, when an image is
loaded into memory the operating system needs to allocate pages. These will be freed when the
image has finished executing and is unloaded. Another use for physical pages is to hold kernel
specific data structures such as the page tables themselves. The mechanisms and data structures
used for page allocation and deallocation are perhaps the most critical in maintaining the efficiency
of the virtual memory subsystem.
All of the physical pages in the system are described by the mem_map data structure which is a list
of mem_map_t** structures which is initialized at boot time. Each mem_map_t describes a single
physical page in the system. Important fields (so far as memory management is concerned) are:
count
This is a count of the number of users of this page. The count is greater than one when the
page is shared between many processes,
age
This field describes the age of the page and is used to decide if the page is a good candidate
for discarding or swapping,
map_nr
This is the physical page frame number that this mem_map_t describes.

The free_area vector is used by the page allocation code to find and free pages. The whole
buffer management scheme is supported by this mechanism and, as far as the code is concerned, the
size of the page and physical paging mechanisms used by the processor are irrelevant.
Each element of free_area contains information about blocks of pages. The first element in the
array describes single pages, the next blocks of 2 pages, the next blocks of 4 pages and so on
upwards in powers of two. The list element is used as a queue head and has pointers to the page
data structures in the mem_map array. Free blocks of pages are queued here. map is a pointer to a
bitmap which keeps track of allocated groups of pages of this size. Bit N of the bitmap is set if the
Nth block of pages is free.
The figure below, shows the free_area structure. Element 0 has one free page (page frame
number 0) and element 2 has 2 free blocks of 4 pages, the first starting at page frame number 4 and
the second at page frame number 56.

Page Allocation
Linux uses the Buddy algorithm 2 to effectively allocate and deallocate blocks of pages. The page
allocation code
attempts to allocate a block of one or more physical pages. Pages are allocated in blocks which are
powers of 2 in size. That means that it can allocate a block 1 page, 2 pages, 4 pages and so on. So
long as there are enough free pages in the system to grant this request (nr_free_pages 
min_free_pages) the allocation code will search the free_area for a block of pages of the size
requested. Each element of the free_area has a map of the allocated and free blocks of pages for
that sized block. For example, element 2 of the array has a memory map that describes free and
allocated blocks each of 4 pages long.
The allocation algorithm first searches for blocks of pages of the size requested. It follows the chain
of free pages that is queued on the list element of the free_area data structure. If no blocks of
pages of the requested size are free, blocks of the next size (which is twice that of the size
requested) are looked for. This process continues until all of the free_area has been searched or
until a block of pages has been found. If the block of pages found is larger than that requested it
must be broken down until there is a block of the right size. Because the blocks are each a power of
2 pages big then this breaking down process is easy as you simply break the blocks in half. The free
blocks are queued on the appropriate queue and the allocated block of pages is returned to the caller.

Figure: The free_area data structure

For example, in the Figure above if a block of 2 pages was requested, the first block of 4 pages
(starting at page frame number 4) would be broken into two 2 page blocks. The first, starting at
page frame number 4 would be returned to the caller as the allocated pages and the second block,
starting at page frame number 6 would be queued as a free block of 2 pages onto element 1 of the
free_area array.

Page Deallocation
Allocating blocks of pages tends to fragment memory with larger blocks of free pages being broken
down into smaller ones. The page deallocation code
recombines pages into larger blocks of free pages whenever it can. In fact the page block size is
important as it allows for easy combination of blocks into larger blocks.
Whenever a block of pages is freed, the adjacent or buddy block of the same size is checked to see
if it is free. If it is, then it is combined with the newly freed block of pages to form a new free block
of pages for the next size block of pages. Each time two blocks of pages are recombined into a
bigger block of free pages the page deallocation code attempts to recombine that block into a yet
larger one. In this way the blocks of free pages are as large as memory usage will allow.
For example, in the Figure above, if page frame number 1 were to be freed, then that would be
combined with the already free page frame number 0 and queued onto element 1 of the
free_area as a free block of size 2 pages.

Memory Mapping
When an image is executed, the contents of the executable image must be brought into the process'
virtual address space. The same is also true of any shared libraries that the executable image has
been linked to use. The executable file is not actually brought into physical memory, instead it is
merely linked into the process' virtual memory. Then, as the parts of the program are referenced by
the running application, the image is brought into memory from the executable image. This linking
of an image into a process' virtual address space is known as memory mapping.

Figure: Areas of Virtual Memory


Every process' virtual memory is represented by an mm_struct data structure. This contains
information about the image that it is currently executing (for example bash) and also has pointers
to a number of vm_area_struct data structures. Each vm_area_struct data structure
describes the start and end of the area of virtual memory, the process' access rights to that memory
and a set of operations for that memory. These operations are a set of routines that Linux must use
when manipulating this area of virtual memory. For example, one of the virtual memory operations
performs the correct actions when the process has attempted to access this virtual memory but finds
(via a page fault) that the memory is not actually in physical memory. This operation is the nopage
operation. The nopage operation is used when Linux demand pages the pages of an executable
image into memory.
When an executable image is mapped into a process' virtual address a set of vm_area_struct
data structures is generated. Each vm_area_struct data structure represents a part of the
executable image; the executable code, initialized data (variables), unintialized data and so on.
Linux supports a number of standard virtual memory operations and, as the vm_area_struct
data structures are created, the correct set of virtual memory operations are associated with them.
The Linux Page Cache

Figure: The Linux Page Cache


The role of the Linux page cache is to speed up access to files on disk. Memory mapped files are
read a page at a time and these pages are stored in the page cache. The figure above shows that the
page cache consists of the page_hash_table, a vector of pointers to mem_map_t data
structures.
Each file in Linux is identified by a VFS inode data structure (described previously). Each VFS
inode is unique and fully describes one and only one file. The index into the page table is derived
from the file's VFS inode and the offset into the file.

Whenever a page is read from a memory mapped file, for example when it needs to be brought back
into memory during demand paging, the page is read through the page cache. If the page is present
in the cache, a pointer to the mem_map_t data structure representing it is returned to the page fault
handling code. Otherwise the page must be brought into memory from the file system that holds the
image. Linux allocates a physical page and reads the page from the file on disk.
If it is possible, Linux will initiate a read of the next page in the file. This single page read ahead
means that if the process is accessing the pages in the file serially, the next page will be waiting in
memory for the process.
Over time the page cache grows as images are read and executed. Pages will be removed from the
cache as they are no longer needed, say as an image is no longer being used by any process. As
Linux uses memory it can start to run low on physical pages. In this case Linux will reduce the size
of the page cache.
Reducing the Size of the Page and Buffer Caches
The pages held in the page and buffer caches are good candidates for being freed into the
free_area vector. The Page Cache, which contains pages of memory mapped files, may contain
unneccessary pages that are filling up the system's memory. Likewise the Buffer Cache, which
contains buffers read from or being written to physical devices, may also contain unneeded buffers.
When the physical pages in the system start to run out, discarding pages from these caches is
relatively easy as it requires no writing to physical devices (unlike swapping pages out of memory).
Discarding these pages does not have too many harmful side effects other than making access to
physical devices and memory mapped files slower. However, if the discarding of pages from these
caches is done fairly, all processes will suffer equally.
Every time the kernel swap daemon tries to shrink these caches it examines a block of pages in the
mem_map page vector to see if any can be discarded from physical memory. The size of the block
of pages examined is higher if the kernel swap daemon is intensively swapping; that is if the
number of free pages in the system has fallen dangerously low. The blocks of pages are examined in
a cyclical manner; a different block of pages is examined each time an attempt is made to shrink the
memory map. This is known as the clock algorithm as, rather like the minute hand of a clock, the
whole mem_map page vector is examined a few pages at a time.

Each page being examined is checked to see if it is cached in either the page cache or the buffer
cache. You should note that shared pages are not considered for discarding at this time and that a
page cannot be in both caches at the same time. If the page is not in either cache then the next page
in the mem_map page vector is examined.

Pages are cached in the buffer cache (or rather the buffers within the pages are cached) to make
buffer allocation and deallocation more efficient. The memory map shrinking code tries to free the
buffers that are contained within the page being examined.

If all the buffers are freed, then the pages that contain them are also be freed. If the examined page
is in the Linux page cache, it is removed from the page cache and freed.
When enough pages have been freed on this attempt then the kernel swap daemon will wait until the
next time it is periodically awakened. As none of the freed pages were part of any process' virtual
memory (they were cached pages), then no page tables need updating. If there were not enough
cached pages discarded then the swap daemon will try to swap out some shared pages.

Caches
If you were to implement a system using the above theoretical model then it would work, but not
particularly efficiently. Both operating system and processor designers try hard to extract more
performance from the system. Apart from making the processors, memory and so on faster the best
approach is to maintain caches of useful information and data that make some operations faster.
Linux uses a number of memory management related caches:
Buffer Cache

The buffer cache contains data buffers that are used by the block device drivers.

These buffers are of fixed sizes (for example 512 bytes) and contain blocks of information
that have either been read from a block device or are being written to it. A block device is one
that can only be accessed by reading and writing fixed sized blocks of data. All hard disks are
block devices.

The buffer cache is indexed via the device identifier and the desired block number and is used
to quickly find a block of data. Block devices are only ever accessed via the buffer cache. If
data can be found in the buffer cache then it does not need to be read from the physical block
device, for example a hard disk, and access to it is much faster.

Page Cache

This is used to speed up access to images and data on disk. It is used to cache the logical
contents of a file a page at a time and is accessed via the file and offset within the file. As
pages are read into memory from disk, they are cached in the page cache.

Swap Cache

Only modified (or dirty) pages are saved in the swap file. So long as these pages are not
modified after they have been written to the swap file then the next time the page is swapped
out there is no need to write it to the swap file as the page is already in the swap file. Instead
the page can simply be discarded. In a heavily swapping system this saves many unnecessary
and costly disk operations.

Hardware Caches

One commonly implemented hardware cache is in the processor; a cache of Page Table
Entries. In this case, the processor does not always read the page table directly but instead
caches translations for pages as it needs them. These are the Translation Look-aside Buffers
and contain cached copies of the page table entries from one or more processes in the system.

When the reference to the virtual address is made, the processor will attempt to find a
matching TLB entry. If it finds one, it can directly translate the virtual address into a physical
one and perform the correct operation on the data. If the processor cannot find a matching
TLB entry then it must get the operating system to help. It does this by signalling the
operating system that a TLB miss has occurred. A system specific mechanism is used to
deliver that exception to the operating system code that can fix things up. The operating
system generates a new TLB entry for the address mapping. When the exception has been
cleared, the processor will make another attempt to translate the virtual address. This time it
will work because there is now a valid entry in the TLB for that address.

The drawback of using caches, hardware or otherwise, is that in order to save effort Linux must use
more time and space maintaining these caches and, if the caches become corrupted, the system will
crash.

Swapping Out and Discarding Pages


When physical memory becomes scarce the Linux memory management subsystem must attempt to
free physical pages. This task falls to the kernel swap daemon (kswapd).
The kernel swap daemon is a special type of process, a kernel thread. Kernel threads are processes
that have no virtual memory, instead they run in kernel mode in the physical address space. The
kernel swap daemon is slightly misnamed in that it does more than merely swap pages out to the
system's swap files. Its role is make sure that there are enough free pages in the system to keep the
memory management system operating efficiently.
The Kernel swap daemon (kswapd) is started by the kernel init process at startup time and sits
waiting for the kernel swap timer to periodically expire.
Every time the timer expires, the swap daemon looks to see if the number of free pages in the
system is getting too low. It uses two variables, free_pages_high and free_pages_low to decide if it
should free some pages. So long as the number of free pages in the system remains above
free_pages_high, the kernel swap daemon does nothing; it sleeps again until its timer next expires.
For the purposes of this check the kernel swap daemon takes into account the number of pages
currently being written out to the swap file. It keeps a count of these in nr_async_pages, which is
incremented each time a page is queued waiting to be written out to the swap file and decremented
when the write to the swap device has completed. free_pages_low and free_pages_high are set at
system startup time and are related to the number of physical pages in the system. If the number of
free pages in the system has fallen below free_pages_high or worse still free_pages_low, the kernel
swap daemon will try three ways to reduce the number of physical pages being used by the system:
Reducing the size of the buffer and page caches,
Swapping out System V shared memory pages,
Swapping out and discarding pages.

If the number of free pages in the system has fallen below free_pages_low, the kernel swap daemon
will try to free 6 pages before it next runs. Otherwise it will try to free 3 pages. Each of the above
methods are tried in turn until enough pages have been freed. The kernel swap daemon remembers
which method it used the last time that it attempted to free physical pages. Each time it runs it will
start trying to free pages using this last successful method.
After it has freed sufficient pages, the swap daemon sleeps again until its timer expires. If the reason
that the kernel swap daemon freed pages was that the number of free pages in the system had fallen
below free_pages_low, it only sleeps for half its usual time. Once the number of free pages is more
than free_pages_low the kernel swap daemon goes back to sleeping longer between checks.

Swapping Out and Discarding Pages


The swap daemon looks at each process in the system in turn to see if it is a good candidate for
swapping. Good candidates are processes that can be swapped (some cannot) and that have one or
more pages which can be swapped or discarded from memory. Pages are swapped out of physical
memory into the system's swap files only if the data in them cannot be retrieved another way.
A lot of the contents of an executable image come from the image's file and can easily be re-read
from that file. For example, the executable instructions of an image will never be modified by the
image and so will never be written to the swap file. These pages can simply be discarded; when they
are again referenced by the process, they will be brought back into memory from the executable
image. Once the process to swap has been located, the swap daemon looks through all of its virtual
memory regions looking for areas which are not shared or locked. Linux does not swap out all of
the swappable pages of the process that it has selected. Instead it removes only a small number of
pages. Pages cannot be swapped or discarded if they are locked in memory.
The Linux swap algorithm uses page aging. Each page has a counter (held in the mem_map_t data
structure) that gives the Kernel swap daemon some idea whether or not a page is worth swapping.
Pages age when they are unused and rejuvinate on access; the swap daemon only swaps out old
pages. The default action when a page is first allocated, is to give it an initial age of 3. Each time it
is touched, it's age is increased by 3 to a maximum of 20. Every time the Kernel swap daemon runs
it ages pages, decrementing their age by 1. These default actions can be changed and for this reason
they (and other swap related information) are stored in the swap_control data structure. If the
page is old (age = 0), the swap daemon will process it further. Dirty pages are pages which can be
swapped out. Linux uses an architecture specific bit in the PTE to describe pages this way.
However, not all dirty pages are necessarily written to the swap file. Every virtual memory region of
a process may have its own swap operation (pointed at by the vm_ops pointer in the
vm_area_struct) and that method is used. Otherwise, the swap daemon will allocate a page in
the swap file and write the page out to that device.
The page's page table entry is replaced by one which is marked as invalid but which contains
information about where the page is in the swap file. This is an offset into the swap file where the
page is held and an indication of which swap file is being used. Whatever the swap method used,
the original physical page is made free by putting it back into the free_area. Clean (or rather not
dirty) pages can be discarded and put back into the free_area for re-use.

If enough of the swappable process' pages have been swapped out or discarded, the swap daemon
will again sleep. The next time it wakes it will consider the next process in the system. In this way,
the swap daemon nibbles away at each process' physical pages until the system is again in balance.
This is much fairer than swapping out whole processes.

Swapping Out System V Shared Memory Pages


System V shared memory is an inter-process communication mechanism which allows two or more
processes to share virtual memory in order to pass information amongst themselves. How processes
share memory in this way is described in more detail in the section on inter-process communication.
For now it is enough to say that each area of System V shared memory is described by a
shmid_ds data structure. This contains a pointer to a list of vm_area_struct data structures,
one for each process sharing this area of virtual memory. The vm_area_struct data structures
describe where in each processes virtual memory this area of System V shared memory goes. Each
vm_area_struct data structure for this System V shared memory is linked together using the
vm_next_shared and vm_prev_shared pointers. Each shmid_ds data structure also
contains a list of page table entries each of which describes the physical page that a shared virtual
page maps to.
The kernel swap daemon also uses a clock algorithm when swapping out System V shared memory
pages. Each time it runs it remembers which page of which shared virtual memory area it last
swapped out.It does this by keeping two indices, the first is an index into the set of shmid_ds data
structures, the second into the list of page table entries for this area of System V shared memory.
This makes sure that it fairly "victimizes" the areas of System V shared memory.
The kernel swap daemon must modify the page table of every process sharing an area of virtual
memory to reflect that a page has been moved from memory to the swap file because the physical
page frame number for any given virtual page of System V shared memory is contained in the page
tables of all of the processes sharing this area of virtual memory. For each shared page it is
swapping out, the kernel swap daemon finds the page table entry in each of the sharing process'
page tables (by following a pointer from each vm_area_struct data structure). If this process'
page table entry for this page of System V shared memory is valid, it converts it into an invalid but
swapped out page table entry and reduces this (shared) page's count of users by one. The format of a
swapped out System V shared page table entry contains an index into the set of shmid_ds data
structures and an index into the page table entries for this area of System V shared memory.
If the page's count is zero after the page tables of the sharing processes have all been modified, the
shared page can be written out to the swap file. The page table entry in the list pointed at by the
shmid_ds data structure for this area of System V shared memory is replaced by a swapped out
page table entry. A swapped out page table entry is invalid but contains an index into the set of open
swap files and the offset in that file where the swapped out page can be found. This information will
be used when the page has to be brought back into physical memory.

Swapping Pages In
The dirty pages saved in the swap files may be needed again, for example when an application
writes to an area of virtual memory whose contents are held in a swapped out physical page.
Accessing a page of virtual memory that is not held in physical memory causes a page fault to
occur. The page fault is the processor signalling the operating system that it cannot translate a
virtual address into a physical one. In this case this is because the page table entry describing this
page of virtual memory was marked as invalid when the page was swapped out. The processor
cannot handle the virtual to physical address translation and so hands control back to the operating
system describing as it does so the virtual address that faulted and the reason for the fault. The
format of this information and how the processor passes control to the operating system is processor
specific.
The processor specific page fault handling code must locate the vm_area_struct data structure
that describes the area of virtual memory that contains the faulting virtual address. It does this by
searching the vm_area_struct data structures for this process until it finds the one containing
the faulting virtual address. This is very time critical code and a processes vm_area_struct data
structures are so arranged as to make this search take as little time as possible.
Having carried out the appropriate processor specific actions and found that the faulting virtual
address is for a valid area of virtual memory, the page fault processing becomes generic and
applicable to all processors that Linux runs on.
The generic page fault handling code looks for the page table entry for the faulting virtual address.
If the page table entry it finds is for a swapped out page, Linux must swap the page back into
physical memory. The format of the page table entry for a swapped out page is processor specific
but all processors mark these pages as invalid and put the information neccessary to locate the page
within the swap file into the page table entry. Linux needs this information in order to bring the
page back into physical memory.
At this point, Linux knows the faulting virtual address and has a page table entry containing
information about where this page has been swapped to. The vm_area_struct data structure
may contain a pointer to a routine which will swap any page of the area of virtual memory that it
describes back into physical memory. This is its swapin operation. If there is a swapin operation for
this area of virtual memory then Linux will use it. This is, in fact, how swapped out System V
shared memory pages are handled as it requires special handling because the format of a swapped
out System V shared page is a little different from that of an ordinairy swapped out page. There may
not be a swapin operation, in which case Linux will assume that this is an ordinairy page that does
not need to be specially handled.
It allocates a free physical page and reads the swapped out page back from the swap file.
Information telling it where in the swap file (and which swap file) is taken from the the invalid page
table entry.
If the access that caused the page fault was not a write access then the page is left in the swap cache
and its page table entry is not marked as writable. If the page is subsequently written to, another
page fault will occur and, at that point, the page is marked as dirty and its entry is removed from the
swap cache. If the page is not written to and it needs to be swapped out again, Linux can avoid the
write of the page to its swap file because the page is already in the swap file.
If the access that caused the page to be brought in from the swap file was a write operation, this
page is removed from the swap cache and its page table entry is marked as both dirty and writable.

The Swap Cache


When swapping pages out to the swap files, Linux avoids writing pages if it does not have to. There
are times when a page is both in a swap file and in physical memory. This happens when a page that
was swapped out of memory was then brought back into memory when it was again accessed by a
process. So long as the page in memory is not written to, the copy in the swap file remains valid.
Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one
per physical page in the system. This is a page table entry for a swapped out page and describes
which swap file the page is being held in together with its location in the swap file. If a swap cache
entry is non-zero, it represents a page which is being held in a swap file that has not been modified.
If the page is subsequently modified (by being written to), its entry is removed from the swap cache.
When Linux needs to swap a physical page out to a swap file it consults the swap cache and, if there
is a valid entry for this page, it does not need to write the page out to the swap file. This is because
the page in memory has not been modified since it was last read from the swap file.
The entries in the swap cache are page table entries for swapped out pages. They are marked as
invalid but contain information which allow Linux to find the right swap file and the right page
within that swap file.

Processes
Processes carry out tasks within the operating system. A program is a set of machine code
instructions and data stored in an executable image on disk and is, as such, a passive entity; a
process can be thought of as a computer program in action.
A process is a dynamic entity, constantly changing as the machine code instructions are executed by
the processor. As well as the program's instructions and data, the process also includes the program
counter and all of the CPU's s as well as the process stacks containing temporary data such as
routine parameters, return addresses and saved variables. In short a process is an executing program
encompassing all of the current activity in the microprocessor. Linux is a multiprocessing operating
system. Each process is a separate task with its own rights and responsibilities. If one process
crashes it will not cause another process in the system to crash. Each individual process runs in its
own virtual address space and is not capable of interacting with another process except through
secure, kernel-managed mechanisms.
During the lifetime of a process it will use many system resources. It will use the CPUs in the
system to run its instructions and the system's physical memory to hold it and its data. It will open
and use files within the filesystems and may directly or indirectly use the physical devices in the
system. Linux must keep track of the process and its system resources to fairly manage it and the
other processes in the system. It would not be fair to the other processes in the system if one process
monopolized most of the system's physical memory or its CPUs.
The most precious resource in the system is the CPU, of which there is usually only one. Linux is a
multiprocessing operating system that maximizes CPU utilization by ensuring that there is a running
process on each CPU in the system at all times. If there are more processes than CPUs (and there
usually are), the rest of the processes must wait before a CPU becomes free until they can be run.
Multiprocessing is a simple idea; a process is executed until it must wait, usually for some system
resource. It may resume once the resource becomes available. In a uniprocessing system like , the
CPU simply sits idly until the system resource becomes available, wasting the waiting time. In a
multiprocessing system many processes are kept in memory at the same time. Whenever a process
has to wait the operating system takes the CPU away from that process and gives it to another, more
deserving process. Linux' scheduler uses a number of scheduling strategies to ensure fairness, such
as deciding which process to run next. Linux supports a number of different executable file formats,
like and Java. These must be managed transparently as must the process' use of the system's shared
libraries.
From the user's perspective, perhaps the most obvious aspect of a kernel is process management.
This is the part of the kernel that ensures that each process gets its turn to run on the CPU. This is
also the part that makes sure that the individual processes don't "trounce" on other processes by
writing to areas of memory that belong to someone else. To do this, the kernel keeps track of many
different structures that are maintained both on a per-user basis as well as systemwide.
As we talked about in the section on operating system basics, a process is the running instance of a
program (a program simply being the bytes on the disks). One of the most powerful aspects of
Linux is its ability not only to keep many processes in memory at once but also to switch between
them fast enough to make it appear as though they were all running at the same time. (Note: In
much of the Linux code, the references are to tasks, not to processes. Because the term process
seems to be more common in UNIX literature and I am used to that term, I will be using process.
However, there is no difference between a task and a process, so you can interchange them to your
heart's content.)
A process runs within its context. It is also common to say that the CPU is operating within the
context of a specific process. The context of a process is all of the characteristics, settings, values,
etc., that a particular program uses as it runs, as well as those that it needs to run. Even the internal
state of the CPU and the contents of all its registers are part of the context of the process. When a
process has finished having its turn on the CPU and another process gets to run, the act of changing
from one process to another is called a context switch. This is represented graphically by the figure
below.

We can say that a process' context is defined by two structures: its task structure (also called its
uarea or ublock in some operating system text) and its process table entry. These contain the
necessary information to manage the each process, such as the user ID (UID) of the process, the
group ID (GID), the system call error return value, and dozens of other things. To see where it is all
kept (that is, the structure of the task structure), see the task_struct in <linux/sched.h>.
There is a special part of the kernel's private memory that holds the task structure of the currently
running process. When a context switch occurs, the task structure is switched out. All other parts of
the process remain where they are. The task structure of the next process is copied into the same
place in memory as the task structure for the old process. This way the kernel does not have to make
any adjustments and knows exactly where to look for the task structure. It will always be able to
access the task structure of the currently running process by accessing the same area in memory.
This is the current process, which is a pointer of type task_struct.
One piece of information that the process table entry (PTE) contains is the process' Local
Descriptor Table (). A descriptor is a data structure the process uses to gain access to different parts
of the system (that is, different parts of memory or different segments). Despite a common
misunderstanding, Linux does use a segmented memory architecture. In older CPUs, segments were
a way to get around memory access limitations. By referring to memory addresses as offsets within
a given segment, more memory could be addressed than if memory were looked at as a single block.
The key difference with Linux is that each of these segments are 4GB and not the 64K they were
originally.
The descriptors are held in descriptor tables. The keeps track of a process' segments, also called a
region. That is, these descriptors are local to the process. The Global Descriptor Table (GDT) keeps
track of the kernel's segments. Because there are many processes running, there will be many LDTs.
These are part of the process' context. However, there is only one GDT, as there is only one kernel.
Within the task structure is a pointer to another key aspect of a process' context: its Task State
Segment (TSS). The TSS contains all the registers in the CPU. The contents of all the registers
define the state in which the CPU is currently running. In other words, the registers say what a
given process is doing at any given moment, Keeping track of these registers is vital to the concept
of multitasking.
By saving the registers in the TSS, you can reload them when this process gets its turn again and
continue where you left off because all of the registers are reloaded to their previous value. Once
reloaded, the process simply starts over where it left off as though nothing had happened.
This brings up two new issues: system calls and stacks. A system call is a programming term for a
very low-level function, functions that are "internal" to the operating system and that are used to
access the internals of the operating system, such as in the device drivers that ultimately access the
hardware. Compare this to library calls, which are made up of system calls.
A stack is a means of keeping track where a process has been. Like a stack of plates, objects are
pushed onto the stack and popped off the stack. Therefore, objects that are pushed onto the stack are
then popped off in reverse. When calling routines, certain values are pushed onto the stack for safe-
keeping, including the variables to be passed to the function and the location to which the system
should return after completing the function. When returning from that routine, these values are
retrieved by being popped off the stack.
Part of the task structure is a pointer to that process' entry in the process table. The process table, as
its name implies, is a table containing information about all the processes on the system, whether
that process is currently running or not. Each entry in the process table is defined in
<linux/sched.h>. The principle that a process may be in memory but not actually running is
important and I will get into more detail about the life of a process shortly.
The size of this table is a set value and is determined by the kernel parameter NR_TASKS. Though
you could change this value, you need to build a new kernel and reboot for the change to take effect.
If there is a runaway process that keeps creating more and processes or if you simply have a very
busy system, it is possible that the process table will fill up. If it were to fill up, root would be
unable to even stop them because it needs to start a new process to do so (even if root were logged
in already.) The nice thing is that there is a set number of processes reserved for root. This is
defined by MIN_TASKS_LEFT_FOR_ROOT in < linux/tasks.h>. On my system, this defaults to 4.
Just how is a process created? First, one process uses the fork() system call. Like a fork in the road,
the fork() system call starts off as a single entity and then splits into two. When one process uses the
fork() system call, an exact copy of itself is created in memory and the task structures are essentially
identical. However memory is not copied, but rather
The value in each CPU register is the same, so both copies of this process are at the exact same
place in their code. Each of the variables also has the exact same value. There are two exceptions:
the process ID number and the return value of the fork() system call. (You can see the details of the
fork() system call in kernel/fork.c.) How the fork()-exec() look graphically you can see in the figure
below:
Like users and their UID, each process is referred to by its process ID number, or PID, which is a
unique number. Although your system could have approximately 32K processes at a time, on even
the busiest systems it rarely gets that high.
You may, however, find a very large PID on your system (running ps, for example). This does not
mean that there are actually that many processes. Instead, it demonstrates the fact that the system
does not immediately re-use the PID. This is to prevent a "race condition", for example where one
process sends a signal (message) to a another process, but before the message arrives, the other
process has stopped. The result is that the wrong process could get the message.
When a fork() system call is made, the value returned by the fork() to the calling process is the PID
of the newly created process. Because the new copy didn't actually make the fork() call, the return
value in the copy is 0. This is how a process spawns or forks a child process. The process that called
the fork() is the parent process of this new process, which is the child process. Note that I
intentionally said the parent process and a child process. A process can fork many child processes,
but the process has only one parent. Almost always, a program will keep track of that return value
and will then change its behavior based on that value. It is very common for the child to issue an
exec() system call. Although it takes the fork() system call to create the space that will be utilized by
the new process, it is the exec() system call that causes this space to be overwritten with the new
program.
At the beginning of every executable program is an area simply called the "header." This header
describes the contents of the file; that is, how the file is to be interpreted. The header contains the
locations of the text and data segments. As we talked about before, a segment is a portion of the
program. The portion of the program that contains the executable instructions is called the text
segment. The portion containing pre-initialized data is the data segment. Pre-initialized data are
variables, structures, arrays, etc. that have their value already set even before the program is run.
The process is given descriptors for each of the segments.
In contrast to other operating systems running on Intel-based CPUs, Linux has only one segment
each for the text, data, and stack. I haven't mentioned the stack segment until now because the stack
segment is created when the process is created. Because the stack is used to keep track of where the
process has been and what it has done, there is no need create it until the process starts.
Another segment that I haven't talked about until now is not always used. This is the shared data
segment. Shared data is an area of memory that is accessible by more than one process. Do you
remember from our discussion on operating system basics when I said that part of the job of the
operating system was to keep processes from accessing areas of memory that they weren't supposed
to? So, what if they need to? What if they are supposed to? That is where the shared data region
comes in.
If one process tells the other where the shared memory segment is (by giving a pointer to it), then
any process can access it. The way to keep unwanted processes away is simply not to tell them. In
this way, each process that is allowed can use the data and the segment only goes away when that
last process disappears. Figure 0-3 shows how several processes would look in memory.

In the figure above, we see three processes. In all three instances, each process has its own data and
stack segments. However, process A and process B share a text segment. That is, process A and
process B have called the same executable off the hard disk. Therefore, they are sharing the same
instructions. Note that in reality, this is much more complicated because the two processes may be
not be executing the exact same instructions at any given moment.
Each process has at least a text, data, and stack segment. In addition, each process is created in the
same way. An existing process will (normally) use the fork()-exec() system call pair to create
another process. However, this brings up an interesting question, similar to "Who or what created
God?": If every process has to be created by another, who or what created the first process?
When the computer is turned on, it goes through some wild gyrations that we talk about in the
section on the boot process. At the end of the boot process, the system loads and executes the
/vmlinuz binary, the kernel itself. One of the last things the kernel does is "force" the creation of a
single process, which then becomes the great-grandparent of all the other processes.
The first created process is init, with a PID of 1. All other processes can trace their ancestry back to
init . It is init's job to read the entries in the file /etc./inittab and execute different programs. One
thing it does is start the getty program on all the login terminals, which eventually provides every
user with its shell.
Another system process is bdflush, the buffer flushing daemon. Its job is to clean out any "dirty"
buffers inside the systems buffer cache. A dirty buffer contains data that has been written to by a
program but hasn't yet been written to the disk. It is the job of bdflush to write this out to the hard
disk (probably) at regular intervals. These intervals are 30 seconds for data buffers and 5 seconds
for metadata buffers. (Metadata is the data used to administer the file system, such as the
superblock.)
You may find on your system that two daemons are running, bdflush and update. Both are used to
write back blocks, but with slightly different functions. The update daemon writes back modified
blocks (including superblocks and inode tables) after a specific period of time to ensure that blocks
are not kept in memory too long without being written to the disk. On the other hand, bdflush writes
back a specific number of dirty blocks buffers. This keeps the ratio of dirty blocks to total blocks in
the buffer at a "safe" level.
All processes, including those I described above, operate in one of two modes: user or system mode
(see Figure 0-4 Process Modes). In the section on the CPU in the hardware chapter, I will talk about
the privilege levels. An Intel 80386 and later has four privilege levels, 0-3. Linux uses only the two
most extreme: 0 and 3. Processes running in user mode run at privilege level 3 within the CPU.
Processes running in system mode run at privilege level 0 (more on this in a moment).
In user mode, a process executes instructions from within its own text segment, references its own
data segment, and uses its own stack. Processes switch from user mode to kernel mode by making
system calls. Once in system mode, the instructions within the kernel's text segment are executed,
the kernel's data segment is used, and a system stack is used within the process task structure.
Although the process goes through a lot of changes when it makes a system call, keep in mind that
this is not a context switch. It is still the same process but it is just operating at a higher privilege.

Linux Processes
So that Linux can manage the processes in the system, each process is represented by a
task_struct data structure (task and process are terms that Linux uses interchangeably). The
task vector is an array of pointers to every task_struct data structure in the system.

This means that the maximum number of processes in the system is limited by the size of the task
vector; by default it has 512 entries. As processes are created, a new task_struct is allocated
from system memory and added into the task vector. To make it easy to find, the current, running,
process is pointed to by the current pointer.

As well as the normal type of process, Linux supports real time processes. These processes have to
react very quickly to external events (hence the term "real time") and they are treated differently
from normal user processes by the scheduler. Although the task_struct data structure is quite
large and complex, but its fields can be divided into a number of functional areas:
State
As a process executes it changes state according to its circumstances. Linux processes have
the following states: 1
Running
The process is either running (it is the current process in the system) or it is ready to run
(it is waiting to be assigned to one of the system's CPUs).
Waiting
The process is waiting for an event or for a resource. Linux differentiates between two
types of waiting process; interruptible and uninterruptible. Interruptible waiting
processes can be interrupted by signals whereas uninterruptible waiting processes are
waiting directly on hardware conditions and cannot be interrupted under any
circumstances.
Stopped
The process has been stopped, usually by receiving a signal. A process that is being
debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a task_struct data
structure in the task vector. It is what it sounds like, a dead process.

Scheduling Information
The scheduler needs this information in order to fairly decide which process in the system
most deserves to run,

Identifiers
Every process in the system has a process identifier. The process identifier is not an index into
the task vector, it is simply a number. Each process also has User and group identifiers,
these are used to control this processes access to the files and devices in the system,

Inter-Process Communication
Linux supports the classic Unix TM IPC mechanisms of signals, pipes and semaphores and also
the System V IPC mechanisms of shared memory, semaphores and message queues. The IPC
mechanisms supported by Linux are described in the section on IPC.

Links
In a Linux system no process is independent of any other process. Every process in the
system, except the initial process has a parent process. New processes are not created, they are
copied, or rather cloned from previous processes. Every task_struct representing a
process keeps pointers to its parent process and to its siblings (those processes with the same
parent process) as well as to its own child processes. You can see the family relationship
between the running processes in a Linux system using the pstree command:

init(1)-+-crond(98)
|-emacs(387)
|-gpm(146)
|-inetd(110)
|-kerneld(18)
|-kflushd(2)
|-klogd(87)
|-kswapd(3)
|-login(160)---bash(192)---emacs(225)
|-lpd(121)
|-mingetty(161)
|-mingetty(162)
|-mingetty(163)
|-mingetty(164)
|-login(403)---bash(404)---pstree(594)
|-sendmail(134)
|-syslogd(78)
`-update(166)
Additionally all of the processes in the system are held in a doubly linked list whose root is
the init processes task_struct data structure. This list allows the Linux kernel to look
at every process in the system. It needs to do this to provide support for commands such as ps
or kill.

Times and Timers


The kernel keeps track of a processes creation time as well as the CPU time that it consumes
during its lifetime. Each clock tick, the kernel updates the amount of time in jiffies that
the current process has spent in system and in user mode. Linux also supports process specific
interval timers, processes can use system calls to set up timers to send signals to themselves
when the timers expire. These timers can be single-shot or periodic timers.

File system
Processes can open and close files as they wish and the processes task_struct contains
pointers to descriptors for each open file as well as pointers to two VFS inodes. Each VFS
inode uniquely describes a file or directory within a file system and also provides a uniform
interface to the underlying file systems. How file systems are supported under Linux is
described in the section on the filesystems. The first is to the root of the process (its home
directory) and the second is to its current or pwd directory. pwd is derived from the Unix
TM command pwd, print working directory. These two VFS inodes have their count fields

incremented to show that one or more processes are referencing them. This is why you cannot
delete the directory that a process has as its pwd directory set to, or for that matter one of its
sub-directories.

Virtual memory
Most processes have some virtual memory (kernel threads and daemons do not) and the Linux
kernel must track how that virtual memory is mapped onto the system's physical memory.

Processor Specific Context


A process could be thought of as the sum total of the system's current state. Whenever a
process is running it is using the processor's registers, stacks and so on. This is the processes
context and, when a process is suspended, all of that CPU specific context must be saved in
the task_struct for the process. When a process is restarted by the scheduler its context is
restored from here.

Executing Programs
In Linux, as in Unix, programs and commands are normally executed by a command interpreter. A
command interpreter is a user process like any other process and is called a shell .

There are many shells in Linux, some of the most popular are sh, bash and tcsh. With the
exception of a few built in commands, such as cd and pwd, a command is an executable binary file.
For each command entered, the shell searches the directories in the process's search path, held in
the PATH environment variable, for an executable image with a matching name. If the file is found
it is loaded and executed. The shell clones itself using the fork mechanism described above and then
the new child process replaces the binary image that it was executing, the shell, with the contents of
the executable image file just found. Normally the shell waits for the command to complete, or
rather for the child process to exit. You can cause the shell to run again by pushing the child process
to the background by typing control-Z, which causes a SIGSTOP signal to be sent to the child
process, stopping it. You then use the shell command bg to push it into a background, the shell
sends it a SIGCONT signal to restart it, where it will stay until either it ends or it needs to do
terminal input or output.
An executable file can have many formats or even be a script file. Script files have to be recognized
and the appropriate interpreter run to handle them; for example /bin/sh interprets shell scripts.
Executable object files contain executable code and data together with enough information to allow
the operating system to load them into memory and execute them. The most commonly used object
file format used by Linux is ELF but, in theory, Linux is flexible enough to handle almost any
object file format.

Figure: Registered Binary Formats


As with file systems, the binary formats supported by Linux are either built into the kernel at kernel
build time or available to be loaded as modules. The kernel keeps a list of supported binary formats
(see figure 4.3) and when an attempt is made to execute a file, each binary format is tried in turn
until one works.
Commonly supported Linux binary formats are a.out and ELF. Executable files do not have to be
read completely into memory, a technique known as demand loading is used. As each part of the
executable image is used by a process it is brought into memory. Unused parts of the image may be
discarded from memory.

ELF
The ELF (Executable and Linkable Format) object file format, designed by the Unix System
Laboratories, is now firmly established as the most commonly used format in Linux. Whilst there is
a slight performance overhead when compared with other object file formats such as ECOFF and
a.out, ELF is felt to be more flexible. ELF executable files contain executable code, sometimes
refered to as text, and data. Tables within the executable image describe how the program should be
placed into the process's virtual memory. Statically linked images are built by the linker (ld), or link
editor, into one single image containing all of the code and data needed to run this image. The
image also specifies the layout in memory of this image and the address in the image of the first
code to execute.
It is a simple C program that prints ``hello world'' and then exits. The header describes it as an ELF
image with two physical headers (e_phnum is 2) starting 52 bytes (e_phoff) from the start of the
image file. The first physical header describes the executable code in the image. It goes at virtual
address 0x8048000 and there is 65532 bytes of it. This is because it is a statically linked image
which contains all of the library code for the printf() call to output ``hello world''. The entry
point for the image, the first instruction for the program, is not at the start of the image but at virtual
address 0x8048090 (e_entry). The code starts immediately after the second physical header. This
physical header describes the data for the program and is to be loaded into virtual memory at
address 0x8059BB8. This data is both readable and writeable. You will notice that the size of the
data in the file is 2200 bytes (p_filesz) whereas its size in memory is 4248 bytes. This because
the first 2200 bytes contain pre-initialized data and the next 2048 bytes contain data that will be
initialized by the executing code. When Linux loads an ELF executable image into the process's
virtual address space, it does not actually load the image.
It sets up the virtual memory data structures, the process's vm_area_struct tree and its page
tables. When the program is executed page faults will cause the program's code and data to be
fetched into physical memory. Unused portions of the program will never be loaded into memory.
Once the ELF binary format loader is satisfied that the image is a valid ELF executable image it
flushes the process's current executable image from its virtual memory. As this process is a cloned
image (all processes are) this, old, image is the program that the parent process was executing, for
example the command interpreter shell such as bash. This flushing of the old executable image
discards the old virtual memory data structures and resets the process's page tables. It also clears
away any signal handlers that were set up and closes any files that are open. At the end of the flush
the process is ready for the new executable image. No matter what format the executable image is,
the same information gets set up in the process's mm_struct. There are pointers to the start and
end of the image's code and data. These values are found as the ELF executable images physical
headers are read and the sections of the program that they describe are mapped into the process's
virtual address space. That is also when the vm_area_struct data structures are set up and the
process's page tables are modified. The mm_struct data structure also contains pointers to the
parameters to be passed to the program and to this process's environment variables.

ELF Shared Libraries


A dynamically linked image, on the other hand, does not contain all of the code and data required to
run. Some of it is held in shared libraries that are linked into the image at run time. The ELF shared
library's tables are also used by the dynamic linker when the shared library is linked into the image
at run time. Linux uses several dynamic linkers, ld.so.1, libc.so.1 and ld-linux.so.1,
all to be found in /lib. The libraries contain commonly used code such as language subroutines.
Without dynamic linking, all programs would need their own copy of the these libraries and would
need far more disk space and virtual memory. In dynamic linking, information is included in the
ELF image's tables for every library routine referenced. The information indicates to the dynamic
linker how to locate the library routine and link it into the program's address space.
Script Files
Script files are executables that need an interpreter to run them. There are a wide variety of
interpreters available for Linux; for example wish, perl and command shells such as tcsh. Linux
uses the standard Unux TM convention of having the first line of a script file contain the name of the
interpreter. So, a typical script file would start:
#!/usr/bin/wish

The script binary loader tries to find the intepreter for the script.
It does this by attempting to open the executable file that is named in the first line of the script. If it
can open it, it has a pointer to its VFS inode and it can go ahead and have it interpret the script file.
The name of the script file becomes argument zero (the first argument) and all of the other
arguments move up one place (the original first argument becomes the new second argument and so
on). Loading the interpreter is done in the same way as Linux loads all of its executable files. Linux
tries each binary format in turn until one works. This means that you could in theory stack several
interpreters and binary formats making the Linux binary format handler a very flexible piece of
software.

ProcessFiles

Figure: A Process's Files


The figure above shows that there are two data structures that describe file system specific
information for each process in the system. The first, the fs_struct contains pointers to this
process's VFS inodes and its umask. The umask is the default mode that new files will be created
in, and it can be changed via system calls.
The second data structure, the files_struct, contains information about all of the files that this
process is currently using. Programs read from standard input and write to standard output. Any
error messages should go to standard error. These may be files, terminal input/output or a real
device but so far as the program is concerned they are all treated as files. Every file has its own
descriptor and the files_struct contains pointers to up to 256 file data structures, each one
describing a file being used by this process. The f_mode field describes what mode the file has
been created in; read only, read and write or write only. f_pos holds the position in the file where
the next read or write operation will occur. f_inode points at the VFS inode describing the file
and f_ops is a pointer to a vector of routine addresses; one for each function that you might wish
to perform on a file. There is, for example, a write data function. This abstraction of the interface is
very powerful and allows Linux to support a wide variety of file types. In Linux, pipes are
implemented using this mechanism as we shall see later.
Every time a file is opened, one of the free file pointers in the files_struct is used to point
to the new file structure. Linux processes expect three file descriptors to be open when they start.
These are known as standard input, standard output and standard error and they are usually
inherited from the creating parent process. All accesses to files are via standard system calls which
pass or return file descriptors. These descriptors are indices into the process's fd vector, so
standard input, standard output and standard error have file descriptors 0, 1 and 2. Each access to
the file uses the file data structure's file operation routines to together with the VFS inode to
achieve its needs.

Identifiers
Linux, like all Unix TM uses user and group identifiers to check for access rights to files and images
in the system. All of the files in a Linux system have ownerships and permissions, these permissions
describe what access the system's users have to that file or directory. Basic permissions are read,
write and execute and are assigned to three classes of user; the owner of the file, processes
belonging to a particular group and all of the processes in the system. Each class of user can have
different permissions, for example a file could have permissions which allow its owner to read and
write it, the file's group to read it and for all other processes in the system to have no access at all.
Groups are Linux's way of assigning privileges to files and directories for a group of users rather
than to a single user or to all processes in the system. You might, for example, create a group for all
of the users in a software project and arrange it so that only they could read and write the source
code for the project. A process can belong to several groups (a maximum of 32 is the default) and
these are held in the groups vector in the task_struct for each process. So long as a file has
access rights for one of the groups that a process belongs to then that process will have appropriate
group access rights to that file.
There are four pairs of process and group identifiers held in a processes task_struct:

uid, gid
The user identifier and group identifier of the user that the process is running on behalf of,
effective uid and gid
There are some programs which change the uid and gid from that of the executing process
into their own (held as attributes in the VFS inode describing the executable image). These
programs are known as setuid programs and they are useful because it is a way of restricting
accesses to services, particularly those that run on behalf of someone else, for example a
network daemon. The effective uid and gid are those from the setuid program and the uid and
gid remain as they were. The kernel checks the effective uid and gid whenever it checks for
privilege rights.
file system uid and gid
These are normally the same as the effective uid and gid and are used when checking file
system access rights. They are needed for NFS mounted filesystems where the user mode
NFS server needs to access files as if it were a particular process. In this case only the file
system uid and gid are changed (not the effective uid and gid). This avoids a situation where
malicious users could send a kill signal to the NFS server. Kill signals are delivered to
processes with a particular effective uid and gid.
saved uid and gid
These are mandated by the POSIX standard and are used by programs which change the
processes uid and gid via system calls. They are used to save the real uid and gid during the
time that the original uid and gid have been changed.

Linux, like all Unix TM uses user and group identifiers to check for access rights to files and images
in the system. All of the files in a Linux system have ownerships and permissions, these permissions
describe what access the system's users have to that file or directory. Basic permissions are read,
write and execute and are assigned to three classes of user; the owner of the file, processes
belonging to a particular group and all of the processes in the system. Each class of user can have
different permissions, for example a file could have permissions which allow its owner to read and
write it, the file's group to read it and for all other processes in the system to have no access at all.
Groups are Linux's way of assigning privileges to files and directories for a group of users rather
than to a single user or to all processes in the system. You might, for example, create a group for all
of the users in a software project and arrange it so that only they could read and write the source
code for the project. A process can belong to several groups (a maximum of 32 is the default) and
these are held in the groups vector in the task_struct for each process. So long as a file has
access rights for one of the groups that a process belongs to then that process will have appropriate
group access rights to that file.
There are four pairs of process and group identifiers held in a processes task_struct:

uid, gid
The user identifier and group identifier of the user that the process is running on behalf of,
effective uid and gid
There are some programs which change the uid and gid from that of the executing process
into their own (held as attributes in the VFS inode describing the executable image). These
programs are known as setuid programs and they are useful because it is a way of restricting
accesses to services, particularly those that run on behalf of someone else, for example a
network daemon. The effective uid and gid are those from the setuid program and the uid and
gid remain as they were. The kernel checks the effective uid and gid whenever it checks for
privilege rights.
file system uid and gid
These are normally the same as the effective uid and gid and are used when checking file
system access rights. They are needed for NFS mounted filesystems where the user mode
NFS server needs to access files as if it were a particular process. In this case only the file
system uid and gid are changed (not the effective uid and gid). This avoids a situation where
malicious users could send a kill signal to the NFS server. Kill signals are delivered to
processes with a particular effective uid and gid.
saved uid and gid
These are mandated by the POSIX standard and are used by programs which change the
processes uid and gid via system calls. They are used to save the real uid and gid during the
time that the original uid and gid have been changed.

The Life Cycle of Processes


From the time a process is created with a fork() until it has completed its job and disappears from
the process table, it goes through many different states. The state a process is in changes many times
during its "life." These changes can occur, for example, when the process makes a system call, it is
someone else's turn to run, an interrupt occurs, or the process asks for a resource that is currently
not available.
A commonly used model shows processes operating in one of six separate states, which you can
find in sched.h:
1. executing in user mode
2. executing in kernel mode
3. ready to run
4. sleeping
5. newly created, not ready to run, and not sleeping
6. issued exit system call (zombie)
The states listed here describe what is happening conceptually and do not indicate what "official"
state a process is in. The official states are listed below:
TASK_RUNNING task (process) currently running
TASK_INTERRUPTABLE process is sleeping but can be woken up (interrupted)
TASK_UNINTERRUPTABLE process is sleeping but can not be woken up (interrupted)
process terminated but its status was not collected (it was not
TASK_ZOMBIE
waited for)
TASK_STOPPED process stopped by a debugger or job control
TASK_SWAPPING (removed in 2.3.x kernel)
Table - Process States in sched.h
In my list of states, there was no mention of a process actually being on the processor
(TASK_RUNNING). Processes that are running in kernel mode or in user mode are both in the
TASK_RUNNING state. Although there is no 1:1 match-up, I hope you'll see what each state means
as we go through the following description. You can see how this all looks graphically in the figure
below.
A newly created process enters the system in state 5. If the process is simply a copy of the original
process (a fork but no exec), it then begins to run in the state that the original process was in (1 or
2). (Why none of the other states? It has to be running to fork a new process.) If an exec() is made,
then this process will end up in kernel mode (2). It is possible that the fork()-exec() was done in
system mode and the process goes into state 1. However, this highly unlikely.
When a process is running, an interrupt may be generated (more often than not, this is the system
clock) and the currently running process is pre-empted (3). This is the same state as state 3 because
it is still ready to run and in main memory. The only difference is that the process was just kicked
off the processor.
When the process makes a system call while in user mode (1), it moves into state 2 where it begins
to run in kernel mode. Assume at this point that the system call made was to read a file on the hard
disk. Because the read is not carried out immediately, the process goes to sleep, waiting on the event
that the system has read the disk and the data is ready. It is now in state 4. When the data is ready,
the process is awakened. This does not mean it runs immediately, but rather it is once again ready to
run in main memory (3).
If a process that was asleep is awakened (perhaps when the data is ready), it moves from state 4
(sleeping) to state 3 (ready to run). This can be in either user mode (1) or kernel mode (2). A process
can end its life by either explicitly calling the exit() system call or having it called for them. The
exit() system call releases all the data structures that the process was using. One exception is the slot
in the process table, which is the responsibility of the init process.
The reason for hanging around is that the slot in the process table is used for the exit code of the
exiting process. This can be used by the parent process to determine whether the process did what it
was supposed to do or whether it ran into problems. The process shows that it has terminated by
putting itself into state 8, and it becomes a "zombie." Once here, it can never run again because
nothing exists other than the entry in the process table.
This is why you cannot "kill" a zombie process. There is nothing there to kill. To kill a process, you
need to send it a signal (more on signals later). Because there is nothing there to receive or process
that signal, trying to kill it makes little sense. The only thing to do is to let the system clean it up.
If the exiting process has any children, they are "inherited" by init. One value stored in the process
structure is the PID of that process' parent process. This value is (logically) referred to as the parent
process ID or PPID. When a process is inherited by init, the value of its PPID is changed to 1 (the
PID of init).
A process' state change can cause a context switch in several different cases. One case is when the
process voluntarily goes to sleep, which can happen when the process needs a resource that is not
immediately available. A very common example is your login shell. You type in a command, the
command is executed, and you are back to a shell prompt. Between the time the command is
finished and you input your next command, a very long time could pass - at least two or three
seconds.
Rather than constantly checking the keyboard for input, the shell puts itself to sleep while waiting
on an event. That event might be an interrupt from the keyboard to say "Hey! I have input for you."
When a process puts itself to sleep, it sleeps on a particular wait channel (WCHAN). When the
event that is associated with that wait channel occurs, every process waiting on that wait channel is
woken up. To find out what wait channels processes are waiting on for your system see the section
on system monitoring.
There is probably only one process waiting on input from your keyboard at any given time.
However, many processes could be waiting for data from the hard disk. If so, there might be dozens
of processes all waiting on the same wait channel. All are woken up when the hard disk is ready. It
may be that the hard disk has read only the data for a subset of the processes waiting. Therefore, if
the program is correctly written, the processes check to see whether their data is ready for them. If
not, they put themselves to sleep on the same wait channel.
When a process puts itself to sleep, it is voluntarily giving up the CPU. It may be that this process
had just started its turn when it noticed that it didn't have some resource it needed. Rather than
forcing other processes to wait until the first one gets its "fair share" of the CPU, that process is nice
and lets some other process have a turn on the CPU.
Because the process is being so nice to let others have a turn, the kernel will be nice to the process.
One thing the kernel allows is that a process that puts itself to sleep can set the priority at which it
will run when it wakes. Normally, the kernel process scheduling algorithm calculates the priorities
of all the processes. In exchange for voluntarily giving up the CPU, however, the process is allowed
to choose its own priority.

Process Scheduling
Like many dialects of UNIX, the process scheduler is a function inside the kernel, not a separate
process. Actually, it's better to say that process scheduling is done by two functions working
together, both of which are a part of sched.c. The first function is schedule(), which does the actual
scheduling. The other is do_timer(), which is called at different times and whose function is to
update the times of each process. Essentially, this is used to keep track of how long each process has
been running, how long it has had the processors, how long it has been in user mode, how long it
has been in kernel mode, etc.
In the section on operating system basics, I mentioned that each process gets a time slice that's
1/100th of a second long. At the end of each time slice, the do_timer() function is called and
priorities are recalculated. Each time a system call returns to user mode, do_timer() is also called to
update the times.
Scheduling processes is not as easy as finding the process that has been waiting the longest. Some
operating systems do this kind of scheduling, which is referred to as "round-robin." The processes
could be thought of as sitting in a circle. The scheduling algorithm could then be though of as a
pointer that moves around the circle, getting to each process in turn. The Linux scheduler does a
modified version of round-robin scheduling, however, so that processes with a higher priority get to
run more often and longer.
Linux also allows you to be nice to your fellow processes. If you feel that your work is not as
important as someone else's, you might want to consider being nice to them. This is done with the
nice command, the syntax of which is
nice <nice_value> <command>
For example, if you wanted to run the date command with a lower priority, you could run it like
this:
nice -10 date
This decreases the start priority of the date command by 10. Nice values range from 19 (lowest
priority) to -20 (highest priority). Note that only root can increase a process' priority, that is, use a
negative nice value. The nice value only affects running processes, but child processes inherit the
nice value of their parent. By default, processes that users start have a nice value of 20.
The numeric value calculated for the priority is the opposite of what we normally think of as
priority. A better way of thinking about it is like the pull-down number tickets you get at the ice
cream store. The lower the number, the sooner you'll be served. So it is for processes as well.
Although the nice command only works when you start a process, you can change the nice value on
a running process by using the renice command. It uses the same priorities as nice, but is used on
processes that are already running. It can take the -p option for a specific PID, the -g option for a
process group, or -u for the processes belonging to a specific user.
The number of times the clock interrupts per second, and therefore the numbers of times the priority
is recalculated, is defined by the HZ system variable. This is defined by default to be 100HZ, or 100
times a second. However, we are assuming that the priorities are only calculated once a second
instead of 100 times.

Scheduling in Multiprocessor Systems


Systems with multiple CPUs are reasonably rare in the Linux world but a lot of work has already
gone into making Linux an SMP (Symmetric Multi-Processing) operating system. That is, one that
is capable of evenly balancing work between the CPUs in the system. Nowhere is this balancing of
work more apparent than in the scheduler.
In a multiprocessor system, hopefully, all of the processors are busily running processes. Each will
run the scheduler separately as its current process exhausts its time-slice or has to wait for a system
resource. The first thing to notice about an SMP system is that there is not just one idle process in
the system. In a single processor system the idle process is the first task in the task vector, in an
SMP system there is one idle process per CPU, and you could have more than one idle CPU.
Additionally there is one current process per CPU, so SMP systems must keep track of the current
and idle processes for each processor.
In an SMP system each process's task_struct contains the number of the processor that it is
currently running on (processor) and its processor number of the last processor that it ran on
(last_processor). There is no reason why a process should not run on a different CPU each
time it is selected to run but Linux can restrict a process to one or more processors in the system
using the processor_mask. If bit N is set, then this process can run on processor N. When the
scheduler is choosing a new process to run it will not consider one that does not have the
appropriate bit set for the current processor's number in its processor_mask. The scheduler also
gives a slight advantage to a process that last ran on the current processor because there is often a
performance overhead when moving a process to a different processor.
Creating a Process
When the system starts up it is running in kernel mode and there is, in a sense, only one process, the
initial process. Like all processes, the initial process has a machine state represented by stacks,
registers and so on. These will be saved in the initial process's task_struct data structure when
other processes in the system are created and run. At the end of system initialization, the initial
process starts up a kernel thread (called init) and then sits in an idle loop doing nothing.
Whenever there is nothing else to do the scheduler will run this, idle, process. The idle process's
task_struct is the only one that is not dynamically allocated, it is statically defined at kernel
build time and is, rather confusingly, called init_task.

The init kernel thread or process has a process identifier of 1 as it is the system's first real
process. It does some initial setting up of the system and then executes the system initialization
program. This is one of /etc/init, /bin/init or /sbin/init depending on your system.
The init program uses /etc/inittab as a script file to create new processes within the
system. These new processes may themselves go on to create new processes. For example the
getty process may create a login process when a user attempts to login. All of the processes in
the system are descended from the init kernel thread.

New processes are created by cloning old processes, or rather by cloning the current process. A new
task is created by a system call (fork or clone)
and the cloning happens within the kernel in kernel mode. At the end of the system call there is a
new process waiting to run once the scheduler chooses it. A new task_struct data structure is
allocated from the system's physical memory with one or more physical pages for the cloned
process's stacks (user and kernel). A new process identifier may be created, one that is unique within
the set of process identifiers in the system. However, it is perfectly reasonable for the cloned
process to keep its parents process identifier. The new task_struct is entered into the task
vector and the contents of the old process's task_struct are copied into the cloned
task_struct.

When cloning processes Linux allows the two processes to share resources rather than have two
separate copies. This applies to the process's files, signal handlers and virtual memory. When the
resources are to be shared their respective count fields are incremented so that Linux will not
deallocate these resources until both processes have finished using them. So, for example, if the
cloned process is to share virtual memory, its task_struct will contain a pointer to the
mm_struct of the original process and that mm_struct has its count field incremented to
show the number of current processes sharing it.
Cloning a process's virtual memory is rather tricky. A new set of vm_area_struct data
structures must be generated together with their owning mm_struct data structure and the cloned
process's page tables. None of the process's virtual memory is copied at this point. That would be a
rather difficult and lengthy task for some of that virtual memory would be in physical memory,
some in the executable image that the process is currently executing and possibly some would be in
the swap file. Instead Linux uses a technique called ``copy on write'' which means that virtual
memory will only be copied when one of the two processes tries to write to it. Any virtual memory
that is not written to, even if it can be, will be shared between the two processes without any harm
occuring. The read only memory, for example the executable code, will always be shared. For
``copy on write'' to work, the writeable areas have their page table entries marked as read only and
the vm_area_struct data structures describing them are marked as ``copy on write''. When one
of the processes attempts to write to this virtual memory a page fault will occur. It is at this point
that Linux will make a copy of the memory and fix up the two processes' page tables and virtual
memory data structures.

Executing Programs
In Linux, as in Unix TM, programs and commands are normally executed by a command interpreter.
A command interpreter is a user process like any other process and is called a shell 2.

There are many shells in Linux, some of the most popular are sh, bash and tcsh. With the
exception of a few built in commands, such as cd and pwd, a command is an executable binary file.
For each command entered, the shell searches the directories in the process's search path, held in
the PATH environment variable, for an executable image with a matching name. If the file is found
it is loaded and executed. The shell clones itself using the fork mechanism described above and then
the new child process replaces the binary image that it was executing, the shell, with the contents of
the executable image file just found. Normally the shell waits for the command to complete, or
rather for the child process to exit. You can cause the shell to run again by pushing the child process
to the background by typing control-Z, which causes a SIGSTOP signal to be sent to the child
process, stopping it. You then use the shell command bg to push it into a background, the shell
sends it a SIGCONT signal to restart it, where it will stay until either it ends or it needs to do
terminal input or output.
An executable file can have many formats or even be a script file. Script files have to be recognized
and the appropriate interpreter run to handle them; for example /bin/sh interprets shell scripts.
Executable object files contain executable code and data together with enough information to allow
the operating system to load them into memory and execute them. The most commonly used object
file format used by Linux is ELF but, in theory, Linux is flexible enough to handle almost any
object file format.
Figure: Registered Binary Formats
As with file systems, the binary formats supported by Linux are either built into the kernel at kernel
build time or available to be loaded as modules. The kernel keeps a list of supported binary formats
(see figure 4.3) and when an attempt is made to execute a file, each binary format is tried in turn
until one works.
Commonly supported Linux binary formats are a.out and ELF. Executable files do not have to be
read completely into memory, a technique known as demand loading is used. As each part of the
executable image is used by a process it is brought into memory. Unused parts of the image may be
discarded from memory.

ELF
The ELF (Executable and Linkable Format) object file format, designed by the Unix System
Laboratories, is now firmly established as the most commonly used format in Linux. Whilst there is
a slight performance overhead when compared with other object file formats such as ECOFF and
a.out, ELF is felt to be more flexible. ELF executable files contain executable code, sometimes
refered to as text, and data. Tables within the executable image describe how the program should be
placed into the process's virtual memory. Statically linked images are built by the linker (ld), or link
editor, into one single image containing all of the code and data needed to run this image. The
image also specifies the layout in memory of this image and the address in the image of the first
code to execute.
It is a simple C program that prints ``hello world'' and then exits. The header describes it as an ELF
image with two physical headers (e_phnum is 2) starting 52 bytes (e_phoff) from the start of the
image file. The first physical header describes the executable code in the image. It goes at virtual
address 0x8048000 and there is 65532 bytes of it. This is because it is a statically linked image
which contains all of the library code for the printf() call to output ``hello world''. The entry
point for the image, the first instruction for the program, is not at the start of the image but at virtual
address 0x8048090 (e_entry). The code starts immediately after the second physical header. This
physical header describes the data for the program and is to be loaded into virtual memory at
address 0x8059BB8. This data is both readable and writeable. You will notice that the size of the
data in the file is 2200 bytes (p_filesz) whereas its size in memory is 4248 bytes. This because
the first 2200 bytes contain pre-initialized data and the next 2048 bytes contain data that will be
initialized by the executing code. When Linux loads an ELF executable image into the process's
virtual address space, it does not actually load the image.
It sets up the virtual memory data structures, the process's vm_area_struct tree and its page
tables. When the program is executed page faults will cause the program's code and data to be
fetched into physical memory. Unused portions of the program will never be loaded into memory.
Once the ELF binary format loader is satisfied that the image is a valid ELF executable image it
flushes the process's current executable image from its virtual memory. As this process is a cloned
image (all processes are) this, old, image is the program that the parent process was executing, for
example the command interpreter shell such as bash. This flushing of the old executable image
discards the old virtual memory data structures and resets the process's page tables. It also clears
away any signal handlers that were set up and closes any files that are open. At the end of the flush
the process is ready for the new executable image. No matter what format the executable image is,
the same information gets set up in the process's mm_struct. There are pointers to the start and
end of the image's code and data. These values are found as the ELF executable images physical
headers are read and the sections of the program that they describe are mapped into the process's
virtual address space. That is also when the vm_area_struct data structures are set up and the
process's page tables are modified. The mm_struct data structure also contains pointers to the
parameters to be passed to the program and to this process's environment variables.

ELF Shared Libraries


A dynamically linked image, on the other hand, does not contain all of the code and data required to
run. Some of it is held in shared libraries that are linked into the image at run time. The ELF shared
library's tables are also used by the dynamic linker when the shared library is linked into the image
at run time. Linux uses several dynamic linkers, ld.so.1, libc.so.1 and ld-linux.so.1,
all to be found in /lib. The libraries contain commonly used code such as language subroutines.
Without dynamic linking, all programs would need their own copy of the these libraries and would
need far more disk space and virtual memory. In dynamic linking, information is included in the
ELF image's tables for every library routine referenced. The information indicates to the dynamic
linker how to locate the library routine and link it into the program's address space.

Script Files
Script files are executables that need an interpreter to run them. There are a wide variety of
interpreters available for Linux; for example wish, perl and command shells such as tcsh. Linux
uses the standard Unux TM convention of having the first line of a script file contain the name of the
interpreter. So, a typical script file would start: #!/usr/bin/wish
The script binary loader tries to find the intepreter for the script. It does this by attempting to open
the executable file that is named in the first line of the script. If it can open it, it has a pointer to its
VFS inode and it can go ahead and have it interpret the script file. The name of the script file
becomes argument zero (the first argument) and all of the other arguments move up one place (the
original first argument becomes the new second argument and so on). Loading the interpreter is
done in the same way as Linux loads all of its executable files. Linux tries each binary format in
turn until one works. This means that you could in theory stack several interpreters and binary
formats making the Linux binary format handler a very flexible piece of software.

Processes in Action
If you are like me, knowing how things work in theory is not enough. You want to see how things
work on your system. Linux provides several tools for you to watch what is happening. The first
tool is perhaps the only one that the majority of users have ever seen. This is the ps command,
which gives you the process status of particular processes.
Although users can look at processes using the ps command, they cannot look at the insides of the
processes themselves. This is because the ps command simply reads the process table, which
contains only the control and data structures necessary to administer and manager the process and
not the process itself. Despite this, using ps can not only show you a lot about what your system is
doing but can give you insights into how the system works. Because much of what I will talk about
is documented in the ps man-page, I suggest in advance that you look there for more details.
If you start ps from the command with no options, the default behavior is to show the processes
running for the user who ran the . With a logon on two terminals, it looks something like this:
PID TTY TIME CMD 1857 pts/5 00:00:00 su 1858 pts/5 00:00:00 bash 24375 pts/6 00:00:00 su
24376 pts/6 00:00:00 bash 25620 pts/6 00:00:00 ps
This shows the process ID (PID), the terminal that the process is running on (TTY), the total
amount of time the process has had on the CPU (TIME), and the command that was run (CMD).
The PID is assigned to the process when it is created. This is a unique number that starts at 1 with
the init process when the system is first booted, and then increments until it reaches a pre-defined
limit and then it starts over. Note that if a particular process is still running, the system will not re-
use that ID, but will skip to the next free ID.
In the example above, the processes are actually running on a . In a nutshell, this means that the
terminal is not really a physical device. You typically have pseudo-terminals when opening a
console within a or conneting to a remote machine via telent or ssh.
Note that I have read some books that claim that since ps is showing only the processes for the
current user, the terminal should always be the same. Although this appears logical at first, it is not
entirely correct. If I were to be running a , I could have many console sessions open, each running
on their own . So, as we see in this example, there are different processes running on different a .
However, newer versions of ps that I have worked with only show the processes running on the
current . If you want all of the process for a given user, you can run ps like this: ps -u jimmo
The time is listed in hours, minutes and seconds. It is possible that the process has been on the
system for hours and even days and it still reports zero time. The reason is that this is the amount of
time that the process is actually running. If, for example, I were to immediately start vi when I
logged in, I could continue to use vi for hours and my original shell process would not get a chance
to run again. It would be sleeping until the vi process finished.
Although this is useful in many circumstances, it doesn't say much about these processes, Lets see
what the long output looks like. This is run like this: ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 100 S 501 1857 1844 0 69 0 -
537 wait4 pts/5 00:00:00 su 000 S 501 1858 1857 0 69 0 - 683 read_c pts/5 00:00:00 bash 100 S
501 24375 24362 0 69 0 - 537 wait4 pts/6 00:00:00 su 000 S 501 24376 24375 0 72 0 - 683 wait4
pts/6 00:00:00 bash 000 R 501 25621 24376 0 76 0 - 729 - pts/6 00:00:00 ps
This output looks a little better. At least there are more entries, so maybe it is more interesting. The
columns PID, TTY, TIME, and COMMAND are the same as in the previous example. The first
column (F) is the octal representation of what flags the process has. Most have no flags, but in this
example, both su processes have a flag value of 100, which indciates super-user privileges. This
makes sense because I used su to switch to root and then used su again to switch to another user. For
a list of possible flags, see the ps man-page.
Note that example this assumes the behaviour of ps that displays all of the processes for the user
across all pseudo-terminals. You may need to use the -u USERNAME option to see all of the
processes.
The S column tells us what state the process is in. All but the last ps process are in state S, or
sleeping. This is the typical state when they are waiting on something. However, ps must be running
to give us this output, so it has a state R, for running. Note that on a single processor system, only
one process can be running at once. Therefore, you will never see more than one process in state R.
Here we see that the bash process on line 5 is sleeping. Although I can't tell from this output, I know
that the event, on which the shell is waiting, is the completion of the ps command. The PID and
PPID columns, the Process ID and Parent Process ID, respectively, are one indication. Notice that
the PPID of the ps process is the same as the PID of the bash process. This is because I started the
ps command from the bash command line and the bash had to do a fork() -exec() to start up the ps.
This makes ps a child process of the bash. Because I didn't start the ps in the background, I know the
bash is waiting on the completion of the ps. (More on this in a moment.)
The C column is the processor utilization. With the exception of the ps command, all of the other
processes listed are waiting for some event, such as another process to exit or (as in the case of shell
processes) are waiting for user input. Note that in this example all of the entries have a 0 (zero) in
the C column. This does not mean they have used no CPU time, but rather it is so low that it is
reported as zero.
The PRI column shows us the priority of the process followed by the nice value (NI). The nice
value is a way of influencing how often the process is scheduled to run. The higher the nice value
the "nicer" the processes is, and lets other processes run. The SZ column is the size in blocks of the
core image of the process. This is followed by the process' wait channel, or what the process is
waiting on. In the case of the bash process on the second line, it is waiting on the wait channel
"read_c". That is, it is waiting to read a character from the keyboard.
The ps process is on the processor and in the state R-RUNNABLE. As a matter of fact, I have never
run a ps command where ps was not RUNNABLE. Why? Well, the only way for ps to read the
process table is to be running and the only way for a process to be running is to be runnable.
However, with this output of ps, we cannot actually see the state, but guess that ps is running in that
it is not waiting on anything (there is a - in the WCHAN column). To see what state the process is in
we need to us a different option, for example ps ux, which gives us this:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME CMD linux-tu 1857 0.0 0.0 2148 4
pts/5 S Mar27 0:00 su - linux-tu linux-tu 1858 0.0 0.0 2732 4 pts/5 S Mar27 0:00 -bash linux-tu
24375 0.0 0.2 2148 824 pts/6 S 16:36 0:00 su - linux-tu linux-tu 24376 0.0 0.3 2732 1484 pts/6 S
16:36 0:00 -bash linux-tu 25659 0.0 0.3 2452 1488 pts/6 R 21:52 0:00 ps -ux
Because I am running these processes as the user linux-tu, the USER column shows this. The owner
is almost always the user who started the process. However, you can change the owner of a process
using the setuid() or the seteuid() system call.
VSZ, is the size of the virtual memory image in kilobytes. This is followed by The RSS column is
the "resident set size," or how many kilobytes of the program is in memory. The difference being
that the virtual memory size is essentially what the total memory size of the process would be if it
were all in memory, while the RSS is what is actually in memory.
Though I can't prove this, I can make some inferences. First, let's look at the ps output again. This
time, lets start ps -l in the background, which gives us the following output:
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
100 S 501 1857 1844 0 69 0 - 537 wait4 pts/5 00:00:00 su
000 S 501 1858 1857 0 69 0 - 683 read_c pts/5 00:00:00 bash
100 S 501 24375 24362 0 69 0 - 537 wait4 pts/6 00:00:00 su
000 S 501 24376 24375 0 72 0 - 684 read_c pts/6 00:00:00 bash
000 R 501 29043 24376 0 78 0 - 729 - pts/6 00:00:00 ps

We now see that the same process (24376) has a different wait channel. This time it is read_c, which
tells us it is waiting for keyboard input.
In the second example, bash did a fork-exec, but because we put it in the background, it returned to
the prompt and didn't wait for the ps to complete. Instead, it was waiting for more input from the
keyboard. In the second example, bash did nothing. However, in the first example, we did not put
the command in the background, so the WCHAN was the completion of ps.
So far we have only seen the "short" version of the command. In essence, this is only the actual
command name. If we use the -f (for "full") we can see the command and any options, like this:
UID PID PPID C STIME TTY TIME CMD jimmo 2271 2184 0 Mar17 pts/11 00:00:00
/bin/bash jimmo 17344 2271 0 10:03 pts/11 00:00:00 ps -f

Note that in the case of the bash command, we see the full path. This is because that is what was
used to start the process. Without the -f all we would see is bash. In the case of the ps command, the
full listing shows the option -f as well. However, without it, all we would see is ps.
To see the relationship between various processes you can use the --forest. If we wanted to see a
forest (i.e. many process "trees") of the user jimmo, the command might look like this:
ps -u jimmo --forest
Which gives us output like this:
PID TTY TIME CMD 2184 ? 00:00:37 \_ kdeinit 2260 pts/10 00:00:00 | \_ bash 17058
pts/10 00:00:00 | | \_ man 17059 pts/10 00:00:00 | | \_ sh 17063 pts/10 00:00:00
| | \_ nroff 17069 pts/10 00:00:00 | | | \_ groff 17071 pts/10 00:00:00 | | |
| \_ grotty 17072 pts/10 00:00:00 | | | \_ cat 17064 pts/10 00:00:00 | | \_ less
11607 ? 00:00:18 \_ soffice.bin 11643 ? 00:00:00 | \_ soffice.bin 11644 ?
00:00:00 | \_ soffice.bin 11645 ? 00:00:00 | \_ soffice.bin

Note that this is only an portion of the actual process forest. At this moment on my machine I
literally have dozens of processes. However, this example gives you a good idea of what a process
looks like. Give it a try on your system to see what you might be running.
One interesting thing to note is the tree starting at PID 17058. This is all part of a man command
that I started. For the most part, I am not even aware that these other process (i.e. nroff are running).
Using the -a option, you show all of the processes for the given process with the exception of the
so-call "". In a nutshell, the session leader is the process which started the session. Typically, this is
the login or often kdeinit if you are running a console from within KDE (for example).
If you did a listing of all processes using, for example, ps -e you might get something like this:
PID TTY TIME CMD 1 ? 00:00:05 init 2 ? 00:00:00 keventd 3 ? 00:00:00
ksoftirqd_CPU0 4 ? 00:00:09 kswapd 5 ? 00:00:00 bdflush 6 ? 00:00:00 kupdated
7 ? 00:00:05 kinoded 8 ? 00:00:00 mdrecoveryd 13 ? 00:00:00 kjournald 387 ?
00:00:00 lvm-mpd 435 ? 00:00:00 kjournald 440 ? 00:00:00 pagebufd 441 ? 00:00:00
xfslogd/0 442 ? 00:00:00 xfsdatad/0 443 ? 00:00:00 xfssyncd 444 ? 00:00:00
xfssyncd 445 ? 00:00:07 kjournald 446 ? 00:00:00 xfssyncd 447 ? 00:00:00
kjournald 448 ? 00:00:00 kjournald 923 ? 00:00:01 syslogd 926 ? 00:00:02 klogd
985 ? 00:00:00 khubd 1130 ? 00:00:00 resmgrd 1166 ? 00:00:00 portmap 1260 ?
00:00:00 mysqld_safe 1325 ? 00:00:00 acpid 1340 ? 00:00:00 mysqld 1352 ?
00:00:00 sshd 1474 ? 00:00:00 cupsd 1479 ? 00:00:00 mysqld 1480 ? 00:00:00
mysqld 1481 ? 00:00:00 mysqld 1482 ? 00:00:00 mysqld 1676 ? 00:00:00 master 1747
? 00:00:00 qmgr 1775 ? 00:00:00 httpd 1842 ? 00:00:00 cron 1846 ? 00:00:00 nscd
1847 ? 00:00:00 nscd 1848 ? 00:00:00 nscd 1849 ? 00:00:00 nscd 1850 ? 00:00:00
nscd 1851 ? 00:00:00 nscd 1852 ? 00:00:00 nscd 1856 ? 00:00:00 smbd 1876 ?
00:00:44 httpd 1972 ? 00:00:00 kdm 2022 ? 01:10:11 X 2023 ? 00:00:00 kdm 2026
tty1 00:00:00 mingetty 2027 tty2 00:00:00 mingetty 2028 tty3 00:00:00 mingetty
2029 tty4 00:00:00 mingetty 2030 tty5 00:00:00 mingetty 2031 tty6 00:00:00
mingetty 2058 ? 00:00:00 kde 2094 ? 00:00:00 gpg-agent 2095 ? 00:00:00 ssh-agent
2158 ? 00:00:01 kamix 2160 ? 00:00:14 kdeinit 2164 ? 00:00:00 kdeinit 2166 ?
00:00:01 susewatcher 2167 ? 00:03:54 suseplugger 2178 ? 00:00:07 kdeinit 2180 ?
00:04:33 kmail 2182 ? 00:00:05 kscd 2184 ? 00:00:38 kdeinit 2187 ? 00:08:20
quanta 2188 pts/3 00:00:00 bash 2191 pts/5 00:00:00 bash 2199 pts/8 00:00:00
bash 2220 pts/9 00:00:00 bash 2224 ? 00:01:10 quanta 2260 pts/10 00:00:00 bash
2271 pts/11 00:00:00 bash 2273 pts/13 00:00:00 bash 2277 pts/14 00:00:00 bash
2314 ? 00:00:00 kalarmd 2402 ? 00:00:35 httpd 2831 ? 00:00:01 kdeinit 10244 ?
00:00:00 kdeinit 11607 ? 00:00:18 soffice.bin 11643 ? 00:00:00 soffice.bin 11644
? 00:00:00 soffice.bin 11692 ? 00:02:21 kdeinit 12035 ? 00:00:10 kdeinit 12036
pts/4 00:00:00 bash 12058 pts/6 00:00:00 bash 12088 ? 00:00:58 kdeinit 12148
pts/7 00:00:00 bash 12238 pts/7 00:00:00 man 12239 pts/7 00:00:00 sh 12240 pts/7
00:00:00 less 12241 pts/7 00:00:00 gzip 12242 pts/7 00:00:00 zsoelim 12243 pts/7
00:00:00 tbl 12244 pts/7 00:00:00 nroff 12249 pts/7 00:00:00 groff 12250 pts/7
00:00:00 cat 12251 pts/7 00:00:00 troff 12252 pts/7 00:00:00 grotty 12260 pts/12
00:00:00 bash 12300 pts/15 00:00:00 bash 13010 pts/6 00:00:00 man 13011 pts/6
00:00:00 sh 13016 pts/6 00:00:00 less 13979 ? 00:00:00 smbd 16049 ? 00:00:08
smbd 16138 pts/3 00:00:00 man 16139 pts/3 00:00:00 sh 16144 pts/3 00:00:00 less
16251 ? 00:00:00 mysqld 16273 ? 00:00:00 mysqld 16387 ? 00:00:48 grip 16388 ?
00:00:00 grip 16506 ? 00:00:11 kdeinit 16574 ? 00:00:00 pickup 16683 ? 00:00:00
smbd 16883 ? 00:00:00 login 16922 ? 00:00:00 kdeinit 17058 pts/10 00:00:00 man
17059 pts/10 00:00:00 sh 17063 pts/10 00:00:00 nroff 17064 pts/10 00:00:00 less
17069 pts/10 00:00:00 groff 17071 pts/10 00:00:00 grotty 17072 pts/10 00:00:00
cat 17165 pts/11 00:00:00 ps

Note that in many cases, there is no terminal associated with a specific process. This is common for
system processes (i.e. ) as there is no real terminal associated with it.
As a side note, consider that a terminal may belong to a process as its "controlling terminal". This a
unique terminal device associated with the process. Or better yet, associated with the session. Each
process of that sessions will have the same controlling terminal. Although a session or process can
have only a single controlling terminal, a process does not need to have a controlling terminal at all.
Without going into details of UNIX programming, we can simply say that a process can break that
connection and run with a controlling terminal as is common for graphical processes. These
processes can start child processes, which also do not have a controlling terminal. YOu can see in
the output here, that process without a crontroll terminal have a question-mark (?) in the TTY
column.
Another thing to note is that system processes that are started when the system was first loaded have
very low PIDs. On some system, there may be so little going on that all such system processes have
single digit PIDs (i.e. less than 100). However, in the example above, there are a number of system
processes with relatively high PIDs (i.e. close to 1000). All this means is that so many processes
were started (and probably ended immediately).
For the most part, all of the processes on this system that are associated with a terminal are
associated with a . However, you will see a number of mingetty processes running on physical
terminals tty1 through tty6. These represent the virtual consoles and although no user is using them
at the moment, there is a process associated with them.
Note that by default mingetty runs on terminals tty1 through tty6, but these still need to be
explicitely actived in the /etc/inittab file. For details on the /etc/inittab look at the section on run-
levels. Similar to the --forest option to ps is the pstree command. If we wanted to highlight the
current process and all of it's ancestors we would use the -h option, which would give us something
like this: init-+-acpid |-bdflush |-cron |-cupsd |-httpd---2*[httpd] |-kalarmd |-
kamix |-kdeinit-+-artsd | |-grip---grip | |-18*[kdeinit] | |-kdeinit-+-bash---
man---sh---less | | |-5*[bash] | | |-bash---man---sh-+-less | | | `-nroff-+-cat
| | | `-groff---grotty | | `-bash---pstree | |-kdeinit-+-3*[bash] | | |-bash---
man---sh---less | | `-bash---man-+-gzip | | |-less | | `-sh-+-nroff-+-cat | | |
`-groff-+-grotty | | | `-troff | | |-tbl | | `-zsoelim | |-2*[quanta] | `-
soffice.bin---soffice.bin---4*[soffice.bin] |-11*[kdeinit] |-kdm-+-X | `-kdm---
kde-+-gpg-agent | |-kwrapper | `-ssh-agent |-keventd |-khubd |-kinoded |-
8*[kjournald] |-klogd |-kmail |-kscd |-ksoftirqd_CPU0 |-kswapd |-kupdated |-
login---bash |-lvm-mpd |-master-+-pickup | `-qmgr |-mdrecoveryd |-5*[mingetty]
|-mysqld_safe---mysqld---mysqld---10*[mysqld] |-nscd---nscd---5*[nscd] |-
pagebufd |-portmap |-resmgrd |-smbd---3*[smbd] |-sshd |-suseplugger |-
susewatcher |-syslogd |-xfsdatad/0 |-xfslogd/0 `-3*[xfssyncd]
Note that the processes here are sorted alphabetically. You can use the -n option to sort the tree by
process id. You can find more details about intepreting this output in the section on monitoring
processing.

Virtual Memory
A process's virtual memory contains executable code and data from many sources. First there is the
program image that is loaded; for example a command like ls. This command, like all executable
images, is composed of both executable code and data. The image file contains all of the
information neccessary to load the executable code and associated program data into the virtual
memory of the process. Secondly, processses can allocate (virtual) memory to use during their
processing, say to hold the contents of files that it is reading. This newly allocated, virtual, memory
needs to be linked into the process's existing virtual memory so that it can be used. Thirdly, Linux
processes use libraries of commonly useful code, for example file handling routines. It does not
make sense that each process has its own copy of the library, Linux uses shared libraries that can be
used by several running processes at the same time. The code and the data from these shared
libraries must be linked into this process's virtual address space and also into the virtual address
space of the other processes sharing the library.
In any given time period a process will not have used all of the code and data contained within its
virtual memory. It could contain code that is only used during certain situations, such as during
initialization or to process a particular event. It may only have used some of the routines from its
shared libraries. It would be wasteful to load all of this code and data into physical memory where it
would lie unused. Multiply this wastage by the number of processes in the system and the system
would run very inefficiently. Instead, Linux uses a technique called demand paging where the
virtual memory of a process is brought into physical memory only when a process attempts to use it.
So, instead of loading the code and data into physical memory straight away, the Linux kernel alters
the process's page table, marking the virtual areas as existing but not in memory. When the process
attempts to acccess the code or data the system hardware will generate a page fault and hand control
to the Linux kernel to fix things up.

Figure: A Process's Virtual Memory


The Linux kernel needs to manage all of these areas of virtual memory and the contents of each
process's virtual memory is described by a mm_struct data structure pointed at from its
task_struct. The process's mm_struct

data structure also contains information about the loaded executable image and a pointer to the
process's page tables. It contains pointers to a list of vm_area_struct data structures, each
representing an area of virtual memory within this process.
This linked list is in ascending virtual memory order, the figure above shows the layout in virtual
memory of a simple process together with the kernel data structures managing it. As those areas of
virtual memory are from several sources, Linux abstracts the interface by having the
vm_area_struct point to a set of virtual memory handling routines (via vm_ops). This way all
of the process's virtual memory can be handled in a consistent way no matter how the underlying
services managing that memory differ. For example there is a routine that will be called when the
process attempts to access the memory and it does not exist, this is how page faults are handled.
The process's set of vm_area_struct data structures is accessed repeatedly by the Linux kernel
as it creates new areas of virtual memory for the process and as it fixes up references to virtual
memory not in the system's physical memory. This makes the time that it takes to find the correct
vm_area_struct critical to the performance of the system. To speed up this access, Linux also
arranges the vm_area_struct data structures into an AVL (Adelson-Velskii and Landis) tree.
This tree is arranged so that each vm_area_struct (or node) has a left and a right pointer to its
neighbouring vm_area_struct structure. The left pointer points to node with a lower starting
virtual address and the right pointer points to a node with a higher starting virtual address. To find
the correct node, Linux goes to the root of the tree and follows each node's left and right pointers
until it finds the right vm_area_struct. Of course, nothing is for free and inserting a new
vm_area_struct into this tree takes additional processing time.

When a process allocates virtual memory, Linux does not actually reserve physical memory for the
process. Instead, it describes the virtual memory by creating a new vm_area_struct data
structure. This is linked into the process's list of virtual memory. When the process attempts to write
to a virtual address within that new virtual memory region then the system will page fault. The
processor will attempt to decode the virtual address, but as there are no Page Table Entries for any
of this memory, it will give up and raise a page fault exception, leaving the Linux kernel to fix
things up. Linux looks to see if the virtual address referenced is in the current process's virtual
address space. If it is, Linux creates the appropriate PTEs and allocates a physical page of memory
for this process. The code or data may need to be brought into that physical page from the
filesystem or from the swap disk. The process can then be restarted at the instruction that caused the
page fault and, this time as the memory physically exists, it may continue.
A process's virtual memory contains executable code and data from many sources. First there is the
program image that is loaded; for example a command like ls. This command, like all executable
images, is composed of both executable code and data. The image file contains all of the
information neccessary to load the executable code and associated program data into the virtual
memory of the process. Secondly, processses can allocate (virtual) memory to use during their
processing, say to hold the contents of files that it is reading. This newly allocated, virtual, memory
needs to be linked into the process's existing virtual memory so that it can be used. Thirdly, Linux
processes use libraries of commonly useful code, for example file handling routines. It does not
make sense that each process has its own copy of the library, Linux uses shared libraries that can be
used by several running processes at the same time. The code and the data from these shared
libraries must be linked into this process's virtual address space and also into the virtual address
space of the other processes sharing the library.
In any given time period a process will not have used all of the code and data contained within its
virtual memory. It could contain code that is only used during certain situations, such as during
initialization or to process a particular event. It may only have used some of the routines from its
shared libraries. It would be wasteful to load all of this code and data into physical memory where it
would lie unused. Multiply this wastage by the number of processes in the system and the system
would run very inefficiently. Instead, Linux uses a technique called demand paging where the
virtual memory of a process is brought into physical memory only when a process attempts to use it.
So, instead of loading the code and data into physical memory straight away, the Linux kernel alters
the process's page table, marking the virtual areas as existing but not in memory. When the process
attempts to acccess the code or data the system hardware will generate a page fault and hand control
to the Linux kernel to fix things up. Therefore, for every area of virtual memory in the process's
address space Linux needs to know where that virtual memory comes from and how to get it into
memory so that it can fix up these page faults.
The Linux kernel needs to manage all of these areas of virtual memory and the contents of each
process's virtual memory is described by a mm_struct data structure pointed at from its
task_struct. The process's mm_struct

data structure also contains information about the loaded executable image and a pointer to the
process's page tables. It contains pointers to a list of vm_area_struct data structures, each
representing an area of virtual memory within this process.
This linked list is in ascending virtual memory order, the figure above shows the layout in virtual
memory of a simple process together with the kernel data structures managing it. As those areas of
virtual memory are from several sources, Linux abstracts the interface by having the
vm_area_struct point to a set of virtual memory handling routines (via vm_ops). This way all
of the process's virtual memory can be handled in a consistent way no matter how the underlying
services managing that memory differ. For example there is a routine that will be called when the
process attempts to access the memory and it does not exist, this is how page faults are handled.
The process's set of vm_area_struct data structures is accessed repeatedly by the Linux kernel
as it creates new areas of virtual memory for the process and as it fixes up references to virtual
memory not in the system's physical memory. This makes the time that it takes to find the correct
vm_area_struct critical to the performance of the system. To speed up this access, Linux also
arranges the vm_area_struct data structures into an AVL (Adelson-Velskii and Landis) tree.
This tree is arranged so that each vm_area_struct (or node) has a left and a right pointer to its
neighbouring vm_area_struct structure. The left pointer points to node with a lower starting
virtual address and the right pointer points to a node with a higher starting virtual address. To find
the correct node, Linux goes to the root of the tree and follows each node's left and right pointers
until it finds the right vm_area_struct. Of course, nothing is for free and inserting a new
vm_area_struct into this tree takes additional processing time.
When a process allocates virtual memory, Linux does not actually reserve physical memory for the
process. Instead, it describes the virtual memory by creating a new vm_area_struct data
structure. This is linked into the process's list of virtual memory. When the process attempts to write
to a virtual address within that new virtual memory region then the system will page fault. The
processor will attempt to decode the virtual address, but as there are no Page Table Entries for any
of this memory, it will give up and raise a page fault exception, leaving the Linux kernel to fix
things up. Linux looks to see if the virtual address referenced is in the current process's virtual
address space. If it is, Linux creates the appropriate PTEs and allocates a physical page of memory
for this process. The code or data may need to be brought into that physical page from the
filesystem or from the swap disk. The process can then be restarted at the instruction that caused the
page fault and, this time as the memory physically exists, it may continue.

Times and Timers


The kernel keeps track of a process's creation time as well as the CPU time that it consumes during
its lifetime. Each clock tick, the kernel updates the amount of time in jiffies that the current
process has spent in system and in user mode.
In addition to these accounting timers, Linux supports process specific interval timers.
A process can use these timers to send itself various signals each time that they expire. Three sorts
of interval timers are supported:
Real
the timer ticks in real time, and when the timer has expired, the process is sent a SIGALRM
signal.
Virtual
This timer only ticks when the process is running and when it expires it sends a SIGVTALRM
signal.
Profile
This timer ticks both when the process is running and when the system is executing on behalf
of the process itself. SIGPROF is signalled when it expires.

One or all of the interval timers may be running and Linux keeps all of the neccessary information
in the process's task_struct data structure. System calls can be made to set up these interval
timers and to start them, stop them and read their current values. The virtual and profile timers are
handled the same way.
Every clock tick the current process's interval timers are decremented and, if they have expired, the
appropriate signal is sent.
Real time interval timers are a little different and for these Linux uses the timer mechanism
described timer_list data structure and, when the real interval timer is running, this is queued
on the system timer list. When the timer expires the timer bottom half handler removes it from the
queue and calls the interval timer handler.
This generates the SIGALRM signal and restarts the interval timer, adding it back into the system
timer queue.
Interprocess Communication
Processes communicate with each other and with the kernel to coordinate their activities. Linux
supports a number of Inter-Process Communication (IPC) mechanisms. Signals and pipes are two of
them but Linux also supports the System V IPC mechanisms named after the Unix release in which
they first appeared.

Signals
Signals are a way of sending simple messages to processes. Most of these messages are already
defined and can be found in <linux/signal.h>. However, signals can only be processed when the
process is in . If a has been sent to a process that is in mode, it is dealt with immediately on
returning to user mode.
Signals are one of the oldest inter-process communication methods used by Unix TM systems. They
are used to signal asynchronous events to one or more processes. A signal could be generated by a
keyboard or an error condition such as the process attempting to access a non-existent location in its
virtual memory. Signals are also used by the shells to signal job control commands to their child
processes.

There are a set of defined signals that the kernel can generate or that can be generated by other
processes in the system, provided that they have the correct privileges. You can list a system's set of
signals using the kill command (kill -l), on my Intel Linux box this gives:

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL


5) SIGTRA 6) SIGIOT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 17) SIGCHLD
18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN
22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO
30) SIGPWR
The numbers are different for an Alpha AXP Linux box. Processes can choose to ignore most of the
signals that are generated, with two notable exceptions: neither the SIGSTOP signal which causes a
process to halt its execution nor the SIGKILL signal which causes a process to exit can be ignored.
Otherwise though, a process can choose just how it wants to handle the various signals. Processes
can block the signals and, if they do not block them, processes can either choose to handle the
signals themselves or allow the kernel to handle them. If the kernel handles the signals, it will do
the default actions required for this signal. For example, the default action when a process receives
the SIGFPE (floating point exception) signal is to core dump and then exit. Signals have no
inherent relative priorities. If two signals are generated for a process at the same time then they may
be presented to the process or handled in any order. Also there is no mechanism for handling
multiple signals of the same kind. There is no way that a process can tell if it received 1 or 42
SIGCONT signals.
Linux implements signals using information stored in the task_struct for the process. The
number of supported signals is limited to the word size of the processor. Processes with a word size
of 32 bits can have 32 signals whereas 64 bit processors like the Alpha AXP may have up to 64
signals. The currently pending signals are kept in the signal field with a mask of blocked signals
held in blocked. With the exception of SIGSTOP and SIGKILL, all signals can be blocked. If a
blocked signal is generated, it remains pending until it is unblocked. Linux also holds information
about how each process handles every possible signal and this is held in an array of sigaction
data structures pointed at by the task_struct for each process. Amongst other things it contains
either the address of a routine that will handle the signal or a flag which tells Linux that the process
either wishes to ignore this signal or let the handle the signal for it. The process modifies the default
signal handling by making system calls and these calls alter the sigaction for the appropriate
signal as well as the blocked mask.

Not every in the system can send signals to every other process. For security reason (among other
things) only the kernel can and super users can send signals to all processes. Normal processes can
only send signals to processes with the same uid and gid or to processes in the same process group.
Signals are generated by setting the appropriate bit in the task_struct's signal field. If the
process has not blocked the signal and is waiting but interruptible (in state Interruptible) then it is
woken up by changing its state to Running and making sure that it is in the run queue. That way the
scheduler will consider it a candidate for running when the system next schedules. If the default
handling is needed, then Linux can optimize the handling of the signal. For example if the signal
SIGWINCH (the X window changed focus) and the default handler is being used then there is
nothing to be done.
Signals are not presented to the process immediately after they are generated. Instead, they must
wait until the process is running again. Every time a process exits from a system call, its signal
and blocked fields are checked and, if there are any unblocked signals, they can now be
delivered. This might seem a very unreliable method since it is dependant on the processes checking
the signals, but every process in the system is making system calls, for example to write a character
to the terminal, all of the time. Processes can elect to wait for signals if they wish, they are
suspended in state Interruptible until a signal is presented. The Linux signal processing code looks
at the sigaction structure for each of the current unblocked signals.

If a signal's handler is set to the default action, then the kernel will handle it. The SIGSTOP signal's
default handler will change the current process' state to Stopped, then run the scheduler to select a
new process to run. The default action for the SIGFPE signal core dumps the process and then
causes it to exit. Alternatively, a process may have specfied its own signal handler. This is a routine
that is called whenever the signal is generated and the sigaction structure holds the routine's address.
It is the kernel's job to call the process' signal handling routine. How this happens is processor
specific but all CPUs must cope with the fact that the current process that has been running in
kernel mode is just about to return to the calling process in user mode. The problem is solved by
doing several things such as manipulating the process' stack and registers, resetting the process'
program counter to the address of its signal handling routine and either adding the parameters of the
routine to the call frame or passing in registers. Whatever the CPU's exact mechanism, when the
process resumes operation it appears as if the signal handling routine had been called normally.
Linux is POSIX compatible and so the process can specify which signals are blocked when a
particular signal handling routine is called. This entails changing the blocked mask during the
call to the processes signal handler. The blocked mask must be returned to its original value when
the signal handling routine has finished. Therefore Linux adds a call to a tidy up routine that will
restore the original blocked mask onto the call stack of the signalled process. Linux also
optimizes the case where several signal handling routines need to be called by stacking them so that
each time one handling routine exits, the next one is called until the tidy up routine is called.
Many signals (such as 9, SIGKILL), have the ability to immediately terminate a process. However,
most of these signals can be either ignored or dealt with by the process itself. If not, the will take the
default action specified for that . You can send signals to processes yourself by means of the kill
command, as well as by the Delete key and Ctrl+/. However, you can only send signals to processes
that you own unless you are root. If you are root, you can send signals to any process.
It's possible that the process to which you want to send the is sleeping. If that process is sleeping at
an interruptible priority, then the process will awaken to handle the signal. The keeps track of
pending signals in each process' process structure. This is a 32-bit value in which each bit represents
a single . Because it is only one bit per , there can only be one signal pending of each type. If there
are different kinds of signals pending, the has no way of determining which came in when. It will
therefore process the signals starting at the lowest numbered and moving up.

Pipes
The common Linux shells all allow redirection. For example
$ ls | pr | lpr

pipes the output from the ls command listing the directory's files into the standard input of the pr
command which paginates them. Finally the standard output from the pr command is piped into the
standard input of the lpr command which prints the results on the default printer. Pipes then are
unidirectional byte streams which connect the standard output from one process into the standard
input of another process. Neither process is aware of this redirection and behaves just as it would
normally. It is the shell that sets up these temporary pipes between the processes.
Figure: Pipes
In Linux, a pipe is implemented using two file data structures which both point at the same
temporary VFS inode which, in turn, points at a physical page within memory. Figure 5.1 shows
that each file data structure contains pointers to different file operation routine vectors: one for
writing to the pipe, the other for reading from the pipe. This hides the underlying differences from
the generic system calls which read and write to ordinary files. As the writing process writes to the
pipe, bytes are copied into the shared data page and when the reading process reads from the pipe,
bytes are copied from the shared data page. Linux must synchronize access to the pipe. It must
make sure that the reader and the writer of the pipe are in step and to do this it uses locks, wait
queues and signals.
When the writer wants to write to the pipe it uses the standard write library functions. These all pass
file descriptors that are indices into the process' set of file data structures, each one representing
an open file or, as in this case, an open pipe. The Linux system call uses the write routine pointed at
by the file data structure describing this pipe. That write routine uses information held in the VFS
inode representing the pipe to manage the write request.
If there is enough room to write all of the bytes into the pipe and, so long as the pipe is not locked
by its reader, Linux locks it for the writer and copies the bytes to be written from the process'
address space into the shared data page. If the pipe is locked by the reader or if there is not enough
room for the data then the current process is made to sleep on the pipe inode's wait queue and the
scheduler is called so that another process can run. It is interruptible, so it can receive signals and it
will be awakened by the reader when there is enough room for the write data or when the pipe is
unlocked. When the data has been written, the pipe's VFS inode is unlocked and any waiting readers
sleeping on the inode's wait queue will themselves be awakened. Reading data from the pipe is a
very similar process to writing to it.
Processes are allowed to do non-blocking reads (depending on the mode in which they opened the
file or pipe and if there is no data to be read or if the pipe is locked, an error will be returned,as in
this case). This means that the process can continue to run. The alternative is to wait on the pipe
inode's wait queue until the write process has finished. When both processes have finished with the
pipe, the pipe inode is discarded along with the shared data page.
Linux also supports named pipes, also known as FIFOs because pipes operate on a First In, First
Out principle. The first data written into the pipe is the first data read from the pipe. Unlike pipes,
FIFOs are not temporary objects, they are entities in the file system and can be created using the
mkfifo command. Processes are free to use a FIFO so long as they have appropriate access rights to
it. The way that FIFOs are opened is a little different from pipes. A pipe (its two file data
structures, its VFS inode and the shared data page) is created in one go whereas a FIFO already
exists and is opened and closed by its users. Linux must handle readers opening the FIFO before
writers open it as well as readers reading before any writers have written to it. That aside, FIFOs are
handled almost exactly the same way as pipes and they use the same data structures and operations.

Semaphores
In its simplest form a semaphore is a location in memory whose value can be tested and set by more
than one process. The test and set operation is, so far as each process is concerned, uninterruptible
or atomic; once started nothing can stop it. The result of the test and set operation is the addition of
the current value of the semaphore and the set value, which can be positive or negative. Depending
on the result of the test and set operation one process may have to sleep until the semphore's value
is changed by another process. Semaphores can be used to implement critical regions, areas of
critical code that only one process at a time should be executing.
Although a program variable could be considered "a location in memory whose value can be tested
and set", the key different is that with a semaphore is accessible to other processes, whereas a
variable is only accessible to the one process that created it. The fact that it is accessible from
multiple processes is the key feature of a semaphore.
Say you had many cooperating processes reading records from and writing records to a single data
file. You would want that file access to be strictly coordinated. You could use a semaphore with an
initial value of 1 and, around the file operating code, put two semaphore operations, the first to test
and decrement the semaphore's value and the second to test and increment it. The first process to
access the file would try to decrement the semaphore's value and it would succeed, the semaphore's
value now being 0. This process can now go ahead and use the data file but if another process
wishing to use it now tries to decrement the semaphore's value it would fail as the result would be
-1. That process will be suspended until the first process has finished with the data file. When the
first process has finished with the data file it will increment the semaphore's value, making it 1
again. Now the waiting process can be awakened and this time its attempt to decrement the
semaphore will succeed.
Figure: System V IPC Semaphores
System V IPC semaphore objects each describe a semaphore array and Linux uses the semid_ds
data structure to represent this. All of the semid_ds data structures in the system are pointed at by
the semary, a vector of pointers. There are sem_nsems in each semaphore array, each one
described by a sem data structure pointed at by sem_base. All of the processes that are allowed to
manipulate the semaphore array of a System V IPC semaphore object may make system calls that
perform operations on them. The system call can specify many operations and each operation is
described by three inputs: the semaphore index, the operation value and a set of flags. The
semaphore index is an index into the semaphore array and the operation value is a numerical value
that will be added to the current value of the semaphore.
First Linux tests whether or not all of the operations would succeed. An operation will succeed if
the operation value added to the semaphore's current value would be greater than zero or if both the
operation value and the semaphore's current value are zero. If any of the semaphore operations
would fail Linux may suspend the process but only if the operation flags have not requested that the
system call is non-blocking. If the process is to be suspended then Linux must save the state of the
semaphore operations to be performed and put the current process onto a wait queue. It does this by
building a sem_queue data structure on the stack and filling it out. The new sem_queue data
structure is put at the end of this semaphore object's wait queue (using the sem_pending and
sem_pending_last pointers). The current process is put on the wait queue in the sem_queue
data structure (sleeper) and the scheduler called to choose another process to run.
If all of the semaphore operations would have succeeded and the current process does not need to be
suspended, Linux goes ahead and applies the operations to the appropriate members of the
semaphore array. Now Linux must check that any waiting, suspended, processes may now apply
their semaphore operations. It looks at each member of the operations pending queue
(sem_pending) in turn, testing to see if the semphore operations will succeed this time. If they
will then it removes the sem_queue data structure from the operations pending list and applies the
semaphore operations to the semaphore array. It wakes up the sleeping process making it available
to be restarted the next time the scheduler runs. Linux keeps looking through the pending list from
the start until there is a pass where no semaphore operations can be applied and so no more
processes can be awakened.
There is a problem with semaphores: deadlocks. These occur when one process has altered the
semaphore's value as it enters a critical region but then fails to leave the critical region because it
crashed or was killed. Linux protects against this by maintaining lists of adjustments to the
semaphore arrays. The idea is that when these adjustments are applied, the semaphores will be put
back to the state that they were in before the a process' set of semaphore operations were applied.
These adjustments are kept in sem_undo data structures queued both on the semid_ds data
structure and on the task_struct data structure for the processes using these semaphore arrays.

Each individual semaphore operation may request that an adjustment be maintained. Linux will
maintain at most one sem_undo data structure per process for each semaphore array. If the
requesting process does not have one, then one is created when it is needed. The new sem_undo
data structure is queued both onto this process' task_struct data structure and onto the
semaphore array's semid_ds data structure. As operations are applied to the semphores in the
semaphore array the negation of the operation value is added to this semphore's entry in the
adjustment array of this process' sem_undo data structure. So, if the operation value is 2, then -2 is
added to the adjustment entry for this semaphore.
When processes are deleted, as they exit Linux works through their set of sem_undo data
structures applying the adjustments to the semaphore arrays. If a semaphore set is deleted, the
sem_undo data structures are left queued on the process' task_struct but the semaphore array
identifier is made invalid. In this case the semaphore clean up code simply discards the sem_undo
data structure.

Message Queues
Message queues allow one or more processes to write messages that will be read by one or more
reading processes. Linux maintains a list of message queues, the msgque vector: each element of
which points to a msqid_ds data structure that fully describes the message queue. When message
queues are created, a new msqid_ds data structure is allocated from system memory and inserted
into the vector.
Figure: System V IPC Message Queues
Each msqid_ds data structure contains an ipc_perm data structure and pointers to the messages
entered onto this queue. In addition, Linux keeps queue modification times such as the last time that
this queue was written to and so on. The msqid_ds also contains two wait queues: one for the
writers to the queue and one for the readers of the queue.
Each time a process attempts to write a message to the write queue, its effective user and group
identifiers are compared with the mode in this queue's ipc_perm data structure. If the process can
write to the queue then the message may be copied from the process' address space into a msg data
structure and put at the end of this message queue. Each message is tagged with an application
specific type, agreed between the cooperating processes. However, there may be no room for the
message as Linux restricts the number and length of messages that can be written. In this case the
process will be added to this message queue's write wait queue and the scheduler will be called to
select a new process to run. It will be awakened when one or more messages have been read from
this message queue.
Reading from the queue is similar. Again, the process' access rights to the write queue are checked.
A reading process may choose to either get the first message in the queue regardless of its type or
select messages with particular types. If no messages match this criteria the reading process will be
added to the message queue's read wait queue and the scheduler run. When a new message is
written to the queue this process will be awakened and run again.

Shared Memory
Shared memory allows one or more processes to communicate via memory that appears in all of
their virtual address spaces. The pages of the virtual memory is referenced by page table entries in
each of the sharing processes' page tables. It does not have to be at the same address in all of the
processes' virtual memory. As with all System V IPC objects, access to shared memory areas is
controlled via keys and access rights checking. Once the memory is being shared, there are no
checks on how the processes use it. They must rely on other mechanisms, for example System V
semaphores, to synchronize access to the memory.

Figure: System V IPC Shared Memory


Each newly created shared memory area is represented by a shmid_ds data structure. These are
kept in the shm_segs vector.

The shmid_ds data structure decribes how big the area of shared memory is, how many processes
are using it and information about how that shared memory is mapped into their address spaces. It is
the creator of the shared memory that controls the access permissions to that memory and whether
its key is public or private. If it has enough access rights it may also lock the shared memory into
physical memory.
Each process that wishes to share the memory must attach to that virtual memory via a system call.
This creates a new vm_area_struct data structure describing the shared memory for this
process. The process can choose where in its virtual address space the shared memory goes or it can
let Linux choose a free area large enough. The new vm_area_struct structure is put into the list
of vm_area_struct pointed at by the shmid_ds. The vm_next_shared and
vm_prev_shared pointers are used to link them together. The virtual memory is not actually
created during the attachment; it happens when the first process attempts to access it.
The first time that a process accesses one of the pages of the shared virtual memory, a page fault
will occur. When Linux fixes up that page fault it finds the vm_area_struct data structure
describing it. This contains pointers to handler routines for this type of shared virtual memory. The
shared memory page fault handling code looks in the list of page table entries for this shmid_ds
to see if one exists for this page of the shared virtual memory. If it does not exist, it will allocate a
physical page and create a page table entry for it.
This entry is saved in the current process' page tables and the shmid_ds.. Consequently, when the
next process that attempts to access this memory gets a page fault, the shared memory fault
handling code will use this newly created physical page for that process too. So, the first process
that accesses a page of the shared memory causes it to be created and thereafter access by the other
processes cause that page to be added into their virtual address spaces.
When processes no longer wish to share the virtual memory, they detach from it. So long as other
processes are still using the memory the detach only affects the current process. Its
vm_area_struct is removed from the shmid_ds data structure and deallocated. The current
process's page tables are updated to invalidate the area of virtual memory that it once shared. When
the last process sharing the memory detaches from it, the pages of the shared memory current in
physical memory are freed, as is the shmid_ds data structure for this shared memory.

Further complications arise when shared virtual memory is not locked into physical memory. In this
case the pages of the shared memory may be swapped out to the system's swap disk during periods
of high memory usage. How shared memory memory is swapped into and out of physical memory
is described in the section on memory management.

Sockets

System V IPC Mechanisms


Linux supports three types of interprocess communication mechanisms that first appeared in Unix TM
System V (1983). These are message queues, semaphores and shared memory. These System V IPC
mechanisms all share common authentication methods. Processes may access these resources only
by passing a unique reference identifier to the kernel via system calls. Access to these System V IPC
objects is checked using access permissions, much like accesses to files are checked. The access
rights to the System V IPC object is set by the creator of the object via system calls. The object's
reference identifier is used by each mechanism as an index into a table of resources. It is not a
straightforward index but requires some manipulation to generate it.
All Linux data structures representing System V IPC objects in the system include an ipc_perm
structure which contains the owner and creator process's user and group identifiers. The access
mode for this object (owner, group and other) and the IPC object's key. The key is used as a way of
locating the System V IPC object's reference identifier. Two sets of keys are supported: public and
private. If the key is public then any process in the system, subject to rights checking, can find the
reference identifier for the System V IPC object. System V IPC objects can never be referenced
with a key, only by their reference identifier.

Kernel Mechanisms
This chapter describes some of the general tasks and mechanisms that the Linux kernel needs to
supply so that other parts of the kernel work effectively together.
Bottom Half Handling

Figure: Bottom Half Handling Data Structures


There are often times when you don't want the to do any work at all. A good example of this is
during processing. When the interrupt was asserted, the processor stopped what it was doing and the
operating system delivered the to the appropriate device driver. Device drivers should not spend too
much time handling interrupts as, during this time, nothing else in the system can run. There is often
some work that could just as well be done later on. Linux's bottom half handlers were invented so
that s and other parts of the Linux could queue work to be done later on. The figure above shows the
kernel data structures associated with bottom half handling.
There can be up to 32 different bottom half handlers, which are referenced through a vector of
pointers called bh_base. These pointers point to each of the kernel's bottom half handling routines.
bh_active and bh_mask have their bits set according to what handlers have been installed and are
active. If bit N of bh_mask is set then the Nth element of bh_base contains the address of a bottom
half routine. If bit N of bh_active is set then the Nth bottom half handler routine should be called as
soon as the scheduler deems reasonable. These indices are statically defined. The timer bottom half
handler (index 0) is the highest priority, the console bottom half handler (index 1) is next in priority
and so on. Typically the bottom half handling routines have lists of tasks associated with them. For
example, the immediate bottom half handler works its way through the immediate tasks
(tq_immediate), which contains tasks that need to be performed immediately.
Some of the 's bottom half handers are device specific, but others are more generic:
TIMER
This handler is marked as active each time the system's periodic timer interrupts and is used to
drive the 's timer mechanisms,
CONSOLE
This handler is used to process console messages,
TQUEUE
This handler is used to process tty messages,
NET
This handler handles general network processing,
IMMEDIATE
This is a generic handler used by several device drivers to work to be done later.

Whenever a , or some other part of the kernel, needs to schedule work to be done later, it adds work
to the appropriate system , for example the timer , and then signals the kernel that some bottom half
handling needs to be done. It does this by setting the appropriate bit in bh_active. Bit 8 is set if the
driver has queued something on the immediate queue and wishes the immediate bottom half handler
to run and process it. The bh_active bitmask is checked at the end of each system call, just before
control is returned to the calling process. If it has any bits set, the bottom half handler routines that
are active are called. Bit 0 is checked first, then 1 and so on until bit 31.
The bit in bh_active is cleared as each bottom half handling routine is called. bh_active is transient;
it only has meaning between calls to the scheduler and is a way of not calling bottom half handling
routines when there is no work for them to do.

Task Queues

Figure: A Task Queue


Task s are the 's way of deferring work until later. Linux queues work using a generic mechanism,
so it can processes the queues later. Task s are often used in conjunction with bottom half handlers;
the timer task queue is processed when the timer queue bottom half handler runs. A task queue is a
simple data structure, see figure 11.2 which consists of a singly linked list of tq_struct data
structures each of which contains the address of a routine and a pointer to some data.
The routine will be called when the element on the task queue is processed and it will be passed a
pointer to the data. Anything in the kernel, for example a device driver, can create and use task s but
there are three task queues created and managed by the :
timer
This is used to queue work that will be done as soon after the next system as is possible.
Each , this queue is checked to see if it contains any entries and, if it does, the timer queue
bottom half handler is made active. The timer bottom half handler is processed, along with all
the other bottom half handlers, when the scheduler next runs. This queue should not be
confused with system timers, which are a much more sophisticated mechanism.
immediate
This is also processed when the scheduler processes the active bottom half handlers. The
immediate bottom half handler is not as high in priority as the timer bottom half handler and
so these tasks will be run later.
scheduler
This task is processed directly by the scheduler. It is used to support other task queues in the
system and, in this case, the task to be run will be a routine that processes a task queue, say
for a .

When task queues are processed, the pointer to the first element in the queue is removed from the
and replaced with a null pointer. In fact, this removal is an operation, one that cannot be interrupted.
Then each element in the has its handling routine called in turn. The elements in the queue are often
statically allocated data. However there is no inherent mechanism for discarding allocated memory.
The task processing routine simply moves to the next element in the list. It is the job of the task
itself to ensure that it properly cleans up any allocated memory.

Wait Queues
There are many times when a process must wait for a system resource. For example a process may
need the VFS inode describing a directory in the file system and that inode may not be in the buffer
cache. In this case the process must wait for that inode to be fetched from the physical media
containing the file system before it can carry on.
wait_queue

*task

*next

The Linux kernel uses a simple data structure, a wait queue (see the figure above), which consists of
a pointer to the processes task_struct and a pointer to the next element in the wait queue.

When processes are added to the end of a wait queue they can either be interruptible or
uninterruptible. Interruptible processes may be interrupted by events such as timers expiring or
signals being delivered whilst they are waiting on a wait queue. The waiting process' state will
reflect this and either be INTERRUPTIBLE or UNINTERRUPTIBLE. As this process can not now
continue to run, the scheduler is run and, when it selects a new process to run, the waiting process
will be suspended.
When the wait queue is processed, the state of every process in the wait queue is set to RUNNING.
If the process has been removed from the run queue, it is put back onto the run queue. The next time
the scheduler runs, the processes that are on the wait queue are now candidates to be run as they are
now no longer waiting. When a process on the wait queue is scheduled, the first thing that it will do
is remove itself from the wait queue. Wait queues can be used to synchronize access to system
resources and they are used by Linux in its implementation of semaphores (see here).

Timers

Figure: System Timers


An operating system needs to be able to schedule an activity sometime in the future. A mechanism
is needed whereby activities can be scheduled to run at some relatively precise time. Any
microprocessor that wishes to support an operating system must have a programmable interval timer
that periodically interrupts the processor. This periodic interrupt is known as a system and it acts
like a metronome, orchestrating the system's activities.
Linux has a very simple view of what time it is; it measures time in s since the system booted. All
system times are based on this measurement, which is known as jiffies after the globally available
variable of the same name.
Linux has two types of system timers, both are routines to be called at some system time but they
are slightly different in their implementations. The Figure above shows both mechanisms.
The first, the old timer mechanism, has a static array of 32 pointers to timer_struct data
structures and a mask of active timers, timer_active.

Where the timers go in the timer table is statically defined (rather like the bottom half handler table
bh_base). Entries are added into this table mostly at system initialization time. The second, newer,
mechanism uses a linked list of timer_list data structures held in ascending expiry time order.
Both methods use the time in jiffies as an expiry time so that a timer that wished to run in 5s would
have to convert 5s to units of jiffies and add that to the current system time to get the system time in
jiffies when the timer should expire. Every system clock tick the timer bottom half handler is
marked as active so that the when the scheduler next runs, the timer queues will be processed. The
timer bottom half handler processes both types of system timer. For the old system timers the
timer_active bit mask is check for bits that are set.

If the expiry time for an active timer has expired (expiry time is less than the current system jiffies),
its timer routine is called and its active bit is cleared. For new system timers, the entries in the
linked list of timer_list data structures are checked.

Every expired timer is removed from the list and its routine is called. The new timer mechanism has
the advantage of being able to pass an argument to the timer routine.

Buzz Locks
Buzz locks, better known as spin locks, are a primitive way of protecting a data structure or piece of
code. They only allow one process at a time to be within a critical region of code. They are used in
Linux to restrict access to fields in data structures, using a single integer field as a lock. Each
process wishing to enter the region attempts to change the lock's initial value from 0 to 1. If its
current value is 1, the process tries again, spinning in a tight loop of code. The access to the
memory location holding the lock must be atomic, the action of reading its value, checking that it is
0 and then changing it to 1 cannot be interrupted by any other process. Most architectures provide
support for this via special instructions but you can also implement buzz locks using uncached main
memory.
When the owning process leaves the critical region of code it decrements the buzz lock, returning its
value to 0. Any processes spinning on the lock will now read it as 0, the first one to do this will
increment it to 1 and enter the critical region.

Semaphores
Semaphores are used to protect critical regions of code or data structures. Remember that each
access of a critical piece of data such as a VFS inode describing a directory is made by kernel code
running on behalf of a process. It would be very dangerous to allow one process to alter a critical
data structure that is being used by another process. One way to achieve this would be to use a buzz
lock around the critical piece of data that is being accessed, but this is a simplistic approach that
would degrade system performance.
Instead Linux uses semaphores to allow just one process at a time to access critical regions of code
and data; all other processes wishing to access this resource will be made to wait until it becomes
free. The waiting processes are suspended, other processes in the system can continue to run as
normal.
A Linux semaphore data structure contains the following information:

count
This field keeps track of the count of processes wishing to use this resource. A positive value
means that the resource is available. A negative or zero value means that processes are waiting
for it. An initial value of 1 means that one and only one process at a time can use this
resource. When processes want this resource they decrement the count and when they have
finished with this resource they increment the count,
waking
This is the count of processes waiting for this resource which is also the number of process
waiting to be awakened when this resource becomes free,
wait queue
When processes are waiting for this resource they are put onto this wait queue,
lock
A buzz lock used when accessing the waking field.

Suppose the initial count for a semaphore is 1, the first process to come along will see that the count
is positive and decrement it by 1, making it 0. The process now ``owns'' the critical piece of code or
resource that is being protected by the semaphore. When the process leaves the critical region it
increments the semphore's count. The most optimal case is where there are no other processes
contending for ownership of the critical region. Linux has implemented semaphores to work
efficiently for this, the most common case.
If another process wishes to enter the critical region whilst it is owned by a process it too will
decrement the count. As the count is now negative (-1) the process cannot enter the critical region.
Instead it must wait until the owning process exits it. Linux makes the waiting process sleep until
the owning process wakes it on exiting the critical region. The waiting process adds itself to the
semaphore's wait queue and sits in a loop checking the value of the waking field and calling the
scheduler until waking is non-zero. The owner of the critical region increments the semaphore's
count and if it is less than or equal to zero then there are processes sleeping, waiting for this
resource. In the optimal case the semaphore's count would have been returned to its initial value of
1 and no further work would be neccessary. The owning process increments the waking counter and
wakes up the process sleeping on the semaphore's wait queue. When the waiting process wakes up,
the waking counter is now 1 and it knows that it may now enter the critical region. It decrements the
waking counter, returning it to a value of zero, and continues. All access to the waking field of
semaphore are protected by a buzz lock using the semaphore's lock.

Interrupts Exceptions and Traps


Normally, processes are asleep, waiting on some event. When that event happens, these processes
are called into action. Remember, it is the responsibility of the sched process to free memory when
a process runs short of it. So, it is not until memory is needed that sched starts up.
How does sched know that memory is needed? When a process makes reference to a place in its
virtual memory space that does not yet exist in physical memory, a page fault occurs. Faults belong
to a group of system events called exceptions. An exception is simply something that occurs outside
of what is normally expected. Faults (exceptions) can occur either before or during the execution of
an instruction.
For example, if an instruction that is not yet in memory needs to be read, the exception (page fault)
occurs before the instruction starts being executed. On the other hand, if the instruction is supposed
to read data from a virtual memory location that isn't in physical memory, the exception occurs
during the execution of the instruction. In cases like these, once the missing memory location is
loaded into physical memory, the CPU can start the instruction.
Traps are exceptions that occur after an instruction has been executed. For example, attempting to
divide by zero generates an exception. However, in this case it doesn't make sense to restart the
instruction because every time we to try to run that instruction, it still comes up with a Divide-by-
Zero exception. That is, all memory references are read before we start to execute the command.
It is also possible for processes to generate exceptions intentionally. These programmed exceptions
are called software interrupts.
When any one of these exceptions occurs, the system must react to the exception. To react, the
system will usually switch to another process to deal with the exception, which means a context
switch. In our discussion of process scheduling, I mentioned that at every clock tick the priority of
every process is recalculated. To make those calculations, something other than those processes
have to run.
In Linux, the system timer (or clock) is programmed to generate a hardware interrupt 100 times a
second (as defined by the HZ system parameter). The interrupt is accomplished by sending a signal
to a special chip on the motherboard called an interrupt controller. (We go into more detail about
these in the section on hardware.) The interrupt controller then sends an interrupt to the CPU. When
the CPU receives this signal, it knows that the clock tick has occurred and it jumps to a special part
of the kernel that handles the clock interrupt. Scheduling priorities are also recalculated within this
same section of code.
Because the system might be doing something more important when the clock generates an
interrupt, you can turn interrupts off using "masking". In other words, there is a way to mask out
interrupts. Interrupts that can be masked out are called maskable interrupts. An example of
something more important than the clock would be accepting input from the keyboard. This is why
clock ticks are lost on systems with a lot of users inputting a lot of data. As a result, the system
clock appears to slow down over time.
Sometimes events occur on the system that you want to know about no matter what. Imagine what
would happen if memory was bad. If the system was in the middle of writing to the hard disk when
it encountered the bad memory, the results could be disastrous. If the system recognizes the bad
memory, the hardware generates an interrupt to alert the CPU. If the CPU is told to ignore all
hardware interrupts, it would ign