Basic File Attributes
The UNIX file system allows the user to access other files not
belonging to them and without infringing on security. A file has a
number of attributes (properties) that are stored in the inode. In this
chapter, we discuss,
ls l to display file attributes (properties)
Listing of a specific directory
Ownership and group ownership
Different file permissions
Listing File Attributes
ls command is used to obtain a list of all filenames in the current
directory. The output in UNIX lingo is often referred to as the listing.
Sometimes we combine this option with other options for displaying
other attributes, or ordering the list in a different sequence. ls look up
the files inode to fetch its attributes. It lists seven attributes of all files
in the current directory and they are:
File type and Permissions
Links
Ownership
Group ownership
File size
Last Modification date and time
File name
The file type and its permissions are associated with each file. Links
indicate the number of file names maintained by the system. This does
not mean that there are so many copies of the file. File is created by
the owner. Every user is attached to a group owner. File size in bytes is
displayed. Last modification time is the next field. If you change only
the permissions or ownership of the file, the modification time remains
unchanged. In the last field, it displays the file name.
For example,
$ ls l
total 72
-rw-r--r-- 1 kumar metal 19514 may 10 13:45 chap01
-rw-r--r-- 1 kumar metal 4174 may 10 15:01 chap02
-rw-rw-rw- 1 kumar metal 84 feb 12 12:30 dept.lst
1
-rw-r--r-- 1 kumar metal 9156 mar 12 1999 genie.sh
drwxr-xr-x 2 kumar metal 512 may 9 10:31 helpdir
drwxr-xr-x 2 kumar metal 512 may 9 09:57 progs
Listing Directory Attributes
ls -d will not list all subdirectories in the current directory
For example,
ls ld helpdir progs
drwxr-xr-x 2 kumar metal 512 may 9 10:31 helpdir
drwxr-xr-x 2 kumar metal 512 may 9 09:57 progs
Directories are easily identified in the listing by the first
character of the first column, which here shows a d. The significance of
the attributes of a directory differs a good deal from an ordinary file. To
see the attributes of a directory rather than the files contained in it,
use ls ld with the directory name. Note that simply using ls d will not
list all subdirectories in the current directory. Strange though it may
seem, ls has no option to list only directories.
File Ownership
When you create a file, you become its owner. Every owner is
attached to a group owner. Several users may belong to a single group,
but the privileges of the group are set by the owner of the file and not
by the group members. When the system administrator creates a user
account, he has to assign these parameters to the user:
The user-id (UID) both its name and numeric representation
The group-id (GID) both its name and numeric representation
File Permissions
UNIX follows a three-tiered file protection system that determines
a files access rights. It is displayed in the following format:
Filetype owner (rwx) groupowner (rwx) others (rwx)
For Example:
-rwxr-xr-- 1 kumar metal 20500 may 10 19:21 chap02
rwx r-x r--
owner/user group owner others
2
The first group has all three permissions. The file is readable,
writable and executable by the owner of the file. The second group has
a hyphen in the middle slot, which indicates the absence of write
permission by the group owner of the file. The third group has the write
and execute bits absent. This set of permissions is applicable to others.
You can set different permissions for the three categories of
users owner, group and others. Its important that you understand
them because a little learning here can be a dangerous thing. Faulty
file permission is a sure recipe for disaster
Changing File Permissions
A file or a directory is created with a default set of permissions,
which can be determined by umask. Let us assume that the file
permission for the created file is -rw-r--r--. Using chmod command, we
can change the file permissions and allow the owner to execute his file.
The command can be used in two ways:
In a relative manner by specifying the changes to the current
permissions
In an absolute manner by specifying the final permissions
Relative Permissions
chmod only changes the permissions specified in the command line
and leaves the other permissions unchanged. Its syntax is:
chmod category operation permission filename(s)
chmod takes an expression as its argument which contains:
user category (user, group, others)
operation to be performed (assign or remove a permission)
type of permission (read, write, execute)
Category operation permission
u - user + assign r - read
g - group - remove w - write
o - others = absolute x - execute
a - all (ugo)
Let us discuss some examples:
Initially,
-rw-r--r-- 1 kumar metal 1906 sep 23:38
xstart
3
chmod u+x xstart
-rwxr--r-- 1 kumar metal 1906 sep 23:38
xstart
The command assigns (+) execute (x) permission to the user (u), other
permissions remain unchanged.
chmod ugo+x xstart or
chmod a+x xstart or
chmod +x xstart
-rwxr-xr-x 1 kumar metal 1906 sep 23:38 xstart
chmod accepts multiple file names in command line
chmod u+x note note1 note3
Let initially,
-rwxr-xr-x 1 kumar metal 1906 sep 23:38 xstart
chmod go-r xstart
Then, it becomes
-rwx--x--x 1 kumar metal 1906 sep 23:38 xstart
Absolute Permissions
Here, we need not to know the current file permissions. We can
set all nine permissions explicitly. A string of three octal digits is used
as an expression. The permission can be represented by one octal digit
for each category. For each category, we add octal digits. If we
represent the permissions of each category by one octal digit, this is
how the permission can be represented:
Read permission 4 (octal 100)
Write permission 2 (octal 010)
Execute permission 1 (octal 001)
Octal Permissions Significance
0 --- no permissions
1 --x execute only
4
2 -w- write only
3 -wx write and execute
4 r-- read only
5 r-x read and execute
6 rw- read and write
7 rwx read, write and execute
We have three categories and three permissions for each
category, so three octal digits can describe a files permissions
completely. The most significant digit represents user and the least one
represents others. chmod can use this three-digit string as the
expression.
Using relative permission, we have,
chmod a+rw xstart
Using absolute permission, we have,
chmod 666 xstart
chmod 644 xstart
chmod 761 xstart
will assign all permissions to the owner, read and write permissions for
the group and only execute permission to the others.
777 signify all permissions for all categories, but still we can
prevent a file from being deleted. 000 signifies absence of all
permissions for all categories, but still we can delete a file. It is the
directory permissions that determine whether a file can be deleted or
not. Only owner can change the file permissions. User can not change
other users files permissions. But the system administrator can do
anything.
The Security Implications
Let the default permission for the file xstart is
-rw-r--r--
chmod u-rw, go-r xstart or
chmod 000 xstart
----------
5
This is simply useless but still the user can delete this file
On the other hand,
chmod a+rwx xstart
chmod 777 xstart
-rwxrwxrwx
The UNIX system by default, never allows this situation as you can
never have a secure system. Hence, directory permissions also play a
very vital role here
We can use chmod Recursively.
chmod -R a+x shell_scripts
This makes all the files and subdirectories found in the shell_scripts
directory, executable by all users. When you know the shell meta
characters well, you will appreciate that the * doesnt match filenames
beginning with a dot. The dot is generally a safer but note that both
commands change the permissions of directories also.
Directory Permissions
It is possible that a file cannot be accessed even though it has
read permission, and can be removed even when it is write protected.
The default permissions of a directory are,
rwxr-xr-x (755)
A directory must never be writable by group and others
Example:
mkdir c_progs
ls ld c_progs
drwxr-xr-x 2 kumar metal 512 may 9 09:57 c_progs
If a directory has write permission for group and others also, be
assured that every user can remove every file in the directory. As a
rule, you must not make directories universally writable unless you
have definite reasons to do so.
6
Changing File Ownership
Usually, on BSD and AT&T systems, there are two commands meant
to change the ownership of a file or directory. Let kumar be the owner
and metal be the group owner. If sharma copies a file of kumar, then
sharma will become its owner and he can manipulate the attributes
chown changing file owner and chgrp changing group owner
On BSD, only system administrator can use chown
On other systems, only the owner can change both
chown
Changing ownership requires superuser permission, so use su
command
ls -l note
-rwxr----x 1 kumar metal 347 may 10 20:30 note
chown sharma note; ls -l note
-rwxr----x 1 sharma metal 347 may 10 20:30 note
Once ownership of the file has been given away to sharma, the
user file permissions that previously applied to Kumar now apply to
sharma. Thus, Kumar can no longer edit note since there is no write
privilege for group and others. He can not get back the ownership
either. But he can copy the file to his own directory, in which case he
becomes the owner of the copy.
chgrp
This command changes the files group owner. No superuser
permission is required.
ls l dept.lst
-rw-r--r-- 1 kumar metal 139 jun 8 16:43 dept.lst
chgrp dba dept.lst; ls l dept.lst
-rw-r--r-- 1 kumar dba 139 jun 8 16:43 dept.lst
7
In this chapter we considered two important file attributes
permissions and ownership. After we complete the first round of
discussions related to files, we will take up the other file attributes.
Source: Sumitabha Das, UNIX Concepts and Applications, 4th
edition, Tata McGraw Hill, 2006
8
FILTERS USING REGULAR EXPRESSIONS grep and sed
We often need to search a file for a pattern, either to see the lines containing (or
not containing) it or to have it replaced with something else. This chapter discusses two
important filters that are specially suited for these tasks grep and sed. grep takes care of
all search requirements we may have. sed goes further and can even manipulate the
individual characters in a line. In fact sed can de several things, some of then quite well.
grep searching for a pattern
It scans the file / input for a pattern and displays lines containing the pattern, the
line numbers or filenames where the pattern occurs. Its a command from a special family
in UNIX for handling search requirements.
grep options pattern filename(s)
grep sales emp.lst
will display lines containing sales from the file emp.lst. Patterns with and without quotes
is possible. Its generally safe to quote the pattern. Quote is mandatory when pattern
involves more than one word. It returns the prompt in case the pattern cant be located.
grep president emp.lst
When grep is used with multiple filenames, it displays the filenames along with the
output.
grep director emp1.lst emp2.lst
Where it shows filename followed by the contents
grep options
grep is one of the most important UNIX commands, and we must know the
options that POSIX requires grep to support. Linux supports all of these options.
-i ignores case for matching
-v doesnt display lines matching expression
-n displays line numbers along with lines
-c displays count of number of occurrences
-l displays list of filenames only
-e exp specifies expression with this option
-x matches pattern with entire line
-f file takes pattrens from file, one per line
-E treats pattren as an extended RE
-F matches multiple fixed strings
9
grep -i agarwal emp.lst
grep -v director emp.lst > otherlist
wc -l otherlist will display 11 otherlist
grep n marketing emp.lst
grep c director emp.lst
grep c director emp*.lst
will print filenames prefixed to the line count
grep l manager *.lst
will display filenames only
grep e Agarwal e aggarwal e agrawal emp.lst
will print matching multiple patterns
grep f pattern.lst emp.lst
all the above three patterns are stored in a separate file pattern.lst
Basic Regular Expressions (BRE) An Introduction
It is tedious to specify each pattern separately with the -e option. grep uses an
expression of a different type to match a group of similar patterns. If an expression uses
meta characters, it is termed a regular expression. Some of the characters used by regular
expression are also meaningful to the shell.
BRE character subset
The basic regular expression character subset uses an elaborate meta character set,
overshadowing the shells wild-cards, and can perform amazing matches.
* Zero or more occurrences
g* nothing or g, gg, ggg, etc.
. A single character
.* nothing or any number of characters
[pqr] a single character p, q or r
[c1-c2] a single character within the ASCII range represented by c1 and c2
10
The character class
grep supports basic regular expressions (BRE) by default and extended regular
expressions (ERE) with the E option. A regular expression allows a group of characters
enclosed within a pair of [ ], in which the match is performed for a single character in the
group.
grep [aA]g[ar][ar]wal emp.lst
A single pattern has matched two similar strings. The pattern [a-zA-Z0-9] matches a
single alphanumeric character. When we use range, make sure that the character on the
left of the hyphen has a lower ASCII value than the one on the right. Negating a class (^)
(caret) can be used to negate the character class. When the character class begins with
this character, all characters other than the ones grouped in the class are matched.
The *
The asterisk refers to the immediately preceding character. * indicates zero or more
occurrences of the previous character.
g* nothing or g, gg, ggg, etc.
grep [aA]gg*[ar][ar]wal emp.lst
Notice that we dont require to use e option three times to get the same output!!!!!
The dot
A dot matches a single character. The shell uses ? Character to indicate that.
.* signifies any number of characters or none
grep j.*saxena emp.lst
Specifying Pattern Locations (^ and $)
Most of the regular expression characters are used for matching patterns, but there
are two that can match a pattern at the beginning or end of a line. Anchoring a pattern is
often necessary when it can occur in more than one place in a line, and we are interested
in its occurance only at a particular location.
^ for matching at the beginning of a line
$ for matching at the end of a line
grep ^2 emp.lst
11
Selects lines where emp_id starting with 2
grep 7$ emp.lst
Selects lines where emp_salary ranges between 7000 to 7999
grep ^[^2] emp.lst
Selects lines where emp_id doesnt start with 2
When meta characters lose their meaning
It is possible that some of these special characters actually exist as part of the text.
Sometimes, we need to escape these characters. For example, when looking for a pattern
g*, we have to use \
To look for [, we use \[
To look for .*, we use \.\*
Extended Regular Expression (ERE) and grep
If current version of grep doesnt support ERE, then use egrep but without the E
option. -E option treats pattern as an ERE.
+ matches one or more occurrences of the previous character
? Matches zero or one occurrence of the previous character
b+ matches b, bb, bbb, etc.
b? matches either a single instance of b or nothing
These characters restrict the scope of match as compared to the *
grep E [aA]gg?arwal emp.lst
# ?include +<stdio.h>
The ERE set
ch+ matches one or more occurrences of character ch
ch? Matches zero or one occurrence of character ch
exp1|exp2 matches exp1 or exp2
(x1|x2)x3 matches x1x3 or x2x3
Matching multiple patterns (|, ( and ))
12
grep E sengupta|dasgupta emp.lst
We can locate both without using e option twice, or
grep E (sen|das)gupta emp.lst
sed The Stream Editor
sed is a multipurpose tool which combines the work of several filters. sed uses
instructions to act on text. An instruction combines an address for selecting lines, with
an action to be taken on them.
sed options address action file(s)
sed supports only the BRE set. Address specifies either one line number to select a single
line or a set of two lines, to select a group of contiguous lines. action specifies print,
insert, delete, substitute the text.
sed processes several instructions in a sequential manner. Each instruction
operates on the output of the previous instruction. In this context, two options are
relevant, and probably they are the only ones we will use with sed the e option that
lets us use multiple instructions, and the f option to take instructions from a file. Both
options are used by grep in identical manner.
Line Addressing
sed 3q emp.lst
Just similar to head n 3 emp.lst. Selects first three lines and quits
sed n 1,2p emp.lst
p prints selected lines as well as all lines. To suppress this behavior, we use n whenever
we use p command
sed n $p emp.lst
Selects last line of the file
sed n 9,11p emp.lst
Selecting lines from anywhere of the file, between lines from 9 to 11
sed n 1,2p
7,9p
$p emp.lst
13
Selecting multiple groups of lines
sed n 3,$!p emp.lst
Negating the action, just same as 1,2p
Using Multiple Instructions (-e and f)
There is adequate scope of using the e and f options whenever sed is used with
multiple instructions.
sed n e 1,2p e 7,9p e $p emp.lst
Let us consider,
cat instr.fil
1,2p
7,9p
$p
-f option to direct the sed to take its instructions from the file
sed n f instr.fil emp.lst
We can combine and use e and f options as many times as we want
sed n f instr.fil1 f instr.fil2 emp.lst
sed n e /saxena/p f instr.fil1 f instr.fil2 emp.lst
Context Addressing
We can specify one or more patterns to locate lines
sed n /director/p emp.lst
We can also specify a comma-separated pair of context addresses to select a group of
lines.
sed n /dasgupta/,/saxena/p emp.lst
Line and context addresses can also be mixed
sed n 1,/dasgupta/p emp.lst
Using regular expressions
14
Context addresses also uses regular expressions.
Sed n /[aA]gg*[ar][ar]wal/p emp.lst
Selects all agarwals.
Sed n /sa[kx]s*ena/p
/gupta/p emp.lst
Selects saxenas and gupta.
We can also use ^ and $, as part of the regular expression syntax.
sed n /50..$/p emp.lst
Selects all people born in the year 1950.
Writing Selected Lines to a File (w)
We can use w command to write the selected lines to a separate file.
sed n /director/w dlist emp.lst
Saves the lines of directors in dlist file
sed n /director/w dlist
/manager/w mlist
/executive/w elist emp.lst
Splits the file among three files
sed n 1,500w foo1
501,$w foo2 foo.main
Line addressing also is possible. Saves first 500 lines in foo1 and the rest in foo2
Text Editing
sed supports inserting (i), appending (a), changing (c) and deleting (d) commands
for the text.
$ sed 1i\
> #include <stdio.h>\
> #include <unistd.h>
> foo.c > $$
15
Will add two include lines in the beginning of foo.c file. Sed identifies the line without
the \ as the last line of input. Redirected to $$ temporary file. This technique has to be
followed when using the a and c commands also. To insert a blank line after each line of
the file is printed (double spacing text), we have,
sed a\
emp.lst
Deleting lines (d)
sed /director/d emp.lst > olist or
sed n /director/!p emp.lst > olist
Selects all lines except those containing director, and saves them in olist
Note that n option not to be used with d
Substitution (s)
Substitution is the most important feature of sed, and this is one job that sed does
exceedingly well.
[address]s/expression1/expression2/flags
Just similar to the syntax of substitution in vi editor, we use it in sed also.
sed s/|/:/ emp.lst | head n 2
2233:a.k.shukla |gm |sales |12/12/52|6000
9876:jai sharma |director|production|12/03/50|7000
Only the first instance of | in a line has been replaced. We need to use the g
(global) flag to replace all the pipes.
sed s/|/:/g emp.lst | head n 2
We can limit the vertical boundaries too by specifying an address (for first three lines
only).
sed 1,3s/|/:/g emp.lst
Replace the word director with member in the first five lines of emp.lst
sed 1,5s/director/member/ emp.lst
16
sed also uses regular expressions for patterns to be substituted. To replace all occurrence
of agarwal, aggarwal and agrawal with simply Agarwal, we have,
sed s/[Aa]gg*[ar][ar]wal/Agarwal/g emp.lst
We can also use ^ and $ with the same meaning. To add 2 prefix to all emp-ids,
sed s/^/2/ emp.lst | head n 1
22233 | a.k.shukla | gm | sales | 12/12/52 | 6000
To add .00 suffix to all salary,
sed s/$/.00/ emp.lst | head n 1
2233 | a.k.shukla | gm | sales | 12/12/52 | 6000.00
Performing multiple substitutions
sed s/<I>/<EM>/g
s/<B>/<STRONG>/g
s/<U>/<EM>/g form.html
An instruction processes the output of the previous instruction, as sed is a stream editor
and works on data stream
sed s/<I>/<EM>/g
s/<EM>/<STRONG>/g form.html
When a g is used at the end of a substitution instruction, the change is performed
globally along the line. Without it, only the left most occurrence is replaced. When there
are a group of instructions to execute, you should place these instructions in a file instead
and use sed with the f option.
Compressing multiple spaces
sed s/*|/|/g emp.lst | tee empn.lst | head n 3
2233|a.k.shukla|g.m|sales|12/12/52|6000
9876|jai sharma|director|production|12/03/50|7000
5678|sumit chakrobarty|dgm|mrking|19/04/43|6000
The remembered patterns
Consider the below three lines which does the same job
17
sed s/director/member/ emp.lst
sed /director/s//member/ emp.lst
sed /director/s/director/member/ emp.lst
The // representing an empty regular expression is interpreted to mean that the search and
substituted patterns are the same
sed s/|//g emp.lst removes every | from file
Basic Regular Expressions (BRE) Revisited
Three more additional types of expressions are:
The repeated patterns - &
The interval regular expression (IRE) { }
The tagged regular expression (TRE) ( )
The repeated patterns - &
To make the entire source pattern appear at the destination also
sed s/director/executive director/ emp.lst
sed s/director/executive &/ emp.lst
sed /director/s//executive &/ emp.lst
Replaces director with executive director where & is a repeated pattern
The interval RE - { }
sed and grep uses IRE that uses an integer to specify the number of characters preceding
a pattern. The IRE uses an escaped pair of curly braces and takes three forms:
ch\{m\} the ch can occur m times
ch\{m,n\} ch can occur between m and n times
ch\{m,\} ch can occur at least m times
The value of m and n can't exceed 255. Let teledir.txt maintains landline and mobile
phone numbers. To select only mobile numbers, use IRE to indicate that a numerical can
occur 10 times.
grep [0-9]\{10\} teledir.txt
18
Line length between 101 and 150
grep ^.\{101,150\}$ foo
Line length at least 101
sed n /.{101,\}/p foo
The Tagged Regular Expression (TRE)
You have to identify the segments of a line that you wish to extract and enclose each
segment with a matched pair of escaped parenthesis. If we need to extract a number, \([0-
9]*\). If we need to extract non alphabetic characters,
\([^a-zA-Z]*\)
Every grouped pattern automatically acquires the numeric label n, where n signifies the
nth group from the left.
sed s/ \ (a-z]*\) *\ ([a-z]*\) / \2, \1/ teledir.txt
To get surname first followed by a , and then the name and rest of the line. sed does not
use compulsorily a / to delimit patterns for substitution. We can use only any character
provided it doesnt occur in the entire command line. Choosing a different delimiter has
allowed us to get away without escaping the / which actually occurs in the pattern.
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
19
MORE FILE ATTRIBUTES
Apart from permissions and ownership, a UNIX file has several other attributes,
and in this chapter, we look at most of the remaining ones. A file also has properties
related to its time stamps and links. It is important to know how these attributes are
interpreted when applied to a directory or a device.
This chapter also introduces the concepts of file system. It also looks at the inode,
the lookup table that contained almost all file attributes. Though a detailed treatment of
the file systems is taken up later, knowledge of its basics is essential to our understanding
of the significance of some of the file attributes. Basic file attributes has helped us to
know about - ls l to display file attributes (properties), listing of a specific directory,
ownership and group ownership and different file permissions. ls l provides attributes
like permissions, links, owner, group owner, size, date and the file name.
File Systems and inodes
The hard disk is split into distinct partitions, with a separate file system in each
partition. Every file system has a directory structure headed by root.
n partitions = n file systems = n separate root directories
All attributes of a file except its name and contents are available in a table inode
(index node), accessed by the inode number. The inode contains the following attributes
of a file:
File type
File permissions
Number of links
The UID of the owner
The GID of the group owner
File size in bytes
Date and time of last modification
Date and time of last access
Date and time of last change of the inode
An array of pointers that keep track of all disk blocks used by the file
Please note that, neither the name of the file nor the inode number is stored in the inode.
To know inode number of a file:
ls -il tulec05
9059 -rw-r--r-- 1 kumar metal 51813 Jan 31 11:15 tulec05
Where, 9059 is the inode number and no other file can have the same inode number in the
same file system.
20
Hard Links
The link count is displayed in the second column of the listing. This count is normally 1,
but the following files have two links,
-rwxr-xr-- 2 kumar metal 163 Jull 13 21:36 backup.sh
-rwxr-xr-- 2 kumar metal 163 Jul 13 21:36 restore.sh
All attributes seem to be identical, but the files could still be copies. Its the link count
that seems to suggest that the files are linked to each other. But this can only be
confirmed by using the i option to ls.
ls -li backup.sh restore.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 backup.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 restore.sh
ln: Creating Hard Links
A file is linked with the ln command which takes two filenames as arguments (cp
command). The command can create both a hard link and a soft link and has syntax
similar to the one used by cp. The following command links emp.lst with employee:
ln emp.lst employee
The i option to ls shows that they have the same inode number, meaning that
they are actually one end the same file:
ls -li emp.lst employee
29518 -rwxr-xr-x 2 kumar metal 915 may 4 09:58 emp.lst
29518 -rwxr-xr-x 2 kumar metal 915 may 4 09:58 employee
The link count, which is normally one for unlinked files, is shown to be two. You
can increase the number of links by adding the third file name emp.dat as:
ln employee emp.dat ; ls -l emp*
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 emp.dat
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 emp.lst
29518 -rwxr-xr-x 3 kumar metal 915 may 4 09:58 employee
You can link multiple files, but then the destination filename must be a directory. A file is
considered to be completely removed from the file system when its link count drops to
zero. ln returns an error when the destination file exists. Use the f option to force the
removal of the existing link before creation of the new one
21
Where to use Hard Links
ln data/ foo.txt input_files
It creates link in directory input_files. With this link available, your existing
programs will continue to find foo.txt in the input_files directory. It is more convenient to
do this that modifies all programs to point to the new path. Links provide some protection
against accidental deletion, especially when they exist in different directories. Because of
links, we dont need to maintain two programs as two separate disk files if there is very
little difference between them. A files name is available to a C program and to a shell
script. A single file with two links can have its program logic make it behave in two
different ways depending on the name by which it is called.
We cant have two linked filenames in two file systems and we cant link a
directory even within the same file system. This can be solved by using symbolic links
(soft links).
Symbolic Links
Unlike the hard linked, a symbolic link doesnt have the files contents, but simply
provides the pathname of the file that actually has the contents.
ln -s note note.sym
ls -li note note.sym
9948 -rw-r--r-- 1 kumar group 80 feb 16 14:52 note
9952 lrwxrwxrwx 1 kumar group 4 feb16 15:07note.sym ->note
Where, l indicate symbolic link file category. -> indicates note.sym contains the
pathname for the filename note. Size of symbolic link is only 4 bytes; it is the length of
the pathname of note.
Its important that this time we indeed have two files, and they are not identical.
Removing note.sym wont affect us much because we can easily recreate the link. But if
we remove note, we would lose the file containing the data. In that case, note.sym would
point to a nonexistent file and become a dangling symbolic link.
Symbolic links can also be used with relative pathnames. Unlike hard links, they
can also span multiple file systems and also link directories. If you have to link all
filenames in a directory to another directory, it makes sense to simply link the directories.
Like other files, a symbolic link has a separate directory entry with its own inode number.
This means that rm can remove a symbolic link even if its points to a directory.
A symbolic link has an inode number separate from the file that it points to. In
most cases, the pathname is stored in the symbolic link and occupies space on disk.
22
However, Linux uses a fast symbolic link which stores the pathname in the inode itself
provided it doesnt exceed 60 characters.
The Directory
A directory has its own permissions, owners and links. The significance of the file
attributes change a great deal when applied to a directory. For example, the size of a
directory is in no way related to the size of files that exists in the directory, but rather to
the number of files housed by it. The higher the number of files, the larger the directory
size. Permission acquires a different meaning when the term is applied to a directory.
ls -l -d progs
drwxr-xr-x 2 kumar metal 320 may 9 09:57 progs
The default permissions are different from those of ordinary files. The user has all
permissions, and group and others have read and execute permissions only. The
permissions of a directory also impact the security of its files. To understand how that can
happen, we must know what permissions for a directory really mean.
Read permission
Read permission for a directory means that the list of filenames stored in that
directory is accessible. Since ls reads the directory to display filenames, if a directorys
read permission is removed, ls wont work. Consider removing the read permission first
from the directory progs,
ls -ld progs
drwxr-xr-x 2 kumar metal 128 jun 18 22:41 progs
chmod -r progs ; ls progs
progs: permission denied
Write permission
We cant write to a directory file. Only the kernel can do that. If that were
possible, any user could destroy the integrity of the file system. Write permission for a
directory implies that you are permitted to create or remove files in it. To try that out,
restore the read permission and remove the write permission from the directory before
you try to copy a file to it.
chmod 555 progs ; ls ld progs
dr-xr-xr-x 2 kumar metal 128 jun 18 22:41 progs
23
cp emp.lst progs
cp: cannot create progs/emp.lst: permission denied
The write permission for a directory determines whether we can create or remove
files in it because these actions modify the directory
Whether we can modify a file depends on whether the file itself has write
permission. Changing a file doesn't modify its directory entry
Execute permission
If a single directory in the pathname doesnt have execute permission, then it
cant be searched for the name of the next directory. Thats why the execute privilege of a
directory is often referred to as the search permission. A directory has to be searched for
the next directory, so the cd command wont work if the search permission for the
directory is turned off.
chmod 666 progs ; ls ld progs
drw-rw-rw- 2 kumar metal 128 jun 18 22:41 progs
cd progs
permission denied to search and execute it
umask: DEFAULT FILE AND DIRECTORY PERMISSIONS
When we create files and directories, the permissions assigned to them depend on
the systems default setting. The UNIX system has the following default permissions for
all files and directories.
rw-rw-rw- (octal 666) for regular files
rwxrwxrwx (octal 777) for directories
The default is transformed by subtracting the user mask from it to remove one or
more permissions. We can evaluate the current value of the mask by using umask without
arguments,
$ umask
022
This becomes 644 (666-022) for ordinary files and 755 (777-022) for directories umask
000. This indicates, we are not subtracting anything and the default permissions will
remain unchanged. Note that, changing system wide default permission settings is
possible using chmod but not by umask
24
MODIFICATION AND ACCESS TIMES
A UNIX file has three time stamps associated with it. Among them, two are:
Time of last file modification ls -l
Time of last access ls lu
The access time is displayed when ls -l is combined with the -u option. Knowledge of
files modification and access times is extremely important for the system administrator.
Many of the tools used by them look at these time stamps to decide whether a particular
file will participate in a backup or not.
TOUCH COMMAND changing the time stamps
To set the modification and access times to predefined values, we have,
touch options expression filename(s)
touch emp.lst (without options and expression)
Then, both times are set to the current time and creates the file, if it doesnt exist.
touch command (without options but with expression) can be used. The expression
consists of MMDDhhmm (month, day, hour and minute).
touch 03161430 emp.lst ; ls -l emp.lst
-rw-r--r-- 1 kumar metal 870 mar 16 14:30 emp.lst
ls -lu emp.lst
-rw-r--r-- 1 kumar metal 870 mar 16 14:30 emp.lst
It is possible to change the two times individually. The m and a options change the
modification and access times, respectively:
touch command (with options and expression)
-m for changing modification time
-a for changing access time
touch -m 02281030 emp.lst ; ls -l emp.lst
-rw-r--r-- 1 kumar metal 870 feb 28 10:30 emp.lst
touch -a 01261650 emp.lst ; ls -lu emp.lst
25
-rw-r--r-- 1 kumar metal 870 jan 26 16:50 emp.lst
find : locating files
It recursively examines a directory tree to look for files matching some criteria,
and then takes some action on the selected files. It has a difficult command line, and if
you have ever wondered why UNIX is hated by many, then you should look up the
cryptic find documentation. How ever, find is easily tamed if you break up its arguments
into three components:
find path_list selecton_criteria action
where,
Recursively examines all files specified in path_list
It then matches each file for one or more selection-criteria
It takes some action on those selected files
The path_list comprises one or more subdirectories separated by white space. There can
also be a host of selection_criteria that you use to match a file, and multiple actions to
dispose of the file. This makes the command difficult to use initially, but it is a program
that every user must master since it lets him make file selection under practically any
condition.
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
26
SIMPLE FILTERS
Filters are the commands which accept data from standard input manipulate it and
write the results to standard output. Filters are the central tools of the UNIX tool kit, and
each filter performs a simple function. Some commands use delimiter, pipe (|) or colon
(:). Many filters work well with delimited fields, and some simply wont work without
them. The piping mechanism allows the standard output of one filter serve as standard
input of another. The filters can read data from standard input when used without a
filename as argument, and from the file otherwise
The Simple Database
Several UNIX commands are provided for text editing and shell programming.
(emp.lst) - each line of this file has six fields separated by five delimiters. The details of
an employee are stored in one single line. This text file designed in fixed format and
containing a personnel database. There are 15 lines, where each field is separated by the
delimiter |.
$ cat emp.lst
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
9876 | jai sharma | director | production | 12/03/50 | 7000
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600
4290 | jayant choudhury | executive | production | 07/09/50 | 6000
2476 | anil aggarwal | manager | sales | 01/05/59 | 5000
6521 | lalit chowdury | directir | marketing | 26/09/45 | 8200
3212 | shyam saksena | d.g.m. | accounts | 12/12/55 | 6000
3564 | sudhir agarwal | executive | personnel | 06/07/47 | 7500
2345 | j. b. sexena | g.m. | marketing | 12/03/45 | 8000
0110 | v.k.agrawal | g.m.| marketing | 31/12/40 | 9000
pr : paginating files
We know that,
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
27
05|production|9876
06|sales|1006
pr command adds suitable headers, footers and formatted text. pr adds five lines of
margin at the top and bottom. The header shows the date and time of last modification of
the file along with the filename and page number.
pr dept.lst
May 06 10:38 1997 dept.lst page 1
01:accounts:6213
02:progs:5423
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
blank lines
pr options
The different options for pr command are:
-k prints k (integer) columns
-t to suppress the header and footer
-h to have a header of users choice
-d double spaces input
-n will number each line and helps in debugging
-on offsets the lines by n spaces and increases left margin of page
pr +10 chap01
starts printing from page 10
pr -l 54 chap01
this option sets the page length to 54
head displaying the beginning of the file
The command displays the top of the file. It displays the first 10 lines of the file,
when used without an option.
head emp.lst
28
-n to specify a line count
head -n 3 emp.lst
will display the first three lines of the file.
tail displaying the end of a file
This command displays the end of the file. It displays the last 10 lines of the file,
when used without an option.
tail emp.lst
-n to specify a line count
tail -n 3 emp.lst
displays the last three lines of the file. We can also address lines from the
beginning of the file instead of the end. The +count option allows to do that, where count
represents the line number from where the selection should begin.
tail +11 emp.lst
Will display 11th line onwards
Different options for tail are:
Monitoring the file growth (-f)
Extracting bytes rather than lines (-c)
Use tail f when we are running a program that continuously writes to a file, and we want
to see how the file is growing. We have to terminate this command with the interrupt key.
cut slitting a file vertically
It is used for slitting the file vertically. head -n 5 emp.lst | tee shortlist will select
the first five lines of emp.lst and saves it to shortlist. We can cut by using -c option with a
list of column numbers, delimited by a comma (cutting columns).
cut -c 6-22,24-32 shortlist
cut -c -3,6-22,28-34,55- shortlist
The expression 55- indicates column number 55 to end of line. Similarly, -3 is the same
as 1-3.
Most files dont contain fixed length lines, so we have to cut fields rather than columns
(cutting fields).
29
-d for the field delimiter
-f for the field list
cut -d \ | -f 2,3 shortlist | tee cutlist1
will display the second and third columns of shortlist and saves the output in
cutlist1. here | is escaped to prevent it as pipeline character
To print the remaining fields, we have
cut d \ | -f 1,4- shortlist > cutlist2
paste pasting files
When we cut with cut, it can be pasted back with the paste command, vertically rather
than horizontally. We can view two files side by side by pasting them. In the previous
topic, cut was used to create the two files cutlist1 and cutlist2 containing two cut-out
portions of the same file.
paste cutlist1 cutlist2
We can specify one or more delimiters with -d
paste -d | cutlist1 cutlist2
Where each field will be separated by the delimiter |. Even though paste uses at least two
files for concatenating lines, the data for one file can be supplied through the standard
input.
Joining lines (-s)
Let us consider that the file address book contains the details of three persons
cat addressbook
paste -s addressbook -to print in one single line
paste -s -d | | \n addressbook -are used in a circular manner
sort : ordering a file
Sorting is the ordering of data in ascending or descending sequence. The sort command
orders a file and by default, the entire line is sorted
sort shortlist
30
This default sorting sequence can be altered by using certain options. We can also sort
one or more keys (fileds) or use a different ordering rule.
sort options
The important sort options are:
-tchar uses delimiter char to identify fields
-k n sorts on nth field
-k m,n starts sort on mth field and ends sort on nth field
-k m.n starts sort on nth column of mth field
-u removes repeated lines
-n sorts numerically
-r reverses sort order
-f folds lowercase to equivalent uppercase
-m list merges sorted files in list
-c checks if file is sorted
-o flname places output in file flname
sort t| k 2 shortlist
sorts the second field (name)
sort t| r k 2 shortlist or
sort t| k 2r shortlist
sort order can be revered with this r option.
sort t| k 3,3 k 2,2 shortlist
sorting on secondary key is also possible as shown above.
sort t| k 5.7,5.8 shortlist
we can also specify a character position with in a field to be the beginning of sort
as shown above (sorting on columns).
sort n numfile
when sort acts on numericals, strange things can happen. When we sort a file
containing only numbers, we get a curious result. This can be overridden by n (numeric)
option.
cut d | f3 emp.lst | sort u | tee desigx.lst
31
Removing repeated lines can be possible using u option as shown above. If we
cut out the designation filed from emp.lst, we can pipe it to sort to find out the unique
designations that occur in the file.
Other sort options are:
sort o sortedlist k 3 shortlist
sort o shortlist shortlist
sort c shortlist
sort t | c k 2 shortlist
sort m foo1 foo2 foo3
uniq command locate repeated and nonrepeated lines
When we concatenate or merge files, we will face the problem of duplicate entries
creeping in. we saw how sort removes them with the u option. UNIX offers a special
tool to handle these lines the uniq command. Consider a sorted dept.lst that includes
repeated lines:
cat dept.lst
displays all lines with duplicates. Where as,
uniq dept.lst
simply fetches one copy of each line and writes it to the standard output. Since uniq
requires a sorted file as input, the general procedure is to sort a file and pipe its output to
uniq. The following pipeline also produces the same output, except that the output is
saved in a file:
sort dept.lst | uniq uniqlist
Different uniq options are :
Selecting the nonrepeated lines (-u)
cut d | f3 emp.lst | sort | uniq u
Selecting the duplicate lines (-d)
cut d | f3 emp.lst | sort | uniq d
32
Counting frequency of occurrence (-c)
cut d | f3 emp.lst | sort | uniq c
tr command translating characters
The tr filter manipulates the individual characters in a line. It translates characters
using one or two compact expressions.
tr options expn1 expn2 standard input
It takes input only from standard input, it doesnt take a filename as argument. By
default, it translates each character in expression1 to its mapped counterpart in
expression2. The first character in the first expression is replaced with the first character
in the second expression, and similarly for the other characters.
tr |/ ~- < emp.lst | head n 3
exp1=|/ ; exp2=~-
tr $exp1 $exp2 < emp.lst
Changing case of text is possible from lower to upper for first three lines of the file.
head n 3 emp.lst | tr [a-z] [A-Z]
Different tr options are:
Deleting charecters (-d)
tr d |/ < emp.lst | head n 3
Compressing multiple consecutive charecters (-s)
tr s < emp.lst | head n 3
Complementing values of expression (-c)
tr cd |/ < emp.lst
Using ASCII octal values and escape sequences
tr | \012 < emp.lst | head n 6
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
33
The vi Editor
To write and edit some programs and scripts, we require editors. UNIX provides vi
editor for BSD system created by Bill Joy. Bram Moolenaar improved vi editor and
called it as vim (vi improved) on Linux OS.
vi Basics
To add some text to a file, we invoke,
vi <filename>
In all probability, the file doesnt exist, and vi presents you a full screen with the
filename shown at the bottom with the qualifier. The cursor is positioned at the top and all
remaining lines of the screen show a ~. They are non-existent lines. The last line is
reserved for commands that you can enter to act on text. This line is also used by the
system to display messages. This is the command mode. This is the mode where you can
pass commands to act on text, using most of the keys of the keyboard. This is the default
mode of the editor where every key pressed is interpreted as a command to run on text.
You will have to be in this mode to copy and delete text
For, text editing, vi uses 24 out of 25 lines that are normally available in the
terminal. To enter text, you must switch to the input mode. First press the key i, and you
are in this mode ready to input text. Subsequent key depressions will then show up on the
screen as text input.
After text entry is complete, the cursor is positioned on the last character of the
last line. This is known as current line and the character where the cursor is stationed is
the current cursor position. This mode is used to handle files and perform substitution.
After the command is run, you are back to the default command mode. If a word has been
misspelled, use ctrl-w to erase the entire word.
Now press esc key to revert to command mode. Press it again and you will hear a
beep. A beep in vi indicates that a key has been pressed unnecessarily. Actually, the text
entered has not been saved on disk but exists in some temporary storage called a buffer.
To save the entered text, you must switch to the execute mode (the last line mode).
Invoke the execute mode from the command mode by entering a: which shows up in the
last line.
The Repeat Factor
vi provides repeat factor in command and input mode commands. Command
mode command k moves the cursor one line up. 10k moves cursor 10 lines up.
To undo whenever you make a mistake, press
Esc u
34
To clear the screen in command mode, press
ctrl-l
Dont use (caps lock) - vi commands are case-sensitive
Avoid using the PC navigation keys
Input Mode Entering and Replacing Text
It is possible to display the mode in which is user is in by typing,
:set showmode
Messages like INSERT MODE, REPLACE MODE, CHANGE MODE, etc will appear in
the last line.
Pressing i changes the mode from command to input mode. To append text to the right
of the cursor position, we use a, text. I and A behave same as i and a, but at line extremes
I inserts text at the beginning of line. A appends text at end of line. o opens a new line
below the current line
r<letter> replacing a single character
s<text/word> replacing text with s
R<text/word> replacing text with R
Press esc key to switch to command mode after you have keyed in text
Some of the input mode commands are:
COMMAND FUNCTION
i inserts text
a appends text
I inserts at beginning of line
A appends text at end of line
o opens line below
O opens line above
r replaces a single character
s replaces with a text
S replaces entire line
Saving Text and Quitting The ex Mode
When you edit a file using vi, the original file is not distributed as such, but only a
copy of it that is placed in a buffer. From time to time, you should save your work by
writing the buffer contents to disk to keep the disk file current. When we talk of saving a
file, we actually mean saving this buffer. You may also need to quit vi after or without
saving the buffer. Some of the save and exit commands of the ex mode is:
35
Command Action
:W saves file and remains in editing mode
:x saves and quits editing mode
:wq saves and quits editing mode
:w <filename> save as
:w! <filename> save as, but overwrites existing file
:q quits editing mode
:q! quits editing mode by rejecting changes made
:sh escapes to UNIX shell
:recover recovers file from a crash
Navigation
A command mode command doesnt show up on screen but simply performs a function.
To move the cursor in four directions,
k moves cursor up
j moves cursor down
h moves cursor left
l moves cursor right
Word Navigation
Moving by one character is not always enough. You will often need to move faster
along a line. vi understands a word as a navigation unit which can be defined in two
ways, depending on the key pressed. If your cursor is a number of words away from your
desired position, you can use the word-navigation commands to go there directly. There
are three basic commands:
b moves back to beginning of word
e moves forward to end of word
w moves forward to beginning word
Example,
5b takes the cursor 5 words back
3w takes the cursor 3 words forward
Moving to Line Extremes
Moving to the beginning or end of a line is a common requirement.
To move to the first character of a line
0 or |
30| moves cursor to column 30
36
$ moves to the end of the current line
The use of these commands along with b, e, and w is allowed
Scrolling
Faster movement can be achieved by scrolling text in the window using the
control keys. The two commands for scrolling a page at a time are
ctrl-f scrolls forward
ctrl-b scrolls backward
10ctrl-f scroll 10 pages and navigate faster
ctrl-d scrolls half page forward
ctrl-u scrolls half page backward
The repeat factor can also be used here.
Absolute Movement
The editor displays the total number of lines in the last line
Ctrl-g to know the current line number
40G goes to line number 40
1G goes to line number 1
G goes to end of file
Editing Text
The editing facilitates in vi are very elaborate and invoke the use of operators. They use
operators, such as,
d delete
y yank (copy)
Deleting Text
x deletes a single character
dd delete entire line
yy copy entire line
6dd deletes the current line and five lines below
Moving Text
Moving text (p) puts the text at the new location.
37
p and P place text on right and left only when you delete parts of lines. But the same keys
get associated with below and above when you delete complete lines
Copying Text
Copying text (y and p) is achieved as,
yy copies current line
10yy copies current line & 9 lines below
Joining Lines
J to join the current line and the line following it
4J joins following 3 lines with current line
Undoing Last Editing Instructions
In command mode, to undo the last change made, we use u
To discard all changes made to the current line, we use U
vim (LINUX) lets you undo and redo multiple editing instructions. u behaves
differently here; repeated use of this key progressively undoes your previous actions. You
could even have the original file in front of you. Further 10u reverses your last 10 editing
actions. The function of U remains the same.
You may overshoot the desired mark when you keep u pressed, in which case use
ctrl-r to redo your undone actions. Further, undoing with 10u can be completely reversed
with 10ctrl-r. The undoing limit is set by the execute mode command: set undolevels=n,
where n is set to 1000 by default.
Repeating the Last Command
The . (dot) command is used for repeating the last instruction in both editing and
command mode commands
For example:
2dd deletes 2 lines from current line and to repeat this operation, type. (dot)
Searching for a Pattern
/ search forward
? search backward
/printf
The search begins forward to position the cursor on the first instance of the word
38
?pattern
Searches backward for the most previous instance of the pattern
Repeating the Last Pattern Search
n repeats search in same direction of original search
n doesnt necessarily repeat a search in the forward direction. The direction
depends on the search command used. If you used? printf to search in the reverse
direction in the first place, then n also follows the same direction. In that case, N will
repeat the search in the forward direction, and not n.
Search and repeat commands
Command Function
/pat searches forward for pattern pat
?pat searches backward for pattern pat
n repeats search in same direction along which previous search was made
N repeats search in direction opposite to that along which previous search was
made
Substitution search and replace
We can perform search and replace in execute mode using :s. Its syntax is,
:address/source_pattern/target_pattern/flags
:1,$s/director/member/g can also use % instead of 1,$
:1,50s/unsigned//g deletes unsigned everywhere in lines 1 to 50
:3,10s/director/member/g substitute lines 3 through 10
:.s/director/member/g only the current line
:$s/director/member/g only the last line
Interactive substitution: sometimes you may like to selectively replace a string. In that
case, add the c parameter as the flag at the end:
:1,$s/director/member/gc
Each line is selected in turn, followed by a sequence of carets in the next line, just below
the pattern that requires substitution. The cursor is positioned at the end of this caret
sequence, waiting for your response.
The ex mode is also used for substitution. Both search and replace operations also
use regular expressions for matching multiple patterns.
39
The features of vi editor that have been highlighted so far are good enough for a
beginner who should not proceed any further before mastering most of them. There are
many more functions that make vi a very powerful editor. Can you copy three words or
even the entire file using simple keystrokes? Can you copy or move multiple sections of
text from one file to another in a single file switch? How do you compile your C and Java
programs without leaving the editor? vi can do all this.
Source: Sumitabha Das, UNIX Concepts and Applications, 4th edition, Tata
McGraw Hill, 2006
40