0% found this document useful (0 votes)
1K views336 pages

UNIX Programmer's Manual

Uploaded by

Siva Sankar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views336 pages

UNIX Programmer's Manual

Uploaded by

Siva Sankar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 336

UNIXTM TIME-SHARING SYSTEM:

UNIX PROGRAMMER’S MANUAL

Seventh Edition, Volume 2A

January, 1979

Bell Telephone Laboratories, Incorporated


Murray Hill, New Jersey
UNIX Programmer’s Manual
Volume 2 — Supplementary Documents

Seventh Edition
January 10, 1979

This volume contains documents which supplement the information contained in Volume 1 of The
UNIX† Programmer’s Manual. The documents here are grouped roughly into the areas of basics, editing,
language tools, document preparation, and system maintenance. Further general information may be
found in the Bell System Technical Journal special issue on UNIX, July-August, 1978.
Many of the documents cited within this volume as Bell Laboratories internal memoranda or Com-
puting Science Technical Reports (CSTR) are also contained here.
These documents contain occasional localisms, typically references to other operating systems like
GCOS and IBM. In all cases, such references may be safely ignored by UNIX users.

General Works
1. 7th Edition UNIX — Summary.
A concise summary of the facilities available on UNIX.
2. The UNIX Time-Sharing System. D. M. Ritchie and K. Thompson.
The original UNIX paper, reprinted from CACM.

Getting Started
3. UNIX for Beginners — Second Edition. B. W. Kernighan.
An introduction to the most basic use of the system.
4. A Tutorial Introduction to the UNIX Text Editor. B. W. Kernighan.
An easy way to get started with the editor.
5. Advanced Editing on UNIX. B. W. Kernighan.
The next step.
6. An Introduction to the UNIX Shell. S. R. Bourne.
An introduction to the capabilities of the command interpreter, the shell.
7. Learn — Computer Aided Instruction on UNIX. M. E. Lesk and B. W. Kernighan.
Describes a computer-aided instruction program that walks new users through the basics of
files, the editor, and document preparation software.

Document Preparation
8. Typing Documents on the UNIX System. M. E. Lesk.
Describes the basic use of the formatting tools. Also describes ‘‘– ms’’, a standardized
package of formatting requests that can be used to lay out most documents (including those
in this volume).

__________________
†UNIX is a Trademark of Bell Laboratories.
-2-

9. A System for Typesetting Mathematics. B. W. Kernighan and L. L. Cherry.


Describes EQN. an easy-to-learn language for doing high-quality mathematical typesetting,
10. TBL — A Program to Format Tables. M. E. Lesk.
A program to permit easy specification of tabular material for typesetting. Again, easy to
learn and use.
11. Some Applications of Inverted Indexes on the UNIX System. M. E. Lesk.
Describes, among other things, the program REFER which fills in bibliographic citations
from a data base automatically.
12. NROFF/TROFF User’s Manual. J. F. Ossanna.
The basic formatting program.
13. A TROFF Tutorial. B. W. Kernighan.
An introduction to TROFF for those who really want to know such things.

Programming
14. The C Programming Language — Reference Manual. D. M. Ritchie.
Official statement of the syntax and semantics of C. Should be supplemented by The C
Programming Language, B. W. Kernighan and D. M. Ritchie, Prentice-Hall, 1978, which
contains a tutorial introduction and many examples.
15. Lint, A C Program Checker. S. C. Johnson.
Checks C programs for syntax errors, type violations, portability problems, and a variety of
probable errors.
16. Make — A Program for Maintaining Computer Programs. S. I. Feldman.
Indispensable tool for making sure that large programs are properly compiled with minimal
effort.
17. UNIX Programming. B. W. Kernighan and D. M. Ritchie.
Describes the programming interface to the operating system and the standard I/O library.
18. A Tutorial Introduction to ADB. J. F. Maranzano and S. R. Bourne.
How to use the ADB debugger.

Supporting Tools and Languages


19. YACC: Yet Another Compiler-Compiler. S. C. Johnson.
Converts a BNF specification of a language and semantic actions written in C into a com-
piler for the language.
20. LEX — A Lexical Analyzer Generator. M. E. Lesk and E. Schmidt.
Creates a recognizer for a set of regular expressions; each regular expression can be fol-
lowed by arbitrary C code which will be executed when the regular expression is found.
21. A Portable Fortran 77 Compiler. S. I. Feldman and P. J. Weinberger.
The first Fortran 77 compiler, and still one of the best.
22. Ratfor — A Preprocessor for a Rational Fortran. B. W. Kernighan.
Converts a Fortran with C-like control structures and cosmetics into real, ugly Fortran.
23. The M4 Macro Processor. B. W. Kernighan and D. M. Ritchie.
M4 is a macro processor useful as a front end for C, Ratfor, Cobol, and in its own right.
24. SED — A Non-interactive Text Editor. L. E. McMahon.
A variant of the editor for processing large inputs.
25. AWK — A Pattern Scanning and Processing Language. A. V. Aho, B. W. Kernighan and
P. J. Weinberger.
Makes it easy to specify many data transformation and selection operations.
-3-

26. DC — An Interactive Desk Calculator. R. H. Morris and L. L. Cherry.


A super HP calculator, if you don’t need floating point.
27. BC — An Arbitrary Precision Desk-Calculator Language. L. L. Cherry and R. H. Morris.
A front end for DC that provides infix notation, control flow, and built-in functions.
28. UNIX Assembler Reference Manual. D. M. Ritchie.
The ultimate dead language.

Implementation, Maintenance, and Miscellaneous


29. Setting Up UNIX — Seventh Edition. C. B. Haley and D. M. Ritchie.
How to configure and get your system running.
30. Regenerating System Software. C. B. Haley and D. M. Ritchie.
What do do when you have to change things.
31. UNIX Implementation. K. Thompson.
How the system actually works inside.
32. The UNIX I/O System. D. M. Ritchie.
How the I/O system really works.
33. A Tour Through the UNIX C Compiler. D. M. Ritchie.
How the PDP-11 compiler works inside.
34. A Tour Through the Portable C Compiler. S. C. Johnson.
How the portable C compiler works inside.
35. A Dial-Up Network of UNIX Systems. D. A. Nowitz and M. E. Lesk.
Describes UUCP, a program for communicating files between UNIX systems.
36. UUCP Implementation Description. D. A. Nowitz.
How UUCP works, and how to administer it.
37. On the Security of UNIX. D. M. Ritchie.
Hints on how to break UNIX, and how to avoid doing so.
38. Password Security: A Case History. R. H. Morris and K. Thompson.
How the bad guys used to be able to break the password algorithm, and why they can’t
now, at least not so easily.
7th Edition UNIX — Summary

September 6, 1978
Bell Laboratories
Murray Hill, New Jersey 07974

A. What’s new: highlights of the 7th edition UNIX† System


Aimed at larger systems. Devices are addressable to 231 bytes, files to 230 bytes. 128K memory
(separate instruction and data space) is needed for some utilities.
Portability. Code of the operating system and most utilities has been extensively revised to minimize
its dependence on particular hardware.
Fortran 77. F77 compiler for the new standard language is compatible with C at the object level. A
Fortran structurer, STRUCT, converts old, ugly Fortran into RATFOR, a structured dialect usable with
F77.
Shell. Completely new SH program supports string variables, trap handling, structured programming,
user profiles, settable search path, multilevel file name generation, etc.
Document preparation. TROFF phototypesetter utility is standard. NROFF (for terminals) is now
highly compatible with TROFF. MS macro package provides canned commands for many common for-
matting and layout situations. TBL provides an easy to learn language for preparing complicated tabular
material. REFER fills in bibliographic citations from a data base.
UNIX-to-UNIX file copy. UUCP performs spooled file transfers between any two machines.
Data processing. SED stream editor does multiple editing functions in parallel on a data stream of
indefinite length. AWK report generator does free-field pattern selection and arithmetic operations.
Program development. MAKE controls re-creation of complicated software, arranging for minimal
recompilation.
Debugging. ADB does postmortem and breakpoint debugging, handles separate instruction and data
spaces, floating point, etc.
C language. The language now supports definable data types, generalized initialization, block structure,
long integers, unions, explicit type conversions. The LINT verifier does strong type checking and detec-
tion of probable errors and portability problems even across separately compiled functions.
Lexical analyzer generator. LEX converts specification of regular expressions and semantic actions
into a recognizing subroutine. Analogous to YACC.
Graphics. Simple graph-drawing utility, graphic subroutines, and generalized plotting filters adapted to
various devices are now standard.
Standard input-output package. Highly efficient buffered stream I/O is integrated with formatted
input and output.
Other. The operating system and utilities have been enhanced and freed of restrictions in many other
ways too numerous to relate.

__________________
† UNIX is a Trademark of Bell Laboratories.
-2-

B. Hardware
The 7th edition UNIX operating system runs on a DEC PDP-11/45 or 11/70* with at least the fol-
lowing equipment:
128K to 2M words of managed memory; parity not used.
disk: RP03, RP04, RP06, RK05 (more than 1 RK05) or equivalent.
console typewriter.
clock: KW11-L or KW11-P.
The following equipment is strongly recommended:
communications controller such as DL11 or DH11.
full duplex 96-character ASCII terminals.
9-track tape or extra disk for system backup.
The system is normally distributed on 9-track tape. The minimum memory and disk space specified is
enough to run and maintain UNIX. More will be needed to keep all source on line, or to handle a large
number of users, big data bases, diversified complements of devices, or large programs. The resident
code occupies 12-20K words depending on configuration; system data occupies 10-28K words.
There is no commitment to provide 7th edition UNIX on PDP-11/34, 11/40 and 11/60 hardware.

C. Software
Most of the programs available as UNIX commands are listed. Source code and printed manuals
are distributed for all of the listed software except games. Almost all of the code is written in C. Com-
mands are self-contained and do not require extra setup information, unless specifically noted as
‘‘interactive.’’ Interactive programs can be made to run from a prepared script simply by redirecting
input. Most programs intended for interactive use (e.g., the editor) allow for an escape to command
level (the Shell). Most file processing commands can also go from standard input to standard output
(‘‘filters’’). The piping facility of the Shell may be used to connect such filters directly to the input or
output of other programs.

1. Basic Software
This includes the time-sharing operating system with utilities, a machine language assembler and a
compiler for the programming language C—enough software to write and run new applications and to
maintain or modify UNIX itself.

1.1. Operating System


UNIX The basic resident code on which everything else depends. Supports the system calls,
and maintains the file system. A general description of UNIX design philosophy and
system facilities appeared in the Communications of the ACM, July, 1974. A more
extensive survey is in the Bell System Technical Journal for July-August 1978. Capa-
bilities include:
Reentrant code for user processes.
Separate instruction and data spaces.
‘‘Group’’ access permissions for cooperative projects, with overlapping member-
ships.
Alarm-clock timeouts.
Timer-interrupt sampling and interprocess monitoring for debugging and measure-
ment.

__________________
*PDP is a Trademark of Digital Equipment Corporation.
-3-

Multiplexed I/O for machine-to-machine communication.


DEVICES All I/O is logically synchronous. I/O devices are simply files in the file system. Nor-
mally, invisible buffering makes all physical record structure and device characteristics
transparent and exploits the hardware’s ability to do overlapped I/O. Unbuffered phy-
sical record I/O is available for unusual applications. Drivers for these devices are
available; others can be easily written:
Asynchronous interfaces: DH11, DL11. Support for most common ASCII terminals.
Synchronous interface: DP11.
Automatic calling unit interface: DN11.
Line printer: LP11.
Magnetic tape: TU10 and TU16.
DECtape: TC11.
Fixed head disk: RS11, RS03 and RS04.
Pack type disk: RP03, RP04, RP06; minimum-latency seek scheduling.
Cartridge-type disk: RK05, one or more physical devices per logical device.
Null device.
Physical memory of PDP-11, or mapped memory in resident system.
Phototypesetter: Graphic Systems System/1 through DR11C.
BOOT Procedures to get UNIX started.
MKCONF Tailor device-dependent system code to hardware configuration. As distributed, UNIX
can be brought up directly on any acceptable CPU with any acceptable disk, any
sufficient amount of core, and either clock. Other changes, such as optimal assignment
of directories to devices, inclusion of floating point simulator, or installation of device
names in file system, can then be made at leisure.

1.2. User Access Control

LOGIN Sign on as a new user.


Verify password and establish user’s individual and group (project) identity.
Adapt to characteristics of terminal.
Establish working directory.
Announce presence of mail (from MAIL).
Publish message of the day.
Execute user-specified profile.
Start command interpreter or other initial program.
PASSWD Change a password.
User can change his own password.
Passwords are kept encrypted for security.
NEWGRP Change working group (project). Protects against unauthorized changes to projects.

1.3. Terminal Handling

TABS Set tab stops appropriately for specified terminal type.


STTY Set up options for optimal control of a terminal. In so far as they are deducible from
the input, these options are set automatically by LOGIN.
Half vs. full duplex.
Carriage return+line feed vs. newline.
Interpretation of tabs.
Parity.
-4-

Mapping of upper case to lower.


Raw vs. edited input.
Delays for tabs, newlines and carriage returns.

1.4. File Manipulation

CAT Concatenate one or more files onto standard output. Particularly used for unadorned
printing, for inserting data into a pipeline, and for buffering output that comes in dribs
and drabs. Works on any file regardless of contents.
CP Copy one file to another, or a set of files to a directory. Works on any file regardless
of contents.
PR Print files with title, date, and page number on every page.
Multicolumn output.
Parallel column merge of several files.
LPR Off-line print. Spools arbitrary files to the line printer.
CMP Compare two files and report if different.
TAIL Print last n lines of input
May print last n characters, or from n lines or characters to end.
SPLIT Split a large file into more manageable pieces. Occasionally necessary for editing
(ED).
DD Physical file format translator, for exchanging data with foreign systems, especially
IBM 370’s.
SUM Sum the words of a file.

1.5. Manipulation of Directories and File Names

RM Remove a file. Only the name goes away if any other names are linked to the file.
Step through a directory deleting files interactively.
Delete entire directory hierarchies.
LN ‘‘Link’’ another name (alias) to an existing file.
MV Move a file or files. Used for renaming files.
CHMOD Change permissions on one or more files. Executable by files’ owner.
CHOWN Change owner of one or more files.
CHGRP Change group (project) to which a file belongs.
MKDIR Make a new directory.
RMDIR Remove a directory.
CD Change working directory.
FIND Prowl the directory hierarchy finding every file that meets specified criteria.
Criteria include:
name matches a given pattern,
creation date in given range,
date of last use in given range,
given permissions,
given owner,
given special file characteristics,
boolean combinations of above.
-5-

Any directory may be considered to be the root.


Perform specified command on each file found.

1.6. Running of Programs

SH The Shell, or command language interpreter.


Supply arguments to and run any executable program.
Redirect standard input, standard output, and standard error files.
Pipes: simultaneous execution with output of one process connected to the input of
another.
Compose compound commands using:
if ... then ... else,
case switches,
while loops,
for loops over lists,
break, continue and exit,
parentheses for grouping.
Initiate background processes.
Perform Shell programs, i.e., command scripts with substitutable arguments.
Construct argument lists from all file names satisfying specified patterns.
Take special action on traps and interrupts.
User-settable search path for finding commands.
Executes user-settable profile upon login.
Optionally announces presence of mail as it arrives.
Provides variables and parameters with default setting.
TEST Tests for use in Shell conditionals.
String comparison.
File nature and accessibility.
Boolean combinations of the above.
EXPR String computations for calculating command arguments.
Integer arithmetic
Pattern matching
WAIT Wait for termination of asynchronously running processes.
READ Read a line from terminal, for interactive Shell procedure.
ECHO Print remainder of command line. Useful for diagnostics or prompts in Shell pro-
grams, or for inserting data into a pipeline.
SLEEP Suspend execution for a specified time.
NOHUP Run a command immune to hanging up the terminal.
NICE Run a command in low (or high) priority.
KILL Terminate named processes.
CRON Schedule regular actions at specified times.
Actions are arbitrary programs.
Times are conjunctions of month, day of month, day of week, hour and minute.
Ranges are specifiable for each.
AT Schedule a one-shot action for an arbitrary time.
TEE Pass data between processes and divert a copy into one or more files.
-6-

1.7. Status Inquiries

LS List the names of one, several, or all files in one or more directories.
Alphabetic or temporal sorting, up or down.
Optional information: size, owner, group, date last modified, date last accessed, per-
missions, i-node number.
FILE Try to determine what kind of information is in a file by consulting the file system
index and by reading the file itself.
DATE Print today’s date and time. Has considerable knowledge of calendric and horological
peculiarities.
May set UNIX’s idea of date and time.
DF Report amount of free space on file system devices.
DU Print a summary of total space occupied by all files in a hierarchy.
QUOT Print summary of file space usage by user id.
WHO Tell who’s on the system.
List of presently logged in users, ports and times on.
Optional history of all logins and logouts.
PS Report on active processes.
List your own or everybody’s processes.
Tell what commands are being executed.
Optional status information: state and scheduling info, priority, attached terminal,
what it’s waiting for, size.
IOSTAT Print statistics about system I/O activity.
TTY Print name of your terminal.
PWD Print name of your working directory.

1.8. Backup and Maintenance

MOUNT Attach a device containing a file system to the tree of directories. Protects against
nonsense arrangements.
UMOUNT Remove the file system contained on a device from the tree of directories. Protects
against removing a busy device.
MKFS Make a new file system on a device.
MKNOD Make an i-node (file system entry) for a special file. Special files are physical devices,
virtual devices, physical memory, etc.
TP
TAR Manage file archives on magnetic tape or DECtape. TAR is newer.
Collect files into an archive.
Update DECtape archive by date.
Replace or delete DECtape files.
Print table of contents.
Retrieve from archive.
DUMP Dump the file system stored on a specified device, selectively by date, or indiscrim-
inately.
-7-

RESTOR Restore a dumped file system, or selectively retrieve parts thereof.


SU Temporarily become the super user with all the rights and privileges thereof. Requires
a password.
DCHECK
ICHECK
NCHECK Check consistency of file system.
Print gross statistics: number of files, number of directories, number of special files,
space used, space free.
Report duplicate use of space.
Retrieve lost space.
Report inaccessible files.
Check consistency of directories.
List names of all files.
CLRI Peremptorily expunge a file and its space from a file system. Used to repair damaged
file systems.
SYNC Force all outstanding I/O on the system to completion. Used to shut down gracefully.

1.9. Accounting
The timing information on which the reports are based can be manually cleared or shut off completely.
AC Publish cumulative connect time report.
Connect time by user or by day.
For all users or for selected users.
SA Publish Shell accounting report. Gives usage information on each command executed.
Number of times used.
Total system time, user time and elapsed time.
Optional averages and percentages.
Sorting on various fields.

1.10. Communication

MAIL Mail a message to one or more users. Also used to read and dispose of incoming
mail. The presence of mail is announced by LOGIN and optionally by SH.
Each message can be disposed of individually.
Messages can be saved in files or forwarded.
CALENDAR Automatic reminder service for events of today and tomorrow.
WRITE Establish direct terminal communication with another user.
WALL Write to all users.
MESG Inhibit receipt of messages from WRITE and WALL.
CU Call up another time-sharing system.
Transparent interface to remote machine.
File transmission.
Take remote input from local file or put remote output into local file.
Remote system need not be UNIX.
UUCP UNIX to UNIX copy.
-8-

Automatic queuing until line becomes available and remote machine is up.
Copy between two remote machines.
Differences, mail, etc., between two machines.

1.11. Basic Program Development Tools


Some of these utilities are used as integral parts of the higher level languages described in section 2.
AR Maintain archives and libraries. Combines several files into one for housekeeping
efficiency.
Create new archive.
Update archive by date.
Replace or delete files.
Print table of contents.
Retrieve from archive.
AS Assembler. Similar to PAL-11, but different in detail.
Creates object program consisting of
code, possibly read-only,
initialized data or read-write code,
uninitialized data.
Relocatable object code is directly executable without further transformation.
Object code normally includes a symbol table.
Multiple source files.
Local labels.
Conditional assembly.
‘‘Conditional jump’’ instructions become branches or branches plus jumps depend-
ing on distance.
Library The basic run-time library. These routines are used freely by all software.
Buffered character-by-character I/O.
Formatted input and output conversion (SCANF and PRINTF) for standard input and
output, files, in-memory conversion.
Storage allocator.
Time conversions.
Number conversions.
Password encryption.
Quicksort.
Random number generator.
Mathematical function library, including trigonometric functions and inverses,
exponential, logarithm, square root, bessel functions.
ADB Interactive debugger.
Postmortem dumping.
Examination of arbitrary files, with no limit on size.
Interactive breakpoint debugging with the debugger as a separate process.
Symbolic reference to local and global variables.
Stack trace for C programs.
Output formats:
1-, 2-, or 4-byte integers in octal, decimal, or hex
single and double floating point
character and string
disassembled machine instructions
Patching.
-9-

Searching for integer, character, or floating patterns.


Handles separated instruction and data space.
OD Dump any file. Output options include any combination of octal or decimal by words,
octal by bytes, ASCII, opcodes, hexadecimal.
Range of dumping is controllable.
LD Link edit. Combine relocatable object files. Insert required routines from specified
libraries.
Resulting code may be sharable.
Resulting code may have separate instruction and data spaces.
LORDER Places object file names in proper order for loading, so that files depending on others
come after them.
NM Print the namelist (symbol table) of an object program. Provides control over the style
and order of names that are printed.
SIZE Report the core requirements of one or more object files.
STRIP Remove the relocation and symbol table information from an object file to save space.
TIME Run a command and report timing information on it.
PROF Construct a profile of time spent per routine from statistics gathered by time-sampling
the execution of a program. Uses floating point.
Subroutine call frequency and average times for C programs.
MAKE Controls creation of large programs. Uses a control file specifying source file depen-
dencies to make new version; uses time last changed to deduce minimum amount of
work necessary.
Knows about CC, YACC, LEX, etc.

1.12. UNIX Programmer’s Manual

Manual Machine-readable version of the UNIX Programmer’s Manual.


System overview.
All commands.
All system calls.
All subroutines in C and assembler libraries.
All devices and other special files.
Formats of file system and kinds of files known to system software.
Boot and maintenance procedures.
MAN Print specified manual section on your terminal.

1.13. Computer-Aided Instruction

LEARN A program for interpreting CAI scripts, plus scripts for learning about UNIX by using
it.
Scripts for basic files and commands, editor, advanced files and commands, EQN,
MS macros, C programming language.

2. Languages

2.1. The C Language


- 10 -

CC Compile and/or link edit programs in the C language. The UNIX operating system,
most of the subsystems and C itself are written in C. For a full description of C, read
The C Programming Language, Brian W. Kernighan and Dennis M. Ritchie, Prentice-
Hall, 1978.
General purpose language designed for structured programming.
Data types include character, integer, float, double, pointers to all types, functions
returning above types, arrays of all types, structures and unions of all types.
Operations intended to give machine-independent control of full machine facility,
including to-memory operations and pointer arithmetic.
Macro preprocessor for parameterized code and inclusion of standard files.
All procedures recursive, with parameters by value.
Machine-independent pointer manipulation.
Object code uses full addressing capability of the PDP-11.
Runtime library gives access to all system facilities.
Definable data types.
Block structure
LINT Verifier for C programs. Reports questionable or nonportable usage such as:
Mismatched data declarations and procedure interfaces.
Nonportable type conversions.
Unused variables, unreachable code, no-effect operations.
Mistyped pointers.
Obsolete syntax.
Full cross-module checking of separately compiled programs.
CB A beautifier for C programs. Does proper indentation and placement of braces.

2.2. Fortran

F77 A full compiler for ANSI Standard Fortran 77.


Compatible with C and supporting tools at object level.
Optional source compatibility with Fortran 66.
Free format source.
Optional subscript-range checking, detection of uninitialized variables.
All widths of arithmetic: 2- and 4-byte integer; 4- and 8-byte real; 8- and 16-byte
complex.
RATFOR Ratfor adds rational control structure à la C to Fortran.
Compound statements.
If-else, do, for, while, repeat-until, break, next statements.
Symbolic constants.
File insertion.
Free format source
Translation of relationals like >, >=.
Produces genuine Fortran to carry away.
May be used with F77.
STRUCT Converts ordinary ugly Fortran into structured Fortran (i.e., Ratfor), using statement
grouping, if-else, while, for, repeat-until.

2.3. Other Algorithmic Languages

BAS An interactive interpreter, similar in style to BASIC. Interpret unnumbered statements


immediately, numbered statements upon ‘run’.
- 11 -

Statements include:
comment,
dump,
for...next,
goto,
if...else...fi,
list,
print,
prompt,
return,
run,
save.
All calculations double precision.
Recursive function defining and calling.
Builtin functions include log, exp, sin, cos, atn, int, sqr, abs, rnd.
Escape to ED for complex program editing.
DC Interactive programmable desk calculator. Has named storage locations as well as con-
ventional stack for holding integers or programs.
Unlimited precision decimal arithmetic.
Appropriate treatment of decimal fractions.
Arbitrary input and output radices, in particular binary, octal, decimal and hexade-
cimal.
Reverse Polish operators:
+– */
remainder, power, square root,
load, store, duplicate, clear,
print, enter program text, execute.
BC A C-like interactive interface to the desk calculator DC.
All the capabilities of DC with a high-level syntax.
Arrays and recursive functions.
Immediate evaluation of expressions and evaluation of functions upon call.
Arbitrary precision elementary functions: exp, sin, cos, atan.
Go-to-less programming.

2.4. Macroprocessing

M4 A general purpose macroprocessor.


Stream-oriented, recognizes macros anywhere in text.
Syntax fits with functional syntax of most higher-level languages.
Can evaluate integer arithmetic expressions.

2.5. Compiler-compilers

YACC An LR(1)-based compiler writing system. During execution of resulting parsers, arbi-
trary C functions may be called to do code generation or semantic actions.
BNF syntax specifications.
Precedence relations.
Accepts formally ambiguous grammars with non-BNF resolution rules.
LEX Generator of lexical analyzers. Arbitrary C functions may be called upon isolation of
each lexical token.
- 12 -

Full regular expression, plus left and right context dependence.


Resulting lexical analysers interface cleanly with YACC parsers.

3. Text Processing

3.1. Document Preparation

ED Interactive context editor. Random access to all lines of a file.


Find lines by number or pattern. Patterns may include: specified characters, don’t
care characters, choices among characters, repetitions of these constructs, beginning
of line, end of line.
Add, delete, change, copy, move or join lines.
Permute or split contents of a line.
Replace one or all instances of a pattern within a line.
Combine or split files.
Escape to Shell (command language) during editing.
Do any of above operations on every pattern-selected line in a given range.
Optional encryption for extra security.
PTX Make a permuted (key word in context) index.
SPELL Look for spelling errors by comparing each word in a document against a word list.
25,000-word list includes proper names.
Handles common prefixes and suffixes.
Collects words to help tailor local spelling lists.
LOOK Search for words in dictionary that begin with specified prefix.
TYPO Look for spelling errors by a statistical technique; not limited to English.
CRYPT Encrypt and decrypt files for security.

3.2. Document Formatting

ROFF A typesetting program for terminals. Easy for nontechnical people to learn, and good
for simple documents. Input consists of data lines intermixed with control lines, such
as

ROFF is deemed to be obsolete;


it is intended only for casual use.
Justification of either or both margins.
Automatic hyphenation.
Generalized running heads and feet, with even-odd page capability, numbering, etc.
Definable macros for frequently used control sequences (no substitutable arguments).
All 4 margins and page size dynamically adjustable.
Hanging indents and one-line indents.
Absolute and relative parameter settings.
Optional legal-style numbering of output lines.
Multiple file capability.
Not usable as a filter.
TROFF
NROFF Advanced typesetting. TROFF drives a Graphic Systems phototypesetter; NROFF
drives ascii terminals of all types. This summary was typeset using TROFF. TROFF
and NROFF style is similar to ROFF, but they are capable of much more elaborate
feats of formatting, when appropriately programmed. TROFF and NROFF accept the
- 13 -

same input language.


All ROFF capabilities available or definable.
Completely definable page format keyed to dynamically planted ‘‘interrupts’’ at
specified lines.
Maintains several separately definable typesetting environments (e.g., one for body
text, one for footnotes, and one for unusually elaborate headings).
Arbitrary number of output pools can be combined at will.
Macros with substitutable arguments, and macros invocable in mid-line.
Computation and printing of numerical quantities.
Conditional execution of macros.
Tabular layout facility.
Positions expressible in inches, centimeters, ems, points, machine units or arithmetic
combinations thereof.
Access to character-width computation for unusually difficult layout problems.
Overstrikes, built-up brackets, horizontal and vertical line drawing.
Dynamic relative or absolute positioning and size selection, globally or at the char-
acter level.
Can exploit the characteristics of the terminal being used, for approximating special
characters, reverse motions, proportional spacing, etc.
The Graphic Systems typesetter has a vocabulary of several 102-character fonts (4 simultaneously) in 15
sizes. TROFF provides terminal output for rough sampling of the product.
NROFF will produce multicolumn output on terminals capable of reverse line feed, or through the post-
processor COL.
High programming skill is required to exploit the formatting capabilities of TROFF and NROFF,
although unskilled personnel can easily be trained to enter documents according to canned formats such
as those provided by MS, below. TROFF and EQN are essentially identical to NROFF and NEQN so it
is usually possible to define interchangeable formats to produce approximate proof copy on terminals
before actual typesetting. The preprocessors MS, TBL, and REFER are fully compatible with TROFF
and NROFF.
MS A standardized manuscript layout package for use with NROFF/TROFF. This docu-
ment was formatted with MS.
Page numbers and draft dates.
Automatically numbered subheads.
Footnotes.
Single or double column.
Paragraphing, display and indentation.
Numbered equations.
EQN A mathematical typesetting preprocessor for TROFF. Translates easily readable formu-
las, either in-line or displayed, into detailed typesetting instructions. Formulas are
written in a style like this:
sigma sup 2 ˜=˜ 1 over N sum from i=1 to N ( x sub i – x bar ) sup 2
which produces:
1 N
σ2 = __ Σ (xi −x )2
N i =1
Automatic calculation of size changes for subscripts, sub-subscripts, etc.
Full vocabulary of Greek letters and special symbols, such as ‘gamma’, ‘GAMMA’,
‘integral’.
- 14 -

Automatic calculation of large bracket sizes.


Vertical ‘‘piling’’ of formulae for matrices, conditional alternatives, etc.
Integrals, sums, etc., with arbitrarily complex limits.
Diacriticals: dots, double dots, hats, bars, etc.
Easily learned by nonprogrammers and mathematical typists.
NEQN A version of EQN for NROFF; accepts the same input language. Prepares formulas
for display on any terminal that NROFF knows about, for example, those based on
Diablo printing mechanism.
Same facilities as EQN within graphical capability of terminal.
TBL A preprocessor for NROFF/TROFF that translates simple descriptions of table layouts
and contents into detailed typesetting instructions.
Computes column widths.
Handles left- and right-justified columns, centered columns and decimal-point align-
ment.
Places column titles.
Table entries can be text, which is adjusted to fit.
Can box all or parts of table.
REFER Fills in bibliographic citations in a document from a data base (not supplied).
References may be printed in any style, as they occur or collected at the end.
May be numbered sequentially, by name of author, etc.
TC Simulate Graphic Systems typesetter on Tektronix 4014 scope. Useful for checking
TROFF page layout before typesetting.
GREEK Fancy printing on Diablo-mechanism terminals like DASI-300 and DASI-450, and on
Tektronix 4014.
Gives half-line forward and reverse motions.
Approximates Greek letters and other special characters by overstriking.
COL Canonicalize files with reverse line feeds for one-pass printing.
DEROFF Remove all TROFF commands from input.
CHECKEQ Check document for possible errors in EQN usage.

4. Information Handling

SORT Sort or merge ASCII files line-by-line. No limit on input size.


Sort up or down.
Sort lexicographically or on numeric key.
Multiple keys located by delimiters or by character position.
May sort upper case together with lower into dictionary order.
Optionally suppress duplicate data.
TSORT Topological sort — converts a partial order into a total order.
UNIQ Collapse successive duplicate lines in a file into one line.
Publish lines that were originally unique, duplicated, or both.
May give redundancy count for each line.
TR Do one-to-one character translation according to an arbitrary code.
May coalesce selected repeated characters.
May delete selected characters.
DIFF Report line changes, additions and deletions necessary to bring two files into agree-
ment.
- 15 -

May produce an editor script to convert one file into another.


A variant compares two new versions against one old one.
COMM Identify common lines in two sorted files. Output in up to 3 columns shows lines
present in first file only, present in both, and/or present in second only.
JOIN Combine two files by joining records that have identical keys.
GREP Print all lines in a file that satisfy a pattern as used in the editor ED.
May print all lines that fail to match.
May print count of hits.
May print first hit in each file.
LOOK Binary search in sorted file for lines with specified prefix.
WC Count the lines, ‘‘words’’ (blank-separated strings) and characters in a file.
SED Stream-oriented version of ED. Can perform a sequence of editing operations on each
line of an input stream of unbounded length.
Lines may be selected by address or range of addresses.
Control flow and conditional testing.
Multiple output streams.
Multi-line capability.
AWK Pattern scanning and processing language. Searches input for patterns, and performs
actions on each line of input that satisfies the pattern.
Patterns include regular expressions, arithmetic and lexicographic conditions,
boolean combinations and ranges of these.
Data treated as string or numeric as appropriate.
Can break input into fields; fields are variables.
Variables and arrays (with non-numeric subscripts).
Full set of arithmetic operators and control flow.
Multiple output streams to files and pipes.
Output can be formatted as desired.
Multi-line capabilities.

5. Graphics
The programs in this section are predominantly intended for use with Tektronix 4014 storage scopes.
GRAPH Prepares a graph of a set of input numbers.
Input scaled to fit standard plotting area.
Abscissae may be supplied automatically.
Graph may be labeled.
Control over grid style, line style, graph orientation, etc.
SPLINE Provides a smooth curve through a set of points intended for GRAPH.
PLOT A set of filters for printing graphs produced by GRAPH and other programs on various
terminals. Filters provided for 4014, DASI terminals, Versatec printer/plotter.

6. Novelties, Games, and Things That Didn’t Fit Anywhere Else

BACKGAMMON
A player of modest accomplishment.
CHESS Plays good class D chess.
- 16 -

CHECKERS Ditto, for checkers.


BCD Converts ascii to card-image form.
PPT Converts ascii to paper tape form.
BJ A blackjack dealer.
CUBIC An accomplished player of 4×4×4 tic-tac-toe.
MAZE Constructs random mazes for you to solve.
MOO A fascinating number-guessing game.
CAL Print a calendar of specified month and year.
BANNER Print output in huge letters.
CHING The I Ching. Place your own interpretation on the output.
FORTUNE Presents a random fortune cookie on each invocation. Limited jar of cookies included.
UNITS Convert amounts between different scales of measurement. Knows hundreds of units.
For example, how many km/sec is a parsec/megayear?
TTT A tic-tac-toe program that learns. It never makes the same mistake twice.
ARITHMETIC
Speed and accuracy test for number facts.
FACTOR Factor large integers.
QUIZ Test your knowledge of Shakespeare, Presidents, capitals, etc.
WUMP Hunt the wumpus, thrilling search in a dangerous cave.
REVERSI A two person board game, isomorphic to Othello.
HANGMAN Word-guessing game. Uses the dictionary supplied with SPELL.
FISH Children’s card-guessing game.
The UNIX Time-Sharing System*

D. M. Ritchie and K. Thompson

ABSTRACT

UNIX† is a general-purpose, multi-user, interactive operating system for the larger


Digital Equipment Corporation PDP-11 and the Interdata 8/32 computers. It offers a
number of features seldom found even in larger operating systems, including
i A hierarchical file system incorporating demountable volumes,
ii Compatible file, device, and inter-process I/O,
iii The ability to initiate asynchronous processes,
iv System command language selectable on a per-user basis,
v Over 100 subsystems including a dozen languages,
vi High degree of portability.
This paper discusses the nature and implementation of the file system and of the user
command interface.

1. INTRODUCTION
There have been four versions of the UNIX time-sharing system. The earliest (circa 1969-70) ran
on the Digital Equipment Corporation PDP-7 and -9 computers. The second version ran on the unpro-
tected PDP-11/20 computer. The third incorporated multiprogramming and ran on the PDP-11/34, /40,
/45, /60, and /70 computers; it is the one described in the previously published version of this paper, and
is also the most widely used today. This paper describes only the fourth, current system that runs on the
PDP-11/70 and the Interdata 8/32 computers. In fact, the differences among the various systems is rather
small; most of the revisions made to the originally published version of this paper, aside from those con-
cerned with style, had to do with details of the implementation of the file system.
Since PDP-11 UNIX became operational in February, 1971, over 600 installations have been put into
service. Most of them are engaged in applications such as computer science education, the preparation
and formatting of documents and other textual material, the collection and processing of trouble data
from various switching machines within the Bell System, and recording and checking telephone service
orders. Our own installation is used mainly for research in operating systems, languages, computer net-
works, and other topics in computer science, and also for document preparation.
Perhaps the most important achievement of UNIX is to demonstrate that a powerful operating sys-
tem for interactive use need not be expensive either in equipment or in human effort: it can run on
hardware costing as little as $40,000, and less than two man-years were spent on the main system
software. We hope, however, that users find that the most important characteristics of the system are its
simplicity, elegance, and ease of use.
Besides the operating system proper, some major programs available under UNIX are
__________________
* Copyright 1974, Association for Computing Machinery, Inc., reprinted by permission. This is a revised version of an
article that appeared in Communications of the ACM, 17, No. 7 (July 1974), pp. 365-375. That article was a revised
version of a paper presented at the Fourth ACM Symposium on Operating Systems Principles, IBM Thomas J. Watson
Research Center, Yorktown Heights, New York, October 15-17, 1973.
†UNIX is a Trademark of Bell Laboratories.
-2-

C compiler
Text editor based on QED1
Assembler, linking loader, symbolic debugger
Phototypesetting and equation setting programs23
Dozens of languages including Fortran 77, Basic, Snobol, APL, Algol 68, M6, TMG, Pascal
There is a host of maintenance, utility, recreation and novelty programs, all written locally. The UNIX
user community, which numbers in the thousands, has contributed many more programs and languages.
It is worth noting that the system is totally self-supporting. All UNIX software is maintained on the sys-
tem; likewise, this paper and all other documents in this issue were generated and formatted by the UNIX
editor and text formatting programs.

II. HARDWARE AND SOFTWARE ENVIRONMENT


The PDP-11/70 on which the Research UNIX system is installed is a 16-bit word (8-bit byte) com-
puter with 768K bytes of core memory; the system kernel occupies 90K bytes about equally divided
between code and data tables. This system, however, includes a very large number of device drivers and
enjoys a generous allotment of space for I/O buffers and system tables; a minimal system capable of
running the software mentioned above can require as little as 96K bytes of core altogether. There are
even larger installations; see the description of the PWB/UNIX systems,45 for example. There are also
much smaller, though somewhat restricted, versions of the system.6
Our own PDP-11 has two 200-Mb moving-head disks for file system storage and swapping. There
are 20 variable-speed communications interfaces attached to 300- and 1200-baud data sets, and an addi-
tional 12 communication lines hard-wired to 9600-baud terminals and satellite computers. There are
also several 2400- and 4800-baud synchronous communication interfaces used for machine-to-machine
file transfer. Finally, there is a variety of miscellaneous devices including nine-track magnetic tape, a
line printer, a voice synthesizer, a phototypesetter, a digital switching network, and a chess machine.
The preponderance of UNIX software is written in the abovementioned C language.7 Early versions
of the operating system were written in assembly language, but during the summer of 1973, it was
rewritten in C. The size of the new system was about one-third greater than that of the old. Since the
new system not only became much easier to understand and to modify but also included many func-
tional improvements, including multiprogramming and the ability to share reentrant code among several
user programs, we consider this increase in size quite acceptable.

III. THE FILE SYSTEM


The most important role of the system is to provide a file system. From the point of view of the
user, there are three kinds of files: ordinary disk files, directories, and special files.

3.1 Ordinary files


A file contains whatever information the user places on it, for example, symbolic or binary
(object) programs. No particular structuring is expected by the system. A file of text consists simply of
a string of characters, with lines demarcated by the newline character. Binary programs are sequences
of words as they will appear in core memory when the program starts executing. A few user programs
manipulate files with more structure; for example, the assembler generates, and the loader expects, an
object file in a particular format. However, the structure of files is controlled by the programs that use
them, not by the system.

3.2 Directories
Directories provide the mapping between the names of files and the files themselves, and thus
induce a structure on the file system as a whole. Each user has a directory of his own files; he may also
create subdirectories to contain groups of files conveniently treated together. A directory behaves
exactly like an ordinary file except that it cannot be written on by unprivileged programs, so that the
system controls the contents of directories. However, anyone with appropriate permission may read a
directory just like any other file.
-3-

The system maintains several directories for its own use. One of these is the root directory. All
files in the system can be found by tracing a path through a chain of directories until the desired file is
reached. The starting point for such searches is often the root. Other system directories contain all the
programs provided for general use; that is, all the commands. As will be seen, however, it is by no
means necessary that a program reside in one of these directories for it to be executed.
Files are named by sequences of 14 or fewer characters. When the name of a file is specified to
the system, it may be in the form of a path name, which is a sequence of directory names separated by
slashes, ‘‘/ ’’, and ending in a file name. If the sequence begins with a slash, the search begins in the
root directory. The name /alpha/beta/gamma causes the system to search the root for directory alpha,
then to search alpha for beta, finally to find gamma in beta. gamma may be an ordinary file, a direc-
tory, or a special file. As a limiting case, the name ‘‘/ ’’ refers to the root itself.
A path name not starting with ‘‘/ ’’ causes the system to begin the search in the user’s current
directory. Thus, the name alpha/beta specifies the file named beta in subdirectory alpha of the current
directory. The simplest kind of name, for example, alpha, refers to a file that itself is found in the
current directory. As another limiting case, the null file name refers to the current directory.
The same non-directory file may appear in several directories under possibly different names.
This feature is called linking; a directory entry for a file is sometimes called a link. The UNIX system
differs from other systems in which linking is permitted in that all links to a file have equal status. That
is, a file does not exist within a particular directory; the directory entry for a file consists merely of its
name and a pointer to the information actually describing the file. Thus a file exists independently of
any directory entry, although in practice a file is made to disappear along with the last link to it.
Each directory always has at least two entries. The name ‘‘ . ’’ in each directory refers to the
directory itself. Thus a program may read the current directory under the name ‘‘ . ’’ without knowing
its complete path name. The name ‘‘ . . ’’ by convention refers to the parent of the directory in which it
appears, that is, to the directory in which it was created.
The directory structure is constrained to have the form of a rooted tree. Except for the special
entries ‘‘ . ’’ and ‘‘ . . ’’, each directory must appear as an entry in exactly one other directory, which is
its parent. The reason for this is to simplify the writing of programs that visit subtrees of the directory
structure, and more important, to avoid the separation of portions of the hierarchy. If arbitrary links to
directories were permitted, it would be quite difficult to detect when the last connection from the root to
a directory was severed.

3.3 Special files


Special files constitute the most unusual feature of the UNIX file system. Each supported I/O dev-
ice is associated with at least one such file. Special files are read and written just like ordinary disk
files, but requests to read or write result in activation of the associated device. An entry for each special
file resides in directory /dev, although a link may be made to one of these files just as it may to an ordi-
nary file. Thus, for example, to write on a magnetic tape one may write on the file /dev/mt. Special
files exist for each communication line, each disk, each tape drive, and for physical main memory. Of
course, the active disks and the memory special file are protected from indiscriminate access.
There is a threefold advantage in treating I/O devices this way: file and device I/O are as similar
as possible; file and device names have the same syntax and meaning, so that a program expecting a file
name as a parameter can be passed a device name; finally, special files are subject to the same protection
mechanism as regular files.

3.4 Removable file systems


Although the root of the file system is always stored on the same device, it is not necessary that
the entire file system hierarchy reside on this device. There is a mount system request with two argu-
ments: the name of an existing ordinary file, and the name of a special file whose associated storage
volume (e.g., a disk pack) should have the structure of an independent file system containing its own
directory hierarchy. The effect of mount is to cause references to the heretofore ordinary file to refer
instead to the root directory of the file system on the removable volume. In effect, mount replaces a
-4-

leaf of the hierarchy tree (the ordinary file) by a whole new subtree (the hierarchy stored on the remov-
able volume). After the mount, there is virtually no distinction between files on the removable volume
and those in the permanent file system. In our installation, for example, the root directory resides on a
small partition of one of our disk drives, while the other drive, which contains the user’s files, is
mounted by the system initialization sequence. A mountable file system is generated by writing on its
corresponding special file. A utility program is available to create an empty file system, or one may
simply copy an existing file system.
There is only one exception to the rule of identical treatment of files on different devices: no link
may exist between one file system hierarchy and another. This restriction is enforced so as to avoid the
elaborate bookkeeping that would otherwise be required to assure removal of the links whenever the
removable volume is dismounted.

3.5 Protection
Although the access control scheme is quite simple, it has some unusual features. Each user of
the system is assigned a unique user identification number. When a file is created, it is marked with the
user ID of its owner. Also given for new files is a set of ten protection bits. Nine of these specify
independently read, write, and execute permission for the owner of the file, for other members of his
group, and for all remaining users.
If the tenth bit is on, the system will temporarily change the user identification (hereafter, user ID)
of the current user to that of the creator of the file whenever the file is executed as a program. This
change in user ID is effective only during the execution of the program that calls for it. The set-user-ID
feature provides for privileged programs that may use files inaccessible to other users. For example, a
program may keep an accounting file that should neither be read nor changed except by the program
itself. If the set-user-ID bit is on for the program, it may access the file although this access might be
forbidden to other programs invoked by the given program’s user. Since the actual user ID of the
invoker of any program is always available, set-user-ID programs may take any measures desired to
satisfy themselves as to their invoker’s credentials. This mechanism is used to allow users to execute
the carefully written commands that call privileged system entries. For example, there is a system entry
invokable only by the ‘‘super-user’’ (below) that creates an empty directory. As indicated above, direc-
tories are expected to have entries for ‘‘ . ’’ and ‘‘ . . ’’. The command which creates a directory is
owned by the super-user and has the set-user-ID bit set. After it checks its invoker’s authorization to
create the specified directory, it creates it and makes the entries for ‘‘ . ’’ and ‘‘ . . ’’.
Because anyone may set the set-user-ID bit on one of his own files, this mechanism is generally
available without administrative intervention. For example, this protection scheme easily solves the MOO
accounting problem posed by ‘‘Aleph-null.’’8
The system recognizes one particular user ID (that of the ‘‘super-user’’) as exempt from the usual
constraints on file access; thus (for example), programs may be written to dump and reload the file sys-
tem without unwanted interference from the protection system.

3.6 I/O calls


The system calls to do I/O are designed to eliminate the differences between the various devices
and styles of access. There is no distinction between ‘‘random’’ and ‘‘sequential’’ I/O, nor is any logi-
cal record size imposed by the system. The size of an ordinary file is determined by the number of
bytes written on it; no predetermination of the size of a file is necessary or possible.
To illustrate the essentials of I/O, some of the basic calls are summarized below in an anonymous
language that will indicate the required parameters without getting into the underlying complexities.
Each call to the system may potentially result in an error return, which for simplicity is not represented
in the calling sequence.
To read or write a file assumed to exist already, it must be opened by the following call:
filep = open ( name, flag )
where name indicates the name of the file. An arbitrary path name may be given. The flag argument
-5-

indicates whether the file is to be read, written, or ‘‘updated,’’ that is, read and written simultaneously.
The returned value filep is called a file descriptor. It is a small integer used to identify the file in
subsequent calls to read, write, or otherwise manipulate the file.
To create a new file or completely rewrite an old one, there is a create system call that creates the
given file if it does not exist, or truncates it to zero length if it does exist; create also opens the new file
for writing and, like open, returns a file descriptor.
The file system maintains no locks visible to the user, nor is there any restriction on the number
of users who may have a file open for reading or writing. Although it is possible for the contents of a
file to become scrambled when two users write on it simultaneously, in practice difficulties do not arise.
We take the view that locks are neither necessary nor sufficient, in our environment, to prevent interfer-
ence between users of the same file. They are unnecessary because we are not faced with large, single-
file data bases maintained by independent processes. They are insufficient because locks in the ordinary
sense, whereby one user is prevented from writing on a file that another user is reading, cannot prevent
confusion when, for example, both users are editing a file with an editor that makes a copy of the file
being edited.
There are, however, sufficient internal interlocks to maintain the logical consistency of the file sys-
tem when two users engage simultaneously in activities such as writing on the same file, creating files in
the same directory, or deleting each other’s open files.
Except as indicated below, reading and writing are sequential. This means that if a particular byte
in the file was the last byte written (or read), the next I/O call implicitly refers to the immediately fol-
lowing byte. For each open file there is a pointer, maintained inside the system, that indicates the next
byte to be read or written. If n bytes are read or written, the pointer advances by n bytes.
Once a file is open, the following calls may be used:
n = read ( filep, buffer, count )
n = write ( filep, buffer, count )
Up to count bytes are transmitted between the file specified by filep and the byte array specified by
buffer. The returned value n is the number of bytes actually transmitted. In the write case, n is the
same as count except under exceptional conditions, such as I/O errors or end of physical medium on
special files; in a read, however, n may without error be less than count. If the read pointer is so near
the end of the file that reading count characters would cause reading beyond the end, only sufficient
bytes are transmitted to reach the end of the file; also, typewriter-like terminals never return more than
one line of input. When a read call returns with n equal to zero, the end of the file has been reached.
For disk files this occurs when the read pointer becomes equal to the current size of the file. It is possi-
ble to generate an end-of-file from a terminal by use of an escape sequence that depends on the device
used.
Bytes written affect only those parts of a file implied by the position of the write pointer and the
count; no other part of the file is changed. If the last byte lies beyond the end of the file, the file is
made to grow as needed.
To do random (direct-access) I/O it is only necessary to move the read or write pointer to the
appropriate location in the file.
location = lseek ( filep, offset, base )
The pointer associated with filep is moved to a position offset bytes from the beginning of the file, from
the current position of the pointer, or from the end of the file, depending on base. offset may be nega-
tive. For some devices (e.g., paper tape and terminals) seek calls are ignored. The actual offset from
the beginning of the file to which the pointer was moved is returned in location.
There are several additional system entries having to do with I/O and with the file system that will
not be discussed. For example: close a file, get the status of a file, change the protection mode or the
owner of a file, create a directory, make a link to an existing file, delete a file.
-6-

IV. IMPLEMENTATION OF THE FILE SYSTEM


As mentioned in Section 3.2 above, a directory entry contains only a name for the associated file
and a pointer to the file itself. This pointer is an integer called the i-number (for index number) of the
file. When the file is accessed, its i-number is used as an index into a system table (the i-list ) stored in
a known part of the device on which the directory resides. The entry found thereby (the file’s i-node )
contains the description of the file:
i the user and group-ID of its owner
ii its protection bits
iii the physical disk or tape addresses for the file contents
iv its size
v time of creation, last use, and last modification
vi the number of links to the file, that is, the number of times it appears in a directory
vii a code indicating whether the file is a directory, an ordinary file, or a special file.
The purpose of an open or create system call is to turn the path name given by the user into an i-
number by searching the explicitly or implicitly named directories. Once a file is open, its device, i-
number, and read/write pointer are stored in a system table indexed by the file descriptor returned by the
open or create. Thus, during a subsequent call to read or write the file, the descriptor may be easily
related to the information necessary to access the file.
When a new file is created, an i-node is allocated for it and a directory entry is made that contains
the name of the file and the i-node number. Making a link to an existing file involves creating a direc-
tory entry with the new name, copying the i-number from the original file entry, and incrementing the
link-count field of the i-node. Removing (deleting) a file is done by decrementing the link-count of the
i-node specified by its directory entry and erasing the directory entry. If the link-count drops to 0, any
disk blocks in the file are freed and the i-node is de-allocated.
The space on all disks that contain a file system is divided into a number of 512-byte blocks logi-
cally addressed from 0 up to a limit that depends on the device. There is space in the i-node of each
file for 13 device addresses. For nonspecial files, the first 10 device addresses point at the first 10
blocks of the file. If the file is larger than 10 blocks, the 11 device address points to an indirect block
containing up to 128 addresses of additional blocks in the file. Still larger files use the twelfth device
address of the i-node to point to a double-indirect block naming 128 indirect blocks, each pointing to
128 blocks of the file. If required, the thirteenth device address is a triple-indirect block. Thus files
may conceptually grow to [ (10+128+128 +128 ).512 ] bytes. Once opened, bytes numbered below 5120
2 3

can be read with a single disk access; bytes in the range 5120 to 70,656 require two accesses; bytes in
the range 70,656 to 8,459,264 require three accesses; bytes from there to the largest file (1,082,201,088)
require four accesses. In practice, a device cache mechanism (see below) proves effective in eliminating
most of the indirect fetches.
The foregoing discussion applies to ordinary files. When an I/O request is made to a file whose
i-node indicates that it is special, the last 12 device address words are immaterial, and the first specifies
an internal device name, which is interpreted as a pair of numbers representing, respectively, a device
type and subdevice number. The device type indicates which system routine will deal with I/O on that
device; the subdevice number selects, for example, a disk drive attached to a particular controller or one
of several similar terminal interfaces.
In this environment, the implementation of the mount system call (Section 3.4) is quite straight-
forward. mount maintains a system table whose argument is the i-number and device name of the ordi-
nary file specified during the mount, and whose corresponding value is the device name of the indicated
special file. This table is searched for each i-number/device pair that turns up while a path name is
being scanned during an open or create; if a match is found, the i-number is replaced by the i-number
of the root directory and the device name is replaced by the table value.
To the user, both reading and writing of files appear to be synchronous and unbuffered. That is,
immediately after return from a read call the data are available; conversely, after a write the user’s
-7-

workspace may be reused. In fact, the system maintains a rather complicated buffering mechanism that
reduces greatly the number of I/O operations required to access a file. Suppose a write call is made
specifying transmission of a single byte. The system will search its buffers to see whether the affected
disk block currently resides in main memory; if not, it will be read in from the device. Then the
affected byte is replaced in the buffer and an entry is made in a list of blocks to be written. The return
from the write call may then take place, although the actual I/O may not be completed until a later time.
Conversely, if a single byte is read, the system determines whether the secondary storage block in which
the byte is located is already in one of the system’s buffers; if so, the byte can be returned immediately.
If not, the block is read into a buffer and the byte picked out.
The system recognizes when a program has made accesses to sequential blocks of a file, and asyn-
chronously pre-reads the next block. This significantly reduces the running time of most programs while
adding little to system overhead.
A program that reads or writes files in units of 512 bytes has an advantage over a program that
reads or writes a single byte at a time, but the gain is not immense; it comes mainly from the avoidance
of system overhead. If a program is used rarely or does no great volume of I/O, it may quite reasonably
read and write in units as small as it wishes.
The notion of the i-list is an unusual feature of UNIX. In practice, this method of organizing the
file system has proved quite reliable and easy to deal with. To the system itself, one of its strengths is
the fact that each file has a short, unambiguous name related in a simple way to the protection, address-
ing, and other information needed to access the file. It also permits a quite simple and rapid algorithm
for checking the consistency of a file system, for example, verification that the portions of each device
containing useful information and those free to be allocated are disjoint and together exhaust the space
on the device. This algorithm is independent of the directory hierarchy, because it need only scan the
linearly organized i-list. At the same time the notion of the i-list induces certain peculiarities not found
in other file system organizations. For example, there is the question of who is to be charged for the
space a file occupies, because all directory entries for a file have equal status. Charging the owner of a
file is unfair in general, for one user may create a file, another may link to it, and the first user may
delete the file. The first user is still the owner of the file, but it should be charged to the second user.
The simplest reasonably fair algorithm seems to be to spread the charges equally among users who have
links to a file. Many installations avoid the issue by not charging any fees at all.

V. PROCESSES AND IMAGES


An image is a computer execution environment. It includes a memory image, general register
values, status of open files, current directory and the like. An image is the current state of a pseudo-
computer.
A process is the execution of an image. While the processor is executing on behalf of a process,
the image must reside in main memory; during the execution of other processes it remains in main
memory unless the appearance of an active, higher-priority process forces it to be swapped out to the
disk.
The user-memory part of an image is divided into three logical segments. The program text seg-
ment begins at location 0 in the virtual address space. During execution, this segment is write-protected
and a single copy of it is shared among all processes executing the same program. At the first hardware
protection byte boundary above the program text segment in the virtual address space begins a non-
shared, writable data segment, the size of which may be extended by a system call. Starting at the
highest address in the virtual address space is a stack segment, which automatically grows downward as
the stack pointer fluctuates.

5.1 Processes
Except while the system is bootstrapping itself into operation, a new process can come into
existence only by use of the fork system call:
processid = fork ( )
-8-

When fork is executed, the process splits into two independently executing processes. The two
processes have independent copies of the original memory image, and share all open files. The new
processes differ only in that one is considered the parent process: in the parent, the returned processid
actually identifies the child process and is never 0, while in the child, the returned value is always 0.
Because the values returned by fork in the parent and child process are distinguishable, each pro-
cess may determine whether it is the parent or child.

5.2 Pipes
Processes may communicate with related processes using the same system read and write calls
that are used for file-system I/O. The call:
filep = pipe ( )
returns a file descriptor filep and creates an inter-process channel called a pipe. This channel, like other
open files, is passed from parent to child process in the image by the fork call. A read using a pipe file
descriptor waits until another process writes using the file descriptor for the same pipe. At this point,
data are passed between the images of the two processes. Neither process need know that a pipe, rather
than an ordinary file, is involved.
Although inter-process communication via pipes is a quite valuable tool (see Section 6.2), it is not
a completely general mechanism, because the pipe must be set up by a common ancestor of the
processes involved.

5.3 Execution of programs


Another major system primitive is invoked by
execute ( file, arg1, arg2, . . . , argn )
which requests the system to read in and execute the program named by file, passing it string arguments
arg1 , arg2 , . . . , argn. All the code and data in the process invoking execute is replaced from the file,
but open files, current directory, and inter-process relationships are unaltered. Only if the call fails, for
example because file could not be found or because its execute-permission bit was not set, does a return
take place from the execute primitive; it resembles a ‘‘jump’’ machine instruction rather than a subrou-
tine call.

5.4 Process synchronization


Another process control system call:
processid = wait ( status )
causes its caller to suspend execution until one of its children has completed execution. Then wait
returns the processid of the terminated process. An error return is taken if the calling process has no
descendants. Certain status from the child process is also available.

5.5 Termination
Lastly:
exit ( status )
terminates a process, destroys its image, closes its open files, and generally obliterates it. The parent is
notified through the wait primitive, and status is made available to it. Processes may also terminate as
a result of various illegal actions or user-generated signals (Section VII below).

VI. THE SHELL


For most users, communication with the system is carried on with the aid of a program called the
shell. The shell is a command-line interpreter: it reads lines typed by the user and interprets them as
requests to execute other programs. (The shell is described fully elsewhere,9 so this section will discuss
-9-

only the theory of its operation.) In simplest form, a command line consists of the command name fol-
lowed by arguments to the command, all separated by spaces:
command arg1 arg2 . . . argn
The shell splits up the command name and the arguments into separate strings. Then a file with name
command is sought; command may be a path name including the ‘‘/’’ character to specify any file in
the system. If command is found, it is brought into memory and executed. The arguments collected by
the shell are accessible to the command. When the command is finished, the shell resumes its own exe-
cution, and indicates its readiness to accept another command by typing a prompt character.
If file command cannot be found, the shell generally prefixes a string such as / bin / to command
and attempts again to find the file. Directory / bin contains commands intended to be generally used.
(The sequence of directories to be searched may be changed by user request.)

6.1 Standard I/O


The discussion of I/O in Section III above seems to imply that every file used by a program must
be opened or created by the program in order to get a file descriptor for the file. Programs executed by
the shell, however, start off with three open files with file descriptors 0, 1, and 2. As such a program
begins execution, file 1 is open for writing, and is best understood as the standard output file. Except
under circumstances indicated below, this file is the user’s terminal. Thus programs that wish to write
informative information ordinarily use file descriptor 1. Conversely, file 0 starts off open for reading,
and programs that wish to read messages typed by the user read this file.
The shell is able to change the standard assignments of these file descriptors from the user’s termi-
nal printer and keyboard. If one of the arguments to a command is prefixed by ‘‘>’’, file descriptor 1
will, for the duration of the command, refer to the file named after the ‘‘>’’. For example:
ls
ordinarily lists, on the typewriter, the names of the files in the current directory. The command:
ls >there
creates a file called there and places the listing there. Thus the argument >there means ‘‘place output
on there.’’ On the other hand:
ed
ordinarily enters the editor, which takes requests from the user via his keyboard. The command
ed <script
interprets script as a file of editor commands; thus <script means ‘‘take input from script.’’
Although the file name following ‘‘<’’ or ‘‘>’’ appears to be an argument to the command, in fact
it is interpreted completely by the shell and is not passed to the command at all. Thus no special coding
to handle I/O redirection is needed within each command; the command need merely use the standard
file descriptors 0 and 1 where appropriate.
File descriptor 2 is, like file 1, ordinarily associated with the terminal output stream. When an
output-diversion request with ‘‘>’’ is specified, file 2 remains attached to the terminal, so that commands
may produce diagnostic messages that do not silently end up in the output file.

6.2 Filters
An extension of the standard I/O notion is used to direct output from one command to the input of
another. A sequence of commands separated by vertical bars causes the shell to execute all the com-
mands simultaneously and to arrange that the standard output of each command be delivered to the stan-
dard input of the next command in the sequence. Thus in the command line:
ls  pr −2  opr
ls lists the names of the files in the current directory; its output is passed to pr, which paginates its input
- 10 -

with dated headings. (The argument ‘‘−2’’ requests double-column output.) Likewise, the output from
pr is input to opr; this command spools its input onto a file for off-line printing.
This procedure could have been carried out more clumsily by:
ls >temp1
pr −2 <temp1 >temp2
opr <temp2
followed by removal of the temporary files. In the absence of the ability to redirect output and input, a
still clumsier method would have been to require the ls command to accept user requests to paginate its
output, to print in multi-column format, and to arrange that its output be delivered off-line. Actually it
would be surprising, and in fact unwise for efficiency reasons, to expect authors of commands such as ls
to provide such a wide variety of output options.
A program such as pr which copies its standard input to its standard output (with processing) is
called a filter. Some filters that we have found useful perform character transliteration, selection of lines
according to a pattern, sorting of the input, and encryption and decryption.

6.3 Command separators; multitasking


Another feature provided by the shell is relatively straightforward. Commands need not be on dif-
ferent lines; instead they may be separated by semicolons:
ls; ed
will first list the contents of the current directory, then enter the editor.
A related feature is more interesting. If a command is followed by ‘‘&,’’ the shell will not wait
for the command to finish before prompting again; instead, it is ready immediately to accept a new com-
mand. For example:
as source >output &
causes source to be assembled, with diagnostic output going to output; no matter how long the assem-
bly takes, the shell returns immediately. When the shell does not wait for the completion of a com-
mand, the identification number of the process running that command is printed. This identification may
be used to wait for the completion of the command or to terminate it. The ‘‘&’’ may be used several
times in a line:
as source >output & ls >files &
does both the assembly and the listing in the background. In these examples, an output file other than
the terminal was provided; if this had not been done, the outputs of the various commands would have
been intermingled.
The shell also allows parentheses in the above operations. For example:
( date; ls ) >x &
writes the current date and time followed by a list of the current directory onto the file x. The shell also
returns immediately for another request.

6.4 The shell as a command; command files


The shell is itself a command, and may be called recursively. Suppose file tryout contains the
lines:
as source
mv a.out testprog
testprog
The mv command causes the file a.out to be renamed testprog. a.out is the (binary) output of the
assembler, ready to be executed. Thus if the three lines above were typed on the keyboard, source
would be assembled, the resulting program renamed testprog, and testprog executed. When the lines
- 11 -

are in tryout, the command:


sh <tryout
would cause the shell sh to execute the commands sequentially.
The shell has further capabilities, including the ability to substitute parameters and to construct
argument lists from a specified subset of the file names in a directory. It also provides general condi-
tional and looping constructions.

6.5 Implementation of the shell


The outline of the operation of the shell can now be understood. Most of the time, the shell is
waiting for the user to type a command. When the newline character ending the line is typed, the
shell’s read call returns. The shell analyzes the command line, putting the arguments in a form
appropriate for execute. Then fork is called. The child process, whose code of course is still that of
the shell, attempts to perform an execute with the appropriate arguments. If successful, this will bring
in and start execution of the program whose name was given. Meanwhile, the other process resulting
from the fork, which is the parent process, waits for the child process to die. When this happens, the
shell knows the command is finished, so it types its prompt and reads the keyboard to obtain another
command.
Given this framework, the implementation of background processes is trivial; whenever a com-
mand line contains ‘‘&,’’ the shell merely refrains from waiting for the process that it created to execute
the command.
Happily, all of this mechanism meshes very nicely with the notion of standard input and output
files. When a process is created by the fork primitive, it inherits not only the memory image of its
parent but also all the files currently open in its parent, including those with file descriptors 0, 1, and 2.
The shell, of course, uses these files to read command lines and to write its prompts and diagnostics, and
in the ordinary case its children—the command programs—inherit them automatically. When an argu-
ment with ‘‘<’’ or ‘‘>’’ is given, however, the offspring process, just before it performs execute, makes
the standard I/O file descriptor (0 or 1, respectively) refer to the named file. This is easy because, by
agreement, the smallest unused file descriptor is assigned when a new file is opened (or created); it is
only necessary to close file 0 (or 1) and open the named file. Because the process in which the com-
mand program runs simply terminates when it is through, the association between a file specified after
‘‘<’’ or ‘‘>’’ and file descriptor 0 or 1 is ended automatically when the process dies. Therefore the shell
need not know the actual names of the files that are its own standard input and output, because it need
never reopen them.
Filters are straightforward extensions of standard I/O redirection with pipes used instead of files.
In ordinary circumstances, the main loop of the shell never terminates. (The main loop includes
the branch of the return from fork belonging to the parent process; that is, the branch that does a wait,
then reads another command line.) The one thing that causes the shell to terminate is discovering an
end-of-file condition on its input file. Thus, when the shell is executed as a command with a given
input file, as in:
sh <comfile
the commands in comfile will be executed until the end of comfile is reached; then the instance of the
shell invoked by sh will terminate. Because this shell process is the child of another instance of the
shell, the wait executed in the latter will return, and another command may then be processed.

6.6 Initialization
The instances of the shell to which users type commands are themselves children of another pro-
cess. The last step in the initialization of the system is the creation of a single process and the invoca-
tion (via execute) of a program called init. The role of init is to create one process for each terminal
channel. The various subinstances of init open the appropriate terminals for input and output on files 0,
1, and 2, waiting, if necessary, for carrier to be established on dial-up lines. Then a message is typed
- 12 -

out requesting that the user log in. When the user types a name or other identification, the appropriate
instance of init wakes up, receives the log-in line, and reads a password file. If the user’s name is
found, and if he is able to supply the correct password, init changes to the user’s default current direc-
tory, sets the process’s user ID to that of the person logging in, and performs an execute of the shell. At
this point, the shell is ready to receive commands and the logging-in protocol is complete.
Meanwhile, the mainstream path of init (the parent of all the subinstances of itself that will later
become shells) does a wait. If one of the child processes terminates, either because a shell found an end
of file or because a user typed an incorrect name or password, this path of init simply recreates the
defunct process, which in turn reopens the appropriate input and output files and types another log-in
message. Thus a user may log out simply by typing the end-of-file sequence to the shell.

6.7 Other programs as shell


The shell as described above is designed to allow users full access to the facilities of the system,
because it will invoke the execution of any program with appropriate protection mode. Sometimes,
however, a different interface to the system is desirable, and this feature is easily arranged for.
Recall that after a user has successfully logged in by supplying a name and password, init ordi-
narily invokes the shell to interpret command lines. The user’s entry in the password file may contain
the name of a program to be invoked after log-in instead of the shell. This program is free to interpret
the user’s messages in any way it wishes.
For example, the password file entries for users of a secretarial editing system might specify that
the editor ed is to be used instead of the shell. Thus when users of the editing system log in, they are
inside the editor and can begin work immediately; also, they can be prevented from invoking programs
not intended for their use. In practice, it has proved desirable to allow a temporary escape from the edi-
tor to execute the formatting program and other utilities.
Several of the games (e.g., chess, blackjack, 3D tic-tac-toe) available on the system illustrate a
much more severely restricted environment. For each of these, an entry exists in the password file
specifying that the appropriate game-playing program is to be invoked instead of the shell. People who
log in as a player of one of these games find themselves limited to the game and unable to investigate
the (presumably more interesting) offerings of the UNIX system as a whole.

VII. TRAPS
The PDP-11 hardware detects a number of program faults, such as references to non-existent
memory, unimplemented instructions, and odd addresses used where an even address is required. Such
faults cause the processor to trap to a system routine. Unless other arrangements have been made, an
illegal action causes the system to terminate the process and to write its image on file core in the current
directory. A debugger can be used to determine the state of the program at the time of the fault.
Programs that are looping, that produce unwanted output, or about which the user has second
thoughts may be halted by the use of the interrupt signal, which is generated by typing the ‘‘delete’’
character. Unless special action has been taken, this signal simply causes the program to cease execu-
tion without producing a core file. There is also a quit signal used to force an image file to be pro-
duced. Thus programs that loop unexpectedly may be halted and the remains inspected without prear-
rangement.
The hardware-generated faults and the interrupt and quit signals can, by request, be either ignored
or caught by a process. For example, the shell ignores quits to prevent a quit from logging the user out.
The editor catches interrupts and returns to its command level. This is useful for stopping long printouts
without losing work in progress (the editor manipulates a copy of the file it is editing). In systems
without floating-point hardware, unimplemented instructions are caught and floating-point instructions
are interpreted.
- 13 -

VIII. PERSPECTIVE
Perhaps paradoxically, the success of the UNIX system is largely due to the fact that it was not
designed to meet any predefined objectives. The first version was written when one of us (Thompson),
dissatisfied with the available computer facilities, discovered a little-used PDP-7 and set out to create a
more hospitable environment. This (essentially personal) effort was sufficiently successful to gain the
interest of the other author and several colleagues, and later to justify the acquisition of the PDP-11/20,
specifically to support a text editing and formatting system. When in turn the 11/20 was outgrown, the
system had proved useful enough to persuade management to invest in the PDP-11/45, and later in the
PDP-11/70 and Interdata 8/32 machines, upon which it developed to its present form. Our goals
throughout the effort, when articulated at all, have always been to build a comfortable relationship with
the machine and to explore ideas and inventions in operating systems and other software. We have not
been faced with the need to satisfy someone else’s requirements, and for this freedom we are grateful.
Three considerations that influenced the design of UNIX are visible in retrospect.
First: because we are programmers, we naturally designed the system to make it easy to write,
test, and run programs. The most important expression of our desire for programming convenience was
that the system was arranged for interactive use, even though the original version only supported one
user. We believe that a properly designed interactive system is much more productive and satisfying to
use than a ‘‘batch’’ system. Moreover, such a system is rather easily adaptable to noninteractive use,
while the converse is not true.
Second: there have always been fairly severe size constraints on the system and its software.
Given the partially antagonistic desires for reasonable efficiency and expressive power, the size con-
straint has encouraged not only economy, but also a certain elegance of design. This may be a thinly
disguised version of the ‘‘salvation through suffering’’ philosophy, but in our case it worked.
Third: nearly from the start, the system was able to, and did, maintain itself. This fact is more
important than it might seem. If designers of a system are forced to use that system, they quickly
become aware of its functional and superficial deficiencies and are strongly motivated to correct them
before it is too late. Because all source programs were always available and easily modified on-line, we
were willing to revise and rewrite the system and its software when new ideas were invented,
discovered, or suggested by others.
The aspects of UNIX discussed in this paper exhibit clearly at least the first two of these design
considerations. The interface to the file system, for example, is extremely convenient from a program-
ming standpoint. The lowest possible interface level is designed to eliminate distinctions between the
various devices and files and between direct and sequential access. No large ‘‘access method’’ routines
are required to insulate the programmer from the system calls; in fact, all user programs either call the
system directly or use a small library program, less than a page long, that buffers a number of characters
and reads or writes them all at once.
Another important aspect of programming convenience is that there are no ‘‘control blocks’’ with
a complicated structure partially maintained by and depended on by the file system or other system calls.
Generally speaking, the contents of a program’s address space are the property of the program, and we
have tried to avoid placing restrictions on the data structures within that address space.
Given the requirement that all programs should be usable with any file or device as input or out-
put, it is also desirable to push device-dependent considerations into the operating system itself. The
only alternatives seem to be to load, with all programs, routines for dealing with each device, which is
expensive in space, or to depend on some means of dynamically linking to the routine appropriate to
each device when it is actually needed, which is expensive either in overhead or in hardware.
Likewise, the process-control scheme and the command interface have proved both convenient and
efficient. Because the shell operates as an ordinary, swappable user program, it consumes no ‘‘wired-
down’’ space in the system proper, and it may be made as powerful as desired at little cost. In particu-
lar, given the framework in which the shell executes as a process that spawns other processes to perform
commands, the notions of I/O redirection, background processes, command files, and user-selectable sys-
tem interfaces all become essentially trivial to implement.
- 14 -

Influences
The success of UNIX lies not so much in new inventions but rather in the full exploitation of a
carefully selected set of fertile ideas, and especially in showing that they can be keys to the implementa-
tion of a small yet powerful operating system.
The fork operation, essentially as we implemented it, was present in the GENIE time-sharing sys-
tem.10 On a number of points we were influenced by Multics, which suggested the particular form of the
I/O system calls11 and both the name of the shell and its general functions. The notion that the shell
should create a process for each command was also suggested to us by the early design of Multics,
although in that system it was later dropped for efficiency reasons. A similar scheme is used by
TENEX.12

IX. STATISTICS
The following numbers are presented to suggest the scale of the Research UNIX operation. Those
of our users not involved in document preparation tend to use the system for program development,
especially language work. There are few important ‘‘applications’’ programs.
Overall, we have today:

125 user population


33 maximum simultaneous users
1,630 directories
28,300 files
301,700 512-byte secondary storage blocks used

There is a ‘‘background’’ process that runs at the lowest possible priority; it is used to soak up any idle
CPU time. It has been used to produce a million-digit approximation to the constant e, and other semi-
infinite problems. Not counting this background work, we average daily:

13,500 commands
9.6 CPU hours
230 connect hours
62 different users
240 log-ins

X. ACKNOWLEDGMENTS
The contributors to UNIX are, in the traditional but here especially apposite phrase, too numerous
to mention. Certainly, collective salutes are due to our colleagues in the Computing Science Research
Center. R. H. Canaday contributed much to the basic design of the file system. We are particularly
appreciative of the inventiveness, thoughtful criticism, and constant support of R. Morris, M. D. McIl-
roy, and J. F. Ossanna.

References
1. L. P. Deutsch and B. W. Lampson, ‘‘An online editor,’’ Comm. Assoc. Comp. Mach. 10(12),
pp.793-799, 803 (December 1967).
2. B. W. Kernighan and L. L. Cherry, ‘‘A System for Typesetting Mathematics,’’ Comm. Assoc.
Comp. Mach. 18, pp.151-157 (March 1975).
3. B. W. Kernighan, M. E. Lesk, and J. F. Ossanna, ‘‘UNIX Time-Sharing System: Document
Preparation,’’ Bell Sys. Tech. J. 57(6), pp.2115-2135 (1978).
- 15 -

4. T. A. Dolotta and J. R. Mashey, ‘‘An Introduction to the Programmer’s Workbench,’’ Proc. 2nd
Int. Conf. on Software Engineering, pp.164-168 (October 13-15, 1976).
5. T. A. Dolotta, R. C. Haight, and J. R. Mashey, ‘‘UNIX Time-Sharing System: The Programmer’s
Workbench,’’ Bell Sys. Tech. J. 57(6), pp.2177-2200 (1978).
6. H. Lycklama, ‘‘UNIX Time-Sharing System: UNIX on a Microprocessor,’’ Bell Sys. Tech. J. 57(6),
pp.2087-2101 (1978).
7. B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall, Englewood
Cliffs, New Jersey (1978).
8. Aleph-null, ‘‘Computer Recreations,’’ Software Practice and Experience 1(2), pp.201-204 (April-
June 1971).
9. S. R. Bourne, ‘‘UNIX Time-Sharing System: The UNIX Shell,’’ Bell Sys. Tech. J. 57(6), pp.1971-
1990 (1978).
10. L. P. Deutsch and B. W. Lampson, ‘‘SDS 930 time-sharing system preliminary reference manual,’’
Doc. 30.10.10, Project GENIE, Univ. Cal. at Berkeley (April 1965).
11. R. J. Feiertag and E. I. Organick, ‘‘The Multics input-output system,’’ Proc. Third Symposium on
Operating Systems Principles, pp.35-41 (October 18-20, 1971).
12. D. G. Bobrow, J. D. Burchfiel, D. L. Murphy, and R. S. Tomlinson, ‘‘TENEX, a Paged Time Shar-
ing System for the PDP-10,’’ Comm. Assoc. Comp. Mach. 15(3), pp.135-143 (March 1972).
UNIX For Beginners — Second Edition

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

This paper is meant to help new users get started on the UNIX† operating system. It
includes:
• basics needed for day-to-day use of the system — typing commands, correcting
typing mistakes, logging in and out, mail, inter-terminal communication, the file
system, printing files, redirecting I/O, pipes, and the shell.
• document preparation — a brief discussion of the major formatting programs and
macro packages, hints on preparing documents, and capsule descriptions of some
supporting software.
• UNIX programming — using the editor, programming the shell, programming in C,
other languages and tools.
• An annotated UNIX bibliography.

October 2, 1978

_______________
†UNIX is a Trademark of Bell Laboratories.
UNIX For Beginners — Second Edition

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

INTRODUCTION 4. Writing Programs: UNIX is an excellent system


From the user’s point of view, the UNIX operat- for developing programs. This section talks
ing system is easy to learn and use, and presents few about some of the tools, but again is not a
of the usual impediments to getting the job done. It tutorial in any of the programming languages
is hard, however, for the beginner to know where to provided by the system.
start, and how to make the best use of the facilities 5. A UNIX Reading List. An annotated bibliogra-
available. The purpose of this introduction is to help phy of documents that new users should be
new users get used to the main ideas of the UNIX sys- aware of.
tem and start making effective use of it quickly.
You should have a couple of other documents I. GETTING STARTED
with you for easy reference as you read this one. The
most important is The UNIX Programmer’s Manual ; Logging In
it’s often easier to tell you to read about something in You must have a UNIX login name, which you
the manual than to repeat its contents here. The other can get from whoever administers your system. You
useful document is A Tutorial Introduction to the also need to know the phone number, unless your
UNIX Text Editor, which will tell you how to use the system uses permanently connected terminals. The
editor to get text — programs, data, documents — UNIX system is capable of dealing with a wide
into the computer. variety of terminals: Terminet 300’s; Execuport, TI
A word of warning: the UNIX system has and similar portables; video (CRT) terminals like the
become quite popular, and there are several major HP2640, etc.; high-priced graphics terminals like the
variants in widespread use. Of course details also Tektronix 4014; plotting terminals like those from
change with time. So although the basic structure of GSI and DASI; and even the venerable Teletype in its
UNIX and how to use it is common to all versions, various forms. But note: UNIX is strongly oriented
there will certainly be a few things which are dif- towards devices with lower case. If your terminal
ferent on your system from what is described here. produces only upper case (e.g., model 33 Teletype,
We have tried to minimize the problem, but be aware some video and portable terminals), life will be so
of it. In cases of doubt, this paper describes Version difficult that you should look for another terminal.
7 UNIX. Be sure to set the switches appropriately on your
This paper has five sections: device. Switches that might need to be adjusted
include the speed, upper/lower case mode, full
1. Getting Started: How to log in, how to type, duplex, even parity, and any others that local wisdom
what to do about mistakes in typing, how to log advises. Establish a connection using whatever magic
out. Some of this is dependent on which system is needed for your terminal; this may involve dialing
you log into (phone numbers, for example) and a telephone call or merely flipping a switch. In either
what terminal you use, so this section must case, UNIX should type ‘‘login:’’ at you. If it types
necessarily be supplemented by local informa- garbage, you may be at the wrong speed; check the
tion. switches. If that fails, push the ‘‘break’’ or ‘‘inter-
2. Day-to-day Use: Things you need every day to rupt’’ key a few times, slowly. If that fails to pro-
use the system effectively: generally useful duce a login message, consult a guru.
commands; the file system. When you get a login: message, type your login
3. Document Preparation: Preparing manuscripts is name in lower case. Follow it by a RETURN; the sys-
one of the most common uses for UNIX systems. tem will not do anything until you type a RETURN.
This section contains advice, but not extensive If a password is required, you will be asked for it,
instructions on any of the formatting tools. and (if possible) printing will be turned off while you
type it. Don’t forget RETURN.
-2-

The culmination of your login efforts is a and the system will convert each tab into the right
‘‘prompt character,’’ a single character that indicates number of blanks for you. If your terminal does have
that the system is ready to accept commands from computer-settable tabs, the command tabs will set the
you. The prompt character is usually a dollar sign $ stops correctly for you.
or a percent sign %. (You may also get a message
of the day just before the prompt character, or a Mistakes in Typing
notification that you have mail.) If you make a typing mistake, and see it before
RETURN has been typed, there are two ways to
Typing Commands recover. The sharp-character # erases the last charac-
Once you’ve seen the prompt character, you can ter typed; in fact successive uses of # erase characters
type commands, which are requests that the system back to the beginning of the line (but not beyond).
do something. Try typing So if you type badly, you can correct as you go:
date dd#atte##e
followed by RETURN. You should get back some- is the same as date.
thing like The at-sign @ erases all of the characters typed
Mon Jan 16 14:17:10 EST 1978 so far on the current input line, so if the line is irre-
trievably fouled up, type an @ and start the line over.
Don’t forget the RETURN after the command, or noth-
ing will happen. If you think you’re being ignored, What if you must enter a sharp or at-sign as part
type a RETURN; something should happen. RETURN of the text? If you precede either # or @ by a
won’t be mentioned again, but don’t forget it — it backslash \, it loses its erase meaning. So to enter a
has to be there at the end of each line. sharp or at-sign in something, type \# or \@. The
system will always echo a newline at you after your
Another command you might try is who, which at-sign, even if preceded by a backslash. Don’t worry
tells you everyone who is currently logged in: — the at-sign has been recorded.
who To erase a backslash, you have to type two
gives something like sharps or two at-signs, as in \##. The backslash is
used extensively in UNIX to indicate that the follow-
mb tty01 Jan 16 09:11 ing character is in some way special.
ski tty05 Jan 16 09:33
gam tty11 Jan 16 13:07 Read-ahead
The time is when the user logged in; ‘‘ttyxx’’ is the UNIX has full read-ahead, which means that you
system’s idea of what terminal the user is on. can type as fast as you want, whenever you want,
If you make a mistake typing the command even when some command is typing at you. If you
name, and refer to a non-existent command, you will type during output, your input characters will appear
be told. For example, if you type intermixed with the output characters, but they will be
stored away and interpreted in the correct order. So
whom you can type several commands one after another
you will be told without waiting for the first to finish or even begin.

whom: not found Stopping a Program


Of course, if you inadvertently type the name of some You can stop most programs by typing the char-
other command, it will run, with more or less mys- acter ‘‘DEL’’ (perhaps called ‘‘delete’’ or ‘‘rubout’’
terious results. on your terminal). The ‘‘interrupt’’ or ‘‘break’’ key
found on most terminals can also be used. In a few
Strange Terminal Behavior programs, like the text editor, DEL stops whatever the
Sometimes you can get into a state where your program is doing but leaves you in that program.
terminal acts strangely. For example, each letter may Hanging up the phone will stop most programs.
be typed twice, or the RETURN may not cause a line
feed or a return to the left margin. You can often fix Logging Out
this by logging out and logging back in. Or you can The easiest way to log out is to hang up the
read the description of the command stty in section I phone. You can also type
of the manual. To get intelligent treatment of tab
login
characters (which are much used in UNIX) if your ter-
minal doesn’t have tabs, type the command and let someone else use the terminal you were on.
It is usually not sufficient just to turn off the terminal.
stty – tabs
Most UNIX systems do not use a time-out mechanism,
-3-

so you’ll be there forever unless you hang up. on yours and vice versa. The path is slow, rather like
talking to the moon. (If you are in the middle of
Mail something, you have to get to a state where you can
When you log in, you may sometimes get the type a command. Normally, whatever program you
message are running has to terminate or be terminated. If
you’re editing, you can escape temporarily from the
You have mail. editor — read the editor tutorial.)
UNIX provides a postal system so you can communi- A protocol is needed to keep what you type from
cate with other users of the system. To read your getting garbled up with what Joe types. Typically it’s
mail, type the command like this:
mail Joe types write smith and waits.
Smith types write joe and waits.
Your mail will be printed, one message at a time,
Joe now types his message (as many lines as
most recent message first. After each message, mail
he likes). When he’s ready for a reply, he
waits for you to say what to do with it. The two
signals it by typing (o), which stands for
basic responses are d, which deletes the message, and
‘‘over’’.
RETURN, which does not (so it will still be there the
Now Smith types a reply, also terminated by
next time you read your mailbox). Other responses
(o).
are described in the manual. (Earlier versions of mail
This cycle repeats until someone gets tired; he
do not process one message at a time, but are other-
then signals his intent to quit with (oo), for
wise similar.)
‘‘over and out’’.
How do you send mail to someone else? Sup- To terminate the conversation, each side must
pose it is to go to ‘‘joe’’ (assuming ‘‘joe’’ is type a ‘‘control-d’’ character alone on a line.
someone’s login name). The easiest way is this: (‘‘Delete’’ also works.) When the other
mail joe person types his ‘‘control-d’’, you will get the
now type in the text of the letter message EOF on your terminal.
on as many lines as you like ...
If you write to someone who isn’t logged in, or
After the last line of the letter
who doesn’t want to be disturbed, you’ll be told. If
type the character ‘‘control– d’’,
the target is logged in but doesn’t answer after a
that is, hold down ‘‘control’’ and type
decent interval, simply type ‘‘control-d’’.
a letter ‘‘d’’.
And that’s it. The ‘‘control-d’’ sequence, often called On-line Manual
‘‘EOF’’ for end-of-file, is used throughout the system The UNIX Programmer’s Manual is typically
to mark the end of input from a terminal, so you kept on-line. If you get stuck on something, and
might as well get used to it. can’t find an expert to assist you, you can print on
For practice, send mail to yourself. (This isn’t your terminal some manual section that might help.
as strange as it might sound — mail to oneself is a This is also useful for getting the most up-to-date
handy reminder mechanism.) information on a command. To print a manual sec-
There are other ways to send mail — you can tion, type ‘‘man command-name’’. Thus to read up
send a previously prepared letter, and you can mail to on the who command, type
a number of people all at once. For more details see man who
mail(1). (The notation mail(1) means the command
mail in section 1 of the UNIX Programmer’s Manual.) and, of course,
man man
Writing to other users
tells all about the man command.
At some point, out of the blue will come a mes-
sage like Computer Aided Instruction
Message from joe tty07... Your UNIX system may have available a program
accompanied by a startling beep. It means that Joe called learn, which provides computer aided instruc-
wants to talk to you, but unless you take explicit tion on the file system and basic commands, the edi-
action you won’t be able to talk back. To respond, tor, document preparation, and even C programming.
type the command Try typing the command

write joe learn

This establishes a two-way communication path. If learn exists on your system, it will tell you what to
Now whatever Joe types on his terminal will appear do from there.
-4-

II. DAY-TO-DAY USE


ls
Creating Files — The Editor the response will be
If you have to type a paper or a letter or a pro- junk
gram, how do you get the information stored in the temp
machine? Most of these tasks are done with the
UNIX ‘‘text editor’’ ed. Since ed is thoroughly docu- which are indeed the two files just created. The
mented in ed(1) and explained in A Tutorial Introduc- names are sorted into alphabetical order automati-
tion to the UNIX Text Editor, we won’t spend any cally, but other variations are possible. For example,
time here describing how to use it. All we want it the command
for right now is to make some files. (A file is just a ls – t
collection of information stored in the machine, a
simplistic but adequate definition.) causes the files to be listed in the order in which they
were last changed, most recent first. The – l option
To create a file called junk with some text in it, gives a ‘‘long’’ listing:
do the following:
ls – l
ed junk (invokes the text editor)
a (command to ‘‘ed’’, to add text) will produce something like
now type in – rw– rw– rw– 1 bwk 41 Jul 22 2:56 junk
whatever text you want ... – rw– rw– rw– 1 bwk 78 Jul 22 2:57 temp
. (signals the end of adding text)
The date and time are of the last change to the file.
The ‘‘.’’ that signals the end of adding text must be The 41 and 78 are the number of characters (which
at the beginning of a line by itself. Don’t forget it, should agree with the numbers you got from ed).
for until it is typed, no other ed commands will be bwk is the owner of the file, that is, the person who
recognized — everything you type will be treated as created it. The – rw– rw– rw– tells who has permis-
text to be added. sion to read and write the file, in this case everyone.
At this point you can do various editing opera- Options can be combined: ls – lt gives the same
tions on the text you typed in, such as correcting thing as ls – l, but sorted into time order. You can
spelling mistakes, rearranging paragraphs and the like. also name the files you’re interested in, and ls will
Finally, you must write the information you have list the information about them only. More details
typed into a file with the editor command w: can be found in ls(1).
w The use of optional arguments that begin with a
ed will respond with the number of characters it minus sign, like – t and – lt, is a common convention
wrote into the file junk. for UNIX programs. In general, if a program accepts
such optional arguments, they precede any filename
Until the w command, nothing is stored per- arguments. It is also vital that you separate the vari-
manently, so if you hang up and go home the infor- ous arguments with spaces: ls– l is not the same as
mation is lost.† But after w the information is there ls – l.
permanently; you can re-access it any time by typing
ed junk Printing Files

Type a q command to quit the editor. (If you try to Now that you’ve got a file of text, how do you
quit without writing, ed will print a ? to remind you. print it so people can look at it? There are a host of
A second q gets you out regardless.) programs that do that, probably more than are needed.

Now create a second file called temp in the One simple thing is to use the editor, since print-
same manner. You should now have two files, junk ing is often done just before making changes anyway.
and temp. You can say
ed junk
What files are out there? 1,$p
The ls (for ‘‘list’’) command lists the names (not ed will reply with the count of the characters in junk
contents) of any of the files that UNIX knows about. and then print all the lines in the file. After you learn
If you type how to use the editor, you can be selective about the
_____________________ parts you print.
† This is not strictly true — if you hang up while editing, the
There are times when it’s not feasible to use the
data you were working on is saved in a file called ed.hup,
which you can continue with at your next session. editor for printing. For example, there is a limit on
how big a file ed can handle (several thousand lines).
Secondly, it will only print one file at a time, and
-5-

sometimes you want to print several, one after


cp precious temp1
another. So here are a couple of alternatives.
First is cat, the simplest of all the printing pro- makes a duplicate copy of precious in temp1.
grams. cat simply prints on the terminal the contents Finally, when you get tired of creating and mov-
of all the files named in a list. Thus ing files, there is a command to remove files from the
file system, called rm.
cat junk
rm temp temp1
prints one file, and
will remove both of the files named.
cat junk temp
You will get a warning message if one of the
prints two. The files are simply concatenated (hence named files wasn’t there, but otherwise rm, like most
the name ‘‘cat’’) onto the terminal. UNIX commands, does its work silently. There is no
pr produces formatted printouts of files. As with prompting or chatter, and error messages are occa-
cat, pr prints all the files named in a list. The differ- sionally curt. This terseness is sometimes disconcert-
ence is that it produces headings with date, time, page ing to newcomers, but experienced users find it desir-
number and file name at the top of each page, and able.
extra lines to skip over the fold in the paper. Thus,
What’s in a Filename
pr junk temp
So far we have used filenames without ever say-
will print junk neatly, then skip to the top of a new ing what’s a legal name, so it’s time for a couple of
page and print temp neatly. rules. First, filenames are limited to 14 characters,
pr can also produce multi-column output: which is enough to be descriptive. Second, although
you can use almost any character in a filename, com-
pr – 3 junk
mon sense says you should stick to ones that are visi-
prints junk in 3-column format. You can use any ble, and that you should probably avoid characters
reasonable number in place of ‘‘3’’ and pr will do its that might be used with other meanings. We have
best. pr has other capabilities as well; see pr(1). already seen, for example, that in the ls command,
It should be noted that pr is not a formatting ls – t means to list in time order. So if you had a file
program in the sense of shuffling lines around and whose name was – t, you would have a tough time
justifying margins. The true formatters are nroff and listing it by name. Besides the minus sign, there are
troff, which we will get to in the section on docu- other characters which have special meaning. To
ment preparation. avoid pitfalls, you would do well to use only letters,
numbers and the period until you’re familiar with the
There are also programs that print files on a situation.
high-speed printer. Look in your manual under opr
and lpr. Which to use depends on what equipment is On to some more positive suggestions. Suppose
attached to your machine. you’re typing a large document like a book. Logi-
cally this divides into many small pieces, like
Shuffling Files About chapters and perhaps sections. Physically it must be
divided too, for ed will not handle really big files.
Now that you have some files in the file system Thus you should type the document as a number of
and some experience in printing them, you can try files. You might have a separate file for each chapter,
bigger things. For example, you can move a file from called
one place to another (which amounts to giving it a
new name), like this: chap1
chap2
mv junk precious etc...
This means that what used to be ‘‘junk’’ is now Or, if each chapter were broken into several files, you
‘‘precious’’. If you do an ls command now, you will might have
get
chap1.1
precious chap1.2
temp chap1.3
Beware that if you move a file to another one that ...
already exists, the already existing contents are lost chap2.1
forever. chap2.2
...
If you want to make a copy of a file (that is, to
have two versions of something), you can use the cp You can now tell at a glance where a particular file
command: fits into the whole.
-6-

There are advantages to a systematic naming


ls – l chap?.1
convention which are not obvious to the novice UNIX
user. What if you wanted to print the whole book? lists information about the first file of each chapter
You could say (chap1.1, chap2.1, etc.).
pr chap1.1 chap1.2 chap1.3 ...... Of these niceties, * is certainly the most useful,
and you should get used to it. The others are frills,
but you would get tired pretty fast, and would prob- but worth knowing.
ably even make mistakes. Fortunately, there is a
shortcut. You can say If you should ever have to turn off the special
meaning of *, ?, etc., enclose the entire argument in
pr chap* single quotes, as in
The * means ‘‘anything at all,’’ so this translates into ls ′?′
‘‘print all files whose names begin with chap’’, listed
in alphabetical order. We’ll see some more examples of this shortly.

This shorthand notation is not a property of the What’s in a Filename, Continued


pr command, by the way. It is system-wide, a ser-
vice of the program that interprets commands (the When you first made that file called junk, how
‘‘shell,’’ sh(1)). Using that fact, you can see how to did the system know that there wasn’t another junk
list the names of the files in the book: somewhere else, especially since the person in the
next office is also reading this tutorial? The answer
ls chap* is that generally each user has a private directory,
produces which contains only the files that belong to him.
When you log in, you are ‘‘in’’ your directory.
chap1.1 Unless you take special action, when you create a
chap1.2 new file, it is made in the directory that you are
chap1.3 currently in; this is most often your own directory,
... and thus the file is unrelated to any other file of the
The * is not limited to the last position in a filename same name that might exist in someone else’s direc-
— it can be anywhere and can occur several times. tory.
Thus The set of all files is organized into a (usually
big) tree, with your files located several branches into
rm *junk* *temp*
the tree. It is possible for you to ‘‘walk’’ around this
removes all files that contain junk or temp as any tree, and to find any file in the system, by starting at
part of their name. As a special case, * by itself the root of the tree and walking along the proper set
matches every filename, so of branches. Conversely, you can start where you are
and walk toward the root.
pr *
Let’s try the latter first. The basic tools is the
prints all your files (alphabetical order), and command pwd (‘‘print working directory’’), which
rm * prints the name of the directory you are currently in.

removes all files. (You had better be very sure that’s Although the details will vary according to the
what you wanted to say!) system you are on, if you give the command pwd, it
will print something like
The * is not the only pattern-matching feature
available. Suppose you want to print only chapters 1 /usr/your-name
through 4 and 9. Then you can say This says that you are currently in the directory
pr chap[12349]* your-name, which is in turn in the directory /usr,
which is in turn in the root directory called by con-
The [...] means to match any of the characters inside vention just /. (Even if it’s not called /usr on your
the brackets. A range of consecutive letters or digits system, you will get something analogous. Make the
can be abbreviated, so you can also do this with corresponding changes and read on.)
pr chap[1– 49]* If you now type
Letters can also be used within brackets: [a– z] ls /usr/your-name
matches any character in the range a through z.
you should get exactly the same list of file names as
The ? pattern matches any single character, so you get from a plain ls: with no arguments, ls lists
ls ? the contents of the current directory; given the name
of a directory, it lists the contents of that directory.
lists all files which have single-character names, and
-7-

Next, try
ls /usr/neighbor-name
ls /usr
or make your own copy of one of his files by
This should print a long series of names, among
cp /usr/your-neighbor/his-file yourfile
which is your own login name your-name. On many
systems, usr is a directory that contains the direc- If your neighbor doesn’t want you poking around
tories of all the normal users of the system, like you. in his files, or vice versa, privacy can be arranged.
The next step is to try Each file and directory has read-write-execute permis-
sions for the owner, a group, and everyone else,
ls /
which can be set to control access. See ls(1) and
You should get a response something like this chmod(1) for details. As a matter of observed fact,
(although again the details may be different): most users most of the time find openness of more
benefit than privacy.
bin
dev As a final experiment with pathnames, try
etc ls /bin /usr/bin
lib
tmp Do some of the names look familiar? When you run
usr a program, by typing its name after the prompt char-
acter, the system simply looks for a file of that name.
This is a collection of the basic directories of files It normally looks first in your directory (where it typi-
that the system knows about; we are at the root of the cally doesn’t find it), then in /bin and finally in
tree. /usr/bin. There is nothing magic about commands
Now try like cat or ls, except that they have been collected
into a couple of places to be easy to find and admin-
cat /usr/your-name/junk
ister.
(if junk is still around in your directory). The name What if you work regularly with someone else
/usr/your-name/junk on common information in his directory? You could
just log in as your friend each time you want to, but
is called the pathname of the file that you normally you can also say ‘‘I want to work on his files instead
think of as ‘‘junk’’. ‘‘Pathname’’ has an obvious of my own’’. This is done by changing the directory
meaning: it represents the full name of the path you that you are currently in:
have to follow from the root through the tree of direc-
tories to get to a particular file. It is a universal rule cd /usr/your-friend
in the UNIX system that anywhere you can use an (On some systems, cd is spelled chdir.) Now when
ordinary filename, you can use a pathname. you use a filename in something like cat or pr, it
Here is a picture which may make this clearer: refers to the file in your friend’s directory. Changing
(root) directories doesn’t affect any permissions associated
⁄ \ with a file — if you couldn’t access a file from your
⁄  \ own directory, changing to another directory won’t
⁄  \ alter that fact. Of course, if you forget what directory
bin etc usr dev tmp you’re in, type
⁄ \ ⁄ \ ⁄ \ ⁄ \ ⁄ \
⁄  \ pwd
⁄  \
adam eve mary to find out.
⁄ ⁄ \ \
⁄ \ junk It is usually convenient to arrange your own files
junk temp so that all the files related to one thing are in a direc-
tory separate from other projects. For example, when
Notice that Mary’s junk is unrelated to Eve’s. you write your book, you might want to keep all the
This isn’t too exciting if all the files of interest text in a directory called book. So make one with
are in your own directory, but if you work with mkdir book
someone else or on several projects concurrently, it
becomes handy indeed. For example, your friends then go to it with
can print your book by saying cd book
pr /usr/your-name/chap* then start typing chapters. The book is now found in
Similarly, you can find out what files your neighbor (presumably)
has by saying /usr/your-name/book
-8-

To remove the directory book, type Pipes


rm book/* One of the novel contributions of the UNIX sys-
rmdir book tem is the idea of a pipe. A pipe is simply a way to
connect the output of one program to the input of
The first command removes all files from the direc- another program, so the two run as a sequence of
tory; the second removes the empty directory. processes — a pipeline.
You can go up one level in the tree of files by For example,
saying
pr f g h
cd ..
will print the files f, g, and h, beginning each on a
‘‘..’’ is the name of the parent of whatever directory new page. Suppose you want them run together
you are currently in. For completeness, ‘‘.’’ is an instead. You could say
alternate name for the directory you are in.
cat f g h >temp
Using Files instead of the Terminal pr <temp
rm temp
Most of the commands we have seen so far pro-
duce output on the terminal; some, like the editor, but this is more work than necessary. Clearly what
also take their input from the terminal. It is universal we want is to take the output of cat and connect it to
in UNIX systems that the terminal can be replaced by the input of pr. So let us use a pipe:
a file for either or both of input and output. As one
cat f g h  pr
example,
The vertical bar  means to take the output from cat,
ls
which would normally have gone to the terminal, and
makes a list of files on your terminal. But if you say put it into pr to be neatly formatted.
ls >filelist There are many other examples of pipes. For
example,
a list of your files will be placed in the file filelist
(which will be created if it doesn’t already exist, or ls  pr – 3
overwritten if it does). The symbol > means ‘‘put the prints a list of your files in three columns. The pro-
output on the following file, rather than on the termi- gram wc counts the number of lines, words and char-
nal.’’ Nothing is produced on the terminal. As acters in its input, and as we saw earlier, who prints a
another example, you could combine several files into list of currently-logged on people, one per line. Thus
one by capturing the output of cat in a file:
who  wc
cat f1 f2 f3 >temp
tells how many people are logged on. And of course
The symbol >> operates very much like > does,
ls  wc
except that it means ‘‘add to the end of.’’ That is,
counts your files.
cat f1 f2 f3 >>temp
Any program that reads from the terminal can
means to concatenate f1, f2 and f3 to the end of read from a pipe instead; any program that writes on
whatever is already in temp, instead of overwriting the terminal can drive a pipe. You can have as many
the existing contents. As with >, if temp doesn’t elements in a pipeline as you wish.
exist, it will be created for you.
Many UNIX programs are written so that they
In a similar way, the symbol < means to take the will take their input from one or more files if file
input for a program from the following file, instead of arguments are given; if no arguments are given they
from the terminal. Thus, you could make up a script will read from the terminal, and thus can be used in
of commonly used editing commands and put them pipelines. pr is one example:
into a file called script. Then you can run the script
on a file by saying pr – 3 a b c

ed file <script prints files a, b and c in order in three columns. But


in
As another example, you can use ed to prepare a
letter in file let, then send it to several people with cat a b c  pr – 3

mail adam eve mary joe <let pr prints the information coming down the pipeline,
still in three columns.
-9-

The Shell commands. (Why not? The shell, after all, is just a
We have already mentioned once or twice the program, albeit a clever one.) For instance, suppose
mysterious ‘‘shell,’’ which is in fact sh(1). The shell you want to set tabs on your terminal, and find out
is the program that interprets what you type as com- the date and who’s on the system every time you log
mands and arguments. It also looks after translating in. Then you can put the three necessary commands
*, etc., into lists of filenames, and <, >, and  into (tabs, date, who) into a file, let’s call it startup, and
changes of input and output streams. then run it with

The shell has other capabilities too. For exam- sh startup


ple, you can run two programs with one command This says to run the shell with the file startup as
line by separating the commands with a semicolon; input. The effect is as if you had typed the contents
the shell recognizes the semicolon and breaks the line of startup on the terminal.
into two commands. Thus
If this is to be a regular thing, you can eliminate
date; who the need to type sh: simply type, once only, the com-
does both commands before returning with a prompt mand
character. chmod +x startup
You can also have more than one program run- and thereafter you need only say
ning simultaneously if you wish. For example, if you
are doing something time-consuming, like the editor startup
script of an earlier section, and you don’t want to to run the sequence of commands. The chmod(1)
wait around for the results before starting something command marks the file executable; the shell recog-
else, you can say nizes this and runs it as a sequence of commands.
ed file <script & If you want startup to run automatically every
The ampersand at the end of a command line says time you log in, create a file in your login directory
‘‘start this command running, then take further com- called .profile, and place in it the line startup.
mands from the terminal immediately,’’ that is, don’t When the shell first gains control when you log in, it
wait for it to complete. Thus the script will begin, looks for the .profile file and does whatever com-
but you can do something else at the same time. Of mands it finds in it. We’ll get back to the shell in the
course, to keep the output from interfering with what section on programming.
you’re doing on the terminal, it would be better to
say
III. DOCUMENT PREPARATION
ed file <script >script.out &
UNIX systems are used extensively for document
which saves the output lines in a file called preparation. There are two major formatting pro-
script.out. grams, that is, programs that produce a text with
When you initiate a command with &, the sys- justified right margins, automatic page numbering and
tem replies with a number called the process number, titling, automatic hyphenation, and the like. nroff is
which identifies the command in case you later want designed to produce output on terminals and line-
to stop it. If you do, you can say printers. troff (pronounced ‘‘tee-roff’’) instead drives
a phototypesetter, which produces very high quality
kill process-number output on photographic paper. This paper was for-
If you forget the process number, the command ps matted with troff.
will tell you about everything you have running. (If
you are desperate, kill 0 will kill all your processes.) Formatting Packages
And if you’re curious about other people, ps a will The basic idea of nroff and troff is that the text
tell you about all programs that are currently running. to be formatted contains within it ‘‘formatting com-
You can say mands’’ that indicate in detail how the formatted text
is to look. For example, there might be commands
(command-1; command-2; command-3) & that specify how long lines are, whether to use single
to start three commands in the background, or you or double spacing, and what running titles to use on
can start a background pipeline with each page.
Because nroff and troff are relatively hard to
command-1  command-2 &
learn to use effectively, several ‘‘packages’’ of
Just as you can tell the editor or some similar canned formatting requests are available to let you
program to take its input from a file instead of from specify paragraphs, running titles, footnotes, multi-
the terminal, you can tell the shell to read a file to get column output, and so on, with little effort and
without having to learn nroff and troff. These pack-
- 10 -

ages take a modest effort to learn, but the rewards for that closely resembles the way you would speak it
using them are so great that it is time well spent. aloud. For example, the eqn input
In this section, we will provide a hasty look at sum from i=0 to n x sub i ˜=˜ pi over 2
the ‘‘manuscript’’ package known as – ms. Format-
ting requests typically consist of a period and two produces the output
upper-case letters, such as .TL, which is used to n
π
introduce a title, or .PP to begin a new paragraph. Σ xi
i =0
= __
2
A document is typed so it looks something like
this: The program tbl provides an analogous service
for preparing tabular material; it does all the computa-
.TL tions necessary to align complicated columns with
title of document elements of varying widths.
.AU
refer prepares bibliographic citations from a data
author name
base, in whatever style is defined by the formatting
.SH
package. It looks after all the details of numbering
section heading
references in sequence, filling in page and volume
.PP
numbers, getting the author’s initials and the journal
paragraph ...
name right, and so on.
.PP
another paragraph ... spell and typo detect possible spelling mistakes
.SH in a document. spell works by comparing the words
another section heading in your document to a dictionary, printing those that
.PP are not in the dictionary. It knows enough about
etc. English spelling to detect plurals and the like, so it
does a very good job. typo looks for words which
The lines that begin with a period are the formatting are ‘‘unusual’’, and prints those. Spelling mistakes
requests. For example, .PP calls for starting a new tend to be more unusual, and thus show up early
paragraph. The precise meaning of .PP depends on when the most unusual words are printed first.
what output device is being used (typesetter or termi-
nal, for instance), and on what publication the docu- grep looks through a set of files for lines that
ment will appear in. For example, – ms normally contain a particular text pattern (rather like the
assumes that a paragraph is preceded by a space (one editor’s context search does, but on a bunch of files).
line in nroff, 1⁄2 line in troff), and the first word is For example,
indented. These rules can be changed if you like, but grep ′ing$′ chap*
they are changed by changing the interpretation of
.PP, not by re-typing the document. will find all lines that end with the letters ing in the
files chap*. (It is almost always a good practice to
To actually produce a document in standard for- put single quotes around the pattern you’re searching
mat using – ms, use the command for, in case it contains characters like * or $ that have
troff – ms files ... a special meaning to the shell.) grep is often useful
for finding out in which of a set of files the
for the typesetter, and misspelled words detected by spell are actually
nroff – ms files ... located.
for a terminal. The – ms argument tells troff and diff prints a list of the differences between two
nroff to use the manuscript package of formatting files, so you can compare two versions of something
requests. automatically (which certainly beats proofreading by
hand).
There are several similar packages; check with a
local expert to determine which ones are in common wc counts the words, lines and characters in a
use on your machine. set of files. tr translates characters into other charac-
ters; for example it will convert upper to lower case
Supporting Tools and vice versa. This translates upper into lower:
In addition to the basic formatters, there is a host tr A– Z a– z <input >output
of supporting programs that help with document
preparation. The list in the next few paragraphs is far sort sorts files in a variety of ways; cref makes
from complete, so browse through the manual and cross-references; ptx makes a permuted index
check with people around you for other possibilities. (keyword-in-context listing). sed provides many of
the editing facilities of ed, but can apply them to
eqn and neqn let you integrate mathematics into arbitrarily long inputs. awk provides the ability to do
the text of a document, in an easy-to-learn language both pattern matching and numeric computations, and
- 11 -

to conveniently process fields within lines. These The Shell


programs are for more advanced users, and they are The pipe mechanism lets you fabricate quite
not limited to document preparation. Put them on complicated operations out of spare parts that already
your list of things to learn about. exist. For example, the first draft of the spell pro-
Most of these programs are either independently gram was (roughly)
documented (like eqn and tbl), or are sufficiently
cat ... collect the files
simple that the description in the UNIX Programmer’s
 tr ... put each word on a new line
Manual is adequate explanation.
 tr ... delete punctuation, etc.
 sort into dictionary order
Hints for Preparing Documents
 uniq discard duplicates
Most documents go through several versions  comm print words in text
(always more than you expected) before they are but not in dictionary
finally finished. Accordingly, you should do whatever
possible to make the job of changing them easy. More pieces have been added subsequently, but this
goes a long way for such a small effort.
First, when you do the purely mechanical opera-
tions of typing, type so that subsequent editing will The editor can be made to do things that would
be easy. Start each sentence on a new line. Make normally require special programs on other systems.
lines short, and break lines at natural places, such as For example, to list the first and last lines of each of
after commas and semicolons, rather than randomly. a set of files, such as a book, you could laboriously
Since most people change documents by rewriting type
phrases and adding, deleting and rearranging sen- ed
tences, these precautions simplify any editing you e chap1.1
have to do later. 1p
Keep the individual files of a document down to $p
modest size, perhaps ten to fifteen thousand charac- e chap1.2
ters. Larger files edit more slowly, and of course if 1p
you make a dumb mistake it’s better to have clob- $p
bered a small file than a big one. Split into files at etc.
natural boundaries in the document, for the same rea- But you can do the job much more easily. One way
sons that you start each sentence on a new line. is to type
The second aspect of making change easy is to
ls chap* >temp
not commit yourself to formatting details too early.
One of the advantages of formatting packages like to get the list of filenames into a file. Then edit this
– ms is that they permit you to delay decisions to the file to make the necessary series of editing commands
last possible moment. Indeed, until a document is (using the global commands of ed), and write it into
printed, it is not even decided whether it will be script. Now the command
typeset or put on a line printer.
ed <script
As a rule of thumb, for all but the most trivial
jobs, you should type a document in terms of a set of will produce the same output as the laborious hand
requests like .PP, and then define them appropriately, typing. Alternately (and more easily), you can use
either by using one of the canned packages (the better the fact that the shell will perform loops, repeating a
way) or by defining your own nroff and troff com- set of commands over and over again for a set of
mands. As long as you have entered the text in some arguments:
systematic way, it can always be cleaned up and re- for i in chap*
formatted by a judicious combination of editing com- do
mands and request definitions. ed $i <script
done
IV. PROGRAMMING
This sets the shell variable i to each file name in turn,
There will be no attempt made to teach any of then does the command. You can type this command
the programming languages available but a few words at the terminal, or put it in a file for later execution.
of advice are in order. One of the reasons why the
UNIX system is a productive programming environ- Programming the Shell
ment is that there is already a rich set of tools avail-
able, and facilities like pipes, I/O redirection, and the An option often overlooked by newcomers is that
capabilities of the shell often make it possible to do a the shell is itself a programming language, with vari-
job by pasting together programs that already exist ables, control flow (if-else, while, for, case), subrou-
instead of writing from scratch. tines, and interrupt handling. Since there are many
- 12 -

building-block programs, you can sometimes avoid mand time will give you the gross run-time statistics
writing a new program merely by piecing together of a program, but they are not super accurate or
some of the building blocks with shell command files. reproducible.
We will not go into any details here; examples
and rules can be found in An Introduction to the UNIX Other Languages
Shell, by S. R. Bourne. If you have to use Fortran, there are two possi-
bilities. You might consider Ratfor, which gives you
Programming in C the decent control structures and free-form input that
If you are undertaking anything substantial, C is characterize C, yet lets you write code that is still
the only reasonable choice of programming language: portable to other environments. Bear in mind that
everything in the UNIX system is tuned to it. The UNIX Fortran tends to produce large and relatively
system itself is written in C, as are most of the pro- slow-running programs. Furthermore, supporting
grams that run on it. It is also a easy language to use software like adb, prof, etc., are all virtually useless
once you get started. C is introduced and fully with Fortran programs. There may also be a Fortran
described in The C Programming Language by B. W. 77 compiler on your system. If so, this is a viable
Kernighan and D. M. Ritchie (Prentice-Hall, 1978). alternative to Ratfor, and has the non-trivial advan-
Several sections of the manual describe the system tage that it is compatible with C and related pro-
interfaces, that is, how you do I/O and similar func- grams. (The Ratfor processor and C tools can be
tions. Read UNIX Programming for more compli- used with Fortran 77 too.)
cated things. If your application requires you to translate a
Most input and output in C is best handled with language into a set of actions or another language,
the standard I/O library, which provides a set of I/O you are in effect building a compiler, though probably
functions that exist in compatible form on most a small one. In that case, you should be using the
machines that have C compilers. In general, it’s yacc compiler-compiler, which helps you develop a
wisest to confine the system interactions in a program compiler quickly. The lex lexical analyzer generator
to the facilities provided by this library. does the same job for the simpler languages that can
be expressed as regular expressions. It can be used
C programs that don’t depend too much on spe- by itself, or as a front end to recognize inputs for a
cial features of UNIX (such as pipes) can be moved to yacc-based program. Both yacc and lex require some
other computers that have C compilers. The list of sophistication to use, but the initial effort of learning
such machines grows daily; in addition to the original them can be repaid many times over in programs that
PDP-11, it currently includes at least Honeywell 6000, are easy to change later on.
IBM 370, Interdata 8/32, Data General Nova and
Eclipse, HP 2100, Harris /7, VAX 11/780, SEL 86, Most UNIX systems also make available other
and Zilog Z80. Calls to the standard I/O library will languages, such as Algol 68, APL, Basic, Lisp, Pas-
work on all of these machines. cal, and Snobol. Whether these are useful depends
largely on the local environment: if someone cares
There are a number of supporting programs that about the language and has worked on it, it may be in
go with C. lint checks C programs for potential por- good shape. If not, the odds are strong that it will be
tability problems, and detects errors such as more trouble than it’s worth.
mismatched argument types and uninitialized vari-
ables. V. UNIX READING LIST
For larger programs (anything whose source is
on more than one file) make allows you to specify General:
the dependencies among the source files and the pro- K. L. Thompson and D. M. Ritchie, The UNIX
cessing steps needed to make a new version; it then Programmer’s Manual, Bell Laboratories, 1978.
checks the times that the pieces were last changed Lists commands, system routines and interfaces, file
and does the minimal amount of recompiling to create formats, and some of the maintenance procedures.
a consistent updated version. You can’t live without this, although you will prob-
The debugger adb is useful for digging through ably only need to read section 1.
the dead bodies of C programs, but is rather hard to Documents for Use with the UNIX Time-sharing Sys-
learn to use effectively. The most effective debug- tem. Volume 2 of the Programmer’s Manual. This
ging tool is still careful thought, coupled with judi- contains more extensive descriptions of major com-
ciously placed print statements. mands, and tutorials and reference manuals. All of
The C compiler provides a limited instrumenta- the papers listed below are in it, as are descriptions of
tion service, so you can find out where programs most of the programs mentioned above.
spend their time and what parts are worth optimizing. D. M. Ritchie and K. L. Thompson, ‘‘The UNIX
Compile the routines with the – p option; after the test Time-sharing System,’’ CACM, July 1974. An over-
run, use prof to print an execution profile. The com-
- 13 -

view of the system, for people interested in operating S. C. Johnson, ‘‘Yacc — Yet Another Compiler-
systems. Worth reading by anyone who programs. Compiler,’’ Bell Laboratories CSTR 32, 1978.
Contains a remarkable number of one-sentence obser- M. E. Lesk, ‘‘Lex — A Lexical Analyzer Genera-
vations on how to do things right. tor,’’ Bell Laboratories CSTR 39, 1975.
The Bell System Technical Journal (BSTJ) Special S. C. Johnson, ‘‘Lint, a C Program Checker,’’ Bell
Issue on UNIX, July/August, 1978, contains many Laboratories CSTR 65, 1977.
papers describing recent developments, and some
retrospective material. S. I. Feldman, ‘‘MAKE — A Program for Maintain-
ing Computer Programs,’’ Bell Laboratories CSTR
The 2nd International Conference on Software 57, 1977.
Engineering (October, 1976) contains several papers
describing the use of the Programmer’s Workbench J. F. Maranzano and S. R. Bourne, ‘‘A Tutorial Intro-
(PWB) version of UNIX. duction to ADB,’’ Bell Laboratories CSTR 62, 1977.
An introduction to a powerful but complex debugging
Document Preparation: tool.

B. W. Kernighan, ‘‘A Tutorial Introduction to the S. I. Feldman and P. J. Weinberger, ‘‘A Portable For-
UNIX Text Editor’’ and ‘‘Advanced Editing on tran 77 Compiler,’’ Bell Laboratories, 1978. A full
UNIX,’’ Bell Laboratories, 1978. Beginners need the Fortran 77 for UNIX systems.
introduction; the advanced material will help you get
the most out of the editor.
M. E. Lesk, ‘‘Typing Documents on UNIX,’’ Bell
Laboratories, 1978. Describes the – ms macro pack-
age, which isolates the novice from the vagaries of
nroff and troff, and takes care of most formatting
situations. If this specific package isn’t available on
your system, something similar probably is. The
most likely alternative is the PWB/UNIX macro pack-
age – mm; see your local guru if you use PWB/UNIX.
B. W. Kernighan and L. L. Cherry, ‘‘A System for
Typesetting Mathematics,’’ Bell Laboratories Com-
puting Science Tech. Rep. 17.
M. E. Lesk, ‘‘Tbl — A Program to Format Tables,’’
Bell Laboratories CSTR 49, 1976.
J. F. Ossanna, Jr., ‘‘NROFF/TROFF User’s Manual,’’
Bell Laboratories CSTR 54, 1976. troff is the basic
formatter used by – ms, eqn and tbl. The reference
manual is indispensable if you are going to write or
maintain these or similar programs. But start with:
B. W. Kernighan, ‘‘A TROFF Tutorial,’’ Bell
Laboratories, 1976. An attempt to unravel the intrica-
cies of troff.

Programming:
B. W. Kernighan and D. M. Ritchie, The C Program-
ming Language, Prentice-Hall, 1978. Contains a
tutorial introduction, complete discussions of all
language features, and the reference manual.
B. W. Kernighan and D. M. Ritchie, ‘‘UNIX Program-
ming,’’ Bell Laboratories, 1978. Describes how to
interface with the system from C programs: I/O calls,
signals, processes.
S. R. Bourne, ‘‘An Introduction to the UNIX Shell,’’
Bell Laboratories, 1978. An introduction and refer-
ence manual for the Version 7 shell. Mandatory
reading if you intend to make effective use of the
programming power of this shell.
A Tutorial Introduction to the UNIX Text Editor

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

Almost all text input on the UNIX† operating system is done with the text-editor ed.
This memorandum is a tutorial guide to help beginners get started with text editing.
Although it does not cover everything, it does discuss enough for most users’ day-
to-day needs. This includes printing, appending, changing, deleting, moving and
inserting entire lines of text; reading and writing files; context searching and line
addressing; the substitute command; the global commands; and the use of special char-
acters for advanced editing.

September 21, 1978

_______________
†UNIX is a Trademark of Bell Laboratories.
A Tutorial Introduction to the UNIX Text Editor

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

Introduction Creating Text – the Append command ‘‘a’’


Ed is a ‘‘text editor’’, that is, an interactive pro- As your first problem, suppose you want to create
gram for creating and modifying ‘‘text’’, using direc- some text starting from scratch. Perhaps you are typ-
tions provided by a user at a terminal. The text is ing the very first draft of a paper; clearly it will have
often a document like this one, or a program or to start somewhere, and undergo modifications later.
perhaps data for a program. This section will show how to get some text in, just
This introduction is meant to simplify learning ed. to get started. Later we’ll talk about how to change
The recommended way to learn ed is to read this it.
document, simultaneously using ed to follow the When ed is first started, it is rather like working
examples, then to read the description in section I of with a blank piece of paper – there is no text or
the UNIX Programmer’s Manual, all the while experi- information present. This must be supplied by the
menting with ed. (Solicitation of advice from experi- person using ed; it is usually done by typing in the
enced users is also useful.) text, or by reading it into ed from a file. We will
Do the exercises! They cover material not com- start by typing in some text, and return shortly to how
pletely discussed in the actual text. An appendix to read files.
summarizes the commands. First a bit of terminology. In ed jargon, the text
being worked on is said to be ‘‘kept in a buffer.’’
Disclaimer Think of the buffer as a work space, if you like, or
This is an introduction and a tutorial. For this simply as the information that you are going to be
reason, no attempt is made to cover more than a part editing. In effect the buffer is like the piece of paper,
of the facilities that ed offers (although this fraction on which we will write things, then change some of
includes the most useful and frequently used parts). them, and finally file the whole thing away for
When you have mastered the Tutorial, try Advanced another day.
Editing on UNIX. Also, there is not enough space to The user tells ed what to do to his text by typing
explain basic UNIX procedures. We will assume that instructions called ‘‘commands.’’ Most commands
you know how to log on to UNIX, and that you have consist of a single letter, which must be typed in
at least a vague understanding of what a file is. For lower case. Each command is typed on a separate
more on that, read UNIX for Beginners. line. (Sometimes the command is preceded by infor-
You must also know what character to type as the mation about what line or lines of text are to be
end-of-line on your particular terminal. This charac- affected – we will discuss these shortly.) Ed makes
ter is the RETURN key on most terminals. no response to most commands – there is no prompt-
Throughout, we will refer to this character, whatever ing or typing of messages like ‘‘ready’’. (This
it is, as RETURN. silence is preferred by experienced users, but some-
times a hangup for beginners.)
Getting Started The first command is append, written as the letter
We’ll assume that you have logged in to your a
system and it has just printed the prompt character,
usually either a $ or a %. The easiest way to get ed all by itself. It means ‘‘append (or add) text lines to
is to type the buffer, as I type them in.’’ Appending is rather
like writing fresh material on a piece of paper.
eedd ((ffoolllloow
weedd bbyy a rreettuurrnn))
So to enter lines of text into the buffer, just type
You are now ready to go – ed is waiting for you to an a followed by a RETURN, followed by the lines of
tell it what to do. text you want, like this:
-2-

you give a w command. (Writing out the text onto a


a
file from time to time as it is being created is a good
NNoow w iiss tthhee ttiim
mee
idea, since if the system crashes or if you make some
ffoorr aallll ggoooodd m
meenn
horrible mistake, you will lose all the text in the
ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
buffer but any text that was written onto a file is rela-
. tively safe.)
The only way to stop appending is to type a line
that contains only a period. The ‘‘.’’ is used to tell Leaving ed – the Quit command ‘‘q’’
ed that you have finished appending. (Even experi- To terminate a session with ed, save the text
enced users forget that terminating ‘‘.’’ sometimes. If you’re working on by writing it onto a file using the
ed seems to be ignoring you, type an extra line with w command, and then type the command
just ‘‘.’’ on it. You may then find you’ve added
q
some garbage lines to your text, which you’ll have to
take out later.) which stands for quit. The system will respond with
After the append command has been done, the the prompt character ($ or %). At this point your
buffer will contain the three lines buffer vanishes, with all its text, which is why you
want to write it out before quitting.†
NNoow w iiss tthhee ttiim
mee
ffoorr aallll ggoooodd m
meenn Exercise 1:
ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
Enter ed and create some text using
The ‘‘a’’ and ‘‘.’’ aren’t there, because they are not
a
text.
. . . tteexxtt . . .
To add more text to what you already have, just .
issue another a command, and continue typing.
Write it out using w. Then leave ed with the q com-
Error Messages – ‘‘?’’ mand, and print the file, to see that everything
worked. (To print a file, say
If at any time you make an error in the com-
mands you type to ed, it will tell you by typing pprr fi
filleennaam
mee

? or

This is about as cryptic as it can be, but with practice, ccaatt fi


filleennaam
mee
you can usually figure out how you goofed. in response to the prompt character. Try both.)

Writing text out as a file – the Write command Reading text from a file – the Edit command ‘‘e’’
‘‘w’’
A common way to get text into the buffer is to
It’s likely that you’ll want to save your text for read it from a file in the file system. This is what
later use. To write out the contents of the buffer onto you do to edit text that you saved with the w com-
a file, use the write command mand in a previous session. The edit command e
w fetches the entire contents of a file into the buffer.
So if you had saved the three lines ‘‘Now is the
followed by the filename you want to write on. This time’’, etc., with a w command in an earlier session,
will copy the buffer’s contents onto the specified file the ed command
(destroying any previous information on the file). To
save the text on a file named junk, for example, type e jjuunnkk

w jjuunnkk would fetch the entire contents of the file junk into
the buffer, and respond
Leave a space between w and the file name. Ed will
respond by printing the number of characters it wrote 6688
out. In this case, ed would respond with which is the number of characters in junk. If any-
6688 thing was already in the buffer, it is deleted first.

(Remember that blanks and the return character at the If you use the e command to read a file into the
end of each line are included in the character count.) buffer, then you need not use a file name after a sub-
Writing a file just makes a copy of the text – the sequent w command; ed remembers the last file name
_____________________
buffer’s contents are not disturbed, so you can go on † Actually, ed will print ? if you try to quit without writing.
adding lines to it. This is an important point. Ed at At that point, write if you want; if not, another q will get you
all times works on a copy of a file, not the file itself. out regardless.
No change in the contents of a file takes place until
-3-

used in an e command, and w will write on this file. is exactly equivalent to


Thus a good way to operate is
eedd
eedd e fifilleennaam
mee
e fi fillee
What does
[[eeddiittiinngg sseessssiioonn]]
w f fi
filleennaam
mee
q
do?
This way, you can simply say w from time to time,
and be secure in the knowledge that if you got the file Printing the contents of the buffer – the Print
name right at the beginning, you are writing into the command ‘‘p’’
proper file each time. To print or list the contents of the buffer (or parts
You can find out at any time what file name ed is of it) on the terminal, use the print command
remembering by typing the file command f. In this
p
example, if you typed
The way this is done is as follows. Specify the lines
f
where you want printing to begin and where you want
ed would reply it to end, separated by a comma, and followed by the
letter p. Thus to print the first two lines of the
jjuunnkk
buffer, for example, (that is, lines 1 through 2) say
11,,22pp ((ssttaarrttiinngg lliinnee=
=11,, eennddiinngg lliinnee=
=22 pp))
Reading text from a file – the Read command ‘‘r’’
Sometimes you want to read a file into the buffer Ed will respond with
without destroying anything that is already there. NNoow w iiss tthhee ttiim
mee
This is done by the read command r. The command ffoorr aallll ggoooodd m
meenn
r jjuunnkk
Suppose you want to print all the lines in the
will read the file junk into the buffer; it adds it to the buffer. You could use 1,3p as above if you knew
end of whatever is already in the buffer. So if you there were exactly 3 lines in the buffer. But in gen-
do a read after an edit: eral, you don’t know how many there are, so what do
you use for the ending line number? Ed provides a
e jjuunnkk
shorthand symbol for ‘‘line number of last line in
r jjuunnkk
buffer’’ – the dollar sign $. Use it this way:
the buffer will contain two copies of the text (six
11,,$$pp
lines).
This will print all the lines in the buffer (line 1 to last
NNoow w iiss tthhee ttiim
mee
line.) If you want to stop the printing before it is
ffoorr aallll ggoooodd m
meenn
finished, push the DEL or Delete key; ed will type
ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
NNoow w iiss tthhee ttiim
mee ?
ffoorr aallll ggoooodd m
meenn
and wait for the next command.
ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
To print the last line of the buffer, you could use
Like the w and e commands, r prints the number of
characters read in, after the reading operation is com- $$,,$$pp
plete. but ed lets you abbreviate this to
Generally speaking, r is much less used than e.
$$pp
Exercise 2: You can print any single line by typing the line
Experiment with the e command – try reading number followed by a p. Thus
and printing various files. You may get an error 11pp
?name, where name is the name of a file; this means
that the file doesn’t exist, typically because you produces the response
spelled the file name wrong, or perhaps that you are N
Noow
w iiss tthhee ttiim
mee
not allowed to read or write it. Try alternately read-
ing and appending to see that they work similarly. which is the first line of the buffer.
Verify that In fact, ed lets you abbreviate even further: you
can print any single line by typing just the line
eedd fi
filleennaam
mee
number – no need to type the letter p. So if you say
-4-

Dot is most useful when used in combinations


$
like this one:
ed will print the last line of the buffer.
.++11 ((oorr eeqquuiivvaalleennttllyy,, .+
+11pp))
You can also use $ in combinations like
This means ‘‘print the next line’’ and is a handy way
$$–– 11,,$$pp to step slowly through a buffer. You can also say
which prints the last two lines of the buffer. This .– 1 ((oorr .– 11pp )
helps when you want to see how far you got in typ-
ing. which means ‘‘print the line before the current line.’’
This enables you to go backwards if you wish.
Exercise 3: Another useful one is something like

As before, create some text using the a command .– 33,,.– 11pp


and experiment with the p command. You will find, which prints the previous three lines.
for example, that you can’t print line 0 or a line
beyond the end of the buffer, and that attempts to Don’t forget that all of these change the value of
print a buffer in reverse order by saying dot. You can find out what dot is at any time by typ-
ing
33,,11pp
.=
don’t work.
Ed will respond by printing the value of dot.
The current line – ‘‘Dot’’ or ‘‘.’’ Let’s summarize some things about the p com-
Suppose your buffer still contains the six lines as mand and dot. Essentially p can be preceded by 0, 1,
above, that you have just typed or 2 line numbers. If there is no line number given,
it prints the ‘‘current line’’, the line that dot refers to.
11,,33pp If there is one line number given (with or without the
and ed has printed the three lines for you. Try typing letter p), it prints that line (and dot is set there); and
just if there are two line numbers, it prints all the lines in
that range (and sets dot to the last line printed.) If
p ((nnoo lliinnee nnuum
mbbeerrss)) two line numbers are specified the first can’t be
This will print bigger than the second (see Exercise 2.)
Typing a single return will cause printing of the
ttoo ccoom
mee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
next line – it’s equivalent to .+1p. Try it. Try typ-
which is the third line of the buffer. In fact it is the ing a – ; you will find that it’s equivalent to .– 1p.
last (most recent) line that you have done anything
with. (You just printed it!) You can repeat this p Deleting lines: the ‘‘d’’ command
command without line numbers, and it will continue Suppose you want to get rid of the three extra
to print line 3. lines in the buffer. This is done by the delete com-
The reason is that ed maintains a record of the mand
last line that you did anything to (in this case, line 3,
d
which you just printed) so that it can be used instead
of an explicit line number. This most recent line is Except that d deletes lines instead of printing them,
referred to by the shorthand symbol its action is similar to that of p. The lines to be
deleted are specified for d exactly as they are for p:
. ((pprroonnoouunncceedd ‘‘‘‘ddoott’’’’))..
starting line, ending line d
Dot is a line number in the same way that $ is; it
means exactly ‘‘the current line’’, or loosely, ‘‘the Thus the command
line you most recently did something to.’’ You can
44,,$$dd
use it in several ways – one possibility is to say
deletes lines 4 through the end. There are now three
.,,$$pp lines left, as you can check by using
This will print all the lines from (including) the
11,,$$pp
current line to the end of the buffer. In our example
these are lines 3 through 6. And notice that $ now is line 3! Dot is set to the
Some commands change the value of dot, while next line after the last line deleted, unless the last line
others do not. The p command sets dot to the deleted is the last line in the buffer. In that case, dot
number of the last line printed; the last command will is set to $.
set both . and $ to 6.
-5-

Exercise 4: of slashes is replaced by whatever is between the


Experiment with a, e, r, w, p and d until you are second pair, in all the lines between starting-line and
sure that you know what they do, and until you ending-line. Only the first occurrence on each line is
understand how dot, $, and line numbers are used. changed, however. If you want to change every
occurrence, see Exercise 5. The rules for line
If you are adventurous, try using line numbers numbers are the same as those for p, except that dot
with a, r and w as well. You will find that a will is set to the last line changed. (But there is a trap for
append lines after the line number that you specify the unwary: if no substitution took place, dot is not
(rather than after dot); that r reads a file in after the changed. This causes an error ? as a warning.)
line number you specify (not necessarily at the end of
the buffer); and that w will write out exactly the lines Thus you can say
you specify, not necessarily the whole buffer. These 11,,$$ss//ssppeelliinngg//ssppeelllliinngg//
variations are sometimes handy. For instance you can
insert a file at the beginning of a buffer by saying and correct the first spelling mistake on each line in
the text. (This is useful for people who are consistent
00rr fi
filleennaam
mee misspellers!)
and you can enter lines at the beginning of the buffer If no line numbers are given, the s command
by saying assumes we mean ‘‘make the substitution on line
dot’’, so it changes things only on the current line.
00aa
This leads to the very common sequence
. . . text . . .
. ss//ssoom
meetthhiinngg//ssoom
meetthhiinngg eellssee//pp
Notice that .w is very different from which makes some correction on the current line, and
then prints it, to make sure it worked out right. If it
. didn’t, you can try again. (Notice that there is a p on
w
the same line as the s command. With few excep-
tions, p can follow any command; no other multi-
Modifying text: the Substitute command ‘‘s’’ command lines are legal.)
We are now ready to try one of the most impor- It’s also legal to say
tant of all commands – the substitute command
ss// . . . ////
s
which means ‘‘change the first string of characters to
This is the command that is used to change individual ‘‘nothing’’, i.e., remove them. This is useful for
words or letters within a line or group of lines. It is deleting extra words in a line or removing extra
what you use, for example, for correcting spelling letters from words. For instance, if you had
mistakes and typing errors.
N
Noow
wxxxx iiss tthhee ttiim
mee
Suppose that by a typing error, line 1 says
you can say
N
Noow
w iiss tthh ttiim
mee
ss//xxxx////pp
– the e has been left off the. You can use s to fix
this up as follows: to get

11ss//tthh//tthhee// N
Noow
w iiss tthhee ttiim
mee

This says: ‘‘in line 1, substitute for the characters th Notice that // (two adjacent slashes) means ‘‘no char-
the characters the.’’ To verify that it works (ed will acters’’, not a blank. There is a difference! (See
not print the result automatically) say below for another meaning of //.)

p Exercise 5:
and get Experiment with the substitute command. See
what happens if you substitute for some word on a
N
Noow
w iiss tthhee ttiim
mee
line with several occurrences of that word. For
which is what you wanted. Notice that dot must have example, do this:
been set to the line where the substitution took place,
a
since the p command printed that line. Dot is always
tthhee ootthheerr ssiiddee ooff tthhee ccooiinn
set this way with the s command.
.
The general way to use the substitute command is ss//tthhee//oonn tthhee//pp
starting-line, ending-line ss//change this//to this// You will get
Whatever string of characters is between the first pair
-6-

oonn tthhee ootthheerr ssiiddee ooff tthhee ccooiinn ?


A substitute command changes only the first Otherwise it prints the line it found.
occurrence of the first string. You can change all You can do both the search for the desired line
occurrences by adding a g (for ‘‘global’’) to the s and a substitution all at once, like this:
command, like this:
//tthheeiirr//ss//tthheeiirr//tthhee//pp
ss// . . . / . . . //ggpp
which will yield
Try other characters instead of slashes to delimit the
two sets of characters in the s command – anything ttoo ccoom
mee ttoo tthhee aaiidd ooff tthhee ppaarrttyy..
should work except blanks or tabs. There were three parts to that last command: context
(If you get funny results using any of the charac- search for the desired line, make the substitution,
ters print the line.
ˆ . $ [ ∗ \ & The expression /their/ is a context search expres-
sion. In their simplest form, all context search
read the section on ‘‘Special Characters’’.) expressions are like this – a string of characters sur-
rounded by slashes. Context searches are inter-
Context searching – ‘‘/ . . . /’’ changeable with line numbers, so they can be used by
With the substitute command mastered, you can themselves to find and print a desired line, or as line
move on to another highly important idea of ed – numbers for some other command, like s. They were
context searching. used both ways in the examples above.
Suppose you have the original three line text in Suppose the buffer contains the three familiar
the buffer: lines
NNoow w iiss tthhee ttiim
mee NNoow w iiss tthhee ttiim
mee
ffoorr aallll ggoooodd m
meenn ffoorr aallll ggoooodd m
meenn
ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy.. ttoo ccoommee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy..
Suppose you want to find the line that contains their Then the ed line numbers
so you can change it to the. Now with only three
//NNoow w//+ +11
lines in the buffer, it’s pretty easy to keep track of
//ggoooodd//
what line the word their is on. But if the buffer con-
//ppaarrttyy//–– 1
tained several hundred lines, and you’d been making
changes, deleting and rearranging lines, and so on, are all context search expressions, and they all refer
you would no longer really know what this line to the same line (line 2). To make a change in line 2,
number would be. Context searching is simply a you could say
method of specifying the desired line, regardless of
//N
Noow
w//+
+11ss//ggoooodd//bbaadd//
what its number is, by specifying some context on it.
The way to say ‘‘search for a line that contains or
this particular string of characters’’ is to type //ggoooodd//ss//ggoooodd//bbaadd//
//string of characters we want to find// or
For example, the ed command //ppaarrttyy//–– 11ss//ggoooodd//bbaadd//
//tthheeiirr// The choice is dictated only by convenience. You
is a context search which is sufficient to find the could print all three lines by, for instance
desired line – it will locate the next occurrence of the //N
Noow
w//,,//ppaarrttyy//pp
characters between slashes (‘‘their’’). It also sets dot
to that line and prints the line for verification: or

ttoo ccoom
mee ttoo tthhee aaiidd ooff tthheeiirr ppaarrttyy.. //N
Noow
w//,,//N
Noow
w//+
+22pp

‘‘Next occurrence’’ means that ed starts looking for or by any number of similar combinations. The first
the string at line .+1, searches to the end of the one of these might be better if you don’t know how
buffer, then continues at line 1 and searches to line many lines are involved. (Of course, if there were
dot. (That is, the search ‘‘wraps around’’ from $ to only three lines in the buffer, you’d use
1.) It scans all the lines in the buffer until it either 11,,$$pp
finds the desired line or gets back to dot again. If the
given string of characters can’t be found in any line, but not if there were several hundred.)
ed types the error message
-7-

The basic rule is: a context search expression is is used to replace a number of lines with different
the same as a line number, so it can be used wher- lines, which are typed in at the terminal. For exam-
ever a line number is needed. ple, to change lines .+1 through $ to something else,
type
Exercise 6:
..+
+11,,$$cc
Experiment with context searching. Try a body . . . type the lines of text you want here . . .
of text with several occurrences of the same string of .
characters, and scan through it using the same context
search. The lines you type between the c command and the .
will take the place of the original lines between start
Try using context searches as line numbers for the line and end line. This is most useful in replacing a
substitute, print and delete commands. (They can line or several lines which have errors in them.
also be used with r, w, and a.)
If only one line is specified in the c command,
Try context searching using ?text? instead of then just that line is replaced. (You can type in as
/text/. This scans lines in the buffer in reverse order many replacement lines as you like.) Notice the use
rather than normal. This is sometimes useful if you of . to end the input – this works just like the . in
go too far while looking for some string of characters the append command and must appear by itself on a
– it’s an easy way to back up. new line. If no line number is given, line dot is
(If you get funny results with any of the charac- replaced. The value of dot is set to the last line you
ters typed in.
ˆ . $ [ ∗ \ & ‘‘Insert’’ is similar to append – for instance

read the section on ‘‘Special Characters’’.) //ssttrriinngg//ii


. . . type the lines to be inserted here . . .
Ed provides a shorthand for repeating a context
search for the same string. For example, the ed line
.
number will insert the given text before the next line that con-
tains ‘‘string’’. The text between i and . is inserted
//ssttrriinngg//
before the specified line. If no line number is
will find the next occurrence of string. It often hap- specified dot is used. Dot is set to the last line
pens that this is not the desired line, so the search inserted.
must be repeated. This can be done by typing merely
Exercise 7:
////
‘‘Change’’ is rather like a combination of delete
This shorthand stands for ‘‘the most recently used followed by insert. Experiment to verify that
context search expression.’’ It can also be used as
the first string of the substitute command, as in start, end d
i
//ssttrriinngg11//ss////ssttrriinngg22// . . . text . . .
which will find the next occurrence of string1 and .
replace it by string2. This can save a lot of typing. is almost the same as
Similarly
start, end c
???? . . . text . . .
means ‘‘scan backwards for the same expression.’’ .
These are not precisely the same if line $ gets
Change and Insert – ‘‘c’’ and ‘‘i’’ deleted. Check this out. What is dot?
This section discusses the change command Experiment with a and i, to see that they are
c similar, but not the same. You will observe that

which is used to change or replace a group of one or line-number a


more lines, and the insert command . . . text . . .
.
i
appends after the given line, while
which is used for inserting a group of one or more
lines. line-number i
. . . text . . .
‘‘Change’’, written as
.
c
inserts before it. Observe that if no line number is
-8-

given, i inserts before line dot, while a appends after


gg//xxxxxx//.– 11ss//aabbcc//ddeeff//nn
line dot.
.++22ss//gghhii//jjkkll//nn
Moving text around: the ‘‘m’’ command
.– 22,,.p
The move command m is used for cutting and makes changes in the lines before and after each line
pasting – it lets you move a group of lines from one that contains xxx, then prints all three lines.
place to another in the buffer. Suppose you want to The v command is the same as g, except that the
put the first three lines of the buffer at the end commands are executed on every line that does not
instead. You could do it by saying: match the string following v:
11,,33ww tteem
mpp vv// //dd
$$rr tteem
mpp
deletes every line that does not contain a blank.
11,,33dd
(Do you see why?) but you can do it a lot easier Special Characters
with the m command: You may have noticed that things just don’t work
11,,33m
m$$ right when you used some characters like ., ∗, $, and
others in context searches and the substitute com-
The general case is mand. The reason is rather complex, although the
start line, end line m after this line cure is simple. Basically, ed treats these characters as
special, with special meanings. For instance, in a
Notice that there is a third line to be specified – the context search or the first string of the substitute com-
place where the moved stuff gets put. Of course the mand only, . means ‘‘any character,’’ not a period, so
lines to be moved can be specified by context
searches; if you had //xx.yy//

FFiirrsstt ppaarraaggrraapphh means ‘‘a line with an x, any character, and a y,’’
... not just ‘‘a line with an x, a period, and a y.’’ A
eenndd ooff fi firrsstt ppaarraaggrraapphh.. complete list of the special characters that can cause
SSeeccoonndd ppaarraaggrraapphh trouble is the following:
... ˆ . $ [ ∗ \
eenndd ooff sseeccoonndd ppaarraaggrraapphh..
Warning: The backslash character \ is special to ed.
you could reverse the two paragraphs like this: For safety’s sake, avoid it where possible. If you
//S
Seeccoonndd//,,//eenndd ooff sseeccoonndd//m
m//F
Fiirrsstt//–– 1 have to use one of the special characters in a substi-
tute command, you can turn off its magic meaning
Notice the – 1: the moved text goes after the line temporarily by preceding it with the backslash. Thus
mentioned. Dot gets set to the last line moved.
ss//\\\\\\.\\∗∗//bbaacckkssllaasshh ddoott ssttaarr//
The global commands ‘‘g’’ and ‘‘v’’ will change \.∗ into ‘‘backslash dot star’’.
The global command g is used to execute one or Here is a hurried synopsis of the other special
more ed commands on all those lines in the buffer characters. First, the circumflex ˆ signifies the begin-
that match some specified string. For example ning of a line. Thus
gg//ppeelliinngg//pp //ˆˆssttrriinngg//
prints all lines that contain peling. More usefully, finds string only if it is at the beginning of a line: it
gg//ppeelliinngg//ss////ppeelllliinngg//ggpp will find

makes the substitution everywhere on the line, then ssttrriinngg


prints each corrected line. Compare this to but not
11,,$$ss//ppeelliinngg//ppeelllliinngg//ggpp tthhee ssttrriinngg......
which only prints the last line substituted. Another The dollar-sign $ is just the opposite of the
subtle difference is that the g command does not give circumflex; it means the end of a line:
a ? if peling is not found where the s command will.
//ssttrriinngg$$//
There may be several commands (including a, c,
i, r, w, but not g); in that case, every line except the will only find an occurrence of string that is at the
last must end with a backslash \: end of some line. This implies, of course, that
//ˆˆssttrriinngg$$//
-9-

will find only a line that contains just string, and You don’t have to match the whole line, of
course: if the buffer contains
//ˆˆ.$$//
tthhee eenndd ooff tthhee w
woorrlldd
finds a line containing exactly one character.
The character ., as we mentioned above, matches you could type
anything; //w
woorrlldd//ss////&
& iiss aatt hhaanndd//
//xx.yy// to produce
matches any of tthhee eenndd ooff tthhee w
woorrlldd iiss aatt hhaanndd
xx+
+yy Observe this expression carefully, for it illustrates
xx–– y how to take advantage of ed to save typing. The
xy string /world/ found the desired line; the shorthand //
x. y found the same word in the line; and the & saves you
This is useful in conjunction with ∗, which is a from typing it again.
repetition character; a∗ is a shorthand for ‘‘any The & is a special character only within the
number of a’s,’’ so .∗ matches any number of any- replacement text of a substitute command, and has no
things. This is used like this: special meaning elsewhere. You can turn off the spe-
cial meaning of & by preceding it with a \:
ss//.∗∗//ssttuuffff//
ss//aam
mppeerrssaanndd//\\&
&//
which changes an entire line, or
will convert the word ‘‘ampersand’’ into the literal
ss//.∗∗,,////
symbol & in the current line.
which deletes all characters in the line up to and
including the last comma. (Since .∗ finds the longest
possible match, this goes up to the last comma.)
[ is used with ] to form ‘‘character classes’’; for Summary of Commands and Line Numbers
example, The general form of ed commands is the com-
mand name, perhaps preceded by one or two line
//[[00112233445566778899]]//
numbers, and, in the case of e, r, and w, followed by
matches any single digit – any one of the characters a file name. Only one command is allowed per line,
inside the braces will cause a match. This can be but a p command may follow any other command
abbreviated to [0– 9]. (except for e, r, w and q).
Finally, the & is another shorthand character – it a: Append, that is, add lines to the buffer (at line
is used only on the right-hand part of a substitute dot, unless a different line is specified). Appending
command where it means ‘‘whatever was matched on continues until . is typed on a new line. Dot is set to
the left-hand side’’. It is used to save typing. Sup- the last line appended.
pose the current line contained c: Change the specified lines to the new text which
N
Noow
w iiss tthhee ttiim
mee follows. The new lines are terminated by a ., as with
a. If no lines are specified, replace line dot. Dot is
and you wanted to put parentheses around it. You set to last line changed.
could just retype the line, but this is tedious. Or you
could say d: Delete the lines specified. If none are specified,
delete line dot. Dot is set to the first undeleted line,
ss//ˆˆ//((// unless $ is deleted, in which case dot is set to $.
ss//$$//))//
e: Edit new file. Any previous contents of the buffer
using your knowledge of ˆ and $. But the easiest are thrown away, so issue a w beforehand.
way uses the &: f: Print remembered filename. If a name follows f
ss//.∗∗//((&
&))// the remembered name will be set to it.

This says ‘‘match the whole line, and replace it by g: The command
itself surrounded by parentheses.’’ The & can be gg//------//ccoom
mmmaannddss
used several times in a line; consider using
will execute the commands on those lines that contain
ss//.∗∗//&
&?? &
&!!!!// ---, which can be any context search expression.
to produce i: Insert lines before specified line (or dot) until a . is
typed on a new line. Dot is set to last line inserted.
N
Noow
w iiss tthhee ttiim
mee?? N
Noow
w iiss tthhee ttiim
mee!!!!
- 10 -

m: Move lines specified to after the line named after


m. Dot is set to the last line moved.
p: Print specified lines. If none specified, print line
dot. A single line number is equivalent to line-
number p. A single return prints .+1, the next line.
q: Quit ed. Wipes out all text in buffer if you give
it twice in a row without first giving a w command.
r: Read a file into buffer (at end unless specified
elsewhere.) Dot set to last line read.
s: The command
ss//ssttrriinngg11//ssttrriinngg22//
substitutes the characters string1 into string2 in the
specified lines. If no lines are specified, make the
substitution in line dot. Dot is set to last line in
which a substitution took place, which means that if
no substitution took place, dot is not changed. s
changes only the first occurrence of string1 on a line;
to change all of them, type a g after the final slash.
v: The command
vv//------//ccoom
mmmaannddss
executes commands on those lines that do not con-
tain ---.
w: Write out buffer onto a file. Dot is not changed.
.=: Print value of dot. (= by itself prints the value of
$.)
!: The line
!!ccoom
mmmaanndd--lliinnee
causes command-line to be executed as a UNIX com-
mand.
/-----/: Context search. Search for next line which
contains this string of characters. Print it. Dot is set
to the line where string was found. Search starts at
.+1, wraps around from $ to 1, and continues to dot,
if necessary.
?-----?: Context search in reverse direction. Start
search at .– 1, scan to 1, wrap around to $.
Advanced Editing on UNIX

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

This paper is meant to help secretaries, typists and programmers to make effec-
tive use of the UNIX† facilities for preparing and editing text. It provides explanations
and examples of
• special characters, line addressing and global commands in the editor ed;
• commands for ‘‘cut and paste’’ operations on files and parts of files, including
the mv, cp, cat and rm commands, and the r, w, m and t commands of the edi-
tor;
• editing scripts and editor-based programs like grep and sed.
Although the treatment is aimed at non-programmers, new users with any back-
ground should find helpful hints on how to get their jobs done more easily.

August 4, 1978

_______________
†UNIX is a Trademark of Bell Laboratories.
Advanced Editing on UNIX

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

1. INTRODUCTION The List command ‘l’


Although UNIX† provides remarkably effective ed provides two commands for printing the
tools for text editing, that by itself is no guarantee contents of the lines you’re editing. Most people are
that everyone will automatically make the most effec- familiar with p, in combinations like
tive use of them. In particular, people who are not
1,$p
computer specialists — typists, secretaries, casual
users — often use the system less effectively than to print all the lines you’re editing, or
they might.
s/abc/def/p
This document is intended as a sequel to A
Tutorial Introduction to the UNIX Text Editor [1], to change ‘abc’ to ‘def’ on the current line. Less
providing explanations and examples of how to edit familiar is the list command l (the letter ‘l ’), which
with less effort. (You should also be familiar with gives slightly more information than p. In particular,
the material in UNIX For Beginners [2].) Further l makes visible characters that are normally invisible,
information on all commands discussed here can be such as tabs and backspaces. If you list a line that
found in The UNIX Programmer’s Manual [3]. contains some of these, l will print each tab as −> and
each backspace as − <. This makes it much easier to
Examples are based on observations of users correct the sort of typing mistake that inserts extra
and the difficulties they encounter. Topics covered spaces adjacent to tabs, or inserts a backspace fol-
include special characters in searches and substitute lowed by a space.
commands, line addressing, the global commands, and
line moving and copying. There are also brief discus- The l command also ‘folds’ long lines for
sions of effective use of related tools, like those for printing — any line that exceeds 72 characters is
file manipulation, and those based on ed, like grep printed on multiple lines; each printed line except the
and sed. last is terminated by a backslash \\, so you can tell it
was folded. This is useful for printing long lines on
A word of caution. There is only one way to short terminals.
learn to use something, and that is to use it. Reading
a description is no substitute for trying something. A Occasionally the l command will print in a line
paper like this one should give you ideas about what a string of numbers preceded by a backslash, such as
to try, but until you actually try something, you will \\07 or \\16. These combinations are used to make
not learn it. visible characters that normally don’t print, like form
feed or vertical tab or bell. Each such combination is
2. SPECIAL CHARACTERS a single character. When you see such characters, be
wary — they may have surprising meanings when
The editor ed is the primary interface to the printed on some terminals. Often their presence
system for many people, so it is worthwhile to know means that your finger slipped while you were typing;
how to get the most out of ed for the least effort. you almost never want them.
The next few sections will discuss shortcuts
and labor-saving devices. Not all of these will be The Substitute Command ‘s’
instantly useful to any one person, of course, but a Most of the next few sections will be taken up
few will be, and the others should give you ideas to with a discussion of the substitute command s. Since
store away for future use. And as always, until you this is the command for changing the contents of indi-
try these things, they will remain theoretical vidual lines, it probably has the most complexity of
knowledge, not something you have confidence in. any ed command, and the most potential for effective
__________________ use.
†UNIX is a Trademark of Bell Laboratories.
As the simplest place to begin, recall the
meaning of a trailing g after a substitute command.
-2-

With
/x.y/
s/this/that/
finds any line where ‘x’ and ‘y’ occur separated by a
and single character, as in
s/this/that/g x+y
x– y
the first one replaces the first ‘this’ on the line with
x y
‘that’. If there is more than one ‘this’ on the line, the
x. y
second form with the trailing g changes all of them.
Either form of the s command can be followed and so on. (We will use to stand for a space when-
by p or l to ‘print’ or ‘list’ (as described in the previ- ever we need to make it visible.)
ous section) the contents of the line: Since ‘.’ matches a single character, that gives
you a way to deal with funny characters printed by l.
s/this/that/p
Suppose you have a line that, when printed with the l
s/this/that/l
command, appears as
s/this/that/gp
s/this/that/gl .... th\\07is ....
are all legal, and mean slightly different things. and you want to get rid of the \\07 (which represents
Make sure you know what the differences are. the bell character, by the way).
Of course, any s command can be preceded by The most obvious solution is to try
one or two ‘line numbers’ to specify that the substitu-
s/\\07//
tion is to take place on a group of lines. Thus
but this will fail. (Try it.) The brute force solution,
1,$s/mispell/misspell/
which most people would now take, is to re-type the
changes the first occurrence of ‘mispell’ to ‘misspell’ entire line. This is guaranteed, and is actually quite a
on every line of the file. But reasonable tactic if the line in question isn’t too big,
but for a very long line, re-typing is a bore. This is
1,$s/mispell/misspell/g
where the metacharacter ‘.’ comes in handy. Since
changes every occurrence in every line (and this is ‘\\07’ really represents a single character, if we say
more likely to be what you wanted in this particular
s/th.is/this/
case).
You should also notice that if you add a p or l the job is done. The ‘.’ matches the mysterious char-
to the end of any of these substitute commands, only acter between the ‘h’ and the ‘i’, whatever it is.
the last line that got changed will be printed, not all Bear in mind that since ‘.’ matches any single
the lines. We will talk later about how to print all character, the command
the lines that were modified.
s/./,/
The Undo Command ‘u’ converts the first character on a line into a ‘,’, which
Occasionally you will make a substitution in a very often is not what you intended.
line, only to realize too late that it was a ghastly mis- As is true of many characters in ed, the ‘.’ has
take. The ‘undo’ command u lets you ‘undo’ the last several meanings, depending on its context. This line
substitution: the last line that was substituted can be shows all three:
restored to its previous state by typing the command
.s/././
u
The first ‘.’ is a line number, the number of the line
we are editing, which is called ‘line dot’. (We will
The Metacharacter ‘.’ discuss line dot more in Section 3.) The second ‘.’ is
As you have undoubtedly noticed when you a metacharacter that matches any single character on
use ed, certain characters have unexpected meanings that line. The third ‘.’ is the only one that really is
when they occur in the left side of a substitute com- an honest literal period. On the right side of a substi-
mand, or in a search for a particular line. In the next tution, ‘.’ is not special. If you apply this command
several sections, we will talk about these special char- to the line
acters, which are often called ‘metacharacters’. Now is the time.
The first one is the period ‘.’. On the left side the result will be
of a substitute command, or in a search with ‘/.../’, ‘.’
stands for any single character. Thus the search .ow is the time.
which is probably not what you intended.
-3-

The Backslash ‘\\’ As an exercise, before reading further, find two


Since a period means ‘any character’, the ques- substitute commands each of which will convert the
tion naturally arises of what to do when you really line
want a period. For example, how do you convert the \\x\\.\\y
line
into the line
Now is the time.
\\x\\y
into
Here are several solutions; verify that each
Now is the time?
works as advertised.
The backslash ‘\\’ does the job. A backslash turns off
s/\\\\\\.//
any special meaning that the next character might
s/x../x/
have; in particular, ‘\\.’ converts the ‘.’ from a ‘match
s/..y/y/
anything’ into a period, so you can use it to replace
the period in A couple of miscellaneous notes about
Now is the time. backslashes and special characters. First, you can use
any character to delimit the pieces of an s command:
like this: there is nothing sacred about slashes. (But you must
s/\\./?/ use slashes for context searching.) For instance, in a
line that contains a lot of slashes already, like
The pair of characters ‘\\.’ is considered by ed to be a
single real period. //exec //sys.fort.go // etc...
The backslash can also be used when searching you could use a colon as the delimiter — to delete all
for lines that contain a special character. Suppose the slashes, type
you are looking for a line that contains
s:/::g
.PP
Second, if # and @ are your character erase
The search and line kill characters, you have to type \\# and \\@;
/.PP/ this is true whether you’re talking to ed or any other
program.
isn’t adequate, for it will find a line like
When you are adding text with a or i or c,
THE APPLICATION OF ... backslash is not special, and you should only put in
because the ‘.’ matches the letter ‘A’. But if you say one backslash for each one you really want.

/\\.PP/ The Dollar Sign ‘$’


you will find only lines that contain ‘.PP’. The next metacharacter, the ‘$’, stands for ‘the
The backslash can also be used to turn off spe- end of the line’. As its most obvious use, suppose
cial meanings for characters other than ‘.’. For exam- you have the line
ple, consider finding a line that contains a backslash. Now is the
The search
and you wish to add the word ‘time’ to the end. Use
/\\/ the $ like this:
won’t work, because the ‘\\’ isn’t a literal ‘\\’, but s/$/ time/
instead means that the second ‘/’ no longer delimits
the search. But by preceding a backslash with to get
another one, you can search for a literal backslash. Now is the time
Thus
Notice that a space is needed before ‘time’ in the sub-
/\\\\/ stitute command, or you will get
does work. Similarly, you can search for a forward Now is thetime
slash ‘/’ with
As another example, replace the second comma
/\\//
in the following line with a period without altering
The backslash turns off the meaning of the immedi- the first:
ately following ‘/’ so that it doesn’t terminate the /.../
Now is the time, for all good men,
construction prematurely.
The command needed is
-4-

indeterminate number of spaces between the x and the


s/,$/./
y. Suppose the job is to replace all the spaces
The $ sign here provides context to make specific between x and y by a single space. The line is too
which comma we mean. Without it, of course, the s long to retype, and there are too many spaces to
command would operate on the first comma to pro- count. What now?
duce This is where the metacharacter ‘∗’ comes in
Now is the time. for all good men, handy. A character followed by a star stands for as
many consecutive occurrences of that character as
As another example, to convert possible. To refer to all the spaces at once, say
Now is the time. s/x ∗y/x y/
into The construction ‘ ∗’ means ‘as many spaces as pos-
sible’. Thus ‘x ∗y’ means ‘an x, as many spaces as
Now is the time?
possible, then a y’.
as we did earlier, we can use The star can be used with any character, not
s/.$/?/ just space. If the original example was instead
text x– – – – – – – – y text
Like ‘.’, the ‘$’ has multiple meanings depend-
ing on context. In the line then all ‘– ’ signs can be replaced by a single space
with the command
$s/$/$/
s/x– ∗y/x y/
the first ‘$’ refers to the last line of the file, the
second refers to the end of that line, and the third is a Finally, suppose that the line was
literal dollar sign, to be added to that line.
text x.................. y text
The Circumflex ‘ˆ’ Can you see what trap lies in wait for the unwary? If
The circumflex (or hat or caret) ‘ˆ’ stands for you blindly type
the beginning of the line. For example, suppose you
s/x.∗y/x y/
are looking for a line that begins with ‘the’. If you
simply say what will happen? The answer, naturally, is that it
depends. If there are no other x’s or y’s on the line,
/the/
then everything works, but it’s blind luck, not good
you will in all likelihood find several lines that con- management. Remember that ‘.’ matches any single
tain ‘the’ in the middle before arriving at the one you character? Then ‘.∗’ matches as many single charac-
want. But with ters as possible, and unless you’re careful, it can eat
up a lot more of the line than you expected. If the
/ˆthe/
line was, for example, like this:
you narrow the context, and thus arrive at the desired
text x text x................ y text y text
one more easily.
The other use of ‘ˆ’ is of course to enable you then saying
to insert something at the beginning of a line: s/x.∗y/x y/
s/ˆ/ / will take everything from the first ‘x’ to the last ‘y’,
places a space at the beginning of the current line. which, in this example, is undoubtedly more than you
wanted.
Metacharacters can be combined. To search for
a line that contains only the characters The solution, of course, is to turn off the spe-
cial meaning of ‘.’ with ‘\\.’:
.PP
s/x\\.∗y/x y/
you can use the command
Now everything works, for ‘\\.∗’ means ‘as many
/ˆ\\.PP$/ periods as possible’.
There are times when the pattern ‘.∗’ is exactly
The Star ‘∗’ what you want. For example, to change
Suppose you have a line that looks like this: Now is the time for all good men ....
text x y text into
where text stands for lots of text, and there are some
-5-

Now is the time. 1,$s/ˆ1∗//


1,$s/ˆ2∗//
use ‘.∗’ to eat up everything after the ‘for’:
1,$s/ˆ3∗//
s/ for.∗/./
and so on, but this is clearly going to take forever if
There are a couple of additional pitfalls associ- the numbers are at all long. Unless you want to
ated with ‘∗’ that you should be aware of. Most not- repeat the commands over and over until finally all
able is the fact that ‘as many as possible’ means zero numbers are gone, you must get all the digits on one
or more. The fact that zero is a legitimate possibility pass. This is the purpose of the brackets [ and ].
is sometimes rather surprising. For example, if our The construction
line contained
[0123456789]
text xy text x y text
matches any single digit — the whole thing is called
and we said a ‘character class’. With a character class, the job is
easy. The pattern ‘[0123456789]∗’ matches zero or
s/x ∗y/x y/
more digits (an entire number), so
the first ‘xy’ matches this pattern, for it consists of an
1,$s/ˆ[0123456789]∗//
‘x’, zero spaces, and a ‘y’. The result is that the sub-
stitute acts on the first ‘xy’, and does not touch the deletes all digits from the beginning of all lines.
later one that actually contains some intervening Any characters can appear within a character
spaces. class, and just to confuse the issue there are essen-
The way around this, if it matters, is to specify tially no special characters inside the brackets; even
a pattern like the backslash doesn’t have a special meaning. To
search for special characters, for example, you can
/x ∗y/
say
which says ‘an x, a space, then as many more spaces
/[.\\$ˆ[]/
as possible, then a y’, in other words, one or more
spaces. Within [...], the ‘[’ is not special. To get a ‘]’ into a
The other startling behavior of ‘∗’ is again character class, make it the first character.
related to the fact that zero is a legitimate number of It’s a nuisance to have to spell out the digits,
occurrences of something followed by a star. The so you can abbreviate them as [0– 9]; similarly, [a– z]
command stands for the lower case letters, and [A– Z] for upper
case.
s/x∗/y/g
As a final frill on character classes, you can
when applied to the line specify a class that means ‘none of the following
abcdef characters’. This is done by beginning the class with
a ‘ˆ’:
produces
[ˆ0– 9]
yaybycydyeyfy
stands for ‘any character except a digit’. Thus you
which is almost certainly not what was intended. The might find the first line that doesn’t begin with a tab
reason for this behavior is that zero is a legal number or space by a search like
of matches, and there are no x’s at the beginning of
the line (so that gets converted into a ‘y’), nor /ˆ[ˆ(space)(tab)]/
between the ‘a’ and the ‘b’ (so that gets converted
into a ‘y’), nor ... and so on. Make sure you really Within a character class, the circumflex has a
want zero matches; if not, in this case write special meaning only if it occurs at the beginning.
Just to convince yourself, verify that
s/xx∗/y/g
/ˆ[ˆˆ]/
‘xx∗’ is one or more x’s.
finds a line that doesn’t begin with a circumflex.
The Brackets ‘[ ]’
The Ampersand ‘&’
Suppose that you want to delete any numbers
that appear at the beginning of all lines of a file. You The ampersand ‘&’ is used primarily to save
might first think of trying a series of commands like typing. Suppose you have the line
Now is the time
and you want to make it
-6-

‘\\’ at the end of a line would make the newline there


Now is the best time
no longer special.
Of course you can always say You can in fact make a single line into several
s/the/the best/ lines with this same mechanism. As a large example,
consider underlining the word ‘very’ in a long line by
but it seems silly to have to repeat the ‘the’. The ‘&’ splitting ‘very’ onto a separate line, and preceding it
is used to eliminate the repetition. On the right side by the roff or nroff formatting command ‘.ul’.
of a substitute, the ampersand means ‘whatever was
just matched’, so you can say text a very big text

s/the/& best/ The command

and the ‘&’ will stand for ‘the’. Of course this isn’t s/ very /\\
much of a saving if the thing matched is just ‘the’, .ul\\
but if it is something truly long or awful, or if it is very\\
something like ‘.∗’ which matches a lot of text, you /
can save some tedious typing. There is also much converts the line into four shorter lines, preceding the
less chance of making a typing error in the replace- word ‘very’ by the line ‘.ul’, and eliminating the
ment text. For example, to parenthesize a line, spaces around the ‘very’, all at the same time.
regardless of its length,
When a newline is substituted in, dot is left
s/.∗/(&)/ pointing at the last line created.

The ampersand can occur more than once on


the right side: Joining Lines
Lines may also be joined together, but this is
s/the/& best and & worst/
done with the j command instead of s. Given the
makes lines
Now is the best and the worst time Now is
the time
and
and supposing that dot is set to the first of them, then
s/.∗/&? &!!/
the command
converts the original line into
j
Now is the time? Now is the time!!
joins them together. No blanks are added, which is
To get a literal ampersand, naturally the why we carefully showed a blank at the beginning of
backslash is used to turn off the special meaning: the second line.
All by itself, a j command joins line dot to line
s/ampersand/\\&/
dot+1, but any contiguous set of lines can be joined.
converts the word into the symbol. Notice that ‘&’ is Just specify the starting and ending line numbers.
not special on the left side of a substitute, only on the For example,
right side.
1,$jp
Substituting Newlines joins all the lines into one big one and prints it.
ed provides a facility for splitting a single line (More on line numbers in Section 3.)
into two or more shorter lines by ‘substituting in a
newline’. As the simplest example, suppose a line Rearranging a Line with \\( ... \\)
has gotten unmanageably long because of editing (or (This section should be skipped on first read-
merely because it was unwisely typed). If it looks ing.) Recall that ‘&’ is a shorthand that stands for
like whatever was matched by the left side of an s com-
mand. In much the same way you can capture
text xy text
separate pieces of what was matched; the only differ-
you can break it between the ‘x’ and the ‘y’ like this: ence is that you have to specify on the left side just
what pieces you’re interested in.
s/xy/x\\
y/ Suppose, for instance, that you have a file of
lines that consist of names in the form
This is actually a single command, although it is
typed on two lines. Bearing in mind that ‘\\’ turns off Smith, A. B.
special meanings, it seems relatively intuitive that a Jones, C.
-7-

and so on, and you want the initials to precede the Address Arithmetic
name, as in The next step is to combine the line numbers
A. B. Smith like ‘.’, ‘$’, ‘/.../’ and ‘?...?’ with ‘+’ and ‘– ’. Thus
C. Jones $– 1
It is possible to do this with a series of editing com- is a command to print the next to last line of the
mands, but it is tedious and error-prone. (It is current file (that is, one line before line ‘$’). For
instructive to figure out how it is done, though.) example, to recall how far you got in a previous edit-
The alternative is to ‘tag’ the pieces of the pat- ing session,
tern (in this case, the last name, and the initials), and
$– 5,$p
then rearrange the pieces. On the left side of a sub-
stitution, if part of the pattern is enclosed between \\( prints the last six lines. (Be sure you understand why
and \\), whatever matched that part is remembered, it’s six, not five.) If there aren’t six, of course, you’ll
and available for use on the right side. On the right get an error message.
side, the symbol ‘\\1’ refers to whatever matched the As another example,
first \\(...\\) pair, ‘\\2’ to the second \\(...\\), and so on.
The command
.– 3,.+3p
prints from three lines before where you are now (at
1,$s/ˆ\\([ˆ,]∗\\), ∗\\(.∗\\)/\\2 \\1/
line dot) to three lines after, thus giving you a bit of
although hard to read, does the job. The first \\(...\\) context. By the way, the ‘+’ can be omitted:
matches the last name, which is any string up to the
comma; this is referred to on the right side with ‘\\1’.
.– 3,.3p
The second \\(...\\) is whatever follows the comma and is absolutely identical in meaning.
any spaces, and is referred to as ‘\\2’. Another area in which you can save typing
Of course, with any editing sequence this com- effort in specifying lines is to use ‘– ’ and ‘+’ as line
plicated, it’s foolhardy to simply run it and hope. numbers by themselves.
The global commands g and v discussed in section 4

provide a way for you to print exactly those lines
which were affected by the substitute command, and by itself is a command to move back up one line in
thus verify that it did what you wanted in all cases. the file. In fact, you can string several minus signs
together to move back up that many lines:
3. LINE ADDRESSING IN THE EDITOR
–––
The next general area we will discuss is that of
line addressing in ed, that is, how you specify what moves up three lines, as does ‘– 3’. Thus
lines are to be affected by editing commands. We – 3,+3p
have already used constructions like
is also identical to the examples above.
1,$s/x/y/
Since ‘– ’ is shorter than ‘.– 1’, constructions
to specify a change on all lines. And most users are like
long since familiar with using a single newline (or
– ,.s/bad/good/
return) to print the next line, and with
are useful. This changes ‘bad’ to ‘good’ on the previ-
/thing/
ous line and on the current line.
to find a line that contains ‘thing’. Less familiar, ‘+’ and ‘– ’ can be used in combination with
surprisingly enough, is the use of searches using ‘/.../’ and ‘?...?’, and with ‘$’. The
?thing? search

to scan backwards for the previous occurrence of /thing/– –


‘thing’. This is especially handy when you realize finds the line containing ‘thing’, and positions you
that the thing you want to operate on is back up the two lines before it.
page from where you are currently editing.
The slash and question mark are the only char- Repeated Searches
acters you can use to delimit a context search, though Suppose you ask for the search
you can use essentially any character in a substitute
command. /horrible thing/
and when the line is printed you discover that it isn’t
the horrible thing that you wanted, so it is necessary
-8-

to repeat the search again. You don’t have to re-type The line-changing commands a, c and i by
the search, for the construction default all affect the current line — if you give no
line number with them, a appends text after the
//
current line, c changes the current line, and i inserts
is a shorthand for ‘the previous thing that was text before the current line.
searched for’, whatever it was. This can be repeated a, c, and i behave identically in one respect —
as many times as necessary. You can also go back- when you stop appending, changing or inserting, dot
wards: points at the last line entered. This is exactly what
?? you want for typing and editing on the fly. For
example, you can say
searches for the same thing, but in the reverse direc-
tion. a
... text ...
Not only can you repeat the search, but you
... botch ... (minor error)
can use ‘//’ as the left side of a substitute command,
to mean ‘the most recent pattern’.
.
s/botch/correct/ (fix botched line)
/horrible thing/ a
.... ed prints line with ‘horrible thing’ ... ... more text ...
s//good/p
without specifying any line number for the substitute
To go backwards and change a line, say command or for the second append command. Or
you can say
??s//good/
a
Of course, you can still use the ‘&’ on the right hand
... text ...
side of a substitute to stand for whatever got matched:
... horrible botch ... (major error)
//s//& &/p .
c (replace entire line)
finds the next occurrence of whatever you searched
... fixed up line ...
for last, replaces it by two copies of itself, then prints
the line just to verify that it worked. You should experiment to determine what hap-
pens if you add no lines with a, c or i.
Default Line Numbers and the Value of Dot
The r command will read a file into the text
One of the most effective ways to speed up being edited, either at the end if you give no address,
your editing is always to know what lines will be or after the specified line if you do. In either case,
affected by a command if you don’t specify the lines dot points at the last line read in. Remember that you
it is to act on, and on what line you will be posi- can even say 0r to read a file in at the beginning of
tioned (i.e., the value of dot) when a command the text. (You can also say 0a or 1i to start adding
finishes. If you can edit without specifying unneces- text at the beginning.)
sary line numbers, you can save a lot of typing.
The w command writes out the entire file. If
As the most obvious example, if you issue a you precede the command by one line number, that
search command like line is written, while if you precede it by two line
/thing/ numbers, that range of lines is written. The w com-
mand does not change dot: the current line remains
you are left pointing at the next line that contains the same, regardless of what lines are written. This is
‘thing’. Then no address is required with commands true even if you say something like
like s to make a substitution on that line, or p to print
it, or l to list it, or d to delete it, or a to append text /ˆ\\.AB/,/ˆ\\.AE/w abstract
after it, or c to change it, or i to insert text before it. which involves a context search.
What happens if there was no ‘thing’? Then Since the w command is so easy to use, you
you are left right where you were — dot is should save what you are editing regularly as you go
unchanged. This is also true if you were sitting on along just in case the system crashes, or in case you
the only ‘thing’ when you issued the command. The do something foolish, like clobbering what you’re
same rules hold for searches that use ‘?...?’; the only editing.
difference is the direction in which you search.
The least intuitive behavior, in a sense, is that
The delete command d leaves dot pointing at of the s command. The rule is simple — you are left
the line that followed the last deleted line. When line sitting on the last line that got changed. If there were
‘$’ gets deleted, however, dot points at the new line no changes, then dot is unchanged.
‘$’.
-9-

To illustrate, suppose that there are three lines


/a/;/b/p
in the buffer, and you are sitting on the middle one:
prints the range of lines from ‘ab’ to ‘bc’, because
x1
after the ‘a’ is found, dot is set to that line, and then
x2
‘b’ is searched for, starting beyond that line.
x3
This property is most often useful in a very
Then the command simple situation. Suppose you want to find the
– ,+s/x/y/p second occurrence of ‘thing’. You could say

prints the third line, which is the last one changed. /thing/
But if the three lines had been //

x1 but this prints the first occurrence as well as the


y2 second, and is a nuisance when you know very well
y3 that it is only the second one you’re interested in.
The solution is to say
and the same command had been issued while dot
pointed at the second line, then the result would be to /thing/;//
change and print only the first line, and that is where This says to find the first occurrence of ‘thing’, set
dot would be set. dot to that line, then find the second and print only
that.
Semicolon ‘;’
Closely related is searching for the second pre-
Searches with ‘/.../’ and ‘?...?’ start at the vious occurrence of something, as in
current line and move forward or backward respec-
tively until they either find the pattern or get back to ?something?;??
the current line. Sometimes this is not what is Printing the third or fourth or ... in either direction is
wanted. Suppose, for example, that the buffer con- left as an exercise.
tains lines like this:
Finally, bear in mind that if you want to find
. the first occurrence of something in a file, starting at
. an arbitrary place within the file, it is not sufficient to
. say
ab
. 1;/thing/
. because this fails if ‘thing’ occurs on line 1. But it is
. possible to say
bc
. 0;/thing/
. (one of the few places where 0 is a legal line
Starting at line 1, one would expect that the command number), for this starts the search at line 1.

/a/,/b/p Interrupting the Editor


prints all the lines from the ‘ab’ to the ‘bc’ inclusive. As a final note on what dot gets set to, you
Actually this is not what happens. Both searches (for should be aware that if you hit the interrupt or delete
‘a’ and for ‘b’) start from the same point, and thus or rubout or break key while ed is doing a command,
they both find the line that contains ‘ab’. The result things are put back together again and your state is
is to print a single line. Worse, if there had been a restored as much as possible to what it was before the
line with a ‘b’ in it before the ‘ab’ line, then the print command began. Naturally, some changes are irrevo-
command would be in error, since the second line cable — if you are reading or writing a file or making
number would be less than the first, and it is illegal to substitutions or deleting lines, these will be stopped in
try to print lines in reverse order. some clean but unpredictable state in the middle
This is because the comma separator for line (which is why it is not usually wise to stop them).
numbers doesn’t set dot as each address is processed; Dot may or may not be changed.
each search starts from the same place. In ed, the Printing is more clear cut. Dot is not changed
semicolon ‘;’ can be used just like comma, with the until the printing is done. Thus if you print until you
single difference that use of a semicolon forces dot to see an interesting line, then hit delete, you are not sit-
be set at that point as the line numbers are being ting on that line or even near it. Dot is left where it
evaluated. In effect, the semicolon ‘moves’ dot. was when the p command was started.
Thus in our example above, the command
- 10 -

4. GLOBAL COMMANDS signal for a new paragraph in some formatting pack-


The global commands g and v are used to per- ages). Remember that ‘+’ means ‘one line past dot’.
form one or more editing commands on all lines that And
either contain (g) or don’t contain (v) a specified pat- g/topic/?ˆ\\.SH?1
tern.
searches for each line that contains ‘topic’, scans
As the simplest example, the command backwards until it finds a line that begins ‘.SH’ (a
g/UNIX/p section heading) and prints the line that follows that,
thus showing the section headings under which ‘topic’
prints all lines that contain the word ‘UNIX’. The is mentioned. Finally,
pattern that goes between the slashes can be anything
that could be used in a line search or in a substitute g/ˆ\\.EQ/+,/ˆ\\.EN/– p
command; exactly the same rules and limitations prints all the lines that lie between lines beginning
apply. with ‘.EQ’ and ‘.EN’ formatting commands.
As another example, then, The g and v commands can also be preceded
g/ˆ\\./p by line numbers, in which case the lines searched are
only those in the range specified.
prints all the formatting commands in a file (lines that
begin with ‘.’). Multi-line Global Commands
The v command is identical to g, except that it It is possible to do more than one command
operates on those line that do not contain an under the control of a global command, although the
occurrence of the pattern. (Don’t look too hard for syntax for expressing the operation is not especially
mnemonic significance to the letter ‘v’.) So natural or pleasant. As an example, suppose the task
v/ˆ\\./p is to change ‘x’ to ‘y’ and ‘a’ to ‘b’ on all lines that
contain ‘thing’. Then
prints all the lines that don’t begin with ‘.’ — the
actual text lines. g/thing/s/x/y/\\
s/a/b/
The command that follows g or v can be any-
thing: is sufficient. The ‘\\’ signals the g command that the
set of commands continues on the next line; it ter-
g/ˆ\\./d
minates on the first line that does not end with ‘\\’.
deletes all lines that begin with ‘.’, and (As a minor blemish, you can’t use a substitute com-
mand to insert a newline within a g command.)
g/ˆ$/d
You should watch out for this problem: the
deletes all empty lines. command
Probably the most useful command that can
g/x/s//y/\\
follow a global is the substitute command, for this
s/a/b/
can be used to make a change and print each affected
line for verification. For example, we could change does not work as you expect. The remembered pat-
the word ‘Unix’ to ‘UNIX’ everywhere, and verify tern is the last pattern that was actually executed, so
that it really worked, with sometimes it will be ‘x’ (as expected), and sometimes
it will be ‘a’ (not expected). You must spell it out,
g/Unix/s//UNIX/gp
like this:
Notice that we used ‘//’ in the substitute command to
g/x/s/x/y/\\
mean ‘the previous pattern’, in this case, ‘Unix’. The
s/a/b/
p command is done on every line that matches the
pattern, not just those on which a substitution took It is also possible to execute a, c and i com-
place. mands under a global command; as with other multi-
The global command operates by making two line constructions, all that is needed is to add a ‘\\’ at
passes over the file. On the first pass, all lines that the end of each line except the last. Thus to add a
match the pattern are marked. On the second pass, ‘.nf’ and ‘.sp’ command before each ‘.EQ’ line, type
each marked line in turn is examined, dot is set to
g/ˆ\\.EQ/i\\
that line, and the command executed. This means
that it is possible for the command that follows a g or
.nf\\
v to use addresses, set dot, and so on, quite freely.
.sp
There is no need for a final line containing a ‘.’ to
g/ˆ\\.PP/+
terminate the i command, unless there are further
prints the line that follows each ‘.PP’ command (the commands being done under the global. On the other
- 11 -

hand, it does no harm to put it in either. Now if you decide at some time that you want
to get back to the original state of ‘good’, you can
5. CUT AND PASTE WITH UNIX COMMANDS say
One editing area in which non-programmers mv savegood good
seem not very confident is in what might be called
‘cut and paste’ operations — changing the name of a (if you’re not interested in ‘savegood’ any more), or
file, making a copy of a file somewhere else, moving cp savegood good
a few lines from one place to another in a file, insert-
ing one file in the middle of another, splitting a file if you still want to retain a safe copy.
into pieces, and splicing two or more files together. In summary, mv just renames a file; cp makes
Yet most of these operations are actually quite a duplicate copy. Both of them clobber the ‘target’
easy, if you keep your wits about you and go cau- file if it already exists, so you had better be sure
tiously. The next several sections talk about cut and that’s what you want to do before you do it.
paste. We will begin with the UNIX commands for
moving entire files around, then discuss ed commands Removing a File
for operating on pieces of files. If you decide you are really done with a file
forever, you can remove it with the rm command:
Changing the Name of a File
rm savegood
You have a file named ‘memo’ and you want it
to be called ‘paper’ instead. How is it done? throws away (irrevocably) the file called ‘savegood’.

The UNIX program that renames files is called Putting Two or More Files Together
mv (for ‘move’); it ‘moves’ the file from one name to
another, like this: The next step is the familiar one of collecting
two or more files into one big one. This will be
mv memo paper needed, for example, when the author of a paper
That’s all there is to it: mv from the old name to the decides that several sections need to be combined into
new name. one. There are several ways to do it, of which the
cleanest, once you get used to it, is a program called
mv oldname newname cat. (Not all programs have two-letter names.) cat is
Warning: if there is already a file around with the short for ‘concatenate’, which is exactly what we
new name, its present contents will be silently clob- want to do.
bered by the information from the other file. The one Suppose the job is to combine the files ‘file1’
exception is that you can’t move a file to itself — and ‘file2’ into a single file called ‘bigfile’. If you
say
mv x x
cat file
is illegal.
the contents of ‘file’ will get printed on your terminal.
Making a Copy of a File If you say
Sometimes what you want is a copy of a file cat file1 file2
— an entirely fresh version. This might be because
you want to work on a file, and yet save a copy in the contents of ‘file1’ and then the contents of ‘file2’
case something gets fouled up, or just because you’re will both be printed on your terminal, in that order.
paranoid. So cat combines the files, all right, but it’s not much
help to print them on the terminal — we want them
In any case, the way to do it is with the cp in ‘bigfile’.
command. (cp stands for ‘copy’; the system is big on
short command names, which are appreciated by Fortunately, there is a way. You can tell the
heavy users, but sometimes a strain for novices.) system that instead of printing on your terminal, you
Suppose you have a file called ‘good’ and you want want the same information put in a file. The way to
to save a copy before you make some dramatic edit- do it is to add to the command line the character >
ing changes. Choose a name — ‘savegood’ might be and the name of the file where you want the output to
acceptable — then type go. Then you can say

cp good savegood cat file1 file2 >bigfile

This copies ‘good’ onto ‘savegood’, and you now and the job is done. (As with cp and mv, you’re put-
have two identical copies of the file ‘good’. (If ting something into ‘bigfile’, and anything that was
‘savegood’ previously contained something, it gets already there is destroyed.)
overwritten.) This ability to ‘capture’ the output of a pro-
- 12 -

gram is one of the most useful aspects of the system. Filenames


Fortunately it’s not limited to the cat program — you The first step is to ensure that you know the ed
can use it with any program that prints on your termi- commands for reading and writing files. Of course
nal. We’ll see some more uses for it in a moment. you can’t go very far without knowing r and w.
Naturally, you can combine several files, not Equally useful, but less well known, is the ‘edit’ com-
just two: mand e. Within ed, the command
cat file1 file2 file3 ... >bigfile e newfile
collects a whole bunch. says ‘I want to edit a new file called newfile, without
Question: is there any difference between leaving the editor.’ The e command discards what-
ever you’re currently working on and starts over on
cp good savegood newfile. It’s exactly the same as if you had quit with
and the q command, then re-entered ed with a new file
name, except that if you have a pattern remembered,
cat good >savegood then a command like // will still work.
Answer: for most purposes, no. You might reason- If you enter ed with the command
ably ask why there are two programs in that case,
ed file
since cat is obviously all you need. The answer is
that cp will do some other things as well, which you ed remembers the name of the file, and any subse-
can investigate for yourself by reading the manual. quent e, r or w commands that don’t contain a
For now we’ll stick to simple usages. filename will refer to this remembered file. Thus
ed file1
Adding Something to the End of a File
... (editing) ...
Sometimes you want to add one file to the end w (writes back in file1)
of another. We have enough building blocks now e file2 (edit new file, without leaving editor)
that you can do it; in fact before reading further it ... (editing on file2) ...
would be valuable if you figured out how. To be w (writes back on file2)
specific, how would you use cp, mv and/or cat to add
the file ‘good1’ to the end of the file ‘good’? (and so on) does a series of edits on various files
without ever leaving ed and without typing the name
You could try of any file more than once. (As an aside, if you
cat good good1 >temp examine the sequence of commands here, you can see
mv temp good why many UNIX systems use e as a synonym for
ed.)
which is probably most direct. You should also
understand why You can find out the remembered file name at
any time with the f command; just type f without a
cat good good1 >good file name. You can also change the name of the
doesn’t work. (Don’t practice with a good ‘good’!) remembered file name with f; a useful sequence is

The easy way is to use a variant of >, called ed precious


>>. In fact, >> is identical to > except that instead of f junk
clobbering the old file, it simply tacks stuff on at the ... (editing) ...
end. Thus you could say which gets a copy of a precious file, then uses f to
cat good1 >>good guarantee that a careless w command won’t clobber
the original.
and ‘good1’ is added to the end of ‘good’. (And if
‘good’ didn’t exist, this makes a copy of ‘good1’ Inserting One File into Another
called ‘good’.)
Suppose you have a file called ‘memo’, and
6. CUT AND PASTE WITH THE EDITOR you want the file called ‘table’ to be inserted just
after the reference to Table 1. That is, in ‘memo’
Now we move on to manipulating pieces of somewhere is a line that says
files — individual lines or groups of lines. This is
another area where new users seem unsure of them- Table 1 shows that ...
selves. and the data contained in ‘table’ has to go there,
probably so it will be formatted properly by nroff or
troff. Now what?
This one is easy. Edit ‘memo’, find ‘Table 1’,
and add the file ‘table’ right there:
- 13 -

Moving Lines Around


ed memo
/Table 1/ Suppose you want to move a paragraph from
Table 1 shows that ... [response from ed] its present position in a paper to the end. How would
.r table you do it? As a concrete example, suppose each
paragraph in the paper begins with the formatting
The critical line is the last one. As we said earlier, command ‘.PP’. Think about it and write down the
the r command reads a file; here you asked for it to details before reading on.
be read in right after line dot. An r command
without any address adds lines at the end, so it is the The brute force way (not necessarily bad) is to
same as $r. write the paragraph onto a temporary file, delete it
from its current position, then read in the temporary
Writing out Part of a File file at the end. Assuming that you are sitting on the
‘.PP’ command that begins the paragraph, this is the
The other side of the coin is writing out part of sequence of commands:
the document you’re editing. For example, maybe
you want to split out into a separate file that table .,/ˆ\\.PP/– w temp
from the previous example, so it can be formatted and .,//– d
tested separately. Suppose that in the file being $r temp
edited we have That is, from where you are now (‘.’) until one line
.TS before the next ‘.PP’ (‘/ˆ\\.PP/– ’) write onto ‘temp’.
...[lots of stuff] Then delete the same lines. Finally, read ‘temp’ at
.TE the end.

which is the way a table is set up for the tbl program. As we said, that’s the brute force way. The
To isolate the table in a separate file called ‘table’, easier way (often) is to use the move command m
first find the start of the table (the ‘.TS’ line), then that ed provides — it lets you do the whole set of
write out the interesting part: operations at one crack, without any temporary file.
The m command is like many other ed com-
/ˆ\\.TS/
mands in that it takes up to two line numbers in front
.TS [ed prints the line it found] that tell what lines are to be affected. It is also fol-
.,/ˆ\\.TE/w table lowed by a line number that tells where the lines are
and the job is done. If you are confident, you can do to go. Thus
it all at once with
line1, line2 m line3
/ˆ\\.TS/;/ˆ\\.TE/w table
says to move all the lines between ‘line1’ and ‘line2’
The point is that the w command can write out after ‘line3’. Naturally, any of ‘line1’ etc., can be
a group of lines, instead of the whole file. In fact, patterns between slashes, $ signs, or other ways to
you can write out a single line if you like; just give specify lines.
one line number instead of two. For example, if you Suppose again that you’re sitting at the first
have just typed a horribly complicated line and you line of the paragraph. Then you can say
know that it (or something like it) is going to be
.,/ˆ\\.PP/– m$
needed later, then save it — don’t re-type it. In the
editor, say That’s all.
a As another example of a frequent operation,
...lots of stuff... you can reverse the order of two adjacent lines by
...horrible line... moving the first one to after the second. Suppose that
. you are positioned at the first. Then
.w temp m+
a
...more stuff... does it. It says to move line dot to after one line
. after line dot. If you are positioned on the second
.r temp line,
a m– –
...more stuff...
. does the interchange.

This last example is worth studying, to be sure you As you can see, the m command is more suc-
appreciate what’s going on. cinct and direct than writing, deleting and re-reading.
When is brute force better anyway? This is a matter
of personal taste — do what you have most
- 14 -

confidence in. The main difficulty with the m com-


a
mand is that if you use patterns to specify both the
.......... x ......... (long line)
lines you are moving and the target, you have to take
care that you specify them properly, or you may well
.
t. (make a copy)
not move the lines you thought you did. The result
s/x/y/ (change it a bit)
of a botched m command can be a ghastly mess.
t. (make third copy)
Doing the job a step at a time makes it easier for you
s/y/z/ (change it a bit)
to verify at each step that you accomplished what you
wanted to. It’s also a good idea to issue a w com- and so on.
mand before doing anything complicated; then if you
goof, it’s easy to back up to where you were. The Temporary Escape ‘!’
Sometimes it is convenient to be able to tem-
Marks porarily escape from the editor to do some other UNIX
ed provides a facility for marking a line with a command, perhaps one of the file copy or move com-
particular name so you can later reference it by name mands discussed in section 5, without leaving the edi-
regardless of its actual line number. This can be tor. The ‘escape’ command ! provides a way to do
handy for moving lines, and for keeping track of this.
them as they move. The mark command is k; the If you say
command
!any UNIX command
kx
your current editing state is suspended, and the UNIX
marks the current line with the name ‘x’. If a line command you asked for is executed. When the com-
number precedes the k, that line is marked. (The mand finishes, ed will signal you by printing another
mark name must be a single lower case letter.) Now !; at that point you can resume editing.
you can refer to the marked line with the address
You can really do any UNIX command, includ-
′x ing another ed. (This is quite common, in fact.) In
this case, you can even do another !.
Marks are most useful for moving things
around. Find the first line of the block to be moved, 7. SUPPORTING TOOLS
and mark it with ′a. Then find the last line and mark
it with ′b. Now position yourself at the place where There are several tools and techniques that go
the stuff is to go and say along with the editor, all of which are relatively easy
once you know how ed works, because they are all
′a,′bm. based on the editor. In this section we will give
some fairly cursory examples of these tools, more to
Bear in mind that only one line can have a par- indicate their existence than to provide a complete
ticular mark name associated with it at any given tutorial. More information on each can be found in
time. [3].

Copying Lines Grep


We mentioned earlier the idea of saving a line Sometimes you want to find all occurrences of
that was hard to type or used often, so as to cut down some word or pattern in a set of files, to edit them or
on typing time. Of course this could be more than perhaps just to verify their presence or absence. It
one line; then the saving is presumably even greater. may be possible to edit each file separately and look
ed provides another command, called t (for for the pattern of interest, but if there are many files
‘transfer’) for making a copy of a group of one or this can get very tedious, and if the files are really
more lines at any point. This is often easier than big, it may be impossible because of limits in ed.
writing and reading. The program grep was invented to get around
The t command is identical to the m com- these limitations. The search patterns that we have
mand, except that instead of moving lines it simply described in the paper are often called ‘regular
duplicates them at the place you named. Thus expressions’, and ‘grep’ stands for
1,$t$ g/re/p
duplicates the entire contents that you are editing. A That describes exactly what grep does — it prints
more common use for t is for creating a series of every line in a set of files that contains a particular
lines that differ only slightly. For example, you can pattern. Thus
say
grep ′thing′ file1 file2 file3 ...
- 15 -

finds ‘thing’ wherever it occurs in any of the files Sed


‘file1’, ‘file2’, etc. grep also indicates the file in sed (‘stream editor’) is a version of the editor
which the line was found, so you can later edit it if with restricted capabilities but which is capable of
you like. processing unlimited amounts of input. Basically sed
The pattern represented by ‘thing’ can be any copies its input to its output, applying one or more
pattern you can use in the editor, since grep and ed editing commands to each line of input.
use exactly the same mechanism for pattern search- As an example, suppose that we want to do the
ing. It is wisest always to enclose the pattern in the ‘Unix’ to ‘UNIX’ part of the example given above,
single quotes ′...′ if it contains any non-alphabetic but without rewriting the files. Then the command
characters, since many such characters also mean
something special to the UNIX command interpreter sed ′s/Unix/UNIX/g′ file1 file2 ...
(the ‘shell’). If you don’t quote them, the command applies the command ‘s/Unix/UNIX/g’ to all lines
interpreter will try to interpret them before grep gets from ‘file1’, ‘file2’, etc., and copies all lines to the
a chance. output. The advantage of using sed in such a case is
There is also a way to find lines that don’t that it can be used with input too large for ed to han-
contain a pattern: dle. All the output can be collected in one place,
either in a file or perhaps piped into another program.
grep – v ′thing′ file1 file2 ...
If the editing transformation is so complicated
finds all lines that don’t contains ‘thing’. The – v that more than one editing command is needed, com-
must occur in the position shown. Given grep and mands can be supplied from a file, or on the com-
grep – v, it is possible to do things like selecting all mand line, with a slightly more complex syntax. To
lines that contain some combination of patterns. For take commands from a file, for example,
example, to get all lines that contain ‘x’ but not ‘y’:
sed – f cmdfile input– files...
grep x file...  grep – v y
(The notation  is a ‘pipe’, which causes the output sed has further capabilities, including condi-
of the first command to be used as input to the tional testing and branching, which we cannot go into
second command; see [2].) here.

Editing Scripts Acknowledgement

If a fairly complicated set of editing operations I am grateful to Ted Dolotta for his careful
is to be done on a whole set of files, the easiest thing reading and valuable suggestions.
to do is to make up a ‘script’, i.e., a file that contains
the operations you want to perform, then apply this References
script to each file in turn. [1] Brian W. Kernighan, A Tutorial Introduction to
For example, suppose you want to change the UNIX Text Editor, Bell Laboratories inter-
every ‘Unix’ to ‘UNIX’ and every ‘Gcos’ to ‘GCOS’ nal memorandum.
in a large number of files. Then put into the file [2] Brian W. Kernighan, UNIX For Beginners,
‘script’ the lines Bell Laboratories internal memorandum.
g/Unix/s//UNIX/g [3] Ken L. Thompson and Dennis M. Ritchie, The
g/Gcos/s//GCOS/g UNIX Programmer’s Manual. Bell Labora-
w tories.
q
Now you can say
ed file1 <script
ed file2 <script
...
This causes ed to take its commands from the
prepared script. Notice that the whole job has to be
planned in advance.
And of course by using the UNIX command
interpreter, you can cycle through a set of files
automatically, with varying degrees of ease.
An Introduction to the UNIX Shell

S. R. Bourne
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

The shell is a command programming language that provides an interface to the UNIX†
operating system. Its features include control-flow primitives, parameter passing, vari-
ables and string substitution. Constructs such as while, if then else, case and for are
available. Two-way communication is possible between the shell and commands.
String-valued parameters, typically file names or flags, may be passed to a command.
A return code is set by commands that may be used to determine control-flow, and the
standard output from a command may be used as shell input.
The shell can modify the environment in which commands run. Input and output can
be redirected to files, and processes that communicate through ‘pipes’ can be invoked.
Commands are found by searching directories in the file system in a sequence that can
be defined by the user. Commands can be read either from the terminal or from a file,
which allows command procedures to be stored for later use.

November 12, 1978

_______________
†UNIX is a Trademark of Bell Laboratories.
An Introduction to the UNIX Shell

S. R. Bourne
Bell Laboratories
Murray Hill, New Jersey 07974

1.0 Introduction
The shell is both a command language and a programming language that provides an interface to the
UNIX operating system. This memorandum describes, with examples, the UNIX shell. The first section
covers most of the everyday requirements of terminal users. Some familiarity with UNIX is an advan-
tage when reading this section; see, for example, "UNIX for beginners".1 Section 2 describes those
features of the shell primarily intended for use within shell procedures. These include the control-flow
primitives and string-valued variables provided by the shell. A knowledge of a programming language
would be a help when reading this section. The last section describes the more advanced features of the
shell. References of the form "see pipe (2)" are to a section of the UNIX manual.2

1.1 Simple commands


Simple commands consist of one or more words separated by blanks. The first word is the name of the
command to be executed; any remaining words are passed as arguments to the command. For example,
who
is a command that prints the names of users logged in. The command
ls −l
prints a list of files in the current directory. The argument −l tells ls to print status information, size and
the creation date for each file.

1.2 Background commands


To execute a command the shell normally creates a new process and waits for it to finish. A command
may be run without waiting for it to finish. For example,
cc pgm.c &
calls the C compiler to compile the file pgm.c . The trailing & is an operator that instructs the shell not
to wait for the command to finish. To help keep track of such a process the shell reports its process
number following its creation. A list of currently active processes may be obtained using the ps com-
mand.

1.3 Input output redirection


Most commands produce output on the standard output that is initially connected to the terminal. This
output may be sent to a file by writing, for example,
ls −l >file
The notation >file is interpreted by the shell and is not passed as an argument to ls. If file does not exist
then the shell creates it; otherwise the original contents of file are replaced with the output from ls. Out-
put may be appended to a file using the notation
ls −l >>file
In this case file is also created if it does not already exist.
-2-

The standard input of a command may be taken from a file instead of the terminal by writing, for exam-
ple,
wc <file
The command wc reads its standard input (in this case redirected from file) and prints the number of
characters, words and lines found. If only the number of lines is required then
wc −l <file
could be used.

1.4 Pipelines and filters


The standard output of one command may be connected to the standard input of another by writing the
‘pipe’ operator, indicated by | , as in,
ls −l | wc
Two commands connected in this way constitute a pipeline and the overall effect is the same as
ls −l >file; wc <file
except that no file is used. Instead the two processes are connected by a pipe (see pipe (2)) and are run
in parallel. Pipes are unidirectional and synchronization is achieved by halting wc when there is nothing
to read and halting ls when the pipe is full.
A filter is a command that reads its standard input, transforms it in some way, and prints the result as
output. One such filter, grep, selects from its input those lines that contain some specified string. For
example,
ls | grep old
prints those lines, if any, of the output from ls that contain the string old. Another useful filter is sort.
For example,
who | sort
will print an alphabetically sorted list of logged in users.
A pipeline may consist of more than two commands, for example,
ls | grep old | wc −l
prints the number of file names in the current directory containing the string old.

1.5 File name generation


Many commands accept arguments which are file names. For example,
ls −l main.c
prints information relating to the file main.c .
The shell provides a mechanism for generating a list of file names that match a pattern. For example,
ls −l *.c
generates, as arguments to ls, all file names in the current directory that end in .c . The character * is a
pattern that will match any string including the null string. In general patterns are specified as follows.
* Matches any string of characters including the null string.
? Matches any single character.
[. . .] Matches any one of the characters enclosed. A pair of characters separated by a minus
will match any character lexically between the pair.
For example,
-3-

[a−z]*
matches all names in the current directory beginning with one of the letters a through z.
/usr/fred/test/?
matches all names in the directory /usr/fred/test that consist of a single character. If no file name is
found that matches the pattern then the pattern is passed, unchanged, as an argument.
This mechanism is useful both to save typing and to select names according to some pattern. It may
also be used to find files. For example,
echo /usr/fred/*/core
finds and prints the names of all core files in sub-directories of /usr/fred . (echo is a standard UNIX
command that prints its arguments, separated by blanks.) This last feature can be expensive, requiring a
scan of all sub-directories of /usr/fred .
There is one exception to the general rules given for patterns. The character ‘.’ at the start of a file
name must be explicitly matched.
echo *
will therefore echo all file names in the current directory not beginning with ‘.’ .
echo .*
will echo all those file names that begin with ‘.’ . This avoids inadvertent matching of the names ‘.’ and
‘..’ which mean ‘the current directory’ and ‘the parent directory’ respectively. (Notice that ls
suppresses information for the files ‘.’ and ‘..’ .)

1.6 Quoting
Characters that have a special meaning to the shell, such as < > * ? | & , are called metacharacters. A
complete list of metacharacters is given in appendix B. Any character preceded by a \ is quoted and
loses its special meaning, if any. The \ is elided so that
echo \?
will echo a single ? , and
echo \\
will echo a single \ . To allow long strings to be continued over more than one line the sequence \new-
line is ignored.
\ is convenient for quoting single characters. When more than one character needs quoting the above
mechanism is clumsy and error prone. A string of characters may be quoted by enclosing the string
between single quotes. For example,
echo xx´****´xx
will echo
xx****xx
The quoted string may not contain a single quote but may contain newlines, which are preserved. This
quoting mechanism is the most simple and is recommended for casual use.
A third quoting mechanism using double quotes is also available that prevents interpretation of some but
not all metacharacters. Discussion of the details is deferred to section 3.4 .
-4-

1.7 Prompting
When the shell is used from a terminal it will issue a prompt before reading a command. By default
this prompt is ‘$ ’ . It may be changed by saying, for example,
PS1=yesdear

that sets the prompt to be the string yesdear . If a newline is typed and further input is needed then the
shell will issue the prompt ‘> ’ . Sometimes this can be caused by mistyping a quote mark. If it is
unexpected then an interrupt (DEL) will return the shell to read another command. This prompt may be
changed by saying, for example,
PS2=more

1.8 The shell and login


Following login (1) the shell is called to read and execute commands typed at the terminal. If the user’s
login directory contains the file .profile then it is assumed to contain commands and is read by the shell
before reading any commands from the terminal.

1.9 Summary

• ls
Print the names of files in the current directory.
• ls >file
Put the output from ls into file.
• ls | wc −l
Print the number of files in the current directory.
• ls | grep old
Print those file names containing the string old.
• ls | grep old | wc −l
Print the number of files whose name contains the string old.
• cc pgm.c &
Run cc in the background.
-5-

2.0 Shell procedures


The shell may be used to read and execute commands contained in a file. For example,
sh file [ args . . . ]
calls the shell to read commands from file. Such a file is called a command procedure or shell pro-
cedure. Arguments may be supplied with the call and are referred to in file using the positional parame-
ters $1, $2, . . . . For example, if the file wg contains
who | grep $1
then
sh wg fred
is equivalent to
who | grep fred

UNIX files have three independent attributes, read, write and execute. The UNIX command chmod (1)
may be used to make a file executable. For example,
chmod +x wg
will ensure that the file wg has execute status. Following this, the command
wg fred
is equivalent to
sh wg fred
This allows shell procedures and programs to be used interchangeably. In either case a new process is
created to run the command.
As well as providing names for the positional parameters, the number of positional parameters in the call
is available as $# . The name of the file being executed is available as $0 .
A special shell parameter $* is used to substitute for all positional parameters except $0 . A typical use
of this is to provide some default arguments, as in,
nroff −T450 −ms $*
which simply prepends some arguments to those already given.

2.1 Control flow - for


A frequent use of shell procedures is to loop through the arguments ($1, $2, . . .) executing commands
once for each argument. An example of such a procedure is tel that searches the file /usr/lib/telnos that
contains lines of the form
...
fred mh0123
bert mh0789
...
The text of tel is
for i
do grep $i /usr/lib/telnos; done
The command
tel fred
prints those lines in /usr/lib/telnos that contain the string fred .
-6-

tel fred bert


prints those lines containing fred followed by those for bert.
The for loop notation is recognized by the shell and has the general form
for name in w1 w2 . . .
do command-list
done
A command-list is a sequence of one or more simple commands separated or terminated by a newline or
semicolon. Furthermore, reserved words like do and done are only recognized following a newline or
semicolon. name is a shell variable that is set to the words w1 w2 . . . in turn each time the command-
list following do is executed. If in w1 w2 . . . is omitted then the loop is executed once for each posi-
tional parameter; that is, in $* is assumed.
Another example of the use of the for loop is the create command whose text is
for i do >$i; done
The command
create alpha beta
ensures that two empty files alpha and beta exist and are empty. The notation >file may be used on its
own to create or clear the contents of a file. Notice also that a semicolon (or newline) is required before
done.

2.2 Control flow - case


A multiple way branch is provided for by the case notation. For example,
case $# in
1) cat >>$1 ;;
2) cat >>$2 <$1 ;;
*) echo ´usage: append [ from ] to´ ;;
esac
is an append command. When called with one argument as
append file
$# is the string 1 and the standard input is copied onto the end of file using the cat command.
append file1 file2
appends the contents of file1 onto file2. If the number of arguments supplied to append is other than 1
or 2 then a message is printed indicating proper usage.
The general form of the case command is
case word in
pattern ) command-list ;;
...
esac
The shell attempts to match word with each pattern, in the order in which the patterns appear. If a
match is found the associated command-list is executed and execution of the case is complete. Since *
is the pattern that matches any string it can be used for the default case.
A word of caution: no check is made to ensure that only one pattern matches the case argument. The
first match found defines the set of commands to be executed. In the example below the commands fol-
lowing the second * will never be executed.
-7-

case $# in
*) . . . ;;
*) . . . ;;
esac

Another example of the use of the case construction is to distinguish between different forms of an argu-
ment. The following example is a fragment of a cc command.
for i
do case $i in
−[ocs]) . . . ;;
−*) echo ´unknown flag $i´ ;;
*.c) /lib/c0 $i . . . ;;
*) echo ´unexpected argument $i´ ;;
esac
done

To allow the same commands to be associated with more than one pattern the case command provides
for alternative patterns separated by a | . For example,
case $i in
−x | −y) ...
esac
is equivalent to
case $i in
−[xy]) ...
esac

The usual quoting conventions apply so that


case $i in
\?) . . .
will match the character ? .

2.3 Here documents


The shell procedure tel in section 2.1 uses the file /usr/lib/telnos to supply the data for grep. An alter-
native is to include this data within the shell procedure as a here document, as in,
for i
do grep $i <<!
...
fred mh0123
bert mh0789
...
!
done
In this example the shell takes the lines between <<! and ! as the standard input for grep. The string !
is arbitrary, the document being terminated by a line that consists of the string following << .
Parameters are substituted in the document before it is made available to grep as illustrated by the fol-
lowing procedure called edg .
-8-

ed $3 <<%
g/$1/s//$2/g
w
%
The call
edg string1 string2 file
is then equivalent to the command
ed file <<%
g/string1/s//string2/g
w
%
and changes all occurrences of string1 in file to string2 . Substitution can be prevented using \ to quote
the special character $ as in
ed $3 <<+
1,\$s/$1/$2/g
w
+
(This version of edg is equivalent to the first except that ed will print a ? if there are no occurrences of
the string $1 .) Substitution within a here document may be prevented entirely by quoting the terminat-
ing string, for example,
grep $i <<\#
...
#
The document is presented without modification to grep. If parameter substitution is not required in a
here document this latter form is more efficient.

2.4 Shell variables


The shell provides string-valued variables. Variable names begin with a letter and consist of letters,
digits and underscores. Variables may be given values by writing, for example,
user=fred box=m000 acct=mh0000
which assigns values to the variables user, box and acct. A variable may be set to the null string by
saying, for example,
null=
The value of a variable is substituted by preceding its name with $ ; for example,
echo $user
will echo fred.
Variables may be used interactively to provide abbreviations for frequently used strings. For example,
b=/usr/fred/bin
mv pgm $b
will move the file pgm from the current directory to the directory /usr/fred/bin . A more general nota-
tion is available for parameter (or variable) substitution, as in,
echo ${user}
which is equivalent to
-9-

echo $user
and is used when the parameter name is followed by a letter or digit. For example,
tmp=/tmp/ps
ps a >${tmp}a
will direct the output of ps to the file /tmp/psa, whereas,
ps a >$tmpa
would cause the value of the variable tmpa to be substituted.
Except for $? the following are set initially by the shell. $? is set after executing each command.
$? The exit status (return code) of the last command executed as a decimal string. Most
commands return a zero exit status if they complete successfully, otherwise a non-zero
exit status is returned. Testing the value of return codes is dealt with later under if and
while commands.
$# The number of positional parameters (in decimal). Used, for example, in the append
command to check the number of parameters.
$$ The process number of this shell (in decimal). Since process numbers are unique among
all existing processes, this string is frequently used to generate unique temporary file
names. For example,
ps a >/tmp/ps$$
...
rm /tmp/ps$$

$! The process number of the last process run in the background (in decimal).
$− The current shell flags, such as −x and −v .
Some variables have a special meaning to the shell and should be avoided for general use.
$MAIL When used interactively the shell looks at the file specified by this variable before it
issues a prompt. If the specified file has been modified since it was last looked at the
shell prints the message you have mail before prompting for the next command. This
variable is typically set in the file .profile, in the user’s login directory. For example,
MAIL=/usr/mail/fred

$HOME The default argument for the cd command. The current directory is used to resolve file
name references that do not begin with a / , and is changed using the cd command. For
example,
cd /usr/fred/bin
makes the current directory /usr/fred/bin .
cat wn
will print on the terminal the file wn in this directory. The command cd with no argu-
ment is equivalent to
cd $HOME
This variable is also typically set in the the user’s login profile.
$PATH A list of directories that contain commands (the search path ). Each time a command is
- 10 -

executed by the shell a list of directories is searched for an executable file. If $PATH is
not set then the current directory, /bin, and /usr/bin are searched by default. Otherwise
$PATH consists of directory names separated by : . For example,
PATH=:/usr/fred/bin:/bin:/usr/bin

specifies that the current directory (the null string before the first : ), /usr/fred/bin, /bin
and /usr/bin are to be searched in that order. In this way individual users can have their
own ‘private’ commands that are accessible independently of the current directory. If
the command name contains a / then this directory search is not used; a single attempt is
made to execute the command.
$PS1 The primary shell prompt string, by default, ‘$ ’.
$PS2 The shell prompt when further input is needed, by default, ‘> ’.
$IFS The set of characters used by blank interpretation (see section 3.4).

2.5 The test command


The test command, although not part of the shell, is intended for use by shell programs. For example,
test −f file
returns zero exit status if file exists and non-zero exit status otherwise. In general test evaluates a predi-
cate and returns the result as its exit status. Some of the more frequently used test arguments are given
here, see test (1) for a complete specification.
test s true if the argument s is not the null string
test −f file true if file exists
test −r file true if file is readable
test −w file true if file is writable
test −d file true if file is a directory

2.6 Control flow - while


The actions of the for loop and the case branch are determined by data available to the shell. A while
or until loop and an if then else branch are also provided whose actions are determined by the exit
status returned by commands. A while loop has the general form
while command-list1
do command-list2
done

The value tested by the while command is the exit status of the last simple command following while.
Each time round the loop command-list1 is executed; if a zero exit status is returned then command-list2
is executed; otherwise, the loop terminates. For example,
while test $1
do . . .
shift
done
is equivalent to
for i
do . . .
done
shift is a shell command that renames the positional parameters $2, $3, . . . as $1, $2, . . . and loses $1 .
Another kind of use for the while/until loop is to wait until some external event occurs and then run
some commands. In an until loop the termination condition is reversed. For example,
- 11 -

until test −f file


do sleep 300; done
commands
will loop until file exists. Each time round the loop it waits for 5 minutes before trying again. (Presum-
ably another process will eventually create the file.)

2.7 Control flow - if


Also available is a general conditional branch of the form,
if command-list
then command-list
else command-list
fi
that tests the value returned by the last simple command following if.
The if command may be used in conjunction with the test command to test for the existence of a file as
in
if test −f file
then process file
else do something else
fi

An example of the use of if, case and for constructions is given in section 2.10 .
A multiple test if command of the form
if . . .
then ...
else if . . .
then ...
else if . . .
...
fi
fi
fi
may be written using an extension of the if notation as,
if . . .
then ...
elif ...
then ...
elif ...
...
fi

The following example is the touch command which changes the ‘last modified’ time for a list of files.
The command may be used in conjunction with make (1) to force recompilation of a list of files.
- 12 -

flag=
for i
do case $i in
−c) flag=N ;;
*) if test −f $i
then ln $i junk$$; rm junk$$
elif test $flag
then echo file \´$i\´ does not exist
else >$i
fi
esac
done
The −c flag is used in this command to force subsequent files to be created if they do not already exist.
Otherwise, if the file does not exist, an error message is printed. The shell variable flag is set to some
non-null string if the −c argument is encountered. The commands
ln . . .; rm . . .
make a link to the file and then remove it thus causing the last modified date to be updated.
The sequence
if command1
then command2
fi
may be written
command1 && command2
Conversely,
command1 | | command2
executes command2 only if command1 fails. In each case the value returned is that of the last simple
command executed.

2.8 Command grouping


Commands may be grouped in two ways,
{ command-list ; }
and
( command-list )

In the first command-list is simply executed. The second form executes command-list as a separate pro-
cess. For example,
(cd x; rm junk )
executes rm junk in the directory x without changing the current directory of the invoking shell.
The commands
cd x; rm junk
have the same effect but leave the invoking shell in the directory x.
- 13 -

2.9 Debugging shell procedures


The shell provides two tracing mechanisms to help when debugging shell procedures. The first is
invoked within the procedure as
set −v
(v for verbose) and causes lines of the procedure to be printed as they are read. It is useful to help iso-
late syntax errors. It may be invoked without modifying the procedure by saying
sh −v proc . . .
where proc is the name of the shell procedure. This flag may be used in conjunction with the −n flag
which prevents execution of subsequent commands. (Note that saying set −n at a terminal will render
the terminal useless until an end-of-file is typed.)
The command
set −x
will produce an execution trace. Following parameter substitution each command is printed as it is exe-
cuted. (Try these at the terminal to see what effect they have.) Both flags may be turned off by saying
set −
and the current setting of the shell flags is available as $− .

2.10 The man command


The following is the man command which is used to print sections of the UNIX manual. It is called,
for example, as
man sh
man −t ed
man 2 fork
In the first the manual section for sh is printed. Since no section is specified, section 1 is used. The
second example will typeset (−t option) the manual section for ed. The last prints the fork manual page
from section 2.
- 14 -

cd /usr/man

: ´colon is the comment command´


: ´default is nroff ($N), section 1 ($s)´
N=n s=1

for i
do case $i in
[1−9]*) s=$i ;;
−t) N=t ;;
−n) N=n ;;
−*) echo unknown flag \´$i\´ ;;

*) if test −f man$s/$i.$s
then ${N}roff man0/${N}aa man$s/$i.$s
else : ´look through all manual sections´
found=no
for j in 1 2 3 4 5 6 7 8 9
do if test −f man$j/$i.$j
then man $j $i
found=yes
fi
done
case $found in
no) echo ´$i: manual page not found´
esac
fi
esac
done
Figure 1. A version of the man command
- 15 -

3.0 Keyword parameters


Shell variables may be given values by assignment or when a shell procedure is invoked. An argument
to a shell procedure of the form name=value that precedes the command name causes value to be
assigned to name before execution of the procedure begins. The value of name in the invoking shell is
not affected. For example,
user=fred command
will execute command with user set to fred. The −k flag causes arguments of the form name=value to
be interpreted in this way anywhere in the argument list. Such names are sometimes called keyword
parameters. If any arguments remain they are available as positional parameters $1, $2, . . . .
The set command may also be used to set positional parameters from within a procedure. For example,
set − *
will set $1 to the first file name in the current directory, $2 to the next, and so on. Note that the first
argument, −, ensures correct treatment when the first file name begins with a − .

3.1 Parameter transmission


When a shell procedure is invoked both positional and keyword parameters may be supplied with the
call. Keyword parameters are also made available implicitly to a shell procedure by specifying in
advance that such parameters are to be exported. For example,
export user box
marks the variables user and box for export. When a shell procedure is invoked copies are made of all
exportable variables for use within the invoked procedure. Modification of such variables within the
procedure does not affect the values in the invoking shell. It is generally true of a shell procedure that it
may not modify the state of its caller without explicit request on the part of the caller. (Shared file
descriptors are an exception to this rule.)
Names whose value is intended to remain constant may be declared readonly . The form of this com-
mand is the same as that of the export command,
readonly name . . .
Subsequent attempts to set readonly variables are illegal.

3.2 Parameter substitution


If a shell parameter is not set then the null string is substituted for it. For example, if the variable d is
not set
echo $d
or
echo ${d}
will echo nothing. A default string may be given as in
echo ${d−.}
which will echo the value of the variable d if it is set and ‘.’ otherwise. The default string is evaluated
using the usual quoting conventions so that
echo ${d−´*´}
will echo * if the variable d is not set. Similarly
echo ${d−$1}
will echo the value of d if it is set and the value (if any) of $1 otherwise. A variable may be assigned a
default value using the notation
- 16 -

echo ${d=.}
which substitutes the same string as
echo ${d−.}
and if d were not previously set then it will be set to the string ‘.’ . (The notation ${. . .=. . .} is not
available for positional parameters.)
If there is no sensible default then the notation
echo ${d?message}
will echo the value of the variable d if it has one, otherwise message is printed by the shell and execu-
tion of the shell procedure is abandoned. If message is absent then a standard message is printed. A
shell procedure that requires some parameters to be set might start as follows.
: ${user?} ${acct?} ${bin?}
...
Colon (:) is a command that is built in to the shell and does nothing once its arguments have been
evaluated. If any of the variables user, acct or bin are not set then the shell will abandon execution of
the procedure.

3.3 Command substitution


The standard output from a command can be substituted in a similar way to parameters. The command
pwd prints on its standard output the name of the current directory. For example, if the current directory
is /usr/fred/bin then the command
d=`pwd`
is equivalent to
d=/usr/fred/bin

The entire string between grave accents (`. . .`) is taken as the command to be executed and is replaced
with the output from the command. The command is written using the usual quoting conventions except
that a ` must be escaped using a \ . For example,
ls `echo "$1"`
is equivalent to
ls $1
Command substitution occurs in all contexts where parameter substitution occurs (including here docu-
ments) and the treatment of the resulting text is the same in both cases. This mechanism allows string
processing commands to be used within shell procedures. An example of such a command is basename
which removes a specified suffix from a string. For example,
basename main.c .c
will print the string main . Its use is illustrated by the following fragment from a cc command.
case $A in
...
*.c) B=`basename $A .c`
...
esac
that sets B to the part of $A with the suffix .c stripped.
Here are some composite examples.
- 17 -

• for i in `ls −t`; do . . .


The variable i is set to the names of files in time order, most recent first.
• set `date`; echo $6 $2 $3, $4
will print, e.g., 1977 Nov 1, 23:59:59

3.4 Evaluation and quoting


The shell is a macro processor that provides parameter substitution, command substitution and file name
generation for the arguments to commands. This section discusses the order in which these evaluations
occur and the effects of the various quoting mechanisms.
Commands are parsed initially according to the grammar given in appendix A. Before a command is
executed the following substitutions occur.
• parameter substitution, e.g. $user
• command substitution, e.g. `pwd`
Only one evaluation occurs so that if, for example, the value of the variable X is the string
$y then
echo $X
will echo $y .
• blank interpretation
Following the above substitutions the resulting characters are broken into non-blank words
(blank interpretation). For this purpose ‘blanks’ are the characters of the string $IFS. By
default, this string consists of blank, tab and newline. The null string is not regarded as a
word unless it is quoted. For example,
echo ´´
will pass on the null string as the first argument to echo, whereas
echo $null
will call echo with no arguments if the variable null is not set or set to the null string.
• file name generation
Each word is then scanned for the file pattern characters *, ? and [. . .] and an alphabetical
list of file names is generated to replace the word. Each such file name is a separate argu-
ment.
The evaluations just described also occur in the list of words associated with a for loop. Only substitu-
tion occurs in the word used for a case branch.
As well as the quoting mechanisms described earlier using \ and ´. . .´ a third quoting mechanism is pro-
vided using double quotes. Within double quotes parameter and command substitution occurs but file
name generation and the interpretation of blanks does not. The following characters have a special
meaning within double quotes and may be quoted using \ .
$ parameter substitution
` command substitution
" ends the quoted string
\ quotes the special characters $ ` " \
For example,
echo "$x"
will pass the value of the variable x as a single argument to echo. Similarly,
echo "$*"
will pass the positional parameters as a single argument and is equivalent to
- 18 -

echo "$1 $2 . . ."


The notation $@ is the same as $* except when it is quoted.
echo "$@"
will pass the positional parameters, unevaluated, to echo and is equivalent to
echo "$1" "$2" . . .

The following table gives, for each quoting mechanism, the shell metacharacters that are evaluated.
metacharacter
\ $ * ` " ´
´ n n n n n t
` y n n t n n
" y y n y t n

t terminator
y interpreted
n not interpreted

Figure 2. Quoting mechanisms

In cases where more than one evaluation of a string is required the built-in command eval may be used.
For example, if the variable X has the value $y, and if y has the value pqr then
eval echo $X
will echo the string pqr .
In general the eval command evaluates its arguments (as do all commands) and treats the result as input
to the shell. The input is read and the resulting command(s) executed. For example,
wg=´eval who | grep´
$wg fred
is equivalent to
who | grep fred
In this example, eval is required since there is no interpretation of metacharacters, such as | , following
substitution.

3.5 Error handling


The treatment of errors detected by the shell depends on the type of error and on whether the shell is
being used interactively. An interactive shell is one whose input and output are connected to a terminal
(as determined by gtty (2)). A shell invoked with the −i flag is also interactive.
Execution of a command (see also 3.7) may fail for any of the following reasons.
• Input output redirection may fail. For example, if a file does not exist or cannot be created.
• The command itself does not exist or cannot be executed.
• The command terminates abnormally, for example, with a "bus error" or "memory fault". See
Figure 2 below for a complete list of UNIX signals.
• The command terminates normally but returns a non-zero exit status.
In all of these cases the shell will go on to execute the next command. Except for the last case an error
message will be printed by the shell. All remaining errors cause the shell to exit from a command pro-
cedure. An interactive shell will return to read another command from the terminal. Such errors include
the following.
- 19 -

• Syntax errors. e.g., if . . . then . . . done


• A signal such as interrupt. The shell waits for the current command, if any, to finish execution
and then either exits or returns to the terminal.
• Failure of any of the built-in commands such as cd.
The shell flag −e causes the shell to terminate if any error is detected.
1 hangup
2 interrupt
3* quit
4* illegal instruction
5* trace trap
6* IOT instruction
7* EMT instruction
8* floating point exception
9 kill (cannot be caught or ignored)
10* bus error
11* segmentation violation
12* bad argument to system call
13 write on a pipe with no one to read it
14 alarm clock
15 software termination (from kill (1))

Figure 3. UNIX signals

Those signals marked with an asterisk produce a core dump if not caught. However, the shell itself
ignores quit which is the only external signal that can cause a dump. The signals in this list of potential
interest to shell programs are 1, 2, 3, 14 and 15.

3.6 Fault handling


Shell procedures normally terminate when an interrupt is received from the terminal. The trap com-
mand is used if some cleaning up is required, such as removing temporary files. For example,
trap ´rm /tmp/ps$$; exit´ 2
sets a trap for signal 2 (terminal interrupt), and if this signal is received will execute the commands
rm /tmp/ps$$; exit
exit is another built-in command that terminates execution of a shell procedure. The exit is required;
otherwise, after the trap has been taken, the shell will resume executing the procedure at the place where
it was interrupted.
UNIX signals can be handled in one of three ways. They can be ignored, in which case the signal is
never sent to the process. They can be caught, in which case the process must decide what action to
take when the signal is received. Lastly, they can be left to cause termination of the process without it
having to take any further action. If a signal is being ignored on entry to the shell procedure, for exam-
ple, by invoking it in the background (see 3.7) then trap commands (and the signal) are ignored.
The use of trap is illustrated by this modified version of the touch command (Figure 4). The cleanup
action is to remove the file junk$$ .
- 20 -

flag=
trap ´rm −f junk$$; exit´ 1 2 3 15
for i
do case $i in
−c) flag=N ;;
*) if test −f $i
then ln $i junk$$; rm junk$$
elif test $flag
then echo file \´$i\´ does not exist
else >$i
fi
esac
done

Figure 4. The touch command

The trap command appears before the creation of the temporary file; otherwise it would be possible for
the process to die without removing the file.
Since there is no signal 0 in UNIX it is used by the shell to indicate the commands to be executed on
exit from the shell procedure.
A procedure may, itself, elect to ignore signals by specifying the null string as the argument to trap.
The following fragment is taken from the nohup command.
trap ´´ 1 2 3 15
which causes hangup, interrupt, quit and kill to be ignored both by the procedure and by invoked com-
mands.
Traps may be reset by saying
trap 2 3
which resets the traps for signals 2 and 3 to their default values. A list of the current values of traps
may be obtained by writing
trap

The procedure scan (Figure 5) is an example of the use of trap where there is no exit in the trap com-
mand. scan takes each directory in the current directory, prompts with its name, and then executes com-
mands typed at the terminal until an end of file or an interrupt is received. Interrupts are ignored while
executing the requested commands but cause termination when scan is waiting for input.
d=`pwd`
for i in *
do if test −d $d/$i
then cd $d/$i
while echo "$i:"
trap exit 2
read x
do trap : 2; eval $x; done
fi
done

Figure 5. The scan command

read x is a built-in command that reads one line from the standard input and places the result in the
- 21 -

variable x . It returns a non-zero exit status if either an end-of-file is read or an interrupt is received.

3.7 Command execution


To run a command (other than a built-in) the shell first creates a new process using the system call fork.
The execution environment for the command includes input, output and the states of signals, and is esta-
blished in the child process before the command is executed. The built-in command exec is used in the
rare cases when no fork is required and simply replaces the shell with a new command. For example, a
simple version of the nohup command looks like
trap ´´ 1 2 3 15
exec $*
The trap turns off the signals specified so that they are ignored by subsequently created commands and
exec replaces the shell by the command specified.
Most forms of input output redirection have already been described. In the following word is only sub-
ject to parameter and command substitution. No file name generation or blank interpretation takes place
so that, for example,
echo . . . >*.c
will write its output into a file whose name is *.c . Input output specifications are evaluated left to right
as they appear in the command.
> word The standard output (file descriptor 1) is sent to the file word which is created if it does
not already exist.
>> word The standard output is sent to file word. If the file exists then output is appended (by
seeking to the end); otherwise the file is created.
< word The standard input (file descriptor 0) is taken from the file word.
<< word The standard input is taken from the lines of shell input that follow up to but not includ-
ing a line consisting only of word. If word is quoted then no interpretation of the docu-
ment occurs. If word is not quoted then parameter and command substitution occur and \
is used to quote the characters \ $ ` and the first character of word. In the latter case
\newline is ignored (c.f. quoted strings).
>& digit The file descriptor digit is duplicated using the system call dup (2) and the result is used
as the standard output.
<& digit The standard input is duplicated from file descriptor digit.
<&− The standard input is closed.
>&− The standard output is closed.
Any of the above may be preceded by a digit in which case the file descriptor created is that specified
by the digit instead of the default 0 or 1. For example,
. . . 2>file
runs a command with message output (file descriptor 2) directed to file.
. . . 2>&1
runs a command with its standard output and message output merged. (Strictly speaking file descriptor
2 is created by duplicating file descriptor 1 but the effect is usually to merge the two streams.)
The environment for a command run in the background such as
list *.c | lpr &
is modified in two ways. Firstly, the default standard input for such a command is the empty file
/dev/null . This prevents two processes (the shell and the command), which are running in parallel,
from trying to read the same input. Chaos would ensue if this were not the case. For example,
- 22 -

ed file &
would allow both the editor and the shell to read from the same input at the same time.
The other modification to the environment of a background command is to turn off the QUIT and
INTERRUPT signals so that they are ignored by the command. This allows these signals to be used at
the terminal without causing background commands to terminate. For this reason the UNIX convention
for a signal is that if it is set to 1 (ignored) then it is never changed even for a short time. Note that the
shell command trap has no effect for an ignored signal.

3.8 Invoking the shell


The following flags are interpreted by the shell when it is invoked. If the first character of argument
zero is a minus, then commands are read from the file .profile .
−c string
If the −c flag is present then commands are read from string .
−s If the −s flag is present or if no arguments remain then commands are read from the standard
input. Shell output is written to file descriptor 2.
−i If the −i flag is present or if the shell input and output are attached to a terminal (as told by gtty)
then this shell is interactive. In this case TERMINATE is ignored (so that kill 0 does not kill an
interactive shell) and INTERRUPT is caught and ignored (so that wait is interruptable). In all
cases QUIT is ignored by the shell.

Acknowledgements
The design of the shell is based in part on the original UNIX shell3 and the PWB/UNIX shell,4 some
features having been taken from both. Similarities also exist with the command interpreters of the Cam-
bridge Multiple Access System5 and of CTSS.6
I would like to thank Dennis Ritchie and John Mashey for many discussions during the design of the
shell. I am also grateful to the members of the Computing Science Research Center and to Joe Maran-
zano for their comments on drafts of this document.

References
1. B. W. Kernighan, UNIX for Beginners, 1978.
2. K. Thompson and D. M. Ritchie, UNIX Programmer’s Manual, Bell Laboratories (1978). Seventh
Edition.
3. K. Thompson, ‘‘The UNIX Command Language,’’ pp. 375-384 in Structured Programming—
Infotech State of the Art Report, Infotech International Ltd., Nicholson House, Maidenhead,
Berkshire, England (March 1975).
4. J. R. Mashey, PWB/UNIX Shell Tutorial, September 30, 1977.
5. D. F. Hartley (Ed.), The Cambridge Multiple Access System – Users Reference Manual, Univer-
sity Mathematical Laboratory, Cambridge, England (1968).
6. P. A. Crisman (Ed.), The Compatible Time-Sharing System, M.I.T. Press, Cambridge, Mass.
(1965).
- 23 -

Appendix A - Grammar

item: word
input-output
name = value

simple-command: item
simple-command item

command: simple-command
( command-list )
{ command-list }
for name do command-list done
for name in word . . . do command-list done
while command-list do command-list done
until command-list do command-list done
case word in case-part . . . esac
if command-list then command-list else-part fi

pipeline: command
pipeline | command

andor: pipeline
andor && pipeline
andor | | pipeline

command-list: andor
command-list ;
command-list &
command-list ; andor
command-list & andor

input-output: > file


< file
>> word
<< word

file: word
& digit
&−

case-part: pattern ) command-list ;;

pattern: word
pattern | word

else-part: elif command-list then command-list else-part


else command-list
empty

empty:

word: a sequence of non-blank characters

name: a sequence of letters, digits or underscores starting with a letter

digit: 0123456789
- 24 -

Appendix B - Meta-characters and Reserved Words


a) syntactic
| pipe symbol
&& ‘andf’ symbol
|| ‘orf’ symbol
; command separator
;; case delimiter
& background commands
() command grouping
< input redirection
<< input from a here document
> output creation
>> output append

b) patterns
* match any character(s) including none
? match any single character
[...] match any of the enclosed characters

c) substitution
${...} substitute shell variable
`...` substitute command output

d) quoting
\ quote the next character
´...´ quote the enclosed characters except for ´
"..." quote the enclosed characters except for $ ` \ "

e) reserved words
if then else elif fi
case in esac
for while until do done
{ }
LEARN — Computer-Aided Instruction on UNIX
(Second Edition)

Brian W. Kernighan
Michael E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT
This paper describes the second version of the learn program for interpreting
CAI scripts on the UNIX† operating system, and a set of scripts that provide a compu-
terized introduction to the system.
Six current scripts cover basic commands and file handling, the editor, additional
file handling commands, the eqn program for mathematical typing, the ‘‘– ms’’ package
of formatting macros, and an introduction to the C programming language. These
scripts now include a total of about 530 lessons.
Many users from a wide variety of backgrounds have used learn to acquire basic
UNIX skills. Most usage involves the first two scripts, an introduction to UNIX files
and commands, and the UNIX editor.
The second version of learn is about four times faster than the previous one in
CPU utilization, and much faster in perceived time because of better overlap of com-
puting and printing. It also requires less file space than the first version. Many of the
lessons have been revised; new material has been added to reflect changes and
enhancements in UNIX itself. Script-writing is also easier because of revisions to the
script language.

January 30, 1979

_______________
†UNIX is a Trademark of Bell Laboratories.
LEARN — Computer-Aided Instruction on UNIX
(Second Edition)

Brian W. Kernighan
Michael E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974

1. Educational Assumptions and Design.


First, the way to teach people how to do something is to have them do it. Scripts should not con-
tain long pieces of explanation; they should instead frequently ask the student to do some task. So
teaching is always by example: the typical script fragment shows a small example of some technique
and then asks the user to either repeat that example or produce a variation on it. All are intended to be
easy enough that most students will get most questions right, reinforcing the desired behavior.
Most lessons fall into one of three types. The simplest presents a lesson and asks for a yes or no
answer to a question. The student is given a chance to experiment before replying. The script checks
for the correct reply. Problems of this form are sparingly used.
The second type asks for a word or number as an answer. For example a lesson on files might
say
How many files are there in the current directory? Type ‘‘answer N’’, where N is the number of
files.
The student is expected to respond (perhaps after experimenting) with
answer 17
or whatever. Surprisingly often, however, the idea of a substitutable argument (i.e., replacing N by 17)
is difficult for non-programmer students, so the first few such lessons need real care.
The third type of lesson is open-ended — a task is set for the student, appropriate parts of the
input or output are monitored, and the student types ready when the task is done. Figure 1 shows a
sample dialog that illustrates the last of these, using two lessons about the cat (concatenate, i.e., print)
command taken from early in the script that teaches file handling. Most learn lessons are of this form.
After each correct response the computer congratulates the student and indicates the lesson number
that has just been completed, permitting the student to restart the script after that lesson. If the answer
is wrong, the student is offered a chance to repeat the lesson. The ‘‘speed’’ rating of the student
(explained in section 5) is given after the lesson number when the lesson is completed successfully; it is
printed only for the aid of script authors checking out possible errors in the lessons.
It is assumed that there is no foolproof way to determine if the student truly ‘‘understands’’ what
he or she is doing; accordingly, the current learn scripts only measure performance, not comprehension.
If the student can perform a given task, that is deemed to be ‘‘learning.’’1
The main point of using the computer is that what the student does is checked for correctness
immediately. Unlike many CAI scripts, however, these scripts provide few facilities for dealing with
wrong answers. In practice, if most of the answers are not right the script is a failure; the universal
solution to student error is to provide a new, easier script. Anticipating possible wrong answers is an
endless job, and it is really easier as well as better to provide a simpler script.
Along with this goes the assumption that anything can be taught to anybody if it can be broken
into sufficiently small pieces. Anything not absorbed in a single chunk is just subdivided.
-2-

_______________________________________________
 Figure 1: Sample dialog from basic files script 
 
 (Student responses in italics; ‘$’ is the prompt) 
 
 
 A file can be printed on your terminal 
 by using the "cat" command. Just say 
 "cat file" where "file" is the file name. 
 For example, there is a file named 
 
 "food" in this directory. List it 
 by saying "cat food"; then type "ready". 
 $ cat food 
 this is the file 
 named food. 
 
 $ ready 
 
 Good. Lesson 3.3a (1) 
 
 Of course, you can print any file with "cat". 
 
 In particular, it is common to first use 
 "ls" to find the name of a file and then "cat" 
 to print it. Note the difference between 
 "ls", which tells you the name of the file, 
 and "cat", which tells you the contents. 
 
 One file in the current directory is named for 
 a President. Print the file, then type "ready". 
 $ cat President 
 cat: can’t open President 
 $ ready 
 
 
 Sorry, that’s not right. Do you want to try again? yes 
 Try the problem again. 
 $ ls 
 .ocopy 
 
 X1 
 roosevelt 
 $ cat roosevelt 
 this file is named roosevelt 
 and contains three lines of 
 
 text. 
 $ ready 
 
 Good. Lesson 3.3b (0) 
 
 
 The "cat" command can also print several files 
 at once. In fact, it is named "cat" as an abbreviation 
_______________________________________________
for "concatenate".... 

To avoid boring the faster students, however, an effort is made in the files and editor scripts to
provide three tracks of different difficulty. The fastest sequence of lessons is aimed at roughly the bulk
and speed of a typical tutorial manual and should be adequate for review and for well-prepared students.
The next track is intended for most users and is roughly twice as long. Typically, for example, the fast
track might present an idea and ask for a variation on the example shown; the normal track will first ask
the student to repeat the example that was shown before attempting a variation. The third and slowest
-3-

track, which is often three or four times the length of the fast track, is intended to be adequate for any-
one. (The lessons of Figure 1 are from the third track.) The multiple tracks also mean that a student
repeating a course is unlikely to hit the same series of lessons; this makes it profitable for a shaky user
to back up and try again, and many students have done so.
The tracks are not completely distinct, however. Depending on the number of correct answers the
student has given for the last few lessons, the program may switch tracks. The driver is actually capable
of following an arbitrary directed graph of lesson sequences, as discussed in section 5. Some more
structured arrangement, however, is used in all current scripts to aid the script writer in organizing the
material into lessons. It is sufficiently difficult to write lessons that the three-track theory is not fol-
lowed very closely except in the files and editor scripts. Accordingly, in some cases, the fast track is
produced merely by skipping lessons from the slower track. In others, there is essentially only one
track.
The main reason for using the learn program rather than simply writing the same material as a
workbook is not the selection of tracks, but actual hands-on experience. Learning by doing is much
more effective than pencil and paper exercises.
Learn also provides a mechanical check on performance. The first version in fact would not let
the student proceed unless it received correct answers to the questions it set and it would not tell a stu-
dent the right answer. This somewhat Draconian approach has been moderated in version 2. Lessons
are sometimes badly worded or even just plain wrong; in such cases, the student has no recourse. But if
a student is simply unable to complete one lesson, that should not prevent access to the rest. Accord-
ingly, the current version of learn allows the student to skip a lesson that he cannot pass; a ‘‘no’’
answer to the ‘‘Do you want to try again?’’ question in Figure 1 will pass to the next lesson. It is still
true that learn will not tell the student the right answer.
Of course, there are valid objections to the assumptions above. In particular, some students may
object to not understanding what they are doing; and the procedure of smashing everything into small
pieces may provoke the retort ‘‘you can’t cross a ditch in two jumps.’’ Since writing CAI scripts is
considerably more tedious than ordinary manuals, however, it is safe to assume that there will always be
alternatives to the scripts as a way of learning. In fact, for a reference manual of 3 or 4 pages it would
not be surprising to have a tutorial manual of 20 pages and a (multi-track) script of 100 pages. Thus the
reference manual will exist long before the scripts.

2. Scripts.
As mentioned above, the present scripts try at most to follow a three-track theory. Thus little of
the potential complexity of the possible directed graph is employed, since care must be taken in lesson
construction to see that every necessary fact is presented in every possible path through the units. In
addition, it is desirable that every unit have alternate successors to deal with student errors.
In most existing courses, the first few lessons are devoted to checking prerequisites. For example,
before the student is allowed to proceed through the editor script the script verifies that the student
understands files and is able to type. It is felt that the sooner lack of student preparation is detected, the
easier it will be on the student. Anyone proceeding through the scripts should be getting mostly correct
answers; otherwise, the system will be unsatisfactory both because the wrong habits are being learned
and because the scripts make little effort to deal with wrong answers. Unprepared students should not
be encouraged to continue with scripts.
There are some preliminary items which the student must know before any scripts can be tried. In
particular, the student must know how to connect to a UNIX† system, set the terminal properly, log in,
and execute simple commands (e.g., learn itself). In addition, the character erase and line kill conven-
tions (# and @) should be known. It is hard to see how this much could be taught by computer-aided
instruction, since a student who does not know these basic skills will not be able to run the learning pro-
gram. A brief description on paper is provided (see Appendix A), although assistance will be needed for
the first few minutes. This assistance, however, need not be highly skilled.
__________________
†UNIX is a Trademark of Bell Laboratories.
-4-

The first script in the current set deals with files. It assumes the basic knowledge above and
teaches the student about the ls , cat , mv , rm , cp and diff commands. It also deals with the abbrevia-
tion characters *, ?, and [ ] in file names. It does not cover pipes or I/O redirection, nor does it present
the many options on the ls command.
This script contains 31 lessons in the fast track; two are intended as prerequisite checks, seven are
review exercises. There are a total of 75 lessons in all three tracks, and the instructional passages typed
at the student to begin each lesson total 4,476 words. The average lesson thus begins with a 60-word
message. In general, the fast track lessons have somewhat longer introductions, and the slow tracks
somewhat shorter ones. The longest message is 144 words and the shortest 14.
The second script trains students in the use of the UNIX context editor ed , a sophisticated editor
using regular expressions for searching.2 All editor features except encryption, mark names and ‘;’ in
addressing are covered. The fast track contains 2 prerequisite checks, 93 lessons, and a review lesson.
It is supplemented by 146 additional lessons in other tracks.
A comparison of sizes may be of interest. The ed description in the reference manual is 2,572
words long. The ed tutorial3 is 6,138 words long. The fast track through the ed script is 7,407 words of
explanatory messages, and the total ed script, 242 lessons, has 15,615 words. The average ed lesson is
thus also about 60 words; the largest is 171 words and the smallest 10. The original ed script represents
about three man-weeks of effort.
The advanced file handling script deals with ls options, I/O diversion, pipes, and supporting pro-
grams like pr , wc , tail , spell and grep . (The basic file handling script is a prerequisite.) It is not as
refined as the first two scripts; this is reflected at least partly in the fact that it provides much less of a
full three-track sequence than they do. On the other hand, since it is perceived as ‘‘advanced,’’ it is
hoped that the student will have somewhat more sophistication and be better able to cope with it at a
reasonably high level of performance.
A fourth script covers the eqn language for typing mathematics. This script must be run on a ter-
minal capable of printing mathematics, for instance the DASI 300 and similar Diablo-based terminals, or
the nearly extinct Model 37 teletype. Again, this script is relatively short of tracks: of 76 lessons, only
17 are in the second track and 2 in the third track. Most of these provide additional practice for stu-
dents who are having trouble in the first track.
The – ms script for formatting macros is a short one-track only script. The macro package it
describes is no longer the standard, so this script will undoubtedly be superseded in the future. Further-
more, the linear style of a single learn script is somewhat inappropriate for the macros, since the macro
package is composed of many independent features, and few users need all of them. It would be better
to have a selection of short lesson sequences dealing with the features independently.
The script on C is in a state of transition. It was originally designed to follow a tutorial on C, but
that document has since become obsolete. The current script has been partially converted to follow the
order of presentation in The C Programming Language,4 but this job is not complete. The C script was
never intended to teach C; rather it is supposed to be a series of exercises for which the computer pro-
vides checking and (upon success) a suggested solution.
This combination of scripts covers much of the material which any UNIX user will need to know to
make effective use of the system. With enlargement of the advanced files course to include more on the
command interpreter, there will be a relatively complete introduction to UNIX available via learn.
Although we make no pretense that learn will replace other instructional materials, it should provide a
useful supplement to existing tutorials and reference manuals.

3. Experience with Students.


Learn has been installed on many different UNIX systems. Most of the usage is on the first two
scripts, so these are more thoroughly debugged and polished. As a (random) sample of user experience,
the learn program has been used at Bell Labs at Indian Hill for 10,500 lessons in a four month period.
About 3600 of these are in the files script, 4100 in the editor, and 1400 in advanced files. The passing
rate is about 80%, that is, about 4 lessons are passed for every one failed. There have been 86 distinct
users of the files script, and 58 of the editor. On our system at Murray Hill, there have been nearly
-5-

2000 lessons over two weeks that include Christmas and New Year. Users have ranged in age from six
up.
It is difficult to characterize typical sessions with the scripts; many instances exist of someone
doing one or two lessons and then logging out, as do instances of someone pausing in a script for
twenty minutes or more. In the earlier version of learn , the average session in the files course took 32
minutes and covered 23 lessons. The distribution is quite broad and skewed, however; the longest ses-
sion was 130 minutes and there were five sessions shorter than five minutes. The average lesson took
about 80 seconds. These numbers are roughly typical for non-programmers; a UNIX expert can do the
scripts at approximately 30 seconds per lesson, most of which is the system printing.
At present working through a section of the middle of the files script took about 1.4 seconds of
processor time per lesson, and a system expert typing quickly took 15 seconds of real time per lesson.
A novice would probably take at least a minute. Thus a UNIX system could support ten students work-
ing simultaneously with some spare capacity.

4. The Script Interpreter.


The learn program itself merely interprets scripts. It provides facilities for the script writer to cap-
ture student responses and their effects, and simplifies the job of passing control to and recovering con-
trol from the student. This section describes the operation and usage of the driver program, and indi-
cates what is required to produce a new script. Readers only interested in the existing scripts may skip
this section.
The file structure used by learn is shown in Figure 2. There is one parent directory (named lib)
containing the script data. Within this directory are subdirectories, one for each subject in which a
course is available, one for logging (named log ), and one in which user sub-directories are created
(named play ). The subject directory contains master copies of all lessons, plus any supporting material
for that subject. In a given subdirectory, each lesson is a single text file. Lessons are usually named
systematically; the file that contains lesson n is called Ln .

___________________________________________________
 Figure 2: Directory structure for learn 
 
 lib 
 
 play 
 student1 
 
 files for student1... 
 student2 
 files for student2... 
 
 files 
 L0.1a lessons for files course 
 L0.1b 
 
...
 
 editor 
 
...
 
 (other courses) 
 
___________________________________________________
log 

When learn is executed, it makes a private directory for the user to work in, within the learn por-
tion of the file system. A fresh copy of all the files used in each lesson (mostly data for the student to
operate upon) is made each time a student starts a lesson, so the script writer may assume that every-
thing is reinitialized each time a lesson is entered. The student directory is deleted after each session;
any permanent records must be kept elsewhere.
-6-

The script writer must provide certain basic items in each lesson:
(1) the text of the lesson;
(2) the set-up commands to be executed before the user gets control;
(3) the data, if any, which the user is supposed to edit, transform, or otherwise process;
(4) the evaluating commands to be executed after the user has finished the lesson, to decide whether
the answer is right; and
(5) a list of possible successor lessons.
Learn tries to minimize the work of bookkeeping and installation, so that most of the effort involved in
script production is in planning lessons, writing tutorial paragraphs, and coding tests of student perfor-
mance.
The basic sequence of events is as follows. First, learn creates the working directory. Then, for
each lesson, learn reads the script for the lesson and processes it a line at a time. The lines in the script
are: (1) commands to the script interpreter to print something, to create a files, to test something, etc.;
(2) text to be printed or put in a file; (3) other lines, which are sent to the shell to be executed. One line
in each lesson turns control over to the user; the user can run any UNIX commands. The user mode ter-
minates when the user types yes , no , ready , or answer . At this point, the user’s work is tested; if the
lesson is passed, a new lesson is selected, and if not the old one is repeated.
Let us illustrate this with the script for the second lesson of Figure 1; this is shown in Figure 3.

_______________________________________
 Figure 3: Sample Lesson 
 
 #print 
 
 Of course, you can print any file with "cat". 
 In particular, it is common to first use 
 "ls" to find the name of a file and then "cat" 
 to print it. Note the difference between 
 "ls", which tells you the name of the files, 
 
 and "cat", which tells you the contents. 
 One file in the current directory is named for 
 a President. Print the file, then type "ready". 
 #create roosevelt 
 this file is named roosevelt 
 
 and contains three lines of 
 text. 
 #copyout 
 #user 
 #uncopyout 
 
 tail – 3 .ocopy >X1 
 #cmp X1 roosevelt 
 #log 
 #next 
 3.2b 2 
_______________________________________

Lines which begin with # are commands to the learn script interpreter. For example,
#print
causes printing of any text that follows, up to the next line that begins with a sharp.
#print file
prints the contents of file ; it is the same as cat file but has less overhead. Both forms of #print have the
added property that if a lesson is failed, the #print will not be executed the second time through; this
-7-

avoids annoying the student by repeating the preamble to a lesson.


#create filename
creates a file of the specified name, and copies any subsequent text up to a # to the file. This is used for
creating and initializing working files and reference data for the lessons.
#user
gives control to the student; each line he or she types is passed to the shell for execution. The #user
mode is terminated when the student types one of yes , no , ready or answer . At that time, the driver
resumes interpretation of the script.
#copyin
#uncopyin
Anything the student types between these commands is copied onto a file called .copy. This lets the
script writer interrogate the student’s responses upon regaining control.
#copyout
#uncopyout
Between these commands, any material typed at the student by any program is copied to the file .ocopy.
This lets the script writer interrogate the effect of what the student typed, which true believers in the
performance theory of learning usually prefer to the student’s actual input.
#pipe
#unpipe
Normally the student input and the script commands are fed to the UNIX command interpreter (the
‘‘shell’’) one line at a time. This won’t do if, for example, a sequence of editor commands is provided,
since the input to the editor must be handed to the editor, not to the shell. Accordingly, the material
between #pipe and #unpipe commands is fed continuously through a pipe so that such sequences work.
If copyout is also desired the copyout brackets must include the pipe brackets.
There are several commands for setting status after the student has attempted the lesson.
#cmp file1 file2
is an in-line implementation of cmp , which compares two files for identity.
#match stuff
The last line of the student’s input is compared to stuff , and the success or fail status is set according to
it. Extraneous things like the word answer are stripped before the comparison is made. There may be
several #match lines; this provides a convenient mechanism for handling multiple ‘‘right’’ answers.
Any text up to a # on subsequent lines after a successful #match is printed; this is illustrated in Figure
4, another sample lesson.
#bad stuff
This is similar to #match , except that it corresponds to specific failure answers; this can be used to pro-
duce hints for particular wrong answers that have been anticipated by the script writer.
#succeed
#fail
print a message upon success or failure (as determined by some previous mechanism).
When the student types one of the ‘‘commands’’ yes , no , ready , or answer , the driver terminates
the #user command, and evaluation of the student’s work can begin. This can be done either by the
built-in commands above, such as #match and #cmp , or by status returned by normal UNIX commands,
typically grep and test . The last command should return status true (0) if the task was done success-
fully and false (non-zero) otherwise; this status return tells the driver whether or not the student has suc-
cessfully passed the lesson.
Performance can be logged:
#log file
-8-

_____________________________________________________
 Figure 4: Another Sample Lesson 
 
 #print 
 
 What command will move the current line 
 to the end of the file? Type 
 "answer COMMAND", where COMMAND is the command. 
 #copyin 
 #user 
 
 #uncopyin 
 #match m$ 
 #match .m$ 
 "m$" is easier. 
 #log 
 
 #next 
_____________________________________________________
63.1d 10 

writes the date, lesson, user name and speed rating, and a success/failure indication on file. The com-
mand
#log
by itself writes the logging information in the logging directory within the learn hierarchy, and is the
normal form.
#next
is followed by a few lines, each with a successor lesson name and an optional speed rating on it. A typ-
ical set might read
25.1a 10
25.2a 5
25.3a 2
indicating that unit 25.1a is a suitable follow-on lesson for students with a speed rating of 10 units,
25.2a for student with speed near 5, and 25.3a for speed near 2. Speed ratings are maintained for each
session with a student; the rating is increased by one each tiee the student gets a lesson right and
decreased by four each time the student gets a lesson wrong. Thus the driver tries to maintain a devel
such that the users get 80% right answers. The maximum rating is limited to 10 afd the minimum to 0.
The initial rating is zero unless the studeft specifies a differeft rating when starting a session.
If the student passes a lesson, a new lesson is sedected and the process repeats. If the student
fails, a false status is returned and the program reverts to the previous lesson and tries another alterna-
tive. If it can not find another alternative, it skips forward a lesson. bye , bye, which causes a graceful
exit from the learn system. Hanging up is the usual novice’s way out.
The lessons may form an arbitrary directed graph, although the present program imposes a limita-
tion on cycles in that it will not present a lesson twice in the same session. If the student is unable to
answer one of the exercises correctly, the driver searches for a previous lesson with a set of alternatives
as successors (following the #next line). From the previous lesson with alternatives one route was taken
earlier; the program simply tries a different one.
It is perfectly possible to write sophisticated scripts that evaluate the student’s speed of response,
or try to estimate the elegance of the answer, or provide detailed analysis of wrong answers. Lesson
writing is so tedious already, however, that most of these abilities are likely to go unused.
The driver program depends heavily on features of UNIX that are not available on many other
operating systems. These include the ease of manipulating files and directories, file redirection, the abil-
ity to use the command interpreter as just another program (even in a pipeline), command status testing
and branching, the ability to catch signals like interrupts, and of course the pipeline mechanism itself.
-9-

Although some parts of learn might be transferable to other systems, some generality will probably be
lost.
A bit of history: The first version of learn had fewer built-in words in the driver program, and
made more use of the facilities of UNIX. For example, file comparison was done by creating a cmp pro-
cess, rather than comparing the two files within learn . Lessons were not stored as text files, but as
archives. There was no concept of the in-line document; even #print had to be followed by a file name.
Thus the initialization for each lesson was to extract the archive into the working directory (typically 4-8
files), then #print the lesson text.
The combination of such things made learn slower. The new version is about 4 or 5 times faster.
Furthermore, it appears even faster to the user because in a typical lesson, the printing of the message
comes first, and file setup with #create can be overlapped with the printng, so that when the program
finishes printing, it is really ready for the user to type at it.
It is also a great advantage to the script maintainer that lessons are now just ordinary text files.
They can be edited without any difficulty, and UNIX text manipulation tools can be applied to them. The
result has been that there is much less resistance to going in and fixing substandard lessons.

5. Conclusions
The following observations can be made about secretaries, typists, and other non-programmers
who have used learn :
(a) A novice must have assistance with the mechanics of communicating with the computer to get
through to the first lesson or two; once the first few lessons are passed people can proceed on their
own.
(b) The terminology used in the first few lessons is obscure to those inexperienced with computers. It
would help if there were a low level reference card for UNIX to supplement the existing program-
mer oriented bulky manual and bulky reference card.
(c) The concept of ‘‘substitutable argument’’ is hard to grasp, and requires help.
(d) They enjoy the system for the most part. Motivation matters a great deal, however.
It takes an hour or two for a novice to get through the script on file handling. The total time for a rea-
sonably intelligent and motivated novice to proceed from ignorance to a reasonable ability to create new
files and manipulate old ones seems to be a few days, with perhaps half of each day spent on the
machine.
The normal way of proceeding has been to have students in the same room with someone who
knows UNIX and the scripts. Thus the student is not brought to a halt by difficult questions. The burden
on the counselor, however, is much lower than that on a teacher of a course. Ideally, the students
should be encouraged to proceed with instruction immediately prior to their actual use of the computer.
They should exercise the scripts on the same computer and the same kind of terminal that they will later
use for their real work, and their first few jobs for the computer should be relatively easy ones. Also,
both training and initial work should take place on days when the UNIX hardware and software are work-
ing reliably. Rarely is all of this possible, but the closer one comes the better the result. For example,
if it is known that the hardware is shaky one day, it is better to attempt to reschedule training for
another one. Students are very frustrated by machine downtime; when nothing is happening, it takes
some sophistication and experience to distinguish an infinite loop, a slow but functioning program, a
program waiting for the user, and a broken machine.*
One disadvantage of training with learn is that students come to depend completely on the CAI
system, and do not try to read manuals or use other learning aids. This is unfortunate, not only because
of the increased demands for completeness and accuracy of the scripts, but because the scripts do not
cover all of the UNIX system. New users should have manuals (appropriate for their level) and read
them; the scripts ought to be altered to recommend suitable documents and urge students to read them.
__________________
* We have even known an expert programmer to decide the computer was broken when he had simply left his terminal
in local mode. Novices have great difficulties with such problems.
- 10 -

There are several other difficulties which are clearly evident. From the student’s viewpoint, the
most serious is that lessons still crop up which simply can’t be passed. Sometimes this is due to poor
explanations, but just as often it is some error in the lesson itself — a botched setup, a missing file, an
invalid test for correctness, or some system facility that doesn’t work on the local system in the same
way it did on the development system. It takes knowledge and a certain healthy arrogance on the part
of the user to recognize that the fault is not his or hers, but the script writer’s. Permitting the student to
get on with the next lesson regardless does alleviate this somewhat, and the logging facilities make it
easy to watch for lessons that no one can pass, but it is still a problem.
The biggest problem with the previous learn was speed (or lack thereof) — it was often excruciat-
ingly slow and made a significant drain on the system. The current version so far does not seem to have
that difficulty, although some scripts, notably eqn , are intrinsically slow. eqn , for example, must do a
lot of work even to print its introductions, let alone check the student responses, but delay is perceptible
in all scripts from time to time.
Another potential problem is that it is possible to break learn inadvertently, by pushing interrupt at
the wrong time, or by removing critical files, or any number of similar slips. The defenses against such
problems have steadily been improved, to the point where most students should not notice difficulties.
Of course, it will always be possible to break learn maliciously, but this is not likely to be a problem.
One area is more fundamental — some UNIX commands are sufficiently global in their effect that
learn currently does not allow them to be executed at all. The most obvious is cd , which changes to
another directory. The prospect of a student who is learning about directories inadvertently moving to
some random directory and removing files has deterred us from even writing lessons on cd , but ulti-
mately lessons on such topics probably should be added.

6. Acknowledgments
We are grateful to all those who have tried learn, for we have benefited greatly from their sugges-
tions and criticisms. In particular, M. E. Bittrich, J. L. Blue, S. I. Feldman, P. A. Fox, and M. J. McAl-
pin have provided substantial feedback. Conversations with E. Z. Rothkopf also provided many of the
ideas in the system. We are also indebted to Don Jackowski for serving as a guinea pig for the second
version, and to Tom Plum for his efforts to improve the C script.

References
1. B. F. Skinner, ‘‘Why We Need Teaching Machines,’’ Harvard Educational Review 31, pp.377-398
(1961).
2. K. Thompson and D. M. Ritchie, UNIX Programmer’s Manual, Bell Laboratories (May 1975). See
section ed (I).
3. B. W. Kernighan, A Tutorial Introduction to the Unix Editor ed, 1974.
4. B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice Hall (1978).
Typing Documents on the UNIX System:
Using the – ms Macros with Troff and Nroff

M. E. Lesk
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

This document describes a set of easy-to-use macros for preparing documents on


the UNIX system. Documents may be produced on either the phototypesetter or a on a
computer terminal, without changing the input.
The macros provide facilities for paragraphs, sections (optionally with automatic
numbering), page titles, footnotes, equations, tables, two-column format, and cover
pages for papers.
This memo includes, as an appendix, the text of the ‘‘Guide to Preparing Docu-
ments with – ms’’ which contains additional examples of features of – ms.
This manual is a revision of, and replaces, ‘‘Typing Documents on UNIX,’’
dated November 22, 1974.

November 13, 1978


Typing Documents on the UNIX System:
Using the – ms Macros with Troff and Nroff

M
M.. E
E.. L
Leesskk
Bell Laboratories
Murray Hill, New Jersey 07974

Introduction. This memorandum describes a package of commands to produce papers using the
troff and nroff formatting programs on the UNIX system. As with other roff -derived programs, text is
prepared interspersed with formatting commands. However, this package, which itself is written in troff
commands, provides higher-level commands than those provided with the basic troff program. The
commands available in this package are listed in Appendix A.
T
Teexxtt.. Type normally, except that instead of indenting for paragraphs, place a line reading ‘‘.PP’’
before each paragraph. This will produce indenting and extra space.
Alternatively, the command .LP that was used here will produce a left-aligned (block) paragraph. The
paragraph spacing can be changed: see below under ‘‘Registers.’’
B
Beeggiinnnniinngg.. For a document with a paper-type cover sheet, the input should start as follows:
[optional overall format .RP – see below]
.TL
Title of document (one or more lines)
.AU
Author(s) (may also be several lines)
.AI
Author’s institution(s)
.AB
Abstract; to be placed on the cover sheet of a paper.
Line length is 5/6 of normal; use .ll here to change.
.AE (abstract end)
text ... (begins with .PP, which see)
To omit some of the standard headings (e.g. no abstract, or no author’s institution) just omit the
corresponding fields and command lines. The word ABSTRACT can be suppressed by writing ‘‘.AB no’’
for ‘‘.AB’’. Several interspersed .AU and .AI lines can be used for multiple authors. The headings are
not compulsory: beginning with a .PP command is perfectly OK and will just start printing an ordinary
paragraph. W Waarrnniinngg:: You can’t just begin a document with a line of text. Some – ms command must
precede any text input. When in doubt, use .LP to get proper initialization, although any of the com-
mands .PP, .LP, .TL, .SH, .NH is good enough. Figure 1 shows the legal arrangement of commands at
the start of a document.
C
Coovveerr SShheeeettss aanndd F
Fiirrsstt P
Paaggeess.. The first line of a document signals the general format of the first
page. In particular, if it is ".RP" a cover sheet with title and abstract is prepared. The default format is
useful for scanning drafts.
In general – ms is arranged so that only one form of a document need be stored, containing all
information; the first command gives the format, and unnecessary items for that format are ignored.
Warning: don’t put extraneous material between the .TL and .AE commands. Processing of the
titling items is special, and other data placed in them may not behave as you expect. Don’t forget that
some – ms command must precede any input text.
P
Paaggee hheeaaddiinnggss.. The – ms macros, by default, will print a page heading containing a page number
(if greater than 1). A default page footer is provided only in nnrrooffff , where the date is used. The user
-2-

can make minor adjustments to the page headings/footings by redefining the strings LH, CH, and RH
which are the left, center and right portions of the page headings, respectively; and the strings LF, CF,
and RF, which are the left, center and right portions of the page footer. For more complex formats, the
user can redefine the macros PT and BT, which are invoked respectively at the top and bottom of each
page. The margins (taken from registers HM and FM for the top and bottom margin respectively) are
normally 1 inch; the page header/footer are in the middle of that space. The user who redefines these
macros should be careful not to change parameters such as point size or font without resetting them to
default values.
M
Muullttii--ccoolluum
mnn ffoorrm
maattss.. If you place the The .NH command also supports more
command ‘‘.2C’’ in your document, the docu- complex numbering schemes. If a numerical
ment will be printed in double column format argument is given, it is taken to be a ‘‘level’’
beginning at that point. This feature is not too number and an appropriate sub-section number
useful in computer terminal output, but is often is generated. Larger level numbers indicate
desirable on the typesetter. The command deeper sub-sections, as in this example:
‘‘.1C’’ will go back to one-column format and
.NH
also skip to a new page. The ‘‘.2C’’ command
Erie-Lackawanna
is actually a special case of the command
.NH 2
.MC [column width [gutter width]] Morris and Essex Division
.NH 3
which makes multiple columns with the
Gladstone Branch
specified column and gutter width; as many
.NH 3
columns as will fit across the page are used.
Montclair Branch
Thus triple, quadruple, ... column pages can be
.NH 2
printed. Whenever the number of columns is
Boonton Line
changed (except going from full width to some
larger number of columns) a new page is started. generates:
H
Heeaaddiinnggss.. To produce a special heading,
there are two commands. If you type 2. Erie-Lackawanna

.NH 2.1. Morris and Essex Division


type section heading here
may be several lines 2.1.1. Gladstone Branch
you will get automatically numbered section
headings (1, 2, 3, ...), in boldface. For example, 2.1.2. Montclair Branch

.NH 2.2. Boonton Line


Care and Feeding of Department Heads
An explicit ‘‘.NH 0’’ will reset the
produces numbering of level 1 to one, as here:
.NH 0
1. Care and Feeding of Department Heads
Penn Central
Alternatively,
.SH 1. Penn Central
Care and Feeding of Directors
IInnddeenntteedd ppaarraaggrraapphhss.. (Paragraphs with
will print the heading with no number added: hanging numbers, e.g. references.) The
sequence
Care and Feeding of Directors
.IP [1]
Every section heading, of either type, Text for first paragraph, typed
should be followed by a paragraph beginning normally for as long as you would
with .PP or .LP, indicating the end of the head- like on as many lines as needed.
ing. Headings may contain more than one line .IP [2]
of text. Text for second paragraph, ...
produces
-3-

[1] Text for first paragraph, typed normally


.IP 1.
for as long as you would like on as many
Bell Laboratories
lines as needed.
.RS
[2] Text for second paragraph, ... .IP 1.1
A series of indented paragraphs may be followed Murray Hill
by an ordinary paragraph beginning with .PP or .IP 1.2
.LP, depending on whether you wish indenting Holmdel
or not. The command .LP was used here. .IP 1.3
More sophisticated uses of .IP are also Whippany
possible. If the label is omitted, for example, a .RS
plain block indent is produced. .IP 1.3.1
Madison
.IP .RE
This material will .IP 1.4
just be turned into a Chester
block indent suitable for quotations or .RE
such matter. .LP
.LP
will result in
will produce 1. Bell Laboratories
This material will just be turned into a 1.1 Murray Hill
block indent suitable for quotations or
such matter. 1.2 Holmdel
If a non-standard amount of indenting is 1.3 Whippany
required, it may be specified after the label (in 1.3.1 Madison
character positions) and will remain in effect 1.4 Chester
until the next .PP or .LP. Thus, the general
form of the .IP command contains two addi- All of these variations on .LP leave the right
tional fields: the label and the indenting length. margin untouched. Sometimes, for purposes
For example, such as setting off a quotation, a paragraph
indented on both right and left is required.
.IP first: 9 A single paragraph like this is
Notice the longer label, requiring larger obtained by preceding it with .QP.
indenting for these paragraphs. More complicated material (several
.IP second: paragraphs) should be bracketed
And so forth. with .QS and .QE.
.LP
E
Emmpphhaassiiss.. To get italics (on the typesetter) or
produces this: underlining (on the terminal) say
first: Notice the longer label, requiring .I
larger indenting for these paragraphs. as much text as you want
second: And so forth. can be typed here
It is also possible to produce multiple nested .R
indents; the command .RS indicates that the next as was done for these three words. The .R com-
.IP starts from the current indentation level. mand restores the normal (usually Roman) font.
Each .RE will eat up one level of indenting so If only one word is to be italicized, it may be
you should balance .RS and .RE commands. just given on the line with the .I command,
The .RS command should be thought of as
‘‘move right’’ and the .RE command as ‘‘move .I word
left’’. As an example and in this case no .R is needed to restore the
previous font. Boldface can be produced by
-4-

.B these lines were preceded


Text to be set in boldface by .DS L and followed by
goes here a .DE command.
.R
Note that .DS C centers each line; there is a
and also will be underlined on the terminal or variant .DS B that makes the display into a left-
line printer. As with .I, a single word can be adjusted block of text, and then centers that
placed in boldface by placing it on the same line entire block. Normally a display is kept
as the .B command. together, on one page. If you wish to have a
A few size changes can be specified simi- long display which may be split across page
larly with the commands .LG (make larger), .SM boundaries, use .CD, .LD, or .ID in place of the
(make smaller), and .NL (return to normal size). commands .DS C, .DS L, or .DS I respectively.
The size change is two points; the commands An extra argument to the .DS I or .DS command
may be repeated for increased effect (here one .NL is taken as an amount to indent. Note: it is
canceled two .SM commands). tempting to assume that .DS R will right adjust
lines, but it doesn’t work.
If actual _underlining
_________ as opposed to italiciz-
ing is required on the typesetter, the command B
Booxxiinngg w
woorrddss oorr lliinneess.. To draw rec-
tangular boxes around words the command
.UL word
.BX word
will underline a word. There is no way to
underline multiple words on the typesetter. will print _word
____ as shown. The boxes will not
be neat on a terminal, and this should not be
F
Foooottnnootteess.. Material placed between lines _used
__________________________________________
as a substitute for italics.
with the commands .FS (footnote) and .FE (foot-  Longer pieces of text may be boxed by enclos- 
note end) will be collected, remembered, and
 ing them with .B1 and .B2: 
finally placed at the bottom of the current page*.  
By default, footnotes are 11/12th the length of  .B1 
normal text, but this can be changed using the  text... 
 .B2 
FL register (see below). 

D
Diissppllaayyss aanndd T
Taabblleess.. To prepare ___________________________________________
as has been done here. 
displays of lines, such as tables, in which the
K
Keeeeppiinngg bblloocckkss ttooggeetthheerr.. If you wish to
lines should not be re-arranged, enclose them in
keep a table or other block of lines together on a
the commands .DS and .DE
page, there are ‘‘keep - release’’ commands. If
.DS a block of lines preceded by .KS and followed
table lines, like the by .KE does not fit on the remainder of the
examples here, are placed current page, it will begin on a new page. Lines
between .DS and .DE bracketed by .DS and .DE commands are
.DE automatically kept together this way. There is
also a ‘‘keep floating’’ command: if the block to
By default, lines between .DS and .DE are
be kept together is preceded by .KF instead of
indented and left-adjusted. You can also center
.KS and does not fit on the current page, it will
lines, or retain the left margin. Lines bracketed
be moved down through the text until the top of
by .DS C and .DE commands are centered (and
the next page. Thus, no large blank space will
not re-arranged); lines bracketed by .DS L and
be introduced in the document.
.DE are left-adjusted, not indented, and not re-
arranged. A plain .DS is equivalent to .DS I, N
Nrrooffff//T
Trrooffff ccoom
mmmaannddss.. Among the useful
which indents and left-adjusts. Thus, commands from the basic formatting programs
are the following. They all work with both
these lines were preceded typesetter and computer terminal output:
by .DS C and followed by
a .DE command;
whereas
__________________
* Like this.
-5-

permit changing its output style. For more com-


.bp - begin new page.
plicated headers and footers the macros PT and
.br - ‘‘break’’, stop running text
BT can be redefined, as explained earlier.
from line to line.
.sp n - insert n blank lines. A
Acccceennttss.. To simplify typing certain
.na - don’t adjust right margins. foreign words, strings representing common
accent marks are defined. They precede the
D
Daattee.. By default, documents produced on letter over which the mark is to appear. Here
computer terminals have the date at the bottom are the strings:
of each page; documents produced on the
Input Output Input Output
typesetter don’t. To force the date, say ‘‘.DA’’.
\*′e e´ \*˜a a˜
To force no date, say ‘‘.ND’’. To lie about the v
\*`e e` \*Ce e
date, say ‘‘.DA July 4, 1776’’ which puts the ..
\*:u u \*,c c,
specified date at the bottom of each page. The
\*ˆe eˆ
command
.ND May 8, 1945 U
Ussee.. After your document is prepared and
stored on a file, you can print it on a terminal
in ".RP" format places the specified date on the with the command*
cover sheet and nowhere else. Place this line
before the title. nroff – ms file
SSiiggnnaattuurree lliinnee.. You can obtain a signa- and you can print it on the typesetter with the
ture line by placing the command .SG in the command
document. The authors’ names will be output in
troff – ms file
place of the .SG line. An argument to .SG is
used as a typing identification line, and placed (many options are possible). In each case, if
after the signatures. The .SG command is your document is stored in several files, just list
ignored in released paper format. all the filenames where we have used ‘‘file’’. If
R
Reeggiisstteerrss.. Certain of the registers used equations or tables are used, eqn and/or tbl must
by – ms can be altered to change default set- be invoked as preprocessors.
tings. They should be changed with .nr com- R
Reeffeerreenncceess aanndd ffuurrtthheerr ssttuuddyy.. If you
mands, as with have to do Greek or mathematics, see eeqqnn [1]
for equation setting. To aid eeqqnn users, – m mss
.nr PS 9
provides definitions of .EQ and .EN which nor-
to make the default point size 9 point. If the mally center the equation and set it off slightly.
effect is needed immediately, the normal troff An argument on .EQ is taken to be an equation
command should be used in addition to chang- number and placed in the right margin near the
ing the number register. equation. In addition, there are three special
Register Defines Takes Default arguments to EQ: the letters C, I, and L indi-
effect cate centered (default), indented, and left
PS point size next para. 10 adjusted equations, respectively. If there is both
VS line spacing next para. 12 pts a format argument and an equation number, give
LL line length next para. 6′′
LT title length next para. 6′′
the format argument first, as in
PD para. spacing next para. 0.3 VS .EQ L (1.3a)
PI para. indent next para. 5 ens
FL footnote length next FS 11/12 LL for a left-adjusted equation numbered (1.3a).
CW column width next 2C 7/15 LL
GW intercolumn gap next 2C 1/15 LL Similarly, the macros .TS and .TE are
PO page offset next page 26/27′′ defined to separate tables (see [2]) from text
HM top margin next page 1′′ with a little space. A very long table with a
FM bottom margin next page 1′′ heading may be broken across pages by begin-
You may also alter the strings LH, CH, and RH ning it with .TS H instead of .TS, and placing
which are the left, center, and right headings the line .TH in the table data after the heading.
__________________
respectively; and similarly LF, CF, and RF
* If .2C was used, pipe the nroff output through col;
which are strings in the page footer. The page make the first line of the input ‘‘.pi /usr/bin/col.’’
number on output is taken from register PN, to
-6-

If the table has no heading repeated from page


to page, just use the ordinary .TS and .TE mac-
ros.
To learn more about troff see [3] for a
general introduction, and [4] for the full details
(experts only). Information on related UNIX
commands is in [5]. For jobs that do not seem
well-adapted to – ms, consider other macro pack-
ages. It is often far easier to write a specific
macro packages for such tasks as imitating par-
ticular journals than to try to adapt – ms.
A
Acckknnoow
wlleeddggm
meenntt.. Many thanks are due
to Brian Kernighan for his help in the design
and implementation of this package, and for his
assistance in preparing this manual.

References
[1] B. W. Kernighan and L. L. Cherry,
Typesetting Mathematics — Users Guide
(2nd edition), Bell Laboratories Comput-
ing Science Report no. 17.
[2] M. E. Lesk, Tbl — A Program to Format
Tables, Bell Laboratories Computing Sci-
ence Report no. 45.
[3] B. W. Kernighan, A Troff Tutorial, Bell
Laboratories, 1976.
[4] J. F. Ossanna, Nroff /Troff Reference
Manual, Bell Laboratories Computing Sci-
ence Report no. 51.
[5] K. Thompson and D. M. Ritchie, UNIX
Programmer’s Manual, Bell Laboratories,
1978.
-7-

Appendix A
List of Commands
1C Return to single column format. LG Increase type size.
2C Start double column format. LP Left aligned block paragraph.
AB Begin abstract.
AE End abstract.
AI Specify author’s institution.
AU Specify author. ND Change or cancel date.
B Begin boldface. NH Specify numbered heading.
DA Provide the date on each page. NL Return to normal type size.
DE End display. PP Begin paragraph.
DS Start display (also CD, LD, ID).
EN End equation. R Return to regular font (usually Roman).
EQ Begin equation. RE End one level of relative indenting.
FE End footnote. RP Use released paper format.
FS Begin footnote. RS Relative indent increased one level.
SG Insert signature line.
I Begin italics. SH Specify section heading.
SM Change to smaller type size.
IP Begin indented paragraph. TL Specify title.
KE Release keep.
KF Begin floating keep. UL Underline one word.
KS Start keep.

Register Names
The following register names are used by – ms internally. Independent use of these names in
one’s own macros may produce incorrect output. Note that no lower case letters are used in any – ms
internal name.
Number registers used in – ms
: DW GW HM IQ LL NA OJ PO T. TV
#T EF H1 HT IR LT NC PD PQ TB VS
1T FL H3 IK KI MM NF PF PX TD YE
AV FM H4 IM L1 MN NS PI RO TN YY
CW FP H5 IP LE MO OI PN ST TQ ZN

String registers used in – ms


′ A5 CB DW EZ I KF MR R1 RT TL
` AB CC DY FA I1 KQ ND R2 S0 TM
ˆ AE CD E1 FE I2 KS NH R3 S1 TQ
˜ AI CF E2 FJ I3 LB NL R4 S2 TS
: AU CH E3 FK I4 LD NP R5 SG TT
, B CM E4 FN I5 LG OD RC SH UL
1C BG CS E5 FO ID LP OK RE SM WB
2C BT CT EE FQ IE ME PP RF SN WH
A1 C D EL FS IM MF PT RH SY WT
A2 C1 DA EM FV IP MH PY RP TA XD
A3 C2 DE EN FY IZ MN QF RQ TE XF
A4 CA DS EQ HO KE MO R RS TH XK
-8-

RP

TL

AU

AI

AB

AE

NH, SH

PP, LP

text ...

Figure 1
2

C
Coom
mmmaannddss ffoorr a T
TMM
.TM 1978-5b3 99999 99999-11
.ND April 1, 1976
.TL
A Guide to Preparing The Role of the Allen Wrench in Modern
Electronics
Documents with – ms .AU "MH 2G-111" 2345
J. Q. Pencilpusher
.AU "MH 1K-222" 5432
X. Y. Hardwired
M. E. Lesk .AI
.MH
Bell Laboratories August 1978 .OK
Tools
____________________________________________ Design
.AB
This guide gives some simple examples of This abstract should be short enough to
document preparation on Bell Labs computers, fit on a single page cover sheet.
It must attract the reader into sending for
emphasizing the use of the – m mss macro package. It
the complete memorandum.
enormously abbreviates information in .AE
1. T Tyyppiinngg D Dooccuum meennttss oonn U UNNIIXX aanndd G GC CO
OSS,, by .CS 10 2 12 5 6 7
M. E. Lesk; .NH
2. T Tyyppeesseettttiinngg MMaatthheem maattiiccss – U Usseerr’’ss G Guuiiddee,, by Introduction.
B. W. Kernighan and L. L. Cherry; and .PP
3. T Tbbll – A P Prrooggrraam
m ttoo FFoorrm maatt T Taabblleess,, by M. E. Now the first paragraph of actual text ...
Lesk. ...
These memos are all included in the U UN NIIX
X Last line of text.
P
Prrooggrraam
mm meerr’’ss M Maannuuaall,, V
Voolluum mee 22.. The new user .SG MH-1234-JQP/XYH-unix
.NH
should also have A T Tuuttoorriiaall IInnttrroodduuccttiioonn ttoo tthhee
References ...
U
UNNIIXXT Teexxtt E
Eddiittoorr,, by B. W. Kernighan.
For more detailed information, read A Addvvaanncceedd Commands not needed in a particular format are ignored.
____________________________________________________________________________
E
Eddiittiinngg oonn U
UN NIIX
X and A T Trrooffff T
Tuuttoorriiaall,, by B. W.  
Kernighan, and (for experts) N Nrrooffff //T
Trrooffff R
Reeffeerreennccee  
 
M
Maannuuaall by J. F. Ossanna. Information on related  Bell Laboratories Cover Sheet for TM 
 
commands is found (for UNIX users) in U UNNIIX X ffoorr  
 ________________________________________________________________________ 
B
Beeggiinnnneerrss by B. W. Kernighan and the U UN NIIX X  This information is for employees of Bell Laboratories. (GEI 13.9-3) 
 ________________________________________________________________________ 
P
Prrooggrraam mmmeerr’’ss M
Maannuuaall by K. Thompson and D. M.  
Ritchie.  
 Title- TThhee RRoollee ooff tthhee AAlllleenn W
Wrreenncchh Apprriill 11,, 11997766 
Date- A
 
 i
inn M
Mo oddeer
rnn E
Elleeccttr
roon
ni ic
css 
 TM- 1 1997788--55bb33 
Contents  
 Other Keywords- T Toooollss 
 D
Deessiiggnn 
A TM . . . . . . . . . . . . . . . . . . . . 2  
 
A released paper . . . . . . . . . . . . . . 3  
 
An internal memo, and headings . . . . . 4  
 Author Location Ext. Charging Case- 99999 
Lists, displays, and footnotes . . . . . . . 5  J
J.. Q
Q.. P
Pe en
ncciil
lppu
usshheer
r M
MH H 2
2GG--111111 2
2334455 Filing Case- 99999a 
 
Indents, keeps, and double column . . . . 6  X
X.. Y
Y.. H Haarrddw
wiirreedd M
MH H 11K K--222222 55443322 
 
Equations and registers . . . . . . . . . . 7  
 
 ABSTRACT 
Tables and usage . . . . . . . . . . . . . . 8  
 
 This abstract should be short enough to fit 
Throughout the examples, input is shown in  on a single page cover sheet. It must attract the 
 
 reader into sending for the complete memoran- 
this Helvetica sans serif font  dum. 
while the resulting output is shown in  
 
this Times Roman font.  
 
 
 
 ______________________________________________________ 
  
UNIX Document no. 1111  Pages Text 10 Other 2 Total 12  
  
 
 No. Figures 5 No. Tables 6 No. Refs. 7  
 ______________________________________________________  

 E-1932-U (6-73) SEE REVERSE SIDE FOR DISTRIBUTION LIST 
 

____________________________________________________________________________ 
3 4

AR
Reelleeaasseedd P
Paappeerr w
wiitthh M
Maatthheem
maattiiccss A
Ann IInntteerrnnaall M
Meem
moorraanndduum
m
.EQ .IM
delim $$ .ND January 24, 1956
.EN .TL
.RP The 1956 Consent Decree
.AU
... (as for a TM) Able, Baker &
Charley, Attys.
.CS 10 2 12 5 6 7 .PP
.NH Plaintiff, United States of America, having filed
Introduction its complaint herein on January 14, 1949; the
.PP defendants having appeared and filed their
The solution to the torque handle equation answer to such complaint denying the
.EQ (1) substantive allegations thereof; and the parties,
sum from 0 to inf F ( x sub i ) = G ( x ) by their attorneys, ...
.EN __________________________________________________________
is found with the transformation $ x = rho over  
theta $ where $ rho = G prime (x) $ and $theta$  
 
is derived ...  
 B
Beel
lll L
Laabbo
or ra
atto
orri
iees
s 
__________________________________________________________  
   Subject: T Thhee 11995566 C Coonnsseenntt D
Deeccrreeee date: JJaan nuuaarryy 2244,, 11995566 
  
   from: A Abbllee,, BBaakkeerr & 
   C
Chhaarrlleeyy,, AAttttyyss.. 
 T
Thhee RRoollee ooff tthhee A
Alllleenn W Wrreenncchh   
   
iinn M
Mooddeerrnn E Elleeccttrroonniiccss  
 Plaintiff, United States of America, having filed its complaint 
   herein on January 14, 1949; the defendants having appeared and 

J. Q. Pencilpusher   filed their answer to such complaint denying the substantive alle- 
   gations thereof; and the parties, by their attorneys, having 

X. Y. Hardwired   severally consented to the entry of this Final Judgment without 
   trial or adjudication of any issues of fact or law herein and 
Bell Laboratories   without this Final Judgment constituting any evidence or admis- 


Murray Hill, New Jersey 07974   sion by any party in respect of any such issues; 
   
  Now, therefore before any testimony has been taken herein,
 and without trial or adjudication of any issue of fact or law 
 ABSTRACT   
  herein, and upon the consent of all parties hereto, it is hereby
 
 This abstract should be short enough to fit on a sin-   Ordered, adjudged and decreed as follows: 
 gle page cover sheet. It must attract the reader into   II.. [[SShheerrm
maann A Acctt]] 
 sending for the complete memorandum.   
   This Court has jurisdiction of the subject matter herein and of 
   all the parties hereto. The complaint states a claim upon which 
   relief may be granted against each of the defendants under Sec- 
   tions 1, 2 and 3 of the Act of Congress of July 2, 1890, entitled 
   ‘‘An act to protect trade and commerce against unlawful restraints 
   and monopolies,’’ commonly known as the Sherman Act, as 
 April 1, 1976   amended. 
__________________________________________________________   IIII.. [[D
Deefifinniittiioonnss]] 
   
__________________________________________________________ 
For the purposes of this Final Judgment:

   (a) ‘‘Western’’ shall mean the defendant Western Electric 
   Company, Incorporated. 
   
 T
Thhee RRoollee ooff tthhee A
Alllleenn W Wrreenncchh __________________________________________________________
  
 iinn M
Mooddeerrnn E Elleeccttrroonniiccss 
 
 J. Q. Pencilpusher  Other formats possible (specify before .TL) are: .MR
 
 
(‘‘memo for record’’), .MF (‘‘memo for file’’), .EG
 X. Y. Hardwired (‘‘engineer’s notes’’) and .TR (Computing Science Tech.

  Report).
 Bell Laboratories 
 Murray Hill, New Jersey 07974 
 
  H
Heeaaddiinnggss
 
 
  .NH .SH
 11.. IInnttrroodduuccttiioonn  Introduction. Appendix I
 The solution to the torque handle equation 
 ∞ .PP .PP



Σ F (xi )=G (x ) (1)  text text text text text text
0 
 is found with the transformation x = _ρ_ where ρ=G ′(x ) and θ is  11.. IInnttrroodduuccttiioonn A
Appppeennddiixx I
 θ 
 derived from well-known principles.  text text text text text text
__________________________________________________________
 
5 6

AS
Siim
mppllee L
Liisstt M
Muullttiippllee IInnddeennttss
.IP 1. This is ordinary text to point out
J. Pencilpusher and X. Hardwired, the margins of the page.
.I .IP 1.
A New Kind of Set Screw, First level item
.R .RS
Proc. IEEE .IP a)
.B 75 Second level.
(1976), 23-255. .IP b)
.IP 2. Continued here with another second
H. Nails and R. Irons, level item, but somewhat longer.
.I .RE
Fasteners for Printed Circuit Boards, .IP 2.
.R Return to previous value of the
Proc. ASME indenting at this point.
.B 23 .IP 3.
(1974), 23-24. Another
.LP (terminates list) line.

1. J. Pencilpusher and X. Hardwired, A New Kind of This is ordinary text to point out the margins of the page.
Set Screw, Proc. IEEE 75 (1976), 23-255. 1. First level item
2. H. Nails and R. Irons, Fasteners for Printed Circuit a) Second level.
Boards, Proc. ASME 23 (1974), 23-24. b) Continued here with another second level item,
but somewhat longer.
2. Return to previous value of the indenting at this
D
Diissppllaayyss point.
3. Another line.
text text text text text text
.DS
and now K
Keeeeppss
for something
completely different Lines bracketed by the following commands are kept
.DE together, and will appear entirely on one page:
text text text text text text .KS not moved .KF may float
.KE through text .KE in text
hoboken harrison newark roseville avenue grove street
east orange brick church orange highland avenue moun-
tain station south orange maplewood millburn short hills D
Doouubbllee C
Coolluum
mnn
summit new providence
.TL
and now
The Declaration of Independence
for something
.2C
completely different
.PP
murray hill berkeley heights gillette stirling millington When in the course of human events, it becomes
lyons basking ridge bernardsville far hills peapack glad- necessary for one people to dissolve the political
stone bonds which have connected them with another, and
to assume among the powers of the earth the
Options: .DS L: left-adjust; .DS C: line-by-line center;
separate and equal station to which the laws of
.DS B: make block, then center.
Nature and of Nature’s God entitle them, a decent
respect to the opinions of
F
Foooottnnootteess
The Declaration of Independence
Among the most important occupants
When in the course of We hold these truths to
of the workbench are the long-nosed pliers.
human events, it becomes be self-evident, that all
Without these basic tools*
necessary for one people to men are created equal, that
.FS
dissolve the political bonds they are endowed by their
* As first shown by Tiger & Leopard
which have connected creator with certain
(1975).
them with another, and to unalienable rights, that
.FE
assume among the powers among these are life, liber-
few assemblies could be completed. They may
of the earth the separate ty, and the pursuit of hap-
lack the popular appeal of the sledgehammer
and equal station to which piness. That to secure
the laws of Nature and of these rights, governments
Among the most important occupants of the workbench Nature’s God entitle them, are instituted among men,
are the long-nosed pliers. Without these basic tools* few a decent respect to the
assemblies could be completed. They may lack the popu- opinions of mankind re-
lar appeal of the sledgehammer quires that they should de-
________________ clare the causes which im-
* As first shown by Tiger & Leopard (1975). pel them to the separation.
7 8

E
Eqquuaattiioonnss T
Taabblleess
A displayed equation is marked .TS ( T indicates a tab)
with an equation number at the right margin allbox; _____________________
by adding an argument to the EQ line: css _____________________
AT&T Common Stock 
.EQ (1.3) ccc _____________________
Year  Price Dividend 
x sup 2 over a sup 2 ˜=˜ sqrt {p z sup 2 +qz+r} n n n.  1971 41-54  $2.60 
_____________________ 
.EN AT&T Common Stock 2 41-54  2.70 
_____________________
Year T Price T Dividend  
A displayed equation is marked with an equation number 1971 T 41-54 T $2.60 _____________________
3 46-55  2.87 
_____________________
4 40-53  3.24 
at the right margin by adding an argument to the EQ line: 2 T 41-54 T 2.70 _____________________
3 T 46-55 T 2.87 5 45-52  3.40 
2
 
_x__ = √
pz 2+qz +r (1.3) 4 T 40-53 T 3.24 6 51-59  .95* 
_____________________
a2
5 T 45-52 T 3.40 * (first quarter only)
.EQ I (2.2a) 6 T 51-59 T .95*
bold V bar sub nu˜=˜left [ pile {a above b above .TE
c } right ] + left [ matrix { col { A(11) above . * (first quarter only)
above . } col { . above . above .} col {. above . The meanings of the key-letters describing the alignment
above A(33) }} right ] cdot left [ pile { alpha of each entry are:
above beta above gamma } right ] c center n numerical
.EN r right-adjust a subcolumn
l left-adjust s spanned
 a   A (11) . .  α  The global table options are center, expand, box,
V ν =  b + .
V . . . β  (2.2a)
c   . . A (33)   γ  doublebox, allbox, tab (xx ) and linesize (nn ).

.EQ L .TS (with delim $$ on, see panel 3)


F hat ( chi ) ˜ mark = ˜  del V  sup 2 doublebox, center;
.EN cc
.EQ L l l.
lineup =˜ {left ( {partial V} over {partial x} right ) } Name T Definition
sup 2 + { left ( {partial V} over {partial y} right ) } .sp
sup 2 ˜˜˜˜˜˜ lambda -> inf Gamma T $GAMMA (z) = int sub 0 sup inf \
.EN t sup {z-1} e sup -t dt$
Sine T $sin (x) = 1 over 2i ( e sup ix - e sup -ix )$
F̂ (χ) =  ∇V  2 Error T $ roman erf (z) = 2 over sqrt pi \
int sub 0 sup z e sup {-t sup 2} dt$
 ∂V 2  ∂V 2 Bessel T $ J sub 0 (z) = 1 over pi \
=  ___  +  ___  λ→∞ int sub 0 sup pi cos ( z sin theta ) d theta $
 ∂x   ∂y  Zeta T $ zeta (s) = \
sum from k=1 to inf k sup -s ˜˜( Re˜s > 1)$
$ a dot $, $ b dotdot$, $ xi tilde times y vec$: .TE
. .. __________________________________
_______________________________
a , b , ξ̃×y→. (with delim $$ on, see panel 3).
 Name Definition 
See also the equations in the second table, panel 8.  
 ∞ 
Gamma Γ(z )=∫ t z −1e −t dt 
S
Soom
mee R
Reeggiisstteerrss Y
Yoouu C
Caann C
Chhaannggee  0
1 
−ix
Sine sin(x )= ___ (e −e )
ix

 2i
z 
Line length Paragraph spacing Error erf(z )= ___ 2
∫ e −t dt 
2

.nr LL 7i .nr PD 0  √π 0



π
 
J 0(z )= ∫ cos(z sinθ)d θ 
Title length Page offset _1_
.nr LT 7i .nr PO 0.5i Bessel π 0 
 ∞

Point size Page heading Zeta ζ(s )= Σ k −s (Re s >1) 
.nr PS 9 .ds CH Appendix __________________________________
_______________________________
k =1 
Vertical spacing (center)
.nr VS 11 .ds RH 7-25-76 U
Ussaaggee
(right)
Column width .ds LH Private Documents with just text:
.nr CW 3i (left) troff -ms files
Intercolumn spacing Page footer With equations only:
.nr GW .5i .ds CF Draft eqn files  troff -ms
.ds LF With tables only:
Margins – head and foot tbl files  troff -ms
.nr HM .75i .ds RF similar
With both tables and equations:
.nr FM .75i Page numbers tbl files  eqn  troff -ms
______________________________
Paragraph indent .nr % 3
.nr PI 2n The above generates STARE output on GCOS: replace – st
with – ph for typesetter output.
A System for Typesetting Mathematics

Brian W. Kernighan and Lorinda L. Cherry


Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

This paper describes the design and implementation of a system for typesetting mathemat-
ics. The language has been designed to be easy to learn and to use by people (for example,
secretaries and mathematical typists) who know neither mathematics nor typesetting. Experience
indicates that the language can be learned in an hour or so, for it has few rules and fewer e